On Tue, 12 Dec 2023 23:06:14 -0500 Steve Litt <sl...@troubleshooters.com> wrote:
> I've already split paratext into multiple LINE tokens which represent > a line without its NL, and now I'm thinking of splitting line into > multiple chars ("[^\n]"). Perhaps this will make the rules less > complicated, though longer. Have the scanner return two tokens only: LINE a line of text, no newline SEP a blank line The lexer might have: .+/\n { ... return LINE; } (\n[[:blank:]]*){2,} { return SEP; } // two or more blank lines \n { /* ignore */ } Then your parser wants: top: paragraphs | paragraphs SEP // to allow for trailing blank lines ; paragraphs: paragraph | paragraphs SEP paragraph ; paragraph: lines ; lines: LINE | lines LINE ; I would think that would work. --jkl