On Tue, 12 Dec 2023 23:06:14 -0500
Steve Litt <[email protected]> wrote:
> I've already split paratext into multiple LINE tokens which represent
> a line without its NL, and now I'm thinking of splitting line into
> multiple chars ("[^\n]"). Perhaps this will make the rules less
> complicated, though longer.
Have the scanner return two tokens only:
LINE a line of text, no newline
SEP a blank line
The lexer might have:
.+/\n { ... return LINE; }
(\n[[:blank:]]*){2,} { return SEP; } // two or more blank lines
\n { /* ignore */ }
Then your parser wants:
top: paragraphs
| paragraphs SEP // to allow for trailing blank lines
;
paragraphs: paragraph
| paragraphs SEP paragraph
;
paragraph: lines
;
lines: LINE
| lines LINE
;
I would think that would work.
--jkl