James K. Lowden said on Tue, 12 Dec 2023 20:24:35 -0500 >On Tue, 12 Dec 2023 23:06:14 -0500 >Steve Litt <sl...@troubleshooters.com> wrote: > >> I've already split paratext into multiple LINE tokens which represent >> a line without its NL, and now I'm thinking of splitting line into >> multiple chars ("[^\n]"). Perhaps this will make the rules less >> complicated, though longer. > >Have the scanner return two tokens only: > > LINE a line of text, no newline > SEP a blank line > >The lexer might have: > >.+/\n { ... return LINE; } >(\n[[:blank:]]*){2,} { return SEP; } // two or more blank lines >\n { /* ignore */ }
Thanks James, this looks great! I won't need to consider end of line spaces because I now have a sed 1 liner preprocessor that gets rid of trailing space :-). Right now I've gone back to the Hello World stage and am making a Flex/Bison scanner that does nothing but copy the file. Once I learn from that, I'll try your suggestions. They look refreshingly simple and understandable to me. Thanks much, SteveT Steve Litt Autumn 2023 featured book: Rapid Learning for the 21st Century http://www.troubleshooters.com/rl21