On Mittwoch, 8. Juli 2020 06:24:13 CEST Akim Demaille wrote: > > As I don't speak BASIC, let me rephrase this problem in FORTRAN IV which > > is also "blank agnostic": > > > > DO <number> <variable> = <expression> , <expression> [, <expression>] > > > > It is not until you reach the comma after the first expression that you > > know whether the statement is the beginning of a loop or it is an > > assignment. And the expression can contain commas in function calls, > > which defeats any trivial lookahead scanning. E.g., > > > > D O 17 6PQ R=FUN X(1 4, V 8) > > > > is an assignment to variable DO176PQR. The function arguments can also be > > expressions that contain function calls. > > > > As you can see, this more or less defeats any attempt to write a lex > > scanner. And you cannot just squeeze out all blanks in a front end > > because "Hollerith fields" can contain blanks that are significant (must > > remain). > I still think you can address this case with Flex, but I agree it's > going to be painful. I would go for something like > > sp [ \t]* > do D{sp}O > > id [a-zA-Z]({sp}[a-zA-Z_0-9]+)*
do 10 i = 1, n would then be interpreted as assignment to variable 'do10i', it is a loop definition though. So yes, you could certainly address this to work correctly with Flex with additional measures, but I think both the Fortran and BASIC examples could much easier (less complex) and elegantly be solved with a monolithicly combined parser-scanner, as the parser could then out of the box detect keywords depending on the grammar context. Best regards, Christian Schoenebeck