Re: Parsing a language with optional spaces

Christian Schoenebeck Wed, 08 Jul 2020 03:16:50 -0700

On Mittwoch, 8. Juli 2020 06:24:13 CEST Akim Demaille wrote:
> > As I don't speak BASIC, let me rephrase this problem in FORTRAN IV which
> > is also "blank agnostic":
> > 
> > DO <number> <variable> = <expression> , <expression> [, <expression>]
> > 
> > It is not until you reach the comma after the first expression that you
> > know whether the statement is the beginning of a loop or it is an
> > assignment.  And the expression can contain commas in function calls,
> > which defeats any trivial lookahead scanning.  E.g.,
> > 
> > D O 17 6PQ R=FUN X(1 4, V 8)
> > 
> > is an assignment to variable DO176PQR.  The function arguments can also be
> > expressions that contain function calls.
> > 
> > As you can see, this more or less defeats any attempt to write a lex
> > scanner.  And you cannot just squeeze out all blanks in a front end
> > because "Hollerith fields" can contain blanks that are significant (must
> > remain).
> I still think you can address this case with Flex, but I agree it's
> going to be painful.  I would go for something like
> 
> sp   [ \t]*
> do   D{sp}O
> 
> id   [a-zA-Z]({sp}[a-zA-Z_0-9]+)*


do 10 i = 1, n

would then be interpreted as assignment to variable 'do10i', it is a loop 
definition though.

So yes, you could certainly address this to work correctly with Flex with 
additional measures, but I think both the Fortran and BASIC examples could 
much easier (less complex) and elegantly be solved with a monolithicly 
combined parser-scanner, as the parser could then out of the box detect 
keywords depending on the grammar context.

Best regards,
Christian Schoenebeck

Re: Parsing a language with optional spaces

Reply via email to