On Montag, 6. Juli 2020 18:25:56 CEST Maury Markowitz wrote: > Moving to a new thread - I was surprised I could even post, previous efforts > were bounced from the list server for no obvious reason. Someone helpfully > posted for me in the past. And now everything is magically working, so I > hope you don't all mind the duplicate. > > On Jul 6, 2020, at 9:04 AM, Christian Schoenebeck > > <schoeneb...@crudebyte.com> wrote: > > You would simply add a RegEx pattern & rule like this: > Consider two snippets in home-computer-era BASIC: > > FOREX=10 > > and: > > FOREX=10TO20 > > Is the first one a broken FOR statement or a perfectly valid variable > assignment? (Why not both?!) MS says the former, BBC the later.
To avoid mixing language design aspects with its actual parser implementation; most programming languages allow 1..n spaces in between keywords of course. But 0..n spaces is usually only allowed by programming languages if it would not end in ambiguities. From language design perspective your example: FOREX=... would IMO clearly be a variable assignment, never a loop definition. However originally variable assignments in BASIC were actually like this BTW: LET foo=10 Which would resolve that ambiguity of your example, if it actually exists, as I have never seen "FOREX" as valid BASIC loop definition before, if there is, sources for that specification appreciated. > In the lex/flex model of longest-match-wins, assuming any reasonable > definition for your variable pattern, both statements are variable > assignments and the second fails to parse. > > To match the behaviour of BASIC, one has to complicate the variable pattern. > "Complicate" varies between adding a tail pattern for every possible > keyword, or artificially limiting the variables in length, or... > > Is there a better way? > > Ideally, I would love if there was an optional #keyword which is similar to > #token but has a "higher priority" so they would match first. I suspect > this would be valuable in a wide variety of tasks, but I'm completely noob > so I can't say. On (F)lex side, more complex handling of eating up white spaces is commonly handled with scanner states (<CONDITIONNAME> in front of patterns) and pushing/popping states is done as action in other scanner rules by calling yy_push_state() and yy_pop_state() accordingly. For instance in this programming language scanner I needed more complicated white space filtering, as e.g. there are preprocessor statements (including variable white spaces as well) that should be preprocessed by the scanner before entering the language parser: http://svn.linuxsampler.org/cgi-bin/viewvc.cgi/linuxsampler/trunk/src/scriptvm/scanner.l?view=markup In your particular example, it might also be considerable to simply work with line start anchor (^) instead: ^ACTUALPATTERN Best regards, Christian Schoenebeck