Hi Nick,

APL historically has required consecutive numbers to be separated by at
least one character that can't be part of a number.

This differs from the lex approach of matching the longest legal thing, and
not caring about the next thing beginning immediately thereafter.

Since doing it differently doesn't add any new functionality, it's probably
best to just stick with the way it's always been done.

Such occurrences in peoples code are almost always typos, and it's best to
tell them about them.

Regards,

Mike


On Fri, Aug 28, 2015 at 10:23 AM, Nick Lobachevsky <[email protected]>
wrote:

> Before the parsing comes the lexical analysis....
>
> Have a look at the ancient Unix lex (or flex) utility for some insight
> as to how GNU APL might recognise numbers.  Also, have a look at the
> APL.LEX definition file from Timothy Budd's APLc compiler project,
> see http://home.earthlink.net/~swsirlin/aplc.tar.Z which you can
> unzip.  More info here: http://home.earthlink.net/~swsirlin/aplcc.html
>
> Budd's Lex regex definitions for numeric constants are:
> (".ng"{ws})?[0-9]+\.[0-9]*([eE][+-]?[0-9]+)?    {return( lexnum(RCON));}
> (".ng"{ws})?[0-9]*\.[0-9]+([eE][+-]?[0-9]+)?    {return( lexnum(RCON));}
> (".ng"{ws})?[0-9]+                              {return( lexnum(ICON)); }
> With Lex, the longest match wins.  Evidently, the reason for the two
> similar real number definitions is to support things like 1.e3 and
> .1e3 instead of the more complete 1.0e3 and 0.1e3.
>
> So to my Lex-influenced way of thinking,
>
>       ¯5¯6¯7
> really should be three negative numbers, as the high minus
> unambiguously begins the next numeric token.
>
>       1E6E7
> 1E6 is a complete numeric token and processing ends for that number
> immediately.  What follows is E7, which looks like a variable or
> function name.
>
>        1E¯¯6
> would be four tokens, 1, then E, then a lone high minus, then negative 6.
>
>       1E¯
> three tokens
>
>       1D¯¯6
> also four tokens, interesting why the Bad Number.
>

Reply via email to