Re: Preferring a shorter token match

Jeffrey Kegler Wed, 02 Dec 2020 13:29:04 -0800

I'll first describe your immediate problem, then ask a couple Q's.

The problem: Lexing is LATM -- *Longest* Acceptable Token Matching.  The
lexeme priority is a tie breaker, used when tokens are the same length.
When your grammar fails, "PAx" is your longest token, and the only choice
at length 3.  "PA" is only 2 chars long, and lexemes of different length
are not compared for priority.


(Btw the reason for this is, as implemented, lexeme priorities can be (and
are) tested in a few machine instructions.  If Marpa needed to look at
earlier possibilities, the logic gets vastly very complex, efficiency goes
out the window, and you get into the territory when the grammar can often
be handled in easier faster ways.)

Now the questions:

1.) I notice statements cannot be multiline.  Is that the intent going
forward?

2.) In the example, commands always begin with a capital letter, variables
never do.  Will that continue to be the case?  (If so, it points to an
easy, fast solution.)

Possible solutions, depending, include finding something that distinguishes
commands from variables in the lexer; custom lexers; using events to guide
custom lexing; and character-by-character lexing, whereby you handle your
own whitespace.



On Wed, Dec 2, 2020 at 3:27 PM Dean S <[email protected]> wrote:

> Hello, I'm having trouble figuring out how to express my grammar and was
> hoping someone could help. I've tried rewriting various ways and looking
> for some options that might change behavior, but I haven't been able to
> figure it out.
>
> I have a language with variable assignment and simple commands and doesn't
> care about whitespace. So,
>
> PAx=42   # variable assignment to "PAx"
>
> PAx      # PA command with argument x
>
> I have a grammar, but it insists on spaces after command names. I've tried
> hiding assignment behind a prioritized rule and tried setting the command
> lexeme priority, but I always get parse errors when parsing "PAx". I have a
> simplified grammer which exhibits the issue,
>
> #!/usr/bin/perl
> use warnings; use strict; use 5.028;
>
> use Marpa::R2 8.000000;
> use Data::Dumper;
>
> my $grammar = 'Marpa::R2::Scanless::G'->new({ source => \(<<'RULES') });
> :default ::= action => [name,values]
> lexeme default = latm => 1
>
> :start     ::= Program
>
> Program    ::= Statement+
>
> Statement  ::= Command terminator
>             || Assign terminator
>
> Command    ::= command arg
>
> Assign     ::= variable equal value
>
> # Doesn't help: :lexeme  ~ command  priority => 2
> command      ~ 'PA' | 'PR'
> arg          ~ [\w]+
> equal        ~ '='
> value        ~ [0-9]+
> variable     ~ [\w]+
> terminator   ~ [;\n]
>
> :discard     ~ whitespace
> whitespace   ~ [ \t]+
> RULES
>
> # This parses correctly, line 2 is a command, line 3 is assignment.
> my $ok = <<TEXT;
> x=23
> PA x
> PAx=42
> TEXT
>
> say Dumper($grammar->parse(\$ok));
>
> # I want to generate same tree as above,
> # but my grammar wants line 2 to be an assignment.
> my $bad = <<TEXT;
> x=23
> PAx
> PAx=42
> TEXT
>
> # Error in SLIF parse: No lexeme found at line 2, column 4
> say Dumper($grammar->parse(\$bad));
>
> Is there some trick to this? Did I miss someting in the documentation?
>
> Any suggestions?
>
> Thanks!
>   - Dean
>
> --
> You received this message because you are subscribed to the Google Groups
> "marpa parser" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/marpa-parser/32966712-7756-4590-8b13-c9b2decbc3e4n%40googlegroups.com
> <https://groups.google.com/d/msgid/marpa-parser/32966712-7756-4590-8b13-c9b2decbc3e4n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/marpa-parser/CA%2B2Wrv9ZPWr4RRKNRuQ8GZ5UE_5kOyLuPnqMDz0enh24%2BR_Q%3Dw%40mail.gmail.com.

Re: Preferring a shorter token match

Reply via email to