Re: Scanless lexer doesn't try shorter lexem

Jeffrey Kegler Mon, 06 Jan 2014 15:35:41 -0800

First off, welcome back. Since you've been away for a while, allow meto let new readers know that you're the founder of this mailing list,and someone whose support and advice have been very valuable to Marpa.

Second, I'm about to dive into the answer, but I'm very open to ideasthat would make Marpa easier to use.

Marpa does an old-fashioned longest-tokens-match. I does haveinformation about tokens expected, but it imitates traditional parsersin not using that. At the beginning, it looks for the longest token.If there is only one, and it is not acceptable to the grammar, the parsefails. In this case it finds a <value>, and because a <value> is notacceptable first thing, the parse fails.

Longest-tokens-match requires that you contrive it so that the longesttoken, including those which the grammar will not find acceptable, isalways the one you want. Could Marpa do it differently? Yes, and itwill in the future. (Aside @amon: perhaps the IRIF already does better?)


-- jeffrey

On 01/06/2014 03:08 PM, Ruslan Zakirov wrote:

Hi,

Shorter script that demos problem: https://gist.github.com/ruz/8291475

Comments below:

On Tue, Jan 7, 2014 at 2:57 AM, Ron Savage <[email protected]<mailto:[email protected]>> wrote:


    I made some small changes:

    ron@zigzag:~/Documents/repos/marpa.papers$ diff
    ~/bin/vcard.parser.orig.pl <http://vcard.parser.orig.pl>
    ~/bin/vcard.parser.pl <http://vcard.parser.pl>
    0a1,2
    > #!/usr/bin/env perl
    >
    7c9
    < my $syntax = <<'END';
    ---
    > my $syntax = <<'EOS';
    15,17c17,19
    < group ~ A_D_D
    < name ~ A_D_D
    < params ::= ';' param_list | empty
    ---
    > group ::= A_D_D
    > name ::= A_D_D
    > params ::= SEMICOLON param_list | empty
    21c23
    < any_param_name ~ A_D_D
    ---
    > any_param_name ::= A_D_D
    86c88
    < END
    ---
    > EOS
    89c91
    < say "rules L0:\n", $grammar->show_rules(1, 'G0');
    ---
    > #say "rules L0:\n", $grammar->show_rules(1, 'G0');

    and I get:

    ron@zigzag:~/Documents/repos/marpa.papers$ ~/bin/vcard.parser.pl
    <http://vcard.parser.pl>
    Setting trace_terminals option
    Lexer "L0" rejected lexeme L1c1-11: text; value="BEGIN:VCARD"
    Lexer "L0" accepted lexeme L1c1-11: 'BEGIN:VCARD'; value="BEGIN:VCARD"

You see here that lexer rejected text rule, but accepted literal ruleof the same length.


    Lexer "L0" accepted lexeme L1c12: CRLF; value="
    "
    Lexer "L0" rejected lexeme L2c1-11: text; value="VERSION:4.0"
    Lexer "L0" accepted lexeme L2c1-11: 'VERSION:4.0'; value="VERSION:4.0"


Once again.

    Lexer "L0" accepted lexeme L2c12: CRLF; value="
    "
    Lexer "L0" rejected lexeme L3c1-49: text;
    value="UID:urn:uuid:4fbe8971-0bc3-424c-9c26-36c3e1eff6b1"


Here lexer went for longer match and never tried A_D_D; value="UID".

    progress:
    P0 @0-0 L1c1 vCards -> . vCard +
    P1 @0-0 L1c1 vCard -> . 'BEGIN:VCARD' CRLF 'VERSION:4.0' CRLF
    content 'END:VCARD'
    P36 @0-0 L1c1 :start -> . vCards
    R1:1 @0-1 L1c1-11 vCard -> 'BEGIN:VCARD' . CRLF 'VERSION:4.0' CRLF
    content 'END:VCARD'
    R1:2 @0-2 L1c1-12 vCard -> 'BEGIN:VCARD' CRLF . 'VERSION:4.0' CRLF
    content 'END:VCARD'
    R1:3 @0-3 L1c1-L2c11 vCard -> 'BEGIN:VCARD' CRLF 'VERSION:4.0' .
    CRLF content 'END:VCARD'
    R1:4 @0-4 L1c1-L2c12 vCard -> 'BEGIN:VCARD' CRLF 'VERSION:4.0'
    CRLF . content 'END:VCARD'
    P2 @4-4 L2c12 content -> . content_line +
    P3 @4-4 L2c12 content_line -> . content_name params ':' value CRLF
    P4 @4-4 L2c12 content_name -> . name
    P5 @4-4 L2c12 content_name -> . group '.' name
    P6 @4-4 L2c12 group -> . A_D_D
    P7 @4-4 L2c12 name -> . A_D_D

    Error in SLIF parse: No lexemes accepted at line 3, column 1
      Lexer "L0" rejected 1 lexeme(s)
      Rejected lexeme #1: text;
    value="UID:urn:uuid:4fbe8971-0bc3-424c-9c26-36c3e1eff6b1"; length = 49
    * String before error: BEGIN:VCARD\nVERSION:4.0\n
    * The error was at line 3, column 1, and at character 0x0055 'U', ...
    * here: UID:urn:uuid:4fbe8971-0bc3-424c-9c26-36c3e1eff6b1\n
    Marpa::R2 exception at /home/ron/bin/vcard.parser.pl
    <http://vcard.parser.pl> line 96.

    So it is trying A_D_D.


Sure. Recognizer waits for A_D_D, but lexer never offers it.

--You received this message because you are subscribed to the Google

    Groups "marpa parser" group.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to [email protected]
    <mailto:marpa-parser%[email protected]>.
    For more options, visit https://groups.google.com/groups/opt_out.




--
Best regards, Ruslan.
--

You received this message because you are subscribed to the GoogleGroups "marpa parser" group.To unsubscribe from this group and stop receiving emails from it, sendan email to [email protected].

For more options, visit https://groups.google.com/groups/opt_out.


--
You received this message because you are subscribed to the Google Groups "marpa 
parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: Scanless lexer doesn't try shorter lexem

Reply via email to