Re: Scanless lexer doesn't try shorter lexem

Jeffrey Kegler Mon, 06 Jan 2014 15:50:34 -0800

By the way, essentially this same problem<http://stackoverflow.com/questions/17773976/prevent-naive-longest-token-matching-in-marpar2scanless>cameup on stackoverflow, and a solution for the current SLIF is there.


-- jeffrey


On 01/06/2014 03:08 PM, Ruslan Zakirov wrote:

Hi,

Shorter script that demos problem: https://gist.github.com/ruz/8291475

Comments below:

On Tue, Jan 7, 2014 at 2:57 AM, Ron Savage <[email protected]<mailto:[email protected]>> wrote:


    I made some small changes:

    ron@zigzag:~/Documents/repos/marpa.papers$ diff
    ~/bin/vcard.parser.orig.pl <http://vcard.parser.orig.pl>
    ~/bin/vcard.parser.pl <http://vcard.parser.pl>
    0a1,2
    > #!/usr/bin/env perl
    >
    7c9
    < my $syntax = <<'END';
    ---
    > my $syntax = <<'EOS';
    15,17c17,19
    < group ~ A_D_D
    < name ~ A_D_D
    < params ::= ';' param_list | empty
    ---
    > group ::= A_D_D
    > name ::= A_D_D
    > params ::= SEMICOLON param_list | empty
    21c23
    < any_param_name ~ A_D_D
    ---
    > any_param_name ::= A_D_D
    86c88
    < END
    ---
    > EOS
    89c91
    < say "rules L0:\n", $grammar->show_rules(1, 'G0');
    ---
    > #say "rules L0:\n", $grammar->show_rules(1, 'G0');

    and I get:

    ron@zigzag:~/Documents/repos/marpa.papers$ ~/bin/vcard.parser.pl
    <http://vcard.parser.pl>
    Setting trace_terminals option
    Lexer "L0" rejected lexeme L1c1-11: text; value="BEGIN:VCARD"
    Lexer "L0" accepted lexeme L1c1-11: 'BEGIN:VCARD'; value="BEGIN:VCARD"

You see here that lexer rejected text rule, but accepted literal ruleof the same length.


    Lexer "L0" accepted lexeme L1c12: CRLF; value="
    "
    Lexer "L0" rejected lexeme L2c1-11: text; value="VERSION:4.0"
    Lexer "L0" accepted lexeme L2c1-11: 'VERSION:4.0'; value="VERSION:4.0"


Once again.

    Lexer "L0" accepted lexeme L2c12: CRLF; value="
    "
    Lexer "L0" rejected lexeme L3c1-49: text;
    value="UID:urn:uuid:4fbe8971-0bc3-424c-9c26-36c3e1eff6b1"


Here lexer went for longer match and never tried A_D_D; value="UID".

    progress:
    P0 @0-0 L1c1 vCards -> . vCard +
    P1 @0-0 L1c1 vCard -> . 'BEGIN:VCARD' CRLF 'VERSION:4.0' CRLF
    content 'END:VCARD'
    P36 @0-0 L1c1 :start -> . vCards
    R1:1 @0-1 L1c1-11 vCard -> 'BEGIN:VCARD' . CRLF 'VERSION:4.0' CRLF
    content 'END:VCARD'
    R1:2 @0-2 L1c1-12 vCard -> 'BEGIN:VCARD' CRLF . 'VERSION:4.0' CRLF
    content 'END:VCARD'
    R1:3 @0-3 L1c1-L2c11 vCard -> 'BEGIN:VCARD' CRLF 'VERSION:4.0' .
    CRLF content 'END:VCARD'
    R1:4 @0-4 L1c1-L2c12 vCard -> 'BEGIN:VCARD' CRLF 'VERSION:4.0'
    CRLF . content 'END:VCARD'
    P2 @4-4 L2c12 content -> . content_line +
    P3 @4-4 L2c12 content_line -> . content_name params ':' value CRLF
    P4 @4-4 L2c12 content_name -> . name
    P5 @4-4 L2c12 content_name -> . group '.' name
    P6 @4-4 L2c12 group -> . A_D_D
    P7 @4-4 L2c12 name -> . A_D_D

    Error in SLIF parse: No lexemes accepted at line 3, column 1
      Lexer "L0" rejected 1 lexeme(s)
      Rejected lexeme #1: text;
    value="UID:urn:uuid:4fbe8971-0bc3-424c-9c26-36c3e1eff6b1"; length = 49
    * String before error: BEGIN:VCARD\nVERSION:4.0\n
    * The error was at line 3, column 1, and at character 0x0055 'U', ...
    * here: UID:urn:uuid:4fbe8971-0bc3-424c-9c26-36c3e1eff6b1\n
    Marpa::R2 exception at /home/ron/bin/vcard.parser.pl
    <http://vcard.parser.pl> line 96.

    So it is trying A_D_D.


Sure. Recognizer waits for A_D_D, but lexer never offers it.

--You received this message because you are subscribed to the Google

    Groups "marpa parser" group.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to [email protected]
    <mailto:marpa-parser%[email protected]>.
    For more options, visit https://groups.google.com/groups/opt_out.




--
Best regards, Ruslan.
--

You received this message because you are subscribed to the GoogleGroups "marpa parser" group.To unsubscribe from this group and stop receiving emails from it, sendan email to [email protected].

For more options, visit https://groups.google.com/groups/opt_out.


--
You received this message because you are subscribed to the Google Groups "marpa 
parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: Scanless lexer doesn't try shorter lexem

Reply via email to