Re: Scanless lexer doesn't try shorter lexem

Ruslan Zakirov Mon, 06 Jan 2014 15:56:45 -0800

Hi,


Peter mentioned Longest-tokens-match off list an hour ago and I only
noticed it 5 minutes ago. This is what I was not expecting from scanerless
interface.

This means Repa is still valid thing. I should kill all attempts at
continuos parsing in it and release.

Pauses and manual lexing are not "sexy" :)

What is IRIF? Is it new marpa front end with inline actions?

On Tue, Jan 7, 2014 at 3:35 AM, Jeffrey Kegler <
[email protected]> wrote:

>  First off, welcome back.  Since you've been away for a while, allow me
> to  let new readers know that you're the founder of this mailing list, and
> someone whose support and advice have been very valuable to Marpa.
>
> Second, I'm about to dive into the answer, but I'm very open to ideas that
> would make Marpa easier to use.
>
> Marpa does an old-fashioned longest-tokens-match.  I does have information
> about tokens expected, but it imitates traditional parsers in not using
> that.  At the beginning, it looks for the longest token.  If there is only
> one, and it is not acceptable to the grammar, the parse fails.  In this
> case it finds a <value>, and because a <value> is not acceptable first
> thing, the parse fails.
>
> Longest-tokens-match requires that you contrive it so that the longest
> token, including those which the grammar will not find acceptable, is
> always the one you want.  Could Marpa do it differently?  Yes, and it will
> in the future.  (Aside @amon: perhaps the IRIF already does better?)
>

>
> -- jeffrey
>
>  On 01/06/2014 03:08 PM, Ruslan Zakirov wrote:
>
> Hi,
>
>  Shorter script that demos problem: https://gist.github.com/ruz/8291475
>
>  Comments below:
>
>
> On Tue, Jan 7, 2014 at 2:57 AM, Ron Savage <[email protected]> wrote:
>
>> I made some small changes:
>>
>>  ron@zigzag:~/Documents/repos/marpa.papers$ diff ~/bin/
>> vcard.parser.orig.pl ~/bin/vcard.parser.pl
>> 0a1,2
>> > #!/usr/bin/env perl
>> >
>> 7c9
>> < my $syntax = <<'END';
>> ---
>> > my $syntax = <<'EOS';
>> 15,17c17,19
>>  < group ~ A_D_D
>> < name ~ A_D_D
>> < params ::= ';' param_list | empty
>> ---
>> > group ::= A_D_D
>> > name ::= A_D_D
>> > params ::= SEMICOLON param_list | empty
>> 21c23
>> < any_param_name ~ A_D_D
>> ---
>> > any_param_name ::= A_D_D
>> 86c88
>> < END
>> ---
>> > EOS
>> 89c91
>> < say "rules L0:\n", $grammar->show_rules(1, 'G0');
>> ---
>> > #say "rules L0:\n", $grammar->show_rules(1, 'G0');
>>
>>  and I get:
>>
>>  ron@zigzag:~/Documents/repos/marpa.papers$ ~/bin/vcard.parser.pl
>> Setting trace_terminals option
>> Lexer "L0" rejected lexeme L1c1-11: text; value="BEGIN:VCARD"
>> Lexer "L0" accepted lexeme L1c1-11: 'BEGIN:VCARD'; value="BEGIN:VCARD"
>>
>
>  You see here that lexer rejected text rule, but accepted literal rule of
> the same length.
>
>   Lexer "L0" accepted lexeme L1c12: CRLF; value="
>> "
>> Lexer "L0" rejected lexeme L2c1-11: text; value="VERSION:4.0"
>> Lexer "L0" accepted lexeme L2c1-11: 'VERSION:4.0'; value="VERSION:4.0"
>>
>
>  Once again.
>
>
>>   Lexer "L0" accepted lexeme L2c12: CRLF; value="
>> "
>> Lexer "L0" rejected lexeme L3c1-49: text;
>> value="UID:urn:uuid:4fbe8971-0bc3-424c-9c26-36c3e1eff6b1"
>>
>>
>  Here lexer went for longer match and never tried A_D_D; value="UID".
>
>
>>  progress:
>> P0 @0-0 L1c1 vCards -> . vCard +
>> P1 @0-0 L1c1 vCard -> . 'BEGIN:VCARD' CRLF 'VERSION:4.0' CRLF content
>> 'END:VCARD'
>> P36 @0-0 L1c1 :start -> . vCards
>> R1:1 @0-1 L1c1-11 vCard -> 'BEGIN:VCARD' . CRLF 'VERSION:4.0' CRLF
>> content 'END:VCARD'
>> R1:2 @0-2 L1c1-12 vCard -> 'BEGIN:VCARD' CRLF . 'VERSION:4.0' CRLF
>> content 'END:VCARD'
>> R1:3 @0-3 L1c1-L2c11 vCard -> 'BEGIN:VCARD' CRLF 'VERSION:4.0' . CRLF
>> content 'END:VCARD'
>> R1:4 @0-4 L1c1-L2c12 vCard -> 'BEGIN:VCARD' CRLF 'VERSION:4.0' CRLF .
>> content 'END:VCARD'
>> P2 @4-4 L2c12 content -> . content_line +
>> P3 @4-4 L2c12 content_line -> . content_name params ':' value CRLF
>>  P4 @4-4 L2c12 content_name -> . name
>> P5 @4-4 L2c12 content_name -> . group '.' name
>>  P6 @4-4 L2c12 group -> . A_D_D
>> P7 @4-4 L2c12 name -> . A_D_D
>>
>>  Error in SLIF parse: No lexemes accepted at line 3, column 1
>>   Lexer "L0" rejected 1 lexeme(s)
>>   Rejected lexeme #1: text;
>> value="UID:urn:uuid:4fbe8971-0bc3-424c-9c26-36c3e1eff6b1"; length = 49
>>  * String before error: BEGIN:VCARD\nVERSION:4.0\n
>> * The error was at line 3, column 1, and at character 0x0055 'U', ...
>> * here: UID:urn:uuid:4fbe8971-0bc3-424c-9c26-36c3e1eff6b1\n
>> Marpa::R2 exception at /home/ron/bin/vcard.parser.pl line 96.
>>
>>  So it is trying A_D_D.
>>
>
>  Sure. Recognizer waits for A_D_D, but lexer never offers it.
>
>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "marpa parser" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>
>
>  --
> Best regards, Ruslan.
>  --
> You received this message because you are subscribed to the Google Groups
> "marpa parser" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "marpa parser" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>



-- 
Best regards, Ruslan.

-- 
You received this message because you are subscribed to the Google Groups 
"marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: Scanless lexer doesn't try shorter lexem

Reply via email to