Hi,
Peter mentioned Longest-tokens-match off list an hour ago and I only noticed it 5 minutes ago. This is what I was not expecting from scanerless interface. This means Repa is still valid thing. I should kill all attempts at continuos parsing in it and release. Pauses and manual lexing are not "sexy" :) What is IRIF? Is it new marpa front end with inline actions? On Tue, Jan 7, 2014 at 3:35 AM, Jeffrey Kegler < [email protected]> wrote: > First off, welcome back. Since you've been away for a while, allow me > to let new readers know that you're the founder of this mailing list, and > someone whose support and advice have been very valuable to Marpa. > > Second, I'm about to dive into the answer, but I'm very open to ideas that > would make Marpa easier to use. > > Marpa does an old-fashioned longest-tokens-match. I does have information > about tokens expected, but it imitates traditional parsers in not using > that. At the beginning, it looks for the longest token. If there is only > one, and it is not acceptable to the grammar, the parse fails. In this > case it finds a <value>, and because a <value> is not acceptable first > thing, the parse fails. > > Longest-tokens-match requires that you contrive it so that the longest > token, including those which the grammar will not find acceptable, is > always the one you want. Could Marpa do it differently? Yes, and it will > in the future. (Aside @amon: perhaps the IRIF already does better?) > > > -- jeffrey > > On 01/06/2014 03:08 PM, Ruslan Zakirov wrote: > > Hi, > > Shorter script that demos problem: https://gist.github.com/ruz/8291475 > > Comments below: > > > On Tue, Jan 7, 2014 at 2:57 AM, Ron Savage <[email protected]> wrote: > >> I made some small changes: >> >> ron@zigzag:~/Documents/repos/marpa.papers$ diff ~/bin/ >> vcard.parser.orig.pl ~/bin/vcard.parser.pl >> 0a1,2 >> > #!/usr/bin/env perl >> > >> 7c9 >> < my $syntax = <<'END'; >> --- >> > my $syntax = <<'EOS'; >> 15,17c17,19 >> < group ~ A_D_D >> < name ~ A_D_D >> < params ::= ';' param_list | empty >> --- >> > group ::= A_D_D >> > name ::= A_D_D >> > params ::= SEMICOLON param_list | empty >> 21c23 >> < any_param_name ~ A_D_D >> --- >> > any_param_name ::= A_D_D >> 86c88 >> < END >> --- >> > EOS >> 89c91 >> < say "rules L0:\n", $grammar->show_rules(1, 'G0'); >> --- >> > #say "rules L0:\n", $grammar->show_rules(1, 'G0'); >> >> and I get: >> >> ron@zigzag:~/Documents/repos/marpa.papers$ ~/bin/vcard.parser.pl >> Setting trace_terminals option >> Lexer "L0" rejected lexeme L1c1-11: text; value="BEGIN:VCARD" >> Lexer "L0" accepted lexeme L1c1-11: 'BEGIN:VCARD'; value="BEGIN:VCARD" >> > > You see here that lexer rejected text rule, but accepted literal rule of > the same length. > > Lexer "L0" accepted lexeme L1c12: CRLF; value=" >> " >> Lexer "L0" rejected lexeme L2c1-11: text; value="VERSION:4.0" >> Lexer "L0" accepted lexeme L2c1-11: 'VERSION:4.0'; value="VERSION:4.0" >> > > Once again. > > >> Lexer "L0" accepted lexeme L2c12: CRLF; value=" >> " >> Lexer "L0" rejected lexeme L3c1-49: text; >> value="UID:urn:uuid:4fbe8971-0bc3-424c-9c26-36c3e1eff6b1" >> >> > Here lexer went for longer match and never tried A_D_D; value="UID". > > >> progress: >> P0 @0-0 L1c1 vCards -> . vCard + >> P1 @0-0 L1c1 vCard -> . 'BEGIN:VCARD' CRLF 'VERSION:4.0' CRLF content >> 'END:VCARD' >> P36 @0-0 L1c1 :start -> . vCards >> R1:1 @0-1 L1c1-11 vCard -> 'BEGIN:VCARD' . CRLF 'VERSION:4.0' CRLF >> content 'END:VCARD' >> R1:2 @0-2 L1c1-12 vCard -> 'BEGIN:VCARD' CRLF . 'VERSION:4.0' CRLF >> content 'END:VCARD' >> R1:3 @0-3 L1c1-L2c11 vCard -> 'BEGIN:VCARD' CRLF 'VERSION:4.0' . CRLF >> content 'END:VCARD' >> R1:4 @0-4 L1c1-L2c12 vCard -> 'BEGIN:VCARD' CRLF 'VERSION:4.0' CRLF . >> content 'END:VCARD' >> P2 @4-4 L2c12 content -> . content_line + >> P3 @4-4 L2c12 content_line -> . content_name params ':' value CRLF >> P4 @4-4 L2c12 content_name -> . name >> P5 @4-4 L2c12 content_name -> . group '.' name >> P6 @4-4 L2c12 group -> . A_D_D >> P7 @4-4 L2c12 name -> . A_D_D >> >> Error in SLIF parse: No lexemes accepted at line 3, column 1 >> Lexer "L0" rejected 1 lexeme(s) >> Rejected lexeme #1: text; >> value="UID:urn:uuid:4fbe8971-0bc3-424c-9c26-36c3e1eff6b1"; length = 49 >> * String before error: BEGIN:VCARD\nVERSION:4.0\n >> * The error was at line 3, column 1, and at character 0x0055 'U', ... >> * here: UID:urn:uuid:4fbe8971-0bc3-424c-9c26-36c3e1eff6b1\n >> Marpa::R2 exception at /home/ron/bin/vcard.parser.pl line 96. >> >> So it is trying A_D_D. >> > > Sure. Recognizer waits for A_D_D, but lexer never offers it. > > >> -- >> You received this message because you are subscribed to the Google Groups >> "marpa parser" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> For more options, visit https://groups.google.com/groups/opt_out. >> > > > > -- > Best regards, Ruslan. > -- > You received this message because you are subscribed to the Google Groups > "marpa parser" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. > > > -- > You received this message because you are subscribed to the Google Groups > "marpa parser" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. > -- Best regards, Ruslan. -- You received this message because you are subscribed to the Google Groups "marpa parser" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
