By the way, essentially this same problem
<http://stackoverflow.com/questions/17773976/prevent-naive-longest-token-matching-in-marpar2scanless>came
up on stackoverflow, and a solution for the current SLIF is there.
-- jeffrey
On 01/06/2014 03:08 PM, Ruslan Zakirov wrote:
Hi,
Shorter script that demos problem: https://gist.github.com/ruz/8291475
Comments below:
On Tue, Jan 7, 2014 at 2:57 AM, Ron Savage <[email protected]
<mailto:[email protected]>> wrote:
I made some small changes:
ron@zigzag:~/Documents/repos/marpa.papers$ diff
~/bin/vcard.parser.orig.pl <http://vcard.parser.orig.pl>
~/bin/vcard.parser.pl <http://vcard.parser.pl>
0a1,2
> #!/usr/bin/env perl
>
7c9
< my $syntax = <<'END';
---
> my $syntax = <<'EOS';
15,17c17,19
< group ~ A_D_D
< name ~ A_D_D
< params ::= ';' param_list | empty
---
> group ::= A_D_D
> name ::= A_D_D
> params ::= SEMICOLON param_list | empty
21c23
< any_param_name ~ A_D_D
---
> any_param_name ::= A_D_D
86c88
< END
---
> EOS
89c91
< say "rules L0:\n", $grammar->show_rules(1, 'G0');
---
> #say "rules L0:\n", $grammar->show_rules(1, 'G0');
and I get:
ron@zigzag:~/Documents/repos/marpa.papers$ ~/bin/vcard.parser.pl
<http://vcard.parser.pl>
Setting trace_terminals option
Lexer "L0" rejected lexeme L1c1-11: text; value="BEGIN:VCARD"
Lexer "L0" accepted lexeme L1c1-11: 'BEGIN:VCARD'; value="BEGIN:VCARD"
You see here that lexer rejected text rule, but accepted literal rule
of the same length.
Lexer "L0" accepted lexeme L1c12: CRLF; value="
"
Lexer "L0" rejected lexeme L2c1-11: text; value="VERSION:4.0"
Lexer "L0" accepted lexeme L2c1-11: 'VERSION:4.0'; value="VERSION:4.0"
Once again.
Lexer "L0" accepted lexeme L2c12: CRLF; value="
"
Lexer "L0" rejected lexeme L3c1-49: text;
value="UID:urn:uuid:4fbe8971-0bc3-424c-9c26-36c3e1eff6b1"
Here lexer went for longer match and never tried A_D_D; value="UID".
progress:
P0 @0-0 L1c1 vCards -> . vCard +
P1 @0-0 L1c1 vCard -> . 'BEGIN:VCARD' CRLF 'VERSION:4.0' CRLF
content 'END:VCARD'
P36 @0-0 L1c1 :start -> . vCards
R1:1 @0-1 L1c1-11 vCard -> 'BEGIN:VCARD' . CRLF 'VERSION:4.0' CRLF
content 'END:VCARD'
R1:2 @0-2 L1c1-12 vCard -> 'BEGIN:VCARD' CRLF . 'VERSION:4.0' CRLF
content 'END:VCARD'
R1:3 @0-3 L1c1-L2c11 vCard -> 'BEGIN:VCARD' CRLF 'VERSION:4.0' .
CRLF content 'END:VCARD'
R1:4 @0-4 L1c1-L2c12 vCard -> 'BEGIN:VCARD' CRLF 'VERSION:4.0'
CRLF . content 'END:VCARD'
P2 @4-4 L2c12 content -> . content_line +
P3 @4-4 L2c12 content_line -> . content_name params ':' value CRLF
P4 @4-4 L2c12 content_name -> . name
P5 @4-4 L2c12 content_name -> . group '.' name
P6 @4-4 L2c12 group -> . A_D_D
P7 @4-4 L2c12 name -> . A_D_D
Error in SLIF parse: No lexemes accepted at line 3, column 1
Lexer "L0" rejected 1 lexeme(s)
Rejected lexeme #1: text;
value="UID:urn:uuid:4fbe8971-0bc3-424c-9c26-36c3e1eff6b1"; length = 49
* String before error: BEGIN:VCARD\nVERSION:4.0\n
* The error was at line 3, column 1, and at character 0x0055 'U', ...
* here: UID:urn:uuid:4fbe8971-0bc3-424c-9c26-36c3e1eff6b1\n
Marpa::R2 exception at /home/ron/bin/vcard.parser.pl
<http://vcard.parser.pl> line 96.
So it is trying A_D_D.
Sure. Recognizer waits for A_D_D, but lexer never offers it.
--
You received this message because you are subscribed to the Google
Groups "marpa parser" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to [email protected]
<mailto:marpa-parser%[email protected]>.
For more options, visit https://groups.google.com/groups/opt_out.
--
Best regards, Ruslan.
--
You received this message because you are subscribed to the Google
Groups "marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "marpa
parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.