First off, welcome back. Since you've been away for a while, allow me
to let new readers know that you're the founder of this mailing list,
and someone whose support and advice have been very valuable to Marpa.
Second, I'm about to dive into the answer, but I'm very open to ideas
that would make Marpa easier to use.
Marpa does an old-fashioned longest-tokens-match. I does have
information about tokens expected, but it imitates traditional parsers
in not using that. At the beginning, it looks for the longest token.
If there is only one, and it is not acceptable to the grammar, the parse
fails. In this case it finds a <value>, and because a <value> is not
acceptable first thing, the parse fails.
Longest-tokens-match requires that you contrive it so that the longest
token, including those which the grammar will not find acceptable, is
always the one you want. Could Marpa do it differently? Yes, and it
will in the future. (Aside @amon: perhaps the IRIF already does better?)
-- jeffrey
On 01/06/2014 03:08 PM, Ruslan Zakirov wrote:
Hi,
Shorter script that demos problem: https://gist.github.com/ruz/8291475
Comments below:
On Tue, Jan 7, 2014 at 2:57 AM, Ron Savage <[email protected]
<mailto:[email protected]>> wrote:
I made some small changes:
ron@zigzag:~/Documents/repos/marpa.papers$ diff
~/bin/vcard.parser.orig.pl <http://vcard.parser.orig.pl>
~/bin/vcard.parser.pl <http://vcard.parser.pl>
0a1,2
> #!/usr/bin/env perl
>
7c9
< my $syntax = <<'END';
---
> my $syntax = <<'EOS';
15,17c17,19
< group ~ A_D_D
< name ~ A_D_D
< params ::= ';' param_list | empty
---
> group ::= A_D_D
> name ::= A_D_D
> params ::= SEMICOLON param_list | empty
21c23
< any_param_name ~ A_D_D
---
> any_param_name ::= A_D_D
86c88
< END
---
> EOS
89c91
< say "rules L0:\n", $grammar->show_rules(1, 'G0');
---
> #say "rules L0:\n", $grammar->show_rules(1, 'G0');
and I get:
ron@zigzag:~/Documents/repos/marpa.papers$ ~/bin/vcard.parser.pl
<http://vcard.parser.pl>
Setting trace_terminals option
Lexer "L0" rejected lexeme L1c1-11: text; value="BEGIN:VCARD"
Lexer "L0" accepted lexeme L1c1-11: 'BEGIN:VCARD'; value="BEGIN:VCARD"
You see here that lexer rejected text rule, but accepted literal rule
of the same length.
Lexer "L0" accepted lexeme L1c12: CRLF; value="
"
Lexer "L0" rejected lexeme L2c1-11: text; value="VERSION:4.0"
Lexer "L0" accepted lexeme L2c1-11: 'VERSION:4.0'; value="VERSION:4.0"
Once again.
Lexer "L0" accepted lexeme L2c12: CRLF; value="
"
Lexer "L0" rejected lexeme L3c1-49: text;
value="UID:urn:uuid:4fbe8971-0bc3-424c-9c26-36c3e1eff6b1"
Here lexer went for longer match and never tried A_D_D; value="UID".
progress:
P0 @0-0 L1c1 vCards -> . vCard +
P1 @0-0 L1c1 vCard -> . 'BEGIN:VCARD' CRLF 'VERSION:4.0' CRLF
content 'END:VCARD'
P36 @0-0 L1c1 :start -> . vCards
R1:1 @0-1 L1c1-11 vCard -> 'BEGIN:VCARD' . CRLF 'VERSION:4.0' CRLF
content 'END:VCARD'
R1:2 @0-2 L1c1-12 vCard -> 'BEGIN:VCARD' CRLF . 'VERSION:4.0' CRLF
content 'END:VCARD'
R1:3 @0-3 L1c1-L2c11 vCard -> 'BEGIN:VCARD' CRLF 'VERSION:4.0' .
CRLF content 'END:VCARD'
R1:4 @0-4 L1c1-L2c12 vCard -> 'BEGIN:VCARD' CRLF 'VERSION:4.0'
CRLF . content 'END:VCARD'
P2 @4-4 L2c12 content -> . content_line +
P3 @4-4 L2c12 content_line -> . content_name params ':' value CRLF
P4 @4-4 L2c12 content_name -> . name
P5 @4-4 L2c12 content_name -> . group '.' name
P6 @4-4 L2c12 group -> . A_D_D
P7 @4-4 L2c12 name -> . A_D_D
Error in SLIF parse: No lexemes accepted at line 3, column 1
Lexer "L0" rejected 1 lexeme(s)
Rejected lexeme #1: text;
value="UID:urn:uuid:4fbe8971-0bc3-424c-9c26-36c3e1eff6b1"; length = 49
* String before error: BEGIN:VCARD\nVERSION:4.0\n
* The error was at line 3, column 1, and at character 0x0055 'U', ...
* here: UID:urn:uuid:4fbe8971-0bc3-424c-9c26-36c3e1eff6b1\n
Marpa::R2 exception at /home/ron/bin/vcard.parser.pl
<http://vcard.parser.pl> line 96.
So it is trying A_D_D.
Sure. Recognizer waits for A_D_D, but lexer never offers it.
--
You received this message because you are subscribed to the Google
Groups "marpa parser" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to [email protected]
<mailto:marpa-parser%[email protected]>.
For more options, visit https://groups.google.com/groups/opt_out.
--
Best regards, Ruslan.
--
You received this message because you are subscribed to the Google
Groups "marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "marpa
parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.