Hi Yves, (and perhaps Damian, if he has time to read
until the end, sorry)
> "Orton, Yves" schrieb:
>
> > oh fine, the next time I will look first in the source!
>
> No. Read the docs first. Its there... :-)
Oh god, I red it, at least two times and also the FAQ. The next
time I use grep, sigh.
and later on in the doc (shame on my head, but anyway it
doesn't solve the problem with misleading <error> messages)
> Terminal Separators
>
> For the purpose of matching, each terminal in a production is
> considered to be preceded by a ``prefix'' - a pattern which
> must be matched before a token match is attempted. By default,
> the prefix is optional whitespace (which always matches, at
> least trivially), but this default may be reset in any production.
>
> The variable $Parse::RecDescent::skip stores the universal
> prefix, which is the default for all terminal matches in
> all parsers built with Parse::RecDescent.
>
> The prefix for an individual production can be altered by
> using the <skip:...> directive (see below).
but, and that was my problem, under the explanation of the
<skip> directive no longer mentioned.
> > but not proper matching, it's already stopping at the first
> > line comment
> > and therefore you get this ERROR messages as you get.
>
> Er, I dont understand you. That pattern will skip all line comments
> and whitespace. (Well, actually P::RD will match that regex repeated
> times as is necessary.)
no, look at this code (your regex) and trace and output:
#!/usr/bin/perl -w
use Parse::RecDescent;
$Parse::RecDescent::skip = qr{(^\s+|#.*$)+};
$RD_TRACE = 1;
my $grammar =<<'EOGRAMMAR';
file : int(s) /\z/
| <error>
int : /[+-]?\d+/
| <error>
EOGRAMMAR
my $parser = Parse::RecDescent->new($grammar);
my $text = <<'EOTEXT';
# comment
.123
EOTEXT
my $result = $parser->file($text);
Parse::RecDescent: Treating "file :" as a rule declaration
Parse::RecDescent: Treating "int(s)" as a one-or-more subrule match
Parse::RecDescent: Treating "/\z/" as a /../ pattern terminal
Parse::RecDescent: Treating "| <error" as a new (error) production
Parse::RecDescent: Treating "<error>" as an error marker
Parse::RecDescent: Treating "int :" as a rule declaration
Parse::RecDescent: Treating "/[+-]?\d+/" as a /../ pattern terminal
Parse::RecDescent: Treating "| <error" as a new (error) production
Parse::RecDescent: Treating "<error>" as an error marker
printing code (10158) to RD_TRACE
| file |Trying rule: [file] |
| file | |" # comment\n .123\n"
| file |Trying production: [int /\z/] |
| file |Trying repeated subrule: [int] |
| int |Trying rule: [int] |
| int |Trying production: [/[+-]?\d+/] |
| int |Trying terminal: [/[+-]?\d+/] |
| int |<<Didn't match terminal>> |
| int | |"# comment\n .123\n"
| int |Trying production: [<error...>] |
| int | |" # comment\n .123\n"
| int |Trying directive: [<error...>] |
| int | |"# comment\n .123\n"
| int |<<Didn't match directive>> |
| int |<<Didn't match rule>> |
| file |<<Didn't match repeated subrule: |
| |[int]>> |
| file | |" # comment\n .123\n"
| file |Trying production: [<error...>] |
| file |Trying directive: [<error...>] |
| file |<<Didn't match directive>> |
| file |<<Didn't match rule>> |
ERROR (line 1): Invalid int: Was expecting /[+-]?\\d+/
ERROR (line 1): Invalid file: Was expecting int
as you can see, the parser is never skipping over the first
comment due to the skip regex, and therefore the error message
is correct (all is happenend in line 1)
and with my flattened regex (thanks to your hints!)
#!/usr/bin/perl -w
use Parse::RecDescent;
$Parse::RecDescent::skip = qr{(\s+|#.*)+};
$RD_TRACE = 1;
my $grammar =<<'EOGRAMMAR';
file : int(s) /\z/
| <error>
int : /[+-]?\d+/
| <error>
EOGRAMMAR
my $parser = Parse::RecDescent->new($grammar);
my $text = <<'EOTEXT';
# comment
.123
EOTEXT
my $result = $parser->file($text);
Parse::RecDescent: Treating "file :" as a rule declaration
Parse::RecDescent: Treating "int(s)" as a one-or-more subrule match
Parse::RecDescent: Treating "/\z/" as a /../ pattern terminal
Parse::RecDescent: Treating "| <error" as a new (error) production
Parse::RecDescent: Treating "<error>" as an error marker
Parse::RecDescent: Treating "int :" as a rule declaration
Parse::RecDescent: Treating "/[+-]?\d+/" as a /../ pattern terminal
Parse::RecDescent: Treating "| <error" as a new (error) production
Parse::RecDescent: Treating "<error>" as an error marker
printing code (10156) to RD_TRACE
| file |Trying rule: [file] |
| file | |" # comment\n .123\n"
| file |Trying production: [int /\z/] |
| file |Trying repeated subrule: [int] |
| int |Trying rule: [int] |
| int |Trying production: [/[+-]?\d+/] |
| int |Trying terminal: [/[+-]?\d+/] |
| int |<<Didn't match terminal>> |
| int | |".123\n"
| int |Trying production: [<error...>] |
| int | |" # comment\n .123\n"
| int |Trying directive: [<error...>] |
| int | |".123\n"
| int |<<Didn't match directive>> |
| int |<<Didn't match rule>> |
| file |<<Didn't match repeated subrule: |
| |[int]>> |
| file | |" # comment\n .123\n"
| file |Trying production: [<error...>] |
| file |Trying directive: [<error...>] |
| file |<<Didn't match directive>> |
| file |<<Didn't match rule>> |
ERROR (line 2): Invalid int: Was expecting /[+-]?\\d+/
ERROR (line 1): Invalid file: Was expecting int
here you see in the rightmost row, that the skip was
porperly done, but the int terminal didn't match, and then
again (as with the <skip: directive>) the comment get's stuffed
backed to the input stream before the error is generated
even when this skip was not bound to this production.
That means, it helps in the moment nothing using a global
skip regex, not bound to any rule, as long as the skipped
patterns are not consumed.
BTW, Damian, it would be nice to see the matched text
by the skip pattern in the trace as we can see the
other matched things.
Damian should also use the skip on the input stream before he
produces the error message, I assume he is just trimming
with \s* the input stream (fast look in the code, not in the
doku, this time :)
> 1663 sub message ($)
> 1664 {
> 1665 my ($self) = @_;
> 1666 $self->{expected} = $self->{defexpected} unless $self->{expected};
> 1667 $self->{expected} =~ s/_/ /g;
> 1668 if (!$self->{unexpected} || $self->{unexpected} =~ /\A\s*\Z/s)
> 1669 {
> 1670 return "Was expecting $self->{expected}";
> 1671 }
> 1672 else
> 1673 {
> 1674 $self->{unexpected} =~ /\s*(.*)/;
> 1675 return "Was expecting $self->{expected} but found \"$1\" instead";
> 1676 }
> 1677 }
you see in line 1674, the remaining text is trimmed by whitespace,
not by Parse::RecDescent::skip or the skip regex for this production.
I think it's a bug.
Anyway, even when Damian fixes the remaining expected code,
the line number is still wrong because it was already reset
when the skipped text was stuffed back.
Damian should think about how to handle this skipped fragments
a little bit different as in the moment for the next release.
> Im not really sure what the difference is, but whatever makes you
> comfortable. I think that P::RD will automatically stick a \G or ^ or
> \A at the beginning anyway so it probably doesnt matter.... :-)
no, see my examples above. And even if you fix the \n after #.*$)
with #.*\n) you have to spend the /m modifier for multiline that
the beginning of the regex re{(^\s+|#.*\n)+}m finds the following
input stream:
# comment
# comment
which is just:
$text = ' # comment\n # comment\n'
but this isn't and wasn't the topic of this message.
The problem is with skip handling and misleading error
messages and strange error line indications when skipping
over different things than just the default.
Regards
Charly