Hi all,

I've been working on this on and off for a fortnight now, I've read the
list archives, read the tutorial, read the manpage, read the FAQ,
googled, read through examples, googled . . . and I'm still not getting
anywhere.  Any help would be greatly appreciated.

I'm writing a simple program to backup the servers here in work (I know
about Amanda; it's not doing what we want and it's regularly failing
here).  It's pretty much complete except for parsing the config file,
which has a relatively simple syntax:

tapedevice      /path/to/device
monthly         yes
backup {
        filesystem      /
        filesystem      /usr
        monthly         no
        weekly          yes
}
. . .

Global directives appear outside a backup section; the global section
must be followed by a backup section; there may be multiple
(global,backup) pairs within the config file.  Most directives
can be given in a either a backup or global section, but some are
specifically global or backup only.

So far I can parse any number of config files, and it all works fine,
creating my data structures etc, except for two problems.

PROBLEM 1.

I want to follow the standard Unix config file conventions of # starting
a comment, whitespace being insignificant and \ escaping a newline.  I
figured I'd do this by <skip: qr/.../>, which is working.  But if I try
$Parse::RecDescent::skip = qr/.../ it fails.  The actual regex is:

qr/((\s*\#.*\n)|(\s*\\\n)|\s*)*/

Which broken down is:
        (\s*\#.*\n)     # match a comment
        (\s*\\\n)       # backslash escaped newline
        \s*             # whitespace
Match any of the three any number of times.

The shortest test case I've come up with is:

-------------------------------------------------

#!/usr/bin/perl -w

use strict;
use warnings;

use Parse::RecDescent;
$::RD_HINT = 1;
$::RD_TRACE = 1;

my $parser1 = new Parse::RecDescent ( <<'GRAMMAR1' );
config: <skip: qr/((\s*\#.*\n)|(\s*\\\n)|\s*)*/>
        "tapedevice" "foo"
GRAMMAR1
rename "RD_TRACE", "RD_TRACE.1";

$Parse::RecDescent::skip = qr/((\s*\#.*\n)|(\s*\\\n)|\s*)*/;
my $parser2 = new Parse::RecDescent ( <<'GRAMMAR2' );
config: 
        "tapedevice" "foo"
GRAMMAR2
rename "RD_TRACE", "RD_TRACE.2";

my $text = <<'TEXT';
tapedevice      \
        foo
TEXT

print "before parser1: $text\n";
my $retval1 = $parser1->config ( $text );
defined $retval1 or warn "parser1 failed\n";
print "retval1: $retval1\n";

print "before parser2: $text\n";
my $retval2 = $parser2->config ( $text );
defined $retval2 or warn "parser2 failed\n";
print "retval2: $retval2\n";

-------------------------------------------------

Looking through RD_TRACE.1 and RD_TRACE.2 I see that RD_TRACE.1 has:
$skip= qr/((\s*\#.*\n)|(\s*\\\n)|\s*)*/;
whereas RD_TRACE.2 has:
$skip = '(?-xism:((\s*\#.*\n)|(\s*\\\n)|\s*)*)';
I've tried various combinations of backslashes incase it's an
interpolation issue, but that hasn't helped.  At this stage I'm stuck.
Obviously I could just stick with using the skip directive instead of
$Parse::RecDescent::skip, but I've spent long enough battling with this
that I want to know why it doesn't work.


PROBLEM 2:

I'm trying to get error messages from the parser I've created and I
can't figure out how.  Here's a contrived example because the real
grammar is too long to post; I'll explain my problems after it.

my $grammar = <<'GRAMMAR';
config:         global_line(s?) backup /\s*/

global_lines:     "foo"         <commit>        boolean
                | "bar"         <commit>        string
                | common

boolean:          /\b(yes|true|on|1)\b/
                | /\b(no|false|off|0)\b/
                | <error>

string:         /.*/

backup:         "backup" "{" backup_line(s?) "}"

backup_line:      "yellow"      <commit>        boolean
                | "blue"        <commit>        string
                | common

common:           "burger"      <commit>        boolean
                | "chips"       <commit>        boolean
                | "pizza"       <commit>        string
                | <error?>
GRAMMAR

If you feed this a correct file it's fine; feed it an incorrect file
containing something like:
yellow custard
and it fails to parse (good), but doesn't give a helpful error message.

Because global_line can match 0 times, even if the grammar generates an
error the error is ignored, until we get all the way back up the stack
and down into backup_lines, even if I stack | <error> all the way up.
When the message is eventually displayed, it says something like:
        found "yellow", expecting "backup"
This is incorrect: it should say something like:
        found "custard", expecting boolean

The <commit> does at least prune the tree, but I guess what I'm looking
for is a <reallycommit>, because I know if I reach the <commit> but fail
to match afterwards, there really, really is an error and it's never
going to match.  There was a similar mail to this list months ago, but
nothing suggested has worked for me.  Has anyone any ideas on how I can
achieve this?

Also when I do get error messages (from stacking <error> all the way up)
I get incorrect (negative) line numbers: instead of being $thisline,
it's: $thisline - number of lines in file.  I'm working around this by
splitting the file into an array, splitting the non-matched text into an
array, then figuring out the line number, but it's a nasty solution.  
Anyone else come across this and found a better solution?

Thanks for your time, help and reading all this way through such a long
mail :)

-- 
John Tobin
[Parrot] will have reflection, introspection, and Deep Meditative
Capabilities.
                    Dan Sugalski, 2002/07/11, [EMAIL PROTECTED]

Reply via email to