> I want to follow the standard Unix config file conventions of > # starting > a comment, whitespace being insignificant and \ escaping a newline. I > figured I'd do this by <skip: qr/.../>, which is working. > But if I try > $Parse::RecDescent::skip = qr/.../ it fails. The actual regex is: > qr/((\s*\#.*\n)|(\s*\\\n)|\s*)*/
Have you considered that it probably makes much more sense to resolve this before parsing? Why not strip these out first? s/(?<!\\)#.$//mg; s/\\\n//g; I think a good rule of thumb with PRD grammars is don't do things within them that you don't have to. If you can move stuff from a grammar rule into a regex its usually a good thing. If you can move simple processing into a preproces then it is probably a good plan. If the rules of these things is too complex for a s/// preprocess then why not use two grammars? One to remove the whitespace and rejoin lines, and one to parse the resulting mess. Im pretty sure that the end result will have a lot cleaner grammar and will run faster. > I've tried various combinations of backslashes incase it's an > interpolation issue, but that hasn't helped. At this stage I'm stuck. Youll kick yourself. It works just fine with the regex: $Parse::RecDescent::skip=qr/(?:\s*#.*\n|\s*\\\s*\n|\s*)*/; or rather it parsed my $text="chips \\ true #comment backup{yellow true} #more comment"; on my machine no problem. (However its not clear to me why this works. If I change it to the hypothetically equivelent $Parse::RecDescent::skip=qr/(?:\s*#.*\n|\s*[\\]\s*\n|\s*)*/; it errors. Im not sure but I think Damian may be doing something really sneaky here and is handling the case of \\\s as a special case, but when its [\\] he doesnt do the same handling. Its unclear to me right now ) The key issue being the (\s*\\\n) on a win32 machine (where i am running) is that wont match as the text is actually (\s*\\\r\n) by sticking an \s* in between the backslash and the \n this is gobbled up and no problem. I suspect that if you are on unix that sometimes you are actually encountering "\\ \n" or the like. Also, is # ONLY used for comments? I would change the regex to be $Parse::RecDescent::skip=qr/\s*(?:(?!<[\\\\])#.*\n|[\\\\]\s*\n)*/; Incidentally (hopefully Damian is watching) there IS a bug in P:RD with regard to handling backslashing. I encountered it when trying to get the above regex to work. I originally tried: $Parse::RecDescent::skip=qr/\s*(?:(?!<\\)#.*\n|\\\s*\n)*/; Which produced the following error: Unmatched ( before HERE mark in regex m/\A( << HERE (?-xism:\s*(?:(?!<\)#.*\n|\\s*\n)*))/ at (eval 15) line 1567. The stringifed form the qr() is : (?-xism:\s*(?:(?!<\\)#.*\n|\\\s*\n)*) It looks like the problem is that the qr() in $PRD::skip gets stringified, and then evalled. This is a diasasterous circumstance as the eval destroys all of the benefit of using qr(), and is responsible for (?!<\\) becoming (?!<\) which causes all kinds of problems. Even worse is that \\\s* becomes \\s* which when evalled becomes \s* which of course will never match "\\ " (or shouldnt, im not so sure about what Damian is doing here.) This applies whether or not you use qr() or another form of quoting. The solution is as I did above, to use [\\\\] instead of \\ this is ok because /[\\\\]/ matches the same thing as /\\/ but when it gets double interpolated it becomes /[\\]/ which of course is also the same as /\\/ > > PROBLEM 2: > > I'm trying to get error messages from the parser I've created and I > can't figure out how. Here's a contrived example because the real > grammar is too long to post; I'll explain my problems after it. > > my $grammar = <<'GRAMMAR'; > config: global_line(s?) backup /\s*/ > > global_lines: "foo" <commit> boolean > | "bar" <commit> string > | common > > boolean: /\b(yes|true|on|1)\b/ > | /\b(no|false|off|0)\b/ > | <error> > > string: /.*/ > > backup: "backup" "{" backup_line(s?) "}" > > backup_line: "yellow" <commit> boolean > | "blue" <commit> string > | common > > common: "burger" <commit> boolean > | "chips" <commit> boolean > | "pizza" <commit> string > | <error?> > GRAMMAR > > If you feed this a correct file it's fine; feed it an incorrect file > containing something like: > yellow custard > and it fails to parse (good), but doesn't give a helpful error message. > > When the message is eventually displayed, it says something like: > found "yellow", expecting "backup" > This is incorrect: it should say something like: > found "custard", expecting boolean Why? The grammar say to first compare "yellow" against 'global_lines', and to accept that it wont match. It doesnt (as it does not begin with "foo" or "bar" or "burger","chips","pizza"), so the 'global_lines' rule is satisfied and it goes on to match "yellow" against 'backup'. 'backup' requires that the string starts with the literal "backup" which certainly doesn't match, so it complains of the fact, quite correctly. I'm curious why you think it should say what you do. It makes no sense to me, what rule do you think should be consuimg the "yellow"? Although I just ran the code, and it turns out that the posted code doesnt match at all. the rule config: global_line(s?) backup /\s*/ Doesnt refer to an existing rule ($::RD_WARN causes the following message to be displayed: Warning: Undefined (sub)rule "global_line" used in a production. (Hint: Will you be providing this rule later, or did you perhaps misspell "global_line"? Otherwise it will be treated as an immediate <reject>.) I guessed that you really meant config: global_lines(?) backup /\s*/ and when i ran the code against "yellow custard" I got no error at all (except that the parse() method returned false). But the trace is revealing: | common |Trying directive: [<error?:...>] | | common |>>Matched directive<< (return value: | | |[0]) | | common |>>Rejecting production<< (found | | |<reject>) | | common |<<Didn't match rule>> | |global_lin|<<Didn't match subrule: [common]>> | |global_lin|<<Didn't match rule>> | | config |>>Matched repeated subrule: | | |[global_lines]<< (0 times) | Which shows that the subrule 'global_lines' was considered to match (0 times) which indicates that it will then try to match "yellow" against the next subrule of 'config' > > The <commit> does at least prune the tree, but I guess what Afaict "yellow custard" doesn't ever cause a commit to fire. As such when it reaches the <error?> directive it simply ignores the error part and treats the rule as matching 0 times, which is an accepting condition. Now if you changed that error condition to be <error> I get ERROR (line 1): Invalid common: Was expecting 'burger', or 'chips', or 'pizza' Bad text! printing code (49598) to RD_TRACE (assuming the above change to the config rule is made). > Thanks for your time, help and reading all this way through > such a long mail :) Hope the reply wasnt to long for anybody either. HTH yves ps (If this post doesnt seem as well organized as it could be, its becuase i changed various bits over time as I explored the issues you have raised. So sometimes it may say what i though originally only to be followed by a calrification from further research. Apologies if you find this confusing.)