> I've been re-reading A5 (regexen), and I was trying to work out how to > incorporate a preprocessor into regex, without a separate lexer. I came > to the conclusion that preprocessor commands are part of the whitespace > in the higher layer of the grammer. So we just need to define the <ws> > rule appropriately. Proprocessing also requires maintenance of filename > and linenum as we move between files. > > The following code illustrates my thoughts: am I anywhere near close? > > > grammar expression is preprocessed > { > rule main { <expression>* <EOF> } > > rule expression > { > :w <number> <op> <number> { print eval join $op, @number }
C<join> is probably a method now so that should probably be either: { print eval @number.join($op) } Or the equivalent indirect object syntax. Then again, C<&*join> might just be a forwarding function of the same nature. > } > } > > grammar preprocessed > { > rule ws > { > <SUPER.ws> # we are extending the default <ws> rule That's <SUPER::ws>, as SUPER isn't a parse object. This shouldn't go first either, because it matches a null string. > # match #include _filename_. > | ^^ \# <SUPER.ws>* include <SUPER.ws>* <filename> <ws> matches optional repeated whitespace, so that * is not needed. Also, I think you want this all on the same line, right? Those <SUPER::ws>'s match newlines as well. See below. > < <SUPER.ws> - \n >* \n Hmmm, I don't know whether you can use C<-> like that on rules that are more than single characters. I'm not sure how to get around that one without plunging into the innards of <ws>. If you could, however, it would be: < <SUPER::ws> - [\n] > > # assume .pos acts like a stack: push the > # new filehandle onto it: hope the handle > # will supply filename and line_num That'd be nice. > { .pos.push open "<$filename" or fail } Method calls need parens, though. { .pos.push(open "<$filename" err fail) } > # at end of file, pop the .pos stack, if not empty > | <EOF> { .pos.empty and fail } { .pos.pop } > } You probably need to rename this rule C<my_ws is private> or somesuch, and then make the ws rule: rule ws { <my_ws>+ } Because you want to allow whitespace before and after your directive with a single <ws> call. > # assume Safe module gives us a filename with no dangerous > # meta-chars > rule filename { Safe.filename } Probably rule filename { <Safe::filename> } Or even grammar preprocessed is private filename (If private inheritance is supported). From a design perspective, either is valid. Now about the < <SUPER::ws> - [\n] > thing. Here's a very slow way of doing it: <SUPER::ws> <( $0[-1] !~ /\n/ )> It'd be nice to be able to tell a rule to minimal match: <SUPER::ws?> \n But there could be so many different meanings of that for some particular rule, that it's probably not possible. The *rules (global) might include ? variants of themselves, however, so things like this would be easy. And of course you could just do: \s*? \n But that's not very good object oriented design (what if someone else overrode <ws> above you?). What about a rule junction? <all /<SUPER::ws>/, /\N*/> (I am entirely unsure of my syntax there). Presumably, that would find a place in the input where both of those match from the point they started, which would do it. It could be optimized (heh :-) to make an aggregate rule which checks both sides at each character match, which would bring the match time out of exponential. I see use for that construct. Luke > } > > -- > Dave.