Re: Parsers with Pre-processors

Luke Palmer Thu, 17 Jul 2003 23:08:52 -0700

> I've been re-reading A5 (regexen), and I was trying to work out how to 
> incorporate a preprocessor into regex, without a separate lexer. I came 
> to the conclusion that preprocessor commands are part of the whitespace 
> in the higher layer of the grammer. So we just need to define the <ws> 
> rule appropriately. Proprocessing also requires maintenance of filename 
> and linenum as we move between files.
> 
> The following code illustrates my thoughts: am I anywhere near close?
> 
> 
> grammar expression is preprocessed
> {
>      rule main { <expression>* <EOF> }
> 
>      rule expression
>      {
>          :w <number> <op> <number> { print eval join $op, @number }


C<join> is probably a method now so that should probably be either:

    { print eval @number.join($op) }

Or the equivalent indirect object syntax.  Then again, C<&*join> might
just be a forwarding function of the same nature.

>      }
> }
> 
> grammar preprocessed
> {
>      rule ws
>      {
>          <SUPER.ws> # we are extending the default <ws> rule

That's <SUPER::ws>, as SUPER isn't a parse object.  This shouldn't go
first either, because it matches a null string.

>          # match #include _filename_.
>          | ^^ \# <SUPER.ws>* include <SUPER.ws>* <filename>

<ws> matches optional repeated whitespace, so that * is not needed. 

Also, I think you want this all on the same line, right?  Those
<SUPER::ws>'s match newlines as well.  See below.

>            < <SUPER.ws> - \n >* \n

Hmmm, I don't know whether you can use C<-> like that on rules that
are more than single characters.  I'm not sure how to get around that
one without plunging into the innards of <ws>.   If you could,
however, it would be:

    < <SUPER::ws> - [\n] >

>              # assume .pos acts like a stack: push the
>              # new filehandle onto it: hope the handle
>              # will supply filename and line_num

That'd be nice.

>              { .pos.push open "<$filename" or fail }

Method calls need parens, though.

    { .pos.push(open "<$filename" err fail) }

>          # at end of file, pop the .pos stack, if not empty
>          | <EOF> { .pos.empty and fail } { .pos.pop }
>      }

You probably need to rename this rule C<my_ws is private> or
somesuch, and then make the ws rule:

    rule ws { <my_ws>+ }

Because you want to allow whitespace before and after your directive
with a single <ws> call.

>      # assume Safe module gives us a filename with no dangerous
>      # meta-chars
>      rule filename { Safe.filename }

Probably

    rule filename { <Safe::filename> }

Or even

    grammar preprocessed is private filename

(If private inheritance is supported).  From a design perspective,
either is valid.

Now about the < <SUPER::ws> - [\n] > thing.  Here's a very slow way of
doing it:

    <SUPER::ws> <( $0[-1] !~ /\n/ )>

It'd be nice to be able to tell a rule to minimal match:

    <SUPER::ws?> \n

But there could be so many different meanings of that for some
particular rule, that it's probably not possible.  The *rules (global)
might include ? variants of themselves, however, so things like this
would be easy.

And of course you could just do:

    \s*? \n

But that's not very good object oriented design (what if someone else
overrode <ws> above you?).

What about a rule junction?

    <all /<SUPER::ws>/, /\N*/>

(I am entirely unsure of my syntax there).  Presumably, that would
find a place in the input where both of those match from the point
they started, which would do it.  It could be optimized (heh :-) to
make an aggregate rule which checks both sides at each character
match, which would bring the match time out of exponential.

I see use for that construct.

Luke

> }
> 
> --
> Dave.

Re: Parsers with Pre-processors

Reply via email to