1] Fix #2666 - Make sure regex are parsed in regex context

Brice Figureau Wed, 23 Sep 2009 08:49:47 -0700

On Wed, 2009-09-23 at 07:53 -0700, Markus Roberts wrote:
> 
> 
>         When lexing 4096, we'd reset @no_regex to false, and we'd
>         parse / 23 /
>         as a regex.
>         Essentially no_regex_after has 3 distinct values:
>                * true, means we can't lex a regex until it is said we
>         can
>                * false, we now can lex a regex
>                * nil, leave no_regex in the mode it was
>         
>         I know this is kludgy like the whole patch. I'd favor another
>         approach
>         if I'd have one.
>         
>         > I recognise that the behavior isn't the same and suspect
>         that it has
>         > something to do with
>         >
>         >         ')' => :RPAREN,
>         > -        '=' => :EQUALS,
>         > +        '=' => { :name => :EQUALS, :no_regex_after =>
>         true },
>         >         '+=' => :APPENDS,
>         >
>         > but I'm not clear on the purpose.  We're setting @no_regex
>         to true
>         > when we see an :EQUALS and then leaving it unchanged until
>         we see
>         > a  :CASE, :NODE, etc.  Why not take a more restrictive
>         stance and only
>         > allow regular expressions directly after :CASE, :NODE, etc.?
>         
>         
>         I don't remember exactly, but I wen't the other way because
>         there was a
>         case that wouldn't work easily (I tried at least 3 differents
>         alternatives yesterday in a small amount of time so things are
>         now
>         blurred in my mind).
> 
> I figured as much.  If you remember the case I'd be interested in
> knowing what it is; I may try to doodle something together as that's
> often the best way to see where the cliffs and pitfalls are in a
> problem like this.


I'll give it another round tonight after closing #2672. Maybe that'll be
enough to sparks something in my mind.

Hmm, now that I think about it, I think the issue was with the CASE,
specifically this:

case $variable {
        "something": {
                $othervar = 4096 / 2
        },
        /regex/: {
                notice("this notably sucks")
        }
}

Of course the proposed patch doesn't even fix it :-(

>         > I guess my inclination would have been to (as I noted on the
>         ticket)
>         > make the test a property of the :REGEX token type (using a
>         general
>         > "applicable" test that didn't refer to the REGEX rules), and
>         let it
>         > look at the most recently generated token(s) to make the
>         judgement
>         > rather than scattering the infomation on all the various
>         token types.
>         >  But it may be that I'm missing some subtilty of the syntax
>         that makes
>         > that infeasable.
>         
>         
>         My feeling is that either way will fail. There are too much
>         places where
>         a / can be mis-lexed as a regex.
> 
> So what if DIV is the default?  I was under the impression (perhaps
> mistaken) that there was a relatively small number of places a regular
> expression could occur (directly after NODE, directly after CASE,
> etc.) and everywhere else we could assume it was DIV because REGEXP
> wouldn't be legal.

On the contrary DIV is allowed only in expressions which happens only
after an :EQUALS or any branch after an IF.

You can find regex directly after a NODE, a MATCH, a NOMATCH. It can be
found a possibly large number of tokens (including every other kind of
tokens) after a CASE

The whole issue is that where a regex or DIV is allowed is defined in
the parser. That'd be über-great to keep this as is.

>         The more I think about it, the more I think the parser should
>         parse the
>         regex, and the lexer shouldn't do anything about those.
>         
>         Maybe with an untested grammar like this:
>         
>         regex: DIV regex_content DIV
>         regex_content: regex_content regex_string
>                       | LPAREN regex_content RPAREN
>                       | regex_content STAR
>                       | regex_content QMARK
>                       | LBRACE character_class RBRACE
>                       ...
>         
>         character_class: string
>         
>         Of course the lexer would have to provide some support,
>         especially it
>         should lex spaces between / /...
>         
>         What do you think?
> 
> I thought about this and its gets goofy around the edges.  Take:
> 
>     node /www(\d)\.case\.edu/ {...
> 
> and think about the token stream this would produce. The parser would
> have to de-tokenize CASE, for one thing, and I'm not sure how well the
> lexer would like the rest.

My stupid idea was to jump into a smaller-set regexp only lexer. But
that doesn't really solve the initial issue :-(

What'd be really great is to lex DIV as a REGEX, then the parser will
notice it is an error, enter error recovery (where we'd push back the
tokens to the lexer saying: please no more regex), and then lex again.
I still think it is possible with racc, as in error recovery mode you
get access to the current token stack. It would be enough pop it, and
tell the lexer to rewing the input stream to the previous match, then
exit error recovery.
The only issue is that racc documenation says: "DO NOT MODIFY THE
STACK", and looking at the code, it won't work.

> Interactive (where the parser asks the lexer for either a token or a
> character (with the ability to unget) would be cleaner.

But this feature is not available with racc.

I'm really considering David Schmitt's idea of modifying the regex
syntax to m// instead of //, if we can afford the change of course.
Otherwise, I'm afraid we'll have to deploy a large amount of efforts to
close every hole...
-- 
Brice Figureau
Follow the latest Puppet Community evolutions on www.planetpuppet.org!


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/puppet-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

[Puppet-dev] Re: [PATCH/puppet 1/1] Fix #2666 - Make sure regex are parsed in regex context

Reply via email to