1] Fix #2666 - Make sure regex are parsed in regex context

Brice Figureau Wed, 23 Sep 2009 00:28:59 -0700

On Tue, 2009-09-22 at 18:07 -0700, Markus Roberts wrote:
> So I like the idea but I'm not thrilled about some of the details (as
> will always be the case with a hack of this sort I suppose).
> 
>         +    attr_accessor :line, :indefine, :no_regex
> 
> I'd call this "regex_permitted" or "may_be_followed_by_regex"; ditto
> with :no_regex_after


Sure, I don't mind.

> 
>                 def add_tokens(hash)
>                     hash.each do |regex, name|
>         -                add_token(name, regex)
>         +                options = {}
>         +                if name.is_a?(Hash)
>         +                    options = name
>         +                    name = options[:name]
>         +                    options.delete(:name)
>         +                end
>         +                add_token(name, regex, options)
>                     end
>                 end
> 
> I'd be inclined to write this:
> 
>         def add_tokens(hash)
>             hash.each do |regex, options|
>                 options = {:name => options} unless options.is_a? Hash
>                 name = options.delete(:name)
>                 add_token(name, regex, options)
>             end
>         end
> 
> but that's a minor stylistic matter.

OK.

> @@ -365,6 +373,12 @@ class Puppet::Parser::Lexer
> 
>                     @commentstack.push(comment)
>                 end
>         
>         +        if token.no_regex_after
>         +            @no_regex = true
>         +        elsif not token.no_regex_after.nil?
>         +            @no_regex = false
>         +        end
>         +
> 
> I'm not understanding why this isn't just
> 
> @no_regex = token.no_regex_after

Because we don't touch no_regex if the token is in no_regex_after dunno
mode.
Otherwise:

$var = 4096 / 23 / 4

When lexing 4096, we'd reset @no_regex to false, and we'd parse / 23 /
as a regex.
Essentially no_regex_after has 3 distinct values:
        * true, means we can't lex a regex until it is said we can
        * false, we now can lex a regex
        * nil, leave no_regex in the mode it was

I know this is kludgy like the whole patch. I'd favor another approach
if I'd have one.

> I recognise that the behavior isn't the same and suspect that it has
> something to do with
> 
>         ')' => :RPAREN,
> -        '=' => :EQUALS,
> +        '=' => { :name => :EQUALS, :no_regex_after => true },
>         '+=' => :APPENDS,
> 
> but I'm not clear on the purpose.  We're setting @no_regex to true
> when we see an :EQUALS and then leaving it unchanged until we see
> a  :CASE, :NODE, etc.  Why not take a more restrictive stance and only
> allow regular expressions directly after :CASE, :NODE, etc.?

I don't remember exactly, but I wen't the other way because there was a
case that wouldn't work easily (I tried at least 3 differents
alternatives yesterday in a small amount of time so things are now
blurred in my mind).

> I guess my inclination would have been to (as I noted on the ticket)
> make the test a property of the :REGEX token type (using a general
> "applicable" test that didn't refer to the REGEX rules), and let it
> look at the most recently generated token(s) to make the judgement
> rather than scattering the infomation on all the various token types.
>  But it may be that I'm missing some subtilty of the syntax that makes
> that infeasable.

My feeling is that either way will fail. There are too much places where
a / can be mis-lexed as a regex.

The more I think about it, the more I think the parser should parse the
regex, and the lexer shouldn't do anything about those.

Maybe with an untested grammar like this:

regex: DIV regex_content DIV
regex_content: regex_content regex_string 
               | LPAREN regex_content RPAREN
               | regex_content STAR
               | regex_content QMARK
               | LBRACE character_class RBRACE
               ...

character_class: string

Of course the lexer would have to provide some support, especially it
should lex spaces between / /...

What do you think?
-- 
Brice Figureau
Follow the latest Puppet Community evolutions on www.planetpuppet.org!


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/puppet-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

[Puppet-dev] Re: [PATCH/puppet 1/1] Fix #2666 - Make sure regex are parsed in regex context

Reply via email to