On Sep 23, 2009, at 11:53 AM, Brice Figureau wrote:

>
> On 23/09/09 20:42, Luke Kanies wrote:
>> On Sep 23, 2009, at 11:34 AM, Markus Roberts wrote:
>>
>>>>>> Given the ridiculous complexity we're looking at adding to work
>>>>> around this, making the lexer easier by adding the leading 'm'
>>>>> or whatever seems like a good step.  A bit ugly lexically, but
>>>>> way easier to implement and will help us avoid more pain in
>>>>> the future.
>>>> Definitely.
>>>> For which version^H^H^H^H codename do we plan the change?
>>> Well, it's a bugfix that just happens to horribly break backward
>>> compatibility.  But I'd say 0.25.1, because we're basically choosing
>>> between breaking compatibility with code in 0.24 (incorrectly lexing
>>> division) and 0.25 (changing the regex lexing).
>>>
>>> Thoughts?
>>>
>>> Yeah.  I don't like the leading 'm'.
>>>
>>> Further, I think the complexity in the workaround is coming from the
>>> approach (maintaining special purpose state) rather than the  
>>> problem.
>>>
>>> I'd favor (as I mentioned on the ticket) adding a general purpose
>>> "acceptable?" test to tokens and have the lexer use that in with the
>>> longest match first logic.
>>>
>>> Make "acceptable?" default to true, but in the case of REGEX have it
>>> check to see if it's in a value context (that is, that the last
>>> token generated was in the set LBRACE, RBRACE, COMMA, etc.).  As far
>>> as I can see, the last token gives a clean division between where
>>> DIV is acceptable and where REGEX is acceptable so there need only
>>> be one place (REGEX.acceptable?) where we do anything funky--and
>>> that just an [...].include? test.
>>>
>>> I'm working on debugging a 24.8/25.0 problem at the moment but I'll
>>> try to code this up later today.
>
> OK, thanks.
>
>> Ok, I look forward to seeing your code.  I also am not that fond of
>> the m//, but in the absence of another clean solution...
>
> Started working on this m// thing and found a bug in the lexer: in
> find_regex_token, when we have several tokens matching, then only the
> first one will win which is the contrary of what we want.
>
> Why?
> Because in the if matched_length > length we scan the input string  
> so we
> advance the string pointer and any other matching tokens will have
> difficulties to match again.
> It doesn't seem to me to be a feature (or I might be wrong).
>
> diff --git a/lib/puppet/parser/lexer.rb b/lib/puppet/parser/lexer.rb
> index 0db6c22..020963d 100644
> --- a/lib/puppet/parser/lexer.rb
> +++ b/lib/puppet/parser/lexer.rb
> @@ -297,12 +300,12 @@ class Puppet::Parser::Lexer
>
>              # We've found a longer match
>              if match_length > length
> -                value = @scanner.scan(token.regex)
>                  length = value.length
>                  matched_token = token
>              end
>          end
>
> +        value = @scanner.scan(matched_token.regex) unless
> matched_token.nil?
>          return matched_token, value
>      end

You're right - the scanning should be outside the loop.

-- 
Kai's Example Dilemma:
     A good analogy is like a diagonal frog.
---------------------------------------------------------------------
Luke Kanies | http://reductivelabs.com | http://madstop.com


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/puppet-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to