On 23/09/09 20:42, Luke Kanies wrote:
> On Sep 23, 2009, at 11:34 AM, Markus Roberts wrote:
> 
>>>>> Given the ridiculous complexity we're looking at adding to work
>>>> around this, making the lexer easier by adding the leading 'm'
>>>> or whatever seems like a good step.  A bit ugly lexically, but
>>>> way easier to implement and will help us avoid more pain in
>>>> the future.
>>> Definitely.
>>> For which version^H^H^H^H codename do we plan the change?
>> Well, it's a bugfix that just happens to horribly break backward
>> compatibility.  But I'd say 0.25.1, because we're basically choosing
>> between breaking compatibility with code in 0.24 (incorrectly lexing
>> division) and 0.25 (changing the regex lexing).
>>
>> Thoughts?
>>
>> Yeah.  I don't like the leading 'm'.
>>
>> Further, I think the complexity in the workaround is coming from the  
>> approach (maintaining special purpose state) rather than the problem.
>>
>> I'd favor (as I mentioned on the ticket) adding a general purpose  
>> "acceptable?" test to tokens and have the lexer use that in with the  
>> longest match first logic.
>>
>> Make "acceptable?" default to true, but in the case of REGEX have it  
>> check to see if it's in a value context (that is, that the last  
>> token generated was in the set LBRACE, RBRACE, COMMA, etc.).  As far  
>> as I can see, the last token gives a clean division between where  
>> DIV is acceptable and where REGEX is acceptable so there need only  
>> be one place (REGEX.acceptable?) where we do anything funky--and  
>> that just an [...].include? test.
>>
>> I'm working on debugging a 24.8/25.0 problem at the moment but I'll  
>> try to code this up later today.

OK, thanks.

> Ok, I look forward to seeing your code.  I also am not that fond of  
> the m//, but in the absence of another clean solution...

Started working on this m// thing and found a bug in the lexer: in 
find_regex_token, when we have several tokens matching, then only the 
first one will win which is the contrary of what we want.

Why?
Because in the if matched_length > length we scan the input string so we 
advance the string pointer and any other matching tokens will have 
difficulties to match again.
It doesn't seem to me to be a feature (or I might be wrong).

diff --git a/lib/puppet/parser/lexer.rb b/lib/puppet/parser/lexer.rb
index 0db6c22..020963d 100644
--- a/lib/puppet/parser/lexer.rb
+++ b/lib/puppet/parser/lexer.rb
@@ -297,12 +300,12 @@ class Puppet::Parser::Lexer

              # We've found a longer match
              if match_length > length
-                value = @scanner.scan(token.regex)
                  length = value.length
                  matched_token = token
              end
          end

+        value = @scanner.scan(matched_token.regex) unless 
matched_token.nil?
          return matched_token, value
      end

-- 
Brice Figureau
My Blog: http://www.masterzen.fr/


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/puppet-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to