On 23/09/09 21:03, Luke Kanies wrote: > On Sep 23, 2009, at 11:53 AM, Brice Figureau wrote: > >> On 23/09/09 20:42, Luke Kanies wrote: >>> On Sep 23, 2009, at 11:34 AM, Markus Roberts wrote: >>> >>>>>>> Given the ridiculous complexity we're looking at adding to work >>>>>> around this, making the lexer easier by adding the leading 'm' >>>>>> or whatever seems like a good step. A bit ugly lexically, but >>>>>> way easier to implement and will help us avoid more pain in >>>>>> the future. >>>>> Definitely. >>>>> For which version^H^H^H^H codename do we plan the change? >>>> Well, it's a bugfix that just happens to horribly break backward >>>> compatibility. But I'd say 0.25.1, because we're basically choosing >>>> between breaking compatibility with code in 0.24 (incorrectly lexing >>>> division) and 0.25 (changing the regex lexing). >>>> >>>> Thoughts? >>>> >>>> Yeah. I don't like the leading 'm'. >>>> >>>> Further, I think the complexity in the workaround is coming from the >>>> approach (maintaining special purpose state) rather than the >>>> problem. >>>> >>>> I'd favor (as I mentioned on the ticket) adding a general purpose >>>> "acceptable?" test to tokens and have the lexer use that in with the >>>> longest match first logic. >>>> >>>> Make "acceptable?" default to true, but in the case of REGEX have it >>>> check to see if it's in a value context (that is, that the last >>>> token generated was in the set LBRACE, RBRACE, COMMA, etc.). As far >>>> as I can see, the last token gives a clean division between where >>>> DIV is acceptable and where REGEX is acceptable so there need only >>>> be one place (REGEX.acceptable?) where we do anything funky--and >>>> that just an [...].include? test. >>>> >>>> I'm working on debugging a 24.8/25.0 problem at the moment but I'll >>>> try to code this up later today. >> OK, thanks. >> >>> Ok, I look forward to seeing your code. I also am not that fond of >>> the m//, but in the absence of another clean solution... >> Started working on this m// thing and found a bug in the lexer: in >> find_regex_token, when we have several tokens matching, then only the >> first one will win which is the contrary of what we want. >> >> Why? >> Because in the if matched_length > length we scan the input string >> so we >> advance the string pointer and any other matching tokens will have >> difficulties to match again. >> It doesn't seem to me to be a feature (or I might be wrong). >> >> diff --git a/lib/puppet/parser/lexer.rb b/lib/puppet/parser/lexer.rb >> index 0db6c22..020963d 100644 >> --- a/lib/puppet/parser/lexer.rb >> +++ b/lib/puppet/parser/lexer.rb >> @@ -297,12 +300,12 @@ class Puppet::Parser::Lexer >> >> # We've found a longer match >> if match_length > length >> - value = @scanner.scan(token.regex) >> length = value.length >> matched_token = token >> end >> end >> >> + value = @scanner.scan(matched_token.regex) unless >> matched_token.nil? >> return matched_token, value >> end > > You're right - the scanning should be outside the loop.
I'll file a ticket and send a patch tomorrow if someone doesn't do it before me. I'm wondering what will be the side-effects of this fix :-( -- Brice Figureau My Blog: http://www.masterzen.fr/ --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Puppet Developers" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/puppet-dev?hl=en -~----------~----~----~----~------~----~------~--~---
