On Mon, Oct 12, 2009 at 4:19 PM, Joan Pujol <joanpu...@gmail.com> wrote: > Hi, > > I've a problem with the lexer. Given that grammar: > > > TEXT : (~'$' | '$' ~'{')+; > OTHER : .; > program : (TEXT|OTHER)+; > > If I try with input 'Hola ${ ' it does'nt works. And I can't understand > why. > If I look at generated code of rule TEXT it doesn't use a lookahead of > 2. In fact when it finds a '$' it enters in the subrule '$' ~'{ > without using LA(2) to see that the next caracter is a { and that then > it can't enter the subrule. > In fact as it only compares LA(1)='$' it enters the subrule but then > when it founds the{ it fails with a lexical error. > > And I can't understand why the lexer isn't capable to generate code to > compare with LA(1)='$' and LA(2)!='{' to enter the subrule. > > Is that the correct behaivour? Why de lexer can aproximate the K=2? > Someone can explain why? > > A lot of thanks in advance, >
There are far too many threads that refer to the FAQ entry, but not enough actual links (I've always found it hard to find while searching the FAQ, maybe I'm just blind or something). See here: http://www.antlr.org/wiki/display/ANTLR3/Lexer+grammar+for+floating+point,+dot,+range,+time+specs This is a common problem. Terrance gave some discussion in this thread: http://www.antlr.org/pipermail/antlr-interest/2007-October/024273.html It'd be great if the lexer would emit errors/warnings noting that there is input that can end in a no mans land and the lexer will blow up. I don't think it will do that as of 3.1.3 (I haven't played with newer versions). The crux of the problem is (with my limited understanding): Once the lexer/DFA decides: "A longer match *MIGHT* exist", there is no more backing up, or use a rule that matched the longest prefix so far. So if a rule looks like it might match longer input, the rules that match the shorter input are thrown away. A trivial case would be something like this: (From memory, I haven't actually put this into ANTLRWorks, but I know I've seen something similar to this setup have problems where lookahead was the only solution): ONE_BRACE: '{'; THREE_BRACE: '{{{'; If you feed it the input of '{{', it will blow up upon inside the lexer, despite the fact that the would would match ONE_BRACE, ONE_BRACE. Once the lexer/DFA decides, there have been enough braces that it could match a longer rule then ONE_BRACE any more, the lexer will never emit "ONE_BRACE"... So the THREE_BRACE is all that is left, once the "match" is called, it gives some type of RecognitionException. It might work around that if backtracking is enabled, I'm not sure. I know I've had problems like this where enabling backtracking resolved some of them, but not others. It's annoying, but ANTLR is useful enough I just get over it. I'm assuming that the it is because the '$' ~'{' is longer and could match, so the lexer/DFA has decided that the OTHER rule is no longer an option. You might try manually using the lookahead or a predicate like the following: // Syntactic Predicate I believe TEXT : (~'$' | ('$' ~'{') => ('$' ~'{'))+; // or // Semantic Predicate implementing simple look ahead I believe TEXT : (~'$' | { input.LA(1) == '$' && input.LA(2) != '{' } | '$')+; The first one is slower, but I believe more portable, the second one is faster, but the lexer will have to be adjusted for each language target (the first one would work in the Java and C targets, the second one would only work in the Java target as I understand it). Cheers, Kirby > > > > -- > Joan Jesús Pujol Espinar > http://www.joanpujol.cat > http://lujop.deviantart.com > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: > http://www.antlr.org/mailman/options/antlr-interest/your-email-address > List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to il-antlr-interest@googlegroups.com To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en -~----------~----~----~----~------~----~------~--~---