I mentioned this idea in a prior message, calling it a "zero-width negative look-ahead assertion", mainly because I am unfamiliar with the term "syntactic predicate". I didn't say much about implementation, though.
Two suggestions for this would be to allow ANTLR to: 1) treat: kreturn : 'return' ws? ; as if it were: kreturn : 'r' 'e' 't' 'u' 'r' 'n' ws? ; 2) permit the use of regular expressions instead of simple strings in such instances which would then make that production look like: kreturn : 'return\s?' ; and with the addition of dealing with the keyword problem: kreturn : 'return(?!\W)' ; where (?!\W) looks at but does not consume the next character on the stream and the test passes only if the character is a non-word ( \W matches anything that is not a-z, A-Z, 0-9 or _ ). I apologize for my ignorance of ANTLR architecture. I have no idea if or how the architecture of ANTLR would support this. On a related note, it looks to me that when you have something like: kreturn : 'return' ws?; that the lexer is automatically created with a lexical element 'return' whereas my initial expectation is that 'return' would only be tested for existence at that particular point in the parsing process. I think limiting the parser to only considering the rule at hand is crucial to obtaining the flexibility implied by a scanner-less parser generator. Best regards, Jason Doege On 4/17/2011 5:35 AM, Peter Kooiman wrote: > Ter, > > First of all, let me explain that the only reason I'm being such a nuisance > is that I really want this to work! However, I'm afraid that in the end, > ANTLR falls just short of being a scannerless tool. > > The problem lies with distinguishing between keywords, and identifiers that > happen to start with the same letters as a keyword. > The sample at http://bit.ly/gT3Q1C cannot distinguish between "returnme;" and > "return me;", because kreturn is expressed as: > > kreturn : 'r' 'e' 't' 'u' 'r' 'n' ws? ; > > My first thought was, just make the whitespace not optional. But, in C for > example, we can have > return; > return me; > > whereas "returnme;" would be a syntax error. Now, making ws not optional is > no longer possible; what is really needed is a way to express > "'r' 'e' 't' 'u' 'r' 'n' followed by anything that can NOT be part of an > identifier". Although you could re-write the return statement rule to > something awful like > > retstat: kreturn ws? colon > | kreturn ws id colon > ; > > the underlying problem remains: there is no way to prevent ANTLR entering > rule kreturn upon seeing an identifier like "returnme" that happens to start > with the same letters as keyword "return". In Rats!, you would write > > KRETURN = "return" !LetterOrDigit ws? ; > > where the "!" operator denotes a syntactic predicate meaning "LetterOrDigit > must not match, and corresponding input will not be consumed" > > Without the ability to express "something followed by anything that is not a > letter or digit", I don't see how to get it right in ANTLR. I very much hope > I am wrong though! > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: > http://www.antlr.org/mailman/options/antlr-interest/your-email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.