I mentioned this idea in a prior message, calling it a "zero-width 
negative look-ahead assertion", mainly because I am unfamiliar with the 
term "syntactic predicate". I didn't say much about implementation, though.

Two suggestions for this would be to allow ANTLR to:

1) treat:

kreturn : 'return' ws? ;

as if it were:

kreturn : 'r' 'e' 't' 'u' 'r' 'n' ws? ;

2) permit the use of regular expressions instead of simple strings in 
such instances which would then make that production look like:

kreturn : 'return\s?' ;

and with the addition of dealing with the keyword problem:

kreturn : 'return(?!\W)' ;

where (?!\W) looks at but does not consume the next character on the 
stream and the test passes only if the character is a non-word ( \W 
matches anything that is not a-z, A-Z, 0-9 or _ ).

I apologize for my ignorance of ANTLR architecture. I have no idea if or 
how the architecture of ANTLR would support this.

On a related note, it looks to me that when you have something like:

kreturn : 'return' ws?;

that the lexer is automatically created with a lexical element 'return' 
whereas my initial expectation is that 'return' would only be tested for 
existence at that particular point in the parsing process. I think 
limiting the parser to only considering the rule at hand is crucial to 
obtaining the flexibility implied by a scanner-less parser generator.

Best regards,
Jason Doege

On 4/17/2011 5:35 AM, Peter Kooiman wrote:
> Ter,
>
> First of all, let me explain that the only reason I'm being such a nuisance 
> is that I really want this to work! However, I'm afraid that in the end, 
> ANTLR falls just short of being a scannerless tool.
>
> The problem lies with distinguishing between keywords, and identifiers that 
> happen to start with the same letters as a keyword.
> The sample at http://bit.ly/gT3Q1C cannot distinguish between "returnme;" and 
> "return me;", because kreturn is expressed as:
>
> kreturn : 'r' 'e' 't' 'u' 'r' 'n' ws? ;
>
> My first thought was, just make the whitespace not optional. But, in C for 
> example, we can have
> return;
> return me;
>
> whereas "returnme;" would be a syntax error. Now, making ws not optional is 
> no longer possible; what is really needed is a way to express
> "'r' 'e' 't' 'u' 'r' 'n' followed by anything that can NOT be part of an 
> identifier". Although you could re-write the return statement rule to 
> something awful like
>
> retstat: kreturn ws? colon
>           | kreturn ws id colon
>           ;
>
> the underlying problem remains: there is no way to prevent ANTLR entering 
> rule kreturn upon seeing an identifier like "returnme" that happens to start 
> with the same letters as keyword "return". In Rats!, you would write
>
> KRETURN = "return" !LetterOrDigit ws? ;
>
> where the "!" operator denotes a syntactic predicate meaning "LetterOrDigit 
> must not match, and corresponding input will not be consumed"
>
> Without the ability to express "something followed by anything that is not a 
> letter or digit", I don't see how to get it right in ANTLR. I very much hope 
> I am wrong though!
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: 
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-inter...@googlegroups.com.
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.

Reply via email to