Hi, On Sat, Nov 5, 2011 at 4:16 PM, Patrick Zimmermann <patr...@zakweb.de>wrote:
> Hi, > > thank you a lot. > Using a lexer rule does in fact solve this problem. > > And now I am already on the next: > stripped down to: > > start : ('{' 'ab' '}')* '{a}'; > > using input: > {ab}{a} > > Will not list '{ab' on the input stream in AntlrWorks and thus fails to > parse > the input. I suspect this is another "should be done with the lexer"-thing. > No, the literals in your parser rule are implicit lexer rules, although it's better to create explicit rules instead of mixing them inside your parser rules: ABraced : '{a}'; OBrace : '{'; CBrace : '}'; AB : 'ab'; A : 'a'; If the lexer now tries to tokenize the input "{ab", then the lexer will see "{a" and expects a "}" but there's a "b" instead: and an error is emitted. > I'm currently thinking about whether ANTLR is the right tool for my job: > > In many cases the input I have is character wise context sensitive. I have > some areas (the free text area) where '(' and ')' have a specific meaning > and > others (the note area) where '(' ')' are simply normal text. Or whitespace > which is important in the text and to be ignored in tags and similar > constructs. > > If I'm not mistaken the lexer runs completely before the parser and > constructs > tokens. Those tokens are then matched by the parser. So if an input would > match several tokens (e.g. text not containing parenthesis) and the "wrong" > one is chosen by the lexer the parser is screwed, right? > Yes, the parser has no control over what tokens the lexer produces. > I currently realize that I am forced to use lexer rules for certain > constructs > (like ..) because I need character ranges to define the chars that are > allowed > (unicode, only certain languages). > > > Do you think ANTLR is the right tool for for this job and I'm just not > seeing > the point in how to do it, or should I better use something else? What? > You could let the lexer simply create single tokens and create parser rules that match a certain range of tokens (like the `ab` rule below): start : OBrace ab CBrace OBrace A CBrace EOF ; ab : A B ; OBrace : '{'; CBrace : '}'; A : 'a'; B : 'b'; > Thanks so far, > Patrick Regards, Bart. PS. could you use the list for communication please? List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to il-antlr-inter...@googlegroups.com. To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.