You can, of course, do COMMENT : '\n' 'C' (~'\n')+ ;
NEWLINE: '\n' ; (the ordering matters for ANTLR 3's DFA construction), but the approach Brodie suggested is the common idiom since it costs less in terms of performance and does not depend on the quirks of ANTLR DFA construction. "Start of line" is a semantic notion, whereas '\n' 'C' specifies syntax. --Loring ----- Original Message ---- > From: Christian Convey <[email protected]> > To: John B. Brodie <[email protected]> > Cc: [email protected] > Sent: Sun, June 6, 2010 12:09:02 PM > Subject: Re: [antlr-interest] Parsing whole-line comments? > > Hi John, Thanks for the ideas. The "{ $type = ..." approach sounds > viable. But it still seems like a messier solution than I was hoping for > when I decided to take ANTLR for a test drive. Do you know why ANTLR > lacks regular expressions that can match the beginning-of-line? It > seems to me like it would go a long way to making line-oriented languages > easier to describe. I can't think of any good reason for ANTLR to not > support this, at least as an option. - C On Sun, Jun 6, 2010 at > 2:16 PM, John B. Brodie < > href="mailto:[email protected]">[email protected]> wrote: > > Greetings! > > On Sun, 2010-06-06 at 12:19 -0400, Christian Convey > wrote: >> > Alternatively, you can apply semantic predicate to lexer > rules like this: >> > ------------------------ >> > > >> > C: { $pos == 0 }?=> 'C' ; >> > >> > > ------------------------ >> > >> > It should only > match "C" at the beginning of the line, but I found (in >> > my noob > experiences) semantic predicate can be pretty tricky due to >> > > "hoisting out" business and how it affects prediction DFA construction > - >> > I'm sure more experienced hands can tell you > better. >> >> Thanks. But I'm actually pretty against > intermixing lexical, >> grammatical, and semantic rules. At that point > (at least in my >> particular project) I've given up most of the > clarity that I was >> hoping to gain by using ANTLR as opposed to a > hand-written recursive >> descent parser. >> >> I > think at this point I'm just going to hand-write the parser for my >> > DSL. Thanks very much for the help. >> > > you might look > at the Python lexer in the examples. It seems to me the > Python lexer > would have a similar problem to yours --- identifying white > space at the > beginning of a line --- your case seems a little simpler > because you > seem to care about just the first letter at the beginning of > the > line. > > also perhaps realizing that the first character of a line > must be > preceeded by a new-line character (except the very first > line). > > so: > > tokens { C; E; } > > > ...... > > NEWLINE : ( '\r' | '\n' )+ // for the last > line.... > ( 'C' { $type = C; } > | 'E' { $type = E; } > > //..... other first-char possibilities go here > ) > > ; > > CALL : 'CALL' ; > ID : ('a'..'z'|'A'..'Z')+ // or > whatever > > and of course create a wrapper around the input stream > in order to > supply a new-line as the very first character and then the > actual input > text as the rest of the stream. (in effect append a > new-line to the > front of the input) > > just a > thought..... > -jbb > > > List: > href="http://www.antlr.org/mailman/listinfo/antlr-interest" target=_blank > >http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: > href="http://www.antlr.org/mailman/options/antlr-interest/your-email-address" > target=_blank > >http://www.antlr.org/mailman/options/antlr-interest/your-email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
