Mark van Assem wrote:
> Hello Antlers,
> 
> I'm designing a lexer/parser for units of measure (e.g. meters, 
> seconds). In that process I'm trying to match symbols like Ω (Ohm) and å 
> (angstrom).

The Ångstrom symbol is capital-A-ring (\u00C5 or \u212B), by the way.

> Below is the relevant part of the grammar -  the part that treats 
> symbols. The grammar checks out OK in ANTLRWorks, but I get a 
> EarlyExitException when I run it on a file that contains two lines with 
> on the first the Ohm sign and on the second the angstrom sign. The 
> behaviour is different in the interpreter: there the first line is 
> parsed OK, but for the second line a NoViableAltException is given.

The grammar includes alpha, not the Ångstrom symbol, so that explains
the interpreter behaviour. The behaviour when run on a file is likely
to be a character encoding issue; make sure that the charset parameter
to ANTLRInputStream matches the encoding of your file (probably UTF-8).
Also, either make sure that the file does not contain an initial BOM
(Byte Order Mark, \uFFEF), or match that character in your grammar.

-- 
David-Sarah Hopwood  ⚥  http://davidsarah.livejournal.com

Attachment: signature.asc
Description: OpenPGP digital signature

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

Reply via email to