Mark van Assem wrote: > Hello Antlers, > > I'm designing a lexer/parser for units of measure (e.g. meters, > seconds). In that process I'm trying to match symbols like Ω (Ohm) and å > (angstrom).
The Ångstrom symbol is capital-A-ring (\u00C5 or \u212B), by the way. > Below is the relevant part of the grammar - the part that treats > symbols. The grammar checks out OK in ANTLRWorks, but I get a > EarlyExitException when I run it on a file that contains two lines with > on the first the Ohm sign and on the second the angstrom sign. The > behaviour is different in the interpreter: there the first line is > parsed OK, but for the second line a NoViableAltException is given. The grammar includes alpha, not the Ångstrom symbol, so that explains the interpreter behaviour. The behaviour when run on a file is likely to be a character encoding issue; make sure that the charset parameter to ANTLRInputStream matches the encoding of your file (probably UTF-8). Also, either make sure that the file does not contain an initial BOM (Byte Order Mark, \uFFEF), or match that character in your grammar. -- David-Sarah Hopwood ⚥ http://davidsarah.livejournal.com
signature.asc
Description: OpenPGP digital signature
List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
