I've found ANTLR very useful as a language parser for my application,
but I now have a requirement to use UNICODE files as input.  I'm using
the C runtime since my application is written in C. I hope someone can
help me with a couple of questions.

There are two bytes at the beginning of a UNICODE file. My application
will be run on multiple platforms (Java wasn't an option) and I will
need to interpret the UNICODE BOM (byte order mark) since I don't think
ANTLR uses this, is that correct?  I can write a function to always set
the order to one particular way (the input files could come from
different architecture machines) by reading the BOM myself. I think that
is a correct approach, unless there is something in the ANTLR C Runtime
that can help.

I've read about how I need to convert a UNICODE file to UTF-32 and use
the UCS2 input functions, but I've had little to no success in doing so.
I get lots of errors or things just don't parse. Does anyone have sample
C code that accomplishes this? Or even the functions that I should use
and order in which to call them?

TIA
_______________________________________________
antlr-dev mailing list
antlr-dev@antlr.org
http://www.antlr.org/mailman/listinfo/antlr-dev

Reply via email to