[antlr-dev] UNICODE file input for C Runtime

Goins, John C (IS) Wed, 06 Jan 2010 15:37:50 -0800

I've found ANTLR very useful as a language parser for my application,
but I now have a requirement to use UNICODE files as input.  I'm using
the C runtime since my application is written in C. I hope someone can
help me with a couple of questions.


There are two bytes at the beginning of a UNICODE file. My application
will be run on multiple platforms (Java wasn't an option) and I will
need to interpret the UNICODE BOM (byte order mark) since I don't think
ANTLR uses this, is that correct?  I can write a function to always set
the order to one particular way (the input files could come from
different architecture machines) by reading the BOM myself. I think that
is a correct approach, unless there is something in the ANTLR C Runtime
that can help.

I've read about how I need to convert a UNICODE file to UTF-32 and use
the UCS2 input functions, but I've had little to no success in doing so.
I get lots of errors or things just don't parse. Does anyone have sample
C code that accomplishes this? Or even the functions that I should use
and order in which to call them?

TIA

_______________________________________________
antlr-dev mailing list
antlr-dev@antlr.org
http://www.antlr.org/mailman/listinfo/antlr-dev

[antlr-dev] UNICODE file input for C Runtime

Reply via email to