I've found ANTLR very useful as a language parser for my application, but I now have a requirement to use UNICODE files as input. I'm using the C runtime since my application is written in C. I hope someone can help me with a couple of questions.
There are two bytes at the beginning of a UNICODE file. My application will be run on multiple platforms (Java wasn't an option) and I will need to interpret the UNICODE BOM (byte order mark) since I don't think ANTLR uses this, is that correct? I can write a function to always set the order to one particular way (the input files could come from different architecture machines) by reading the BOM myself. I think that is a correct approach, unless there is something in the ANTLR C Runtime that can help. I've read about how I need to convert a UNICODE file to UTF-32 and use the UCS2 input functions, but I've had little to no success in doing so. I get lots of errors or things just don't parse. Does anyone have sample C code that accomplishes this? Or even the functions that I should use and order in which to call them? TIA
_______________________________________________ antlr-dev mailing list antlr-dev@antlr.org http://www.antlr.org/mailman/listinfo/antlr-dev