On Tue, 20 Oct 2015 20:23:02 +0100 "John Dougrez-Lewis" <jle...@lightblue.com> wrote:
> Hi, Hi, are you by any chance the Raving Loony I once knew at Cambridge? > I need to be able to service and respond to requests as follows: Basically there are three parts to working with character encodings: * Detecting them in incoming data. * Converting them to order. * Correctly labelling outgoing data. mod_xml2enc will do all that for libxml2-based filters, and could easily be tweaked to drop the libxml2-specific optimisations for general-purpose use. Alternatively the charset-detection from mod_xml2enc could probably be folded into mod_charset_lite. > The input and output buffers appears to be 8-bit char* based but I can't see > any references to specific encodings. > > > > How do I go about massaging the input & output into UTF-8 and fixed width > 16-bit Unicode? > > > > Are there any good references on how to achieve this? It's a bit of a mess, because there are several different standards (HTTP, XML and HTML), and in real life those are sometimes in conflict. The detection in mod_xml2enc has been fine-tuned over the years and test-driven on a wide range of scripts, including non-Latin charsets such as Russian/Cyrillic and Arabic. -- Nick Kew