Re: Apache Module Development Query on character encodings.

Nick Kew Tue, 20 Oct 2015 15:38:23 -0700

On Tue, 20 Oct 2015 20:23:02 +0100
"John Dougrez-Lewis" <[email protected]> wrote:


> Hi,

Hi, are you by any chance the Raving Loony I once knew at Cambridge?

> I need to be able to service and respond to requests as follows:

Basically there are three parts to working with character encodings:
 * Detecting them in incoming data.
 * Converting them to order.
 * Correctly labelling outgoing data.

mod_xml2enc will do all that for libxml2-based filters,
and could easily be tweaked to drop the libxml2-specific
optimisations for general-purpose use.  Alternatively
the charset-detection from mod_xml2enc could probably
be folded into mod_charset_lite.

> The input and output buffers appears to be 8-bit char* based but I can't see
> any references to specific encodings.
> 
>  
> 
> How do I go about massaging the input & output into UTF-8 and fixed width
> 16-bit Unicode?
> 
>  
> 
> Are there any good references on how to achieve this?

It's a bit of a mess, because there are several different
standards (HTTP, XML and HTML), and in real life those are
sometimes in conflict.  The detection in mod_xml2enc has
been fine-tuned over the years and test-driven on a wide
range of scripts, including non-Latin charsets such
as Russian/Cyrillic and Arabic.

-- 
Nick Kew

Re: Apache Module Development Query on character encodings.

Reply via email to