Am Dienstag, den 25.12.2007, 22:54 +0000 schrieb Nick Kew:
> As developer or co-developer of several libxml2-based filter
> modules, ...

Hey, I thought you were on the expat side :) 

> The basic features are:
>   1. Sniff charset of incoming data, from (in order):
>       (a) HTTP headers, if available
>       (b) XML BOM / XML Declaration
>       (c) HTML <meta> elements
>       (d) Configuration default

A configuration Like
XML2EncSniff HTTP XML META CONF
might be desirable for this in the long run. So one can for example
ignore META.

>   2. If the charset is not supported by libxml2,
>      convert it to UTF-8 using apr_xlate (if supported).
>   3. Remove <meta> elements that are invalidated by
>      any such conversion.
>   4. Perform other preprocessing fixups, and offer an
>      optional hook for preprocessing.

This means e.g. fix XML decl. if the header tells different?

>   5. Support post-filtering from UTF-8 to a server admin's
>      choice of charset.

Good.

> The challenging aspect of this is to enable it to be inserted
> twice in a filter chain (before and after libxml2), and perform
> different transformations each time. 

This means two different filter functions, right?

> Currently it offers
> configuration options appropriate to a pre-filter, and will
> export a function for other filter modules to insert it with
> their own configuration options (f->ctx) for post-filtering.
> Unless anyone has a better suggestion.

Why do you think it is necessary to ask other filters for configuration
this way? What is the advantage of this above simply having
configuration options for the post filter?

Hey, you may want to interface with mod_negotiate :) Charsets are not
really negotiable now, but with your module they will we.

Sincerely,
Joachim


Reply via email to