As developer or co-developer of several libxml2-based filter
modules, I sometimes find myself wanting to replicate functionality
across a number of modules. One such case is improved
internationalisation, which is a good candidate for a separate
module. So I've been hacking just such a module: mod_xml2enc
is now at http://apache.webthing.com/mod_xml2enc/
The basic features are:
1. Sniff charset of incoming data, from (in order):
(a) HTTP headers, if available
(b) XML BOM / XML Declaration
(c) HTML <meta> elements
(d) Configuration default
2. If the charset is not supported by libxml2,
convert it to UTF-8 using apr_xlate (if supported).
3. Remove <meta> elements that are invalidated by
any such conversion.
4. Perform other preprocessing fixups, and offer an
optional hook for preprocessing.
5. Support post-filtering from UTF-8 to a server admin's
choice of charset.
This is work-in-progress, and currently won't do anything more
useful than crash your server. But I think it's time to
solicit developer feedback, particularly from those of you who
use libxml2 with apache. So I've committed it to public SVN,
and started on a module page:
http://apache.webthing.com/mod_xml2enc/
The challenging aspect of this is to enable it to be inserted
twice in a filter chain (before and after libxml2), and perform
different transformations each time. Currently it offers
configuration options appropriate to a pre-filter, and will
export a function for other filter modules to insert it with
their own configuration options (f->ctx) for post-filtering.
Unless anyone has a better suggestion.
--
Nick Kew
Application Development with Apache - the Apache Modules Book
http://www.apachetutor.org/