On Fri, 04 Jan 2008 22:47:16 +0100 Joachim Zobel <[EMAIL PROTECTED]> wrote:
> Am Dienstag, den 25.12.2007, 22:54 +0000 schrieb Nick Kew: > > As developer or co-developer of several libxml2-based filter > > modules, ... > > Hey, I thought you were on the expat side :) Just mod_xmlns. All my other SAX parsing modules are libxml2. > > The basic features are: > > 1. Sniff charset of incoming data, from (in order): > > (a) HTTP headers, if available > > (b) XML BOM / XML Declaration > > (c) HTML <meta> elements > > (d) Configuration default > > A configuration Like > XML2EncSniff HTTP XML META CONF > might be desirable for this in the long run. So one can for example > ignore META. Indeed, that's a thought. Not to mention sniffing according to Content-Type, since one purpose of this is *also* to support non-markup text. > > 2. If the charset is not supported by libxml2, > > convert it to UTF-8 using apr_xlate (if supported). > > 3. Remove <meta> elements that are invalidated by > > any such conversion. > > 4. Perform other preprocessing fixups, and offer an > > optional hook for preprocessing. > > This means e.g. fix XML decl. if the header tells different? Yes, though that's a TBD. > > 5. Support post-filtering from UTF-8 to a server admin's > > choice of charset. > > Good. > > > The challenging aspect of this is to enable it to be inserted > > twice in a filter chain (before and after libxml2), and perform > > different transformations each time. > > This means two different filter functions, right? No, one function, with its behaviour determined by its ctx. > > Currently it offers > > configuration options appropriate to a pre-filter, and will > > export a function for other filter modules to insert it with > > their own configuration options (f->ctx) for post-filtering. > > Unless anyone has a better suggestion. > > Why do you think it is necessary to ask other filters for > configuration this way? What is the advantage of this above simply > having configuration options for the post filter? That gets messy, with two filters both of AP_FTYPE_RESOURCE. If I hack it with offsets, that breaks interaction with other filters. > Hey, you may want to interface with mod_negotiate :) Charsets are not > really negotiable now, but with your module they will we. Hehe. Well, there's also mod_charset_lite:-) Thanks for the comments. -- Nick Kew Application Development with Apache - the Apache Modules Book http://www.apachetutor.org/
