Re: mod_dav and EBCDIC
On Sun, Nov 20, 2005 at 09:53:50AM -0500, Jeff Trawick wrote: On input path, ap_xml_parse_input() handles converting xml to native charset (at least in 2.2). On output, there is no provision for converting xml in responses. OK, pop quiz: how is a Unicode XML document getting converted into EBCDIC on input without losing most of the character set along the way? joe
Re: mod_dav and EBCDIC
Joe Orton wrote: On Sun, Nov 20, 2005 at 09:53:50AM -0500, Jeff Trawick wrote: On input path, ap_xml_parse_input() handles converting xml to native charset (at least in 2.2). On output, there is no provision for converting xml in responses. Yes there is, it's the explicit or implicit charset definition of the xml format. If mod_dav, or others, are generating EBCDIC xml without the proper charset tags (if such a thing were even possible to decode) then I'd think this is a much deeper problem in the modules creating the xml, not the httpd core. I understand all xml is iso-8859-1 until informed otherwise. How can the xml parser or xml apps do anything with alternate explicit charsets, if the core/filters switch the input or output stream to EBCDIC?
mod_dav and EBCDIC
On input path, ap_xml_parse_input() handles converting xml to native charset (at least in 2.2). On output, there is no provision for converting xml in responses. Some choices: (a) convert right in DAV before calling ap_fXXX() APIs (b) have DAV implement a filter that converts xml from native to UTF-8 (or whatever the xml says the charset is in); add the filter automatically within DAV when it generates an xml response; it would be prudent for mod_charset_lite to be aware of this filter so that it won't touch the body where it is added (c) have mod_charset_lite implement a special filter for this purpose; perhaps it uses the existing logic but the name of the filter sets the proper configuration; DAV would add this filter implicitly when it generates an xml response; user simply loads mod_charset_lite on non-ASCII machine and no further configuration is needed to get DAV xml translatable (d) ??? (some solutions in mod_charset_lite which require user to do special mod_charset_lite configuration, such as to indicate that dav xml is translated one way and actual content is potentially translated a different way, or not translated at all) (c) looks reasonable to me; they agree on filter name (mod_charset_lite.h) and it is presumed that it means to translate from the codepage of compliled in strings (such as the string DAV_XML_HEADER in mod_dav.h) to UTF-8 Thoughts?
Re: mod_dav and EBCDIC
On Sunday 20 November 2005 14:53, Jeff Trawick wrote: On input path, ap_xml_parse_input() handles converting xml to native charset (at least in 2.2). On output, there is no provision for converting xml in responses. Is this a hypothetical or real-life issue? Some choices: (a) convert right in DAV before calling ap_fXXX() APIs Ugh. What are filters for? (b) have DAV implement a filter that converts xml from native to UTF-8 (or whatever the xml says the charset is in); add the filter automatically within DAV when it generates an xml response; it would be prudent for mod_charset_lite to be aware of this filter so that it won't touch the body where it is added FWIW, libxml2 does that. There's well-tried-and-tested code for detecting encoding and converting to utf-8 in several of my filters; for example mod_proxy_html (which is GPL, but I'd have no problem relicensing the relevant parts for a good cause). As regards working with mod_charset_lite, mod_filter dispatching should deal with that. (c) have mod_charset_lite implement a special filter for this purpose; perhaps it uses the existing logic but the name of the filter sets the proper configuration; DAV would add this filter implicitly when it generates an xml response; user simply loads mod_charset_lite on non-ASCII machine and no further configuration is needed to get DAV xml translatable Bear in mind that a filter at the charset_lite/iconv level is going to have to edit the xmldecl and, in the case of (X)HTML, any meta element that sets charset. (c) looks reasonable to me; they agree on filter name (mod_charset_lite.h) and it is presumed that it means to translate from the codepage of compliled in strings (such as the string DAV_XML_HEADER in mod_dav.h) to UTF-8 Indeed, subject to the above caveats. -- Nick Kew