Re: mod_dav and EBCDIC

2005-11-23 Thread Joe Orton
On Sun, Nov 20, 2005 at 09:53:50AM -0500, Jeff Trawick wrote:
 On input path, ap_xml_parse_input() handles converting xml to native
 charset (at least in 2.2).  On output, there is no provision for
 converting xml in responses.

OK, pop quiz: how is a Unicode XML document getting converted into 
EBCDIC on input without losing most of the character set along the way?

joe


Re: mod_dav and EBCDIC

2005-11-23 Thread William A. Rowe, Jr.

Joe Orton wrote:

On Sun, Nov 20, 2005 at 09:53:50AM -0500, Jeff Trawick wrote:


On input path, ap_xml_parse_input() handles converting xml to native
charset (at least in 2.2).  On output, there is no provision for
converting xml in responses.


Yes there is, it's the explicit or implicit charset definition of the
xml format.  If mod_dav, or others, are generating EBCDIC xml without
the proper charset tags (if such a thing were even possible to decode)
then I'd think this is a much deeper problem in the modules creating
the xml, not the httpd core.  I understand all xml is iso-8859-1 until
informed otherwise.

How can the xml parser or xml apps do anything with alternate explicit
charsets, if the core/filters switch the input or output stream to EBCDIC?


mod_dav and EBCDIC

2005-11-20 Thread Jeff Trawick
On input path, ap_xml_parse_input() handles converting xml to native
charset (at least in 2.2).  On output, there is no provision for
converting xml in responses.

Some choices:

(a) convert right in DAV before calling ap_fXXX() APIs
(b) have DAV implement a filter that converts xml from native to UTF-8
(or whatever the xml says the charset is in); add the filter
automatically within DAV when it generates an xml response; it would
be prudent for mod_charset_lite to be aware of this filter so that it
won't touch the body where it is added
(c) have mod_charset_lite implement a special filter for this purpose;
perhaps it uses the existing logic but the name of the filter sets the
proper configuration; DAV would add this filter implicitly when it
generates an xml response; user simply loads mod_charset_lite on
non-ASCII machine and no further configuration is needed to get DAV
xml translatable
(d) ??? (some solutions in mod_charset_lite which require user to do
special mod_charset_lite configuration, such as to indicate that dav
xml is translated one way and actual content is potentially translated
a different way, or not translated at all)

(c) looks reasonable to me; they agree on filter name
(mod_charset_lite.h) and it is presumed that it means to translate
from the codepage of compliled in strings (such as the string
DAV_XML_HEADER in mod_dav.h) to UTF-8

Thoughts?


Re: mod_dav and EBCDIC

2005-11-20 Thread Nick Kew
On Sunday 20 November 2005 14:53, Jeff Trawick wrote:
 On input path, ap_xml_parse_input() handles converting xml to native
 charset (at least in 2.2).  On output, there is no provision for
 converting xml in responses.

Is this a hypothetical or real-life issue?


 Some choices:

 (a) convert right in DAV before calling ap_fXXX() APIs

Ugh.  What are filters for?

 (b) have DAV implement a filter that converts xml from native to UTF-8
 (or whatever the xml says the charset is in); add the filter
 automatically within DAV when it generates an xml response; it would
 be prudent for mod_charset_lite to be aware of this filter so that it
 won't touch the body where it is added

FWIW, libxml2 does that.  There's well-tried-and-tested code for
detecting encoding and converting to utf-8 in several of my filters;
for example mod_proxy_html (which is GPL, but I'd have no problem
relicensing the relevant parts for a good cause).

As regards working with mod_charset_lite, mod_filter dispatching
should deal with that.

 (c) have mod_charset_lite implement a special filter for this purpose;
 perhaps it uses the existing logic but the name of the filter sets the
 proper configuration; DAV would add this filter implicitly when it
 generates an xml response; user simply loads mod_charset_lite on
 non-ASCII machine and no further configuration is needed to get DAV
 xml translatable

Bear in mind that a filter at the charset_lite/iconv level is going to have to
edit the xmldecl and, in the case of (X)HTML, any meta element that
sets charset.

 (c) looks reasonable to me; they agree on filter name
 (mod_charset_lite.h) and it is presumed that it means to translate
 from the codepage of compliled in strings (such as the string
 DAV_XML_HEADER in mod_dav.h) to UTF-8

Indeed, subject to the above caveats.


-- 
Nick Kew