Hi Nick, On 11/02/2012 07:25 PM CEST +02:00, Nick Kew wrote: >> just debugged a case where Apache used as reverse proxy filters a >> text/javascript file through mod_proxy_html and mod_xml2enc. As >> mod_proxy_html sees no business in filtering that file, it removes >> itself from the filter chain, but mod_xml2enc still tries to do its job. > > That looks like a logic bug you've found!
yes, that's also possible. > It looks like an edge case: one you'll only see when the charset coming > from the backend is not supported by libxml2 on your platform, so that > mod_xml2enc converts it using apr_iconv. No, not exactly this edge case. The backend server sends the response header "Content-Type: text/javascript", i.e. without any information about the used charset. From what I've seen in GDB, mod_xml2enc seems to resort to assume that the server sends ISO-8859-1 and without an error converts that to UTF-8 (even though in fact it seems to be mixed ISO-8859-1 and UTF-8). >> The attached patch based on httpd-trunk fixes that issue by removing the >> Content-Length header entirely. Please review it. I would appreciate it, >> if it could get applied to trunk and then backported to the httpd-2.4.x >> branch. > > Your patch fixes the immediate bug (thanks!), but the fact that > mod_xml2enc is doing anything at all in the case you describe is a > bigger bug. Ok, I too wondered about mod_xml2enc staying active being a bug, but was not sure. So I only fixed the immediate bug. If you consider this a bug, I assume the Content-Type check just needs to be unified. In mod_proxy_html check_filter_init() checks for "text/html" or "application/xhtml+xml", whereas in mod_xml2enc xml2enc_ffunc() checks for prefix "text/" or "xml" anywhere in the content type string. This is not consistent, and causes mod_proxy_html to skip text/javascript (or text/css) files, while mod_xml2enc takes them. > There's no easy solution: mod_proxy_html delays some of the checks > until it has a first chunk of data, to allow for cases where an earlier > filter (e.g. XSLT) might affect Content-Type. But by that time it's > too late to insert or uninsert the xml2enc filter, as that needs to go > in front of the proxy_html filter. Yes, the delayed checks also seem necessary for the charset guessing in case no charset is specified. But what about making the Content-Type check consistent? Regards, Micha