Hi Apache developers,

next is a bug that causes mod_proxy_html to add some random characters (+html code) to HTML pages, if the document is smaller than 4 bytes. (Thomas, Ewald, this is issue #18378 in our Mantis). It looks like the output is from some kind of uninitialized memory. The added string sometimes matches part of a previously delivered request. Also, it looks like this only happens when doing multiple HTTP requests with the same browser and using HTTP Keep Alive.

The root cause is that the charset guessing with xml2enc needs to consume at least 4 bytes from the document to come to a conclusion. The consumed bytes are buffered so that they can later get prepended to the output again. But apparently it is assumed that there are always at least 4 bytes available, which in some cases is not the case. In these cases the buffer may contain some bytes left behind from the previous request on the same connection.

The attached patch fixes that issue by simply skipping documents smaller than 4 bytes. The rationale behind this is, that for HTML rewriting to do something useful, it needs to work on an absolute URL (i.e. including a schema). But as the schema "http" is already 4 bytes, there would be nothing to rewrite.

The patch is based on httpd trunk, rev. 1579365.

Please provide feedback whether I should file an issue in Apaches Bugzilla or whether this isn't needed in this case.

Regards,
Micha
Index: modules/filters/mod_proxy_html.c
===================================================================
--- modules/filters/mod_proxy_html.c	(Revision 1579365)
+++ modules/filters/mod_proxy_html.c	(Arbeitskopie)
@@ -885,6 +885,15 @@
         else if (apr_bucket_read(b, &buf, &bytes, APR_BLOCK_READ)
                  == APR_SUCCESS) {
             if (ctxt->parser == NULL) {
+                /* For documents smaller than four bytes, there is no reason to do
+                 * HTML rewriting. The URL schema (i.e. 'http') needs four bytes alone.
+                 * And the HTML parser needs at least four bytes to initialise correctly.
+                 */
+                if ((bytes < 4) && APR_BUCKET_IS_EOS(APR_BUCKET_NEXT(b))) {
+                    ap_remove_output_filter(f) ;
+                    return ap_pass_brigade(f->next, bb) ;
+                }
+
                 const char *cenc;
                 if (!xml2enc_charset ||
                     (xml2enc_charset(f->r, &enc, &cenc) != APR_SUCCESS)) {

Reply via email to