Re: [PATCH] mod_proxy_html sometimes adds random characters to HTML pages smaller than 4 bytes

2014-03-19 Thread Micha Lenk

Hi,

On 19.03.2014 21:19, Jim Jagielski wrote:

It's always best, imo, to follow-up with a bugzilla entry with
description and patch.


Ok, this issue is now filed in ASF bugzilla as #56286.

Regards,
Micha


Re: [PATCH] mod_proxy_html sometimes adds random characters to HTML pages smaller than 4 bytes

2014-03-19 Thread Jim Jagielski
It's always best, imo, to follow-up with a bugzilla entry with
description and patch.

Thx!!
On Mar 19, 2014, at 3:58 PM, Micha Lenk  wrote:

> Hi Apache developers,
> 
> next is a bug that causes mod_proxy_html to add some random characters (+html 
> code) to HTML pages, if the document is smaller than 4 bytes. (Thomas, Ewald, 
> this is issue #18378 in our Mantis). It looks like the output is from some 
> kind of uninitialized memory. The added string sometimes matches part of a 
> previously delivered request. Also, it looks like this only happens when 
> doing multiple HTTP requests with the same browser and using HTTP Keep Alive.
> 
> The root cause is that the charset guessing with xml2enc needs to consume at 
> least 4 bytes from the document to come to a conclusion. The consumed bytes 
> are buffered so that they can later get prepended to the output again. But 
> apparently it is assumed that there are always at least 4 bytes available, 
> which in some cases is not the case. In these cases the buffer may contain 
> some bytes left behind from the previous request on the same connection.
> 
> The attached patch fixes that issue by simply skipping documents smaller than 
> 4 bytes. The rationale behind this is, that for HTML rewriting to do 
> something useful, it needs to work on an absolute URL (i.e. including a 
> schema). But as the schema "http" is already 4 bytes, there would be nothing 
> to rewrite.
> 
> The patch is based on httpd trunk, rev. 1579365.
> 
> Please provide feedback whether I should file an issue in Apaches Bugzilla or 
> whether this isn't needed in this case.
> 
> Regards,
> Micha
> 



[PATCH] mod_proxy_html sometimes adds random characters to HTML pages smaller than 4 bytes

2014-03-19 Thread Micha Lenk

Hi Apache developers,

next is a bug that causes mod_proxy_html to add some random characters 
(+html code) to HTML pages, if the document is smaller than 4 bytes. 
(Thomas, Ewald, this is issue #18378 in our Mantis). It looks like the 
output is from some kind of uninitialized memory. The added string 
sometimes matches part of a previously delivered request. Also, it looks 
like this only happens when doing multiple HTTP requests with the same 
browser and using HTTP Keep Alive.


The root cause is that the charset guessing with xml2enc needs to 
consume at least 4 bytes from the document to come to a conclusion. The 
consumed bytes are buffered so that they can later get prepended to the 
output again. But apparently it is assumed that there are always at 
least 4 bytes available, which in some cases is not the case. In these 
cases the buffer may contain some bytes left behind from the previous 
request on the same connection.


The attached patch fixes that issue by simply skipping documents smaller 
than 4 bytes. The rationale behind this is, that for HTML rewriting to 
do something useful, it needs to work on an absolute URL (i.e. including 
a schema). But as the schema "http" is already 4 bytes, there would be 
nothing to rewrite.


The patch is based on httpd trunk, rev. 1579365.

Please provide feedback whether I should file an issue in Apaches 
Bugzilla or whether this isn't needed in this case.


Regards,
Micha
Index: modules/filters/mod_proxy_html.c
===
--- modules/filters/mod_proxy_html.c	(Revision 1579365)
+++ modules/filters/mod_proxy_html.c	(Arbeitskopie)
@@ -885,6 +885,15 @@
 else if (apr_bucket_read(b, &buf, &bytes, APR_BLOCK_READ)
  == APR_SUCCESS) {
 if (ctxt->parser == NULL) {
+/* For documents smaller than four bytes, there is no reason to do
+ * HTML rewriting. The URL schema (i.e. 'http') needs four bytes alone.
+ * And the HTML parser needs at least four bytes to initialise correctly.
+ */
+if ((bytes < 4) && APR_BUCKET_IS_EOS(APR_BUCKET_NEXT(b))) {
+ap_remove_output_filter(f) ;
+return ap_pass_brigade(f->next, bb) ;
+}
+
 const char *cenc;
 if (!xml2enc_charset ||
 (xml2enc_charset(f->r, &enc, &cenc) != APR_SUCCESS)) {