> Aha!  Revisiting that, I see I still have an uncommitted patch to make
> content types to process configurable.  I think that was an issue you
> originally raised?  But compression is another issue.

Yep.

> Hmmm?

> If the backend sends compressed contents with no content-encoding,
doesn't that imply:
> 1. INFLATE doesn't see encoding, so steps away.
> 2. xml2enc and proxy-html can't parse compressed content, so step away
(log an error?)
> 3. DEFLATE … aha, that's what you meant about double-compression.
> In effect the whole chain was reduced to just DEFLATE.   That's a bit
nonsensical
> but not incorrect, and the user-agent will reverse the DEFLATE and
restore the
> original from the backend, yesno?

I think you are right. Yet, when using FF or Chrome (both in the latest
versions) the final result is 'double compressed' nonetheless. Repeating
the steps 'manually' (curl + gzip) it's all good, meaning the original file
from the server is restored as it should be. I'm reluctant to blame the
clients however.

> But is the real issue anything more than an inability to use
ProxyHTMLEnable
> with compressed contents? In which case, wouldn't mod_proxy_html be the
> place to patch?  Have it test/insert deflate at the same point as it
inserts xml2enc?

No, yes and I tried but couldn't get it to work. Following your advice I
went along the lines of

diff --git a/modules/filters/mod_proxy_html.c
b/modules/filters/mod_proxy_html.c
index b964fec..9760115 100644
--- a/modules/filters/mod_proxy_html.c
+++ b/modules/filters/mod_proxy_html.c
@@ -1569,10 +1569,19 @@ static void proxy_html_insert(request_rec *r)
     proxy_html_conf *cfg;
     cfg = ap_get_module_config(r->per_dir_config, &proxy_html_module);
     if (cfg->enabled) {
-        if (xml2enc_filter)
+        int add_deflate_output_filter = 0;
+        if (apr_table_get(r->headers_in, "Content-Encoding:") != NULL) {
+            ap_add_input_filter("inflate", NULL, r, r->connection);
+            add_deflate_output_filter = 1;
+        }
+        if (xml2enc_filter) {
             xml2enc_filter(r, NULL, ENCIO_INPUT_CHECKS);
+        }
         ap_add_output_filter("proxy-html", NULL, r, r->connection);
         ap_add_output_filter("proxy-css", NULL, r, r->connection);
+        if (add_deflate_output_filter) {
+            ap_add_output_filter("deflate", NULL, r, r->connection);
+        }
     }
 }
 static void proxy_html_hooks(apr_pool_t *p)

but it appears to be way off because it does exactly nothing. When logging
the headers at this point, I found r->headers_in to contain the client
request whereas r->headers_out was empty. Doesn't this tell me I'm doing
all of this too early ?




On Tue, Dec 17, 2013 at 12:47 PM, Nick Kew <[email protected]> wrote:

>
> On 17 Dec 2013, at 10:32, Thomas Eckert wrote:
>
> > I've been over this with Nick before: mod_proxy_html uses mod_xml2enc to
> do the detection magic but mod_xml2enc fails to detect compressed content
> correctly. Hence a simple "ProxyHTMLEnable" fails when content compression
> is in place.
>
> Aha!  Revisiting that, I see I still have an uncommitted patch to make
> content types to process configurable.  I think that was an issue you
> originally raised?  But compression is another issue.
>
> > To work around this without dropping support for content compression you
> can do
> >
> >   SetOutputfilter INFLATE;xml2enc;proxy-html;DEFLATE
> >
> > or at least that was the kind-of-result of the half-finished discussion
> last time.
>
> I didn't find that discussion.  But I suspect my reaction would have
> included
> a certain aversion to that level of processing overhead in the proxy in
> these
> days of fatter pipes and hardware compression.
>
> > Suppose the client does
> >
> >   GET /something.tar.gz HTTP/1.1
> >   ...
> >   Accept-Encoding: gzip, deflate
> >
> > to which the backend will respond with 200 but *not* send an
> "Content-Encoding" header since the content is already encoded. Using the
> above filter chain "corrupts" the content because it will be inflated and
> then deflated, double compressing it in the end.
>
> Hmmm?
>
> If the backend sends compressed contents with no content-encoding, doesn't
> that imply:
> 1. INFLATE doesn't see encoding, so steps away.
> 2. xml2enc and proxy-html can't parse compressed content, so step away
> (log an error?)
> 3. DEFLATE … aha, that's what you meant about double-compression.
> In effect the whole chain was reduced to just DEFLATE.   That's a bit
> nonsensical
> but not incorrect, and the user-agent will reverse the DEFLATE and restore
> the
> original from the backend, yesno?
>
> > Imho this whole issue lies with proxy_html using xml2enc to do the
> content type detection and xml2enc failing to detect the content encoding.
> I guess all it really takes is to have xml2enc inspect the headers_in to
> see if there is a "Content-Encoding" header and then add the
> inflate/deflate filters (unless there is a general reason not to rely on
> the input headers, see below).
>
> Well in this particular case, surely it lies with the backend?
> But is the real issue anything more than an inability to use
> ProxyHTMLEnable
> with compressed contents?  In which case, wouldn't mod_proxy_html be the
> place to patch?  Have it test/insert deflate at the same point as it
> inserts xml2enc?
>
> > Of course, this whole issue would disappear if inflate/deflate would be
> run automagically (upon seeing a Content-Encoding header) in general.
> Anyway, what's the reasoning behind not having them run always and give
> them the knowledge (e.g. about the input headers) to get out of the way if
> necessary ?
>
> That's an interesting thought.  mod_deflate will of course do exactly that
> if configured, so the issue seems to boil down to configuring that filter
> chain.
>
> The ultimate chain here would be:
> 1.      INFLATE // unpack compressed contents
> 2.      xml2enc         // deal with charset for libxml2/mod_proxy_html
> 3.      proxy-html      // fix URLs
> 4.      xml2enc         // set an output encoding other than utf-8
> 5.      DEFLATE // compress
>
> That's not possible with SetOutputFilter or FilterChain&family, because
> you can't configure both instances of xml2enc at once (that's what
> ProxyHTMLEnable deals with).  But of those, 4 and 5 seem low-priority
> as they're not doing really essential things.
>
> Returning to:
> > SetOutputfilter INFLATE;xml2enc;proxy-html;DEFLATE
>
> AFAICS the only thing that's missing is the nonessential step 4 above.
>
> Am I missing something?
>
> --
> Nick Kew

Reply via email to