On 17 Dec 2013, at 10:32, Thomas Eckert wrote:

> I've been over this with Nick before: mod_proxy_html uses mod_xml2enc to do 
> the detection magic but mod_xml2enc fails to detect compressed content 
> correctly. Hence a simple "ProxyHTMLEnable" fails when content compression is 
> in place.

Aha!  Revisiting that, I see I still have an uncommitted patch to make
content types to process configurable.  I think that was an issue you
originally raised?  But compression is another issue.

> To work around this without dropping support for content compression you can 
> do
> 
>   SetOutputfilter INFLATE;xml2enc;proxy-html;DEFLATE
> 
> or at least that was the kind-of-result of the half-finished discussion last 
> time.

I didn't find that discussion.  But I suspect my reaction would have included
a certain aversion to that level of processing overhead in the proxy in these
days of fatter pipes and hardware compression.

> Suppose the client does
>   
>   GET /something.tar.gz HTTP/1.1
>   ...
>   Accept-Encoding: gzip, deflate
> 
> to which the backend will respond with 200 but *not* send an 
> "Content-Encoding" header since the content is already encoded. Using the 
> above filter chain "corrupts" the content because it will be inflated and 
> then deflated, double compressing it in the end. 

Hmmm?

If the backend sends compressed contents with no content-encoding, doesn't that 
imply:
1. INFLATE doesn't see encoding, so steps away.
2. xml2enc and proxy-html can't parse compressed content, so step away (log an 
error?)
3. DEFLATE … aha, that's what you meant about double-compression.
In effect the whole chain was reduced to just DEFLATE.   That's a bit 
nonsensical
but not incorrect, and the user-agent will reverse the DEFLATE and restore the
original from the backend, yesno?

> Imho this whole issue lies with proxy_html using xml2enc to do the content 
> type detection and xml2enc failing to detect the content encoding. I guess 
> all it really takes is to have xml2enc inspect the headers_in to see if there 
> is a "Content-Encoding" header and then add the inflate/deflate filters 
> (unless there is a general reason not to rely on the input headers, see 
> below).

Well in this particular case, surely it lies with the backend?
But is the real issue anything more than an inability to use ProxyHTMLEnable
with compressed contents?  In which case, wouldn't mod_proxy_html be the
place to patch?  Have it test/insert deflate at the same point as it inserts 
xml2enc?

> Of course, this whole issue would disappear if inflate/deflate would be run 
> automagically (upon seeing a Content-Encoding header) in general. Anyway, 
> what's the reasoning behind not having them run always and give them the 
> knowledge (e.g. about the input headers) to get out of the way if necessary ?

That's an interesting thought.  mod_deflate will of course do exactly that
if configured, so the issue seems to boil down to configuring that filter chain.

The ultimate chain here would be:
1.      INFLATE // unpack compressed contents
2.      xml2enc         // deal with charset for libxml2/mod_proxy_html
3.      proxy-html      // fix URLs
4.      xml2enc         // set an output encoding other than utf-8
5.      DEFLATE // compress

That's not possible with SetOutputFilter or FilterChain&family, because
you can't configure both instances of xml2enc at once (that's what
ProxyHTMLEnable deals with).  But of those, 4 and 5 seem low-priority
as they're not doing really essential things.

Returning to:
> SetOutputfilter INFLATE;xml2enc;proxy-html;DEFLATE

AFAICS the only thing that's missing is the nonessential step 4 above.

Am I missing something?

-- 
Nick Kew

Reply via email to