Re: mod_cache, mod_deflate and Vary: User-Agent

2009-08-28 Thread Nick Kew


On 28 Aug 2009, at 06:13, toki...@aol.com wrote:



 Brian Akins of Turner Broadcasting, Inc. wrote...

 We are moving towards the 'if you say you support gzip,
 then you get gzip' attitude.


The only approach that makes sense.  Good to hear that from
folks as big as you.


There isn't a browser in the world that can 'Accept Encoding'
successfully for ALL mime types.


Huh?  Whyever not?  Encoding is orthogonal to MIME type,
and for the ability to decode to be dependent on MIME type
would indicate tortuously over-complicated and hopelessly
broken browser design.

--
Nick Kew


Re: mod_cache, mod_deflate and Vary: User-Agent

2009-08-27 Thread tokiley

 William A. Rowe, Jr.

 I think we blew it :)

 Vary: user-agent is not practical for correcting errant browser behavior.

You have not 'blown it'.

From a certain perspective, it's the only reasonable thing to do.

Everyone keeps forgetting one very important aspect of this issue
and that is the fact that the 'Browsers' themselves are 
participating in the whole 'caching' scheme and that they
are the source of the actual requests, so their behavior is
as much a part of the equation as any inline proxy cache.

There is no real solution to this problem.

The HTTP protocol itself does not have the capability
to deal with things correctly with regards to 
compressed variants.

The only decision that anyone needs to make is 'Where is
the pain factor?'.

If you VARY on ANYTHING other than 'User-Agent' then this
might show some reduction of the pain factor at the proxy
level but you have now exponentially increased the pain
factor at the infamous 'Last Mile'.

Most modern browsers will NOT 'cache' anything that has
a 'Vary:' header OTHER than 'User-Agent:'. This is as true
today as it was 10 years ago.

The following discussion involving myself and some of the 
authors of the SQUID Proxy caching Server took place just 
short of SEVEN (7) YEARS ago but, as unbelievable as it might
seem, is still just as relevant ( and unresolved )...

http://marc.info/?l=apache-modgzipm=103958533520502w=2

It's way too long to reproduce here but here is just 
the SUMMARY part. You would have to access the link
above to read all the gory details...

[snip]

 Hello all.

 This is a continuation of the thread entitled...

 [Mod_gzip] mod_gzip_send_vary=Yes disables caching on IE

 After several hours spent doing my own testing with MSIE and
 digging into MSIE internals with a kernel debugger I think I
 have the answers.

 The news is NOT GOOD.

 I will start with a SUMMARY first for those who don't have the
 time to read the whole, ugly story but for those who want to
 know where the following 'conclusions' are coming from I
 refer you to the rest of the message and the detail.

 SUMMARY

 There is only 1 request header value that you can use with
 Vary: that will cause MSIE to cache a non-compressed
 response and that is ( drum roll please ) User-Agent.

 If you use ANY other (legal) request header field name in
 a Vary: header then MSIE ( Versions 4, 5 and 6 ) will
 REFUSE to cache that response in the MSIE local cache.

 This is why Jordan is seeing a caching problem and Slava
 is not. Slava is 'accidentally' using the only possible Vary:
 field name that will cause MSIE to behave as it should
 and cache a non-compressed response.

 Jordan is seeing non-compressed responses never being
 cached by MSIE because the responses are arriving
 with something other than Vary: User-Agent like
 Vary: Accept-Encoding.

 It should be perfectly legal and fine to send Vary: Accept-Encoding
 on a non-compressed response that can 'Vary' on that field
 value and that response SHOULD be 'cached' by MSIE...
 but so much for assumptions. MSIE will NOT cache this response.

 MSIE will treat ANY field name other than User-Agent
 as if Vary: * ( Vary + STAR ) was used and it will
 NOT cache the non-compressed response.

 The reason the COMPRESSED responses are, in fact,
 always getting cached no matter what Vary: field name
 is present is just as I suspected... it is because MSIE
 decides it MUST cache responses that arrive with
 Content-Encoding: gzip because it MUST have a
 disk ( cache ) file to work with in order to do the
 decompression.

 The problem exists in ALL versions of MSIE but it's
 even WORSE for any version earlier than 5.0. MSIE 4.x
 will not even cache responses with Vary: User-Agent.

 That's it for the SUMMARY.

 The rest of this message contains the gory details.

[/snip]

I participated in another lengthy 'offline' discussion about
all this some 3 or 4 years ago again with the authors of 
SQUID. There was still no real resolution to the problem.

The general consensus was that if there is always going to
be a 'pain factor' then it's better to follow one of the
rules of Networking and assume the following...

The least amount of resources will always be present
the closer you get to the last mile.

In other words... it's BETTER to live with some redundant
traffic at the proxy level, where the equipment and bandwidth 
is usually more robust and closer to the backbone, than to put 
the pain factor onto the 'last mile' where resources are usually
more constrained.

If anyone is going to start dropping some special code
anywhere to 'invisibly handle the problem' my suggestion
would be to look at coming up with a scheme that undoes
the damage these out-of-control redundant 'User-Agent' strings are 
causing. The only thing a proxy cache really needs to know is
whether a certain 'User-Agent' string represents a 
different level of DEVCAP than another one. If all that
is changing is a version number and there is no change
with regards to 

Re: mod_cache, mod_deflate and Vary: User-Agent

2009-08-27 Thread Akins, Brian
On 8/26/09 3:20 PM, Paul Querna p...@querna.org wrote:

 I would write little lua scriptlets that map user agents to two
 buckets: supports gzip, doesnt support gzip.  store the thing in
 mod_cache only twice, instead of once for every user agent.

We do the same basic thing.  We are moving towards the if you say you
support gzip, then you get gzip attitude.  I think less than 1% of our
clients would be affected, and I think a lot of those are fake agents
anyway.


-- 
Brian Akins



Re: mod_cache, mod_deflate and Vary: User-Agent

2009-08-27 Thread tokiley


 Brian Akins of Turner Broadcasting, Inc. wrote...

 We are moving towards the 'if you say you support gzip,
 then you get gzip' attitude.

There isn't a browser in the world that can 'Accept Encoding'
successfully for ALL mime types.

Some are better than others but there are always certain
mime types that should never be returned with any
'Content Encoding' regardless of what the browser
is saying.

In that sense, you can never really trust the 
'Accept-encoding: gzip, deflate' header at all.

There is (currently) no mechanism in the HTTP protocol
for a client to specify WHICH mime types it can
successfully decode.

It was supposed to be an 'all or nothing' DEVCAP
indicator but that's not how things have evolved in
the real world.

There are really only 3 choices...

1. Stick with the original spec and continue to treat
'Accept-encoding: whatever' as an 'all or nothing' indicator
with regards to possible mime types and treat every 
complaint of breakage as 'it's not our problem, your 
browser is non-compliant'.

2. Change the original spec and add a way for clients 
to indicate which mime types can be successfully
decoded and then wait for all the resulting support code 
to be added to all Servers and Proxies.

3. Do nothing, and let every individual Server owner
continue to find their own solution(s) to the problem(s).

Yours
Kevin Kiley



 

-Original Message-
From: Akins, Brian brian.ak...@turner.com
To: dev@httpd.apache.org dev@httpd.apache.org
Sent: Thu, Aug 27, 2009 9:42 am
Subject: Re: mod_cache, mod_deflate and Vary: User-Agent










On 8/26/09 3:20 PM, Paul Querna p...@querna.org wrote:

 I would write little lua scriptlets that map user agents to two
 buckets: supports gzip, doesnt support gzip.  store the thing in
 mod_cache only twice, instead of once for every user agent.

We do the same basic thing.  We are moving towards the if you say you
support gzip, then you get gzip attitude.  I think less than 1% of our
clients would be affected, and I think a lot of those are fake agents
anyway.


-- 
Brian Akins




 



Re: mod_cache, mod_deflate and Vary: User-Agent

2009-08-26 Thread Paul Querna
On Wed, Aug 26, 2009 at 11:47 AM, William A. Rowe,
Jr.wr...@rowe-clan.net wrote:
 I think we blew it :)

 Vary: user-agent is not practical for correcting errant browser behavior.

 For example;

  User-Agent: Mozilla/5.0 Gecko/20090729 Firefox/3.5.2

 produces a myriad number of 'variant' flavors when tagging Vary with
 the User-Agent when determining if the deflate/gzip compression should
 be served, or the uncompressed variant.

 What we really meant to do was to determine which Accept-Encoding values
 were invalid based on known browser bugs, and -remove them- from the A-E
 header *prior* to determining the cache handling (quick handler hook) or
 typical content handling.

 Which implies that setenvif + headers need an extra chance to run really
 first in front of the quick handler.

 Any better suggestions?

Yes, write a Varied header to 'hash' plugin API for mod_cache.

I would write little lua scriptlets that map user agents to two
buckets: supports gzip, doesnt support gzip.  store the thing in
mod_cache only twice, instead of once for every user agent.


Re: mod_cache, mod_deflate and Vary: User-Agent

2009-08-26 Thread William A. Rowe, Jr.
Paul Querna wrote:
 
 Yes, write a Varied header to 'hash' plugin API for mod_cache.
 
 I would write little lua scriptlets that map user agents to two
 buckets: supports gzip, doesnt support gzip.  store the thing in
 mod_cache only twice, instead of once for every user agent.

This doesn't solve the problem of each-and-every downstream proxy
cache storing an excessively large number of copies.  Even if we
strip down comments from the fields before choosing cache entries,
Mozilla's many versions of Mozilla/2.0.3 and Gecko/20090731 tags
are going to continue to proliferate copies.

I'm suggesting that this might need to be 'invisibly' handled, not
using Vary:, but by any proxy clever enough to detect the non-conforming
browser to then strip the request to deflate/gzip.  At that point, the
choice-of-two becomes obvious to all proxies and back end servers with
this knowledge.  If this is unknown to an earlier proxy, the client
could get the broken deflate/gzip content, but that seems unavoidable.

Honestly, I can't see a way to honor HTTP/1.1 cache negotiation goals
while minimizing cache pollution.

I did consider a module (lua or otherwise) that would 'interfere' in
the initial quick handler phase just to work out broken user agents,
rather than carry the entire weight of setenvif/headers to the quick
handler phase.




Re: mod_cache, mod_deflate and Vary: User-Agent

2009-08-26 Thread Paul Querna
On Wed, Aug 26, 2009 at 2:50 PM, William A. Rowe,
Jr.wr...@rowe-clan.net wrote:
 Paul Querna wrote:

 Yes, write a Varied header to 'hash' plugin API for mod_cache.

 I would write little lua scriptlets that map user agents to two
 buckets: supports gzip, doesnt support gzip.  store the thing in
 mod_cache only twice, instead of once for every user agent.

 This doesn't solve the problem of each-and-every downstream proxy
 cache storing an excessively large number of copies.  Even if we
 strip down comments from the fields before choosing cache entries,
 Mozilla's many versions of Mozilla/2.0.3 and Gecko/20090731 tags
 are going to continue to proliferate copies.

 I'm suggesting that this might need to be 'invisibly' handled, not
 using Vary:, but by any proxy clever enough to detect the non-conforming
 browser to then strip the request to deflate/gzip.  At that point, the
 choice-of-two becomes obvious to all proxies and back end servers with
 this knowledge.  If this is unknown to an earlier proxy, the client
 could get the broken deflate/gzip content, but that seems unavoidable.

 Honestly, I can't see a way to honor HTTP/1.1 cache negotiation goals
 while minimizing cache pollution.

There isn't.  So, optimize your cache, strip caching headers to
downstream proxies.

Maybe Waka can fix it.