Re: mod_cache, mod_deflate and Vary: User-Agent
On 28 Aug 2009, at 06:13, toki...@aol.com wrote: Brian Akins of Turner Broadcasting, Inc. wrote... We are moving towards the 'if you say you support gzip, then you get gzip' attitude. The only approach that makes sense. Good to hear that from folks as big as you. There isn't a browser in the world that can 'Accept Encoding' successfully for ALL mime types. Huh? Whyever not? Encoding is orthogonal to MIME type, and for the ability to decode to be dependent on MIME type would indicate tortuously over-complicated and hopelessly broken browser design. -- Nick Kew
Re: mod_cache, mod_deflate and Vary: User-Agent
William A. Rowe, Jr. I think we blew it :) Vary: user-agent is not practical for correcting errant browser behavior. You have not 'blown it'. From a certain perspective, it's the only reasonable thing to do. Everyone keeps forgetting one very important aspect of this issue and that is the fact that the 'Browsers' themselves are participating in the whole 'caching' scheme and that they are the source of the actual requests, so their behavior is as much a part of the equation as any inline proxy cache. There is no real solution to this problem. The HTTP protocol itself does not have the capability to deal with things correctly with regards to compressed variants. The only decision that anyone needs to make is 'Where is the pain factor?'. If you VARY on ANYTHING other than 'User-Agent' then this might show some reduction of the pain factor at the proxy level but you have now exponentially increased the pain factor at the infamous 'Last Mile'. Most modern browsers will NOT 'cache' anything that has a 'Vary:' header OTHER than 'User-Agent:'. This is as true today as it was 10 years ago. The following discussion involving myself and some of the authors of the SQUID Proxy caching Server took place just short of SEVEN (7) YEARS ago but, as unbelievable as it might seem, is still just as relevant ( and unresolved )... http://marc.info/?l=apache-modgzipm=103958533520502w=2 It's way too long to reproduce here but here is just the SUMMARY part. You would have to access the link above to read all the gory details... [snip] Hello all. This is a continuation of the thread entitled... [Mod_gzip] mod_gzip_send_vary=Yes disables caching on IE After several hours spent doing my own testing with MSIE and digging into MSIE internals with a kernel debugger I think I have the answers. The news is NOT GOOD. I will start with a SUMMARY first for those who don't have the time to read the whole, ugly story but for those who want to know where the following 'conclusions' are coming from I refer you to the rest of the message and the detail. SUMMARY There is only 1 request header value that you can use with Vary: that will cause MSIE to cache a non-compressed response and that is ( drum roll please ) User-Agent. If you use ANY other (legal) request header field name in a Vary: header then MSIE ( Versions 4, 5 and 6 ) will REFUSE to cache that response in the MSIE local cache. This is why Jordan is seeing a caching problem and Slava is not. Slava is 'accidentally' using the only possible Vary: field name that will cause MSIE to behave as it should and cache a non-compressed response. Jordan is seeing non-compressed responses never being cached by MSIE because the responses are arriving with something other than Vary: User-Agent like Vary: Accept-Encoding. It should be perfectly legal and fine to send Vary: Accept-Encoding on a non-compressed response that can 'Vary' on that field value and that response SHOULD be 'cached' by MSIE... but so much for assumptions. MSIE will NOT cache this response. MSIE will treat ANY field name other than User-Agent as if Vary: * ( Vary + STAR ) was used and it will NOT cache the non-compressed response. The reason the COMPRESSED responses are, in fact, always getting cached no matter what Vary: field name is present is just as I suspected... it is because MSIE decides it MUST cache responses that arrive with Content-Encoding: gzip because it MUST have a disk ( cache ) file to work with in order to do the decompression. The problem exists in ALL versions of MSIE but it's even WORSE for any version earlier than 5.0. MSIE 4.x will not even cache responses with Vary: User-Agent. That's it for the SUMMARY. The rest of this message contains the gory details. [/snip] I participated in another lengthy 'offline' discussion about all this some 3 or 4 years ago again with the authors of SQUID. There was still no real resolution to the problem. The general consensus was that if there is always going to be a 'pain factor' then it's better to follow one of the rules of Networking and assume the following... The least amount of resources will always be present the closer you get to the last mile. In other words... it's BETTER to live with some redundant traffic at the proxy level, where the equipment and bandwidth is usually more robust and closer to the backbone, than to put the pain factor onto the 'last mile' where resources are usually more constrained. If anyone is going to start dropping some special code anywhere to 'invisibly handle the problem' my suggestion would be to look at coming up with a scheme that undoes the damage these out-of-control redundant 'User-Agent' strings are causing. The only thing a proxy cache really needs to know is whether a certain 'User-Agent' string represents a different level of DEVCAP than another one. If all that is changing is a version number and there is no change with regards to
Re: mod_cache, mod_deflate and Vary: User-Agent
On 8/26/09 3:20 PM, Paul Querna p...@querna.org wrote: I would write little lua scriptlets that map user agents to two buckets: supports gzip, doesnt support gzip. store the thing in mod_cache only twice, instead of once for every user agent. We do the same basic thing. We are moving towards the if you say you support gzip, then you get gzip attitude. I think less than 1% of our clients would be affected, and I think a lot of those are fake agents anyway. -- Brian Akins
Re: mod_cache, mod_deflate and Vary: User-Agent
Brian Akins of Turner Broadcasting, Inc. wrote... We are moving towards the 'if you say you support gzip, then you get gzip' attitude. There isn't a browser in the world that can 'Accept Encoding' successfully for ALL mime types. Some are better than others but there are always certain mime types that should never be returned with any 'Content Encoding' regardless of what the browser is saying. In that sense, you can never really trust the 'Accept-encoding: gzip, deflate' header at all. There is (currently) no mechanism in the HTTP protocol for a client to specify WHICH mime types it can successfully decode. It was supposed to be an 'all or nothing' DEVCAP indicator but that's not how things have evolved in the real world. There are really only 3 choices... 1. Stick with the original spec and continue to treat 'Accept-encoding: whatever' as an 'all or nothing' indicator with regards to possible mime types and treat every complaint of breakage as 'it's not our problem, your browser is non-compliant'. 2. Change the original spec and add a way for clients to indicate which mime types can be successfully decoded and then wait for all the resulting support code to be added to all Servers and Proxies. 3. Do nothing, and let every individual Server owner continue to find their own solution(s) to the problem(s). Yours Kevin Kiley -Original Message- From: Akins, Brian brian.ak...@turner.com To: dev@httpd.apache.org dev@httpd.apache.org Sent: Thu, Aug 27, 2009 9:42 am Subject: Re: mod_cache, mod_deflate and Vary: User-Agent On 8/26/09 3:20 PM, Paul Querna p...@querna.org wrote: I would write little lua scriptlets that map user agents to two buckets: supports gzip, doesnt support gzip. store the thing in mod_cache only twice, instead of once for every user agent. We do the same basic thing. We are moving towards the if you say you support gzip, then you get gzip attitude. I think less than 1% of our clients would be affected, and I think a lot of those are fake agents anyway. -- Brian Akins
Re: mod_cache, mod_deflate and Vary: User-Agent
On Wed, Aug 26, 2009 at 11:47 AM, William A. Rowe, Jr.wr...@rowe-clan.net wrote: I think we blew it :) Vary: user-agent is not practical for correcting errant browser behavior. For example; User-Agent: Mozilla/5.0 Gecko/20090729 Firefox/3.5.2 produces a myriad number of 'variant' flavors when tagging Vary with the User-Agent when determining if the deflate/gzip compression should be served, or the uncompressed variant. What we really meant to do was to determine which Accept-Encoding values were invalid based on known browser bugs, and -remove them- from the A-E header *prior* to determining the cache handling (quick handler hook) or typical content handling. Which implies that setenvif + headers need an extra chance to run really first in front of the quick handler. Any better suggestions? Yes, write a Varied header to 'hash' plugin API for mod_cache. I would write little lua scriptlets that map user agents to two buckets: supports gzip, doesnt support gzip. store the thing in mod_cache only twice, instead of once for every user agent.
Re: mod_cache, mod_deflate and Vary: User-Agent
Paul Querna wrote: Yes, write a Varied header to 'hash' plugin API for mod_cache. I would write little lua scriptlets that map user agents to two buckets: supports gzip, doesnt support gzip. store the thing in mod_cache only twice, instead of once for every user agent. This doesn't solve the problem of each-and-every downstream proxy cache storing an excessively large number of copies. Even if we strip down comments from the fields before choosing cache entries, Mozilla's many versions of Mozilla/2.0.3 and Gecko/20090731 tags are going to continue to proliferate copies. I'm suggesting that this might need to be 'invisibly' handled, not using Vary:, but by any proxy clever enough to detect the non-conforming browser to then strip the request to deflate/gzip. At that point, the choice-of-two becomes obvious to all proxies and back end servers with this knowledge. If this is unknown to an earlier proxy, the client could get the broken deflate/gzip content, but that seems unavoidable. Honestly, I can't see a way to honor HTTP/1.1 cache negotiation goals while minimizing cache pollution. I did consider a module (lua or otherwise) that would 'interfere' in the initial quick handler phase just to work out broken user agents, rather than carry the entire weight of setenvif/headers to the quick handler phase.
Re: mod_cache, mod_deflate and Vary: User-Agent
On Wed, Aug 26, 2009 at 2:50 PM, William A. Rowe, Jr.wr...@rowe-clan.net wrote: Paul Querna wrote: Yes, write a Varied header to 'hash' plugin API for mod_cache. I would write little lua scriptlets that map user agents to two buckets: supports gzip, doesnt support gzip. store the thing in mod_cache only twice, instead of once for every user agent. This doesn't solve the problem of each-and-every downstream proxy cache storing an excessively large number of copies. Even if we strip down comments from the fields before choosing cache entries, Mozilla's many versions of Mozilla/2.0.3 and Gecko/20090731 tags are going to continue to proliferate copies. I'm suggesting that this might need to be 'invisibly' handled, not using Vary:, but by any proxy clever enough to detect the non-conforming browser to then strip the request to deflate/gzip. At that point, the choice-of-two becomes obvious to all proxies and back end servers with this knowledge. If this is unknown to an earlier proxy, the client could get the broken deflate/gzip content, but that seems unavoidable. Honestly, I can't see a way to honor HTTP/1.1 cache negotiation goals while minimizing cache pollution. There isn't. So, optimize your cache, strip caching headers to downstream proxies. Maybe Waka can fix it.