Re: Wrong etag sent with mod_deflate
Henrik Nordstrom wrote: But the unique identity of the response entity is defined by request-URI + ETag and/or Content-Location. The cache is not supposed to evaluate Accept-* headers in determining the entity identity, only the origin server. However, on an initial request (ie, non-conditional) we do not have an etag from the client, we only have info like Host, URI, Accept-*, etc. So, how would the cache identify which entity to serve in this case? Please see RFC2616 13.6 Caching Negotiated Responses, it explains how the RFC intends that caches should operate wrt Vary, ETag and Content-Location in full detail. I have read it many times.. In our case - cnn.com, etc. - we have to decided to be RFC compliant from the client to the cache server. From the cache to the origin, however, we are not as concerned. In a reverse-proxy-cache, this is not a big deal. However, in a normal forward-proxy-cache, where one does not control both cache and origin, one must be more careful. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: Wrong etag sent with mod_deflate
ons 2006-12-13 klockan 08:51 -0500 skrev Brian Akins: However, on an initial request (ie, non-conditional) we do not have an etag from the client, we only have info like Host, URI, Accept-*, etc. So, how would the cache identify which entity to serve in this case? You have the URL and the other cached entities of that URL. It does not matter if the client request was a conditional or not. The conditions in the request is on the response to see if it should be a 200 or 304, not selectors on what entity to respond with. The selected response entity is always the same for the same request, with or without conditions. Obviously on the very first request for a given URL you have nothing, and that request is forwarded without any added condition. However, after that every Vary cache miss on that URL is a If-None-Match conditional to ask the server if any of the cached entity variants is applicable for the current request. I have read it many times.. In our case - cnn.com, etc. - we have to decided to be RFC compliant from the client to the cache server. From the cache to the origin, however, we are not as concerned. And you are free to. A reverse proxy is by definition the origin server. How it finds the content is of no concern to the RFC, just happens to be HTTP and not plain files, NFS, database or whatever. In a reverse-proxy-cache, this is not a big deal. However, in a normal forward-proxy-cache, where one does not control both cache and origin, one must be more careful. Indeed. But on the other hand it's actually reverse proxy configurations which has pushed for 13.6 compliance in Squid as it's a lot easier for processing intensive servers to evaluate If-None-Match than to render the entity again, and when you depend on Accept-Language + Accept-Encoding + User-Agent the number of request combinations becomes quite significant, especially if there maybe only is two or three variants under the URL. Regards Henrik signature.asc Description: Detta är en digitalt signerad meddelandedel
Re: Wrong etag sent with mod_deflate
Henrik Nordstrom wrote: mån 2006-12-11 klockan 14:25 -0500 skrev Brian Akins: So, multiple variants of the same object can have the same Etag, but still be different cached objects. Your implementation ignores RFC 2616 13.6 Caching Negotiated Responses, but is otherwise fine. It's functionally compliant but not as effective as it could be. That was a simplified explanation, we actually do not store a cache entry for every single variant. In our case the only thing we actually ever care about is whether or not you support gzip. So all the variants for Vary: User-Agent, Accept-Encoding actually boil down to 2 variants - gzip or no-gzip. One of the major reasons we quit using squid was it support for Vary's. (This was pre-3.0, so things may have changed). Of course, at the time httpd wasn't any better - but it was alot easier to hack ;) Variants is identified by ETag or Content-Location. Only if there is neither ETag or Content-Location in the response entity then is the response entity identified by the Vary request headers. Only conditional requests from clients, generally, have If-None-Match headers. So the only way for a cache, on an initial request from a client, to determine what object to serve is to use the Client supplied information - which doesn't include an Etag, so you have to, usually, rely on URI first, and then the Vary information. -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: Wrong etag sent with mod_deflate
tis 2006-12-12 klockan 09:20 -0500 skrev Brian Akins: Only conditional requests from clients, generally, have If-None-Match headers. Correct. It's a conditional. These days you also see them from Squid btw. So the only way for a cache, on an initial request from a client, to determine what object to serve is to use the Client supplied information - which doesn't include an Etag, so you have to, usually, rely on URI first, and then the Vary information. Indeed. This is always the case. If-None-Match MUST NOT be used for identification of which response to use. It's a conditional only. But the unique identity of the response entity is defined by request-URI + ETag and/or Content-Location. The cache is not supposed to evaluate Accept-* headers in determining the entity identity, only the origin server. The identity of the entity is important for - Cache correctness, making sure updates invalidate cached copies where needed. - Avoiding duplicated storage There may be any number of request header combinations in any Vary dimensions all mapping to the same entity. This logics is not at all unique for Accept-Encoding. The logics on how a cache is supposed to operate applies equal to all Vary indicated headers. The specs does not make any distinction between Accept-Encoding, Accept-Language, User-Agent etc in how caches are supposed to operate. It all boils down to the entity identified by URI + ETag and/or Content-Location as returned in 200 and 304 responses allowing the cache to map requests to entities. Please see RFC2616 13.6 Caching Negotiated Responses, it explains how the RFC intends that caches should operate wrt Vary, ETag and Content-Location in full detail. Regards Henrik signature.asc Description: Detta är en digitalt signerad meddelandedel
Re: Wrong etag sent with mod_deflate
This is not a response to any post on this subject, but more of a comment. Here is a real world example of how we use deflate and etags with our cache. (Note this is very similar to mod_cache, but I do not know the inner workings of it as well). 1. Generate key from URI and ap_get_servername 2. open cached object. Is it Vary? no, goto step 5. 3. Is Vary. Generate new key. 4. Open cached object. 5. Check expiry time, exit if expired. 6. Load headers. 7. Call ap_meets_conditions (etags, IMS, etc.) If yes, return 304 (or whatever). 8. If not meets_conditions, serve from cache. So, multiple variants of the same object can have the same Etag, but still be different cached objects. This probably has no bearing on the current conversation, but perhaps I am not fully appreciating the core of the debate?? -- Brian Akins Chief Operations Engineer Turner Digital Media Technologies
Re: Wrong etag sent with mod_deflate
Let me preface all comments by saying that I AGREE with BOTH Roy and Henrik... If Apache is sending the same exact (strong) ETag value for both a compressed and an identity variant of the same entity... then, according to current RFC content, that is broken behavior and it should be fixed. You can take the part of the RFC that talks specifically about how Weak Etags might seem ideal for compressed variants and argue that against Henrik's point of view that a compressed variant should ALWAYS be treated as a separate (unique) HTTP entity but I don't want to go there. Not now, anyway. Personally I tend to agree with the concept that even if DCE is employed ( Dynamic Content Encoding ) that any code that is doing DCE ( versus Transfer Encoding ) should make that dynamically generated entity appear as if it was simply a disk-based (separate) resource. DCE is, after all, just a magic trick. It is making it APPEAR to end-users as if compressed variants of entities actually physically exist and are being sent back to anyone ready/able/willing to receive them... ...and it's a GOOD TRICK, when done correctly. Roy wrote In other words, Henrik has it right. It is our responsibility to assign different etags to different variants because doing otherwise may result in errors on shared caches that use the etag as a variant identifier. See above. Totally agree. Justin wrote... As Kevin mentioned, Squid is only using the ETag and is ignoring the Vary header. That's the crux of the broken behavior on their part. Roy wrote... Then they will still be broken regardless of what we do here. It simply isn't a relevant issue. It's relevant to the extent that I think there are still some things missing from the RFCs with regards to all this which is why a piece of software like SQUID might be doing the wrong thing as well. Best way I could elaborate on that feeling is to just walk through Roy's scenario... Roy wrote... Unlike Squid, RFC compliance is part of our mission, at least when it isn't due to a bug in the spec. This is not a bug in the spec. A high-efficiency response cache is expected to have multiple representations of a given resource cached. No doubt. The cache key is the URI. Yes. If the set of varying header field values that generated the cached response is different from the request set, ...as when one browser asks for the a URI and sends Accept-encoding: gzip and another ask for the same URI and does NOT supply Accept-encoding: gzip... then a conditional GET request is made containing ALL of the cached entity tags in an If-None-Match field (in accordance with the Vary requirements). ...and, currently, if the cache has stored both a compressed and and non-compressed version of the same entity received from Apache ( sic: mod_deflate ) then the same ( strong ) ETag is returned in the conditional GET for both of the cached variants. Hmmm... begins to look like a problem... but is it really?... If the server says that any one of the representations, as indicated by the ETag in a 304 response, is okay, okay means fresh. In the case of a DCE encoded variant, an argument could be made here that it doesn't make a bit of difference if the ETag for the compressed or non-compressed variant is the 'same' or it is 'different'. All the cache really wants to know is Is the ORIGINAL ( uncompressed ) version of this response fresh or not? The compressed variant should ALWAYS be just the encoded version of the same original uncompressed entity. If the original uncompressed version ( indicated by strong ETag 1 ) is not fresh then there is no possible way for any compressed variant of the same entity ( marked by the same strong ETag 1 ) to be fresh. It's just not possible. So, in essence, when the Vary: has to do with just compression, then the compressed and uncompressed variants are married in a way that, perhaps, is not covered in the existing ETag RFC specifications. The ETag CAN/SHOULD be the same because there is no way for the original ( strong ETag ) to become not fresh without the other representation also becoming not fresh. These kinds of Variants are Synced in a way perhaps not ( currently ) covered by the ETag specs. then the cached representation with that entity tag is sent to the user-agent regardless of the Vary calculation. sent to means the cache has received it's 304 response and decided what it CAN/SHOULD send back to the user, right? Well... if you follow the argument above about how certain variants are synced together then even if two variants on the cache share the same strong ETag... then how can the cache send back the wrong thing or NOT pay attention to the Vary calculation on its end? I don't know the exact details of the exact field problem that Henrik is trying to solve but it seems to me that EVEN THOUGH the compressed and non-compressed variants might happen to share the same (strong) ETag... if SQUID is delivering
Re: Wrong etag sent with mod_deflate
On 12/09/2006 06:52 AM, Roy T. Fielding wrote: The best solution is to not mess with content-encoding at all, which gets us out of both this consistency problem and related problems with the entity-header fields (content-md5, signatures, etc.). That is why transfer encoding was invented in the first place. We should have an implementation of deflate as a transfer encoding, but it should be configurable independent of the existing filter. Some people will want TE specifically to avoid the addition of Vary and all the other problems that content-changing filters cause. For example, an additional directive option for CE, TE, or either. I think fixing the current CE filter is easier right now then to add the option above. I think this can be done in a second step and sounds like a good idea to me. The existing filter needs to modify the ETag field value (and any other entity-dependent values that we can think of) or be removed as a feature. Weak etags are not a solution -- being able to make range requests of large cached representations requires a strong etag, and it really isn't hard to provide one. It is better to not deflate the response at all than to interfere with caching. Would the following patch address all your points for a CE mod_deflate filter? Index: modules/filters/mod_deflate.c === --- modules/filters/mod_deflate.c (Revision 484803) +++ modules/filters/mod_deflate.c (Arbeitskopie) @@ -320,6 +320,7 @@ if (!ctx) { char *token; const char *encoding; +const char *etag; /* only work on main request/no subrequests */ if (r-main != NULL) { @@ -483,7 +484,26 @@ else { apr_table_mergen(r-headers_out, Content-Encoding, gzip); } +/* + * Unset headers which are no longer valid after we have compressed + * the content. + */ apr_table_unset(r-headers_out, Content-Length); +apr_table_unset(r-headers_out, Content-MD5); +/* Adjust ETag if present */ +etag = apr_table_get(r-headers_out, ETag); +if (etag) { +if (*etag) { +/* Remove the '' at the end of the ETag */ +etag[strlen(etag) - 1] = '\0'; +apr_table_set(r-headers_out, ETag, + apr_pstrcat(r-pool, etag, -gzip\, NULL)); +} +else { +/* Does not seem to be a valid ETag. So remove it. */ +apr_table_unset(r-headers_out, ETag); +} +} /* initialize deflate output buffer */ ctx-stream.next_out = ctx-buffer; Regards Rüdiger
Re: Wrong etag sent with mod_deflate
On 12/9/06, Ruediger Pluem [EMAIL PROTECTED] wrote: The existing filter needs to modify the ETag field value (and any other entity-dependent values that we can think of) or be removed as a feature. Weak etags are not a solution -- being able to make range requests of large cached representations requires a strong etag, and it really isn't hard to provide one. It is better to not deflate the response at all than to interfere with caching. Would the following patch address all your points for a CE mod_deflate filter? No - this patch breaks conditional GETs which is what I'm against. See the problem here is that you have to teach ap_meets_conditions() about this. An ETag of 1234-gzip needs to also satisfy a conditional request when the ETag when ap_meets_conditions() is run is 1234. In other words, ap_meets_conditions() also needs to strip -gzip if it is present before it does the ETag comparison. But, the issue is that there is no real way for us to implement this without a butt-ugly hack. However, I disagree with Roy in that we most certainly *do* treat the ETag values as opaque - Subversion has its own ETag values - Roy's position only works if you assume the core is assigning the ETag value which has a set format - not a third-party module. IMO, any valid solution that we deploy must work *independently* of what any module may set ETag to. It is perfectly valid for a 3rd-party module to include -gzip at the end of their ETag. For example, if you had a file called foo-gzip in revision 10, SVN would assign the ETag 10//foo-gzip. (And, I could construct a conflict where httpd would hork the ETag incorrectly for any arbitrary value.) -- justin
Re: Wrong etag sent with mod_deflate
On 12/9/06, Roy T. Fielding [EMAIL PROTECTED] wrote: The best solution is to not mess with content-encoding at all, which gets us out of both this consistency problem and related problems with the entity-header fields (content-md5, signatures, etc.). That is why transfer encoding was invented in the first place. We don't live in a world that uses Transfer Encoding for gzip. Firefox, MSIE, and Opera don't support it. So, dropping Content Encoding support in mod_deflate is a non-starter. We should have an implementation of deflate as a transfer encoding, but it should be configurable independent of the existing filter. Some people will want TE specifically to avoid the addition of Vary and all the other problems that content-changing filters cause. For example, an additional directive option for CE, TE, or either. As I said earlier, mod_deflate could respond if TE is sent - there's no need for a directive here. And it can sidestep the ETag violation there. It's a trivial addition to the current filter of just a few lines. And, it gives the one cache in the world that doesn't support Vary a way out. So, I feel that this resolves the RFC violation that Squid sees as long as it sends TE: gzip instead. The existing filter needs to modify the ETag field value (and any other entity-dependent values that we can think of) or be removed as a feature. Weak etags are not a solution -- being able to make range requests of large cached representations requires a strong etag, and it really isn't hard to provide one. It is better to not deflate the response at all than to interfere with caching. As Rudiger's patch shows, removing the ETag or appending junk in mod_deflate isn't enough - you have to teach ap_meets_conditions() how to know what it is that it's looking at. I'm against adding ugly hacks there that make it only know how to handle -gzip. (mod_deflate could in theory very well send deflate compression.) So, any solution within ap_meets_conditions() needs to be generic and not a one-off just for mod_deflate. In any case, I won't accept anyone's votes on this issue until there is a patch that can be voted on, and the technical considerations of security and correctness take priority over other trade-offs. RTC. The patch you have been outlining is straightforward - but ultimately broken because you haven't sketched a way to handle the ap_meets_conditions() problem. I'm merely informing you that I will veto any approach that breaks conditional GETs with real browsers. I couldn't care less what a broken proxy cache does (especially if we can give it way not to be broken) if it means that mod_deflate no longer supports browser caches. -- justin
Re: Wrong etag sent with mod_deflate
On 12/09/2006 03:23 PM, Justin Erenkrantz wrote: On 12/9/06, Ruediger Pluem [EMAIL PROTECTED] wrote: Would the following patch address all your points for a CE mod_deflate filter? No - this patch breaks conditional GETs which is what I'm against. Ok, to be honest my question was more directed to Roy than to you, to understand Roys ideas and plans from a patch level perspective. I was pretty sure that you would not like it as you have expressed it clearly before. See the problem here is that you have to teach ap_meets_conditions() about this. An ETag of 1234-gzip needs to also satisfy a conditional request when the ETag when ap_meets_conditions() is run is 1234. In other words, ap_meets_conditions() also needs to strip -gzip if it is present before it does the ETag comparison. But, the issue is that there is no real way for us to implement this without a butt-ugly hack. Thanks for giving the pointer to ap_meets_conditions. So content compressed by mod_deflate would not stand conditional requests based on ETags any longer. That would be bad. Would it help if we simply unset the ETag in mod_deflate? mod_filter does this in these situations or does this have any other nasty side effects? So what I understand from the current discussion is that 1. Using TE instead of CE would be RFC compliant and would relief us of much problems except the one that none of the major browsers can handle it and thus would effectively make mod_deflate useless. 2. There are two different points of view in the CE case: Roy and Henrik say that a strong ETag arriving at mod_deflate must be replaced with a different strong ETag within mod_deflate (e.g by adding -gzip to it), because as mod_deflate is doing CE the entities before and after mod_deflate are different and require different ETags. Justin OTH says that it is sufficient to convert a strong ETag into a weak one, right? Regards Rüdiger
Re: Wrong etag sent with mod_deflate
On 12/9/06, Ruediger Pluem [EMAIL PROTECTED] wrote: Thanks for giving the pointer to ap_meets_conditions. So content compressed by mod_deflate would not stand conditional requests based on ETags any longer. That would be bad. Would it help if we simply unset the ETag in mod_deflate? mod_filter does this in these situations or does this have any other nasty side effects? AIUI, many caches do not allow the response to be cached at all if it doesn't have an ETag. This is why it was brought up that not doing deflate at all might be better in some cases than removing the ETag. So what I understand from the current discussion is that 1. Using TE instead of CE would be RFC compliant and would relief us of much problems except the one that none of the major browsers can handle it and thus would effectively make mod_deflate useless. Right. 2. There are two different points of view in the CE case: Roy and Henrik say that a strong ETag arriving at mod_deflate must be replaced with a different strong ETag within mod_deflate (e.g by adding -gzip to it), because as mod_deflate is doing CE the entities before and after mod_deflate are different and require different ETags. Justin OTH says that it is sufficient to convert a strong ETag into a weak one, right? In the ideal world, I think a weak ETag would be the 'right' thing - however, the current spec doesn't allow conditional GETs to work with weak ETags. Therefore, to allow conditional GETs, mod_deflate can only produce strong ETags. However, to make conditional GETs work and to create a different ETag, the transformation has to be reversible - which I believe may become a sticking point. (BTW, I disagree with Roy and Henrik that the transformation that mod_deflate is applying changes the actual meaning of the content; but that's largely an irrelevant and academic point for this list.) HTH. -- justin
Re: Wrong etag sent with mod_deflate
On 12/09/2006 07:02 PM, Justin Erenkrantz wrote: On 12/9/06, Ruediger Pluem [EMAIL PROTECTED] wrote: Thanks for giving the pointer to ap_meets_conditions. So content compressed by mod_deflate would not stand conditional requests based on ETags any longer. That would be bad. Would it help if we simply unset the ETag in mod_deflate? mod_filter does this in these situations or does this have any other nasty side effects? AIUI, many caches do not allow the response to be cached at all if it doesn't have an ETag. This is why it was brought up that not doing AFAICS this is not the case for mod_cache. As long as at least one of the following headers is present mod_cache can cache the response if all other conditions needed are true: ETag Last-Modified Expires Regards Rüdiger
Re: Wrong etag sent with mod_deflate
fre 2006-12-08 klockan 15:35 -0800 skrev Justin Erenkrantz: As Kevin mentioned, Squid is only using the ETag and is ignoring the Vary header. That's the crux of the broken behavior on their part. If they want to point out minor RFC violations in Apache, then we can play that game as well. (mod_cache deals with this Vary/ETag case just fine, FWIW.) We are not at all ignoring Vary, but we are using If-None-Match to ask the server which one of the N already cached entities belonging to the resource URI is valid for this specific request, indirectly learning the server side content negotiation logics used. The compromise I'd be willing to accept is to have mod_deflate support the 'TE: gzip' request header and add 'gzip' to the Transfer-Encoding bit - and to prefer that over any Accept-Encoding bits that are sent. Would be a great move if you can not make it behave correct in the content space. But if you make mod_deflate behave according to the RFC then sending Content-Encoding: gzip is fine to me. But TE is a much better fit from the RFC point of view. The ETag can clearly remain the same in that case - even as a strong ETag. Yes. So, Squid can change to send along TE: gzip (if it isn't already). TE: gzip is likely to appear in 3.1. And, everyone else who sends Accept-Encoding gets the result in a way that doesn't pooch their cache if they try to do a later conditional request. As long as mod_deflate continues ignoring the RFC wrt ETag there will conflicts with various cache implementations. Is that acceptable? -- justin Intentionally not following a MUST level requirements in the RFC is not an acceptable solution in my eyes. For one thing even if you ignore everyone else it would make it impossible for Apache + mod_deflate to claim RFC 2616 HTTP/1.1 compliance. Regards Henrik signature.asc Description: Detta är en digitalt signerad meddelandedel
Re: Wrong etag sent with mod_deflate
lör 2006-12-09 klockan 15:23 +0100 skrev Justin Erenkrantz: See the problem here is that you have to teach ap_meets_conditions() about this. An ETag of 1234-gzip needs to also satisfy a conditional request when the ETag when ap_meets_conditions() is run is 1234. In other words, ap_meets_conditions() also needs to strip -gzip if it is present before it does the ETag comparison. But, the issue is that there is no real way for us to implement this without a butt-ugly hack. Be careful there.. Blindly stripping the decoration alone won't work out. Consider for example If-None-Match. In specific If-None-Match with the ETag of the gzip variant should only return 304 if the request would cause Apache to send the gzip:ed variant of the entity. If-None-Match: list of etags returns 304 with the single correct ETag if any of the ETags in the directive matches the current response to the current request. Regards Henrik signature.asc Description: Detta är en digitalt signerad meddelandedel
Re: Wrong etag sent with mod_deflate
lör 2006-12-09 klockan 19:02 +0100 skrev Justin Erenkrantz: AIUI, many caches do not allow the response to be cached at all if it doesn't have an ETag. Most still caches it, but for example Mozilla has bugs vrt Vary handling if there is no ETag and the conditions changes.. In the ideal world, I think a weak ETag would be the 'right' thing I don't have an opinion if you return a strong or weak ETag, but it must still be different than the ETag of the identity encoded object, not just the same ETag flagged as weak. Your main decision if the ETag on the mod_deflate generated entity should be weak or strong should be a) If the original entity is weak, then the mod_deflate generated one MUST be weak as well.. b) If mod_deflate can not be trusted to generate the exact same octet representation on each request then the ETag of the generated entity MUST be weak. Else the ETag SHOULD be strong. however, the current spec doesn't allow conditional GETs to work with weak ETags. Err.. Weak ETags is allowed in If-None-Match for GET/HEAD. Regards Henrik signature.asc Description: Detta är en digitalt signerad meddelandedel
Re: Wrong etag sent with mod_deflate
fre 2006-12-08 klockan 15:40 -0800 skrev Justin Erenkrantz: I think we all (hopefully) agree that a weak ETag is ideally what mod_deflate should add. Please read RFC2616 13.6 Caching Negotiated Responses for an in-depth description of how caches should handle Vary. And please stop lying about Squid. If you think something in our cache implementation of Vary/ETag is not right then say what and back it up with RFC reference. My base requirement is that you comply with If-None-Match. For this you MUST return a different ETag. It does not matter to me if it's weak or strong as the main concerns for a cache is GET/HEAD requests. Flagging the existing ETag as weak does not make it a different ETag as If-None-Match on GET/HEAD allows for the weak comparison function where weakness is ignored. 13.3.3 Weak and Strong Validators - The weak comparison function: in order to be considered equal, both validators MUST be identical in every way, but either or both of them MAY be tagged as weak without affecting the result. Regards Henrik signature.asc Description: Detta är en digitalt signerad meddelandedel
Re: Wrong etag sent with mod_deflate
lör 2006-12-09 klockan 05:44 -0500 skrev [EMAIL PROTECTED]: It's relevant to the extent that I think there are still some things missing from the RFCs with regards to all this which is why a piece of software like SQUID might be doing the wrong thing as well. Ater reading the RFC on this topic many many times I can not agree that it's that incomplete. The scheme set by the RFC is quite complete as long as you stay with strong ETags, allowing for cache correctness, update serialization and many good things. Situations requiring weak etags also works out pretty well for cache correctness thanks to If-None-Match, but not other operations as they are banned from both non-GET/HEAD requests and If-Match conditions. ...and, currently, if the cache has stored both a compressed and and non-compressed version of the same entity received from Apache ( sic: mod_deflate ) then the same ( strong ) ETag is returned in the conditional GET for both of the cached variants. Hmmm... begins to look like a problem... but is it really?... It is. See 13.6 Caching Negotiated Responses (all of it). And then skim over 14.26 If-None-Match, and finally read 10.3.5 304 Not Modified. Then piece them together. Also take note that nowhere is there any requirement on the cache to evaluate any server driven content negotiation inputs (Accept-XXX etc). This responsibility is fully at the origin server and reflected back via ETag. Caches evaluate Vary in finding the correct response entity. If the server says that any one of the representations, as indicated by the ETag in a 304 response, is okay, okay means fresh. Not only that, it also tells which entity among the N cached ones is valid to send as response to this request. happen to share the same (strong) ETag... if SQUID is delivering stale compressed variants when a 304 response says that the original identity variant is not fresh then that's just a colossal screw-up in the caching code itself. The 304 says Send the entity with the ETag XXX, its still fresh. Nothing more. If does not indicate if this is a identiy of gzip encoded, neither the content length, content type or anything other relevant to the actual content besides the ETag and/or Content-Location. Regardless of what the server says... how could you ever get into a situation where you would consider a compressed variant of an entity fresh when the identity version is now stale? As HTTP did not consider dynamic content encoding it sees the two entities as different objects (i.e. file and file.gz) and does not enforce a strict synchronization between the two. The only requirement set in the RFC is that the origin server SHOULD make sure the two representations on the server is in synch. is seriously confused even if the ETags are the same and the cache is sending back stale compressed variants when the identity variant ( strong ETag value ) is also stale. I don't know what condition you refer to here. the Squid cache (2.6) only remembers the last seen of the two as the later response with the same ETag overwrites the first.. There's still something missing from the specs or something. Not that I can tell. When an exact, literal interpretation of a spec tends to defy common sense... my instinct is to suspect the spec itself. In what way? There is something in your reasoning I don't get. DCE ( Dynamic Content Encoding ) is a valid concept even if it wasn't sufficiently imagined at the time the specs were codified. It works. It works WELL... and it is something that OUGHT to always be possible if the RFCs mean anything at all. And it is possible. Just that you need to pay attention to Content-Location ETag Content-MD5 as all of these is affected by dynamically altering the entity by server driven content negotiation with static or dynamic recoding of the entity. One of the main prime directives for developing Apache 2.0 at all was to finally re-org the IO stream so that schemes like DCE could be done more easily than were already being done in the 1.3.x framework. Mission was accomplished. Filtering was born. It would be a shame to consider abandoning one of the very concepts that gave birth to Apache 2.0 for the sake of a few more lines of code that could take it into the end zone. Agreed. No argument here. Transfer-encoding is about a DECADE overdue now. And as already indicated should be piece of cake to add to mod_deflate, and as HTTP support evolves in clients and caches is likely to lessen the complexity of dealing with mod_deflate and conditionals considerably. In the case of compressed entities it would still be a good idea to always add a standard header which indicates the original uncompressed content-length ( if it's possible to know it ). There is no such header in HTTP, but you are free to propose one. But it's worth noting that this information also exists in the gzip encoding. Current specs does not handle
Re: Wrong etag sent with mod_deflate
Justin wrote... No - this patch breaks conditional GETs which is what I'm against. See the problem here is that you have to teach ap_meets_conditions() about this. An ETag of 1234-gzip needs to also satisfy a conditional request when the ETag when ap_meets_conditions() is run is 1234. In other words, ap_meets_conditions() also needs to strip -gzip if it is present before it does the ETag comparison. But, the issue is that there is no real way for us to implement this without a butt-ugly hack. However, I disagree with Roy in that we most certainly *do* treat the ETag values as opaque - Subversion has its own ETag values - Roy's position only works if you assume the core is assigning the ETag value which has a set format - not a third-party module. IMO, any valid solution that we deploy must work *independently* of what any module may set ETag to. It is perfectly valid for a 3rd-party module to include -gzip at the end of their ETag. ...or -bzip2. mod_bzip2 has been working fine for almost a year now and presents the same issue Justin is talking about here. It (can) generate it's own ETag values, if you want it to ( configurable ), and ap_meets_conditions isn't going to know what to strip or not strip. Yours Kevin
Re: Wrong etag sent with mod_deflate
And please stop lying about Squid. C'mon Henrik. No one is intentionally trying to LIE about Squid. If you are referring to Justin quoting ME let me supply a big fat MEA CULPA here and say right now that I haven't looked at the SQUID Vary/ETag code since the last major release and I DO NOT KNOW FOR SURE what SQUID is doing ( or not doing ) if/when it sees the same (strong) ETag for both a compressed and an identity version of the same entity. Period. I DO NOT KNOW FER SURE. I should have made that perfectly clear along with any opinion previously offered. I apologize for that. I also DID already state clearly in another post... I don't know the exact details of the exact field problem that Henrik is trying to solve... Keyphrase --don't know the exact details In my other posts, I was suggesting, however, that even if an upstream content server ( Apache ) is not sending separate unique ETags I am still having a hard time understanding why that would cause SQUID to deliver the wrong Varied response back to the user. Something is nagging at me telling me that EVEN IF the same (strong) ETag happens to be on both a compressed and a non-compressed version of the SAME ENTITY that there shouldn't be a big problem in the field ( sic: A user not getting what they asked for ). A compressed version of an entity IS the same entity... for all intents and purposes... it just has compression applied. One cannot possibly become stale without the other also being stale at the same exact moment in time. If you think something in our cache implementation of Vary/ETag is not right then say what and back it up with RFC reference. At the moment... yes... I do... but if you read my other posts I also have a feeling the reason I can't quote you Verse and Chapter from an RFC is because I have a sneaking suspicion that there is something missing from the ETag/Vary scheme that can lead to problems like this... and it's NOT IN ANY RFC YET. It has something to do with being too literal about a spec and ignoring common sense. In other words... you may be doing exactly what hours and hours of reading an RFC seems to be telling you you SHOULD do... but there still might be something else that OUGHT to be done. I hope the discussion continues. This is something that has been lurking for years now and it needs to get resolved. There will always be the chance that some upstream server will ( mistakenly? ) keep the same (strong) ETag on a compressed variant. People are not perfect and they make mistakes. I still think that even when that happens any caching software should follow the be lenient in what you accpet and strict in what you send rule and still use the other information available to it ( sic: What the client really asked for and expects ) and do the right thing. Only the cache knows what the client is REALLY asking for. Yours... Kevin
Re: Wrong etag sent with mod_deflate
lör 2006-12-09 klockan 20:38 -0500 skrev [EMAIL PROTECTED]: If you are referring to Justin quoting ME let me supply a big fat MEA CULPA here and say right now that I haven't looked at the SQUID Vary/ETag code since the last major release and I DO NOT KNOW FOR SURE what SQUID is doing ( or not doing ) if/when it sees the same (strong) ETag for both a compressed and an identity version of the same entity. Thats not the problem. The problem is that Apache tells us that we should use whatever we got first on all subsequent responses. The chain of events leading to the problem is as follows: 1. We forward request A. Lets say this claims Accept-Encoding: gzip. 2. Apache mod_deflate returns an gzip:ed entity with ETag 6bf1f7-6-1b6d6340 and Vary: Accept-Encoding. 3. We get another request with a different Accept-Encoding value. This gets forwarded to Apache with an If-None-Match header telling the ETags of the entities we have, i.e. If-None-Match 6bf1f7-6-1b6d6340. 4. The entity hasn't changed and Apache responds with a 304 ETag 6bf1f7-6-1b6d6340 telling us that the valid response entity for this request is the previous received response with ETag 6bf1f7-6-1b6d6340, and any updated HTTP headers for that response. The problem arises in '4'. Period. I DO NOT KNOW FER SURE. Then stop saying that Squid is broken, does not implement X or broken clients such as Squid. All I ask. Fine to say that you do not understand why it is a problem for Squid. In my other posts, I was suggesting, however, that even if an upstream content server ( Apache ) is not sending separate unique ETags I am still having a hard time understanding why that would cause SQUID to deliver the wrong Varied response back to the user. Simply because Apache explicitly tells it do exactly that in it's 304 response. A compressed version of an entity IS the same entity... Nope. It's a different representation of the the same resource, but not the same entity in terms of HTTP. This is the key difference between Content-Encoding and Transfer-Encoding. Content-Encoding is a property of the entity. Transfer-Encoding is a property of how the message is sent, just like chunked, with no implications on the entity. The problem arises from trying to use Content-Encoding as if it was Transfer-Encoding. Many years ago we had the same discussion about Vary, and when dust settled all understood the problem about not sending correct Vary in the responses. Now as the cache implementation is evolving we are hitting the exact same problem again in a different form this time due to ETag collisions. I am sorry that we did not realize the full extent of the brokenness of these responses the first time when Vary was discussed. for all intents and purposes... it just has compression applied. One cannot possibly become stale without the other also being stale at the same exact moment in time. HTTP does not make this strict freshness relation between entities of the same URI, but thats a different question and generally not a big problem. At the moment... yes... I do... but if you read my other posts I also have a feeling the reason I can't quote you Verse and Chapter from an RFC is because I have a sneaking suspicion that there is something missing from the ETag/Vary scheme that can lead to problems like this... and it's NOT IN ANY RFC YET. And what I am saying is that Apache mod_deflate is violating a MUST level requirement on ETag in the RFC, thereby making the caching section of the same RFC break down. In other words... you may be doing exactly what hours and hours of reading an RFC seems to be telling you you SHOULD do... but there still might be something else that OUGHT to be done. And I am telling you that this part of the RFC is complete, save for the small detail that the server can not signal that both the compressed and identity encoding becomes stale when one changes, only one at a time. There will always be the chance that some upstream server will ( mistakenly? ) keep the same (strong) ETag on a compressed variant. True, there will always be non-compliant implementation out there in various forms, and they will continue causing problems at least for as long as it's about MUST level violations. In many cases (this one included) workarounds can be found, but that does not justify the ones being non-compliant to continue and intentionally being non-compliant when informed about the problem. People are not perfect and they make mistakes. I still think that even when that happens any caching software should follow the be lenient in what you accpet and strict in what you send rule and still use the other information available to it Which in this case is none. The only information we ever get from Apache is the ETag of the supposedly valid to use response, and possibly new freshness details about the same. ( sic: What the client really asked for and expects ) and do the right thing. Only the cache knows
Re: Wrong etag sent with mod_deflate
On 12/8/06, Henrik Nordstrom [EMAIL PROTECTED] wrote: No, that won't work. You still be just as non-conforming by doing that. But if mod_deflate may to produce different octet-level results on different requests for the same original entity then it must do this in addition to other transforms of the ETag. The identity and gzip encodings is not bidirectionally semantically equivalent, and additionally normal conditional comparing W/X to X is true. Uh, no, they *are* semantically equivalent - but, yes, not syntactically (bit-for-bit) equivalent. You inflate the response and you get exactly what the ETag originally represented. See RFC 2616 3.3.3 Weak and Strong Validators You must make the value of the ETag differ between the two entities. mod_deflate is clearly only doing a semantic (weak) transformation. -- justin
Re: Wrong etag sent with mod_deflate
On 12/8/06, Henrik Nordstrom [EMAIL PROTECTED] wrote: The protocol is quite fine as it is, and not easy to change. As it is now it's mainly a matter of understanding that mod_deflate does create a completely new entity from the original one. To the protocol it's exactly the same as when using mod_negotiate and having both the identity and gzip encoded entities on disk. The fact that you do this encoding on the fly is of no concern to HTTP. mod_deflate is certainly not creating a new resource - it's modifying the representation. There is no legitimate reason for it to modify the ETag other than to mark it as weak. That the caching bits in the RFC didn't understand this speak to the fact that it's quite subtle. -- justin
Re: Wrong etag sent with mod_deflate
fre 2006-12-08 klockan 14:47 +0100 skrev Justin Erenkrantz: mod_deflate is certainly not creating a new resource It is creating a new HTTP entity. Not a new object on your server, but still a new unique HTTP entity with different characteristics from the identity encoding. If we were talking about transfer-encoding then you would be correct as it only alter the encoding for transfer purposes and not the HTTP entity as such, but this is content-encoding. Content encoding is a property of the response entity. The main reason why things get blurred is because the creation of this entity is done on the fly instead of creating a new resource on the server like HTTP expects. As result you need to be very careful with the ETag and Content-Location headers. Not modifying ETag (including just making it weak) says that the identity and gzip encodings is semantically equivalent, and can be exchanged freely. In other words says it's fine to send gzip encoding to all clients (which we all know it's not). Not modifying/removing Content-Location is less harmful but will cause cache bouncing, as each time the cache sees a new response entity for a given URI any older ones with the same Content-Location will get removed from the cache. Regards Henrik signature.asc Description: Detta är en digitalt signerad meddelandedel
Re: Wrong etag sent with mod_deflate
fre 2006-12-08 klockan 14:40 +0100 skrev Justin Erenkrantz: Uh, no, they *are* semantically equivalent - but, yes, not syntactically (bit-for-bit) equivalent. You inflate the response and you get exactly what the ETag originally represented. To entities is only semantically equivalent if they can be interchanged freely at the HTTP level with no semantic difference in the end-user result. identiy and gzip encoding can not be said to bidirectionally have the same semantic meaning as a gzip encoded entity is pure rubbish to a recipient not understanding gzip. No more than a Swedish translation of a document could be said to be semantically equivalent to a Greek translation of the same document. Content-Encoding is a case of unidirectional semantic equivalence where the identity encoding can be substituted for the gzip encoding with kept semantics, but for ETag bidirectional semantic equivalence is required which is not fulfilled as gzip encoding can not be substituted for identity encoding without risking a significant semantic difference to the recipient. The only real difference of a weak etag compared to a strong one is that the weak one does not guarantee octet equality. All other restrictions apply. Plus a bunch of protocol restrictions where weak etags is not allowed to be used. Regards Henrik signature.asc Description: Detta är en digitalt signerad meddelandedel
Re: Wrong etag sent with mod_deflate
tor 2006-12-07 klockan 02:42 +0100 skrev Justin Erenkrantz: -1 on adding semantic junk to the existing ETag (and keeping it strong); that's blatantly uncool. Any generated ETag from mod_deflate should either be the original strong version or a weak version of any previous etag. mod_deflate by *definition* is just creating a weak version of the prior entity. You basically only have two choices: a) Make mod_deflate not send an ETag on modified responses. b) Modify the value (within the quotes) of the ETag somehow. And if mod_deflate can not be trusted to always return the same octet representation make sure to use an weak ETag unless the ETag generation is also tightly coupled to the octet representation guaranteing a different ETag should mod_deflate encode slightly different. And to be fully compliant you also need to pay attention to the Content-Location header. Here I don't see much choice but to not send Content-Location in mod_deflate mangled responses (but can be kept on the original response, no problem there). RFC 2616 13.6 Caching Negotiated Responses, last paragraph. mod_deflate does properly stick in the Vary header, so caches already have enough knowledge to know what's going on anyway even without a fix. (This is probably why mod_cache doesn't flag it as an error.) My opinion is to fix the protocol and move on... -- justin The protocol is quite fine as it is, and not easy to change. As it is now it's mainly a matter of understanding that mod_deflate does create a completely new entity from the original one. To the protocol it's exactly the same as when using mod_negotiate and having both the identity and gzip encoded entities on disk. The fact that you do this encoding on the fly is of no concern to HTTP. Another option is to explore the use gzip transfer encoding instead of content encodin. In transfer encoding none of these problems apply as it's done on the transport level and not entity level, but it's not that well supported in clients unfortunately.. Regards Henrik signature.asc Description: Detta är en digitalt signerad meddelandedel
Re: Wrong etag sent with mod_deflate
fre 2006-12-08 klockan 14:40 +0100 skrev Justin Erenkrantz: Uh, no, they *are* semantically equivalent - but, yes, not syntactically (bit-for-bit) equivalent. You inflate the response and you get exactly what the ETag originally represented. To entities is only semantically equivalent if they can be interchanged freely at the HTTP level with no semantic difference in the end-user result. identiy and gzip encoding can not be said to bidirectionally have the same semantic meaning as a gzip encoded entity is pure rubbish to a recipient not understanding gzip. No more than a Swedish translation of a document could be said to be semantically equivalent to a Greek translation of the same document. Content-Encoding is a case of unidirectional semantic equivalence where the identity encoding can be substituted for the gzip encoding with kept semantics, but for ETag bidirectional semantic equivalence is required which is not fulfilled as gzip encoding can not be substituted for identity encoding without risking a significant semantic difference to the recipient. The only real difference of a weak etag compared to a strong one is that the weak one does not guarantee octet equality. All other restrictions apply. Plus a bunch of protocol restrictions where weak etags is not allowed to be used. Regards Henrik signature.asc Description: Detta är en digitalt signerad meddelandedel
Re: Wrong etag sent with mod_deflate
Argh, my stupid ISP is losing apache email again because they use spamcop. On Dec 7, 2006, at 2:45 PM, Henrik Nordstrom wrote: tor 2006-12-07 klockan 02:42 +0100 skrev Justin Erenkrantz: -1 on adding semantic junk to the existing ETag (and keeping it strong); that's blatantly uncool. Any generated ETag from mod_deflate should either be the original strong version or a weak version of any previous etag. mod_deflate by *definition* is just creating a weak version of the prior entity. No, it is changing the content-encoding value, which is changing the entity. The purpose of etag for caching is two-fold: 1) for freshness checks, and 2) handling conditional range/authoring requests. That is why the spec is full of gobbledygook on etag handling -- it was stretched at the last minute to reuse a very simple freshness check as a form of variant identifier. What we should be doing is sending transfer-encoding, not content- encoding, and get past the chicken and egg dilemma of that feature in HTTP. If we are changing content-encoding, then we must behave as if there are two different files on the server representing the resource. That means tweaking the etag and being prepared to handle that tweak on future conditional requests. In other words, Henrik has it right. It is our responsibility to assign different etags to different variants because doing otherwise may result in errors on shared caches that use the etag as a variant identifier. Roy
Re: Wrong etag sent with mod_deflate
In other words, Henrik has it right. It is our responsibility to assign different etags to different variants because doing otherwise may result in errors on shared caches that use the etag as a variant identifier. Henrik is trying to make it sound like it is all Apache's fault. It is not. SQUID is screwing up, too. ...shared caches that use the etag as a variant identifier. To ONLY ever use ETag as a the end-all-be-all for variant identification is, itself, a mistake. If the Vary: field is present... then THAT is what the entity (also) Varies: on and to ignore that and only rely on ETag is a screw-up. I had this argument years ago with folks at the SQUID forum. It was just prior to when they ( finally ) got around to adding any support for Vary: at all but (limited) support for ETag:. Regardless of whether it's DCC ( Dynamic Content-Encoding ) or not... if the entity Varies: on Content-encoding: but some cache software is ignoring that just because it's ETag matches some other stored variant... well... that's just WRONG. Both pieces of software ( SQUID and Apache ) need just a little more code to finally get it right. Don't forget about Content-Length, either. If 2 different responses for the same requested entity come back with 2 different Content-Lengths and there is no Vary: or ETag then regardless of any other protocol semantics the only SANE thing for any caching software to do is to recoginze that, assume it is not a mistake, and REPLACE the existing entity with the new one. Yea.. sure... you might get a lot of cache bounce that way but at least you are returning a fresh copy. It is not possible for 2 EXACTLY identical reprsentations of the same requested entity to have different content lengths. If the lengths are different, then SOMETHING is different with regards to what you have in your cache. To ignore that reality as well ( which most caching software does ) is just kinda stupid. No protocol ( sic: set of rules ) can ever cover all the realities. ( Good ) software knows how to make common sense as well. Yours... Kevin Kiley In a message dated 12/8/2006 11:45:44 AM Pacific Standard Time, [EMAIL PROTECTED] writes: Argh, my stupid ISP is losing apache email again because they use spamcop. On Dec 7, 2006, at 2:45 PM, Henrik Nordstrom wrote: tor 2006-12-07 klockan 02:42 +0100 skrev Justin Erenkrantz: -1 on adding semantic junk to the existing ETag (and keeping it strong); that's blatantly uncool. Any generated ETag from mod_deflate should either be the original strong version or a weak version of any previous etag. mod_deflate by *definition* is just creating a weak version of the prior entity. No, it is changing the content-encoding value, which is changing the entity. The purpose of etag for caching is two-fold: 1) for freshness checks, and 2) handling conditional range/authoring requests. That is why the spec is full of gobbledygook on etag handling -- it was stretched at the last minute to reuse a very simple freshness check as a form of variant identifier. What we should be doing is sending transfer-encoding, not content- encoding, and get past the chicken and egg dilemma of that feature in HTTP. If we are changing content-encoding, then we must behave as if there are two different files on the server representing the resource. That means tweaking the etag and being prepared to handle that tweak on future conditional requests. In other words, Henrik has it right. It is our responsibility to assign different etags to different variants because doing otherwise may result in errors on shared caches that use the etag as a variant identifier. Roy
Re: Wrong etag sent with mod_deflate
fre 2006-12-08 klockan 15:03 -0500 skrev [EMAIL PROTECTED]: To ONLY ever use ETag as a the end-all-be-all for variant identification is, itself, a mistake. Well, this area of the HTTP specs is pretty clear in my eyes, but then I have read it up and down too many times unwinding the tangled web which is found in there. An entity (including encoding) is identified by request URI + Content-Location. A specific version of a entity is identified by it's unique ETag. Vary: tells which headers the server used in server driven negotiation of which entity to respond with. Accept-Encoding is one input to this. A strong ETag must be unique among all variants of a given URI, that is all different forms of entities that may reside under the URI and all their past and future versions. A weak ETag may be shared by two variants/versions if and only if they can be considered semantically equivalent and mutually exchangeable at the HTTP level with no semantic loss. For example different levels of compression, or minor changes of negligible or no importance to the semantics of the resource (hit counter example in the specs). Both pieces of software ( SQUID and Apache ) need just a little more code to finally get it right. It's correct that the current Squid implementation is not flawless. Most notably it has very poor handling of cache invalidations at the moment. Don't forget about Content-Length, either. If 2 different responses for the same requested entity come back with 2 different Content-Lengths and there is no Vary: or ETag then regardless of any other protocol semantics the only SANE thing for any caching software to do is to recoginze that, assume it is not a mistake, and REPLACE the existing entity with the new one. Caches tend to by nature replace what they have with what they get. Yea.. sure... you might get a lot of cache bounce that way but at least you are returning a fresh copy. How would Content-Length changes cause cache bouncing? It is not possible for 2 EXACTLY identical reprsentations of the same requested entity to have different content lengths. If the lengths are different, then SOMETHING is different with regards to what you have in your cache. Yes, but when would this be seen? We only get the ETag from Apache, not the Content-Length. Specs forbids Apache from sending the Content-Length or other entity headers in 304 responses partly to make sure entities do not get corrupted by errors in the origin server side implementation of server driven content negotiation. No protocol ( sic: set of rules ) can ever cover all the realities. ( Good ) software knows how to make common sense as well. Indeed and is why we are going slow on implementing the more advanced features of the specs. But violating MUST level protocol requirements is not common sense. And if you actually follow the specs these parts do make great sense once you get the picture that ETags MUST be unique for all entity versions of a given URI. The only poor part I have seen in this area of the specs is that the If-None-Match condition is perhaps a bit blunt only telling the end results, the ETag of the valid response entity of a negotiated resource, not how the server came to that conclusion. This adds a bit more roundtrips to the origin than would be required only to figure out that Content-Language: en is ok both for Accept-Language: en and Accept-Language: en, sv, but thats about it. (yes, I intentioanlly avoided Accept-Encoding here to illustrate the point, the mechanism is the exact same however). RFC 2616 3.11 Entity Tags A strong entity tag MAY be shared by two entities of a resource only if they are equivalent by octet equality. An entity tag MUST be unique across all versions of all entities associated with a particular resource. A given entity tag value MAY See also 14.26 If-None-Match, and numerous other references to ETag. I can bombard you with long chains of supporting claims from the RFC if you like depending on which parts of the equation you feel is loosely connected. Just tell me which part you don't trust and I'll happily help you see the light. a) That identity and gzip content-encoding of the same resource represents different entities of the same resource b) That different entities of the same resource MUST have different (strong) ETags. c) That gzip and identity encoding is not semantically equivalent. d) That the weak ETag W/X is semantically equivalent to the strong ETag X with the same quoted value. Regards Henrik signature.asc Description: Detta är en digitalt signerad meddelandedel
Re: Wrong etag sent with mod_deflate
fre 2006-12-08 klockan 11:44 -0800 skrev Roy T. Fielding: In other words, Henrik has it right. It is our responsibility to assign different etags to different variants because doing otherwise may result in errors on shared caches that use the etag as a variant identifier. Thanks ;-) Regards Henrik signature.asc Description: Detta är en digitalt signerad meddelandedel
Re: Wrong etag sent with mod_deflate
fre 2006-12-08 klockan 22:28 +0100 skrev Henrik Nordstrom: A strong ETag must be unique among all variants of a given URI, that is all different forms of entities that may reside under the URI and all their past and future versions. Forgot the last piece there which clears many doubts: Entities from different URIs may share the same ETag (or even Content-Location) with no implications on any form of equivalence between the two. Also I am sorry that my use of terms is a bit messed up wrt entity vs variant vs version, but so is the specs.. Regards Henrik signature.asc Description: Detta är en digitalt signerad meddelandedel
Re: Wrong etag sent with mod_deflate
On 12/8/06, Roy T. Fielding [EMAIL PROTECTED] wrote: What we should be doing is sending transfer-encoding, not content- encoding, and get past the chicken and egg dilemma of that feature in HTTP. If we are changing content-encoding, then we must behave as if there are two different files on the server representing the resource. That means tweaking the etag and being prepared to handle that tweak on future conditional requests. There's just no way to know how to handle any ETag modification on future requests. So, that's a non-starter. Therefore, any fix for this edge case which breaks cacheability in the common case of real browsers I would find unacceptable. In other words, Henrik has it right. It is our responsibility to assign different etags to different variants because doing otherwise may result in errors on shared caches that use the etag as a variant identifier. As Kevin mentioned, Squid is only using the ETag and is ignoring the Vary header. That's the crux of the broken behavior on their part. If they want to point out minor RFC violations in Apache, then we can play that game as well. (mod_cache deals with this Vary/ETag case just fine, FWIW.) The compromise I'd be willing to accept is to have mod_deflate support the 'TE: gzip' request header and add 'gzip' to the Transfer-Encoding bit - and to prefer that over any Accept-Encoding bits that are sent. The ETag can clearly remain the same in that case - even as a strong ETag. So, Squid can change to send along TE: gzip (if it isn't already). And, everyone else who sends Accept-Encoding gets the result in a way that doesn't pooch their cache if they try to do a later conditional request. Is that acceptable? -- justin
Re: Wrong etag sent with mod_deflate
On 12/8/06, Henrik Nordstrom [EMAIL PROTECTED] wrote: A strong ETag must be unique among all variants of a given URI, that is all different forms of entities that may reside under the URI and all their past and future versions. A weak ETag may be shared by two variants/versions if and only if they can be considered semantically equivalent and mutually exchangeable at the HTTP level with no semantic loss. For example different levels of compression, or minor changes of negligible or no importance to the semantics of the resource (hit counter example in the specs). I think we all (hopefully) agree that a weak ETag is ideally what mod_deflate should add. But, the specs simply dropped the ball here as doing that breaks conditional requests. If we could issue a weak ETag and have it work for conditional requests, this would be easy and be done by now. We can't, so I would much prefer that we don't break conditional requests just because mod_deflate is in use. I also don't believe we can come up with a reversible ETag semantic without rewriting big chunks of code or introducing butt-ugly hacks. Apache has always treated the ETag as opaque (except for W/) - to do otherwise is to bust large assumptions. -- justin
Re: Wrong etag sent with mod_deflate
On Dec 8, 2006, at 3:35 PM, Justin Erenkrantz wrote: On 12/8/06, Roy T. Fielding [EMAIL PROTECTED] wrote: What we should be doing is sending transfer-encoding, not content- encoding, and get past the chicken and egg dilemma of that feature in HTTP. If we are changing content-encoding, then we must behave as if there are two different files on the server representing the resource. That means tweaking the etag and being prepared to handle that tweak on future conditional requests. There's just no way to know how to handle any ETag modification on future requests. So, that's a non-starter. Therefore, any fix for this edge case which breaks cacheability in the common case of real browsers I would find unacceptable. It isn't necessary to handle any ETag modification -- our ETag generation is fairly limited and is not opaque to the server. We only need to avoid conflicts between the content-encoded variant and the non-encoded variant, which is guaranteed if the encoded variant has -gzip appended to the existing entity-tag. That will work fine with the common case of real browsers -- far better than the current case which will deliver invalid content if a browser tries to complete a partial download from a cache. In other words, Henrik has it right. It is our responsibility to assign different etags to different variants because doing otherwise may result in errors on shared caches that use the etag as a variant identifier. As Kevin mentioned, Squid is only using the ETag and is ignoring the Vary header. That's the crux of the broken behavior on their part. Then they will still be broken regardless of what we do here. It simply isn't a relevant issue. If they want to point out minor RFC violations in Apache, then we can play that game as well. (mod_cache deals with this Vary/ETag case just fine, FWIW.) Unlike Squid, RFC compliance is part of our mission, at least when it isn't due to a bug in the spec. This is not a bug in the spec. A high-efficiency response cache is expected to have multiple representations of a given resource cached. The cache key is the URI. If the set of varying header field values that generated the cached response is different from the request set, then a conditional GET request is made containing ALL of the cached entity tags in an If-None-Match field (in accordance with the Vary requirements). If the server says that any one of the representations, as indicated by the ETag in a 304 response, is okay, then the cached representation with that entity tag is sent to the user-agent regardless of the Vary calculation. In short, if we have two active representations that have the same etag, then we have violated the spec and created an unnecessary interoperability problem: If the selecting request header fields for the cached entry do not match the selecting request header fields of the new request, then the cache MUST NOT use a cached entry to satisfy the request unless it first relays the new request to the origin server in a conditional request and the server responds with 304 (Not Modified), including an entity tag or Content-Location that indicates the entity to be used. If an entity tag was assigned to a cached representation, the forwarded request SHOULD be conditional and include the entity tags in an If-None-Match header field from all its cache entries for the resource. This conveys to the server the set of entities currently held by the cache, so that if any one of these entities matches the requested entity, the server can use the ETag header field in its 304 (Not Modified) response to tell the cache which entry is appropriate. If the entity-tag of the new response matches that of an existing entry, the new response SHOULD be used to update the header fields of the existing entry, and the result MUST be returned to the client. In other words, the conditional request containing all of the entity tags satisfies the semantics of Vary when the server responds with 304 and one of those entity tags. And, no, mod_cache doesn't deal with it -- it just isn't a very efficient cache. The compromise I'd be willing to accept is to have mod_deflate support the 'TE: gzip' request header and add 'gzip' to the Transfer-Encoding bit - and to prefer that over any Accept-Encoding bits that are sent. The ETag can clearly remain the same in that case - even as a strong ETag. So, Squid can change to send along TE: gzip (if it isn't already). And, everyone else who sends Accept-Encoding gets the result in a way that doesn't pooch their cache if they try to do a later conditional request. Is that acceptable? -- justin The best solution is to not mess with content-encoding at all, which gets us out of both this consistency problem and related problems with the entity-header fields (content-md5, signatures, etc.). That is why transfer encoding was invented in the first place. We should have an
Re: Wrong etag sent with mod_deflate
tor 2006-12-07 klockan 02:31 +0100 skrev Justin Erenkrantz: mod_deflate should just add the W/ prefix if it's not already there. -- justin No, that won't work. You still be just as non-conforming by doing that. But if mod_deflate may to produce different octet-level results on different requests for the same original entity then it must do this in addition to other transforms of the ETag. The identity and gzip encodings is not bidirectionally semantically equivalent, and additionally normal conditional comparing W/X to X is true. See RFC 2616 3.3.3 Weak and Strong Validators You must make the value of the ETag differ between the two entities. Regards Henrik signature.asc Description: Detta är en digitalt signerad meddelandedel
Re: Wrong etag sent with mod_deflate
tor 2006-12-07 klockan 02:42 +0100 skrev Justin Erenkrantz: -1 on adding semantic junk to the existing ETag (and keeping it strong); that's blatantly uncool. Any generated ETag from mod_deflate should either be the original strong version or a weak version of any previous etag. mod_deflate by *definition* is just creating a weak version of the prior entity. You basically only have two choices: a) Make mod_deflate not send an ETag on modified responses. b) Modify the value (within the quotes) of the ETag somehow. And if mod_deflate can not be trusted to always return the same octet representation make sure to use an weak ETag unless the ETag generation is also tightly coupled to the octet representation guaranteing a different ETag should mod_deflate encode slightly different. And to be fully compliant you also need to pay attention to the Content-Location header. Here I don't see much choice but to not send Content-Location in mod_deflate mangled responses (but can be kept on the original response, no problem there). RFC 2616 13.6 Caching Negotiated Responses, last paragraph. mod_deflate does properly stick in the Vary header, so caches already have enough knowledge to know what's going on anyway even without a fix. (This is probably why mod_cache doesn't flag it as an error.) My opinion is to fix the protocol and move on... -- justin The protocol is quite fine as it is, and not easy to change. As it is now it's mainly a matter of understanding that mod_deflate does create a completely new entity from the original one. To the protocol it's exactly the same as when using mod_negotiate and having both the identity and gzip encoded entities on disk. The fact that you do this encoding on the fly is of no concern to HTTP. Another option is to explore the use gzip transfer encoding instead of content encodin. In transfer encoding none of these problems apply as it's done on the transport level and not entity level, but it's not that well supported in clients unfortunately.. Regards Henrik signature.asc Description: Detta är en digitalt signerad meddelandedel
Re: Wrong etag sent with mod_deflate
Roy T. Fielding wrote: Protocol issues really should be brought up on the dev list, with an appropriate subject, and not left in bugzilla. FWIW, there was a dev list thread on this 3 years ago with the subject mod_deflate and transfer / content encoding problem. http://www.mail-archive.com/dev@httpd.apache.org/msg18366.html
Re: Wrong etag sent with mod_deflate
On 12/7/06, Roy T. Fielding [EMAIL PROTECTED] wrote: Entities gzip:ed by mod_deflate still carries the same ETag as the plain entiy, causing inconsistency in ETag aware proxy caches. I'll have a look later and see if I can fix it, but let me know if there is already a patch in the works (that doesn't rely on mod_filter). mod_deflate should just add the W/ prefix if it's not already there. -- justin
Re: Wrong etag sent with mod_deflate
On 12/7/06, Justin Erenkrantz [EMAIL PROTECTED] wrote: mod_deflate should just add the W/ prefix if it's not already there. -- justin But, that'll break caches as we're not allowed to serve If-Match with weak entity tags. Feh. -1 on adding semantic junk to the existing ETag (and keeping it strong); that's blatantly uncool. Any generated ETag from mod_deflate should either be the original strong version or a weak version of any previous etag. mod_deflate by *definition* is just creating a weak version of the prior entity. mod_deflate does properly stick in the Vary header, so caches already have enough knowledge to know what's going on anyway even without a fix. (This is probably why mod_cache doesn't flag it as an error.) My opinion is to fix the protocol and move on... -- justin