ETag and Content-Encoding
http://issues.apache.org/bugzilla/show_bug.cgi?id=39727 We have some controversy surrounding this bug, and bugzilla has turned into a technical discussion that belongs here. Fundamental question: Does a weak ETag preclude (negotiated) changes to Content-Encoding? Summary: Original bug: mod_deflate may compress/decompress content but leave an existing ETag in place. [ various discussion followed ] Yesterday: I committed a fix to /trunk/, assuming it would be uncontroversial. The fix is that any existing ETag should be made a weak ETag if mod_deflate either inflates or deflates the contents. Rationale: a weak ETag promises equivalent but not byte-by-byte identical contents, and that's exactly what you have with mod_deflate. Henrik Nordstrom commented: "Not sufficient. The two versions is not semantically equivalen as one can not be exchanged for the other without breaking the protocol. In the context of If-None-Match the weak comparator is used in HTTP and there a strong ETag is equal to a weak ETag." Further discussion followed. I won't repost it here in full, but since there clearly is an issue, it needs discussing here. Cc: folks subscribed to the bug. -- Nick Kew Application Development with Apache - the Apache Modules Book http://www.apachetutor.org/
Re: ETag and Content-Encoding
On ons, 2007-10-03 at 14:23 +0100, Nick Kew wrote: > http://issues.apache.org/bugzilla/show_bug.cgi?id=39727 > > We have some controversy surrounding this bug, and bugzilla > has turned into a technical discussion that belongs here. > > Fundamental question: Does a weak ETag preclude (negotiated) > changes to Content-Encoding? A weak etag means the response is semantically equivalent both at protocol and content level, and may be exchanged freely. Two resource variants with different content-encoding is not semantically equivalent as the recipient may not be able to understand an variant sent with an incompatible encoding. Sending a weak ETag do not signal that there is negotiation taking place (Vary does that), all it signals is that there may be multiple but fully compatible versions of the entity variant in circulation, or that each request results in a slightly different object where the difference has no practical meaning (i.e. embedded non-important timestamp or similar). > deflates the contents. Rationale: a weak ETag promises > equivalent but not byte-by-byte identical contents, and > that's exactly what you have with mod_deflate. I disagree. It's two very different entities. Note: If mod_deflate is deterministic and always returning the exact same encoded version then using a strong ETag is correct. What this boils down to in the end is a) HTTP must be able to tell if an already cached variant is valid for a new request by using If-None-Match. This means that each negotiated entity needs to use a different ETag value. Accept-Encoding is no different in this any of the other inputs to content negotiation. b) If the object undergo some transformation that is not deterministic then the ETag must be weak to signify that byte-equivalence can not be guaranteed. Note regarding a: The weak/strong property of the ETag has no significance here. If-None-Match uses the weak comparision function where only the value is compared, not the strength. See 13.3.3 paragraph "The weak comparison function". Regards Henrik signature.asc Description: This is a digitally signed message part
Re: ETag and Content-Encoding
On 10/03/2007 03:23 PM, Nick Kew wrote: > http://issues.apache.org/bugzilla/show_bug.cgi?id=39727 > > We have some controversy surrounding this bug, and bugzilla > has turned into a technical discussion that belongs here. > > Fundamental question: Does a weak ETag preclude (negotiated) > changes to Content-Encoding? > > Summary: > > Original bug: mod_deflate may compress/decompress content > but leave an existing ETag in place. > > [ various discussion followed ] > > Yesterday: I committed a fix to /trunk/, assuming it would > be uncontroversial. The fix is that any existing ETag should > be made a weak ETag if mod_deflate either inflates or > deflates the contents. Rationale: a weak ETag promises > equivalent but not byte-by-byte identical contents, and > that's exactly what you have with mod_deflate. > > Henrik Nordstrom commented: > > "Not sufficient. The two versions is not semantically equivalen as one > can not be exchanged for the other without breaking the protocol. In > the context of If-None-Match the weak comparator is used in HTTP and > there a strong ETag is equal to a weak ETag." > > Further discussion followed. I won't repost it here in full, but > since there clearly is an issue, it needs discussing here. Currently I share your opinion that a weak etag should fix the issue (besides ap_meets_condition currently does not work correctly with weak etags, but this is another story). OTOH I try to understand why Henrik thinks it is not sufficient. Ok, before the patch we had the following situation: Depending on the client httpd sent an uncompressed or an compressed response with the *same* (possibly) strong ETag and a Vary: Accept-Encoding header. A cache in the line stored the response and because both responses had the *same* (possibly) strong ETag it only stored it *once* (either the compressed or uncompressed version) and in fact ignored the Vary header. So if a client requested that resource from the cache either conditional (If-none-match) or unconditional it delivered what it had in stock ignoring the Accept-Encoding header of the client. Now after the patch we have the following situation: Depending on the client httpd sends an uncompressed or an compressed response with the original ETag if it does not modify the response and with a weak version of the ETag if does compress / uncompress the response. In any case it sets a Vary: Accept-Encoding header. Ok, sending the original ETag if we do not alter the response might be an error, but lets assume we do not and sent a weak version of the original ETag in both cases (altering the response / not altering the response). Does this allow the cache in the line to store it only *once* and ignoring the Vary header? If yes, then the fix is not sufficient, but if a weak ETag forces the cache to store each variant based on the Vary header than it should work. Regards RĂ¼diger
Re: ETag and Content-Encoding
On Oct 3, 2007 7:20 AM, Henrik Nordstrom <[EMAIL PROTECTED]> wrote: > > deflates the contents. Rationale: a weak ETag promises > > equivalent but not byte-by-byte identical contents, and > > that's exactly what you have with mod_deflate. > > I disagree. It's two very different entities. As before, I still don't understand why Vary is not sufficient to allow real-world clients to differentiate here. If Squid is ignoring Vary, then it does so at its own peril - regardless of ETags. The problem with trying to invent new ETags is that we'll almost certainly break conditional requests and I find that a total non-starter. Your suggestion of appending ";gzip" leaks information that doesn't belong in the ETag - as it is quite possible for that to appear in a valid ETag from another source - for example, it is trivial to make Subversion generate ETags containing that at the end - this would create nasty false positives and corrupt Subversion's conditional request checks. Plus, rewriting every filter to append or delete a 'special' marker in the ETag is bound to make the situation way worse. -- justin
Re: ETag and Content-Encoding
On Oct 3, 2007, at 7:20 AM, Henrik Nordstrom wrote: On ons, 2007-10-03 at 14:23 +0100, Nick Kew wrote: http://issues.apache.org/bugzilla/show_bug.cgi?id=39727 We have some controversy surrounding this bug, and bugzilla has turned into a technical discussion that belongs here. Fundamental question: Does a weak ETag preclude (negotiated) changes to Content-Encoding? A weak etag means the response is semantically equivalent both at protocol and content level, and may be exchanged freely. Two resource variants with different content-encoding is not semantically equivalent as the recipient may not be able to understand an variant sent with an incompatible encoding. That is not true. The weak etag is for content that has changed but is just as good a response content as would have been received. In other words, protocol equivalence is irrelevant. Sending a weak ETag do not signal that there is negotiation taking place (Vary does that), all it signals is that there may be multiple but fully compatible versions of the entity variant in circulation, or that each request results in a slightly different object where the difference has no practical meaning (i.e. embedded non-important timestamp or similar). Yes. Compression has no practical meaning. deflates the contents. Rationale: a weak ETag promises equivalent but not byte-by-byte identical contents, and that's exactly what you have with mod_deflate. I disagree. It's two very different entities. That is irrelevant. What matters is the resource semantics, not the message bits. Every bit can change randomly and still be semantically equivalent to a resource representation of random bits. Note: If mod_deflate is deterministic and always returning the exact same encoded version then using a strong ETag is correct. What this boils down to in the end is a) HTTP must be able to tell if an already cached variant is valid for a new request by using If-None-Match. This means that each negotiated entity needs to use a different ETag value. Accept-Encoding is no different in this any of the other inputs to content negotiation. That is not HTTP. Don't confuse the needs of caching with the needs of range requests -- only range requests need strong etags. b) If the object undergo some transformation that is not deterministic then the ETag must be weak to signify that byte-equivalence can not be guaranteed. Note regarding a: The weak/strong property of the ETag has no significance here. If-None-Match uses the weak comparision function where only the value is compared, not the strength. See 13.3.3 paragraph "The weak comparison function". As intended, Roy
Re: ETag and Content-Encoding
On Oct 3, 2007, at 7:53 AM, Justin Erenkrantz wrote: The problem with trying to invent new ETags is that we'll almost certainly break conditional requests and I find that a total non-starter. Your suggestion of appending ";gzip" leaks information that doesn't belong in the ETag - as it is quite possible for that to appear in a valid ETag from another source - for example, it is trivial to make Subversion generate ETags containing that at the end - this would create nasty false positives and corrupt Subversion's conditional request checks. Plus, rewriting every filter to append or delete a 'special' marker in the ETag is bound to make the situation way worse. -- justin I don't see how that is possible, unless subversion is depending on content-encoding to twiddle between compressed and uncompressed transfer without changing the etag. In that case, subversion will be broken, as would any poster child for misusing content-encoding as a transfer encoding. Roy
Re: ETag and Content-Encoding
On Oct 3, 2007 12:19 PM, Roy T. Fielding <[EMAIL PROTECTED]> wrote: > I don't see how that is possible, unless subversion is depending > on content-encoding to twiddle between compressed and uncompressed > transfer without changing the etag. In that case, subversion will be > broken, as would any poster child for misusing content-encoding as > a transfer encoding. I don't understand - why should Subversion care? It doesn't know anything related to gzip - that's purely mod_deflate's job. The issue here is that mod_dav_svn generates an ETag (based off rev num and path) and that ETag can be later used to check for conditional requests. But, if mod_deflate always strips a 'special' tag from the ETag (per Henrik), then by the time that mod_dav_svn sees it, the tag could be corrupt - as that special tag could have been part of a valid ETag produced by mod_dav_svn as we've *never* placed restrictions on the format of the ETag produced by our modules. -- justin
Re: ETag and Content-Encoding
On ons, 2007-10-03 at 07:53 -0700, Justin Erenkrantz wrote: > As before, I still don't understand why Vary is not sufficient to > allow real-world clients to differentiate here. If Squid is ignoring > Vary, then it does so at its own peril - regardless of ETags. See RFC2616 13.6 Caching Negotiated Responses and you should understand why returing an unique ETag on each variant is very important. (yes, the gzip and identity content-encoded responses is two different variants of the same resource, see earlier discussions if you don't agree on that). But yes, thinking over this a second time converting the ETag to a weak ETag is sufficient to plaster over the problem assuming the original ETag is a strong one. Not because it's correct from a protocol perspective, but becase Apache do not use the weak compare function when processing If-None-Match so in Apache's world changing a strong ETag to a weak one is about the same as assigning a new ETag. However, if the original ETag is already weak then the problem remains exactly as it is today.. Also it's also almost the same as deleting the ETag as you also destroy If-None-Match processing of filtered responses, which also is why it works.. > The problem with trying to invent new ETags is that we'll almost > certainly break conditional requests and I find that a total > non-starter. Only because your processing of conditional requests is broken. See earlier discussions on the topic of this bug already covering this aspect. To work proper the conditionals needs to (logically) be processed when the response entity is known, this is after mod_deflate (or another filter) does it's dance to transform the response headers. Doing conditionals before the actual response headers is known is very errorprone and likely to cause false matches as you don't know this is the response which will be sent to the requestor. > Your suggestion of appending ";gzip" leaks information > that doesn't belong in the ETag - as it is quite possible for that to > appear in a valid ETag from another source - for example, it is > trivial to make Subversion generate ETags containing that at the end - > this would create nasty false positives and corrupt Subversion's > conditional request checks. Then use something stronger, less likely to be seen in the original etag. Or fix the filter architecture to deal with conditionals proper making this question ("collisions") pretty much a non-issue. Or until conditionals can be processed correctly in precense of filters drop the ETag on filtered responses where the filter do some kind of negotiation. > Plus, rewriting every filter to append or > delete a 'special' marker in the ETag is bound to make the situation > way worse. -- justin I don't see much choice if you want to comply with the RFC requirements. The other choice is to drop the ETag header on such responses, which also is not a nice thing but at least complying with the specifications making it better than sending out the same ETag on incompatible responses from the same resource. Regards Henrik signature.asc Description: This is a digitally signed message part
Re: ETag and Content-Encoding
On ons, 2007-10-03 at 13:29 -0700, Justin Erenkrantz wrote: > The issue here is that mod_dav_svn generates an ETag (based off rev > num and path) and that ETag can be later used to check for conditional > requests. But, if mod_deflate always strips a 'special' tag from the > ETag (per Henrik), That was only a suggestion on how you may work around your somewhat limited conditional processing capabilities wrt filters like mod_deflate, but I think it's probably the cleanest approach considering the requirements of If-Match and modifying methods (PUT, DELETE, PROPATCH etc). In that construct the tag added to the ETag by mod_deflate (or another entity transforming filter) needs to be sufficiently unique that it is not likely to be seen in the original ETag value. It's not easy to fulfill the needs of all components when doing dynamic entity transformations, especially when there is negotiation involved.. Regards Henrik signature.asc Description: This is a digitally signed message part
Re: ETag and Content-Encoding
On ons, 2007-10-03 at 12:10 -0700, Roy T. Fielding wrote: > > Two resource variants with different content-encoding is not > > semantically equivalent as the recipient may not be able to understand > > an variant sent with an incompatible encoding. > > That is not true. The weak etag is for content that has changed but > is just as good a response content as would have been received. > In other words, protocol equivalence is irrelevant. By protocol semantic equivalence I mean responses being acceptable to requests. Example: Two negotiated responses with different Content-Encoding is not semantically equivalent at the HTTP level as their negotiation properties is different, and one can not substitute one for the other and expect that HTTP works. But two compressed response entities with different compression level depending on the CPU load is. Note: Ignoring transfer-encoding here as it's transport and pretty much irrelevant to the operations of the protocol other than wire message encoding/decoding. > > a) HTTP must be able to tell if an already cached variant is valid > > for a > > new request by using If-None-Match. This means that each negotiated > > entity needs to use a different ETag value. Accept-Encoding is no > > different in this any of the other inputs to content negotiation. > > That is not HTTP. Don't confuse the needs of caching with the needs > of range requests -- only range requests need strong etags. I am not. I am talking about If-None-Match, not If-Range. And specifically the use of If-None-Match in 13.6 Caching Negotiated Responses. It's a very simple and effective mechanism, but requires servers to properly assign ETags to each (semantically in case of weak) unique entity of a resource (not the resource as such). Content-Encoding is no different in this than any of the other negotiated properties (Content-Type, Content-Language, whatever). Regards Henrik signature.asc Description: This is a digitally signed message part
Re: ETag and Content-Encoding
Henrik Nordstrom wrote: On ons, 2007-10-03 at 13:29 -0700, Justin Erenkrantz wrote: The issue here is that mod_dav_svn generates an ETag (based off rev num and path) and that ETag can be later used to check for conditional requests. But, if mod_deflate always strips a 'special' tag from the ETag (per Henrik), That was only a suggestion on how you may work around your somewhat limited conditional processing capabilities wrt filters like mod_deflate, but I think it's probably the cleanest approach considering the requirements of If-Match and modifying methods (PUT, DELETE, PROPATCH etc). In that construct the tag added to the ETag by mod_deflate (or another entity transforming filter) needs to be sufficiently unique that it is not likely to be seen in the original ETag value. ... Two cents -- no three cents :-): #1) I agree with Henrik's analysis. #2) If Content-Encoding is implemented through a separate module, it will have to rewrite both outgoing and incoming etags; note that this includes the "If-*" headers from RFC2616 and the "If" header defined in RFC4918 (obsoleting RFC2518). #3) If just appending "-gzip" doesn't provide sufficient uniqueness, the implementation may want to *always* append a token (such as "-identity"), even when no compression occurred. Best regards, Julian
Re: ETag and Content-Encoding
On ons, 2007-10-03 at 23:52 +0200, Henrik Nordstrom wrote: > > That is not HTTP. Don't confuse the needs of caching with the needs > > of range requests -- only range requests need strong etags. > > I am not. I am talking about If-None-Match, not If-Range. And > specifically the use of If-None-Match in 13.6 Caching Negotiated > Responses. To clarify, I do not care much about strong/weak etags. This is a property of how the server generates the content with no significant relevance to caching other than that the ETags as such must be sufficiently unique (there is some cache impacts of weak etags, but not really relevant to this discussion) It anything I said seems to imply that I only want to see strong ETags then that's solely due to the use of poor language on my part and not intentional. All I am trying to say is that the responses [no Content-Encoding] and Content-Encoding: gzip from the same negotiated resource is two different variants in terms of HTTP and must carry different ETag values, if any. End. The rest is just trying to get people to see this. Apache mod_deflate do not do this when doing it's dynamic content negotiation driven transformations, and that is a bug (13.11 MUST) with quite nasty implications on caching of negotiated responses (13.6). The fact that responses with different Content-Encoding is meant to result in the same object after decoding is pretty much irrelevant here. It's two incompatible different negotated variants of the resource and is all that matters. I am also saying that the simple change of making mod_deflate transform any existing ETag into a weak one is not sufficient to address this proper, but it's quite likely to plaster over the problem for a while in most uses except when the original response ETag is already weak. It will however break completely if Apache GET If-None-Match processing is changed to use the weak comparison as mandated by the RFC (13.3.3) (to my best knowledge Apache always uses the strong function, but I may be wrong there..). Negotiation of Content-Encoding is really not any different than negotiation of any of the other content properties such as Content-Language or Content-Type. The same rules apply, and each unique outcome (variant) of the negotiation process needs to be assigned an unique ETag with no overlaps between variants, and for strong ETag's each binary version of each variant needs to have an unique ETag with no overlaps. This ignoring any out-of-band dynamic parameters to the negotiation process such as server load which might affect responses to the same request, only talking about negotiation based on request headers. For out-of-band negotiation properties it's important to respect the strong ETag binary equivalence requirements. Note: Changed language to use the more proper term "variant" instead of "entity". Hopefully less confusing. Regards Henrik signature.asc Description: This is a digitally signed message part
Cc: lists (Re: ETag and Content-Encoding)
On Wed, 3 Oct 2007 07:53:31 -0700 "Justin Erenkrantz" <[EMAIL PROTECTED]> wrote: > [chop] The Cc: list on this and subsequent postings is screwed: (1) It includes me, so I get everything twice. OK, I can live with that, but it's annoying. (2) It fails to include Henrik Nordstrom, the principal non-Apache protagonist in this discussion. -- Nick Kew Application Development with Apache - the Apache Modules Book http://www.apachetutor.org/
Re: Cc: lists (Re: ETag and Content-Encoding)
On ons, 2007-10-03 at 21:44 +0100, Nick Kew wrote: > The Cc: list on this and subsequent postings is screwed: > > (1) It includes me, so I get everything twice. > OK, I can live with that, but it's annoying. Use a Message-Id filter? > (2) It fails to include Henrik Nordstrom, the principal > non-Apache protagonist in this discussion. No problem. I am a dev@ subscriber Regards Henrik signature.asc Description: This is a digitally signed message part