Re: svn commit: r951222 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_disk_cache.c
On 07 Jun 2010, at 8:19 AM, Ruediger Pluem wrote: I still fail to see how to integrate this easily with the current mod_cache API, since you sure want to use all the generic freshness checks within mod_cache. And if a copy is seen as fresh it is simply served. But I guess this now goes OT and it would be better to discuss this in a separate thread as soon as someone offers an implementation to actually cache 206's. It is easier to discuss the exact code then. Does someone actually plan to do this? I may need to do this soon. We have been promoting the use of inline HTTP caching within our service oriented platform, and have been stress testing mod_cache quite significantly in the process. We deliver all sorts of content, including video file fragments. One of the things that needs to be fixed is the current situation where a cached entity has one header file and one data file. This creates races in the cache. What really needs to happen is that there is one header file only, and the data file has a unique name contained in the header file, so that a second new data file can be safely moved into place at leisure without affecting the original data file, and the corresponding headers file renamed in atomically. I predict this won't be a backportable change, so I'm getting the backportable fixes done first before I attempt this. Regards, Graham --
Re: svn commit: r951222 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_disk_cache.c
On 06/06/2010 01:35 PM, Graham Leggett wrote: > On 05 Jun 2010, at 6:51 PM, Ruediger Pluem wrote: > IMHO it does not (at least according to the comments and the code looks like to follow these): >>> >>> This is only present on trunk, and this needs to be fixed too. The >>> problem we saw was in httpd v2.2. >> >> A similar code is part of 2.2. Hence I am still astonished why you >> see this problem in 2.2. > > I think we're talking across purposes, the code in the most recent v2.2 > branch I am looking at looks like this with the comments stripped: Ah, sorry I missed the inner if which was added in r727599. You are correct, 2.2 caches 206 with expire or cache-control set. > >> I am currently worried that a fresh cached copy of a 206 response cannot >> be updated accordingly if a full response is requested. That puts the >> burden >> of this stuff on each provider instead of some general framework solution >> in mod_cache. So I think mod_cache should provide a better framework for >> 206 responses first before providers should implement it. > > I don't follow what mod_cache would need to do in this case with respect > to a 206 that makes a cache implementation require special support. > > If a cache implementation wanted to, it could cache the 206, marking the > range in the cache as appropriate. When a full request came along, the > cache could potentially modify the request to ask for the missing range, > and then decline the opportunity to serve from cache. The backend > supplies the missing range, and the cache_save filter saves the missing > range to complete the request, and inserts the existing cached data into > the response as a simple file bucket, completing the response. I still fail to see how to integrate this easily with the current mod_cache API, since you sure want to use all the generic freshness checks within mod_cache. And if a copy is seen as fresh it is simply served. But I guess this now goes OT and it would be better to discuss this in a separate thread as soon as someone offers an implementation to actually cache 206's. It is easier to discuss the exact code then. Does someone actually plan to do this? > If we did find a way to support a generic 206 mechanism in mod_cache, > that would be great (assuming it was necessary), but the opposite, > banning any caching implementation from even attempting to cache a 206 > by enforcing a ban in mod_cache itself isn't ideal, if that makes sense? Ok, I am fine with that now. I guess the discussion might continue as soon as we see the first implementation for caching 206's. Regards Rüdiger
Re: svn commit: r951222 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_disk_cache.c
On 06.06.2010 13:35, Graham Leggett wrote: On 05 Jun 2010, at 6:51 PM, Ruediger Pluem wrote: The reason we don't see this problem very often is because we don't see a 206 very often at this point in the code. The filter that does content ranges sits after the CACHE_SAVE filter, and so under most circumstances, the response is still a complete 200 OK when it gets here, and the cache caches the full 200 OK response. If we reverse proxy to an application server of some kind, many applications don't consider the range request at all, and return a 200 OK anyway. Again, the cache does "the right thing" at this point and caches the complete response, even if the browser has requested a range, the range filter turning the response into a range response after the cache has finished caching the full response. It's only when we reverse proxy to another server that understands and responds to range requests itself that we will see a 206 at this point, and this isn't a common configuration. Because the cache doesn't know any better, it passes the 206 back to mod_disk_cache. Because mod_disk_cache previously didn't know better either, it cached the 206 (this is now fixed). Subsequent ordinary requests were handed cached 206 partial responses in return even though the browser asked for a normal request, until an end user goes "huh", clicks "shift reload" and the cache fixes itself again. I'm not very experienced with mod_cache, but that's exactly a situation I noticed with an early 2.2 release, where a caching reverse proxy cached parts of PDF documents served by a backend and served those parts on later requests for the document. The Byte Range requests came from Acrobat (loading individual pages). At that time I didn't investigate closer, but at least for an old 2.2 release I can confirm this behaviour. Regards, Rainer
Re: svn commit: r951222 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_disk_cache.c
On 05 Jun 2010, at 6:51 PM, Ruediger Pluem wrote: IMHO it does not (at least according to the comments and the code looks like to follow these): This is only present on trunk, and this needs to be fixed too. The problem we saw was in httpd v2.2. A similar code is part of 2.2. Hence I am still astonished why you see this problem in 2.2. I think we're talking across purposes, the code in the most recent v2.2 branch I am looking at looks like this with the comments stripped: if (r->status != HTTP_OK && r->status != HTTP_NON_AUTHORITATIVE && r->status != HTTP_MULTIPLE_CHOICES && r->status != HTTP_MOVED_PERMANENTLY && r->status != HTTP_NOT_MODIFIED) { if (exps != NULL || cc_out != NULL) { // noop } else { reason = apr_psprintf(p, "Response status %d", r->status); } } If the request survives to the end of this code with "reason" still NULL, the response is still cacheable at this point. A 206 always gets past the first if, and is then tested against the second. If an Expires or a Cache-Control is present (and this is true in many cases, and particularly in our case where the news people found the bug), the "noop" path is followed, and we reach the end of the if statements with reason still NULL. The cache implementation then duly incorrectly caches the partial response. The reason we don't see this problem very often is because we don't see a 206 very often at this point in the code. The filter that does content ranges sits after the CACHE_SAVE filter, and so under most circumstances, the response is still a complete 200 OK when it gets here, and the cache caches the full 200 OK response. If we reverse proxy to an application server of some kind, many applications don't consider the range request at all, and return a 200 OK anyway. Again, the cache does "the right thing" at this point and caches the complete response, even if the browser has requested a range, the range filter turning the response into a range response after the cache has finished caching the full response. It's only when we reverse proxy to another server that understands and responds to range requests itself that we will see a 206 at this point, and this isn't a common configuration. Because the cache doesn't know any better, it passes the 206 back to mod_disk_cache. Because mod_disk_cache previously didn't know better either, it cached the 206 (this is now fixed). Subsequent ordinary requests were handed cached 206 partial responses in return even though the browser asked for a normal request, until an end user goes "huh", clicks "shift reload" and the cache fixes itself again. The fix to this problem comes in two parts, the first is for mod_disk_cache to say "I explicitly decline to cache range requests, I don't yet know how", the second part is to make the comments in mod_cache match the code: The comments in mod_cache say that a 206 is not cacheable, the code said it is, but only if CC or Expires is present. You then fixed this on trunk by banning 206 caching entirely for all mod_cache implementations, but this comes at the price of removing the option for an implementation to choose to cache a range request if it so wished. I am currently worried that a fresh cached copy of a 206 response cannot be updated accordingly if a full response is requested. That puts the burden of this stuff on each provider instead of some general framework solution in mod_cache. So I think mod_cache should provide a better framework for 206 responses first before providers should implement it. I don't follow what mod_cache would need to do in this case with respect to a 206 that makes a cache implementation require special support. If a cache implementation wanted to, it could cache the 206, marking the range in the cache as appropriate. When a full request came along, the cache could potentially modify the request to ask for the missing range, and then decline the opportunity to serve from cache. The backend supplies the missing range, and the cache_save filter saves the missing range to complete the request, and inserts the existing cached data into the response as a simple file bucket, completing the response. If we did find a way to support a generic 206 mechanism in mod_cache, that would be great (assuming it was necessary), but the opposite, banning any caching implementation from even attempting to cache a 206 by enforcing a ban in mod_cache itself isn't ideal, if that makes sense? Regards, Graham --
Re: svn commit: r951222 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_disk_cache.c
On 06/04/2010 05:09 PM, Graham Leggett wrote: > On 04 Jun 2010, at 4:15 PM, Plüm, Rüdiger, VF-Group wrote: > >> IMHO it does not (at least according to the comments and the code >> looks like >> to follow these): > > This is only present on trunk, and this needs to be fixed too. The > problem we saw was in httpd v2.2. A similar code is part of 2.2. Hence I am still astonished why you see this problem in 2.2. > >>> implementation (mod_disk_cache) to decide whether it wants to >>> handle a >>> 206 or not. >>> >>> mod_cache is not the place to fix this. It is entirely valid for a >> >> So you think that should be fixed in every single provider? > > Yes. > > Each provider should have the opportunity to cache a 206 if it so > wishes, as RFC2616 allows it. Remember that providers don't have to be > written by us. > > Any provider that chooses not to support a 206 should explicitly do so, > not rely on mod_cache to enforce a blanket ban on supporting 206 > response caching. > >> I am currently not convinced that any provider could cache a 206 with >> the current mod_cache infrastructure. I am currently worried that a fresh cached copy of a 206 response cannot be updated accordingly if a full response is requested. That puts the burden of this stuff on each provider instead of some general framework solution in mod_cache. So I think mod_cache should provide a better framework for 206 responses first before providers should implement it. Regards Rüdiger
Re: svn commit: r951222 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_disk_cache.c
On 04 Jun 2010, at 4:15 PM, Plüm, Rüdiger, VF-Group wrote: cache implementation to be given the opportunity to cache a 206, if Right, RFC2616 permits caching 206's. What I have in mind is this below. mod_cache supports the caching of a 206, but mod_disk_cache chooses not to. Once backported to 2.2, this will fix the current problem where if a 206 arrives, and contains an Expires or Cache-Control header, it gets cached. Index: modules/cache/mod_cache.c === --- modules/cache/mod_cache.c (revision 951708) +++ modules/cache/mod_cache.c (working copy) @@ -726,16 +726,17 @@ * They are tested here one by one to be clear and unambiguous. */ if (r->status != HTTP_OK && r->status != HTTP_NON_AUTHORITATIVE +&& r->status != HTTP_PARTIAL_CONTENT && r->status != HTTP_MULTIPLE_CHOICES && r->status != HTTP_MOVED_PERMANENTLY && r->status != HTTP_NOT_MODIFIED) { /* RFC2616 13.4 we are allowed to cache 200, 203, 206, 300, 301 or 410 - * We don't cache 206, because we don't (yet) cache partial responses. + * We allow the caching of 206, but a cache implementation might choose + * to decline to cache a 206 if it doesn't know how to. * We include 304 Not Modified here too as this is the origin server * telling us to serve the cached copy. */ -if ((exps != NULL || cc_out != NULL) -&& r->status != HTTP_PARTIAL_CONTENT) { +if (exps != NULL || cc_out != NULL) { /* We are also allowed to cache any response given that it has a * valid Expires or Cache Control header. If we find a either of * those here, we pass request through the rest of the tests. From @@ -748,9 +749,6 @@ * include the following: an Expires header (section 14.21); a * "max-age", "s-maxage", "must-revalidate", "proxy- revalidate", * "public" or "private" cache-control directive (section 14.9). - * - * But do NOT store 206 responses in any case since we - * don't (yet) cache partial responses. */ } else { Regards, Graham --
Re: svn commit: r951222 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_disk_cache.c
On 04 Jun 2010, at 4:15 PM, Plüm, Rüdiger, VF-Group wrote: IMHO it does not (at least according to the comments and the code looks like to follow these): This is only present on trunk, and this needs to be fixed too. The problem we saw was in httpd v2.2. implementation (mod_disk_cache) to decide whether it wants to handle a 206 or not. mod_cache is not the place to fix this. It is entirely valid for a So you think that should be fixed in every single provider? Yes. Each provider should have the opportunity to cache a 206 if it so wishes, as RFC2616 allows it. Remember that providers don't have to be written by us. Any provider that chooses not to support a 206 should explicitly do so, not rely on mod_cache to enforce a blanket ban on supporting 206 response caching. I am currently not convinced that any provider could cache a 206 with the current mod_cache infrastructure. There was nothing in the original design for mod_cache that stopped a provider trying to cache a 206. cache implementation to be given the opportunity to cache a 206, if Right, RFC2616 permits caching 206's. Regards, Graham --
RE: svn commit: r951222 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_disk_cache.c
> -Original Message- > From: Graham Leggett > Sent: Freitag, 4. Juni 2010 15:44 > To: dev@httpd.apache.org > Subject: Re: svn commit: r951222 - in /httpd/httpd/trunk: > CHANGES modules/cache/mod_disk_cache.c > > On 04 Jun 2010, at 7:27 AM, Ruediger Pluem wrote: > > > Why is this needed? mod_cache itself does not allow partial > content > > to be cached > > and even if this does not work it should be fixed there and not in > > one of the > > storage providers. > > mod_cache does allow a 206 to be cached, it is up to the cache IMHO it does not (at least according to the comments and the code looks like to follow these): /* * what responses should we not cache? * * At this point we decide based on the response headers whether it * is appropriate _NOT_ to cache the data from the server. There are * a whole lot of conditions that prevent us from caching this data. * They are tested here one by one to be clear and unambiguous. */ if (r->status != HTTP_OK && r->status != HTTP_NON_AUTHORITATIVE && r->status != HTTP_MULTIPLE_CHOICES && r->status != HTTP_MOVED_PERMANENTLY && r->status != HTTP_NOT_MODIFIED) { /* RFC2616 13.4 we are allowed to cache 200, 203, 206, 300, 301 or 410 * We don't cache 206, because we don't (yet) cache partial responses. * We include 304 Not Modified here too as this is the origin server * telling us to serve the cached copy. */ if ((exps != NULL || cc_out != NULL) && r->status != HTTP_PARTIAL_CONTENT) { /* We are also allowed to cache any response given that it has a * valid Expires or Cache Control header. If we find a either of * those here, we pass request through the rest of the tests. From * the RFC: * * A response received with any other status code (e.g. status * codes 302 and 307) MUST NOT be returned in a reply to a * subsequent request unless there are cache-control directives or * another header(s) that explicitly allow it. For example, these * include the following: an Expires header (section 14.21); a * "max-age", "s-maxage", "must-revalidate", "proxy-revalidate", * "public" or "private" cache-control directive (section 14.9). * * But do NOT store 206 responses in any case since we * don't (yet) cache partial responses. */ } else { reason = apr_psprintf(p, "Response status %d", r->status); } } > implementation (mod_disk_cache) to decide whether it wants to > handle a > 206 or not. > > mod_cache is not the place to fix this. It is entirely valid for a So you think that should be fixed in every single provider? I am currently not convinced that any provider could cache a 206 with the current mod_cache infrastructure. > cache implementation to be given the opportunity to cache a 206, if Right, RFC2616 permits caching 206's. Regards Rüdiger
Re: svn commit: r951222 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_disk_cache.c
On 04 Jun 2010, at 7:27 AM, Ruediger Pluem wrote: Why is this needed? mod_cache itself does not allow partial content to be cached and even if this does not work it should be fixed there and not in one of the storage providers. mod_cache does allow a 206 to be cached, it is up to the cache implementation (mod_disk_cache) to decide whether it wants to handle a 206 or not. mod_cache is not the place to fix this. It is entirely valid for a cache implementation to be given the opportunity to cache a 206, if the cache implementation wants to. It is also entirely valid for a cache implementation to choose not to cache a 206 if it doesn't yet know how to, and that's what this fix does. This patch is in response to buggy behaviour discovered in one of the UK's news websites, where mod_cache on cluster A was reverse proxying content served from origin cluster B. Origin cluster B was reacting to range requests with a 206, and cluster A then cached the 206, returning it to subsequent visitors. This patch fixes the problem. Regards, Graham --
Re: svn commit: r951222 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_disk_cache.c
On 04.06.2010 02:17, minf...@apache.org wrote: > Author: minfrin > Date: Fri Jun 4 00:17:16 2010 > New Revision: 951222 > > URL: http://svn.apache.org/viewvc?rev=951222&view=rev > Log: > mod_disk_cache: Decline the opportunity to cache if the response is > a 206 Partial Content. This stops a reverse proxied partial response > from becoming cached, and then being served in subsequent responses. > > Modified: > httpd/httpd/trunk/CHANGES > httpd/httpd/trunk/modules/cache/mod_disk_cache.c > > Modified: httpd/httpd/trunk/CHANGES > URL: > http://svn.apache.org/viewvc/httpd/httpd/trunk/CHANGES?rev=951222&r1=951221&r2=951222&view=diff > == > --- httpd/httpd/trunk/CHANGES [utf-8] (original) > +++ httpd/httpd/trunk/CHANGES [utf-8] Fri Jun 4 00:17:16 2010 > @@ -28,6 +28,11 @@ Changes with Apache 2.3.6 > processing is completed, avoiding orphaned callback pointers. > [Brett Gervasoni , Jeff Trawick] > > + *) mod_disk_cache: Decline the opportunity to cache if the response is > + a 206 Partial Content. This stops a reverse proxied partial response > + from becoming cached, and then being served in subsequent responses. > + [Graham Leggett] > + >*) mod_deflate: avoid the risk of forwarding data before headers are set. > PR 49369 [Matthew Steele ] > > > Modified: httpd/httpd/trunk/modules/cache/mod_disk_cache.c > URL: > http://svn.apache.org/viewvc/httpd/httpd/trunk/modules/cache/mod_disk_cache.c?rev=951222&r1=951221&r2=951222&view=diff > == > --- httpd/httpd/trunk/modules/cache/mod_disk_cache.c (original) > +++ httpd/httpd/trunk/modules/cache/mod_disk_cache.c Fri Jun 4 00:17:16 2010 > @@ -333,6 +333,14 @@ static int create_entity(cache_handle_t > return DECLINED; > } > > +/* we don't support caching of range requests (yet) */ > +if (r->status == HTTP_PARTIAL_CONTENT) { > +ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, r->server, > + "disk_cache: URL %s partial content response not > cached", > + key); > +return DECLINED; > +} > + Why is this needed? mod_cache itself does not allow partial content to be cached and even if this does not work it should be fixed there and not in one of the storage providers. Regards Rüdiger