On 05 Jun 2010, at 6:51 PM, Ruediger Pluem wrote:
IMHO it does not (at least according to the comments and the code
looks like
to follow these):
This is only present on trunk, and this needs to be fixed too. The
problem we saw was in httpd v2.2.
A similar code is part of 2.2. Hence I am still astonished why you
see this problem in 2.2.
I think we're talking across purposes, the code in the most recent
v2.2 branch I am looking at looks like this with the comments stripped:
if (r->status != HTTP_OK && r->status != HTTP_NON_AUTHORITATIVE
&& r->status != HTTP_MULTIPLE_CHOICES
&& r->status != HTTP_MOVED_PERMANENTLY
&& r->status != HTTP_NOT_MODIFIED) {
if (exps != NULL || cc_out != NULL) {
// noop
}
else {
reason = apr_psprintf(p, "Response status %d", r->status);
}
}
If the request survives to the end of this code with "reason" still
NULL, the response is still cacheable at this point.
A 206 always gets past the first if, and is then tested against the
second. If an Expires or a Cache-Control is present (and this is true
in many cases, and particularly in our case where the news people
found the bug), the "noop" path is followed, and we reach the end of
the if statements with reason still NULL. The cache implementation
then duly incorrectly caches the partial response.
The reason we don't see this problem very often is because we don't
see a 206 very often at this point in the code. The filter that does
content ranges sits after the CACHE_SAVE filter, and so under most
circumstances, the response is still a complete 200 OK when it gets
here, and the cache caches the full 200 OK response.
If we reverse proxy to an application server of some kind, many
applications don't consider the range request at all, and return a 200
OK anyway. Again, the cache does "the right thing" at this point and
caches the complete response, even if the browser has requested a
range, the range filter turning the response into a range response
after the cache has finished caching the full response.
It's only when we reverse proxy to another server that understands and
responds to range requests itself that we will see a 206 at this
point, and this isn't a common configuration.
Because the cache doesn't know any better, it passes the 206 back to
mod_disk_cache. Because mod_disk_cache previously didn't know better
either, it cached the 206 (this is now fixed). Subsequent ordinary
requests were handed cached 206 partial responses in return even
though the browser asked for a normal request, until an end user goes
"huh", clicks "shift reload" and the cache fixes itself again.
The fix to this problem comes in two parts, the first is for
mod_disk_cache to say "I explicitly decline to cache range requests, I
don't yet know how", the second part is to make the comments in
mod_cache match the code: The comments in mod_cache say that a 206 is
not cacheable, the code said it is, but only if CC or Expires is
present. You then fixed this on trunk by banning 206 caching entirely
for all mod_cache implementations, but this comes at the price of
removing the option for an implementation to choose to cache a range
request if it so wished.
I am currently worried that a fresh cached copy of a 206 response
cannot
be updated accordingly if a full response is requested. That puts
the burden
of this stuff on each provider instead of some general framework
solution
in mod_cache. So I think mod_cache should provide a better framework
for
206 responses first before providers should implement it.
I don't follow what mod_cache would need to do in this case with
respect to a 206 that makes a cache implementation require special
support.
If a cache implementation wanted to, it could cache the 206, marking
the range in the cache as appropriate. When a full request came along,
the cache could potentially modify the request to ask for the missing
range, and then decline the opportunity to serve from cache. The
backend supplies the missing range, and the cache_save filter saves
the missing range to complete the request, and inserts the existing
cached data into the response as a simple file bucket, completing the
response.
If we did find a way to support a generic 206 mechanism in mod_cache,
that would be great (assuming it was necessary), but the opposite,
banning any caching implementation from even attempting to cache a 206
by enforcing a ban in mod_cache itself isn't ideal, if that makes sense?
Regards,
Graham
--