Re: vary stuff

2007-02-12 Thread Henrik Nordstrom
mån 2007-02-12 klockan 20:46 +0800 skrev Adrian Chadd:
 I'm about to start implementing replacement memory-only store client
 primitives and I'm not fully on top of how the vary code abuses
 store objects to do its thing in store.c.

He, abuses is a good description.

It actually doesn't do very much with the store objects as such, most of
the magic is currently taking place in the request  and used by the
storeLookupByRequestMethod call..

 Would you mind if the Vary support was culled out of the storage work
 branch until I've tidied up the storage manager layer somewhat?

No problem. It's not really that tricky thing to support. The tricky
part was getting it into Squid-2 without a suitable store interface or
even intermediary layer..

The things you need to remember about Vary:ing objects and HTTP caching
in general.

0. Caching specifications in HTTP is primarily concerned with GET
requests resulting in 200 OK or derived responses (i.e. 206/304) and
variants of that 200 OK with N variants per URI on the server identified
uniquely by ETag and/or Content-Location. There is some odd twists like
POST which may return a cachable 200 OK suitable for later GET requests
of the same URI (doubt this is used anywhere btw..).

1. The client-intermediary lookup API needs to be async for it to be
able to do the vary dance. May need multiple store lookups and possibly
a conditional upstream request to find the correct response.

2. In the optimal world each variant has a unique ETag identifying the
response entity (body + entity headers). Such objects may be shared by
multiple request thanks to If-None-Match 304 replies building up the
knowledge of the Vary logics in the cache. Responses not having an ETag
is identified by their request headers selected by Vary and unique for
that request header combination.

3. There vary dance has two different but related results

a) On a cache hit (maching request found), the result is a the
matching response entity (headers + body), based on priory seen request
headers and Vary responses and the object (ETag or unique) this maps to.

b) On a cache miss not finding a matching Request headers + Vary
response header pair one need to find a list of ETag:s of the currently
cached variants (fresh and expired equal) of the URI. Used for building
an If-None-Match conditional request for finding out which (if any)
cached variant is valid for this request.

A twist here is that many server implementers of mainly dynamic gzip
content-encoding (which really really should be done as
transfer-encoding) don't understand that well HTTP and messes up wrt
ETag and Content-Location. Due to this we need a blacklist where ETag
alone isn't trused but must be combined with the Accept-Encoding request
header as well to identify the variants of the URI. The Content-Location
problem will bite us the day we start to follow the RFC and correctly
invalidate variants on changes and I have not yet identified if there is
a similar workaround possible..


Some words on ETag vs Content-Location:

This whole dance is based on the server driven content negotiation
scheme. thought of as a server having multiple variants of the same
object, differing in format (i.e. gif/jpeg/png), language (i.e
sv/en/de), encoding (i.e identify/gzip/deflate), each stored as a unique
file in the http directory of the server and each accessible separately
by unique URIs.

Content-Location defines the exact origin of the response. ETag
identifies the exact version of the response.

ETag is guaranteed to be unique for all variants and for a strong ETag
all versions of the URI so the protocol focuses on ETag in mapping
relations between requests and responses.

Content-Location is mainly used in invalidations to make sure all users
gets the most recently seen version of a variant.



There is still some small details wrt freshness of Vary:ing objects
which I have not fully understood how it's supposed to work. In the
worst case we may need to maintain it separately per request header
combination.


Regards
Henrik



signature.asc
Description: Detta är en digitalt signerad	meddelandedel


Re: vary stuff

2007-02-12 Thread Adrian Chadd
On Mon, Feb 12, 2007, Henrik Nordstrom wrote:

  Would you mind if the Vary support was culled out of the storage work
  branch until I've tidied up the storage manager layer somewhat?
 
 No problem. It's not really that tricky thing to support. The tricky
 part was getting it into Squid-2 without a suitable store interface or
 even intermediary layer..

Hm! I'll give it a shot this evening. I wouldn't mind it if you yanked
the code out of the storework branch before me though.. ;)

(And thanks for the caching description here!)

 1. The client-intermediary lookup API needs to be async for it to be
 able to do the vary dance. May need multiple store lookups and possibly
 a conditional upstream request to find the correct response.

We'll have to do this to support a number of 'other' things, such as
the types of storedirs people have wanted over the past: eg md5-based
reiserfs access - performance may suffer but the memory footprint will
be drastically smaller!

From what I've heard from others (as I don't have a commercial web cache lab
here) commercial caches treat Vary content very, very simplisticly. We might
want to re-evaluate how we handle Vary - eg allowing for Vary header contents
to be 'normalised' (eg Vary: Accept-Encoding shouldn't Vary based on the
verbatim header contents as UA's are pretty arbitrary with their Accept-Encoding
headers; instead tokenise Accept-Encoding into a number of states and Vary
based on those states.)

But ok. If you or I or someone else feels up to it then lets yank the Vary
support out of the SF storework branch and leave it out until we've got the
rest of the store manager sorted. It does mean we might have trouble finding
testers (wikimedia probably won't as they really do want Vary support) but
I'll see what I can do.




Adrian