mod_disk_cache patch, preview edition (was: new cache arch)

Niklas Edmundsson Tue, 02 May 2006 05:03:39 -0700

On Tue, 2 May 2006, Graham Leggett wrote:

I've been hacking on mod_disk_cache to make it:
* Only store one set of data when one uncached item is accessed
   simultaneously (currently all requests cache the file and the last
   finished cache process is "wins").
* Don't wait until the whole item is cached, reply while caching
   (currently it stalls).
* Don't block the requesting thread when requestng a large uncached
   item, cache in the background and reply while caching (currently it
   stalls).


This is great, in doing this you've been solving a proxy bug that was
first reported in 1998 :).


OK. Stuck in the "File under L for Later" pile? ;)

The only things to be careful of is for Cache-Control: no-cache and
friends to be handled gracefully (the partially cached file should be
marked as "delete-me" so that the current request creates a new cache file
/ no cache file. Existing running downloads should be unaffected by
this.), and for backend failures (either a timeout or a premature socket
close) to cause the cache entry to be invalidated and deleted.

I haven't changed the handling of this, so any bugs in this regard shouldn't be my fault at least ;)

Regarding partially cached files, it understands when caching a file has failed and so on.

* More or less atomic operations, so caching headers and data in
   separate files gets very messy if you want to keep consistency.


Keep in mind that HTTP/1.1 compliance requires that the headers be
updatable without changing the body.

They are. It seek():s to an offset where the body is stored so headers can be updated as long as they don't grow too much.

* You can't use tempfiles since you want to be able to figure out
   where the data is to be able to reply while caching.
* You want to know the size of the data in order to tell when you're
   done (ie the current size of a file isn't necessarily the real size
   of the body since it might be caching while we're reading it).


The cache already wants to know the size of the data so that it can decide
whether it's prepared to try and cache the file in the first place, so in
theory this should not be a problem.


The need-size-issue goes for retrievals as well.

You also have the "size unknown right now" issue, which this patch solves by writing a header with the size -1 and then updating it when the size is known.

In any case the patch is more or less finished, independent testing
and auditing haven't been done yet but I can submit a preliminary
jumbo-patch if people are interested in having a look at it now.


Post it, people can take a look.

OK. It's attached. It has only had mild testing using the worker mpm with mmap enabled, it needs a bit more testing and auditing before trusting it too hard.

Note that this patch fixes a whole slew of other issues along the way, the most notable ones being LFS on 32bit arch, don't eat all your 32bit memory/address space when caching a huge files, provide r->filename so %f in LogFormat works, and other smaller issues.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se      |     [EMAIL PROTECTED]
---------------------------------------------------------------------------
 I am Zirofsky of Borg. I will reassimilate Alaska and Finland.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

httpd-2.2.2-mod_disk_cache-jumbo20060502.patch.gz
Description: Binary data

mod_disk_cache patch, preview edition (was: new cache arch)

Reply via email to