On Fri, 9 May 2008, Graham Leggett wrote:

There is a lot of valuable stuff in your jumbo patch - but I am not sure what the best approach is to fold it in.

Could you have have a look at the rough patch I posted earlier today - and let me know if you have any thoughts as to which parts should be moved 'up' -- and hence be of use to other caching backends as well - and which
parts are pure disk optimized/specific ?

Or perhaps form an option if we need multiple disk cache modules - each optimized differently (e.g. for large files on multiple spindles; versus very 'hot' cache which is virtually living on a meory disk).

I think the safest option is to develop it as another disk cache module in parallel to the current one. It would definitely help if the jumbo patch could be updated alongside the current batch of changes.

My jumbopatch was rather singlemindedly aimed at making the disk cache work with our usecase: caching large files on machines with not-that-much ram (ie. 4GB DVD images on machines with 3GB RAM). The current mod_disk_cache simply sucks at this, to the point it's completely unusable since all it does is start caching the file once per access, filling your disk and eating all your memory/address space and then segfault.

With that in mind, there are a few things that without a doubt should benefit "vanilla" mod_disk_cache, like the cleanup of on-disk-format (storing of headers, data types used for sizes), cleanup of various functions and probably other things. Those things could probably be merged to trunk without too much debate, given that I can find time to do it.

Also, we want the cache to be usable by other parties. We have a preloaded library that makes rsync able to use it, and an improved version that makes vsftpd work needs some testing before we put it to use.

Changing the absolutely silly defaults for length and number of subdirectories in the cache hierarchy would probably also be wise. You should never have to delete cache directories like is being done now, if you have to you're creating way too many (or are using a crappy filesystem without indexed dirs). CacheDirLength 1 and CacheDirLevels 2 gives you 4096 directories (64^2), that should be more than enough for a competent filesystem.

If I remember correctly the parts of the patch causing commotion was:
- Getting rid of the temporary files and caching directly to the
  destination file (requires O_EXCL which supposedly doesn't work on
  Windows, "too much and ugly code" and other comments).
- The background-caching-stuff. Yes, I know that forking a new
  thread/process isn't pretty but that was the easy way to do it. A
  service-process that does the background-caching is probably more
  elegant but involved much more httpd magic that I'm clueless about.
- Using fstat() in the read-while-caching buckets, people wanted
  inotify-stuff for this even though there is no support in APR for
  it.

The thing is - this is what makes the thing work for using the cache with large files, and I'm not convinced that it fundamentally breaks stuff for small-file-cache-people.

Oh, and for a reference that the jumbo patch is at least usable: It has served approx 4220 PB during the time we have used it, with some 90-95% cache reuse :)

That said I'm very well aware that my code is not the ultimate solution, it's more of a proof that there's some odds and ends missing in apr/httpd/whatever infrastructure to do this in an elegant way. However, obviously still possible to do in a single module without having to patch your way through httpd.

I'm not convinced that forking the disk cache having two similar ones tuned for different usecases is a good idea in the long run, I'm pretty sure that the parts that needs tweaking can be solved with config options and documentation. For a development sprint like this it seems sane though, although one option could be to simply do a branch of modules/cache and go wild with the controversial parts until all parties are satisfied.


/Nikke - will have another glass of wine now, I'll fail miserably in
         Guitar Hero anyhow ;)
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se      |     [EMAIL PROTECTED]
---------------------------------------------------------------------------
 When all else fails, read the manual.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Reply via email to