On Fri, 9 May 2008, Graham Leggett wrote:
There is a lot of valuable stuff in your jumbo patch - but I am not sure
what the best approach is to fold it in.
Could you have have a look at the rough patch I posted earlier today - and
let me know if you have any thoughts
as to which parts should be moved 'up' -- and hence be of use to other
caching backends as well - and which
parts are pure disk optimized/specific ?
Or perhaps form an option if we need multiple disk cache modules - each
optimized differently (e.g. for large files on multiple spindles; versus
very 'hot' cache which is virtually living on a meory disk).
I think the safest option is to develop it as another disk cache module in
parallel to the current one. It would definitely help if the jumbo patch
could be updated alongside the current batch of changes.
My jumbopatch was rather singlemindedly aimed at making the disk cache
work with our usecase: caching large files on machines with
not-that-much ram (ie. 4GB DVD images on machines with 3GB RAM). The
current mod_disk_cache simply sucks at this, to the point it's
completely unusable since all it does is start caching the file once
per access, filling your disk and eating all your memory/address space
and then segfault.
With that in mind, there are a few things that without a doubt should
benefit "vanilla" mod_disk_cache, like the cleanup of on-disk-format
(storing of headers, data types used for sizes), cleanup of various
functions and probably other things. Those things could probably be
merged to trunk without too much debate, given that I can find time to
do it.
Also, we want the cache to be usable by other parties. We have a
preloaded library that makes rsync able to use it, and an improved
version that makes vsftpd work needs some testing before we put it to
use.
Changing the absolutely silly defaults for length and number of
subdirectories in the cache hierarchy would probably also be wise. You
should never have to delete cache directories like is being done now,
if you have to you're creating way too many (or are using a crappy
filesystem without indexed dirs). CacheDirLength 1 and CacheDirLevels
2 gives you 4096 directories (64^2), that should be more than enough
for a competent filesystem.
If I remember correctly the parts of the patch causing commotion was:
- Getting rid of the temporary files and caching directly to the
destination file (requires O_EXCL which supposedly doesn't work on
Windows, "too much and ugly code" and other comments).
- The background-caching-stuff. Yes, I know that forking a new
thread/process isn't pretty but that was the easy way to do it. A
service-process that does the background-caching is probably more
elegant but involved much more httpd magic that I'm clueless about.
- Using fstat() in the read-while-caching buckets, people wanted
inotify-stuff for this even though there is no support in APR for
it.
The thing is - this is what makes the thing work for using the cache
with large files, and I'm not convinced that it fundamentally breaks
stuff for small-file-cache-people.
Oh, and for a reference that the jumbo patch is at least usable: It
has served approx 4220 PB during the time we have used it, with
some 90-95% cache reuse :)
That said I'm very well aware that my code is not the ultimate
solution, it's more of a proof that there's some odds and ends missing
in apr/httpd/whatever infrastructure to do this in an elegant way.
However, obviously still possible to do in a single module without
having to patch your way through httpd.
I'm not convinced that forking the disk cache having two similar ones
tuned for different usecases is a good idea in the long run, I'm
pretty sure that the parts that needs tweaking can be solved with
config options and documentation. For a development sprint like this
it seems sane though, although one option could be to simply do a
branch of modules/cache and go wild with the controversial parts until
all parties are satisfied.
/Nikke - will have another glass of wine now, I'll fail miserably in
Guitar Hero anyhow ;)
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED]
---------------------------------------------------------------------------
When all else fails, read the manual.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=