On Mon, 6 Sep 2010, Graham Leggett wrote:
<snip>
For those who have forgotten, that's what we do in our
large-file-caching-patchset for mod_disk_cache (hidden as an attachment to
https://issues.apache.org/bugzilla/show_bug.cgi?id=39380 but I should
really get around to upload an up2date version that applies cleanly to the
current 2.2 release). Some of the solutions there aren't really applicable
to httpd proper (mostly workarounds for missing infrastructure), but some
ideas are rather sane (like writing the header files in a single go with an
iovec with null terminated strings instead of crlf-stuff thad needs to be
parsed). Oh, and the design caters for a shared data cache (ftp and rsync
access uses the same cache), which isn't really a priority for something in
httpd proper.
Given that the make-cache-writes-atomic problem requires a change to the data
format, it may be useful to look at this now, before v2.4 is baked, which
will happen soon.
Indeed.
When at it, it might make sense to replace arch-specific data types
like int and apr_size_t with apr_int32_t and such. Most people would
have made the 32/64 bit transition already though, so it might be a
non-issue.
Another good thing to have would be the filename of the maching
data/body file. httpd mod_disk_cache hashes this from the URL, but
there may be smarter ways to do this at cache-time which requires the
resulting filename to be stored (for example we use dev/inode on plain
files to reduce data duplication when caching DVD images with dozens
of known URLs). Size of that file is also good to have, on mismatch
the cache is out of sync/corrupted (unless the file is being written,
but then we know enough to start answering the query from cache).
Also we save r->filename to be able to fill it in when replying on a
query (I think for making logging filenames work).
How much of a performance boost is the use-null-terminated-strings?
As CPU is cheap nowadays, not much in end-to-end performance, but the
logic of figuring out whether a header file is correct/complete
becomes much easier when you construct the entire .header-file in an
iovec, place the total header length in the on-disk structure, and
then write it out.
Reading it in becomes reading main data structure, and then reading
whatever length the structure indicates as headers. If you get more or
less than the data structure says then something is wrong and you can
either retry (if the header seems to be currently writing and the
iovec size is too small so it takes multiple writes, but as the
current mod_disk_cache code uses temporary files that's a non-issue)
or discard it.
The current text-ish-based .header files offers no way of knowing the
integrity of the header file, and store_table()/read_table() have
quite a lot of complexity when just handling the null terminated
strings as is would do nicely.
/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | ni...@acc.umu.se
---------------------------------------------------------------------------
After three days of intense pain, the snake died. * Riker
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=