Re: mod_cache: store_body() bites off more than it can chew

Niklas Edmundsson Mon, 06 Sep 2010 05:10:18 -0700

On Mon, 6 Sep 2010, Graham Leggett wrote:

<snip>

For those who have forgotten, that's what we do in ourlarge-file-caching-patchset for mod_disk_cache (hidden as an attachment tohttps://issues.apache.org/bugzilla/show_bug.cgi?id=39380 but I shouldreally get around to upload an up2date version that applies cleanly to thecurrent 2.2 release). Some of the solutions there aren't really applicableto httpd proper (mostly workarounds for missing infrastructure), but someideas are rather sane (like writing the header files in a single go with aniovec with null terminated strings instead of crlf-stuff thad needs to beparsed). Oh, and the design caters for a shared data cache (ftp and rsyncaccess uses the same cache), which isn't really a priority for something inhttpd proper.
Given that the make-cache-writes-atomic problem requires a change to the dataformat, it may be useful to look at this now, before v2.4 is baked, whichwill happen soon.


Indeed.

When at it, it might make sense to replace arch-specific data typeslike int and apr_size_t with apr_int32_t and such. Most people wouldhave made the 32/64 bit transition already though, so it might be anon-issue.

Another good thing to have would be the filename of the machingdata/body file. httpd mod_disk_cache hashes this from the URL, butthere may be smarter ways to do this at cache-time which requires theresulting filename to be stored (for example we use dev/inode on plainfiles to reduce data duplication when caching DVD images with dozensof known URLs). Size of that file is also good to have, on mismatchthe cache is out of sync/corrupted (unless the file is being written,but then we know enough to start answering the query from cache).

Also we save r->filename to be able to fill it in when replying on aquery (I think for making logging filenames work).

How much of a performance boost is the use-null-terminated-strings?

As CPU is cheap nowadays, not much in end-to-end performance, but thelogic of figuring out whether a header file is correct/completebecomes much easier when you construct the entire .header-file in aniovec, place the total header length in the on-disk structure, andthen write it out.

Reading it in becomes reading main data structure, and then readingwhatever length the structure indicates as headers. If you get more orless than the data structure says then something is wrong and you caneither retry (if the header seems to be currently writing and theiovec size is too small so it takes multiple writes, but as thecurrent mod_disk_cache code uses temporary files that's a non-issue)or discard it.

The current text-ish-based .header files offers no way of knowing theintegrity of the header file, and store_table()/read_table() havequite a lot of complexity when just handling the null terminatedstrings as is would do nicely.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se      |     ni...@acc.umu.se
---------------------------------------------------------------------------
 After three days of intense pain, the snake died. * Riker
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Re: mod_cache: store_body() bites off more than it can chew

Reply via email to