On Mon, 6 Sep 2010, Paul Fee wrote:

If mod_disk_cache's on disk format is changing, now may be an opportunity to
investigate some options to improve performance of httpd as a caching proxy.

Currently headers and data are in separate files.  If they were in a single
file, the operating system is given more indication that these two items are
tightly coupled.  For example, when the headers are read in, the O/S can
readahead and buffer part of the body.

A difficulty with this could be refreshing the headers after a response to a
conditional GET.  If the headers are at the start of the file and they
change size, then they may overwrite the start of the existing body.  You
could leave room for expansion (risks wasted space and may not be enough) or
you could put the headers at the end of the file (may not benefit from
readahead).

I tried to go the single-file route, but after having banged my head against the above issue and others while trying to design/implement something that would work for read-while-caching with using only O_EXCL file locking I did some benchmarking and found ut that the gain was minimal and reverted to having a separate header and body file.

What DID matter VERY MUCH regarding performance was the totally bogus defaults which affects the number of directories mod_disk_cache creates. CacheDirLength 1 and CacheDirLevels 2 gives you 4096 directories (64^2) that holds files, that will hold many millions of files even on an fs that isn't too good at coping with many entries in a directory. With the defaults you tend to end up with one directory for each query, not very optimal.

Also, set CacheRemoveDirectories false because otherwise mod_disk_cache creates and deletes directories all the time which is a total waste of time. If you need to delete cache dirs then you have tuned yourself into the wrong corner, so IMHO that part of mod_disk_cache is plainly wrong.

Oh, this rant applies for xfs on Linux while I was hacking on our large-file-cache-patchset. The basics should apply for most other fs/os combos too ;)

On a similar theme, would filesystem extended attributes be suitable for
storing the headers?  The cache file's contents would be the entity body.  A
problem with this approach could be portability.  However the APR could
abstract this, reverting to separate files on platforms/filesystems that
didn't offer extended attributes.

http://en.wikipedia.org/wiki/Extended_file_attributes

I haven't tested extended attributes to see if they offer performance gains
over separate header and body files.  However it seems cleaner to have both
parts in one file.  It should also eliminate race conditions where
headers/body could get out of sync.

I'm honestly not sure you will get any massive performance gains, only benchmarks will tell :) The consistency-issues should be cleaner though.

Also, you will/might lose any possibility to have multiple headers pointing to the same body (classic example is multiple URLs resulting in the same plain file).

/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se      |     ni...@acc.umu.se
---------------------------------------------------------------------------
 IBM stands for Inferior But Marketable.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Reply via email to