On Mon, 6 Sep 2010, Paul Fee wrote:
If mod_disk_cache's on disk format is changing, now may be an opportunity to
investigate some options to improve performance of httpd as a caching proxy.
Currently headers and data are in separate files. If they were in a single
file, the operating system is given more indication that these two items are
tightly coupled. For example, when the headers are read in, the O/S can
readahead and buffer part of the body.
A difficulty with this could be refreshing the headers after a response to a
conditional GET. If the headers are at the start of the file and they
change size, then they may overwrite the start of the existing body. You
could leave room for expansion (risks wasted space and may not be enough) or
you could put the headers at the end of the file (may not benefit from
readahead).
I tried to go the single-file route, but after having banged my head
against the above issue and others while trying to design/implement
something that would work for read-while-caching with using only
O_EXCL file locking I did some benchmarking and found ut that the gain
was minimal and reverted to having a separate header and body file.
What DID matter VERY MUCH regarding performance was the totally bogus
defaults which affects the number of directories mod_disk_cache
creates. CacheDirLength 1 and CacheDirLevels 2 gives you 4096
directories (64^2) that holds files, that will hold many millions of
files even on an fs that isn't too good at coping with many entries in
a directory. With the defaults you tend to end up with one directory
for each query, not very optimal.
Also, set CacheRemoveDirectories false because otherwise
mod_disk_cache creates and deletes directories all the time which is a
total waste of time. If you need to delete cache dirs then you have
tuned yourself into the wrong corner, so IMHO that part of
mod_disk_cache is plainly wrong.
Oh, this rant applies for xfs on Linux while I was hacking on our
large-file-cache-patchset. The basics should apply for most other
fs/os combos too ;)
On a similar theme, would filesystem extended attributes be suitable for
storing the headers? The cache file's contents would be the entity body. A
problem with this approach could be portability. However the APR could
abstract this, reverting to separate files on platforms/filesystems that
didn't offer extended attributes.
http://en.wikipedia.org/wiki/Extended_file_attributes
I haven't tested extended attributes to see if they offer performance gains
over separate header and body files. However it seems cleaner to have both
parts in one file. It should also eliminate race conditions where
headers/body could get out of sync.
I'm honestly not sure you will get any massive performance gains, only
benchmarks will tell :) The consistency-issues should be cleaner
though.
Also, you will/might lose any possibility to have multiple headers
pointing to the same body (classic example is multiple URLs resulting
in the same plain file).
/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | ni...@acc.umu.se
---------------------------------------------------------------------------
IBM stands for Inferior But Marketable.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=