Re: mod_cache summary and plan

Graham Leggett Sun, 29 Oct 2006 04:44:44 -0800

Davi Arnaut wrote:

. Problem:


You have described two separate problems below.

For a moment forget about file buckets and large files, what's really at
stake is proxy/cache brigade management when the arrival rate is too
high (e.g. a single 4.7GB file bucket, high-rate input data to be
consumed by relatively low-rate).

By operating as a normal output filter mod_cache must deal with
potentially large brigades of (possibly) different (other than the stock
ones) bucket types created by other filters on the chain.


This first problem has largely been solved, bar some testing.

The solution was to pass the output filter through the save_body() hook, and let the save_body() code decide for itself when the best time is to write the bucket(s) to the network.

For example in the disk cache, the apr_bucket_read() loop will read chunks of the 4.7GB file 4MB at a time. This chunk will be cached, and then this chuck will be written to the network, then cleanup up. Rinse repeat.

Previously, save_body() was expected to save all 4.7GB to the cache, and then only write the first byte to the network possibly minutes later.

If a filter was present before cache that for any reason converted file buckets into heap buckets (for example mod_deflate), then save_body() would try and store 4.7GB of heap buckets in RAM to pass to the network later, and boom.

How mod_disk_cache chooses to send data to the network is an entirely separate issue, detailed below.

The problem arises from the fact that mod_disk_cache store function
traverses the brigade by it self reading each bucket in order to write
it's contents to disk, potentially filling the memory with large chunks
of data allocated/created by the bucket type read function (e.g. file
bucket).


To put this another way:

The core problem in the old cache code was that the assumption was made that it was practical to call apr_bucket_read() on the same data _twice_ - once during caching, once during network write.


This assumption isn't valid, thus the recent fixes.

. Constraints:

No threads/forked processes.
Bucket type specific workarounds won't work.
No core changes/knowledge, easily back-portable fixes are preferable.

. Proposed solution:

File buffering (or a part of Graham's last approach).

The solution consists of using the cache file as a output buffer by
splitting the buckets into smaller chunks and writing then to disk. Once
written (apr_file_write_full) a new file bucket is created with offset
and size of the just written buffer. The old bucket is deleted.

After that, the bucket is inserted into a temporary (empty) brigade and
sent down the output filter stack for (probably) network i/o.

At a quick glance, this solution may sound absurd -- the chunk is
already in memory, and the output filter might need it again in memory
soon. But there's no silver bullet, and it's a simple enough approach to
solve the growing memory problem while not occurring into performance
penalties.

As soon as apr_file_write_full() is executed, the bucket just saved to disk cache is also in kernel buffer memory - meaning that a corresponding apr_bucket_read() afterwards in the network code reads already kernel memory cached data.

In performance testing, on files small enough to be buffered by the kernel (a few MB), the initial part of the download after caching is very fast.

What this technique does is guarantee that regardless of the source of the response, be it a file, a CGI, or proxy, what gets written to the network is always a file, and always takes advantage of kernel based file performance features.


Regards,
Graham
--

smime.p7s
Description: S/MIME Cryptographic Signature

Re: mod_cache summary and plan

Reply via email to