On 09/02/2010 07:16 PM, Graham Leggett wrote:
> Hi all,
> 
> An issue with mod_cache I would like to address this weekend is the
> definition of the store_body() function in the cache implementation
> provider:
> 
>     apr_status_t (*store_body)(cache_handle_t *h, request_rec *r,
> apr_bucket_brigade *b);
> 
> Right now, mod_cache expects a cache implementation to swallow the
> entire bucket brigade b before returning to mod_cache.
> 
> This is fine until the bucket brigade b contains something really large,
> such as a single file bucket pointing at a 4GB DVD image (such a
> scenario occurs when files on a slow disk are cached on a fast SSD
> disk). At this point, mod_cache expects the cache implementation to
> swallow the entire brigade in one go, and this can take a significant
> amount of time, certainly enough time for the client to get bored and
> time out should the file be large and the original disk slow.

I guess this makes sense for another reason as well. Looking at your example
(a single file bucket of a 4 GB file) I think the current implementation
can consume an insane amount of virtual memory in the httpd process as it
transforms the file bucket into mmap buckets while reading the file bucket
to store its contents in the cache.
Taking the 4 GB example this kills a 32 bit process.

> 
> What I propose is a change to the function that looks like this:
> 
>     apr_status_t (*store_body)(cache_handle_t *h, request_rec *r,
> apr_bucket_brigade *in, apr_bucket_brigade *out);
> 
> Instead of one brigade b being passed in, we pass two brigades in, one
> labelled "in", the other labelled "out".
> 
> The brigade previously marked "b" becomes "in", and the cache
> implementation is free to consume as much of the "in" brigade as it sees
> fit, and as the "in" brigade is consumed, the consumed buckets are moved
> to the "out" brigade.
> 
> If store_body() returns with an empty "in" brigade, mod_cache writes the
> "out" brigade to the output filter stack and we are done as is the case
> now.
> 
> Should however the cache implementation want to take a breath, it
> returns to mod_cache with unconsumed bucket(s) still remaining in the
> "in" brigade. mod_cache in turn sends the already-processed buckets in
> the "out" brigade down the filter stack to the client, and then loops
> round, calling the store_body() function again until the "in" brigade is
> empty.
> 
> In this way, the cache implementation has the option to swallow data in
> as many smaller chunks as it sees fit, and in turn the client gets fed
> data often enough to not get bored and time out if the file is very large.

Sounds reasonable and should solve the problem above as well, provided
that the downstream filters consume these buckets and delete them.

Regards

RĂ¼diger


Reply via email to