Graham Leggett wrote:

> On 06 Sep 2010, at 11:00 PM, Paul Querna wrote:
> 
>> Isn't this problem an artifact of how all bucket brigades work, and is
>> present in all output filter chains?
>>
>> An output filter might be called multiple times, but a single bucket
>> can still contain a 4gb chunk easily.
>>
>> It seems to me it would be better to think about this holistically
>> down the entire output filter chain, rather than building in special
>> case support for this inside mod_cache's internal methods?
> 
> In the cache case, thinking about it a bit the in and out brigades are
> probably unavoidable, as the cache is a special case in that it wants
> to write the data twice, once to the cache, a second time to the rest
> of the filter stack. Right now, the cache is forced to read the
> complete brigade to cache it, no option to give up early. And the
> cache has no choice but to keep the brigade buckets in the brigade so
> that they can be passed a second time up the filter stack, no deleting
> buckets as you go like you normally would. Read one 4GB file bucket in
> the cache, and in the process the file bucket gets morphed into 1/2
> million heap buckets, oops. With two brigades, one in, one out, the in
> brigade can have the buckets removed as they are consumed, as normal,
> and moved to the out brigade. The cache can quit at any time, and the
> code following knows what data to write to the network (out), and what
> data to loop round and resend to the cache (in). The cache provider
> could choose to quit and ask to be called again either because writing
> took too long, or too much data was read (and in the process became
> heap buckets), either reason is fine.
> 
> That said, following on your suggestion of thinking about this in the
> general sense, it would be really nice if the filter stack had the
> option to say "I have bitten off as much of the brigade as I am
> prepared to chew on right now, and the leftovers are still in the
> brigade, can you call me back with this data, maybe with more data
> added, and I'll try swallow some more?".
> 
> In theory, that would mean all handlers (or entities that sent data)
> would no longer be allowed to make the blind assumption that the
> filter stack was willing to consume every possible set of buckets the
> handler wanted to send, and that the stack had the right to go "I'm
> full, give me a second to chew on this".
> 
> This wouldn't need separate brigades, probably just a return code that
> meant EAGAIN, and that was expected to be honoured by handlers.
> 
> Regards,
> Graham
> --

Retrieving bodies from the cache has a similar scalability issue.  The 
CACHE_OUT filter makes a single call to the provider's recall_body().  The 
entire body must be placed in a single brigade which is sent along the 
filter chain with a single ap_pass_brigade() call.

If a custom provider is using heap buckets and the body is large, then this 
can consume too much memory.  It would be better to loop, asking the 
provider repeatedly for portions of the body until the provider provides an 
EOS bucket.  Is there interest in a patch implementing this approach?

Thanks,
Paul

Reply via email to