On 06 Sep 2010, at 11:00 PM, Paul Querna wrote:
Isn't this problem an artifact of how all bucket brigades work, and is
present in all output filter chains?
An output filter might be called multiple times, but a single bucket
can still contain a 4gb chunk easily.
It seems to me it would be better to think about this holistically
down the entire output filter chain, rather than building in special
case support for this inside mod_cache's internal methods?
In the cache case, thinking about it a bit the in and out brigades are
probably unavoidable, as the cache is a special case in that it wants
to write the data twice, once to the cache, a second time to the rest
of the filter stack. Right now, the cache is forced to read the
complete brigade to cache it, no option to give up early. And the
cache has no choice but to keep the brigade buckets in the brigade so
that they can be passed a second time up the filter stack, no deleting
buckets as you go like you normally would. Read one 4GB file bucket in
the cache, and in the process the file bucket gets morphed into 1/2
million heap buckets, oops. With two brigades, one in, one out, the in
brigade can have the buckets removed as they are consumed, as normal,
and moved to the out brigade. The cache can quit at any time, and the
code following knows what data to write to the network (out), and what
data to loop round and resend to the cache (in). The cache provider
could choose to quit and ask to be called again either because writing
took too long, or too much data was read (and in the process became
heap buckets), either reason is fine.
That said, following on your suggestion of thinking about this in the
general sense, it would be really nice if the filter stack had the
option to say "I have bitten off as much of the brigade as I am
prepared to chew on right now, and the leftovers are still in the
brigade, can you call me back with this data, maybe with more data
added, and I'll try swallow some more?".
In theory, that would mean all handlers (or entities that sent data)
would no longer be allowed to make the blind assumption that the
filter stack was willing to consume every possible set of buckets the
handler wanted to send, and that the stack had the right to go "I'm
full, give me a second to chew on this".
This wouldn't need separate brigades, probably just a return code that
meant EAGAIN, and that was expected to be honoured by handlers.
Regards,
Graham
--