From: "Aaron Bannert" <[EMAIL PROTECTED]>
Sent: Wednesday, October 31, 2001 2:54 PM


> On Wed, Oct 31, 2001 at 02:18:58PM -0600, William Rowe wrote:
> > 
> > The most efficient model is for the consumer to keep calling the chain
> > until it has sufficient bytes for what it is trying to query [be it one
> > line, one n byte block, one null delimited record, or whatever] and push
> > back on the filter chain the buckets it refuses [for whatever reason] to
> > consume.
> 
> I'm beginning to come around in support of the "push-back" approach to
> this problem. The thing I'm beginning to like about it are the way it
> treats all parsers the same, whether they are simply looking for LFs
> or they are extracting SSI tags. (Simply pushing back all the unused
> data after each call is probably inefficient, but we might be able
> to limit that to when it is really needed, like when that filter is to
> be removed from the chain...implementation details.)

Look at what we are really saying, here.  App requests data from the client.
It gives me a brigade containing a bucket or three of data.  App sees that 
partway through, there is a boundry condition.  App splits the brigade at
that boundry.  App calls back on the filter chain, pushing that unused
brigade back to the prior filter.  Prior filter either;

  1. Never changed content length or contents - ergo it can push it back 
     further and will never deal with the headaches

  2. Does change the data but not the length or content.  However, if we 
     are sufficiently clever and know the buckets pushed back at us, we
     could push back -original- buckets at the parent, of their original
     content.  This bridges 1 above with 3 below, as an optimization, but
     it is so complex to author I'm certain authors would rarely try this.

  3. Does change the data.  It needs a set-aside helper to hang on to this
     pushed back data for the next read brigade call.  This is the one type
     of filter that suffers, significantly, from a push back model.  Contrast
     this, though, with the next case;

  4. Already sets-aside incomplete data itself, so it appends this brigade to
     any existing set-aside brigade.

> > Pull only works when everyone in the chain agrees on boundry conditions,
> > and a good filtering design doesn't impose one filter's boundry condition
> > on another filter; those should be transparent.
> 
> [High-level discussion from a self-acknowledged filter/zero-copy newbie]
> 
> I think this comes all the way back to the ideas I was forming on the
> apr-serf list about what types of data a bucket can represent. Having
> boundary conditions implies uniform units of data, but that is now how our
> filters work (i.e. get_mime_headers() wants lines, not chunks of data or
> bytes; mod_include wants a data stream). I think most of the complication
> here is because we want to write filters that can quickly gather the
> data they need in a form that is native to that filter, while OTOH we
> are writing the zero-copy routines to pass byte-buffers around as quickly
> as possible. Maybe those aren't as compatible as we'd like them to be.

Yes.  When we bridge from using buckets 'just for httpd' to using buckets
everywhere, this becomes a significant weakness.  Both the consumer and the
provider have knowledge of boundries they enforce, and boundries they can't
expect.  For zero copy to be truly effective, filters should be size and
content agnostic, until they discover the elements they care about.

But I'm just beating the same hollow drum again, I'll hush up now until 
the next good example bites ;)

Bill

Reply via email to