On Wed, Oct 31, 2001 at 02:18:58PM -0600, William Rowe wrote:
> From: "Justin Erenkrantz" <[EMAIL PROTECTED]>
> Sent: Wednesday, October 31, 2001 2:00 PM
> 
> 
> > Yes, they are worthless because the core input filter has no way of 
> > knowing if it is working on a request, header, or body line.  
> 
> It never needs to know.
> 
> > There is simply a call to ap_get_brigade with *readbytes == 0.  That 
> > code has no way of knowing what the maximum request fields size is 
> > (it has no knowledge of HTTP).  
> 
> It doesn't need to know.  It reads, and returns what it's got.  If that 
> is 5 bytes on the socket, fine, then the consumer needs to call back and 
> accumulate more if that wasn't enough.  Your choice of blocking/nonblock
> should only indicate that any data will be returned, not how much.
> 
> > This point was brought up by OtherBill 
> > and Aaron when we discussed the input filtering changes (i.e. maybe 
> > we should have a maximum).  I think readline should be a distinct 
> > mode not an interpretation of the length (and the passed-in length
> > can now be the max to read).
> 
> My point from day one is that this model is flawed.  Whatever we have
> read should simply be returned.  If that's too many or two few bytes,
> so be it.  The consumer [the guy trying to interpret this line of data]
> is the one who should handle it.  This merely reinterates the need for
> better design.

I agree that this is a design issue.

> The most efficient model is for the consumer to keep calling the chain
> until it has sufficient bytes for what it is trying to query [be it one
> line, one n byte block, one null delimited record, or whatever] and push
> back on the filter chain the buckets it refuses [for whatever reason] to
> consume.

I'm beginning to come around in support of the "push-back" approach to
this problem. The thing I'm beginning to like about it are the way it
treats all parsers the same, whether they are simply looking for LFs
or they are extracting SSI tags. (Simply pushing back all the unused
data after each call is probably inefficient, but we might be able
to limit that to when it is really needed, like when that filter is to
be removed from the chain...implementation details.)

> Pull only works when everyone in the chain agrees on boundry conditions,
> and a good filtering design doesn't impose one filter's boundry condition
> on another filter; those should be transparent.


[High-level discussion from a self-acknowledged filter/zero-copy newbie]

I think this comes all the way back to the ideas I was forming on the
apr-serf list about what types of data a bucket can represent. Having
boundary conditions implies uniform units of data, but that is now how our
filters work (i.e. get_mime_headers() wants lines, not chunks of data or
bytes; mod_include wants a data stream). I think most of the complication
here is because we want to write filters that can quickly gather the
data they need in a form that is native to that filter, while OTOH we
are writing the zero-copy routines to pass byte-buffers around as quickly
as possible. Maybe those aren't as compatible as we'd like them to be.

> > As Greg Ames pointed out, I'm not a fan of this commit, but it is 
> > an appropriate stopgap until something cleaner comes along.  -- justin
> 
> Agreed, better a running server than no server a'tall.

-aaron

Reply via email to