On Wed, Oct 03, 2001 at 11:35:45AM -0500, William A. Rowe, Jr. wrote:
> From: "Justin Erenkrantz" <[EMAIL PROTECTED]>
> Sent: Wednesday, October 03, 2001 11:15 AM
> 
> > On Wed, Oct 03, 2001 at 08:10:11AM -0500, William A. Rowe, Jr. wrote:
> > > Now ... the core input filter can't decide where to break input, it has to allow
> > > connection filters to insert themselves and decode whatever is read.
> > > 
> > > That means that the core filter needs to be split, and another line input
> > > dequeue filter needs to be inserted as the last 'connection' filter.  That
> > > leaves us ready for the request filter to read 'lines' or 'bytes' and make
> > > decisions based on the lines and bytes read, and fall back on the line input
> > > dequeue to keep the 'next' request's input set-aside for the next request.

No... The input filters have a mode which says "give me a line of input."
*Every* filter must be able to obey that mode. Reading lines can happen at
any point in the input filter stack -- it is *not* just something for the
top level app (httpd) to perform. Consider:

*) httpd needs to read a line for the request ("GET /foo HTTP/1.0")

*) httpd reads lines for the MIME headers

*) HTTP_IN reads lines that specify a chunk size

*) HTTP_IN reads lines for trailer "headers"

In each case, the next filter is supposed to return a single line.

This is exactly why I proposed adding the brigade functions to read lines.
We don't want to build line-reading code into each and every filter.

> > Yes and no.  You are kind of right here - I see why you might want to
> > do that, but that was the previous architecture with HTTP_IN and
> > CORE_IN.  (HTTP_IN was an admittedly bad name for it, but that is 
> > what it tried to do - do readlines.)

Nah. HTTP_IN was simply the only guy who happened to respond properly to the
"*readbytes == 0" mode.

Of course, using count==0 as a mode specifier is whacky. We already pass a
mode parameter, so it should be used. There are two sets of flags that go in
that param: blocking, and read block/line. The "peek" mode and -1 are other
bastard styles; one is "delete any blank lines you see" and the other is
"read everything from the input".

> Then that was the change, to eliminate the HTTP_IN filter?

No.

The idea was to get CORE_IN to handle the various "get_brigade" modes.
Before, it was doing something totally "wrong" -- it returned a brigade that
(effectively) had "everything" in it. That prevented higher level filters
from reading only a specific amount of data from the socket (e.g. whatever
the Content-Length header said). In turn, that meant our input reading and
parsing was busted. (well, it worked, but the coupling created a completely
fragile system)

The second item that was changed was to combine HTTP_IN and DECHUNK. The
former dealt with Content-Length type requests and the latter with chunked
requests. The former could read lines, which happened to support what
DECHUNK was doing.

Of course, when DECHUNK read a line, it used the broken ap_getline()
function. That thing would read from the *top* of the filter stack. That
would actually mean that DECHUNK was re-entered, it would read a line again,
etc etc. I cannot imagine how the original code would have *ever* worked
with chunking on the request.

> Removing HTTP_IN
> and trying to have a CORE_IN of the everything and the bathwater is a broken 
> model.

Yes, it would be. Thankfully, we weren't shooting for that :-)

> You've succeeded in reverting Apache to an http-only server, if that was your
> intention.  As this eliminates our flexibile interoperability for other 
> protocols, I need to veto this patch.

Slow down, Tex :-)

I would suggest learning what happened, and where things are going, before
pulling out that veto card.

> > A lot of the complexity was removed by assuming that only one filter
> > has the -1 brigade.  And, Greg and Ryan have commented that they 
> > would rather see CORE_IN not deal with socket buckets at all but 
> > read directly from the socket.

Yah. The socket bucket and the brigade it was put in is a lot of needless
structure. Consider what is going on there: you're using the brigade to read
from the socket. A bit of buffering is going on to hold unused data that was
read.

But note: you're *copying* data in there.

* apr_bucket_read() to generate a HEAP bucket from the SOCKET bucket

* ->split() to split that bucket into (requested) / (unused)
  
  - the split mallocs and copies a new bucket.

* move the post-split bucket from ctx->b to the passed-brigade


That is a lot of work, and a needless copy. Instead, the CORE_IN can alloc a
buffer of the requested size and do a apr_socket_read() right into that
buffer. Wrap a bucket around it and place it into the passed-brigade.

Of course, you might not read everything requested, and you might also want
to limit the allocation to some maximum number of bytes. (note that the
current code uses apr_brigade_partition() which blocks until the requested
number of bytes is read)

> The CORE_IN filter reads directly from the socket.  If you want to provide a
> readline and readbytes here, both blocking and nonblocking, that's fine.  It
> doesn't mitigate the fact that the ssl filter will transform any readline into
> read bytes, and that an HTTP filter must handle readline after recieving some
> readbytes, and handle ALL the chunking behavior.

No problem.

> > Ryan and Greg, how do you guys see this working?  I yield to
> > your wisdom here...  If the CORE_IN read from the socket 
> > directly as you both have suggested, how would (or should) 
> > mod_ssl interact?

Why should mod_ssl care *how* data is read? Why should it care whether the
data comes from a socket?  Feh.

mod_ssl says "give me N bytes" and the *next* filter returns *up to* that
number of bytes. That next filter *might* be CORE_IN.

CORE_IN reads from the socket, placing buckets into the passed-brigade (per
the comment above).

> > I see it being a much simpler task to write a module that
> > replaces CORE_IN than trying to work around it.  It's not

No. As Ryan said, "bogus". mod_ssl reads from the "next" filter. It can do
that quite fine. Why should it care about sockets? Why should we remove
CORE_IN when the SSL filter happens to be installed?

> > that much code - and I think mod_ssl could even ditch
> > some of the lesser-supported modes (readbytes==-1 and PEEK 
> > which it already doesn't support).  We'll probably end up
> > removing them in core at some point.  -- justin

Those modes should be fixed independent of mod_ssl. The "peek" thing really
isn't, and I don't even know whether and how Netscape's "padding" interacts
with SSL on the wire (do they still pop out of the encrypted pipe? maybe it
knows better than to send them?)

It Netscape still puts that \r\n down the encrypted pipe, then mod_ssl would
need to deal with seeing that stuff and tossing it.

> If you have to replace CORE_IN, then your implementation of CORE_IN is
> broken.  For that too I would veto the patch you've committed, if this
> can't be fixed to allow the old functionality.

It doesn't have to be replaced. *PLEASE* ... Ease up on your veto threats.
Take a breath and stop to learn what is happening.

> Maybe filtering needs some pushback so each higher level filter can avoid
> hanging on to bytes it doesn't want or need, and push them back at a lower
> level filter that can keep them in their queue for the next transaction
> within the request or the next request within the connection.

At this point, we don't need any concept of pushback. A filter should return
up to what was asked and no more.

But I seriously doubt that is the problem here. I'm guessing that mod_ssl
was coded "knowing" that the brigade it got back from f->next was a socket
bucket. Or that the brigade had certain properties when reading from it.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Reply via email to