Re: [PATCH] Move http header parsing into the http_input_filter

2003-02-27 Thread Bill Stoddard
I am working on the next rev of this patch fixing some of the problems 
pointed out by Justin and Bill R. I have the most concern about protocol 
negotiation. What's in store for HTTP post 1.1? We may have much bigger 
problems than simply replacing a filter.

Bill



Re: [PATCH] Move http header parsing into the http_input_filter

2003-02-26 Thread Bill Stoddard
Justin Erenkrantz wrote:
--On Tuesday, February 25, 2003 8:09 PM -0500 Bill Stoddard 
[EMAIL PROTECTED] wrote:

Ok, I know what I was thinking. Check out tmp_bb in ap_read_request
in protocol.c (unpatched version). We create tmp_bb, do
read_request_line, ap_get_mime_headers then destroy tmp_bb.  My
patch does essentially the same thing.  If there is a request body
to read, whoeever needs to read it will create a new brigade. In
this regard, my patch does not create any new bogosity. (and no, we
do not throw away the first read).


No, it's not the same thing.  One is: 'read the request line and headers 
using this temporary brigade,' the other is: 'read part of the input 
body into this brigade - oh, by the way, feel free to read the headers 
and status while you are at it, and, oh, don't feel that I require any 
data.'
No, I think it is the same.  The request body (or the next request in a 
set of pipelined requests) is set aside in the http_input_filter_ctx_t 
in the HTTP_IN filter after the headers are parsed.   The next call to 
ap_get_brigade to fetch the body will fetch any setaside buckets then 
call down into core_in for more bytes as needed.

I probably cannot say this enough... I've not done any work to get the 
code to handle a request body, but AFAIK, it should not be difficult to 
do.

As I've said before, the patch uses ap_get_brigade only as a way to 
drive the input filters before we are ready to read the body.  
Call me dense but the unpatched code in ap_read_request sure appears to 
do the same thing.

 To me, it
indicates a strong disconnect with how the filters are designed. Filters 
are meant to be called asynchronously as needed (pulled), but the 
request parsing must happen synchronously and at a specific point in time.

I don't fully understand why the status and header parsing must belong 
in an input filter (hammer, meet balloon).  I think the patch could be 
rewritten removing the context of an input filter and still see 
significant performance benefits.  At the time the request is read, 
there should be no other input filters, so I don't believe there would 
be significant overhead to not being an input filter.  -- justin
One of the keys behind why this patch improves performance is the 
elimination of the repeat calls to ap_rgetline() to read the header 
fields. ap_rgetline calls ap_get_brigade(AP_MODE_GETLINE) which drived 
multiple trips into the core input filter. The patch enables 
ap_http_filter to make a -single- call to 
ap_get_brigade(AP_MODE_READBYTES) (and a single entry to core_in) and 
works on parsing the header fields out of the returned brigade. So how 
do you solve the problem of ap_get_brigade(AP_MODE_READBYTES) returning 
bytes beyond the header fields?  AFAIK there are two ways to solve this:

1. push the extra bytes back to the core input filter
2. setaside the extra bytes in the ap_http_filter context so that they 
can be fetched later
My patch implements 2) which is essentially the same thing 
core_input_filter does. core_in will issue a bucket read on the socket 
and that read may return request body bytes. If core_in is entered with 
mode AP_MODE_GETLINE, only a single line is returned and the other bytes 
are setaside in the core_input_filter context. My patch just moves that 
setaside function up one level to a protocol level filter.

Bill




Re: [PATCH] Move http header parsing into the http_input_filter

2003-02-25 Thread Bill Stoddard
Justin,
Thanks for the review. It'll be a while before I have time to respond. 
This week is booking up fast :-(

Bill



Re: [PATCH] Move http header parsing into the http_input_filter

2003-02-25 Thread William A. Rowe, Jr.
At 02:42 AM 2/25/2003, Justin Erenkrantz wrote:

[...]

I agree with 95% of all your observations, so I'll just focus on a few
discrepancies.

--On Monday, February 24, 2003 12:21 PM -0500 Bill Stoddard [EMAIL PROTECTED] wrote:

+rv = read_request_line(f-next, r, b, mode);
+if (rv != APR_SUCCESS) {
+return rv;
+}
+rv = read_header_fields(f-next, r, b, mode);
+if (rv != APR_SUCCESS) {
+#if 0
+/* Handle errors at the top of the filter chain? */
+if (r-status != HTTP_REQUEST_TIME_OUT) {
+ap_log_rerror(APLOG_MARK, APLOG_ERR, 0, r,
+  request failed: error reading the
headers); +ap_send_error_response(r, 0);
+ap_run_log_transaction(r);
+}
+#endif

I believe the strategy we have taken is that protocol errors
should be passed to the output filters directly.  The application
(caller) of this filter has no idea what to do in an error.
I think maintaining that approach would be a good thing.

What do you mean?  Returning a non-apr_status_t result?  We just
got finished hashing that out - since filters are APR brigade based,
then they must return a recognizable apr_status_t value.  If you want
to extend that error space in the application error range, you can do
that for our filters.  But you can't just return 500.

Who cares if the caller doesn't understand the error, they don't
necessarily understand 555 either.  It's either good (0) or not.

--- server/protocol.c   3 Feb 2003 17:53:19 -   1.127
+++ server/protocol.c   24 Feb 2003 17:03:21 -
@@ -923,117 +924,16 @@
...
+r-connection-r = r;
+bb = apr_brigade_create(r-pool, r-connection-bucket_alloc);
+rv = ap_get_brigade(r-input_filters, bb, AP_MODE_READBYTES,
+APR_BLOCK_READ, HUGE_STRING_LEN);
+apr_brigade_destroy(bb);

Um, what?  I think this is how you are trying to get around when to
call the input filter.  This is extremely bogus.  You can't read
data and then throw it away.  And, no, setaside isn't the answer
(sorry, OtherBill).

Huh?  What setaside?  Pushback, damn it - and your comments earlier
to Bill (which were correct) highlight how bogus the entire we want to
make callers responsible for everything they fetch - stick it out so to
speak, even if they would rather have read, and would then rather just go 
away, when they recognize from these bytes that they didn't belong in 
the filter chain.  (Paraphrasing)

The real answer is pushing back on the lower level filter so the next caller
(not the same module) reads beginning with the correct bytes.

Maybe the comprimize is that only connection level filters should support
pushback (at least initially) so that things like ssl filtering can be inserted
and removed painlessly, and the first HTTP_IN filter can also be added
or dropped for another protocol.

I've never advocated the current filter setaside approach, it buries bytes 
in the wrong filters, if that's what you trying to associate me with, Justin.

In order to even contemplating merging this, I would like to see
a clear plan on how to support 'Upgrade' in HTTP/1.1.  I imagine
in the lifetime of 2.1/2.2, we will have a new HTTP protocol.  If
your solution doesn't allow for Upgrade, it'll have to be thrown
away later.  -- justin

We support upgrade now in httpd-2.1 .0-dev, so if we break it, 
it would be obvious.  Would be, had we something to test it with
in perl-framework.

I'll just close by agreeing with FirstBill's noble goals here - and point
out again that this is the time to revisit what we 'got wrong' the first
time around with our filter approach :-)  Bring on the patches - but I
think we are a little ways from committing such a radical patch just
yet.  I'll review in great depth the second draft, since Justin did such
a nice job with the first draft review.  Hopefully trying to get these
changes right will tell us just where we went wrong with the first
filters approach.

My bottom line on the diatribe above is this; once a request's filters
are done, there should be another request just sitting in the connection
filter waiting to be read.  That connection filter must be protocol agnostic
so the next HTTP_IN filter has a fresh start with no bias.  Now exactly
how that happens without all the extra little cruft we've added, that Bill
is trying to mop up, is possibly best done with pushing back the unneeded
data onto the connection filter.  I'll leave my comments at that and go off
to stare at my belly button and try to come up with fresh ideas.

Bill




Re: [PATCH] Move http header parsing into the http_input_filter

2003-02-25 Thread William A. Rowe, Jr.
At 11:04 AM 2/25/2003, Justin Erenkrantz wrote:
--On Tuesday, February 25, 2003 10:20 AM -0600 William A. Rowe, Jr. [EMAIL 
PROTECTED] wrote:

What do you mean?  Returning a non-apr_status_t result?  We just
got finished hashing that out - since filters are APR brigade based,
then they must return a recognizable apr_status_t value.  If you
want to extend that error space in the application error range, you
can do that for our filters.  But you can't just return 500.

Creating an error bucket that describes the HTTP failure and pushing it out to the 
output_filters.  Then a suitable error code can be returned.  But, I was focusing on 
returning an error to the client rather than returning an error to the caller.

Yes - error_buckets are neglected right now.  And I agree that we need to
review who's on first when it comes to providing responses to the client.

The real answer is pushing back on the lower level filter so the
next caller (not the same module) reads beginning with the correct
bytes.

This has been a point of disagreement between us for a long time.  I believe we can 
create a filter model that doesn't rely on pushback. I think in the end, the model is 
better for not having pushback.

Better how?  Simpler?  Well, if you measure not having to set aside data
that the caller isn't ready to use, it's half of one/half dozen of the other if
the caller does that (trapping those bytes) or pushes it back on the prior
filter.  The point is that whatever API we use should be trivial for module
authors to implement, so I agree simplicity is the key, and we should help
that no matter which flavor is implemented for 2.2.

I've never advocated the current filter setaside approach, it
buries bytes  in the wrong filters, if that's what you trying to
associate me with, Justin.

I mixed up setaside and pushback.  =)  Sorry.

I thought that might be the case :-)

We support upgrade now in httpd-2.1 .0-dev, so if we break it,
it would be obvious.  Would be, had we something to test it with
in perl-framework.

Ah, okay.  Perhaps we should create a test.

My current thinking is use lwp and family to first test the negotiation,
then to create another lwp session with a dirty kluge to pass the
now-upgrading ssl socket into the rsa thunks for upgrading.  It will
be dirty and evil, and I need to spend some time on this.

Once we've proven it works with the kludge, then we can upgrade
the lwp/libwww stuff to support upgrade (and neon and others can
begin to implement for our SVN friends.)

I'll just close by agreeing with FirstBill's noble goals here - and
point out again that this is the time to revisit what we 'got
wrong' the first time around with our filter approach :-)  Bring on

Right, I'm all for removing as many instructions as we can and cleaning up what we 
can in the API.  -- justin

Good, we are all on the same page, and this doesn't need to become
the filter wars, redux :-)

FWIW I never see this code going into 2.0.x but only moving forward with
the suggested improvements for the 2.2 release.  I was hoping for a first
release by summer, but that requires a stable API.  Knowing everyone's
busy plates, lets keep our fingers crossed and our dialog flowing so there
might be some hope of that :-)

If 2.2 is filtering and authn/authz both done the 'right way', I'd say that's
enough of a full plate for us to call 2.2 baked and move on further redesign
to 2.4 (file store and expectations for async mpm's all done the right way?)
for release at the end of the year.

Before we can release 2.2 with connection: upgrade tls support, we must
have tests, so I will probably start there.

Bill




Re: [PATCH] Move http header parsing into the http_input_filter

2003-02-25 Thread Bill Stoddard

Index: server/protocol.c
===
RCS file: /home/cvs/httpd-2.0/server/protocol.c,v
retrieving revision 1.127
diff -u -r1.127 protocol.c
--- server/protocol.c3 Feb 2003 17:53:19 -1.127
+++ server/protocol.c24 Feb 2003 17:03:21 -
@@ -923,117 +924,16 @@
...
+r-connection-r = r;
+bb = apr_brigade_create(r-pool, r-connection-bucket_alloc);
+rv = ap_get_brigade(r-input_filters, bb, AP_MODE_READBYTES,
+APR_BLOCK_READ, HUGE_STRING_LEN);
+apr_brigade_destroy(bb);


Um, what?  I think this is how you are trying to get around when to
call the input filter.  This is extremely bogus.  You can't read
data and then throw it away.  And, no, setaside isn't the answer
(sorry, OtherBill).
I have to responds to this (even though I am up to my ears in other 
work)... If bb contains -any- bytes after after the call to 
ap_get_brigade(), then there is something wrong (that is fixable) in the 
filter stack. As I envision how this should operate, bb should be empty 
when we call apr_brigade_destroy. It it is not, then something else is 
broken.

And you are right that ap_brigade_getline() is ignoring the possibility 
of getting an EOS bucket. That's a bug in ap_brigade_getline() that can 
easily be fixed. I also chose to implement ap_brigade_getline() in the 
most direct fashion possible (rather than use apr_brigade_splitline, et. 
al.) because it is much more efficient the way I did it. Yes, there is 
some code duplication, but it is minor.

Bill



[PATCH] Move http header parsing into the http_input_filter

2003-02-24 Thread Bill Stoddard
This patch trims about 2500 instructions out of the path between 
ap_read_request and ap_graceful_stop_signalled when handling a simple 
GET request (will post some profiles in a few minutes).  The patch 
demonstrates some primary principles but is still far from complete. 
POST requests will certainly fail, as will pipelined requests, but the 
basic framework is complete enought to allow these to be fixed without 
requiring major rewriting of code.

Quick Overview and Interesting (controversial) Design Points
1. HTTP header parsing moved out of protocol.c and into the HTTP_IN 
filter (ap_http_filter in http_protocol.c)
2. HTTP_IN is now nearly purely state driven. ap_http_filter registers a 
cleanup on the request pool that drives the filter to the 'new request' 
state.
3. A pointer to the current request req is placed in the conn_rec for 
use by ap_http_filter
4. New function, ap_brigade_getline() replaces 
ap_get_brigade(AP_MODE_GETLINE). This new function has some interesting 
properties..
- You can pass -in- a brigade to the function.
- If you pass in an empty brigade, it will issue 
ap_get_brigade(AP_MODE_READBYTES) to fetch bytes from lower level filters
- User of this function must be prepared to setaside the portion of the 
brigade not immediately needed (eg, the second request in a pipelined 
request in http_input_filter_ctx_t)

I generally like the idea of putting protocol parsing in a connection 
level filter. I think this is a more straight forward way to use the 
Apache 2 infrastructure to handle protocols.

Issues to resolve:
1. net_time_filter sits on top of the HTTP_IN filter. I have given this 
zero thought but it may need to repositioned to sit below HTTP_IN.
2. Lots of broken error paths.
3. Need to implement and debug code to handle pipelined requests and 
request bodies (chunked, unchunked, etc).
4. Formally address (by API changes?) the notion of a PROTOCOL filter 
that lasts the duration of a connection
5. Reevaluate what services should be provided by the core_input_filter 
vs http_in (or any other protocol) filter.

Bill


Index: include/http_protocol.h
===
RCS file: /home/cvs/httpd-2.0/include/http_protocol.h,v
retrieving revision 1.84
diff -u -r1.84 http_protocol.h
--- include/http_protocol.h 3 Feb 2003 17:52:53 -   1.84
+++ include/http_protocol.h 24 Feb 2003 17:03:20 -
@@ -717,6 +717,9 @@
   apr_bucket_brigade *);
 AP_DECLARE_NONSTD(apr_status_t) ap_old_write_filter(ap_filter_t *f, 
apr_bucket_brigade *b);
 
+AP_CORE_DECLARE_NONSTD(apr_status_t) ap_http_header_input_filter(ap_filter_t *f, 
apr_bucket_brigade *b,
+ ap_input_mode_t 
mode, apr_read_type_e block,
+ apr_off_t readbytes);
 /*
  * Setting up the protocol fields for subsidiary requests...
  * Also, a wrapup function to keep the internal accounting straight.
Index: include/httpd.h
===
RCS file: /home/cvs/httpd-2.0/include/httpd.h,v
retrieving revision 1.194
diff -u -r1.194 httpd.h
--- include/httpd.h 3 Feb 2003 17:52:53 -   1.194
+++ include/httpd.h 24 Feb 2003 17:03:20 -
@@ -1013,6 +1013,7 @@
 void *sbh;
 /** The bucket allocator to use for all bucket/brigade creations */
 struct apr_bucket_alloc_t *bucket_alloc;
+request_rec *r;
 };
 
 /* Per-vhost config... */
Index: modules/http/http_core.c
===
RCS file: /home/cvs/httpd-2.0/modules/http/http_core.c,v
retrieving revision 1.309
diff -u -r1.309 http_core.c
--- modules/http/http_core.c3 Feb 2003 17:53:03 -   1.309
+++ modules/http/http_core.c24 Feb 2003 17:03:20 -
@@ -277,11 +277,18 @@
 int csd_set = 0;
 apr_socket_t *csd = NULL;
 
+/* Add the http_header_input_filter to the filter stack. This filter should
+ * remain installed for the duration of the connection.
+ * XXX: should this be added in a pre_connection hook? I think doing it
+ * here makes the code easier to read and understand.
+ */
+ap_add_input_filter_handle(ap_http_input_filter_handle, 
+   NULL, NULL, c); 
+
 /*
  * Read and process each request found on our connection
  * until no requests are left or we decide to close.
  */
- 
 ap_update_child_status(c-sbh, SERVER_BUSY_READ, NULL);
 while ((r = ap_read_request(c)) != NULL) {
 
@@ -327,7 +334,6 @@
 
 return OK;
 }
-
 static void register_hooks(apr_pool_t *p)
 {
 ap_hook_process_connection(ap_process_http_connection,NULL,NULL,
@@ -338,7 +344,7 @@
 ap_hook_create_request(http_create_request, NULL, NULL, APR_HOOK_REALLY_LAST);
 ap_http_input_filter_handle =
 ap_register_input_filter(HTTP_IN,