Re: [PATCH] Move http header parsing into the http_input_filter
I am working on the next rev of this patch fixing some of the problems pointed out by Justin and Bill R. I have the most concern about protocol negotiation. What's in store for HTTP post 1.1? We may have much bigger problems than simply replacing a filter. Bill
Re: [PATCH] Move http header parsing into the http_input_filter
Justin Erenkrantz wrote: --On Tuesday, February 25, 2003 8:09 PM -0500 Bill Stoddard [EMAIL PROTECTED] wrote: Ok, I know what I was thinking. Check out tmp_bb in ap_read_request in protocol.c (unpatched version). We create tmp_bb, do read_request_line, ap_get_mime_headers then destroy tmp_bb. My patch does essentially the same thing. If there is a request body to read, whoeever needs to read it will create a new brigade. In this regard, my patch does not create any new bogosity. (and no, we do not throw away the first read). No, it's not the same thing. One is: 'read the request line and headers using this temporary brigade,' the other is: 'read part of the input body into this brigade - oh, by the way, feel free to read the headers and status while you are at it, and, oh, don't feel that I require any data.' No, I think it is the same. The request body (or the next request in a set of pipelined requests) is set aside in the http_input_filter_ctx_t in the HTTP_IN filter after the headers are parsed. The next call to ap_get_brigade to fetch the body will fetch any setaside buckets then call down into core_in for more bytes as needed. I probably cannot say this enough... I've not done any work to get the code to handle a request body, but AFAIK, it should not be difficult to do. As I've said before, the patch uses ap_get_brigade only as a way to drive the input filters before we are ready to read the body. Call me dense but the unpatched code in ap_read_request sure appears to do the same thing. To me, it indicates a strong disconnect with how the filters are designed. Filters are meant to be called asynchronously as needed (pulled), but the request parsing must happen synchronously and at a specific point in time. I don't fully understand why the status and header parsing must belong in an input filter (hammer, meet balloon). I think the patch could be rewritten removing the context of an input filter and still see significant performance benefits. At the time the request is read, there should be no other input filters, so I don't believe there would be significant overhead to not being an input filter. -- justin One of the keys behind why this patch improves performance is the elimination of the repeat calls to ap_rgetline() to read the header fields. ap_rgetline calls ap_get_brigade(AP_MODE_GETLINE) which drived multiple trips into the core input filter. The patch enables ap_http_filter to make a -single- call to ap_get_brigade(AP_MODE_READBYTES) (and a single entry to core_in) and works on parsing the header fields out of the returned brigade. So how do you solve the problem of ap_get_brigade(AP_MODE_READBYTES) returning bytes beyond the header fields? AFAIK there are two ways to solve this: 1. push the extra bytes back to the core input filter 2. setaside the extra bytes in the ap_http_filter context so that they can be fetched later My patch implements 2) which is essentially the same thing core_input_filter does. core_in will issue a bucket read on the socket and that read may return request body bytes. If core_in is entered with mode AP_MODE_GETLINE, only a single line is returned and the other bytes are setaside in the core_input_filter context. My patch just moves that setaside function up one level to a protocol level filter. Bill
Re: [PATCH] Move http header parsing into the http_input_filter
Justin, Thanks for the review. It'll be a while before I have time to respond. This week is booking up fast :-( Bill
Re: [PATCH] Move http header parsing into the http_input_filter
At 02:42 AM 2/25/2003, Justin Erenkrantz wrote: [...] I agree with 95% of all your observations, so I'll just focus on a few discrepancies. --On Monday, February 24, 2003 12:21 PM -0500 Bill Stoddard [EMAIL PROTECTED] wrote: +rv = read_request_line(f-next, r, b, mode); +if (rv != APR_SUCCESS) { +return rv; +} +rv = read_header_fields(f-next, r, b, mode); +if (rv != APR_SUCCESS) { +#if 0 +/* Handle errors at the top of the filter chain? */ +if (r-status != HTTP_REQUEST_TIME_OUT) { +ap_log_rerror(APLOG_MARK, APLOG_ERR, 0, r, + request failed: error reading the headers); +ap_send_error_response(r, 0); +ap_run_log_transaction(r); +} +#endif I believe the strategy we have taken is that protocol errors should be passed to the output filters directly. The application (caller) of this filter has no idea what to do in an error. I think maintaining that approach would be a good thing. What do you mean? Returning a non-apr_status_t result? We just got finished hashing that out - since filters are APR brigade based, then they must return a recognizable apr_status_t value. If you want to extend that error space in the application error range, you can do that for our filters. But you can't just return 500. Who cares if the caller doesn't understand the error, they don't necessarily understand 555 either. It's either good (0) or not. --- server/protocol.c 3 Feb 2003 17:53:19 - 1.127 +++ server/protocol.c 24 Feb 2003 17:03:21 - @@ -923,117 +924,16 @@ ... +r-connection-r = r; +bb = apr_brigade_create(r-pool, r-connection-bucket_alloc); +rv = ap_get_brigade(r-input_filters, bb, AP_MODE_READBYTES, +APR_BLOCK_READ, HUGE_STRING_LEN); +apr_brigade_destroy(bb); Um, what? I think this is how you are trying to get around when to call the input filter. This is extremely bogus. You can't read data and then throw it away. And, no, setaside isn't the answer (sorry, OtherBill). Huh? What setaside? Pushback, damn it - and your comments earlier to Bill (which were correct) highlight how bogus the entire we want to make callers responsible for everything they fetch - stick it out so to speak, even if they would rather have read, and would then rather just go away, when they recognize from these bytes that they didn't belong in the filter chain. (Paraphrasing) The real answer is pushing back on the lower level filter so the next caller (not the same module) reads beginning with the correct bytes. Maybe the comprimize is that only connection level filters should support pushback (at least initially) so that things like ssl filtering can be inserted and removed painlessly, and the first HTTP_IN filter can also be added or dropped for another protocol. I've never advocated the current filter setaside approach, it buries bytes in the wrong filters, if that's what you trying to associate me with, Justin. In order to even contemplating merging this, I would like to see a clear plan on how to support 'Upgrade' in HTTP/1.1. I imagine in the lifetime of 2.1/2.2, we will have a new HTTP protocol. If your solution doesn't allow for Upgrade, it'll have to be thrown away later. -- justin We support upgrade now in httpd-2.1 .0-dev, so if we break it, it would be obvious. Would be, had we something to test it with in perl-framework. I'll just close by agreeing with FirstBill's noble goals here - and point out again that this is the time to revisit what we 'got wrong' the first time around with our filter approach :-) Bring on the patches - but I think we are a little ways from committing such a radical patch just yet. I'll review in great depth the second draft, since Justin did such a nice job with the first draft review. Hopefully trying to get these changes right will tell us just where we went wrong with the first filters approach. My bottom line on the diatribe above is this; once a request's filters are done, there should be another request just sitting in the connection filter waiting to be read. That connection filter must be protocol agnostic so the next HTTP_IN filter has a fresh start with no bias. Now exactly how that happens without all the extra little cruft we've added, that Bill is trying to mop up, is possibly best done with pushing back the unneeded data onto the connection filter. I'll leave my comments at that and go off to stare at my belly button and try to come up with fresh ideas. Bill
Re: [PATCH] Move http header parsing into the http_input_filter
At 11:04 AM 2/25/2003, Justin Erenkrantz wrote: --On Tuesday, February 25, 2003 10:20 AM -0600 William A. Rowe, Jr. [EMAIL PROTECTED] wrote: What do you mean? Returning a non-apr_status_t result? We just got finished hashing that out - since filters are APR brigade based, then they must return a recognizable apr_status_t value. If you want to extend that error space in the application error range, you can do that for our filters. But you can't just return 500. Creating an error bucket that describes the HTTP failure and pushing it out to the output_filters. Then a suitable error code can be returned. But, I was focusing on returning an error to the client rather than returning an error to the caller. Yes - error_buckets are neglected right now. And I agree that we need to review who's on first when it comes to providing responses to the client. The real answer is pushing back on the lower level filter so the next caller (not the same module) reads beginning with the correct bytes. This has been a point of disagreement between us for a long time. I believe we can create a filter model that doesn't rely on pushback. I think in the end, the model is better for not having pushback. Better how? Simpler? Well, if you measure not having to set aside data that the caller isn't ready to use, it's half of one/half dozen of the other if the caller does that (trapping those bytes) or pushes it back on the prior filter. The point is that whatever API we use should be trivial for module authors to implement, so I agree simplicity is the key, and we should help that no matter which flavor is implemented for 2.2. I've never advocated the current filter setaside approach, it buries bytes in the wrong filters, if that's what you trying to associate me with, Justin. I mixed up setaside and pushback. =) Sorry. I thought that might be the case :-) We support upgrade now in httpd-2.1 .0-dev, so if we break it, it would be obvious. Would be, had we something to test it with in perl-framework. Ah, okay. Perhaps we should create a test. My current thinking is use lwp and family to first test the negotiation, then to create another lwp session with a dirty kluge to pass the now-upgrading ssl socket into the rsa thunks for upgrading. It will be dirty and evil, and I need to spend some time on this. Once we've proven it works with the kludge, then we can upgrade the lwp/libwww stuff to support upgrade (and neon and others can begin to implement for our SVN friends.) I'll just close by agreeing with FirstBill's noble goals here - and point out again that this is the time to revisit what we 'got wrong' the first time around with our filter approach :-) Bring on Right, I'm all for removing as many instructions as we can and cleaning up what we can in the API. -- justin Good, we are all on the same page, and this doesn't need to become the filter wars, redux :-) FWIW I never see this code going into 2.0.x but only moving forward with the suggested improvements for the 2.2 release. I was hoping for a first release by summer, but that requires a stable API. Knowing everyone's busy plates, lets keep our fingers crossed and our dialog flowing so there might be some hope of that :-) If 2.2 is filtering and authn/authz both done the 'right way', I'd say that's enough of a full plate for us to call 2.2 baked and move on further redesign to 2.4 (file store and expectations for async mpm's all done the right way?) for release at the end of the year. Before we can release 2.2 with connection: upgrade tls support, we must have tests, so I will probably start there. Bill
Re: [PATCH] Move http header parsing into the http_input_filter
Index: server/protocol.c === RCS file: /home/cvs/httpd-2.0/server/protocol.c,v retrieving revision 1.127 diff -u -r1.127 protocol.c --- server/protocol.c3 Feb 2003 17:53:19 -1.127 +++ server/protocol.c24 Feb 2003 17:03:21 - @@ -923,117 +924,16 @@ ... +r-connection-r = r; +bb = apr_brigade_create(r-pool, r-connection-bucket_alloc); +rv = ap_get_brigade(r-input_filters, bb, AP_MODE_READBYTES, +APR_BLOCK_READ, HUGE_STRING_LEN); +apr_brigade_destroy(bb); Um, what? I think this is how you are trying to get around when to call the input filter. This is extremely bogus. You can't read data and then throw it away. And, no, setaside isn't the answer (sorry, OtherBill). I have to responds to this (even though I am up to my ears in other work)... If bb contains -any- bytes after after the call to ap_get_brigade(), then there is something wrong (that is fixable) in the filter stack. As I envision how this should operate, bb should be empty when we call apr_brigade_destroy. It it is not, then something else is broken. And you are right that ap_brigade_getline() is ignoring the possibility of getting an EOS bucket. That's a bug in ap_brigade_getline() that can easily be fixed. I also chose to implement ap_brigade_getline() in the most direct fashion possible (rather than use apr_brigade_splitline, et. al.) because it is much more efficient the way I did it. Yes, there is some code duplication, but it is minor. Bill
[PATCH] Move http header parsing into the http_input_filter
This patch trims about 2500 instructions out of the path between ap_read_request and ap_graceful_stop_signalled when handling a simple GET request (will post some profiles in a few minutes). The patch demonstrates some primary principles but is still far from complete. POST requests will certainly fail, as will pipelined requests, but the basic framework is complete enought to allow these to be fixed without requiring major rewriting of code. Quick Overview and Interesting (controversial) Design Points 1. HTTP header parsing moved out of protocol.c and into the HTTP_IN filter (ap_http_filter in http_protocol.c) 2. HTTP_IN is now nearly purely state driven. ap_http_filter registers a cleanup on the request pool that drives the filter to the 'new request' state. 3. A pointer to the current request req is placed in the conn_rec for use by ap_http_filter 4. New function, ap_brigade_getline() replaces ap_get_brigade(AP_MODE_GETLINE). This new function has some interesting properties.. - You can pass -in- a brigade to the function. - If you pass in an empty brigade, it will issue ap_get_brigade(AP_MODE_READBYTES) to fetch bytes from lower level filters - User of this function must be prepared to setaside the portion of the brigade not immediately needed (eg, the second request in a pipelined request in http_input_filter_ctx_t) I generally like the idea of putting protocol parsing in a connection level filter. I think this is a more straight forward way to use the Apache 2 infrastructure to handle protocols. Issues to resolve: 1. net_time_filter sits on top of the HTTP_IN filter. I have given this zero thought but it may need to repositioned to sit below HTTP_IN. 2. Lots of broken error paths. 3. Need to implement and debug code to handle pipelined requests and request bodies (chunked, unchunked, etc). 4. Formally address (by API changes?) the notion of a PROTOCOL filter that lasts the duration of a connection 5. Reevaluate what services should be provided by the core_input_filter vs http_in (or any other protocol) filter. Bill Index: include/http_protocol.h === RCS file: /home/cvs/httpd-2.0/include/http_protocol.h,v retrieving revision 1.84 diff -u -r1.84 http_protocol.h --- include/http_protocol.h 3 Feb 2003 17:52:53 - 1.84 +++ include/http_protocol.h 24 Feb 2003 17:03:20 - @@ -717,6 +717,9 @@ apr_bucket_brigade *); AP_DECLARE_NONSTD(apr_status_t) ap_old_write_filter(ap_filter_t *f, apr_bucket_brigade *b); +AP_CORE_DECLARE_NONSTD(apr_status_t) ap_http_header_input_filter(ap_filter_t *f, apr_bucket_brigade *b, + ap_input_mode_t mode, apr_read_type_e block, + apr_off_t readbytes); /* * Setting up the protocol fields for subsidiary requests... * Also, a wrapup function to keep the internal accounting straight. Index: include/httpd.h === RCS file: /home/cvs/httpd-2.0/include/httpd.h,v retrieving revision 1.194 diff -u -r1.194 httpd.h --- include/httpd.h 3 Feb 2003 17:52:53 - 1.194 +++ include/httpd.h 24 Feb 2003 17:03:20 - @@ -1013,6 +1013,7 @@ void *sbh; /** The bucket allocator to use for all bucket/brigade creations */ struct apr_bucket_alloc_t *bucket_alloc; +request_rec *r; }; /* Per-vhost config... */ Index: modules/http/http_core.c === RCS file: /home/cvs/httpd-2.0/modules/http/http_core.c,v retrieving revision 1.309 diff -u -r1.309 http_core.c --- modules/http/http_core.c3 Feb 2003 17:53:03 - 1.309 +++ modules/http/http_core.c24 Feb 2003 17:03:20 - @@ -277,11 +277,18 @@ int csd_set = 0; apr_socket_t *csd = NULL; +/* Add the http_header_input_filter to the filter stack. This filter should + * remain installed for the duration of the connection. + * XXX: should this be added in a pre_connection hook? I think doing it + * here makes the code easier to read and understand. + */ +ap_add_input_filter_handle(ap_http_input_filter_handle, + NULL, NULL, c); + /* * Read and process each request found on our connection * until no requests are left or we decide to close. */ - ap_update_child_status(c-sbh, SERVER_BUSY_READ, NULL); while ((r = ap_read_request(c)) != NULL) { @@ -327,7 +334,6 @@ return OK; } - static void register_hooks(apr_pool_t *p) { ap_hook_process_connection(ap_process_http_connection,NULL,NULL, @@ -338,7 +344,7 @@ ap_hook_create_request(http_create_request, NULL, NULL, APR_HOOK_REALLY_LAST); ap_http_input_filter_handle = ap_register_input_filter(HTTP_IN,