Ryan Bloom wrote:
>On Monday 27 August 2001 19:25, Paul J. Reder wrote:
>
>>Ryan Bloom wrote:
>>
>>>On Monday 27 August 2001 16:05, Brian Pane wrote:
>>>
>>>>In mod_include's find_start_sequence function, there's some code that
>>>>splits the current bucket if "ctx->bytes_parsed >=
>>>>BYTE_COUNT_THRESHOLD."
>>>>
>>>>Can somebody explain the rationale for this? It seems redundant to be
>>>>splitting the data into smaller chunks in a content filter; I'd expect
>>>>mod_include to defer network block sizing to downstream filters. In
>>>>the profile data I'm looking at currently, this check accounts for 35%
>>>>of the total run time of find_start_sequence, so there's some
>>>>performance to be gained if the "ctx->bytes_parsed >=
>>>>BYTE_COUNT_THRESHOLD" check can be eliminated.
>>>>
>>>It is used to ensure that we don't buffer all the data in mod_include.
>>>It isn't really done correctly though, because what we should be doing,
>>>is just continuing to read as much data as possible, but as soon as we
>>>can't read something, send what we have down the filter stack.
>>>
>>>This variable, basically ensures we don't keep reading all the data until
>>>we process the whole file, or reach the first tag.
>>>
>>In what manner do you mean "as soon as we can't read something"? It is my
>>understanding that the bucket code hides reading delays from the
>>mod_include code. If that is true how would the mod_include code know when
>>
>
>The bucket does not hide reading delays at all. Basically, you call apr_bucket_read
>with APR_NONBLOCK_READ, and when you get a return code of APR_EAGAIN,
>you send what you have, and then call apr_bucket_read with APR_BLOCK_READ.
>
>>to send a chunk along? Are you saying the bucket code should do some magic
>>like send all buckets in the brigade up to the current one? This would
>>wreak havoc on code like mod_include that may be setting aside or tagging
>>buckets for replacement when the end of the tag is found.
>>
>
>Huh? The bucket code doesn't ever send data down the filter stack unless you
>tell it to. Take a look at the content-length filter to see what I mean.
>
>>This code was put in because we were seeing the mod_include code buffer up
>>the entire collection of buckets until an SSI tag was found. If you have a
>>200 MB file with an SSI tag footer at the end of the brigade, the whole
>>thing was being buffered. How do you propose that this be done differently?
>>
>
>I don't care if mod_include buffers 200 Megs, as long as it is constantly doing
>something with the data. If we have a 200 Meg file that has no SSI tags in
>it, but we can get all 200 Meg at one time, then we shouldn't have any problem
>just scanning through the entire 200 Megs very quickly. Worst case, we do what
>Brian suggested, and just check the bucket length once we have finished
>processing all of the data in that bucket. The buffering only becomes a
>real problem when we sit waiting for data from a CGI or some other slow
>content generator.
>
How does this patch look? It does the check only on bucket
boundaries, and it pushes out the buffered content if either:
* the amount of buffered content is too large, or
* the apr_bucket_read would block.
If this technique looks sound, I can create another patch that
does the same thing for find_end_sequence.
--Brian
Index: mod_include.c
===================================================================
RCS file: /home/cvspublic/httpd-2.0/modules/filters/mod_include.c,v
retrieving revision 1.136
diff -u -r1.136 mod_include.c
--- mod_include.c 2001/08/27 20:25:42 1.136
+++ mod_include.c 2001/08/28 06:16:38
@@ -143,9 +143,10 @@
* first byte of the BEGINNING_SEQUENCE (after finding a complete
match) or it
* returns NULL if no match found.
*/
-static apr_bucket *find_start_sequence(apr_bucket *dptr, include_ctx_t
*ctx,
- apr_bucket_brigade *bb, int
*do_cleanup)
+static apr_bucket *find_start_sequence(apr_bucket *dptr, ap_filter_t *f,
+ apr_bucket_brigade **bb, int
*do_cleanup)
{
+ include_ctx_t *ctx = f->ctx;
apr_size_t len;
const char *c;
const char *buf;
@@ -156,39 +157,43 @@
*do_cleanup = 0;
do {
+ apr_status_t rv;
+
if (APR_BUCKET_IS_EOS(dptr)) {
break;
}
- apr_bucket_read(dptr, &buf, &len, APR_BLOCK_READ);
- /* XXX handle retcodes */
- if (len == 0) { /* end of pipe? */
- break;
+
+ if ((ctx->bytes_parsed >= BYTE_COUNT_THRESHOLD) &&
+ !APR_BRIGADE_EMPTY(*bb)) {
+ apr_bucket_brigade *remainder = apr_brigade_split(*bb, dptr);
+ rv = ap_pass_brigade(f->next, *bb);
+ *bb = remainder;
+ if (rv != APR_SUCCESS) {
+ return NULL;
+ }
}
- c = buf;
- while (c < buf + len) {
- if (ctx->bytes_parsed >= BYTE_COUNT_THRESHOLD) {
- apr_bucket *start_bucket;
- if (ctx->head_start_index > 0) {
- start_index = ctx->head_start_index;
- start_bucket = ctx->head_start_bucket;
+ rv = apr_bucket_read(dptr, &buf, &len, APR_NONBLOCK_READ);
+ if (rv == APR_EAGAIN) {
+ if (!APR_BRIGADE_EMPTY(*bb)) {
+ apr_bucket_brigade *remainder = apr_brigade_split(*bb,
dptr);
+ rv = ap_pass_brigade(f->next, *bb);
+ *bb = remainder;
+ if (rv != APR_SUCCESS) {
+ return NULL;
}
- else {
- start_index = (c - buf);
- start_bucket = dptr;
- }
- apr_bucket_split(start_bucket, start_index);
- tmp_bkt = APR_BUCKET_NEXT(start_bucket);
- if (ctx->head_start_index > 0) {
- ctx->head_start_index = 0;
- ctx->head_start_bucket = tmp_bkt;
- ctx->parse_pos = 0;
- ctx->state = PRE_HEAD;
- }
-
- return tmp_bkt;
}
+ rv = apr_bucket_read(dptr, &buf, &len, APR_BLOCK_READ);
+ }
+ if (rv != APR_SUCCESS) {
+ return NULL;
+ }
+ if (len == 0) { /* end of pipe? */
+ break;
+ }
+ c = buf;
+ while (c < buf + len) {
if (*c == str[ctx->parse_pos]) {
if (ctx->state == PRE_HEAD) {
ctx->state = PARSE_HEAD;
@@ -244,7 +249,7 @@
ctx->bytes_parsed++;
}
dptr = APR_BUCKET_NEXT(dptr);
- } while (dptr != APR_BRIGADE_SENTINEL(bb));
+ } while (dptr != APR_BRIGADE_SENTINEL(*bb));
return NULL;
}
@@ -2363,7 +2368,7 @@
int do_cleanup = 0;
apr_size_t cleanup_bytes = ctx->parse_pos;
- tmp_dptr = find_start_sequence(dptr, ctx, *bb, &do_cleanup);
+ tmp_dptr = find_start_sequence(dptr, f, bb, &do_cleanup);
/* The few bytes stored in the ssi_tag_brigade turned out
not to
* be a tag after all. This can only happen if the starting