[PATCH] Re: chunking of content in mod_include?

Brian Pane Mon, 27 Aug 2001 23:03:43 -0700
Ryan Bloom wrote:

>On Monday 27 August 2001 19:25, Paul J. Reder wrote:
>
>>Ryan Bloom wrote:
>>
>>>On Monday 27 August 2001 16:05, Brian Pane wrote:
>>>
>>>>In mod_include's find_start_sequence function, there's some code that
>>>>splits the current bucket if "ctx->bytes_parsed >=
>>>>BYTE_COUNT_THRESHOLD."
>>>>
>>>>Can somebody explain the rationale for this?  It seems redundant to be
>>>>splitting the data into smaller chunks in a content filter; I'd expect
>>>>mod_include to defer network block sizing to downstream filters.  In
>>>>the profile data I'm looking at currently, this check accounts for 35%
>>>>of the total run time of find_start_sequence, so there's some
>>>>performance to be gained if the "ctx->bytes_parsed >=
>>>>BYTE_COUNT_THRESHOLD" check can be eliminated.
>>>>
>>>It is used to ensure that we don't buffer all the data in mod_include. 
>>>It isn't really done correctly though, because what we should be doing,
>>>is just continuing to read as much data as possible, but as soon as we
>>>can't read something, send what we have down the filter stack.
>>>
>>>This variable, basically ensures we don't keep reading all the data until
>>>we process the whole file, or reach the first tag.
>>>
>>In what manner do you mean "as soon as we can't read something"? It is my
>>understanding that the bucket code hides reading delays from the
>>mod_include code. If that is true how would the mod_include code know when
>>
>
>The bucket does not hide reading delays at all.  Basically, you call apr_bucket_read
>with APR_NONBLOCK_READ, and when you get a return code of APR_EAGAIN,
>you send what you have, and then call apr_bucket_read with APR_BLOCK_READ.
>
>>to send a chunk along? Are you saying the bucket code should do some magic
>>like send all buckets in the brigade up to the current one? This would
>>wreak havoc on code like mod_include that may be setting aside or tagging
>>buckets for replacement when the end of the tag is found.
>>
>
>Huh?  The bucket code doesn't ever send data down the filter stack unless you
>tell it to.  Take a look at the content-length filter to see what I mean.
>
>>This code was put in because we were seeing the mod_include code buffer up
>>the entire collection of buckets until an SSI tag was found. If you have a
>>200 MB file with an SSI tag footer at the end of the brigade, the whole
>>thing was being buffered. How do you propose that this be done differently?
>>
>
>I don't care if mod_include buffers 200 Megs, as long as it is constantly doing
>something with the data.  If we have a 200 Meg file that has no SSI tags in
>it, but we can get all 200 Meg at one time, then we shouldn't have any problem
>just scanning through the entire 200 Megs very quickly.  Worst case, we do what
>Brian suggested, and just check the bucket length once we have finished
>processing all of the data in that bucket.  The buffering only becomes a
>real problem when we sit waiting for data from a CGI or some other slow
>content generator.
>

How does this patch look?  It does the check only on bucket
boundaries, and it pushes out the buffered content if either:
  * the amount of buffered content is too large, or
  * the apr_bucket_read would block.

If this technique looks sound, I can create another patch that
does the same thing for find_end_sequence.

--Brian


Index: mod_include.c
===================================================================
RCS file: /home/cvspublic/httpd-2.0/modules/filters/mod_include.c,v
retrieving revision 1.136
diff -u -r1.136 mod_include.c
--- mod_include.c    2001/08/27 20:25:42    1.136
+++ mod_include.c    2001/08/28 06:16:38
@@ -143,9 +143,10 @@
  * first byte of the BEGINNING_SEQUENCE (after finding a complete 
match) or it
  * returns NULL if no match found.
  */
-static apr_bucket *find_start_sequence(apr_bucket *dptr, include_ctx_t 
*ctx,
-                                      apr_bucket_brigade *bb, int 
*do_cleanup)
+static apr_bucket *find_start_sequence(apr_bucket *dptr, ap_filter_t *f,
+                                      apr_bucket_brigade **bb, int 
*do_cleanup)
 {
+    include_ctx_t *ctx = f->ctx;
     apr_size_t len;
     const char *c;
     const char *buf;
@@ -156,39 +157,43 @@
     *do_cleanup = 0;
 
     do {
+        apr_status_t rv;
+
         if (APR_BUCKET_IS_EOS(dptr)) {
             break;
         }
-        apr_bucket_read(dptr, &buf, &len, APR_BLOCK_READ);
-        /* XXX handle retcodes */
-        if (len == 0) { /* end of pipe? */
-            break;
+
+        if ((ctx->bytes_parsed >= BYTE_COUNT_THRESHOLD) &&
+            !APR_BRIGADE_EMPTY(*bb)) {
+            apr_bucket_brigade *remainder = apr_brigade_split(*bb, dptr);
+            rv = ap_pass_brigade(f->next, *bb);
+            *bb = remainder;
+            if (rv != APR_SUCCESS) {
+                return NULL;
+            }
         }
-        c = buf;
-        while (c < buf + len) {
-            if (ctx->bytes_parsed >= BYTE_COUNT_THRESHOLD) {
-                apr_bucket *start_bucket;
 
-                if (ctx->head_start_index > 0) {
-                    start_index  = ctx->head_start_index;
-                    start_bucket = ctx->head_start_bucket;
+        rv = apr_bucket_read(dptr, &buf, &len, APR_NONBLOCK_READ);
+        if (rv == APR_EAGAIN) {
+            if (!APR_BRIGADE_EMPTY(*bb)) {
+                apr_bucket_brigade *remainder = apr_brigade_split(*bb, 
dptr);
+                rv = ap_pass_brigade(f->next, *bb);
+                *bb = remainder;
+                if (rv != APR_SUCCESS) {
+                    return NULL;
                 }
-                else {
-                    start_index  = (c - buf);
-                    start_bucket = dptr;
-                }
-                apr_bucket_split(start_bucket, start_index);
-                tmp_bkt = APR_BUCKET_NEXT(start_bucket);
-                if (ctx->head_start_index > 0) {
-                    ctx->head_start_index  = 0;
-                    ctx->head_start_bucket = tmp_bkt;
-                    ctx->parse_pos = 0;
-                    ctx->state = PRE_HEAD;
-                }
-
-                return tmp_bkt;
             }
+            rv = apr_bucket_read(dptr, &buf, &len, APR_BLOCK_READ);
+        }
+        if (rv != APR_SUCCESS) {
+            return NULL;
+        }
 
+        if (len == 0) { /* end of pipe? */
+            break;
+        }
+        c = buf;
+        while (c < buf + len) {
             if (*c == str[ctx->parse_pos]) {
                 if (ctx->state == PRE_HEAD) {
                     ctx->state             = PARSE_HEAD;
@@ -244,7 +249,7 @@
             ctx->bytes_parsed++;
         }
         dptr = APR_BUCKET_NEXT(dptr);
-    } while (dptr != APR_BRIGADE_SENTINEL(bb));
+    } while (dptr != APR_BRIGADE_SENTINEL(*bb));
     return NULL;
 }
 
@@ -2363,7 +2368,7 @@
             int do_cleanup = 0;
             apr_size_t cleanup_bytes = ctx->parse_pos;
 
-            tmp_dptr = find_start_sequence(dptr, ctx, *bb, &do_cleanup);
+            tmp_dptr = find_start_sequence(dptr, f, bb, &do_cleanup);
 
             /* The few bytes stored in the ssi_tag_brigade turned out 
not to
              * be a tag after all. This can only happen if the starting
[PATCH] Re: chunking of content in mod_include?

Reply via email to