Module code review (long). Was Re: RFC: Who owns a passed brigade?

Rici Lake Fri, 22 Apr 2005 20:33:34 -0700

On 22-Apr-05, at 6:19 AM, Nick Kew wrote:

However, I think it is a fallacy that a cleaned-up brigade is not too big to wait for pool cleanup. The brigade itself is about four pointers; and the corresponding pool cleanup is another four pointers, so that's a total of around 32 bytes, not a lot. But suppose that a brigade is created for every 8k of filtered data,
Are there applications like that in the wild?  I'd have thought that
sounds like the kind of case where you definitely want to re-use
one brigade, with a brigade_cleanup after each 8Kb pass_brigade.

Indeed. But the filter does not know, in general, how much data it will be called upon to filter, nor how many times it will be invoked. Being invoked once every 8 kb should be pretty common, though, since that seems to be the recommendation on how often a filter should pass a brigade.

Are there applications like that in the wild? I don't know, but it seems likely that there are. None of the following examples are definitive, but they seem illustrative.

I cite these examples only because they are available; it was not an attempt to criticize anyone's code in particular, but rather to show the result of the lack of clear documentation on how to write an output filter.

In summary, of five modules examined, three of them create new brigades on every invocation without destroying the previous one. One of them fails to empty the passed brigade. One of them destroys the passed brigade. Two of them do one-time initialization on every invocation of the filter.

The only output filter I looked at which seems to conform to "brigades-are-retained, buckets-are-passed" is mod_deflate. This is the loop from mod_deflate.c, which seems to be a reasonable model:

  1   static apr_status_t deflate_out_filter(ap_filter_t *f,
  2                                          apr_bucket_brigade *bb)
  3   {
  // ... declarations
  4       if (APR_BRIGADE_EMPTY(bb)) {
  5           return APR_SUCCESS;
  6       }
  7       if (!ctx) {
              // ... determine whether we should filter the content
  8           /* We're cool with filtering this. */
  9           ctx = f->ctx = apr_pcalloc(r->pool, sizeof(*ctx));
 10           ctx->bb = apr_brigade_create(r->pool, f->c->bucket_alloc);
              // ... more initialization
 11       }
 12       while (!APR_BRIGADE_EMPTY(bb))
 13       {
 14           e = APR_BRIGADE_FIRST(bb);
              // ... check if e is metadata, process data in e
 15           apr_bucket_delete(e);
 16       }
 17       apr_brigade_cleanup(bb);
 18       return APR_SUCCESS;
 19   }

Notes:
lines 4-6
   these only serve to avoid initialization in the event that the filter
   only receives empty brigades. I don't know if this is necessary.
line 7
   all initialization is performed only once, if the ctx structure has
   not yet been created
line 12, 15
   all buckets are removed from the brigade and deleted
line 17
   the call to cleanup is presumably redundant since there is no break
   in the while block. But there could be.

mod_case_filter is supposedly an example of how to write filters, so it is likely that its code will be copied by naive filter writers. Here is its loop, in essence:

1 static apr_status_t CaseFilterOutFilter(ap_filter_t *f, 2 apr_bucket_brigade *pbbIn) 3 { // ... declarations 4 pbbOut=apr_brigade_create(r->pool, c->bucket_alloc); 5 for (pbktIn = APR_BRIGADE_FIRST(pbbIn); 6 pbktIn != APR_BRIGADE_SENTINEL(pbbIn); 7 pbktIn = APR_BUCKET_NEXT(pbktIn)) 8 { // ... check for EOS 8 apr_bucket_read(pbktIn,&data,&len,APR_BLOCK_READ); // ... do the transform 10 pbktOut = apr_bucket_heap_create(buf, len, apr_bucket_free, 11 c->bucket_alloc); 12 APR_BRIGADE_INSERT_TAIL(pbbOut,pbktOut); 13 } 14 return ap_pass_brigade(f->next,pbbOut); 15 }

Notes: line 4: a new brigade is created on every invocation. lines 5-7: the input brigade is iterated but the buckets are not removed line 14: the input brigade is not cleaned, and the output brigade is not destroyed.

OK, here's mod_charset_lite, which is flagged as a learning experience (for the author), but I believe can actually be found in the wild. The filter function itself was a bit much to include here (it does seem to empty its input brigade), but the key is in the following function which is called everytime the filter decides it has output available:

1 static apr_status_t send_downstream(ap_filter_t *f, const char *tmp, 2 apr_size_t len) 3 { // ... declarations 4 bb = apr_brigade_create(r->pool, c->bucket_alloc); 5 b = apr_bucket_transient_create(tmp, len, c->bucket_alloc); 6 APR_BRIGADE_INSERT_TAIL(bb, b); 7 rv = ap_pass_brigade(f->next, bb); 8 if (rv != APR_SUCCESS) { 9 ctx->ees = EES_DOWNSTREAM; 10 } 11 return rv; 12 }

Notes:
line 4:
  a new brigade is created on every output (8 kb)
line 11:
  the brigade is not destroyed

Now, mod_include. This is a complex module, but stripped to its essence with respect to bucket_brigade handling, we have:

1 static apr_status_t send_parsed_content(ap_filter_t *f, apr_bucket_brigade *bb) 2 { // ... lots of stuff snipped 3 apr_bucket *b = APR_BRIGADE_FIRST(bb); 4 apr_bucket_brigade *pass_bb; 5 pass_bb = apr_brigade_create(ctx->pool, f->c->bucket_alloc);

6 /* loop over the current bucket brigade */ 7 while (b != APR_BRIGADE_SENTINEL(bb)) { // ... remove buckets from bb and process them, periodically // ... calling ap_pass_brigade 8 /* if something's left over, pass it along */ 9 if (!APR_BRIGADE_EMPTY(pass_bb)) { 10 rv = ap_pass_brigade(f->next, pass_bb); 11 } 12 else { 13 rv = APR_SUCCESS; 14 apr_brigade_destroy(pass_bb); 15 } 16 return rv; 17 }

Notes
line 4:
  a new brigade is created on every invocation
line 14:
  the brigade is destroyed after being passed

While I was looking at this code, I realized that send_parsed_content is not the actual filter function. The actual filter function, again heavily edited:

1 static apr_status_t includes_filter(ap_filter_t *f, apr_bucket_brigade *b) 2 { 3 if (!(ap_allow_options(r) & OPT_INCLUDES)) { // ... refuse to handle request 4 }

  5       if (!f->ctx) {
              // ... initialize context
  6       }

7 if ((parent = ap_get_module_config(r->request_config, &include_module))) { 8 r->subprocess_env = r->main->subprocess_env; 9 apr_pool_join(r->main->pool, r->pool); 10 r->finfo.mtime = r->main->finfo.mtime; 11 } 12 else { 13 /* we're not a nested include, so we create an initial 14 * environment */ 15 ap_add_common_vars(r); 16 ap_add_cgi_vars(r); 17 add_include_vars(r, conf->default_time_fmt); 18 } // ... more stuff having to do with Last-Modified and Content-Length // ... and creating QUERY_STRING from r->args 19 return send_parsed_content(f, b); 20 }

Notes lines 7-18: these run on every invocation, including the calls to ap_add_common_vars and friends at line 15-17. I spent some time fruitlessly searching for the place where it avoids that on the second invocation, and finally tested it by applying it to a self-proxy:

<Location /inc>
    Allow from all
    ProxyPass http://freeb:8089
    ProxyPassReverse http://freeb:8089
    SetOutputFilter INCLUDES
</Location>

using the file:

<html><head><title>test</title></head> <body> <pre>  </pre> <p> ........................................................................ ....... ------ last line repeated 2048 times -----  <pre>  </pre> </body></html>

As might be expected, the first printenv shows the actual SCRIPT_NAME, whereas the second one shows /wouldn't/you/like/to/know. Moving the <!--#set...> to just after the <body> reverses this: the first printenv shows the occulted SCRIPT_NAME, but since ap_add_cgi_vars has run again by the time the second printenv is encountered, the original value is shown by the second printenv.


Finally, mod_layout2, a third-party module actually in use:

1 static apr_status_t layout_filter(ap_filter_t *f, apr_bucket_brigade *b) { 2 if (r->main) { 3 return ap_pass_brigade(f->next, b); 4 } 5 ap_table_setn(r->headers_out, "X-Powered-By", "ModLayout/"VERSION); 6 cfg = ap_get_module_config(r->per_dir_config, &layout_module); 7 if (cfg->uris_ignore) { 8 if (table_find(cfg->uris_ignore, r->uri)) { 9 return ap_pass_brigade(f->next, b); 10 } 11 } 12 info = create_layout_request(r, cfg); 13 if (ctx == NULL) { 14 f->ctx = ctx = apr_pcalloc(f->r->pool, sizeof(*ctx)); 15 ctx->b = apr_brigade_create(f->r->pool, r->connection->bucket_alloc); 16 ctx->output = NULL; 17 }

 18       apr_table_unset(f->r->headers_out, "Content-Length");
 19       apr_table_unset(f->r->headers_out, "ETag");

 20       APR_BRIGADE_FOREACH(e, b) {
 21           if (APR_BUCKET_IS_EOS(e) || APR_BUCKET_IS_FLUSH  (e)) {
 22               info->f = f->next;
 23               info->b = ctx->b;
                  // ... snip: actually process accumulated input
 24               APR_BUCKET_REMOVE(e);
 25               APR_BRIGADE_INSERT_TAIL(ctx->b, e);
 26               return ap_pass_brigade(f->next, ctx->b);
 27           }
 28           if (APR_BUCKET_IS_FLUSH(e)) {
 29               continue;
 30           }
 31           apr_bucket_read(e, &str, &len, APR_NONBLOCK_READ);

32 if (ctx->output) { 33 ctx->output = apr_psprintf(r->pool,"%s%.*s", ctx->output, len, str); 34 } else { 35 ctx->output = apr_pstrndup(r->pool, str, len); 36 } 37 }

 38       apr_brigade_destroy(b);
 39       return APR_SUCCESS;
 40   }

Notes lines 5-12: like mod_include, this module performs initialization on every invocation. line 15: however, a single brigade is created for passing to the next filter line 26: although the module destroys its received brigade (see below), it seems to assume that the next filter will not destroy the brigade it is passing lines 32-36: incoming data is simply buffered in memory until an EOS or FLUSH. The latter seems to mean that a tag will not be recognized if it is interrupted by a flush bucket, but I haven't tested that. line 38: the input buckets have not yet been deleted, but apr_brigade_destroy will do that. It will also destroy the brigade, though.

Module code review (long). Was Re: RFC: Who owns a passed brigade?

Reply via email to