We have run across two problems in mod_include that were corrupting our pages and would like to share our patches.
Both of these situations are highlighted by situations where bucketeering and brigading are accentuated (mostly through the use of mod_proxy to a remote host, but should be testable with simple includes of appropriately sized (8K+) files). The first problem was appearing as a few (important) chars being truncated right before the occurrence of a virtual include. The situation was heightened by heavy use of mod_proxy to retrieve these files and the smaller brigade size that employs (about 1.5K). After snooping the results of the proxy request, we verified that the returned file was looking just fine with various calls to <!--#include virtual... --> ... but after processing by mod_include, some of the (valid) includes where failing. In particular, we tracked down a case where the start of a comment at the very end of the proceeding bucket would cause a miscount of the start of the location of the include in the current bucket. As an example, consider a bucket that ends as follows: ... <! and the new bucket starts as -- some comment --><img src="a.gif"><!--#include ... The problem we were seeing in the mod_include parsed page was that the two chars proceeding this include were being eaten. So, we would see the following string: ... <!-- some comment --><img src="a.gif<!-- missing include --> As you might imagine, the missing end quote was troublesome. :) (Not to mention the bad include...) At line 438 of mod_include, we see the code is set up to handle this 'left over partial match' and it increments the temporary buffer pointer, c, by 2 as it walks past the '--' at the start of this bucket. So it (as far as I could determine) adds the left over '<!' to this bucket and continues on, calling the quick scanner function, bndm(), passing in the recently updated parameter 'c'. The result of this is 'pos', an offset into the bucket supposedly representing the start of the include... but it's off by 2! Later, it proceeds to 'split' the bucket at what it thinks is the start of the include (but is actually the '">' from the proceeding tag) and fails to figure out what include command is being requested, hence the 'missing' comment. This patch which doesn't really fix the problem but corrects the resulting problem... we hope someone with a better understanding of mod_include and buckets can come up with a true solution. ====== The second problem actually occurs across brigades (we learned an awful lot about buckets and brigades in diagnosing these problems! :D) And, instead of removing characters was instead adding large extra chunks. This problem revolves around the use of 'if' 'else' and 'elseif'. Originally, we thought the problem was related to <!--#else --> but later we found an troublesome <!--#if ... --> that lead to us finding the real culprit. As an example, consider the following example: ... <!--#if ... -->Some if text ... end if text <!--#endif --> The problem reveals itself if the 'Some if text ...' is in one brigade and the 'end if text' is in the next *and* the if's condition is false (hence the if's conditional body should *not* be present after mod_include works it's magic). What we found is that we were getting 'Some if text ...' out of mod_include... but not the 'end if text' when the condition was false. :( We finally tracked the problem down because of the comments on line 3200 of mod_include.c: /* Inside a false conditional (if, elif, else), so toss it all... */ if ((dptr != APR_BRIGADE_SENTINEL(*bb)) && (!(ctx->flags & FLAG_PRINTING))) { apr_bucket *free_bucket; do { free_bucket = dptr; dptr = APR_BUCKET_NEXT (dptr); apr_bucket_delete(free_bucket); } while (dptr != APR_BRIGADE_SENTINEL(*bb)); } Rightly so, this loop makes sure it doesn't throw out the brigade's sentinel. Surely this would be a 'bad thing'. However, earlier in the code (3015 ... 3044), we have this: if (tmp_dptr != NULL) { ... } else { /* remainder of this brigade... */ dptr = APR_BRIGADE_SENTINEL(*bb); } And tmp_dptr, is set via line 2937: tmp_dptr = find_start_sequence(dptr, ctx, *bb, &do_cleanup); But since there's no start sequences in the if's conditional text, tmp_dptr is set to NULL (left as an exercise for the reader :). So the code would get to line 3200... and dptr would *always* be pointing at the sentinel. That just clearly seems wrong. The do loop would never be executed and the <!--#if-->'s body would never be thrown away. Exactly what we were seeing. :( Once again, we don't claim the patch really fixes the problem; it just tries to correct the situation we were seeing in the most minimal way. In fact, we kept (what we think is) the 'bad' code in case there's some strange scenario where the code really is good. Attached is a single patch to fix both problems.
mod_include.patch.formal
Description: Binary data