Re: corrupt brigades when using mod_cache in reverse proxy

2011-11-18 Thread Graham Leggett

On 18 Nov 2011, at 9:50 PM, f_los_ch wrote:


When enabling mod_cache in my reverse-proxy scenario, the first file
with a filesize above some threshold gets delivered corrupted,
subsequent requests served from cache are fine.


Can you confirm if the following patch makes any difference for you? I  
suspect the disk cache is finding a way to leave without passing all  
the saved buckets to out first.


Index: modules/cache/mod_cache_disk.c
===
--- modules/cache/mod_cache_disk.c  (revision 1203646)
+++ modules/cache/mod_cache_disk.c  (working copy)
@@ -1098,7 +1098,7 @@
 /* are we done completely? if so, pass any trailing buckets  
right through */

 if (dobj->done || !dobj->data.pool) {
 APR_BUCKET_REMOVE(e);
-APR_BRIGADE_INSERT_TAIL(out, e);
+APR_BRIGADE_INSERT_TAIL(dobj->bb, e);
 continue;
 }

@@ -1107,16 +1107,14 @@
 seen_eos = 1;
 dobj->done = 1;
 APR_BUCKET_REMOVE(e);
-APR_BRIGADE_CONCAT(out, dobj->bb);
-APR_BRIGADE_INSERT_TAIL(out, e);
+APR_BRIGADE_INSERT_TAIL(dobj->bb, e);
 break;
 }

 /* honour flush buckets, we'll get called again */
 if (APR_BUCKET_IS_FLUSH(e)) {
 APR_BUCKET_REMOVE(e);
-APR_BRIGADE_CONCAT(out, dobj->bb);
-APR_BRIGADE_INSERT_TAIL(out, e);
+APR_BRIGADE_INSERT_TAIL(dobj->bb, e);
 break;
 }

@@ -1203,17 +1201,18 @@
 dobj->offset -= length;
 if (dobj->offset <= 0) {
 dobj->offset = 0;
-APR_BRIGADE_CONCAT(out, dobj->bb);
 break;
 }
 if ((dconf->readtime && apr_time_now() > dobj->timeout)) {
 dobj->timeout = 0;
-APR_BRIGADE_CONCAT(out, dobj->bb);
 break;
 }

 }

+/* make sure we sweep all saved buckets to the out brigade */
+APR_BRIGADE_CONCAT(out, dobj->bb);
+
 /* Was this the final bucket? If yes, close the temp file and  
perform

  * sanity checks.
  */

Regards,
Graham
--



Re: corrupt brigades when using mod_cache in reverse proxy

2011-11-19 Thread f_los_ch
Hello,

Thanks for the reply and your patch - it worked!

I could not longer reproduce diffs for cached/uncached files. The log
with dumped buckets according to my previous patch is again available
at: http://paste2.org/p/1785342

Now, when the location is spotted, maybe I can also try to isolate the
bug even more via further debugging for developing some production-use
patch. In the hope that really the bug was in there and the continuous
flushing did not masquerade the underlying issue..?

But that it is working for now is a real relief.

Thanks again & kind regards:
Florian


Am Samstag, den 19.11.2011, 03:09 +0200 schrieb Graham Leggett: 
> On 18 Nov 2011, at 9:50 PM, f_los_ch wrote:
> 
> > When enabling mod_cache in my reverse-proxy scenario, the first file
> > with a filesize above some threshold gets delivered corrupted,
> > subsequent requests served from cache are fine.
> 
> Can you confirm if the following patch makes any difference for you? I  
> suspect the disk cache is finding a way to leave without passing all  
> the saved buckets to out first.
> 
> Index: modules/cache/mod_cache_disk.c
> ===
> --- modules/cache/mod_cache_disk.c(revision 1203646)
> +++ modules/cache/mod_cache_disk.c(working copy)
> @@ -1098,7 +1098,7 @@
>   /* are we done completely? if so, pass any trailing buckets  
> right through */
>   if (dobj->done || !dobj->data.pool) {
>   APR_BUCKET_REMOVE(e);
> -APR_BRIGADE_INSERT_TAIL(out, e);
> +APR_BRIGADE_INSERT_TAIL(dobj->bb, e);
>   continue;
>   }
> 
> @@ -1107,16 +1107,14 @@
>   seen_eos = 1;
>   dobj->done = 1;
>   APR_BUCKET_REMOVE(e);
> -APR_BRIGADE_CONCAT(out, dobj->bb);
> -APR_BRIGADE_INSERT_TAIL(out, e);
> +APR_BRIGADE_INSERT_TAIL(dobj->bb, e);
>   break;
>   }
> 
>   /* honour flush buckets, we'll get called again */
>   if (APR_BUCKET_IS_FLUSH(e)) {
>   APR_BUCKET_REMOVE(e);
> -APR_BRIGADE_CONCAT(out, dobj->bb);
> -APR_BRIGADE_INSERT_TAIL(out, e);
> +APR_BRIGADE_INSERT_TAIL(dobj->bb, e);
>   break;
>   }
> 
> @@ -1203,17 +1201,18 @@
>   dobj->offset -= length;
>   if (dobj->offset <= 0) {
>   dobj->offset = 0;
> -APR_BRIGADE_CONCAT(out, dobj->bb);
>   break;
>   }
>   if ((dconf->readtime && apr_time_now() > dobj->timeout)) {
>   dobj->timeout = 0;
> -APR_BRIGADE_CONCAT(out, dobj->bb);
>   break;
>   }
> 
>   }
> 
> +/* make sure we sweep all saved buckets to the out brigade */
> +APR_BRIGADE_CONCAT(out, dobj->bb);
> +
>   /* Was this the final bucket? If yes, close the temp file and  
> perform
>* sanity checks.
>*/
> 
> Regards,
> Graham
> --
> 




Re: corrupt brigades when using mod_cache in reverse proxy

2011-11-19 Thread Graham Leggett

On 19 Nov 2011, at 3:56 PM, f_los_ch wrote:


Thanks for the reply and your patch - it worked!

I could not longer reproduce diffs for cached/uncached files. The log
with dumped buckets according to my previous patch is again available
at: http://paste2.org/p/1785342

Now, when the location is spotted, maybe I can also try to isolate the
bug even more via further debugging for developing some production-use
patch. In the hope that really the bug was in there and the continuous
flushing did not masquerade the underlying issue..?

But that it is working for now is a real relief.


Thanks for confirming it works. Would it be possible to try this  
alternative patch? Looking at the brigade inside h, it doesn't seem to  
be doing anything useful in this case, and was a hangup from older  
code. This new patch takes the intermediate brigade out completely,  
and should in theory use less memory and be slightly faster: What I  
need to do is find out why we haven't discovered this before.


Index: modules/cache/mod_cache_disk.c
===
--- modules/cache/mod_cache_disk.c  (revision 1203646)
+++ modules/cache/mod_cache_disk.c  (working copy)
@@ -1075,9 +1075,6 @@
 disk_cache_dir_conf *dconf = ap_get_module_config(r- 
>per_dir_config, &cache_disk_module);

 int seen_eos = 0;

-if (!dobj->bb) {
-dobj->bb = apr_brigade_create(r->pool, r->connection- 
>bucket_alloc);

-}
 if (!dobj->offset) {
 dobj->offset = dconf->readsize;
 }
@@ -1107,7 +1104,6 @@
 seen_eos = 1;
 dobj->done = 1;
 APR_BUCKET_REMOVE(e);
-APR_BRIGADE_CONCAT(out, dobj->bb);
 APR_BRIGADE_INSERT_TAIL(out, e);
 break;
 }
@@ -1115,7 +,6 @@
 /* honour flush buckets, we'll get called again */
 if (APR_BUCKET_IS_FLUSH(e)) {
 APR_BUCKET_REMOVE(e);
-APR_BRIGADE_CONCAT(out, dobj->bb);
 APR_BRIGADE_INSERT_TAIL(out, e);
 break;
 }
@@ -1123,21 +1118,20 @@
 /* metadata buckets are preserved as is */
 if (APR_BUCKET_IS_METADATA(e)) {
 APR_BUCKET_REMOVE(e);
-APR_BRIGADE_INSERT_TAIL(dobj->bb, e);
+APR_BRIGADE_INSERT_TAIL(out, e);
 continue;
 }

 /* read the bucket, write to the cache */
 rv = apr_bucket_read(e, &str, &length, APR_BLOCK_READ);
 APR_BUCKET_REMOVE(e);
-APR_BRIGADE_INSERT_TAIL(dobj->bb, e);
+APR_BRIGADE_INSERT_TAIL(out, e);
 if (rv != APR_SUCCESS) {
 ap_log_rerror(APLOG_MARK, APLOG_ERR, 0, r,
 "cache_disk: Error when reading bucket for URL  
%s",

 h->cache_obj->key);
 /* Remove the intermediate cache file and return non- 
APR_SUCCESS */

 apr_pool_destroy(dobj->data.pool);
-APR_BRIGADE_CONCAT(out, dobj->bb);
 return rv;
 }

@@ -1156,7 +1150,6 @@
  APR_BUFFERED | APR_EXCL, dobj- 
>data.pool);

 if (rv != APR_SUCCESS) {
 apr_pool_destroy(dobj->data.pool);
-APR_BRIGADE_CONCAT(out, dobj->bb);
 return rv;
 }
 dobj->file_size = 0;
@@ -1164,7 +1157,6 @@
 dobj->data.tempfd);
 if (rv != APR_SUCCESS) {
 apr_pool_destroy(dobj->data.pool);
-APR_BRIGADE_CONCAT(out, dobj->bb);
 return rv;
 }
 dobj->disk_info.device = finfo.device;
@@ -1180,7 +1172,6 @@
 h->cache_obj->key);
 /* Remove the intermediate cache file and return non- 
APR_SUCCESS */

 apr_pool_destroy(dobj->data.pool);
-APR_BRIGADE_CONCAT(out, dobj->bb);
 return rv;
 }
 dobj->file_size += written;
@@ -1191,7 +1182,6 @@
 h->cache_obj->key, dobj->file_size, dconf->maxfs);
 /* Remove the intermediate cache file and return non- 
APR_SUCCESS */

 apr_pool_destroy(dobj->data.pool);
-APR_BRIGADE_CONCAT(out, dobj->bb);
 return APR_EGENERAL;
 }

@@ -1203,12 +1193,10 @@
 dobj->offset -= length;
 if (dobj->offset <= 0) {
 dobj->offset = 0;
-APR_BRIGADE_CONCAT(out, dobj->bb);
 break;
 }
 if ((dconf->readtime && apr_time_now() > dobj->timeout)) {
 dobj->timeout = 0;
-APR_BRIGADE_CONCAT(out, dobj->bb);
 break;
 }

Index: modules/cache/mod_cache_disk.h
===
--- modules/cache/mod_cache_disk.h  (revision 1203646)
+++ modules/cache/mod_cache_disk.h  (working copy)
@@ -49,7 +49,6 @@
 const char *key; /* On-disk prefix; URI with Vary  
bits (if present) */
 apr_off_t file_siz

Re: corrupt brigades when using mod_cache in reverse proxy

2011-11-19 Thread Florian S.
Yes, I can confirm that this patch works for me, too.

And now I understand your point: Buffering/keeping back the output via
an additional brigade until a flush occurs isn't actually needed for the
cache-filter as it could simply be passed along immediately. Seems
pretty reasonable.

I'll use it in (semi-)production for the next days and keep an eye on it
(but probably not needed).

Thanks again for your effort & Kind regards:
Florian


Am Samstag, den 19.11.2011, 16:59 +0200 schrieb Graham Leggett:
> On 19 Nov 2011, at 3:56 PM, f_los_ch wrote:
> 
> > Thanks for the reply and your patch - it worked!
> >
> > I could not longer reproduce diffs for cached/uncached files. The log
> > with dumped buckets according to my previous patch is again available
> > at: http://paste2.org/p/1785342
> >
> > Now, when the location is spotted, maybe I can also try to isolate the
> > bug even more via further debugging for developing some production-use
> > patch. In the hope that really the bug was in there and the continuous
> > flushing did not masquerade the underlying issue..?
> >
> > But that it is working for now is a real relief.
> 
> Thanks for confirming it works. Would it be possible to try this  
> alternative patch? Looking at the brigade inside h, it doesn't seem to  
> be doing anything useful in this case, and was a hangup from older  
> code. This new patch takes the intermediate brigade out completely,  
> and should in theory use less memory and be slightly faster: What I  
> need to do is find out why we haven't discovered this before.
> 
> Index: modules/cache/mod_cache_disk.c
> ===
> --- modules/cache/mod_cache_disk.c(revision 1203646)
> +++ modules/cache/mod_cache_disk.c(working copy)
> @@ -1075,9 +1075,6 @@
>   disk_cache_dir_conf *dconf = ap_get_module_config(r- 
>  >per_dir_config, &cache_disk_module);
>   int seen_eos = 0;
> 
> -if (!dobj->bb) {
> -dobj->bb = apr_brigade_create(r->pool, r->connection- 
>  >bucket_alloc);
> -}
>   if (!dobj->offset) {
>   dobj->offset = dconf->readsize;
>   }
> @@ -1107,7 +1104,6 @@
>   seen_eos = 1;
>   dobj->done = 1;
>   APR_BUCKET_REMOVE(e);
> -APR_BRIGADE_CONCAT(out, dobj->bb);
>   APR_BRIGADE_INSERT_TAIL(out, e);
>   break;
>   }
> @@ -1115,7 +,6 @@
>   /* honour flush buckets, we'll get called again */
>   if (APR_BUCKET_IS_FLUSH(e)) {
>   APR_BUCKET_REMOVE(e);
> -APR_BRIGADE_CONCAT(out, dobj->bb);
>   APR_BRIGADE_INSERT_TAIL(out, e);
>   break;
>   }
> @@ -1123,21 +1118,20 @@
>   /* metadata buckets are preserved as is */
>   if (APR_BUCKET_IS_METADATA(e)) {
>   APR_BUCKET_REMOVE(e);
> -APR_BRIGADE_INSERT_TAIL(dobj->bb, e);
> +APR_BRIGADE_INSERT_TAIL(out, e);
>   continue;
>   }
> 
>   /* read the bucket, write to the cache */
>   rv = apr_bucket_read(e, &str, &length, APR_BLOCK_READ);
>   APR_BUCKET_REMOVE(e);
> -APR_BRIGADE_INSERT_TAIL(dobj->bb, e);
> +APR_BRIGADE_INSERT_TAIL(out, e);
>   if (rv != APR_SUCCESS) {
>   ap_log_rerror(APLOG_MARK, APLOG_ERR, 0, r,
>   "cache_disk: Error when reading bucket for URL  
> %s",
>   h->cache_obj->key);
>   /* Remove the intermediate cache file and return non- 
> APR_SUCCESS */
>   apr_pool_destroy(dobj->data.pool);
> -APR_BRIGADE_CONCAT(out, dobj->bb);
>   return rv;
>   }
> 
> @@ -1156,7 +1150,6 @@
>APR_BUFFERED | APR_EXCL, dobj- 
>  >data.pool);
>   if (rv != APR_SUCCESS) {
>   apr_pool_destroy(dobj->data.pool);
> -APR_BRIGADE_CONCAT(out, dobj->bb);
>   return rv;
>   }
>   dobj->file_size = 0;
> @@ -1164,7 +1157,6 @@
>   dobj->data.tempfd);
>   if (rv != APR_SUCCESS) {
>   apr_pool_destroy(dobj->data.pool);
> -APR_BRIGADE_CONCAT(out, dobj->bb);
>   return rv;
>   }
>   dobj->disk_info.device = finfo.device;
> @@ -1180,7 +1172,6 @@
>   h->cache_obj->key);
>   /* Remove the intermediate cache file and return non- 
> APR_SUCCESS */
>   apr_pool_destroy(dobj->data.pool);
> -APR_BRIGADE_CONCAT(out, dobj->bb);
>   return rv;
>   }
>   dobj->file_size += written;
> @@ -1191,7 +1182,6 @@
>   h->cache_obj->key, dobj->file_size, dconf->maxfs);
>   /* Remove the intermediate cache file and return non- 
> APR_SUCCESS */
>   apr_pool_destroy(dobj->data.pool);
> -APR_BRIGADE_CONCAT(o