Re: mod_cache: Broken Expires from back end and CacheStoreExpired

2018-06-19 Thread Jim Jagielski
I can't fault the logic... +1 for the patch.

> On Jun 19, 2018, at 6:47 AM, Rainer Jung  wrote:
> 
> I have a situation where I have a caching Apache in front of a back end. The 
> backend sends a response header "Expires: -1" and mod_cache unconditionally 
> refuses to cache the response with the error "Broken expires header".
> 
> RFC 7234 section 5.3 [1] contains the text:
> 
> ===
> ...
> The Expires value is an HTTP-date timestamp, as defined in Section 7.1.1.1 of 
> [RFC7231].
> 
> Expires = HTTP-date
> 
> For example
> 
> Expires: Thu, 01 Dec 1994 16:00:00 GMT
> 
> A cache recipient MUST interpret invalid date formats, especially the value 
> "0", as representing a time in the past (i.e., "already expired").
> ...
> ===
> 
> Furthermore:
> 
> ===
> ...
> If a response includes a Cache-Control field with the max-age directive 
> (Section 5.2.2.8), a recipient MUST ignore the Expires field.  Likewise, if a 
> response includes the s-maxage directive (Section 5.2.2.9), a shared cache 
> recipient MUST ignore the Expires field.  In both these cases, the value in 
> Expires is only intended for recipients that have not yet implemented the 
> Cache-Control field.
> ...
> ===
> 
> I would like to make the following case a) to behave like case b):
> 
> a) expires header contains no valid date format
> 
> b) expires header contains a valid date format, but that date is in the past
> 
> This is currently not the case. Case a) never caches, case b) does not cache 
> by default, but caching can be forced by CacheStoreExpired in the 
> configuration. Also max-age and s-maxage take precendence over expires in 
> case b).
> 
> Code archeology of mod_cache shows:
> 
> 1) originally content with expires being an HTTP-date but that date is in the 
> past was cached (case b) not handled RFC compliant). Content with invalid 
> expires wasn't cached from the beginning (that's case a)).
> 
> 2) r450055 (minfrin) added a check to refuse caching in case expires was an 
> HTTP-date in the past (case b))
> 
> 3) r1000106 (wrowe) added a config option the allow caching stale content but 
> applied this option only to case b), not to case a), although the RFC says "A 
> cache recipient MUST interpret invalid date formats, ..., as representing a 
> time in the past (i.e., "already expired").".
> 
> 4) r1003882 (minfrin) added more options to control more caching behavior by 
> configuration, for example in case b).
> 
> 5) r1726675 (covener) added max-age and s-maxage checks to case b).
> 
> I guess that case a) simply wasn't on the radar when 3), 4) and 5) were added.
> 
> I propose the following patch for trunk and 2.4. Of course in the above case 
> a), caching behavior will not be completely the same (additional caching in 
> case of broken Expires header if the config uses CacheStoreExpired of either 
> max-age or a-maxage was send). The condition and comment additions was copied 
> from the case b) "if" directly below the changed block:
> 
> Index: modules/cache/mod_cache.c
> ===
> --- modules/cache/mod_cache.c   (revision 1833803)
> +++ modules/cache/mod_cache.c   (working copy)
> @@ -1040,8 +1040,11 @@
> if (reason) {
> /* noop */
> }
> -else if (exps != NULL && exp == APR_DATE_BAD) {
> -/* if a broken Expires header is present, don't cache it */
> +else if (!control.s_maxage && !control.max_age && !dconf->store_expired
> + && exps != NULL && exp == APR_DATE_BAD) {
> +/* if a broken Expires header is present, don't cache it
> + * Unless CC: s-maxage or max-age is present
> + */
> reason = apr_pstrcat(p, "Broken expires header: ", exps, NULL);
> }
> else if (!control.s_maxage && !control.max_age
> 
> 
> Thanks for any feedback!
> 
> Regards,
> 
> Rainer
> 
> [1] https://tools.ietf.org/html/rfc7234#section-5.3



Re: mod_cache forgets to include cached headers when serving cached content under certain circumstances (2.2.x)

2016-05-23 Thread Eric Covener
On Mon, May 23, 2016 at 6:24 AM, Rob Landrito  wrote:
> The specific circumstance is described in a comment in mod_cache:
>
> /* Hold the phone. Some servers might allow us to cache a 2xx, but
>  * then make their 304 responses non cacheable. This leaves us in a
>  * sticky position. If the 304 is in answer to our own conditional
>  * request, we cannot send this 304 back to the client because the
>  * client isn't expecting it. Instead, our only option is to respect
>  * the answer to the question we asked (has it changed, answer was
>  * no) and return the cached item to the client, and then respect
>  * the uncacheable nature of this 304 by allowing the remove_url
>  * filter to kick in and remove the cached entity.
>  */
>
> The code proceeds to push the cached content onto the bucket brigade but
> neglects to apply the matching cached headers.
>
> Originally I thought the solution would be to simply add the cached
> headers but in the 2.4.x branch, it seems that this case is handled by
> re-requesting the resource without the conditional.  Should this code be
> backported into the the 2.2.x branch ?

Should/could yes. Not a ton of developer capacity on 2.2.x, though.


-- 
Eric Covener
cove...@gmail.com


Re: mod_cache: Broken code?

2015-12-07 Thread Eric Covener
On Fri, Apr 24, 2015 at 2:04 AM, Niklas Edmundsson  wrote:
>
> When trying to debug something else I stumbled across this code-snippet in
> modules/cache/mod_cache.c:
>
> errno = 0;
> x = control.max_age_value;
> if (errno) {
> x = dconf->defex;
> }
> else {
> x = x * MSEC_ONE_SEC;
> }
>
> It looks that way both in trunk and 2.4.x.
>
> The likelhood of that if-statement to have more than one outcome is low, and
> the fact that errno isn't used anywhere else in mod_cache.c makes it even
> more suspicious.
>
> Looking at the annotated history it seems like the errno-stuff stems from
> apr_atoi64() being used once upon a time but has since been removed without
> cleaning up the related code...
>
> It's easy to just remove the now redundant code, but is that the right way
> to do it or did the initial code have some function that's now gone missing?
>

Just came across the same thing (looking at s-maxage stuff) and found your note
searching my mail.

I think all of the checking is still present between ap_cache_control
and the caller
checking control.maxage before looking at control.maxage_value.

I just removed the errno stuff  in r1718496.


Re: mod_cache thundering herd bug

2014-04-21 Thread Eric Covener
> Covener - Are you talking about my comments in #16 on the ticket? 
> (https://issues.apache.org/bugzilla/show_bug.cgi?id=50317#c16)
>
> If so, do either you or Graham have thoughts on the Age header getting 
> returned with stale content? In my testing, when stale content is getting 
> returned, no Age header is set which appears to be a violation of HTTP 1.1.
>

yes, I think it's not that it's unset, but that the calculation
somehow uses the revalidation-in-progress check time as the basis.

-- 
Eric Covener
cove...@gmail.com


Re: mod_cache thundering herd bug

2014-04-21 Thread Jim Riggs
On 21 Apr 2014, at 06:38, Graham Leggett  wrote:

> On 19 Apr 2014, at 10:26 PM, Eric Covener  wrote:
> 
>> Graham -- related subject brought up either in Denver or in the bug.
>> It seems that when we serve a stale file while the cache is locked,
>> the age headers are small instead of large. I got totally lost trying
>> to track down the issue, maybe it makes sense to you?  It's almost as
>> if they time of the revalidation is somehow updated early and the
>> delta in the stale cache hits is based off of that.
> 
> All thundering herd does is after letting the first conditional request 
> through, it serves stale data (RFC willing) until that conditional request 
> comes back or a specific maximum time is reached, whichever comes first.
> 
> The most valuable piece of information in this process is the "reason" 
> variable, which describes the reason why something wasn't eligible for 
> caching. In httpd v2.4 the X-Cache-Detail header will give this to you, in 
> httpd v2.2 you'll need to log at DEBUG level to get this:
> 
>ap_log_rerror(APLOG_MARK, APLOG_DEBUG, 0, r,
>"cache: %s not cached. Reason: %s", r->unparsed_uri,
>reason);
> 
> The questions to answer are:
> 
> - Is there stale content to serve? No stale content, no thundering herd 
> protection.
> - If stale content is being deleted, identify why that is. This is likely to 
> be unrelated to thundering herd, but rather in other parts of mod_cache.



Covener - Are you talking about my comments in #16 on the ticket? 
(https://issues.apache.org/bugzilla/show_bug.cgi?id=50317#c16)

If so, do either you or Graham have thoughts on the Age header getting returned 
with stale content? In my testing, when stale content is getting returned, no 
Age header is set which appears to be a violation of HTTP 1.1.



Re: mod_cache thundering herd bug

2014-04-21 Thread Graham Leggett
On 19 Apr 2014, at 10:26 PM, Eric Covener  wrote:

> Graham -- related subject brought up either in Denver or in the bug.
> It seems that when we serve a stale file while the cache is locked,
> the age headers are small instead of large. I got totally lost trying
> to track down the issue, maybe it makes sense to you?  It's almost as
> if they time of the revalidation is somehow updated early and the
> delta in the stale cache hits is based off of that.

All thundering herd does is after letting the first conditional request 
through, it serves stale data (RFC willing) until that conditional request 
comes back or a specific maximum time is reached, whichever comes first.

The most valuable piece of information in this process is the "reason" 
variable, which describes the reason why something wasn't eligible for caching. 
In httpd v2.4 the X-Cache-Detail header will give this to you, in httpd v2.2 
you'll need to log at DEBUG level to get this:

ap_log_rerror(APLOG_MARK, APLOG_DEBUG, 0, r,
"cache: %s not cached. Reason: %s", r->unparsed_uri,
reason);

The questions to answer are:

- Is there stale content to serve? No stale content, no thundering herd 
protection.
- If stale content is being deleted, identify why that is. This is likely to be 
unrelated to thundering herd, but rather in other parts of mod_cache.

Regards,
Graham
--



Re: mod_cache thundering herd bug

2014-04-19 Thread Eric Covener
On Tue, Apr 8, 2014 at 4:11 PM, Jim Riggs  wrote:
> https://issues.apache.org/bugzilla/show_bug.cgi?id=50317
>
> While we are at ApacheCon, I would love to address this nasty bug with 
> someone familiar with 2.2's mod_cache. Our sites were brought down a few 
> times last year before we finally tracked it down to being this particular 
> bug. I am using a crude backport of the 2.3 patch (r1023398) in 2.2. It 
> works, but I don't know if it is correct.
>
> Can someone look at this one with me? We really need to get this fixed in 
> 2.2, because there is NO thundering herd protection at all as things stand 
> right now.
>


Graham -- related subject brought up either in Denver or in the bug.
It seems that when we serve a stale file while the cache is locked,
the age headers are small instead of large. I got totally lost trying
to track down the issue, maybe it makes sense to you?  It's almost as
if they time of the revalidation is somehow updated early and the
delta in the stale cache hits is based off of that.

-- 
Eric Covener
cove...@gmail.com


Re: mod_cache thundering herd bug

2014-04-14 Thread Maciej Bogucki



r1023398 for 2.2:

http://people.apache.org/~covener/patches/httpd-2.2.x-thunder.diff

The remove_url() prevents other threads from serving a stale cached
file during refresh of a slow response, but it's unnecessary to have a
separate path because the refresh has to deal with 200s already.  When
the remove_url was added, there as no thundering herd lock / no
ability to serve stale content while one guy was reloading.


covener, mrumph, and I looked at this today at ApacheCon. I updated the bug 
with some comments
and attached this patch.

https://issues.apache.org/bugzilla/show_bug.cgi?id=50317


Hello,

Thank You very much for the patch but*it doesn't works*. When I'm doing ab 
(/usr/bin/ab -k -c 5 -n 10http://host/url) test the application get more than 
one request

1.1.1.1 - - [14/Apr/2014:14:01:58 +0200] "GET /url HTTP/1.0" 200 42398 
9A68DBA96CED90DC517F7D6302F5A748.gpi-app1 1163 1163
1.1.1.1 - - [14/Apr/2014:14:02:05 +0200] "GET /url HTTP/1.0" 200 42398 
D378685BBD4FB87C63A3A867ABFAFB3E.gpi-app1 2931 2930
1.1.1.1 - - [14/Apr/2014:14:02:05 +0200] "GET /url HTTP/1.0" 200 42398 
8B77A0C68FC6F16E0BA3A89C7A614E1A.gpi-app1 2992 2991
1.1.1.1 - - [14/Apr/2014:14:02:05 +0200] "GET /url HTTP/1.0" 200 42398 
57A48B49FB6C52E28F1FA97DDFCDC0C8.gpi-app1 3007 3006
1.1.1.1 - - [14/Apr/2014:14:02:05 +0200] "GET /url HTTP/1.0" 200 42398 
71573080388181B3C55E88CB4BFAB890.gpi-app1 3051 3051
1.1.1.1 - - [14/Apr/2014:14:02:06 +0200] "GET /url HTTP/1.0" 200 42398 
38DA8533D4F9B4046A2F607071652E94.gpi-app1 1412 1412


Here are more information how to reproduce it.

*Compilation*

cd /tmp
svn cohttp://svn.apache.org/repos/asf/httpd/httpd/branches/2.2.x
cd 2.2.x/
svn cohttp://svn.apache.org/repos/asf/apr/apr/branches/1.4.x  srclib/apr
svn cohttp://svn.apache.org/repos/asf/apr/apr-util/branches/1.4.x  
srclib/apr-util
./buildconf
./configure --prefix=/etc/httpd --exec-prefix=/usr --bindir=/usr/bin
--sbindir=/usr/sbin --mandir=/usr/share/man --libdir=/usr/lib64
--sysconfdir=/etc/httpd/conf --includedir=/usr/include/httpd
--libexecdir=/usr/lib64/httpd/modules --datadir=/var/www
--with-installbuilddir=/usr/lib64/httpd/build --with-mpm=prefork
--with-apr=/usr --with-apr-util=/usr --enable-suexec --with-suexec
--with-suexec-caller=apache --with-suexec-docroot=/var/www
--with-suexec-logfile=/var/log/httpd/suexec.log
--with-suexec-bin=/usr/sbin/suexec --with-suexec-uidmin=500
--with-suexec-gidmin=100 --enable-pie --with-pcre
--enable-mods-shared=all --enable-ssl --with-ssl --enable-proxy
--enable-cache --enable-disk-cache --enable-ldap --enable-authnz-ldap
--enable-cgid --enable-authn-anon --enable-authn-alias
--disable-imagemap
patch -p0 < /root/rpmbuild/SOURCES/httpd-2.2.x-thunder.patch
make
make install

*Configuration**
*

...
...
## Cache
CacheRoot /tmp/cache
CacheEnable disk /
CacheDisable /static/
CacheMinFileSize 0
CacheMaxFileSize 1048576
CacheDirLevels 2
CacheDirLength 2
CacheLock on
CacheLockPath /tmp/mod_cache-lock
CacheLockMaxAge 5
CacheIgnoreHeaders ETag Set-Cookie
Header unset Expires
Header unset Cache-Control
Header always set Cache-Control "max-age=30,stale-while-revalidate=15"


Best Regards
Maciej Bogucki


Re: mod_cache thundering herd bug

2014-04-09 Thread Jim Riggs
On 9 Apr 2014, at 14:46, Eric Covener  wrote:

> r1023398 for 2.2:
> 
>  http://people.apache.org/~covener/patches/httpd-2.2.x-thunder.diff
> 
> The remove_url() prevents other threads from serving a stale cached
> file during refresh of a slow response, but it's unnecessary to have a
> separate path because the refresh has to deal with 200s already.  When
> the remove_url was added, there as no thundering herd lock / no
> ability to serve stale content while one guy was reloading.


covener, mrumph, and I looked at this today at ApacheCon. I updated the bug 
with some comments and attached this patch.

https://issues.apache.org/bugzilla/show_bug.cgi?id=50317



Re: mod_cache thundering herd bug

2014-04-09 Thread Eric Covener
r1023398 for 2.2:

  http://people.apache.org/~covener/patches/httpd-2.2.x-thunder.diff

The remove_url() prevents other threads from serving a stale cached
file during refresh of a slow response, but it's unnecessary to have a
separate path because the refresh has to deal with 200s already.  When
the remove_url was added, there as no thundering herd lock / no
ability to serve stale content while one guy was reloading.

On Tue, Apr 8, 2014 at 2:11 PM, Jim Riggs  wrote:
> https://issues.apache.org/bugzilla/show_bug.cgi?id=50317
>
> While we are at ApacheCon, I would love to address this nasty bug with 
> someone familiar with 2.2's mod_cache. Our sites were brought down a few 
> times last year before we finally tracked it down to being this particular 
> bug. I am using a crude backport of the 2.3 patch (r1023398) in 2.2. It 
> works, but I don't know if it is correct.
>
> Can someone look at this one with me? We really need to get this fixed in 
> 2.2, because there is NO thundering herd protection at all as things stand 
> right now.
>
> - Jim
>



-- 
Eric Covener
cove...@gmail.com


Re: mod_cache with Cache-Control no-cache= or private=

2013-03-27 Thread Yann Ylavic
The latest patch is attached to bugzilla #54706.

Regards,
Yann.


Re: mod_cache with Cache-Control no-cache= or private=

2013-03-27 Thread Yann Ylavic
Sorry for my precipitation, the Content-Type is stripped from the
validated (stale) headers with the previous patch, we have to do a
copy like below.

Regards,
Yann.

Index: modules/cache/cache_util.c
===
--- modules/cache/cache_util.c  (revision 1461557)
+++ modules/cache/cache_util.c  (working copy)
@@ -542,13 +542,10 @@
 }

 /* These come from the cached entity. */
-if (h->cache_obj->info.control.no_cache
-|| h->cache_obj->info.control.no_cache_header
-|| h->cache_obj->info.control.private_header) {
+if (h->cache_obj->info.control.no_cache) {
 /*
- * The cached entity contained Cache-Control: no-cache, or a
- * no-cache with a header present, or a private with a header
- * present, so treat as stale causing revalidation.
+ * The cached entity contained Cache-Control: no-cache,
+ * so treat as stale causing revalidation.
  */
 return 0;
 }
@@ -1069,9 +1066,7 @@
 /* ...then try slowest cases */
 else if (!strncasecmp(token, "no-cache", 8)) {
 if (token[8] == '=') {
-if (apr_table_get(headers, token + 9)) {
-cc->no_cache_header = 1;
-}
+cc->no_cache_header = 1;
 }
 else if (!token[8]) {
 cc->no_cache = 1;
@@ -1146,9 +1141,7 @@
 }
 else if (!strncasecmp(token, "private", 7)) {
 if (token[7] == '=') {
-if (apr_table_get(headers, token + 8)) {
-cc->private_header = 1;
-}
+cc->private_header = 1;
 }
 else if (!token[7]) {
 cc->private = 1;
Index: modules/cache/mod_cache.c
===
--- modules/cache/mod_cache.c   (revision 1461557)
+++ modules/cache/mod_cache.c   (working copy)
@@ -714,6 +714,65 @@
 }

 /*
+ * Same as ap_cacheable_headers_out(), but also strips the headers
+ * specified by the Cache-Control private= or no-cache= directives.
+ */
+static int cc_field_doo_doo(void *t, const char *key,
+ const char *val)
+{
+if (val) {
+apr_table_addn(t, key, val);
+return 0;
+}
+return 1;
+}
+static apr_table_t *cache_cacheable_headers_cc(request_rec *r,
+ const cache_control_t *cc)
+{
+apr_table_t *headers_out = ap_cache_cacheable_headers_out(r);
+if (cc && (cc->no_cache_header || cc->private_header)) {
+char *token;
+const char *cc_out = apr_table_get(headers_out, "Cache-Control");
+while (cc_out && (token = ap_get_list_item(r->pool, &cc_out))) {
+apr_size_t len = strlen(token);
+
+/* ap_get_list_item() strips the spurious whitespaces and
+ * lowercases anything (but the quoted-strings) */
+if (len > 9 && strncmp(token, "no-cache=", 9) == 0) {
+token += 9;
+len -= 9;
+}
+else if (len > 8 && strncmp(token, "private=", 8) == 0) {
+token += 8;
+len -= 8;
+}
+else {
+continue;
+}
+
+/* RFC2616 14.9: quoted list of field-names */
+if (len > 2 && token[0] == '"' && token[--len] == '"') {
+/* strip the header(s) from the cacheable headers out,
+ * but preserve the ones from the current response by
+ * adding them to the err_headers_out */
+const char *tok, *header;
+(++token)[--len] = '\0';
+tok = token;
+do {
+if ((header = ap_cache_tokstr(r->pool, tok, &tok)) &&
+!apr_table_do(cc_field_doo_doo, r->err_headers_out,
+  headers_out, header, NULL)) {
+apr_table_unset(r->headers_out, header);
+apr_table_unset(headers_out, header);
+}
+} while (tok);
+}
+}
+}
+return headers_out;
+}
+
+/*
  * CACHE_SAVE filter
  * ---
  *
@@ -746,6 +805,7 @@
 apr_time_t exp, date, lastmod, now;
 apr_off_t size = -1;
 cache_info *info = NULL;
+apr_table_t *cc_headers;
 char *reason;
 apr_pool_t *p;
 apr_bucket *e;
@@ -1075,7 +1135,7 @@
  * err_headers_out and we also need to strip any hop-by-hop
  * headers that might have snuck in.
  */
-r->headers_out = ap_cache_cacheable_headers_out(r);
+r->headers_out = cache_cacheable_headers_cc(r, &control);

   

Re: mod_cache with Cache-Control no-cache= or private=

2013-03-27 Thread Yann Ylavic
In fact this patch is probably better since it does not change
h->resp_hdrs before calling cache_accept_headers() which uses them.

Regars,
Yann.

Index: modules/cache/cache_util.c
===
--- modules/cache/cache_util.c  (revision 1461557)
+++ modules/cache/cache_util.c  (working copy)
@@ -542,13 +542,10 @@
 }

 /* These come from the cached entity. */
-if (h->cache_obj->info.control.no_cache
-|| h->cache_obj->info.control.no_cache_header
-|| h->cache_obj->info.control.private_header) {
+if (h->cache_obj->info.control.no_cache) {
 /*
- * The cached entity contained Cache-Control: no-cache, or a
- * no-cache with a header present, or a private with a header
- * present, so treat as stale causing revalidation.
+ * The cached entity contained Cache-Control: no-cache,
+ * so treat as stale causing revalidation.
  */
 return 0;
 }
@@ -1069,9 +1066,7 @@
 /* ...then try slowest cases */
 else if (!strncasecmp(token, "no-cache", 8)) {
 if (token[8] == '=') {
-if (apr_table_get(headers, token + 9)) {
-cc->no_cache_header = 1;
-}
+cc->no_cache_header = 1;
 }
 else if (!token[8]) {
 cc->no_cache = 1;
@@ -1146,9 +1141,7 @@
 }
 else if (!strncasecmp(token, "private", 7)) {
 if (token[7] == '=') {
-if (apr_table_get(headers, token + 8)) {
-cc->private_header = 1;
-}
+cc->private_header = 1;
 }
 else if (!token[7]) {
 cc->private = 1;
Index: modules/cache/mod_cache.c
===
--- modules/cache/mod_cache.c   (revision 1461557)
+++ modules/cache/mod_cache.c   (working copy)
@@ -714,6 +714,65 @@
 }

 /*
+ * Same as ap_cacheable_headers_out(), but also strips the headers
+ * specified by the Cache-Control private= or no-cache= directives.
+ */
+static int cc_field_doo_doo(void *t, const char *key,
+ const char *val)
+{
+if (val) {
+apr_table_addn(t, key, val);
+return 0;
+}
+return 1;
+}
+static apr_table_t *cache_cacheable_headers_cc(request_rec *r,
+ const cache_control_t *cc)
+{
+apr_table_t *headers_out = ap_cache_cacheable_headers_out(r);
+if (cc && (cc->no_cache_header || cc->private_header)) {
+char *token;
+const char *cc_out = apr_table_get(headers_out, "Cache-Control");
+while (cc_out && (token = ap_get_list_item(r->pool, &cc_out))) {
+apr_size_t len = strlen(token);
+
+/* ap_get_list_item() strips the spurious whitespaces and
+ * lowercases anything (but the quoted-strings) */
+if (len > 9 && strncmp(token, "no-cache=", 9) == 0) {
+token += 9;
+len -= 9;
+}
+else if (len > 8 && strncmp(token, "private=", 8) == 0) {
+token += 8;
+len -= 8;
+}
+else {
+continue;
+}
+
+/* RFC2616 14.9: quoted list of field-names */
+if (len > 2 && token[0] == '"' && token[--len] == '"') {
+/* strip the header(s) from the cacheable headers out,
+ * but preserve the ones from the current response by
+ * adding them to the err_headers_out */
+const char *tok, *header;
+(++token)[--len] = '\0';
+tok = token;
+do {
+if ((header = ap_cache_tokstr(r->pool, tok, &tok)) &&
+!apr_table_do(cc_field_doo_doo, r->err_headers_out,
+  headers_out, header, NULL)) {
+apr_table_unset(r->headers_out, header);
+apr_table_unset(headers_out, header);
+}
+} while (tok);
+}
+}
+}
+return headers_out;
+}
+
+/*
  * CACHE_SAVE filter
  * ---
  *
@@ -746,6 +805,7 @@
 apr_time_t exp, date, lastmod, now;
 apr_off_t size = -1;
 cache_info *info = NULL;
+apr_table_t *cc_headers;
 char *reason;
 apr_pool_t *p;
 apr_bucket *e;
@@ -1075,7 +1135,7 @@
  * err_headers_out and we also need to strip any hop-by-hop
  * headers that might have snuck in.
  */
-r->headers_out = ap_cache_cacheable_headers_out(r);
+r->headers_out = cache_cacheable_headers_cc(r, &control);

 /* Merge in ou

Re: mod_cache with Cache-Control no-cache= or private=

2013-03-27 Thread Yann Ylavic
On Wed, Mar 27, 2013 at 5:44 PM, Graham Leggett  wrote:
> Been snowed under and haven't had a chance to look at this in detail, but one 
> quick thing - we would definitely want to be able to backport this to v2.4 so 
> as to get it into people's hands, and to do that, we cannot change the public 
> APIs. We would need to find a way to do this without changing the API.
>
> A further thing is the reparsing on the Cache-Control string, I'd like to see 
> if I can find a way to avoid this, but need to dig as how to do that.

In the patch below there is no API change and Cache-Control headers
won't be parsed twice, by updating the h->req/resp_hdrs before calling
provider->store_headers() and let it use h->req/resp_headers instead
of recomputing the whole from r->(err_)headers_out.

Regards,
Yann.

Index: modules/cache/cache_util.c
===
--- modules/cache/cache_util.c  (revision 1461557)
+++ modules/cache/cache_util.c  (working copy)
@@ -542,13 +542,10 @@
 }

 /* These come from the cached entity. */
-if (h->cache_obj->info.control.no_cache
-|| h->cache_obj->info.control.no_cache_header
-|| h->cache_obj->info.control.private_header) {
+if (h->cache_obj->info.control.no_cache) {
 /*
- * The cached entity contained Cache-Control: no-cache, or a
- * no-cache with a header present, or a private with a header
- * present, so treat as stale causing revalidation.
+ * The cached entity contained Cache-Control: no-cache,
+ * so treat as stale causing revalidation.
  */
 return 0;
 }
@@ -1069,9 +1066,7 @@
 /* ...then try slowest cases */
 else if (!strncasecmp(token, "no-cache", 8)) {
 if (token[8] == '=') {
-if (apr_table_get(headers, token + 9)) {
-cc->no_cache_header = 1;
-}
+cc->no_cache_header = 1;
 }
 else if (!token[8]) {
 cc->no_cache = 1;
@@ -1146,9 +1141,7 @@
 }
 else if (!strncasecmp(token, "private", 7)) {
 if (token[7] == '=') {
-if (apr_table_get(headers, token + 8)) {
-cc->private_header = 1;
-}
+cc->private_header = 1;
 }
 else if (!token[7]) {
 cc->private = 1;
Index: modules/cache/mod_cache.c
===
--- modules/cache/mod_cache.c   (revision 1461557)
+++ modules/cache/mod_cache.c   (working copy)
@@ -714,6 +714,65 @@
 }

 /*
+ * Same as ap_cacheable_headers_out(), but also strips the headers
+ * specified by the Cache-Control private= or no-cache= directives.
+ */
+static int cc_field_doo_doo(void *t, const char *key,
+ const char *val)
+{
+if (val) {
+apr_table_addn(t, key, val);
+return 0;
+}
+return 1;
+}
+static apr_table_t *cache_cacheable_headers_cc(request_rec *r,
+ const cache_control_t *cc)
+{
+apr_table_t *headers_out = ap_cache_cacheable_headers_out(r);
+if (cc && (cc->no_cache_header || cc->private_header)) {
+char *token;
+const char *cc_out = apr_table_get(headers_out, "Cache-Control");
+while (cc_out && (token = ap_get_list_item(r->pool, &cc_out))) {
+apr_size_t len = strlen(token);
+
+/* ap_get_list_item() strips the spurious whitespaces and
+ * lowercases anything (but the quoted-strings) */
+if (len > 9 && strncmp(token, "no-cache=", 9) == 0) {
+token += 9;
+len -= 9;
+}
+else if (len > 8 && strncmp(token, "private=", 8) == 0) {
+token += 8;
+len -= 8;
+}
+else {
+continue;
+}
+
+/* RFC2616 14.9: quoted list of field-names */
+if (len > 2 && token[0] == '"' && token[--len] == '"') {
+/* strip the header(s) from the cacheable headers out,
+ * but preserve the ones from the current response by
+ * adding them to the err_headers_out */
+const char *tok, *header;
+(++token)[--len] = '\0';
+tok = token;
+do {
+if ((header = ap_cache_tokstr(r->pool, tok, &tok)) &&
+!apr_table_do(cc_field_doo_doo, r->err_headers_out,
+  headers_out, header, NULL)) {
+apr_table_unset(r->headers_out, header);
+apr_table_unset(headers_out, header);
+}
+} 

Re: mod_cache with Cache-Control no-cache= or private=

2013-03-27 Thread Graham Leggett
On 27 Mar 2013, at 6:06 PM, Yann Ylavic  wrote:

> Index: modules/cache/mod_cache.h
> ===
> --- modules/cache/mod_cache.h (revision 1461557)
> +++ modules/cache/mod_cache.h (working copy)
> @@ -152,9 +152,12 @@
> 
> /* Create a new table consisting of those elements from an output
>  * headers table that are allowed to be stored in a cache;
> + * when cc is not NULL, also strip the headers specified by the
> + * Cache-Control private= or no-cache= directives;
>  * ensure there is a content type and capture any errors.
>  */
> -CACHE_DECLARE(apr_table_t *)ap_cache_cacheable_headers_out(request_rec *r);
> +CACHE_DECLARE(apr_table_t *)ap_cache_cacheable_headers_out(request_rec *r,
> + const cache_control_t *cc);
> 
> /**
>  * Parse the Cache-Control and Pragma headers in one go, marking

Been snowed under and haven't had a chance to look at this in detail, but one 
quick thing - we would definitely want to be able to backport this to v2.4 so 
as to get it into people's hands, and to do that, we cannot change the public 
APIs. We would need to find a way to do this without changing the API.

A further thing is the reparsing on the Cache-Control string, I'd like to see 
if I can find a way to avoid this, but need to dig as how to do that.

Regards,
Graham
--



smime.p7s
Description: S/MIME cryptographic signature


Re: mod_cache with Cache-Control no-cache= or private=

2013-03-27 Thread Yann Ylavic
I have already created the bugzilla issue #54706 nearly 2 weeks ago,
about mod_cache that may serve cached private= or no-cache= response
headers.

Should I link something discussion from here or the patch to this issue ?

Regards,
Yann.


Re: mod_cache with Cache-Control no-cache= or private=

2013-03-27 Thread Yann Ylavic
On Mon, Mar 25, 2013 at 11:58 PM, Roy T. Fielding  wrote:
> On Mar 13, 2013, at 10:20 AM, Graham Leggett wrote:
>> I don't read it that way from the spec, I think it all comes down to the 
>> phrase "without successful revalidation with the origin server". I read it 
>> as "without successful revalidation [of the whole request] with the origin 
>> server". In other words, the origin server sent the original header, if the 
>> origin server doesn't update the header in the 304 response then it means 
>> "I've had my opportunity to revalidate the entity, current cached value is 
>> fine, send it".
>>
>> Roy ultimately would be able to answer this?
>
> It means delete the header fields prior to storing them in the cache
> for later reuse.  If the origin had wanted must-revalidate, it would
> simply use that directive instead.  The successful revalidation bit
> is saying that the cache should forward all of the fields for the response
> to the original request and for any response that is revalidated
> (i.e., forward the new fields received in 304), but not for the
> requests that are entirely handled by the cache.
>

Thank you for clarification, hence mod_cache is allowed to serve the
cached response (with respect to the "other restrictions"), no
revalidation is needed (for the CC header fields at least).

The following patch implements this behaviour, with the CC header
fields not being stored while still played with the origin
(validation) response.

Regards,
Yann.

Index: modules/cache/cache_util.c
===
--- modules/cache/cache_util.c  (revision 1461557)
+++ modules/cache/cache_util.c  (working copy)
@@ -542,13 +542,10 @@
 }

 /* These come from the cached entity. */
-if (h->cache_obj->info.control.no_cache
-|| h->cache_obj->info.control.no_cache_header
-|| h->cache_obj->info.control.private_header) {
+if (h->cache_obj->info.control.no_cache) {
 /*
- * The cached entity contained Cache-Control: no-cache, or a
- * no-cache with a header present, or a private with a header
- * present, so treat as stale causing revalidation.
+ * The cached entity contained Cache-Control: no-cache,
+ * so treat as stale causing revalidation.
  */
 return 0;
 }
@@ -915,12 +912,23 @@
 return ap_cache_cacheable_headers(r->pool, r->headers_in, r->server);
 }

-/*
- * Create a new table consisting of those elements from an output
+static int cc_field_doo_doo(void *t, const char *key, const char *val)
+{
+if (val) {
+apr_table_addn(t, key, val);
+return 0;
+}
+return 1;
+}
+
+/* Create a new table consisting of those elements from an output
  * headers table that are allowed to be stored in a cache;
+ * when cc is not NULL, also strip the headers specified by the
+ * Cache-Control private= or no-cache= directives;
  * ensure there is a content type and capture any errors.
  */
-CACHE_DECLARE(apr_table_t *)ap_cache_cacheable_headers_out(request_rec *r)
+CACHE_DECLARE(apr_table_t *)ap_cache_cacheable_headers_out(request_rec *r,
+ const cache_control_t *cc)
 {
 apr_table_t *headers_out;

@@ -944,6 +952,46 @@
r->content_encoding);
 }

+if (cc && (cc->no_cache_header || cc->private_header)) {
+char *token;
+const char *cc_out = apr_table_get(headers_out, "Cache-Control");
+while (cc_out && (token = ap_get_list_item(r->pool, &cc_out))) {
+apr_size_t len = strlen(token);
+
+/* ap_get_list_item() strips the spurious whitespaces and
+ * lowercases anything (but the quoted-strings) */
+if (len > 9 && strncmp(token, "no-cache=", 9) == 0) {
+token += 9;
+len -= 9;
+}
+else if (len > 8 && strncmp(token, "private=", 8) == 0) {
+token += 8;
+len -= 8;
+}
+else {
+continue;
+}
+
+/* RFC2616 14.9: quoted list of field-names */
+if (len > 2 && token[0] == '"' && token[--len] == '"') {
+/* strip the header(s) from the cacheable headers out,
+ * but preserve the ones from the current response by
+ * adding them to the err_headers_out */
+const char *tok, *header;
+(++token)[--len] = '\0';
+tok = token;
+do {
+if ((header = ap_cache_tokstr(r->pool, tok, &tok)) &&
+!apr_table_do(cc_field_doo_doo, r->err_headers_out,
+  headers_out, header, NULL)) {
+apr_table_unset(r->headers_out, header);
+apr_table_unset(headers_out, header);
+}
+} while (tok);
+}
+

Re: mod_cache with Cache-Control no-cache= or private=

2013-03-25 Thread Roy T. Fielding
On Mar 13, 2013, at 10:20 AM, Graham Leggett wrote:

> On 11 Mar 2013, at 12:50 PM, Yann Ylavic  wrote:
> 
>>> The way I read the spec, "the specified field-name(s) MUST NOT be sent in 
>>> the response to a subsequent request without successful revalidation with 
>>> the origin server". What this means is that if the specified field names 
>>> are found to be present in the cached response, then the origin server 
>>> needs to be given the opportunity to update these fields through a 
>>> conditional request. In the current cache code, we return 0 meaning "this 
>>> is stale, revalidate", and a conditional request is sent to the origin. We 
>>> hope the origin sends "304 Not Modified", with updated headers 
>>> corresponding to the fields.
>> 
>> Ok, I see your point, and this is surely the right reading of the rfc,
>> but there is necessarily a difference between no-cache and
>> no-cache="", particularly with the handling of that (stale)
>> header(s).
>> 
>> For what I understand now, if the origin does not send (one of) the
>> header(s) in its 304 response, the stale header(s) "MUST NOT" be
>> served.
> 
> I don't read it that way from the spec, I think it all comes down to the 
> phrase "without successful revalidation with the origin server". I read it as 
> "without successful revalidation [of the whole request] with the origin 
> server". In other words, the origin server sent the original header, if the 
> origin server doesn't update the header in the 304 response then it means 
> "I've had my opportunity to revalidate the entity, current cached value is 
> fine, send it".
> 
> Roy ultimately would be able to answer this?

It means delete the header fields prior to storing them in the cache
for later reuse.  If the origin had wanted must-revalidate, it would
simply use that directive instead.  The successful revalidation bit
is saying that the cache should forward all of the fields for the response
to the original request and for any response that is revalidated
(i.e., forward the new fields received in 304), but not for the
requests that are entirely handled by the cache.

Roy



Re: mod_cache with Cache-Control no-cache= or private=

2013-03-13 Thread Yann Ylavic
On Wed, Mar 13, 2013 at 9:28 PM, Tim Bannister  wrote:
> Is this the situation you're worried about:
>
> ClientA: GET /foo HTTP/1.1
> ReverseProxy: GET /foo HTTP/1.1
> Origin: HTTP/1.1 200 OK
> Origin: Set-Cookie: data=AA
> Origin: Cache-Control: private=Set-Cookie
> ReverseProxy: Set-Cookie: data=AA
> ReverseProxy: Cache-Control: private=Set-Cookie
>
> ClientB: GET /foo HTTP/1.1
> ClientB: Cookie: data=BB
> ReverseProxy: GET /foo HTTP/1.1
> ReverseProxy: Cookie: data=BBB
> Origin: HTTP/1.1 304 Not Modified

Yes, about what happens now, the ReverseProxy (mod_cache) must not
"Set-Cookie: data=AA" to ClientB.

> This should just work. The final reply from the cacheing reverse proxy should 
> look like this:
> ReverseProxy: HTTP/1.1 304 Not Modified
> ReverseProxy: Date: …

This actually does not work, mod_cache will serve the cached Set-Cookie.
The CacheIgnoreHeaders directive only can prevent this (not controlled
by the origin).

The final reply to ClientB, whose request is not conditional, can also be :
ReverseProxy: HTTP/1.1 200 OK
ReverseProxy: Cache-Control: private=Set-Cookie
ReverseProxy: 
That's the main goal I guess (limit backend hits for large things).

> and the Set-Cookie: header from the stored request would not be used (in 
> fact, the proxy may have elected not to store it). The origin doesn't have to 
> mention that header in the 304 response.

In mod_cache the "no-store" of a particular header is harder to patch
than the "no-cache" (ie. do not serve without revalidation), but
indeed the former would be more efficient, no need to "sanitize" each
served response.

For the "private=", the rfc does not say more than its BNF...
If private has the same semantic as without the =, the header should
not be stored (the "Cache-Control: private" response is not stored by
mod_cache).

In all cases, IMHO, no cached Set-Cookie should aver played by a cache
with private/no-cache="Set-Cookie" associated with the resource.


Re: mod_cache with Cache-Control no-cache= or private=

2013-03-13 Thread Tim Bannister
On 13 Mar 2013, at 17:41, Yann Ylavic  wrote:
> On Wed, Mar 13, 2013 at 6:35 PM, Tom Evans  wrote:
>> On Wed, Mar 13, 2013 at 5:27 PM, Yann Ylavic  wrote:
>>> 
>>> How would the origin invalidate a Set-Cookie, with an empty one ?
>>> 
>>> Regards,
>>> Yann.
>> 
>> Set it again, with an in the past expiry date.
> 
> Well, that's not exactly the same thing, the user may have a valid Cookie 
> (which is not the one cached) the origin wants to keep going on.
> I meant invalidating the cached cookie, not the one of the user.


Is this the situation you're worried about:

ClientA: GET /foo HTTP/1.1
ClientA: Host: …

ReverseProxy: GET /foo HTTP/1.1
ReverseProxy: Host: …

Origin: HTTP/1.1 200 OK
Origin: Date: …
Origin: Set-Cookie: data=AA
Origin: Cache-Control: private=Set-Cookie

ReverseProxy: HTTP/1.1 200 OK
ReverseProxy: Date: …
ReverseProxy: Set-Cookie: data=AA
ReverseProxy: Cache-Control: private=Set-Cookie



ClientB: GET /foo HTTP/1.1
ClientB: Host: …
ClientB: Cookie: data=BB

ReverseProxy: GET /foo HTTP/1.1
ReverseProxy: Host: …
ReverseProxy: Cookie: data=BBB

Origin: HTTP/1.1 304 Not Modified
Origin: Date: …
Origin: Cache-Control: private=Set-Cookie



This should just work. The final reply from the cacheing reverse proxy should 
look like this:
ReverseProxy: HTTP/1.1 304 Not Modified
ReverseProxy: Date: …

and the Set-Cookie: header from the stored request would not be used (in fact, 
the proxy may have elected not to store it). The origin doesn't have to mention 
that header in the 304 response.


-- 
Tim Bannister – is...@jellybaby.net



Re: mod_cache with Cache-Control no-cache= or private=

2013-03-13 Thread Yann Ylavic
On Wed, Mar 13, 2013 at 6:35 PM, Tom Evans  wrote:
> On Wed, Mar 13, 2013 at 5:27 PM, Yann Ylavic  wrote:
>>
>> How would the origin invalidate a Set-Cookie, with an empty one ?
>>
>> Regards,
>> Yann.
>
> Set it again, with an in the past expiry date.

Well, that's not exactly the same thing, the user may have a valid
Cookie (which is not the one cached) the origin wants to keep going
on.
I meant invalidating the cached cookie, not the one of the user.

Cheers,
Yann.


Re: mod_cache with Cache-Control no-cache= or private=

2013-03-13 Thread Yann Ylavic
On Wed, Mar 13, 2013 at 6:30 PM, Graham Leggett  wrote:
> On 13 Mar 2013, at 7:27 PM, Yann Ylavic  wrote:
>
>> How would the origin invalidate a Set-Cookie, with an empty one ?
>
> I would imagine with a 200 OK.
>
> Roy would be able to confirm.

Well, I can't see the difference with the no-cache without a header
specified (the actual code) then...

Regards,
Yann.


Re: mod_cache with Cache-Control no-cache= or private=

2013-03-13 Thread Tom Evans
On Wed, Mar 13, 2013 at 5:27 PM, Yann Ylavic  wrote:
>
> How would the origin invalidate a Set-Cookie, with an empty one ?
>
> Regards,
> Yann.

Set it again, with an in the past expiry date.

Cheers

Tom


Re: mod_cache with Cache-Control no-cache= or private=

2013-03-13 Thread Graham Leggett
On 13 Mar 2013, at 7:27 PM, Yann Ylavic  wrote:

> How would the origin invalidate a Set-Cookie, with an empty one ?

I would imagine with a 200 OK.

Roy would be able to confirm.

Regards,
Graham
--



smime.p7s
Description: S/MIME cryptographic signature


Re: mod_cache with Cache-Control no-cache= or private=

2013-03-13 Thread Yann Ylavic
On Wed, Mar 13, 2013 at 6:20 PM, Graham Leggett  wrote:
> On 11 Mar 2013, at 12:50 PM, Yann Ylavic  wrote:
>
>>> The way I read the spec, "the specified field-name(s) MUST NOT be sent in 
>>> the response to a subsequent request without successful revalidation with 
>>> the origin server". What this means is that if the specified field names 
>>> are found to be present in the cached response, then the origin server 
>>> needs to be given the opportunity to update these fields through a 
>>> conditional request. In the current cache code, we return 0 meaning "this 
>>> is stale, revalidate", and a conditional request is sent to the origin. We 
>>> hope the origin sends "304 Not Modified", with updated headers 
>>> corresponding to the fields.
>>
>> Ok, I see your point, and this is surely the right reading of the rfc,
>> but there is necessarily a difference between no-cache and
>> no-cache="", particularly with the handling of that (stale)
>> header(s).
>>
>> For what I understand now, if the origin does not send (one of) the
>> header(s) in its 304 response, the stale header(s) "MUST NOT" be
>> served.
>
> I don't read it that way from the spec, I think it all comes down to the 
> phrase "without successful revalidation with the origin server". I read it as 
> "without successful revalidation [of the whole request] with the origin 
> server". In other words, the origin server sent the original header, if the 
> origin server doesn't update the header in the 304 response then it means 
> "I've had my opportunity to revalidate the entity, current cached value is 
> fine, send it".

How would the origin invalidate a Set-Cookie, with an empty one ?

Regards,
Yann.


Re: mod_cache with Cache-Control no-cache= or private=

2013-03-13 Thread Graham Leggett
On 11 Mar 2013, at 12:50 PM, Yann Ylavic  wrote:

>> The way I read the spec, "the specified field-name(s) MUST NOT be sent in 
>> the response to a subsequent request without successful revalidation with 
>> the origin server". What this means is that if the specified field names are 
>> found to be present in the cached response, then the origin server needs to 
>> be given the opportunity to update these fields through a conditional 
>> request. In the current cache code, we return 0 meaning "this is stale, 
>> revalidate", and a conditional request is sent to the origin. We hope the 
>> origin sends "304 Not Modified", with updated headers corresponding to the 
>> fields.
> 
> Ok, I see your point, and this is surely the right reading of the rfc,
> but there is necessarily a difference between no-cache and
> no-cache="", particularly with the handling of that (stale)
> header(s).
> 
> For what I understand now, if the origin does not send (one of) the
> header(s) in its 304 response, the stale header(s) "MUST NOT" be
> served.

I don't read it that way from the spec, I think it all comes down to the phrase 
"without successful revalidation with the origin server". I read it as "without 
successful revalidation [of the whole request] with the origin server". In 
other words, the origin server sent the original header, if the origin server 
doesn't update the header in the 304 response then it means "I've had my 
opportunity to revalidate the entity, current cached value is fine, send it".

Roy ultimately would be able to answer this?

Regards,
Graham
--



smime.p7s
Description: S/MIME cryptographic signature


Re: mod_cache with Cache-Control no-cache= or private=

2013-03-13 Thread Yann Ylavic
Here is the patch that strips the "no-cache=" or "private=" specified
headers after the origin server's validation, leaving the only headers
updated by the origin.

Regards,
Yann.

Index: modules/cache/cache_storage.c
===
--- modules/cache/cache_storage.c   (revision 1456050)
+++ modules/cache/cache_storage.c   (working copy)
@@ -156,6 +156,51 @@
 apr_table_unset(h->resp_hdrs, "Last-Modified");
 }

+v = apr_table_get(h->resp_hdrs, "Cache-Control");
+if (v && (h->cache_obj->info.control.no_cache_header ||
+  h->cache_obj->info.control.private_header)) {
+/*
+ * RFC2616 14.9.1: If the no-cache directive does specify one or more
+ * field-names, then a cache MAY use the response to satisfy a
+ * subsequent request, subject to any other restrictions on caching.
+ * However, the specified field-name(s) MUST NOT be sent in the
+ * response to a subsequent request without successful revalidation
+ * with the origin server.
+ *
+ * Hence we will strip these cached headers (if any) and let the only
+ * ones validated by the origin server.
+ */
+char *token;
+apr_size_t len;
+while ((token = ap_get_list_item(r->pool, &v))) {
+/* ap_get_list_item() strips the spurious whitespaces and
+ * lowercases anything (but the quoted-strings) */
+if (strncmp(token, "no-cache=", 9) == 0) {
+token += 9;
+}
+else if (strncmp(token, "private=", 8) == 0) {
+token += 8;
+}
+else {
+continue;
+}
+
+/* RFC2616 14.9: quoted list of field-names */
+len = strlen(token);
+if (token[0] == '"' && token[--len] == '"') {
+(++token)[--len] = '\0';
+do {
+const char *name = ap_cache_tokstr(r->pool, token,
+   (const char**)&token);
+if (name) {
+/* strip that name header the response */
+apr_table_unset(h->resp_hdrs, name);
+}
+} while (token);
+}
+}
+}
+
 /* The HTTP specification says that it is legal to merge duplicate
  * headers into one.  Some browsers that support Cookies don't like
  * merged headers and prefer that each Set-Cookie header is sent


Re: mod_cache with Cache-Control no-cache= or private=

2013-03-11 Thread Yann Ylavic
On Sun, Mar 10, 2013 at 1:55 AM, Graham Leggett  wrote:
> On 04 Mar 2013, at 8:22 PM, ylavic dev  wrote:
>
>> For what I understand, mod_cache is allowed to serve its cached entity 
>> (though without the specified header(s)).
>
> I read this through again, this time having slept properly.

Thank you for your enlightened consideration.

> The way I read the spec, "the specified field-name(s) MUST NOT be sent in the 
> response to a subsequent request without successful revalidation with the 
> origin server". What this means is that if the specified field names are 
> found to be present in the cached response, then the origin server needs to 
> be given the opportunity to update these fields through a conditional 
> request. In the current cache code, we return 0 meaning "this is stale, 
> revalidate", and a conditional request is sent to the origin. We hope the 
> origin sends "304 Not Modified", with updated headers corresponding to the 
> fields.

Ok, I see your point, and this is surely the right reading of the rfc,
but there is necessarily a difference between no-cache and
no-cache="", particularly with the handling of that (stale)
header(s).

For what I understand now, if the origin does not send (one of) the
header(s) in its 304 response, the stale header(s) "MUST NOT" be
served.
So mod_cache should never send these stale headers to the client, and
either do not cache them at all, or strip them before overlaping the
304's ones.

The actual code does not comply with this requirement since it
overlaps the stale headers with the origin ones, hence the no-cache
headers will still be there if there are not specified in the 304
response.

> If we were to follow this patch, it means that the first time we hit the URL, 
> the client sees the private/no-cache fields, but every cached response after 
> will be treated as fresh with the field missing. This breaks caching.

Indeed this patch is broken, but I can modify it to comply with my
comment above, meaning (at first glance) that the treatment should be
in cache_accept_headers().

Should I propose the new patch or my understanding is definitively broken ?

Regards,
Yann.


Re: mod_cache with Cache-Control no-cache= or private=

2013-03-09 Thread Graham Leggett
On 04 Mar 2013, at 8:22 PM, ylavic dev  wrote:

> I've been working on a patch for mod_cache to deal (fully) with the response 
> header Cache-Control and the no-cache= or private= directives.
> This "feature" is mainly used with the Set-Cookie header, and allows the 
> origin server to control the caching of that particular header.
> 
> Although the code is already there to detect their usage with a header, 
> mod_cache still handle these directives as if no header was specified.
> That is, "stale entity causing revalidation" [by the origin server].
> 
> The RFC-2616 (14.9.1 What is Cacheable) says this about the no-cache= 
> directive :
>   If the no-cache directive does specify one or more field-names,
>   then a cache MAY use the response to satisfy a subsequent request,
>   subject to any other restrictions on caching. However, the
>   specified field-name(s) MUST NOT be sent in the response to a
>   subsequent request without successful revalidation with the origin
>   server. This allows an origin server to prevent the re-use of
>   certain header fields in a response, while still allowing caching
>   of the rest of the response.
> For what I understand, mod_cache is allowed to serve its cached entity 
> (though without the specified header(s)).

I read this through again, this time having slept properly.

The way I read the spec, "the specified field-name(s) MUST NOT be sent in the 
response to a subsequent request without successful revalidation with the 
origin server". What this means is that if the specified field names are found 
to be present in the cached response, then the origin server needs to be given 
the opportunity to update these fields through a conditional request. In the 
current cache code, we return 0 meaning "this is stale, revalidate", and a 
conditional request is sent to the origin. We hope the origin sends "304 Not 
Modified", with updated headers corresponding to the fields.

If we were to follow this patch, it means that the first time we hit the URL, 
the client sees the private/no-cache fields, but every cached response after 
will be treated as fresh with the field missing. This breaks caching.

What you're trying to achieve needs to be handled by your origin server, which 
should support conditional requests, and then send updated Set-Cookie headers 
along with the 304 Not Modified responses. This way the body stays cached, but 
your Set-Cookie is updated on every hit.

Regards,
Graham
--



smime.p7s
Description: S/MIME cryptographic signature


Re: mod_cache with Cache-Control no-cache= or private=

2013-03-06 Thread Graham Leggett
On 06 Mar 2013, at 12:04 PM, Yann Ylavic  wrote:

>> I've been working on a patch for mod_cache to deal (fully) with the
>> response header Cache-Control and the no-cache= or private=
>> directives.
> 
> I realize that, maybe, the patch should have been included in the
> message, rather than in an attachment, for it to be read quickly.
> So let me reply to myself with the patch below (which is not a big deal)...
> 
> Or maybe is there a reason not to include that functionality in
> mod_cache, with most of the code being already there ?
> I could not find any relative discussion in the list nor anywhere
> (about mod_cache, but to say it is not implemented).

I reviewed the patch and it looks sane, but my schedule has been insane, and I 
need some sleep before I commit this.

Should have time over the weekend if not before. Complying with all of the RFC 
is the goal of mod_cache, thank you for contributing this.

Regards,
Graham
--



smime.p7s
Description: S/MIME cryptographic signature


Re: mod_cache with Cache-Control no-cache= or private=

2013-03-06 Thread Yann Ylavic
Hi,

On Mon, Mar 4, 2013 at 7:22 PM, ylavic dev  wrote:
> I've been working on a patch for mod_cache to deal (fully) with the
> response header Cache-Control and the no-cache= or private=
> directives.

I realize that, maybe, the patch should have been included in the
message, rather than in an attachment, for it to be read quickly.
So let me reply to myself with the patch below (which is not a big deal)...

Or maybe is there a reason not to include that functionality in
mod_cache, with most of the code being already there ?
I could not find any relative discussion in the list nor anywhere
(about mod_cache, but to say it is not implemented).

Regards,
Yann.

Index: modules/cache/cache_util.c
===
--- modules/cache/cache_util.c  (revision 1451191)
+++ modules/cache/cache_util.c  (working copy)
@@ -27,7 +27,7 @@

 extern module AP_MODULE_DECLARE_DATA cache_module;

-#define CACHE_SEPARATOR ",   "
+#define CACHE_SEPARATOR ", \t"

 /* Determine if "url" matches the hostname, scheme and port and path
  * in "filter". All but the path comparisons are case-insensitive.
@@ -542,17 +542,84 @@
 }

 /* These come from the cached entity. */
-if (h->cache_obj->info.control.no_cache
-|| h->cache_obj->info.control.no_cache_header
-|| h->cache_obj->info.control.private_header) {
+if (h->cache_obj->info.control.no_cache) {
 /*
- * The cached entity contained Cache-Control: no-cache, or a
- * no-cache with a header present, or a private with a header
- * present, so treat as stale causing revalidation.
+ * The cached entity contained Cache-Control: no-cache, so
+ * treat as stale causing revalidation.
  */
 return 0;
 }
+if (h->cache_obj->info.control.no_cache_header
+|| h->cache_obj->info.control.private_header) {
+/*
+ * RFC2616 14.9.1: The cached entity contained
+ * Cache-Control: no-cache=, or Cache-Control: private=, with
+ * a header present, hence we are allowed to serve this entity,
+ * but without the specified headers, so let's strip them now,
+ * and fall through the other restrictions.
+ *
+ * Here we assume mixed Cache-Control: no-cache and no-cache=
+ * have been caught above and treated as stale causing revalidation,
+ * leaving here the only no-cache= and/or private= with a header.
+ */
+char *token;
+const char *header = apr_table_get(h->resp_hdrs, "Cache-Control");
+while (header && (token = ap_get_list_item(r->pool, &header))) {
+/* ap_get_list_item() strips the spurious whitespaces and
+ * lowercases anything (but the quoted-strings) */
+if (strncmp(token, "no-cache=", 9) == 0) {
+token += 9;
+}
+else if (strncmp(token, "private=", 8) == 0) {
+token += 8;
+}
+else {
+continue;
+}

+if (*token == '"') {
+/* RFC2616 2.2: quoted-string
+ * found no ap_*() function to unquote those strings,
+ * so the job is done here... */
+char *pos, *start, *end;
+pos = start = end = token + 1;
+while (*pos && *pos != '"') {
+if (*pos == '\\') {
+/* RFC2616 2.2: quoted-pair */
+if (end == pos) {
+/* duplicate to preserve the original token
+ * should the quoted-string be invalid */
+start = apr_pstrdup(r->pool, start);
+pos = end = start + (pos - token) - 1;
+}
+/* skip the quote */
+pos++;
+}
+if (end != pos) {
+*end = *pos;
+}
+end++;
+pos++;
+}
+if (*pos == '"' && !*(pos + 1)) {
+/* valid quoted-string */
+token = start;
+*end = '\0';
+}
+else {
+/* invalid quoted-string, continue? fall through?
+ * like ap_get_mime_headers_core() we do not check
+ * headers' names validity, and just fall through,
+ * is there a tiny chance to unset such a header? */
+/*continue;*/
+}
+}
+
+/* strip that header from the response */
+apr_table_unset(h->resp_hdrs, token);
+}
+}
+
 if ((agestr = apr_table_get(h->resp_hdrs, "Age"))) {
 age_c = apr_atoi64(agestr);
 }


Re: mod_cache incompatible with efficient PHP?

2011-09-18 Thread Graham Leggett

On 18 Sep 2011, at 7:36 AM, Bill Lipa wrote:


According to this thread on serverfault:
http://serverfault.com/questions/74025/apaches-mod-cache-not-caching-fcgi-php-output
and this dormant bug:
https://issues.apache.org/bugzilla/show_bug.cgi?id=48364
the use of Action directives to handle php requests (or indeed any  
type of request) breaks mod_cache.


I believe Action is a key part of the standard / recommended way to  
use php with fcgi, and fcgi is needed for even moderate efficiency.   
Being able to use mod_cache with PHP seems like a pretty important  
use case.  Is there hope for an Apache-only caching solution for php  
sites?


Reading through this bug again, this looks like a filter problem  
rather than a mod_cache problem. To solve this, you would need to  
confirm whether the filter stack is not being bypassed or otherwise  
being fiddled with during the request by the handler you're using  
(mod_fastcgi?).


The debug log shows the cache filters being added, but then silence,  
which in most cases means the filters like mod_cache never got the  
chance to run.


Regards,
Graham
--



smime.p7s
Description: S/MIME cryptographic signature


Re: mod_cache not caching 301s

2011-05-12 Thread Damon Green
Does anyone have any ideas on this? According to the docs It is supposed to
work.

Thanks,
Damon.


On Wed, May 11, 2011 at 9:49 AM, Damon Green  wrote:

> Hi Folks, I posted this question on users but haven't had any joy there,
> hoping someone here may know more.
>
> I have an issue with mod_cache, it refuses to cache redirects (301) and
> insists on cacheing 404 error responses, so really two issues.
>
> I'm using Apache 2.2.17 and the mod_cache/mod_disk_cache from Apache 2.3
> which serves stale content from its disk cache when the Tomcat is
> unavailable. (patched version from Graham Leggett)
>
> Trawling the list archives and docos imply that 404 responses should not be
> cached, and that 30x responses should be, but the behaviour I'm seeing is
> the opposite of that.
>
> I need 301 redirects to remain working (from the cache) when we disable
> Tomcat.
>
> To test this Ive created a rewrite rule in the Apache conf:
>
> RewriteRule ^/damon/(.*)http://www.slashdot.org [R=301,L]
>
> Then cleared the cache, hit a page in /damon/, got redirected, nothing
> created in the disk cache.
> any 200 or 404 however creates files in the cache.
>
>
>
> http://httpd.apache.org/docs/2.2/caching.html
>
> # The response must have a HTTP status code of 200, 203, 300, 301 or 410.
>
> This is largely a function 13.4 in the RFC:
>
>A response received with a status code of 200, 203, 206, 300, 301 or
>410 MAY be stored by a cache and used in reply to a subsequent
>request, subject to the expiration mechanism, unless a cache-control
>directive prohibits caching. However, a cache that does not support
>the Range and Content-Range headers MUST NOT cache 206 (Partial
>Content) responses.
>
>
> Any advice or ideas gratefully received.
>
> Regards,
> Damon Green.
>
>


Re: mod_cache not caching 301s

2011-05-11 Thread Damon Green
On Wed, May 11, 2011 at 9:49 AM, Damon Green  wrote:

> Hi Folks, I posted this question on users but haven't had any joy there,
> hoping someone here may know more.
>
> I have an issue with mod_cache, it refuses to cache redirects (301) and
> insists on cacheing 404 error responses, so really two issues.
>
> I'm using Apache 2.2.17 and the mod_cache/mod_disk_cache from Apache 2.3
> which serves stale content from its disk cache when the Tomcat is
> unavailable. (patched version from Graham Leggett)
>
> Trawling the list archives and docos imply that 404 responses should not be
> cached, and that 30x responses should be, but the behaviour I'm seeing is
> the opposite of that.
>
> I need 301 redirects to remain working (from the cache) when we disable
> Tomcat.
>
> To test this Ive created a rewrite rule in the Apache conf:
>
> RewriteRule ^/damon/(.*)http://www.slashdot.org [R=301,L]
>
> Then cleared the cache, hit a page in /damon/, got redirected, nothing
> created in the disk cache.
> any 200 or 404 however creates files in the cache.
>
>
>
> http://httpd.apache.org/docs/2.2/caching.html
>
> # The response must have a HTTP status code of 200, 203, 300, 301 or 410.
>
> This is largely a function 13.4 in the RFC:
>
>A response received with a status code of 200, 203, 206, 300, 301 or
>410 MAY be stored by a cache and used in reply to a subsequent
>request, subject to the expiration mechanism, unless a cache-control
>directive prohibits caching. However, a cache that does not support
>the Range and Content-Range headers MUST NOT cache 206 (Partial
>Content) responses.
>
>
> Any advice or ideas gratefully received.
>
> Regards,
> Damon Green.
>
>
> --
> WMIT Team
> Financial Times
> Number One, Southwark Bridge, London, SE1 9HL
> Internal Extension: 0207 873 3000 Ext. 7049
> Mobile: 07929 205837
>
>
This bug https://issues.apache.org/bugzilla/show_bug.cgi?id=45273
mentions that
mod_dir might intercept CACHE_SAVE filter, so I've tried disabling mod_dir,
though this has had no beneficial effect. Is there a way to force the
CACHE_SAVE filter on a per response code basis?

Thanks,
Damon Green.


Re: mod_cache: serving stale content during outages

2010-10-19 Thread Mark Nottingham
FYI, while you're doing this it might be interesting to make it explicitly 
controllable by the origin:
   http://tools.ietf.org/html/rfc5861

Cheers,


On 12/10/2010, at 9:43 AM, Graham Leggett wrote:

> Hi all,
> 
> RFC2616 allows us to serve stale content during outages:
> 
>/* RFC2616 13.8 Errors or Incomplete Response Cache Behavior:
> * If a cache receives a 5xx response while attempting to revalidate an
> * entry, it MAY either forward this response to the requesting client,
> * or act as if the server failed to respond. In the latter case, it MAY
> * return a previously received response unless the cached entry
> * includes the "must-revalidate" cache-control directive (see section
> * 14.9).
> */
> 
> The next patch teaches mod_cache how to optionally serve stale content should 
> a backend be responding with 5xx errors, as per the RFC above.
> 
> In order to make this possible, the cache_out_filter needed to be cleaned up 
> so that it cleanly discarded data before the EOS bucket (instead of ignoring 
> it, as before). The cache_status hook needed to be updated so that 
> r->err_headers_out could be passed to it.
> 
> Regards,
> Graham
> --
> 

--
Mark Nottingham   http://www.mnot.net/





Re: mod_cache: overriding Cache-Control and Vary

2010-10-17 Thread Roy T. Fielding
On Oct 17, 2010, at 9:19 AM, Graham Leggett wrote:

> Hi all,
> 
> One of the missing things that mod_cache can't do that other caches can is to 
> be able to override the Cache-Control and Vary headers, so that the cache can 
> be targeted for custom behaviour.
> 
> The classic use case is when you insert request headers into your server 
> stack, which you want to vary on. The Vary header in this case would make no 
> sense to the internet at large, or the Vary header for the benefit of the 
> internet may be different to the Vary header for the benefit of mod_cache.
> 
> What I propose are some simple per-directory directives, allowing you to 
> optionally override the names of Cache-Control, Pragma, and Vary headers for 
> outgoing requests, for example X-Cache-Control, X-Pragma or X-Vary.

No, we don't introduce X- prefix headers.  The functionality you describe
is not specific to cache -- just implement a response header filter
that the admin can configure as they wish and that is applied just after
the content generator.

Roy



Re: mod_cache: use of ap_log_error() instead of ap_log_rerror()

2010-10-17 Thread Stefan Fritsch
Hi Graham,

On Sunday 17 October 2010, Graham Leggett wrote:
> Across mod_cache, all the logging directives log at the server
> scope using ap_log_error(), instead of at the request scope
> ap_log_rerror().
> 
> While I suspect the original intention of this was because the
> quick_handler() is involved, is it true to assume that the
> ap_log_rerror() request scope logging function can't be used in a
> quick_handler()?

Up to 2.2.x, there was not much difference between server scope and 
request scope logging. I suspect that the original intention was just 
to omit the referrer which is appended in request scope.

I don't see any reason why request scope logging should not work for 
the quick handler. Probably all the per-directory log configuration 
will be ignored when using the quick handler, but IMHO that's just 
something that needs to be documented.

> Hoping that someone who knows the logging stuff better than I do
> can confirm, because I'd like to use ap_log_rerror() across the
> board if I can.

+1, because that would allow to use some of the new features, like  
correlating error log and access log entries via the log id.

Cheers,
Stefan


Re: mod_cache: disk layout for vary support

2010-10-10 Thread Igor Galić

- "William A. Rowe Jr."  wrote:

> On 10/10/2010 11:26 PM, Paul Querna wrote:
> > 
> > I would rather change the defaults to use only two letters and two
> > levels deep for the cache directories, and probally restrict the

see below

> > character set even further to just [a-zA-Z].
> > 
> > I think a case should be made for not using sub-directories inside
> the
> > .varies folder. Instead just flatten it out and put all the
> variants
> > inside a flat directory, rather than distributing them over the
> cache
> > again.

+1

> This also makes sense... 3x5 depth is insane if you aren't caching
> content
> for, say, a cellular ip access node.

In trunk the defaults are 2x2 -- but even that is basically advised
against: http://httpd.apache.org/docs/trunk/caching.html#disk

> What if variants had a single variable, number of directory levels,
> using
> the same dir segment length as the general case?  Then the typical
> default
> flavor could be 1 (additional dir layer) with the option to switch to
> 0
> for sites managing very few variants, or 2 if a large number of
> variants
> were expected.

-- 
Igor Galić

Tel: +43 (0) 664 886 22 883
Mail: i.ga...@brainsware.org
URL: http://brainsware.org/


Re: mod_cache: disk layout for vary support

2010-10-10 Thread William A. Rowe Jr.
On 10/10/2010 11:26 PM, Paul Querna wrote:
> 
> I would rather change the defaults to use only two letters and two
> levels deep for the cache directories, and probally restrict the
> character set even further to just [a-zA-Z].
> 
> I think a case should be made for not using sub-directories inside the
> .varies folder. Instead just flatten it out and put all the variants
> inside a flat directory, rather than distributing them over the cache
> again.

This also makes sense... 3x5 depth is insane if you aren't caching content
for, say, a cellular ip access node.

What if variants had a single variable, number of directory levels, using
the same dir segment length as the general case?  Then the typical default
flavor could be 1 (additional dir layer) with the option to switch to 0
for sites managing very few variants, or 2 if a large number of variants
were expected.


Re: mod_cache: disk layout for vary support

2010-10-10 Thread Paul Querna
On Sun, Oct 10, 2010 at 8:56 AM, Graham Leggett  wrote:
> Hi all,
>
> One of the things that needs to be fixed with mod_cache is the support for
> caching varying responses. In the current cache, we store it as below, as an
> additional directory tree below the original URL's directory tree. This
> wastes lots of inodes, and is very expensive to write.
>
> /tmp/cacheroot/
> /tmp/cacheroot//1uq
> /tmp/cacheroot//1uq/w...@d
> /tmp/cacheroot//1uq/w...@d/Fok
> /tmp/cacheroot//1uq/w...@d/Fok/HRU
> /tmp/cacheroot//1uq/w...@d/Fok/HRU/I62
> /tmp/cacheroot//1uq/w...@d/Fok/HRU/I62/QSJf2JA.header
> /tmp/cacheroot//1uq/w...@d/Fok/HRU/I62/QSJf2JA.header.vary
> /tmp/cacheroot//1uq/w...@d/Fok/HRU/I62/QSJf2JA.header.vary/thJ
> /tmp/cacheroot//1uq/w...@d/Fok/HRU/I62/QSJf2JA.header.vary/thJ/bK5
> /tmp/cacheroot//1uq/w...@d/Fok/HRU/I62/QSJf2JA.header.vary/thJ/bK5/im1
> /tmp/cacheroot//1uq/w...@d/Fok/HRU/I62/QSJf2JA.header.vary/thJ/bK5/im1/RSz
> /tmp/cacheroot//1uq/w...@d/Fok/HRU/I62/QSJf2JA.header.vary/thJ/bK5/im1/RSz/fCK
> /tmp/cacheroot//1uq/w...@d/Fok/HRU/I62/QSJf2JA.header.vary/thJ/bK5/im1/RSz/fCK/YHquMmA.data
> /tmp/cacheroot//1uq/w...@d/Fok/HRU/I62/QSJf2JA.header.vary/thJ/bK5/im1/RSz/fCK/YHquMmA.header
>
> What I have in mind is to move the varied content into the main tree, like
> this:
>
> /tmp/cacheroot/
> /tmp/cacheroot//1uq
> /tmp/cacheroot//1uq/w...@d
> /tmp/cacheroot//1uq/w...@d/Fok
> /tmp/cacheroot//1uq/w...@d/Fok/HRU
> /tmp/cacheroot//1uq/w...@d/Fok/HRU/I62
> /tmp/cacheroot//1uq/w...@d/Fok/HRU/I62/QSJf2JA.header
> /tmp/cacheroot//thJ
> /tmp/cacheroot//thJ/bK5
> /tmp/cacheroot//thJ/bK5/im1
> /tmp/cacheroot//thJ/bK5/im1/RSz
> /tmp/cacheroot//thJ/bK5/im1/RSz/fCK
> /tmp/cacheroot//thJ/bK5/im1/RSz/fCK/YHquMmA.data
> /tmp/cacheroot//thJ/bK5/im1/RSz/fCK/YHquMmA.header
>
> We reuse the same directory structure in the process, and keep the original
> QSJf2JA.header file indicating that the URL is a varied URL.

The problem with the second layout is that it makes it near impossible
to clear out all variants of a URL easily.

I think there are far more general issues with waste of inodes on the
cache disk format, than anything to do with varres.

I would rather change the defaults to use only two letters and two
levels deep for the cache directories, and probally restrict the
character set even further to just [a-zA-Z].

I think a case should be made for not using sub-directories inside the
.varies folder. Instead just flatten it out and put all the variants
inside a flat directory, rather than distributing them over the cache
again.

Thanks,

Paul


Re: mod_cache: disk layout for vary support

2010-10-10 Thread pfee
- Original Message 

From: William A. Rowe Jr. 
To: dev@httpd.apache.org
Sent: Sunday, 10 October, 2010 18:09:23
Subject: Re: mod_cache: disk layout for vary support

> On 10/10/2010 10:56 AM, Graham Leggett wrote:
> > 
> > One of the things that needs to be fixed with mod_cache is the support for 
>caching varying
> > responses. In the current cache, we store it as below, as an additional 
>directory tree
> > below the original URL's directory tree. This wastes lots of inodes, and is 
>very expensive
> > to write.
> >
> > What I have in mind is to move the varied content into the main tree, [...]
> > We reuse the same directory structure in the process, and keep the original 
>QSJf2JA.header
> > file indicating that the URL is a varied URL.
>
> +1

Currently the CacheDirLevels and CacheDirLength are also used to calculate 
the path for the varied entities.  What about having separate configuration 
for the vary sub-directory tree?

CacheDirLevels/CacheDirLength will be tuned for storing a huge number of 
URLs.  But how many variants do we expect per URL? A much smaller number.

I'd expect you can safely store all variants in a single subdirectory, i.e. 
VaryCacheDirLevels=0.  That way when it comes to retrieving content, once 
you've parsed the vary file, you've less inodes to deal with before getting 
to the final content.


The directory tree would be:

/tmp/cacheroot/
/tmp/cacheroot//1uq
/tmp/cacheroot//1uq/w...@d
/tmp/cacheroot//1uq/w...@d/Fok
/tmp/cacheroot//1uq/w...@d/Fok/HRU
/tmp/cacheroot//1uq/w...@d/Fok/HRU/I62
/tmp/cacheroot//1uq/w...@d/Fok/HRU/I62/QSJf2JA.header
/tmp/cacheroot//1uq/w...@d/Fok/HRU/I62/QSJf2JA.header.vary
/tmp/cacheroot//1uq/w...@d/Fok/HRU/I62/QSJf2JA.header.vary/
thJbK5im1RSzfCKYHquMmA.data
/tmp/cacheroot//1uq/w...@d/Fok/HRU/I62/QSJf2JA.header.vary/
thJbK5im1RSzfCKYHquMmA.header

Is that better than starting over from the top level to find the varied 
content?

Thanks,
Paul






Re: mod_cache: disk layout for vary support

2010-10-10 Thread William A. Rowe Jr.
On 10/10/2010 10:56 AM, Graham Leggett wrote:
> 
> One of the things that needs to be fixed with mod_cache is the support for 
> caching varying
> responses. In the current cache, we store it as below, as an additional 
> directory tree
> below the original URL's directory tree. This wastes lots of inodes, and is 
> very expensive
> to write.
> 
> What I have in mind is to move the varied content into the main tree, [...]
> We reuse the same directory structure in the process, and keep the original 
> QSJf2JA.header
> file indicating that the URL is a varied URL.

+1


Re: mod_cache: scoping directives to per directory/location

2010-09-30 Thread Niklas Edmundsson

On Thu, 30 Sep 2010, Graham Leggett wrote:


Hi all,

In the case of some of the mod_cache and mod_disk_cache directives, there 
isn't a reason to force these directives to be server wide, they can be per 
location instead. These are mainly directives that control what goes into the 
cache, like CacheStorePrivate, CacheStoreNoStore, and CacheMaxExpire.


The criterion for this to be possible is that the directive must control what 
goes into the cache, rather than what comes out. What comes out of the cache 
is handled in the quick handler, and the directory and location walks are 
bypassed, hiding the directives.


+1

/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | ni...@acc.umu.se
---
 Too rich for me. - O'Brien
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=


RE: mod_cache: store_body() bites off more than it can chew

2010-09-13 Thread Paul Fee
"Plüm, Rüdiger, VF-Group" wrote:

>  
> 
>> -Original Message-
>> From: Graham Leggett
>> Sent: Montag, 13. September 2010 16:35
>> To: dev@httpd.apache.org
>> Subject: Re: mod_cache: store_body() bites off more than it can chew
>> 
>> On 13 Sep 2010, at 4:18 PM, Plüm, Rüdiger, VF-Group wrote:
>> 
>> > It is not a problem for mod_disk_cache as you say, but
>> > I guess he meant for 3rd party providers that could only deliver
>> > the cached responses via heap buckets.
>> 
>> The cache provider itself puts the bucket in the brigade, and
>> has the
>> power to put any bucket into the brigade it likes, including
>> it's own
>> custom developed buckets. The fact that brigades become heap buckets
>> when read is a property of our bucket brigades, they aren't a
>> restriction applied by the cache.
>> 
>> For example, in the large disk cache patch, a special bucket was
>> invented that represented a file that was not be completely present,
>> and that blocked waiting for more data if the in-flight cache
>> file was
>> not yet all there. There was no need to change the API to
>> support this
>> scenario, the cache just dropped the special bucket into the brigade
>> and it was done.
> 
> Yeah, but in a tricky way, which is absolutely fine and cool if you cannot
> change the API, but the question is: Is this the way providers
> should go and does the API looks like as it should?
> 
> Regards
> 
> Rüdiger

Hi,

I'm familiar with the FILE bucket and have considered implementing a new 
bucket type that would have similar morphing properties for our custom 3rd 
party cache provider.

Currently a handler has the ability to call ap_pass_brigade multiple times 
hence can produce large bodies in small chunks.  The CACHE_OUT filter as 
currently implemented does not offer that, forcing a 3rd party provider to 
implement their own bucket type if HEAP buckets would occupy too much 
memory.

Changing CACHE_OUT filter to call recall_body() repeatedly until an EOS is 
obtained is a small change.  More importantly, it won't affect existing 
providers as they'll produce a brigade with an EOS bucket on their first 
invocation.

Custom bucket types may be a better approach, but shouldn't the CACHE_OUT 
filter be able to send the content in multiple brigades in the same way a 
handler would?

Thanks,
Paul


RE: mod_cache: store_body() bites off more than it can chew

2010-09-13 Thread Plüm, Rüdiger, VF-Group
 

> -Original Message-
> From: Graham Leggett 
> Sent: Montag, 13. September 2010 16:35
> To: dev@httpd.apache.org
> Subject: Re: mod_cache: store_body() bites off more than it can chew
> 
> On 13 Sep 2010, at 4:18 PM, Plüm, Rüdiger, VF-Group wrote:
> 
> > It is not a problem for mod_disk_cache as you say, but
> > I guess he meant for 3rd party providers that could only deliver
> > the cached responses via heap buckets.
> 
> The cache provider itself puts the bucket in the brigade, and 
> has the  
> power to put any bucket into the brigade it likes, including 
> it's own  
> custom developed buckets. The fact that brigades become heap buckets  
> when read is a property of our bucket brigades, they aren't a  
> restriction applied by the cache.
> 
> For example, in the large disk cache patch, a special bucket was  
> invented that represented a file that was not be completely present,  
> and that blocked waiting for more data if the in-flight cache 
> file was  
> not yet all there. There was no need to change the API to 
> support this  
> scenario, the cache just dropped the special bucket into the brigade  
> and it was done.

Yeah, but in a tricky way, which is absolutely fine and cool if you cannot
change the API, but the question is: Is this the way providers
should go and does the API looks like as it should?

Regards

Rüdiger



Re: mod_cache: store_body() bites off more than it can chew

2010-09-13 Thread Graham Leggett

On 13 Sep 2010, at 4:18 PM, Plüm, Rüdiger, VF-Group wrote:


It is not a problem for mod_disk_cache as you say, but
I guess he meant for 3rd party providers that could only deliver
the cached responses via heap buckets.


The cache provider itself puts the bucket in the brigade, and has the  
power to put any bucket into the brigade it likes, including it's own  
custom developed buckets. The fact that brigades become heap buckets  
when read is a property of our bucket brigades, they aren't a  
restriction applied by the cache.


For example, in the large disk cache patch, a special bucket was  
invented that represented a file that was not be completely present,  
and that blocked waiting for more data if the in-flight cache file was  
not yet all there. There was no need to change the API to support this  
scenario, the cache just dropped the special bucket into the brigade  
and it was done.



And unlike an handler in the same situation the cache providers
recall_body cannot run multiple passes through the outputfilter chain
with multiple smaller brigades.


In theory we could teach the cache to keep calling the recall_body()  
function until recall_body() returned an EOS bucket, but by supporting  
this we're going against the traditional way that bucket brigades work.


Regards,
Graham
--



RE: mod_cache: store_body() bites off more than it can chew

2010-09-13 Thread Plüm, Rüdiger, VF-Group
 

> -Original Message-
> From: Graham Leggett 
> Sent: Montag, 13. September 2010 16:04
> To: dev@httpd.apache.org
> Subject: Re: mod_cache: store_body() bites off more than it can chew
> 
> On 13 Sep 2010, at 1:14 PM, Paul Fee wrote:
> 
> > Retrieving bodies from the cache has a similar scalability 
> issue.  The
> > CACHE_OUT filter makes a single call to the provider's  
> > recall_body().  The
> > entire body must be placed in a single brigade which is 
> sent along the
> > filter chain with a single ap_pass_brigade() call.
> 
> This isn't a problem for the cache, as the cached content is 

It is not a problem for mod_disk_cache as you say, but
I guess he meant for 3rd party providers that could only deliver
the cached responses via heap buckets.
And unlike an handler in the same situation the cache providers
recall_body cannot run multiple passes through the outputfilter chain
with multiple smaller brigades.

> passed as  
> a single FILE bucket.

Regards

Rüdiger



Re: mod_cache: store_body() bites off more than it can chew

2010-09-13 Thread Graham Leggett

On 13 Sep 2010, at 1:14 PM, Paul Fee wrote:


Retrieving bodies from the cache has a similar scalability issue.  The
CACHE_OUT filter makes a single call to the provider's  
recall_body().  The

entire body must be placed in a single brigade which is sent along the
filter chain with a single ap_pass_brigade() call.


This isn't a problem for the cache, as the cached content is passed as  
a single FILE bucket.


When an upstream filter reads from the file bucket, the bucket  
"morphs" into a RAM resident HEAP bucket, followed by a file bucket  
representing the rest of the file. As long as a filter deletes each  
(heap) bucket as it reads it, we don't have memory issues.


This is no different to the default handler serving a static file.

For special needs, an implementation can define it's own bucket types  
which behave specially when read.


Regards,
Graham
--



Re: mod_cache: store_body() bites off more than it can chew

2010-09-13 Thread Paul Fee
Graham Leggett wrote:

> On 06 Sep 2010, at 11:00 PM, Paul Querna wrote:
> 
>> Isn't this problem an artifact of how all bucket brigades work, and is
>> present in all output filter chains?
>>
>> An output filter might be called multiple times, but a single bucket
>> can still contain a 4gb chunk easily.
>>
>> It seems to me it would be better to think about this holistically
>> down the entire output filter chain, rather than building in special
>> case support for this inside mod_cache's internal methods?
> 
> In the cache case, thinking about it a bit the in and out brigades are
> probably unavoidable, as the cache is a special case in that it wants
> to write the data twice, once to the cache, a second time to the rest
> of the filter stack. Right now, the cache is forced to read the
> complete brigade to cache it, no option to give up early. And the
> cache has no choice but to keep the brigade buckets in the brigade so
> that they can be passed a second time up the filter stack, no deleting
> buckets as you go like you normally would. Read one 4GB file bucket in
> the cache, and in the process the file bucket gets morphed into 1/2
> million heap buckets, oops. With two brigades, one in, one out, the in
> brigade can have the buckets removed as they are consumed, as normal,
> and moved to the out brigade. The cache can quit at any time, and the
> code following knows what data to write to the network (out), and what
> data to loop round and resend to the cache (in). The cache provider
> could choose to quit and ask to be called again either because writing
> took too long, or too much data was read (and in the process became
> heap buckets), either reason is fine.
> 
> That said, following on your suggestion of thinking about this in the
> general sense, it would be really nice if the filter stack had the
> option to say "I have bitten off as much of the brigade as I am
> prepared to chew on right now, and the leftovers are still in the
> brigade, can you call me back with this data, maybe with more data
> added, and I'll try swallow some more?".
> 
> In theory, that would mean all handlers (or entities that sent data)
> would no longer be allowed to make the blind assumption that the
> filter stack was willing to consume every possible set of buckets the
> handler wanted to send, and that the stack had the right to go "I'm
> full, give me a second to chew on this".
> 
> This wouldn't need separate brigades, probably just a return code that
> meant EAGAIN, and that was expected to be honoured by handlers.
> 
> Regards,
> Graham
> --

Retrieving bodies from the cache has a similar scalability issue.  The 
CACHE_OUT filter makes a single call to the provider's recall_body().  The 
entire body must be placed in a single brigade which is sent along the 
filter chain with a single ap_pass_brigade() call.

If a custom provider is using heap buckets and the body is large, then this 
can consume too much memory.  It would be better to loop, asking the 
provider repeatedly for portions of the body until the provider provides an 
EOS bucket.  Is there interest in a patch implementing this approach?

Thanks,
Paul


Re: mod_cache: store_body() bites off more than it can chew

2010-09-12 Thread Graham Leggett

On 06 Sep 2010, at 11:00 PM, Paul Querna wrote:


Isn't this problem an artifact of how all bucket brigades work, and is
present in all output filter chains?

An output filter might be called multiple times, but a single bucket
can still contain a 4gb chunk easily.

It seems to me it would be better to think about this holistically
down the entire output filter chain, rather than building in special
case support for this inside mod_cache's internal methods?


In the cache case, thinking about it a bit the in and out brigades are  
probably unavoidable, as the cache is a special case in that it wants  
to write the data twice, once to the cache, a second time to the rest  
of the filter stack. Right now, the cache is forced to read the  
complete brigade to cache it, no option to give up early. And the  
cache has no choice but to keep the brigade buckets in the brigade so  
that they can be passed a second time up the filter stack, no deleting  
buckets as you go like you normally would. Read one 4GB file bucket in  
the cache, and in the process the file bucket gets morphed into 1/2  
million heap buckets, oops. With two brigades, one in, one out, the in  
brigade can have the buckets removed as they are consumed, as normal,  
and moved to the out brigade. The cache can quit at any time, and the  
code following knows what data to write to the network (out), and what  
data to loop round and resend to the cache (in). The cache provider  
could choose to quit and ask to be called again either because writing  
took too long, or too much data was read (and in the process became  
heap buckets), either reason is fine.


That said, following on your suggestion of thinking about this in the  
general sense, it would be really nice if the filter stack had the  
option to say "I have bitten off as much of the brigade as I am  
prepared to chew on right now, and the leftovers are still in the  
brigade, can you call me back with this data, maybe with more data  
added, and I'll try swallow some more?".


In theory, that would mean all handlers (or entities that sent data)  
would no longer be allowed to make the blind assumption that the  
filter stack was willing to consume every possible set of buckets the  
handler wanted to send, and that the stack had the right to go "I'm  
full, give me a second to chew on this".


This wouldn't need separate brigades, probably just a return code that  
meant EAGAIN, and that was expected to be honoured by handlers.


Regards,
Graham
--



Re: mod_cache: store_body() bites off more than it can chew

2010-09-06 Thread Paul Querna
On Thu, Sep 2, 2010 at 10:16 AM, Graham Leggett  wrote:
> Hi all,
>
> An issue with mod_cache I would like to address this weekend is the
> definition of the store_body() function in the cache implementation
> provider:
>
>    apr_status_t (*store_body)(cache_handle_t *h, request_rec *r,
> apr_bucket_brigade *b);
>
> Right now, mod_cache expects a cache implementation to swallow the entire
> bucket brigade b before returning to mod_cache.
>
> This is fine until the bucket brigade b contains something really large,
> such as a single file bucket pointing at a 4GB DVD image (such a scenario
> occurs when files on a slow disk are cached on a fast SSD disk). At this
> point, mod_cache expects the cache implementation to swallow the entire
> brigade in one go, and this can take a significant amount of time, certainly
> enough time for the client to get bored and time out should the file be
> large and the original disk slow.

Isn't this problem an artifact of how all bucket brigades work, and is
present in all output filter chains?

An output filter might be called multiple times, but a single bucket
can still contain a 4gb chunk easily.

It seems to me it would be better to think about this holistically
down the entire output filter chain, rather than building in special
case support for this inside mod_cache's internal methods?

> What I propose is a change to the function that looks like this:
>
>    apr_status_t (*store_body)(cache_handle_t *h, request_rec *r,
> apr_bucket_brigade *in, apr_bucket_brigade *out);
>
> Instead of one brigade b being passed in, we pass two brigades in, one
> labelled "in", the other labelled "out".
>
> The brigade previously marked "b" becomes "in", and the cache implementation
> is free to consume as much of the "in" brigade as it sees fit, and as the
> "in" brigade is consumed, the consumed buckets are moved to the "out"
> brigade.
>
> If store_body() returns with an empty "in" brigade, mod_cache writes the
> "out" brigade to the output filter stack and we are done as is the case now.
>
> Should however the cache implementation want to take a breath, it returns to
> mod_cache with unconsumed bucket(s) still remaining in the "in" brigade.
> mod_cache in turn sends the already-processed buckets in the "out" brigade
> down the filter stack to the client, and then loops round, calling the
> store_body() function again until the "in" brigade is empty.
>
> In this way, the cache implementation has the option to swallow data in as
> many smaller chunks as it sees fit, and in turn the client gets fed data
> often enough to not get bored and time out if the file is very large.
>
> Regards,
> Graham
> --
>
>


Re: mod_cache: store_body() bites off more than it can chew

2010-09-06 Thread Stefan Fritsch
On Monday 06 September 2010, Paul Fee wrote:
> Currently headers and data are in separate files.  If they were in
> a single  file, the operating system is given more indication that
> these two items are tightly coupled.  For example, when the
> headers are read in, the O/S can readahead and buffer part of the
> body.

I think it would be better to use posix_fadvise() to give the OS hints 
to improve read-ahead. But this probably needs some profiling, it 
could have a negative effect on memory constrained systems.


Re: mod_cache: store_body() bites off more than it can chew

2010-09-06 Thread Niklas Edmundsson

On Mon, 6 Sep 2010, Paul Fee wrote:


If mod_disk_cache's on disk format is changing, now may be an opportunity to
investigate some options to improve performance of httpd as a caching proxy.

Currently headers and data are in separate files.  If they were in a single
file, the operating system is given more indication that these two items are
tightly coupled.  For example, when the headers are read in, the O/S can
readahead and buffer part of the body.

A difficulty with this could be refreshing the headers after a response to a
conditional GET.  If the headers are at the start of the file and they
change size, then they may overwrite the start of the existing body.  You
could leave room for expansion (risks wasted space and may not be enough) or
you could put the headers at the end of the file (may not benefit from
readahead).


I tried to go the single-file route, but after having banged my head 
against the above issue and others while trying to design/implement 
something that would work for read-while-caching with using only 
O_EXCL file locking I did some benchmarking and found ut that the gain 
was minimal and reverted to having a separate header and body file.


What DID matter VERY MUCH regarding performance was the totally bogus 
defaults which affects the number of directories mod_disk_cache 
creates. CacheDirLength 1 and CacheDirLevels 2 gives you 4096 
directories (64^2) that holds files, that will hold many millions of 
files even on an fs that isn't too good at coping with many entries in 
a directory. With the defaults you tend to end up with one directory 
for each query, not very optimal.


Also, set CacheRemoveDirectories false because otherwise 
mod_disk_cache creates and deletes directories all the time which is a 
total waste of time. If you need to delete cache dirs then you have 
tuned yourself into the wrong corner, so IMHO that part of 
mod_disk_cache is plainly wrong.


Oh, this rant applies for xfs on Linux while I was hacking on our 
large-file-cache-patchset. The basics should apply for most other 
fs/os combos too ;)



On a similar theme, would filesystem extended attributes be suitable for
storing the headers?  The cache file's contents would be the entity body.  A
problem with this approach could be portability.  However the APR could
abstract this, reverting to separate files on platforms/filesystems that
didn't offer extended attributes.

http://en.wikipedia.org/wiki/Extended_file_attributes

I haven't tested extended attributes to see if they offer performance gains
over separate header and body files.  However it seems cleaner to have both
parts in one file.  It should also eliminate race conditions where
headers/body could get out of sync.


I'm honestly not sure you will get any massive performance gains, only 
benchmarks will tell :) The consistency-issues should be cleaner 
though.


Also, you will/might lose any possibility to have multiple headers 
pointing to the same body (classic example is multiple URLs resulting 
in the same plain file).


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | ni...@acc.umu.se
---
 IBM stands for Inferior But Marketable.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=


Re: mod_cache: store_body() bites off more than it can chew

2010-09-06 Thread Graham Leggett

On 06 Sep 2010, at 3:10 PM, Paul Fee wrote:

If mod_disk_cache's on disk format is changing, now may be an  
opportunity to
investigate some options to improve performance of httpd as a  
caching proxy.


Currently headers and data are in separate files.  If they were in a  
single
file, the operating system is given more indication that these two  
items are
tightly coupled.  For example, when the headers are read in, the O/S  
can

readahead and buffer part of the body.

A difficulty with this could be refreshing the headers after a  
response to a

conditional GET.  If the headers are at the start of the file and they
change size, then they may overwrite the start of the existing  
body.  You
could leave room for expansion (risks wasted space and may not be  
enough) or

you could put the headers at the end of the file (may not benefit from
readahead).


This requirement is the reason the files are separate now. It's not  
that practical to have a header which can change size in time, and a  
body which can change size in time in the same file.


On a similar theme, would filesystem extended attributes be suitable  
for
storing the headers?  The cache file's contents would be the entity  
body.  A
problem with this approach could be portability.  However the APR  
could
abstract this, reverting to separate files on platforms/filesystems  
that

didn't offer extended attributes.

http://en.wikipedia.org/wiki/Extended_file_attributes

I haven't tested extended attributes to see if they offer  
performance gains
over separate header and body files.  However it seems cleaner to  
have both

parts in one file.  It should also eliminate race conditions where
headers/body could get out of sync.


mod_cache.h provides a provider mechanism for cache implementations,  
it is entirely possible to create a new cache implementation targeted  
directly at platforms where extended attributes are supported. The  
only requirement would be that it be possible to update the extended  
attributes atomically.


Regards,
Graham
--



Re: mod_cache: store_body() bites off more than it can chew

2010-09-06 Thread Paul Fee
Graham Leggett wrote:
> 
> Given that the make-cache-writes-atomic problem requires a change to
> the data format, it may be useful to look at this now, before v2.4 is
> baked, which will happen soon.
> 
> How much of a performance boost is the use-null-terminated-strings?
> 
> Regards,
> Graham
> --

If mod_disk_cache's on disk format is changing, now may be an opportunity to 
investigate some options to improve performance of httpd as a caching proxy.

Currently headers and data are in separate files.  If they were in a single 
file, the operating system is given more indication that these two items are 
tightly coupled.  For example, when the headers are read in, the O/S can 
readahead and buffer part of the body.

A difficulty with this could be refreshing the headers after a response to a 
conditional GET.  If the headers are at the start of the file and they 
change size, then they may overwrite the start of the existing body.  You 
could leave room for expansion (risks wasted space and may not be enough) or 
you could put the headers at the end of the file (may not benefit from 
readahead).

On a similar theme, would filesystem extended attributes be suitable for 
storing the headers?  The cache file's contents would be the entity body.  A 
problem with this approach could be portability.  However the APR could 
abstract this, reverting to separate files on platforms/filesystems that 
didn't offer extended attributes.

http://en.wikipedia.org/wiki/Extended_file_attributes

I haven't tested extended attributes to see if they offer performance gains 
over separate header and body files.  However it seems cleaner to have both 
parts in one file.  It should also eliminate race conditions where 
headers/body could get out of sync.

Thanks,
Paul


Re: mod_cache: store_body() bites off more than it can chew

2010-09-06 Thread Dan Poirier
On 2010-09-06 at 05:52, Niklas Edmundsson  wrote:

> On Fri, 3 Sep 2010, Graham Leggett wrote:
>> Been keen to do this for a while, this would definitely solve the
>> RAM problem, but wouldn't solve the time problem. Copying 4GB of
>> data from a slow disk can easily take minutes, and when Blu-ray
>> images start becoming common, the problem would get worse.
>
> Yup. The next step to solve that would be to be able to serve requests
> from cache while they are being cached. I don't know the RFC
> implications of doing this, but in real life it's really useful.

That would be cool.  If two requests come in for the same 4GB resource
close together, it's a shame to to have to retrieve it twice.



Re: mod_cache: store_body() bites off more than it can chew

2010-09-06 Thread Niklas Edmundsson

On Mon, 6 Sep 2010, Graham Leggett wrote:


For those who have forgotten, that's what we do in our 
large-file-caching-patchset for mod_disk_cache (hidden as an attachment to 
https://issues.apache.org/bugzilla/show_bug.cgi?id=39380 but I should 
really get around to upload an up2date version that applies cleanly to the 
current 2.2 release). Some of the solutions there aren't really applicable 
to httpd proper (mostly workarounds for missing infrastructure), but some 
ideas are rather sane (like writing the header files in a single go with an 
iovec with null terminated strings instead of crlf-stuff thad needs to be 
parsed). Oh, and the design caters for a shared data cache (ftp and rsync 
access uses the same cache), which isn't really a priority for something in 
httpd proper.


Given that the make-cache-writes-atomic problem requires a change to the data 
format, it may be useful to look at this now, before v2.4 is baked, which 
will happen soon.


Indeed.

When at it, it might make sense to replace arch-specific data types 
like int and apr_size_t with apr_int32_t and such. Most people would 
have made the 32/64 bit transition already though, so it might be a 
non-issue.


Another good thing to have would be the filename of the maching 
data/body file. httpd mod_disk_cache hashes this from the URL, but 
there may be smarter ways to do this at cache-time which requires the 
resulting filename to be stored (for example we use dev/inode on plain 
files to reduce data duplication when caching DVD images with dozens 
of known URLs). Size of that file is also good to have, on mismatch 
the cache is out of sync/corrupted (unless the file is being written, 
but then we know enough to start answering the query from cache).


Also we save r->filename to be able to fill it in when replying on a 
query (I think for making logging filenames work).



How much of a performance boost is the use-null-terminated-strings?


As CPU is cheap nowadays, not much in end-to-end performance, but the 
logic of figuring out whether a header file is correct/complete 
becomes much easier when you construct the entire .header-file in an 
iovec, place the total header length in the on-disk structure, and 
then write it out.


Reading it in becomes reading main data structure, and then reading 
whatever length the structure indicates as headers. If you get more or 
less than the data structure says then something is wrong and you can 
either retry (if the header seems to be currently writing and the 
iovec size is too small so it takes multiple writes, but as the 
current mod_disk_cache code uses temporary files that's a non-issue) 
or discard it.


The current text-ish-based .header files offers no way of knowing the 
integrity of the header file, and store_table()/read_table() have 
quite a lot of complexity when just handling the null terminated 
strings as is would do nicely.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | ni...@acc.umu.se
---
 After three days of intense pain, the snake died. * Riker
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=


Re: mod_cache: store_body() bites off more than it can chew

2010-09-06 Thread Graham Leggett

On 06 Sep 2010, at 11:52 AM, Niklas Edmundsson wrote:

Regarding the issue of the disk cache cramming the entire file  
into memory/address space, an alternate solution could be that the  
cache returns buckets pointing to the cached file, ie that the  
cache consumed those pesky mmapped buckets. This way the cache  
could cache the file rather quickly independent of the speed of  
the client, so that the file caching is finished in a sane time so  
others can benefit from it even though the cache-initiating  
request is running at snail-speed...


Been keen to do this for a while, this would definitely solve the  
RAM problem, but wouldn't solve the time problem. Copying 4GB of  
data from a slow disk can easily take minutes, and when Blu-ray  
images start becoming common, the problem would get worse.


Yup. The next step to solve that would be to be able to serve  
requests from cache while they are being cached. I don't know the  
RFC implications of doing this, but in real life it's really useful.


With this in place, it becomes easy for an implementation to decide to  
do "in flight" caching, plugging into the generic mod_cache provider  
interface.


For those who have forgotten, that's what we do in our large-file- 
caching-patchset for mod_disk_cache (hidden as an attachment to https://issues.apache.org/bugzilla/show_bug.cgi?id=39380 
 but I should really get around to upload an up2date version that  
applies cleanly to the current 2.2 release). Some of the solutions  
there aren't really applicable to httpd proper (mostly workarounds  
for missing infrastructure), but some ideas are rather sane (like  
writing the header files in a single go with an iovec with null  
terminated strings instead of crlf-stuff thad needs to be parsed).  
Oh, and the design caters for a shared data cache (ftp and rsync  
access uses the same cache), which isn't really a priority for  
something in httpd proper.


Given that the make-cache-writes-atomic problem requires a change to  
the data format, it may be useful to look at this now, before v2.4 is  
baked, which will happen soon.


How much of a performance boost is the use-null-terminated-strings?

Regards,
Graham
--



Re: mod_cache: store_body() bites off more than it can chew

2010-09-06 Thread Niklas Edmundsson

On Fri, 3 Sep 2010, Graham Leggett wrote:




Regarding the issue of the disk cache cramming the entire file into 
memory/address space, an alternate solution could be that the cache returns 
buckets pointing to the cached file, ie that the cache consumed those pesky 
mmapped buckets. This way the cache could cache the file rather quickly 
independent of the speed of the client, so that the file caching is 
finished in a sane time so others can benefit from it even though the 
cache-initiating request is running at snail-speed...


Been keen to do this for a while, this would definitely solve the RAM 
problem, but wouldn't solve the time problem. Copying 4GB of data from a slow 
disk can easily take minutes, and when Blu-ray images start becoming common, 
the problem would get worse.


Yup. The next step to solve that would be to be able to serve requests 
from cache while they are being cached. I don't know the RFC 
implications of doing this, but in real life it's really useful.


For those who have forgotten, that's what we do in our 
large-file-caching-patchset for mod_disk_cache (hidden as an 
attachment to https://issues.apache.org/bugzilla/show_bug.cgi?id=39380 
but I should really get around to upload an up2date version that 
applies cleanly to the current 2.2 release). Some of the solutions 
there aren't really applicable to httpd proper (mostly workarounds for 
missing infrastructure), but some ideas are rather sane (like writing 
the header files in a single go with an iovec with null terminated 
strings instead of crlf-stuff thad needs to be parsed). Oh, and the 
design caters for a shared data cache (ftp and rsync access uses the 
same cache), which isn't really a priority for something in httpd 
proper.


Sadly I haven't had any time to whack this into pieces and feed the 
appropriate parts into httpd trunk, but I can at least vouch for the 
fact that the functionality is really useful and essential when 
dealing with large files (ie dvd/bluray images) and that development 
in this direction is desired.



/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | ni...@acc.umu.se
---
 Must Go - Some Jehovahs witnesses need shouting at.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=


Re: mod_cache: store_body() bites off more than it can chew

2010-09-03 Thread Graham Leggett

On 03 Sep 2010, at 4:25 PM, Niklas Edmundsson wrote:

This could even go a bit further with providing the cache  
implementation with a hint of when it would be polite of it to  
return. I think it would probably be easier if the cache  
implementation knows what's expected of it. Or?


That I've covered separately in the email about atomic commits.

Also, if the client hangs up, will the cache impl get the chance to  
finish its job (ie. completing the caching of a file instead of  
starting over later on)?


That is a decision made by mod_cache itself, not the implementation  
though, but it's definitely possible. In theory, if mod_cache kept  
track of a downstream failure, and then responded to the failure by  
reading from the backend and caching until done before returning the  
error, this would definitely work.


A side-step from this, how would it interact with the thundering  
herd lock and slow client first to access large file while other  
fast clients also wants to access it? Wouldn't this just be another  
variety of the "client gets bored before reply" scenario?


The thundering herd lock never holds back a client or makes a client  
wait. When the URL is completely uncached, the lock allows the first  
hit to start to cache, and passes all subsequent requests through  
without caching, in the process stopping the huge race that used to  
occur while many requests attempted to cache the same file over and  
over until at least one response was successfully completed. When the  
URL is already cached and has recently gone stale, the first hit is  
allowed to hit the backend and (hopefully) refresh the entry, while  
subsequent requests are served stale content with a Warning (as per  
the RFC). There is a safety valve on the lock in that the lock only  
lives for a few seconds, so if the request to the backend breaks, a  
new request to attempt to freshen the cache will be attempted in a few  
seconds time.


Regarding the issue of the disk cache cramming the entire file into  
memory/address space, an alternate solution could be that the cache  
returns buckets pointing to the cached file, ie that the cache  
consumed those pesky mmapped buckets. This way the cache could cache  
the file rather quickly independent of the speed of the client, so  
that the file caching is finished in a sane time so others can  
benefit from it even though the cache-initiating request is running  
at snail-speed...


Been keen to do this for a while, this would definitely solve the RAM  
problem, but wouldn't solve the time problem. Copying 4GB of data from  
a slow disk can easily take minutes, and when Blu-ray images start  
becoming common, the problem would get worse.


Regards,
Graham
--



Re: mod_cache: store_body() bites off more than it can chew

2010-09-03 Thread Niklas Edmundsson

On Thu, 2 Sep 2010, Graham Leggett wrote:


Should however the cache implementation want to take a breath, it returns to 
mod_cache with unconsumed bucket(s) still remaining in the "in" brigade. 
mod_cache in turn sends the already-processed buckets in the "out" brigade 
down the filter stack to the client, and then loops round, calling the 
store_body() function again until the "in" brigade is empty.


In this way, the cache implementation has the option to swallow data in as 
many smaller chunks as it sees fit, and in turn the client gets fed data 
often enough to not get bored and time out if the file is very large.


Regards,
Graham



This could even go a bit further with providing the cache 
implementation with a hint of when it would be polite of it to return. 
I think it would probably be easier if the cache implementation knows 
what's expected of it. Or?


Also, if the client hangs up, will the cache impl get the chance to 
finish its job (ie. completing the caching of a file instead of 
starting over later on)?


A side-step from this, how would it interact with the thundering herd 
lock and slow client first to access large file while other fast 
clients also wants to access it? Wouldn't this just be another variety 
of the "client gets bored before reply" scenario?


Regarding the issue of the disk cache cramming the entire file into 
memory/address space, an alternate solution could be that the cache 
returns buckets pointing to the cached file, ie that the cache 
consumed those pesky mmapped buckets. This way the cache could cache 
the file rather quickly independent of the speed of the client, so 
that the file caching is finished in a sane time so others can benefit 
from it even though the cache-initiating request is running at 
snail-speed...



/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | ni...@acc.umu.se
---
 "I'm going to hell. It's that simple. I am going straight to hell." - Veronica
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=


Re: mod_cache: store_body() bites off more than it can chew

2010-09-02 Thread Graham Leggett

On 02 Sep 2010, at 8:45 PM, Ruediger Pluem wrote:

I guess this makes sense for another reason as well. Looking at your  
example
(a single file bucket of a 4 GB file) I think the current  
implementation
can consume an insane amount of virtual memory in the httpd process  
as it
transforms the file bucket into mmap buckets while reading the file  
bucket

to store its contents in the cache.
Taking the 4 GB example this kills a 32 bit process.


This is another long standing problem that this change would fix,  
definitely.


Regards,
Graham
--



Re: mod_cache: store_body() bites off more than it can chew

2010-09-02 Thread Ruediger Pluem


On 09/02/2010 07:16 PM, Graham Leggett wrote:
> Hi all,
> 
> An issue with mod_cache I would like to address this weekend is the
> definition of the store_body() function in the cache implementation
> provider:
> 
> apr_status_t (*store_body)(cache_handle_t *h, request_rec *r,
> apr_bucket_brigade *b);
> 
> Right now, mod_cache expects a cache implementation to swallow the
> entire bucket brigade b before returning to mod_cache.
> 
> This is fine until the bucket brigade b contains something really large,
> such as a single file bucket pointing at a 4GB DVD image (such a
> scenario occurs when files on a slow disk are cached on a fast SSD
> disk). At this point, mod_cache expects the cache implementation to
> swallow the entire brigade in one go, and this can take a significant
> amount of time, certainly enough time for the client to get bored and
> time out should the file be large and the original disk slow.

I guess this makes sense for another reason as well. Looking at your example
(a single file bucket of a 4 GB file) I think the current implementation
can consume an insane amount of virtual memory in the httpd process as it
transforms the file bucket into mmap buckets while reading the file bucket
to store its contents in the cache.
Taking the 4 GB example this kills a 32 bit process.

> 
> What I propose is a change to the function that looks like this:
> 
> apr_status_t (*store_body)(cache_handle_t *h, request_rec *r,
> apr_bucket_brigade *in, apr_bucket_brigade *out);
> 
> Instead of one brigade b being passed in, we pass two brigades in, one
> labelled "in", the other labelled "out".
> 
> The brigade previously marked "b" becomes "in", and the cache
> implementation is free to consume as much of the "in" brigade as it sees
> fit, and as the "in" brigade is consumed, the consumed buckets are moved
> to the "out" brigade.
> 
> If store_body() returns with an empty "in" brigade, mod_cache writes the
> "out" brigade to the output filter stack and we are done as is the case
> now.
> 
> Should however the cache implementation want to take a breath, it
> returns to mod_cache with unconsumed bucket(s) still remaining in the
> "in" brigade. mod_cache in turn sends the already-processed buckets in
> the "out" brigade down the filter stack to the client, and then loops
> round, calling the store_body() function again until the "in" brigade is
> empty.
> 
> In this way, the cache implementation has the option to swallow data in
> as many smaller chunks as it sees fit, and in turn the client gets fed
> data often enough to not get bored and time out if the file is very large.

Sounds reasonable and should solve the problem above as well, provided
that the downstream filters consume these buckets and delete them.

Regards

Rüdiger




RE: mod_cache with sub_request crashing

2010-02-12 Thread Shashwat Agarwal
Summarizing, to reproduce this crash it is essential to:

1.   Use a subrequest to generate contents for the URL which is
configured to be cached using mod_cache. While creating subrequest it is
essential to pass main request's output filters:
 request_rec *rr = ap_sub_req_lookup_uri("/cache", r, r->output_filters); 
This causes "cache_save_filter" to be invoked on each ap_rwrite/ap_rflush

2.   There should be multiple calls to ap_rwrites from within the
sub_request

3.And finally, send 2 requests for exactly same URL (to be cached)
simultaneously.

 

This is what I am trying to achieve here: 

I have a set of public facing URLs which are generated dynamically. These
URLs are configured to be cached for public contents. Now, these URLs need
be authorized by a  handler which then generates a new URL containing the
actual location of the content sought. Now to fetch this content, request
need be transferred to this new URL. External redirects for these URLs is
not an option as they are private URLs. Response from Internal-transfer to
these URLs cannot be cached as part of original URL. The only option here
is to use sub requests to which I am passing main URL's output filters to
facilitate caching.

 

It will be great if someone can point me how to fix this in case I am
using subrequests in an exceptional way.

 

Tx,

-Shashwat

 

From: Shashwat Agarwal [mailto:shashw...@decho.com] 
Sent: 11 February 2010 PM 10:54
To: dev@httpd.apache.org
Cc: 'Suresh Krishnappa'; 'L. Suresh'; 'Vinay Y S'
Subject: FW: mod_cache with sub_request crashing

 

Hi,

 

mod_cache is crashing when with a sub request. PFA the source of the
module causing the crash (mod_subreq.c). I am using httpd-2.2.8 with
following details:

 

Server version: Apache/2.2.8 (Unix)

Server built:   Jan 24 2008 10:45:24

Server's Module Magic Number: 20051115:11

Server loaded:  APR 1.2.8, APR-Util 1.2.10

Compiled using: APR 1.2.8, APR-Util 1.2.10

Architecture:   32-bit

Server MPM: Worker

  threaded: yes (fixed thread count)

forked: yes (variable process count)

Server compiled with

 

Scenario is like this:

There is a URL: /hello for which cache is configured (see attached
httpd.conf). And the content for /hello is being generated using a
sub_request to url: /cache as:

request_rec *rr = ap_sub_req_lookup_uri("/cache", r,
r->output_filters);

if(!rr)

{

...

}

 

Now when two requests to /hello are issued simultaneously,  apache is
crashing with the following call stack:

 

#0  0x00927b5c in cache_save_filter (f=0xb4707ea0, in=0xb470a940) at
/usr/src/debug/httpd-2.2.8/modules/cache/mod_cache.c:368

 364 if (cache->in_checked) {

 365 /* pass the brigades into the cache, then pass them

 366  * up the filter stack

 367  */

 368 rv = cache->provider->store_body(cache->handle, r, in); <<<
crashing here because of NULL cache->provider

 369 if (rv != APR_SUCCESS) {

 370 ap_log_error(APLOG_MARK, APLOG_DEBUG, rv, r->server,

 371  "cache: Cache provider's store_body
failed!");

 372 ap_remove_output_filter(f);

 373 }

 374 return ap_pass_brigade(f->next, in);

 375 }

#1  0xb7ef7b60 in ap_pass_brigade (next=0xb4707df8, bb=0xb470a940) at
/usr/src/debug/httpd-2.2.8/server/util_filter.c:526

#2  0xb7ee7db7 in ap_sub_req_output_filter (f=0xb4709560, bb=0xb470a940)
at /usr/src/debug/httpd-2.2.8/server/request.c:1553

#3  0xb7ef7b60 in ap_pass_brigade (next=0xb4707df8, bb=0xb470a940) at
/usr/src/debug/httpd-2.2.8/server/util_filter.c:526

#4  0xb7ede3d1 in ap_old_write_filter (f=0xb470a908, bb=0xb470a940) at
/usr/src/debug/httpd-2.2.8/server/protocol.c:1405

#5  0xb7ef7b60 in ap_pass_brigade (next=0xb4707df8, bb=0xb470a940) at
/usr/src/debug/httpd-2.2.8/server/util_filter.c:526

#6  0xb7ede349 in ap_rflush (r=0xb4709090) at
/usr/src/debug/httpd-2.2.8/server/protocol.c:1609

#7  0x00dee4fc in modsubreq_handler_impl () from
/etc/httpd/modules/libmod_subrequest.so

#8  0x00dee039 in modsubreq_handler () from
/etc/httpd/modules/libmod_subrequest.so

#9  0xb7eec52d in ap_run_handler (r=0xb4709090) at
/usr/src/debug/httpd-2.2.8/server/config.c:157

#10 0xb7eefef7 in ap_invoke_handler (r=0xb4709090) at
/usr/src/debug/httpd-2.2.8/server/config.c:372

#11 0xb7ee79a9 in ap_run_sub_req (r=0xb4709090) at
/usr/src/debug/httpd-2.2.8/server/request.c:1880

#12 0x00dee8f0 in modsubreq_handler_impl () from
/etc/httpd/modules/libmod_subrequest.so

#13 0x00dee039 in modsubreq_handler () from
/etc/httpd/modules/libmod_subrequest.so

#14 0xb7eec52d in ap_run_handler (r=0xb47071a8) at
/usr/src/debug/httpd-2.2.8/server/config.c:157

#15 0xb7eefef7 in ap_invoke_handler (r=0xb47071a8) at
/usr/src/debug/httpd-2.2.8/server/config.c:372

#16 0xb7efbe24 in ap_internal_redirect (new_uri=0xb470902c "/hello?s=1",
r=0xb4703040) at
/usr/src/debug/httpd-2.2.8/modules/http/http_request.c:4

Re: mod_cache sends 200 code instead of 304

2009-09-10 Thread Nicholas Sherlock

Graham Leggett wrote:

Nicholas Sherlock wrote:

But couldn't it just send a 304 Not Modified code instead? At the moment
it ends up wasting large amounts of bandwidth on my website in the case
where you press refresh on an unmodified object in Firefox, which sends
these request headers:


I kept this back to investigate as I have been ENOTIME, but I've noticed
a small detail:


Actually, this problem was traced to a bug in PHP's Apache filter. It 
sets "no_local_copy" to 1 in its response to Apache, which denies 
mod_cache from creating its own 304 Not Modified response code.



Etags and If-None-Match are HTTP/1.1 caching concepts, and yet you're
sending a response back to the cache telling the cache that you are an
HTTP/1.0 server.

I suspect what is happening is that the cache is seeing an HTTP/1.0
response with HTTP/1.1 headers in it, and is in turn ignoring your 304
not modified response.

Try change your response to 'HTTP/1.1 304 Not Modified' instead.


I think I changed it to HTTP/1.0 as a last resort after I had exhausted 
all my other options. I changed it back to HTTP/1.1, and no change, it 
still gives the same behaviour.



Another thing to check, you're using a function called "header" to set
what is really the response status line, I'm not a php person, but that
looks wrong to me.


header() is correct for setting response headers in PHP :).


Check you aren't sending back a 200 OK without realising it (which will
cause the cache to go "oh, the entity just got refreshed, send 200 back
to the original client", which is in turn the symptom you are seeing).


The PHP script is definitely sending a 304, it logs it to a file to 
confirm (and I've verified that file). You can actually tell that 
mod_cache is getting the 304 response code, because mod_cache serves the 
document body from the cache along with the incorrect 200 code (the body 
of the 304 response from PHP itself is of course empty). Using that test 
code, if the branch that was supposed to set a 304 code set a 200 code 
instead, you would expect an empty document body.


I'm currently running unmodified Apache and PHP patched to not set 
no_local_copy=1 in its response constructor on my production server, and 
mod_cache works flawlessly - the 304 code is correctly sent to the 
client instead of the 200 code.


Cheers,
Nicholas Sherlock



Re: mod_cache sends 200 code instead of 304

2009-09-09 Thread Graham Leggett
Nicholas Sherlock wrote:

> If you make a conditional request for a cached document, but the
> document is expired in the cache, mod_cache currently passes on the
> conditional request to the backend. If the backend responds with a "304
> Not Modified" response that indicates that the cached copy is still up
> to date, mod_cache serves the contents of the cache to the client with a
> 200 code.
> 
> But couldn't it just send a 304 Not Modified code instead? At the moment
> it ends up wasting large amounts of bandwidth on my website in the case
> where you press refresh on an unmodified object in Firefox, which sends
> these request headers:

I kept this back to investigate as I have been ENOTIME, but I've noticed
a small detail:

> if (isset($_SERVER['HTTP_IF_NONE_MATCH']) &&
> $_SERVER['HTTP_IF_NONE_MATCH'] == $etag) {
> 
> /* At a users' request, the cache has been bypassed, but the
>  * document is still the same. Avoid costly response generation
>  * and waste of bandwidth by just sending not-modified.
>  */
> header('HTTP/1.0 304 Not Modified');
  
> 
> error_log(date('r')." - Response: 304 Not Modified\n");
> exit(); //Don't generate or send the body
> }   

Etags and If-None-Match are HTTP/1.1 caching concepts, and yet you're
sending a response back to the cache telling the cache that you are an
HTTP/1.0 server.

I suspect what is happening is that the cache is seeing an HTTP/1.0
response with HTTP/1.1 headers in it, and is in turn ignoring your 304
not modified response.

Try change your response to 'HTTP/1.1 304 Not Modified' instead.

Another thing to check, you're using a function called "header" to set
what is really the response status line, I'm not a php person, but that
looks wrong to me.

Check you aren't sending back a 200 OK without realising it (which will
cause the cache to go "oh, the entity just got refreshed, send 200 back
to the original client", which is in turn the symptom you are seeing).

Regards,
Graham
--


smime.p7s
Description: S/MIME Cryptographic Signature


Re: mod_cache, mod_deflate and Vary: User-Agent

2009-08-28 Thread Nick Kew


On 28 Aug 2009, at 06:13, toki...@aol.com wrote:



> Brian Akins of Turner Broadcasting, Inc. wrote...
>
> We are moving towards the 'if you say you support gzip,
> then you get gzip' attitude.


The only approach that makes sense.  Good to hear that from
folks as big as you.


There isn't a browser in the world that can 'Accept Encoding'
successfully for ALL mime types.


Huh?  Whyever not?  Encoding is orthogonal to MIME type,
and for the ability to decode to be dependent on MIME type
would indicate tortuously over-complicated and hopelessly
broken browser design.

--
Nick Kew


Re: mod_cache, mod_deflate and Vary: User-Agent

2009-08-27 Thread tokiley


> Brian Akins of Turner Broadcasting, Inc. wrote...
>
> We are moving towards the 'if you say you support gzip,
> then you get gzip' attitude.

There isn't a browser in the world that can 'Accept Encoding'
successfully for ALL mime types.

Some are better than others but there are always certain
mime types that should never be returned with any
'Content Encoding' regardless of what the browser
is saying.

In that sense, you can never really trust the 
'Accept-encoding: gzip, deflate' header at all.

There is (currently) no mechanism in the HTTP protocol
for a client to specify WHICH mime types it can
successfully decode.

It was supposed to be an 'all or nothing' DEVCAP
indicator but that's not how things have evolved in
the real world.

There are really only 3 choices...

1. Stick with the original spec and continue to treat
'Accept-encoding: whatever' as an 'all or nothing' indicator
with regards to possible mime types and treat every 
complaint of breakage as 'it's not our problem, your 
browser is non-compliant'.

2. Change the original spec and add a way for clients 
to indicate which mime types can be successfully
decoded and then wait for all the resulting support code 
to be added to all Servers and Proxies.

3. Do nothing, and let every individual Server owner
continue to find their own solution(s) to the problem(s).

Yours
Kevin Kiley



 

-Original Message-
From: Akins, Brian 
To: dev@httpd.apache.org 
Sent: Thu, Aug 27, 2009 9:42 am
Subject: Re: mod_cache, mod_deflate and Vary: User-Agent










On 8/26/09 3:20 PM, "Paul Querna"  wrote:

> I would write little lua scriptlets that map user agents to two
> buckets: supports gzip, doesnt support gzip.  store the thing in
> mod_cache only twice, instead of once for every user agent.

We do the same basic thing.  We are moving towards the "if you say you
support gzip, then you get gzip" attitude.  I think less than 1% of our
clients would be affected, and I think a lot of those are fake agents
anyway.


-- 
Brian Akins




 



Re: mod_cache, mod_deflate and Vary: User-Agent

2009-08-27 Thread Akins, Brian
On 8/26/09 3:20 PM, "Paul Querna"  wrote:

> I would write little lua scriptlets that map user agents to two
> buckets: supports gzip, doesnt support gzip.  store the thing in
> mod_cache only twice, instead of once for every user agent.

We do the same basic thing.  We are moving towards the "if you say you
support gzip, then you get gzip" attitude.  I think less than 1% of our
clients would be affected, and I think a lot of those are fake agents
anyway.


-- 
Brian Akins



Re: mod_cache, mod_deflate and Vary: User-Agent

2009-08-27 Thread tokiley

> William A. Rowe, Jr.
>
> I think we blew it :)
>
> Vary: user-agent is not practical for correcting errant browser behavior.

You have not 'blown it'.

>From a certain perspective, it's the only reasonable thing to do.

Everyone keeps forgetting one very important aspect of this issue
and that is the fact that the 'Browsers' themselves are 
participating in the whole 'caching' scheme and that they
are the source of the actual requests, so their behavior is
as much a part of the equation as any inline proxy cache.

There is no real solution to this problem.

The HTTP protocol itself does not have the capability
to deal with things correctly with regards to 
compressed variants.

The only decision that anyone needs to make is 'Where is
the pain factor?'.

If you VARY on ANYTHING other than 'User-Agent' then this
might show some reduction of the pain factor at the proxy
level but you have now exponentially increased the pain
factor at the infamous 'Last Mile'.

Most modern browsers will NOT 'cache' anything that has
a 'Vary:' header OTHER than 'User-Agent:'. This is as true
today as it was 10 years ago.

The following discussion involving myself and some of the 
authors of the SQUID Proxy caching Server took place just 
short of SEVEN (7) YEARS ago but, as unbelievable as it might
seem, is still just as relevant ( and unresolved )...

http://marc.info/?l=apache-modgzip&m=103958533520502&w=2

It's way too long to reproduce here but here is just 
the SUMMARY part. You would have to access the link
above to read all the gory details...

[snip]

> Hello all.
>
> This is a continuation of the thread entitled...
>
> [Mod_gzip] "mod_gzip_send_vary=Yes" disables caching on IE
>
> After several hours spent doing my own testing with MSIE and
> digging into MSIE internals with a kernel debugger I think I
> have the answers.
>
> The news is NOT GOOD.
>
> I will start with a SUMMARY first for those who don't have the
> time to read the whole, ugly story but for those who want to
> know where the following 'conclusions' are coming from I
> refer you to the rest of the message and the "detail".
>
> SUMMARY
>
> There is only 1 request header value that you can use with
> "Vary:" that will cause MSIE to cache a non-compressed
> response and that is ( drum roll please ) "User-Agent".
>
> If you use ANY other (legal) request header field name in
> a "Vary:" header then MSIE ( Versions 4, 5 and 6 ) will
> REFUSE to cache that response in the MSIE local cache.
>
> This is why Jordan is seeing a caching problem and Slava
> is not. Slava is 'accidentally' using the only possible "Vary:"
> field name that will cause MSIE to behave as it should
> and cache a non-compressed response.
>
> Jordan is seeing non-compressed responses never being
> cached by MSIE because the responses are arriving
> with something other than "Vary: User-Agent" like
> "Vary: Accept-Encoding".
>
> It should be perfectly legal and fine to send "Vary: Accept-Encoding"
> on a non-compressed response that can 'Vary' on that field
> value and that response SHOULD be 'cached' by MSIE...
> but so much for assumptions. MSIE will NOT cache this response.
>
> MSIE will treat ANY field name other than "User-Agent"
> as if "Vary: *" ( Vary + STAR ) was used and it will
> NOT cache the non-compressed response.
>
> The reason the COMPRESSED responses are, in fact,
> always getting cached no matter what "Vary:" field name
> is present is just as I suspected... it is because MSIE
> decides it MUST cache responses that arrive with
> "Content-Encoding: gzip" because it MUST have a
> disk ( cache ) file to work with in order to do the
> decompression.
>
> The problem exists in ALL versions of MSIE but it's
> even WORSE for any version earlier than 5.0. MSIE 4.x
> will not even cache responses with "Vary: User-Agent".
>
> That's it for the SUMMARY.
>
> The rest of this message contains the gory details.

[/snip]

I participated in another lengthy 'offline' discussion about
all this some 3 or 4 years ago again with the authors of 
SQUID. There was still no real resolution to the problem.

The general consensus was that if there is always going to
be a 'pain factor' then it's better to follow one of the
rules of Networking and assume the following...

"The least amount of resources will always be present
the closer you get to the last mile."

In other words... it's BETTER to live with some redundant
traffic at the proxy level, where the equipment and bandwidth 
is usually more robust and closer to the backbone, than to put 
the pain factor onto the 'last mile' where resources are usually
more constrained.

If anyone is going to start dropping some special code
anywhere to 'invisibly handle the problem' my suggestion
would be to look at coming up with a scheme that undoes
the damage these out-of-control redundant 'User-Agent' strings are 
causing. The only thing a proxy cache really needs to know is
whether a certain 'User-Agent' string represents a 
different level of DEVCAP th

Re: mod_cache, mod_deflate and Vary: User-Agent

2009-08-26 Thread Paul Querna
On Wed, Aug 26, 2009 at 2:50 PM, William A. Rowe,
Jr. wrote:
> Paul Querna wrote:
>>
>> Yes, write a Varied header to 'hash' plugin API for mod_cache.
>>
>> I would write little lua scriptlets that map user agents to two
>> buckets: supports gzip, doesnt support gzip.  store the thing in
>> mod_cache only twice, instead of once for every user agent.
>
> This doesn't solve the problem of each-and-every downstream proxy
> cache storing an excessively large number of copies.  Even if we
> strip down comments from the fields before choosing cache entries,
> Mozilla's many versions of Mozilla/2.0.3 and Gecko/20090731 tags
> are going to continue to proliferate copies.
>
> I'm suggesting that this might need to be 'invisibly' handled, not
> using Vary:, but by any proxy clever enough to detect the non-conforming
> browser to then strip the request to deflate/gzip.  At that point, the
> choice-of-two becomes obvious to all proxies and back end servers with
> this knowledge.  If this is unknown to an earlier proxy, the client
> could get the broken deflate/gzip content, but that seems unavoidable.
>
> Honestly, I can't see a way to honor HTTP/1.1 cache negotiation goals
> while minimizing cache pollution.

There isn't.  So, optimize your cache, strip caching headers to
downstream proxies.

Maybe Waka can fix it.


Re: mod_cache, mod_deflate and Vary: User-Agent

2009-08-26 Thread William A. Rowe, Jr.
Paul Querna wrote:
> 
> Yes, write a Varied header to 'hash' plugin API for mod_cache.
> 
> I would write little lua scriptlets that map user agents to two
> buckets: supports gzip, doesnt support gzip.  store the thing in
> mod_cache only twice, instead of once for every user agent.

This doesn't solve the problem of each-and-every downstream proxy
cache storing an excessively large number of copies.  Even if we
strip down comments from the fields before choosing cache entries,
Mozilla's many versions of Mozilla/2.0.3 and Gecko/20090731 tags
are going to continue to proliferate copies.

I'm suggesting that this might need to be 'invisibly' handled, not
using Vary:, but by any proxy clever enough to detect the non-conforming
browser to then strip the request to deflate/gzip.  At that point, the
choice-of-two becomes obvious to all proxies and back end servers with
this knowledge.  If this is unknown to an earlier proxy, the client
could get the broken deflate/gzip content, but that seems unavoidable.

Honestly, I can't see a way to honor HTTP/1.1 cache negotiation goals
while minimizing cache pollution.

I did consider a module (lua or otherwise) that would 'interfere' in
the initial quick handler phase just to work out broken user agents,
rather than carry the entire weight of setenvif/headers to the quick
handler phase.




Re: mod_cache, mod_deflate and Vary: User-Agent

2009-08-26 Thread Paul Querna
On Wed, Aug 26, 2009 at 11:47 AM, William A. Rowe,
Jr. wrote:
> I think we blew it :)
>
> Vary: user-agent is not practical for correcting errant browser behavior.
>
> For example;
>
>  User-Agent: Mozilla/5.0 Gecko/20090729 Firefox/3.5.2
>
> produces a myriad number of 'variant' flavors when tagging Vary with
> the User-Agent when determining if the deflate/gzip compression should
> be served, or the uncompressed variant.
>
> What we really meant to do was to determine which Accept-Encoding values
> were invalid based on known browser bugs, and -remove them- from the A-E
> header *prior* to determining the cache handling (quick handler hook) or
> typical content handling.
>
> Which implies that setenvif + headers need an extra chance to run really
> first in front of the quick handler.
>
> Any better suggestions?

Yes, write a Varied header to 'hash' plugin API for mod_cache.

I would write little lua scriptlets that map user agents to two
buckets: supports gzip, doesnt support gzip.  store the thing in
mod_cache only twice, instead of once for every user agent.


Re: mod_cache sends 200 code instead of 304

2009-07-25 Thread Nicholas Sherlock

Nicholas Sherlock wrote:
Thanks, I wasn't certain if the behaviour I wanted was HTTP-correct, but 
it seems that it is (and anyway it'll save me on bandwidth costs, so I 
really want to fix it). I'll go add it now.


This is now bug report #47580

https://issues.apache.org/bugzilla/show_bug.cgi?id=47580

Cheers,
Nicholas Sherlock



Re: mod_cache sends 200 code instead of 304

2009-07-25 Thread Nicholas Sherlock

Dan Poirier wrote:

Nicholas Sherlock  writes:


If you make a conditional request for a cached document, but the
document is expired in the cache, mod_cache currently passes on the
conditional request to the backend. If the backend responds with a
"304 Not Modified" response that indicates that the cached copy is
still up to date, mod_cache serves the contents of the cache to the
client with a 200 code.


This wouldn't surprise me.  There's currently a bug open for the
opposite case, returning a 304 to an unconditional request (45341).

I believe this violates a SHOULD in 14.25 of RFC 2616, which isn't as
strong as a MUST, but certainly would indicate it's worthwhile to try to
fix it.

I'd suggest opening a bug report
(http://httpd.apache.org/bug_report.html), including all the details
from your original message, so this doesn't fall through the cracks
before someone gets to look at it in more depth.


Thanks, I wasn't certain if the behaviour I wanted was HTTP-correct, but 
it seems that it is (and anyway it'll save me on bandwidth costs, so I 
really want to fix it). I'll go add it now.


Cheers,
Nicholas Sherlock



Re: mod_cache sends 200 code instead of 304

2009-07-25 Thread Dan Poirier
Nicholas Sherlock  writes:

> If you make a conditional request for a cached document, but the
> document is expired in the cache, mod_cache currently passes on the
> conditional request to the backend. If the backend responds with a
> "304 Not Modified" response that indicates that the cached copy is
> still up to date, mod_cache serves the contents of the cache to the
> client with a 200 code.

This wouldn't surprise me.  There's currently a bug open for the
opposite case, returning a 304 to an unconditional request (45341).

I believe this violates a SHOULD in 14.25 of RFC 2616, which isn't as
strong as a MUST, but certainly would indicate it's worthwhile to try to
fix it.

I'd suggest opening a bug report
(http://httpd.apache.org/bug_report.html), including all the details
from your original message, so this doesn't fall through the cracks
before someone gets to look at it in more depth.

Dan


Re: mod_cache and module-based authentication

2009-02-10 Thread Graham Leggett

Jon Grov wrote:


Our current workaround is to run two reverse proxy-instances, one which
provides authentication (on port 80) and another providing cache (on port
7920, which is only accessible from within PROXY). A request then
first hits the authentication proxy on port 80, and if valid, is
forwarded to the caching proxy on local port 7920.

This works, but it feels somewhat suboptimal, and we would much prefer to be
able to use one instance to serve both purposes.


I have been tasked with solving a very similar problem: the ability to 
optionally place the cache anywhere in the output filter chain (instead 
of replacing the whole filter chain, as now).


The rationale is that we need to cache content before the INCLUDES 
filter gets hold of the content, and that is currently not possible. 
Give me a day or two.


Regards,
Graham
--


smime.p7s
Description: S/MIME Cryptographic Signature


Re: mod_cache not caching some RFC valid cacheable requests

2008-12-06 Thread Paul Querna

Alex Polvi wrote:

Hi there,

I ran into a weird case where *I think* mod_cache should be caching a
request that it is not. Thought I would try to fix it myself, but
would like to seek your feedback as it is my first httpd patch (thanks
to pquerna for the help/encouragement).

Caching a 302 is generally not valid (RFC2616 13.4), unless the
response headers includes an Expires or Cache Control header (section
13.4, last paragraph). This makes the fix a matter of messing with the
cacheability logic. I optimized for least amount of code change, but
there are surely different ways to do this. Feedback on the best
approach would be greatly appreciated!

Thanks,

-Alex

PS: I also filed a bug, if that is a better forum for this discussion:
https://issues.apache.org/bugzilla/show_bug.cgi?id=46346


Committed to trunk in r724093,

Thanks,

-Paul


Re: mod_cache: Don't update when req max-age=0?

2007-05-24 Thread Henrik Nordstrom
tor 2007-05-24 klockan 13:22 +0200 skrev Niklas Edmundsson:

> c) RFC-wise it seems to me that a not-modified object is a
> not-modified object. There is no guarantee that next request will
> hit the same cache, so nothing can expect a max-age=0 request to
> force a cache to rewrite its headers and then access it with
> max-age!=0 and get headers of that age.

Yes. RFC wise it's fine to not update the cache with the 304. Updating
of cached entries is optional (RFC2616 10.3.5 last paragraph).

The only MUST regardig 304 and caches is that you MUST ignore the 304
and retry the request without the conditional if the 304 indicates
another object than what is currently cached (i.e. ETag or Last-Modified
differs).  (same section, the paragraph above)

Regards
Henrik


signature.asc
Description: Detta är en digitalt signerad	meddelandedel


Re: mod_cache: Don't update when req max-age=0?

2007-05-24 Thread Niklas Edmundsson

On Thu, 24 May 2007, Sander Striker wrote:


>> -8<---
>> Does anybody see a problem with changing mod_cache to not update the
>> stored headers when the request has max-age=0, the body turns out not
>> to be stale and the on-disk header hasn't expired?
>> -8<---
>
> My understanding:
>
> It's fine in an RFC point of view for the cache to completely ignore a
> 304 and not update the stored entity at all. But the response to this
> request should be the merge of the two responses assuming the
> conditional was added by the cache.

This is in line with my understanding, and since the response-merging
is being done today the only change that would be done is to skip
storing the header to disk. I think it would be wise to only skip the
storing for the max-age=0 case though.


Why limit it to the the max-age=0 case?  Isn't it a general improvement?


Consider a default cache lifetime of 86400 seconds, and requests 
coming in with max-age=4 (we see a lot of mozilla downloads with 
this, for example). If you don't rewrite the on-disk headers you'll 
end up always hitting your backend when you pass an age of 4.


In the max-age=0 case you only force an unneccesary header write, 
because:

a) The written header won't be useful for other requests with
   max-age=0. A ground rule of caching is to not save stuff that's
   never used.
b) Requests with max-age!=0 aren't helped much by it, the only penalty
   would be when an max-age!=0 request causes a header rewrite that
   an max-age=0 access would have performed. Doing this single rewrite
   instead of potentially thousands if rewriting due to max-age=0
   is a rather big win.
c) RFC-wise it seems to me that a not-modified object is a
   not-modified object. There is no guarantee that next request will
   hit the same cache, so nothing can expect a max-age=0 request to
   force a cache to rewrite its headers and then access it with
   max-age!=0 and get headers of that age.
d) Also, an object tend to be accessed with more-or-less the same
   max-age. So to store headers in the max-age=0 case just because it
   might be accessed by max-age!=0 makes no sense, since it's more
   likely that the next request to this object will have the same
   max-age.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Did I just step on someones toes again??
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=


Re: mod_cache: Don't update when req max-age=0?

2007-05-24 Thread Graham Leggett
On Thu, May 24, 2007 10:23 am, Sander Striker wrote:

>> > It's fine in an RFC point of view for the cache to completely ignore a
>> > 304 and not update the stored entity at all. But the response to this
>> > request should be the merge of the two responses assuming the
>> > conditional was added by the cache.
>>
>> This is in line with my understanding, and since the response-merging
>> is being done today the only change that would be done is to skip
>> storing the header to disk. I think it would be wise to only skip the
>> storing for the max-age=0 case though.
>
> Why limit it to the the max-age=0 case?  Isn't it a general improvement?

It isn't - the nett effect of not storing the headers to disk, means that
once a fresh object goes stale, it will remain stale until the end of
days, because the mechanism to make that object fresh again has been
removed.

If the object remains stale, it means that a conditional request will be
generated and sent to the backend on every single hit, which is
unnecessary load on both the backend network and the backend webserver.

As a directive controlled special case, this feature makes sense - but
this isn't the kind of default behaviour you want to see on a cache.

A better approach might be to determine whether the headers have actually
changed before writing them to disk. You needed to read the header in in
the first place, if the previously-read header and the newly-received
header from the backend are the same, then don't write to disk, it's
unnecessary.

This remains RFC compliant and solves the underlying problem.

Regards,
Graham
--




Re: mod_cache: Don't update when req max-age=0?

2007-05-24 Thread Sander Striker

On 5/24/07, Niklas Edmundsson <[EMAIL PROTECTED]> wrote:

On Tue, 22 May 2007, Henrik Nordstrom wrote:

> tis 2007-05-22 klockan 11:40 +0200 skrev Niklas Edmundsson:
>
>> -8<---
>> Does anybody see a problem with changing mod_cache to not update the
>> stored headers when the request has max-age=0, the body turns out not
>> to be stale and the on-disk header hasn't expired?
>> -8<---
>
> My understanding:
>
> It's fine in an RFC point of view for the cache to completely ignore a
> 304 and not update the stored entity at all. But the response to this
> request should be the merge of the two responses assuming the
> conditional was added by the cache.

This is in line with my understanding, and since the response-merging
is being done today the only change that would be done is to skip
storing the header to disk. I think it would be wise to only skip the
storing for the max-age=0 case though.


Why limit it to the the max-age=0 case?  Isn't it a general improvement?

Sander


Re: mod_cache: Don't update when req max-age=0?

2007-05-24 Thread Niklas Edmundsson

On Tue, 22 May 2007, Henrik Nordstrom wrote:


tis 2007-05-22 klockan 11:40 +0200 skrev Niklas Edmundsson:


-8<---
Does anybody see a problem with changing mod_cache to not update the
stored headers when the request has max-age=0, the body turns out not
to be stale and the on-disk header hasn't expired?
-8<---


My understanding:

It's fine in an RFC point of view for the cache to completely ignore a
304 and not update the stored entity at all. But the response to this
request should be the merge of the two responses assuming the
conditional was added by the cache.


This is in line with my understanding, and since the response-merging 
is being done today the only change that would be done is to skip 
storing the header to disk. I think it would be wise to only skip the 
storing for the max-age=0 case though.


Should I try to whip up a patch for it then?


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Radioactive halibut will make fission chips.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=


Re: mod_cache: Don't update when req max-age=0?

2007-05-22 Thread Henrik Nordstrom
tis 2007-05-22 klockan 11:40 +0200 skrev Niklas Edmundsson:

> -8<---
> Does anybody see a problem with changing mod_cache to not update the 
> stored headers when the request has max-age=0, the body turns out not 
> to be stale and the on-disk header hasn't expired?
> -8<---

My understanding:

It's fine in an RFC point of view for the cache to completely ignore a
304 and not update the stored entity at all. But the response to this
request should be the merge of the two responses assuming the
conditional was added by the cache.

Regards
Henrik


signature.asc
Description: Detta är en digitalt signerad	meddelandedel


Re: mod_cache: Don't update when req max-age=0?

2007-05-22 Thread Niklas Edmundsson

On Mon, 21 May 2007, Roy T. Fielding wrote:


On May 21, 2007, at 7:49 AM, Niklas Edmundsson wrote:
Does anybody see a problem with changing mod_cache to not update the stored 
headers when the request has max-age=0, the body turns out not to be stale 
and the on-disk header hasn't expired?


Yes, the problem is that it will break content management systems that
need to refresh a cache front-end after the content has changed.


To quote myself:
-8<---
Does anybody see a problem with changing mod_cache to not update the 
stored headers when the request has max-age=0, the body turns out not 
to be stale and the on-disk header hasn't expired?

-8<---

Read the conditions I stated carefully, please.

Since the body isn't stale and the on-disk header hasn't expired it is 
_exactly the same_ as what is being returned to the client today!


This is because mod_cache already detects this, look in 
cache_save_filter() for HTTP_NOT_MODIFIED and stale_handle. Today it 
rewrites the on-disk header and uses the already cached body to 
fulfill the request.


What I'm requesting is that it skips rewriting the on-disk headers in 
the case where all these conditions are fulfilled (yes, I'm 
reiterating here):

- The request has max-age=0
- The body is NOT stale
- The on-disk header hasn't expired

This will NOT break requests where the body/content has changed, since 
they simply don't fulfill the "body is NOT stale" condition.


The win will be that we don't have to waste IO and cycles rewriting 
the on-disk headers more or less continuously when objects are 
hammered with max-age=0 requests, we only have to rewrite the on-disk 
headers when really needed.


On a side note, shouldn't we check the return value of recall_body() 
in mod_cache.c?


The rationale behind this is that there are hordes of stupid "download 
managers" that always issue this kind of request, and multiple in parallell 
to the same file at that. This hammers the entire cache-layer by causing 
headers to be rewritten for each request.


Why don't you just add an ignore of cache-control on requests from
those stupid download managers?  A simple BrowserMatch should do.


The download managers sets the same browser-string as ordinary 
browsers, they are usually indistinguishable from a real browser. Yes, 
broken. Yes, that's life.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Never test for an error you don't know how to handle.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=


Re: mod_cache: Don't update when req max-age=0?

2007-05-21 Thread Roy T. Fielding

On May 21, 2007, at 2:22 PM, Ruediger Pluem wrote:

Why don't you just add an ignore of cache-control on requests from
those stupid download managers?  A simple BrowserMatch should do.


I am not quite sure what you mean by this. AFAIK you cannot set
CacheIgnoreCacheControl based on env variables.


Which is why we would have to add it to the code.  Note that this
would be to ignore client-provided cache control, which is a good
feature to have on a cache for various DoS reasons.

Roy


Re: mod_cache: Don't update when req max-age=0?

2007-05-21 Thread Graham Leggett

Niklas Edmundsson wrote:


At first glance, doing this I think will break RFC2616 compliance, and if
it does break RFC compliance then I think it should not be default
behaviour. However if it does solve a real problem for admins, then 
having

a directive allowing the admin to enable this behaviour does make sense.


Why would it break RFC compliance?


Because when clients say "maxage=0" it means "please consider all URLs 
as stale and revalidate them", and the server is obliged to honor this.


This request will never benefit of 
the headers being saved to disk, and the headers returned to the client 
should of course be those that resulted of the revalidation of the 
object. The only difference is that they aren't saved to disk too.


If this happens you introduce a subtle bug - when the URL becomes stale 
on the frontend, it will remain stale to the end of days, because the 
entry on disk is never refreshed with new headers to show the content is 
fresh.


Yup. CacheIgnoreCacheControl is one of those, we use it on the 
offloaders that only serves large files that we know doesn't need the 
RFC behaviour.


I was thinking of a directive like CacheOrigin [on|off], meaning that 
*this* cache isn't a cache at all, but rather an origin server that just 
happens to fetch data via HTTP from some backend if the data isn't fresh 
in the cache.


Regards,
Graham
--


smime.p7s
Description: S/MIME Cryptographic Signature


Re: mod_cache: Don't update when req max-age=0?

2007-05-21 Thread Ruediger Pluem


On 05/21/2007 09:07 PM, Roy T. Fielding wrote:

> 
> Why don't you just add an ignore of cache-control on requests from
> those stupid download managers?  A simple BrowserMatch should do.

I am not quite sure what you mean by this. AFAIK you cannot set
CacheIgnoreCacheControl based on env variables.

Regards

Rüdiger



Re: mod_cache: Don't update when req max-age=0?

2007-05-21 Thread Roy T. Fielding

On May 21, 2007, at 7:49 AM, Niklas Edmundsson wrote:
Does anybody see a problem with changing mod_cache to not update  
the stored headers when the request has max-age=0, the body turns  
out not to be stale and the on-disk header hasn't expired?


Yes, the problem is that it will break content management systems that
need to refresh a cache front-end after the content has changed.

The rationale behind this is that there are hordes of stupid  
"download managers" that always issue this kind of request, and  
multiple in parallell to the same file at that. This hammers the  
entire cache-layer by causing headers to be rewritten for each  
request.


Why don't you just add an ignore of cache-control on requests from
those stupid download managers?  A simple BrowserMatch should do.

Roy



Re: mod_cache: Don't update when req max-age=0?

2007-05-21 Thread Niklas Edmundsson

On Mon, 21 May 2007, Graham Leggett wrote:


Since max-age=0 requests can't be fulfilled without revalidating the
object they don't benefit from this header rewrite, and requests with
max-age!=0 that can benefit from the header rewrite won't be affected
by this change.

Am I making sense? Have I missed something fundamental?


At first glance, doing this I think will break RFC2616 compliance, and if
it does break RFC compliance then I think it should not be default
behaviour. However if it does solve a real problem for admins, then having
a directive allowing the admin to enable this behaviour does make sense.


Why would it break RFC compliance? This request will never benefit of 
the headers being saved to disk, and the headers returned to the 
client should of course be those that resulted of the revalidation of 
the object. The only difference is that they aren't saved to disk too.


The only difference I can see is that you can't "probe" that the 
previous request was a max-age=0 by doing max-age!=0 request 
afterwards...



Zooming out a little bit, this seems to fall into the category of "RFC
violations that allow the cache to either hit the backend less, or hit the
backend not at all, for the benefit of an admin who knows whet they are
doing".

A simple set of directives that allow an admin to break RFC compliance
under certain circumstances in order to achieve certain goals does make
sense.


Yup. CacheIgnoreCacheControl is one of those, we use it on the 
offloaders that only serves large files that we know doesn't need the 
RFC behaviour.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Sir, We are receiving 285,000 Hails. þ Crusher
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=


Re: mod_cache: Don't update when req max-age=0?

2007-05-21 Thread Graham Leggett
On Mon, May 21, 2007 4:49 pm, Niklas Edmundsson wrote:

> Does anybody see a problem with changing mod_cache to not update the
> stored headers when the request has max-age=0, the body turns out
> not to be stale and the on-disk header hasn't expired?
>
> The rationale behind this is that there are hordes of stupid "download
> managers" that always issue this kind of request, and multiple in
> parallell to the same file at that. This hammers the entire
> cache-layer by causing headers to be rewritten for each request.
>
> Since max-age=0 requests can't be fulfilled without revalidating the
> object they don't benefit from this header rewrite, and requests with
> max-age!=0 that can benefit from the header rewrite won't be affected
> by this change.
>
> Am I making sense? Have I missed something fundamental?

At first glance, doing this I think will break RFC2616 compliance, and if
it does break RFC compliance then I think it should not be default
behaviour. However if it does solve a real problem for admins, then having
a directive allowing the admin to enable this behaviour does make sense.

Zooming out a little bit, this seems to fall into the category of "RFC
violations that allow the cache to either hit the backend less, or hit the
backend not at all, for the benefit of an admin who knows whet they are
doing".

A simple set of directives that allow an admin to break RFC compliance
under certain circumstances in order to achieve certain goals does make
sense.

Regards,
Graham
--




Re: mod_cache: 304 on HEAD (bug 41230)

2007-04-11 Thread Niklas Edmundsson

On Wed, 11 Apr 2007, Niklas Edmundsson wrote:

Would the correct fix be to check for r->header_only in cache_select(), or 
are there even more funky stuff going on? You don't want the cached object to 
be removed just because you got a HEAD request when it really isn't stale but 
just in need of revalidation. Ideally the HEAD request would cause the object 
to be revalidated if possible, but we can live with head requests just doing 
fallback without touching the cache.


I can whip up a patch for it, but I suspect that you guys are more clued on 
the deep magic involved :)


Looking a bit further, I think that something like this would actually 
be enough:

---8<--
--- mod_cache.c.orig2007-04-11 13:29:14.0 +0200
+++ mod_cache.c 2007-04-11 14:06:29.0 +0200
@@ -456,7 +456,7 @@ static int cache_save_filter(ap_filter_t
  */
 reason = "No Last-Modified, Etag, or Expires headers";
 }
-else if (r->header_only) {
+else if (r->header_only && !cache->stale_handle) {
 /* HEAD requests */
 reason = "HTTP HEAD request";
 }
@@ -589,11 +589,12 @@ static int cache_save_filter(ap_filter_t
 cache->provider->remove_entity(cache->stale_handle);
 /* Treat the request as if it wasn't conditional. */
 cache->stale_handle = NULL;
+rv = !OK;
 }
 }

-/* no cache handle, create a new entity */
-if (!cache->handle) {
+/* no cache handle, create a new entity only for non-HEAD request */
+if (!cache->handle && !r->header_only) {
 rv = cache_create_entity(r, size);
 info = apr_pcalloc(r->pool, sizeof(cache_info));
 /* We only set info->status upon the initial creation. */
---8<--

If I have understood things right this would:
- Accept revalidations even though it's a HEAD if the object wasn't
  stale.
- Bail out if the object is stale and it's a HEAD.

I haven't tried it yet though, I'm just trying to get a grasp of 
things. I have no clue on whether other things would break due to the 
fact that it's revalidated based on a HEAD instead of a GET, for 
example.


/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 I am Mr. T of Borg. I pity da fool that resists me.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=


Re: Mod_cache expires check

2007-03-03 Thread Bart van der Schans

Ruediger Pluem wrote:



Providing a better reference to the patch you are talking about would be a 
start :-).


Of course, and now when I'm trying to find Davi's mail from the 18th of 
January in the archive it seems to be missing, so maybe it didn't even 
make it to the list :( So here is his patch again.


The original problem was described in:
http://mail-archives.apache.org/mod_mbox/httpd-dev/200701.mbox/[EMAIL PROTECTED]

Regards,

Bart



Index: modules/cache/mod_cache.c
===
--- modules/cache/mod_cache.c	(revision 497262)
+++ modules/cache/mod_cache.c	(working copy)
@@ -372,13 +372,8 @@
 exps = apr_table_get(r->headers_out, "Expires");
 }
 if (exps != NULL) {
-if (APR_DATE_BAD == (exp = apr_date_parse_http(exps))) {
-exps = NULL;
-}
+exp = apr_date_parse_http(exps);
 }
-else {
-exp = APR_DATE_BAD;
-}
 
 /* read the last-modified date; if the date is bad, then delete it */
 lastmods = apr_table_get(r->err_headers_out, "Last-Modified");
@@ -424,21 +419,24 @@
  */
 reason = apr_psprintf(p, "Response status %d", r->status);
 }
+else if (r->args && exps == NULL) {
+/* if query string present but no expiration time, don't cache it
+ * (RFC 2616/13.9)
+ */
+reason = "Query string present but no expires header";
+}
+/* XXX: APR_DATE_BAD (0) is a valid date */
 else if (exps != NULL && exp == APR_DATE_BAD) {
 /* if a broken Expires header is present, don't cache it */
 reason = apr_pstrcat(p, "Broken expires header: ", exps, NULL);
 }
-else if (exp != APR_DATE_BAD && exp < r->request_time)
+else if (exps != NULL && exp < r->request_time)
 {
-/* if a Expires header is in the past, don't cache it */
+/* if a Expires header is in the past, don't cache it.
+ * it may also be a broken header too, anyway.. we won't cache it
+ */
 reason = "Expires header already expired, not cacheable";
 }
-else if (r->args && exps == NULL) {
-/* if query string present but no expiration time, don't cache it
- * (RFC 2616/13.9)
- */
-reason = "Query string present but no expires header";
-}
 else if (r->status == HTTP_NOT_MODIFIED &&
  !cache->handle && !cache->stale_handle) {
 /* if the server said 304 Not Modified but we have no cache
@@ -686,7 +684,7 @@
  *   else
  *  expire date = date + defaultexpire
  */
-if (exp == APR_DATE_BAD) {
+if (exps == NULL) {
 char expire_hdr[APR_RFC822_DATE_LEN];
 
 /* if lastmod == date then you get 0*conf->factor which results in


Re: Mod_cache expires check

2007-03-03 Thread Ruediger Pluem


On 03/03/2007 06:08 PM, Bart van der Schans wrote:
> Bart van der Schans wrote:
> 
>> Davi Arnaut wrote:
>>  >
>>  > Looking at it more, the previous check it's also useless. Attempted
>> patch...
>>
>> I finally had some time to test the patch and it seems to work
>> correctly. It still recognizes Unix epoch as a bad date, but mod_cache
>> won't cache it.
>>
> 
> Is there any change the patch from Davi will make it in the trunk (and
> hopefully backported to the branch)? What is the correct way to move
> forward with this? Can I help in some way?

Providing a better reference to the patch you are talking about would be a 
start :-).
Seriously, in general you are correct in bugging us here and the help you can
offer varies from case to case (testing, providing documentation, rewriting the 
patch
according to review remarks, etc.)

Regards

Rüdiger



Re: Mod_cache expires check

2007-03-03 Thread Bart van der Schans

Bart van der Schans wrote:

Is there any change the patch from Davi will make it in the trunk (and 


That should have read chance of course, sorry about the typo.

Bart


Re: Mod_cache expires check

2007-03-03 Thread Bart van der Schans

Bart van der Schans wrote:

Davi Arnaut wrote:
 >
 > Looking at it more, the previous check it's also useless. Attempted 
patch...


I finally had some time to test the patch and it seems to work 
correctly. It still recognizes Unix epoch as a bad date, but mod_cache 
won't cache it.




Is there any change the patch from Davi will make it in the trunk (and 
hopefully backported to the branch)? What is the correct way to move 
forward with this? Can I help in some way?


Regards,

Bart


Re: mod_cache: MISS or HIT

2007-02-13 Thread Dziugas Baltrunas

Hi,

I was looking for something more reliable so I now realize it will
require patching of mod_cache.c by adding some flag (i.e. setting some
key r->notes, registering mod_log_config modifier via log_pfn_register
or similar).

On 2/12/07, Joshua Slive <[EMAIL PROTECTED]> wrote:

On 2/12/07, Dziugas Baltrunas <[EMAIL PROTECTED]> wrote:

> I'm trying to figure out the way how to put information in access log
> (via mod_log_config) whether the request was a cache hit or miss
> (similar to what squid does -  TCP_MISS and TCP_HIT). I think this
> information is necessary for any proxy server (or acting like it) and
> it's always wise to know hit/miss ratio.

You can log the Age HTTP response header for this.  If it is present,
you have a cache hit, and if not, a miss.

Of course, this probably wouldn't be reliable if your cache was in
front of another cache.  But it works in most circumstances.

Joshua.




--
Dziugas


Re: mod_cache: MISS or HIT

2007-02-12 Thread Joshua Slive

On 2/12/07, Dziugas Baltrunas <[EMAIL PROTECTED]> wrote:


I'm trying to figure out the way how to put information in access log
(via mod_log_config) whether the request was a cache hit or miss
(similar to what squid does -  TCP_MISS and TCP_HIT). I think this
information is necessary for any proxy server (or acting like it) and
it's always wise to know hit/miss ratio.


You can log the Age HTTP response header for this.  If it is present,
you have a cache hit, and if not, a miss.

Of course, this probably wouldn't be reliable if your cache was in
front of another cache.  But it works in most circumstances.

Joshua.


Re: mod_cache+mod_rewrite behaviour

2007-01-28 Thread Fredrik Widlund

Hi,

This is regarding my 
http://issues.apache.org/bugzilla/show_bug.cgi?id=41484 suggested 
enhancement. I am attaching the same patch in this mail.


I have a need to use caching to improve performance when delivering for 
example banners. Since query-string parameters are used to associate 
unique meta-information to each request mod_cache will treat each 
request as a new request. Since mod_cache is run as a quick handler it 
is not possible to use mod_rewrite to remove the query-string part of 
the request.


I believe this should be considered a relevant scenario, and that an 
option to disable this behaviour is motivated.


I don't see any drawbacks as a result of this option.

This is very much a real-life scenario, I am a doing this on behalf of a 
leading European CDN and we need to use this in a highly distributed 
environment to be able to scale efficiently.


PPP regards,
Fredrik Widlund

Plüm skrev:
  

-Ursprüngliche Nachricht-
Von: Fredrik Widlund 
Gesendet: Freitag, 19. Januar 2007 12:30

An: dev@httpd.apache.org
Betreff: Re: mod_cache+mod_rewrite behaviour


Hi,

Thanks for the information. Tried the patch and it mends it the
behaviour, however it doesn't really help me of course since 
I indeed am

trying to rewrite the url before it's cached.

What are the chances of getting a patch that adds a
"CacheIgnoreQueryString" option accepted? Who/where do I ask this?



This is the right place for discussion. I would propose the following:

1. Create a bug report describing your problem and mark it as enhancement.
2. If you have a patch for CacheIgnoreQueryString attach it to the report.
3. Come back here with your problem (in this thread), refer to the report
   and attach the patch for convenience.
4. Give some arguments why this is not only useful for you but for everyone 
else.
   And if there are any drawbacks as a result of your patch why it is worth
   the tradeoff.
5. Be PPP (Patient, Polite, Persistent) :-). Keep on buging us from time to
   time if the reaction to your proposal is only inactivity and not decline.

Regards

Rüdiger

  


--- cache_storage.c.origSun Jan 28 20:12:51 2007
+++ cache_storage.c Sun Jan 28 20:25:01 2007
@@ -331,10 +331,16 @@
 apr_status_t cache_generate_key_default(request_rec *r, apr_pool_t* p,
 char**key)
 {
+cache_server_conf *conf;
 char *port_str, *hn, *lcs;
 const char *hostname, *scheme;
 int i;
 
+/* Get the module configuration. We need this for the 
CacheIgnoreQueryString
+ * below */
+conf = (cache_server_conf *) ap_get_module_config(r->server->module_config,
+ &cache_module);
+
 /*
  * Use the canonical name to improve cache hit rate, but only if this is
  * not a proxy request or if this is a reverse proxy request.
@@ -424,10 +430,14 @@
 /* Use the server port */
 port_str = apr_psprintf(p, ":%u", ap_get_server_port(r));
 }
-
-/* Key format is a URI */
-*key = apr_pstrcat(p, scheme, "://", hostname, port_str,
-   r->parsed_uri.path, "?", r->args, NULL);
+
+/* Key format is a URI, optionally without the query-string */
+if (conf->ignorequerystring)
+   *key = apr_pstrcat(p, scheme, "://", hostname, port_str,
+  r->parsed_uri.path, "?", NULL);
+else
+   *key = apr_pstrcat(p, scheme, "://", hostname, port_str,
+  r->parsed_uri.path, "?", r->args, NULL);
 
 return APR_SUCCESS;
 }
--- mod_cache.c.origSun Jan 28 20:32:48 2007
+++ mod_cache.c Sun Jan 28 20:32:32 2007
@@ -433,7 +433,8 @@
 /* if a Expires header is in the past, don't cache it */
 reason = "Expires header already expired, not cacheable";
 }
-else if (r->args && exps == NULL) {
+else if (!conf->ignorequerystring &&
+r->parsed_uri.query && exps == NULL) {
 /* if query string present but no expiration time, don't cache it
  * (RFC 2616/13.9)
  */
@@ -889,6 +890,8 @@
 ps->no_last_mod_ignore = 0;
 ps->ignorecachecontrol = 0;
 ps->ignorecachecontrol_set = 0;
+ps->ignorequerystring = 0;
+ps->ignorequerystring_set = 0;
 ps->store_private = 0;
 ps->store_private_set = 0;
 ps->store_nostore = 0;
@@ -929,6 +932,10 @@
 (overrides->ignorecachecontrol_set == 0)
 ? base->ignorecachecontrol
 : overrides->ignorecachecontrol;
+ps->ignorequerystring  =
+   (overrides->ignorequerystring_set == 0)
+   ? base->ignorequerystring
+   : overrides->ignorequerystring;
 ps->store_private  =
 (overrides->store_private_set == 0)
 ? base->store_private

Re: mod_cache: save filter recalls body to non-empty brigade?

2007-01-25 Thread Niklas Edmundsson

On Wed, 24 Jan 2007, Plüm, Rüdiger, VF EITO wrote:


Of course practically you don't want to make assumptions about the
emptiness of the existing brigade, so clearing the brigade as
a first step
makes definite sense.


It is not needed to clear the brigade, because the brigade passed to 
the filter is named in, the one where recall_body stores the cached 
file is bb. I the case of a recalled body we pass bb down the chain 
not in.


Ah, of course.

/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se  | [EMAIL PROTECTED]
---
 Air pollution is a mist demeanor.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=


  1   2   3   4   >