On 2014-08-23 12:36, Graham Leggett wrote:
On 23 Aug 2014, at 3:40 PM, Mark Montague <[email protected]> wrote:

AH00526: Syntax error on line 148 of /etc/httpd/conf/dev.catseye.org.conf: CacheEnable cannot occur within <If> section
The solution here is to lift the restriction above. Having a generic mechanism 
to handle conditional behaviour, and then having a special case to handle the 
same behaviour in a different way is wrong way to go.

I assumed this would be OK because the Header directive has a similar expr=expression clause.

But, I'll look into whether if restriction on If could be removed. If I rewrite things to use the If directive, do you see bypass functionality as something worth including? I ask because from your points below I get the impression that the answer is "no".


The proposed enhancement is about the server deciding when to serve items from 
the cache.  Although the client can specify a Cache-Control request header in 
order to bypass the server's cache, there is no good way for a web application 
to signal to a client when it should do this (for example., when a login cookie 
is set). The behavior of other caches is controlled using the Cache-Control 
response header.
There is - use “Cache-Control: private”. This will tell all public caches, 
including mod_cache and ISP caches, not to cache content with cookies attached, 
while at the same time telling browser caches that they should.

The problem is not whether the content should be cached: it should. The problem is, to which clients should the cached content be served? If the client's request does not contain a login cookie, that client should get the cached copy. If the client's request does contain a login cookie, the cache should be bypassed and the client should get a copy of the resource generated specifically for it.

"Cache-Control: private" cannot be used in a request, only in a response, where it works as you said. The problem is that the first request for a given resource where the client includes a login cookie gets intercepted by mod_cache and served from the cache (if you assume that other clients without login cookies have already requested it). There must therefore be some way to tell mod_cache that this client needs something different. One way to do this would be by having different URL paths for logged in versus non-logged in users, but this is awkward, user-visible, and may not be feasible with all web application.


> - Back-end sets response header "Cache-Control: max-age=0, s-maxage=14400" so 
that mod_cache
> caches the response, but ISP caches and browser caches do not.  (mod_cache 
removes s-maxage
> and does not pass it upstream).
mod_cache shouldn’t remove any Cache-Control headers.

It apparently does, although I haven't found where in the code yet. I would be interested to see if anyone can reproduce my experience. As far as I know, I don't have any configuration that would result in this.

httpd 2.4.10 with mod_proxy_fcgi (Fedora 19 build)
PHP 5.5.5 with PHP-FPM

Relevant configuration:

CacheEnable disk /
CacheDefaultExpire 86400
CacheIgnoreHeaders Set-Cookie
CacheHeader on
CacheDetailHeader on
    # We'll be paying attention to "Cache-Control: s-maxage=xxx" for all
    # of our caching decisions.  The browser will use max-age=yyy for its
    # decisions.  So we drop the Expires header. See the following page
    # from Google which says, "It is redundant to specify both Expires and
    # Cache-Control: max-age"
    # https://developers.google.com/speed/docs/best-practices/caching?hl=sv
Header unset Expires
RewriteRule ^(.*\.php)$ fcgi://127.0.0.1:9001/www/dev.catseye.org/content/$1 [P,L]

File test.php, containing:

<?php
  header( "Cache-Control: max-age=0, s-maxage=14400" );
  header( "Content-type: text/html" );
?>
<html><body>Hello!</body></html>

Browser transaction for https://dev.catseye.org/test.php:

GET /test.php HTTP/1.1
Host: dev.catseye.org
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:31.0) Gecko/20100101 Firefox/31.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
DNT: 1
Connection: keep-alive

HTTP/1.1 200 OK
Date: Sat, 23 Aug 2014 20:11:00 GMT
Server: Apache/2.4
Cache-Control: max-age=0
X-Cache: MISS from dev.catseye.org
X-Cache-Detail: "cache miss: attempting entity save" from dev.catseye.org
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html;charset=UTF-8

And mod_cache definitely receives s-maxage from the backend:

[root@sky cache]# cat ./J/k/WPiKG0bwW@R_H4YvSOdw.header
(binary data omitted)https://dev.catseye.org:443/test.php?Cache-Control: max-age=0
Cache-Control: max-age=0, s-maxage=14400
Content-Security-Policy: default-src 'self'; script-src 'self' 'unsafe-inline' 'unsafe-eval'; style-src 'self' 'unsafe-inline'; img-src 'self' data: ; font-src 'self' data: ; report-uri /csp-report.php
Content-Type: text/html;charset=UTF-8

Host: dev.catseye.org
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:31.0) Gecko/20100101 Firefox/31.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
DNT: 1

[root@sky cache]# cat ./J/k/WPiKG0bwW@R_H4YvSOdw.data
<html><body>Hello!</body></html>
[root@sky cache]#


- When back-end content changes (e.g., an author makes an update), the back-end invokes 
"htcacheclean /path/to/resource" to invalidate the cached page so that it is 
regenerated the next time a client requests it.
Set your max-age correctly and this becomes unnecessary. If you have long lived 
resources that you want caching for a very long time, and you want to change 
that resource, place the version number of the resource in the URL and refer to 
the new URL after the change.

This is fine for JavaScript, CSS files, and images, but I'd rather have users see nice, human-friendly URLs in their browsers location bar, like

https://example.com/latest-news

Rather than

https://example.com/latest-news?20140823T164300

...and I certainly don't want them bookmarking the latter one.


- Clients have multiple cookies set.  Tracking cookies and cookies used by 
JavaScript should not cause a mod_cache miss.
- Dynamic pages that are generated when a login cookie is set should not be cached.  
This is accomplished by the back-end setting the response header 
"Cache-Control: max-age=0”.
This is incorrect, max-age=0 means that a cache is welcome to cache the 
content, but the content must be declared stale immediately and revalidated.

I checked the code and what is actually getting set for all pages dynamically generated for logged-in users is:

Cache-Control: no-cache, must-revalidate, max-age=0

I apologize for being sloppy and not verifying this before sending my previous reply.

- However, when a login cookie is set, dynamic pages that are currently cached 
should not be served to the client with the login cookie, while they should 
still be served to all other clients.
All of the above is handled by HTTP already, just follow the protocol.

Make sure you separate your cacheable content from your uncacheable content. 
Ensure that you use HTTP conditional requests so that expensive calls can be 
made cheap. Properly declare the request headers you vary on using the Vary 
header, but keep in mind that headers with many variations will DoS a cache. 
Cache long-lived content and change the URL if the content is updated. Use 
max-age (and s-maxage) on short lived content to make the generation of it 
cheap.

The only thing I see above that will actually help is having separate URL paths for cachable and non-cachable content, but I'd have to hack that in using mod_rewrite (since I'm limited to the scope of changes I can make to the code of 3rd party web applications). I'd prefer to avoid having logged in and non-logged in users seeing different URLs in their browser location bars.

Thanks for all of your replies!

--
  Mark Montague
  [email protected]

Reply via email to