Re: [RFC] Apache::CacheContent - Caching PerlFixupHandler

2001-12-11 Thread Ask Bjoern Hansen

On Thu, 6 Dec 2001, Paul Lindner wrote:

  BTW -- I think where the docs are cached should be configurable.  I don't
  like the idea of the document root writable by the web process.
 
 That's the price you pay for this functionality.  Because we use
 Apache's native file serving code we need a url-directory mapping
 somewhere.

uh, why couldn't Apache::CacheContent just set 
$r-filename(/where/we/put/the/cache/$file) ?

If you add Bill's suggestion about caching on args, headers and
whatnot you would (on some filesystems) need something like that
anyway to make a hashed directory tree.


 - ask

-- 
ask bjoern hansen, http://ask.netcetera.dk/ !try; do();
more than a billion impressions per week, http://valueclick.com




Re: [RFC] Apache::CacheContent - Caching PerlFixupHandler

2001-12-11 Thread Paul Lindner

On Tue, Dec 11, 2001 at 01:50:52AM -0800, Ask Bjoern Hansen wrote:
 On Thu, 6 Dec 2001, Paul Lindner wrote:
 
   BTW -- I think where the docs are cached should be configurable.  I don't
   like the idea of the document root writable by the web process.
  
  That's the price you pay for this functionality.  Because we use
  Apache's native file serving code we need a url-directory mapping
  somewhere.
 
 uh, why couldn't Apache::CacheContent just set 
 $r-filename(/where/we/put/the/cache/$file) ?

Simplicity really.  This was an example in our upcoming book so I
didn't want to add a filename generator to the code, instead we use
Apache's url-file mapping mechanism.  Also this code was derived from
a 404 error handler that I wrote ages ago :)

I assume (since you suggested it) that you can set $r-filename to any
file in any directory without adding a Directory config?  I'll have
to see how this interacts with the built-in access control logic .

 If you add Bill's suggestion about caching on args, headers and
 whatnot you would (on some filesystems) need something like that
 anyway to make a hashed directory tree.

Right.  A more elaborate Apache::CacheContent would have a filename
hash function, and a separate cache directory structure along the
lines of Cache::FileCache.

I suppose that one could put the whole uri-cachefile mapping into a
custom PerlTransHandler and leave Apache::CacheContent as-is..

-- 
Paul Lindner   [EMAIL PROTECTED]| | | | |  |  |  |   |   |

mod_perl Developer's Cookbook   http://www.modperlcookbook.org
 Human Rights Declaration   http://www.unhchr.ch/udhr/index.htm



Re: [RFC] Apache::CacheContent - Caching PerlFixupHandler

2001-12-11 Thread DeWitt Clinton

On Tue, Dec 11, 2001 at 02:31:36AM -0800, Paul Lindner wrote:

 Right.  A more elaborate Apache::CacheContent would have a filename
 hash function, and a separate cache directory structure along the
 lines of Cache::FileCache.

Just curious -- any reason not to use Cache::Cache as the persistance
mechanism?  It was designed for exactly this scenario and could
provide a nice abstraction for the filesystem or shared memory, as
well as handle things like filename hashing and branching directories
(and namespaces, size limits, OS independance, taint checking, and
more).

-DeWitt




Re: [RFC] Apache::CacheContent - Caching PerlFixupHandler

2001-12-11 Thread Geoffrey Young

Paul Lindner wrote:
 
[snip]
 
 I suppose that one could put the whole uri-cachefile mapping into a
 custom PerlTransHandler and leave Apache::CacheContent as-is..

yeah, I think that we're starting to talk about two different
approaches now.  the cool thing about the current logic is that no
filename mapping has to take place making it rather fast - basically,
after a simple call to some cached stat() properties and you're done,
Apache's native translation mechanism has done all the work.  the
price you pay for that quick simplicity is stuff written to your
document root.  adding a URI-filename translation step adds overhead,
though it may be preferable to some.  it shouldn't be a requirement,
however.

one of the neat things is about this module is that it makes (pretty
creative) use of method handlers.  the base class comes with
disk_cache(), but memory_cache(), uri_cache(),
i_dont_want_the_file_in_my_docroot_cache(), or whatever can be added
either to the module proper or a subclass.  

so, maybe disk_cache() needs a better and less generic name, like
docroot_cache().

--Geoff



[RFC] Apache::CacheContent - Caching PerlFixupHandler

2001-12-06 Thread Paul Lindner

Hi,

I would like to propose a new Apache module before I send it off to
CPAN.  The name chosen is Apache::CacheContent.

It's pretty generic code, and is intended to be subclassed.  It
handles the gory details of caching a page to disk and serving it up
until it expires.  

It's derived from work done on the mod_perl Developer's Cookbook, so
it's already been reviewed by a number of people.

I've attached a README below.  To download it go to
http://www.modperlcookbook.org/code.html




NAME
Apache::CacheContent - PerlFixupHandler class that caches dynamic
content

SYNOPSIS
* Make your method handler a subclass of Apache::CacheContent
* allow your web server process to write into portions of your document
root.
* Add a ttl() subroutine (optional)
* Add directives to your httpd.conf that are similar to these:
  PerlModule MyHandler

  # dynamic url
  Location /dynamic
SetHandler perl-script
PerlHandler MyHandler-handler
  /Location

  # cached URL
  Location /cached
SetHandler perl-script
PerlFixupHandler MyHandler-disk_cache
PerlSetVar CacheTTL 120   # in minutes...
  /Location

DESCRIPTION
Note:  This code is derived from the *Cookbook::CacheContent*
   module, available as part of The mod_perl Developer's
   Cookbook

The Apache::CacheContent module implements a PerlFixupHandler that helps
you to write handlers that can automatically cache generated web pages
to disk. This is a definite performance win for sites that end up
generating the exact same content for many users.

The module is written to use Apache's built-in file handling routines to
efficiently serve data to clients. This means that your code will not
need to worry about HTTP/1.X, byte ranges, if-modified-since, HEAD
requests, etc. It works by writing files into your DocumentRoot, so be
sure that your web server process can write there.

To use this you MUST use mod_perl method handlers. This means that your
version of mod_perl must support method handlers (the argument
EVERYTHING=1 to the mod_perl build will do this). Next you'll need to
have a content-generating mod_perl handler. If isn't a method handler
modify the *handler* subroutine to read:

  sub handler ($$) {
my ($class, $r) = @_;


Next, make your handler a subclass of *Apache::CacheContent* by adding
an ISA entry:

  @MyHandler::ISA = qw(Apache::CacheContent);

You may need to modify your handler code to only look at the *uri* of
the request. Remember, the cached content is independent of any query
string or form elements.

After this is done, you can activate your handler. To use your handler
in a fully dyamic way configure it as a PerlHandler in your httpd.conf,
like this:

  PerlModule MyHandler
  Location /dynamic
SetHandler perl-script
PerlHandler MyHandler-handler
  /Location

So requests to *http://localhost/dynamic/foo.html* will call your
handler method directly. This is great for debugging and testing the
module. To activate the caching mechanism configure httpd.conf as
follows:

  PerlModule MyHandler
  Location /cached
SetHandler perl-script
PerlFixupHandler MyHandler-disk_cache
PerlSetVar CacheTTL 120  # in minutes..
  /Location

Now when you access URLs like *http://localhost/cached/foo.html* the
content will be generated and stored in the file
*DocumentRoot*/cached/foo.html. Subsequent request for the same URL will
return the cached content, depending on the *CacheTTL* setting.

For further customization you can write your own *ttl* function that can
dynamically change the caching time based on the current request.

AUTHOR
Paul Lindner [EMAIL PROTECTED], Geoffrey Young, Randy Kobes

SEE ALSO
The example mod_perl method handler the CacheWeather manpage.

The mod_perl Developer's Cookbook





-- 
Paul Lindner   [EMAIL PROTECTED]| | | | |  |  |  |   |   |

mod_perl Developer's Cookbook   http://www.modperlcookbook.org
 Human Rights Declaration   http://www.unhchr.ch/udhr/index.htm



Re: [RFC] Apache::CacheContent - Caching PerlFixupHandler

2001-12-06 Thread Perrin Harkins

 I would like to propose a new Apache module before I send it off to
 CPAN.  The name chosen is Apache::CacheContent.

This is very cool.  I was planning to write one of these, and now I don't
have to.  Your implementation is short and interesting.  I was planning to
do it with a PerlFixupHandler and an Apache::Filter module to capture the
output.  While that approach wouldn't require the use of method handlers, I
think yours may be easier for newbies because it doesn't require them to
understand as many modules.  The only real advantage of using Apache::Filter
is that it would work well with existing Registry scripts.

A couple of other C's for your R:

A cache defines parameters that constitute a unique request.  Your cache
currently only handles the filename from the request as a parameter.  It
would be nice to also handle query args, POST data, and arbitrary headers
like cookies or language choices.  You could even support an optional
request_keys method for handlers which would let people generate their own
unique key based on their analysis of the request.

Doing this would mean you would need to generate filenames based on the
unique keys (probably by hashing, as in Cache::FileCache) and do an internal
redirect to that file if available when someone sends a request that
matches.

Another thing that might be nice would be to store the TTL with the file
rather than making the handler give it to you again each time.  This is done
in mod_proxy by putting an Expires header in the file and reading it before
sending the file, but you could also store them in a dbm or something.
Support for sending Expires headers automatically would also be useful.

When I first thought about this problem, I wanted to do it the way Vignette
StoryServer does: by having people link to the cached files directly and
making the content generating code be the 404 handler for those files.  That
gives the best possible performance for cached files, since no
PerlFixupHandler needs to run.  The downside is that then you need an
external process to go through and clean up expired files.  It's also hard
to handle complex cache criteria like query args.  StoryServer does it by
having really crazy generated file names and processing all the links to
files on the way out so that they use the cached file names.  Pretty ugly.

I know you guys are pushing to get the book done, so don't feel pressured to
address this stuff now.  I think the current module looks more than good
enough for an initial CPAN release.

- Perrin




Re: [RFC] Apache::CacheContent - Caching PerlFixupHandler

2001-12-06 Thread Bill Moseley

At 08:19 AM 12/06/01 -0800, Paul Lindner wrote:

Ok, hit me over the head.  Why wouldn't you want to use a caching proxy?

BTW -- I think where the docs are cached should be configurable.  I don't
like the idea of the document root writable by the web process.



Bill Moseley
mailto:[EMAIL PROTECTED]



Re: [RFC] Apache::CacheContent - Caching PerlFixupHandler

2001-12-06 Thread Tatsuhiko Miyagawa

On Thu, 06 Dec 2001 10:04:26 -0800
Bill Moseley [EMAIL PROTECTED] wrote:

 BTW -- I think where the docs are cached should be configurable.  I don't
 like the idea of the document root writable by the web process.

Maybe:

  Alias /cached /tmp/cache


--
Tatsuhiko Miyagawa [EMAIL PROTECTED]




Re: [RFC] Apache::CacheContent - Caching PerlFixupHandler

2001-12-06 Thread Tatsuhiko Miyagawa

On Thu, 6 Dec 2001 08:19:09 -0800
Paul Lindner [EMAIL PROTECTED] wrote:

 I've attached a README below.  To download it go to
 http://www.modperlcookbook.org/code.html

Nice one. here's a patch to make the sample code work :)


--- CacheContent.pm~Thu Dec  6 22:11:35 2001
+++ CacheContent.pm Fri Dec  7 03:23:39 2001
@@ -6,6 +6,7 @@
 @Apache::CacheContent::ISA = qw(Apache);

 use Apache;
+use Apache::Log;
 use Apache::Constants qw(OK SERVER_ERROR DECLINED);
 use Apache::File ();

--- eg/CacheWeather.pm~ Thu Dec  6 08:10:09 2001
+++ eg/CacheWeather.pm  Fri Dec  7 03:24:14 2001
@@ -8,7 +8,7 @@

 use strict;

-@CacheWeather::ISA = qw(Cookbook::CacheContent);
+@CacheWeather::ISA = qw(Apache::CacheContent);

 sub ttl {
   my($self, $r) = @_;


--
Tatsuhiko Miyagawa [EMAIL PROTECTED]




Re: [RFC] Apache::CacheContent - Caching PerlFixupHandler

2001-12-06 Thread Matt Sergeant

On Thu, 6 Dec 2001, Paul Lindner wrote:

 On Thu, Dec 06, 2001 at 10:04:26AM -0800, Bill Moseley wrote:
  At 08:19 AM 12/06/01 -0800, Paul Lindner wrote:
 
  Ok, hit me over the head.  Why wouldn't you want to use a caching proxy?

 Apache::CacheContent gives you more control over the caching process
 and keeps the expiration headers from leaking to the browser.  Or
 maybe you want to dynamically control the TTL?

 sub ttl {
   ...
   if ($load_avg  5) {
  return 60 * 5;
   } else {
  return 60;
   }
 }

While a ttl might be useful to some projects, others I'm sure would prefer
a per-hit checking, so you can say Yes, this thing has changed now.

Just a thought.

-- 
Matt/

/||** Founder and CTO  **  **   http://axkit.com/ **
   //||**  AxKit.com Ltd   **  ** XML Application Serving **
  // ||** http://axkit.org **  ** XSLT, XPathScript, XSP  **
 // \\| // ** mod_perl news and resources: http://take23.org  **
 \\//
 //\\
//  \\




Re: [RFC] Apache::CacheContent - Caching PerlFixupHandler

2001-12-06 Thread Bill Moseley

At 10:33 AM 12/06/01 -0800, Paul Lindner wrote:
On Thu, Dec 06, 2001 at 10:04:26AM -0800, Bill Moseley wrote:
 At 08:19 AM 12/06/01 -0800, Paul Lindner wrote:
 
 Ok, hit me over the head.  Why wouldn't you want to use a caching proxy?

Apache::CacheContent gives you more control over the caching process
and keeps the expiration headers from leaking to the browser.

Ok, I see.

Or maybe you want to dynamically control the TTL?

Would you still use it with a front-end lightweight server?  Even with
caching, a mod_perl server is still used to send the cached file (possibly
over 56K modem), right?



Bill Moseley
mailto:[EMAIL PROTECTED]



Re: [RFC] Apache::CacheContent - Caching PerlFixupHandler

2001-12-06 Thread Paul Lindner

On Thu, Dec 06, 2001 at 10:47:35AM -0800, Bill Moseley wrote:
  Ok, hit me over the head.  Why wouldn't you want to use a caching proxy?
 
 Apache::CacheContent gives you more control over the caching process
 and keeps the expiration headers from leaking to the browser.
 
 Ok, I see.
 
 Or maybe you want to dynamically control the TTL?
 
 Would you still use it with a front-end lightweight server?  Even with
 caching, a mod_perl server is still used to send the cached file (possibly
 over 56K modem), right?

You definitely want a proxy-cache in front of your mod_perl server.

One thing that I like about this module is that you can control the
server-side caching of content separate from the client/browser cache.

So, on to the RFC.  Is the name acceptable for Apache::*

I will endeavor to add any functionality that makes it worthy :)

For example, I think adding a virtual method that generates the
filename might be useful.

-- 
Paul Lindner   [EMAIL PROTECTED]| | | | |  |  |  |   |   |

mod_perl Developer's Cookbook   http://www.modperlcookbook.org
 Human Rights Declaration   http://www.unhchr.ch/udhr/index.htm



Re: [RFC] Apache::CacheContent - Caching PerlFixupHandler

2001-12-06 Thread Igor Sysoev

On Thu, 6 Dec 2001, Paul Lindner wrote:

 On Thu, Dec 06, 2001 at 10:04:26AM -0800, Bill Moseley wrote:
  At 08:19 AM 12/06/01 -0800, Paul Lindner wrote:
  
  Ok, hit me over the head.  Why wouldn't you want to use a caching proxy?
 
 Apache::CacheContent gives you more control over the caching process
 and keeps the expiration headers from leaking to the browser.  Or
 maybe you want to dynamically control the TTL?

You can use mod_accel to cache flexible backend:
ftp://ftp.lexa.ru/pub/apache-rus/contrib/mod_accel-1.0.7.tar.gz

mod_accel understands standard Expires and Cache-Control headers
and special X-Accel-Expires header (it is not sent to client).
Besides it allows to ignore Expires and Cache-Control headers
from backend and set expiration by AccelDefaultExpire directive.

Comparing to mod_proxy mod_accel reads backend response
and closes connection to backend as soon as possible.
There is no 2-second backend lingering close timeout
with big answers and slow clients. Big answer means bigger then frontend
kernel TCP-send buffer - 16K in FreeBSD and 64K in Linux by default.
Besides mod_accel read whole POST body before connecting to backend.

mod_accel can ignore client's Pragma: no-cache,
Cache-Control: no-cache and even Authorization headers.
mod_accel allow to not pass to backend some URLs.
mod_accel allow to tune various buffer size and timeouts.
mod_accel can cache responses with cookie-depended content.
mod_accel can use busy locks and can limit number of connection to backend.
mod_accel allows simple fault-tolerance with DNS-balanced backends.
mod_accel logs various information about request processing.
mod_accel can invalidate cache on per-URL basis.

mod_accel has two drawbacks only - too much memory per connection
(inherited Apache drawback) and Russian only documentation.

Igor Sysoev




Re: [RFC] Apache::CacheContent - Caching PerlFixupHandler

2001-12-06 Thread Andrew Ho

Hello,

PLThat's the price you pay for this functionality.  Because we use
PLApache's native file serving code we need a url-directory mapping
PLsomewhere.
PL
PLOf course you don't need to make the entire docroot writable, just the
PLdirectory corresponding to your script.

Apologies if this is obvious--I haven't downloaded and tried this module
yet. But would it not be possible to specify a separate directory
altogether and make it serveable (Directory ... ... Allow from all ...)?
If so perhaps it'd be easy to add this as a configurable parameter.

In general it is a fine idea to not make the DocumentRoot writeable by the
web user. In fact, I believe it is a good policy that the web user should
be able to write only to a small subset of controlled locations.

Humbly,

Andrew

--
Andrew Ho   http://www.tellme.com/   [EMAIL PROTECTED]
Engineer   [EMAIL PROTECTED]  Voice 650-930-9062
Tellme Networks, Inc.   1-800-555-TELLFax 650-930-9101
--




Re: [RFC] Apache::CacheContent - Caching PerlFixupHandler

2001-12-06 Thread Paul Lindner

On Thu, Dec 06, 2001 at 12:55:25PM -0800, Andrew Ho wrote:
 Hello,
 
 PLThat's the price you pay for this functionality.  Because we use
 PLApache's native file serving code we need a url-directory mapping
 PLsomewhere.
 PL
 PLOf course you don't need to make the entire docroot writable, just the
 PLdirectory corresponding to your script.
 
 Apologies if this is obvious--I haven't downloaded and tried this module
 yet. But would it not be possible to specify a separate directory
 altogether and make it serveable (Directory ... ... Allow from all ...)?
 If so perhaps it'd be easy to add this as a configurable parameter.

Yes, you can do this using the regular Apache directives:

# mkdir /var/cache/www/mydir
# chown apache /var/cache/www/mydir
# vi /etc/httpd/conf/httpd.conf


Directory /var/cache/www/mydir
AllowOverride None
Order allow,deny
Allow from all
/Directory

Alias /mydir/ /var/cache/www/mydir/

 In general it is a fine idea to not make the DocumentRoot writeable by the
 web user. In fact, I believe it is a good policy that the web user should
 be able to write only to a small subset of controlled locations.

Yes, I agree totally!  I'll add some warning to the docs to make sure
that people do not inadvertently misconfigure their servers..

-- 
Paul Lindner   [EMAIL PROTECTED]| | | | |  |  |  |   |   |

mod_perl Developer's Cookbook   http://www.modperlcookbook.org
 Human Rights Declaration   http://www.unhchr.ch/udhr/index.htm