Re: Determining when a cached item is out of date

2003-01-17 Thread Honza Pazdziora
On Thu, Jan 16, 2003 at 04:40:07PM -0600, Christopher L. Everett wrote:
> 
> But again, there is the issue of mapping changed data onto dependent
> pages.  I guess one way to do that is to track which database rows
> appear in which pages in the database.  Since typically I do several
> database operations to generate a page, adding one more delete or
> insert operation whanever a new page is generated won't kill me.
> Could get nasty in a big hurry if I'm not careful though.  Perhaps
> a cache manager object/class that handles cache mappings & invalidation
> would be handy.  Or maybe do that as part the PageKit base Model class.

It all depends on the ration between updates and selects, and the
number of distincts selects.

New versions of MySQL have support (by default, it is disabled,
I believe) for caching the results of selects. So if the ratio is
good, you might just use the MySQL cache and be done with it. Of
course, if your select set is so wide that it is not reasonably
cachable, you are out of luck with this route.

Yours,

-- 

 Honza Pazdziora | [EMAIL PROTECTED] | http://www.fi.muni.cz/~adelton/
  ... all of these signs saying sorry but we're closed ...




Re: Determining when a cached item is out of date

2003-01-16 Thread Perrin Harkins
Christopher L. Everett wrote:

I see where one could combine polling and invalidation, for instance
by having empty files representing a page that get touched when the
data for them go out of date.


More commonly you would combine TTL with invalidation.  You use 
invalidation for the simple stuff, where people need to make instant 
updates they can see, and you use TTL to catch everything else.

But again, there is the issue of mapping changed data onto dependent
pages.


Tracking dependencies gets difficult quickly, and that's why almost no 
one does it.  TTL is very efficient, if you can live with data being out 
of sync for a little while.

- Perrin



Re: Determining when a cached item is out of date

2003-01-16 Thread Christopher L. Everett
Perrin Harkins wrote:

Christopher L. Everett wrote:


But I haven't
been able to wrap my skull around knowing when the data in Mysql is
fresher than what is in the cache without doing a major portion of the
work needed to generate that web page to begin with.


There are three ways to handle cache synchronization:

1) Time to Live (TTL).  This approach just keeps the data cached for a 
certain amount of time and ignores possible updates.  This is the most 
popular because it is easy to implement and gives good performance. 
Cache::Cache and friends work this way.

I'm cursed by my installed base.  Our users go into our site to "make
sure" their changes are up correctly.  I don't think a 15 second TTL
would do us any good :)


2) Polling.  This involves checking the freshness of the data before 
serving it from cache.  This is only feasible if you have a way to check 
freshness that is faster than re-generating the data.  This is difficult 
in most situations.

3) Invalidation.  This approach involves removing cache entried whenever 
you update something that would make them out of date.  This is only 
feasible if you have total control over the update mechanism and can 
calculate all the dependencies quickly.

I see where one could combine polling and invalidation, for instance
by having empty files representing a page that get touched when the
data for them go out of date.

But again, there is the issue of mapping changed data onto dependent
pages.  I guess one way to do that is to track which database rows
appear in which pages in the database.  Since typically I do several
database operations to generate a page, adding one more delete or
insert operation whanever a new page is generated won't kill me.
Could get nasty in a big hurry if I'm not careful though.  Perhaps
a cache manager object/class that handles cache mappings & invalidation
would be handy.  Or maybe do that as part the PageKit base Model class.


One more thing.  Perrin Harkins' eToys case study casually mentions a
a means of removing files from the mod_proxy cache directory so that
mod_proxy had to go back to the application servers to get an up to
date copy.  I haven't seen anything in the mod_proxy docs that says
this is possible.  Does something like that exist outside of eToys?


Not in mod_proxy.  We added it ourselves.  I don't have the code for 
that anymore, but it's not hard to do if you have a competent C hacker 
handy.  Maybe mod_accel has this feature.

Well, I like to think I'm language independent, heh.  But reinventing
the wheel isn't cheap.  I'll root around some more.

--
Christopher L. Everett
Chief Technology Officer
The Medical Banner Exchange
Physicians Employment on the Internet




Re: Determining when a cached item is out of date

2003-01-16 Thread Perrin Harkins
Christopher L. Everett wrote:

But I haven't
been able to wrap my skull around knowing when the data in Mysql is
fresher than what is in the cache without doing a major portion of the
work needed to generate that web page to begin with.


There are three ways to handle cache synchronization:

1) Time to Live (TTL).  This approach just keeps the data cached for a 
certain amount of time and ignores possible updates.  This is the most 
popular because it is easy to implement and gives good performance. 
Cache::Cache and friends work this way.

2) Polling.  This involves checking the freshness of the data before 
serving it from cache.  This is only feasible if you have a way to check 
freshness that is faster than re-generating the data.  This is difficult 
in most situations.

3) Invalidation.  This approach involves removing cache entried whenever 
you update something that would make them out of date.  This is only 
feasible if you have total control over the update mechanism and can 
calculate all the dependencies quickly.

One more thing.  Perrin Harkins' eToys case study casually mentions a
a means of removing files from the mod_proxy cache directory so that
mod_proxy had to go back to the application servers to get an up to
date copy.  I haven't seen anything in the mod_proxy docs that says
this is possible.  Does something like that exist outside of eToys?


Not in mod_proxy.  We added it ourselves.  I don't have the code for 
that anymore, but it's not hard to do if you have a competent C hacker 
handy.  Maybe mod_accel has this feature.

- Perrin



Re: Determining when a cached item is out of date

2003-01-16 Thread Ed
On Thu, Jan 16, 2003 at 06:33:52PM +0100, Honza Pazdziora wrote:
> On Thu, Jan 16, 2003 at 06:05:30AM -0600, Christopher L. Everett wrote:
> > 
> > Do AxKit and PageKit pay such close attention to caching because XML
> > processing is so deadly slow that one doesn't have a hope of reasonable
> > response times on a fast but lightly loaded server otherwise?  Or is
> > it because even a fast server would quickly be on its knees under
> > anything more than a light load?
> 
> It really pays off to do any steps that will increase the throughput.
> And AxKit is well suited for caching because it has clear layers and
> interfaces between them. So I see AxKit doing caching not only to get
> the performance, but also "just because it can". You cannot do the
> caching easily with more dirty approaches.
> 
> > With a MVC type architecture, would it make sense to have the Model
> > objects maintain the XML related to the content I want to serve as
> > static files so that a simple stat of the appropriate XML file tells
> > me if my cached HTML document is out of date?
> 
> Well, AxKit uses filesystem cache, doesn't it?
> 
> It really depends on how much precission you need to achieve. If you
> run a website that lists cinema programs, it's just fine that your
> public will see the updated pages after five minutes, not immediatelly
> after they were changed by the data manager. Then you can really go
> with simply timing out the items in the cache.
> 
> If you need to do something more real-time, you might prefer the push
> approach of MVC (because pull involves too much processing anyway, as
> you have said), and then you have a small problem with MySQL. As it
> lacks trigger support, you will have to send the push invalidation
> from you applications. Which might or might not be a problem, it
> depends on how many of them you have.

I have pages that update as often as 15 seconds.  I just use mtime() and
has_changed() properly in my custom provider Provider.pm's or rely on
the File::Provider's checking the stat of the xml files.  Mostly users are
getting cached files.

For xsp's that are no_cache(1), the code that generates the inforation that
gets sent throught the taglib does its own caching.  Just as if it were a
plain mod_perl handler.  they use IPC::MM and Cache::Cache (usually filecache)

I've fooled w/ having the cache use different databases but finally decided it
didn't make much of a difference since the os and disk can be tuned effectively.
The standard rules apply: put the cache on its own disk spindle, ie. not on 
the same physical disk as your sql database etc.  Makes a big difference ...
you can see w/ vmstat, systat etc.

The only trouble is cleaning up the ever growing stale cache.  So, I use this
simple script in my /etc/daily.local file, or a guy could use cron.

Its similar to what's openbsd uses for its cleaning of /tmp,/var/tmp in the
/etc/daily script.

Ed.

# cat /etc/clean_www.conf
CLEAN_WWW_DIRS="/u4/www/cache /var/www/temp"

# cat /usr/local/sbin/clean_www
#!/bin/sh -
# $Id: clean_www.sh,v 1.2 2003/01/03 00:18:27 entropic Exp $

: ${CLEAN_WWW_CONF:=/etc/clean_www.conf}

clean_dir() {
dir=$1
echo "Removing scratch and junk files from '$dir':"
if [ -d $dir -a ! -L $dir ]; then
cd $dir && {
find . ! -name . -atime +1 -execdir rm -f -- {} \;
find . ! -name . -type d -mtime +1 -execdir rmdir -- {} \; \
>/dev/null 2>&1; }
fi
}

if [ -f $CLEAN_WWW_CONF ]; then
. $CLEAN_WWW_CONF
fi

if [ "X${CLEAN_WWW_CONF}" != X"" ]; then
echo ""
for cfg_dir in $CLEAN_WWW_DIRS; do
clean_dir "${cfg_dir}";
done
fi






Re: Determining when a cached item is out of date

2003-01-16 Thread Honza Pazdziora
On Thu, Jan 16, 2003 at 06:05:30AM -0600, Christopher L. Everett wrote:
> 
> Do AxKit and PageKit pay such close attention to caching because XML
> processing is so deadly slow that one doesn't have a hope of reasonable
> response times on a fast but lightly loaded server otherwise?  Or is
> it because even a fast server would quickly be on its knees under
> anything more than a light load?

It really pays off to do any steps that will increase the throughput.
And AxKit is well suited for caching because it has clear layers and
interfaces between them. So I see AxKit doing caching not only to get
the performance, but also "just because it can". You cannot do the
caching easily with more dirty approaches.

> With a MVC type architecture, would it make sense to have the Model
> objects maintain the XML related to the content I want to serve as
> static files so that a simple stat of the appropriate XML file tells
> me if my cached HTML document is out of date?

Well, AxKit uses filesystem cache, doesn't it?

It really depends on how much precission you need to achieve. If you
run a website that lists cinema programs, it's just fine that your
public will see the updated pages after five minutes, not immediatelly
after they were changed by the data manager. Then you can really go
with simply timing out the items in the cache.

If you need to do something more real-time, you might prefer the push
approach of MVC (because pull involves too much processing anyway, as
you have said), and then you have a small problem with MySQL. As it
lacks trigger support, you will have to send the push invalidation
from you applications. Which might or might not be a problem, it
depends on how many of them you have.

-- 

 Honza Pazdziora | [EMAIL PROTECTED] | http://www.fi.muni.cz/~adelton/
  ... all of these signs saying sorry but we're closed ...




Determining when a cached item is out of date

2003-01-16 Thread Christopher L. Everett
I'm moving into the XML space and one of the things I see is that XML
processing is very expensive, so AxKit, PageKit, et al make extensive
use of caching.  I'm keeping all of my data in a MySQL DB with about
40 tables.  I'm pretty clear about how to turn that MySQL data into
XML and turn the XML into HTML, WML, or what have you.  But I haven't
been able to wrap my skull around knowing when the data in Mysql is
fresher than what is in the cache without doing a major portion of the
work needed to generate that web page to begin with.

Do AxKit and PageKit pay such close attention to caching because XML
processing is so deadly slow that one doesn't have a hope of reasonable
response times on a fast but lightly loaded server otherwise?  Or is
it because even a fast server would quickly be on its knees under
anything more than a light load?

With a MVC type architecture, would it make sense to have the Model
objects maintain the XML related to the content I want to serve as
static files so that a simple stat of the appropriate XML file tells
me if my cached HTML document is out of date?

One more thing.  Perrin Harkins' eToys case study casually mentions a
a means of removing files from the mod_proxy cache directory so that
mod_proxy had to go back to the application servers to get an up to
date copy.  I haven't seen anything in the mod_proxy docs that says
this is possible.  Does something like that exist outside of eToys?

I don't know, maybe my Prussian Perfection gene has taken over again
and wants a bigger win than I need to get ...

--
Christopher L. Everett
Chief Technology Officer
The Medical Banner Exchange
Physicians Employment on the Internet