Re: Determining when a cached item is out of date
On Thu, Jan 16, 2003 at 04:40:07PM -0600, Christopher L. Everett wrote: > > But again, there is the issue of mapping changed data onto dependent > pages. I guess one way to do that is to track which database rows > appear in which pages in the database. Since typically I do several > database operations to generate a page, adding one more delete or > insert operation whanever a new page is generated won't kill me. > Could get nasty in a big hurry if I'm not careful though. Perhaps > a cache manager object/class that handles cache mappings & invalidation > would be handy. Or maybe do that as part the PageKit base Model class. It all depends on the ration between updates and selects, and the number of distincts selects. New versions of MySQL have support (by default, it is disabled, I believe) for caching the results of selects. So if the ratio is good, you might just use the MySQL cache and be done with it. Of course, if your select set is so wide that it is not reasonably cachable, you are out of luck with this route. Yours, -- Honza Pazdziora | [EMAIL PROTECTED] | http://www.fi.muni.cz/~adelton/ ... all of these signs saying sorry but we're closed ...
Re: Determining when a cached item is out of date
Christopher L. Everett wrote: I see where one could combine polling and invalidation, for instance by having empty files representing a page that get touched when the data for them go out of date. More commonly you would combine TTL with invalidation. You use invalidation for the simple stuff, where people need to make instant updates they can see, and you use TTL to catch everything else. But again, there is the issue of mapping changed data onto dependent pages. Tracking dependencies gets difficult quickly, and that's why almost no one does it. TTL is very efficient, if you can live with data being out of sync for a little while. - Perrin
Re: Determining when a cached item is out of date
Perrin Harkins wrote: Christopher L. Everett wrote: But I haven't been able to wrap my skull around knowing when the data in Mysql is fresher than what is in the cache without doing a major portion of the work needed to generate that web page to begin with. There are three ways to handle cache synchronization: 1) Time to Live (TTL). This approach just keeps the data cached for a certain amount of time and ignores possible updates. This is the most popular because it is easy to implement and gives good performance. Cache::Cache and friends work this way. I'm cursed by my installed base. Our users go into our site to "make sure" their changes are up correctly. I don't think a 15 second TTL would do us any good :) 2) Polling. This involves checking the freshness of the data before serving it from cache. This is only feasible if you have a way to check freshness that is faster than re-generating the data. This is difficult in most situations. 3) Invalidation. This approach involves removing cache entried whenever you update something that would make them out of date. This is only feasible if you have total control over the update mechanism and can calculate all the dependencies quickly. I see where one could combine polling and invalidation, for instance by having empty files representing a page that get touched when the data for them go out of date. But again, there is the issue of mapping changed data onto dependent pages. I guess one way to do that is to track which database rows appear in which pages in the database. Since typically I do several database operations to generate a page, adding one more delete or insert operation whanever a new page is generated won't kill me. Could get nasty in a big hurry if I'm not careful though. Perhaps a cache manager object/class that handles cache mappings & invalidation would be handy. Or maybe do that as part the PageKit base Model class. One more thing. Perrin Harkins' eToys case study casually mentions a a means of removing files from the mod_proxy cache directory so that mod_proxy had to go back to the application servers to get an up to date copy. I haven't seen anything in the mod_proxy docs that says this is possible. Does something like that exist outside of eToys? Not in mod_proxy. We added it ourselves. I don't have the code for that anymore, but it's not hard to do if you have a competent C hacker handy. Maybe mod_accel has this feature. Well, I like to think I'm language independent, heh. But reinventing the wheel isn't cheap. I'll root around some more. -- Christopher L. Everett Chief Technology Officer The Medical Banner Exchange Physicians Employment on the Internet
Re: Determining when a cached item is out of date
Christopher L. Everett wrote: But I haven't been able to wrap my skull around knowing when the data in Mysql is fresher than what is in the cache without doing a major portion of the work needed to generate that web page to begin with. There are three ways to handle cache synchronization: 1) Time to Live (TTL). This approach just keeps the data cached for a certain amount of time and ignores possible updates. This is the most popular because it is easy to implement and gives good performance. Cache::Cache and friends work this way. 2) Polling. This involves checking the freshness of the data before serving it from cache. This is only feasible if you have a way to check freshness that is faster than re-generating the data. This is difficult in most situations. 3) Invalidation. This approach involves removing cache entried whenever you update something that would make them out of date. This is only feasible if you have total control over the update mechanism and can calculate all the dependencies quickly. One more thing. Perrin Harkins' eToys case study casually mentions a a means of removing files from the mod_proxy cache directory so that mod_proxy had to go back to the application servers to get an up to date copy. I haven't seen anything in the mod_proxy docs that says this is possible. Does something like that exist outside of eToys? Not in mod_proxy. We added it ourselves. I don't have the code for that anymore, but it's not hard to do if you have a competent C hacker handy. Maybe mod_accel has this feature. - Perrin
Re: Determining when a cached item is out of date
On Thu, Jan 16, 2003 at 06:33:52PM +0100, Honza Pazdziora wrote: > On Thu, Jan 16, 2003 at 06:05:30AM -0600, Christopher L. Everett wrote: > > > > Do AxKit and PageKit pay such close attention to caching because XML > > processing is so deadly slow that one doesn't have a hope of reasonable > > response times on a fast but lightly loaded server otherwise? Or is > > it because even a fast server would quickly be on its knees under > > anything more than a light load? > > It really pays off to do any steps that will increase the throughput. > And AxKit is well suited for caching because it has clear layers and > interfaces between them. So I see AxKit doing caching not only to get > the performance, but also "just because it can". You cannot do the > caching easily with more dirty approaches. > > > With a MVC type architecture, would it make sense to have the Model > > objects maintain the XML related to the content I want to serve as > > static files so that a simple stat of the appropriate XML file tells > > me if my cached HTML document is out of date? > > Well, AxKit uses filesystem cache, doesn't it? > > It really depends on how much precission you need to achieve. If you > run a website that lists cinema programs, it's just fine that your > public will see the updated pages after five minutes, not immediatelly > after they were changed by the data manager. Then you can really go > with simply timing out the items in the cache. > > If you need to do something more real-time, you might prefer the push > approach of MVC (because pull involves too much processing anyway, as > you have said), and then you have a small problem with MySQL. As it > lacks trigger support, you will have to send the push invalidation > from you applications. Which might or might not be a problem, it > depends on how many of them you have. I have pages that update as often as 15 seconds. I just use mtime() and has_changed() properly in my custom provider Provider.pm's or rely on the File::Provider's checking the stat of the xml files. Mostly users are getting cached files. For xsp's that are no_cache(1), the code that generates the inforation that gets sent throught the taglib does its own caching. Just as if it were a plain mod_perl handler. they use IPC::MM and Cache::Cache (usually filecache) I've fooled w/ having the cache use different databases but finally decided it didn't make much of a difference since the os and disk can be tuned effectively. The standard rules apply: put the cache on its own disk spindle, ie. not on the same physical disk as your sql database etc. Makes a big difference ... you can see w/ vmstat, systat etc. The only trouble is cleaning up the ever growing stale cache. So, I use this simple script in my /etc/daily.local file, or a guy could use cron. Its similar to what's openbsd uses for its cleaning of /tmp,/var/tmp in the /etc/daily script. Ed. # cat /etc/clean_www.conf CLEAN_WWW_DIRS="/u4/www/cache /var/www/temp" # cat /usr/local/sbin/clean_www #!/bin/sh - # $Id: clean_www.sh,v 1.2 2003/01/03 00:18:27 entropic Exp $ : ${CLEAN_WWW_CONF:=/etc/clean_www.conf} clean_dir() { dir=$1 echo "Removing scratch and junk files from '$dir':" if [ -d $dir -a ! -L $dir ]; then cd $dir && { find . ! -name . -atime +1 -execdir rm -f -- {} \; find . ! -name . -type d -mtime +1 -execdir rmdir -- {} \; \ >/dev/null 2>&1; } fi } if [ -f $CLEAN_WWW_CONF ]; then . $CLEAN_WWW_CONF fi if [ "X${CLEAN_WWW_CONF}" != X"" ]; then echo "" for cfg_dir in $CLEAN_WWW_DIRS; do clean_dir "${cfg_dir}"; done fi
Re: Determining when a cached item is out of date
On Thu, Jan 16, 2003 at 06:05:30AM -0600, Christopher L. Everett wrote: > > Do AxKit and PageKit pay such close attention to caching because XML > processing is so deadly slow that one doesn't have a hope of reasonable > response times on a fast but lightly loaded server otherwise? Or is > it because even a fast server would quickly be on its knees under > anything more than a light load? It really pays off to do any steps that will increase the throughput. And AxKit is well suited for caching because it has clear layers and interfaces between them. So I see AxKit doing caching not only to get the performance, but also "just because it can". You cannot do the caching easily with more dirty approaches. > With a MVC type architecture, would it make sense to have the Model > objects maintain the XML related to the content I want to serve as > static files so that a simple stat of the appropriate XML file tells > me if my cached HTML document is out of date? Well, AxKit uses filesystem cache, doesn't it? It really depends on how much precission you need to achieve. If you run a website that lists cinema programs, it's just fine that your public will see the updated pages after five minutes, not immediatelly after they were changed by the data manager. Then you can really go with simply timing out the items in the cache. If you need to do something more real-time, you might prefer the push approach of MVC (because pull involves too much processing anyway, as you have said), and then you have a small problem with MySQL. As it lacks trigger support, you will have to send the push invalidation from you applications. Which might or might not be a problem, it depends on how many of them you have. -- Honza Pazdziora | [EMAIL PROTECTED] | http://www.fi.muni.cz/~adelton/ ... all of these signs saying sorry but we're closed ...
Determining when a cached item is out of date
I'm moving into the XML space and one of the things I see is that XML processing is very expensive, so AxKit, PageKit, et al make extensive use of caching. I'm keeping all of my data in a MySQL DB with about 40 tables. I'm pretty clear about how to turn that MySQL data into XML and turn the XML into HTML, WML, or what have you. But I haven't been able to wrap my skull around knowing when the data in Mysql is fresher than what is in the cache without doing a major portion of the work needed to generate that web page to begin with. Do AxKit and PageKit pay such close attention to caching because XML processing is so deadly slow that one doesn't have a hope of reasonable response times on a fast but lightly loaded server otherwise? Or is it because even a fast server would quickly be on its knees under anything more than a light load? With a MVC type architecture, would it make sense to have the Model objects maintain the XML related to the content I want to serve as static files so that a simple stat of the appropriate XML file tells me if my cached HTML document is out of date? One more thing. Perrin Harkins' eToys case study casually mentions a a means of removing files from the mod_proxy cache directory so that mod_proxy had to go back to the application servers to get an up to date copy. I haven't seen anything in the mod_proxy docs that says this is possible. Does something like that exist outside of eToys? I don't know, maybe my Prussian Perfection gene has taken over again and wants a bigger win than I need to get ... -- Christopher L. Everett Chief Technology Officer The Medical Banner Exchange Physicians Employment on the Internet