Re: SELECT cacheing
Brian Cocks wrote: > > I'm wondering how much improvement this caching is over the database caching > the parsed SQL statement and results (disk blocks)? > > In Oracle, if you issue a query that is cached, it doesn't need to be parsed. > If the resulting blocks are also cached, there isn't any disk access. If the > database is tuned, you should be able to get stuff out of cache over 90% of > the time. I don't know what other databases other than Oracle do. Oracle is clever in this respect - and you are right if you tune correctly you should hit the cache ... But may other DB's do not have a shared executition plan / results cache. MySQL, msql and postgreSQL do not. As far as I am aware Sybase only has an execution plan cache that is per connection (could be wrong here). The MySQL developers have got a SELECT CACHED idea, where you can define a statement as cacheable, and further calls to SELECT CACHE will return the cached results - this is all on the todo list with no fixed date. > What are the advantages of implementing your own cache? Is there any reason > other than speed? Could be wrong but no - and it has a bad point in that it introduces an added layer of complexity . I am certainly interested as accessing a local cache should be an order of magnitude faster than asking a buzy DB. Greg Cope > > -- > Brian Cocks > Senior Software Architect > Multi-Ad Services, Inc. > [EMAIL PROTECTED]
Re: SELECT cacheing
On Fri, 8 Sep 2000, Perrin Harkins wrote: > On Fri, 8 Sep 2000, Roger Espel Llima wrote: > > > - If possible, use some existing cache module for the storage, like > > > Apache::Session or one of the m/Cache/ modules on CPAN. > > > > Others have suggested Storable. I've used this one before, and I can > > agree that it's probably a good solution. > > Storable is just a way to turn a complex data structure into a single > scalar. You still need to handle the file manipulation yourself. Most of > the existing cache modules use Storable to serialize to a scalar and then > files or shared memory or a dbm for actual storage. I would delegate the tieing, serialization, and locking to a module like Apache::Session (It uses Storable internally). Then the user can specify their own favorite backing store and locking mechanism by subclassing Apache::Session. I would also look to the Memoize module for ideas: http://search.cpan.org/search?dist=Memoize -Tim
Re: SELECT cacheing
I'm wondering how much improvement this caching is over the database caching the parsed SQL statement and results (disk blocks)? In Oracle, if you issue a query that is cached, it doesn't need to be parsed. If the resulting blocks are also cached, there isn't any disk access. If the database is tuned, you should be able to get stuff out of cache over 90% of the time. I don't know what other databases other than Oracle do. What are the advantages of implementing your own cache? Is there any reason other than speed? -- Brian Cocks Senior Software Architect Multi-Ad Services, Inc. [EMAIL PROTECTED]
Re: SELECT cacheing
On Fri, 8 Sep 2000, Roger Espel Llima wrote: > > - If possible, use some existing cache module for the storage, like > > Apache::Session or one of the m/Cache/ modules on CPAN. > > Others have suggested Storable. I've used this one before, and I can > agree that it's probably a good solution. Storable is just a way to turn a complex data structure into a single scalar. You still need to handle the file manipulation yourself. Most of the existing cache modules use Storable to serialize to a scalar and then files or shared memory or a dbm for actual storage. > This means that this module would need a config file. Or some PerlSetVar directives in httpd.conf. - Perrin
Re: SELECT cacheing
On Fri, Sep 08, 2000 at 03:46:25PM +0100, Tim Sweetman wrote: > This can be an extremely powerful approach to speeding up web > applications. We use a similar module which ended up fairly large - it > takes a method name & arguments, rather than an SQL string, meaning that > you can cache the result of operations other than SQL queries. It's also > grown several other enhancements: a mutual-exclusion-and-backoff > algorithm, so if one process is looking for the answer, others wait for > it rather than performing the same query at the same time, and several > ways to expire results that have become outdated (specifying lifetime, > or via timestamp files that get touched when major changes happen) That sure sounds powerful! > The one thing I'd advise is: BE VERY CAREFUL WITH RACE CONDITIONS. You > can easily end up with something that will, in an unusual case, store > garbled data. I think you'd need to either use flock(), or write to > files then rename them, since rename is an atomic operation - and I > don't know how well that works under OSs other than UNIXes. I use the latter approach: write to a temp name, then rename. I really think this should be safe anywhere, I don't think any OS would be broken enough to make a rename non atomic, and let other processes read garbled stuff when the original file was written to and closed. > Many CPAN things that do this sort of thing use tied hashes, which > (mostly, at least) won't work in a multi-process environment because > they don't handle concurrent reads & writes. I really prefer Storable or something like it, for this application. So each cached value is a file, and we can use the filesystem and its lastmod metadata, and standard tools like find or File::Find (or whatever its name is) to clean up. -- Roger Espel Llima, [EMAIL PROTECTED] http://www.iagora.com/~espel/index.html
Re: SELECT cacheing
On Fri, Sep 08, 2000 at 09:26:23AM -0400, Drew Taylor wrote: > I'm certainly interested. One question though - in the module do you > blindly use the cache? I ask because in my instance I display the > contents of a shopping cart on every page. And while only a few pages > change the cart contents, the cart listing does need to be current. How > do you handle this situation? the module gets the expiration time. if it's 0 or negative, it ignores the cache and reads straight from the db. -- Roger Espel Llima, [EMAIL PROTECTED] http://www.iagora.com/~espel/index.html
Re: SELECT cacheing
On Thu, Sep 07, 2000 at 06:22:40PM -0700, Perrin Harkins wrote: > I'd say this is probably useful to some people, so go ahead. A few > suggestions: > - Use the DBIx namespace for the module. Sounds reasonable. The question then is: what should the API be like? The way it works right now is with an API of its own: $arrayref_of_arrays = SelectCache::select($dbh, $st, $timeout); It'd be nice to have an API that mimics the DBI one more closely, with the different fetchrow_* and fetchall_* interfaces. Then again, for a module whose main purpose in life is to speed up SELECTs, maybe restricting it to mimic the selectall_arrayref(), selectrow_array() and selectcol_arrayref() would be enough. I really don't see much of a purpose on writing iterators a-la fetchrow_* for a module that gets all the rows at the same time. This is actually an important thing to decide, because AFAICS fetchrow_hashref is the only method that returns hashes and therefore needs to care about column names. So if we decide to support only the select* interfaces, all we have to store are arrayrefs of arrayrefs. If we do support fetchrow_hashref, then we need either two kinds of storage (so that the results of the same SELECT can be cached twice, if one scrpit wants arrays and the other wants hashes), or a way to get arrays from hashes or vice versa, which looks hard because one loses the order and the other the names. Another matter is: should this be a subclass of DBI, mimicing its API, with an interface like: my $dbc = DBI::SelectCache->new($db); $dbc->expiration(180); my $st = qq{ select ... }; my $rows = $dbc->selectall_arrayref($st); and letting everything else (prepare, fetch*, etc) fall through to the superclass, or should it be passing the expiration time as part of the main function all, as I was doing before? That woudl be something like: my $st = qq{ select ... }; my $rows = DBI::SelectCache->selectall_arrayref($db, $st); The first option fits in better with DBI, but the second is more practical for the user, who doesn't need to create another object, and can think of the expiration as a per-statement thing (which it is), rather than per-connection. Any suggestions? I'm a bit lost on how to give this thing a good, extendable interface. > - If possible, use some existing cache module for the storage, like > Apache::Session or one of the m/Cache/ modules on CPAN. Others have suggested Storable. I've used this one before, and I can agree that it's probably a good solution. Right now, what I'm doing is just join()ing the arrays with null characters as separators, and using spilt() to get them back. But null chars are allowed in databases, so I agree that switching to Storable would be a good idea. > - Provide a safety check so that if a query brought back a few million > rows by accident you wouldn't try to write the whole mess to disk. > - Maybe try to support the other results interfaces in DBI? Sounds good. This means that this module would need a config file. -- Roger Espel Llima, [EMAIL PROTECTED] http://www.iagora.com/~espel/index.html
Re: SELECT cacheing
On Fri, 8 Sep 2000, Tim Sweetman wrote: > > - Use the DBIx namespace for the module. > > Possibly. SQL is not the only application for this sort of tool, though > it seems to be the main one. The module we're discussing is DBI-specific. At least the interesting part of it is. The actual caching part is the second most re-invented wheel on the mod_perl list, right behind templating systems. > > - If possible, use some existing cache module for the storage, like > > Apache::Session or one of the m/Cache/ modules on CPAN. > > IIRC, Apache::Session *generates* its own key for each session. It only does that if you don't hand it one. - Perrin
Re: SELECT cacheing
DeWitt - this started as a reply to the modperl mailing list, & I had a look at File::Cache as my reply grew. See the end of this for the relevant bit :) - think I've found a bug... Drew Taylor wrote: > > Roger Espel Llima wrote: > > > > I've written a very small module to cache SELECT results from DBI > > requests. The interface looks like: > > > > use SelectCache; > > > > my $db = whatever::get_a_handle(); > > my $st = qq{ select this, that ... }; > > my $rows = SelectCache::select($db, $st, 180); > > > > this returns an arrayref of rows (like the selectall_arrayref function), > > and caches the result in a file, which gets reused for 180 seconds > > instead of asking the db again. "Storable" is probably a good way to store this sort of result. > > The names of the cache files are the md5's of the select statement, > > using the last hex digit as a subdirectory name. There's no file > > cleanup function; you can always do that from cron with find. > > > > This is all very simple, but it's pretty useful in combination with > > mod_perl, to speed up things like showing the "latest 10 posts", on > > frequently accessed webpages. > > The question now is: is there any interest in releasing this? I could > > write some minimal docs and give it a 'proper' module name, if there's > > interest. This can be an extremely powerful approach to speeding up web applications. We use a similar module which ended up fairly large - it takes a method name & arguments, rather than an SQL string, meaning that you can cache the result of operations other than SQL queries. It's also grown several other enhancements: a mutual-exclusion-and-backoff algorithm, so if one process is looking for the answer, others wait for it rather than performing the same query at the same time, and several ways to expire results that have become outdated (specifying lifetime, or via timestamp files that get touched when major changes happen) I always thought it'd make a good thing to CPANify but never got round to it :( The one thing I'd advise is: BE VERY CAREFUL WITH RACE CONDITIONS. You can easily end up with something that will, in an unusual case, store garbled data. I think you'd need to either use flock(), or write to files then rename them, since rename is an atomic operation - and I don't know how well that works under OSs other than UNIXes. > I'm certainly interested. One question though - in the module do you > blindly use the cache? I ask because in my instance I display the > contents of a shopping cart on every page. I think this would be tricky to use with a cache - cart contents will change in real time, and there's one copy per user, so you'd need a way of expiring the cached data according to user ID. Some RDBMSs get Large performance improvements from using placeholders ("select * from foo where userid = ?") and cacheing the statement handles - I don't know if this applies to MySQL. With your sort of application I'd try those measures before trying to use a complex cache mechanism. Where up-to-date results are not critical, a cache mechanism has great merit, IMHO. Reading back along this thread, Perrin Hawkins wrote: > - Use the DBIx namespace for the module. Possibly. SQL is not the only application for this sort of tool, though it seems to be the main one. > - If possible, use some existing cache module for the storage, like > Apache::Session or one of the m/Cache/ modules on CPAN. IIRC, Apache::Session *generates* its own key for each session. This isn't going to work with a MD5-keyed-cache, where the key is generated from the SQL. File::Cache seems to do something rather similar, though without the MD5 bit. However, from a cursory look at the code, I think it's vulnerable to concurrency conditions such as: + process (a) reads a file whilst (b) is still writing it + processes (a) and (b) both write to a file simultaneously, possibly corrupting it?! (this may be impossible, not sure) + process fails whilst writing a file (eg. process catches a KILL); subsequent reads of that file will get fatal error ... which will pop up only Sometimes, usually on a busy site open to the public :) This is Not Nice, assuming it's true. Many CPAN things that do this sort of thing use tied hashes, which (mostly, at least) won't work in a multi-process environment because they don't handle concurrent reads & writes. Cheers -- Tim Sweetman A L Digital
Re: SELECT cacheing
Some good ideas, I think that this package might come out a bit thin though. I've written a package that does arbitrary variable caching (like everybody else). But it has a list of other bells and whistles. Things like cache expiration and data refresh hooks. It's a pretty simple process. >From there, I've have (but addmittedly don't use yet) a little DB package that sits as an interface between the programmer and the DB, and incorporates things like this caching package at the same time. So you do: $dbh = DB->new(...) $sth = $dbh->prepare($q) %results1 = $sth->fetch... $sth = $dbh->prepare($q) %results2 = $sth->fetch... # Results are the same, %results2 comes from cache. $sth = $dbh->prepare($insert) $sth->execute $sth = $dbh->prepare($q) %diff_results = $sth->fetch... # %diff_results is new data because the DB has changed. Just some thoughts for y'all to mull over. Rodney Broom
Re: SELECT cacheing
I don't know about Roger, but in my situation queries are called as follows. my $queryhandle=Query("select blah from blah where blah") the Query routine can be overloaded with a timeout value (a default capable of being set), with a timeout of 0 meaning that the select should never be cached and should always be selected live from the database. I'd assume Roger would need to have something similar in the module he's developing. regards, P On Fri, 8 Sep 2000, Drew Taylor wrote: > Roger Espel Llima wrote: > > > > I've written a very small module to cache SELECT results from DBI > > requests. The interface looks like: > > > > use SelectCache; > > > > my $db = whatever::get_a_handle(); > > my $st = qq{ select this, that ... }; > > my $rows = SelectCache::select($db, $st, 180); > > > > this returns an arrayref of rows (like the selectall_arrayref function), > > and caches the result in a file, which gets reused for 180 seconds > > instead of asking the db again. > > > > The names of the cache files are the md5's of the select statement, > > using the last hex digit as a subdirectory name. There's no file > > cleanup function; you can always do that from cron with find. > > > > This is all very simple, but it's pretty useful in combination with > > mod_perl, to speed up things like showing the "latest 10 posts", on > > frequently accessed webpages. > > > > The question now is: is there any interest in releasing this? I could > > write some minimal docs and give it a 'proper' module name, if there's > > interest. > I'm certainly interested. One question though - in the module do you > blindly use the cache? I ask because in my instance I display the > contents of a shopping cart on every page. And while only a few pages > change the cart contents, the cart listing does need to be current. How > do you handle this situation? > > -- .-. | Peter SkipworthPh: 03 9897 1121 | | Senior Programmer Mob: 0417 013 292 | | realestate.com.au [EMAIL PROTECTED] | `-'
Re: SELECT cacheing
Roger Espel Llima wrote: > > I've written a very small module to cache SELECT results from DBI > requests. The interface looks like: > > use SelectCache; > > my $db = whatever::get_a_handle(); > my $st = qq{ select this, that ... }; > my $rows = SelectCache::select($db, $st, 180); > > this returns an arrayref of rows (like the selectall_arrayref function), > and caches the result in a file, which gets reused for 180 seconds > instead of asking the db again. > > The names of the cache files are the md5's of the select statement, > using the last hex digit as a subdirectory name. There's no file > cleanup function; you can always do that from cron with find. > > This is all very simple, but it's pretty useful in combination with > mod_perl, to speed up things like showing the "latest 10 posts", on > frequently accessed webpages. > > The question now is: is there any interest in releasing this? I could > write some minimal docs and give it a 'proper' module name, if there's > interest. I'm certainly interested. One question though - in the module do you blindly use the cache? I ask because in my instance I display the contents of a shopping cart on every page. And while only a few pages change the cart contents, the cart listing does need to be current. How do you handle this situation? -- Drew Taylor Vialogix Communications, Inc. 501 N. College Street Charlotte, NC 28202 704 370 0550 http://www.vialogix.com/
Re: SELECT cacheing
On Thu, 7 Sep 2000, Roger Espel Llima wrote: > The question now is: is there any interest in releasing this? I could > write some minimal docs and give it a 'proper' module name, if there's > interest. I'd say this is probably useful to some people, so go ahead. A few suggestions: - Use the DBIx namespace for the module. - If possible, use some existing cache module for the storage, like Apache::Session or one of the m/Cache/ modules on CPAN. - Provide a safety check so that if a query brought back a few million rows by accident you wouldn't try to write the whole mess to disk. - Maybe try to support the other results interfaces in DBI?
Re: SELECT cacheing
Hi there, On Thu, 7 Sep 2000, Roger Espel Llima wrote: > I've written a very small module to cache SELECT results from DBI > requests. [snip] > The question now is: is there any interest in releasing this? I could > write some minimal docs and give it a 'proper' module name, if there's > interest. I'm sure there is. My last but one contract I did a very similar thing but I implemented the cache as a tied DBM file. 73, Ged.
Re: SELECT cacheing
Roger, Sounds interesting - as a matter of fact, I'm doing it exactly the same way on realestate.com.au, except I use the last 2 hex digits as the directory name. Great minds think alike! I don't know if many people realise this, but mySQL, for those of you that use it, will soon include a select_cached function, for doing exactly the same. Of course, a non-RDSM-specific solution would be great as well on the Perl side. I'm using Storable to write the query result to disk as an array of hashes. I've implemented most of DBI's query methods, including fetchrow, fetchrow_hashref, fetchrow_arrayref, etc, and it works quite well and is transparent to the programmer. Page generation speed improved by up to 100% in a lot of cases. I'd suggest that you include an (optionally implementable) cache-cleaning routine, so that it's an 'all in one' solution, as well as parameters such as 'maximum cache size' and cache-timeout per store. Let me know if you'd like any help with getting this completed, as I'd be happy to help. regards, Peter Skipworth On Thu, 7 Sep 2000, Roger Espel Llima wrote: > I've written a very small module to cache SELECT results from DBI > requests. The interface looks like: > > use SelectCache; > > my $db = whatever::get_a_handle(); > my $st = qq{ select this, that ... }; > my $rows = SelectCache::select($db, $st, 180); > > this returns an arrayref of rows (like the selectall_arrayref function), > and caches the result in a file, which gets reused for 180 seconds > instead of asking the db again. > > The names of the cache files are the md5's of the select statement, > using the last hex digit as a subdirectory name. There's no file > cleanup function; you can always do that from cron with find. > > This is all very simple, but it's pretty useful in combination with > mod_perl, to speed up things like showing the "latest 10 posts", on > frequently accessed webpages. > > The question now is: is there any interest in releasing this? I could > write some minimal docs and give it a 'proper' module name, if there's > interest. > > -- .-. | Peter SkipworthPh: 03 9897 1121 | | Senior Programmer Mob: 0417 013 292 | | realestate.com.au [EMAIL PROTECTED] | `-'
SELECT cacheing
I've written a very small module to cache SELECT results from DBI requests. The interface looks like: use SelectCache; my $db = whatever::get_a_handle(); my $st = qq{ select this, that ... }; my $rows = SelectCache::select($db, $st, 180); this returns an arrayref of rows (like the selectall_arrayref function), and caches the result in a file, which gets reused for 180 seconds instead of asking the db again. The names of the cache files are the md5's of the select statement, using the last hex digit as a subdirectory name. There's no file cleanup function; you can always do that from cron with find. This is all very simple, but it's pretty useful in combination with mod_perl, to speed up things like showing the "latest 10 posts", on frequently accessed webpages. The question now is: is there any interest in releasing this? I could write some minimal docs and give it a 'proper' module name, if there's interest. -- Roger Espel Llima, [EMAIL PROTECTED] http://www.iagora.com/~espel/index.html