Re: Comparison of different caching schemes
Ok, I'm a bit slow... At 03:05 PM 12/12/01 +1100, Rob Mueller (fastmail) wrote: Just thought people might be interested... Seems like they were! Thanks again. I didn't see anyone comment about this, but I was a bit surprised by MySQLs good performance. I suppose caching is key. I wonder if things would change with 50 or 100 thousand rows in the table. I always assumed something like Cache::FileCache would have less overhead than a RDMS. It's impressive. Now to the results, here they are. Package C0 - In process hash Sets per sec = 147116 Gets per sec = 81597 Mixes per sec = 124120 Package C1 - Storable freeze/thaw Sets per sec = 2665 Gets per sec = 6653 Mixes per sec = 3880 Package C2 - Cache::Mmap Sets per sec = 809 Gets per sec = 3235 Mixes per sec = 1261 Package C3 - Cache::FileCache Sets per sec = 393 Gets per sec = 831 Mixes per sec = 401 Package C4 - DBI with freeze/thaw Sets per sec = 651 Gets per sec = 1648 Mixes per sec = 816 Package C5 - DBI (use updates with dup) with freeze/thaw Sets per sec = 657 Gets per sec = 1994 Mixes per sec = 944 Package C6 - MLDBM::Sync::SDBM_File Sets per sec = 334 Gets per sec = 1279 Mixes per sec = 524 Package C7 - Cache::SharedMemoryCache Sets per sec = 42 Gets per sec = 29 Mixes per sec = 32 Bill Moseley mailto:[EMAIL PROTECTED]
Re: Comparison of different caching schemes
Rob Mueller (fastmail) wrote: And ++ on Paul's comments about Devel::DProf and other profilers. Ditto again. I've been using Apache::DProf recently and it's been great at tracking down exactly where time is spent in my program. One place that Rob and I still haven't found a good solution for profiling is trying to work out whether we should be focussing on optimising our mod_perl code, or our IMAP config, or our MySQL DB, or our SMTP setup, or our daemons' code, or... Can anyone suggest a way (under Linux 2.4, if it's OS dependent) to get a log of CPU (and also IO preferably) usage by process name over some period of time? Sorry to move slightly off-topic here, but in optimising our mod_perl app we have to work out which bit of the whole system to prioritise.
Re: Comparison of different caching schemes
One place that Rob and I still haven't found a good solution for profiling is trying to work out whether we should be focussing on optimising our mod_perl code, or our IMAP config, or our MySQL DB, or our SMTP setup, or our daemons' code, or... Assuming that the mod_perl app is the front-end for users and that you're trying to optimize for speed of responses, you should just use DProf to tell you which subroutines are using the most wall clock time. (I think it's dprofpp -t or something. Check the man page.) If the sub that uses IMAP is the slowest, then work on speeding up your IMAP server or the way you access it. CPU utilization may not be all that telling, since database stuff often takes the longest but doesn't burn much CPU. - Perrin
Re: Comparison of different caching schemes
On Sun, Dec 16, 2001 at 09:58:09AM +1100, Jeremy Howard wrote: Can anyone suggest a way (under Linux 2.4, if it's OS dependent) to get a log of CPU (and also IO preferably) usage by process name over some period of time? What about BSD Process Accounting (supported in most *nix systems) and sysstat (iostat + sar)? regards, Luciano Rocha -- Luciano Rocha, [EMAIL PROTECTED] The trouble with computers is that they do what you tell them, not what you want. -- D. Cohen
Re: Comparison of different caching schemes
On Sat, Dec 15, 2001 at 08:57:30PM -0500, Perrin Harkins wrote: One place that Rob and I still haven't found a good solution for profiling is trying to work out whether we should be focussing on optimising our mod_perl code, or our IMAP config, or our MySQL DB, or our SMTP setup, or our daemons' code, or... Assuming that the mod_perl app is the front-end for users and that you're trying to optimize for speed of responses, you should just use DProf to tell you which subroutines are using the most wall clock time. (I think it's dprofpp -t or something. Check the man page.) If the sub that uses IMAP is the slowest, then work on speeding up your IMAP server or the way you access it. CPU utilization may not be all that telling, since database stuff often takes the longest but doesn't burn much CPU. I agree 100%, wall clock time is the best metric. Consider that most apps are not CPU bound, they are I/O or network bound. Wall clock time measures that exactly. Another useful tool is truss/strace. Start up your server single process (-X) and then attach to it with strace. The -c and -r flags to strace are quite handy.. For example here's the cumulative stats for 'ls' % time seconds usecs/call callserrors syscall -- --- --- - - 52.290.004508 751 6 read 19.480.001679 560 3 write 12.930.001115 5620 close 6.010.000518 2323 old_mmap 5.450.000470 2122 2 open 1.220.000105 21 5 munmap 0.930.80 40 2 getdents64 0.710.61 321 fstat64 0.290.25 4 7 brk 0.240.21 11 2 mprotect 0.190.16 8 2 ioctl 0.120.10 10 1 uname 0.090.08 3 3 2 rt_sigaction 0.060.05 5 1 fcntl64 -- --- --- - - 100.000.008621 118 4 total -- Paul Lindner [EMAIL PROTECTED]| | | | | | | | | | mod_perl Developer's Cookbook http://www.modperlcookbook.org Human Rights Declaration http://www.unhchr.ch/udhr/index.htm
Re: Comparison of different caching schemes
I was using Cache::SharedMemoryCache on my system. I figured, Hey, it's RAM, right? It's gonna be WAY faster than anything disk-based. The thing you were missing is that on an OS with an aggressively caching filesystem (like Linux), frequently read files will end up cached in RAM anyway. The kernel can usually do a better job of managing an efficient cache than your program can. For what it's worth, DeWitt Clinton accompanied his first release of File::Cache (the precursor to Cache::FileCache) with a benchmark showing this same thing. That was the reason File::Cache was created. And ++ on Paul's comments about Devel::DProf and other profilers. - Perrin
RE: Comparison of different caching schemes
Another powerful tool for tracking down performance problems is perl's profiler combined with Devel::DProf and Apache::DProf. Devel::DProf is bundled with perl. Apache::DProf is hidden in the Apache-DB package on CPAN. Ya know the place in my original comment where I was optimizing a different subsystem? I just discovered Devel::DProf last week (after 5 *years* of perl... smacks forehead), and was using that. *AND* had improved a sore spot's performance by 10% without even working hard, because of profiling. Point taken. At the same time I added some code to track the time it takes to process a request using Time::HiRes. This value is set as a note via $r-note('REQTIME'). A customlog directive takes care of dumping that value in the logs... Hmm... I was already logging a status message via warn(), I did the SAME TRICK but stored it in a local variable because I didn't need to go as far as a customlog... Sounds like great minds think alike! :-) L8r, Rob #!/usr/bin/perl -w use Disclaimer qw/:standard/;
Re: Comparison of different caching schemes
On Fri, 14 Dec 2001, Perrin Harkins wrote: The thing you were missing is that on an OS with an aggressively caching filesystem (like Linux), frequently read files will end up cached in RAM anyway. The kernel can usually do a better job of managing an efficient cache than your program can. Plus for some reason IPC overhead seems to seriously degrade as the size of the overall shared memory segment increases. I'm not sure why this is but I think the docs for IPC::Shareable mention this. Maybe for _very_ small amounts of data, shared memory might still be a win. -dave /*== www.urth.org We await the New Sun ==*/
Re: Comparison of different caching schemes
So our solution was caching in-process with just a hash, and using a DBI/mysql persistent store. in pseudo code sub get_stuff { if (! $cache{$whatever} ) { if !( $cache{whatever} = dbi_lookup()) { $cache{$whatever}=derive_data_from_original_source($whatever); dbi_save($cache_whatever); } } $cache{$whatever} } That's actually a bit different. That would fail to notice updates between processes until the in-memory cache was cleared. Still very useful for read-only data or data that can be out of sync for some period though. The filesystem based / time sensitive aging is a nice little thing we should put up in CPAN. We've just not done so yet. How does it differ from the other solutions like Cache::FileCache? Is it something you could add to an existing module? - Perrin
Re: Comparison of different caching schemes
On December 14, 2001 03:04 pm, Perrin Harkins wrote: So our solution was caching in-process with just a hash, and using a DBI/mysql persistent store. in pseudo code sub get_stuff { if (! $cache{$whatever} ) { if !( $cache{whatever} = dbi_lookup()) { $cache{$whatever}=derive_data_from_original_source($whatever); dbi_save($cache_whatever); } } $cache{$whatever} } That's actually a bit different. That would fail to notice updates between processes until the in-memory cache was cleared. Still very useful for read-only data or data that can be out of sync for some period though. The filesystem based / time sensitive aging is a nice little thing we should put up in CPAN. We've just not done so yet. How does it differ from the other solutions like Cache::FileCache? Is it something you could add to an existing module? Performance,and key structure, semantics. Its mostly different, not better/worse. =pod =head1 NAME UFMEDIA::CacheOneFile - cache a scalar value in a file on disk =head1 SYNOPSIS use UFMEDIA::CacheOneFile; my $cache = new UFMEDIA::CacheOneFile( cache_file = '/var/cache/myapp/flurge.cache', max_age= 30, refill_sub = sub { recalculate_flurges(31337, 'blue', 42) }, ); my $value = $cache-get_value; =head1 DESCRIPTION UFMEDIA::CacheOneFile enables you to cache a single scalar value in a file on disk. Given a filename under a writable directory, a maximum age, and a reference to a refill subroutine, a cache object will cache the result of the refill subroutine in the file the first time Bget_value() is called, and use the cached value for subsequent calls to Bget_value() until Bmax_age the cach e file is more than Bmax_age seconds old. If multiple processes share a single cache file, the first process that reads the cache file after it has expired will take responsibility for replacing it with an up-to-date copy. Other processes needing up-to-date information will wait for this to finish and will then use the new value. =head1 AUTHOR Mike Lyons [EMAIL PROTECTED] =head1 SEE ALSO perl(1). =cut - Perrin -- Jay yohimbe Thorne [EMAIL PROTECTED] Mgr Sys Tech, Userfriendly.org
Re: Comparison of different caching schemes
At 6:04 PM -0500 12/14/01, Perrin Harkins wrote: That's actually a bit different. That would fail to notice updates between processes until the in-memory cache was cleared. Still very useful for read-only data or data that can be out of sync for some period though. The primary problem with everything mentioned thus far is that they are almost entirely based around the concept of a single server. Caching schemes that are both fast and work across multiple servers and process instances are very hard to find. After reading the eToys article, I decided that BerkeleyDB was worth a look. We discovered that it was very fast and would make a great cache for some of our commonly used database queries. The problem was that we are running on 5 different servers, load balanced by Big/IP. Again, taking my que from the eToys article, I began working on a data exchange system (using perl instead of C) which multicasts data packets to the listening servers. So far, our only problems have been with the deadlocking (solved with db_deadlock), and a few corrupt records. I'm considering uploading to CPAN the stuff I've written for the caching, but haven't had the time to make it generic enough. I also haven't performed any benchmarks other than the Wow, that's a lot faster benchmark. One limitation to the stuff I've written is that the daemons (listeners) are non threaded, non forking. That and it's all based on UDP. Rob -- When I used a Mac, they laughed because I had no command prompt. When I used Linux, they laughed because I had no GUI.
Re: Comparison of different caching schemes
On December 14, 2001 03:53 pm, Robert Landrum wrote: At 6:04 PM -0500 12/14/01, Perrin Harkins wrote: That's actually a bit different. That would fail to notice updates between processes until the in-memory cache was cleared. Still very useful for read-only data or data that can be out of sync for some period though. The primary problem with everything mentioned thus far is that they are almost entirely based around the concept of a single server. Caching schemes that are both fast and work across multiple servers and process instances are very hard to find. After reading the eToys article, I decided that BerkeleyDB was worth a look. We discovered that it was very fast and would make a great cache for some of our commonly used database queries. The problem was that we are running on 5 different servers, load balanced by Big/IP. Again, taking my que from the eToys article, I began working on a data exchange system (using perl instead of C) which multicasts data packets to the listening servers. So far, our only problems have been with the deadlocking (solved with db_deadlock), and a few corrupt records. We did stuff based on kind of an Acceptable differences concept. Using round robinn dns, with predominantly winXX clients, you tend to get the same IP address from one request to tthe next. Changing, IIRC, on the dns cache timeout, from the soa record. With a bigIP or something, you can guarantee this user locality. So that leaves you with *appearing* to be in sync for the one local machine. Using a database with common connections from multiple machines or the one writer db machine, many read db machines concept, using replication or what have you for updates, you can present a pretty consistent database view, with a maximum difference on the read side of the replication time. We found that the database concept, plus using a local cache with a timeout of 5 seconds not only gave us a huge performance increase, but it gave us a maximum of about 6 seconds of data difference between machines. And since your average human viewer hits pages no faster than once every 20 seconds or so, it was pretty much invisible to the individual user. Of course search engines will be faster than this, but does being consistent for search engines matter all that much? I'm considering uploading to CPAN the stuff I've written for the caching, but haven't had the time to make it generic enough. I also haven't performed any benchmarks other than the Wow, that's a lot faster benchmark. One limitation to the stuff I've written is that the daemons (listeners) are non threaded, non forking. That and it's all based on UDP. Rob -- Jay yohimbe Thorne [EMAIL PROTECTED] Mgr Sys Tech, Userfriendly.org
Re: Comparison of different caching schemes
The thing you were missing is that on an OS with an aggressively caching filesystem (like Linux), frequently read files will end up cached in RAM anyway. The kernel can usually do a better job of managing an efficient cache than your program can. For what it's worth, DeWitt Clinton accompanied his first release of File::Cache (the precursor to Cache::FileCache) with a benchmark showing this same thing. That was the reason File::Cache was created. While that's true, there are still some problems with a file cache. Namely to get reasonably performance you have to do the directory hashing structure so that you don't end up with too many files (one for each key) in one directory. Thus for every add to the cache you have to: * stat each directory in the hash path and create it if it doesn't exist * open and create the file and write to it, close the file A similar method is required for reading. All that still takes a bit of time. This is where having some shared memory representation can be a really help since you don't have to traverse, open, read/write, close a file every time. Witness the performance of IPC::MM which seems to be mostly limited by the performance of the Storable module. I'm planning on doing another test which just stores some data without the 'streaming' to see which examples are really limited by Storable and which by their implementation, this might be useful for some people. And ++ on Paul's comments about Devel::DProf and other profilers. Ditto again. I've been using Apache::DProf recently and it's been great at tracking down exactly where time is spent in my program. If you have any performance problems, definitely use it first before making any assumptions. One question though, I have a call tree that looks a bit like: main - calls f1 - calls f2 - calls f3 - calls f2 The two calls to f2 may take completely different times. Using 'dprofpp -I', I can see what percentage of overall time is spent in 'f1 and children', 'f3 and children' and 'f2 and children'. But I can't see an easy way to tell 'time in f2 and children when f2 was called from f1' (or ditto for f3). Does that make sense? Rob
Re: Comparison of different caching schemes
Some more points. I'd like to point out that I don't think the lack of actual concurrency testing is a real problem, at leastfor most single CPU installations. If most of the time is spent doing other stuff in a request (which is most likely the case), then on average when a process goes to access the cache, nothing else will be accessing it, so it won't have to block to wait to get a lock. In that case, one process doing 10*N accesses is the same as N processes doing 10 accesses, and the results are still meaningful. Perrin Harkins pointed out IPC::MM which I've added to the test code. It's based on the MM library (http://www.engelschall.com/sw/mm/mm-1.1.3.tar.gz) and includes a hash and btree tied hash implementation. I've tried the tied hash and it performs extremely well. It seems to be limited mostly by the speed of the Storable module. Package C0 - In process hashSets per sec = 181181Gets per sec = 138242Mixes per sec = 128501Package C1 - Storable freeze/thawSets per sec = 2856Gets per sec = 7079Mixes per sec = 3728Package C2 - Cache::MmapSets per sec = 810Gets per sec = 2956Mixes per sec = 1185Package C3 - Cache::FileCacheSets per sec = 392Gets per sec = 813Mixes per sec = 496Package C4 - DBI with freeze/thawSets per sec = 660Gets per sec = 1923Mixes per sec = 885Package C5 - DBI (use updates with dup) with freeze/thawSets per sec = 676Gets per sec = 1866Mixes per sec = 943Package C6 - MLDBM::Sync::SDBM_FileSets per sec = 340Gets per sec = 1425Mixes per sec = 510Package C7 - Cache::SharedMemoryCacheSets per sec = 31Gets per sec = 21Mixes per sec = 24Package C8 - IPC::MMSets per sec = 2267Gets per sec = 5435Mixes per sec = 2769 Rob
Re: Comparison of different caching schemes
On Wed Dec 12, 2001 at 03:05:33PM +1100, Rob Mueller (fastmail) wrote: I tried out the following systems. * Null reference case (just store in 'in process' hash) * Storable reference case (just store in 'in process' hash after 'freeze') * Cache::Mmap (uses Storable) * Cache::FileCache (uses Storable) * DBI (I used InnoDB), use Storable, always do 'delete' then 'insert' * DBI, use Storable, do 'select' then 'insert' or 'update' * MLDBM::Sync::SDBM_File (uses Storable) * Cache::SharedMemoryCache (uses Storable) [...] Have I missed something obvious? Hi Rob, Maybe you can add my under construction Apache::Cache module to you test suite ? It is availlable on CPAN. Best regards -- ___ O l i v i e rP o i t r e y
Re: Comparison of different caching schemes
On Wed, Dec 12, 2001 at 03:05:33PM +1100, Rob Mueller (fastmail) wrote: I sat down the other day and wrote a test script to try out various caching implementations. The script is pretty basic at the moment, I just wanted to get an idea of the performance of different methods. Rob, wow! This is fantastic work! I'm wondering if you could include a test on the Cache::FileBackend class. The Cache::FileCache class is not optimized for the particular type of gets and sets that you were testing for. In fact, each of those requests will still need to go through the process of checking for object expiration, which may add the overhead that you observe in the benchmark. Not that I'm expecting Cache::FileBackend to do significantly better -- it still does name hashing, directory hashing, taint checking, cloning of objects, etc. But I am curious to see what difference it makes. You could try Cache::SharedMemoryBackend as well. In general the Cache::* modules were designed for clarity and ease of use in mind. For example, the modules tend to require absolutely no set-up work on the end user's part and try to be as fail-safe as possible. Thus there is run-time overhead involved. That said, I'm certainly not against performance. :) These benchmarks are going to be tremendously useful in identifying bottlenecks. However, I won't be able to optimize for these particular benchmarks, as Cache::Cache is designed to do something different than straight gets and sets. Again, thank you, Rob. This is great, -DeWitt
Re: Comparison of different caching schemes
Perrin Harkins wrote: Also, I'd like to see MLDBM + BerkeleyDB (not DB_File) with BerkeleyDB doing automatic locking, and IPC::MM, and IPC::Shareable, and IPC::ShareLite (though it doesn't serialize complex data by itself), and MySQL with standard tables. Of course I could just do them myself, since you were kind enough to provide code. IPC::ShareLite freezes/thaws the whole data structure, rather than just the hash element being accessed, IIRC, so is probably going to have extremely poor scaling characteristics. Worth adding to check, of course. MLDBM+BDB without using MLDBM::Sync is a cool idea that I hadn't thought of. That will certainly be interesting. Another interesting option is mapping a MySQL table data structure directly to the data structure being stored. This can be faster because you don't have to serialise, and also don't have to update the whole structure, just the fields that change (which is common in session handling and similar). Unfortunately this is hard to include in this benchmark, because the benchmark creates random complex structures. Maybe we'll create a custom one for this option for the sake of reference, even although it won't be directly comparable. I'm not sure what a 'standard table' in MySQL is any more... Berkeley, MyISAM, ISAM... I guess we can try all these, but that's benchmarking the DB rather than the caching scheme, and we're not about to try every DB server we can find!
Re: Comparison of different caching schemes
IPC::ShareLite freezes/thaws the whole data structure, rather than just the hash element being accessed, IIRC, so is probably going to have extremely poor scaling characteristics. Worth adding to check, of course. No, it's probably not worth it. It would be worth adding IPC::Shareable though, because people never believe me when I tell them to use something else. Having some numbers would help. Another interesting option is mapping a MySQL table data structure directly to the data structure being stored. That could be useful as part of a comparison for storing non-complex data, i.e. a single scalar value. I'm not sure what a 'standard table' in MySQL is any more... Berkeley, MyISAM, ISAM... I guess we can try all these, but that's benchmarking the DB rather than the caching scheme, and we're not about to try every DB server we can find! No, of course not. It may be that the performance characteristics of these table types are well known already and I just don't follow the MySQL scene well enough to know. I thought maybe the default tables type (MyISAM?) which doesn't support transactions would have better speed for dirt simple storage like this. - Perrin
Re: Comparison of different caching schemes
In general the Cache::* modules were designed for clarity and ease of use in mind. For example, the modules tend to require absolutely no set-up work on the end user's part and try to be as fail-safe as possible. Thus there is run-time overhead involved. That said, I'm certainly not against performance. :) These benchmarks are going to be tremendously useful in identifying bottlenecks. However, I won't be able to optimize for these particular benchmarks, as Cache::Cache is designed to do something different than straight gets and sets. Again, thank you, Rob. This is great, That's a good point. I probably should have added the features that each one can do to help with decisions. Cache::Cache does have the most options with regard to limiting time/size in the cache, so that could be a be factor in someones choice. * Cache::Mmap (uses Storable) - Can indirectly specify the maximum cache size, though purges are uneven depending on how well data hashes into different buckets - Has callback ability on a read/purge so you can move any purged data to a different data store if you want, and automatically retrieve it on next retrieve when it's not in the cache * Cache::FileCache (uses Storable) * Cache::SharedMemoryCache (uses Storable) - Can specify the maximum cache size (Cache::SizeAwareFileCache) and/or maximum time an object is allowed in the cache - Follows the Cache::Cache interface system * DBI (I used InnoDB), use Storable, always do 'delete' then 'insert' * DBI, use Storable, do 'select' then 'insert' or 'update' - Can't specifiy any limits directly - Could add a 'size' and 'timestamp' column to each row and use a daemon to iterate through and cleanup based on time and size * MLDBM::Sync::SDBM_File (uses Storable) * IPC::MM - Can't specifiy any limits directly - Could create secondary tied db/mm hash with key - [ size, timestamp ] mapping and use daemon to iterate through and cleanup based on time and size Rob
Re: Comparison of different caching schemes
I sat down the other day and wrote a test script to try out various caching implementations. Very interesting. Looks like Cache::Mmap deserves more attention (and maybe a Cache::Cache subclass). Have I missed something obvious? Nothing much, but I'd like to see how these numbers vary with the size of the data being written. And it would be good if there were a way to ensure that the data had not been corrupted at the end of a run. Also, I'd like to see MLDBM + BerkeleyDB (not DB_File) with BerkeleyDB doing automatic locking, and IPC::MM, and IPC::Shareable, and IPC::ShareLite (though it doesn't serialize complex data by itself), and MySQL with standard tables. Of course I could just do them myself, since you were kind enough to provide code. - Perrin
Re: Comparison of different caching schemes
Just wanted to add an extra thought that I forgot to include in the previous post. One important aspect missing from my tests is the actual concurrency testing. In mostreal world programs, multiple applications will be reading from/writing to the cache at the same time. Depending on the cache synchronisation scheme, you'll get varying levels of performance degradation: 1 being worst, 3 being the best. 1. Lock entire cache for a request(Cache::SharedMemoryCache, MySQL normal - table locks?) 2. Lock some part of cache for a request (Cache::Mmap buckets, MLDBM pages?) 3. Lock only the key/value for a request (Cache::FileCache, MySQL InnoDB - row locks?) Uggg, to do a complete test you really need to generate an entire modelling system: 1. Number of concurrent processes 2. Average reads/writes to cache per second 3. Ratio of reused/new entries etc Rob
Re: Comparison of different caching schemes
One important aspect missing from my tests is the actual concurrency testing. Oh, I guess I should have checked your code. I thought these were concurrent. That makes a huge difference. 2. Lock some part of cache for a request (Cache::Mmap buckets, MLDBM pages?) MLDBM::Sync locks the whole thing. - Perrin