Bill Moseley wrote: > > Hi, > > <verbose> > I'm looking for a little discussion on selecting a data storage method, and > I'm posting here because Cache::Cache often is discussed here (along with > Apache::Session). And people here are smart, of course ;). > > Basically, I'm trying to understand when to use Cache::Cache, vs. Berkeley > DB, and locking issues. (Perrin, I've been curious why at etoys you used > Berkeley DB over other caching options, such as Cache::Cache). I think > RDBMS is not required as I'm only reading/writing and not doing any kind of > selects on the data -- also I could end up doing thousands of selects for a > request. So far, performance has been good with the file system store. >
Hey Bill, I'll tell you about using MLDBM::Sync for this, of which I'm the author. MLDBM::Sync is a wrapper around MLDBM databases which could be SDBM_File, GDBM_File, DB_File, and recently Tie::TextDir based. It provides the locking layer that you need to keep access to the database safe, without corrupting things or losing data. Depending on the OS, and whether the dbm is on something like a RAM disk, performance will vary. MLDBM has long been a simple way to read and write complex data to dbm file through an easy tied interface: $dbm{$key} = \%data; my $data = $dbm{$key}; What you get with MLDBM::Sync is the locking API, plus some other goodies like RAM caching and auto checksum keys if you like. > 1) Read/write a single record > 2) Read anywhere from a few to thousands of records in a request. This > is the typical mod_perl-based request. I know the record IDs that I > need to read from another source. I basically need a way to get some > subset of records fast, by record ID. > 3) Traverse the data store and read every record. > Regarding some of these specific issues ... I wrote MLDBM::Sync to be able to specifically handle #1 safely. For #2, there is an API that you can use like tied(%hash)->Lock(); OR tied(%hash)->ReadLock(); ... do lots of reads/writes ... tied(%hash)->Unlock(); that can be used to improve the performance of multiple reads and writes between requests. You can use the locking strategy too to do #3 really fast, or slower without locking. I wrote this using the techniques I had long been using in Apache::ASP for $Session and $Application support, and recently bolted MLDBM::Sync in for these. I have been using MLDBM::Sync in production for something like 6 months to a year now as a stand alone module, but only recently added support for Tie::TextDir. > When reading (item 2) I have to create a perl data structure from the data, > which doesn't change. So, I want to store this in my record, using > Storable.pm. That can work with any data store, of course. > MLDBM supports this kind of thing natively, via: use MLDBM qw(DB_File Storable); # use Storable for serializing Below are some benchmarks when running bench/bench_sync.pl in the MLDBM::Sync distribution on my 2.2.14 Linux kernel on ext2 fs. Only in my .23 dev release have I added the -n & --bundle options you see below. The bundle option in particular is the # of reads/writes per lock, which is used to improve performance. I would probably use GDBM_File in your position, as I am not sure that Tie::TextDir would scale as well past 10000 files/entries. Happy hacking! --Josh _________________________________________________________________ Joshua Chamas Chamas Enterprises Inc. NodeWorks Founder Huntington Beach, CA USA http://www.nodeworks.com 1-714-625-4051 [MLDBM-Sync-0.21]# perl bench/bench_sync.pl === INSERT OF 50 BYTE RECORDS === Time for 100 writes + 100 reads for SDBM_File 0.14 seconds 12288 bytes Time for 100 writes + 100 reads for MLDBM::Sync::SDBM_File 0.17 seconds 12288 bytes Time for 100 writes + 100 reads for GDBM_File 3.00 seconds 18066 bytes Time for 100 writes + 100 reads for DB_File 4.10 seconds 20480 bytes Time for 100 writes + 100 reads for Tie::TextDir .04 0.24 seconds 9096 bytes === INSERT OF 500 BYTE RECORDS === Time for 100 writes + 100 reads for SDBM_File 0.24 seconds 1297408 bytes Time for 100 writes + 100 reads for MLDBM::Sync::SDBM_File 0.54 seconds 207872 bytes Time for 100 writes + 100 reads for GDBM_File 2.98 seconds 63472 bytes Time for 100 writes + 100 reads for DB_File 4.29 seconds 114688 bytes Time for 100 writes + 100 reads for Tie::TextDir .04 0.27 seconds 54096 bytes === INSERT OF 5000 BYTE RECORDS === (skipping test for SDBM_File 1024 byte limit) Time for 100 writes + 100 reads for MLDBM::Sync::SDBM_File 1.35 seconds 1911808 bytes Time for 100 writes + 100 reads for GDBM_File 4.11 seconds 832400 bytes Time for 100 writes + 100 reads for DB_File 5.66 seconds 839680 bytes Time for 100 writes + 100 reads for Tie::TextDir .04 0.49 seconds 504096 bytes === INSERT OF 20000 BYTE RECORDS === (skipping test for SDBM_File 1024 byte limit) Time for 100 writes + 100 reads for MLDBM::Sync::SDBM_File 4.73 seconds 14994432 bytes Time for 100 writes + 100 reads for GDBM_File 4.61 seconds 2063912 bytes Time for 100 writes + 100 reads for DB_File 5.96 seconds 2068480 bytes Time for 100 writes + 100 reads for Tie::TextDir .04 1.24 seconds 2004096 bytes [MLDBM-Sync-0.23]# perl ./bench/bench_sync.pl -n=10000 --bundle=50 NUMBER OF PROCESSES IN TEST: 4 === INSERT OF 50 BYTE RECORDS === Time for 10000 writes + 10000 reads for SDBM_File 5.44 seconds 6478848 bytes locks/pid=50 Time for 10000 writes + 10000 reads for MLDBM::Sync::SDBM_File 8.34 seconds 6102016 bytes locks/pid=50 Time for 10000 writes + 10000 reads for GDBM_File 29.94 seconds 1032312 bytes locks/pid=50 Time for 10000 writes + 10000 reads for DB_File 30.45 seconds 1335296 bytes locks/pid=50 Time for 10000 writes + 10000 reads for Tie::TextDir .04 39.37 seconds 901408 bytes locks/pid=50 === INSERT OF 500 BYTE RECORDS === (skipping test for SDBM_File 100 byte limit) (skipping test for MLDBM::Sync db size > 1M) Time for 10000 writes + 10000 reads for GDBM_File 33.39 seconds 5501948 bytes locks/pid=50 Time for 10000 writes + 10000 reads for DB_File 60.65 seconds 11427840 bytes locks/pid=50 Time for 10000 writes + 10000 reads for Tie::TextDir .04 44.39 seconds 5401408 bytes locks/pid=50 === INSERT OF 5000 BYTE RECORDS === (skipping test for SDBM_File 100 byte limit) (skipping test for MLDBM::Sync db size > 1M) Time for 10000 writes + 10000 reads for GDBM_File 85.13 seconds 82449298 bytes locks/pid=50 Time for 10000 writes + 10000 reads for DB_File 104.70 seconds 82563072 bytes locks/pid=50 Time for 10000 writes + 10000 reads for Tie::TextDir .04 94.12 seconds 50401408 bytes locks/pid=50 === INSERT OF 20000 BYTE RECORDS === (skipping test for SDBM_File 100 byte limit) (skipping test for MLDBM::Sync db size > 1M) Time for 10000 writes + 10000 reads for GDBM_File 252.91 seconds 205409834 bytes locks/pid=50 Time for 10000 writes + 10000 reads for DB_File 273.33 seconds 205443072 bytes locks/pid=50 Time for 10000 writes + 10000 reads for Tie::TextDir .04 246.50 seconds 200401408 bytes locks/pid=50