Bill Moseley wrote:
>
> Hi,
>
> <verbose>
> I'm looking for a little discussion on selecting a data storage method, and
> I'm posting here because Cache::Cache often is discussed here (along with
> Apache::Session). And people here are smart, of course ;).
>
> Basically, I'm trying to understand when to use Cache::Cache, vs. Berkeley
> DB, and locking issues. (Perrin, I've been curious why at etoys you used
> Berkeley DB over other caching options, such as Cache::Cache). I think
> RDBMS is not required as I'm only reading/writing and not doing any kind of
> selects on the data -- also I could end up doing thousands of selects for a
> request. So far, performance has been good with the file system store.
>
Hey Bill, I'll tell you about using MLDBM::Sync for this, of which I'm
the author. MLDBM::Sync is a wrapper around MLDBM databases which could
be SDBM_File, GDBM_File, DB_File, and recently Tie::TextDir based.
It provides the locking layer that you need to keep access to the
database safe, without corrupting things or losing data. Depending
on the OS, and whether the dbm is on something like a RAM disk,
performance will vary.
MLDBM has long been a simple way to read and write complex data to dbm file
through an easy tied interface:
$dbm{$key} = \%data;
my $data = $dbm{$key};
What you get with MLDBM::Sync is the locking API, plus some other goodies
like RAM caching and auto checksum keys if you like.
> 1) Read/write a single record
> 2) Read anywhere from a few to thousands of records in a request. This
> is the typical mod_perl-based request. I know the record IDs that I
> need to read from another source. I basically need a way to get some
> subset of records fast, by record ID.
> 3) Traverse the data store and read every record.
>
Regarding some of these specific issues ... I wrote MLDBM::Sync to be
able to specifically handle #1 safely. For #2, there is an API that
you can use like
tied(%hash)->Lock(); OR tied(%hash)->ReadLock();
... do lots of reads/writes ...
tied(%hash)->Unlock();
that can be used to improve the performance of multiple reads
and writes between requests. You can use the locking strategy
too to do #3 really fast, or slower without locking. I wrote this
using the techniques I had long been using in Apache::ASP for $Session
and $Application support, and recently bolted MLDBM::Sync in for
these. I have been using MLDBM::Sync in production for something
like 6 months to a year now as a stand alone module, but only
recently added support for Tie::TextDir.
> When reading (item 2) I have to create a perl data structure from the data,
> which doesn't change. So, I want to store this in my record, using
> Storable.pm. That can work with any data store, of course.
>
MLDBM supports this kind of thing natively, via:
use MLDBM qw(DB_File Storable); # use Storable for serializing
Below are some benchmarks when running bench/bench_sync.pl in
the MLDBM::Sync distribution on my 2.2.14 Linux kernel on ext2 fs.
Only in my .23 dev release have I added the -n & --bundle options you
see below. The bundle option in particular is the # of reads/writes
per lock, which is used to improve performance. I would probably
use GDBM_File in your position, as I am not sure that Tie::TextDir
would scale as well past 10000 files/entries.
Happy hacking!
--Josh
_________________________________________________________________
Joshua Chamas Chamas Enterprises Inc.
NodeWorks Founder Huntington Beach, CA USA
http://www.nodeworks.com 1-714-625-4051
[MLDBM-Sync-0.21]# perl bench/bench_sync.pl
=== INSERT OF 50 BYTE RECORDS ===
Time for 100 writes + 100 reads for SDBM_File 0.14 seconds
12288 bytes
Time for 100 writes + 100 reads for MLDBM::Sync::SDBM_File 0.17 seconds
12288 bytes
Time for 100 writes + 100 reads for GDBM_File 3.00 seconds
18066 bytes
Time for 100 writes + 100 reads for DB_File 4.10 seconds
20480 bytes
Time for 100 writes + 100 reads for Tie::TextDir .04 0.24 seconds
9096 bytes
=== INSERT OF 500 BYTE RECORDS ===
Time for 100 writes + 100 reads for SDBM_File 0.24 seconds
1297408 bytes
Time for 100 writes + 100 reads for MLDBM::Sync::SDBM_File 0.54 seconds
207872 bytes
Time for 100 writes + 100 reads for GDBM_File 2.98 seconds
63472 bytes
Time for 100 writes + 100 reads for DB_File 4.29 seconds
114688 bytes
Time for 100 writes + 100 reads for Tie::TextDir .04 0.27 seconds
54096 bytes
=== INSERT OF 5000 BYTE RECORDS ===
(skipping test for SDBM_File 1024 byte limit)
Time for 100 writes + 100 reads for MLDBM::Sync::SDBM_File 1.35 seconds
1911808 bytes
Time for 100 writes + 100 reads for GDBM_File 4.11 seconds
832400 bytes
Time for 100 writes + 100 reads for DB_File 5.66 seconds
839680 bytes
Time for 100 writes + 100 reads for Tie::TextDir .04 0.49 seconds
504096 bytes
=== INSERT OF 20000 BYTE RECORDS ===
(skipping test for SDBM_File 1024 byte limit)
Time for 100 writes + 100 reads for MLDBM::Sync::SDBM_File 4.73 seconds
14994432 bytes
Time for 100 writes + 100 reads for GDBM_File 4.61 seconds
2063912 bytes
Time for 100 writes + 100 reads for DB_File 5.96 seconds
2068480 bytes
Time for 100 writes + 100 reads for Tie::TextDir .04 1.24 seconds
2004096 bytes
[MLDBM-Sync-0.23]# perl ./bench/bench_sync.pl -n=10000 --bundle=50
NUMBER OF PROCESSES IN TEST: 4
=== INSERT OF 50 BYTE RECORDS ===
Time for 10000 writes + 10000 reads for SDBM_File 5.44 seconds
6478848 bytes locks/pid=50
Time for 10000 writes + 10000 reads for MLDBM::Sync::SDBM_File 8.34 seconds
6102016 bytes locks/pid=50
Time for 10000 writes + 10000 reads for GDBM_File 29.94 seconds
1032312 bytes locks/pid=50
Time for 10000 writes + 10000 reads for DB_File 30.45 seconds
1335296 bytes locks/pid=50
Time for 10000 writes + 10000 reads for Tie::TextDir .04 39.37 seconds
901408 bytes locks/pid=50
=== INSERT OF 500 BYTE RECORDS ===
(skipping test for SDBM_File 100 byte limit)
(skipping test for MLDBM::Sync db size > 1M)
Time for 10000 writes + 10000 reads for GDBM_File 33.39 seconds
5501948 bytes locks/pid=50
Time for 10000 writes + 10000 reads for DB_File 60.65 seconds
11427840 bytes locks/pid=50
Time for 10000 writes + 10000 reads for Tie::TextDir .04 44.39 seconds
5401408 bytes locks/pid=50
=== INSERT OF 5000 BYTE RECORDS ===
(skipping test for SDBM_File 100 byte limit)
(skipping test for MLDBM::Sync db size > 1M)
Time for 10000 writes + 10000 reads for GDBM_File 85.13 seconds
82449298 bytes locks/pid=50
Time for 10000 writes + 10000 reads for DB_File 104.70 seconds
82563072 bytes locks/pid=50
Time for 10000 writes + 10000 reads for Tie::TextDir .04 94.12 seconds
50401408 bytes locks/pid=50
=== INSERT OF 20000 BYTE RECORDS ===
(skipping test for SDBM_File 100 byte limit)
(skipping test for MLDBM::Sync db size > 1M)
Time for 10000 writes + 10000 reads for GDBM_File 252.91 seconds
205409834 bytes locks/pid=50
Time for 10000 writes + 10000 reads for DB_File 273.33 seconds
205443072 bytes locks/pid=50
Time for 10000 writes + 10000 reads for Tie::TextDir .04 246.50 seconds
200401408 bytes locks/pid=50