Bill Moseley wrote:
> 
> Hi,
> 
> <verbose>
> I'm looking for a little discussion on selecting a data storage method, and
> I'm posting here because Cache::Cache often is discussed here (along with
> Apache::Session).  And people here are smart, of course ;).
> 
> Basically, I'm trying to understand when to use Cache::Cache, vs. Berkeley
> DB, and locking issues.  (Perrin, I've been curious why at etoys you used
> Berkeley DB over other caching options, such as Cache::Cache).  I think
> RDBMS is not required as I'm only reading/writing and not doing any kind of
> selects on the data -- also I could end up doing thousands of selects for a
> request.  So far, performance has been good with the file system store.
> 

Hey Bill, I'll tell you about using MLDBM::Sync for this, of which I'm 
the author.  MLDBM::Sync is a wrapper around MLDBM databases which could
be SDBM_File, GDBM_File, DB_File, and recently Tie::TextDir based.
It provides the locking layer that you need to keep access to the 
database safe, without corrupting things or losing data.  Depending
on the OS, and whether the dbm is on something like a RAM disk, 
performance will vary.

MLDBM has long been a simple way to read and write complex data to dbm file
through an easy tied interface:

  $dbm{$key} = \%data;
  my $data = $dbm{$key};

What you get with MLDBM::Sync is the locking API, plus some other goodies
like RAM caching and auto checksum keys if you like.

> 1) Read/write a single record
> 2) Read anywhere from a few to thousands of records in a request. This
>    is the typical mod_perl-based request.  I know the record IDs that I
>    need to read from another source.  I basically need a way to get some
>    subset of records fast, by record ID.
> 3) Traverse the data store and read every record.
> 

Regarding some of these specific issues ... I wrote MLDBM::Sync to be
able to specifically handle #1 safely.  For #2, there is an API that 
you can use like 

  tied(%hash)->Lock(); OR tied(%hash)->ReadLock();
    ... do lots of reads/writes ...
  tied(%hash)->Unlock();

that can be used to improve the performance of multiple reads
and writes between requests.   You can use the locking strategy
too to do #3 really fast, or slower without locking.  I wrote this
using the techniques I had long been using in Apache::ASP for $Session
and $Application support, and recently bolted MLDBM::Sync in for
these.  I have been using MLDBM::Sync in production for something
like 6 months to a year now as a stand alone module, but only 
recently added support for Tie::TextDir.

> When reading (item 2) I have to create a perl data structure from the data,
> which doesn't change.  So, I want to store this in my record, using
> Storable.pm.  That can work with any data store, of course.
> 

MLDBM supports this kind of thing natively, via:

  use MLDBM qw(DB_File Storable);        # use Storable for serializing

Below are some benchmarks when running bench/bench_sync.pl in 
the MLDBM::Sync distribution on my 2.2.14 Linux kernel on ext2 fs.
Only in my .23 dev release have I added the -n & --bundle options you 
see below.  The bundle option in particular is the # of reads/writes
per lock, which is used to improve performance.  I would probably
use GDBM_File in your position, as I am not sure that Tie::TextDir
would scale as well past 10000 files/entries.

Happy hacking!

--Josh
_________________________________________________________________
Joshua Chamas                           Chamas Enterprises Inc.
NodeWorks Founder                       Huntington Beach, CA  USA 
http://www.nodeworks.com                1-714-625-4051

[MLDBM-Sync-0.21]# perl bench/bench_sync.pl 

=== INSERT OF 50 BYTE RECORDS ===
  Time for 100 writes + 100 reads for  SDBM_File                  0.14 seconds     
12288 bytes
  Time for 100 writes + 100 reads for  MLDBM::Sync::SDBM_File     0.17 seconds     
12288 bytes
  Time for 100 writes + 100 reads for  GDBM_File                  3.00 seconds     
18066 bytes
  Time for 100 writes + 100 reads for  DB_File                    4.10 seconds     
20480 bytes
  Time for 100 writes + 100 reads for  Tie::TextDir .04           0.24 seconds      
9096 bytes

=== INSERT OF 500 BYTE RECORDS ===
  Time for 100 writes + 100 reads for  SDBM_File                  0.24 seconds   
1297408 bytes
  Time for 100 writes + 100 reads for  MLDBM::Sync::SDBM_File     0.54 seconds    
207872 bytes
  Time for 100 writes + 100 reads for  GDBM_File                  2.98 seconds     
63472 bytes
  Time for 100 writes + 100 reads for  DB_File                    4.29 seconds    
114688 bytes
  Time for 100 writes + 100 reads for  Tie::TextDir .04           0.27 seconds     
54096 bytes

=== INSERT OF 5000 BYTE RECORDS ===
 (skipping test for SDBM_File 1024 byte limit)
  Time for 100 writes + 100 reads for  MLDBM::Sync::SDBM_File     1.35 seconds   
1911808 bytes
  Time for 100 writes + 100 reads for  GDBM_File                  4.11 seconds    
832400 bytes
  Time for 100 writes + 100 reads for  DB_File                    5.66 seconds    
839680 bytes
  Time for 100 writes + 100 reads for  Tie::TextDir .04           0.49 seconds    
504096 bytes

=== INSERT OF 20000 BYTE RECORDS ===
 (skipping test for SDBM_File 1024 byte limit)
  Time for 100 writes + 100 reads for  MLDBM::Sync::SDBM_File     4.73 seconds  
14994432 bytes
  Time for 100 writes + 100 reads for  GDBM_File                  4.61 seconds   
2063912 bytes
  Time for 100 writes + 100 reads for  DB_File                    5.96 seconds   
2068480 bytes
  Time for 100 writes + 100 reads for  Tie::TextDir .04           1.24 seconds   
2004096 bytes

[MLDBM-Sync-0.23]# perl ./bench/bench_sync.pl -n=10000 --bundle=50
NUMBER OF PROCESSES IN TEST: 4

=== INSERT OF 50 BYTE RECORDS ===
  Time for 10000 writes + 10000 reads for  SDBM_File                  5.44 seconds   
6478848 bytes locks/pid=50
  Time for 10000 writes + 10000 reads for  MLDBM::Sync::SDBM_File     8.34 seconds   
6102016 bytes locks/pid=50
  Time for 10000 writes + 10000 reads for  GDBM_File                 29.94 seconds   
1032312 bytes locks/pid=50
  Time for 10000 writes + 10000 reads for  DB_File                   30.45 seconds   
1335296 bytes locks/pid=50
  Time for 10000 writes + 10000 reads for  Tie::TextDir .04          39.37 seconds    
901408 bytes locks/pid=50

=== INSERT OF 500 BYTE RECORDS ===
 (skipping test for SDBM_File 100 byte limit)
 (skipping test for MLDBM::Sync db size > 1M)
  Time for 10000 writes + 10000 reads for  GDBM_File                 33.39 seconds   
5501948 bytes locks/pid=50
  Time for 10000 writes + 10000 reads for  DB_File                   60.65 seconds  
11427840 bytes locks/pid=50
  Time for 10000 writes + 10000 reads for  Tie::TextDir .04          44.39 seconds   
5401408 bytes locks/pid=50

=== INSERT OF 5000 BYTE RECORDS ===
 (skipping test for SDBM_File 100 byte limit)
 (skipping test for MLDBM::Sync db size > 1M)
  Time for 10000 writes + 10000 reads for  GDBM_File                 85.13 seconds  
82449298 bytes locks/pid=50
  Time for 10000 writes + 10000 reads for  DB_File                  104.70 seconds  
82563072 bytes locks/pid=50
  Time for 10000 writes + 10000 reads for  Tie::TextDir .04          94.12 seconds  
50401408 bytes locks/pid=50

=== INSERT OF 20000 BYTE RECORDS ===
 (skipping test for SDBM_File 100 byte limit)
 (skipping test for MLDBM::Sync db size > 1M)
  Time for 10000 writes + 10000 reads for  GDBM_File                252.91 seconds  
205409834 bytes locks/pid=50
  Time for 10000 writes + 10000 reads for  DB_File                  273.33 seconds  
205443072 bytes locks/pid=50
  Time for 10000 writes + 10000 reads for  Tie::TextDir .04         246.50 seconds  
200401408 bytes locks/pid=50

Reply via email to