Re: [OT] Data store options

2001-11-08 Thread Joachim Zobel

At 08:59 08.11.01 -0800, you wrote:
My specifics are that I have a need to permanently store tens of thousands
of smallish (5K) items.  I'm currently using a simple file system store,
one file per record, all in the same directory.  Clearly, I need to move
into a directory tree for better performance as the number of files increases.

A modern FS like XFS or ReiserFS will do that without a directory tree.

Joachim

--
... ein Geschlecht erfinderischer Zwerge, die fuer alles gemietet werden
koennen.- Bertolt Brecht - Leben des Galilei




[OT] Data store options

2001-11-08 Thread Bill Moseley

Hi,

verbose
I'm looking for a little discussion on selecting a data storage method, and
I'm posting here because Cache::Cache often is discussed here (along with
Apache::Session).  And people here are smart, of course ;).

Basically, I'm trying to understand when to use Cache::Cache, vs. Berkeley
DB, and locking issues.  (Perrin, I've been curious why at etoys you used
Berkeley DB over other caching options, such as Cache::Cache).  I think
RDBMS is not required as I'm only reading/writing and not doing any kind of
selects on the data -- also I could end up doing thousands of selects for a
request.  So far, performance has been good with the file system store.

My specifics are that I have a need to permanently store tens of thousands
of smallish (5K) items.  I'm currently using a simple file system store,
one file per record, all in the same directory.  Clearly, I need to move
into a directory tree for better performance as the number of files increases.

The data is accessed in a few ways:

1) Read/write a single record
2) Read anywhere from a few to thousands of records in a request. This
   is the typical mod_perl-based request.  I know the record IDs that I
   need to read from another source.  I basically need a way to get some
   subset of records fast, by record ID.
3) Traverse the data store and read every record.

I don't need features to automatically expire the records.  They are
permanent.

When reading (item 2) I have to create a perl data structure from the data,
which doesn't change.  So, I want to store this in my record, using
Storable.pm.  That can work with any data store, of course.

It's not a complicated design.  My choices are something like:

1) use Storable and write the files out myself.
2) use Cache::FileCache and have the work done (but can I traverse?)
3) use Berkeley DB (I understand the issues discussed in The Guide)

So, what kind of questions and answers would help be weigh the options?

With regard to locking, IIRC, Cache::Cache doesn't lock, rather writes go
to a temp file, then there's an atomic rename.  Last in wins.  If updates
to a record are not based on previous content (such as a counter file) is
there any reason this is not a perfectly good method -- as opposed to flock?

Again, I'm really looking more for discussion, not an answer to my specific
needs.  What issues would you use when selecting a data store method, and why?

/verbose

Thanks very much,





Bill Moseley
mailto:[EMAIL PROTECTED]



Re: [OT] Data store options

2001-11-08 Thread Perrin Harkins

 Basically, I'm trying to understand when to use Cache::Cache, vs. Berkeley
 DB, and locking issues.  (Perrin, I've been curious why at etoys you used
 Berkeley DB over other caching options, such as Cache::Cache).

Cache::Cache didn't exist at the time.  BerkeleyDB seemed easier than
rolling our own file cache, but in retrospect I'm not sure it was the right
way to go.  We spent a fair amount of time figuring out how to work with
BerkeleyDB effectively.

 1) use Storable and write the files out myself.
 2) use Cache::FileCache and have the work done (but can I traverse?)
 3) use Berkeley DB (I understand the issues discussed in The Guide)

If you do use BerkeleyDB, I suggest you just use the simple database-level
lock.  Otherwise, you have to think about deadlocks and I found the deadlock
daemon that comes with it kind of difficult to use.

 So, what kind of questions and answers would help be weigh the options?

If you have too many records, I suppose Cache::FileCache might eat up all
your inodes.  You can probably go pretty far with a modern OS before you hit
a problem with this.

I always wanted to benchmark Cache::FileCache against BerkeleyDB for
different read/write loads and see how they do.  I think FileCache would win
on Linux.

 With regard to locking, IIRC, Cache::Cache doesn't lock, rather writes go
 to a temp file, then there's an atomic rename.  Last in wins.  If updates
 to a record are not based on previous content (such as a counter file) is
 there any reason this is not a perfectly good method -- as opposed to
flock?

That should be fine.

- Perrin




Cache::* and MD5 collisions [was: [OT] Data store options]

2001-11-08 Thread Barrie Slaymaker

On Thu, Nov 08, 2001 at 08:59:55AM -0800, Bill Moseley wrote:
 Hi,
 
 verbose
 I'm looking for a little discussion on selecting a data storage method, and
 I'm posting here because Cache::Cache often is discussed here (along with
 Apache::Session).  And people here are smart, of course ;).
 
 Basically, I'm trying to understand when to use Cache::Cache, vs. Berkeley
 DB, and locking issues.

Even a bit more OT: one thing to watch out for, especially if you plan
on caching a *lot* of data, is that the Cache::* modules did not do
collision detection on MD5 collisions the last time I looked.  Forgive
me if that's changed recently.

The probability is extremely low, but they can and do occur in the
wild (I won't bring anyone's name up here :).  In other words, it's a
remote possibility that one URI might serve up another's content if the
two hash keys map to the same MD5 value.

Given an infinite number of web monkeys using Cache::* and an infinite
number of user monkeys clicking on links...

- Barrie



Re: Cache::* and MD5 collisions [was: [OT] Data store options]

2001-11-08 Thread Jeffrey W. Baker

On Thu, 2001-11-08 at 10:11, Barrie Slaymaker wrote:
 On Thu, Nov 08, 2001 at 08:59:55AM -0800, Bill Moseley wrote:
  Hi,
  
  verbose
  I'm looking for a little discussion on selecting a data storage method, and
  I'm posting here because Cache::Cache often is discussed here (along with
  Apache::Session).  And people here are smart, of course ;).
  
  Basically, I'm trying to understand when to use Cache::Cache, vs. Berkeley
  DB, and locking issues.
 
 Even a bit more OT: one thing to watch out for, especially if you plan
 on caching a *lot* of data, is that the Cache::* modules did not do
 collision detection on MD5 collisions the last time I looked.  Forgive
 me if that's changed recently.
 
 The probability is extremely low, but they can and do occur in the
 wild (I won't bring anyone's name up here :).  In other words, it's a
 remote possibility that one URI might serve up another's content if the
 two hash keys map to the same MD5 value.
 
 Given an infinite number of web monkeys using Cache::* and an infinite
 number of user monkeys clicking on links...

You could switch to SHA1.  SHA1 collisions will be much more rare than
MD5 collisions.

-jwb




Re: Cache::* and MD5 collisions [was: [OT] Data store options]

2001-11-08 Thread DeWitt Clinton

On Thu, Nov 08, 2001 at 01:11:21PM -0500, Barrie Slaymaker wrote:

 Even a bit more OT: one thing to watch out for, especially if you
 plan on caching a *lot* of data, is that the Cache::* modules did
 not do collision detection on MD5 collisions the last time I looked.
 Forgive me if that's changed recently.

I knew that a collision was technically possible, but the odds seemed
astronomically low.  But if security is a concern, then I agree that
MD5-ing the key isn't the best way to create a unique identifier.  The
thing that I liked most about MD5 (other than its convenience) is that
it tends to distribute very evenly.  For example, file system caches
fill their directories roughly equally when their paths are created
from MD5 hashed keys.  Doing something simple and unique like
URL-encoding the key to make a legal identifier (legal in the sense
that it is a valid filename) wouldn't distribute as evenly.

Barrie raised this question privately to me earlier this year, and
this is what I wrote at the time:

   Good question.  They would overwrite each other.  You may want to
   check out this RFC for more information about the odds of that:

http://www.faqs.org/rfcs/rfc1321.html 
   Specifically, they say that it is conjectured that the difficulty
   of coming up with two messages having the same message digest is on
   the order of 2^64 operations, and that the difficulty of coming up
   with any message having a given message digest is on the order
   of 2^128 operations.

   This means you would need 18,446,744,073,709,551,616 keys to make
   it 100% likely that there would be a collision.

   Assuming you were only willing to risk a 0.01% chance that there is
   a collision, you could still fit 1,844,674,407,370,955 keys.  Since
   each value stored will take at least 1k in memory, that number of
   keys would require 1,844,674 Terrabytes of storage.
   
   In other words, it is highly unlikely that you could get a collision
   in real world usage.


Note that I'm not disagreeing with Barrie at all -- I just took the
easy way out and refuted the likelihood of an infinite number of
monkeys hitting my code.  :)

-DeWitt




Re: Cache::* and MD5 collisions [was: [OT] Data store options]

2001-11-08 Thread Andrew Ho

Hello,

DCFor example, file system caches fill their directories roughly equally
DCwhen their paths are created from MD5 hashed keys. Doing something
DCsimple and unique like URL-encoding the key to make a legal identifier
DC(legal in the sense that it is a valid filename) wouldn't distribute as
DCevenly.

Let me point out that if you are using MD5 hashes for directory spreading
(i.e. to spread a large number of files across a tree of directories so
that no one directory is filled with too many files for efficient
filesystem access), the easy solution is to just append the unique key
that you are hashing to the files involved.

For example, say your keys are e-mail addresses and you just want to use
an MD5 hash to spread your data files over directories so that no one
directory has too many files in it. Say your original key is
[EMAIL PROTECTED] (hex encoded MD5 hash of this is RfbmPiuRLyPGGt3oHBagt).
Instead of just storing the key in the file
R/Rf/Rfb/Rfbm/RfbmPiuRLyPGGt3oHBagt.dat, store the key in the file
[EMAIL PROTECTED] Presto... collisions are impossible.

Humbly,

Andrew

--
Andrew Ho   http://www.tellme.com/   [EMAIL PROTECTED]
Engineer   [EMAIL PROTECTED]  Voice 650-930-9062
Tellme Networks, Inc.   1-800-555-TELLFax 650-930-9101
--






Re: [OT] Data store options

2001-11-08 Thread Andrew Ho

Hello,

PHIf you do use BerkeleyDB, I suggest you just use the simple
PHdatabase-level lock. Otherwise, you have to think about deadlocks and I
PHfound the deadlock daemon that comes with it kind of difficult to use.

Later versions of BerkeleyDB have a row-level lock available which works
pretty transparently. However, this works using mmap() so it won't work if
your BerkeleyDB file(s) is/are mounted via NFS. flock()'s scalability over
NFS using lockd under Solaris is also questionable so many people end up
implementing their own database-level lock using, say, atomic moves. It's
an icky world out there.

Humbly,

Andrew

--
Andrew Ho   http://www.tellme.com/   [EMAIL PROTECTED]
Engineer   [EMAIL PROTECTED]  Voice 650-930-9062
Tellme Networks, Inc.   1-800-555-TELLFax 650-930-9101
--




Re: Cache::* and MD5 collisions [was: [OT] Data store options]

2001-11-08 Thread Bill Moseley

At 10:54 AM 11/08/01 -0800, Andrew Ho wrote:
For example, say your keys are e-mail addresses and you just want to use
an MD5 hash to spread your data files over directories so that no one
directory has too many files in it. Say your original key is
[EMAIL PROTECTED] (hex encoded MD5 hash of this is RfbmPiuRLyPGGt3oHBagt).
Instead of just storing the key in the file
R/Rf/Rfb/Rfbm/RfbmPiuRLyPGGt3oHBagt.dat, store the key in the file
[EMAIL PROTECTED] Presto... collisions are impossible.

That has the nice side effect that I can run through the directory tree and
get the key for every file.  I do need a way to read every key in the
store.  Order is not important.



Bill Moseley
mailto:[EMAIL PROTECTED]



Re: Cache::* and MD5 collisions [was: [OT] Data store options]

2001-11-08 Thread Barrie Slaymaker

On Thu, Nov 08, 2001 at 10:54:11AM -0800, Andrew Ho wrote:
 Let me point out that if you are using MD5 hashes for directory spreading
 (i.e. to spread a large number of files across a tree of directories so
 that no one directory is filled with too many files for efficient
 filesystem access), the easy solution is to just append the unique key
 that you are hashing to the files involved.

Neat.

That adds directory levels if you do URI-filesystem_path mapping for
the key, or runs in to namelength limits if you fold / and, say, %
and . in to %FF style escape codes or base64 it.  And namelength
limitations are even more severe on many non-Unixish filesystems :).

I prefer the technique of storing the full text of the hash key in the
stored object's metadata (ie in the cache) and comparing it to the
requested key on retrieval.  When a collision is detected, you can then
do overflow processing (like most hashed dictionary data structures from
CS-land) and seek the cached copy somewhere else in the cache or just
treat it like a cache miss.

MD5 is a good thing, but relying on it's uniqueness for a site that needs
to be reliable is a bit risky in my mind.  YMMV, I just want folks to be
aware of the issues.

- Barrie



Re: [OT] Data store options

2001-11-08 Thread Joshua Chamas

Bill Moseley wrote:
 
 Hi,
 
 verbose
 I'm looking for a little discussion on selecting a data storage method, and
 I'm posting here because Cache::Cache often is discussed here (along with
 Apache::Session).  And people here are smart, of course ;).
 
 Basically, I'm trying to understand when to use Cache::Cache, vs. Berkeley
 DB, and locking issues.  (Perrin, I've been curious why at etoys you used
 Berkeley DB over other caching options, such as Cache::Cache).  I think
 RDBMS is not required as I'm only reading/writing and not doing any kind of
 selects on the data -- also I could end up doing thousands of selects for a
 request.  So far, performance has been good with the file system store.
 

Hey Bill, I'll tell you about using MLDBM::Sync for this, of which I'm 
the author.  MLDBM::Sync is a wrapper around MLDBM databases which could
be SDBM_File, GDBM_File, DB_File, and recently Tie::TextDir based.
It provides the locking layer that you need to keep access to the 
database safe, without corrupting things or losing data.  Depending
on the OS, and whether the dbm is on something like a RAM disk, 
performance will vary.

MLDBM has long been a simple way to read and write complex data to dbm file
through an easy tied interface:

  $dbm{$key} = \%data;
  my $data = $dbm{$key};

What you get with MLDBM::Sync is the locking API, plus some other goodies
like RAM caching and auto checksum keys if you like.

 1) Read/write a single record
 2) Read anywhere from a few to thousands of records in a request. This
is the typical mod_perl-based request.  I know the record IDs that I
need to read from another source.  I basically need a way to get some
subset of records fast, by record ID.
 3) Traverse the data store and read every record.
 

Regarding some of these specific issues ... I wrote MLDBM::Sync to be
able to specifically handle #1 safely.  For #2, there is an API that 
you can use like 

  tied(%hash)-Lock(); OR tied(%hash)-ReadLock();
... do lots of reads/writes ...
  tied(%hash)-Unlock();

that can be used to improve the performance of multiple reads
and writes between requests.   You can use the locking strategy
too to do #3 really fast, or slower without locking.  I wrote this
using the techniques I had long been using in Apache::ASP for $Session
and $Application support, and recently bolted MLDBM::Sync in for
these.  I have been using MLDBM::Sync in production for something
like 6 months to a year now as a stand alone module, but only 
recently added support for Tie::TextDir.

 When reading (item 2) I have to create a perl data structure from the data,
 which doesn't change.  So, I want to store this in my record, using
 Storable.pm.  That can work with any data store, of course.
 

MLDBM supports this kind of thing natively, via:

  use MLDBM qw(DB_File Storable);# use Storable for serializing

Below are some benchmarks when running bench/bench_sync.pl in 
the MLDBM::Sync distribution on my 2.2.14 Linux kernel on ext2 fs.
Only in my .23 dev release have I added the -n  --bundle options you 
see below.  The bundle option in particular is the # of reads/writes
per lock, which is used to improve performance.  I would probably
use GDBM_File in your position, as I am not sure that Tie::TextDir
would scale as well past 1 files/entries.

Happy hacking!

--Josh
_
Joshua Chamas   Chamas Enterprises Inc.
NodeWorks Founder   Huntington Beach, CA  USA 
http://www.nodeworks.com1-714-625-4051

[MLDBM-Sync-0.21]# perl bench/bench_sync.pl 

=== INSERT OF 50 BYTE RECORDS ===
  Time for 100 writes + 100 reads for  SDBM_File  0.14 seconds 
12288 bytes
  Time for 100 writes + 100 reads for  MLDBM::Sync::SDBM_File 0.17 seconds 
12288 bytes
  Time for 100 writes + 100 reads for  GDBM_File  3.00 seconds 
18066 bytes
  Time for 100 writes + 100 reads for  DB_File4.10 seconds 
20480 bytes
  Time for 100 writes + 100 reads for  Tie::TextDir .04   0.24 seconds  
9096 bytes

=== INSERT OF 500 BYTE RECORDS ===
  Time for 100 writes + 100 reads for  SDBM_File  0.24 seconds   
1297408 bytes
  Time for 100 writes + 100 reads for  MLDBM::Sync::SDBM_File 0.54 seconds
207872 bytes
  Time for 100 writes + 100 reads for  GDBM_File  2.98 seconds 
63472 bytes
  Time for 100 writes + 100 reads for  DB_File4.29 seconds
114688 bytes
  Time for 100 writes + 100 reads for  Tie::TextDir .04   0.27 seconds 
54096 bytes

=== INSERT OF 5000 BYTE RECORDS ===
 (skipping test for SDBM_File 1024 byte limit)
  Time for 100 writes + 100 reads for  MLDBM::Sync::SDBM_File 1.35 seconds   
1911808 bytes
  Time for 100 writes + 100 reads for  GDBM_File  4.11 seconds
832400 bytes
  Time for 100 writes + 100 reads for