Hi,

<verbose>
I'm looking for a little discussion on selecting a data storage method, and
I'm posting here because Cache::Cache often is discussed here (along with
Apache::Session).  And people here are smart, of course ;).

Basically, I'm trying to understand when to use Cache::Cache, vs. Berkeley
DB, and locking issues.  (Perrin, I've been curious why at etoys you used
Berkeley DB over other caching options, such as Cache::Cache).  I think
RDBMS is not required as I'm only reading/writing and not doing any kind of
selects on the data -- also I could end up doing thousands of selects for a
request.  So far, performance has been good with the file system store.

My specifics are that I have a need to permanently store tens of thousands
of smallish (5K) items.  I'm currently using a simple file system store,
one file per record, all in the same directory.  Clearly, I need to move
into a directory tree for better performance as the number of files increases.

The data is accessed in a few ways:

1) Read/write a single record
2) Read anywhere from a few to thousands of records in a request. This
   is the typical mod_perl-based request.  I know the record IDs that I
   need to read from another source.  I basically need a way to get some
   subset of records fast, by record ID.
3) Traverse the data store and read every record.

I don't need features to automatically expire the records.  They are
permanent.

When reading (item 2) I have to create a perl data structure from the data,
which doesn't change.  So, I want to store this in my record, using
Storable.pm.  That can work with any data store, of course.

It's not a complicated design.  My choices are something like:

1) use Storable and write the files out myself.
2) use Cache::FileCache and have the work done (but can I traverse?)
3) use Berkeley DB (I understand the issues discussed in The Guide)

So, what kind of questions and answers would help be weigh the options?

With regard to locking, IIRC, Cache::Cache doesn't lock, rather writes go
to a temp file, then there's an atomic rename.  Last in wins.  If updates
to a record are not based on previous content (such as a counter file) is
there any reason this is not a perfectly good method -- as opposed to flock?

Again, I'm really looking more for discussion, not an answer to my specific
needs.  What issues would you use when selecting a data store method, and why?

</verbose>

Thanks very much,





Bill Moseley
mailto:[EMAIL PROTECTED]

Reply via email to