Just putting about a little feeler about this package I started writing last
night.  Wondering about its usefulness, current availability, and just
overall interest.  Designed for mod_perl use.  Doesn't make much sense
otherwise.

Don't want to go into too many details here, but File::Redundant, takes some
unique word (hopefully guaranteed through a database: a mailbox, a username,
a website, etc.) which I call a thing, a pool of dirs, and how many $copies
you would like to maintain.  From the pool of dirs, $copies good dirs are
chosen, ordered by percent full on the given partition.
When you open a file with my open method (along with close, this is the only
override method I have written so far), you get a file handle.  Do what you
like on the file handle.  When you close the file handle, with my close
method, I CORE::close the file and use Rob Brown's File::DirSync to sync to
all the directories.  DirSync uses time stamps to very quickly sync changes
between directory trees.
When a dir can't be reached (box is down or what have you), $copies good
dirs are re-chosen and the dirsync happens from good old data to the new
good dirs.  If too much stuff goes down, you're sorta outta luck, but you
would have been without my system anyway.
I would write methods for everything (within reason) you do to a file, open,
close, unlink, rename, stat, etc.
So who cares?  Well, using this system would make it quite easy to keep
track of really an arbitrarily large amount of data.  The pool of dirs could
be mounts from any number of boxes, located remotely or otherwise, and you
could sync accordingly.  If File::DirSync gets to the point where you can
use ftp or scp, all the better.
There are race conditions all over the place, and I plan on
transactionalizing where I can.  The whole system depends on how long the
dirsync takes.  In my experience, dirsync is very fast.  Likely I would have
dirsync'ing daemon(s), dirsync'ing as fast as they can.  In some best case
scenario, the most data that would ever get lost would be the time it takes
to do one dirsync (usually less than a second for even very large amounts of
data), and the loss would only happen if you were making changes on a dir as
the dir went down.  I would try to deal with boxes coming back up and
keeping everything clean as best I could.
So, it would be a work in progress, and hopefully get better as I went, but
I would at least like to give it a shot.
Earl

Reply via email to