Hi David,
I need a little better idea of the entire workload to really answer your
question, but I can talk a little about pros and cons at least.
First, we've actually run a CD farm on top of PVFS (well PVFS1) before.
Long ago we had a system at Clemson that ripped CDs onto PVFS and then
scheduled encodings on various cluster nodes using a job scheduler (PBS
at the time). A pretty cheezy version of what you're talking about, but
it was cool :). Also, it was relatively fast, once we got cdparanoia to
write in multi-KB blocks (a patch they accepted long ago).
The first thing to think about is reliability. As long as you have
redundant storage on the various servers, and your rebuild times aren't
too long, you will not lose data. If a server fails for some reason, you
will lose access to data until you get the server running again, but the
data will still be there. There are ways to maintain access in the event
of server failure, but those require SAN hardware that you might not
already have in-house.
Alternatively you could just split the space up into two volumes and
rsync or something to mirror, up to you. It sounds like this would be a
viable model for you, and disk is cheap.
Access from windows is going to require exporting with samba or NFS. We
don't really suggest that usually and don't test it in-house, but it
should work. Access will be relatively slow because of the lack of
client caching in PVFS.
If the majority of your I/O traffic is to and from windows boxes, I
would say you should probably find something with better windows support
or that caches on clients so as to get better performance via samba or
NFS. If a significant part of your I/O traffic is on the linux side,
then I think PVFS might make sense for you.
Regards,
Rob
David Case wrote:
I am looking at PVFS to replace a system of 45 machines holding about 100tb of
data, were we currently use nfs and a bunch of symlinks to keep track of
everything. When I was given this thing to administer, I was able to
parallelize parts of it, but having everything under one filesystem would be
really really nice.
This system is basically the music equivalent of a render farm -- we get about
60-100 cds a day, we encode them in a lossless compressed format, then have a
set of windows machines that run the various encodings (mp3, aac, wma, etc...
some of which are unavailable on Linux). And then the files get delivered to
a bunch of downstream partner companies (probably 70 or so). We keep a copy
of all the encodings we do (23 different encodings at this time, more soon).
So ideally we need a fast parallel system that can also serve as an archive
(because when new partner companies come in, we give them the whole catalog).
If we had a catastrophic loss of all the data, I would probably lose my job,
but occasional partial losses are recoverable (we keep a store copy of every
CD, and we have recovered from losing 10,000 albums in a fairly short amount
of time)
So do you think PVFS is suitable? I saw a post on this list that it is best
suited for use as a fast scratchpad. I really like PVFS over Lustre just
from looking at it -- a kernel module is vastly more palatable than patching
the kernel, plus the whole thing is totally free and open, unlike Lustre.
What do you think?
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users