Hi David,

I need a little better idea of the entire workload to really answer your question, but I can talk a little about pros and cons at least.

First, we've actually run a CD farm on top of PVFS (well PVFS1) before. Long ago we had a system at Clemson that ripped CDs onto PVFS and then scheduled encodings on various cluster nodes using a job scheduler (PBS at the time). A pretty cheezy version of what you're talking about, but it was cool :). Also, it was relatively fast, once we got cdparanoia to write in multi-KB blocks (a patch they accepted long ago).

The first thing to think about is reliability. As long as you have redundant storage on the various servers, and your rebuild times aren't too long, you will not lose data. If a server fails for some reason, you will lose access to data until you get the server running again, but the data will still be there. There are ways to maintain access in the event of server failure, but those require SAN hardware that you might not already have in-house.

Alternatively you could just split the space up into two volumes and rsync or something to mirror, up to you. It sounds like this would be a viable model for you, and disk is cheap.

Access from windows is going to require exporting with samba or NFS. We don't really suggest that usually and don't test it in-house, but it should work. Access will be relatively slow because of the lack of client caching in PVFS.

If the majority of your I/O traffic is to and from windows boxes, I would say you should probably find something with better windows support or that caches on clients so as to get better performance via samba or NFS. If a significant part of your I/O traffic is on the linux side, then I think PVFS might make sense for you.

Regards,

Rob

David Case wrote:
I am looking at PVFS to replace a system of 45 machines holding about 100tb of data, were we currently use nfs and a bunch of symlinks to keep track of everything. When I was given this thing to administer, I was able to parallelize parts of it, but having everything under one filesystem would be really really nice.

This system is basically the music equivalent of a render farm -- we get about 60-100 cds a day, we encode them in a lossless compressed format, then have a set of windows machines that run the various encodings (mp3, aac, wma, etc... some of which are unavailable on Linux). And then the files get delivered to a bunch of downstream partner companies (probably 70 or so). We keep a copy of all the encodings we do (23 different encodings at this time, more soon). So ideally we need a fast parallel system that can also serve as an archive (because when new partner companies come in, we give them the whole catalog).

If we had a catastrophic loss of all the data, I would probably lose my job, but occasional partial losses are recoverable (we keep a store copy of every CD, and we have recovered from losing 10,000 albums in a fairly short amount of time)

So do you think PVFS is suitable? I saw a post on this list that it is best suited for use as a fast scratchpad. I really like PVFS over Lustre just from looking at it -- a kernel module is vastly more palatable than patching the kernel, plus the whole thing is totally free and open, unlike Lustre.

What do you think?

_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to