On 5/13/2010 5:05 PM, Craig Carl wrote:
Jeff -
Two comments/ideas.
1. If you are limited to four pieces of hardware, the minimum for
stripe, and you want to stripe some of the data and just distribute
other files there is a way to do that. Ideally you would use your
hardware RAID controllers to create two LUNs on each host, one for
distribute, the other for stripe. If you don't have hardware RAID you
could use LVM2 or ZFS to achieve the same thing. (or you could use
folders)
1a. Once you have two file systems created use
glusterfs-volgen to create the vol files for the distribute export
just like you normally would.
1b. Move the files you just created to the storage servers and
clients.
1c. Re-run glusterfs-volgen this time for the stripe, adding
the -p option and specifying a port. (something above 1024, not 6996).
1d. Move the files you just created to the storage servers and
clients.
1e . Start Gluster twice on all the servers, specifying the
different vol files.
1f. You now have two GlusterFS exports, one distribute, the
other mirror.
1g. You can mount one inside the other on the client if that
makes management easier.
There are advantages to this model, having two separate Gluster
instances significantly improves parallelism on the storage servers.
You can manage the two instances as if they are on different iron.
2. The use case for stripe is vanishingly small. If you have very
large files (at least 2X the amount of memory in your storage servers
and a minimum of 50GB) with very limited writes and simultaneous
access from hundreds of clients then maybe stripe might be
appropriate. Stripe was designed for a specific type of HPC problem
solving, not general file serving. Our video streaming users don't use
stripe, even though that is an obvious use, there are better ways to
configure Gluster for that. If you could share the type of
content/access methods/iops per sec we could make some specific
suggestions.
We *are* a quasi-HPC environment. We have 100+ batch compute servers
with 500+ cores, all with GbT interfaces, pounding on an old NAS storage
server. We are trying to replace the old shared staging area with new
hardware. We've been looking at an Isilon solution, which performs well
for the task but costs 4x to 5x what a Gluster solution would price out
at for similar-sized hardware/space.
Some our users have millions of small files, some have thousands of
large files, some have one or two humongous files. If all the data was
just one size or another all would be well. All files are currently
stored in the same shared staging area. Our users are not HPC
programmers and tend to program in HLL such as matlab, so we try to be
as accommodating as possible, rather than force them to manage the data
distribution.
We'd love a solution that would (a) spread small files over multiple
volumes as well as (b) spread large files over multiple volumes.
Cluster/distribute would work for the former and cluster/stripe for the
latter. A marriage of the two would be great.
Right now I'm trying to patch together a temporary testbed using a bunch
of old machines with two 143GB drives each. The problem is that many
files are multi-GB and unless they are striped they could easily fill up
a volume with poor hash distributions. Likewise many small files could
swamp the low-end disk in a stripe volume.
I suppose we could create two pools and tell the predominantly small
file users to use one and the predominantly large file users to use the
other, but somehow I would not hold my breath on it working out.
Jeff
_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users