[Gluster-users] Newbie questions

Joshua Baker-LePain Mon, 03 May 2010 12:50:08 -0700

I'm a Gluster newbie trying to get myself up to speed. I've been throughthe bulk of the website docs and I'm in the midst of some small (althoughincreasing) scale test setups. But I wanted to poll the list's collectivewisdom on how best to fit Gluster into my setup.

As background, I currently have over 550 nodes with over 3000 cores in my(SGE scheduled) cluster, and we expand on a roughly biannual basis. Thecluster is all gigabit ethernet -- each rack has a switch, and theseswitches each have 4-port trunks to our central switch. Despite thenumber of nodes in each rack, these trunks are not currentlyoversubscribed. The cluster is shared among many research groups and thevast majority of the jobs are embarrassingly parallel. Our currentstorage is an active-active pair of NetApp FAS3070s with a total of 8shelves of disks. Unsurprisingly, it's fairly easy for any one user toflatten either head (or both) of the NetApp.


I'm looking at Gluster for 2 purposes:

1) To host our "database" volume.  This volume has copies of several
   protein and gene databases (PDB, UniProt, etc).  The databases
   generally consist of tens of thousands of small (a few hundred KB at
   most) files.  Users often start array jobs with hundreds or thousands
   of tasks, each task of which accesses many of these files.

2) To host a cluster-wide scratch space.  Users waste a lot of time (and
   bandwidth) copying (often temporary) results back and forth between the
   network storage and the nodes' scratch disks.  And scaling the NetApp
   is difficult, not least of which because it is rather difficult to
   convince PIs to spring for storage rather than more cores.

For purpose 1, clearly I'm looking at a replicated volume. For purpose 2,I'm assuming that distributed is the way to go (rather than striped),although for reliability reasons I'd likely go replicated thendistributed. For storage bricks, I'm looking at something like HP's DL180G6, where I would have 25 internal SAS disks (or alternatively, I couldput the same number in a SAS-attached external chassis).

In addition to any general advice folks could give, I have these specificquestions:


1) My initial leaning would be to RAID10 the disks at the server level,
   and then use the RAID volumes as gluster exports.  But I could also see
   running the disks in JBOD mode and doing all the redundancy at the
   Gluster level.  The latter would seem to make management (and, e.g.,
   hot swap) more difficult, but is it preferred from a Gluster
   perspective?  How difficult would it make disk and/or brick
   maintenance?

2) Is it frowned upon to create 2 volumes out of the same physical set of
   disks?  I'd like to maximize the spindle count in both volumes
   (especially the scratch volume), but will it overly degrade
   performance?  Would it be better to simply create one replicated and
   distributed volume and use that for both of the above purposes?

3) Is it crazy to think of doing a distributed (or NUFA) volume with the
   scratch disks in the whole cluster?  Especially given that we have
   nodes of many ages and see not infrequent node crashes due to bad
   memory/HDDs/user code?

If you've made it this far, thanks very much for reading. Any and alladvice (and/or pointers at more documentation) would be much appreciated.


--
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF
_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

[Gluster-users] Newbie questions

Reply via email to