If you can, I'd advocate the route you suggest - multiple RAID groups, each group maps to a unique LUN, and each LUN is an OST. Note that you'll likely want the number of data disks in each RAID to be a power of 2 (e.g., 6- or 10-disk raid6, 5- or 9-disk raid5). Obviously, you'll be wasting more spindles on overhead (RAID parity), but performance is more predictable.
The other way (single RAID, multiple LUNs, LUN == OST) means the performance of OSTs aren't independent - they all bottleneck on the same RAID array. If you have enough RAID controller bandwidth, this can (theoretically) work, but makes hunting/fixing performance problems more complex. In Lustre, if it's not writing fast enough, you can just stripe over more OSTs. However, if your OSTs aren't really independent, that may or may not help - you'll get different bandwidth depending on how many OSTs are sharing the same pool of physical disks. I'd expect two OSTs that don't share drives to write faster than two that do, and so on. BTW, if you have multiple controllers, and the LUN platform has a sense of controller affinity (i.e., a LUN uses one controller as "primary" and another as "secondary" or "backup"), try and balance your RAIDs across the two controllers in your array. For instance, stick even-numbered LUNs on one, odd-numbered LUNs on another. Also, if you're doing multi-pathing into your OSSes, make sure your multipath drivers are aware of this arrangement, and respect it. Most midrange disk trays will do multipath, and cache mirroring between controllers - but if you read the fine print, you often find that access through the secondary controller is MUCH slower. It's usually implemented as a write-through to the primary, or has its cache disabled while the primary is active, etc. Cache mirroring at high speed is hard, complicated, and expensive, so vendors often only implement what's minimally necessary to do failover - even if it means the secondary controller doesn't cache a LUN unless the primary dies. If you have one of these (and I've no idea if Dell's 3200 does this, but this behavior is common enough I'd think about it), you'll want to split LUNs evenly between controllers to maximize the cache use. You'll also want to make sure the OSS knows which path is primary for which LUN, so it doesn't send traffic down the wrong path (or worse, down both - round-robin balancing is a bad idea when the paths are asymmetric) unless there's been a hardware failure. BTW, if you implemented a single RAID group and exported multiple LUNs, any multi-controller effects can get way more complicated - and are highly implementation-dependent. TL;DR - Multi-raid, RAID group == LUN == OST. Keep OSTs as independent as you can, and watch your controller and OSS multipath settings (if used). -- Mike Shuey On Sat, Mar 9, 2013 at 10:19 AM, Jerome, Ron <ron.jer...@ssc-spc.gc.ca> wrote: > I am currently having a debate about the best way to carve up Dell MD3200's > to be used as OST's in a Lustre file system and I invite this community to > weigh in... > > I am of the opinion that it should be setup as multiple raid groups each > having a single LUN, with each raid group representing an OST, while my > colleague feels that it should be setup as a single raid group across the > whole array with multiple LUNS, with each LUN representing an OST. > > Does anyone in this group have an opinion (one way or another)? > > Regards, > > Ron Jerome > _______________________________________________ > HPDD-discuss mailing list > hpdd-disc...@lists.01.org > https://lists.01.org/mailman/listinfo/hpdd-discuss _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss