I'm looking at using Lustre to implement a centralized storage for several virtualized machines. The key consideration being reliability and ease of increasing/replacing capacity.
However, I'm still quite confused and haven't read the manual fully because I'm tripping on this: what exactly happens if a piece of hardware fails? Perhaps it's because I haven't yet tried to setup Lustre so the terms used don't quite translate for me yet. So I'll appreciate some newbie hand holding here :) For example, if I have a simple 5 machine cluster, one MDS/MDTand one failover MDS/MDT. Three OSS/OST machines with 4 drives each, for 2 sets of MD Raid 1 block devices and so total of 6 OST if I didn't understand the term wrongly. What happens if one of the OSS/OST dies, say motherboard failure? Because the manual mentions data striping across multiple OST, it sounds like either networked RAID 0 or RAID 5. In the case of network RAID 0, a single machine failure means the whole cluster is dead. It doesn't seem to make sense for Lustre to fail in this manner. Where as if Lustre implements network RAID 5, the cluster would continue to serve all data despite the dead machine. Yet the manual warns that Lustre does not have redundancy and relies entirely on some kind of hardware RAID being used. So it seems to imply that the network RAID 0 is what's implemented. This appears to be the case given the example in the manual of a simple combined MGS/MDT with two OSS/OST which uses the same fsname "temp" for the OSTs, which then combines the two 16MB OST into a single 30MB block device mounted as /lustre on the client. Does this then mean that if I want redundancy on the storage, I would basically need to have a failover machine for every OSS/OST? I'm also confused because the manual says an OST is a block device such as /dev/sda1 but OSS can be configured to provide failover services. But if the OSS machine which houses the OST dies, how would another OSS take over anyway since it would not be able to access the other set of data? Or does that mean this functionality is only available if the OST in the cluster are standalone SAN devices? _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss