I would recommend the heartbeat with pacemaker setup for the fail-over control. The configuration may seem complex at the beginning but after enough reading (and there is many good sources) it is quite easy to setup. I have recently set up a Lustre system with 3 OSSs and two MDSs (DRBD with LVM between them) working as a single HA cluster and it was easy enough. Pacemaker allows single point of administration of lustre system (starting and stopping the filesystem) and there is a neat GUI for those who want to show something to their managers :)
Best regards, Wojciech On 10 August 2010 20:47, Bernd Schubert <[email protected]> wrote: > > On Tuesday, August 10, 2010, David Noriega wrote: > > So your script resets the server so there is no fail-over(ie the other > > server takes over resources from that server?) or there is failover > > but you then manually return resources back to the server that was > > reset? > > Our ddn ipmi stonith script (external/ipmi_ddn in heartbeat/pacemaker > stonith > terms) only makes absolutely sure the node was really reset. If something > fails, an error code is reported to pacemaker and then pacemaker (*) will > not > initiate resource fail-over in order to prevent split-brain. > As Lustre devices use MMP (multiple-mount protection) that is not strictly > required, in principal. But if something goes wrong. e.g. MMP was > accidentally > not enabled, a double mount could come up and that would cause serious > filesystem and data corruption... > > > Cheers, > Bernd > > PS: (*) hearbeat-v1 (and v2/v3 if not in xml/crm mode) also *should* accept > stonith error codes, but in general, I have seen it more than once that > hearbeat-v1 run into split-brain and started resources on both cluster > nodes. > That is something where pacemaker does a much better job. > > -- > Bernd Schubert > DataDirect Networks > _______________________________________________ > Lustre-discuss mailing list > [email protected] > http://lists.lustre.org/mailman/listinfo/lustre-discuss > -- Wojciech Turek Senior System Architect High Performance Computing Service University of Cambridge Email: [email protected] Tel: (+)44 1223 763517
_______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
