Were you able to get monitoring working to detect network failures? (pingd?)
I have it configured, but haven't been able to get it to trigger a failover when an MDS cannot ping the network. (I tried with 1.0 and 2.0 conf files, I am currently using 2.0) I have a ticket open with the pacemaker project (no ticket system for the HA stuff...) but not resolution. I am considering writing a script to down the node when the ping fails, but don't like the idea. I would also like to get the hpingd functioning to detect a fiber failure, but there was less available on that solution. -- Andrew > -----Original Message----- > From: Jim Garlick [mailto:garl...@llnl.gov] > Sent: Monday, July 13, 2009 2:21 PM > To: Lundgren, Andrew > Cc: Carlos Santana; lustre-discuss@lists.lustre.org > Subject: Re: [Lustre-discuss] failover software - heartbeat > > We recently put heartbeat v1 in production and along the way > developed some admin scripts including heartbeat resource agent > compliant > lustre init scripts, a script to initiate failover/failback and get > detailed > status, a powerman stonith interface, and various safeguards to ensure > MMP > is on, devices are present and usable, etc. before starting lustre. > > If this is of general interest I could post it to a bug for review. > > Jim > > On Mon, Jul 13, 2009 at 01:45:02PM -0600, Lundgren, Andrew wrote: > > It is very difficult to find relevant documentation for heartbeat > 1/2. I just finished configuring a heartbeat system and would not > recommend it because of the documentation. (They seem to have removed > portions the heartbeat documentation from the site.) > > > > Pacemaker is not a simple solution to configure either. I played > briefly with the RH clustering software. It does not directly support > any FS type other than the basic ext2/ext3, and wasn't happy with a > lustre type. > > > > -- > > Andrew > > > > > -----Original Message----- > > > From: lustre-discuss-boun...@lists.lustre.org [mailto:lustre- > discuss- > > > boun...@lists.lustre.org] On Behalf Of Carlos Santana > > > Sent: Monday, July 13, 2009 11:42 AM > > > To: lustre-discuss@lists.lustre.org > > > Subject: [Lustre-discuss] failover software - heartbeat > > > > > > Howdy, > > > > > > The lustre manual recommends heartbeat for handling failover. The > > > pacemaker is successor of hearbeat version 2. So whats recommended > - > > > should we be using pacemaker or stick to hearbeat? > > > > > > - > > > CS. > > > _______________________________________________ > > > Lustre-discuss mailing list > > > Lustre-discuss@lists.lustre.org > > > http://*lists.lustre.org/mailman/listinfo/lustre-discuss > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss@lists.lustre.org > > http://*lists.lustre.org/mailman/listinfo/lustre-discuss _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss