Hi, I appreciate that, but it doesn't answer the question.
What I'm getting at, is there are multiple scenarios where a system can fail but in my test scenario I was forcing high load. My application wouldn't, in a working scenario, ever cause this type of load unless there was a very serious issue that would warrant failover. So in this scenario I want pacemaker to be able to handle this accordingly without the need to configure additional services entirely separate to the working of pacemaker. For example, it's easy to assume the monitor operations on the RA's can handle this already. The slave should be initiating a monitor operation against the master to see if it's services are still responding. But it seems only the master does this, but of course the master is foobared so never responds, so failover never occurs. Surely I'm not the only one that sees this as rather flawed? Regards, James -----Original Message----- From: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Florian Haas Sent: 07 July 2011 11:59 To: General Linux-HA mailing list Subject: Re: [Linux-HA] Forkbomb not initiating failover On 2011-07-07 11:59, James Smith wrote: > Hi, > > Summary: Two node cluster running DRBD, IET with a floating IP and stonith > enabled. > > All this works well, I can kernel panic the machine, kill individual > PIDs (for example IET) which then invoke failover. However, when I forkbomb > the master, nothing happens. > The box is dead, the services stop responding etc, but pacemaker does > not recognise this and therefore failover does not occur. > > Very occasionally it will fence and invoke failover after several > minutes or even longer, which is no good at all. > > To me, it seems extremely odd pacemaker itself does not automatically > incorporate system health checks that can detect such a scenario. > I've raised this a couple of times, but the suggestion is to run > watchdog or create an RA to do resource checking. Watchdog certainly does > its job and is easy to configure, but this seems flawed to me. Please refer to: http://www.gossamer-threads.com/lists/linuxha/pacemaker/70081#70081 Cheers, Florian _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems