On Thu, 8 Sep 2016 09:51:27 +0000 Shermal Fernando <sherma...@millenniumit.com> wrote:
> Hi Jehan-Guillaume, > > Sorry for disturbing you. This is really important for us to pass this test > on the pacemaker resiliency and robustness. To my understanding, it's the > pacemakerd who feeds the watchdog. If only the crmd is hung, fencing will not > work. Am I correct here? I guess yes. I am talking of a scenario where the server is under a high load (fork bomb, swap storm, ...), not only crmd being hung for some reasons. > -----Original Message----- > From: Jehan-Guillaume de Rorthais [mailto:j...@dalibo.com] > Sent: Thursday, September 08, 2016 3:12 PM > To: Shermal Fernando > Cc: Cluster Labs - All topics related to open-source clustering welcomed > Subject: Re: [ClusterLabs] Antw: Re: When the DC crmd is frozen, cluster > decisions are delayed infinitely > > On Thu, 8 Sep 2016 08:58:15 +0000 > Shermal Fernando <sherma...@millenniumit.com> wrote: > > > Hi Jehan-Guillaume, > > > > Does this means watchdog will serf-terminate the machine when the crm > > daemon is frozen? > > This means that if the machine is under such a load that PAcemaker is not > able to feed the watchdog, the watchdog will fence the machine itself. > > > -----Original Message----- > > From: Jehan-Guillaume de Rorthais [mailto:j...@dalibo.com] > > Sent: Thursday, September 08, 2016 12:52 PM > > To: Digimer > > Cc: Cluster Labs - All topics related to open-source clustering > > welcomed > > Subject: Re: [ClusterLabs] Antw: Re: When the DC crmd is frozen, > > cluster decisions are delayed infinitely > > > > On Thu, 8 Sep 2016 15:55:50 +0900 > > Digimer <li...@alteeve.ca> wrote: > > > > > On 08/09/16 03:47 PM, Ulrich Windl wrote: > > > >>>> Shermal Fernando <sherma...@millenniumit.com> schrieb am > > > >>>> 08.09.2016 um > > > >>>> 06:41 in > > > > Nachricht > > > > <8ce6e8d87f896546b9c65ed80d30a4336578c...@lg-spmb-mbx02.lseg.stockex.local>: > > > >> The whole cluster will fail if the DC (crm daemon) is frozen due > > > >> to CPU starvation or hanging while trying to perform a IO operation. > > > >> Please share some thoughts on this issue. > > > > > > > > What is "the whole cluster will fail"? If the DC times out, some > > > > recovery will take place. > > > > > > Yup. The starved node should be declared lost by corosync, the > > > remaining nodes reform and if they're still quorate, the hung node > > > should be fenced. Recovery occur and life goes on. > > > > +1 > > > > And fencing might either come from outside, or just from the server > > itself using watchdog. _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org