On Thu, 2018-09-27 at 13:45 +0530, Prasad Nagaraj wrote: > Hello - I was trying to understand the behavior or cluster when > pacemaker crashes on one of the nodes. So I hard killed pacemakerd > and its related processes. > > ------------------------------------------------------------------- > ------------------------------------- > [root@SG-mysqlold-907 azureuser]# ps -ef | grep pacemaker > root 74022 1 0 07:53 pts/0 00:00:00 pacemakerd > 189 74028 74022 0 07:53 ? 00:00:00 > /usr/libexec/pacemaker/cib > root 74029 74022 0 07:53 ? 00:00:00 > /usr/libexec/pacemaker/stonithd > root 74030 74022 0 07:53 ? 00:00:00 > /usr/libexec/pacemaker/lrmd > 189 74031 74022 0 07:53 ? 00:00:00 > /usr/libexec/pacemaker/attrd > 189 74032 74022 0 07:53 ? 00:00:00 > /usr/libexec/pacemaker/pengine > 189 74033 74022 0 07:53 ? 00:00:00 > /usr/libexec/pacemaker/crmd > > root 75228 50092 0 07:54 pts/0 00:00:00 grep pacemaker > [root@SG-mysqlold-907 azureuser]# kill -9 74022 > > [root@SG-mysqlold-907 azureuser]# ps -ef | grep pacemaker > root 74030 1 0 07:53 ? 00:00:00 > /usr/libexec/pacemaker/lrmd > 189 74032 1 0 07:53 ? 00:00:00 > /usr/libexec/pacemaker/pengine > > root 75303 50092 0 07:55 pts/0 00:00:00 grep pacemaker > [root@SG-mysqlold-907 azureuser]# kill -9 74030 > [root@SG-mysqlold-907 azureuser]# kill -9 74032 > [root@SG-mysqlold-907 azureuser]# ps -ef | grep pacemaker > root 75332 50092 0 07:55 pts/0 00:00:00 grep pacemaker > > [root@SG-mysqlold-907 azureuser]# crm satus > ERROR: status: crm_mon (rc=107): Connection to cluster failed: > Transport endpoint is not connected > ------------------------------------------------------------------- > ---------------------------------------------------------- > > However, this does not seem to be having any effect on the cluster > status from other nodes > ------------------------------------------------------------------- > -------------------------------------------------------- > > [root@SG-mysqlold-909 azureuser]# crm status > Last updated: Thu Sep 27 07:56:17 2018 Last change: Thu Sep > 27 07:53:43 2018 by root via crm_attribute on SG-mysqlold-909 > Stack: classic openais (with plugin) > Current DC: SG-mysqlold-908 (version 1.1.14-8.el6_8.1-70404b0) - > partition with quorum > 3 nodes and 3 resources configured, 3 expected votes > > Online: [ SG-mysqlold-907 SG-mysqlold-908 SG-mysqlold-909 ]
It most definitely would make the node offline, and if fencing were configured, the rest of the cluster would fence the node to make sure it's safely down. I see you're using the old corosync 1 plugin. I suspect what happened in this case is that corosync noticed the plugin died and restarted it quickly enough that it had rejoined by the time you checked the status elsewhere. > > Full list of resources: > > Master/Slave Set: ms_mysql [p_mysql] > Masters: [ SG-mysqlold-909 ] > Slaves: [ SG-mysqlold-907 SG-mysqlold-908 ] > > > [root@SG-mysqlold-908 azureuser]# crm status > Last updated: Thu Sep 27 07:56:08 2018 Last change: Thu Sep > 27 07:53:43 2018 by root via crm_attribute on SG-mysqlold-909 > Stack: classic openais (with plugin) > Current DC: SG-mysqlold-908 (version 1.1.14-8.el6_8.1-70404b0) - > partition with quorum > 3 nodes and 3 resources configured, 3 expected votes > > Online: [ SG-mysqlold-907 SG-mysqlold-908 SG-mysqlold-909 ] > > Full list of resources: > > Master/Slave Set: ms_mysql [p_mysql] > Masters: [ SG-mysqlold-909 ] > Slaves: [ SG-mysqlold-907 SG-mysqlold-908 ] > > ------------------------------------------------------------------- > --------------------------------------------------- > > I am bit surprised that other nodes are not able to detect that > pacemaker is down on one of the nodes - SG-mysqlold-907 > > Even if I kill pacemaker on the node which is a DC - I observe the > same behavior with rest of the nodes not detecting that DC is down. > > Could some one explain what is the expected behavior in these cases ? > > I am using corosync 1.4.7 and pacemaker 1.1.14 > > Thanks in advance > Prasad > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch. > pdf > Bugs: http://bugs.clusterlabs.org -- Ken Gaillot <kgail...@redhat.com> _______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org