On 06/15/2017 12:38 AM, Jaz Khan wrote: > Hi, > > I have been encountering this serious issue from past couple of months. > I really have no idea that why pacemaker sends shutdown signal to peer > node and it goes down. This is very strange and I am too much worried . > > This is not happening daily, but it surely does this kind of behavior > after every few days. > > Version: > Pacemaker 1.1.16 > Corosync 2.4.2 > > Please help me out with this bug! Below is the log message. > > > > Jun 14 15:52:23 apex1 crmd[18733]: notice: State transition S_IDLE -> > S_POLICY_ENGINE > Jun 14 15:52:23 apex1 pengine[18732]: notice: On loss of CCM Quorum: Ignore > > Jun 14 15:52:23 apex1 pengine[18732]: notice: Scheduling Node ha-apex2 > for shutdown
This is not a fencing, but a clean shutdown. Normally this only happens in response to a user request. Check the logs on both nodes before this point, to try to see what was the first indication that it would shut down. > > Jun 14 15:52:23 apex1 pengine[18732]: notice: Move vip#011(Started > ha-apex2 -> ha-apex1) > Jun 14 15:52:23 apex1 pengine[18732]: notice: Move > filesystem#011(Started ha-apex2 -> ha-apex1) > Jun 14 15:52:23 apex1 pengine[18732]: notice: Move samba#011(Started > ha-apex2 -> ha-apex1) > Jun 14 15:52:23 apex1 pengine[18732]: notice: Move > database#011(Started ha-apex2 -> ha-apex1) > Jun 14 15:52:23 apex1 pengine[18732]: notice: Calculated transition > 1744, saving inputs in /var/lib/pacemaker/pengine/pe-input-123.bz2 > Jun 14 15:52:23 apex1 crmd[18733]: notice: Initiating stop operation > vip_stop_0 on ha-apex2 > Jun 14 15:52:23 apex1 crmd[18733]: notice: Initiating stop operation > samba_stop_0 on ha-apex2 > Jun 14 15:52:23 apex1 crmd[18733]: notice: Initiating stop operation > database_stop_0 on ha-apex2 > Jun 14 15:52:26 apex1 crmd[18733]: notice: Initiating stop operation > filesystem_stop_0 on ha-apex2 > Jun 14 15:52:27 apex1 kernel: drbd apexdata apex2.br <http://apex2.br>: > peer( Primary -> Secondary ) > Jun 14 15:52:27 apex1 crmd[18733]: notice: Initiating start operation > filesystem_start_0 locally on ha-apex1 > > Jun 14 15:52:27 apex1 crmd[18733]: notice: do_shutdown of peer ha-apex2 > is complete > > Jun 14 15:52:27 apex1 attrd[18731]: notice: Node ha-apex2 state is now lost > Jun 14 15:52:27 apex1 attrd[18731]: notice: Removing all ha-apex2 > attributes for peer loss > Jun 14 15:52:27 apex1 attrd[18731]: notice: Lost attribute writer ha-apex2 > Jun 14 15:52:27 apex1 attrd[18731]: notice: Purged 1 peers with id=2 > and/or uname=ha-apex2 from the membership cache > Jun 14 15:52:27 apex1 stonith-ng[18729]: notice: Node ha-apex2 state is > now lost > Jun 14 15:52:27 apex1 stonith-ng[18729]: notice: Purged 1 peers with > id=2 and/or uname=ha-apex2 from the membership cache > Jun 14 15:52:27 apex1 cib[18728]: notice: Node ha-apex2 state is now lost > Jun 14 15:52:27 apex1 cib[18728]: notice: Purged 1 peers with id=2 > and/or uname=ha-apex2 from the membership cache > > > > Best regards, > Jaz. K _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org