My understanding is node fenced while rebooting. I suggest you to look info fencing logs as well. If your fencing logs not in detail use following in cluster.conf to enable logging
<logging> <logging_daemon name="fenced" debug="on"/> </logging> Thanks On Thu, Nov 7, 2013 at 5:34 PM, Yuriy Demchenko <demchenko...@gmail.com>wrote: > Hi, > > I'm trying to set up 3-node cluster (2 nodes + 1 standby node for quorum) > with cman+pacemaker stack, everything according this quickstart article: > http://clusterlabs.org/quickstart-redhat.html > > Cluster starts, all nodes see each other, quorum gained, stonith working, > but I've run into problem with cman: node cant join cluster after reboot - > cman starts and cman_tool nodes reports only that node as cluster-member, > while on other 2 nodes it reports 2 nodes as cluster-member and 3rd as > offline. cman stop/start/restart on the problem node does no effect - it > still can see only itself, but if i'll do cman restart on one of working > nodes - everything goes back to normal, all 3 nodes joins the cluster and > subsequent cman service restarts on any nodes works fine - node lefts > cluster and rejoins sucessfully. But again - only till node OS reboot. > > For example: > [1] Working cluster: > >> [root@node-1 ~]# cman_tool nodes >> Node Sts Inc Joined Name >> 1 M 592 2013-11-07 15:20:54 node-1.spb.stone.local >> 2 M 760 2013-11-07 15:20:54 node-2.spb.stone.local >> 3 M 760 2013-11-07 15:20:54 vnode-3.spb.stone.local >> [root@node-1 ~]# cman_tool status >> Version: 6.2.0 >> Config Version: 10 >> Cluster Name: ocluster >> Cluster Id: 2059 >> Cluster Member: Yes >> Cluster Generation: 760 >> Membership state: Cluster-Member >> Nodes: 3 >> Expected votes: 3 >> Total votes: 3 >> Node votes: 1 >> Quorum: 2 >> Active subsystems: 7 >> Flags: >> Ports Bound: 0 >> Node name: node-1.spb.stone.local >> Node ID: 1 >> Multicast addresses: 239.192.8.19 >> Node addresses: 192.168.220.21 >> > Picture is same on all 3 nodes (except for node name and id) - same > cluster name, cluster id, multicast addres. > > [2] I've put node-1 into reboot. After reboot complete, "cman_tool nodes" > on node-2 and vnode-3 shows this: > >> Node Sts Inc Joined Name >> 1 X 760 node-1.spb.stone.local >> 2 M 588 2013-11-07 15:11:23 node-2.spb.stone.local >> 3 M 760 2013-11-07 15:20:54 vnode-3.spb.stone.local >> [root@node-2 ~]# cman_tool status >> Version: 6.2.0 >> Config Version: 10 >> Cluster Name: ocluster >> Cluster Id: 2059 >> Cluster Member: Yes >> Cluster Generation: 764 >> Membership state: Cluster-Member >> Nodes: 2 >> Expected votes: 3 >> Total votes: 2 >> Node votes: 1 >> Quorum: 2 >> Active subsystems: 7 >> Flags: >> Ports Bound: 0 >> Node name: node-2.spb.stone.local >> Node ID: 2 >> Multicast addresses: 239.192.8.19 >> Node addresses: 192.168.220.22 >> > But, on rebooted node-1 it shows this: > >> Node Sts Inc Joined Name >> 1 M 764 2013-11-07 15:49:01 node-1.spb.stone.local >> 2 X 0 node-2.spb.stone.local >> 3 X 0 vnode-3.spb.stone.local >> [root@node-1 ~]# cman_tool status >> Version: 6.2.0 >> Config Version: 10 >> Cluster Name: ocluster >> Cluster Id: 2059 >> Cluster Member: Yes >> Cluster Generation: 776 >> Membership state: Cluster-Member >> Nodes: 1 >> Expected votes: 3 >> Total votes: 1 >> Node votes: 1 >> Quorum: 2 Activity blocked >> Active subsystems: 7 >> Flags: >> Ports Bound: 0 >> Node name: node-1.spb.stone.local >> Node ID: 1 >> Multicast addresses: 239.192.8.19 >> Node addresses: 192.168.220.21 >> > so, same cluster name, cluster id, multicast address - but it cant see > other nodes. And there are nothing in /var/log/messages and > /var/log/cluster/corosync.log on other two nodes - they seem not notice > node-1 coming back online at all, last records about node-1 leaving cluster. > > [3] If now i do "service cman restart" on node-2 or vnode-3 - everything > goes back to normal operation as in [1] > in logs it shows as node-2 leaving cluster (service stop) and > simultaneously joining of both node-2 and node-1 (service start) > >> Nov 7 11:47:06 vnode-3 corosync[26692]: [QUORUM] Members[2]: 2 3 >> Nov 7 11:47:06 vnode-3 corosync[26692]: [TOTEM ] A processor joined or >> left the membership and a new membership was formed. >> Nov 7 11:47:06 vnode-3 kernel: dlm: closing connection to node 1 >> Nov 7 11:47:06 vnode-3 corosync[26692]: [CPG ] chosen downlist: >> sender r(0) ip(192.168.220.22) ; members(old:3 left:1) >> Nov 7 11:47:06 vnode-3 corosync[26692]: [MAIN ] Completed service >> synchronization, ready to provide service. >> Nov 7 11:53:28 vnode-3 corosync[26692]: [QUORUM] Members[1]: 3 >> Nov 7 11:53:28 vnode-3 corosync[26692]: [TOTEM ] A processor joined or >> left the membership and a new membership was formed. >> Nov 7 11:53:28 vnode-3 corosync[26692]: [CPG ] chosen downlist: >> sender r(0) ip(192.168.220.14) ; members(old:2 left:1) >> Nov 7 11:53:28 vnode-3 corosync[26692]: [MAIN ] Completed service >> synchronization, ready to provide service. >> Nov 7 11:53:28 vnode-3 kernel: dlm: closing connection to node 2 >> Nov 7 11:53:30 vnode-3 corosync[26692]: [TOTEM ] A processor joined or >> left the membership and a new membership was formed. >> Nov 7 11:53:30 vnode-3 corosync[26692]: [QUORUM] Members[2]: 1 3 >> Nov 7 11:53:30 vnode-3 corosync[26692]: [QUORUM] Members[2]: 1 3 >> Nov 7 11:53:30 vnode-3 corosync[26692]: [QUORUM] Members[3]: 1 2 3 >> Nov 7 11:53:30 vnode-3 corosync[26692]: [QUORUM] Members[3]: 1 2 3 >> Nov 7 11:53:30 vnode-3 corosync[26692]: [QUORUM] Members[3]: 1 2 3 >> Nov 7 11:53:30 vnode-3 corosync[26692]: [CPG ] chosen downlist: >> sender r(0) ip(192.168.220.21) ; members(old:1 left:0) >> Nov 7 11:53:30 vnode-3 corosync[26692]: [MAIN ] Completed service >> synchronization, ready to provide service. >> > > I've set up such cluster before in quite same configuration and never had > any problems, but now I'm completely stuck. > So, what is wrong with my cluster and how to fix it? > > OS Centos 6.4 with lastest updates, firewall disabled, selinux permissive, > all 3 nodes inside same network. Multicast working - checked with omping. > cman.x86_64 3.0.12.1-49.el6_4.2 @centos6-updates > corosync.x86_64 1.4.1-15.el6_4.1 @centos6-updates > pacemaker.x86_64 1.1.10-1.el6_4.4 @centos6-updates > > cluster.conf is in attach > > -- > Yuriy Demchenko > > > -- > Linux-cluster mailing list > Linux-cluster@redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- http://linuxmantra.com
-- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster