Re: [Linux-cluster] [cman] cant joint cluster after reboot
Thanks, the problem indeed in multicast. Switching to udpu brought cluster to normal operation. Any tips how to fix multicast operation? igmp snooping on switch is disabled, firewall disabled too. In fact, what confuses me is that node cant join cluster after reboot no matter how long i'll wait after reboot, no matter how many times i'll do "service cman restart" on that node - it just dont work until cman restarted on some other node. Another strange thing - i've used tcpdump to capture udp traffic and there were no udp traffic at all from node-1 after reboot, no traffic after service restarts. But as soon as service restarted on other node - udp traffic to multicast address appeared from node-1. I've also tried to switch igmp snooping on, but that caused cluster not working at all - each node saw only itself. On switch I saw that multicast group was created, each corresponded port became member of that group, but packet statistics shown only few "report v3" packets, no query/leave/error packets. Yuriy Demchenko On 11/07/2013 05:47 PM, Christine Caulfield wrote: On 07/11/13 12:04, Yuriy Demchenko wrote: Hi, I'm trying to set up 3-node cluster (2 nodes + 1 standby node for quorum) with cman+pacemaker stack, everything according this quickstart article: http://clusterlabs.org/quickstart-redhat.html Cluster starts, all nodes see each other, quorum gained, stonith working, but I've run into problem with cman: node cant join cluster after reboot - cman starts and cman_tool nodes reports only that node as cluster-member, while on other 2 nodes it reports 2 nodes as cluster-member and 3rd as offline. cman stop/start/restart on the problem node does no effect - it still can see only itself, but if i'll do cman restart on one of working nodes - everything goes back to normal, all 3 nodes joins the cluster and subsequent cman service restarts on any nodes works fine - node lefts cluster and rejoins sucessfully. But again - only till node OS reboot. For example: [1] Working cluster: [root@node-1 ~]# cman_tool nodes Node Sts Inc Joined Name 1 M592 2013-11-07 15:20:54 node-1.spb.stone.local 2 M760 2013-11-07 15:20:54 node-2.spb.stone.local 3 M760 2013-11-07 15:20:54 vnode-3.spb.stone.local [root@node-1 ~]# cman_tool status Version: 6.2.0 Config Version: 10 Cluster Name: ocluster Cluster Id: 2059 Cluster Member: Yes Cluster Generation: 760 Membership state: Cluster-Member Nodes: 3 Expected votes: 3 Total votes: 3 Node votes: 1 Quorum: 2 Active subsystems: 7 Flags: Ports Bound: 0 Node name: node-1.spb.stone.local Node ID: 1 Multicast addresses: 239.192.8.19 Node addresses: 192.168.220.21 Picture is same on all 3 nodes (except for node name and id) - same cluster name, cluster id, multicast addres. [2] I've put node-1 into reboot. After reboot complete, "cman_tool nodes" on node-2 and vnode-3 shows this: Node Sts Inc Joined Name 1 X760node-1.spb.stone.local 2 M588 2013-11-07 15:11:23 node-2.spb.stone.local 3 M760 2013-11-07 15:20:54 vnode-3.spb.stone.local [root@node-2 ~]# cman_tool status Version: 6.2.0 Config Version: 10 Cluster Name: ocluster Cluster Id: 2059 Cluster Member: Yes Cluster Generation: 764 Membership state: Cluster-Member Nodes: 2 Expected votes: 3 Total votes: 2 Node votes: 1 Quorum: 2 Active subsystems: 7 Flags: Ports Bound: 0 Node name: node-2.spb.stone.local Node ID: 2 Multicast addresses: 239.192.8.19 Node addresses: 192.168.220.22 But, on rebooted node-1 it shows this: Node Sts Inc Joined Name 1 M764 2013-11-07 15:49:01 node-1.spb.stone.local 2 X 0node-2.spb.stone.local 3 X 0vnode-3.spb.stone.local [root@node-1 ~]# cman_tool status Version: 6.2.0 Config Version: 10 Cluster Name: ocluster Cluster Id: 2059 Cluster Member: Yes Cluster Generation: 776 Membership state: Cluster-Member Nodes: 1 Expected votes: 3 Total votes: 1 Node votes: 1 Quorum: 2 Activity blocked Active subsystems: 7 Flags: Ports Bound: 0 Node name: node-1.spb.stone.local Node ID: 1 Multicast addresses: 239.192.8.19 Node addresses: 192.168.220.21 so, same cluster name, cluster id, multicast address - but it cant see other nodes. And there are nothing in /var/log/messages and /var/log/cluster/corosync.log on other two nodes - they seem not notice node-1 coming back online at all, last records about node-1 leaving cluster. [3] If now i do "service cman restart" on node-2 or vnode-3 - everything goes back to normal operation as in [1] in logs it shows as node-2 leaving cluster (service stop) and simultaneously joining of both node-2 and node-1 (service start) Nov 7 11:47:06 vnode-3 corosync[26692]: [QUORUM] Members[2]: 2 3 Nov 7 11:47:06 vnode-3 corosync[26692]: [TOTEM ] A processor joined or left the membership and a ne
Re: [Linux-cluster] [cman] cant joint cluster after reboot
Nope, nothing in logs suggests that node is fenced while in reboot. Moreover, same behaviour persists with pacemaker started - and I've explicitly put node into standby in pacemaker before reboot. And same behaviour persists with stonith-enabled=false; same behaviour with manual node fence via "stonith_admin --reboot node-1.spb.stone.local". So i suppose fencing isn't issue here. Yuriy Demchenko On 11/07/2013 05:11 PM, Vishesh kumar wrote: My understanding is node fenced while rebooting. I suggest you to look info fencing logs as well. If your fencing logs not in detail use following in cluster.conf to enable logging Thanks On Thu, Nov 7, 2013 at 5:34 PM, Yuriy Demchenko mailto:demchenko...@gmail.com>> wrote: Hi, I'm trying to set up 3-node cluster (2 nodes + 1 standby node for quorum) with cman+pacemaker stack, everything according this quickstart article: http://clusterlabs.org/quickstart-redhat.html Cluster starts, all nodes see each other, quorum gained, stonith working, but I've run into problem with cman: node cant join cluster after reboot - cman starts and cman_tool nodes reports only that node as cluster-member, while on other 2 nodes it reports 2 nodes as cluster-member and 3rd as offline. cman stop/start/restart on the problem node does no effect - it still can see only itself, but if i'll do cman restart on one of working nodes - everything goes back to normal, all 3 nodes joins the cluster and subsequent cman service restarts on any nodes works fine - node lefts cluster and rejoins sucessfully. But again - only till node OS reboot. For example: [1] Working cluster: [root@node-1 ~]# cman_tool nodes Node Sts Inc Joined Name 1 M592 2013-11-07 15:20:54 node-1.spb.stone.local 2 M760 2013-11-07 15:20:54 node-2.spb.stone.local 3 M760 2013-11-07 15:20:54 vnode-3.spb.stone.local [root@node-1 ~]# cman_tool status Version: 6.2.0 Config Version: 10 Cluster Name: ocluster Cluster Id: 2059 Cluster Member: Yes Cluster Generation: 760 Membership state: Cluster-Member Nodes: 3 Expected votes: 3 Total votes: 3 Node votes: 1 Quorum: 2 Active subsystems: 7 Flags: Ports Bound: 0 Node name: node-1.spb.stone.local Node ID: 1 Multicast addresses: 239.192.8.19 Node addresses: 192.168.220.21 Picture is same on all 3 nodes (except for node name and id) - same cluster name, cluster id, multicast addres. [2] I've put node-1 into reboot. After reboot complete, "cman_tool nodes" on node-2 and vnode-3 shows this: Node Sts Inc Joined Name 1 X760 node-1.spb.stone.local 2 M588 2013-11-07 15:11:23 node-2.spb.stone.local 3 M760 2013-11-07 15:20:54 vnode-3.spb.stone.local [root@node-2 ~]# cman_tool status Version: 6.2.0 Config Version: 10 Cluster Name: ocluster Cluster Id: 2059 Cluster Member: Yes Cluster Generation: 764 Membership state: Cluster-Member Nodes: 2 Expected votes: 3 Total votes: 2 Node votes: 1 Quorum: 2 Active subsystems: 7 Flags: Ports Bound: 0 Node name: node-2.spb.stone.local Node ID: 2 Multicast addresses: 239.192.8.19 Node addresses: 192.168.220.22 But, on rebooted node-1 it shows this: Node Sts Inc Joined Name 1 M764 2013-11-07 15:49:01 node-1.spb.stone.local 2 X 0 node-2.spb.stone.local 3 X 0 vnode-3.spb.stone.local [root@node-1 ~]# cman_tool status Version: 6.2.0 Config Version: 10 Cluster Name: ocluster Cluster Id: 2059 Cluster Member: Yes Cluster Generation: 776 Membership state: Cluster-Member Nodes: 1 Expected votes: 3 Total votes: 1 Node votes: 1 Quorum: 2 Activity blocked Active subsystems: 7 Flags: Ports Bound: 0 Node name: node-1.spb.stone.local Node ID: 1 Multicast addresses: 239.192.8.19 Node addresses: 192.168.220.21 so, same cluster name, cluster id, multicast address - but it cant see other nodes. And there are nothing in /var/log/messages and /var/log/cluster/corosync.log on other two nodes - they seem not notice node-1 coming back online at all, last records about node-1 leaving cluster. [3] If now i do "service cman restart" on node-2 or vnode-3 - everything goes back to normal operation as in [1] in logs it shows as node-2 lea
[Linux-cluster] [cman] cant joint cluster after reboot
: [QUORUM] Members[3]: 1 2 3 Nov 7 11:53:30 vnode-3 corosync[26692]: [QUORUM] Members[3]: 1 2 3 Nov 7 11:53:30 vnode-3 corosync[26692]: [QUORUM] Members[3]: 1 2 3 Nov 7 11:53:30 vnode-3 corosync[26692]: [CPG ] chosen downlist: sender r(0) ip(192.168.220.21) ; members(old:1 left:0) Nov 7 11:53:30 vnode-3 corosync[26692]: [MAIN ] Completed service synchronization, ready to provide service. I've set up such cluster before in quite same configuration and never had any problems, but now I'm completely stuck. So, what is wrong with my cluster and how to fix it? OS Centos 6.4 with lastest updates, firewall disabled, selinux permissive, all 3 nodes inside same network. Multicast working - checked with omping. cman.x86_64 3.0.12.1-49.el6_4.2 @centos6-updates corosync.x86_64 1.4.1-15.el6_4.1 @centos6-updates pacemaker.x86_64 1.1.10-1.el6_4.4 @centos6-updates cluster.conf is in attach -- Yuriy Demchenko -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster