Re: [ClusterLabs] continous QUORUM messages in a 3-node cluster

Jan Friesse Mon, 12 Oct 2015 01:08:32 -0700

Illia,

Hi,

We are using a 3-node pacemaker/corosync cluster on CentOs 7.
We have several identical setups in our QA/DEV orgs, and a couple of them 
continuously spew the following messages on all 3 nodes:

Oct  8 17:18:20 42-hw-rig4-L3-2 corosync[15105]: [TOTEM ] A new membership 
(10.1.13.134:1553572) was formed. Members
Oct  8 17:18:20 42-hw-rig4-L3-2 corosync[15105]: [QUORUM] Members[3]: 2 1 3
Oct  8 17:18:20 42-hw-rig4-L3-2 corosync[15105]: [MAIN  ] Completed service 
synchronization, ready to provide service.
Oct  8 17:18:22 42-hw-rig4-L3-2 corosync[15105]: [TOTEM ] A new membership 
(10.1.13.134:1553576) was formed. Members
Oct  8 17:18:22 42-hw-rig4-L3-2 corosync[15105]: [QUORUM] Members[3]: 2 1 3
Oct  8 17:18:22 42-hw-rig4-L3-2 corosync[15105]: [MAIN  ] Completed service 
synchronization, ready to provide service.
Oct  8 17:18:24 42-hw-rig4-L3-2 corosync[15105]: [TOTEM ] A new membership 
(10.1.13.134:1553580) was formed. Members
Oct  8 17:18:24 42-hw-rig4-L3-2 corosync[15105]: [QUORUM] Members[3]: 2 1 3
Oct  8 17:18:24 42-hw-rig4-L3-2 corosync[15105]: [MAIN  ] Completed service 
synchronization, ready to provide service.
Oct  8 17:18:26 42-hw-rig4-L3-2 corosync[15105]: [TOTEM ] A new membership 
(10.1.13.134:1553584) was formed. Members
Oct  8 17:18:26 42-hw-rig4-L3-2 corosync[15105]: [QUORUM] Members[3]: 2 1 3
Oct  8 17:18:26 42-hw-rig4-L3-2 corosync[15105]: [MAIN  ] Completed service 
synchronization, ready to provide service.

The cluster seems to be generally happy:

root@42-hw-rig4-L3-2 ~]# pcs cluster status
Cluster Status:
  Last updated: Thu Oct  8 17:24:02 2015
  Last change: Thu Oct  8 16:46:57 2015
  Stack: corosync
  Current DC: dq-ceph9.clearsky-data.net (3) - partition with quorum
  Version: 1.1.12-a14efad
  3 Nodes configured
  17 Resources configured

PCSD Status:
   42-hw-back-1.clearsky-data.net: Online
   41-hw-back-1.clearsky-data.net: Online
   dq-ceph9.clearsky-data.net: Online

The corosync config is:

totem {
version: 2
secauth: off
cluster_name: L3_cluster
transport: udpu
}

nodelist {
   node {
         ring0_addr: 42-hw-back-1.clearsky-data.net
         nodeid: 1
        }
   node {
         ring0_addr: 41-hw-back-1.clearsky-data.net
         nodeid: 2
        }
   node {
         ring0_addr: dq-ceph9.clearsky-data.net
         nodeid: 3
        }
}

quorum {
provider: corosync_votequorum

}

logging {
to_syslog: yes
debug : off
}

What do these messages mean, and how can we stop them?

Any help would be very appreciated.

Ilia Sokolinski

PS

I have tried to enable corosync debug and got the following logs:

Oct  8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [CFG   ] Config reload 
requested by node 1
Oct  8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [QB    ] HUP conn 
(15105-46913-25)
Oct  8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [QB    ] 
qb_ipcs_disconnect(15105-46913-25) state:2
Oct  8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [QB    ] epoll_ctl(del): Bad 
file descriptor (9)
Oct  8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [MAIN  ] 
cs_ipcs_connection_closed()
Oct  8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [MAIN  ] 
cs_ipcs_connection_destroyed()
Oct  8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [QB    ] Free'ing ringbuffer: 
/dev/shm/qb-cfg-response-15105-46913-25-header
Oct  8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [QB    ] Free'ing ringbuffer: 
/dev/shm/qb-cfg-event-15105-46913-25-header
Oct  8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [QB    ] Free'ing ringbuffer: 
/dev/shm/qb-cfg-request-15105-46913-25-header
Oct  8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [VOTEQ ] got nodeinfo message 
from cluster node 3
Oct  8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [VOTEQ ] nodeinfo message[3]: 
votes: 1, expected: 3 flags: 1
Oct  8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [VOTEQ ] flags: quorate: Yes 
Leaving: No WFA Status: No First: No Qdevice: No QdeviceAlive: No 
QdeviceCastVote: No QdeviceMasterWins: No
Oct  8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [VOTEQ ] got nodeinfo message 
from cluster node 3
Oct  8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [VOTEQ ] nodeinfo message[0]: 
votes: 0, expected: 0 flags: 0
Oct  8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [VOTEQ ] got nodeinfo message 
from cluster node 2
Oct  8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [VOTEQ ] nodeinfo message[2]: 
votes: 1, expected: 3 flags: 1
Oct  8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [VOTEQ ] flags: quorate: Yes 
Leaving: No WFA Status: No First: No Qdevice: No QdeviceAlive: No 
QdeviceCastVote: No QdeviceMasterWins: No
Oct  8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [VOTEQ ] got nodeinfo message 
from cluster node 2
Oct  8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [VOTEQ ] nodeinfo message[0]: 
votes: 0, expected: 0 flags: 0
Oct  8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [VOTEQ ] got nodeinfo message 
from cluster node 1
Oct  8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [VOTEQ ] nodeinfo message[1]: 
votes: 1, expected: 3 flags: 1
Oct  8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [VOTEQ ] flags: quorate: Yes 
Leaving: No WFA Status: No First: No Qdevice: No QdeviceAlive: No 
QdeviceCastVote: No QdeviceMasterWins: No
Oct  8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [VOTEQ ] total_votes=3, 
expected_votes=3
Oct  8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [VOTEQ ] node 1 state=1, 
votes=1, expected=3
Oct  8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [VOTEQ ] node 2 state=1, 
votes=1, expected=3
Oct  8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [VOTEQ ] node 3 state=1, 
votes=1, expected=3
Oct  8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [VOTEQ ] lowest node id: 1 us: 
1
Oct  8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [VOTEQ ] highest node id: 3 
us: 1
Oct  8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [VOTEQ ] got nodeinfo message 
from cluster node 1
Oct  8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [VOTEQ ] nodeinfo message[0]: 
votes: 0, expected: 0 flags: 0
Oct  8 16:18:48 42-hw-rig4-L3-2 corosync[15105]: [TOTEM ] entering GATHER state 
from 9(merge during operational state).
Oct  8 16:18:49 42-hw-rig4-L3-2 corosync[15105]: [QB    ] IPC credentials 
authenticated (15105-46923-25)
Oct  8 16:18:49 42-hw-rig4-L3-2 corosync[15105]: [QB    ] connecting to client 
[46923]
Oct  8 16:18:49 42-hw-rig4-L3-2 corosync[15105]: [QB    ] shm size:1048589; 
real_size:1052672; rb->word_size:263168
Oct  8 16:18:49 42-hw-rig4-L3-2 corosync[15105]: [QB    ] shm size:1048589; 
real_size:1052672; rb->word_size:263168
Oct  8 16:18:49 42-hw-rig4-L3-2 corosync[15105]: [QB    ] shm size:1048589; 
real_size:1052672; rb->word_size:263168
Oct  8 16:18:49 42-hw-rig4-L3-2 corosync[15105]: [MAIN  ] connection created
Oct  8 16:18:49 42-hw-rig4-L3-2 corosync[15105]: [CMAP  ] lib_init_fn: 
conn=0x560430414560
Oct  8 16:18:49 42-hw-rig4-L3-2 corosync[15105]: [QB    ] HUP conn 
(15105-46923-25)
Oct  8 16:18:49 42-hw-rig4-L3-2 corosync[15105]: [QB    ] 
qb_ipcs_disconnect(15105-46923-25) state:2
Oct  8 16:18:49 42-hw-rig4-L3-2 corosync[15105]: [QB    ] epoll_ctl(del): Bad 
file descriptor (9)
Oct  8 16:18:49 42-hw-rig4-L3-2 corosync[15105]: [MAIN  ] 
cs_ipcs_connection_closed()
Oct  8 16:18:49 42-hw-rig4-L3-2 corosync[15105]: [CMAP  ] exit_fn for 
conn=0x560430414560
Oct  8 16:18:49 42-hw-rig4-L3-2 corosync[15105]: [MAIN  ] 
cs_ipcs_connection_destroyed()
Oct  8 16:18:49 42-hw-rig4-L3-2 corosync[15105]: [QB    ] Free'ing ringbuffer: 
/dev/shm/qb-cmap-response-15105-46923-25-header
Oct  8 16:18:49 42-hw-rig4-L3-2 corosync[15105]: [QB    ] Free'ing ringbuffer: 
/dev/shm/qb-cmap-event-15105-46923-25-header
Oct  8 16:18:49 42-hw-rig4-L3-2 corosync[15105]: [QB    ] Free'ing ringbuffer: 
/dev/shm/qb-cmap-request-15105-46923-25-header

Line "entering GATHER state from 9(merge during operational state)". isinteresting. I believe you have different cluster on the network whereyou forgot ip addr of one of node in affected cluster. As a simple test,you can try to add


        interface {
            ringnumber: 0
            mcastport: 5409
        }

into corosync.conf totem section on all nodes in one of affected clusterand see if problem disappear.


Honza

_______________________________________________
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



_______________________________________________
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] continous QUORUM messages in a 3-node cluster

Reply via email to