Hi everybody,
we have been using corosync directly to provide clustering for GFS2 on our
centos 7.2 pools with only one network interface and all has been working great
so far!
We now have a new set-up with two network interfaces for every host in the
cluster:
A -> 1 Gbit (the one we would like corosync to use, 10.220.88.X)
B -> 10 Gbit (used for iscsi connection to storage, 10.220.246.X)
when we run corosync in this mode we get the logs continuously spammed by
messages like these:
[12880] cl15-02 corosyncdebug [TOTEM ] entering GATHER state from 0(consensus
timeout).
[12880] cl15-02 corosyncdebug [TOTEM ] Creating commit token because I am the
rep.
[12880] cl15-02 corosyncdebug [TOTEM ] Saving state aru 10 high seq received
10
[12880] cl15-02 corosyncdebug [MAIN ] Storing new sequence id for ring 5750
[12880] cl15-02 corosyncdebug [TOTEM ] entering COMMIT state.
[12880] cl15-02 corosyncdebug [TOTEM ] got commit token
[12880] cl15-02 corosyncdebug [TOTEM ] entering RECOVERY state.
[12880] cl15-02 corosyncdebug [TOTEM ] TRANS [0] member 10.220.88.41:
[12880] cl15-02 corosyncdebug [TOTEM ] TRANS [1] member 10.220.88.47:
[12880] cl15-02 corosyncdebug [TOTEM ] position [0] member 10.220.88.41:
[12880] cl15-02 corosyncdebug [TOTEM ] previous ring seq 574c rep 10.220.88.41
[12880] cl15-02 corosyncdebug [TOTEM ] aru 10 high delivered 10 received flag
1
[12880] cl15-02 corosyncdebug [TOTEM ] position [1] member 10.220.88.47:
[12880] cl15-02 corosyncdebug [TOTEM ] previous ring seq 574c rep 10.220.88.41
[12880] cl15-02 corosyncdebug [TOTEM ] aru 10 high delivered 10 received flag
1
[12880] cl15-02 corosyncdebug [TOTEM ] Did not need to originate any messages
in recovery.
[12880] cl15-02 corosyncdebug [TOTEM ] got commit token
[12880] cl15-02 corosyncdebug [TOTEM ] Sending initial ORF token
[12880] cl15-02 corosyncdebug [TOTEM ] token retrans flag is 0 my set retrans
flag0 retrans queue empty 1 count 0, aru 0
[12880] cl15-02 corosyncdebug [TOTEM ] install seq 0 aru 0 high seq received 0
[12880] cl15-02 corosyncdebug [TOTEM ] token retrans flag is 0 my set retrans
flag0 retrans queue empty 1 count 1, aru 0
[12880] cl15-02 corosyncdebug [TOTEM ] install seq 0 aru 0 high seq received 0
[12880] cl15-02 corosyncdebug [TOTEM ] token retrans flag is 0 my set retrans
flag0 retrans queue empty 1 count 2, aru 0
[12880] cl15-02 corosyncdebug [TOTEM ] install seq 0 aru 0 high seq received 0
[12880] cl15-02 corosyncdebug [TOTEM ] token retrans flag is 0 my set retrans
flag0 retrans queue empty 1 count 3, aru 0
[12880] cl15-02 corosyncdebug [TOTEM ] install seq 0 aru 0 high seq received 0
[12880] cl15-02 corosyncdebug [TOTEM ] retrans flag count 4 token aru 0
install seq 0 aru 0 0
[12880] cl15-02 corosyncdebug [TOTEM ] Resetting old ring state
[12880] cl15-02 corosyncdebug [TOTEM ] recovery to regular 1-0
[12880] cl15-02 corosyncdebug [TOTEM ] waiting_trans_ack changed to 1
Apr 11 16:19:54 [13372] cl15-02 pacemakerd: info: pcmk_quorum_notification:
Membership 22352: quorum retained (2)
Apr 11 16:19:54 [13378] cl15-02 crmd: info: pcmk_quorum_notification:
Membership 22352: quorum retained (2)
[12880] cl15-02 corosyncdebug [TOTEM ] entering OPERATIONAL state.
[12880] cl15-02 corosyncnotice [TOTEM ] A new membership (10.220.88.41:22352)
was formed. Members
[12880] cl15-02 corosyncdebug [SYNC ] Committing synchronization for
corosync configuration map access
Apr 11 16:19:54 [13373] cl15-02 cib: info: cib_process_request:
Forwarding cib_modify operation for section nodes to master
(origin=local/crmd/27157)
[12880] cl15-02 corosyncdebug [CMAP ] Not first sync -> no action
Apr 11 16:19:54 [13373] cl15-02 cib: info: cib_process_request:
Forwarding cib_modify operation for section status to master
(origin=local/crmd/27158)
[12880] cl15-02 corosyncdebug [CPG ] got joinlist message from node 0x2
[12880] cl15-02 corosyncdebug [CPG ] comparing: sender r(0)
ip(10.220.88.41) ; members(old:2 left:0)
[12880] cl15-02 corosyncdebug [CPG ] comparing: sender r(0)
ip(10.220.88.47) ; members(old:2 left:0)
[12880] cl15-02 corosyncdebug [CPG ] chosen downlist: sender r(0)
ip(10.220.88.41) ; members(old:2 left:0)
[12880] cl15-02 corosyncdebug [CPG ] got joinlist message from node 0x1
[12880] cl15-02 corosyncdebug [SYNC ] Committing synchronization for
corosync cluster closed process group service v1.01
Apr 11 16:19:54 [13373] cl15-02 cib: info: cib_process_request:
Completed cib_modify operation for section nodes: OK (rc=0,
origin=cl15-02/crmd/27157, version=0.18.22)
[12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[0] group:clvmd,
ip:r(0) ip(10.220.88.41) , pid:35677
Apr 11 16:19:54 [13373] cl15-02 cib: info: cib_process_request:
Completed cib_modify operation for section status: OK (rc=0,
origin=cl15-02/crmd/27158, version=0.18.22)
[12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[1]
group:dlm:ls:clvmd\x00, ip:r(0) ip(10.220.88.41) , pid:34995
[12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[2]
group:dlm:controld\x00, ip:r(0) ip(10.220.88.41) , pid:34995
[12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[3] group:crmd\x00,
ip:r(0) ip(10.220.88.41) , pid:13378
[12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[4] group:attrd\x00,
ip:r(0) ip(10.220.88.41) , pid:13376
[12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[5]
group:stonith-ng\x00, ip:r(0) ip(10.220.88.41) , pid:13374
[12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[6] group:cib\x00,
ip:r(0) ip(10.220.88.41) , pid:13373
[12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[7]
group:pacemakerd\x00, ip:r(0) ip(10.220.88.41) , pid:13372
[12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[8] group:crmd\x00,
ip:r(0) ip(10.220.88.47) , pid:12879
[12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[9] group:attrd\x00,
ip:r(0) ip(10.220.88.47) , pid:12877
[12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[10]
group:stonith-ng\x00, ip:r(0) ip(10.220.88.47) , pid:12875
[12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[11] group:cib\x00,
ip:r(0) ip(10.220.88.47) , pid:12874
[12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[12]
group:pacemakerd\x00, ip:r(0) ip(10.220.88.47) , pid:12873
[12880] cl15-02 corosyncdebug [VOTEQ ] flags: quorate: Yes Leaving: No WFA
Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote: No
QdeviceMasterWins: No
[12880] cl15-02 corosyncdebug [VOTEQ ] got nodeinfo message from cluster node
1
[12880] cl15-02 corosyncdebug [VOTEQ ] nodeinfo message[1]: votes: 1,
expected: 3 flags: 1
[12880] cl15-02 corosyncdebug [VOTEQ ] flags: quorate: Yes Leaving: No WFA
Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote: No
QdeviceMasterWins: No
[12880] cl15-02 corosyncdebug [VOTEQ ] total_votes=2, expected_votes=3
[12880] cl15-02 corosyncdebug [VOTEQ ] node 1 state=1, votes=1, expected=3
[12880] cl15-02 corosyncdebug [VOTEQ ] node 2 state=1, votes=1, expected=3
[12880] cl15-02 corosyncdebug [VOTEQ ] node 3 state=2, votes=1, expected=3
[12880] cl15-02 corosyncdebug [VOTEQ ] lowest node id: 1 us: 1
[12880] cl15-02 corosyncdebug [VOTEQ ] highest node id: 2 us: 1
[12880] cl15-02 corosyncdebug [VOTEQ ] got nodeinfo message from cluster node
1
[12880] cl15-02 corosyncdebug [VOTEQ ] nodeinfo message[0]: votes: 0,
expected: 0 flags: 0
[12880] cl15-02 corosyncdebug [VOTEQ ] got nodeinfo message from cluster node
2
[12880] cl15-02 corosyncdebug [VOTEQ ] nodeinfo message[2]: votes: 1,
expected: 3 flags: 1
[12880] cl15-02 corosyncdebug [VOTEQ ] flags: quorate: Yes Leaving: No WFA
Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote: No
QdeviceMasterWins: No
[12880] cl15-02 corosyncdebug [VOTEQ ] got nodeinfo message from cluster node
2
[12880] cl15-02 corosyncdebug [VOTEQ ] nodeinfo message[0]: votes: 0,
expected: 0 flags: 0
[12880] cl15-02 corosyncdebug [SYNC ] Committing synchronization for
corosync vote quorum service v1.0
[12880] cl15-02 corosyncdebug [VOTEQ ] total_votes=2, expected_votes=3
[12880] cl15-02 corosyncdebug [VOTEQ ] node 1 state=1, votes=1, expected=3
[12880] cl15-02 corosyncdebug [VOTEQ ] node 2 state=1, votes=1, expected=3
[12880] cl15-02 corosyncdebug [VOTEQ ] node 3 state=2, votes=1, expected=3
[12880] cl15-02 corosyncdebug [VOTEQ ] lowest node id: 1 us: 1
[12880] cl15-02 corosyncdebug [VOTEQ ] highest node id: 2 us: 1
[12880] cl15-02 corosyncnotice [QUORUM] Members[2]: 1 2
[12880] cl15-02 corosyncdebug [QUORUM] sending quorum notification to (nil),
length = 56
[12880] cl15-02 corosyncnotice [MAIN ] Completed service synchronization,
ready to provide service.
[12880] cl15-02 corosyncdebug [TOTEM ] waiting_trans_ack changed to 0
[12880] cl15-02 corosyncdebug [QUORUM] got quorate request on 0x7f5a907749a0
[12880] cl15-02 corosyncdebug [TOTEM ] entering GATHER state from 11(merge
during join).
and we do not get them when there is only a single network interface in the
systems.
--------------------------------------------------------------------------------------
These are the network configurations on the three hosts:
[root@cl15-02 ~]# ifconfig | grep inet
inet 10.220.88.41 netmask 255.255.248.0 broadcast 10.220.95.255
inet 10.220.246.50 netmask 255.255.255.0 broadcast 10.220.246.255
inet 127.0.0.1 netmask 255.0.0.0
[root@cl15-08 ~]# ifconfig | grep inet
inet 10.220.88.47 netmask 255.255.248.0 broadcast 10.220.95.255
inet 10.220.246.51 netmask 255.255.255.0 broadcast 10.220.246.255
inet 127.0.0.1 netmask 255.0.0.0
[root@cl15-09 ~]# ifconfig | grep inet
inet 10.220.88.48 netmask 255.255.248.0 broadcast 10.220.95.255
inet 10.220.246.59 netmask 255.255.255.0 broadcast 10.220.246.255
inet 127.0.0.1 netmask 255.0.0.0
-----------------------------------------------------------------------------------
corosync-quorumtool output:
[root@cl15-02 ~]# corosync-quorumtool
Quorum information
------------------
Date: Mon Apr 11 15:46:26 2016
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 1
Ring ID: 18952
Quorate: Yes
Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
1 1 cl15-02 (local)
2 1 cl15-08
3 1 cl15-09
---------------------------------------------------------------------------
/etc/corosync/corosync.conf:
[root@cl15-02 ~]# cat /etc/corosync/corosync.conf
totem {
version: 2
secauth: off
cluster_name: gfs_cluster
transport: udpu
}
nodelist {
node {
ring0_addr: cl15-02
nodeid: 1
}
node {
ring0_addr: cl15-08
nodeid: 2
}
node {
ring0_addr: cl15-09
nodeid: 3
}
}
quorum {
provider: corosync_votequorum
}
logging {
debug: on
to_logfile: yes
logfile: /var/log/cluster/corosync.log
to_syslog: yes
}
--
Linux-cluster mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cluster