I've been trying to build a model cluster using three virtual machines
on my home server.  Each VM boots off its own dedicated partition
(CentOS 7.3).  One partition is designated to be the common /home
partition for the VMs, (on the real machine it will mount as /cluster).
I'm intending to run GFS2 on the shared partition, so I need to
configure DLM and corosync.  That's where I'm getting bogged down.

The VMs and the real machine are bridged onto one ethernet.  There is
another ethernet in the main machine on a different network, but that is
not used for clustering.  The ethernet port is connected to a switch
which in turn connects to a BT Home Hub 6.  All four adresses are
static, Network Manager is off, ssh works across the nodes without a
password and ping gives sensible times.

--------------%<-------------------
# brctl show
bridge name     bridge id       STP enabled     interfaces
br3             XXXXXXXXX       no              enp3s0
                                                vnet0
                                                vnet1
                                                vnet2
virbr0          XXXXXXXXX       yes             virbr0-nic
--------------%<-------------------

When I start corosync each node starts up but does not see the others.
For instance I see:

--------------%<----------------------
# corosync-quorumtool
Quorum information
------------------
Date:             Sun Sep 10 12:56:56 2017
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          3
Ring ID:          3/28648
Quorate:          No

Votequorum information
----------------------
Expected votes:   4
Highest expected: 4
Total votes:      1
Quorum:           3 Activity blocked
Flags:

Membership information
----------------------
    Nodeid      Votes Name
         3          1 192.168.1.52 (local)
----------------%<-------------------

All four nodes are similar, but with different node IDs, IP addresses
and Ring IDs.

The documentation warns that not all routers will handle multicast
datagrams correctly.  I therefore attempted to force unicast
communication by making the following changes from the distributed
corosync.conf:

        transport: updu
        cluster_name: <set to the same as the domain>
#       crypto_cipher: none
#       crypto_hash: none
#               mcastaddr: 239.255.1.1
#               mcastport: 5405
#               ttl: 1

The following are unchanged:

        version: 2
        secauth: off
                ringnumber: 0
                bindnetaddr: 192.168.1.0

The nodelist is:

---------%<----------------
nodelist {
        node {
                ring0_addr: 192.168.1.2
                nodeid: 1
        }
        node {
                ring0_addr: 192.168.1.51
                nodeid: 2
        }
        node {
                ring0_addr: 192.168.1.52
                nodeid: 3
        }
        node {
                ring0_addr: 192.168.1.53
                nodeid: 4
        }
}
--------%<------------------

logging and quorum are as supplied.

Any help will be gratefully received.

Regards,
Martin

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos

Reply via email to