kronosnet

Thomas Lamprecht Fri, 14 Jun 2019 08:51:05 -0700

On 6/14/19 3:03 PM, Fabian Grünbichler wrote:
> ISSUE 1
> 
> creating a cluster with
> 
>  pvesh create /cluster/config -clustername thomastest -link0 192.168.21.71 
> -link5 10.0.0.71 
> 
> creates an invalid corosync.conf:
> 
> Jun 14 14:23:50 clustertest71 systemd[1]: Starting Corosync Cluster Engine...
> Jun 14 14:23:50 clustertest71 corosync[2160]:   [MAIN  ] Corosync Cluster 
> Engine 3.0.1-dirty starting up
> Jun 14 14:23:50 clustertest71 corosync[2160]:   [MAIN  ] Corosync built-in 
> features: dbus monitoring watchdog systemd xmlconf snmp pie relro bindnow
> Jun 14 14:23:50 clustertest71 corosync[2160]:   [MAIN  ] parse error in 
> config: Not all nodes have the same number of links
> Jun 14 14:23:50 clustertest71 corosync[2160]:   [MAIN  ] Corosync Cluster 
> Engine exiting with status 8 at main.c:1386.
> Jun 14 14:23:50 clustertest71 systemd[1]: corosync.service: Main process 
> exited, code=exited, status=8/n/a
> Jun 14 14:23:50 clustertest71 systemd[1]: corosync.service: Failed with 
> result 'exit-code'.
> Jun 14 14:23:50 clustertest71 systemd[1]: Failed to start Corosync Cluster 
> Engine.
> 
> $ cat /etc/corosync/corosync.conf
> 
> logging {
>   debug: off
>   to_syslog: yes
> }
> 
> nodelist {
>   node {
>     name: clustertest71
>     nodeid: 1
>     quorum_votes: 1
>     ring0_addr: 192.168.21.71
>     ring5_addr: 10.0.0.71
>   }
> }
> 
> quorum {
>   provider: corosync_votequorum
> }
> 
> totem {
>   cluster_name: thomastest
>   config_version: 1
>   interface {
>     linknumber: 0
>   }
>   interface {
>     linknumber: 5
>   }
>   ip_version: ipv4-6
>   link_mode: passive
>   secauth: on
>   version: 2
> }
> 
> doing the same with link0 and link1 instead of link0 and link5 works.
> subsequently changing corosync.conf to have link0 and linkX with X != 1
> also works, although the reload complains with the same error message
> (cmap and corosync-cfgtool show the updated status just fine).
> restarting corosync fails, again with the status shown above.
> 
> haven't checked yet whether that is an issue on our side or corosync,
> but probably worth an investigation ;)


this is a "bug" of corosync..

the following check in exec/totemconfig fails:

for (i=0; i<num_configured; i++) {
        if (totem_config->interfaces[i].member_count != members) err...
}

here, num_configured is the correct number of configured interfaces
(2), the struct entry member_count is 1 (one node, which seems OK here
too) but members is 0...

members is set a bit above with:
members = totem_config->interfaces[0].member_count;


but totem_config->interfaces gets dynamically allocated with:
totem_config->interfaces = malloc (sizeof (struct totem_interface) * 
INTERFACE_MAX);

So it's not the configured interfaces (0 being the lowest one
configured, 1 the next, ...) but the _actual_ links from 0 to
INTERFACE_MAX - 1 (== 7)

So here it _always_ gets the membercount from link0, if that is
non-existent in the config then it's the default 0...

So either, link0 isn't as optional as you meant/wished or they have
at least one, and probably a few more, bugs where they falsely assume
that interfaces[0] is the first configured not link0...


_______________________________________________
pve-devel mailing list
pve-devel@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Re: [pve-devel] [PATCH cluster v2 0/8] initial API adaption to corosync 3/kronosnet

Reply via email to