> The problem is that, if you enable cman on boot, the fenced node will > try to join the cluster, fail to reach it's peer after post_join_delay > (default 6 seconds, iirc) and fence it's peer. That peer reboots, > starts cman, tries to connect, fenced it's peer... > > The easiest way to avoid this in 2-node clusters is to not let > cman/rgmanager start automatically. That way, if a node is fenced, it > will boot back up and you can log into remotely (assuming it's not > totally dead). When you know things are fixed, manually start cman. > I my case however, the node which is trying to join is fully operational and has network access. Also if you look at the configuration that I had in my original email, my post_join_delay is 360 (for testing purposes), so there is no way that a timeout occurs.
I might be wrong here, but judging from corosync's log file, the other node even joins the cluster successfully, before being marked for fencing by dlm_controld: Sep 11 11:14:09 corosync [CLM ] CLM CONFIGURATION CHANGE Sep 11 11:14:09 corosync [CLM ] New Configuration: Sep 11 11:14:09 corosync [CLM ] r(0) ip(10.xx.xx.1) Sep 11 11:14:09 corosync [CLM ] r(0) ip(10.xx.xx.2) Sep 11 11:14:09 corosync [CLM ] Members Left: Sep 11 11:14:09 corosync [CLM ] Members Joined: Sep 11 11:14:09 corosync [CLM ] r(0) ip(10.xx.xx.2) Sep 11 11:14:09 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed. Sep 11 11:14:09 corosync [QUORUM] Members[2]: 1 2 Sep 11 11:14:09 corosync [QUORUM] Members[2]: 1 2
-- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster