On Wed, Mar 14, 2018 at 10:35 AM, Muhammad Sharfuddin <m.sharfud...@nds.com.pk> wrote: > Hi Andrei, >>Somehow I miss corosync confiuration in this thread. Do you know >>wait-for-all is set (how?) or you just assume it? >> > solution found, I was not using "wait_for_all" option, I was assuming that > "two_node: 1" > would be sufficient: > > nodelist { > node { ring0_addr: 10.8.9.151 } > node { ring0_addr: 10.8.9.152 } > } > ###previously: > quorum { > two_node: 1 > provider: corosync_votequorum > } > ###now/fix: > quorum { > two_node: 1 > provider: corosync_votequorum > wait_for_all: 0 } > > My observation: > when I was not using "wait_for_all: 0" in corosync.conf, only ocfs2 > resources were > not running, rest of the resources were running fine because:
OK, I tested it and indeed, when wait_for_all is (explicitly) disabled, single node comes up quorate (immediately). It still requests fencing of other node. So trying to wrap my head around it 1. two_node=1 appears to only permanently set "in quorate" state for each node. So whether you have 1 or 2 nodes, you are in quorum. E.g. with expected_votes=2 even if I kill one node I am left with single node that believes it is in "partition with quorum". 2. two_node=1 implicitly sets wait_for_all which prevents corosync entering quorate state until all nodes are up. Once they have been up, we are left in quorum. As long as OCFS2 requires quorum to be attained this also explains your observation. > a - "two_node: 1" in corosync.conf file. > b - "no-quorum-policy=ignore" in cib. > If my reasoning above is correct, I question the value of wait_for_all=1 with two_node. This is difference between "pretending we have quorum" and "ignoring we have no quorum", but split between different layers. End effect is the same as long as corosync quorum state is not queried directly. > @ Klaus >> what I tried to point out is that "no-quorum-policy=ignore" >>is dangerous for services that do require a resource-manager. If you don't >>have any of those go with a systemd startup. >> > running a single node is obviously something in-acceptable, but say if both > the nodes crashes > and only node come back and if I start the resources via systemd then the > day the other node > come back, I have to stop the services via systemd, to start the resources > via cluster, while if a > single node cluster was running the other node simply joins the cluster and > no downtime would occur. > Exactly. There is simply no other way to sensibly use two node cluster without it and I argue that notion of quorum is not relevant to most parts of pacemaker operation at all as long as stonith wirks properly. Again - if you use two_node=1, your cluster is ALWAYS in quorum except initial startup. So no-quorum-policy=ignore is redundant. It is only needed because of implicit wait_for_all=1. But if everyone ignores implicit wait_for_all=1 anyway, what's the point to set it by default? _______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org