On 21.04.2022 18:26, john tillman wrote: >> Dne 20. 04. 22 v 20:21 john tillman napsal(a): >>>> On 20.04.2022 19:53, john tillman wrote: >>>>> I have a two node cluster that won't start any resources if only one >>>>> node >>>>> is booted; the pacemaker service does not start. >>>>> >>>>> Once the second node boots up, the first node will start pacemaker and >>>>> the >>>>> resources are started. All is well. But I would like the resources >>>>> to >>>>> start when the first node boots by itself. >>>>> >>>>> I thought the problem was with the wait_for_all option but I have it >>>>> set >>>>> to "0". >>>>> >>>>> On the node that is booted by itself, when I run "corosync-quorumtool" >>>>> I >>>>> see: >>>>> >>>>> [root@test00 ~]# corosync-quorumtool >>>>> Quorum information >>>>> ------------------ >>>>> Date: Wed Apr 20 16:05:07 2022 >>>>> Quorum provider: corosync_votequorum >>>>> Nodes: 1 >>>>> Node ID: 1 >>>>> Ring ID: 1.2f >>>>> Quorate: Yes >>>>> >>>>> Votequorum information >>>>> ---------------------- >>>>> Expected votes: 2 >>>>> Highest expected: 2 >>>>> Total votes: 1 >>>>> Quorum: 1 >>>>> Flags: 2Node Quorate >>>>> >>>>> Membership information >>>>> ---------------------- >>>>> Nodeid Votes Name >>>>> 1 1 test00 (local) >>>>> >>>>> >>>>> My config file look like this: >>>>> totem { >>>>> version: 2 >>>>> cluster_name: testha >>>>> transport: knet >>>>> crypto_cipher: aes256 >>>>> crypto_hash: sha256 >>>>> } >>>>> >>>>> nodelist { >>>>> node { >>>>> ring0_addr: test00 >>>>> name: test00 >>>>> nodeid: 1 >>>>> } >>>>> >>>>> node { >>>>> ring0_addr: test01 >>>>> name: test01 >>>>> nodeid: 2 >>>>> } >>>>> } >>>>> >>>>> quorum { >>>>> provider: corosync_votequorum >>>>> two_node: 1 >>>>> wait_for_all: 0 >>>>> } >>>>> >>>>> logging { >>>>> to_logfile: yes >>>>> logfile: /var/log/cluster/corosync.log >>>>> to_syslog: yes >>>>> timestamp: on >>>>> debug: on >>>>> syslog_priority: debug >>>>> logfile_priority: debug >>>>> } >>>>> >>>>> Fencing is disabled. >>>>> >>>> >>>> That won't work. >>>> >>>>> I've also looked in "corosync.log" but I don't know what to look for >>>>> to >>>>> diagnose this issue. I mean there are many lines similar to: >>>>> [QUORUM] This node is within the primary component and will provide >>>>> service. >>>>> and >>>>> [VOTEQ ] Sending quorum callback, quorate = 1 >>>>> and >>>>> [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: No First: Yes >>>>> Qdevice: No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: No >>>>> >>>>> Is there something specific I should look for in the log? >>>>> >>>>> So can a two node cluster work after booting only one node? Maybe it >>>>> never will and I am wasting a lot of time, yours and mine. >>>>> >>>>> If it can, what else can I investigate further? >>>>> >>>> >>>> Before node can start handling resources it needs to know status of >>>> other node. Without successful fencing there is no way to accomplish >>>> it. >>>> >>>> Yes, you can tell pacemaker to ignore unknown status. Depending on your >>>> resources this could simply prevent normal work or lead to data >>>> corruption. >>> >>> >>> Makes sense. Thank you. >>> >>> Perhaps some future enhancement could allow for this situation? I mean, >>> It might be desirable for some cases to allow for a single node to boot, >>> determine quorum by two_node=1 and wait_for_all=0, and start resources >>> without ever seeing the other node. Sure, there are dangers of split >>> brain but I can see special cases where I want the node to work alone >>> for >>> a period of time despite the danger. >>> >> >> Hi John, >> >> How about 'pcs quorum unblock'? >> >> Regards, >> Tomas >> > > > Tomas, > > Thank you for the suggestion. However it didn't work. It returned: > Error: unable to check quorum status > crm_mon: Error: cluster is not available on this node > I checked pacemaker, just in case, and it still isn't running. >
Either pacemaker or some service it depends upon attempted to start and failed or systemd still waits for some service that is required before pacemaker. Checks logs or provide "journalctl -b" output in this state. > I very curious how I could convince the cluster to start its resources on > one node in the event that the other node is not able to boot. But I'm > afraid the answer is either to use fencing or add a third node to the > cluster or both. > > -John > > >>> Thank you again. >>> >>> >>>> _______________________________________________ >>>> Manage your subscription: >>>> https://lists.clusterlabs.org/mailman/listinfo/users >>>> >>>> ClusterLabs home: https://www.clusterlabs.org/ >>>> >>>> >>> >>> >>> _______________________________________________ >>> Manage your subscription: >>> https://lists.clusterlabs.org/mailman/listinfo/users >>> >>> ClusterLabs home: https://www.clusterlabs.org/ >>> >> >> _______________________________________________ >> Manage your subscription: >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> ClusterLabs home: https://www.clusterlabs.org/ >> >> > > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/