On 27.06.2023 07:21, Priyanka Balotra wrote:
Hi Andrei,
After this state the system went through some more fencings and we saw the
following state:

:~ # crm status
Cluster Summary:
   * Stack: corosync
   * Current DC: FILE-2 (version
2.1.2+20211124.ada5c3b36-150400.2.43-2.1.2+20211124.ada5c3b36) - partition
with quorum

It says "partition with quorum" so what exactly is the problem?

   * Last updated: Mon Jun 26 12:44:15 2023
   * Last change:  Mon Jun 26 12:41:12 2023 by root via cibadmin on FILE-2
   * 4 nodes configured
   * 11 resource instances configured

Node List:
   * Node FILE-1: UNCLEAN (offline)
   * Node FILE-4: UNCLEAN (offline)
   * Online: [ FILE-2 ]
   * Online: [ FILE-3 ]

At this stage FILE-1 and FILE-4 were continuously getting fenced (we have
device based stonith configured but the resource was not up ) .
Two nodes were online and two were offline. So quorum wasn't attained
again.
1)  For such a scenario we need help to be able to have one cluster live .
2)  And in cases where only one node of the cluster is up and others are
down we need the resources and cluster to be up .

Thanks
Priyanka

On Tue, Jun 27, 2023 at 12:25 AM Andrei Borzenkov <arvidj...@gmail.com>
wrote:

On 26.06.2023 21:14, Priyanka Balotra wrote:
Hi All,
We are seeing an issue where we replaced no-quorum-policy=ignore with
other
options in corosync.conf order to simulate the same behaviour :


*     wait_for_all: 0*

*        last_man_standing: 1        last_man_standing_window: 20000*

There was another property (auto-tie-breaker) tried but couldn't
configure
it as crm did not recognise this property.

But even after using these options, we are seeing that system is not
quorate if at least half of the nodes are not up.

Some properties from crm config are as follows:



*primitive stonith-sbd stonith:external/sbd \        params
pcmk_delay_base=5s.*




















*.property cib-bootstrap-options: \        have-watchdog=true \

dc-version="2.1.2+20211124.ada5c3b36-150400.2.43-2.1.2+20211124.ada5c3b36"
\        cluster-infrastructure=corosync \        cluster-name=FILE \
    stonith-enabled=true \        stonith-timeout=172 \
stonith-action=reboot \        stop-all-resources=false \
no-quorum-policy=ignorersc_defaults build-resource-defaults: \
resource-stickiness=1rsc_defaults rsc-options: \
resource-stickiness=100 \        migration-threshold=3 \
failure-timeout=1m \        cluster-recheck-interval=10minop_defaults
op-options: \        timeout=600 \        record-pending=true*

On a 4-node setup when the whole cluster is brought up together we see
error logs like:

*2023-06-26T11:35:17.231104+00:00 FILE-1 pacemaker-schedulerd[26359]:
warning: Fencing and resource management disabled due to lack of quorum*

*2023-06-26T11:35:17.231338+00:00 FILE-1 pacemaker-schedulerd[26359]:
warning: Ignoring malformed node_state entry without uname*

*2023-06-26T11:35:17.233771+00:00 FILE-1 pacemaker-schedulerd[26359]:
warning: Node FILE-2 is unclean!*

*2023-06-26T11:35:17.233857+00:00 FILE-1 pacemaker-schedulerd[26359]:
warning: Node FILE-3 is unclean!*

*2023-06-26T11:35:17.233957+00:00 FILE-1 pacemaker-schedulerd[26359]:
warning: Node FILE-4 is unclean!*


According to this output FILE-1 lost connection to three other nodes, in
which case it cannot be quorate.


Kindly help correct the configuration to make the system function
normally
with all resources up, even if there is just one node up.

Please let me know if any more info is needed.

Thanks
Priyanka


_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to