Re: [ClusterLabs] 8 node cluster

2021-09-07 Thread Andrei Borzenkov
On Tue, Sep 7, 2021 at 8:37 PM M N S H SNGHL  wrote:
>
> Hello Team,
>
> I am looking for some suggestions here. I have created an 8 node HA cluster 
> on my SuSE hosts.
> Have configured certain group resources on it, which mostly run on a single 
> node.
>
> Everything works fine, but I am at a fix for certain requirements -
>
> 1) The resources should work fine even if 7 nodes go down, which means 
> surviving node should still be running the resources.
> I did set "last_man_standing (and last_man_standing_window) option, with ATB 
> .. but it didn't really work or didn't dynamically reduce the expected votes.
> 2) Another requirement is - If all nodes in the cluster go down, and just one 
> (anyone) comes back up, it should pick up the resources and should run them.
>
> I tried setting ignore-quorum-policy to ignore, and which worked most of the 
> time... (yet to find the case where it doesn't work).. but I am suspecting, 
> wouldn't this setting cause split-brain in some cases?
>

Yes, the only way to do it is to ignore quorum and to resolve split
brain you must have working fencing/STONITH between nodes. This
applies to startup as well - if a cluster is incomplete, before
starting to manage resources missing nodes must be fenced.

It is not about "some case" - in general working fencing is necessary
even if you do not ignore quorum.

This means you may have race conditions on startup if nodes come up
with delay. But it is up to you to decide what is more important.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] 8 node cluster

2021-09-07 Thread Strahil Nikolov via Users
i would go with a VM hosting all resources and setup a 3-node Virtualization 
cluster.

The concept that the cluster should keep your resources up even if another 7 
nodes died is not good -> there could be a network issue or other cases where 
this approach won't (and should not) work.
As Antony mentioned -> you need quorum (majority that will agree what is going 
on) and stonith (a way to prevent the rest of the cluster to take the 
resources).
In your case , you can setup the cluster with last_man_standing and 
last_man_standing_window and it should work. Are you sure you dodn't drop more 
than 50% of the nodes simultaneously ?

Vest Regards,Strahil Nikolov
 
 
  On Tue, Sep 7, 2021 at 21:08, Antony Stone 
wrote:   On Tuesday 07 September 2021 at 19:37:33, M N S H SNGHL wrote:

> I am looking for some suggestions here. I have created an 8 node HA cluster
> on my SuSE hosts.

An even number of nodes is never a good idea.

> 1) The resources should work fine even if 7 nodes go down, which means
> surviving node should still be running the resources.

> I did set "last_man_standing (and last_man_standing_window) option, with
> ATB .. but it didn't really work or didn't dynamically reduce the expected
> votes.

What do the log files (especially on that "last man") tell you happened as you 
gradually reduced the number of nodes online?

> 2) Another requirement is - If all nodes in the cluster go down, and just
> one (anyone) comes back up, it should pick up the resources and should run
> them.

So, how should this one node realise that it is the only node awake and should 
be running the reources, and that there aren't {1..7} other nodes somewhere 
else on the network, all in the same situation, thinking "I can't connect to 
anyone else, but I'm alive, so I'll take on the resources"?

> I tried setting ignore-quorum-policy to ignore, and which worked most of
> the time... (yet to find the case where it doesn't work).. but I am
> suspecting, wouldn't this setting cause split-brain in some cases?

I think you're taking the wrong approach to HA.  Some number of nodes (plural) 
need to be in communication with each other in order for them to decide 
whether they have quorum or not, and can decide to be in charge of the 
resources.

Two basic rules of HA:

1. One node on its own has no clue whatever else is going on with the rest of 
the cluster, and therefore cannot decide to take charge

2. Quorum (unless you override it and really know what you're doing) requires 
>50% of nodes to be in agreement, and an even number of nodes can split into 
50:50, where neither half (literally) is >50%, so everything stops.  This is 
"split brain".

I have two questions:

 - why do you feel you need as many as 8 nodes when the resources will only be 
running on one node?

 - why do you specifically want 8 nodes instead of 7 or 9?


Antony.

-- 
The Royal Society for the Prevention of Cruelty to Animals was formed in 1824.
The National Society for the Prevention of Cruelty to Children was not formed 
until 1884.
That says something about the British.

                                                  Please reply to the list;
                                                        please *don't* CC me.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/
  
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] 8 node cluster

2021-09-07 Thread Antony Stone
On Tuesday 07 September 2021 at 19:37:33, M N S H SNGHL wrote:

> I am looking for some suggestions here. I have created an 8 node HA cluster
> on my SuSE hosts.

An even number of nodes is never a good idea.

> 1) The resources should work fine even if 7 nodes go down, which means
> surviving node should still be running the resources.

> I did set "last_man_standing (and last_man_standing_window) option, with
> ATB .. but it didn't really work or didn't dynamically reduce the expected
> votes.

What do the log files (especially on that "last man") tell you happened as you 
gradually reduced the number of nodes online?

> 2) Another requirement is - If all nodes in the cluster go down, and just
> one (anyone) comes back up, it should pick up the resources and should run
> them.

So, how should this one node realise that it is the only node awake and should 
be running the reources, and that there aren't {1..7} other nodes somewhere 
else on the network, all in the same situation, thinking "I can't connect to 
anyone else, but I'm alive, so I'll take on the resources"?

> I tried setting ignore-quorum-policy to ignore, and which worked most of
> the time... (yet to find the case where it doesn't work).. but I am
> suspecting, wouldn't this setting cause split-brain in some cases?

I think you're taking the wrong approach to HA.  Some number of nodes (plural) 
need to be in communication with each other in order for them to decide 
whether they have quorum or not, and can decide to be in charge of the 
resources.

Two basic rules of HA:

1. One node on its own has no clue whatever else is going on with the rest of 
the cluster, and therefore cannot decide to take charge

2. Quorum (unless you override it and really know what you're doing) requires 
>50% of nodes to be in agreement, and an even number of nodes can split into 
50:50, where neither half (literally) is >50%, so everything stops.  This is 
"split brain".

I have two questions:

 - why do you feel you need as many as 8 nodes when the resources will only be 
running on one node?

 - why do you specifically want 8 nodes instead of 7 or 9?


Antony.

-- 
The Royal Society for the Prevention of Cruelty to Animals was formed in 1824.
The National Society for the Prevention of Cruelty to Children was not formed 
until 1884.
That says something about the British.

   Please reply to the list;
 please *don't* CC me.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] 8 node cluster

2021-09-07 Thread M N S H SNGHL
Hello Team,

I am looking for some suggestions here. I have created an 8 node HA cluster
on my SuSE hosts.
Have configured certain group resources on it, which mostly run on a single
node.

Everything works fine, but I am at a fix for certain requirements -

1) The resources should work fine even if 7 nodes go down, which means
surviving node should still be running the resources.
I did set "last_man_standing (and last_man_standing_window) option, with
ATB .. but it didn't really work or didn't dynamically reduce the expected
votes.
2) Another requirement is - If all nodes in the cluster go down, and just
one (anyone) comes back up, it should pick up the resources and should run
them.

I tried setting ignore-quorum-policy to ignore, and which worked most of
the time... (yet to find the case where it doesn't work).. but I am
suspecting, wouldn't this setting cause split-brain in some cases?

Experts, please advise.

Thanks
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/