Re: [ClusterLabs] 8 node cluster
On Tue, Sep 7, 2021 at 8:37 PM M N S H SNGHL wrote: > > Hello Team, > > I am looking for some suggestions here. I have created an 8 node HA cluster > on my SuSE hosts. > Have configured certain group resources on it, which mostly run on a single > node. > > Everything works fine, but I am at a fix for certain requirements - > > 1) The resources should work fine even if 7 nodes go down, which means > surviving node should still be running the resources. > I did set "last_man_standing (and last_man_standing_window) option, with ATB > .. but it didn't really work or didn't dynamically reduce the expected votes. > 2) Another requirement is - If all nodes in the cluster go down, and just one > (anyone) comes back up, it should pick up the resources and should run them. > > I tried setting ignore-quorum-policy to ignore, and which worked most of the > time... (yet to find the case where it doesn't work).. but I am suspecting, > wouldn't this setting cause split-brain in some cases? > Yes, the only way to do it is to ignore quorum and to resolve split brain you must have working fencing/STONITH between nodes. This applies to startup as well - if a cluster is incomplete, before starting to manage resources missing nodes must be fenced. It is not about "some case" - in general working fencing is necessary even if you do not ignore quorum. This means you may have race conditions on startup if nodes come up with delay. But it is up to you to decide what is more important. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] 8 node cluster
i would go with a VM hosting all resources and setup a 3-node Virtualization cluster. The concept that the cluster should keep your resources up even if another 7 nodes died is not good -> there could be a network issue or other cases where this approach won't (and should not) work. As Antony mentioned -> you need quorum (majority that will agree what is going on) and stonith (a way to prevent the rest of the cluster to take the resources). In your case , you can setup the cluster with last_man_standing and last_man_standing_window and it should work. Are you sure you dodn't drop more than 50% of the nodes simultaneously ? Vest Regards,Strahil Nikolov On Tue, Sep 7, 2021 at 21:08, Antony Stone wrote: On Tuesday 07 September 2021 at 19:37:33, M N S H SNGHL wrote: > I am looking for some suggestions here. I have created an 8 node HA cluster > on my SuSE hosts. An even number of nodes is never a good idea. > 1) The resources should work fine even if 7 nodes go down, which means > surviving node should still be running the resources. > I did set "last_man_standing (and last_man_standing_window) option, with > ATB .. but it didn't really work or didn't dynamically reduce the expected > votes. What do the log files (especially on that "last man") tell you happened as you gradually reduced the number of nodes online? > 2) Another requirement is - If all nodes in the cluster go down, and just > one (anyone) comes back up, it should pick up the resources and should run > them. So, how should this one node realise that it is the only node awake and should be running the reources, and that there aren't {1..7} other nodes somewhere else on the network, all in the same situation, thinking "I can't connect to anyone else, but I'm alive, so I'll take on the resources"? > I tried setting ignore-quorum-policy to ignore, and which worked most of > the time... (yet to find the case where it doesn't work).. but I am > suspecting, wouldn't this setting cause split-brain in some cases? I think you're taking the wrong approach to HA. Some number of nodes (plural) need to be in communication with each other in order for them to decide whether they have quorum or not, and can decide to be in charge of the resources. Two basic rules of HA: 1. One node on its own has no clue whatever else is going on with the rest of the cluster, and therefore cannot decide to take charge 2. Quorum (unless you override it and really know what you're doing) requires >50% of nodes to be in agreement, and an even number of nodes can split into 50:50, where neither half (literally) is >50%, so everything stops. This is "split brain". I have two questions: - why do you feel you need as many as 8 nodes when the resources will only be running on one node? - why do you specifically want 8 nodes instead of 7 or 9? Antony. -- The Royal Society for the Prevention of Cruelty to Animals was formed in 1824. The National Society for the Prevention of Cruelty to Children was not formed until 1884. That says something about the British. Please reply to the list; please *don't* CC me. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] 8 node cluster
On Tuesday 07 September 2021 at 19:37:33, M N S H SNGHL wrote: > I am looking for some suggestions here. I have created an 8 node HA cluster > on my SuSE hosts. An even number of nodes is never a good idea. > 1) The resources should work fine even if 7 nodes go down, which means > surviving node should still be running the resources. > I did set "last_man_standing (and last_man_standing_window) option, with > ATB .. but it didn't really work or didn't dynamically reduce the expected > votes. What do the log files (especially on that "last man") tell you happened as you gradually reduced the number of nodes online? > 2) Another requirement is - If all nodes in the cluster go down, and just > one (anyone) comes back up, it should pick up the resources and should run > them. So, how should this one node realise that it is the only node awake and should be running the reources, and that there aren't {1..7} other nodes somewhere else on the network, all in the same situation, thinking "I can't connect to anyone else, but I'm alive, so I'll take on the resources"? > I tried setting ignore-quorum-policy to ignore, and which worked most of > the time... (yet to find the case where it doesn't work).. but I am > suspecting, wouldn't this setting cause split-brain in some cases? I think you're taking the wrong approach to HA. Some number of nodes (plural) need to be in communication with each other in order for them to decide whether they have quorum or not, and can decide to be in charge of the resources. Two basic rules of HA: 1. One node on its own has no clue whatever else is going on with the rest of the cluster, and therefore cannot decide to take charge 2. Quorum (unless you override it and really know what you're doing) requires >50% of nodes to be in agreement, and an even number of nodes can split into 50:50, where neither half (literally) is >50%, so everything stops. This is "split brain". I have two questions: - why do you feel you need as many as 8 nodes when the resources will only be running on one node? - why do you specifically want 8 nodes instead of 7 or 9? Antony. -- The Royal Society for the Prevention of Cruelty to Animals was formed in 1824. The National Society for the Prevention of Cruelty to Children was not formed until 1884. That says something about the British. Please reply to the list; please *don't* CC me. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] 8 node cluster
Hello Team, I am looking for some suggestions here. I have created an 8 node HA cluster on my SuSE hosts. Have configured certain group resources on it, which mostly run on a single node. Everything works fine, but I am at a fix for certain requirements - 1) The resources should work fine even if 7 nodes go down, which means surviving node should still be running the resources. I did set "last_man_standing (and last_man_standing_window) option, with ATB .. but it didn't really work or didn't dynamically reduce the expected votes. 2) Another requirement is - If all nodes in the cluster go down, and just one (anyone) comes back up, it should pick up the resources and should run them. I tried setting ignore-quorum-policy to ignore, and which worked most of the time... (yet to find the case where it doesn't work).. but I am suspecting, wouldn't this setting cause split-brain in some cases? Experts, please advise. Thanks ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/