Re: [ClusterLabs] Pacemaker/corosync behavior in case of partial split brain
On Thu, Aug 5, 2021 at 9:25 PM Andrei Borzenkov wrote: > > Three nodes A, B, C. Communication between A and B is blocked > (completely - no packet can come in both direction). A and B can > communicate with C. > > I expected that result will be two partitions - (A, C) and (B, C). To my > surprise, A went offline leaving (B, C) running. It was always the same > node (with node id 1 if it matters, out of 1, 2, 3). > > How surviving partition is determined in this case? > For the sake of archives - this is how Totem protocol works. Which node will be isolated is non-deterministic and depends on whether C receives a message from A or B first. A will mark B as unreachable (failed) and send a message to C; once C gets this message it marks B as failed and ignores further messages from it (actually this will cause B to mark C as failed in return). So the cluster will be split in two partitions - (A, C) and B. B sends exactly the same message that marks A as failed. Both messages are sent after consensus timeout so at approximately the same moment. > Can I be sure the same will also work in case of multiple nodes? I.e. if > I have two sites with equal number of nodes and the third site as > witness and connectivity between multi-node sites is lost but each site > can communicate with witness. Will one site go offline? Which one? This should work exactly the same and the isolated site is just as non-deterministic. Moreover, it will also be non-deterministic if the number of nodes on sites without connectivity is different (at last I do not see anything in Totem that would depend on the number of nodes unless Corosync adds some external knobs here). So in case of site A and B with 3 nodes each and site C with 1 node and site A losing connectivity to C we may equally end up with 6+1 split as well as 3+4 split. ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Pacemaker/corosync behavior in case of partial split brain
On 2021-08-05 2:25 p.m., Andrei Borzenkov wrote: Three nodes A, B, C. Communication between A and B is blocked (completely - no packet can come in both direction). A and B can communicate with C. I expected that result will be two partitions - (A, C) and (B, C). To my surprise, A went offline leaving (B, C) running. It was always the same node (with node id 1 if it matters, out of 1, 2, 3). How surviving partition is determined in this case? Can I be sure the same will also work in case of multiple nodes? I.e. if I have two sites with equal number of nodes and the third site as witness and connectivity between multi-node sites is lost but each site can communicate with witness. Will one site go offline? Which one? In your case, your nodes were otherwise healthy so quorum worked. To properly avoid a split brain (when a node is not behaving properly, ie: lockups, bad RAM/CPU, etc) you realy need actual fencing. In such a case, whichever nodes maintain quorum, will fence the lost node (be it because it became inquorate or stopped behaving properly). As for the mechanics of how quorum is determined in your case above, I'll let one of the corosync people decide. -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] Pacemaker/corosync behavior in case of partial split brain
Three nodes A, B, C. Communication between A and B is blocked (completely - no packet can come in both direction). A and B can communicate with C. I expected that result will be two partitions - (A, C) and (B, C). To my surprise, A went offline leaving (B, C) running. It was always the same node (with node id 1 if it matters, out of 1, 2, 3). How surviving partition is determined in this case? Can I be sure the same will also work in case of multiple nodes? I.e. if I have two sites with equal number of nodes and the third site as witness and connectivity between multi-node sites is lost but each site can communicate with witness. Will one site go offline? Which one? ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/