Re: [ClusterLabs] Pacemaker/corosync behavior in case of partial split brain

2021-08-06 Thread Andrei Borzenkov
On Thu, Aug 5, 2021 at 9:25 PM Andrei Borzenkov  wrote:
>
> Three nodes A, B, C. Communication between A and B is blocked
> (completely - no packet can come in both direction). A and B can
> communicate with C.
>
> I expected that result will be two partitions - (A, C) and (B, C). To my
> surprise, A went offline leaving (B, C) running. It was always the same
> node (with node id 1 if it matters, out of 1, 2, 3).
>
> How surviving partition is determined in this case?
>

For the sake of archives - this is how Totem protocol works. Which
node will be isolated is non-deterministic and depends on whether C
receives a message from A or B first. A will mark B as unreachable
(failed) and send a message to C; once C gets this message it marks B
as failed and ignores further messages from it (actually this will
cause B to mark C as failed in return). So the cluster will be split
in two partitions - (A, C) and B. B sends exactly the same message
that marks A as failed. Both messages are sent after consensus timeout
so at approximately the same moment.

> Can I be sure the same will also work in case of multiple nodes? I.e. if
> I have two sites with equal number of nodes and the third site as
> witness and connectivity between multi-node sites is lost but each site
> can communicate with witness. Will one site go offline? Which one?

This should work exactly the same and the isolated site is just as
non-deterministic. Moreover, it will also be non-deterministic if the
number of nodes on sites without connectivity is different (at last I
do not see anything in Totem that would depend on the number of nodes
unless Corosync adds some external knobs here). So in case of site A
and B with 3 nodes each and site C with 1 node and site A losing
connectivity to C we may equally end up with 6+1 split as well as 3+4
split.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Pacemaker/corosync behavior in case of partial split brain

2021-08-05 Thread Digimer

  
  
On 2021-08-05 2:25 p.m., Andrei
  Borzenkov wrote:


  Three nodes A, B, C. Communication between A and B is blocked
(completely - no packet can come in both direction). A and B can
communicate with C.

I expected that result will be two partitions - (A, C) and (B, C). To my
surprise, A went offline leaving (B, C) running. It was always the same
node (with node id 1 if it matters, out of 1, 2, 3).

How surviving partition is determined in this case?

Can I be sure the same will also work in case of multiple nodes? I.e. if
I have two sites with equal number of nodes and the third site as
witness and connectivity between multi-node sites is lost but each site
can communicate with witness. Will one site go offline? Which one?



In your case, your nodes were otherwise healthy so quorum worked.
  To properly avoid a split brain (when a node is not behaving
  properly, ie: lockups, bad RAM/CPU, etc) you realy need actual
  fencing. In such a case, whichever nodes maintain quorum, will
  fence the lost node (be it because it became inquorate or stopped
  behaving properly). 

As for the mechanics of how quorum is determined in your case
  above, I'll let one of the corosync people decide.

-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
  

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Pacemaker/corosync behavior in case of partial split brain

2021-08-05 Thread Andrei Borzenkov
Three nodes A, B, C. Communication between A and B is blocked
(completely - no packet can come in both direction). A and B can
communicate with C.

I expected that result will be two partitions - (A, C) and (B, C). To my
surprise, A went offline leaving (B, C) running. It was always the same
node (with node id 1 if it matters, out of 1, 2, 3).

How surviving partition is determined in this case?

Can I be sure the same will also work in case of multiple nodes? I.e. if
I have two sites with equal number of nodes and the third site as
witness and connectivity between multi-node sites is lost but each site
can communicate with witness. Will one site go offline? Which one?
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/