Hi all,

I don't know if this has been addressed before, but I couldn't find anything on a fast manner.

We have a corosync cluster to manage an active/passive MySQL service with DRBD underneath. Those two servers are in fact VMs running on top of two different XenServer hypervisors. The hypervisors are connected with an LACP active-active link to a stacked switch.

What's happening is if we reboot a stack unit, the LACP will take some time to flip the established sessions to the other link. This little glitch is long enough to trigger a member lost in Corosync. You see the rest, both nodes are master, and when network is back, DRBD split-brains.

Is there anything we can do to tolerate such failures which last around 20 to 30sec?

--
Francois Gaudreault
Architecte de Solution Cloud | Cloud Solutions Architect
[email protected]
514-629-6775
- - -
CloudOps
420 rue Guy
Montréal QC  H3J 1S6
www.cloudops.com
@CloudOps_

_______________________________________________
discuss mailing list
[email protected]
http://lists.corosync.org/mailman/listinfo/discuss

Reply via email to