- **status**: review --> fixed
- **Comment**:

changeset:   7828:afd54406a4ad
tag:         tip
user:        Hans Nordeback <hans.nordeb...@ericsson.com>
date:        Tue Jun 21 09:40:03 2016 +0200
files:       00-README.conf osaf/services/infrastructure/fm/config/fmd.conf 
osaf/services/infrastructure/fm/fms/Makefile.am 
osaf/services/infrastructure/fm/fms/fm_cb.h 
osaf/services/infrastructure/fm/fms/fm_main.c scripts/opensaf_reboot
description:
fm: Add support for remote fencing using STONITH V3 [#1859]


changeset:   7827:c0487080b508
user:        Anders Widell <anders.wid...@ericsson.com>
date:        Wed Aug 03 14:56:00 2016 +0200
files:       osaf/services/infrastructure/fm/config/fmd.conf 
osaf/services/infrastructure/fm/fms/fm_cb.h 
osaf/services/infrastructure/fm/fms/fm_main.c 
osaf/services/infrastructure/fm/fms/fm_mds.c
description:
fm: Add support for self-fencing V2 [#1859]

In situations where remote fencing is not possible, this patch adds support for 
self-fencing.




---

** [tickets:#1859] fm: Add support for STONITH fencing**

**Status:** fixed
**Milestone:** 5.1.FC
**Created:** Wed Jun 01, 2016 01:49 PM UTC by Hans Nordebäck
**Last Updated:** Wed Jun 22, 2016 07:16 AM UTC
**Owner:** Hans Nordebäck
**Attachments:**

- 
[self_fencing.diff](https://sourceforge.net/p/opensaf/tickets/1859/attachment/self_fencing.diff)
 (6.1 kB; text/x-patch)


Split brain can occur in OpenSAF if either both links between the two 
controllers are "lost"
or one of the controller "live hangs". 

OpenSAF handles and detects split-brain via FM and uses PLM to fence the other 
system controller using reboot. PLM only supports target environments running 
on particular hardware

Only a few split-brain cases has been seen and only when running in virtualized 
environments:
1) Virtual switches problems that makes SCs isolated from each other.
2) Both TIPC links between the SCs are "down/lost", e.g. TIPC tolerance time 
too low, non-redundant links, other latencies etc.
3) A system controller in a virtual machine is "live hanging" for several 
seconds, e.g. due to  
live migration/snapshotting.

To be able to do power fencing in a virtualized environment this ticket suggests
to use STONITH.
When FM detects its peer is not available, both active and standby, in an 
virtualized environment the active FM system controller will use STONITH to 
power fence the FM standby system controller. The FM standby system controller 
will also power fence, but with a delay,  the active FM system controller.
This will solve the above identified split-brain cases.
It will also fit well with the roaming feature. E.g. after power fencing a 
standby controller a new standby controller will automatically be selected by 
the roaming feature.

In situations where remote fencing is not possible, we could also enhance the 
criteria for when a node should self-fence. The attached patch 
self_fencing.diff is a proposal for a mechanism that can detect when a 
controller node suddenly loses contact with all the other nodes in the cluster, 
in which case it will assume the fault is local and reboot itself.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to