- **Milestone**: 5.19.07 --> future


---

** [tickets:#2936] imm: Select one from multiple headless partitioned cluster 
to join into one cluster**

**Status:** assigned
**Milestone:** future
**Created:** Thu Oct 04, 2018 08:47 AM UTC by Minh Hon Chau
**Last Updated:** Wed Jul 03, 2019 06:28 AM UTC
**Owner:** Vu Minh Nguyen


In the event of split network that separates nodes of cluster into multiple 
paritions, each partition may have one or no SC. Network merges back, 2 SCs 
will be self-fenced and rebooted as current OpenSAF behavior, leaves multiple 
partitions as headless clusters. Also, before network merges back, if the SC in 
each partition shutdown, which also leaves multiple partitions in headless. 
Once a SC comes back, we have multiple headless clusters joining into a single 
cluster. These headless clusters will be conflicted in term of IMM data and AMF 
assignments.

In order to address this problem, this ticket introduces the partition 
selection in IMM, in which IMM is responsible to select payloads from only one 
partition to be alive among others, the others will be rebooted when all nodes 
join into a single cluster.

To do that, IMMND will hold an aditional same cluster-wide information about 
the node id at which its active IMMD locates and a unique id sent by the active 
IMMD.  These two information will help to distinguish which IMMND used to be on 
the same partition or from other partitions with the coord. Particularly, when 
an SC comes up from headless, one of these veteran IMMNDs will be elected to be 
the coord, and any IMMND which has these global data different with the coord 
will order its local node reboot at the time of receiving intro rsp from the 
active IMMD.

For example:
Normal cluster: SC1, SC2, PL3, PL4, PL5, PL6, PL7, PL8
Split network first time: 
    P#1: PL3, PL4 (previously has SC1 as active SC, and unique id: 1111)
    P#2: SC1, SC2, PL5, PL6, PL7, PL8
Split network second time:
    P#1: PL3, PL4 (previously has SC1 as active SC, and unique id: 1111)
    P#2: SC1, PL5, PL6 (has SC1 as active and the unique id: 2222 )
    P#3: SC2, PL7 PL8 (has SC2 as active, and the unique id: 3333)
Network merge (both SC reboots), or shutdown both SCs. Then SC1 comes to active 
role and elects IMMND on PL5 to be the coord. When IMMNDs on PL3, PL4, PL7, PL8 
request to sync data, it will be rejected by the active IMMD and these nodes 
will be rebooted afterward.
   


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to