This scenario is related to cluster startup. It is quite possible that a node 
(old) that went down is still in the progress of going down(due to big 
application termination timeouts) and such a node could well be a controller 
node.

I think what has to be done is that RDE should lookout for the presence of such 
slow (stranded?) node and should fail the opensafd startup by returning failure 
to NID.
One way to detect such slow nodes is to make RDE subscribe for AMFD and during 
node startup if RDE determines to become ACTIVE but receives an AMFD MDS UP 
event from a node where no peer RDE is running, then RDE can return failure to 
NID.



---

** [tickets:#2151] osaf: system in not in correct state during Act controller 
comming up**

**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Mon Oct 31, 2016 10:54 AM UTC by Nagendra Kumar
**Last Updated:** Tue Nov 01, 2016 10:49 AM UTC
**Owner:** nobody


Steps to reproduce:
1. Start two controllers(SC-1 Act, SC-2 Standby) and two paylods. Configure 50 
components on SC-2 and unlock them. Keep 1 sec delay in each component stop 
script.
2. Stop SC-1 and after that, stop SC-2.
3. During SC-2 is going down, start SC-1.

Observed behaviour:
Since components are taking time in stopping all components during 'opensad 
stop' of SC-2, Amfnd hasn't exited. But, all middleware components assignments 
are stopped. Only Amfnd and Amfd is alive with few more components to stop.
But SC-1 has come up till Amfd and since two Amfd is Act now, so SC-2 Amfd 
exits by saying "Duplicate ACTIVE detected, exiting".
Till this time, services states including Amfd is in bad state as they couldn't 
differentiate whether it is headless state or failover. This is true also as 
the system is in half middle of headless and failover.


Expected behaviour
In my view:
FMS should stop and shouldn't proceed if peer is going down. i.e. FMS should 
figure out on SC-1 that the peer system is going down. And should allow SC-1 
only if all services are down i.e. it gets node down (may be cb->immd_down && 
cb->immnd_down && cb->amfnd_down && cb->amfd_down && cb->fm_down).





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to