This scenario is related to cluster startup. It is quite possible that a node
(old) that went down is still in the progress of going down(due to big
application termination timeouts) and such a node could well be a controller
node.
I think what has to be done is that RDE should lookout for the presence of such
slow (stranded?) node and should fail the opensafd startup by returning failure
to NID.
One way to detect such slow nodes is to make RDE subscribe for AMFD and during
node startup if RDE determines to become ACTIVE but receives an AMFD MDS UP
event from a node where no peer RDE is running, then RDE can return failure to
NID.
---
** [tickets:#2151] osaf: system in not in correct state during Act controller
comming up**
**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Mon Oct 31, 2016 10:54 AM UTC by Nagendra Kumar
**Last Updated:** Tue Nov 01, 2016 10:49 AM UTC
**Owner:** nobody
Steps to reproduce:
1. Start two controllers(SC-1 Act, SC-2 Standby) and two paylods. Configure 50
components on SC-2 and unlock them. Keep 1 sec delay in each component stop
script.
2. Stop SC-1 and after that, stop SC-2.
3. During SC-2 is going down, start SC-1.
Observed behaviour:
Since components are taking time in stopping all components during 'opensad
stop' of SC-2, Amfnd hasn't exited. But, all middleware components assignments
are stopped. Only Amfnd and Amfd is alive with few more components to stop.
But SC-1 has come up till Amfd and since two Amfd is Act now, so SC-2 Amfd
exits by saying "Duplicate ACTIVE detected, exiting".
Till this time, services states including Amfd is in bad state as they couldn't
differentiate whether it is headless state or failover. This is true also as
the system is in half middle of headless and failover.
Expected behaviour
In my view:
FMS should stop and shouldn't proceed if peer is going down. i.e. FMS should
figure out on SC-1 that the peer system is going down. And should allow SC-1
only if all services are down i.e. it gets node down (may be cb->immd_down &&
cb->immnd_down && cb->amfnd_down && cb->amfd_down && cb->fm_down).
---
Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets