---
** [tickets:#2161] AMF: Unexpected assignment due to headless recovery steps**
**Status:** assigned
**Milestone:** 5.2.FC
**Labels:** headless recovery
**Created:** Thu Nov 03, 2016 02:44 AM UTC by Minh Hon Chau
**Last Updated:** Thu Nov 03, 2016 02:44 AM UTC
**Owner:** Minh Hon Chau
**Attachments:**
- [log.tgz](https://sourceforge.net/p/opensaf/tickets/2161/attachment/log.tgz)
(1.2 MB; application/x-compressed)
A si-swap test cases as below:
- Issue si-swap command, PL3 host SU1 having Active assignment, PL4 hosts SU2
having Standby assignment
- Delay QUIESCED csiSet callback on SU1
- Stop both SCs
- Release QUIESCED csiSet callback on SU1
- Stop PL4
- Restart SCs
- Observation: SU1 gets csiRemove callback.
Below is a snippet of AMFD trace where the problem happens
Nov 3 12:13:49.902975 osafamfd [475:sgproc.cc:1045] >> avd_su_si_assign_evh:
id:2, node:2030f, act:5, 'safSu=1,safSg=1,safApp=osaftest', '', ha:3, err:1,
single:0
Nov 3 12:13:49.903450 osafamfd [475:sg_2n_fsm.cc:2354] >> susi_success:
'safSu=1,safSg=1,safApp=osaftest' act=5, hastate=3, TEST sg_fsm_state=2
Nov 3 12:13:49.903455 osafamfd [475:sg_2n_fsm.cc:1873] >>
susi_success_su_oper: 'safSu=1,safSg=1,safApp=osaftest' act=5, state=3
Nov 3 12:13:49.903459 osafamfd [475:sg_2n_fsm.cc:0477] >> avd_sg_2n_act_susi:
'safSg=1,safApp=osaftest'
...
Nov 3 12:13:49.903605 osafamfd [475:sg_2n_fsm.cc:0555] << avd_sg_2n_act_susi:
act: 'safSu=1,safSg=1,safApp=osaftest', stdby: 'safSu=2,safSg=1,safApp=osaftest'
Nov 3 12:13:49.903610 osafamfd [475:sgproc.cc:2358] >> avd_sg_su_si_del_snd:
'safSu=1,safSg=1,safApp=osaftest'
...
Nov 3 12:13:49.904102 osafamfd [475:sgproc.cc:2396] << avd_sg_su_si_del_snd
Nov 3 12:13:49.904107 osafamfd [475:sg.cc:1693] >> set_fsm_state
Nov 3 12:13:49.904112 osafamfd [475:sg.cc:1696] TR safSg=1,safApp=osaftest
sg_fsm_state 2 => 1
When AMFD receives assignment response, SG_2N::susi_success_su_oper is called.
In normal scenario, the next step is SU2 will take over Active assignment
because SU2 is IN-SERVICE.
In scenario of headless recovery, SU2 has absent SUSI so SU2 is OUT_OF_SERVICE
and can not take Active assignment.
The root cause of this problem is, when there is a SU becoming OUT-OF-SERVICE
due to node reboot, the situation should be handle in node_fail_su_oper, where
the assignment of OUT-OF-SERVICE SU will
be removed. In the other words, the susi_success_su_oper is not supposed to
handle an OUT-OF-SERVICE SU but still having assignment. When
susi_success_su_oper() is called, SU having assignment must be IN-SERVICE.
The original cause is from headless recovery, which currently outweights
pending assignment than absent assignment, this can be seen in cluster.cc
cluster.cc
if (i_sg->any_assignment_in_progress() == false) {
if (i_sg->any_assignment_absent() == false) {
i_sg->set_fsm_state(AVD_SG_FSM_STABLE);
} else {
// failover with ABSENT SUSI, which had already
been removed during headless
i_sg->failover_absent_assignment();
}
}
It works in most of SG FSM state, however there is a problem in SG_FSM_SU_OPER
where assignment needs to be moved over to another SU.
A solution is that AMFD can revert the order of headless recovery, which
prioritizes to failover absent assignment first. This change should also be
working, because it is supported in non-headless, where a node restarts while
assignment on another node is still in progress.
---
Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
_______________________________________________
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets