This is not a crash. The problem is  opensaf shtudown is getting timed 
out because amfnd could not cleanup all the components (proxied).
The trace "ER AMF director unexpectedly crashed" is coming as opensafd 
is killing AMFD after opensaf shutdown gets timed out.
The cleanup callback timeout for proxied component is higher than the 
opensafd shutdown timeout.

Patch works fine if cleanup of proxied is successful. But the case of 
unsuccessful cleanup of proxied in
avnd_comp_clc_terming_cleanfail_hdler() is not handled. Even if cleanup 
of proxied fails, AMFND should clean proxy for the
graceful shutdown of opensaf.


Thanks,
Praveen
On 02-May-14 10:15 PM, Alex Jones wrote:
>   osaf/services/saf/amf/amfnd/clc.cc |  27 +++++++++++++++++++++++++++
>   1 files changed, 27 insertions(+), 0 deletions(-)
>
>
> May  2 15:22:11 linux osafamfnd[2990]: NO 
> 'safSu=Management-SU1,safSg=Management-2N,safApp=ManagementApp' Presence 
> State INSTANTIATED => TERMINATION_FAILED
> May  2 15:22:38 linux osafamfnd[2990]: ER AMF director unexpectedly crashed
> May  2 15:22:38 linux osafamfnd[2990]: Rebooting OpenSAF NodeId = 131343 EE 
> Name = 
> safEE=Linux_os_hosting_clm_node,safHE=Stirling_Blade_slot_1,safDomain=Q50chassis,
>  Reason: local AVD down(Adest) or both AVD down(Vdest) received, OwnNodeId = 
> 131343, SupervisionTime = 60
> May  2 15:22:38 linux opensaf_reboot: Rebooting local node; timeout=60
>
> On OpenSAF shutdown, amfnd doesn't wait for the proxied components of a proxy 
> to
> terminate before terminating the proxy.  It terminates the proxy via "CLC-CLI
> cleanup", and the proxied component via the
> SaAmfProxiedComponentCleanupCallbackT in the proxy.  If the proxied component
> takes some time to terminate, amfnd doesn't wait and terminates the proxy.
> amfnd never gets the response from SaAmfProxiedComponentCleanupCallbackT
> (because the proxy has been terminated), and thinks that the termination of 
> the
> proxied component timed out.
>
> amfnd needs to delay terminating the proxy until all the proxy's proxied
> components have have terminated.
>
> diff --git a/osaf/services/saf/amf/amfnd/clc.cc 
> b/osaf/services/saf/amf/amfnd/clc.cc
> --- a/osaf/services/saf/amf/amfnd/clc.cc
> +++ b/osaf/services/saf/amf/amfnd/clc.cc
> @@ -1808,6 +1808,13 @@ uint32_t avnd_comp_clc_inst_clean_hdler(
>               avnd_comp_cbq_del(cb, comp, true);
>               /* call the cleanup callback */
>               rc = avnd_comp_cbk_send(cb, comp, AVSV_AMF_PXIED_COMP_CLEAN, 0, 
> 0);
> +     } else if (m_AVND_COMP_TYPE_IS_PROXY(comp) && comp->pxied_list.n_nodes) 
> {
> +             /*
> +              * If there are still outstanding proxied components we need to
> +              * wait for them to terminate first.
> +              */
> +             TRACE_LEAVE();
> +             return rc;
>       } else
>               /* cleanup the comp */
>               rc = avnd_comp_clc_cmd_execute(cb, comp, 
> AVND_COMP_CLC_CMD_TYPE_CLEANUP);
> @@ -2081,6 +2088,26 @@ uint32_t avnd_comp_clc_terming_cleansucc
>               m_AVND_COMP_REG_PARAM_RESET(cb, comp);
>               m_AVND_SEND_CKPT_UPDT_ASYNC_UPDT(cb, comp, 
> AVND_CKPT_COMP_CONFIG);
>       }
> +        else if (AVND_TERM_STATE_OPENSAF_SHUTDOWN_STARTED == cb->term_state) 
> {
> +             /*
> +              * The proxied component has finished, so unregister it and
> +              * check if there are anymore proxied components left.  If not,
> +              * we can now terminate the proxy.
> +              */
> +             AVND_COMP *proxy = comp->pxy_comp;
> +             rc = avnd_comp_unreg_prc(cb, comp, proxy);
> +
> +             /*
> +              * avnd_comp_unreg_prc will remove the proxy label if there are
> +              * no more proxied components left.
> +              */
> +             if (rc == NCSCC_RC_SUCCESS && !m_AVND_COMP_TYPE_IS_PROXY(proxy))
> +             {
> +                     rc = avnd_comp_clc_fsm_run(avnd_cb,
> +                                     proxy,
> +                                     AVND_COMP_CLC_PRES_FSM_EV_CLEANUP);
> +             }
> +     }
>   
>       /* determine if this is a case of component failover */
>       if (m_AVND_COMP_IS_FAILED(comp) && m_AVND_SU_IS_FAILED(su) &&
>
>
> ------------------------------------------------------------------------------
> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
> Instantly run your Selenium tests across 300+ browser/OS combos.  Get
> unparalleled scalability from the best Selenium testing platform available.
> Simple to use. Nothing to install. Get started now for free."
> http://p.sf.net/sfu/SauceLabs
> _______________________________________________
> Opensaf-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/opensaf-devel


------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.
Get unparalleled scalability from the best Selenium testing platform available
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to