Hi,
The attached patch works for this ticket. (Note: The afmterminate callback has to be corrected for directors also, will do that in a separate patch) Please note that when running this test for IMM, the immadm or amf-adm commands do not return to the command prompt, even though the command had functionally succeeded, i.e. IMM got successfully restarted. I suspect that the reason could be either be that AMF is not responding the admin-op result to IMM or the result is being discarded by IMM. Neel/Nagendra, could you please confirm whether the issue(response to admin op) is with IMM or AMF? See snapshot below: Jul 17 13:08:33 SC-2 osafamfnd[8169]: NO Admin restart requested for 'safComp=IMMND,safSu=SC-2,safSg=NoRed,safApp=OpenSAF' Jul 17 13:08:33 SC-2 osafimmnd[8457]: NO Received AMF component terminate callback, exiting Jul 17 13:08:33 SC-2 osafamfd[8159]: NO Re-initializing with IMM Jul 17 13:08:33 SC-2 osafimmnd[8530]: Started Jul 17 13:08:34 SC-2 osafimmnd[8530]: NO SERVER STATE: IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING Jul 17 13:08:34 SC-2 osafimmnd[8530]: NO SERVER STATE: IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING Jul 17 13:08:34 SC-2 osafimmnd[8530]: NO SERVER STATE: IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING Jul 17 13:08:34 SC-2 osafimmnd[8530]: NO NODE STATE-> IMM_NODE_ISOLATED Jul 17 13:08:35 SC-2 osafimmd[8101]: NO Ruling epoch noted as:10 on IMMD standby Jul 17 13:08:35 SC-2 osafimmd[8101]: NO IMMND coord at 2010f Jul 17 13:08:35 SC-2 osafimmnd[8530]: NO NODE STATE-> IMM_NODE_W_AVAILABLE Jul 17 13:08:35 SC-2 osafimmnd[8530]: NO SERVER STATE: IMM_SERVER_SYNC_PENDING --> IMM_SERVER_SYNC_CLIENT Jul 17 13:08:36 SC-2 osafimmnd[8530]: NO NODE STATE-> IMM_NODE_FULLY_AVAILABLE 2171 Jul 17 13:08:36 SC-2 osafimmnd[8530]: NO RepositoryInitModeT is SA_IMM_INIT_FROM_FILE Jul 17 13:08:36 SC-2 osafimmnd[8530]: NO Epoch set to 10 in ImmModel Jul 17 13:08:36 SC-2 immadm: IN Received PROC_STALE_CLIENTS Jul 17 13:08:36 SC-2 osafimmd[8101]: NO SBY: New Epoch for IMMND process at node 2010f old epoch: 9 new epoch:10 Jul 17 13:08:36 SC-2 osafimmd[8101]: NO IMMND coord at 2010f Jul 17 13:08:36 SC-2 osafimmd[8101]: NO SBY: New Epoch for IMMND process at node 2020f old epoch: 0 new epoch:10 Jul 17 13:08:36 SC-2 osafimmnd[8530]: NO Implementer connected: 33 (MsgQueueService131599) <283, 2020f> Jul 17 13:08:36 SC-2 osafimmnd[8530]: NO SERVER STATE: IMM_SERVER_SYNC_CLIENT --> IMM SERVER READY Jul 17 13:08:36 SC-2 osafimmnd[8530]: NO Implementer (applier) connected: 34 (@safLogService) <511, 2020f> Jul 17 13:08:36 SC-2 osafimmnd[8530]: NO Implementer (applier) connected: 35 (@safAmfService2020f) <512, 2020f> Jul 17 13:08:37 SC-2 osafamfd[8159]: NO Finished re-initializing with IMM Thanks, Mathi. From: Mathi Naickan [mailto:[email protected]] Sent: Tuesday, July 16, 2013 12:36 PM To: [opensaf:tickets] Subject: [opensaf:tickets] Re: #501 amf: No node directors register to AMF within time after "#7 cleanup instead of terminate used at component restart" I checked the NDs. I think we should remove these sleeps(legacy). Also, the exits should be styled like the daemon_exit()s. We also need to test such ‘exit’s from the terminatecallback for directors as well and consider special classes like NTF where we ought to call the likes of stop_ntfimcn(). Will get back on this. Thanks, Mathi. From: Praveen [mailto:[email protected]] Sent: Monday, July 15, 2013 9:35 AM To: [opensaf:tickets] Subject: [opensaf:tickets] Re: #501 amf: No node directors register to AMF within time after "#7 cleanup instead of terminate used at component restart" Can sleep(1) be added before giving response to AMF? Thanks Praveen On 15-Jul-13 8:10 AM, Nagendra Kumar wrote: There is no problem with AMF as amf is running instantiate script for all the services(cpnd, glnd, mqnd, smfnd). The problem resides in these services, because it is sleeping for 1 seconds after giving amf response in the terminate callback. Ex: cpnd_amf_comp_terminate_callback saAmfResponse(cb->amf_hdl, invocation, saErr); ncshm_give_hdl(gl_cpnd_cb_hdl); sleep(1); LOG_NO("Received AMF component terminate callback, exiting"); exit(0); When instantiate script is executed by amf, since the process is still up and running(because of sleep of 1 second), 'start_daemon -p $pidfile $binary $args' becomes ineffective and the processes(e.g. cpnd) doesn't start. I tested by removing sleep and all worked as expected. So, it is advised in other services to find out why sleep of 1 was introduced and how we can get rid of sleep. _ HYPERLINK "http://sourceforge.net/p/opensaf/tickets/501/"HYPERLINK "http://sourceforge.net/p/opensaf/tickets/501/"[tickets:#501] http://sourceforge.net/p/opensaf/tickets/501/ amf: No node directors register to AMF within time after "#7 cleanup instead of terminate used at component restart" Status: unassigned Created: Thu Jul 11, 2013 07:47 AM UTC by Ingvar Bergström Last Updated: Thu Jul 11, 2013 07:47 AM UTC Owner: nobody After introduction of patches solving "#7 cleanup instead of terminate used at component restart", no node directors registers to AMF within time according to messages log. I have tried SMFND, CPND, GLND and MQND. It seems however that the main routines of the node director daemons are not started until 10 seconds after the terminate callback (after the registration timeout). It is very easy to see the fault by entering command "amf-adm restart safComp=xxxND,safSu=SC-1,safSg=NoRed,safApp=OpenSAF" _ Sent from sourceforge.net because HYPERLINK "mailto:[email protected]"[email protected] is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list. _ See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk _ Opensaf-tickets mailing list HYPERLINK "mailto:[email protected]"[email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-tickets _ HYPERLINK "http://sourceforge.net/p/opensaf/tickets/501/"HYPERLINK "http://sourceforge.net/p/opensaf/tickets/501/"[tickets:#501] amf: No node directors register to AMF within time after "#7 cleanup instead of terminate used at component restart" Status: unassigned Created: Thu Jul 11, 2013 07:47 AM UTC by Ingvar Bergström Last Updated: Mon Jul 15, 2013 02:42 AM UTC Owner: nobody After introduction of patches solving "#7 cleanup instead of terminate used at component restart”, no node directors registers to AMF within time according to messages log. I have tried SMFND, CPND, GLND and MQND. It seems however that the main routines of the node director daemons are not started until 10 seconds after the terminate callback (after the registration timeout). It is very easy to see the fault by entering command "amf-adm restart safComp=xxxND,safSu=SC-1,safSg=NoRed,safApp=OpenSAF" _ Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/opensaf/tickets/501/ To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/ _____ HYPERLINK "http://sourceforge.net/p/opensaf/tickets/501/"[tickets:#501] amf: No node directors register to AMF within time after "#7 cleanup instead of terminate used at component restart" Status: unassigned Created: Thu Jul 11, 2013 07:47 AM UTC by Ingvar Bergström Last Updated: Mon Jul 15, 2013 02:42 AM UTC Owner: nobody After introduction of patches solving "#7 cleanup instead of terminate used at component restart”, no node directors registers to AMF within time according to messages log. I have tried SMFND, CPND, GLND and MQND. It seems however that the main routines of the node director daemons are not started until 10 seconds after the terminate callback (after the registration timeout). It is very easy to see the fault by entering command "amf-adm restart safComp=xxxND,safSu=SC-1,safSg=NoRed,safApp=OpenSAF" _____ Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/opensaf/tickets/501/ To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/
501_osaf.patch
Description: Binary data
------------------------------------------------------------------------------ See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________ Opensaf-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-devel
