Ok I see. 
Currently resurrect is not done for handles with replies outstanding:


uint32_t imma_proc_resurrect_client(IMMA_CB *cb, SaImmHandleT immHandle, bool 
isOm, SaAisErrorT *err_resurrect)
{
.....

        if (cl_node->replyPending) {
                TRACE_4("Can not resurrect client with pending replies, client 
now exposed");
                /* Catches on-going async admin OM op as well as blocked calls 
*/
                cl_node->exposed = true;
                goto failure;
        }

.....
}


This could be improved to some extent if the use case is important.
But the admin-owner handle used for the admin op must have release-on-finalize 
set to false.
Othersise the admin-owner allocations are tore down with the immd that is going 
down.
That is intentional to avoid lingering resurectable handles blocking the sysem 
by hogging
resources that could otherwise not be deallocated until a future uncertain 
resurrect. 


/AndersBj

-----Original Message-----
From: Mathivanan Naickan Palanivelu [mailto:[email protected]] 
Sent: den 18 juli 2013 16:50
To: Anders Björnerstedt; Reddy Neelakanta Reddy Peddavandla; Praveen Malviya
Cc: [email protected]
Subject: RE: [devel] Early patch for #501 for review/testing (was Re: #501 amf: 
No node directors register to AMF within time after "#7 cleanup instead of 
terminate used at component restart")

> 
> You mean there was an immnd crash ?
> Resurrect only deals with crashes of hte local immnd.
>

'Like', an immnd crash! In this case, the IMMND was restarted by the 'restart' 
AMF admin operation of the IMMND 'component'.
The general use case is an admin restart of a standby or payload IMMND 
triggered from any node in the cluster.
The example/particular case in this mail thread is about the OM client from a 
standby controller is invoking the 'admin restart' command of the IMMND on the 
same standby controller!
-Mathi.
 
> /AndersBj
> 
> -----Original Message-----
> From: Mathivanan Naickan Palanivelu [mailto:[email protected]]
> Sent: den 18 juli 2013 16:38
> To: Anders Björnerstedt; Reddy Neelakanta Reddy Peddavandla; Praveen 
> Malviya
> Cc: [email protected]
> Subject: RE: [devel] Early patch for #501 for review/testing (was Re: 
> #501
> amf: No node directors register to AMF within time after "#7 cleanup 
> instead of terminate used at component restart")
> 
> But, the OM client handle would have got resurrected here and that 
> must be the reason why the imm-adm client is waiting/blocked until it 
> eventually times out!?
> -Mathi.
> 
> 
> > -----Original Message-----
> > From: Anders Björnerstedt [mailto:[email protected]]
> > Sent: Thursday, July 18, 2013 6:03 PM
> > To: Mathivanan Naickan Palanivelu; Reddy Neelakanta Reddy 
> > Peddavandla; Praveen Malviya
> > Cc: [email protected]
> > Subject: RE: [devel] Early patch for #501 for review/testing (was Re:
> > #501
> > amf: No node directors register to AMF within time after "#7 cleanup 
> > instead of terminate used at component restart")
> >
> > Its not as simple as that.
> > In this case the invoking om-client has moved.
> > Thus the reply is to be sent to a different om-handle (as seen by 
> > the
> immsv).
> >
> > /AndersBj
> >
> > -----Original Message-----
> > From: Mathivanan Naickan Palanivelu 
> > [mailto:[email protected]]
> > Sent: den 18 juli 2013 14:36
> > To: Anders Björnerstedt; Reddy Neelakanta Reddy Peddavandla; Praveen 
> > Malviya
> > Cc: [email protected]
> > Subject: RE: [devel] Early patch for #501 for review/testing (was Re:
> > #501
> > amf: No node directors register to AMF within time after "#7 cleanup 
> > instead of terminate used at component restart")
> >
> > Can't IMMSv subscribe for IMMND dests?
> > Thanks,
> > Mathi.
> >
> > > -----Original Message-----
> > > From: Anders Björnerstedt 
> > > [mailto:[email protected]]
> > > Sent: Thursday, July 18, 2013 5:30 PM
> > > To: Neelakanta Reddy; praveen malviya
> > > Cc: [email protected]
> > > Subject: Re: [devel] Early patch for #501 for review/testing (was Re:
> > > #501
> > > amf: No node directors register to AMF within time after "#7 
> > > cleanup instead of terminate used at component restart")
> > >
> > > Sounds like you could have needed to use the "continuationId"
> > > parameter to saImmOmAdminOperationInvoke().
> > > Unfortunately this A.2.1 feature is not yet implemented in the immsv.
> > >
> > > https://sourceforge.net/p/opensaf/tickets/51/
> > >
> > >
> > > /AndersBj
> > >
> > >
> > > -----Original Message-----
> > > From: Neelakanta Reddy [mailto:[email protected]]
> > > Sent: den 18 juli 2013 13:41
> > > To: praveen malviya
> > > Cc: [email protected]
> > > Subject: Re: [devel] Early patch for #501 for review/testing (was Re:
> > > #501
> > > amf: No node directors register to AMF within time after "#7 
> > > cleanup instead of terminate used at component restart")
> > >
> > > HI Mathi/Praveen,
> > >
> > > I misunderstood, the flow of admin operation related to the component.
> > >
> > > After analyzing the logs the following is the reason why the reply 
> > > can not be
> > > sent:
> > >
> > > The admin operation,to terminate IMMND is called at standby. The 
> > > implementer is the active amfd.
> > >
> > > The active amfd sends the admin operation result to local active 
> > > IMMND, active IMMND tries to send the result to the IMMND(standby) 
> > > where the admin operation is called, the mds adest that is stored 
> > > in the active IMMND is the adest of the old IMMND(standby).
> > >
> > > Because of this the following error message will come at the 
> > > active
> > > controller:
> > >
> > > ER Problem in sending to peer IMMND over MDS. Discarding admin op
> > reply.
> > >
> > >
> > > Thanks,
> > > Neel.
> > > On Thursday 18 July 2013 04:52 PM, praveen malviya wrote:
> > > > Hi,
> > > > For restart admin on any component AMFD sends admin operation
> > > message
> > > > to corresponding AMFND.
> > > > AMFND will restart the component. When the operation will be in 
> > > > progress presence state of the component will transition from 
> > > > INSTANTIATED to  RESTARTING and then from RESTARTING to
> > > INSTANTIATED.
> > > > AMFND updates presence state to AMFD whenever it changes,  but
> > AMFD
> > > > will respond to IMM for the completion of operation only when 
> > > > component presence state becomes INSTANTIATED.
> > > >
> > > > Thanks,
> > > > Praveen.
> > > > On 17-Jul-13 7:09 PM, Neelakanta Reddy wrote:
> > > >> Hi Mathi,
> > > >>
> > > >> After giving the terminate message to local amnfnd, amfd 
> > > >> immediately sends the admin operation result.
> > > >>
> > > >> The amfnd sends the message to the IMMND, the IMMND is
> processing
> > > in
> > > >> the immnd_amf_comp_terminate_callback, which will terminate
> > IMMND.
> > > >> The admin operation result also arrives at local IMMND. since 
> > > >> the terminate callback is executed first, the IMMND will not 
> > > >> get the chance to execute the admin operation result.
> > > >>
> > > >> The admin operation initiated for terminating immnd will 
> > > >> eventually leads to TIMEOUT.
> > > >>
> > > >> Thanks,
> > > >> Neel.
> > > >>
> > > >>
> > > >> On Wednesday 17 July 2013 01:22 PM, Mathivanan Naickan 
> > > >> Palanivelu
> > > wrote:
> > > >>> Hi,
> > > >>>
> > > >>> The attached patch works for this ticket. (Note: The 
> > > >>> afmterminate callback has to be corrected for directors also, 
> > > >>> will do that in a separate patch)
> > > >>>
> > > >>> Please note that when running this test for IMM, the immadm or 
> > > >>> amf-adm commands do not return to the command prompt, even
> > > though
> > > >>> the command
> > > >>>
> > > >>> had functionally succeeded, i.e. IMM got successfully restarted.
> > > >>>
> > > >>> I suspect that the reason could be either be that AMF is not 
> > > >>> responding the admin-op result to IMM or the result is being 
> > > >>> discarded by IMM.
> > > >>>
> > > >>> Neel/Nagendra, could you please confirm whether the 
> > > >>> issue(response to admin op) is with IMM or AMF?
> > > >>>
> > > >>> See snapshot below:
> > > >>>
> > > >>> Jul 17 13:08:33 SC-2 osafamfnd[8169]: NO Admin restart 
> > > >>> requested for 'safComp=IMMND,safSu=SC-2,safSg=NoRed,safApp=OpenSAF'
> > > >>>
> > > >>> Jul 17 13:08:33 SC-2 osafimmnd[8457]: NO Received AMF 
> > > >>> component terminate callback, exiting
> > > >>>
> > > >>> Jul 17 13:08:33 SC-2 osafamfd[8159]: NO Re-initializing with 
> > > >>> IMM
> > > >>>
> > > >>> Jul 17 13:08:33 SC-2 osafimmnd[8530]: Started
> > > >>>
> > > >>> Jul 17 13:08:34 SC-2 osafimmnd[8530]: NO SERVER STATE:
> > > >>> IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING
> > > >>>
> > > >>> Jul 17 13:08:34 SC-2 osafimmnd[8530]: NO SERVER STATE:
> > > >>> IMM_SERVER_CLUSTER_WAITING -->
> > > IMM_SERVER_LOADING_PENDING
> > > >>>
> > > >>> Jul 17 13:08:34 SC-2 osafimmnd[8530]: NO SERVER STATE:
> > > >>> IMM_SERVER_LOADING_PENDING -->
> IMM_SERVER_SYNC_PENDING
> > > >>>
> > > >>> Jul 17 13:08:34 SC-2 osafimmnd[8530]: NO NODE STATE-> 
> > > >>> IMM_NODE_ISOLATED
> > > >>>
> > > >>> Jul 17 13:08:35 SC-2 osafimmd[8101]: NO Ruling epoch noted 
> > > >>> as:10 on IMMD standby
> > > >>>
> > > >>> Jul 17 13:08:35 SC-2 osafimmd[8101]: NO IMMND coord at 2010f
> > > >>>
> > > >>> Jul 17 13:08:35 SC-2 osafimmnd[8530]: NO NODE STATE-> 
> > > >>> IMM_NODE_W_AVAILABLE
> > > >>>
> > > >>> Jul 17 13:08:35 SC-2 osafimmnd[8530]: NO SERVER STATE:
> > > >>> IMM_SERVER_SYNC_PENDING --> IMM_SERVER_SYNC_CLIENT
> > > >>>
> > > >>> Jul 17 13:08:36 SC-2 osafimmnd[8530]: NO NODE STATE-> 
> > > >>> IMM_NODE_FULLY_AVAILABLE 2171
> > > >>>
> > > >>> Jul 17 13:08:36 SC-2 osafimmnd[8530]: NO RepositoryInitModeT 
> > > >>> is SA_IMM_INIT_FROM_FILE
> > > >>>
> > > >>> Jul 17 13:08:36 SC-2 osafimmnd[8530]: NO Epoch set to 10 in 
> > > >>> ImmModel
> > > >>>
> > > >>> Jul 17 13:08:36 SC-2 immadm: IN Received PROC_STALE_CLIENTS
> > > >>>
> > > >>> Jul 17 13:08:36 SC-2 osafimmd[8101]: NO SBY: New Epoch for 
> > > >>> IMMND process at node 2010f old epoch: 9  new epoch:10
> > > >>>
> > > >>> Jul 17 13:08:36 SC-2 osafimmd[8101]: NO IMMND coord at 2010f
> > > >>>
> > > >>> Jul 17 13:08:36 SC-2 osafimmd[8101]: NO SBY: New Epoch for 
> > > >>> IMMND process at node 2020f old epoch: 0  new epoch:10
> > > >>>
> > > >>> Jul 17 13:08:36 SC-2 osafimmnd[8530]: NO Implementer connected:
> > > >>> 33
> > > >>> (MsgQueueService131599) <283, 2020f>
> > > >>>
> > > >>> Jul 17 13:08:36 SC-2 osafimmnd[8530]: NO SERVER STATE:
> > > >>> IMM_SERVER_SYNC_CLIENT --> IMM SERVER READY
> > > >>>
> > > >>> Jul 17 13:08:36 SC-2 osafimmnd[8530]: NO Implementer (applier)
> > > >>> connected: 34 (@safLogService) <511, 2020f>
> > > >>>
> > > >>> Jul 17 13:08:36 SC-2 osafimmnd[8530]: NO Implementer (applier)
> > > >>> connected: 35 (@safAmfService2020f) <512, 2020f>
> > > >>>
> > > >>> Jul 17 13:08:37 SC-2 osafamfd[8159]: NO Finished 
> > > >>> re-initializing with IMM
> > > >>>
> > > >>> Thanks,
> > > >>>
> > > >>> Mathi.
> > > >>>
> > > >>> *From:*Mathi Naickan [mailto:[email protected]]
> > > >>> *Sent:* Tuesday, July 16, 2013 12:36 PM
> > > >>> *To:* [opensaf:tickets]
> > > >>> *Subject:* [opensaf:tickets] Re: #501 amf: No node directors 
> > > >>> register to AMF within time after "#7 cleanup instead of 
> > > >>> terminate used at component restart"
> > > >>>
> > > >>> I checked the NDs. I think we should remove these sleeps(legacy).
> > > >>>
> > > >>> Also, the exits should be styled like the daemon_exit()s.
> > > >>>
> > > >>> We also need to test such 'exit's from the terminatecallback 
> > > >>> for directors as well and consider special classes like NTF 
> > > >>> where we ought to
> > > >>>
> > > >>> call the likes of stop_ntfimcn().
> > > >>>
> > > >>> Will get back on this.
> > > >>>
> > > >>> Thanks,
> > > >>>
> > > >>> Mathi.
> > > >>>
> > > >>> From: Praveen [mailto:[email protected]]
> > > >>> Sent: Monday, July 15, 2013 9:35 AM
> > > >>> To: [opensaf:tickets]
> > > >>> Subject: [opensaf:tickets] Re: #501 amf: No node directors 
> > > >>> register to AMF within time after "#7 cleanup instead of 
> > > >>> terminate used at component restart"
> > > >>>
> > > >>> Can sleep(1) be added before giving response to AMF?
> > > >>>
> > > >>> Thanks
> > > >>> Praveen
> > > >>> On 15-Jul-13 8:10 AM, Nagendra Kumar wrote:
> > > >>>
> > > >>> There is no problem with AMF as amf is running instantiate 
> > > >>> script for all the services(cpnd, glnd, mqnd, smfnd).
> > > >>> The problem resides in these services, because it is sleeping 
> > > >>> for
> > > >>> 1 seconds after giving amf response in the terminate callback.
> > > >>> Ex:
> > > >>> cpnd_amf_comp_terminate_callback
> > > >>>
> > > >>> saAmfResponse(cb->amf_hdl,  invocation,  saErr); 
> > > >>> ncshm_give_hdl(gl_cpnd_cb_hdl); sleep(1); LOG_NO("Received
> AMF
> > > >>> component terminate callback, exiting"); exit(0);
> > > >>>
> > > >>> When instantiate script is executed by amf, since the process 
> > > >>> is still up and running(because of sleep of 1 second), 
> > > >>> 'start_daemon -p $pidfile $binary $args' becomes ineffective 
> > > >>> and the
> processes(e.g.
> > > >>> cpnd) doesn't start.
> > > >>>
> > > >>> I tested by removing sleep and all worked as expected.
> > > >>>
> > > >>> So, it is advised in other services to find out why sleep of 1 
> > > >>> was introduced and how we can get rid of sleep.
> > > >>>
> > > >>> *_*
> > > >>>
> > > >>> HYPERLINK
> > > >>> "http://sourceforge.net/p/opensaf/tickets/501/"[tickets:#501]
> > > >>> <http://sourceforge.net/p/opensaf/tickets/501/>
> > > >>> http://sourceforge.net/p/opensaf/tickets/501/ amf:
> > > >>> No node directors register to AMF within time after "#7 
> > > >>> cleanup instead of terminate used at component restart"
> > > >>>
> > > >>> Status: unassigned
> > > >>> Created: Thu Jul 11, 2013 07:47 AM UTC by Ingvar Bergström 
> > > >>> Last
> > > >>> Updated: Thu Jul 11, 2013 07:47 AM UTC
> > > >>> Owner: nobody
> > > >>>
> > > >>> After introduction of patches solving "#7 cleanup instead of 
> > > >>> terminate used at component restart", no node directors 
> > > >>> registers to AMF within time according to messages log.
> > > >>> I have tried SMFND, CPND, GLND and MQND.
> > > >>>
> > > >>> It seems however that the main routines of the node director 
> > > >>> daemons are not started until 10 seconds after the terminate 
> > > >>> callback (after the registration timeout).
> > > >>>
> > > >>> It is very easy to see the fault by entering command "amf-adm 
> > > >>> restart safComp=xxxND,safSu=SC-
> 1,safSg=NoRed,safApp=OpenSAF"
> > > >>>
> > > >>> *_*
> > > >>>
> > > >>> Sent from sourceforge.net because HYPERLINK
> > > >>> "mailto:[email protected]"opensaf-
> tickets@lists.
> > > >>> sourceforge.net
> > > >>>
> > > >>> is subscribed to
> > > >>> https://sourceforge.net/p/opensaf/tickets/
> > > >>>
> > > >>> To unsubscribe from further messages, a project admin can 
> > > >>> change settings at
> https://sourceforge.net/p/opensaf/admin/tickets/options.
> > > >>> Or, if this is a mailing list, you can unsubscribe from the 
> > > >>> mailing list.
> > > >>>
> > > >>> *_*
> > > >>>
> > > >>> See everything from the browser to the database with 
> > > >>> AppDynamics
> > > Get
> > > >>> end-to-end visibility with application monitoring from 
> > > >>> AppDynamics Isolate bottlenecks and diagnose root cause in
> seconds.
> > > >>> Start your free trial of AppDynamics Pro today!
> > > >>>
> > >
> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg
> > > >>> .clktrk
> > > >>>
> > > >>>
> > > >>> *_*
> > > >>>
> > > >>> Opensaf-tickets mailing list
> > > >>> HYPERLINK
> > > >>> "mailto:[email protected]"Opensaf-
> tickets@lists.
> > > >>> sourceforge.net
> > > >>>
> > > >>> https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
> > > >>>
> > > >>> *_*
> > > >>>
> > > >>> HYPERLINK
> > > >>> "http://sourceforge.net/p/opensaf/tickets/501/"[tickets:#501]
> > > >>> <http://sourceforge.net/p/opensaf/tickets/501/> amf: No node 
> > > >>> directors register to AMF within time after "#7 cleanup 
> > > >>> instead of terminate used at component restart"
> > > >>>
> > > >>> Status: unassigned
> > > >>> Created: Thu Jul 11, 2013 07:47 AM UTC by Ingvar Bergström 
> > > >>> Last
> > > >>> Updated: Mon Jul 15, 2013 02:42 AM UTC
> > > >>> Owner: nobody
> > > >>>
> > > >>> After introduction of patches solving "#7 cleanup instead of 
> > > >>> terminate used at component restart", no node directors 
> > > >>> registers to AMF within time according to messages log.
> > > >>> I have tried SMFND, CPND, GLND and MQND.
> > > >>>
> > > >>> It seems however that the main routines of the node director 
> > > >>> daemons are not started until 10 seconds after the terminate 
> > > >>> callback (after the registration timeout).
> > > >>>
> > > >>> It is very easy to see the fault by entering command "amf-adm 
> > > >>> restart safComp=xxxND,safSu=SC-
> 1,safSg=NoRed,safApp=OpenSAF"
> > > >>>
> > > >>> *_*
> > > >>>
> > > >>> Sent from sourceforge.net because you indicated interest in 
> > > >>> https://sourceforge.net/p/opensaf/tickets/501/
> > > >>>
> > > >>> To unsubscribe from further messages, please visit 
> > > >>> https://sourceforge.net/auth/subscriptions/
> > > >>>
> > > >>> --------------------------------------------------------------
> > > >>> --
> > > >>> --
> > > >>> --
> > > >>> ----
> > > >>>
> > > >>>
> > > >>> *[tickets:#501] 
> > > >>> <http://sourceforge.net/p/opensaf/tickets/501/>
> amf:
> > > >>> No node directors register to AMF within time after "#7 
> > > >>> cleanup instead of terminate used at component restart"*
> > > >>>
> > > >>> *Status:* unassigned
> > > >>> *Created:* Thu Jul 11, 2013 07:47 AM UTC by Ingvar Bergström 
> > > >>> *Last
> > > >>> Updated:* Mon Jul 15, 2013 02:42 AM UTC
> > > >>> *Owner:* nobody
> > > >>>
> > > >>> After introduction of patches solving "#7 cleanup instead of 
> > > >>> terminate used at component restart", no node directors 
> > > >>> registers to AMF within time according to messages log.
> > > >>> I have tried SMFND, CPND, GLND and MQND.
> > > >>>
> > > >>> It seems however that the main routines of the node director 
> > > >>> daemons are not started until 10 seconds after the terminate 
> > > >>> callback (after the registration timeout).
> > > >>>
> > > >>> It is very easy to see the fault by entering command "amf-adm 
> > > >>> restart safComp=xxxND,safSu=SC-
> 1,safSg=NoRed,safApp=OpenSAF"
> > > >>>
> > > >>> --------------------------------------------------------------
> > > >>> --
> > > >>> --
> > > >>> --
> > > >>> ----
> > > >>>
> > > >>>
> > > >>> Sent from sourceforge.net because you indicated interest in 
> > > >>> https://sourceforge.net/p/opensaf/tickets/501/
> > > >>>
> > > >>> To unsubscribe from further messages, please visit 
> > > >>> https://sourceforge.net/auth/subscriptions/
> > > >>>
> > > >> ---------------------------------------------------------------
> > > >> --
> > > >> --
> > > >> --
> > > >> ---------
> > > >>
> > > >> See everything from the browser to the database with 
> > > >> AppDynamics Get end-to-end visibility with application 
> > > >> monitoring from AppDynamics Isolate bottlenecks and diagnose root 
> > > >> cause in seconds.
> > > >> Start your free trial of AppDynamics Pro today!
> > > >>
> > >
> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.
> > > >> clktrk
> > > >>
> > > >> _______________________________________________
> > > >> Opensaf-devel mailing list
> > > >> [email protected]
> > > >> https://lists.sourceforge.net/lists/listinfo/opensaf-devel
> > > >
> > >
> > >
> > > ------------------------------------------------------------------
> > > --
> > > --
> > > -------- See everything from the browser to the database with 
> > > AppDynamics Get end-to-end visibility with application monitoring 
> > > from AppDynamics Isolate bottlenecks and diagnose root cause in
> seconds.
> > > Start your free trial of AppDynamics Pro today!
> > >
> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg
> > > .c
> > > lk
> > > trk
> > > _______________________________________________
> > > Opensaf-devel mailing list
> > > [email protected]
> > > https://lists.sourceforge.net/lists/listinfo/opensaf-devel
> > >
> > > ------------------------------------------------------------------
> > > --
> > > --
> > > -------- See everything from the browser to the database with 
> > > AppDynamics Get end-to-end visibility with application monitoring 
> > > from AppDynamics Isolate bottlenecks and diagnose root cause in
> seconds.
> > > Start your free trial of AppDynamics Pro today!
> > >
> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg
> > > .c
> > > lk
> > > trk
> > > _______________________________________________
> > > Opensaf-devel mailing list
> > > [email protected]
> > > https://lists.sourceforge.net/lists/listinfo/opensaf-devel

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to