I've tracked this down to a race condition when using the ADMIN API. I don't think this has anything to do with dynamically created CSIs.
The problem seems to be in amfnd. If you have a component that takes a while to instantiate (in our case ~9 seconds), and issue the "UNLOCK-INSTANTIATE" admin command for the SG, and then immediately issue "UNLOCK" admin command (while the component is still instantiating), the amfnd "curr_assign_state" of avnd_comp_csi_rec on the STANDBY never get set to AVND_COMP_CSI_ASSIGN_STATE_ASSIGNED, even though amfd has created all the proper SaAmfCSIAssignments in IMM. Then when you issue "SHUTDOWN" from the admin API, amfnd on the standby doesn't remove the CSIs because it doesn't think it ever assigned them. Does this ring any bells with anyone? Alex On 04/04/2014 06:22 PM, Alex Jones wrote: > I'm seeing a problem when doing an administrative shutdown on an SG > with dynamically created CSIs. > > I'm assuming that dynamically creating CSIs is supported... > > I have an N+1 service group with one component entirely in the imm.xml > config file including CSIs. There is another component in this SG > with all its config in the imm.xml except the SaAmfCSI description. > These are added dynamically after the system is up and running. > > Adding these CSIs dynamically seems to be working OK. The active and > standby assignments are given. I can administratively shutdown the > active SU, and failover occurs properly, etc. > > The problem comes when I try to admin shutdown the SG. All the active > components are correctly handled, but the standby fails. I see this > in the log for the standby SU: > > Apr 4 21:16:57 linux osafamfnd[4542]: NO Removing 'all (5) SIs' from > 'safSu=DDD-SU1,safSg=DDD-Np1,safApp=DDDApp' > Apr 4 21:16:57 linux osafamfnd[4542]: NO Removing > 'safSi=DDD-Np1-SI-1,safApp=DDDApp' from > 'safSu=DDD-SU1,safSg=DDD-Np1,safApp=DDDApp' > Apr 4 21:16:57 linux osafamfnd[4542]: NO Removing > 'safSi=DDD-Np1-SI-2,safApp=DDDApp' from > 'safSu=DDD-SU1,safSg=DDD-Np1,safApp=DDDApp' > Apr 4 21:16:57 linux osafamfnd[4542]: NO Removing > 'safSi=DDD-Np1-SI-3,safApp=DDDApp' from > 'safSu=DDD-SU1,safSg=DDD-Np1,safApp=DDDApp' > Apr 4 21:16:57 linux osafamfnd[4542]: NO Removing > 'safSi=DDD-Np1-SI-4,safApp=DDDApp' from > 'safSu=DDD-SU1,safSg=DDD-Np1,safApp=DDDApp' > Apr 4 21:16:57 linux osafamfnd[4542]: NO Removing > 'safSi=DDD-Np1-SI-5,safApp=DDDApp' from > 'safSu=DDD-SU1,safSg=DDD-Np1,safApp=DDDApp' > > But, I never see "Removed". It looks like it never finishes. Then any > other admin operation on this SG results in: > > Apr 4 21:19:58 linux osafamfd[4761]: WA SG not in STABLE state > (safSg=DDD-Np1,safApp=DDDApp) > > After doing the admin shutdown the SaAmfCSIAssignment entries for the > standby SU that failed still exist in IMM. All other > SaAmfCSIAssignment entries for the active SUs have been deleted, and I > see this in the osafamfd log on the active controller. I never see > deletes for the standby. > > Apr 4 21:16:57.143902 osafamfd [4761:avd_csi.c:1042] TR Deleting > safCSIComp=safComp=DddManager\,safSu=DDD-SU5\,safSg=DDD-Np1\,safApp=DDDApp,safCsi=DddManager,safSi=DDD-Np1-SI-4,safApp=DDDApp > > > > It seems like it's a problem with the standby SU. > > I know that the components are getting the csiRemove callback, so it > doesn't seem to be a component issue. > > I'll continue looking into this, but any help would be appreciated. > > Thanks! > > Alex > > ------------------------------------------------------------------------------ Put Bad Developers to Shame Dominate Development with Jenkins Continuous Integration Continuously Automate Build, Test & Deployment Start a new project now. Try Jenkins in the cloud. http://p.sf.net/sfu/13600_Cloudbees _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel