Hi Greg,
There has been many ticket fixes in this area after 4.3.1. I am
listing few, please check if these works for you. If it doesn't then please
share the syslogs, configuration details and amfnd traces (if possible).
changeset: 4631:7e1a0b3b34c0
branch: opensaf-4.3.x
parent: 4628:83a60fd1b846
user: Nagendra Kumar<[email protected]>
date: Wed Nov 27 15:55:31 2013 +0530
summary: amfd: enable NPI component oper state during admin repair [#182]
changeset: 5119:921e3788932a
branch: opensaf-4.3.x
parent: 5116:f3705e8f90fc
user: Nagendra Kumar<[email protected]>
date: Tue Apr 08 13:37:44 2014 +0530
summary: amfd: return BAD_OP for repair operation for su hosted on absent
node [#826]
changeset: 4946:bb7ca5f5c62e
branch: opensaf-4.3.x
parent: 4932:cf600cf778a3
user: [email protected]
date: Tue Feb 11 14:52:47 2014 +0530
summary: amfnd : fix cleanup of assigned comp in shutdown [#767]
changeset: 4918:6a543c603988
branch: opensaf-4.3.x
parent: 4914:ebff2decec5c
user: Nagendra Kumar<[email protected]>
date: Fri Feb 07 12:49:08 2014 +0530
summary: amfnd: correct npi su term failure handling during opensaf
shutdown [#765]
Thanks
-Nagu
> -----Original Message-----
> From: Greg Hurlman [mailto:[email protected]]
> Sent: 22 August 2014 11:18
> To: [email protected]
> Subject: [users] Admin commands timeout
>
> Hi,
>
> I have a service unit which has transitioned to a failed state during
> unlock-in/unlock where further admin commands including
> lock/lock-in/repaired are failing. It prevents me to stop the OpenSAF
> service on the node. Subsequent admin commands times out and opensafd
> service stop triggers a node reboot after 1 minute. Even after the node
> reboot it remains in the same state. How to get out of this loop?
>
> I have seen many other cases where admin commands timeout in the event SU
> transition to a failed state as a result of a contained component
> instantiation failure and therefore preventing repair on these SUs. Since
> the SU is hosted in the controller node it makes things even worse as I
> have to bring down the whole cluster.
>
> OpenSAF Version used 4.3.1
>
> [root@sc2-active ~]# amf-state su all safSu=active,safSg=active,safApp=App
>
> safSu=active,safSg=active,safApp=App
>
> saAmfSUAdminState=UNLOCKED(1)
>
> saAmfSUOperState=ENABLED(1)
>
> saAmfSUPresenceState=INSTANTIATION-FAILED(6)
>
> saAmfSUReadinessState=IN-SERVICE(2)
>
> [root@sc2-active ~]#
>
>
> /Var/log/messages:
>
> Aug 19 09:19:04 sc2-active abrtd: Init complete, entering main loop
>
> Aug 19 09:21:03 sc2-active opensafd: Stopping OpenSAF Services
>
> Aug 19 09:21:03 sc2-active osafamfnd[1451]: NO Shutdown initiated
>
> Aug 19 09:21:03 sc2-active osafamfnd[1451]: NO Waiting for
> 'safSi=active,safApp=App' (state 2)
>
> Aug 19 09:22:03 sc2-active osafamfnd[1451]: ER AMF director unexpectedly
> crashed
>
> Aug 19 09:22:03 sc2-active osafamfnd[1451]: Rebooting OpenSAF NodeId =
> 131343 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest)
> received, OwnNodeId = 131343, SupervisionTime = 60
>
> Aug 19 09:22:03 sc2-active osafimmnd[1386]: NO Implementer locally
> disconnected. Marking it as doomed 20 <9, 2010f> (@safAmfService2010f)
>
> Aug 19 09:22:03 sc2-active osafimmnd[1386]: NO Implementer disconnected
> 20
> <9, 2010f> (@safAmfService2010f)
>
> Aug 19 09:22:03 sc2-active opensaf_reboot: Rebooting local node; timeout=60
>
>
>
> Any help is much appreciated.
>
> Thanks
> Greg
> ------------------------------------------------------------------------------
> Slashdot TV.
> Video for Nerds. Stuff that matters.
> http://tv.slashdot.org/
> _______________________________________________
> Opensaf-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/opensaf-users
------------------------------------------------------------------------------
Slashdot TV.
Video for Nerds. Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users