Hi Folks,

Running a system with 2 controllers and 20+ payload cards on 4.3, I am getting 
controller reboots that look similar to tickets 946/955 (safmsgnd :On payloads 
restart saImmOiImplementerSet FAILED with 14 )

2014-07-24T15:33:56.793734+00:00 scm2 osafimmnd[2293]: WA ERR_BAD_HANDLE: 
Handle use is blocked by pending reply on syncronous call
2014-07-24T15:33:56.793811+00:00 scm2 osafimmnd[2293]: NO Implementer locally 
disconnected. Marking it as doomed 3 <18, 1100f> (safAmfService)
2014-07-24T15:33:56.794381+00:00 scm2 osafamfd[2462]: WA saImmOiRtObjectUpdate 
of 'safSu=SCM2,safSg=2N,safApp=OpenSAF' saAmfSURestartCount failed with 9
2014-07-24T15:33:56.794846+00:00 scm2 osafimmnd[2293]: WA ERR_BAD_HANDLE: 
Client 77309480975 not found in server
2014-07-24T15:33:56.795427+00:00 scm2 osafamfd[2462]: WA saImmOiRtObjectUpdate 
of 'safSu=SCM2,safSg=2N,safApp=OpenSAF' saAmfSUNumCurrStandbySIs failed with 9
2014-07-24T15:33:56.795881+00:00 scm2 osafimmnd[2293]: WA ERR_BAD_HANDLE: 
Client 77309480975 not found in server
2014-07-24T15:33:56.796381+00:00 scm2 osafamfd[2462]: WA saImmOiRtObjectUpdate 
of 'safSu=SCM2,safSg=2N,safApp=OpenSAF' saAmfSUNumCurrActiveSIs failed with 9
2014-07-24T15:33:56.797709+00:00 scm2 osafimmnd[2293]: WA ERR_BAD_HANDLE: 
Client 77309480975 not found in server
2014-07-24T15:33:56.829827+00:00 scm2 osafamfd[2462]: NO Re-initializing with 
IMM
2014-07-24T15:33:56.830303+00:00 scm2 osafimmnd[2293]: WA IMMND - Client Node 
Get Failed for cli_hdl 77309480975
2014-07-24T15:33:56.845047+00:00 scm2 osafamfd[2462]: ER saImmOiImplementerSet 
failed 14
2014-07-24T15:33:56.845157+00:00 scm2 osafamfd[2462]: ER exiting since 
avd_imm_impl_set failed
2014-07-24T15:33:56.853047+00:00 scm2 osafamfnd[3093]: ER AMF director 
unexpectedly crashed
2014-07-24T15:33:56.853120+00:00 scm2 osafamfnd[3093]: Rebooting OpenSAF NodeId 
= 69647 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) 
received, OwnNodeId = 69647, SupervisionTime = 0

Doesn't happen frequently, but it is triggered by a shutdown/power cycle a 
payload card.

In pushing the problem around, lowering the TIPC timeout (normally set to 10 
seconds to account for our network behavior) down to 4-5 seconds avoids the 
failure or reduces the likelihood to a level that it didn't happen despite my 
attempts.

Not yet familiar enough with the code to have a handle on the details, but it 
is reasonable to believe the TIPC timeout forces a cleaner detection/recovery 
from the communication loss, thus avoiding the error? If that is the case, 
might it be reasonable to increase the synchronous message timeout as a 
workaround until there's a solution?

Thanks for your insight,
-andy


------------------------------------------------------------------------------
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Reply via email to