Hi,

Please, share the complete syslog of both the controllers.

what is the TIPC tolerance timeout presently on the setup?


/Neel.

On Friday 25 July 2014 11:29 PM, Andrew Riley wrote:
> Hi Folks,
>
> Running a system with 2 controllers and 20+ payload cards on 4.3, I am 
> getting controller reboots that look similar to tickets 946/955 (safmsgnd :On 
> payloads restart saImmOiImplementerSet FAILED with 14 )
>
> 2014-07-24T15:33:56.793734+00:00 scm2 osafimmnd[2293]: WA ERR_BAD_HANDLE: 
> Handle use is blocked by pending reply on syncronous call
> 2014-07-24T15:33:56.793811+00:00 scm2 osafimmnd[2293]: NO Implementer locally 
> disconnected. Marking it as doomed 3 <18, 1100f> (safAmfService)
> 2014-07-24T15:33:56.794381+00:00 scm2 osafamfd[2462]: WA 
> saImmOiRtObjectUpdate of 'safSu=SCM2,safSg=2N,safApp=OpenSAF' 
> saAmfSURestartCount failed with 9
> 2014-07-24T15:33:56.794846+00:00 scm2 osafimmnd[2293]: WA ERR_BAD_HANDLE: 
> Client 77309480975 not found in server
> 2014-07-24T15:33:56.795427+00:00 scm2 osafamfd[2462]: WA 
> saImmOiRtObjectUpdate of 'safSu=SCM2,safSg=2N,safApp=OpenSAF' 
> saAmfSUNumCurrStandbySIs failed with 9
> 2014-07-24T15:33:56.795881+00:00 scm2 osafimmnd[2293]: WA ERR_BAD_HANDLE: 
> Client 77309480975 not found in server
> 2014-07-24T15:33:56.796381+00:00 scm2 osafamfd[2462]: WA 
> saImmOiRtObjectUpdate of 'safSu=SCM2,safSg=2N,safApp=OpenSAF' 
> saAmfSUNumCurrActiveSIs failed with 9
> 2014-07-24T15:33:56.797709+00:00 scm2 osafimmnd[2293]: WA ERR_BAD_HANDLE: 
> Client 77309480975 not found in server
> 2014-07-24T15:33:56.829827+00:00 scm2 osafamfd[2462]: NO Re-initializing with 
> IMM
> 2014-07-24T15:33:56.830303+00:00 scm2 osafimmnd[2293]: WA IMMND - Client Node 
> Get Failed for cli_hdl 77309480975
> 2014-07-24T15:33:56.845047+00:00 scm2 osafamfd[2462]: ER 
> saImmOiImplementerSet failed 14
> 2014-07-24T15:33:56.845157+00:00 scm2 osafamfd[2462]: ER exiting since 
> avd_imm_impl_set failed
> 2014-07-24T15:33:56.853047+00:00 scm2 osafamfnd[3093]: ER AMF director 
> unexpectedly crashed
> 2014-07-24T15:33:56.853120+00:00 scm2 osafamfnd[3093]: Rebooting OpenSAF 
> NodeId = 69647 EE Name = , Reason: local AVD down(Adest) or both AVD 
> down(Vdest) received, OwnNodeId = 69647, SupervisionTime = 0
>
> Doesn't happen frequently, but it is triggered by a shutdown/power cycle a 
> payload card.
>
> In pushing the problem around, lowering the TIPC timeout (normally set to 10 
> seconds to account for our network behavior) down to 4-5 seconds avoids the 
> failure or reduces the likelihood to a level that it didn't happen despite my 
> attempts.
>
> Not yet familiar enough with the code to have a handle on the details, but it 
> is reasonable to believe the TIPC timeout forces a cleaner detection/recovery 
> from the communication loss, thus avoiding the error? If that is the case, 
> might it be reasonable to increase the synchronous message timeout as a 
> workaround until there's a solution?
>
> Thanks for your insight,
> -andy
>
>
> ------------------------------------------------------------------------------
> Want fast and easy access to all the code in your enterprise? Index and
> search up to 200,000 lines of code with a free copy of Black Duck
> Code Sight - the same software that powers the world's largest code
> search on Ohloh, the Black Duck Open Hub! Try it now.
> http://p.sf.net/sfu/bds
> _______________________________________________
> Opensaf-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/opensaf-users


------------------------------------------------------------------------------
Infragistics Professional
Build stunning WinForms apps today!
Reboot your WinForms applications with our WinForms controls. 
Build a bridge from your legacy apps to the future.
http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Reply via email to