Hi, I have been trying to start OpenSAF in 2N (Redundancy mode) on Fedora 17 (linux-3.4.44 kernel). I started one node first which started normally as ACTIVE, after which I started the other node which initially started normally as STANDBY. However, after sometime, the STANDBY node IMMD loses contact with the peer IMMD and switches to ACTIVE mode. Both the nodes, then, continue to run separately as ACTIVE. The debug messages are as follows:
Jun 20 17:01:40 fedora1 osafdtmd[7709]: Started Jun 20 17:01:40 fedora1 osafrded[7729]: Started Jun 20 17:01:40 fedora1 osafrded[7729]: rde@2020f has active state => Standby role Jun 20 17:01:40 fedora1 osaffmd[7745]: Started Jun 20 17:01:40 fedora1 osafimmd[7761]: Started Jun 20 17:01:40 fedora1 osafimmd[7761]: Received IMMD service event Jun 20 17:01:40 fedora1 osafimmd[7761]: Received IMMD service event Jun 20 17:01:40 fedora1 osafimmd[7761]: Received IMMD service event Jun 20 17:01:40 fedora1 osafimmnd[7778]: Started Jun 20 17:01:40 fedora1 osafimmnd[7778]: Director Service is up Jun 20 17:01:40 fedora1 osafimmnd[7778]: SERVER STATE: IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING Jun 20 17:01:40 fedora1 osafimmnd[7778]: SERVER STATE: IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING Jun 20 17:01:40 fedora1 osafimmnd[7778]: SERVER STATE: IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING Jun 20 17:01:40 fedora1 osafimmnd[7778]: NODE STATE-> IMM_NODE_ISOLATED Jun 20 17:01:41 fedora1 osafimmd[7761]: Ruling epoch noted as:3 on IMMD standby Jun 20 17:01:41 fedora1 osafimmd[7761]: IMMND coord at 2020f Jun 20 17:01:41 fedora1 osafimmnd[7778]: NODE STATE-> IMM_NODE_W_AVAILABLE Jun 20 17:01:41 fedora1 osafimmnd[7778]: SERVER STATE: IMM_SERVER_SYNC_PENDING --> IMM_SERVER_SYNC_CLIENT Jun 20 17:01:46 fedora1 osafimmnd[7778]: NODE STATE-> IMM_NODE_FULLY_AVAILABLE 1900 Jun 20 17:01:46 fedora1 osafimmnd[7778]: RepositoryInitModeT is SA_IMM_INIT_FROM_FILE Jun 20 17:01:46 fedora1 osafimmnd[7778]: Epoch set to 3 in ImmModel Jun 20 17:01:46 fedora1 osafimmd[7761]: SBY: New Epoch for IMMND process at node 2020f old epoch: 2 new epoch:3 Jun 20 17:01:46 fedora1 osafimmd[7761]: IMMND coord at 2020f Jun 20 17:01:46 fedora1 osafimmd[7761]: SBY: New Epoch for IMMND process at node 2010f old epoch: 0 new epoch:3 Jun 20 17:01:46 fedora1 osafimmnd[7778]: SERVER STATE: IMM_SERVER_SYNC_CLIENT --> IMM SERVER READY Jun 20 17:01:46 fedora1 osaflogd[7800]: Started Jun 20 17:01:46 fedora1 osafntfd[7817]: Started Jun 20 17:01:46 fedora1 osafclmd[7834]: Started Jun 20 17:01:46 fedora1 osafclmna[7851]: Started Jun 20 17:01:46 fedora1 osafclmna[7851]: safNode=fedora1,safCluster=myClmCluster Joined cluster, nodeid=2010f Jun 20 17:01:46 fedora1 osafamfd[7867]: Started Jun 20 17:01:47 fedora1 osafimmnd[7778]: Implementer (applier) connected: 6 (@safAmfService2010f) <7, 2010f> Jun 20 17:01:47 fedora1 osafamfnd[7885]: Started Jun 20 17:01:47 fedora1 osafamfnd[7885]: 'safSu=SC-1,safSg=NoRed,safApp=OpenSAF' Presence State UNINSTANTIATED => INSTANTIATING Jun 20 17:01:47 fedora1 osafamfnd[7885]: 'safSu=SC-1,safSg=2N,safApp=OpenSAF' Presence State UNINSTANTIATED => INSTANTIATING Jun 20 17:01:47 fedora1 osafamfwd[7948]: Started Jun 20 17:01:48 fedora1 osafckptnd[7988]: Started Jun 20 17:01:48 fedora1 osafevtd[8008]: Started Jun 20 17:01:48 fedora1 osafamfnd[7885]: 'safSu=SC-1,safSg=NoRed,safApp=OpenSAF' Presence State INSTANTIATING => INSTANTIATED Jun 20 17:01:48 fedora1 osafckptnd[7988]: cpnd amf hlth chk start failed Jun 20 17:01:48 fedora1 osafckptd[8042]: Started Jun 20 17:01:48 fedora1 osafckptd[8042]: cpd health check start failed Jun 20 17:01:48 fedora1 osafamfnd[7885]: 'safSu=SC-1,safSg=2N,safApp=OpenSAF' Presence State INSTANTIATING => INSTANTIATED Jun 20 17:01:50 fedora1 osafamfnd[7885]: Assigning 'safSi=NoRed1,safApp=OpenSAF' ACTIVE to 'safSu=SC-1,safSg=NoRed,safApp=OpenSAF' Jun 20 17:01:50 fedora1 osafamfnd[7885]: Assigned 'safSi=NoRed1,safApp=OpenSAF' ACTIVE to 'safSu=SC-1,safSg=NoRed,safApp=OpenSAF' Jun 20 17:01:51 fedora1 osafamfd[7867]: Cold sync complete! Jun 20 17:01:52 fedora1 osafamfnd[7885]: Assigning 'safSi=SC-2N,safApp=OpenSAF' STANDBY to 'safSu=SC-1,safSg=2N,safApp=OpenSAF' Jun 20 17:01:52 fedora1 osafamfnd[7885]: Assigned 'safSi=SC-2N,safApp=OpenSAF' STANDBY to 'safSu=SC-1,safSg=2N,safApp=OpenSAF' Jun 20 17:01:54 fedora1 osafimmnd[7778]: Director Service in NOACTIVE state Jun 20 17:01:54 fedora1 osafimmd[7761]: Received IMMD service event Jun 20 17:01:54 fedora1 osafimmd[7761]: Received IMMD service event Jun 20 17:01:54 fedora1 osafimmd[7761]: IMMD lost contact with peer IMMD (NCSMDS_RED_DOWN) Jun 20 17:01:54 fedora1 osaffmd[7745]: Role: STANDBY, Node Down for node id: 2020f Jun 20 17:01:54 fedora1 osaffmd[7745]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: Received Node Down for Active peer Jun 20 17:01:54 fedora1 osafimmnd[7778]: DISCARD DUPLICATE FEVS message:1034 Jun 20 17:01:54 fedora1 osafimmnd[7778]: Error code 2 returned for message type 57 - ignoring Jun 20 17:01:54 fedora1 osafimmnd[7778]: DISCARD DUPLICATE FEVS message:1035 Jun 20 17:01:54 fedora1 osafimmnd[7778]: Error code 2 returned for message type 57 - ignoring Jun 20 17:01:54 fedora1 osafimmd[7761]: IMMND DOWN on active controller f2 detected at standby immd!! f1. Possible failover Jun 20 17:01:54 fedora1 osafimmd[7761]: Skipping re-send of fevs message 1034 since it has recently been resent. Jun 20 17:01:54 fedora1 osafimmd[7761]: Skipping re-send of fevs message 1035 since it has recently been resent. Jun 20 17:01:54 fedora1 osafimmnd[7778]: Global discard node received for nodeId:2020f pid:7664 Jun 20 17:01:54 fedora1 osafimmnd[7778]: Implementer disconnected 5 <0, 2020f(down)> (safEvtService) Jun 20 17:01:54 fedora1 osafimmnd[7778]: Implementer disconnected 4 <0, 2020f(down)> (safCheckPointService) Jun 20 17:01:54 fedora1 osafimmnd[7778]: Implementer disconnected 3 <0, 2020f(down)> (safAmfService) Jun 20 17:01:54 fedora1 osafimmnd[7778]: Implementer disconnected 2 <0, 2020f(down)> (safClmService) Jun 20 17:01:54 fedora1 osafimmnd[7778]: Implementer disconnected 1 <0, 2020f(down)> (safLogService) Jun 20 17:01:54 fedora1 opensaf_reboot: Rebooting remote node in the absence of PLM is outside the scope of OpenSAF Jun 20 17:01:54 fedora1 osafrded[7729]: rde_rde_set_role: role set to 1 Jun 20 17:01:54 fedora1 osafimmd[7761]: ACTIVE request Jun 20 17:01:54 fedora1 osaflogd[7800]: ACTIVE request Jun 20 17:01:54 fedora1 osafntfd[7817]: ACTIVE request Jun 20 17:01:54 fedora1 osafclmd[7834]: ACTIVE request Jun 20 17:01:54 fedora1 osafamfd[7867]: FAILOVER StandBy --> Active Jun 20 17:01:54 fedora1 osafimmnd[7778]: Director Service Is NEWACTIVE state Jun 20 17:01:54 fedora1 osafimmd[7761]: New coord elected, resides at 2010f Jun 20 17:01:54 fedora1 osafimmnd[7778]: This IMMND is now the NEW Coord Jun 20 17:01:54 fedora1 osafimmd[7761]: Received IMMD service event Jun 20 17:01:54 fedora1 osafimmd[7761]: Received IMMD service event Jun 20 17:01:54 fedora1 osafamfnd[7885]: Assigning 'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-1,safSg=2N,safApp=OpenSAF' Jun 20 17:01:54 fedora1 osafimmnd[7778]: Implementer connected: 10 (safCheckPointService) <232, 2010f> Thanks and Regards, Aditya Sahay ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev _______________________________________________ Opensaf-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-users
