yes, the patch is included in 4.3.2. /Neel. On Wednesday 05 March 2014 06:22 PM, Tony Hart wrote: > Thanks Neel. Is this fix is in 4.3.2 ? > > On Mar 5, 2014, at 7:42 AM, Neelakanta Reddy > <[email protected] <mailto:[email protected]>> wrote: > >> Hi, >> >> The similar problem is fixed in >> http://sourceforge.net/p/opensaf/tickets/600/. >> The patch is pushed in changeset: 4688 for 4.3.x. >> >> Apply the patch and retest. >> >> If you still see the problem, please share the following logs: >> >> 1. amfd and amfnd traces of controllers and the payload >> >> 2. syslog of controllers and payload. >> >> 3. mds.log for controllers and payload. >> >> /Neel. >> >> >> On Wednesday 05 March 2014 05:25 PM, Tony Hart wrote: >>> >>> 5 seconds >>> >>> The payload card gets the TIPC timeout logs, but it does not reboot. >>> This maybe timing related since the link re-establishes quickly >>> after the down (you can see from the logs that the link >>> re-established within the same second of going down). >>> >>> On Mar 5, 2014, at 6:51 AM, Neelakanta Reddy >>> <[email protected] <mailto:[email protected]>> >>> wrote: >>> >>>> HI, >>>> >>>> what is the configured TIPC link tolerance time? >>>> Depending on the tolerance time, the other node will get service down. >>>> >>>> /Neel. >>>> >>>> On Tuesday 04 March 2014 08:53 PM, Tony Hart wrote: >>>>> We’re seeing a problem where there is a loss of connectivity between a >>>>> payload (cmm02B) and the controller (the connectivity returns but is away >>>>> just long enough to trigger a TIPC timeout) in this case the payload is >>>>> dropped from the cluster but the payload doesn’t restart. The payload is >>>>> flagged as not being in the cluster and its presence state is >>>>> UNINSTANTIATED. Its still running the osaf processes though. >>>>> >>>>> Is this something that’s been fixed in the current release (we’re running >>>>> 4.3.1) >>>>> >>>>> $ immlist safNode=cmm02b,safCluster=myClmCluster >>>>> Name Type Value(s) >>>>> ======================================================================== >>>>> safNode SA_STRING_T >>>>> safNode=cmm02b >>>>> saClmNodeLockCallbackTimeout SA_TIME_T >>>>> 50000000000 (0xba43b7400, Thu Jan 1 00:00:50 1970) >>>>> saClmNodeIsMember SA_UINT32_T 0 (0x0) >>>>> saClmNodeInitialViewNumber SA_UINT64_T 28 (0x1c) >>>>> saClmNodeID SA_UINT32_T 73743 >>>>> (0x1200f) >>>>> saClmNodeEE SA_NAME_T <Empty> >>>>> saClmNodeDisableReboot SA_UINT32_T 0 (0x0) >>>>> saClmNodeCurrAddressFamily SA_UINT32_T <Empty> >>>>> saClmNodeCurrAddress SA_STRING_T <Empty> >>>>> saClmNodeBootTimeStamp SA_TIME_T >>>>> 1393879646000000000 (0x13580e27277a2c00, Mon Mar 3 20:47:26 2014) >>>>> saClmNodeAdminState SA_UINT32_T 1 (0x1) >>>>> saClmNodeAddressFamily SA_UINT32_T <Empty> >>>>> saClmNodeAddress SA_STRING_T <Empty> >>>>> SaImmAttrImplementerName SA_STRING_T >>>>> safClmService >>>>> SaImmAttrClassName SA_STRING_T SaClmNode >>>>> SaImmAttrAdminOwnerName SA_STRING_T IMMLOADER >>>>> >>>>> >>>>> $ immlist $(amf-find node | grep CMM02B) >>>>> Name Type Value(s) >>>>> ======================================================================== >>>>> safAmfNode SA_STRING_T >>>>> safAmfNode=CMM02B >>>>> saAmfNodeSuFailoverMax SA_UINT32_T 2 (0x2) >>>>> saAmfNodeSuFailOverProb SA_TIME_T >>>>> 1200000000000 (0x1176592e000, Thu Jan 1 00:20:00 1970) >>>>> saAmfNodeOperState SA_UINT32_T 2 (0x2) >>>>> saAmfNodeFailfastOnTerminationFailure SA_UINT32_T 0 (0x0) >>>>> saAmfNodeFailfastOnInstantiationFailure SA_UINT32_T 0 (0x0) >>>>> saAmfNodeClmNode SA_NAME_T >>>>> safNode=cmm02b,safCluster=myClmCluster (38) >>>>> saAmfNodeCapacity SA_STRING_T <Empty> >>>>> saAmfNodeAutoRepair SA_UINT32_T 1 (0x1) >>>>> saAmfNodeAdminState SA_UINT32_T 1 (0x1) >>>>> SaImmAttrImplementerName SA_STRING_T >>>>> safAmfService >>>>> SaImmAttrClassName SA_STRING_T SaAmfNode >>>>> SaImmAttrAdminOwnerName SA_STRING_T IMMLOADER >>>>> >>>>> >>>>> cmm02b$ ps aux | grep osaf >>>>> root 1417 0.0 0.0 225880 2028 ? Ssl Mar03 0:08 >>>>> /usr/lib64/opensaf/osafamfnd osafamfnd >>>>> root 1429 0.0 0.0 157100 1416 ? Ssl Mar03 0:00 >>>>> /usr/lib64/opensaf/osafsmfnd osafsmfnd >>>>> opensaf 1438 0.0 0.1 174256 5764 ? Ssl Mar03 0:00 >>>>> /usr/lib64/opensaf/osafmsgnd osafmsgnd >>>>> opensaf 1454 0.0 0.0 155732 1448 ? Ssl Mar03 0:00 >>>>> /usr/lib64/opensaf/osaflcknd osaflcknd >>>>> opensaf 1463 0.0 0.0 158148 2296 ? Ssl Mar03 0:00 >>>>> /usr/lib64/opensaf/osafckptnd osafckptnd >>>>> opensaf 1472 0.0 0.0 155020 1392 ? Ssl Mar03 0:02 >>>>> /usr/lib64/opensaf/osafamfwd osafamfwd >>>>> opensaf 4704 0.0 0.3 182240 11992 ? Ssl 14:20 0:01 >>>>> /usr/lib64/opensaf/osafimmnd osafimmnd >>>>> >>>>> >>>>> >>>>> SCM1 (1.1.15) >>>>> ------------------- >>>>> 2014-03-04T14:20:18.808187+00:00 scm1 osafamfd[1771]: NO Node 'PLD0211' >>>>> left the cluster >>>>> 2014-03-04T14:20:18.851318+00:00 scm1 kernel: TIPC: Established link >>>>> <1.1.15:eth2-1.1.27:bond0> on network plane A >>>>> 2014-03-04T14:20:18.852749+00:00 scm1 osafsmfd[1965]: ER >>>>> saClmClusterNodeGet failed, rc=SA_AIS_ERR_NOT_EXIST (12) >>>>> 2014-03-04T14:20:18.858472+00:00 scm1 kernel: TIPC: Established link >>>>> <1.1.15:eth2-1.1.23:bond0> on network plane A >>>>> 2014-03-04T14:20:18.871084+00:00 scm1 osafsmfd[1965]: ER >>>>> saClmClusterNodeGet failed, rc=SA_AIS_ERR_NOT_EXIST (12) >>>>> 2014-03-04T14:20:18.956307+00:00 scm1 kernel: TIPC: Resetting link >>>>> <1.1.15:eth2-1.1.32:eth2>, peer not responding >>>>> 2014-03-04T14:20:18.956330+00:00 scm1 kernel: TIPC: Lost link >>>>> <1.1.15:eth2-1.1.32:eth2> on network plane A >>>>> 2014-03-04T14:20:18.956335+00:00 scm1 kernel: TIPC: Lost contact with >>>>> <1.1.32> >>>>> 2014-03-04T14:20:18.956340+00:00 scm1 kernel: TIPC: Established link >>>>> <1.1.15:eth2-1.1.32:eth2> on network plane A >>>>> 2014-03-04T14:20:18.958227+00:00 scm1 osafimmnd[1667]: NO Global discard >>>>> node received for nodeId:1200f pid:1347 >>>>> 2014-03-04T14:20:18.958270+00:00 scm1 osafimmnd[1667]: NO Implementer >>>>> disconnected 51 <0, 1200f(down)> (MsgQueueService73743) >>>>> 2014-03-04T14:20:18.965240+00:00 scm1 osafimmnd[1667]: NO Implementer >>>>> connected: 71 (MsgQueueService73743) <92377, 10f0f> >>>>> 2014-03-04T14:20:18.968251+00:00 scm1 osafimmnd[1667]: NO Implementer >>>>> locally disconnected. Marking it as doomed 71 <92377, 10f0f> >>>>> (MsgQueueService73743) >>>>> 2014-03-04T14:20:18.971785+00:00 scm1 osafimmnd[1667]: NO Global discard >>>>> node received for nodeId:1170f pid:0 >>>>> 2014-03-04T14:20:18.973013+00:00 scm1 osafimmnd[1667]: NO Global discard >>>>> node received for nodeId:1200f pid:0 >>>>> 2014-03-04T14:20:18.976586+00:00 scm1 osafimmnd[1667]: NO Implementer >>>>> disconnected 71 <92377, 10f0f> (MsgQueueService73743) >>>>> 2014-03-04T14:20:19.025760+00:00 scm1 osafimmd[1657]: NO Node 11e0f >>>>> request sync sync-pid:23769 epoch:0 >>>>> 2014-03-04T14:20:19.076427+00:00 scm1 osafamfd[1771]: NO Node 'PLD0214' >>>>> left the cluster >>>>> 2014-03-04T14:20:19.215220+00:00 scm1 osafimmd[1657]: NO Node 11c0f >>>>> request sync sync-pid:23629 epoch:0 >>>>> 2014-03-04T14:20:19.296817+00:00 scm1 osafamfd[1771]: WA >>>>> avd_msg_sanity_chk: invalid node ID (11e0f) >>>>> 2014-03-04T14:20:19.300899+00:00 scm1 osafamfd[1771]: WA >>>>> avd_msg_sanity_chk: invalid node ID (11e0f) >>>>> 2014-03-04T14:20:19.305377+00:00 scm1 osafamfd[1771]: NO Node 'CMM02B' >>>>> left the cluster >>>>> 2014-03-04T14:20:19.357458+00:00 scm1 osafimmd[1657]: NO Node 1200f >>>>> request sync sync-pid:4704 epoch:0 >>>>> >>>>> >>>>> cmm02B (1.1.32) >>>>> ----------------------- >>>>> 2014-03-04T14:20:18.495174+00:00 cmm02b kernel: TIPC: Resetting link >>>>> <1.1.32:eth2-1.1.10:bond0>, peer not responding >>>>> 2014-03-04T14:20:18.495203+00:00 cmm02b kernel: TIPC: Lost link >>>>> <1.1.32:eth2-1.1.10:bond0> on network plane A >>>>> 2014-03-04T14:20:18.495209+00:00 cmm02b kernel: TIPC: Lost contact with >>>>> <1.1.10> >>>>> 2014-03-04T14:20:18.501981+00:00 cmm02b kernel: TIPC: Resetting link >>>>> <1.1.32:eth2-1.1.15:eth2>, peer not responding >>>>> 2014-03-04T14:20:18.502012+00:00 cmm02b kernel: TIPC: Lost link >>>>> <1.1.32:eth2-1.1.15:eth2> on network plane A >>>>> 2014-03-04T14:20:18.502016+00:00 cmm02b kernel: TIPC: Lost contact with >>>>> <1.1.15> >>>>> 2014-03-04T14:20:18.502020+00:00 cmm02b kernel: TIPC: Resetting link >>>>> <1.1.32:eth2-1.1.11:bond0>, peer not responding >>>>> 2014-03-04T14:20:18.502023+00:00 cmm02b kernel: TIPC: Lost link >>>>> <1.1.32:eth2-1.1.11:bond0> on network plane A >>>>> 2014-03-04T14:20:18.502026+00:00 cmm02b kernel: TIPC: Lost contact with >>>>> <1.1.11> >>>>> 2014-03-04T14:20:18.502110+00:00 cmm02b kernel: TIPC: Resetting link >>>>> <1.1.32:eth2-1.1.1:bond0>, peer not responding >>>>> 2014-03-04T14:20:18.502115+00:00 cmm02b kernel: TIPC: Lost link >>>>> <1.1.32:eth2-1.1.1:bond0> on network plane A >>>>> 2014-03-04T14:20:18.502118+00:00 cmm02b kernel: TIPC: Lost contact with >>>>> <1.1.1> >>>>> 2014-03-04T14:20:18.549154+00:00 cmm02b kernel: TIPC: Resetting link >>>>> <1.1.32:eth2-1.1.14:bond0>, peer not responding >>>>> 2014-03-04T14:20:18.549180+00:00 cmm02b kernel: TIPC: Lost link >>>>> <1.1.32:eth2-1.1.14:bond0> on network plane A >>>>> 2014-03-04T14:20:18.549184+00:00 cmm02b kernel: TIPC: Lost contact with >>>>> <1.1.14> >>>>> 2014-03-04T14:20:18.671107+00:00 cmm02b kernel: TIPC: Established link >>>>> <1.1.32:eth2-1.1.14:bond0> on network plane A >>>>> 2014-03-04T14:20:18.743482+00:00 cmm02b kernel: TIPC: Established link >>>>> <1.1.32:eth2-1.1.11:bond0> on network plane A >>>>> 2014-03-04T14:20:18.866277+00:00 cmm02b kernel: TIPC: Established link >>>>> <1.1.32:eth2-1.1.10:bond0> on network plane A >>>>> 2014-03-04T14:20:18.869280+00:00 cmm02b kernel: TIPC: Established link >>>>> <1.1.32:eth2-1.1.1:bond0> on network plane A >>>>> 2014-03-04T14:20:18.954740+00:00 cmm02b kernel: TIPC: Established link >>>>> <1.1.32:eth2-1.1.15:eth2> on network plane A >>>>> 2014-03-04T14:20:18.959226+00:00 cmm02b osafimmnd[1347]: WA MESSAGE:38632 >>>>> OUT OF ORDER my highest processed:38600, exiting >>>>> 2014-03-04T14:20:18.967269+00:00 cmm02b osafamfnd[1417]: NO >>>>> 'safComp=IMMND,safSu=CMM02B,safSg=NoRed,safApp=OpenSAF' faulted due to >>>>> 'avaDown' : Recovery is 'componentRestart' >>>>> 2014-03-04T14:20:19.052569+00:00 cmm02b osafimmnd[4704]: Started >>>>> 2014-03-04T14:20:19.157393+00:00 cmm02b osafimmnd[4704]: NO SERVER STATE: >>>>> IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING >>>>> 2014-03-04T14:20:19.257835+00:00 cmm02b osafimmnd[4704]: NO SERVER STATE: >>>>> IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING >>>>> 2014-03-04T14:20:19.358134+00:00 cmm02b osafimmnd[4704]: NO SERVER STATE: >>>>> IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING >>>>> 2014-03-04T14:20:19.358452+00:00 cmm02b osafimmnd[4704]: NO NODE STATE-> >>>>> IMM_NODE_ISOLATED >>>>> 2014-03-04T14:20:19.955686+00:00 cmm02b osafimmnd[4704]: NO NODE STATE-> >>>>> IMM_NODE_W_AVAILABLE >>>>> 2014-03-04T14:20:20.022473+00:00 cmm02b osafimmnd[4704]: NO SERVER STATE: >>>>> IMM_SERVER_SYNC_PENDING --> IMM_SERVER_SYNC_CLIENT >>>>> 2014-03-04T14:20:26.925158+00:00 cmm02b kernel: TIPC: Resetting link >>>>> <1.1.32:eth2-1.1.31:eth2>, peer not responding >>>>> 2014-03-04T14:20:26.925184+00:00 cmm02b kernel: TIPC: Lost link >>>>> <1.1.32:eth2-1.1.31:eth2> on network plane A >>>>> 2014-03-04T14:20:26.925191+00:00 cmm02b kernel: TIPC: Lost contact with >>>>> <1.1.31> >>>>> 2014-03-04T14:20:27.893115+00:00 cmm02b kernel: TIPC: Resetting link >>>>> <1.1.32:eth2-1.1.27:bond0>, peer not responding >>>>> 2014-03-04T14:20:27.893148+00:00 cmm02b kernel: TIPC: Lost link >>>>> <1.1.32:eth2-1.1.27:bond0> on network plane A >>>>> 2014-03-04T14:20:27.893154+00:00 cmm02b kernel: TIPC: Lost contact with >>>>> <1.1.27> >>>>> 2014-03-04T14:20:32.026411+00:00 cmm02b osafimmnd[4704]: NO NODE STATE-> >>>>> IMM_NODE_FULLY_AVAILABLE 2144 >>>>> 2014-03-04T14:20:32.026463+00:00 cmm02b osafimmnd[4704]: NO >>>>> RepositoryInitModeT is SA_IMM_INIT_FROM_FILE >>>>> 2014-03-04T14:20:32.026493+00:00 cmm02b osafimmnd[4704]: NO Epoch set to >>>>> 22 in ImmModel >>>>> 2014-03-04T14:20:32.031737+00:00 cmm02b osafimmnd[4704]: NO Implementer >>>>> connected: 72 (MsgQueueService73743) <67, 1200f> >>>>> 2014-03-04T14:20:32.035966+00:00 cmm02b osafimmnd[4704]: NO Implementer >>>>> connected: 73 (MsgQueueService73231) <0, 11e0f> >>>>> 2014-03-04T14:20:32.041233+00:00 cmm02b osafimmnd[4704]: NO SERVER STATE: >>>>> IMM_SERVER_SYNC_CLIENT --> IMM SERVER READY >>>>> 2014-03-04T14:20:32.042213+00:00 cmm02b osafimmnd[4704]: NO Implementer >>>>> connected: 74 (MsgQueueService72719) <0, 11c0f> >>>>> 2014-03-04T14:20:32.047252+00:00 cmm02b osafimmnd[4704]: NO Implementer >>>>> connected: 75 (MsgQueueService71439) <0, 1170f> >>>>> 2014-03-04T14:20:46.911220+00:00 cmm02b osafimmnd[4704]: NO Implementer >>>>> connected: 76 (MsgQueueService73487) <0, 10f0f> >>>>> 2014-03-04T14:20:46.920751+00:00 cmm02b osafimmnd[4704]: NO Implementer >>>>> disconnected 76 <0, 10f0f> (MsgQueueService73487) >>>>> 2014-03-04T14:20:48.012244+00:00 cmm02b osafimmnd[4704]: NO Implementer >>>>> connected: 77 (MsgQueueService72463) <0, 10f0f> >>>>> 2014-03-04T14:20:48.014779+00:00 cmm02b osafimmnd[4704]: NO Implementer >>>>> disconnected 77 <0, 10f0f> (MsgQueueService72463) >>>>> 2014-03-04T14:21:13.653100+00:00 cmm02b kernel: TIPC: Established link >>>>> <1.1.32:eth2-1.1.31:eth2> on network plane A >>>>> 2014-03-04T14:21:14.052200+00:00 cmm02b osafimmnd[4704]: NO NODE STATE-> >>>>> IMM_NODE_R_AVAILABLE >>>>> 2014-03-04T14:21:20.913433+00:00 cmm02b osafimmnd[4704]: NO NODE STATE-> >>>>> IMM_NODE_FULLY_AVAILABLE 14277 >>>>> 2014-03-04T14:21:20.913963+00:00 cmm02b osafimmnd[4704]: NO Epoch set to >>>>> 23 in ImmModel >>>>> 2014-03-04T14:21:21.419157+00:00 cmm02b osafimmnd[4704]: NO Implementer >>>>> connected: 78 (MsgQueueService73487) <0, 11f0f> >>>>> 2014-03-04T14:21:40.874192+00:00 cmm02b kernel: TIPC: Established link >>>>> <1.1.32:eth2-1.1.27:bond0> on network plane A >>>>> 2014-03-04T14:21:42.179625+00:00 cmm02b osafimmnd[4704]: NO NODE STATE-> >>>>> IMM_NODE_R_AVAILABLE >>>>> 2014-03-04T14:21:46.871328+00:00 cmm02b osafimmnd[4704]: NO NODE STATE-> >>>>> IMM_NODE_FULLY_AVAILABLE 14277 >>>>> 2014-03-04T14:21:46.871755+00:00 cmm02b osafimmnd[4704]: NO Epoch set to >>>>> 24 in ImmModel >>>>> 2014-03-04T14:21:47.649858+00:00 cmm02b osafimmnd[4704]: NO Implementer >>>>> connected: 79 (MsgQueueService72463) <0, 11b0f> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> Subversion Kills Productivity. Get off Subversion & Make the Move to >>>>> Perforce. >>>>> With Perforce, you get hassle-free workflows. Merge that actually works. >>>>> Faster operations. Version large binaries. Built-in WAN optimization and >>>>> the >>>>> freedom to use Git, Perforce or both. Make the move to Perforce. >>>>> http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk >>>>> _______________________________________________ >>>>> Opensaf-users mailing list >>>>> [email protected] >>>>> https://lists.sourceforge.net/lists/listinfo/opensaf-users >>>> >>> >> >
------------------------------------------------------------------------------ Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce. With Perforce, you get hassle-free workflows. Merge that actually works. Faster operations. Version large binaries. Built-in WAN optimization and the freedom to use Git, Perforce or both. Make the move to Perforce. http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk _______________________________________________ Opensaf-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-users
