yes, the patch is included in 4.3.2.

/Neel.
On Wednesday 05 March 2014 06:22 PM, Tony Hart wrote:
> Thanks Neel.  Is this fix is in 4.3.2 ?
>
> On Mar 5, 2014, at 7:42 AM, Neelakanta Reddy 
> <[email protected] <mailto:[email protected]>> wrote:
>
>> Hi,
>>
>> The similar problem is fixed in 
>> http://sourceforge.net/p/opensaf/tickets/600/.
>> The patch is pushed in changeset: 4688 for 4.3.x.
>>
>> Apply the patch and retest.
>>
>> If you still see the problem, please share the following logs:
>>
>> 1. amfd and amfnd traces of controllers and the payload
>>
>> 2. syslog of controllers and payload.
>>
>> 3. mds.log for controllers and payload.
>>
>> /Neel.
>>
>>
>> On Wednesday 05 March 2014 05:25 PM, Tony Hart wrote:
>>>
>>> 5 seconds
>>>
>>> The payload card gets the TIPC timeout logs, but it does not reboot. 
>>>  This maybe timing related since the link re-establishes quickly 
>>> after the down (you can see from the logs that the link 
>>> re-established within the same second of going down).
>>>
>>> On Mar 5, 2014, at 6:51 AM, Neelakanta Reddy 
>>> <[email protected] <mailto:[email protected]>> 
>>> wrote:
>>>
>>>> HI,
>>>>
>>>> what is the configured TIPC link tolerance time?
>>>> Depending on the tolerance time, the other node will get service down.
>>>>
>>>> /Neel.
>>>>
>>>> On Tuesday 04 March 2014 08:53 PM, Tony Hart wrote:
>>>>> We’re seeing a problem where there is a loss of connectivity between a 
>>>>> payload (cmm02B) and the controller (the connectivity returns but is away 
>>>>> just long enough to trigger a TIPC timeout) in this case the payload is 
>>>>> dropped from the cluster but the payload doesn’t restart.  The payload is 
>>>>> flagged as not being in the cluster and its presence state is 
>>>>> UNINSTANTIATED.  Its still running the osaf processes though.
>>>>>
>>>>> Is this something that’s been fixed in the current release (we’re running 
>>>>> 4.3.1)
>>>>>
>>>>> $ immlist safNode=cmm02b,safCluster=myClmCluster
>>>>> Name                                               Type         Value(s)
>>>>> ========================================================================
>>>>> safNode                                            SA_STRING_T  
>>>>> safNode=cmm02b
>>>>> saClmNodeLockCallbackTimeout                       SA_TIME_T    
>>>>> 50000000000 (0xba43b7400, Thu Jan  1 00:00:50 1970)
>>>>> saClmNodeIsMember                                  SA_UINT32_T  0 (0x0)
>>>>> saClmNodeInitialViewNumber                         SA_UINT64_T  28 (0x1c)
>>>>> saClmNodeID                                        SA_UINT32_T  73743 
>>>>> (0x1200f)
>>>>> saClmNodeEE                                        SA_NAME_T    <Empty>
>>>>> saClmNodeDisableReboot                             SA_UINT32_T  0 (0x0)
>>>>> saClmNodeCurrAddressFamily                         SA_UINT32_T  <Empty>
>>>>> saClmNodeCurrAddress                               SA_STRING_T  <Empty>
>>>>> saClmNodeBootTimeStamp                             SA_TIME_T    
>>>>> 1393879646000000000 (0x13580e27277a2c00, Mon Mar  3 20:47:26 2014)
>>>>> saClmNodeAdminState                                SA_UINT32_T  1 (0x1)
>>>>> saClmNodeAddressFamily                             SA_UINT32_T  <Empty>
>>>>> saClmNodeAddress                                   SA_STRING_T  <Empty>
>>>>> SaImmAttrImplementerName                           SA_STRING_T  
>>>>> safClmService
>>>>> SaImmAttrClassName                                 SA_STRING_T  SaClmNode
>>>>> SaImmAttrAdminOwnerName                            SA_STRING_T  IMMLOADER
>>>>>
>>>>>
>>>>> $ immlist $(amf-find node | grep CMM02B)
>>>>> Name                                               Type         Value(s)
>>>>> ========================================================================
>>>>> safAmfNode                                         SA_STRING_T  
>>>>> safAmfNode=CMM02B
>>>>> saAmfNodeSuFailoverMax                             SA_UINT32_T  2 (0x2)
>>>>> saAmfNodeSuFailOverProb                            SA_TIME_T    
>>>>> 1200000000000 (0x1176592e000, Thu Jan  1 00:20:00 1970)
>>>>> saAmfNodeOperState                                 SA_UINT32_T  2 (0x2)
>>>>> saAmfNodeFailfastOnTerminationFailure              SA_UINT32_T  0 (0x0)
>>>>> saAmfNodeFailfastOnInstantiationFailure            SA_UINT32_T  0 (0x0)
>>>>> saAmfNodeClmNode                                   SA_NAME_T    
>>>>> safNode=cmm02b,safCluster=myClmCluster (38)
>>>>> saAmfNodeCapacity                                  SA_STRING_T  <Empty>
>>>>> saAmfNodeAutoRepair                                SA_UINT32_T  1 (0x1)
>>>>> saAmfNodeAdminState                                SA_UINT32_T  1 (0x1)
>>>>> SaImmAttrImplementerName                           SA_STRING_T  
>>>>> safAmfService
>>>>> SaImmAttrClassName                                 SA_STRING_T  SaAmfNode
>>>>> SaImmAttrAdminOwnerName                            SA_STRING_T  IMMLOADER
>>>>>
>>>>>
>>>>> cmm02b$ ps aux | grep osaf
>>>>> root      1417  0.0  0.0 225880  2028 ?        Ssl  Mar03   0:08 
>>>>> /usr/lib64/opensaf/osafamfnd osafamfnd
>>>>> root      1429  0.0  0.0 157100  1416 ?        Ssl  Mar03   0:00 
>>>>> /usr/lib64/opensaf/osafsmfnd osafsmfnd
>>>>> opensaf   1438  0.0  0.1 174256  5764 ?        Ssl  Mar03   0:00 
>>>>> /usr/lib64/opensaf/osafmsgnd osafmsgnd
>>>>> opensaf   1454  0.0  0.0 155732  1448 ?        Ssl  Mar03   0:00 
>>>>> /usr/lib64/opensaf/osaflcknd osaflcknd
>>>>> opensaf   1463  0.0  0.0 158148  2296 ?        Ssl  Mar03   0:00 
>>>>> /usr/lib64/opensaf/osafckptnd osafckptnd
>>>>> opensaf   1472  0.0  0.0 155020  1392 ?        Ssl  Mar03   0:02 
>>>>> /usr/lib64/opensaf/osafamfwd osafamfwd
>>>>> opensaf   4704  0.0  0.3 182240 11992 ?        Ssl  14:20   0:01 
>>>>> /usr/lib64/opensaf/osafimmnd osafimmnd
>>>>>
>>>>>
>>>>>
>>>>> SCM1 (1.1.15)
>>>>> -------------------
>>>>> 2014-03-04T14:20:18.808187+00:00 scm1 osafamfd[1771]: NO Node 'PLD0211' 
>>>>> left the cluster
>>>>> 2014-03-04T14:20:18.851318+00:00 scm1 kernel: TIPC: Established link 
>>>>> <1.1.15:eth2-1.1.27:bond0> on network plane A
>>>>> 2014-03-04T14:20:18.852749+00:00 scm1 osafsmfd[1965]: ER 
>>>>> saClmClusterNodeGet failed, rc=SA_AIS_ERR_NOT_EXIST (12)
>>>>> 2014-03-04T14:20:18.858472+00:00 scm1 kernel: TIPC: Established link 
>>>>> <1.1.15:eth2-1.1.23:bond0> on network plane A
>>>>> 2014-03-04T14:20:18.871084+00:00 scm1 osafsmfd[1965]: ER 
>>>>> saClmClusterNodeGet failed, rc=SA_AIS_ERR_NOT_EXIST (12)
>>>>> 2014-03-04T14:20:18.956307+00:00 scm1 kernel: TIPC: Resetting link 
>>>>> <1.1.15:eth2-1.1.32:eth2>, peer not responding
>>>>> 2014-03-04T14:20:18.956330+00:00 scm1 kernel: TIPC: Lost link 
>>>>> <1.1.15:eth2-1.1.32:eth2> on network plane A
>>>>> 2014-03-04T14:20:18.956335+00:00 scm1 kernel: TIPC: Lost contact with 
>>>>> <1.1.32>
>>>>> 2014-03-04T14:20:18.956340+00:00 scm1 kernel: TIPC: Established link 
>>>>> <1.1.15:eth2-1.1.32:eth2> on network plane A
>>>>> 2014-03-04T14:20:18.958227+00:00 scm1 osafimmnd[1667]: NO Global discard 
>>>>> node received for nodeId:1200f pid:1347
>>>>> 2014-03-04T14:20:18.958270+00:00 scm1 osafimmnd[1667]: NO Implementer 
>>>>> disconnected 51 <0, 1200f(down)> (MsgQueueService73743)
>>>>> 2014-03-04T14:20:18.965240+00:00 scm1 osafimmnd[1667]: NO Implementer 
>>>>> connected: 71 (MsgQueueService73743) <92377, 10f0f>
>>>>> 2014-03-04T14:20:18.968251+00:00 scm1 osafimmnd[1667]: NO Implementer 
>>>>> locally disconnected. Marking it as doomed 71 <92377, 10f0f> 
>>>>> (MsgQueueService73743)
>>>>> 2014-03-04T14:20:18.971785+00:00 scm1 osafimmnd[1667]: NO Global discard 
>>>>> node received for nodeId:1170f pid:0
>>>>> 2014-03-04T14:20:18.973013+00:00 scm1 osafimmnd[1667]: NO Global discard 
>>>>> node received for nodeId:1200f pid:0
>>>>> 2014-03-04T14:20:18.976586+00:00 scm1 osafimmnd[1667]: NO Implementer 
>>>>> disconnected 71 <92377, 10f0f> (MsgQueueService73743)
>>>>> 2014-03-04T14:20:19.025760+00:00 scm1 osafimmd[1657]: NO Node 11e0f 
>>>>> request sync sync-pid:23769 epoch:0
>>>>> 2014-03-04T14:20:19.076427+00:00 scm1 osafamfd[1771]: NO Node 'PLD0214' 
>>>>> left the cluster
>>>>> 2014-03-04T14:20:19.215220+00:00 scm1 osafimmd[1657]: NO Node 11c0f 
>>>>> request sync sync-pid:23629 epoch:0
>>>>> 2014-03-04T14:20:19.296817+00:00 scm1 osafamfd[1771]: WA 
>>>>> avd_msg_sanity_chk: invalid node ID (11e0f)
>>>>> 2014-03-04T14:20:19.300899+00:00 scm1 osafamfd[1771]: WA 
>>>>> avd_msg_sanity_chk: invalid node ID (11e0f)
>>>>> 2014-03-04T14:20:19.305377+00:00 scm1 osafamfd[1771]: NO Node 'CMM02B' 
>>>>> left the cluster
>>>>> 2014-03-04T14:20:19.357458+00:00 scm1 osafimmd[1657]: NO Node 1200f 
>>>>> request sync sync-pid:4704 epoch:0
>>>>>
>>>>>
>>>>> cmm02B (1.1.32)
>>>>> -----------------------
>>>>> 2014-03-04T14:20:18.495174+00:00 cmm02b kernel: TIPC: Resetting link 
>>>>> <1.1.32:eth2-1.1.10:bond0>, peer not responding
>>>>> 2014-03-04T14:20:18.495203+00:00 cmm02b kernel: TIPC: Lost link 
>>>>> <1.1.32:eth2-1.1.10:bond0> on network plane A
>>>>> 2014-03-04T14:20:18.495209+00:00 cmm02b kernel: TIPC: Lost contact with 
>>>>> <1.1.10>
>>>>> 2014-03-04T14:20:18.501981+00:00 cmm02b kernel: TIPC: Resetting link 
>>>>> <1.1.32:eth2-1.1.15:eth2>, peer not responding
>>>>> 2014-03-04T14:20:18.502012+00:00 cmm02b kernel: TIPC: Lost link 
>>>>> <1.1.32:eth2-1.1.15:eth2> on network plane A
>>>>> 2014-03-04T14:20:18.502016+00:00 cmm02b kernel: TIPC: Lost contact with 
>>>>> <1.1.15>
>>>>> 2014-03-04T14:20:18.502020+00:00 cmm02b kernel: TIPC: Resetting link 
>>>>> <1.1.32:eth2-1.1.11:bond0>, peer not responding
>>>>> 2014-03-04T14:20:18.502023+00:00 cmm02b kernel: TIPC: Lost link 
>>>>> <1.1.32:eth2-1.1.11:bond0> on network plane A
>>>>> 2014-03-04T14:20:18.502026+00:00 cmm02b kernel: TIPC: Lost contact with 
>>>>> <1.1.11>
>>>>> 2014-03-04T14:20:18.502110+00:00 cmm02b kernel: TIPC: Resetting link 
>>>>> <1.1.32:eth2-1.1.1:bond0>, peer not responding
>>>>> 2014-03-04T14:20:18.502115+00:00 cmm02b kernel: TIPC: Lost link 
>>>>> <1.1.32:eth2-1.1.1:bond0> on network plane A
>>>>> 2014-03-04T14:20:18.502118+00:00 cmm02b kernel: TIPC: Lost contact with 
>>>>> <1.1.1>
>>>>> 2014-03-04T14:20:18.549154+00:00 cmm02b kernel: TIPC: Resetting link 
>>>>> <1.1.32:eth2-1.1.14:bond0>, peer not responding
>>>>> 2014-03-04T14:20:18.549180+00:00 cmm02b kernel: TIPC: Lost link 
>>>>> <1.1.32:eth2-1.1.14:bond0> on network plane A
>>>>> 2014-03-04T14:20:18.549184+00:00 cmm02b kernel: TIPC: Lost contact with 
>>>>> <1.1.14>
>>>>> 2014-03-04T14:20:18.671107+00:00 cmm02b kernel: TIPC: Established link 
>>>>> <1.1.32:eth2-1.1.14:bond0> on network plane A
>>>>> 2014-03-04T14:20:18.743482+00:00 cmm02b kernel: TIPC: Established link 
>>>>> <1.1.32:eth2-1.1.11:bond0> on network plane A
>>>>> 2014-03-04T14:20:18.866277+00:00 cmm02b kernel: TIPC: Established link 
>>>>> <1.1.32:eth2-1.1.10:bond0> on network plane A
>>>>> 2014-03-04T14:20:18.869280+00:00 cmm02b kernel: TIPC: Established link 
>>>>> <1.1.32:eth2-1.1.1:bond0> on network plane A
>>>>> 2014-03-04T14:20:18.954740+00:00 cmm02b kernel: TIPC: Established link 
>>>>> <1.1.32:eth2-1.1.15:eth2> on network plane A
>>>>> 2014-03-04T14:20:18.959226+00:00 cmm02b osafimmnd[1347]: WA MESSAGE:38632 
>>>>> OUT OF ORDER my highest processed:38600, exiting
>>>>> 2014-03-04T14:20:18.967269+00:00 cmm02b osafamfnd[1417]: NO 
>>>>> 'safComp=IMMND,safSu=CMM02B,safSg=NoRed,safApp=OpenSAF' faulted due to 
>>>>> 'avaDown' : Recovery is 'componentRestart'
>>>>> 2014-03-04T14:20:19.052569+00:00 cmm02b osafimmnd[4704]: Started
>>>>> 2014-03-04T14:20:19.157393+00:00 cmm02b osafimmnd[4704]: NO SERVER STATE: 
>>>>> IMM_SERVER_ANONYMOUS --> IMM_SERVER_CLUSTER_WAITING
>>>>> 2014-03-04T14:20:19.257835+00:00 cmm02b osafimmnd[4704]: NO SERVER STATE: 
>>>>> IMM_SERVER_CLUSTER_WAITING --> IMM_SERVER_LOADING_PENDING
>>>>> 2014-03-04T14:20:19.358134+00:00 cmm02b osafimmnd[4704]: NO SERVER STATE: 
>>>>> IMM_SERVER_LOADING_PENDING --> IMM_SERVER_SYNC_PENDING
>>>>> 2014-03-04T14:20:19.358452+00:00 cmm02b osafimmnd[4704]: NO NODE STATE-> 
>>>>> IMM_NODE_ISOLATED
>>>>> 2014-03-04T14:20:19.955686+00:00 cmm02b osafimmnd[4704]: NO NODE STATE-> 
>>>>> IMM_NODE_W_AVAILABLE
>>>>> 2014-03-04T14:20:20.022473+00:00 cmm02b osafimmnd[4704]: NO SERVER STATE: 
>>>>> IMM_SERVER_SYNC_PENDING --> IMM_SERVER_SYNC_CLIENT
>>>>> 2014-03-04T14:20:26.925158+00:00 cmm02b kernel: TIPC: Resetting link 
>>>>> <1.1.32:eth2-1.1.31:eth2>, peer not responding
>>>>> 2014-03-04T14:20:26.925184+00:00 cmm02b kernel: TIPC: Lost link 
>>>>> <1.1.32:eth2-1.1.31:eth2> on network plane A
>>>>> 2014-03-04T14:20:26.925191+00:00 cmm02b kernel: TIPC: Lost contact with 
>>>>> <1.1.31>
>>>>> 2014-03-04T14:20:27.893115+00:00 cmm02b kernel: TIPC: Resetting link 
>>>>> <1.1.32:eth2-1.1.27:bond0>, peer not responding
>>>>> 2014-03-04T14:20:27.893148+00:00 cmm02b kernel: TIPC: Lost link 
>>>>> <1.1.32:eth2-1.1.27:bond0> on network plane A
>>>>> 2014-03-04T14:20:27.893154+00:00 cmm02b kernel: TIPC: Lost contact with 
>>>>> <1.1.27>
>>>>> 2014-03-04T14:20:32.026411+00:00 cmm02b osafimmnd[4704]: NO NODE STATE-> 
>>>>> IMM_NODE_FULLY_AVAILABLE 2144
>>>>> 2014-03-04T14:20:32.026463+00:00 cmm02b osafimmnd[4704]: NO 
>>>>> RepositoryInitModeT is SA_IMM_INIT_FROM_FILE
>>>>> 2014-03-04T14:20:32.026493+00:00 cmm02b osafimmnd[4704]: NO Epoch set to 
>>>>> 22 in ImmModel
>>>>> 2014-03-04T14:20:32.031737+00:00 cmm02b osafimmnd[4704]: NO Implementer 
>>>>> connected: 72 (MsgQueueService73743) <67, 1200f>
>>>>> 2014-03-04T14:20:32.035966+00:00 cmm02b osafimmnd[4704]: NO Implementer 
>>>>> connected: 73 (MsgQueueService73231) <0, 11e0f>
>>>>> 2014-03-04T14:20:32.041233+00:00 cmm02b osafimmnd[4704]: NO SERVER STATE: 
>>>>> IMM_SERVER_SYNC_CLIENT --> IMM SERVER READY
>>>>> 2014-03-04T14:20:32.042213+00:00 cmm02b osafimmnd[4704]: NO Implementer 
>>>>> connected: 74 (MsgQueueService72719) <0, 11c0f>
>>>>> 2014-03-04T14:20:32.047252+00:00 cmm02b osafimmnd[4704]: NO Implementer 
>>>>> connected: 75 (MsgQueueService71439) <0, 1170f>
>>>>> 2014-03-04T14:20:46.911220+00:00 cmm02b osafimmnd[4704]: NO Implementer 
>>>>> connected: 76 (MsgQueueService73487) <0, 10f0f>
>>>>> 2014-03-04T14:20:46.920751+00:00 cmm02b osafimmnd[4704]: NO Implementer 
>>>>> disconnected 76 <0, 10f0f> (MsgQueueService73487)
>>>>> 2014-03-04T14:20:48.012244+00:00 cmm02b osafimmnd[4704]: NO Implementer 
>>>>> connected: 77 (MsgQueueService72463) <0, 10f0f>
>>>>> 2014-03-04T14:20:48.014779+00:00 cmm02b osafimmnd[4704]: NO Implementer 
>>>>> disconnected 77 <0, 10f0f> (MsgQueueService72463)
>>>>> 2014-03-04T14:21:13.653100+00:00 cmm02b kernel: TIPC: Established link 
>>>>> <1.1.32:eth2-1.1.31:eth2> on network plane A
>>>>> 2014-03-04T14:21:14.052200+00:00 cmm02b osafimmnd[4704]: NO NODE STATE-> 
>>>>> IMM_NODE_R_AVAILABLE
>>>>> 2014-03-04T14:21:20.913433+00:00 cmm02b osafimmnd[4704]: NO NODE STATE-> 
>>>>> IMM_NODE_FULLY_AVAILABLE 14277
>>>>> 2014-03-04T14:21:20.913963+00:00 cmm02b osafimmnd[4704]: NO Epoch set to 
>>>>> 23 in ImmModel
>>>>> 2014-03-04T14:21:21.419157+00:00 cmm02b osafimmnd[4704]: NO Implementer 
>>>>> connected: 78 (MsgQueueService73487) <0, 11f0f>
>>>>> 2014-03-04T14:21:40.874192+00:00 cmm02b kernel: TIPC: Established link 
>>>>> <1.1.32:eth2-1.1.27:bond0> on network plane A
>>>>> 2014-03-04T14:21:42.179625+00:00 cmm02b osafimmnd[4704]: NO NODE STATE-> 
>>>>> IMM_NODE_R_AVAILABLE
>>>>> 2014-03-04T14:21:46.871328+00:00 cmm02b osafimmnd[4704]: NO NODE STATE-> 
>>>>> IMM_NODE_FULLY_AVAILABLE 14277
>>>>> 2014-03-04T14:21:46.871755+00:00 cmm02b osafimmnd[4704]: NO Epoch set to 
>>>>> 24 in ImmModel
>>>>> 2014-03-04T14:21:47.649858+00:00 cmm02b osafimmnd[4704]: NO Implementer 
>>>>> connected: 79 (MsgQueueService72463) <0, 11b0f>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> Subversion Kills Productivity. Get off Subversion & Make the Move to 
>>>>> Perforce.
>>>>> With Perforce, you get hassle-free workflows. Merge that actually works.
>>>>> Faster operations. Version large binaries.  Built-in WAN optimization and 
>>>>> the
>>>>> freedom to use Git, Perforce or both. Make the move to Perforce.
>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
>>>>> _______________________________________________
>>>>> Opensaf-users mailing list
>>>>> [email protected]
>>>>> https://lists.sourceforge.net/lists/listinfo/opensaf-users
>>>>
>>>
>>
>

------------------------------------------------------------------------------
Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works. 
Faster operations. Version large binaries.  Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Reply via email to