[tipc-discussion] tipc: fix socket timer deadlock

2016-06-13 Thread GUNA
Hi Jon,

Please let me know where I could get the patch for this fix.

Thanks,
Guna

--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-06-02 Thread GUNA
Ying,

Just to clarify that I have not changed anything on tipc_sk_timeout().
The 4.4.0 version already has jiffies change. The issue was seen with
this code. I will try to apply the patch once it is available on
upstream.

thanks
Guna

On Thu, Jun 2, 2016 at 6:57 AM, Xue, Ying  wrote:
> Hi Guna,
>
> Please see my comments below.
>
> Regards,
> Ying
>
> -Original Message-
> From: GUNA [mailto:gbala...@gmail.com]
> Sent: 2016年6月1日 23:26
> To: Xue, Ying
> Cc: Jon Maloy; Jon Maloy; tipc-discussion@lists.sourceforge.net; Erik Hugne; 
> Xue Ying (ying.x...@gmail.com)
> Subject: Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card 
> on 4.4.0
>
> If the issue is reproducible then I could try with Erik's patch even
> though the root cause is unknown. Currently, we are not clear yet on
> root cause of the issue. I could add the patch on production release,
> only if the patch will fix the issue.
>
> [Ying] Understood your meaning. I think Erik's patch should be merged into 
> upstream whatever it can fix your issue or not. But in my opinion, it should 
> be able to fix it.
>
>  Otherwise, I may need to find
> test stream.
>
> As per Erik's proposal,
> ==
> if (sock_owned_by_user(sk))
> we can reschedule timer for a retry in a few jiffies
> ==
>
> I tried to call sk_reset_timer(sk, &sk->sk_timer, (HZ / 20));
> but the code or similar already is in place at tipc_sk_timeout() as
> marked by "<<==" below"
>
> if (tsk->probing_state == TIPC_CONN_PROBING) {
>   if (!sock_owned_by_user(sk))
> ...
>   else
> sk_reset_timer(sk, &sk->sk_timer, (HZ / 20));   <<==
> } else {
>sk_reset_timer(sk, &sk->sk_timer, jiffies + tsk->probing_intv);  <<==
> }
>
> Please let me know If I need to add/modify any.
>
> [Ying] your change above is right and it should be workable. But I still 
> suggest you should adopt Erik's patch("tipc: fix timer handling when socket 
> is owned ") as it's much better than above solution.
>
> thanks,
> Guna
>
> On Wed, Jun 1, 2016 at 7:31 AM, Xue, Ying  wrote:
>> Hi GUNA,
>>
>> Thanks for your confirmation, which is very important for us to look into 
>> what happened in 4.4.0 version.
>> Yes, my mentioned Erik's patch is just as Erik said: "tipc: fix timer 
>> handling when socket is owned".
>>
>> I also agree to Erik's solution as its change is more common method to deal 
>> with the case when owner flag is not set in BH.
>>
>> But now we still need to know what root cause is the issue.
>>
>> If possible, please apply Erik's patch on your side to check whether the 
>> issue occurs or not.
>>
>> Regards,
>> Ying
>>
>> -Original Message-
>> From: GUNA [mailto:gbala...@gmail.com]
>> Sent: 2016年5月31日 23:34
>> To: Xue, Ying
>> Cc: Jon Maloy; Jon Maloy; tipc-discussion@lists.sourceforge.net; Erik Hugne; 
>> Xue Ying (ying.x...@gmail.com)
>> Subject: Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card 
>> on 4.4.0
>>
>> Just want to clarify, system was upgraded only the kernel from 3.4.2
>> to 4.4.0 + some tipc patches on Fedora distribution. That said, the
>> patch, "net: do not block BH while processing socket backlog" is not
>> part of the 4.4.0. So, the issue is not due to this commit.
>>
>> If the patch, "tipc: block BH in TCP callbacks" could resolve the
>> issue then I could try applying the patch. However the issue is not
>> reproducible. So, we may not get the result right away.
>>
>> Which Erik's patch you are talking about?
>> Is this one, "tipc: fix timer handling when socket is owned" ?
>>
>>
>> /// Guna
>>
>> On Tue, May 31, 2016 at 3:49 AM, Xue, Ying  wrote:
>>> Hi Jon,
>>>
>>> Today, I spent time further analyzing its potential root cause why the soft 
>>> lockup occurred regarding log provided by GUNA. But I don't find some 
>>> valuable hints.
>>>
>>> To be honest, even if CONN_MANAGER/CONN_PROBE message is sent through 
>>> tipc_node_xmit_skb() without holding "owner" flag in tipc_sk_timeout(), 
>>> deadlock should not happen in theory. Before the tipc_sk_rcv() is secondly 
>>> called, destination port and source port of CONN_MANAGER/CONN_PROBE message 
>>> created in tipc_sk_timeout() have been reversed. As a result, the tsk found 
>>> at (*1) is different with another tsk found at 

[tipc-discussion] tipc: name table entry is not matched

2016-06-01 Thread GUNA
I am running on Kernel 4.4.0 and do see table Name table mismatch as
per "tipc-config" tool. As per analysis, I do see only one entry is
mismatched as indicated below. It is seen only on card13 CPU and no
other CPUs (system has total 10 cards). The system is up 22 days.

Type   Lower  Upper  Port Identity  Publication Scope
5  6012   6012   <1.1.12:1619006126>1619006126  cluster

Does the entry not published to other cards or not removed it properly
on card13 ?

--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-06-01 Thread GUNA
If the issue is reproducible then I could try with Erik's patch even
though the root cause is unknown. Currently, we are not clear yet on
root cause of the issue. I could add the patch on production release,
only if the patch will fix the issue. Otherwise, I may need to find
test stream.

As per Erik's proposal,
==
if (sock_owned_by_user(sk))
we can reschedule timer for a retry in a few jiffies
==

I tried to call sk_reset_timer(sk, &sk->sk_timer, (HZ / 20));
but the code or similar already is in place at tipc_sk_timeout() as
marked by "<<==" below"

if (tsk->probing_state == TIPC_CONN_PROBING) {
  if (!sock_owned_by_user(sk))
...
  else
sk_reset_timer(sk, &sk->sk_timer, (HZ / 20));   <<==
} else {
   sk_reset_timer(sk, &sk->sk_timer, jiffies + tsk->probing_intv);  <<==
}

Please let me know If I need to add/modify any.

thanks,
Guna

On Wed, Jun 1, 2016 at 7:31 AM, Xue, Ying  wrote:
> Hi GUNA,
>
> Thanks for your confirmation, which is very important for us to look into 
> what happened in 4.4.0 version.
> Yes, my mentioned Erik's patch is just as Erik said: "tipc: fix timer 
> handling when socket is owned".
>
> I also agree to Erik's solution as its change is more common method to deal 
> with the case when owner flag is not set in BH.
>
> But now we still need to know what root cause is the issue.
>
> If possible, please apply Erik's patch on your side to check whether the 
> issue occurs or not.
>
> Regards,
> Ying
>
> -Original Message-
> From: GUNA [mailto:gbala...@gmail.com]
> Sent: 2016年5月31日 23:34
> To: Xue, Ying
> Cc: Jon Maloy; Jon Maloy; tipc-discussion@lists.sourceforge.net; Erik Hugne; 
> Xue Ying (ying.x...@gmail.com)
> Subject: Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card 
> on 4.4.0
>
> Just want to clarify, system was upgraded only the kernel from 3.4.2
> to 4.4.0 + some tipc patches on Fedora distribution. That said, the
> patch, "net: do not block BH while processing socket backlog" is not
> part of the 4.4.0. So, the issue is not due to this commit.
>
> If the patch, "tipc: block BH in TCP callbacks" could resolve the
> issue then I could try applying the patch. However the issue is not
> reproducible. So, we may not get the result right away.
>
> Which Erik's patch you are talking about?
> Is this one, "tipc: fix timer handling when socket is owned" ?
>
>
> /// Guna
>
> On Tue, May 31, 2016 at 3:49 AM, Xue, Ying  wrote:
>> Hi Jon,
>>
>> Today, I spent time further analyzing its potential root cause why the soft 
>> lockup occurred regarding log provided by GUNA. But I don't find some 
>> valuable hints.
>>
>> To be honest, even if CONN_MANAGER/CONN_PROBE message is sent through 
>> tipc_node_xmit_skb() without holding "owner" flag in tipc_sk_timeout(), 
>> deadlock should not happen in theory. Before the tipc_sk_rcv() is secondly 
>> called, destination port and source port of CONN_MANAGER/CONN_PROBE message 
>> created in tipc_sk_timeout() have been reversed. As a result, the tsk found 
>> at (*1) is different with another tsk found at (*2) because we use different 
>> destination number to look up tsk instances.
>>
>> tipc_sk_timeout()
>>   create: CONN_MANAGER/CONN_PROBE msg (src port= tsk->portid, dst port = 
>> peer_port)
>>   tipc_node_xmit_skb()
>> tipc_node_xmit()
>>   tipc_sk_rcv()
>> tsk = tipc_sk_lookup(net, dport); // use dst port(peer_port) to look 
>> up tsk, and the tsk is called tsk1 (*1)
>> if (likely(spin_trylock_bh(&sk->sk_lock.slock)))
>> tipc_sk_enqueue()
>>   filter_rcv()
>> tipc_sk_proto_rcv()
>>tipc_sk_respond()
>>  reverse ports: dport = tsk->portid;  src port = 
>> peer_port
>>  tipc_node_xmit_skb()
>>tipc_node_xmit()
>>   tipc_sk_rcv()
>>  tsk = tipc_sk_lookup(net, dport); // use dst 
>> port(portid) to look up tsk, and the tsk is supposed as tsk2 --(*2)
>>  if (likely(spin_trylock_bh(&sk->sk_lock.slock)))
>>
>> So even if "owner" flag of tsk1 is not set, it's safe for us to operate tsk2 
>> in the same BH context.
>>
>> I also agree with you. Although Erik's patch might solve the issue, we still 
>> need to further find its root cause.
>>
>> Additionally, I suspect there is a certai

Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-05-31 Thread GUNA
Could you provide me the exact code change for rescheduling, so I
don't want to make any mistake.

Also, could I still apply the patch, "tipc: block BH in TCP callbacks" ?

On Tue, May 31, 2016 at 12:03 PM, Erik Hugne  wrote:
>
> On May 31, 2016 17:34, "GUNA"  wrote:
>>
>> Which Erik's patch you are talking about?
>> Is this one, "tipc: fix timer handling when socket is owned" ?
>
> I think he was referring to my earlier suggestion to reschedule the timer if
> the socket is owned by user when it fires.
>
> The patch i sent yesterday tries to solve it slightly different. Instead of
> rescheduling the timer, we set a flag and act on that when the sock is
> released by user.
>
> //E

--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-05-31 Thread GUNA
Just want to clarify, system was upgraded only the kernel from 3.4.2
to 4.4.0 + some tipc patches on Fedora distribution. That said, the
patch, "net: do not block BH while processing socket backlog" is not
part of the 4.4.0. So, the issue is not due to this commit.

If the patch, "tipc: block BH in TCP callbacks" could resolve the
issue then I could try applying the patch. However the issue is not
reproducible. So, we may not get the result right away.

Which Erik's patch you are talking about?
Is this one, "tipc: fix timer handling when socket is owned" ?


/// Guna

On Tue, May 31, 2016 at 3:49 AM, Xue, Ying  wrote:
> Hi Jon,
>
> Today, I spent time further analyzing its potential root cause why the soft 
> lockup occurred regarding log provided by GUNA. But I don't find some 
> valuable hints.
>
> To be honest, even if CONN_MANAGER/CONN_PROBE message is sent through 
> tipc_node_xmit_skb() without holding "owner" flag in tipc_sk_timeout(), 
> deadlock should not happen in theory. Before the tipc_sk_rcv() is secondly 
> called, destination port and source port of CONN_MANAGER/CONN_PROBE message 
> created in tipc_sk_timeout() have been reversed. As a result, the tsk found 
> at (*1) is different with another tsk found at (*2) because we use different 
> destination number to look up tsk instances.
>
> tipc_sk_timeout()
>   create: CONN_MANAGER/CONN_PROBE msg (src port= tsk->portid, dst port = 
> peer_port)
>   tipc_node_xmit_skb()
> tipc_node_xmit()
>   tipc_sk_rcv()
> tsk = tipc_sk_lookup(net, dport); // use dst port(peer_port) to look 
> up tsk, and the tsk is called tsk1 (*1)
> if (likely(spin_trylock_bh(&sk->sk_lock.slock)))
> tipc_sk_enqueue()
>   filter_rcv()
> tipc_sk_proto_rcv()
>tipc_sk_respond()
>  reverse ports: dport = tsk->portid;  src port = peer_port
>  tipc_node_xmit_skb()
>tipc_node_xmit()
>   tipc_sk_rcv()
>  tsk = tipc_sk_lookup(net, dport); // use dst 
> port(portid) to look up tsk, and the tsk is supposed as tsk2 --(*2)
>  if (likely(spin_trylock_bh(&sk->sk_lock.slock)))
>
> So even if "owner" flag of tsk1 is not set, it's safe for us to operate tsk2 
> in the same BH context.
>
> I also agree with you. Although Erik's patch might solve the issue, we still 
> need to further find its root cause.
>
> Additionally, I suspect there is a certain relationship between the issue and 
> 5413d1babe8f10de13d72496c12b862eef8ba613 (net: do not block BH while 
> processing socket backlog).
> I don't know whether it's easy to reproduce the issue. But I suggest we can 
> revert above commit or apply the following patch to verify whether the issue 
> is related to the commit.
>
> http://www.spinics.net/lists/netdev/msg378109.html
>
> Regards,
> Ying
>
> -Original Message-
> From: Jon Maloy [mailto:jon.ma...@ericsson.com]
> Sent: 2016年5月30日 22:43
> To: Xue, Ying; GUNA; Jon Maloy; tipc-discussion@lists.sourceforge.net; Erik 
> Hugne; Xue Ying (ying.x...@gmail.com)
> Subject: RE: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card 
> on 4.4.0
>
>
>
>> -Original Message-
>> From: Xue, Ying [mailto:ying@windriver.com]
>> Sent: Monday, 30 May, 2016 14:15
>> To: Jon Maloy; GUNA; Jon Maloy; tipc-discussion@lists.sourceforge.net; Erik
>> Hugne; Xue Ying (ying.x...@gmail.com)
>> Subject: RE: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card 
>> on 4.4.0
>>
>> Hi Jon,
>>
> ...
>
>> In fact we ever proposed several possible solutions of how to well deal with 
>> the
>> concurrent scenario in skb receive path. Especially, we need to stay BH mode 
>> to
>> directly forward back a skb received in BH, which very easily causes the 
>> problem
>> that a skb will be routed back and forth, leading deadlock occurs.
>
> We stay in BH, but we do *not* keep slock during the send/receive sequence, 
> so this cannot be the cause of the problem. On the receive path, this should 
> work just and any other message received in softirq.
>
> Anyway, I am not sure at all this is a deadlock concerning slock (see earlier 
> mail), but rather a message arriving to a socket that has been corrupted by 
> interference between the timer in softirq and tipc_recv_xxx() in user 
> context. Maybe a state change?
>
> I think a slightly improved version of what Erik suggested might solve the 
> issue, but I won't feel comforta

Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-05-27 Thread GUNA
Any update on the issue? Any other thoughts or possible fix ?

The issue was seen on slot12 (1.1.12) node only. The other slots were up.

I got the full logs as listed here:


May 19 05:03:01 [SEQ 248049] dcsx5testslot13 /USR/SBIN/CROND[11359]:
(root) CMD (/opt/cpu_ss7gw/current/scripts/mgmt_apache_watchdog)
May 19 05:03:21 [SEQ 249182] dcsx5testslot12 kernel:  [673637.606852]
NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [swapper/2:0]
May 19 05:03:21 [SEQ 249183] dcsx5testslot12 kernel:  [673637.607791]
NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [swapper/3:0]
May 19 05:03:21 [SEQ 249184] dcsx5testslot12 kernel:  [673637.607817]
Modules linked in: nf_log_ipv4 nf_log_common xt_LOG sctp libcrc32c
e1000e tip
c udp_tunnel ip6_udp_tunnel 8021q garp iTCO_wdt xt_physdev
br_netfilter bridge stp llc nf_conntrack_ipv4 ipmiq_drv(O) sio_mmc(O)
nf_defrag_ipv4 ip6
t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state
nf_conntrack event_drv(O) ip6table_filter ip6_tables ddi(O) lockd
pt_timer_info(O
) grace ixgbe usb_storage igb pcspkr iTCO_vendor_support i2c_algo_bit
i2c_i801 ptp i2c_core ioatdma lpc_ich mfd_core tpm_tis dca pps_core
intel_ips
 tpm mdio sunrpc [last unloaded: iTCO_wdt]
May 19 05:03:21 [SEQ 249185] dcsx5testslot12 kernel:  [673637.607819]
CPU: 3 PID: 0 Comm: swapper/3 Tainted: G   O4.4.0 #17
May 19 05:03:21 [SEQ 249186] dcsx5testslot12 kernel:  [673637.607820]
Hardware name: PT AMC124/Base Board Product Name, BIOS
LGNAJFIP.PTI.0012.P15
01/15/2014
May 19 05:03:21 [SEQ 249187] dcsx5testslot12 kernel:  [673637.607821]
task: 880351a88000 ti: 880351a84000 task.ti: 880351a84000
May 19 05:03:21 [SEQ 249188] dcsx5testslot12 kernel:  [673637.607827]
RIP: 0010:[]  []
rhashtable_jhash2+0x0/0x
f0
May 19 05:03:21 [SEQ 249189] dcsx5testslot12 kernel:  [673637.607828]
RSP: 0018:88035fc63a60  EFLAGS: 0206
May 19 05:03:21 [SEQ 249190] dcsx5testslot12 kernel:  [673637.607829]
RAX: 880351613a80 RBX: 880347948000 RCX: 880347949060
May 19 05:03:21 [SEQ 249191] dcsx5testslot12 kernel:  [673637.607830]
RDX: 03ff6972 RSI: 0001 RDI: 88035fc63a84
May 19 05:03:21 [SEQ 249192] dcsx5testslot12 kernel:  [673637.607830]
RBP: 88035fc63ab8 R08: 0001 R09: 0004
May 19 05:03:21 [SEQ 249193] dcsx5testslot12 kernel:  [673637.607831]
R10:  R11:  R12: 88035fc63bd0
May 19 05:03:21 [SEQ 249194] dcsx5testslot12 kernel:  [673637.607832]
R13: 88009a4dc000 R14: 8803502f5640 R15: 88035fc63be4
May 19 05:03:21 [SEQ 249195] dcsx5testslot12 kernel:  [673637.607833]
FS:  () GS:88035fc6()
knlGS:
May 19 05:03:21 [SEQ 249196] dcsx5testslot12 kernel:  [673637.607834]
CS:  0010 DS:  ES:  CR0: 8005003b
May 19 05:03:21 [SEQ 249197] dcsx5testslot12 kernel:  [673637.607835]
CR2: 7f6e9c244000 CR3: 01c0a000 CR4: 06e0
May 19 05:03:21 [SEQ 249198] dcsx5testslot12 kernel:  [673637.607835] Stack:
:6600
May 19 05:04:45 [SEQ 252476] dcsx5testslot12 kernel:  [673721.608818]
NMI watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [swapper/3:0]
May 19 05:04:45 [SEQ 252477] dcsx5testslot12 kernel:  [673721.608844]
Modules linked in: nf_log_ipv4 nf_log_common xt_LOG sctp libcrc32c
e1000e tip
c udp_tunnel ip6_udp_tunnel 8021q garp iTCO_wdt xt_physdev
br_netfilter bridge stp llc nf_conntrack_ipv4 ipmiq_drv(O) sio_mmc(O)
nf_defrag_ipv4 ip6
t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state
nf_conntrack event_drv(O) ip6table_filter ip6_tables ddi(O) lockd
pt_timer_info(O
) grace ixgbe usb_storage igb pcspkr iTCO_vendor_support i2c_algo_bit
i2c_i801 ptp i2c_core ioatdma lpc_ich mfd_core tpm_tis dca pps_core
intel_ips
 tpm mdio sunrpc [last unloaded: iTCO_wdt]
May 19 05:04:45 [SEQ 252478] dcsx5testslot12 kernel:  [673721.608847]
CPU: 3 PID: 0 Comm: swapper/3 Tainted: G   O L  4.4.0 #17
May 19 05:04:45 [SEQ 252479] dcsx5testslot12 kernel:  [673721.608848]
Hardware name: PT AMC124/Base Board Product Name, BIOS
LGNAJFIP.PTI.0012.P15
01/15/2014
May 19 05:04:45 [SEQ 252480] dcsx5testslot12 kernel:  [673721.608849]
task: 880351a88000 ti: 880351a84000 task.ti: 880351a84000
May 19 05:04:45 [SEQ 252481] dcsx5testslot12 kernel:  [673721.608858]
RIP: 0010:[]  []
tipc_sk_rcv+0x95/0x490 [
tipc]
May 19 05:04:45 [SEQ 252482] dcsx5testslot12 kernel:  [673721.608859]
RSP: 0018:88035fc63ac8  EFLAGS: 0296
May 19 05:04:45 [SEQ 252483] dcsx5testslot12 kernel:  [673721.608860]
RAX: 0301 RBX: 88035fc63b70 RCX: 0001
May 19 05:04:45 [SEQ 252484] dcsx5testslot12 kernel:  [673721.608861]
RDX: b2db9944 RSI: 0200 RDI: 81ce6240
May 19 05:04:45 [SEQ 252485] dcsx5testslot12 kernel:  [673721.608862]
RBP: 88035fc63b38 R08: 0001 R09: 0004
May 19 05:04:45 [SEQ 252486] dcsx5testslot12 kernel:  [673721.608862]
R10:  R11: 000

[tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-05-24 Thread GUNA
I suspect there could be glitch on switch may cause lost the probe or
abort message.  However, even if the messages are lost for what ever
reason, is not TIPC stack should handle the graceful shutdown of the
TIPC connection by releasing all the resources instead of panic or
dead itself ?

Does lock_sock/release_sock use in tipc_sk_timeout() fix the issue ?

Thanks,
Guna

--
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-05-20 Thread GUNA
Thanks Erik for your quick analysis.
If it is not known issue, are there any expert available to
investigate it further why this lockup happen? Otherwise let me know
the patch or fix information.

// Guna

On Fri, May 20, 2016 at 1:19 AM, Erik Hugne  wrote:
> A little more awake now. Didnt see this yesterday.
> Look at the trace from CPU2 in Guna's initial mail.
>
> TIPC is recursing into the receive loop a second time, and freezes when it
> tries to take slock a second time. this is done in a timer CB, and softirq
> lockup detector kicks in after ~20s.
>
> //E
>
> [686797.257426]  
>
> [686797.257426]  [] _raw_spin_trylock_bh+0x40/0x50
>
> [686797.257430]  [] tipc_sk_rcv+0xbc/0x490 [tipc]
>
> [686797.257432]  [] ? tcp_rcv_established+0x40e/0x760
>
> [686797.257435]  [] tipc_node_xmit+0x11f/0x150 [tipc]
>
> [686797.257437]  [] ? find_busiest_group+0x153/0x980
>
> [686797.257441]  [] tipc_node_xmit_skb+0x37/0x60 [tipc]
>
> [686797.257444]  [] tipc_sk_respond+0x99/0xc0 [tipc]
>
> [686797.257447]  [] filter_rcv+0x4cd/0x550 [tipc]
>
> [686797.257451]  [] tipc_sk_rcv+0x2dd/0x490 [tipc]
>
> [686797.257454]  [] tipc_node_xmit+0x11f/0x150 [tipc]
>
> [686797.257458]  [] ? tipc_recv_stream+0x370/0x370 [tipc]
>
> [686797.257461]  [] tipc_node_xmit_skb+0x37/0x60 [tipc]
>
> [686797.257464]  [] tipc_sk_timeout+0xe0/0x180 [tipc
>
> On May 19, 2016 21:37, "GUNA"  wrote:
>
> All the CPU cards on the system running the same load.  Seen similar
> issue about 6 weeks back but seen again now on one card compared to
> all cards last time. At this time, there was very light traffic
> (handshake).
>
> I had seen following as part of the log, not sure it contributes the
> issue or not:
>
> [686808.930065] ixgbe :01:00.0 p19p2: initiating reset due to tx timeout
> [686810.062026] INFO: rcu_sched detected stalls on CPUs/tasks:
> [686813.257936] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s!
>
> // details
> ...
> [686797.890210]  [] ret_from_fork+0x3f/0x70
> [686797.895692]  [] ? flush_kthread_worker+0x90/0x90
> [686797.901951] Code: 00 eb 02 89 c6 f7 c6 00 ff ff ff 75 41 83 fe 01
> 89 ca 89 f0 41 0f 44 d0 f0 0f b1 17 39 f0 75 e3 83 fa 01 75 04 eb 0d
> f3 90 8b 07 <84> c0 75 f8 66 c7 07 01 00 5d c3 8b 37 81 fe 00 01 00 00
> 75 b6
> [686798.930348] ixgbe :01:00.0 p19p2: initiating reset due to tx timeout
> [686803.938207] ixgbe :01:00.0 p19p2: initiating reset due to tx timeout
> [686808.930065] ixgbe :01:00.0 p19p2: initiating reset due to tx timeout
> [686810.062026] INFO: rcu_sched detected stalls on CPUs/tasks:
> [686810.067613] 2-...: (2 GPs behind) idle=a27/1/0
> softirq=70531357/70531357 fqs=4305315
> [686810.075606] (detected by 1, t=13200382 jiffies,
> g=173829751, c=173829750, q=25641590)
> [686810.083624]  880351a83e68 0018 81591bf1
> 880351a83ec8
> [686810.091163]  0002005932b8 00010006 880351a84000
> 81d1ce20
> [686810.098697]  880351a84000 88035fc5d300 81cb2c00
> 880351a83eb8
> [686810.106233] Call Trace:
> [686810.108767]  [] ? cpuidle_enter_state+0x91/0x200
> [686810.115026]  [] ? cpuidle_enter+0x17/0x20
> [686810.120673]  [] ? call_cpuidle+0x37/0x60
> [686810.126234]  [] ? cpuidle_select+0x13/0x20
> [686810.131978]  [] ? cpu_startup_entry+0x211/0x2d0
> [686810.138156]  [] ? start_secondary+0x103/0x130
> [686813.257936] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s!
> [swapper/3:0]
>
> On Thu, May 19, 2016 at 3:06 PM, Erik Hugne  wrote:
>> On Thu, May 19, 2016 at 10:34:05AM -0400, GUNA wrote:
>>> One of the card in my system is dead and rebooted to recover it.
>>> The system is running on Kernel 4.4.0 + some latest TIPC patches.
>>> Your earliest feedback of the issue is recommended.
>>>
>> At first i thought this might be a spinlock contention problem.
>>
>> CPU2 is receiving TIPC traffic on a socket, and is trying to grab a
>> spinlock in tipc_sk_rcv context (probably sk->sk_lock.slock)
>> First argument to spin_trylock_bh() is passed in RDI: a01546cc
>>
>> CPU3 is sending TIPC data, tipc_node_xmit()->tipc_sk_rcv() indicates
>> that it's traffic between sockets on the same machine.
>> And i think this is the same socket as on CPU2, because we see the same
>> address in RDI: a01546cc
>>
>> But this made me unsure:
>> [686798.930348] ixgbe :01:00.0 p19p2: initiating reset due to tx
>> timeout
>> Is it contributing to the problem, or is it a side effect of a spinlock
>> contention?
>>
>> Driver (or HW) bugs _are_ fatal for a network stack, but why would 

Re: [tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-05-19 Thread GUNA
All the CPU cards on the system running the same load.  Seen similar
issue about 6 weeks back but seen again now on one card compared to
all cards last time. At this time, there was very light traffic
(handshake).

I had seen following as part of the log, not sure it contributes the
issue or not:

[686808.930065] ixgbe :01:00.0 p19p2: initiating reset due to tx timeout
[686810.062026] INFO: rcu_sched detected stalls on CPUs/tasks:
[686813.257936] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s!

// details
...
[686797.890210]  [] ret_from_fork+0x3f/0x70
[686797.895692]  [] ? flush_kthread_worker+0x90/0x90
[686797.901951] Code: 00 eb 02 89 c6 f7 c6 00 ff ff ff 75 41 83 fe 01
89 ca 89 f0 41 0f 44 d0 f0 0f b1 17 39 f0 75 e3 83 fa 01 75 04 eb 0d
f3 90 8b 07 <84> c0 75 f8 66 c7 07 01 00 5d c3 8b 37 81 fe 00 01 00 00
75 b6
[686798.930348] ixgbe :01:00.0 p19p2: initiating reset due to tx timeout
[686803.938207] ixgbe :01:00.0 p19p2: initiating reset due to tx timeout
[686808.930065] ixgbe :01:00.0 p19p2: initiating reset due to tx timeout
[686810.062026] INFO: rcu_sched detected stalls on CPUs/tasks:
[686810.067613] 2-...: (2 GPs behind) idle=a27/1/0
softirq=70531357/70531357 fqs=4305315
[686810.075606] (detected by 1, t=13200382 jiffies,
g=173829751, c=173829750, q=25641590)
[686810.083624]  880351a83e68 0018 81591bf1
880351a83ec8
[686810.091163]  0002005932b8 00010006 880351a84000
81d1ce20
[686810.098697]  880351a84000 88035fc5d300 81cb2c00
880351a83eb8
[686810.106233] Call Trace:
[686810.108767]  [] ? cpuidle_enter_state+0x91/0x200
[686810.115026]  [] ? cpuidle_enter+0x17/0x20
[686810.120673]  [] ? call_cpuidle+0x37/0x60
[686810.126234]  [] ? cpuidle_select+0x13/0x20
[686810.131978]  [] ? cpu_startup_entry+0x211/0x2d0
[686810.138156]  [] ? start_secondary+0x103/0x130
[686813.257936] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s!
[swapper/3:0]

On Thu, May 19, 2016 at 3:06 PM, Erik Hugne  wrote:
> On Thu, May 19, 2016 at 10:34:05AM -0400, GUNA wrote:
>> One of the card in my system is dead and rebooted to recover it.
>> The system is running on Kernel 4.4.0 + some latest TIPC patches.
>> Your earliest feedback of the issue is recommended.
>>
> At first i thought this might be a spinlock contention problem.
>
> CPU2 is receiving TIPC traffic on a socket, and is trying to grab a
> spinlock in tipc_sk_rcv context (probably sk->sk_lock.slock)
> First argument to spin_trylock_bh() is passed in RDI: a01546cc
>
> CPU3 is sending TIPC data, tipc_node_xmit()->tipc_sk_rcv() indicates
> that it's traffic between sockets on the same machine.
> And i think this is the same socket as on CPU2, because we see the same
> address in RDI: a01546cc
>
> But this made me unsure:
> [686798.930348] ixgbe :01:00.0 p19p2: initiating reset due to tx timeout
> Is it contributing to the problem, or is it a side effect of a spinlock 
> contention?
>
> Driver (or HW) bugs _are_ fatal for a network stack, but why would a lock 
> contention
> in a network stack cause NIC TX timeouts?
>
> Does all cards in your system have similar workloads?
> Do you see this on multiple cards?
>
> //E

--
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


[tipc-discussion] tipc_sk_rcv: Kernel panic on one of the card on 4.4.0

2016-05-19 Thread GUNA
One of the card in my system is dead and rebooted to recover it.
The system is running on Kernel 4.4.0 + some latest TIPC patches.
Your earliest feedback of the issue is recommended.

The cascaded failure logs are following:


[686797.257405] Modules linked in: nf_log_ipv4 nf_log_common xt_LOG
sctp libcrc32c e1000e tipc udp_tunnel ip6_udp_tunnel 8021q garp
iTCO_wdt xt_physdev br_netfilter bridge stp llc nf_conntrack_ipv4
ipmiq_drv(O) sio_mmc(O) nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6
nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack event_drv(O)
ip6table_filter ip6_tables ddi(O) lockd pt_timer_info(O) grace ixgbe
usb_storage igb pcspkr iTCO_vendor_support i2c_algo_bit i2c_i801 ptp
i2c_core ioatdma lpc_ich mfd_core tpm_tis dca pps_core intel_ips tpm
mdio sunrpc [last unloaded: iTCO_wdt]

[686797.257407] CPU: 2 PID: 0 Comm: swapper/2 Tainted: GW  O L
 4.4.0 #17

[686797.257407] Hardware name: PT AMC124/Base Board Product Name, BIOS
LGNAJFIP.PTI.0012.P15 01/15/2014

[686797.257408] task: 880351a751c0 ti: 880351a8 task.ti:
880351a8

[686797.257411] RIP: 0010:[]  []
__local_bh_enable_ip+0x4c/0x90

[686797.257412] RSP: 0018:88035fc43aa0  EFLAGS: 0296

[686797.257412] RAX: 0301 RBX: fe01 RCX:
0001

[686797.257413] RDX: 0003 RSI: 0200 RDI:
a01546cc

[686797.257414] RBP: 88035fc43aa8 R08: 02d4 R09:
0004

[686797.257415] R10:  R11:  R12:
88035fc43bd0

[686797.257415] R13: ab7500c6 R14: 880350124ec0 R15:
88035fc43be4

[686797.257416] FS:  () GS:88035fc4()
knlGS:

[686797.257417] CS:  0010 DS:  ES:  CR0: 8005003b

[686797.257418] CR2: 7f6e9c244000 CR3: 01c0a000 CR4:
06e0

[686797.257418] Stack:

[686797.257420]  88035fc43b70 88035fc43ab8 816de340
88035fc43b38

[686797.257421]  a01546cc 88034a8fad10 0020
88035fc43af8

[686797.257423]  81ce6240 880350124f48 00012821c002
88035fc43b58

[686797.257423] Call Trace:

[686797.257426]  

[686797.257426]  [] _raw_spin_trylock_bh+0x40/0x50

[686797.257430]  [] tipc_sk_rcv+0xbc/0x490 [tipc]

[686797.257432]  [] ? tcp_rcv_established+0x40e/0x760

[686797.257435]  [] tipc_node_xmit+0x11f/0x150 [tipc]

[686797.257437]  [] ? find_busiest_group+0x153/0x980

[686797.257441]  [] tipc_node_xmit_skb+0x37/0x60 [tipc]

[686797.257444]  [] tipc_sk_respond+0x99/0xc0 [tipc]

[686797.257447]  [] filter_rcv+0x4cd/0x550 [tipc]

[686797.257451]  [] tipc_sk_rcv+0x2dd/0x490 [tipc]

[686797.257454]  [] tipc_node_xmit+0x11f/0x150 [tipc]

[686797.257458]  [] ? tipc_recv_stream+0x370/0x370 [tipc]

[686797.257461]  [] tipc_node_xmit_skb+0x37/0x60 [tipc]

[686797.257464]  [] tipc_sk_timeout+0xe0/0x180 [tipc]

[686797.257468]  [] ? tipc_recv_stream+0x370/0x370 [tipc]

[686797.257469]  [] call_timer_fn+0x44/0x110

[686797.257470]  [] ? cascade+0x4a/0x80

[686797.257474]  [] ? tipc_recv_stream+0x370/0x370 [tipc]

[686797.257475]  [] run_timer_softirq+0x22c/0x280

[686797.257477]  [] __do_softirq+0xc8/0x260

[686797.257478]  [] irq_exit+0x83/0xb0

[686797.257480]  [] do_IRQ+0x65/0xf0

[686797.257481]  [] common_interrupt+0x7f/0x7f

[686797.257484]  

[686797.257484]  [] ? cpuidle_enter_state+0xad/0x200

[686797.257485]  [] ? cpuidle_enter_state+0x91/0x200

[686797.257486]  [] cpuidle_enter+0x17/0x20

[686797.257488]  [] call_cpuidle+0x37/0x60

[686797.257489]  [] ? cpuidle_select+0x13/0x20

[686797.257490]  [] cpu_startup_entry+0x211/0x2d0

[686797.257491]  [] start_secondary+0x103/0x130

[686797.257506] Code: 80 c7 00 01 75 31 83 eb 01 f7 db 65 01 1d 05 6e
f8 7e 65 8b 05 fe 6d f8 7e a9 00 ff 1f 00 74 31 65 ff 0d f0 6d f8 7e
48 83 c4 08 <5b> 5d c3 9c 58 f6 c4 02 75 d1 eb c6 be 96 00 00 00 48 c7
c7 82

[686797.640076] Modules linked in: nf_log_ipv4 nf_log_common xt_LOG
sctp libcrc32c e1000e tipc udp_tunnel ip6_udp_tunnel 8021q garp
iTCO_wdt xt_physdev br_netfilter bridge stp llc nf_conntrack_ipv4
ipmiq_drv(O) sio_mmc(O) nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6
nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack event_drv(O)
ip6table_filter ip6_tables ddi(O) lockd pt_timer_info(O) grace ixgbe
usb_storage igb pcspkr iTCO_vendor_support i2c_algo_bit i2c_i801 ptp
i2c_core ioatdma lpc_ich mfd_core tpm_tis dca pps_core intel_ips tpm
mdio sunrpc [last unloaded: iTCO_wdt]

[686797.691099] CPU: 0 PID: 28367 Comm: kworker/u32:2 Tainted: G
 W  O L  4.4.0 #17

[686797.699005] Hardware name: PT AMC124/Base Board Product Name, BIOS
LGNAJFIP.PTI.0012.P15 01/15/2014

[686797.708119] Workqueue: tipc_send tipc_send_work [tipc]

[686797.713356] task: 88034e8ad1c0 ti: 8800998f task.ti:
8800998f

[686797.720906] RIP: 0010:[]  []
queued_spin_lock_slowpath+0x46/0x160

[686797.730220] RSP: 0018:8800998f3c78  EFLAGS: 0202

[686797.735606] RAX: 0101 RB

Re: [tipc-discussion] Tipc: name table mismatch between different cards in a system

2016-05-04 Thread GUNA
I have seen the patch: tipc: move netlink policies to netlink.c
Does this patch has a tipc-config compatibility fix that Jon found earlier ?
However I don't see any change in netlink_compat.c.

From Jon:
"
When I run the "tipc" tool  I see the correct value, i.e., key == (portid + 1).
When I run tipc-config (which is deprecated in the new version), I see
the wrong value key == portid for the same publications!
"

// Guna

On Mon, May 2, 2016 at 1:22 PM, GUNA  wrote:
> Is there any possibility getting the fix soon? Our audit scripts cause
> alarm due to incorrect table mismatch. If you point me the code to be
> fixed then I will fix it in my kernel. I am using kernel 4.4.0 on
> Fedora dist.
> Thanks in advance.
> Guna
>
> On Fri, Apr 29, 2016 at 11:55 AM, Jon Maloy  wrote:
>>
>>
>>> -Original Message-
>>> From: GUNA [mailto:gbala...@gmail.com]
>>> Sent: Friday, 29 April, 2016 10:48
>>> To: Jon Maloy
>>> Cc: tipc-discussion@lists.sourceforge.net
>>> Subject: Re: Tipc: name table mismatch between different cards in a system
>>>
>>> The two skb_linearize() calls and the update of ‘hdr' fixes are
>>> already in my load did not solve this issue. The issue remains same
>>> even after today's ACTIVE state fix (before one of link is STANDBY
>>> even same priority)
>>>
>>> // IO card, note this does not run latest kernel or tipc
>>> [root@10 ~]# tipc-config -nt |grep 2334480598
>>>20012  20012  <1.1.12:2334480598>2334480599  
>>> cluster
>>>
>>> // runs latest kernel on all CPU cards.
>>> [root@2 ~]# tipc-config -nt |grep 2334480598
>>> 50009  20012  20012  <1.1.12:2334480598>2334480598  
>>> cluster
>>
>> This was easy to reproduce, and actually looks like another presentation 
>> problem.
>>
>> When I run the "tipc" tool  I see the correct value, i.e., key == (portid + 
>> 1).
>> When I run tipc-config (which is deprecated in the new version), I see the 
>> wrong value key == portid for the same publications!
>>
>> So, your code will probably work correct, but the values presented will be 
>> wrong on the new version and correct on the old one.
>> I think this is something for Richard Alpe, who wrote the new netlink 
>> compatibility code, to take a look at.
>>
>> ///jon
>>
>>>
>>>
>>> On Thu, Apr 28, 2016 at 8:19 PM, GUNA  wrote:
>>> > Thanks Jon. I already applied this patch on currently running module.
>>> >
>>> > On Thursday, April 28, 2016, Jon Maloy  wrote:
>>> >>
>>> >> Here it is. (This is just pasted into Outlook, so don’t try to apply it)
>>> >>
>>> >> If you manually add the two skb_linearize() calls and the update of ‘hdr’
>>> >> you should be safe.
>>> >>
>>> >>
>>> >>
>>> >> Good luck!
>>> >>
>>> >> ///jon
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> commit c7cad0d6f70cd4ce8644ffe528a4df1cdc2e77f5
>>> >>
>>> >> Author: Jon Paul Maloy 
>>> >>
>>> >> Date:   Thu Nov 19 14:30:40 2015 -0500
>>> >>
>>> >>
>>> >>
>>> >> tipc: move linearization of buffers to generic code
>>> >>
>>> >>
>>> >>
>>> >> In commit 5cbb28a4bf65c7e4 ("tipc: linearize arriving NAME_DISTR
>>> >>
>>> >> and LINK_PROTO buffers") we added linearization of NAME_DISTRIBUTOR,
>>> >>
>>> >> LINK_PROTOCOL/RESET and LINK_PROTOCOL/ACTIVATE to the function
>>> >>
>>> >> tipc_udp_recv(). The location of the change was selected in order
>>> >>
>>> >> to make the commit easily appliable to 'net' and 'stable'.
>>> >>
>>> >>
>>> >>
>>> >> We now move this linearization to where it should be done, in the
>>> >>
>>> >> functions tipc_named_rcv() and tipc_link_proto_rcv() respectively.
>>> >>
>>> >>
>>> >>
>>> >> Reviewed-by: Ying Xue 
>>> >>
>>> >> Signed-off-by: Jon Maloy 
>>> >>
>>> >> Signed-off-by: David S. Miller 
>>> &

Re: [tipc-discussion] tipc: tipc_recv_stream with kernel panic

2016-05-03 Thread GUNA
Thanks Ying.

As you suggested, I will revert the "tipc: avoid packets leaking on
socket receive queue" patch. Since the issue is not reproducible on
demand, I may need to wait the issue is seen again or not with the new
driver.

We do experience in traffic throughput due to TIPC connections are
failing (not with this change). If anyone aware of the failures please
let me know.


Background 
System is originally based on Kernel 3.4.2 on Fedora 16 and stable.
Recently, I updated the system with kernel 4.4.0. All stock kernel
drivers are being used and no customization except ported some latest
TIPC patches to fix some TIPC issues.

All 6 routing CPUs went down over the course of the weekend. The noted
output is from one CPU; others are unknown, but assumed to have gone
down with the same cause. Also note that in addition, one of the
routing cards (no output available) went down again on Monday.

The new kernel 4.4.0 is being used in the system Since April 1st, and
seen the issue 3 times so far. All these times, mostly heartbeat type
traffic, not heavy traffic.
=



On Tue, May 3, 2016 at 6:26 AM, Xue, Ying  wrote:
> I agree with Erik too.
>
>
>
> The oops should be caused by socket was freed early. But
>
>
>
> GUNA, can you reproduce the issue? If so, please try to revert the commit
> f4195d1eac954a67adf112dd53404560cc55b942 (“tipc: avoid packets leaking on
> socket receive queue”), and verify whether the issue occurs or not.
>
>
>
> I suspect the commit bring some unknown side effect, leading to the panic.
>
>
>
> Thanks,
>
> Ying
>
> From: Erik Hugne [mailto:erik.hu...@gmail.com]
> Sent: 2016年5月3日 13:09
> To: GUNA
> Cc: tipc-discussion@lists.sourceforge.net;
> parthasarathy.bhuvara...@ericsson.com; Richard Alpe; Xue, Ying
> Subject: Re: [tipc-discussion] tipc: tipc_recv_stream with kernel panic
>
>
>
> (On mobile)
>
> At first glance, it seems that the socket was freed, but there was a pending
> wakeup signal for it. Which then causes the subsequent spin_lock_bh() to
> deref freed mem.
>
> //E
>
> On May 3, 2016 02:43, "GUNA"  wrote
> [...]
>>> [375832.498126] BUG: unable to handle kernel paging request at
>>> 01a400015ff4
>> [375832.505300] IP: []
>> queued_spin_lock_slowpath+0xe6/0x160
>> [375832.512394] PGD 0
>> [375832.514657] Oops: 0002 [#1] SMP
>> [375832.518306] Modules linked in: nf_log_ipv6 nf_log_ipv4
>> nf_log_common xt_LOG sctp libcrc32c e1000e tipc udp_tunnel
>> ip6_udp_tunnel 8021q garp iTCO_wdt xt_physdev br_netfilter bridge stp
>> llc nf_conntrack_ipv4 nf_defrag_ipv4 ipmiq_drv(O) sio_mmc(O)
>> ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state
>> nf_conntrack lockd ip6table_filter event_drv(O) ip6_tables grace
>> pt_timer_info(O) ddi(O) usb_storage ixgbe igb i2c_i801
>> iTCO_vendor_support i2c_algo_bit ioatdma intel_ips i2c_core pcspkr
>> sunrpc ptp mdio dca pps_core lpc_ich tpm_tis mfd_core tpm [last
>> unloaded: iTCO_wdt]
>> [375832.573693] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G   O
>>  4.4.0 #14
>> [375832.581385] Hardware name: PT AMC124/Base Board Product Name, BIOS
>> LGNAJFIP.PTI.0012.P15 01/15/2014
>> [375832.591028] task: 880351a89b40 ti: 880351a9 task.ti:
>> 880351a9
>> [375832.599026] RIP: 0010:[]  []
>> queued_spin_lock_slowpath+0xe6/0x160
>> [375832.608964] RSP: 0018:88035fc83d58  EFLAGS: 00010002
>> [375832.614825] RAX: 1447 RBX: 0292 RCX:
>> 88035fc95fc0
>> [375832.622743] RDX: 01a400015ff4 RSI: 0014 RDI:
>> 880351232f80
>> [375832.630567] RBP: 88035fc83d58 R08: 0101 R09:
>> 0004
>> [375832.638348] R10:  R11:  R12:
>> 01001002
>> [375832.645919] R13: 0001 R14:  R15:
>> 
>> [375832.653610] FS:  () GS:88035fc8()
>> knlGS:
>> [375832.662317] CS:  0010 DS:  ES:  CR0: 8005003b
>> [375832.668483] CR2: 01a400015ff4 CR3: 01c0a000 CR4:
>> 06e0
>> [375832.676133] Stack:
>> [375832.678344]  88035fc83d78 816de2c1 88034a8bba60
>> 880351232f80
>> [375832.686163]  88035fc83db8 810bc592 88035fc83dc8
>> 880351758000
>> [375832.694139]  01001002  b802f4bd
>> a024e6f0
>> [375832.702154] Call Trace:
>> [375832.704844]  
>> [375832.707018]  [] rqsav_raw_spin_lock_ie+0x31/0x40
>
> rqsav_raw_spin_lock_ie??
> Is this some proprietary 

[tipc-discussion] tipc: tipc_recv_stream with kernel panic

2016-05-02 Thread GUNA
The following TIPC traces were collected after cards were forced to
reboot to recover them.
Kernel: 4.4.0 is running and applied some latest TIPC patches.

[   65.954959] sm-msp-queue[1279]: unable to qualify my own domain
name (dcsx5testslot3) -- using short name
[  632.098785] perf interrupt took too long (2505 > 2500), lowering
kernel.perf_event_max_sample_rate to 5
[ 5880.428123] perf interrupt took too long (5585 > 5000), lowering
kernel.perf_event_max_sample_rate to 25000
[17934.014969] CE: hpet increased min_delta_ns to 20115 nsec
[38956.721789] CE: hpet4 increased min_delta_ns to 20115 nsec
[46927.872827] hrtimer: interrupt took 63361 ns
[101662.241093] CE: hpet2 increased min_delta_ns to 20115 nsec
[245973.044600] CE: hpet6 increased min_delta_ns to 20115 nsec
[368639.565040] show_signal_msg: 6 callbacks suppressed
[375832.498126] BUG: unable to handle kernel paging request at 01a400015ff4
[375832.505300] IP: [] queued_spin_lock_slowpath+0xe6/0x160
[375832.512394] PGD 0
[375832.514657] Oops: 0002 [#1] SMP
[375832.518306] Modules linked in: nf_log_ipv6 nf_log_ipv4
nf_log_common xt_LOG sctp libcrc32c e1000e tipc udp_tunnel
ip6_udp_tunnel 8021q garp iTCO_wdt xt_physdev br_netfilter bridge stp
llc nf_conntrack_ipv4 nf_defrag_ipv4 ipmiq_drv(O) sio_mmc(O)
ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state
nf_conntrack lockd ip6table_filter event_drv(O) ip6_tables grace
pt_timer_info(O) ddi(O) usb_storage ixgbe igb i2c_i801
iTCO_vendor_support i2c_algo_bit ioatdma intel_ips i2c_core pcspkr
sunrpc ptp mdio dca pps_core lpc_ich tpm_tis mfd_core tpm [last
unloaded: iTCO_wdt]
[375832.573693] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G   O
 4.4.0 #14
[375832.581385] Hardware name: PT AMC124/Base Board Product Name, BIOS
LGNAJFIP.PTI.0012.P15 01/15/2014
[375832.591028] task: 880351a89b40 ti: 880351a9 task.ti:
880351a9
[375832.599026] RIP: 0010:[]  []
queued_spin_lock_slowpath+0xe6/0x160
[375832.608964] RSP: 0018:88035fc83d58  EFLAGS: 00010002
[375832.614825] RAX: 1447 RBX: 0292 RCX:
88035fc95fc0
[375832.622743] RDX: 01a400015ff4 RSI: 0014 RDI:
880351232f80
[375832.630567] RBP: 88035fc83d58 R08: 0101 R09:
0004
[375832.638348] R10:  R11:  R12:
01001002
[375832.645919] R13: 0001 R14:  R15:

[375832.653610] FS:  () GS:88035fc8()
knlGS:
[375832.662317] CS:  0010 DS:  ES:  CR0: 8005003b
[375832.668483] CR2: 01a400015ff4 CR3: 01c0a000 CR4:
06e0
[375832.676133] Stack:
[375832.678344]  88035fc83d78 816de2c1 88034a8bba60
880351232f80
[375832.686163]  88035fc83db8 810bc592 88035fc83dc8
880351758000
[375832.694139]  01001002  b802f4bd
a024e6f0
[375832.702154] Call Trace:
[375832.704844]  
[375832.707018]  [] rqsav_raw_spin_lock_ie+0x31/0x40
[375832.713970]  [] __wake_up+0x32/0x70
[375832.719444]  [] ? tipc_recv_stream+0x370/0x370 [tipc]
[375832.726589]  [] sock_def_wakeup+0x30/0x40
[375832.732566]  [] tipc_sk_timeout+0x148/0x180 [tipc]
[375832.739388]  [] ? tipc_recv_stream+0x370/0x370 [tipc]
[375832.746507]  [] call_timer_fn+0x44/0x110
[375832.752378]  [] ? cascade+0x4a/0x80
[375832.757848]  [] ? tipc_recv_stream+0x370/0x370 [tipc]
[375832.764871]  [] run_timer_softirq+0x22c/0x280
[375832.771175]  [] __do_softirq+0xc8/0x260
[375832.776958]  [] irq_exit+0x83/0xb0
[375832.782369]  [] do_IRQ+0x65/0xf0
[375832.787607]  [] common_interrupt+0x7f/0x7f
[375832.793709]  
[375832.795803]  [] ? cpuidle_enter_state+0xad/0x200
[375832.802765]  [] ? cpuidle_enter_state+0x91/0x200
[375832.809338]  [] cpuidle_enter+0x17/0x20
[375832.815155]  [] call_cpuidle+0x37/0x60
[375832.821184]  [] ? cpuidle_select+0x13/0x20
[375832.827249]  [] cpu_startup_entry+0x211/0x2d0
[375832.833535]  [] start_secondary+0x103/0x130
[375832.839759] Code: 87 47 02 c1 e0 10 85 c0 74 38 48 89 c2 c1 e8 12
48 c1 ea 0c 83 e8 01 83 e2 30 48 98 48 81 c2 c0 5f 01 00 48 03 14 c5
00 b2 d1 81 <48> 89 0a 8b 41 08 85 c0 75 0d f3 90 8b 41 08 85 c0 74 f7
eb 02
[375832.861151] RIP  [] queued_spin_lock_slowpath+0xe6/0x160
[375832.868607]  RSP 
[375832.872371] CR2: 01a400015ff4
[375832.876408] ---[ end trace f12e0074b180a165 ]---
[375832.881433] Kernel panic - not syncing: Fatal exception in interrupt
[375832.888391] Kernel Offset: disabled
[375832.891968] ---[ end Kernel panic - not syncing: Fatal exception
in interrupt

--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/3

Re: [tipc-discussion] Tipc: name table mismatch between different cards in a system

2016-05-02 Thread GUNA
Is there any possibility getting the fix soon? Our audit scripts cause
alarm due to incorrect table mismatch. If you point me the code to be
fixed then I will fix it in my kernel. I am using kernel 4.4.0 on
Fedora dist.
Thanks in advance.
Guna

On Fri, Apr 29, 2016 at 11:55 AM, Jon Maloy  wrote:
>
>
>> -Original Message-----
>> From: GUNA [mailto:gbala...@gmail.com]
>> Sent: Friday, 29 April, 2016 10:48
>> To: Jon Maloy
>> Cc: tipc-discussion@lists.sourceforge.net
>> Subject: Re: Tipc: name table mismatch between different cards in a system
>>
>> The two skb_linearize() calls and the update of ‘hdr' fixes are
>> already in my load did not solve this issue. The issue remains same
>> even after today's ACTIVE state fix (before one of link is STANDBY
>> even same priority)
>>
>> // IO card, note this does not run latest kernel or tipc
>> [root@10 ~]# tipc-config -nt |grep 2334480598
>>20012  20012  <1.1.12:2334480598>2334480599  
>> cluster
>>
>> // runs latest kernel on all CPU cards.
>> [root@2 ~]# tipc-config -nt |grep 2334480598
>> 50009  20012  20012  <1.1.12:2334480598>2334480598  
>> cluster
>
> This was easy to reproduce, and actually looks like another presentation 
> problem.
>
> When I run the "tipc" tool  I see the correct value, i.e., key == (portid + 
> 1).
> When I run tipc-config (which is deprecated in the new version), I see the 
> wrong value key == portid for the same publications!
>
> So, your code will probably work correct, but the values presented will be 
> wrong on the new version and correct on the old one.
> I think this is something for Richard Alpe, who wrote the new netlink 
> compatibility code, to take a look at.
>
> ///jon
>
>>
>>
>> On Thu, Apr 28, 2016 at 8:19 PM, GUNA  wrote:
>> > Thanks Jon. I already applied this patch on currently running module.
>> >
>> > On Thursday, April 28, 2016, Jon Maloy  wrote:
>> >>
>> >> Here it is. (This is just pasted into Outlook, so don’t try to apply it)
>> >>
>> >> If you manually add the two skb_linearize() calls and the update of ‘hdr’
>> >> you should be safe.
>> >>
>> >>
>> >>
>> >> Good luck!
>> >>
>> >> ///jon
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> commit c7cad0d6f70cd4ce8644ffe528a4df1cdc2e77f5
>> >>
>> >> Author: Jon Paul Maloy 
>> >>
>> >> Date:   Thu Nov 19 14:30:40 2015 -0500
>> >>
>> >>
>> >>
>> >> tipc: move linearization of buffers to generic code
>> >>
>> >>
>> >>
>> >> In commit 5cbb28a4bf65c7e4 ("tipc: linearize arriving NAME_DISTR
>> >>
>> >> and LINK_PROTO buffers") we added linearization of NAME_DISTRIBUTOR,
>> >>
>> >> LINK_PROTOCOL/RESET and LINK_PROTOCOL/ACTIVATE to the function
>> >>
>> >> tipc_udp_recv(). The location of the change was selected in order
>> >>
>> >> to make the commit easily appliable to 'net' and 'stable'.
>> >>
>> >>
>> >>
>> >> We now move this linearization to where it should be done, in the
>> >>
>> >> functions tipc_named_rcv() and tipc_link_proto_rcv() respectively.
>> >>
>> >>
>> >>
>> >> Reviewed-by: Ying Xue 
>> >>
>> >> Signed-off-by: Jon Maloy 
>> >>
>> >> Signed-off-by: David S. Miller 
>> >>
>> >>
>> >>
>> >> diff --git a/net/tipc/link.c b/net/tipc/link.c
>> >>
>> >> index 9efbdbd..fa452fb 100644
>> >>
>> >> --- a/net/tipc/link.c
>> >>
>> >> +++ b/net/tipc/link.c
>> >>
>> >> @@ -1260,6 +1260,8 @@ static int tipc_link_proto_rcv(struct tipc_link *l,
>> >> struct sk_buff *skb,
>> >>
>> >> /* fall thru' */
>> >>
>> >> case ACTIVATE_MSG:
>> >>
>> >> +   skb_linearize(skb);
>> >>
>> >> +   hdr = buf_msg(skb);
>> >>
>> >> /* Complete own link name with peer's interface name */
>> >>
>> >> if_nam

[tipc-discussion] tipc utility: No tipc binary in iproute2

2016-04-29 Thread GUNA
I have compiled on server as well as on target. Both cases, the "tipc"
utility is not built. The rest of the utilities are built fine.

Tried iproute2-4.4.0 and iproute2-4.5.0 versions

===
make clean
make
...
make[1]: Leaving directory `/root/iproute2-4.4.0/genl'
make[1]: Entering directory `/root/iproute2-4.4.0/tipc'
make[1]: Nothing to be done for `all'.
make[1]: Leaving directory `/root/iproute2-4.4.0/tipc'
make[1]: Entering directory `/root/iproute2-4.4.0/man'

Am I missing any?

Thank you,
Guna

--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] Tipc: name table mismatch between different cards in a system

2016-04-29 Thread GUNA
Thank you Jon.

Richard, could you let me know the fix for this please.

thanks,
Guna

On Fri, Apr 29, 2016 at 11:55 AM, Jon Maloy  wrote:
>
>
>> -Original Message-----
>> From: GUNA [mailto:gbala...@gmail.com]
>> Sent: Friday, 29 April, 2016 10:48
>> To: Jon Maloy
>> Cc: tipc-discussion@lists.sourceforge.net
>> Subject: Re: Tipc: name table mismatch between different cards in a system
>>
>> The two skb_linearize() calls and the update of ‘hdr' fixes are
>> already in my load did not solve this issue. The issue remains same
>> even after today's ACTIVE state fix (before one of link is STANDBY
>> even same priority)
>>
>> // IO card, note this does not run latest kernel or tipc
>> [root@10 ~]# tipc-config -nt |grep 2334480598
>>20012  20012  <1.1.12:2334480598>2334480599  
>> cluster
>>
>> // runs latest kernel on all CPU cards.
>> [root@2 ~]# tipc-config -nt |grep 2334480598
>> 50009  20012  20012  <1.1.12:2334480598>2334480598  
>> cluster
>
> This was easy to reproduce, and actually looks like another presentation 
> problem.
>
> When I run the "tipc" tool  I see the correct value, i.e., key == (portid + 
> 1).
> When I run tipc-config (which is deprecated in the new version), I see the 
> wrong value key == portid for the same publications!
>
> So, your code will probably work correct, but the values presented will be 
> wrong on the new version and correct on the old one.
> I think this is something for Richard Alpe, who wrote the new netlink 
> compatibility code, to take a look at.
>
> ///jon
>
>>
>>
>> On Thu, Apr 28, 2016 at 8:19 PM, GUNA  wrote:
>> > Thanks Jon. I already applied this patch on currently running module.
>> >
>> > On Thursday, April 28, 2016, Jon Maloy  wrote:
>> >>
>> >> Here it is. (This is just pasted into Outlook, so don’t try to apply it)
>> >>
>> >> If you manually add the two skb_linearize() calls and the update of ‘hdr’
>> >> you should be safe.
>> >>
>> >>
>> >>
>> >> Good luck!
>> >>
>> >> ///jon
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> commit c7cad0d6f70cd4ce8644ffe528a4df1cdc2e77f5
>> >>
>> >> Author: Jon Paul Maloy 
>> >>
>> >> Date:   Thu Nov 19 14:30:40 2015 -0500
>> >>
>> >>
>> >>
>> >> tipc: move linearization of buffers to generic code
>> >>
>> >>
>> >>
>> >> In commit 5cbb28a4bf65c7e4 ("tipc: linearize arriving NAME_DISTR
>> >>
>> >> and LINK_PROTO buffers") we added linearization of NAME_DISTRIBUTOR,
>> >>
>> >> LINK_PROTOCOL/RESET and LINK_PROTOCOL/ACTIVATE to the function
>> >>
>> >> tipc_udp_recv(). The location of the change was selected in order
>> >>
>> >> to make the commit easily appliable to 'net' and 'stable'.
>> >>
>> >>
>> >>
>> >> We now move this linearization to where it should be done, in the
>> >>
>> >> functions tipc_named_rcv() and tipc_link_proto_rcv() respectively.
>> >>
>> >>
>> >>
>> >> Reviewed-by: Ying Xue 
>> >>
>> >> Signed-off-by: Jon Maloy 
>> >>
>> >> Signed-off-by: David S. Miller 
>> >>
>> >>
>> >>
>> >> diff --git a/net/tipc/link.c b/net/tipc/link.c
>> >>
>> >> index 9efbdbd..fa452fb 100644
>> >>
>> >> --- a/net/tipc/link.c
>> >>
>> >> +++ b/net/tipc/link.c
>> >>
>> >> @@ -1260,6 +1260,8 @@ static int tipc_link_proto_rcv(struct tipc_link *l,
>> >> struct sk_buff *skb,
>> >>
>> >> /* fall thru' */
>> >>
>> >> case ACTIVATE_MSG:
>> >>
>> >> +   skb_linearize(skb);
>> >>
>> >> +   hdr = buf_msg(skb);
>> >>
>> >> /* Complete own link name with peer's interface name */
>> >>
>> >> if_name =  strrchr(l->name, ':') + 1;
>> >>
>> >> diff --git a/net/tipc/name_distr.c b/net/tipc/name_distr.c
>> >>
>> >> index c07612b..f51c8bd

Re: [tipc-discussion] Tipc: name table mismatch between different cards in a system

2016-04-29 Thread GUNA
The two skb_linearize() calls and the update of ‘hdr' fixes are
already in my load did not solve this issue. The issue remains same
even after today's ACTIVE state fix (before one of link is STANDBY
even same priority)

// IO card, note this does not run latest kernel or tipc
[root@10 ~]# tipc-config -nt |grep 2334480598
   20012  20012  <1.1.12:2334480598>2334480599  cluster

// runs latest kernel on all CPU cards.
[root@2 ~]# tipc-config -nt |grep 2334480598
50009  20012  20012  <1.1.12:2334480598>2334480598  cluster


On Thu, Apr 28, 2016 at 8:19 PM, GUNA  wrote:
> Thanks Jon. I already applied this patch on currently running module.
>
> On Thursday, April 28, 2016, Jon Maloy  wrote:
>>
>> Here it is. (This is just pasted into Outlook, so don’t try to apply it)
>>
>> If you manually add the two skb_linearize() calls and the update of ‘hdr’
>> you should be safe.
>>
>>
>>
>> Good luck!
>>
>> ///jon
>>
>>
>>
>>
>>
>> commit c7cad0d6f70cd4ce8644ffe528a4df1cdc2e77f5
>>
>> Author: Jon Paul Maloy 
>>
>> Date:   Thu Nov 19 14:30:40 2015 -0500
>>
>>
>>
>> tipc: move linearization of buffers to generic code
>>
>>
>>
>> In commit 5cbb28a4bf65c7e4 ("tipc: linearize arriving NAME_DISTR
>>
>> and LINK_PROTO buffers") we added linearization of NAME_DISTRIBUTOR,
>>
>> LINK_PROTOCOL/RESET and LINK_PROTOCOL/ACTIVATE to the function
>>
>> tipc_udp_recv(). The location of the change was selected in order
>>
>> to make the commit easily appliable to 'net' and 'stable'.
>>
>>
>>
>> We now move this linearization to where it should be done, in the
>>
>> functions tipc_named_rcv() and tipc_link_proto_rcv() respectively.
>>
>>
>>
>> Reviewed-by: Ying Xue 
>>
>> Signed-off-by: Jon Maloy 
>>
>> Signed-off-by: David S. Miller 
>>
>>
>>
>> diff --git a/net/tipc/link.c b/net/tipc/link.c
>>
>> index 9efbdbd..fa452fb 100644
>>
>> --- a/net/tipc/link.c
>>
>> +++ b/net/tipc/link.c
>>
>> @@ -1260,6 +1260,8 @@ static int tipc_link_proto_rcv(struct tipc_link *l,
>> struct sk_buff *skb,
>>
>> /* fall thru' */
>>
>> case ACTIVATE_MSG:
>>
>> +   skb_linearize(skb);
>>
>> +   hdr = buf_msg(skb);
>>
>> /* Complete own link name with peer's interface name */
>>
>> if_name =  strrchr(l->name, ':') + 1;
>>
>> diff --git a/net/tipc/name_distr.c b/net/tipc/name_distr.c
>>
>> index c07612b..f51c8bd 100644
>>
>> --- a/net/tipc/name_distr.c
>>
>> +++ b/net/tipc/name_distr.c
>>
>> @@ -397,6 +397,7 @@ void tipc_named_rcv(struct net *net, struct
>> sk_buff_head *inputq)
>>
>> spin_lock_bh(&tn->nametbl_lock);
>>
>> for (skb = skb_dequeue(inputq); skb; skb = skb_dequeue(inputq)) {
>>
>> +   skb_linearize(skb);
>>
>> msg = buf_msg(skb);
>>
>> mtype = msg_type(msg);
>>
>> item = (struct distr_item *)msg_data(msg);
>>
>> diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c
>>
>> index ad2719a..816914e 100644
>>
>> --- a/net/tipc/udp_media.c
>>
>> +++ b/net/tipc/udp_media.c
>>
>> @@ -48,7 +48,6 @@
>>
>> #include 
>>
>> #include "core.h"
>>
>> #include "bearer.h"
>>
>> -#include "msg.h"
>>
>>  /* IANA assigned UDP port */
>>
>> #define UDP_PORT_DEFAULT   6118
>>
>> @@ -221,10 +220,6 @@ static int tipc_udp_recv(struct sock *sk, struct
>> sk_buff *skb)
>>
>> {
>>
>> struct udp_bearer *ub;
>>
>> struct tipc_bearer *b;
>>
>> -   int usr = msg_user(buf_msg(skb));
>>
>> -
>>
>> -   if ((usr == LINK_PROTOCOL) || (usr == NAME_DISTRIBUTOR))
>>
>> -   skb_linearize(skb);
>>
>> ub = rcu_dereference_sk_user_data(sk);
>>
>> if (!ub) {
>>
>>
>>
>> From: GUNA [mailto:gbala...@gmail.com]
>> Sent: Thursday, 28 April, 2016 19:24
>> To: Jon Maloy
>> Cc: tipc-discussion@lists.sourceforge.net
>> Subject: Re: Tipc: name table mismatch between different cards in a system
&

Re: [tipc-discussion] [PATCH net-next 1/1] tipc: set 'active' state correctly for first established link

2016-04-29 Thread GUNA
I have tested this fix and both links with same priority now are ACTIVE
state instead of one STANDBY.
Thanks Jon.


On Thu, Apr 28, 2016 at 8:21 PM, GUNA  wrote:

> Thanks Jon. I will try it tomorrow.
>
>
> On Thursday, April 28, 2016, Jon Maloy  wrote:
>
>> This one...
>> ///jon
>>
>>
>>  Forwarded Message 
>> Subject: [PATCH net-next 1/1] tipc: set 'active' state correctly for
>> first established link
>> Date: Thu, 28 Apr 2016 20:16:08 -0400
>> From: Jon Maloy 
>> To: da...@davemloft.net
>> CC: net...@vger.kernel.org, Paul Gortmaker ,
>> parthasarathy.bhuvara...@ericsson.com, richard.a...@ericsson.com,
>> ying@windriver.com, ma...@donjonn.com,
>> tipc-discussion@lists.sourceforge.net, Jon Maloy 
>>
>> When we are displaying statistics for the first link established between
>> two peers, it will always be presented as STANDBY although it in reality
>> is ACTIVE.
>>
>> This happens because we forget to set the 'active' flag in the link
>> instance at the moment it is established. Although this is a bug, it only
>> has impact on the presentation view of the link, not on its actual
>> functionality.
>>
>> Signed-off-by: Jon Maloy 
>> ---
>>  net/tipc/node.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/net/tipc/node.c b/net/tipc/node.c
>> index 68d9f7b..c299156 100644
>> --- a/net/tipc/node.c
>> +++ b/net/tipc/node.c
>> @@ -554,6 +554,7 @@ static void __tipc_node_link_up(struct tipc_node *n, int 
>> bearer_id,
>>  *slot1 = bearer_id;
>>  tipc_node_fsm_evt(n, SELF_ESTABL_CONTACT_EVT);
>>  n->action_flags |= TIPC_NOTIFY_NODE_UP;
>> +tipc_link_set_active(nl, true);
>>  tipc_bcast_add_peer(n->net, nl, xmitq);
>>  return;
>>  }
>> --
>> 1.9.1
>>
>>
>>
>>
>>
--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] Tipc: name table mismatch between different cards in a system

2016-04-28 Thread GUNA
Thanks Jon. I already applied this patch on currently running module.

On Thursday, April 28, 2016, Jon Maloy  wrote:

> Here it is. (This is just pasted into Outlook, so don’t try to apply it)
>
> If you manually add the two skb_linearize() calls and the update of ‘hdr’
> you should be safe.
>
>
>
> Good luck!
>
> ///jon
>
>
>
>
>
> commit c7cad0d6f70cd4ce8644ffe528a4df1cdc2e77f5
>
> Author: Jon Paul Maloy  >
>
> Date:   Thu Nov 19 14:30:40 2015 -0500
>
>
>
> tipc: move linearization of buffers to generic code
>
>
>
> In commit 5cbb28a4bf65c7e4 ("tipc: linearize arriving NAME_DISTR
>
> and LINK_PROTO buffers") we added linearization of NAME_DISTRIBUTOR,
>
> LINK_PROTOCOL/RESET and LINK_PROTOCOL/ACTIVATE to the function
>
> tipc_udp_recv(). The location of the change was selected in order
>
> to make the commit easily appliable to 'net' and 'stable'.
>
>
>
> We now move this linearization to where it should be done, in the
>
> functions tipc_named_rcv() and tipc_link_proto_rcv() respectively.
>
>
>
> Reviewed-by: Ying Xue  >
>
> Signed-off-by: Jon Maloy  >
>
> Signed-off-by: David S. Miller  >
>
>
>
> diff --git a/net/tipc/link.c b/net/tipc/link.c
>
> index 9efbdbd..fa452fb 100644
>
> --- a/net/tipc/link.c
>
> +++ b/net/tipc/link.c
>
> @@ -1260,6 +1260,8 @@ static int tipc_link_proto_rcv(struct tipc_link *l,
> struct sk_buff *skb,
>
> /* fall thru' */
>
> case ACTIVATE_MSG:
>
> +   skb_linearize(skb);
>
> +   hdr = buf_msg(skb);
>
> /* Complete own link name with peer's interface name */
>
> if_name =  strrchr(l->name, ':') + 1;
>
> diff --git a/net/tipc/name_distr.c b/net/tipc/name_distr.c
>
> index c07612b..f51c8bd 100644
>
> --- a/net/tipc/name_distr.c
>
> +++ b/net/tipc/name_distr.c
>
> @@ -397,6 +397,7 @@ void tipc_named_rcv(struct net *net, struct
> sk_buff_head *inputq)
>
> spin_lock_bh(&tn->nametbl_lock);
>
> for (skb = skb_dequeue(inputq); skb; skb = skb_dequeue(inputq)) {
>
> +   skb_linearize(skb);
>
> msg = buf_msg(skb);
>
> mtype = msg_type(msg);
>
> item = (struct distr_item *)msg_data(msg);
>
> diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c
>
> index ad2719a..816914e 100644
>
> --- a/net/tipc/udp_media.c
>
> +++ b/net/tipc/udp_media.c
>
> @@ -48,7 +48,6 @@
>
> #include 
>
> #include "core.h"
>
> #include "bearer.h"
>
> -#include "msg.h"
>
>  /* IANA assigned UDP port */
>
> #define UDP_PORT_DEFAULT   6118
>
> @@ -221,10 +220,6 @@ static int tipc_udp_recv(struct sock *sk, struct
> sk_buff *skb)
>
> {
>
> struct udp_bearer *ub;
>
> struct tipc_bearer *b;
>
> -   int usr = msg_user(buf_msg(skb));
>
> -
>
> -   if ((usr == LINK_PROTOCOL) || (usr == NAME_DISTRIBUTOR))
>
> -   skb_linearize(skb);
>
> ub = rcu_dereference_sk_user_data(sk);
>
> if (!ub) {
>
>
>
> *From:* GUNA [mailto:gbala...@gmail.com
> ]
> *Sent:* Thursday, 28 April, 2016 19:24
> *To:* Jon Maloy
> *Cc:* tipc-discussion@lists.sourceforge.net
> 
> *Subject:* Re: Tipc: name table mismatch between different cards in a
> system
>
>
>
> See inline
>
> On Thursday, April 28, 2016, Jon Maloy  > wrote:
>
>
>
> > -Original Message-
> > From: GUNA [mailto:gbala...@gmail.com]
> > Sent: Thursday, 28 April, 2016 17:43
> > To: tipc-discussion@lists.sourceforge.net
> > Subject: [tipc-discussion] Tipc: name table mismatch between different
> cards in a
> > system
> >
> > After upgraded CPU cards to 4.4.0 Kernel, there is table mismatch between
> > CPU and IO cards. The IO Publication value = CPU Publication + 1 as you
> see
> > example below:
> >
> > In CPU (slot 2)
> >
> > Type   Lower  Upper  Port Identity   Publication
> >
> > 16789314 3201 3201 <1.1.6:1540208445>   1540208445
> >
> > 16789823 4 4 <1.1.6:3035967304>   3035967304
> >
> > 16832168 3201 3201 <1.1.6:723652841> 723652841
> >
> > …
> >
> >
> >
> > In IO (slot10)
> >
> > 16789314 3201 3201 <1.1.6:1540208445>  1540208446
> >
> > 16789823 4 4 <1.1.6:3035967304

Re: [tipc-discussion] Kernel 4.4.0 TIPC: links were bouncing and not stable enough

2016-04-28 Thread GUNA
Thanks Jon.

Please see inline


On Thursday, April 28, 2016, Jon Maloy  wrote:

> Hi Guna,
> see below.
>
>
> > -Original Message-----
> > From: GUNA [mailto:gbala...@gmail.com ]
> > Sent: Thursday, 28 April, 2016 10:27
> > To: tipc-discussion@lists.sourceforge.net 
> > Subject: [tipc-discussion] Kernel 4.4.0 TIPC: links were bouncing and
> not stable
> > enough
> >
> > Hi Jon,
> >
> > Back to debugging the table mismatch and standby links issues ...
> >
> > I need to clarify two items first as described below. The both issues are
> > reported by our audit script and works fine for kernel 3.4.2 but not for
> > new kernel 4.4.0
> >
> > 1. Table mismatch
> > This is due to bunch of entries with type 2, "node" scope that differs
> from
> > each other.
> > Since the type "2"  is internal and "node" scope, do we expect this to be
> > matched with other node's table? Any change on latest TIPC?
> >
> > // slot3
> >
> > 2  16781314   16781314   <1.1.3:0>  0
>  node
> > 2  16781314   16781314   <1.1.3:1>  1
>  node
> > 2  16781324   16781324   <1.1.3:1>  1
>  node
> > 2  16781324   16781324   <1.1.3:0>  0
>  node
> > 2  16781325   16781325   <1.1.3:0>  0
>  node
> > 2  16781325   16781325   <1.1.3:1>  1
>  node
> >
>
> This is a new feature in 4.0+. It actually shows the working links on this
> node towards other nodes, not only the connectivity to a node, as type "0"
> does.
> I can read form this that you typed the command on node <1.1.3>, and that
> that node has two links towards each of <1.1.2>,< 1.1.12> and <1.1.13>
> respectively, (16781314 in hex is 1001002, i.e. 1.1.2).
> You will find corresponding entries for the other endpoints of the links
> on the respective nodes.
> The fact that they are present in the table means they are up and working.
>
>
> >
> > // slot2
> > Type   Lower  Upper  Port Identity  Publication
> > Scope
> >
> > 2  16781315   16781315   <1.1.2:0>  0
>  node
> > 2  16781315   16781315   <1.1.2:1>  1
>  node
> > 2  16781324   16781324   <1.1.2:0>  0
>  node
> > 2  16781324   16781324   <1.1.2:1>  1
>  node
> > 2  16781325   16781325   <1.1.2:0>  0
>  node
> > 2  16781325   16781325   <1.1.2:1>  1
>  node
> >
> >
> See above.
> <1.1.2> has links towards <1.1.3> as expected, and also towards <1.1.12>
> and <1.1.13>.
> So, everything is correct here.


Entries are correct on each node, however I can not use them when I compare
the tables between the nodes. Since this is a "node" scope, I will filter
out the type 2 prior to comparison.


> > 2. Active and standby links.
> > Our system has 2 bearer p19p1 and p19p2. Both links are ACTIVE in 3.4.2
> > kernel, but on new kernel the one comes as STANDBY.  Both have same
> > priority.
> > Is it expected behavior on latest TIPC?
>
> No. If they have the same priority they should both be active.
> I just checked this in the current code, and there is actually a bug, but
> only in the flag used to present the node, state, not the state as such.
> So, even if the statistics say STANDBY, it is still ACTIVE, and will take
> its full part of the load sharing. You can safely us it as before.
> I will post a patch to fix this upstream, but it probably won't go into
> kernel 4.4, since this is just a "cosmetic" bug.
> Thank you for reporting this.


If you let me know the fix, I will integrate with my kernel. Even though it
is safer to use, fIx is needed to avoid errors reported by our audit script.


> >
> > If both are expected behavior then I would change our audit script
> > accordingly. Otherwise, need to debug the issue.
> >
> > Link <1.1.2:p19p1-1.1.3:p19p1>
> >   ACTIVE  MTU:1500  Priority:10  Tolerance:1200 ms  Window:50 packets
> >   RX packets:4294901760 fragments:0/0 bundles:0/0
> >   TX packets:54487 fragments:0/0 bundles:0/0
> >   TX profile sample:164252 packets  average:30 octets
> >   0-64:97% -256:3% -1024:0% -4096:0% -16384:0% -32768:0% -66000:0%
> >   RX states:473939 probes:513 naks:1407 defs:0 dups:0
> >   TX states:999870 probes:472019 naks:0 ac

Re: [tipc-discussion] Tipc: name table mismatch between different cards in a system

2016-04-28 Thread GUNA
See inline

On Thursday, April 28, 2016, Jon Maloy  wrote:

>
>
> > -Original Message-
> > From: GUNA [mailto:gbala...@gmail.com ]
> > Sent: Thursday, 28 April, 2016 17:43
> > To: tipc-discussion@lists.sourceforge.net 
> > Subject: [tipc-discussion] Tipc: name table mismatch between different
> cards in a
> > system
> >
> > After upgraded CPU cards to 4.4.0 Kernel, there is table mismatch between
> > CPU and IO cards. The IO Publication value = CPU Publication + 1 as you
> see
> > example below:
> >
> > In CPU (slot 2)
> >
> > Type   Lower  Upper  Port Identity   Publication
> >
> > 16789314 3201 3201 <1.1.6:1540208445>   1540208445
> >
> > 16789823 4 4 <1.1.6:3035967304>   3035967304
> >
> > 16832168 3201 3201 <1.1.6:723652841> 723652841
> >
> > …
> >
> >
> >
> > In IO (slot10)
> >
> > 16789314 3201 3201 <1.1.6:1540208445>  1540208446
> >
> > 16789823 4 4 <1.1.6:3035967304>  3035967305
> >
> > 16832168 3201 3201 <1.1.6:723652841>723652842
> >
> > …
>
> This looks like another artefact of the problem we have in 4.4 with
> corrupted entries, where old entries are not withdrawn, so the new ones
> cannot replace them in "CPU", while it happens to work in "IO".   If  1.1.6
> is "IO" or a third node this could make sense.


1.1.6 is node6 but I see similar for all other nodes as well. Ie: same for
1.1.2,1.1.3,1.1.13.



> This problem has been fixed in Ubuntu 16.04, but has not been released
> yet. I would suggest you try 4.5 or 4.6rc4 to see if it disappears.
> According to Rune Torgersen it is definitely working in 4.6.
>
> ///jon
>
>
I could not move to new kernel now, but I could rebuild the 4.4.0 if I get
the fix.
If you are aware of the fix please let me know detail of the fix.


> >
> >
> > All the entries are similar pattern in both IO & CPU. The CPU match with
> > rest of the CPUs.
> >
> >
> > The system runs on previous kernel did not see this issue.
> >
> >
> > There are two links on IO and CPU cards.  Both links are ACTIVE in IO and
> > in CPU one ACTIVE and other STANDBY.
> >
> > Is that cause the issue?  If so, is it possible to make both links to be
> > ACTIVE ?
> >
> >
> > // IO
> >
> > Link <1.1.10:eth0-1.1.2:p19p1>
> >
> >   ACTIVE  MTU:1500  Priority:10  Tolerance:1200 ms  Window:50 packets
> >
> > Link <1.1.10:eth1-1.1.2:p19p2>
> >
> >   ACTIVE  MTU:1500  Priority:10  Tolerance:1200 ms  Window:50 packets
> >
> > // CPU
> > Link <1.1.2:p19p1-1.1.10:eth0>
> >   ACTIVE  MTU:1500  Priority:10  Tolerance:1200 ms  Window:50 packets
> >
> > Link <1.1.2:p19p2-1.1.10:eth1>
> >   STANDBY  MTU:1500  Priority:10  Tolerance:1200 ms  Window:50 packets
> >
> > thanks,
> >
> > Guna
> >
> --
> > Find and fix application performance issues faster with Applications
> Manager
> > Applications Manager provides deep performance insights into multiple
> tiers of
> > your business applications. It resolves application problems quickly and
> > reduces your MTTR. Get your free trial!
> > https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
> > ___
> > tipc-discussion mailing list
> > tipc-discussion@lists.sourceforge.net 
> > https://lists.sourceforge.net/lists/listinfo/tipc-discussion
>
--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


[tipc-discussion] Tipc: name table mismatch between different cards in a system

2016-04-28 Thread GUNA
After upgraded CPU cards to 4.4.0 Kernel, there is table mismatch between
CPU and IO cards. The IO Publication value = CPU Publication + 1 as you see
example below:

In CPU (slot 2)

Type   Lower  Upper  Port Identity   Publication

16789314 3201 3201 <1.1.6:1540208445>   1540208445

16789823 4 4 <1.1.6:3035967304>   3035967304

16832168 3201 3201 <1.1.6:723652841> 723652841

…



In IO (slot10)

16789314 3201 3201 <1.1.6:1540208445>  1540208446

16789823 4 4 <1.1.6:3035967304>  3035967305

16832168 3201 3201 <1.1.6:723652841>723652842

…


All the entries are similar pattern in both IO & CPU. The CPU match with
rest of the CPUs.


The system runs on previous kernel did not see this issue.


There are two links on IO and CPU cards.  Both links are ACTIVE in IO and
in CPU one ACTIVE and other STANDBY.

Is that cause the issue?  If so, is it possible to make both links to be
ACTIVE ?


// IO

Link <1.1.10:eth0-1.1.2:p19p1>

  ACTIVE  MTU:1500  Priority:10  Tolerance:1200 ms  Window:50 packets

Link <1.1.10:eth1-1.1.2:p19p2>

  ACTIVE  MTU:1500  Priority:10  Tolerance:1200 ms  Window:50 packets

// CPU
Link <1.1.2:p19p1-1.1.10:eth0>
  ACTIVE  MTU:1500  Priority:10  Tolerance:1200 ms  Window:50 packets

Link <1.1.2:p19p2-1.1.10:eth1>
  STANDBY  MTU:1500  Priority:10  Tolerance:1200 ms  Window:50 packets

thanks,

Guna
--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


[tipc-discussion] Kernel 4.4.0 TIPC: links were bouncing and not stable enough

2016-04-28 Thread GUNA
Hi Jon,

Back to debugging the table mismatch and standby links issues ...

I need to clarify two items first as described below. The both issues are
reported by our audit script and works fine for kernel 3.4.2 but not for
new kernel 4.4.0

1. Table mismatch
This is due to bunch of entries with type 2, "node" scope that differs from
each other.
Since the type "2"  is internal and "node" scope, do we expect this to be
matched with other node's table? Any change on latest TIPC?

// slot3

2  16781314   16781314   <1.1.3:0>  0   node
2  16781314   16781314   <1.1.3:1>  1   node
2  16781324   16781324   <1.1.3:1>  1   node
2  16781324   16781324   <1.1.3:0>  0   node
2  16781325   16781325   <1.1.3:0>  0   node
2  16781325   16781325   <1.1.3:1>  1   node


// slot2
Type   Lower  Upper  Port Identity  Publication
Scope

2  16781315   16781315   <1.1.2:0>  0   node
2  16781315   16781315   <1.1.2:1>  1   node
2  16781324   16781324   <1.1.2:0>  0   node
2  16781324   16781324   <1.1.2:1>  1   node
2  16781325   16781325   <1.1.2:0>  0   node
2  16781325   16781325   <1.1.2:1>  1   node


2. Active and standby links.
Our system has 2 bearer p19p1 and p19p2. Both links are ACTIVE in 3.4.2
kernel, but on new kernel the one comes as STANDBY.  Both have same
priority.
Is it expected behavior on latest TIPC?

If both are expected behavior then I would change our audit script
accordingly. Otherwise, need to debug the issue.

Link <1.1.2:p19p1-1.1.3:p19p1>
  ACTIVE  MTU:1500  Priority:10  Tolerance:1200 ms  Window:50 packets
  RX packets:4294901760 fragments:0/0 bundles:0/0
  TX packets:54487 fragments:0/0 bundles:0/0
  TX profile sample:164252 packets  average:30 octets
  0-64:97% -256:3% -1024:0% -4096:0% -16384:0% -32768:0% -66000:0%
  RX states:473939 probes:513 naks:1407 defs:0 dups:0
  TX states:999870 probes:472019 naks:0 acks:0 dups:1407
  Congestion link:0  Send queue max:0 avg:0

Link <1.1.2:p19p2-1.1.3:p19p2>
  STANDBY  MTU:1500  Priority:10  Tolerance:1200 ms  Window:50 packets
  RX packets:4294508544 fragments:0/0 bundles:0/0
  TX packets:43737 fragments:0/0 bundles:0/0
  TX profile sample:255979 packets  average:29 octets
  0-64:98% -256:2% -1024:0% -4096:0% -16384:0% -32768:0% -66000:0%
  RX states:419534 probes:509 naks:14 defs:242 dups:247
  TX states:182 probes:419011 naks:248 acks:0 dups:14
  Congestion link:0  Send queue max:0 avg:0

[root@Slot2 ~]# tipc-config -b
Bearers:
eth:p19p1
eth:p19p2

On last discussion, you asked me to turn on debug from the node.c. Could
you let me know how I could turned on? Do I need to add printf or just
enable any micro?

I tried to use the latest tipcutils, but having issue on compiling it.

Thanks,
Guna
--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


[tipc-discussion] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [kwork er/u32:2:19149]

2016-04-11 Thread GUNA
Lots of TIPC traces seen since it is started until the card reboots...

Apr  8 18:32:48 [SEQ 260135] Lab62slot5 kernel:  [12582.205697] NMI
watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [swapp
er/1:0]
Apr  8 18:32:48 [SEQ 260136] Lab62slot5 kernel:  [12582.208701] NMI
watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [swapp
er/3:0]
Apr  8 18:32:48 [SEQ 260137] Lab62slot5 kernel:  [12582.208764]
Modules linked in: nf_log_ipv4 nf_log_common xt_LOG ipt_REJE
CT nf_reject_ipv4 sctp e1000e tipc udp_tunnel ip6_udp_tunnel drbd
lru_cache libcrc32c 8021q mrp garp iTCO_wdt xt_physdev br_
netfilter bridge stp llc nf_conntrack_ipv4 nf_defrag_ipv4
iptable_filter ip_tables ipmiq_drv(O) ip6t_REJECT nf_reject_ipv6 s
io_mmc(O) nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack
event_drv(O) ip6table_filter lockd pt_timer_info(O) grace d
di(O) ixgbe igb usb_storage iTCO_vendor_support pcspkr lpc_ich
mfd_core i2c_i801 i2c_algo_bit i2c_core ptp pps_core mdio tpm
_tis tpm sunrpc [last unloaded: iTCO_wdt]
Apr  8 18:32:48 [SEQ 260138] Lab62slot5 kernel:  [12582.208770] CPU: 3
PID: 0 Comm: swapper/3 Tainted: G   O4.4.
0 #9
Apr  8 18:32:48 [SEQ 260139] Lab62slot5 kernel:  [12582.208772]
Hardware name: PT AMC124/Base Board Product Name, BIOS LGNAJ
FIP.PTI.0012.P15 01/15/2014
Apr  8 18:32:48 [SEQ 260140] Lab62slot5 kernel:  [12582.208775] task:
880351aa ti: 880351aa8000 task.ti: 880
351aa8000
Apr  8 18:32:48 [SEQ 260141] Lab62slot5 kernel:  [12582.208787] RIP:
0010:[]  [] _raw_sp
in_lock_bh+0x0/0x40
Apr  8 18:32:48 [SEQ 260142] Lab62slot5 kernel:  [12582.208790] RSP:
0018:88035fc63ac0  EFLAGS: 0202
Apr  8 18:32:48 [SEQ 260143] Lab62slot5 kernel:  [12582.208792] RAX:
 RBX: 88035fc63b70 RCX: 000
1
Apr  8 18:32:48 [SEQ 260144] Lab62slot5 kernel:  [12582.208794] RDX:
0003 RSI: 0200 RDI: 88035fc
63be4
Apr  8 18:32:48 [SEQ 260145] Lab62slot5 kernel:  [12582.208796] RBP:
88035fc63b38 R08: 0001 R09: 000
4
Apr  8 18:32:48 [SEQ 260146] Lab62slot5 kernel:  [12582.208798] R10:
 R11:  R12: 88035fc
63bd0
Apr  8 18:32:48 [SEQ 260147] Lab62slot5 kernel:  [12582.208801] R13:
8a363f96 R14: 88034e1a92c0 R15: 88035fc
63be4
Apr  8 18:32:48 [SEQ 260148] Lab62slot5 kernel:  [12582.208804] FS:
() GS:88035fc6() knlGS:

Apr  8 18:32:48 [SEQ 260149] Lab62slot5 kernel:  [12582.208806] CS:
0010 DS:  ES:  CR0: 8005003b
Apr  8 18:32:48 [SEQ 260150] Lab62slot5 kernel:  [12582.208809] CR2:
7ffe76e7efb8 CR3: 01c0a000 CR4: 000
006e0
Apr  8 18:32:48 [SEQ 260151] Lab62slot5 kernel:  [12582.208810] Stack:
Apr  8 18:32:48 [SEQ 260152] Lab62slot5 kernel:  [12582.208815]
a019a80a 0001 0296 
0001
Apr  8 18:32:48 [SEQ 260153] Lab62slot5 kernel:  [12582.208819]
81ce2800 88034e1a9348 810baeaf 8803
5fc63b28
Apr  8 18:32:48 [SEQ 260154] Lab62slot5 kernel:  [12582.208823]
88034fbf7800 88035fc63b48 88035fc63b70 

Apr  8 18:32:48 [SEQ 260155] Lab62slot5 kernel:  [12582.208824] Call Trace:
Apr  8 18:32:48 [SEQ 260156] Lab62slot5 kernel:  [12582.208841]  
Apr  8 18:32:48 [SEQ 260157] Lab62slot5 kernel:  [12582.208841]
[] ? tipc_sk_rcv+0x3a/0x490 [tipc]
Apr  8 18:32:48 [SEQ 260158] Lab62slot5 kernel:  [12582.208849]
[] ? __wake_up_sync_key+0x5f/0x80
Apr  8 18:32:48 [SEQ 260159] Lab62slot5 kernel:  [12582.208860]
[] tipc_node_xmit+0x11f/0x150 [tipc]
Apr  8 18:32:48 [SEQ 260160] Lab62slot5 kernel:  [12582.208864]
[] ? find_busiest_group+0x153/0x980
Apr  8 18:32:48 [SEQ 260161] Lab62slot5 kernel:  [12582.208875]
[] tipc_node_xmit_skb+0x37/0x60 [tipc]
Apr  8 18:32:48 [SEQ 260162] Lab62slot5 kernel:  [12582.208885]
[] tipc_sk_respond+0x99/0xc0 [tipc]
Apr  8 18:32:48 [SEQ 260163] Lab62slot5 kernel:  [12582.208895]
[] filter_rcv+0x4cd/0x550 [tipc]
Apr  8 18:32:48 [SEQ 260164] Lab62slot5 kernel:  [12582.208905]
[] tipc_sk_rcv+0x2dd/0x490 [tipc]
Apr  8 18:32:48 [SEQ 260165] Lab62slot5 kernel:  [12582.208915]
[] tipc_node_xmit+0x11f/0x150 [tipc]
Apr  8 18:32:48 [SEQ 260166] Lab62slot5 kernel:  [12582.208925]
[] ? tipc_recv_stream+0x370/0x370 [tipc]
Apr  8 18:32:48 [SEQ 260167] Lab62slot5 kernel:  [12582.208935]
[] tipc_node_xmit_skb+0x37/0x60 [tipc]
Apr  8 18:32:48 [SEQ 260168] Lab62slot5 kernel:  [12582.208945]
[] tipc_sk_timeout+0xe0/0x180 [tipc]
Apr  8 18:32:48 [SEQ 260169] Lab62slot5 kernel:  [12582.208955]
[] ? tipc_recv_stream+0x370/0x370 [tipc]
Apr  8 18:32:48 [SEQ 260170] Lab62slot5 kernel:  [12582.208961]
[] call_timer_fn+0x44/0x110
Apr  8 18:32:48 [SEQ 260171] Lab62slot5 kernel:  [12582.208965]
[] ? cascade+0x4a/0x80
Apr  8 18:32:48 [SEQ 260172] Lab62slot5 kernel:  [12582.208975]
[] ? tipc_recv_stream+0x370/0x370 [tipc]
Apr  8 18:32:48 [SEQ 260173] Lab62slot5 kernel:  [12582.208980]
[] run_timer_softirq+0x22c/0x280
Apr  8 18:32:4

[tipc-discussion] Kernel 4.4.0 TIPC: links were bouncing and not stable enough

2016-04-11 Thread GUNA
Jon,
Lab resources are very busy now, I will send you the logs once collected.

Erik,
Thanks for the information about tipcutils. Could I use the tipcutils
while the system uses the tipc-config ? Is there any conflicts?

-Guna

--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial! http://pubads.g.doubleclick.net/
gampad/clk?id=1444514301&iu=/ca-pub-7940484522588532
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] Kernel 4.4.0 TIPC: links were bouncing and not stable enough

2016-04-08 Thread GUNA
I have merged the fixes listed below into 4.4.0 kernel and rebuilt tipc module.
Now, the links are look stable. However, I do still see following alarms:
1. "Name table mismatch ..." alarm
2. "Dropping Name Table update"

Questions:
1. I thought the fix will resolve the above errors but they are still
exist. However links are better now. Why the fix did not work to
resolve above errors? Am I missing any other fixes?

// from 4.5
// tipc: move linearization of buffers to generic code
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=c7cad0d6f70cd4ce8644ffe528a4df1cdc2e77f5

// tipc: small cleanup of function tipc_node_check_state()
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5c10e9794013143eec80d494603d46dcb219970a

// tipc: fix premature addition of node to lookup table
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-4.4.y&id=692925fe2d42092a99d3532cb03932c8fda57786

// From 4.6
// tipc: don't check link reset on non existing link
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5c10e9794013143eec80d494603d46dcb219970a

--
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial! http://pubads.g.doubleclick.net/
gampad/clk?id=1444514301&iu=/ca-pub-7940484522588532
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] Kernel 4.4.0 TIPC: links were bouncing and not stable enough

2016-04-08 Thread GUNA
Jon,

You indicated the patch is available for "Dropping name table..."
issue in 4.5 kernel. Could you let me know the patch detail associated
with the fix, so I could manually merge in my 4.4.0 kernel. At this
time, I could not able to use 4.5 kernel.

If there is any other relevant fixes for "Name table mismatch..."
please let me know as well.

- Guna

--
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] Kernel 4.4.0 TIPC: links were bouncing and not stable enough

2016-04-07 Thread GUNA
Thanks Jon & Richard.

I am using 4.4.0 kernel and was told tipc-config should be good now,
so I will keep it for a while.

Regarding "ERROR DETECTED...", this seems our audit script displays
the error for "STANDBY" nodes while expects "ACTIVE". Since our system
supports redundancy, this is not an error.

>>ERROR DETECTED  2: Link not active <1.1.2:p19p1-1.1.3:p19p1>.
>>ERROR DETECTED  2: Link not active <1.1.2:p19p2-1.1.5:p19p2>.

// name tables mismatch
I also see the following mismatch errors from our audit script's
output. I suspect this is due to the other issue "Dropping name table
update ...". As per recent post,Jon found the fix for it and will test
the patch once it is available.

Name table mismatch between 12 and 10.
Name table mismatch between 2 and 13.
Name table mismatch between 5 and 3.

The name tables were checked for mismatch as per following
assumptions. If they are not correct or change required for latest
TIPC (kernel 4.4.0 version), please let me know.

1. Lines where the first column is 0 or 1 are TIPC internal  => These
lines are ignored
2. As of TIPC 2.0, zone support appears to be deprecated, so,
differences between zone and cluster scopes ignored

// sample name table
[root@Lab2slot2 gTmp]# tipc-config -nt
Type Lower  UpperPort Identity
 Publication   Scope
52009369618461   2634183778  <1.1.3:0>  0
   cluster
0 0   0  <1.1.3:256>
   256  cluster
0 0   0  <1.1.3:2181>
   2181cluster
0 16781314   16781314  <1.1.2:0>
0zone
...

17097678   3201 3201<1.1.5:829961004>829961004
   cluster
17075168   4  4  <1.1.9:1415618317> 1415618317
  cluster
17075168   4  4  <1.1.10:2146874377>    2146874377
  cluster

There are around 200 entries on each card but differs around 14 - 20 entries.

- Guna



On Wed, Apr 6, 2016 at 12:02 PM, Jon Maloy  wrote:
>
>
>> -Original Message-
>> From: GUNA [mailto:gbala...@gmail.com]
>> Sent: Wednesday, 06 April, 2016 11:36
>> To: tipc-discussion@lists.sourceforge.net
>> Subject: [tipc-discussion] Kernel 4.4.0 TIPC: links were bouncing and not 
>> stable
>> enough
>>
>> I upgraded kernel from 3.4.2 to 4.4.0 and noticed links were bouncing
>> and not stable.
>>
>> There are a multitude of “Link not active” messages from audit_tipc.sh
>> script, as you could see below, however the links are up from
>> "tipc-config -l".
>>
>> My questions are:
>> 1. Why the audit_tipc.sh indicates "Link not active" while the links
>> seem up? Does the audit_tipc.sh script not parsing the data
>> correctly(new format perhaps for this new kernel)?
>
> I don't know this script, so I will let somebody else respond to this one.
>
>>
>> 2. Is it known issue that "links were bouncing"? Any fix for the
>> issue? Blocking now.
>
> No, there is no known such issue.  If you take your input from the script I 
> think there is every reason to suspect the interaction between the script and 
> the module code.
>
>>
>> 3. I do also see "Dropping name table update ..." from dmesg. Does
>> this cause any issue?
>
> Yes, it does. It means that pre-establishment bindings (those present in the 
> issuer's local binding table prior to establishment, and sent out as "bulk" 
> in on or more messages when the link is established) are not applied at the 
> receiver. it is clear that this/those particular message sometimes and 
> somehow get corrupted at reception.  I am just trouble-shooting this problem 
> based on input from Rune Torgersen. If you wait with starting your 
> application and doing your bindings until after link establishment you are 
> safe. for now.
>
> ///jon
>
>>
>> // audit_tipc.sh
>> For example the script shows all links as being down for slot 2:
>>
>>ERROR DETECTED  2: Link not active <1.1.2:p19p1-1.1.3:p19p1>.
>>ERROR DETECTED  2: Link not active <1.1.2:p19p2-1.1.5:p19p2>.
>>ERROR DETECTED  2: Link not active <1.1.2:p19p2-1.1.6:p19p2>.
>>ERROR DETECTED  2: Link not active <1.1.2:p19p1-1.1.7:eth0>.
>>ERROR DETECTED  2: Link not active <1.1.2:p19p1-1.1.8:eth0>.
>>ERROR DETECTED  2: Link not active <1.1.2:p19p2-1.1.9:p19p2>.
>>ERROR DETECTED  2: Link not active <1.1.2:p19p2-1.1.10:p19p2>.
>>ERROR DETECTED  2: Link not active <1.1.2:p19

[tipc-discussion] Kernel 4.4.0 TIPC: links were bouncing and not stable enough

2016-04-06 Thread GUNA
I upgraded kernel from 3.4.2 to 4.4.0 and noticed links were bouncing
and not stable.

There are a multitude of “Link not active” messages from audit_tipc.sh
script, as you could see below, however the links are up from
"tipc-config -l".

My questions are:
1. Why the audit_tipc.sh indicates "Link not active" while the links
seem up? Does the audit_tipc.sh script not parsing the data
correctly(new format perhaps for this new kernel)?

2. Is it known issue that "links were bouncing"? Any fix for the
issue? Blocking now.

3. I do also see "Dropping name table update ..." from dmesg. Does
this cause any issue?

// audit_tipc.sh
For example the script shows all links as being down for slot 2:

   ERROR DETECTED  2: Link not active <1.1.2:p19p1-1.1.3:p19p1>.
   ERROR DETECTED  2: Link not active <1.1.2:p19p2-1.1.5:p19p2>.
   ERROR DETECTED  2: Link not active <1.1.2:p19p2-1.1.6:p19p2>.
   ERROR DETECTED  2: Link not active <1.1.2:p19p1-1.1.7:eth0>.
   ERROR DETECTED  2: Link not active <1.1.2:p19p1-1.1.8:eth0>.
   ERROR DETECTED  2: Link not active <1.1.2:p19p2-1.1.9:p19p2>.
   ERROR DETECTED  2: Link not active <1.1.2:p19p2-1.1.10:p19p2>.
   ERROR DETECTED  2: Link not active <1.1.2:p19p1-1.1.12:p19p1>.
   ERROR DETECTED  2: Link not active <1.1.2:p19p1-1.1.13:p19p1>.

But when I check manually I see they are indeed all up for slot 2:

 [root@slot2 ~]# tipc-config -l
[root@slot2 log]# tipc-config -l
Links:
broadcast-link: up
1.1.2:p19p1-1.1.3:p19p1: up
1.1.2:p19p2-1.1.3:p19p2: up
1.1.2:p19p1-1.1.5:p19p1: up
1.1.2:p19p2-1.1.5:p19p2: up
1.1.2:p19p1-1.1.6:p19p1: up
1.1.2:p19p2-1.1.6:p19p2: up
1.1.2:p19p1-1.1.7:eth0: up
1.1.2:p19p2-1.1.7:eth1: up
1.1.2:p19p1-1.1.8:eth0: up
1.1.2:p19p2-1.1.8:eth1: up
1.1.2:p19p1-1.1.9:p19p1: up
1.1.2:p19p2-1.1.9:p19p2: up
1.1.2:p19p1-1.1.10:p19p1: up
1.1.2:p19p2-1.1.10:p19p2: up
1.1.2:p19p1-1.1.12:p19p1: up
1.1.2:p19p2-1.1.12:p19p2: up
1.1.2:p19p1-1.1.13:p19p1: up
1.1.2:p19p2-1.1.13:p19p2: up

[root@slot2 log]# tipc-config -nt |head -n 15
Type   Lower  Upper  Port Identity  Publication Scope
520093696  18461  2634183778 <1.1.3:0>  0   cluster
0  0  0  <1.1.3:256>256 cluster
0  0  0  <1.1.3:2181>   2181cluster
0  16781314   16781314   <1.1.2:0>  0   zone
0  16781315   16781315   <1.1.3:0>  0   cluster
0  16781317   16781317   <1.1.5:0>  0   cluster
0  16781318   16781318   <1.1.6:0>  0   cluster
0  16781319   16781319   <1.1.7:2654986243> 2654986243  cluster
0  16781320   16781320   <1.1.8:3903692803> 3903692803  cluster
0  16781321   16781321   <1.1.9:0>  0   cluster
...

[root@slot2 log]# tipc-config -s
TIPC version 2.0.0

[root@slot2 log]# tipc-config -V
TIPC configuration tool version 2.0.2


[root@slot2 log]#
[root@slot2 log]# dmesg |grep -i "dropping"
[   72.318895] Dropping name table update (0) of {3868288600,
3020841726, 534120506} from <1.1.3> key=3657291381
[   72.328943] Dropping name table update (0) of {533106125,
3808506331, 327966794} from <1.1.3> key=491181369
[   72.338727] Dropping name table update (0) of {2413453794,
4001531852, 1123674776} from <1.1.3> key=727597829
[   72.348754] Dropping name table update (0) of {4091040499,
3918449564, 12583023} from <1.1.3> key=4294901760
[   72.358687] Dropping name table update (0) of {4294901936,
1269051007, 47296} from <1.1.3> key=4294903941
[   72.368356] Dropping name table update (0) of {407831432,
4294965652, 407831432} from <1.1.3> key=2336883592
[   72.378634] Dropping name table update (0) of {16385, 16128, 0}
from <1.1.3> key=407831432
[   72.387304] Dropping name table update (0) of {2189557759,
4294917888, 0} from <1.1.3> key=1246757768

--
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


[tipc-discussion] list_add corruption. next->prev should be prev (ffff88034b1d1c88), but was (null)

2016-03-29 Thread GUNA
Based on Kernel 4.4.0+some of the latest tipc fixes.
The card is not come up, required to re-seat.

Also still seeing "Dropping name table..."


[  255.418898] WARNING: CPU: 5 PID: 110 at lib/list_debug.c:29
__list_add+0x81/0xd0()

[  255.426646] list_add corruption. next->prev should be prev
(88034b1d1c88), but was   (null). (next=8800369d2280).

[  255.438603] Modules linked in: sctp e1000e tipc udp_tunnel
ip6_udp_tunnel drbd lru_cache libcrc32c 8021q mrp garp xt_physdev
br_netfilter iTCO_wdt(O) bridge stp llc nf_conntrack_ipv4
nf_defrag_ipv4 iptable_filter ip_tables ipmiq_drv(O) sio_mmc(O)
ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state
nf_conntrack event_drv(O) lockd ip6table_filter grace ddi(O)
pt_timer_info(O) ixgbe usb_storage igb iTCO_vendor_support pcspkr
i2c_i801 lpc_ich mfd_core i2c_algo_bit i2c_core ptp pps_core mdio
tpm_tis tpm sunrpc [last unloaded: iTCO_wdt]

[  255.489470] CPU: 5 PID: 110 Comm: kworker/u32:2 Tainted: GW
 O4.4.0 #6

[  255.497524] Hardware name: PT AMC124/Base Board Product Name, BIOS
LGNAJFIP.PTI.0012.P15 01/15/2014

[  255.507007] Workqueue: tipc_rcv tipc_recv_work [tipc]

[  255.512302]  001d 88034e2dfb68 8134bc35
88035fcaf300

[  255.520297]  88034e2dfbb8 88034e2dfba8 8107e6d7
a02a455d

[  255.528277]  88034e9d3a80 8800369d2280 88034b1d1c88
88033c22f860

[  255.536170] Call Trace:

[  255.538748]  [] dump_stack+0x45/0x60

[  255.544135]  [] warn_slowpath_common+0x97/0xe0

[  255.550484]  [] ? tsk_advance_rx_queue+0x4d/0x60 [tipc]

[  255.557526]  [] warn_slowpath_fmt+0x46/0x50

[  255.563721]  [] __list_add+0x81/0xd0

[  255.569173]  [] tipc_nametbl_subscribe+0x8b/0x170 [tipc]

[  255.576316]  [] tipc_subscrb_rcv_cb+0x1d5/0x310 [tipc]

[  255.583365]  [] tipc_receive_from_sock+0xb2/0x120 [tipc]

[  255.590749]  [] tipc_recv_work+0x2f/0x60 [tipc]

[  255.597239]  [] process_one_work+0x150/0x3e0

[  255.603451]  [] worker_thread+0x111/0x460

[  255.609400]  [] ? create_worker+0x1b0/0x1b0

[  255.615526]  [] kthread+0xc9/0xe0

[  255.620747]  [] ? flush_kthread_worker+0x90/0x90

[  255.627267]  [] ret_from_fork+0x3f/0x70

[  255.632927]  [] ? flush_kthread_worker+0x90/0x90

[  255.639355] ---[ end trace cd13e34ab48cf602 ]---

--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] Kernel 4.4.0 - tipc __list_del_entry

2016-03-29 Thread GUNA
I merged fixes (listed below) to my kernel 4.4.0 and tested. The send_msg
crash is gone but I do still see following:

1. "Dropping name table" ...
2. general protection fault:  1
<http://jira.sonusnet.com/browse/DSC-9647?focusedCommentId=367208#1> SMP
Mar 23 15:13:25 [SEQ 429781]  kernel:  [ 8289.115539]  []
tipc_nametbl_unsubscribe+0x7e/0x110 [tipc]
Mar 23 15:13:25 [SEQ 429782]  kernel:  [ 8289.123059]  []
? handle_edge_irq+0x93/0x150
Mar 23 15:13:25 [SEQ 429783]  kernel:  [ 8289.128967]  []
? tipc_subscrp_send_event+0xf0/0xf0 [tipc

If I miss any fix please let me know.

Thanks
Guna

On Thu, Mar 24, 2016 at 3:28 PM, GUNA  wrote:

> I could not able to move 4.5.0 but I could merge all the following tipc
> changes in 4.5.0 (except merge git). If I need to pickup any other fixes
> please let me know.
>
> Also, I do see there are discussions about "Dropping name table update
> ..." issue but don't see any solution.  If there is a solution for this
> please let me know as well.
>
> thanks.
>
> AgeCommit message (Expand
> <https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/log/?h=linux-4.5.y&qt=grep&q=tipc&showmsg=1>
> )AuthorFilesLines
>
>
>
>
>
>
> 2016-03-07 tipc: fix nullptr crash during subscription cancel
> <https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-4.5.y&id=4de13d7ed6ffdcbb34317acaa9236f121176f5f8>
>  Parthasarathy
> Bhuvaragan 1 -1/+2
> 2016-03-03 tipc: Revert "tipc: use existing sk_write_queue for outgoing
> packet chain"
> <https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-4.5.y&id=f214fc402967e1bc94ad7f39faa03db5813d6849>
>  Parthasarathy
> Bhuvaragan 1 -14/+19
>
>
>
>
>
> 2016-02-19 tipc: unlock in error path
> <https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-4.5.y&id=b53ce3e7d407aa4196877a48b8601181162ab158>
>  Insu
> Yun 1 -1/+3
> 2016-02-16 tipc: fix premature addition of node to lookup table
> <https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-4.5.y&id=d5c91fb72f1652ea3026925240a0998a42ddb16b>
>  Jon
> Paul Maloy 1 -6/+6
> 2016-01-29 tipc: fix connection abort during subscription cancel
> <https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-4.5.y&id=4d5cfcba2f6ec494d8810b9e3c0a7b06255c8067>
>  Parthasarathy
> Bhuvaragan 1 -6/+5
>
>
> On Thu, Mar 24, 2016 at 3:11 PM, Jon Maloy  wrote:
>
>> Hi,
>> That is the one. However, I see in a different mail from Rune Torgersen
>> that the problem might still be there, despite the fix.
>> Just try 4.5.0 and see if you get the same problem.
>>
>> Regards
>> ///jon
>>
>> > -Original Message-
>> > From: GUNA [mailto:gbala...@gmail.com]
>> > Sent: Thursday, 24 March, 2016 12:17
>> > To: tipc-discussion@lists.sourceforge.net
>> > Subject: [tipc-discussion] Kernel 4.4.0 - tipc __list_del_entry
>> >
>> > Please confirm the following will fix the issue reported below:
>> >
>> > // fix in 4.4.4
>> > https://git.kernel.org/cgit/linux/kernel/git/stable/linux-
>> > stable.git/commit/?h=linux-
>> > 4.4.y&id=c57e51ffd1d910d595ccb3af3ae70eeeb6d423a2
>> >
>> > Mar 23 15:13:25 [SEQ 429763]  kernel:  [ 8289.000919] Workqueue:
>> > tipc_rcv tipc_recv_work [tipc]
>> > Mar 23 15:13:25 [SEQ 429764]  kernel:  [ 8289.006024] task:
>> > 88035073 ti: 880036b2c000 task.ti: 880036b2c000
>> > Mar 23 15:13:25 [SEQ 429765]  kernel:  [ 8289.013717] RIP:
>> > 0010:[]  []
>> > __list_del_entry+0x29/0xd0
>> > Mar 23 15:13:25 [SEQ 429766]  kernel:  [ 8289.022305] RSP:
>> > 0018:88035fc03d88  EFLAGS: 00010203
>> > Mar 23 15:13:25 [SEQ 429767]  kernel:  [ 8289.027745] RAX:
>> > 0021000e RBX: 880349c88900 RCX: dead0200
>> > Mar 23 15:13:25 [SEQ 429768]  kernel:  [ 8289.035120] RDX:
>> > 002e RSI: 0001 RDI: 880349c88980
>> > Mar 23 15:13:25 [SEQ 429769]  kernel:  [ 8289.042410] RBP:
>> > 88035fc03d88 R08: 88034e6d4f50 R09: a029e18b
>> > Mar 23 15:13:25 [SEQ 429770]  kernel:  [ 8289.049705] R10:
>> > 0020 R11:  R12: 88034f35f120
>> > Mar 23 15:13:25 [SEQ 429771]  kernel:  [ 8289.056941] R13:
>> > 880351317cc0 R14: 880349c88980 R15: 880351317cf8
>> > Mar 23 15:13:25 [SEQ 429772]  kernel:  [ 8289.064313] FS:
>> > () GS:88035fc0(00

[tipc-discussion] subscr.c: Added new function is removed on latest: why

2016-03-29 Thread GUNA
// file subscr.c
I do see tipc_subscrp_subscribe() added on Feb 2nd but the changes are
reverted on March 7th version of file. Any reason why the
tipc_subscrp_subscribe() is not on latest version?

// Feb 2nd
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=7c13c6224123a6424bd3bc60ef982759754501e9

// March 7th
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4de13d7ed6ffdcbb34317acaa9236f121176f5f8

It seems the March 7th fix may not needed for Feb 2nd version since
some fixes check the null.  Please confirm.

static void tipc_subscrp_subscribe(struct net *net, struct tipc_subscr *s,
   struct
tipc_subscriber *subscriber, int swap)
{
struct tipc_net *tn = net_generic(net, tipc_net_id);
struct tipc_subscription *sub = NULL;
u32 timeout;

sub = tipc_subscrp_create(net, s, swap);
if (!sub)
   <<== NILL check here
return tipc_conn_terminate(tn->topsrv, subscriber->conid);

spin_lock_bh(&subscriber->lock);
list_add(&sub->subscrp_list, &subscriber->subscrp_list);
tipc_subscrb_get(subscriber);
sub->subscriber = subscriber;
tipc_nametbl_subscribe(sub);   <== At this
point, "sub" won't be NULL
    spin_unlock_bh(&subscriber->lock);

...

Thanks,
Guna

--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] Kernel 4.4.0 - tipc __list_del_entry

2016-03-24 Thread GUNA
Regarding "Dropping name table" ...
It seems working if TIPC_ZONE_SCOPE is replaced by TIPC_CLUSTER_ZONE.
Could you let me know where the change should be applied (ie:file/function)?

On Thu, Mar 24, 2016 at 3:28 PM, GUNA  wrote:

> I could not able to move 4.5.0 but I could merge all the following tipc
> changes in 4.5.0 (except merge git). If I need to pickup any other fixes
> please let me know.
>
> Also, I do see there are discussions about "Dropping name table update
> ..." issue but don't see any solution.  If there is a solution for this
> please let me know as well.
>
> thanks.
>
> AgeCommit message (Expand
> <https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/log/?h=linux-4.5.y&qt=grep&q=tipc&showmsg=1>
> )AuthorFilesLines
>
>
>
>
>
>
> 2016-03-07 tipc: fix nullptr crash during subscription cancel
> <https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-4.5.y&id=4de13d7ed6ffdcbb34317acaa9236f121176f5f8>
>  Parthasarathy
> Bhuvaragan 1 -1/+2
> 2016-03-03 tipc: Revert "tipc: use existing sk_write_queue for outgoing
> packet chain"
> <https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-4.5.y&id=f214fc402967e1bc94ad7f39faa03db5813d6849>
>  Parthasarathy
> Bhuvaragan 1 -14/+19
>
>
>
>
>
> 2016-02-19 tipc: unlock in error path
> <https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-4.5.y&id=b53ce3e7d407aa4196877a48b8601181162ab158>
>  Insu
> Yun 1 -1/+3
> 2016-02-16 tipc: fix premature addition of node to lookup table
> <https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-4.5.y&id=d5c91fb72f1652ea3026925240a0998a42ddb16b>
>  Jon
> Paul Maloy 1 -6/+6
> 2016-01-29 tipc: fix connection abort during subscription cancel
> <https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-4.5.y&id=4d5cfcba2f6ec494d8810b9e3c0a7b06255c8067>
>  Parthasarathy
> Bhuvaragan 1 -6/+5
>
>
> On Thu, Mar 24, 2016 at 3:11 PM, Jon Maloy  wrote:
>
>> Hi,
>> That is the one. However, I see in a different mail from Rune Torgersen
>> that the problem might still be there, despite the fix.
>> Just try 4.5.0 and see if you get the same problem.
>>
>> Regards
>> ///jon
>>
>> > -Original Message-
>> > From: GUNA [mailto:gbala...@gmail.com]
>> > Sent: Thursday, 24 March, 2016 12:17
>> > To: tipc-discussion@lists.sourceforge.net
>> > Subject: [tipc-discussion] Kernel 4.4.0 - tipc __list_del_entry
>> >
>> > Please confirm the following will fix the issue reported below:
>> >
>> > // fix in 4.4.4
>> > https://git.kernel.org/cgit/linux/kernel/git/stable/linux-
>> > stable.git/commit/?h=linux-
>> > 4.4.y&id=c57e51ffd1d910d595ccb3af3ae70eeeb6d423a2
>> >
>> > Mar 23 15:13:25 [SEQ 429763]  kernel:  [ 8289.000919] Workqueue:
>> > tipc_rcv tipc_recv_work [tipc]
>> > Mar 23 15:13:25 [SEQ 429764]  kernel:  [ 8289.006024] task:
>> > 88035073 ti: 880036b2c000 task.ti: 880036b2c000
>> > Mar 23 15:13:25 [SEQ 429765]  kernel:  [ 8289.013717] RIP:
>> > 0010:[]  []
>> > __list_del_entry+0x29/0xd0
>> > Mar 23 15:13:25 [SEQ 429766]  kernel:  [ 8289.022305] RSP:
>> > 0018:88035fc03d88  EFLAGS: 00010203
>> > Mar 23 15:13:25 [SEQ 429767]  kernel:  [ 8289.027745] RAX:
>> > 0021000e RBX: 880349c88900 RCX: dead0200
>> > Mar 23 15:13:25 [SEQ 429768]  kernel:  [ 8289.035120] RDX:
>> > 002e RSI: 0001 RDI: 880349c88980
>> > Mar 23 15:13:25 [SEQ 429769]  kernel:  [ 8289.042410] RBP:
>> > 88035fc03d88 R08: 88034e6d4f50 R09: a029e18b
>> > Mar 23 15:13:25 [SEQ 429770]  kernel:  [ 8289.049705] R10:
>> > 0020 R11:  R12: 88034f35f120
>> > Mar 23 15:13:25 [SEQ 429771]  kernel:  [ 8289.056941] R13:
>> > 880351317cc0 R14: 880349c88980 R15: 880351317cf8
>> > Mar 23 15:13:25 [SEQ 429772]  kernel:  [ 8289.064313] FS:
>> > () GS:88035fc0()
>> > knlGS:
>> > Mar 23 15:13:25 [SEQ 429773]  kernel:  [ 8289.072544] CS:  0010 DS:
>> >  ES:  CR0: 8005003b
>> > Mar 23 15:13:25 [SEQ 429774]  kernel:  [ 8289.078418] CR2:
>> > f7472fd8 CR3: 00033bb34000 CR4: 06f0
>> > Mar 23 15:13:25 [SEQ 429775]  kernel:  [ 8289.085792] Stack:
>> > Mar

Re: [tipc-discussion] Kernel 4.4.0 - tipc __list_del_entry

2016-03-24 Thread GUNA
I could not able to move 4.5.0 but I could merge all the following tipc
changes in 4.5.0 (except merge git). If I need to pickup any other fixes
please let me know.

Also, I do see there are discussions about "Dropping name table update ..."
issue but don't see any solution.  If there is a solution for this please
let me know as well.

thanks.

AgeCommit message (Expand
<https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/log/?h=linux-4.5.y&qt=grep&q=tipc&showmsg=1>
)AuthorFilesLines






2016-03-07 tipc: fix nullptr crash during subscription cancel
<https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-4.5.y&id=4de13d7ed6ffdcbb34317acaa9236f121176f5f8>
Parthasarathy
Bhuvaragan 1 -1/+2
2016-03-03 tipc: Revert "tipc: use existing sk_write_queue for outgoing
packet chain"
<https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-4.5.y&id=f214fc402967e1bc94ad7f39faa03db5813d6849>
Parthasarathy
Bhuvaragan 1 -14/+19





2016-02-19 tipc: unlock in error path
<https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-4.5.y&id=b53ce3e7d407aa4196877a48b8601181162ab158>
Insu
Yun 1 -1/+3
2016-02-16 tipc: fix premature addition of node to lookup table
<https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-4.5.y&id=d5c91fb72f1652ea3026925240a0998a42ddb16b>
Jon
Paul Maloy 1 -6/+6
2016-01-29 tipc: fix connection abort during subscription cancel
<https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-4.5.y&id=4d5cfcba2f6ec494d8810b9e3c0a7b06255c8067>
Parthasarathy
Bhuvaragan 1 -6/+5


On Thu, Mar 24, 2016 at 3:11 PM, Jon Maloy  wrote:

> Hi,
> That is the one. However, I see in a different mail from Rune Torgersen
> that the problem might still be there, despite the fix.
> Just try 4.5.0 and see if you get the same problem.
>
> Regards
> ///jon
>
> > -Original Message-
> > From: GUNA [mailto:gbala...@gmail.com]
> > Sent: Thursday, 24 March, 2016 12:17
> > To: tipc-discussion@lists.sourceforge.net
> > Subject: [tipc-discussion] Kernel 4.4.0 - tipc __list_del_entry
> >
> > Please confirm the following will fix the issue reported below:
> >
> > // fix in 4.4.4
> > https://git.kernel.org/cgit/linux/kernel/git/stable/linux-
> > stable.git/commit/?h=linux-
> > 4.4.y&id=c57e51ffd1d910d595ccb3af3ae70eeeb6d423a2
> >
> > Mar 23 15:13:25 [SEQ 429763]  kernel:  [ 8289.000919] Workqueue:
> > tipc_rcv tipc_recv_work [tipc]
> > Mar 23 15:13:25 [SEQ 429764]  kernel:  [ 8289.006024] task:
> > 88035073 ti: 880036b2c000 task.ti: 880036b2c000
> > Mar 23 15:13:25 [SEQ 429765]  kernel:  [ 8289.013717] RIP:
> > 0010:[]  []
> > __list_del_entry+0x29/0xd0
> > Mar 23 15:13:25 [SEQ 429766]  kernel:  [ 8289.022305] RSP:
> > 0018:88035fc03d88  EFLAGS: 00010203
> > Mar 23 15:13:25 [SEQ 429767]  kernel:  [ 8289.027745] RAX:
> > 0021000e RBX: 880349c88900 RCX: dead0200
> > Mar 23 15:13:25 [SEQ 429768]  kernel:  [ 8289.035120] RDX:
> > 002e RSI: 0001 RDI: 880349c88980
> > Mar 23 15:13:25 [SEQ 429769]  kernel:  [ 8289.042410] RBP:
> > 88035fc03d88 R08: 88034e6d4f50 R09: a029e18b
> > Mar 23 15:13:25 [SEQ 429770]  kernel:  [ 8289.049705] R10:
> > 0020 R11:  R12: 88034f35f120
> > Mar 23 15:13:25 [SEQ 429771]  kernel:  [ 8289.056941] R13:
> > 880351317cc0 R14: 880349c88980 R15: 880351317cf8
> > Mar 23 15:13:25 [SEQ 429772]  kernel:  [ 8289.064313] FS:
> > () GS:88035fc0()
> > knlGS:
> > Mar 23 15:13:25 [SEQ 429773]  kernel:  [ 8289.072544] CS:  0010 DS:
> >  ES:  CR0: 8005003b
> > Mar 23 15:13:25 [SEQ 429774]  kernel:  [ 8289.078418] CR2:
> > f7472fd8 CR3: 00033bb34000 CR4: 06f0
> > Mar 23 15:13:25 [SEQ 429775]  kernel:  [ 8289.085792] Stack:
> > Mar 23 15:13:25 [SEQ 429776]  kernel:  [ 8289.087803]
> > 88035fc03dc8 a0292f1e 810d43a3 880349c88900
> > Mar 23 15:13:25 [SEQ 429777]  kernel:  [ 8289.095547]
> > 88034f35e000 88009b10e708 88035fc0d3b8 a0291d10
> > Mar 23 15:13:25 [SEQ 429778]  kernel:  [ 8289.103259]
> > 88035fc03de8 a02917f9 880349c88900 88009b10e700
> > Mar 23 15:13:25 [SEQ 429779]  kernel:  [ 8289.111030] Call Trace:
> > Mar 23 15:13:25 [SEQ 429780]  kernel:  [ 8289.113533]  
> > Mar 23 15:13:25 [SEQ 429781]  kernel:  [ 8289.115539]
> > [] tipc_nametbl_unsubscribe+0x7e/0x110 [tipc]
&g

[tipc-discussion] BUG: unable to handle kernel NULL pointer dereference at 0000000000000039 ==> tipc_sendmsg

2016-03-24 Thread GUNA
I am using kernel 4.4.0 and have seen following panic. Please let me
know if fix is available for this issue?

[67670.162758] BUG: unable to handle kernel NULL pointer dereference
at 0039
[67670.170664] IP: [] __tipc_sendmsg+0x1be/0x5a0 [tipc]
[67670.177231] PGD 34de24067 PUD 3505f5067 PMD 0
[67670.181778] Oops: 0002 [#1] SMP
[67670.185061] Modules linked in: 8021q mrp garp sctp libcrc32c e1000e
tipc udp_tunnel ip6_udp_tu
nnel iTCO_wdt(O) xt_physdev br_netfilter bridge stp ipmiq_drv(O) llc
sio_mmc(O) nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables
ip6t_REJECT n
f_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack
event_drv(O) lockd ddi(O) ip6table_filter pt_timer_info(O) grace ixgbe
igb usb_sto
rage iTCO_vendor_support lpc_ich mfd_core pcspkr i2c_i801 i2c_algo_bit
i2c_core ptp pps_core mdio tpm_tis tpm sunrpc [last unloaded:
iTCO_wdt]
[67670.234245] CPU: 5 PID: 22259 Comm: yj4flx Tainted: GW  O4.4.0 #6
[67670.241707] Hardware name: PT AMC124/Base Board Product Name, BIOS
LGNAJFIP.PTI.0012.P15 01/15

[67670.251135] task: 8800366d51c0 ti: 88034ef7c000 task.ti:
88034ef7c000
[67670.258773] RIP: 0010:[]  []
__tipc_sendmsg+0x1be/0x5a0 [t
ipc]
[67670.267931] RSP: 0018:88034ef7fc78  EFLAGS: 00010246
RAX:  RBX: 88034ef7fdb0 RCX: 3d8bb0031109
[67670.280838] RDX:  RSI: 880350167cf0 RDI: 81ce2800
[67670.288071] RBP: 88034ef7fd48 R08: 88034ef7c000 R09: 
[67670.295312] R10: 0079 R11:  R12: 880350167bc0
[67670.302687] R13: 880350f5f200 R14: 7dc4 R15: 7dc4
[67670.309944] FS:  () GS:88035fca(0063)
knlGS:f3affb40
[67670.318173] CS:  0010 DS: 002b ES: 002b CR0: 8005003b
[67670.324081] CR2: 0039 CR3: 00034b04 CR4: 06e0
[67670.331288] Stack:
880305b4 88034ef7fdc0 880350167e98 81ce2800
[67670.341145]  880350167cf0 88034ef7fcb0 88034ef7fcb8
0001
[67670.348870]   7dc4 88034ef7fe08
0001
[67670.356589] Call Trace:
[67670.359138]  [] ? __wake_up+0x53/0x70
[] tipc_sendmsg+0x42/0x70 [tipc]
[67670.370380]  [] sock_sendmsg+0x47/0x50
[] SYSC_sendto+0x103/0x150
[] ? SYSC_getsockname+0x92/0xe0
[67670.387261]  [] ? __audit_syscall_entry+0xb1/0x110
[67670.393627]  [] ? do_audit_syscall_entry+0x66/0x70
[67670.49]  [] ? syscall_trace_enter_phase1+0xf8/0x120
[67670.406797]  [] SyS_sendto+0xe/0x10
[67670.411942]  [] compat_SyS_socketcall+0xef/0x180
[67670.418140]  [] do_fast_syscall_32+0x99/0x130
[67670.424110]  [] sysenter_flags_fixed+0x8/0x12
[67670.430197] Code: 30 01 00 00 48 39 85 50 ff ff ff ba 00 00 00 00
48 8b b5 50 ff ff ff 48 8b b
d 48 ff ff ff 48 0f 44 c2 41 0f b6 94 24 50 03 00 00 <88> 50 39 8b 55
cc 41 8b 8c 24 d4 02 00 00 e8 3f e3 ff ff 85 c0
[67670.451072] RIP  [] __tipc_sendmsg+0x1be/0x5a0 [tipc]
[67670.458037]  RSP 

--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


[tipc-discussion] Kernel 4.4.0 - tipc __list_del_entry

2016-03-24 Thread GUNA
Please confirm the following will fix the issue reported below:

// fix in 4.4.4
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-4.4.y&id=c57e51ffd1d910d595ccb3af3ae70eeeb6d423a2

Mar 23 15:13:25 [SEQ 429763]  kernel:  [ 8289.000919] Workqueue:
tipc_rcv tipc_recv_work [tipc]
Mar 23 15:13:25 [SEQ 429764]  kernel:  [ 8289.006024] task:
88035073 ti: 880036b2c000 task.ti: 880036b2c000
Mar 23 15:13:25 [SEQ 429765]  kernel:  [ 8289.013717] RIP:
0010:[]  []
__list_del_entry+0x29/0xd0
Mar 23 15:13:25 [SEQ 429766]  kernel:  [ 8289.022305] RSP:
0018:88035fc03d88  EFLAGS: 00010203
Mar 23 15:13:25 [SEQ 429767]  kernel:  [ 8289.027745] RAX:
0021000e RBX: 880349c88900 RCX: dead0200
Mar 23 15:13:25 [SEQ 429768]  kernel:  [ 8289.035120] RDX:
002e RSI: 0001 RDI: 880349c88980
Mar 23 15:13:25 [SEQ 429769]  kernel:  [ 8289.042410] RBP:
88035fc03d88 R08: 88034e6d4f50 R09: a029e18b
Mar 23 15:13:25 [SEQ 429770]  kernel:  [ 8289.049705] R10:
0020 R11:  R12: 88034f35f120
Mar 23 15:13:25 [SEQ 429771]  kernel:  [ 8289.056941] R13:
880351317cc0 R14: 880349c88980 R15: 880351317cf8
Mar 23 15:13:25 [SEQ 429772]  kernel:  [ 8289.064313] FS:
() GS:88035fc0()
knlGS:
Mar 23 15:13:25 [SEQ 429773]  kernel:  [ 8289.072544] CS:  0010 DS:
 ES:  CR0: 8005003b
Mar 23 15:13:25 [SEQ 429774]  kernel:  [ 8289.078418] CR2:
f7472fd8 CR3: 00033bb34000 CR4: 06f0
Mar 23 15:13:25 [SEQ 429775]  kernel:  [ 8289.085792] Stack:
Mar 23 15:13:25 [SEQ 429776]  kernel:  [ 8289.087803]
88035fc03dc8 a0292f1e 810d43a3 880349c88900
Mar 23 15:13:25 [SEQ 429777]  kernel:  [ 8289.095547]
88034f35e000 88009b10e708 88035fc0d3b8 a0291d10
Mar 23 15:13:25 [SEQ 429778]  kernel:  [ 8289.103259]
88035fc03de8 a02917f9 880349c88900 88009b10e700
Mar 23 15:13:25 [SEQ 429779]  kernel:  [ 8289.111030] Call Trace:
Mar 23 15:13:25 [SEQ 429780]  kernel:  [ 8289.113533]  
Mar 23 15:13:25 [SEQ 429781]  kernel:  [ 8289.115539]
[] tipc_nametbl_unsubscribe+0x7e/0x110 [tipc]
Mar 23 15:13:25 [SEQ 429782]  kernel:  [ 8289.123059]
[] ? handle_edge_irq+0x93/0x150
Mar 23 15:13:25 [SEQ 429783]  kernel:  [ 8289.128967]
[] ? tipc_subscrp_send_event+0xf0/0xf0 [tipc]
Mar 23 15:13:25 [SEQ 429784]  kernel:  [ 8289.136408]
[] tipc_subscrp_delete+0x39/0x60 [tipc]
Mar 23 15:13:25 [SEQ 429785]  kernel:  [ 8289.143245]
[] tipc_subscrp_timeout+0x50/0x70 [tipc]
Mar 23 15:13:25 [SEQ 429786]  kernel:  [ 8289.149965]
[] ? do_IRQ+0x65/0xf0
Mar 23 15:13:25 [SEQ 429787]  kernel:  [ 8289.155120]
[] ? tipc_subscrp_send_event+0xf0/0xf0 [tipc]
Mar 23 15:13:25 [SEQ 429788]  kernel:  [ 8289.162382]
[] call_timer_fn+0x44/0x110
Mar 23 15:13:25 [SEQ 429789]  kernel:  [ 8289.168033]
[] ? common_interrupt+0x7f/0x7f
Mar 23 15:13:25 [SEQ 429790]  kernel:  [ 8289.174183]
[] ? tipc_subscrp_send_event+0xf0/0xf0 [tipc]
Mar 23 15:13:25 [SEQ 429791]  kernel:  [ 8289.181324]
[] run_timer_softirq+0x22c/0x280
Mar 23 15:13:25 [SEQ 429792]  kernel:  [ 8289.187546]
[] __do_softirq+0xc8/0x260
Mar 23 15:13:25 [SEQ 429793]  kernel:  [ 8289.193127]
[] irq_exit+0x83/0xb0
Mar 23 15:13:25 [SEQ 429794]  kernel:  [ 8289.198383]
[] do_IRQ+0x65/0xf0
Mar 23 15:13:25 [SEQ 429795]  kernel:  [ 8289.203288]
[] common_interrupt+0x7f/0x7f
Mar 23 15:13:25 [SEQ 429796]  kernel:  [ 8289.209300]  
Mar 23 15:13:25 [SEQ 429797]  kernel:  [ 8289.211319]
[] ? _raw_spin_unlock_irqrestore+0xe/0x10
Mar 23 15:13:25 [SEQ 429798]  kernel:  [ 8289.218469]
[] mod_timer+0xf3/0x1d0
Mar 23 15:13:25 [SEQ 429799]  kernel:  [ 8289.223845]
[] tipc_subscrb_rcv_cb+0x1c9/0x310 [tipc]
Mar 23 15:13:25 [SEQ 429800]  kernel:  [ 8289.230728]
[] tipc_receive_from_sock+0xb2/0x120 [tipc]
Mar 23 15:13:25 [SEQ 429801]  kernel:  [ 8289.237951]
[] tipc_recv_work+0x2f/0x60 [tipc]
Mar 23 15:13:25 [SEQ 429802]  kernel:  [ 8289.244237]
[] process_one_work+0x150/0x3e0
Mar 23 15:13:25 [SEQ 429803]  kernel:  [ 8289.250301]
[] worker_thread+0x111/0x460

--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] TIPC utilities v2.0.2 compatibility with latest Linux Kernel 4.4

2016-01-20 Thread GUNA
Thanks Jon.

For the time-being, I need to stick with old Fedora. So, could you let me
know the location of the header files (ie:tipc.h ...) from the Kernel 4.4
that need to be used? Is that for user space header files?

Is there any benefit of using "tipc" over "tipc-config"?

- Guna

On Wed, Jan 20, 2016 at 10:24 AM, Jon Maloy  wrote:

>
>
> > -Original Message-
> > From: GUNA [mailto:gbala...@gmail.com]
> > Sent: Wednesday, 20 January, 2016 10:18
> > To: tipc-discussion@lists.sourceforge.net
> > Subject: [tipc-discussion] TIPC utilities v2.0.2 compatibility with
> latest Linux
> > Kernel 4.4
> >
> > Hello,
> >
> >
> >
> > Currently, system is running with Fedora Core 16 with 3.4.2-1 kernel and
> > using TIPC utilities v2.0.2.  The version of “tipc-config” was compiled
> against
> > 3.4.2-1 kernel.
> >
> >
> >
> > If I upgrade the kernel to either 4.4, could I still use the same
> “tipc_config” ?
>
> The answer is yes. But you should be aware that tipc-config is deprecated
> now, and that we recommend the new tool 'tipc' instead.
> This comes with the package iproute2 in the newest versions of Fedora, but
> you probably will have to build it yourself if you continue to use an old
> Fedora.
>
> ///jon
>
> >
> > If not, where the user space tipc.h and tipc_config.h files are located
> to
> > create new tipc-config based on new Kernel 4.4 ?
> >
> >
> >
> > Thank you,
> >
> > Guna B
> >
> --
> > Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM
> > + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor
> > end-to-end web transactions and take corrective actions now Troubleshoot
> > faster and improve end-user experience. Signup Now!
> > http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
> > ___
> > tipc-discussion mailing list
> > tipc-discussion@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/tipc-discussion
>
--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


[tipc-discussion] TIPC utilities v2.0.2 compatibility with latest Linux Kernel 4.4

2016-01-20 Thread GUNA
Hello,



Currently, system is running with Fedora Core 16 with 3.4.2-1 kernel and
using TIPC utilities v2.0.2.  The version of “tipc-config” was compiled
against 3.4.2-1 kernel.



If I upgrade the kernel to either 4.4, could I still use the same
“tipc_config” ?

If not, where the user space tipc.h and tipc_config.h files are located to
create new tipc-config based on new Kernel 4.4 ?



Thank you,

Guna B
--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion