Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+

2018-04-17 Thread David Ahern
On 4/17/18 5:29 PM, Ben Greear wrote:
> 
> FYI, problem still happens in 4.16.  I'm going to re-enable my hack below
> for this kernel as well...I had hopes it might be fixed...

Interesting. I was hoping the same.

> 
> BUG: unable to handle kernel NULL pointer dereference at 8
> IP: fib6_walk_continue+0x5b/0x140 [ipv6]
> PGD 8007dfc0c067 P4D 8007dfc0c067 PUD 7e66ff067 PMD 0
> Oops:  [#1] PREEMPT SMP PTI
> Modules linked in: nf_conntrack_netlink nf_conntrack nfnetlink
> nf_defrag_ipv4 libcrc32c vrf]
> CPU: 3 PID: 15117 Comm: ip Tainted: G   O 4.16.0+ #5
> Hardware name: Iron_Systems,Inc CS-CAD-2U-A02/X10SRL-F, BIOS 2.0b
> 05/02/2017
> RIP: 0010:fib6_walk_continue+0x5b/0x140 [ipv6]
> RSP: 0018:c90008c3bc10 EFLAGS: 00010287
> RAX: 88085ac45050 RBX: 8807e03008a0 RCX: 
> RDX:  RSI: c90008c3bc48 RDI: 8232b240
> RBP: 880819167600 R08: 0008 R09: 8807dff10071
> R10: c90008c3bbd0 R11:  R12: 8807e03008a0
> R13: 0002 R14: 8807e05744c8 R15: 8807e08ef000
> FS:  7f2f04342700() GS:88087fcc()
> knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 0008 CR3: 0007e0556002 CR4: 003606e0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Call Trace:
>  inet6_dump_fib+0x14b/0x2c0 [ipv6]
>  netlink_dump+0x216/0x2a0
>  netlink_recvmsg+0x254/0x400
>  ? copy_msghdr_from_user+0xb5/0x110
>  ___sys_recvmsg+0xe9/0x230
>  ? find_held_lock+0x3b/0xb0
>  ? __handle_mm_fault+0x617/0x1180
>  ? __audit_syscall_entry+0xb3/0x110
>  ? __sys_recvmsg+0x39/0x70
>  __sys_recvmsg+0x39/0x70
>  do_syscall_64+0x63/0x120
>  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> RIP: 0033:0x7f2f03a72030
> RSP: 002b:7fffab3de508 EFLAGS: 0246 ORIG_RAX: 002f
> RAX: ffda RBX: 7fffab3e641c RCX: 7f2f03a72030
> RDX:  RSI: 7fffab3de570 RDI: 0004
> RBP:  R08: 7e6c R09: 7fffab3e63a8
> R10: 7fffab3de5b0 R11: 0246 R12: 7fffab3e6608
> R13: 0066b460 R14: 7e6c R15: 
> Code: 85 d2 74 17 f6 40 2a 04 74 11 8b 53 2c 85 d2 0f 84 d7 00 00 00 83
> ea 01 89 53 2c c7 4
> RIP: fib6_walk_continue+0x5b/0x140 [ipv6] RSP: c90008c3bc10
> CR2: 0008
> ---[ end trace bd03458864eb266c ]---
> Kernel panic - not syncing: Fatal exception in interrupt
> Kernel Offset: disabled
> Rebooting in 10 seconds..
> ACPI MEMORY or I/O RESET_REG.
> 


Since you can reproduce, would you mind trying
https://github.com/dsahern/linux.git ipv6/fib6-change-v2

Hopefully these will be committed upstream soon. It changes the game a
bit with the FIB walker. Would be interesting to know if this problem
goes away.


Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+

2018-04-17 Thread Ben Greear

On 01/24/2018 03:59 PM, Ben Greear wrote:

On 06/20/2017 08:03 PM, David Ahern wrote:

On 6/20/17 5:41 PM, Ben Greear wrote:

On 06/20/2017 11:05 AM, Michal Kubecek wrote:

On Tue, Jun 20, 2017 at 07:12:27AM -0700, Ben Greear wrote:

On 06/14/2017 03:25 PM, David Ahern wrote:

On 6/14/17 4:23 PM, Ben Greear wrote:

On 06/13/2017 07:27 PM, David Ahern wrote:


Let's try a targeted debug patch. See attached


I had to change it to pr_err so it would go to our serial console
since the system locked hard on crash,
and that appears to be enough to change the timing where we can no
longer
reproduce the problem.



ok, let's figure out which one is doing that. There are 3 debug
statements. I suspect fib6_del_route is the one setting the state to
FWS_U. Can you remove the debug prints in fib6_repair_tree and
fib6_walk_continue and try again?


We cannot reproduce with just that one printf in the kernel either.  It
must change the timing too much to trigger the bug.


You might try trace_printk() which should have less impact (don't forget
to enable /proc/sys/kernel/ftrace_dump_on_oops).


We cannot reproduce with trace_printk() either.


I think that suggests the walker state is set to FWS_U in
fib6_del_route, and it is the FWS_U case in fib6_walk_continue that
triggers the fault -- the null parent (pn = fn->parent). So we have the
2 areas of code that are interacting.

I'm on a road trip through the end of this week with little time to
focus on this problem. I'll get back to you another suggestion when I can.


FYI, problem still happens in 4.16.  I'm going to re-enable my hack below
for this kernel as well...I had hopes it might be fixed...

BUG: unable to handle kernel NULL pointer dereference at 8
IP: fib6_walk_continue+0x5b/0x140 [ipv6]
PGD 8007dfc0c067 P4D 8007dfc0c067 PUD 7e66ff067 PMD 0
Oops:  [#1] PREEMPT SMP PTI
Modules linked in: nf_conntrack_netlink nf_conntrack nfnetlink nf_defrag_ipv4 
libcrc32c vrf]
CPU: 3 PID: 15117 Comm: ip Tainted: G   O 4.16.0+ #5
Hardware name: Iron_Systems,Inc CS-CAD-2U-A02/X10SRL-F, BIOS 2.0b 05/02/2017
RIP: 0010:fib6_walk_continue+0x5b/0x140 [ipv6]
RSP: 0018:c90008c3bc10 EFLAGS: 00010287
RAX: 88085ac45050 RBX: 8807e03008a0 RCX: 
RDX:  RSI: c90008c3bc48 RDI: 8232b240
RBP: 880819167600 R08: 0008 R09: 8807dff10071
R10: c90008c3bbd0 R11:  R12: 8807e03008a0
R13: 0002 R14: 8807e05744c8 R15: 8807e08ef000
FS:  7f2f04342700() GS:88087fcc() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 0008 CR3: 0007e0556002 CR4: 003606e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 inet6_dump_fib+0x14b/0x2c0 [ipv6]
 netlink_dump+0x216/0x2a0
 netlink_recvmsg+0x254/0x400
 ? copy_msghdr_from_user+0xb5/0x110
 ___sys_recvmsg+0xe9/0x230
 ? find_held_lock+0x3b/0xb0
 ? __handle_mm_fault+0x617/0x1180
 ? __audit_syscall_entry+0xb3/0x110
 ? __sys_recvmsg+0x39/0x70
 __sys_recvmsg+0x39/0x70
 do_syscall_64+0x63/0x120
 entry_SYSCALL_64_after_hwframe+0x3d/0xa2
RIP: 0033:0x7f2f03a72030
RSP: 002b:7fffab3de508 EFLAGS: 0246 ORIG_RAX: 002f
RAX: ffda RBX: 7fffab3e641c RCX: 7f2f03a72030
RDX:  RSI: 7fffab3de570 RDI: 0004
RBP:  R08: 7e6c R09: 7fffab3e63a8
R10: 7fffab3de5b0 R11: 0246 R12: 7fffab3e6608
R13: 0066b460 R14: 7e6c R15: 
Code: 85 d2 74 17 f6 40 2a 04 74 11 8b 53 2c 85 d2 0f 84 d7 00 00 00 83 ea 01 
89 53 2c c7 4
RIP: fib6_walk_continue+0x5b/0x140 [ipv6] RSP: c90008c3bc10
CR2: 0008
---[ end trace bd03458864eb266c ]---
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: disabled
Rebooting in 10 seconds..
ACPI MEMORY or I/O RESET_REG.



So, though I don't know the right way to fix it, the patch below appears
to make the system not crash.


diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 68b9cc7..bf19a14 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -1614,6 +1614,12 @@ static int fib6_walk_continue(struct fib6_walker *w)
pn = fn->parent;
w->node = pn;
 #ifdef CONFIG_IPV6_SUBTREES
+   if (WARN_ON_ONCE(!pn)) {
+   pr_err("FWS-U, w: %p  fn: %p  pn: %p\n",
+  w, fn, pn);
+   /* Attempt to work around crash that has been 
here forever. --Ben */
+   return 0;
+   }
if (FIB6_SUBTREE(pn) == fn) {
WARN_ON(!(fn->fn_flags & RTN_ROOT));
w->state = FWS_L;



The printout looks like this (when adding 4000 mac-v

Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+

2018-01-24 Thread Ben Greear

On 06/20/2017 08:03 PM, David Ahern wrote:

On 6/20/17 5:41 PM, Ben Greear wrote:

On 06/20/2017 11:05 AM, Michal Kubecek wrote:

On Tue, Jun 20, 2017 at 07:12:27AM -0700, Ben Greear wrote:

On 06/14/2017 03:25 PM, David Ahern wrote:

On 6/14/17 4:23 PM, Ben Greear wrote:

On 06/13/2017 07:27 PM, David Ahern wrote:


Let's try a targeted debug patch. See attached


I had to change it to pr_err so it would go to our serial console
since the system locked hard on crash,
and that appears to be enough to change the timing where we can no
longer
reproduce the problem.



ok, let's figure out which one is doing that. There are 3 debug
statements. I suspect fib6_del_route is the one setting the state to
FWS_U. Can you remove the debug prints in fib6_repair_tree and
fib6_walk_continue and try again?


We cannot reproduce with just that one printf in the kernel either.  It
must change the timing too much to trigger the bug.


You might try trace_printk() which should have less impact (don't forget
to enable /proc/sys/kernel/ftrace_dump_on_oops).


We cannot reproduce with trace_printk() either.


I think that suggests the walker state is set to FWS_U in
fib6_del_route, and it is the FWS_U case in fib6_walk_continue that
triggers the fault -- the null parent (pn = fn->parent). So we have the
2 areas of code that are interacting.

I'm on a road trip through the end of this week with little time to
focus on this problem. I'll get back to you another suggestion when I can.


So, though I don't know the right way to fix it, the patch below appears
to make the system not crash.


diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 68b9cc7..bf19a14 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -1614,6 +1614,12 @@ static int fib6_walk_continue(struct fib6_walker *w)
pn = fn->parent;
w->node = pn;
 #ifdef CONFIG_IPV6_SUBTREES
+   if (WARN_ON_ONCE(!pn)) {
+   pr_err("FWS-U, w: %p  fn: %p  pn: %p\n",
+  w, fn, pn);
+   /* Attempt to work around crash that has been 
here forever. --Ben */
+   return 0;
+   }
if (FIB6_SUBTREE(pn) == fn) {
WARN_ON(!(fn->fn_flags & RTN_ROOT));
w->state = FWS_L;



The printout looks like this (when adding 4000 mac-vlans, so it is pretty 
rare).  PN is definitely NULL sometimes:

[root@2u-6n ~]# journalctl -f|grep FWS
Jan 24 15:48:05 2u-6n kernel: IPv6: FWS-U, w: 8807ea121ba0  fn: 
880856a09260  pn:   (null)
Jan 24 15:51:15 2u-6n kernel: IPv6: FWS-U, w: 8807e3963de0  fn: 
880856a09260  pn:   (null)
Jan 24 15:51:15 2u-6n kernel: IPv6: FWS-U, w: 88081ac22de0  fn: 
880856a09260  pn:   (null)
Jan 24 15:53:13 2u-6n kernel: IPv6: FWS-U, w: 8808290c69c0  fn: 
8807e369f920  pn:   (null)
Jan 24 15:53:24 2u-6n kernel: IPv6: FWS-U, w: 8807ea3156c0  fn: 
88082d1eeb60  pn:   (null)



8066 Jan 24 15:48:04 2u-6n kernel: 8021q: adding VLAN 0 to HW filter on device 
eth2#1006
 8067 Jan 24 15:48:05 2u-6n kernel: [ cut here ]
 8068 Jan 24 15:48:05 2u-6n kernel: WARNING: CPU: 5 PID: 3346 at /home/greearb/git/linux-4.13.dev.y/net/ipv6/ip6_fib.c:1617 fib6_walk_continue+ 
0x154/0x1b0 [ipv6]
 8069 Jan 24 15:48:05 2u-6n kernel: Modules linked in: 8021q garp mrp stp llc fuse macvlan wanlink(O) pktgen ipmi_ssif coretemp intel_raplsb_edac 
x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm ath9k irqbypass iTCO_wdt ath9k_common iTCO_vendor_support ath9k_hw ath  i2c_i801 mac80211 joydev 
lpc_ich cfg80211 ioatdma shpchp tpm_tis tpm_tis_core wmi tpm ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter nfsd auth_rpcgss nfs_acl 
sch_fq_codel lockd grace sunrpc ast drm_kms_helper ttm drm igb hwmon ptp pps_core dca i2c_algo_bit i2c_core ipv6 crc_ccitt

 8070 Jan 24 15:48:05 2u-6n kernel: CPU: 5 PID: 3346 Comm: ip Tainted: G
   O4.13.16+ #22
 8071 Jan 24 15:48:05 2u-6n kernel: Hardware name: Iron_Systems,Inc 
CS-CAD-2U-A02/X10SRL-F, BIOS 2.0b 05/02/2017
 8072 Jan 24 15:48:05 2u-6n kernel: task: 8807e9ef1dc0 task.stack: 
c9002083c000
 8073 Jan 24 15:48:05 2u-6n kernel: RIP: 0010:fib6_walk_continue+0x154/0x1b0 
[ipv6]
 8074 Jan 24 15:48:05 2u-6n kernel: RSP: 0018:c9002083fbc0 EFLAGS: 00010246
 8075 Jan 24 15:48:05 2u-6n kernel: RAX:  RBX: 8807ea121ba0 
RCX: 
 8076 Jan 24 15:48:05 2u-6n kernel: RDX: 880856a09260 RSI: c9002083fc00 
RDI: 81ef2140
 8077 Jan 24 15:48:05 2u-6n kernel: RBP: c9002083fbc8 R08: 0008 
R09: 8807e36f6b25
 8078 Jan 24 15:48:05 2u-6n kernel: R10: c9002083fb70 R11:  
R12: 0002
 8079 Jan 24 15:48:05 2u-6n kernel: R13

Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+

2017-06-25 Thread David Ahern
On 6/20/17 9:03 PM, David Ahern wrote:
> On 6/20/17 5:41 PM, Ben Greear wrote:
>> On 06/20/2017 11:05 AM, Michal Kubecek wrote:
>>> On Tue, Jun 20, 2017 at 07:12:27AM -0700, Ben Greear wrote:
 On 06/14/2017 03:25 PM, David Ahern wrote:
> On 6/14/17 4:23 PM, Ben Greear wrote:
>> On 06/13/2017 07:27 PM, David Ahern wrote:
>>
>>> Let's try a targeted debug patch. See attached
>>
>> I had to change it to pr_err so it would go to our serial console
>> since the system locked hard on crash,
>> and that appears to be enough to change the timing where we can no
>> longer
>> reproduce the problem.
>
>
> ok, let's figure out which one is doing that. There are 3 debug
> statements. I suspect fib6_del_route is the one setting the state to
> FWS_U. Can you remove the debug prints in fib6_repair_tree and
> fib6_walk_continue and try again?

 We cannot reproduce with just that one printf in the kernel either.  It
 must change the timing too much to trigger the bug.
>>>
>>> You might try trace_printk() which should have less impact (don't forget
>>> to enable /proc/sys/kernel/ftrace_dump_on_oops).
>>
>> We cannot reproduce with trace_printk() either.
> 
> I think that suggests the walker state is set to FWS_U in
> fib6_del_route, and it is the FWS_U case in fib6_walk_continue that
> triggers the fault -- the null parent (pn = fn->parent). So we have the
> 2 areas of code that are interacting.
> 
> I'm on a road trip through the end of this week with little time to
> focus on this problem. I'll get back to you another suggestion when I can.
> 

Remove all other changes and try the attached. The fault should not
happen so you need to watch dmesg / console output.
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index deea901746c8..245941e9db8a 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -372,12 +372,13 @@ static int fib6_dump_table(struct fib6_table *table, 
struct sk_buff *skb,
 
read_lock_bh(&table->tb6_lock);
res = fib6_walk(net, w);
-   read_unlock_bh(&table->tb6_lock);
if (res > 0) {
cb->args[4] = 1;
cb->args[5] = w->root->fn_sernum;
}
+   read_unlock_bh(&table->tb6_lock);
} else {
+   read_lock_bh(&table->tb6_lock);
if (cb->args[5] != w->root->fn_sernum) {
/* Begin at the root if the tree changed */
cb->args[5] = w->root->fn_sernum;
@@ -387,7 +388,6 @@ static int fib6_dump_table(struct fib6_table *table, struct 
sk_buff *skb,
} else
w->skip = 0;
 
-   read_lock_bh(&table->tb6_lock);
res = fib6_walk_continue(w);
read_unlock_bh(&table->tb6_lock);
if (res <= 0) {
@@ -1422,7 +1422,6 @@ static void fib6_del_route(struct fib6_node *fn, struct 
rt6_info **rtp,
 
/* Unlink it */
*rtp = rt->dst.rt6_next;
-   rt->rt6i_node = NULL;
net->ipv6.rt6_stats->fib_rt_entries--;
net->ipv6.rt6_stats->fib_discarded_routes++;
 
@@ -1447,12 +1446,24 @@ static void fib6_del_route(struct fib6_node *fn, struct 
rt6_info **rtp,
if (w->state == FWS_C && w->leaf == rt) {
RT6_TRACE("walker %p adjusted by delroute\n", w);
w->leaf = rt->dst.rt6_next;
-   if (!w->leaf)
-   w->state = FWS_U;
+   if (!w->leaf) {
+   if (!w->node->parent) {
+pr_warn("fib6_del_route: setting walker node to null while deleting route: "
+   "dst %pI6c/%d src %pI6c/%d dev %s siblings %d\n",
+   &rt->rt6i_dst.addr, rt->rt6i_dst.plen, &rt->rt6i_src.addr, 
rt->rt6i_src.plen,
+   rt->dst.dev ? rt->dst.dev->name : "", rt->rt6i_nsiblings);
+
+if (rt->rt6i_node == w->node)
+   pr_warn("fib6_del_route: walker node matches deleted route\n");
+   w->node = NULL;
+   } else
+   w->state = FWS_U;
+   }
}
}
read_unlock(&net->ipv6.fib6_walker_lock);
 
+   rt->rt6i_node = NULL;
rt->dst.rt6_next = NULL;
 
/* If it was last route, expunge its radix tree node */


Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+

2017-06-20 Thread David Ahern
On 6/20/17 5:41 PM, Ben Greear wrote:
> On 06/20/2017 11:05 AM, Michal Kubecek wrote:
>> On Tue, Jun 20, 2017 at 07:12:27AM -0700, Ben Greear wrote:
>>> On 06/14/2017 03:25 PM, David Ahern wrote:
 On 6/14/17 4:23 PM, Ben Greear wrote:
> On 06/13/2017 07:27 PM, David Ahern wrote:
>
>> Let's try a targeted debug patch. See attached
>
> I had to change it to pr_err so it would go to our serial console
> since the system locked hard on crash,
> and that appears to be enough to change the timing where we can no
> longer
> reproduce the problem.


 ok, let's figure out which one is doing that. There are 3 debug
 statements. I suspect fib6_del_route is the one setting the state to
 FWS_U. Can you remove the debug prints in fib6_repair_tree and
 fib6_walk_continue and try again?
>>>
>>> We cannot reproduce with just that one printf in the kernel either.  It
>>> must change the timing too much to trigger the bug.
>>
>> You might try trace_printk() which should have less impact (don't forget
>> to enable /proc/sys/kernel/ftrace_dump_on_oops).
> 
> We cannot reproduce with trace_printk() either.

I think that suggests the walker state is set to FWS_U in
fib6_del_route, and it is the FWS_U case in fib6_walk_continue that
triggers the fault -- the null parent (pn = fn->parent). So we have the
2 areas of code that are interacting.

I'm on a road trip through the end of this week with little time to
focus on this problem. I'll get back to you another suggestion when I can.


Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+

2017-06-20 Thread Ben Greear

On 06/20/2017 11:05 AM, Michal Kubecek wrote:

On Tue, Jun 20, 2017 at 07:12:27AM -0700, Ben Greear wrote:

On 06/14/2017 03:25 PM, David Ahern wrote:

On 6/14/17 4:23 PM, Ben Greear wrote:

On 06/13/2017 07:27 PM, David Ahern wrote:


Let's try a targeted debug patch. See attached


I had to change it to pr_err so it would go to our serial console
since the system locked hard on crash,
and that appears to be enough to change the timing where we can no longer
reproduce the problem.



ok, let's figure out which one is doing that. There are 3 debug
statements. I suspect fib6_del_route is the one setting the state to
FWS_U. Can you remove the debug prints in fib6_repair_tree and
fib6_walk_continue and try again?


We cannot reproduce with just that one printf in the kernel either.  It
must change the timing too much to trigger the bug.


You might try trace_printk() which should have less impact (don't forget
to enable /proc/sys/kernel/ftrace_dump_on_oops).


We cannot reproduce with trace_printk() either.

Thanks,
Ben



Michal Kubecek




--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com



Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+

2017-06-20 Thread Michal Kubecek
On Tue, Jun 20, 2017 at 07:12:27AM -0700, Ben Greear wrote:
> On 06/14/2017 03:25 PM, David Ahern wrote:
> >On 6/14/17 4:23 PM, Ben Greear wrote:
> >>On 06/13/2017 07:27 PM, David Ahern wrote:
> >>
> >>>Let's try a targeted debug patch. See attached
> >>
> >>I had to change it to pr_err so it would go to our serial console
> >>since the system locked hard on crash,
> >>and that appears to be enough to change the timing where we can no longer
> >>reproduce the problem.
> >
> >
> >ok, let's figure out which one is doing that. There are 3 debug
> >statements. I suspect fib6_del_route is the one setting the state to
> >FWS_U. Can you remove the debug prints in fib6_repair_tree and
> >fib6_walk_continue and try again?
> 
> We cannot reproduce with just that one printf in the kernel either.  It
> must change the timing too much to trigger the bug.

You might try trace_printk() which should have less impact (don't forget
to enable /proc/sys/kernel/ftrace_dump_on_oops).

Michal Kubecek



Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+

2017-06-20 Thread Ben Greear



On 06/14/2017 03:25 PM, David Ahern wrote:

On 6/14/17 4:23 PM, Ben Greear wrote:

On 06/13/2017 07:27 PM, David Ahern wrote:


Let's try a targeted debug patch. See attached


I had to change it to pr_err so it would go to our serial console
since the system locked hard on crash,
and that appears to be enough to change the timing where we can no longer
reproduce the problem.



ok, let's figure out which one is doing that. There are 3 debug
statements. I suspect fib6_del_route is the one setting the state to
FWS_U. Can you remove the debug prints in fib6_repair_tree and
fib6_walk_continue and try again?


We cannot reproduce with just that one printf in the kernel either.  It
must change the timing too much to trigger the bug.

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+

2017-06-14 Thread David Ahern
On 6/14/17 4:23 PM, Ben Greear wrote:
> On 06/13/2017 07:27 PM, David Ahern wrote:
> 
>> Let's try a targeted debug patch. See attached
> 
> I had to change it to pr_err so it would go to our serial console
> since the system locked hard on crash,
> and that appears to be enough to change the timing where we can no longer
> reproduce the problem.


ok, let's figure out which one is doing that. There are 3 debug
statements. I suspect fib6_del_route is the one setting the state to
FWS_U. Can you remove the debug prints in fib6_repair_tree and
fib6_walk_continue and try again?


Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+

2017-06-14 Thread Ben Greear

On 06/13/2017 07:27 PM, David Ahern wrote:


Let's try a targeted debug patch. See attached


I had to change it to pr_err so it would go to our serial console
since the system locked hard on crash,
and that appears to be enough to change the timing where we can no longer
reproduce the problem.

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com



Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+

2017-06-13 Thread David Ahern
On 6/13/17 3:42 PM, Cong Wang wrote:
> On Tue, Jun 13, 2017 at 1:16 PM, Ben Greear  wrote:
>> On 06/09/2017 02:25 PM, Eric Dumazet wrote:
>>>
>>> On Fri, 2017-06-09 at 07:27 -0600, David Ahern wrote:

 On 6/8/17 11:55 PM, Cong Wang wrote:
> Apparently fn->parent is NULL here for some reason, but
> I don't know if that is expected or not. If a simple NULL check
> is not enough here, we have to trace why it is NULL.


 From my understanding, parent should not be null hence the attempts to
 fix access to table nodes under a lock. ie., figuring out why it is null
 here.
>>
>>
>> If someone has more suggestions, I'll be happy to test.
> 
> Can you enable RT6_TRACE() by changing RT6_DEBUG
> from 2 to 3? We may collect some useful log with it.
> 
> diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
> index d4bf2c6..1941595 100644
> --- a/net/ipv6/ip6_fib.c
> +++ b/net/ipv6/ip6_fib.c
> @@ -37,7 +37,7 @@
>  #include 
>  #include 
> 
> -#define RT6_DEBUG 2
> +#define RT6_DEBUG 3
> 
>  #if RT6_DEBUG >= 3
>  #define RT6_TRACE(x...) pr_debug(x)
> 

Let's try a targeted debug patch. See attached
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index deea901746c8..367f1284f05b 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -1396,6 +1396,8 @@ static struct fib6_node *fib6_repair_tree(struct net *net,
RT6_TRACE("W %p adjusted by 
delnode 2, s=%d\n", w, w->state);
w->state = w->state >= FWS_C ? 
FWS_U : FWS_INIT;
}
+   if (w->state == FWS_U)
+   pr_warn("fib6_repair_tree: W %p 
adjusted by delnode 2, state FWS_U\n", w, w->state);
}
}
}
@@ -1447,8 +1449,10 @@ static void fib6_del_route(struct fib6_node *fn, struct 
rt6_info **rtp,
if (w->state == FWS_C && w->leaf == rt) {
RT6_TRACE("walker %p adjusted by delroute\n", w);
w->leaf = rt->dst.rt6_next;
-   if (!w->leaf)
+   if (!w->leaf) {
+   pr_warn("fib6_del_route: walker %p adjusted by 
delroute - state FWS_U\n", w);
w->state = FWS_U;
+   }
}
}
read_unlock(&net->ipv6.fib6_walker_lock);
@@ -1591,6 +1595,7 @@ static int fib6_walk_continue(struct fib6_walker *w)
continue;
}
 skip:
+   pr_warn("fib6_walk_continue: set state to FWS_U\n");
w->state = FWS_U;
case FWS_U:
if (fn == w->root)


Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+

2017-06-13 Thread Cong Wang
On Tue, Jun 13, 2017 at 1:16 PM, Ben Greear  wrote:
> On 06/09/2017 02:25 PM, Eric Dumazet wrote:
>>
>> On Fri, 2017-06-09 at 07:27 -0600, David Ahern wrote:
>>>
>>> On 6/8/17 11:55 PM, Cong Wang wrote:
 Apparently fn->parent is NULL here for some reason, but
 I don't know if that is expected or not. If a simple NULL check
 is not enough here, we have to trace why it is NULL.
>>>
>>>
>>> From my understanding, parent should not be null hence the attempts to
>>> fix access to table nodes under a lock. ie., figuring out why it is null
>>> here.
>
>
> If someone has more suggestions, I'll be happy to test.

Can you enable RT6_TRACE() by changing RT6_DEBUG
from 2 to 3? We may collect some useful log with it.

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index d4bf2c6..1941595 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -37,7 +37,7 @@
 #include 
 #include 

-#define RT6_DEBUG 2
+#define RT6_DEBUG 3

 #if RT6_DEBUG >= 3
 #define RT6_TRACE(x...) pr_debug(x)


Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+

2017-06-13 Thread Ben Greear

On 06/13/2017 01:28 PM, David Ahern wrote:

On 6/13/17 2:16 PM, Ben Greear wrote:

On 06/09/2017 02:25 PM, Eric Dumazet wrote:

On Fri, 2017-06-09 at 07:27 -0600, David Ahern wrote:

On 6/8/17 11:55 PM, Cong Wang wrote:

On Thu, Jun 8, 2017 at 2:27 PM, Ben Greear 
wrote:


As far as I can tell, the patch did not help, or at least we still
reproduce
the
crash easily.


netlink dump is serialized by nlk->cb_mutex so I don't think that
patch makes any sense w.r.t race condition.


From what I can see fn_sernum should be accessed under table lock, so
when saving and checking it during a walk make sure it the lock is held.
That has nothing to do with the netlink dump, but the table changing
during a walk.



Yes, your patch makes total sense, of course.


I guess someone should go ahead and make an official patch and
submit it, even if it doesn't fix my problem.


I can do that; was hoping to root cause the problem first.





(gdb) l *(fib6_walk_continue+0x76)
0x188c6 is in fib6_walk_continue
(/home/greearb/git/linux-2.6/net/ipv6/ip6_fib.c:1593).
1588if (fn == w->root)
1589return 0;
1590pn = fn->parent;
1591w->node = pn;
1592#ifdef CONFIG_IPV6_SUBTREES
1593if (FIB6_SUBTREE(pn) == fn) {


Apparently fn->parent is NULL here for some reason, but
I don't know if that is expected or not. If a simple NULL check
is not enough here, we have to trace why it is NULL.


From my understanding, parent should not be null hence the attempts to
fix access to table nodes under a lock. ie., figuring out why it is null
here.


If someone has more suggestions, I'll be happy to test.


I have looked at the code again and nothing is jumping out. Will look
again later today.



I noticed there is some code to help fix up the walkers when nodes are deleted. 
 They
use lock:   read_lock(&net->ipv6.fib6_walker_lock);

The code you were tweaking uses a different lock:  
read_lock_bh(&table->tb6_lock);

In is certainly not simple code, so I don't know if that is correct or not, but
might possibly be a place to start looking.

I'm going to re-test with a WARN_ON to see if that triggers since previous 
suggestion
was that f->parent was NULL.


diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 51cd637..86295df 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -1571,6 +1571,10 @@ static int fib6_walk_continue(struct fib6_walker *w)
case FWS_U:
if (fn == w->root)
return 0;
+   if (!fn->parent) {
+   WARN_ON_ONCE(0);
+   return 0;
+   }
pn = fn->parent;
w->node = pn;
 #ifdef CONFIG_IPV6_SUBTREES


Thanks,
Ben

Ben Greear 
Candela Technologies Inc  http://www.candelatech.com



Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+

2017-06-13 Thread David Ahern
On 6/13/17 2:16 PM, Ben Greear wrote:
> On 06/09/2017 02:25 PM, Eric Dumazet wrote:
>> On Fri, 2017-06-09 at 07:27 -0600, David Ahern wrote:
>>> On 6/8/17 11:55 PM, Cong Wang wrote:
 On Thu, Jun 8, 2017 at 2:27 PM, Ben Greear 
 wrote:
>
> As far as I can tell, the patch did not help, or at least we still
> reproduce
> the
> crash easily.

 netlink dump is serialized by nlk->cb_mutex so I don't think that
 patch makes any sense w.r.t race condition.
>>>
>>> From what I can see fn_sernum should be accessed under table lock, so
>>> when saving and checking it during a walk make sure it the lock is held.
>>> That has nothing to do with the netlink dump, but the table changing
>>> during a walk.
>>
>>
>> Yes, your patch makes total sense, of course.
> 
> I guess someone should go ahead and make an official patch and
> submit it, even if it doesn't fix my problem.

I can do that; was hoping to root cause the problem first.


> 
> (gdb) l *(fib6_walk_continue+0x76)
> 0x188c6 is in fib6_walk_continue
> (/home/greearb/git/linux-2.6/net/ipv6/ip6_fib.c:1593).
> 1588if (fn == w->root)
> 1589return 0;
> 1590pn = fn->parent;
> 1591w->node = pn;
> 1592#ifdef CONFIG_IPV6_SUBTREES
> 1593if (FIB6_SUBTREE(pn) == fn) {

 Apparently fn->parent is NULL here for some reason, but
 I don't know if that is expected or not. If a simple NULL check
 is not enough here, we have to trace why it is NULL.
>>>
>>> From my understanding, parent should not be null hence the attempts to
>>> fix access to table nodes under a lock. ie., figuring out why it is null
>>> here.
> 
> If someone has more suggestions, I'll be happy to test.

I have looked at the code again and nothing is jumping out. Will look
again later today.


Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+

2017-06-13 Thread Ben Greear

On 06/09/2017 02:25 PM, Eric Dumazet wrote:

On Fri, 2017-06-09 at 07:27 -0600, David Ahern wrote:

On 6/8/17 11:55 PM, Cong Wang wrote:

On Thu, Jun 8, 2017 at 2:27 PM, Ben Greear  wrote:


As far as I can tell, the patch did not help, or at least we still reproduce
the
crash easily.


netlink dump is serialized by nlk->cb_mutex so I don't think that
patch makes any sense w.r.t race condition.


From what I can see fn_sernum should be accessed under table lock, so
when saving and checking it during a walk make sure it the lock is held.
That has nothing to do with the netlink dump, but the table changing
during a walk.



Yes, your patch makes total sense, of course.


I guess someone should go ahead and make an official patch and
submit it, even if it doesn't fix my problem.


(gdb) l *(fib6_walk_continue+0x76)
0x188c6 is in fib6_walk_continue
(/home/greearb/git/linux-2.6/net/ipv6/ip6_fib.c:1593).
1588if (fn == w->root)
1589return 0;
1590pn = fn->parent;
1591w->node = pn;
1592#ifdef CONFIG_IPV6_SUBTREES
1593if (FIB6_SUBTREE(pn) == fn) {


Apparently fn->parent is NULL here for some reason, but
I don't know if that is expected or not. If a simple NULL check
is not enough here, we have to trace why it is NULL.


From my understanding, parent should not be null hence the attempts to
fix access to table nodes under a lock. ie., figuring out why it is null
here.


If someone has more suggestions, I'll be happy to test.

Thanks,
Ben


--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com



Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+

2017-06-09 Thread Eric Dumazet
On Fri, 2017-06-09 at 07:27 -0600, David Ahern wrote:
> On 6/8/17 11:55 PM, Cong Wang wrote:
> > On Thu, Jun 8, 2017 at 2:27 PM, Ben Greear  wrote:
> >>
> >> As far as I can tell, the patch did not help, or at least we still 
> >> reproduce
> >> the
> >> crash easily.
> > 
> > netlink dump is serialized by nlk->cb_mutex so I don't think that
> > patch makes any sense w.r.t race condition.
> 
> From what I can see fn_sernum should be accessed under table lock, so
> when saving and checking it during a walk make sure it the lock is held.
> That has nothing to do with the netlink dump, but the table changing
> during a walk.


Yes, your patch makes total sense, of course.




> 
> 
> >> (gdb) l *(fib6_walk_continue+0x76)
> >> 0x188c6 is in fib6_walk_continue
> >> (/home/greearb/git/linux-2.6/net/ipv6/ip6_fib.c:1593).
> >> 1588if (fn == w->root)
> >> 1589return 0;
> >> 1590pn = fn->parent;
> >> 1591w->node = pn;
> >> 1592#ifdef CONFIG_IPV6_SUBTREES
> >> 1593if (FIB6_SUBTREE(pn) == fn) {
> > 
> > Apparently fn->parent is NULL here for some reason, but
> > I don't know if that is expected or not. If a simple NULL check
> > is not enough here, we have to trace why it is NULL.
> 
> From my understanding, parent should not be null hence the attempts to
> fix access to table nodes under a lock. ie., figuring out why it is null
> here.




Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+

2017-06-09 Thread David Ahern
On 6/8/17 11:55 PM, Cong Wang wrote:
> On Thu, Jun 8, 2017 at 2:27 PM, Ben Greear  wrote:
>>
>> As far as I can tell, the patch did not help, or at least we still reproduce
>> the
>> crash easily.
> 
> netlink dump is serialized by nlk->cb_mutex so I don't think that
> patch makes any sense w.r.t race condition.

>From what I can see fn_sernum should be accessed under table lock, so
when saving and checking it during a walk make sure it the lock is held.
That has nothing to do with the netlink dump, but the table changing
during a walk.



>> (gdb) l *(fib6_walk_continue+0x76)
>> 0x188c6 is in fib6_walk_continue
>> (/home/greearb/git/linux-2.6/net/ipv6/ip6_fib.c:1593).
>> 1588if (fn == w->root)
>> 1589return 0;
>> 1590pn = fn->parent;
>> 1591w->node = pn;
>> 1592#ifdef CONFIG_IPV6_SUBTREES
>> 1593if (FIB6_SUBTREE(pn) == fn) {
> 
> Apparently fn->parent is NULL here for some reason, but
> I don't know if that is expected or not. If a simple NULL check
> is not enough here, we have to trace why it is NULL.

>From my understanding, parent should not be null hence the attempts to
fix access to table nodes under a lock. ie., figuring out why it is null
here.


Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+

2017-06-08 Thread Cong Wang
On Thu, Jun 8, 2017 at 2:27 PM, Ben Greear  wrote:
>
> As far as I can tell, the patch did not help, or at least we still reproduce
> the
> crash easily.

netlink dump is serialized by nlk->cb_mutex so I don't think that
patch makes any sense w.r.t race condition.


> (gdb) l *(fib6_walk_continue+0x76)
> 0x188c6 is in fib6_walk_continue
> (/home/greearb/git/linux-2.6/net/ipv6/ip6_fib.c:1593).
> 1588if (fn == w->root)
> 1589return 0;
> 1590pn = fn->parent;
> 1591w->node = pn;
> 1592#ifdef CONFIG_IPV6_SUBTREES
> 1593if (FIB6_SUBTREE(pn) == fn) {

Apparently fn->parent is NULL here for some reason, but
I don't know if that is expected or not. If a simple NULL check
is not enough here, we have to trace why it is NULL.


> 1594WARN_ON(!(fn->fn_flags & RTN_ROOT));
> 1595w->state = FWS_L;
> 1596continue;
> 1597}
> (gdb) l *(inet6_dump_fib+0x1ab)
> 0x1939b is in inet6_dump_fib
> (/home/greearb/git/linux-2.6/net/ipv6/ip6_fib.c:392).
> 387 w->skip = w->count;
> 388 } else
> 389 w->skip = 0;
> 390
> 391 res = fib6_walk_continue(w);
> 392 read_unlock_bh(&table->tb6_lock);
> 393 if (res <= 0) {
> 394 fib6_walker_unlink(net, w);
> 395 cb->args[4] = 0;
> 396 }
> (gdb)


Thanks.


Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+

2017-06-08 Thread Ben Greear

On 06/06/2017 09:19 PM, Eric Dumazet wrote:

On Tue, 2017-06-06 at 18:34 -0600, David Ahern wrote:

On 6/6/17 6:27 PM, Eric Dumazet wrote:

Good catch, but it looks like similar fix is needed a few lines before.

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 
deea901746c8570c5e801e40592c91e3b62812e0..b214443dc8346cef3690df7f27cc48a864028865
 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -372,12 +372,13 @@ static int fib6_dump_table(struct fib6_table *table, 
struct sk_buff *skb,

read_lock_bh(&table->tb6_lock);
res = fib6_walk(net, w);
-   read_unlock_bh(&table->tb6_lock);
if (res > 0) {
cb->args[4] = 1;
cb->args[5] = w->root->fn_sernum;
}
+   read_unlock_bh(&table->tb6_lock);


indeed. tunnel vision on Ben's problem


BTW, bug was already Ben's problem when Patrick tried to fix it
in commit 2bec5a369ee79 ("ipv6: fib: fix crash when changing large fib
while dumping it")  seven years ago ;)


As far as I can tell, the patch did not help, or at least we still reproduce the
crash easily.

ct524-ffb0 login: BUG: unable to handle kernel NULL pointer dereference at 
0018
IP: fib6_walk_continue+0x76/0x180 [ipv6]
PGD 3ec59a067
P4D 3ec59a067
PUD 3eb939067
PMD 0

Oops:  [#1] PREEMPT SMP
Modules linked in: nf_conntrack_netlink nf_conntrack nfnetlink nf_defrag_ipv4 libcrc32c bridge stp llc veth bnep fuse macvlan pktgen cfg80211 ipmi_ssif iTCO_wdt 
iTCO_vendor_support coretemp intel_rapl x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm irqbypass joydev i2c_i801 ie31200_edac intel_pch_thermal shpchp 
hci_uart ipmi_si btbcm btqca ipmi_devintf btintel ipmi_msghandler pinctrl_sunrisepoint bluetooth intel_lpss_acpi acpi_als video pinctrl_intel intel_lpss 
kfifo_buf tpm_tis tpm_tis_core industrialio acpi_power_meter tpm acpi_pad sch_fq_codel nfsd auth_rpcgss nfs_acl lockd grace sunrpc ast drm_kms_helper ttm drm 
igb hwmon ptp pps_core dca i2c_algo_bit i2c_hid i2c_core ipv6 crc_ccitt [last unloaded: nfnetlink]

CPU: 3 PID: 2185 Comm: ip Not tainted 4.12.0-rc4+ #32
Hardware name: Supermicro Super Server/X11SSM-F, BIOS 1.0b 12/29/2015
task: 8803e87fd940 task.stack: c90009ae8000
RIP: 0010:fib6_walk_continue+0x76/0x180 [ipv6]
RSP: 0018:c90009aebbc0 EFLAGS: 00010287
RAX: 880460df8ca0 RBX: 8803f20a2c60 RCX: 
RDX:  RSI: c90009aebc00 RDI: 81eee280
RBP: c90009aebbc8 R08: 0008 R09: 8803e87b47cd
R10: c90009aebb70 R11:  R12: 0001
R13: 0001 R14: 8803f20a2c60 R15: 8803ec601f80
FS:  7f43520ee700() GS:8804778c() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 0018 CR3: 0003ebb46000 CR4: 003406e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 inet6_dump_fib+0x1ab/0x2a0 [ipv6]
 netlink_dump+0x11d/0x290
 netlink_recvmsg+0x260/0x3f0
 sock_recvmsg+0x38/0x40
 ___sys_recvmsg+0xe9/0x230
 ? alloc_pages_vma+0x9d/0x260
 ? page_add_new_anon_rmap+0x88/0xc0
 ? lru_cache_add_active_or_unevictable+0x31/0xb0
 ? __handle_mm_fault+0xce3/0xf70
 __sys_recvmsg+0x3d/0x70
 ? __sys_recvmsg+0x3d/0x70
 SyS_recvmsg+0xd/0x20
 do_syscall_64+0x56/0xc0
 entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7f4351a23050
RSP: 002b:7ffdb1bfafb8 EFLAGS: 0246 ORIG_RAX: 002f
RAX: ffda RBX:  RCX: 7f4351a23050
RDX:  RSI: 7ffdb1bfb020 RDI: 0004
RBP: 7ffdb1bff044 R08: 3fe4 R09: 
R10: 7ffdb1bfb060 R11: 0246 R12: 0064f360
R13: 7ffdb1bff0b0 R14: 3fe4 R15: 
Code: f6 40 2a 04 74 11 8b 53 30 85 d2 0f 84 02 01 00 00 83 ea 01 89 53 30 c7 43 28 04 00 00 00 48 39 43 10 74 33 48 8b 10 48 89 53 18 <48> 39 42 18 0f 84 a3 00 
00 00 48 39 42 08 0f 84 ae 00 00 00 48

RIP: fib6_walk_continue+0x76/0x180 [ipv6] RSP: c90009aebbc0
CR2: 0018
---[ end trace 06ac9dee8b14db6b ]---
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: disabled



(gdb) l *(fib6_walk_continue+0x76)
0x188c6 is in fib6_walk_continue 
(/home/greearb/git/linux-2.6/net/ipv6/ip6_fib.c:1593).
1588if (fn == w->root)
1589return 0;
1590pn = fn->parent;
1591w->node = pn;
1592#ifdef CONFIG_IPV6_SUBTREES
1593if (FIB6_SUBTREE(pn) == fn) {
1594WARN_ON(!(fn->fn_flags & RTN_ROOT));
1595w->state = FWS_L;
1596continue;
1597}
(gdb) l *(inet6_dump_fib+0x1ab)
0x1939b is in inet6_dump_fib

Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+

2017-06-06 Thread Eric Dumazet
On Tue, 2017-06-06 at 18:34 -0600, David Ahern wrote:
> On 6/6/17 6:27 PM, Eric Dumazet wrote:
> > Good catch, but it looks like similar fix is needed a few lines before.
> > 
> > diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
> > index 
> > deea901746c8570c5e801e40592c91e3b62812e0..b214443dc8346cef3690df7f27cc48a864028865
> >  100644
> > --- a/net/ipv6/ip6_fib.c
> > +++ b/net/ipv6/ip6_fib.c
> > @@ -372,12 +372,13 @@ static int fib6_dump_table(struct fib6_table *table, 
> > struct sk_buff *skb,
> >  
> > read_lock_bh(&table->tb6_lock);
> > res = fib6_walk(net, w);
> > -   read_unlock_bh(&table->tb6_lock);
> > if (res > 0) {
> > cb->args[4] = 1;
> > cb->args[5] = w->root->fn_sernum;
> > }
> > +   read_unlock_bh(&table->tb6_lock);
> 
> indeed. tunnel vision on Ben's problem

BTW, bug was already Ben's problem when Patrick tried to fix it
in commit 2bec5a369ee79 ("ipv6: fib: fix crash when changing large fib
while dumping it")  seven years ago ;)





Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+

2017-06-06 Thread Ben Greear

On 06/06/2017 05:27 PM, Eric Dumazet wrote:

On Tue, 2017-06-06 at 18:00 -0600, David Ahern wrote:

On 6/6/17 3:06 PM, Ben Greear wrote:

This bug has been around forever, and we recently got an intern and
stuck him with
trying to reproduce it on the latest kernel.  It is still here.  I'm not
super excited
about trying to fix this, but we can easily test patches if someone has a
patch to try.


Can you try this (whitespace damaged on paste, but it is moving the lock
ahead of the fn_sernum check):

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index deea901746c8..7a44c49055c0 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -378,6 +378,7 @@ static int fib6_dump_table(struct fib6_table *table,
struct sk_buff *skb,
cb->args[5] = w->root->fn_sernum;
}
} else {
+   read_lock_bh(&table->tb6_lock);
if (cb->args[5] != w->root->fn_sernum) {
/* Begin at the root if the tree changed */
cb->args[5] = w->root->fn_sernum;
@@ -387,7 +388,6 @@ static int fib6_dump_table(struct fib6_table *table,
struct sk_buff *skb,
} else
w->skip = 0;

-   read_lock_bh(&table->tb6_lock);
res = fib6_walk_continue(w);
read_unlock_bh(&table->tb6_lock);
if (res <= 0) {



Good catch, but it looks like similar fix is needed a few lines before.


We will test this tomorrow.

Thanks,
Ben




diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 
deea901746c8570c5e801e40592c91e3b62812e0..b214443dc8346cef3690df7f27cc48a864028865
 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -372,12 +372,13 @@ static int fib6_dump_table(struct fib6_table *table, 
struct sk_buff *skb,

read_lock_bh(&table->tb6_lock);
res = fib6_walk(net, w);
-   read_unlock_bh(&table->tb6_lock);
if (res > 0) {
cb->args[4] = 1;
cb->args[5] = w->root->fn_sernum;
}
+   read_unlock_bh(&table->tb6_lock);
} else {
+   read_lock_bh(&table->tb6_lock);
if (cb->args[5] != w->root->fn_sernum) {
/* Begin at the root if the tree changed */
cb->args[5] = w->root->fn_sernum;
@@ -387,7 +388,6 @@ static int fib6_dump_table(struct fib6_table *table, struct 
sk_buff *skb,
} else
w->skip = 0;

-   read_lock_bh(&table->tb6_lock);
res = fib6_walk_continue(w);
read_unlock_bh(&table->tb6_lock);
if (res <= 0) {





--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com



Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+

2017-06-06 Thread David Ahern
On 6/6/17 6:27 PM, Eric Dumazet wrote:
> Good catch, but it looks like similar fix is needed a few lines before.
> 
> diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
> index 
> deea901746c8570c5e801e40592c91e3b62812e0..b214443dc8346cef3690df7f27cc48a864028865
>  100644
> --- a/net/ipv6/ip6_fib.c
> +++ b/net/ipv6/ip6_fib.c
> @@ -372,12 +372,13 @@ static int fib6_dump_table(struct fib6_table *table, 
> struct sk_buff *skb,
>  
>   read_lock_bh(&table->tb6_lock);
>   res = fib6_walk(net, w);
> - read_unlock_bh(&table->tb6_lock);
>   if (res > 0) {
>   cb->args[4] = 1;
>   cb->args[5] = w->root->fn_sernum;
>   }
> + read_unlock_bh(&table->tb6_lock);

indeed. tunnel vision on Ben's problem


Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+

2017-06-06 Thread Eric Dumazet
On Tue, 2017-06-06 at 18:00 -0600, David Ahern wrote:
> On 6/6/17 3:06 PM, Ben Greear wrote:
> > This bug has been around forever, and we recently got an intern and
> > stuck him with
> > trying to reproduce it on the latest kernel.  It is still here.  I'm not
> > super excited
> > about trying to fix this, but we can easily test patches if someone has a
> > patch to try.
> 
> Can you try this (whitespace damaged on paste, but it is moving the lock
> ahead of the fn_sernum check):
> 
> diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
> index deea901746c8..7a44c49055c0 100644
> --- a/net/ipv6/ip6_fib.c
> +++ b/net/ipv6/ip6_fib.c
> @@ -378,6 +378,7 @@ static int fib6_dump_table(struct fib6_table *table,
> struct sk_buff *skb,
> cb->args[5] = w->root->fn_sernum;
> }
> } else {
> +   read_lock_bh(&table->tb6_lock);
> if (cb->args[5] != w->root->fn_sernum) {
> /* Begin at the root if the tree changed */
> cb->args[5] = w->root->fn_sernum;
> @@ -387,7 +388,6 @@ static int fib6_dump_table(struct fib6_table *table,
> struct sk_buff *skb,
> } else
> w->skip = 0;
> 
> -   read_lock_bh(&table->tb6_lock);
> res = fib6_walk_continue(w);
> read_unlock_bh(&table->tb6_lock);
> if (res <= 0) {


Good catch, but it looks like similar fix is needed a few lines before.

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 
deea901746c8570c5e801e40592c91e3b62812e0..b214443dc8346cef3690df7f27cc48a864028865
 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -372,12 +372,13 @@ static int fib6_dump_table(struct fib6_table *table, 
struct sk_buff *skb,
 
read_lock_bh(&table->tb6_lock);
res = fib6_walk(net, w);
-   read_unlock_bh(&table->tb6_lock);
if (res > 0) {
cb->args[4] = 1;
cb->args[5] = w->root->fn_sernum;
}
+   read_unlock_bh(&table->tb6_lock);
} else {
+   read_lock_bh(&table->tb6_lock);
if (cb->args[5] != w->root->fn_sernum) {
/* Begin at the root if the tree changed */
cb->args[5] = w->root->fn_sernum;
@@ -387,7 +388,6 @@ static int fib6_dump_table(struct fib6_table *table, struct 
sk_buff *skb,
} else
w->skip = 0;
 
-   read_lock_bh(&table->tb6_lock);
res = fib6_walk_continue(w);
read_unlock_bh(&table->tb6_lock);
if (res <= 0) {




Re: Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+

2017-06-06 Thread David Ahern
On 6/6/17 3:06 PM, Ben Greear wrote:
> This bug has been around forever, and we recently got an intern and
> stuck him with
> trying to reproduce it on the latest kernel.  It is still here.  I'm not
> super excited
> about trying to fix this, but we can easily test patches if someone has a
> patch to try.

Can you try this (whitespace damaged on paste, but it is moving the lock
ahead of the fn_sernum check):

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index deea901746c8..7a44c49055c0 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -378,6 +378,7 @@ static int fib6_dump_table(struct fib6_table *table,
struct sk_buff *skb,
cb->args[5] = w->root->fn_sernum;
}
} else {
+   read_lock_bh(&table->tb6_lock);
if (cb->args[5] != w->root->fn_sernum) {
/* Begin at the root if the tree changed */
cb->args[5] = w->root->fn_sernum;
@@ -387,7 +388,6 @@ static int fib6_dump_table(struct fib6_table *table,
struct sk_buff *skb,
} else
w->skip = 0;

-   read_lock_bh(&table->tb6_lock);
res = fib6_walk_continue(w);
read_unlock_bh(&table->tb6_lock);
if (res <= 0) {


Repeatable inet6_dump_fib crash in stock 4.12.0-rc4+

2017-06-06 Thread Ben Greear

Hello,

This bug has been around forever, and we recently got an intern and stuck him 
with
trying to reproduce it on the latest kernel.  It is still here.  I'm not super 
excited
about trying to fix this, but we can easily test patches if someone has a
patch to try.

Test case is to create 1000 mac-vlans and bring them up, with user-space
tools running lots of 'dump' related commands as part of bringing up the
interfaces and configuring some special source-based routing tables.

(gdb) l *(inet6_dump_fib+0x109)
0x192f9 is in inet6_dump_fib 
(/home/greearb/git/linux-2.6/net/ipv6/ip6_fib.c:392).
387 } else
388 w->skip = 0;
389 
390 read_lock_bh(&table->tb6_lock);
391 res = fib6_walk_continue(w);
392 read_unlock_bh(&table->tb6_lock);
393 if (res <= 0) {
394 fib6_walker_unlink(net, w);
395 cb->args[4] = 0;
396 }

(gdb) l *(fib6_walk_continue+0x76)
0x188c6 is in fib6_walk_continue 
(/home/greearb/git/linux-2.6/net/ipv6/ip6_fib.c:1593).
1588if (fn == w->root)
1589return 0;
1590pn = fn->parent;
1591w->node = pn;
1592#ifdef CONFIG_IPV6_SUBTREES
1593if (FIB6_SUBTREE(pn) == fn) {
1594WARN_ON(!(fn->fn_flags & RTN_ROOT));
1595w->state = FWS_L;
1596continue;
1597}

[root@ct524-ffb0 ~]# BUG: unable to handle kernel NULL pointer dereference at 
0018
IP: fib6_walk_continue+0x76/0x180 [ipv6]
PGD 3d9226067
P4D 3d9226067
PUD 3d9020067
PMD 0

Oops:  [#1] PREEMPT SMP
Modules linked in: nf_conntrack_netlink nf_conntrack nfnetlink nf_defrag_ipv4 libcrc32c bnep fuse macvlan pktgen cfg80211 ipmi_ssif iTCO_wdt iTCO_vendor_support 
coretemp intel_rapl x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm irqbypass joydev i2c_i801 ie31200_edac intel_pch_thermal shpchp hci_uart ipmi_si btbcm 
btqca ipmi_devintf btintel ipmi_msghandler bluetooth pinctrl_sunrisepoint acpi_als pinctrl_intel video tpm_tis intel_lpss_acpi kfifo_buf tpm_tis_core intel_lpss 
industrialio tpm acpi_pad acpi_power_meter sch_fq_codel nfsd auth_rpcgss nfs_acl lockd grace sunrpc ast drm_kms_helper ttm drm igb hwmon ptp pps_core dca 
i2c_algo_bit i2c_hid i2c_core ipv6 crc_ccitt [last unloaded: nf_conntrack]

CPU: 1 PID: 996 Comm: ip Not tainted 4.12.0-rc4+ #32
Hardware name: Supermicro Super Server/X11SSM-F, BIOS 1.0b 12/29/2015
task: 8803d4d61dc0 task.stack: c9000970c000
RIP: 0010:fib6_walk_continue+0x76/0x180 [ipv6]
RSP: 0018:c9000970fbb8 EFLAGS: 00010283
RAX: 8803de84b020 RBX: 8803e0756f00 RCX: 
RDX:  RSI: c9000970fc00 RDI: 81eee280
RBP: c9000970fbc0 R08: 0008 R09: 8803d4fbbf31
R10: c9000970fb68 R11:  R12: 0001
R13: 0001 R14: 8803e0756f00 R15: 8803d9345b18
FS:  7f32ca4ec700() GS:88047784() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 0018 CR3: 0003ddacc000 CR4: 003406e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 inet6_dump_fib+0x109/0x290 [ipv6]
 netlink_dump+0x11d/0x290
 netlink_recvmsg+0x260/0x3f0
 sock_recvmsg+0x38/0x40
 ___sys_recvmsg+0xe9/0x230
 ? alloc_pages_vma+0x9d/0x260
 ? page_add_new_anon_rmap+0x88/0xc0
 ? lru_cache_add_active_or_unevictable+0x31/0xb0
 ? __handle_mm_fault+0xce3/0xf70
 __sys_recvmsg+0x3d/0x70
 ? __sys_recvmsg+0x3d/0x70
 SyS_recvmsg+0xd/0x20
 do_syscall_64+0x56/0xc0
 entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7f32c9e21050
RSP: 002b:7fff96401de8 EFLAGS: 0246 ORIG_RAX: 002f
RAX: ffda RBX:  RCX: 7f32c9e21050
RDX:  RSI: 7fff96401e50 RDI: 0004
RBP: 7fff96405e74 R08: 3fe4 R09: 
R10: 7fff96401e90 R11: 0246 R12: 0064f3a0
R13: 7fff96405ee0 R14: 3fe4 R15: 
Code: f6 40 2a 04 74 11 8b 53 30 85 d2 0f 84 02 01 00 00 83 ea 01 89 53 30 c7 43 28 04 00 00 00 48 39 43 10 74 33 48 8b 10 48 89 53 18 <48> 39 42 18 0f 84 a3 00 
00 00 48 39 42 08 0f 84 ae 00 00 00 48

RIP: fib6_walk_continue+0x76/0x180 [ipv6] RSP: c9000970fbb8
CR2: 0018
---[ end trace 5ebbc4ee97bea64e ]---
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: disabled
Rebooting in 10 seconds..
ACPI MEMORY or I/O RESET_REG.


--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com