On 06/13/2017 01:28 PM, David Ahern wrote:
On 6/13/17 2:16 PM, Ben Greear wrote:
On 06/09/2017 02:25 PM, Eric Dumazet wrote:
On Fri, 2017-06-09 at 07:27 -0600, David Ahern wrote:
On 6/8/17 11:55 PM, Cong Wang wrote:
On Thu, Jun 8, 2017 at 2:27 PM, Ben Greear <gree...@candelatech.com>
wrote:

As far as I can tell, the patch did not help, or at least we still
reproduce
the
crash easily.

netlink dump is serialized by nlk->cb_mutex so I don't think that
patch makes any sense w.r.t race condition.

From what I can see fn_sernum should be accessed under table lock, so
when saving and checking it during a walk make sure it the lock is held.
That has nothing to do with the netlink dump, but the table changing
during a walk.


Yes, your patch makes total sense, of course.

I guess someone should go ahead and make an official patch and
submit it, even if it doesn't fix my problem.

I can do that; was hoping to root cause the problem first.



(gdb) l *(fib6_walk_continue+0x76)
0x188c6 is in fib6_walk_continue
(/home/greearb/git/linux-2.6/net/ipv6/ip6_fib.c:1593).
1588                            if (fn == w->root)
1589                                    return 0;
1590                            pn = fn->parent;
1591                            w->node = pn;
1592    #ifdef CONFIG_IPV6_SUBTREES
1593                            if (FIB6_SUBTREE(pn) == fn) {

Apparently fn->parent is NULL here for some reason, but
I don't know if that is expected or not. If a simple NULL check
is not enough here, we have to trace why it is NULL.

From my understanding, parent should not be null hence the attempts to
fix access to table nodes under a lock. ie., figuring out why it is null
here.

If someone has more suggestions, I'll be happy to test.

I have looked at the code again and nothing is jumping out. Will look
again later today.


I noticed there is some code to help fix up the walkers when nodes are deleted. 
 They
use lock:       read_lock(&net->ipv6.fib6_walker_lock);

The code you were tweaking uses a different lock:  
read_lock_bh(&table->tb6_lock);

In is certainly not simple code, so I don't know if that is correct or not, but
might possibly be a place to start looking.

I'm going to re-test with a WARN_ON to see if that triggers since previous 
suggestion
was that f->parent was NULL.


diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 51cd637..86295df 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -1571,6 +1571,10 @@ static int fib6_walk_continue(struct fib6_walker *w)
                case FWS_U:
                        if (fn == w->root)
                                return 0;
+                       if (!fn->parent) {
+                               WARN_ON_ONCE(0);
+                               return 0;
+                       }
                        pn = fn->parent;
                        w->node = pn;
 #ifdef CONFIG_IPV6_SUBTREES


Thanks,
Ben

Ben Greear <gree...@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

Reply via email to