On 06/09/2017 02:25 PM, Eric Dumazet wrote:
On Fri, 2017-06-09 at 07:27 -0600, David Ahern wrote:
On 6/8/17 11:55 PM, Cong Wang wrote:
On Thu, Jun 8, 2017 at 2:27 PM, Ben Greear <gree...@candelatech.com> wrote:

As far as I can tell, the patch did not help, or at least we still reproduce
the
crash easily.

netlink dump is serialized by nlk->cb_mutex so I don't think that
patch makes any sense w.r.t race condition.

From what I can see fn_sernum should be accessed under table lock, so
when saving and checking it during a walk make sure it the lock is held.
That has nothing to do with the netlink dump, but the table changing
during a walk.


Yes, your patch makes total sense, of course.

I guess someone should go ahead and make an official patch and
submit it, even if it doesn't fix my problem.

(gdb) l *(fib6_walk_continue+0x76)
0x188c6 is in fib6_walk_continue
(/home/greearb/git/linux-2.6/net/ipv6/ip6_fib.c:1593).
1588                            if (fn == w->root)
1589                                    return 0;
1590                            pn = fn->parent;
1591                            w->node = pn;
1592    #ifdef CONFIG_IPV6_SUBTREES
1593                            if (FIB6_SUBTREE(pn) == fn) {

Apparently fn->parent is NULL here for some reason, but
I don't know if that is expected or not. If a simple NULL check
is not enough here, we have to trace why it is NULL.

From my understanding, parent should not be null hence the attempts to
fix access to table nodes under a lock. ie., figuring out why it is null
here.

If someone has more suggestions, I'll be happy to test.

Thanks,
Ben


--
Ben Greear <gree...@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

Reply via email to