On Fri, 2013-03-29 at 06:01 -0700, Eric Dumazet wrote: > From: Eric Dumazet <eduma...@google.com> > > On Fri, 2013-03-29 at 10:48 +0100, Jiri Pirko wrote: > > > Hmm. I think that this might be issue introduced by: > > commit a9b3cd7f323b2e57593e7215362a7b02fc933e3a > > Author: Stephen Hemminger <shemmin...@vyatta.com> > > Date: Mon Aug 1 16:19:00 2011 +0000 > > > > rcu: convert uses of rcu_assign_pointer(x, NULL) to RCU_INIT_POINTER > > > > > > Because, if rcu_dereference(dev->rx_handler) is null, > > rcu_dereference(dev->rx_handler_data) is never done. Therefore I believe > > we are hitting following scenario: > > > > > > CPU0 CPU1 > > ---- ---- > > dev->rx_handler_data = NULL > > rcu_read_lock() > > dev->rx_handler = NULL > > > > > > CPU0 will see rx_handler set and yet, rx_handler_data nulled. Write > > barrier in rcu_assign_pointer() might prevent this reorder from happening. > > Therefore I suggest: > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > index 0caa38e..c16b829 100644 > > --- a/net/core/dev.c > > +++ b/net/core/dev.c > > @@ -3332,8 +3332,8 @@ void netdev_rx_handler_unregister(struct net_device > > *dev) > > { > > > > ASSERT_RTNL(); > > - RCU_INIT_POINTER(dev->rx_handler, NULL); > > - RCU_INIT_POINTER(dev->rx_handler_data, NULL); > > + rcu_assign_pointer(dev->rx_handler, NULL); > > + rcu_assign_pointer(dev->rx_handler_data, NULL); > > } > > EXPORT_SYMBOL_GPL(netdev_rx_handler_unregister); > > > > > > Nope this changes nothing at all.
Exactly! In fact, the bug triggered on an older kernel that had the original rcu_assign_pointer() > > However, we can fix the bug in a different way, if we want to avoid a > test in fast path. > > With following patch, we can make sure that a reader seeing a non NULL > rx_handler has a guarantee to see a non NULL rx_handler_data > [..] > We can fix bug this in two ways. First is adding a test in > bond_handle_frame() and others to check if rx_handler_data is NULL. > > A second way is adding a synchronize_net() in > netdev_rx_handler_unregister() to make sure that a rcu protected reader > has the guarantee to see a non NULL rx_handler_data. > > The second way is better as it avoids an extra test in fast path. > > Reported-by: Steven Rostedt <rost...@goodmis.org> > Signed-off-by: Eric Dumazet <eduma...@google.com> > Cc: Jiri Pirko <jpi...@redhat.com> > Cc: Paul E. McKenney <paul...@us.ibm.com> > --- > net/core/dev.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/net/core/dev.c b/net/core/dev.c > index b13e5c7..56932a4 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -3314,6 +3314,7 @@ int netdev_rx_handler_register(struct net_device *dev, > if (dev->rx_handler) > return -EBUSY; > > + /* Note: rx_handler_data must be set before rx_handler */ > rcu_assign_pointer(dev->rx_handler_data, rx_handler_data); > rcu_assign_pointer(dev->rx_handler, rx_handler); > > @@ -3334,6 +3335,11 @@ void netdev_rx_handler_unregister(struct net_device > *dev) > > ASSERT_RTNL(); > RCU_INIT_POINTER(dev->rx_handler, NULL); > + /* a reader seeing a non NULL rx_handler in a rcu_read_lock() > + * section has a guarantee to see a non NULL rx_handler_data > + * as well. > + */ > + synchronize_net(); I've thought about this too, but I wasn't sure we wanted two synchronize_*() functions, as the caller does a synchronize as well. That said, I think this is the more robust solution and it lets all rx_handler() functions assume that their rx_handler_data is set. And it removes the check from the fast path which outweighs an added synchronization in the slow path. Acked-by: Steven Rostedt <rost...@goodmis.org> Thanks! -- Steve > RCU_INIT_POINTER(dev->rx_handler_data, NULL); > } > EXPORT_SYMBOL_GPL(netdev_rx_handler_unregister); > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/