On Fri, 2014-05-30 at 22:41 -0400, Neil Horman wrote:
> On Fri, May 30, 2014 at 01:58:33PM -0700, Michael Chan wrote:
> > On Fri, 2014-05-30 at 16:38 -0400, Neil Horman wrote: 
> > > On Fri, May 30, 2014 at 01:13:40PM -0700, Michael Chan wrote:
> > > > On Fri, 2014-05-30 at 16:03 -0400, Neil Horman wrote: 
> > > > > On Fri, May 30, 2014 at 10:58:11AM -0700, Michael Chan wrote:
> > > > > > On Fri, 2014-05-30 at 11:00 -0400, Neil Horman wrote: 
> > > > > > > The Cnic driver handles lots of ulp operations in its netdevice 
> > > > > > > event hanlder.
> > > > > > > To do this, it accesses the ulp_ops array, which is an rcu 
> > > > > > > protected array.
> > > > > > > However, some ulp operations (like bnx2fc_indicate_netevent) try 
> > > > > > > to lock
> > > > > > > mutexes, which might sleep (somthing that you can't do while 
> > > > > > > holding rcu read
> > > > > > > side locks if you've configured non-preemptive rcu.
> > > > > > > 
> > > > > > > Fix this by changing the dereference method.  All accesses to the 
> > > > > > > ulp_ops array
> > > > > > > for a cnic dev are modified under the protection of the rtnl 
> > > > > > > lock, and so we can
> > > > > > > safely just use rcu_dereference_rtnl, and remove the 
> > > > > > > rcu_read_lock here
> > > > > > 
> > > > > > Because the bnx2fc function can sleep, we need a more complete fix 
> > > > > > to
> > > > > > prevent the ulp_ops from going away when the device is unregistered.
> > > > > > synchronize_rcu() won't be able to protect it.  I'll post the patch
> > > > > > later today.  Thanks.
> > > > > > 
> > > > > The device can't be unregistered while we hold rtnl, can it?  Since 
> > > > > we hold it
> > > > > in this path it seems safe to me, even if we sleep, or am I missing 
> > > > > something?
> > > > > Neil
> > > > > 
> > > > The netdev cannot be unregistered of course, but I am talking about
> > > > bnx2fc unregistering the cnic device.  For example if someone does
> > > > fcoeadm -d or bnx2fc gets unloaded.
> > > 
> > > I don't think the latter can happen, as creating an fcoe transport places 
> > > a hold
> > > on the bnx2fc module (see bnx2fc_create), and the former operation 
> > > (fcoeadm -d)
> > > will block in bnx2fc_destroy as it requires holding the rtnl_lock, which 
> > > will
> > > already be held by the netevent notifer, and confirmed by the
> > > rcu_dereference_rtnl in my patch.
> > > 
> > > I really think we're safe here 
> > 
> > Take a look at bnx2fc_mod_exit().  It doesn't look safe to me as it goes
> > through the adapter_list unregistering all cnic devices not under
> > rtnl_lock.
> > 
> Right, but you can't get into the module removal code at all until all
> transports are unregistered.  I suppose if you have no registered transports 
> and
> remove the bnx2fc module while a netdevice event occurs, there might be a
> problem, but I think that problem is bigger than what we're talking about 
> here,
> as you don't want to remove the module at all while running a netdevice
> notifier, as you'll wind up potentially executing garbage. 

As long as we take care of the race conditions, I don't think there is a
bigger problem.  During bnx2fc module removal, it will unregister all
cnic devices.  If there is a netdev event, we will synchronize and the
unregister call will wait for all pending netdev event handling to be
done before completing.  The alternate patch that I sent out should take
care of this condition.  Thanks.

_______________________________________________
fcoe-devel mailing list
[email protected]
http://lists.open-fcoe.org/mailman/listinfo/fcoe-devel

Reply via email to