On Fri, Mar 25, 2016 at 03:16:36PM -0400, David Miller wrote:
> From: Bjorn Helgaas <helg...@kernel.org>
> Date: Fri, 25 Mar 2016 11:46:39 -0500
> 
> > You're right, there is an issue here.  I reproduced a problem with a
> > bond device.  bond_netpoll_setup() calls __netpoll_setup() directly
> > (not netpoll_setup()).  I'll debug it more; just wanted to let you
> > know there *is* a problem with this patch.
> 
> I bet that's why the assignment to np->dev and the reference counting
> were separated in the first place :-/
> 
> Indeed, commit 30fdd8a082a00126a6feec994e43e8dc12f5bccb:
> 
> commit 30fdd8a082a00126a6feec994e43e8dc12f5bccb
> Author: Jiri Pirko <j...@resnulli.us>
> Date:   Tue Jul 17 05:22:35 2012 +0000
> 
>     netpoll: move np->dev and np->dev_name init into __netpoll_setup()
>     
>     Signed-off-by: Jiri Pirko <j...@resnulli.us>
>     Signed-off-by: David S. Miller <da...@davemloft.net>

We probably just want to balance the setting/clearing of np->dev in
__netpoll_setup, so that any error return (that would result in a drop of the
refcount in netpoll_setup) correlates to a setting of np->dev to NULL in
__netpoll_setup. That leaves us with the problem of having to watch for future
imbalances as you mentioned previously Dave, but it seems a potential problem
tomorrow is preferable to an actual problem today.

Another option would be to move the dev_hold/put into __netpoll_setup, but doing
so would I think require some additional refactoring in netpoll_setup.  Namely
that we would still need a dev_hold/put in netpoll_setup to prevent the device
from being removed during the period where we release the rtnl lock in the if
(!netif_running(ndev)) clause. We would have to hold the device, unlock rtnl,
then put the device after re-aquiring rtnl at the end of that if block.

Neil

Reply via email to