On Fri, Mar 25, 2016 at 03:16:36PM -0400, David Miller wrote: > From: Bjorn Helgaas <helg...@kernel.org> > Date: Fri, 25 Mar 2016 11:46:39 -0500 > > > You're right, there is an issue here. I reproduced a problem with a > > bond device. bond_netpoll_setup() calls __netpoll_setup() directly > > (not netpoll_setup()). I'll debug it more; just wanted to let you > > know there *is* a problem with this patch. > > I bet that's why the assignment to np->dev and the reference counting > were separated in the first place :-/ > > Indeed, commit 30fdd8a082a00126a6feec994e43e8dc12f5bccb: > > commit 30fdd8a082a00126a6feec994e43e8dc12f5bccb > Author: Jiri Pirko <j...@resnulli.us> > Date: Tue Jul 17 05:22:35 2012 +0000 > > netpoll: move np->dev and np->dev_name init into __netpoll_setup() > > Signed-off-by: Jiri Pirko <j...@resnulli.us> > Signed-off-by: David S. Miller <da...@davemloft.net>
We probably just want to balance the setting/clearing of np->dev in __netpoll_setup, so that any error return (that would result in a drop of the refcount in netpoll_setup) correlates to a setting of np->dev to NULL in __netpoll_setup. That leaves us with the problem of having to watch for future imbalances as you mentioned previously Dave, but it seems a potential problem tomorrow is preferable to an actual problem today. Another option would be to move the dev_hold/put into __netpoll_setup, but doing so would I think require some additional refactoring in netpoll_setup. Namely that we would still need a dev_hold/put in netpoll_setup to prevent the device from being removed during the period where we release the rtnl lock in the if (!netif_running(ndev)) clause. We would have to hold the device, unlock rtnl, then put the device after re-aquiring rtnl at the end of that if block. Neil