Re: [PATCH net-next 1/1] hv_netvsc: fix deadlock on hotplug

2017-09-06 Thread Stephen Hemminger
On Wed, 6 Sep 2017 16:23:45 +
Haiyang Zhang  wrote:

> > -Original Message-
> > From: Stephen Hemminger [mailto:step...@networkplumber.org]
> > Sent: Wednesday, September 6, 2017 11:19 AM
> > To: KY Srinivasan ; Haiyang Zhang
> > ; Stephen Hemminger 
> > Cc: de...@linuxdriverproject.org; netdev@vger.kernel.org
> > Subject: [PATCH net-next 1/1] hv_netvsc: fix deadlock on hotplug
> > 
> > When a virtual device is added dynamically (via host console), then
> > the vmbus sends an offer message for the primary channel. The processing
> > of this message for networking causes the network device to then
> > initialize the sub channels.
> > 
> > The problem is that setting up the sub channels needs to wait until
> > the subsequent subchannel offers have been processed. These offers
> > come in on the same ring buffer and work queue as where the primary
> > offer is being processed; leading to a deadlock.
> > 
> > This did not happen in older kernels, because the sub channel waiting
> > logic was broken (it wasn't really waiting).
> > 
> > The solution is to do the sub channel setup in its own work queue
> > context that is scheduled by the primary channel setup; and then
> > happens later.
> > 
> > Fixes: 732e49850c5e ("netvsc: fix race on sub channel creation")
> > Reported-by: Dexuan Cui 
> > Signed-off-by: Stephen Hemminger 
> > ---
> > Should also go to stable, but this version does not apply cleanly
> > to 4.13. Have another patch for that.
> > 
> >  drivers/net/hyperv/hyperv_net.h   |   1 +
> >  drivers/net/hyperv/netvsc_drv.c   |   8 +--
> >  drivers/net/hyperv/rndis_filter.c | 106 ++-
> > ---
> >  3 files changed, 74 insertions(+), 41 deletions(-)  
> 
> The patch looks overall. I just have a question:
> 
> With this patch, after module load and probe is done, there may still be
> subchannels being processed. If rmmod immediately, the subchannel offers 
> may hit half-way removed device structures... Do we also need to add 
> cancel_work_sync(&dev->subchan_work) to the top of netvsc_remove()?
> 
> unregister_netdevice() includes device close, but it's only called later
> in the netvsc_remove() when rndis is already removed.
> 
> Thanks,
> - Haiyang

Good catch.
If the driver called unregister_netdevice first before doing 
rndis_filter_device_remove
that would solve the problem. That wouldn't cause additional problems and it 
makes
sense to close the network layer first.


RE: [PATCH net-next 1/1] hv_netvsc: fix deadlock on hotplug

2017-09-06 Thread Haiyang Zhang


> -Original Message-
> From: Stephen Hemminger [mailto:step...@networkplumber.org]
> Sent: Wednesday, September 6, 2017 11:19 AM
> To: KY Srinivasan ; Haiyang Zhang
> ; Stephen Hemminger 
> Cc: de...@linuxdriverproject.org; netdev@vger.kernel.org
> Subject: [PATCH net-next 1/1] hv_netvsc: fix deadlock on hotplug
> 
> When a virtual device is added dynamically (via host console), then
> the vmbus sends an offer message for the primary channel. The processing
> of this message for networking causes the network device to then
> initialize the sub channels.
> 
> The problem is that setting up the sub channels needs to wait until
> the subsequent subchannel offers have been processed. These offers
> come in on the same ring buffer and work queue as where the primary
> offer is being processed; leading to a deadlock.
> 
> This did not happen in older kernels, because the sub channel waiting
> logic was broken (it wasn't really waiting).
> 
> The solution is to do the sub channel setup in its own work queue
> context that is scheduled by the primary channel setup; and then
> happens later.
> 
> Fixes: 732e49850c5e ("netvsc: fix race on sub channel creation")
> Reported-by: Dexuan Cui 
> Signed-off-by: Stephen Hemminger 
> ---
> Should also go to stable, but this version does not apply cleanly
> to 4.13. Have another patch for that.
> 
>  drivers/net/hyperv/hyperv_net.h   |   1 +
>  drivers/net/hyperv/netvsc_drv.c   |   8 +--
>  drivers/net/hyperv/rndis_filter.c | 106 ++-
> ---
>  3 files changed, 74 insertions(+), 41 deletions(-)

The patch looks overall. I just have a question:

With this patch, after module load and probe is done, there may still be
subchannels being processed. If rmmod immediately, the subchannel offers 
may hit half-way removed device structures... Do we also need to add 
cancel_work_sync(&dev->subchan_work) to the top of netvsc_remove()?

unregister_netdevice() includes device close, but it's only called later
in the netvsc_remove() when rndis is already removed.

Thanks,
- Haiyang