Re: [PATCH] hv_netvsc: Make sure out channel is fully opened on send

2018-09-27 Thread Mohammed Gamal
On Thu, 2018-09-27 at 12:23 +0200, Stephen Hemminger wrote:
> On Thu, 27 Sep 2018 10:57:05 +0200
> Mohammed Gamal  wrote:
> 
> > On Wed, 2018-09-26 at 17:13 +, Haiyang Zhang wrote:
> > > > -Original Message-
> > > > From: Mohammed Gamal 
> > > > Sent: Wednesday, September 26, 2018 12:34 PM
> > > > To: Stephen Hemminger ; net...@vger.ker
> > > > nel.
> > > > org
> > > > Cc: KY Srinivasan ; Haiyang Zhang
> > > > ; vkuznets ;
> > > > ot...@redhat.com; cavery ; linux-
> > > > ker...@vger.kernel.org; de...@linuxdriverproject.org; Mohammed
> > > > Gamal
> > > > 
> > > > Subject: [PATCH] hv_netvsc: Make sure out channel is fully
> > > > opened
> > > > on send
> > > > 
> > > > Dring high network traffic changes to network interface
> > > > parameters
> > > > such as
> > > > number of channels or MTU can cause a kernel panic with a NULL
> > > > pointer
> > > > dereference. This is due to netvsc_device_remove() being called
> > > > and
> > > > deallocating the channel ring buffers, which can then be
> > > > accessed
> > > > by
> > > > netvsc_send_pkt() before they're allocated on calling
> > > > netvsc_device_add()
> > > > 
> > > > The patch fixes this problem by checking the channel state and
> > > > returning
> > > > ENODEV if not yet opened. We also move the call to
> > > > hv_ringbuf_avail_percent()
> > > > which may access the uninitialized ring buffer.
> > > > 
> > > > Signed-off-by: Mohammed Gamal 
> > > > ---
> > > >  drivers/net/hyperv/netvsc.c | 7 ++-
> > > >  1 file changed, 6 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/net/hyperv/netvsc.c
> > > > b/drivers/net/hyperv/netvsc.c index
> > > > fe01e14..75f1b31 100644
> > > > --- a/drivers/net/hyperv/netvsc.c
> > > > +++ b/drivers/net/hyperv/netvsc.c
> > > > @@ -825,7 +825,12 @@ static inline int netvsc_send_pkt(
> > > >     struct netdev_queue *txq = netdev_get_tx_queue(ndev,
> > > > packet->q_idx);
> > > >     u64 req_id;
> > > >     int ret;
> > > > -   u32 ring_avail =
> > > > hv_get_avail_to_write_percent(_channel-  
> > > > > outbound);  
> > > > 
> > > > +   u32 ring_avail;
> > > > +
> > > > +   if (out_channel->state != CHANNEL_OPENED_STATE)
> > > > +   return -ENODEV;
> > > > +
> > > > +   ring_avail =
> > > > hv_get_avail_to_write_percent(_channel-  
> > > > > outbound);  
> > > 
> > > When you reproducing the NULL ptr panic, does your kernel include
> > > the
> > > following patch?
> > > hv_netvsc: common detach logic
> > > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.g
> > > it/c
> > > ommit/?id=7b2ee50c0cd513a176a26a71f2989facdd75bfea
> > >   
> > 
> > Yes it is included. And the commit did reduce the occurrence of
> > this
> > race condition, but it still nevertheless occurs albeit rarely.
> > 
> > > We call netif_tx_disable(ndev) and netif_device_detach(ndev)
> > > before
> > > doing the changes 
> > > on MTU or #channels. So there should be no call to start_xmit()
> > > when
> > > channel is not ready.
> > > 
> > > If you see the check for CHANNEL_OPENED_STATE is still necessary
> > > on
> > > upstream kernel (including 
> > > the patch " common detach logic "), we should debug further on
> > > the
> > > code and find out the 
> > > root cause.
> > > 
> > > Thanks,
> > > - Haiyang
> > >   
> > 
> > ___
> > devel mailing list
> > de...@linuxdriverproject.org
> > http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-
> > devel
> 
> Is there some workload, that can be used to reproduce this?
> The stress test from Vitaly with changing parameters while running
> network traffic
> passes now.
> 
> Can you reproduce this with the upstream current kernel?
> 
> Adding the check in start xmit is still racy, and won't cure the
> problem.
> 
> Another solution would be to add a grace period in the netvsc detach
> logic.
> 

Steps to reproduce are listed here:
https://bugzilla.redhat.com/show_bug.cgi?id=1632653

We've also managed to reproduce the same issue upstream. It's more
likely to be reproduced on Windows 2012R2 than 2016.

Regards,
Mohammed
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH] hv_netvsc: Make sure out channel is fully opened on send

2018-09-27 Thread Stephen Hemminger
On Thu, 27 Sep 2018 10:57:05 +0200
Mohammed Gamal  wrote:

> On Wed, 2018-09-26 at 17:13 +, Haiyang Zhang wrote:
> > > -Original Message-
> > > From: Mohammed Gamal 
> > > Sent: Wednesday, September 26, 2018 12:34 PM
> > > To: Stephen Hemminger ; netdev@vger.kernel.
> > > org
> > > Cc: KY Srinivasan ; Haiyang Zhang
> > > ; vkuznets ;
> > > ot...@redhat.com; cavery ; linux-
> > > ker...@vger.kernel.org; de...@linuxdriverproject.org; Mohammed
> > > Gamal
> > > 
> > > Subject: [PATCH] hv_netvsc: Make sure out channel is fully opened
> > > on send
> > > 
> > > Dring high network traffic changes to network interface parameters
> > > such as
> > > number of channels or MTU can cause a kernel panic with a NULL
> > > pointer
> > > dereference. This is due to netvsc_device_remove() being called and
> > > deallocating the channel ring buffers, which can then be accessed
> > > by
> > > netvsc_send_pkt() before they're allocated on calling
> > > netvsc_device_add()
> > > 
> > > The patch fixes this problem by checking the channel state and
> > > returning
> > > ENODEV if not yet opened. We also move the call to
> > > hv_ringbuf_avail_percent()
> > > which may access the uninitialized ring buffer.
> > > 
> > > Signed-off-by: Mohammed Gamal 
> > > ---
> > >  drivers/net/hyperv/netvsc.c | 7 ++-
> > >  1 file changed, 6 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/net/hyperv/netvsc.c
> > > b/drivers/net/hyperv/netvsc.c index
> > > fe01e14..75f1b31 100644
> > > --- a/drivers/net/hyperv/netvsc.c
> > > +++ b/drivers/net/hyperv/netvsc.c
> > > @@ -825,7 +825,12 @@ static inline int netvsc_send_pkt(
> > >   struct netdev_queue *txq = netdev_get_tx_queue(ndev,
> > > packet->q_idx);
> > >   u64 req_id;
> > >   int ret;
> > > - u32 ring_avail =
> > > hv_get_avail_to_write_percent(_channel-  
> > > > outbound);  
> > > 
> > > + u32 ring_avail;
> > > +
> > > + if (out_channel->state != CHANNEL_OPENED_STATE)
> > > + return -ENODEV;
> > > +
> > > + ring_avail = hv_get_avail_to_write_percent(_channel-  
> > > >outbound);  
> > 
> > When you reproducing the NULL ptr panic, does your kernel include the
> > following patch?
> > hv_netvsc: common detach logic
> > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/c
> > ommit/?id=7b2ee50c0cd513a176a26a71f2989facdd75bfea
> >   
> Yes it is included. And the commit did reduce the occurrence of this
> race condition, but it still nevertheless occurs albeit rarely.
> 
> > We call netif_tx_disable(ndev) and netif_device_detach(ndev) before
> > doing the changes 
> > on MTU or #channels. So there should be no call to start_xmit() when
> > channel is not ready.
> > 
> > If you see the check for CHANNEL_OPENED_STATE is still necessary on
> > upstream kernel (including 
> > the patch " common detach logic "), we should debug further on the
> > code and find out the 
> > root cause.
> > 
> > Thanks,
> > - Haiyang
> >   
> ___
> devel mailing list
> de...@linuxdriverproject.org
> http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

Is there some workload, that can be used to reproduce this?
The stress test from Vitaly with changing parameters while running network 
traffic
passes now.

Can you reproduce this with the upstream current kernel?

Adding the check in start xmit is still racy, and won't cure the problem.

Another solution would be to add a grace period in the netvsc detach logic.

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH] hv_netvsc: Make sure out channel is fully opened on send

2018-09-27 Thread Mohammed Gamal
On Wed, 2018-09-26 at 17:13 +, Haiyang Zhang wrote:
> > -Original Message-
> > From: Mohammed Gamal 
> > Sent: Wednesday, September 26, 2018 12:34 PM
> > To: Stephen Hemminger ; netdev@vger.kernel.
> > org
> > Cc: KY Srinivasan ; Haiyang Zhang
> > ; vkuznets ;
> > ot...@redhat.com; cavery ; linux-
> > ker...@vger.kernel.org; de...@linuxdriverproject.org; Mohammed
> > Gamal
> > 
> > Subject: [PATCH] hv_netvsc: Make sure out channel is fully opened
> > on send
> > 
> > Dring high network traffic changes to network interface parameters
> > such as
> > number of channels or MTU can cause a kernel panic with a NULL
> > pointer
> > dereference. This is due to netvsc_device_remove() being called and
> > deallocating the channel ring buffers, which can then be accessed
> > by
> > netvsc_send_pkt() before they're allocated on calling
> > netvsc_device_add()
> > 
> > The patch fixes this problem by checking the channel state and
> > returning
> > ENODEV if not yet opened. We also move the call to
> > hv_ringbuf_avail_percent()
> > which may access the uninitialized ring buffer.
> > 
> > Signed-off-by: Mohammed Gamal 
> > ---
> >  drivers/net/hyperv/netvsc.c | 7 ++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/hyperv/netvsc.c
> > b/drivers/net/hyperv/netvsc.c index
> > fe01e14..75f1b31 100644
> > --- a/drivers/net/hyperv/netvsc.c
> > +++ b/drivers/net/hyperv/netvsc.c
> > @@ -825,7 +825,12 @@ static inline int netvsc_send_pkt(
> >     struct netdev_queue *txq = netdev_get_tx_queue(ndev,
> > packet->q_idx);
> >     u64 req_id;
> >     int ret;
> > -   u32 ring_avail =
> > hv_get_avail_to_write_percent(_channel-
> > > outbound);
> > 
> > +   u32 ring_avail;
> > +
> > +   if (out_channel->state != CHANNEL_OPENED_STATE)
> > +   return -ENODEV;
> > +
> > +   ring_avail = hv_get_avail_to_write_percent(_channel-
> > >outbound);
> 
> When you reproducing the NULL ptr panic, does your kernel include the
> following patch?
> hv_netvsc: common detach logic
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/c
> ommit/?id=7b2ee50c0cd513a176a26a71f2989facdd75bfea
> 
Yes it is included. And the commit did reduce the occurrence of this
race condition, but it still nevertheless occurs albeit rarely.

> We call netif_tx_disable(ndev) and netif_device_detach(ndev) before
> doing the changes 
> on MTU or #channels. So there should be no call to start_xmit() when
> channel is not ready.
> 
> If you see the check for CHANNEL_OPENED_STATE is still necessary on
> upstream kernel (including 
> the patch " common detach logic "), we should debug further on the
> code and find out the 
> root cause.
> 
> Thanks,
> - Haiyang
> 
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


RE: [PATCH] hv_netvsc: Make sure out channel is fully opened on send

2018-09-26 Thread Haiyang Zhang



> -Original Message-
> From: Mohammed Gamal 
> Sent: Wednesday, September 26, 2018 12:34 PM
> To: Stephen Hemminger ; net...@vger.kernel.org
> Cc: KY Srinivasan ; Haiyang Zhang
> ; vkuznets ;
> ot...@redhat.com; cavery ; linux-
> ker...@vger.kernel.org; de...@linuxdriverproject.org; Mohammed Gamal
> 
> Subject: [PATCH] hv_netvsc: Make sure out channel is fully opened on send
> 
> Dring high network traffic changes to network interface parameters such as
> number of channels or MTU can cause a kernel panic with a NULL pointer
> dereference. This is due to netvsc_device_remove() being called and
> deallocating the channel ring buffers, which can then be accessed by
> netvsc_send_pkt() before they're allocated on calling
> netvsc_device_add()
> 
> The patch fixes this problem by checking the channel state and returning
> ENODEV if not yet opened. We also move the call to hv_ringbuf_avail_percent()
> which may access the uninitialized ring buffer.
> 
> Signed-off-by: Mohammed Gamal 
> ---
>  drivers/net/hyperv/netvsc.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c index
> fe01e14..75f1b31 100644
> --- a/drivers/net/hyperv/netvsc.c
> +++ b/drivers/net/hyperv/netvsc.c
> @@ -825,7 +825,12 @@ static inline int netvsc_send_pkt(
>   struct netdev_queue *txq = netdev_get_tx_queue(ndev, packet->q_idx);
>   u64 req_id;
>   int ret;
> - u32 ring_avail = hv_get_avail_to_write_percent(_channel-
> >outbound);
> + u32 ring_avail;
> +
> + if (out_channel->state != CHANNEL_OPENED_STATE)
> + return -ENODEV;
> +
> + ring_avail = hv_get_avail_to_write_percent(_channel->outbound);

When you reproducing the NULL ptr panic, does your kernel include the following 
patch?
hv_netvsc: common detach logic
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=7b2ee50c0cd513a176a26a71f2989facdd75bfea

We call netif_tx_disable(ndev) and netif_device_detach(ndev) before doing the 
changes 
on MTU or #channels. So there should be no call to start_xmit() when channel is 
not ready.

If you see the check for CHANNEL_OPENED_STATE is still necessary on upstream 
kernel (including 
the patch " common detach logic "), we should debug further on the code and 
find out the 
root cause.

Thanks,
- Haiyang

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[PATCH] hv_netvsc: Make sure out channel is fully opened on send

2018-09-26 Thread Mohammed Gamal
Dring high network traffic changes to network interface parameters
such as number of channels or MTU can cause a kernel panic with a NULL
pointer dereference. This is due to netvsc_device_remove() being
called and deallocating the channel ring buffers, which can then be
accessed by netvsc_send_pkt() before they're allocated on calling
netvsc_device_add()

The patch fixes this problem by checking the channel state and returning
ENODEV if not yet opened. We also move the call to hv_ringbuf_avail_percent()
which may access the uninitialized ring buffer.

Signed-off-by: Mohammed Gamal 
---
 drivers/net/hyperv/netvsc.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index fe01e14..75f1b31 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -825,7 +825,12 @@ static inline int netvsc_send_pkt(
struct netdev_queue *txq = netdev_get_tx_queue(ndev, packet->q_idx);
u64 req_id;
int ret;
-   u32 ring_avail = hv_get_avail_to_write_percent(_channel->outbound);
+   u32 ring_avail;
+
+   if (out_channel->state != CHANNEL_OPENED_STATE)
+   return -ENODEV;
+
+   ring_avail = hv_get_avail_to_write_percent(_channel->outbound);
 
nvmsg.hdr.msg_type = NVSP_MSG1_TYPE_SEND_RNDIS_PKT;
if (skb)
-- 
1.8.3.1

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH] hv_netvsc: Make sure out channel is fully opened on send

2018-03-16 Thread David Miller
From: Mohammed Gamal 
Date: Tue, 13 Mar 2018 20:06:50 +0100

> Dring high network traffic changes to network interface parameters
> such as number of channels or MTU can cause a kernel panic with a NULL
> pointer dereference. This is due to netvsc_device_remove() being
> called and deallocating the channel ring buffers, which can then be
> accessed by netvsc_send_pkt() before they're allocated on calling
> netvsc_device_add()
> 
> The patch fixes this problem by checking the channel state and returning
> ENODEV if not yet opened. We also move the call to hv_ringbuf_avail_percent()
> which may access the uninitialized ring buffer.
> 
> Signed-off-by: Mohammed Gamal 

Based upon the discusion on this patch, it looks like this will be fixed
in some other way.
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH] hv_netvsc: Make sure out channel is fully opened on send

2018-03-15 Thread Stephen Hemminger
On Thu, 15 Mar 2018 17:24:13 +0100
Mohammed Gamal  wrote:

> On Wed, 2018-03-14 at 10:22 +0100, Mohammed Gamal wrote:
> > On Tue, 2018-03-13 at 12:35 -0700, Stephen Hemminger wrote:  
> > > On Tue, 13 Mar 2018 20:06:50 +0100
> > > Mohammed Gamal  wrote:
> > >   
> > > > Dring high network traffic changes to network interface
> > > > parameters
> > > > such as number of channels or MTU can cause a kernel panic with a
> > > > NULL
> > > > pointer dereference. This is due to netvsc_device_remove() being
> > > > called and deallocating the channel ring buffers, which can then
> > > > be
> > > > accessed by netvsc_send_pkt() before they're allocated on calling
> > > > netvsc_device_add()
> > > > 
> > > > The patch fixes this problem by checking the channel state and
> > > > returning
> > > > ENODEV if not yet opened. We also move the call to
> > > > hv_ringbuf_avail_percent()
> > > > which may access the uninitialized ring buffer.
> > > > 
> > > > Signed-off-by: Mohammed Gamal 
> > > > ---
> > > >  drivers/net/hyperv/netvsc.c | 5 +++--
> > > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/drivers/net/hyperv/netvsc.c
> > > > b/drivers/net/hyperv/netvsc.c
> > > > index 0265d70..44a8358 100644
> > > > --- a/drivers/net/hyperv/netvsc.c
> > > > +++ b/drivers/net/hyperv/netvsc.c
> > > > @@ -757,7 +757,7 @@ static inline int netvsc_send_pkt(
> > > >     struct netdev_queue *txq = netdev_get_tx_queue(ndev,
> > > > packet->q_idx);
> > > >     u64 req_id;
> > > >     int ret;
> > > > -   u32 ring_avail = hv_ringbuf_avail_percent(_channel-  
> > > > > outbound);  
> > > > 
> > > > +   u32 ring_avail;
> > > >  
> > > >     nvmsg.hdr.msg_type = NVSP_MSG1_TYPE_SEND_RNDIS_PKT;
> > > >     if (skb)
> > > > @@ -773,7 +773,7 @@ static inline int netvsc_send_pkt(
> > > >  
> > > >     req_id = (ulong)skb;
> > > >  
> > > > -   if (out_channel->rescind)
> > > > +   if (out_channel->rescind || out_channel->state !=
> > > > CHANNEL_OPENED_STATE)
> > > >     return -ENODEV;
> > > >  
> > > >     if (packet->page_buf_cnt) {
> > > > @@ -791,6 +791,7 @@ static inline int netvsc_send_pkt(
> > > >        VMBUS_DATA_PACKET_FLAG_CO
> > > > MP
> > > > LETION_REQUESTED);
> > > >     }
> > > >  
> > > > +   ring_avail = hv_ringbuf_avail_percent(_channel-  
> > > > > outbound);  
> > > > 
> > > >     if (ret == 0) {
> > > >     atomic_inc_return(>queue_sends);
> > > >    
> > > 
> > > Thanks for your patch. Yes there are races with the current update
> > > logic. The root cause goes higher up in the flow; the send queues
> > > should
> > > be stopped before netvsc_device_remove is called. Solving it where
> > > you tried
> > > to is racy and not going to work reliably.
> > > 
> > > Network patches should go to net...@vger.kernel.org
> > > 
> > > You can't move the ring_avail check until after the
> > > vmbus_sendpacket
> > > because
> > > that will break the flow control logic.
> > >   
> > 
> > Why? I don't see ring_avail being used before that point.  
> 
> Ah, stupid me. vmbus_sendpacket() will write to the ring buffer and
> that means that ring_avail value will be different than the expected.
> 
> >   
> > > Instead, you should just move the avail_read check until just after
> > > the existing rescind
> > > check.
> > > 
> > > Also, you shouldn't need to check for OPENED_STATE, just rescind is
> > > enough.  
> > 
> > That rarely mitigated the race. channel->rescind flag is set on vmbus
> > exit - called on module unload - and when a rescind offer is received
> > from the host, which AFAICT doesn't happen on every call to
> > netvsc_device_remove, so it's quite possible that the ringbuffer is
> > accessed before it's allocated again on channel open and hence the
> > check for OPENED_STAT - which is only set after all vmbus data is
> > initialized.
> >   
> 
> Perhaps I haven't been clear enough. The NULL pointer dereference
> happens in the call to hv_ringbuf_avail_percent() which is used to
> calculate ring_avail. 
> 
> So we need to stop the queues before calling it if the channel's ring
> buffers haven't been allocated yet, but OTOH we should only stop the
> queues based upon the value of ring_avail, so this leads into a chicken
> and egg situation. 
> 
> Is my observation here correct? Please correct me if I am wrong,
> Stephen.

This is a far more drastic work of the shutdown logic which I am still working
on. The current netvsc driver is not doing a good job of handling concurrent
send/receives during ring changes.


From a22da18b41ad5024340dddcc989d918987836f6d Mon Sep 17 00:00:00 2001
From: Stephen Hemminger 
Date: Tue, 6 Feb 2018 15:05:19 -0800
Subject: [PATCH] hv_netvsc: common detach logic

Make common function for detaching internals of device
during changes to MTU and RSS. Make sure no more packets
are 

Re: [PATCH] hv_netvsc: Make sure out channel is fully opened on send

2018-03-15 Thread Mohammed Gamal
On Wed, 2018-03-14 at 10:22 +0100, Mohammed Gamal wrote:
> On Tue, 2018-03-13 at 12:35 -0700, Stephen Hemminger wrote:
> > On Tue, 13 Mar 2018 20:06:50 +0100
> > Mohammed Gamal  wrote:
> > 
> > > Dring high network traffic changes to network interface
> > > parameters
> > > such as number of channels or MTU can cause a kernel panic with a
> > > NULL
> > > pointer dereference. This is due to netvsc_device_remove() being
> > > called and deallocating the channel ring buffers, which can then
> > > be
> > > accessed by netvsc_send_pkt() before they're allocated on calling
> > > netvsc_device_add()
> > > 
> > > The patch fixes this problem by checking the channel state and
> > > returning
> > > ENODEV if not yet opened. We also move the call to
> > > hv_ringbuf_avail_percent()
> > > which may access the uninitialized ring buffer.
> > > 
> > > Signed-off-by: Mohammed Gamal 
> > > ---
> > >  drivers/net/hyperv/netvsc.c | 5 +++--
> > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/net/hyperv/netvsc.c
> > > b/drivers/net/hyperv/netvsc.c
> > > index 0265d70..44a8358 100644
> > > --- a/drivers/net/hyperv/netvsc.c
> > > +++ b/drivers/net/hyperv/netvsc.c
> > > @@ -757,7 +757,7 @@ static inline int netvsc_send_pkt(
> > >   struct netdev_queue *txq = netdev_get_tx_queue(ndev,
> > > packet->q_idx);
> > >   u64 req_id;
> > >   int ret;
> > > - u32 ring_avail = hv_ringbuf_avail_percent(_channel-
> > > > outbound);
> > > 
> > > + u32 ring_avail;
> > >  
> > >   nvmsg.hdr.msg_type = NVSP_MSG1_TYPE_SEND_RNDIS_PKT;
> > >   if (skb)
> > > @@ -773,7 +773,7 @@ static inline int netvsc_send_pkt(
> > >  
> > >   req_id = (ulong)skb;
> > >  
> > > - if (out_channel->rescind)
> > > + if (out_channel->rescind || out_channel->state !=
> > > CHANNEL_OPENED_STATE)
> > >   return -ENODEV;
> > >  
> > >   if (packet->page_buf_cnt) {
> > > @@ -791,6 +791,7 @@ static inline int netvsc_send_pkt(
> > >      VMBUS_DATA_PACKET_FLAG_CO
> > > MP
> > > LETION_REQUESTED);
> > >   }
> > >  
> > > + ring_avail = hv_ringbuf_avail_percent(_channel-
> > > > outbound);
> > > 
> > >   if (ret == 0) {
> > >   atomic_inc_return(>queue_sends);
> > >  
> > 
> > Thanks for your patch. Yes there are races with the current update
> > logic. The root cause goes higher up in the flow; the send queues
> > should
> > be stopped before netvsc_device_remove is called. Solving it where
> > you tried
> > to is racy and not going to work reliably.
> > 
> > Network patches should go to net...@vger.kernel.org
> > 
> > You can't move the ring_avail check until after the
> > vmbus_sendpacket
> > because
> > that will break the flow control logic.
> > 
> 
> Why? I don't see ring_avail being used before that point.

Ah, stupid me. vmbus_sendpacket() will write to the ring buffer and
that means that ring_avail value will be different than the expected.

> 
> > Instead, you should just move the avail_read check until just after
> > the existing rescind
> > check.
> > 
> > Also, you shouldn't need to check for OPENED_STATE, just rescind is
> > enough.
> 
> That rarely mitigated the race. channel->rescind flag is set on vmbus
> exit - called on module unload - and when a rescind offer is received
> from the host, which AFAICT doesn't happen on every call to
> netvsc_device_remove, so it's quite possible that the ringbuffer is
> accessed before it's allocated again on channel open and hence the
> check for OPENED_STAT - which is only set after all vmbus data is
> initialized.
> 

Perhaps I haven't been clear enough. The NULL pointer dereference
happens in the call to hv_ringbuf_avail_percent() which is used to
calculate ring_avail. 

So we need to stop the queues before calling it if the channel's ring
buffers haven't been allocated yet, but OTOH we should only stop the
queues based upon the value of ring_avail, so this leads into a chicken
and egg situation. 

Is my observation here correct? Please correct me if I am wrong,
Stephen.
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH] hv_netvsc: Make sure out channel is fully opened on send

2018-03-14 Thread Mohammed Gamal
On Tue, 2018-03-13 at 12:35 -0700, Stephen Hemminger wrote:
> On Tue, 13 Mar 2018 20:06:50 +0100
> Mohammed Gamal  wrote:
> 
> > Dring high network traffic changes to network interface parameters
> > such as number of channels or MTU can cause a kernel panic with a
> > NULL
> > pointer dereference. This is due to netvsc_device_remove() being
> > called and deallocating the channel ring buffers, which can then be
> > accessed by netvsc_send_pkt() before they're allocated on calling
> > netvsc_device_add()
> > 
> > The patch fixes this problem by checking the channel state and
> > returning
> > ENODEV if not yet opened. We also move the call to
> > hv_ringbuf_avail_percent()
> > which may access the uninitialized ring buffer.
> > 
> > Signed-off-by: Mohammed Gamal 
> > ---
> >  drivers/net/hyperv/netvsc.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/net/hyperv/netvsc.c
> > b/drivers/net/hyperv/netvsc.c
> > index 0265d70..44a8358 100644
> > --- a/drivers/net/hyperv/netvsc.c
> > +++ b/drivers/net/hyperv/netvsc.c
> > @@ -757,7 +757,7 @@ static inline int netvsc_send_pkt(
> >     struct netdev_queue *txq = netdev_get_tx_queue(ndev,
> > packet->q_idx);
> >     u64 req_id;
> >     int ret;
> > -   u32 ring_avail = hv_ringbuf_avail_percent(_channel-
> > >outbound);
> > +   u32 ring_avail;
> >  
> >     nvmsg.hdr.msg_type = NVSP_MSG1_TYPE_SEND_RNDIS_PKT;
> >     if (skb)
> > @@ -773,7 +773,7 @@ static inline int netvsc_send_pkt(
> >  
> >     req_id = (ulong)skb;
> >  
> > -   if (out_channel->rescind)
> > +   if (out_channel->rescind || out_channel->state !=
> > CHANNEL_OPENED_STATE)
> >     return -ENODEV;
> >  
> >     if (packet->page_buf_cnt) {
> > @@ -791,6 +791,7 @@ static inline int netvsc_send_pkt(
> >        VMBUS_DATA_PACKET_FLAG_COMP
> > LETION_REQUESTED);
> >     }
> >  
> > +   ring_avail = hv_ringbuf_avail_percent(_channel-
> > >outbound);
> >     if (ret == 0) {
> >     atomic_inc_return(>queue_sends);
> >  
> 
> Thanks for your patch. Yes there are races with the current update
> logic. The root cause goes higher up in the flow; the send queues
> should
> be stopped before netvsc_device_remove is called. Solving it where
> you tried
> to is racy and not going to work reliably.
> 
> Network patches should go to net...@vger.kernel.org
> 
> You can't move the ring_avail check until after the vmbus_sendpacket
> because
> that will break the flow control logic.
> 
Why? I don't see ring_avail being used before that point.

> Instead, you should just move the avail_read check until just after
> the existing rescind
> check.
> 
> Also, you shouldn't need to check for OPENED_STATE, just rescind is
> enough.

That rarely mitigated the race. channel->rescind flag is set on vmbus
exit - called on module unload - and when a rescind offer is received
from the host, which AFAICT doesn't happen on every call to
netvsc_device_remove, so it's quite possible that the ringbuffer is
accessed before it's allocated again on channel open and hence the
check for OPENED_STAT - which is only set after all vmbus data is
initialized.

> 
> 
> 
> 
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH] hv_netvsc: Make sure out channel is fully opened on send

2018-03-14 Thread Dan Carpenter
On Tue, Mar 13, 2018 at 08:06:50PM +0100, Mohammed Gamal wrote:
> @@ -791,6 +791,7 @@ static inline int netvsc_send_pkt(
>  
> VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED);
>   }
>  
> + ring_avail = hv_ringbuf_avail_percent(_channel->outbound);
>   if (ret == 0) {
>   atomic_inc_return(>queue_sends);
>  

Could you move the assignment inside the "ret == 0" path closer to where
it's used?

regards,
dan carpenter


___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH] hv_netvsc: Make sure out channel is fully opened on send

2018-03-13 Thread Stephen Hemminger
On Tue, 13 Mar 2018 20:06:50 +0100
Mohammed Gamal  wrote:

> Dring high network traffic changes to network interface parameters
> such as number of channels or MTU can cause a kernel panic with a NULL
> pointer dereference. This is due to netvsc_device_remove() being
> called and deallocating the channel ring buffers, which can then be
> accessed by netvsc_send_pkt() before they're allocated on calling
> netvsc_device_add()
> 
> The patch fixes this problem by checking the channel state and returning
> ENODEV if not yet opened. We also move the call to hv_ringbuf_avail_percent()
> which may access the uninitialized ring buffer.
> 
> Signed-off-by: Mohammed Gamal 
> ---
>  drivers/net/hyperv/netvsc.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
> index 0265d70..44a8358 100644
> --- a/drivers/net/hyperv/netvsc.c
> +++ b/drivers/net/hyperv/netvsc.c
> @@ -757,7 +757,7 @@ static inline int netvsc_send_pkt(
>   struct netdev_queue *txq = netdev_get_tx_queue(ndev, packet->q_idx);
>   u64 req_id;
>   int ret;
> - u32 ring_avail = hv_ringbuf_avail_percent(_channel->outbound);
> + u32 ring_avail;
>  
>   nvmsg.hdr.msg_type = NVSP_MSG1_TYPE_SEND_RNDIS_PKT;
>   if (skb)
> @@ -773,7 +773,7 @@ static inline int netvsc_send_pkt(
>  
>   req_id = (ulong)skb;
>  
> - if (out_channel->rescind)
> + if (out_channel->rescind || out_channel->state != CHANNEL_OPENED_STATE)
>   return -ENODEV;
>  
>   if (packet->page_buf_cnt) {
> @@ -791,6 +791,7 @@ static inline int netvsc_send_pkt(
>  
> VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED);
>   }
>  
> + ring_avail = hv_ringbuf_avail_percent(_channel->outbound);
>   if (ret == 0) {
>   atomic_inc_return(>queue_sends);
>  

Thanks for your patch. Yes there are races with the current update
logic. The root cause goes higher up in the flow; the send queues should
be stopped before netvsc_device_remove is called. Solving it where you tried
to is racy and not going to work reliably.

Network patches should go to net...@vger.kernel.org

You can't move the ring_avail check until after the vmbus_sendpacket because
that will break the flow control logic.

Instead, you should just move the avail_read check until just after the 
existing rescind
check.

Also, you shouldn't need to check for OPENED_STATE, just rescind is enough.




___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel