Re: [PATCH] hv_netvsc: Make sure out channel is fully opened on send
On Thu, 2018-09-27 at 12:23 +0200, Stephen Hemminger wrote: > On Thu, 27 Sep 2018 10:57:05 +0200 > Mohammed Gamal wrote: > > > On Wed, 2018-09-26 at 17:13 +, Haiyang Zhang wrote: > > > > -Original Message- > > > > From: Mohammed Gamal > > > > Sent: Wednesday, September 26, 2018 12:34 PM > > > > To: Stephen Hemminger ; net...@vger.ker > > > > nel. > > > > org > > > > Cc: KY Srinivasan ; Haiyang Zhang > > > > ; vkuznets ; > > > > ot...@redhat.com; cavery ; linux- > > > > ker...@vger.kernel.org; de...@linuxdriverproject.org; Mohammed > > > > Gamal > > > > > > > > Subject: [PATCH] hv_netvsc: Make sure out channel is fully > > > > opened > > > > on send > > > > > > > > Dring high network traffic changes to network interface > > > > parameters > > > > such as > > > > number of channels or MTU can cause a kernel panic with a NULL > > > > pointer > > > > dereference. This is due to netvsc_device_remove() being called > > > > and > > > > deallocating the channel ring buffers, which can then be > > > > accessed > > > > by > > > > netvsc_send_pkt() before they're allocated on calling > > > > netvsc_device_add() > > > > > > > > The patch fixes this problem by checking the channel state and > > > > returning > > > > ENODEV if not yet opened. We also move the call to > > > > hv_ringbuf_avail_percent() > > > > which may access the uninitialized ring buffer. > > > > > > > > Signed-off-by: Mohammed Gamal > > > > --- > > > > drivers/net/hyperv/netvsc.c | 7 ++- > > > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/net/hyperv/netvsc.c > > > > b/drivers/net/hyperv/netvsc.c index > > > > fe01e14..75f1b31 100644 > > > > --- a/drivers/net/hyperv/netvsc.c > > > > +++ b/drivers/net/hyperv/netvsc.c > > > > @@ -825,7 +825,12 @@ static inline int netvsc_send_pkt( > > > > struct netdev_queue *txq = netdev_get_tx_queue(ndev, > > > > packet->q_idx); > > > > u64 req_id; > > > > int ret; > > > > - u32 ring_avail = > > > > hv_get_avail_to_write_percent(&out_channel- > > > > > outbound); > > > > > > > > + u32 ring_avail; > > > > + > > > > + if (out_channel->state != CHANNEL_OPENED_STATE) > > > > + return -ENODEV; > > > > + > > > > + ring_avail = > > > > hv_get_avail_to_write_percent(&out_channel- > > > > > outbound); > > > > > > When you reproducing the NULL ptr panic, does your kernel include > > > the > > > following patch? > > > hv_netvsc: common detach logic > > > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.g > > > it/c > > > ommit/?id=7b2ee50c0cd513a176a26a71f2989facdd75bfea > > > > > > > Yes it is included. And the commit did reduce the occurrence of > > this > > race condition, but it still nevertheless occurs albeit rarely. > > > > > We call netif_tx_disable(ndev) and netif_device_detach(ndev) > > > before > > > doing the changes > > > on MTU or #channels. So there should be no call to start_xmit() > > > when > > > channel is not ready. > > > > > > If you see the check for CHANNEL_OPENED_STATE is still necessary > > > on > > > upstream kernel (including > > > the patch " common detach logic "), we should debug further on > > > the > > > code and find out the > > > root cause. > > > > > > Thanks, > > > - Haiyang > > > > > > > ___ > > devel mailing list > > de...@linuxdriverproject.org > > http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev- > > devel > > Is there some workload, that can be used to reproduce this? > The stress test from Vitaly with changing parameters while running > network traffic > passes now. > > Can you reproduce this with the upstream current kernel? > > Adding the check in start xmit is still racy, and won't cure the > problem. > > Another solution would be to add a grace period in the netvsc detach > logic. > Steps to reproduce are listed here: https://bugzilla.redhat.com/show_bug.cgi?id=1632653 We've also managed to reproduce the same issue upstream. It's more likely to be reproduced on Windows 2012R2 than 2016. Regards, Mohammed ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: [PATCH] hv_netvsc: Make sure out channel is fully opened on send
On Thu, 27 Sep 2018 10:57:05 +0200 Mohammed Gamal wrote: > On Wed, 2018-09-26 at 17:13 +, Haiyang Zhang wrote: > > > -Original Message- > > > From: Mohammed Gamal > > > Sent: Wednesday, September 26, 2018 12:34 PM > > > To: Stephen Hemminger ; netdev@vger.kernel. > > > org > > > Cc: KY Srinivasan ; Haiyang Zhang > > > ; vkuznets ; > > > ot...@redhat.com; cavery ; linux- > > > ker...@vger.kernel.org; de...@linuxdriverproject.org; Mohammed > > > Gamal > > > > > > Subject: [PATCH] hv_netvsc: Make sure out channel is fully opened > > > on send > > > > > > Dring high network traffic changes to network interface parameters > > > such as > > > number of channels or MTU can cause a kernel panic with a NULL > > > pointer > > > dereference. This is due to netvsc_device_remove() being called and > > > deallocating the channel ring buffers, which can then be accessed > > > by > > > netvsc_send_pkt() before they're allocated on calling > > > netvsc_device_add() > > > > > > The patch fixes this problem by checking the channel state and > > > returning > > > ENODEV if not yet opened. We also move the call to > > > hv_ringbuf_avail_percent() > > > which may access the uninitialized ring buffer. > > > > > > Signed-off-by: Mohammed Gamal > > > --- > > > drivers/net/hyperv/netvsc.c | 7 ++- > > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/net/hyperv/netvsc.c > > > b/drivers/net/hyperv/netvsc.c index > > > fe01e14..75f1b31 100644 > > > --- a/drivers/net/hyperv/netvsc.c > > > +++ b/drivers/net/hyperv/netvsc.c > > > @@ -825,7 +825,12 @@ static inline int netvsc_send_pkt( > > > struct netdev_queue *txq = netdev_get_tx_queue(ndev, > > > packet->q_idx); > > > u64 req_id; > > > int ret; > > > - u32 ring_avail = > > > hv_get_avail_to_write_percent(&out_channel- > > > > outbound); > > > > > > + u32 ring_avail; > > > + > > > + if (out_channel->state != CHANNEL_OPENED_STATE) > > > + return -ENODEV; > > > + > > > + ring_avail = hv_get_avail_to_write_percent(&out_channel- > > > >outbound); > > > > When you reproducing the NULL ptr panic, does your kernel include the > > following patch? > > hv_netvsc: common detach logic > > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/c > > ommit/?id=7b2ee50c0cd513a176a26a71f2989facdd75bfea > > > Yes it is included. And the commit did reduce the occurrence of this > race condition, but it still nevertheless occurs albeit rarely. > > > We call netif_tx_disable(ndev) and netif_device_detach(ndev) before > > doing the changes > > on MTU or #channels. So there should be no call to start_xmit() when > > channel is not ready. > > > > If you see the check for CHANNEL_OPENED_STATE is still necessary on > > upstream kernel (including > > the patch " common detach logic "), we should debug further on the > > code and find out the > > root cause. > > > > Thanks, > > - Haiyang > > > ___ > devel mailing list > de...@linuxdriverproject.org > http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel Is there some workload, that can be used to reproduce this? The stress test from Vitaly with changing parameters while running network traffic passes now. Can you reproduce this with the upstream current kernel? Adding the check in start xmit is still racy, and won't cure the problem. Another solution would be to add a grace period in the netvsc detach logic. ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: [PATCH] hv_netvsc: Make sure out channel is fully opened on send
On Wed, 2018-09-26 at 17:13 +, Haiyang Zhang wrote: > > -Original Message- > > From: Mohammed Gamal > > Sent: Wednesday, September 26, 2018 12:34 PM > > To: Stephen Hemminger ; netdev@vger.kernel. > > org > > Cc: KY Srinivasan ; Haiyang Zhang > > ; vkuznets ; > > ot...@redhat.com; cavery ; linux- > > ker...@vger.kernel.org; de...@linuxdriverproject.org; Mohammed > > Gamal > > > > Subject: [PATCH] hv_netvsc: Make sure out channel is fully opened > > on send > > > > Dring high network traffic changes to network interface parameters > > such as > > number of channels or MTU can cause a kernel panic with a NULL > > pointer > > dereference. This is due to netvsc_device_remove() being called and > > deallocating the channel ring buffers, which can then be accessed > > by > > netvsc_send_pkt() before they're allocated on calling > > netvsc_device_add() > > > > The patch fixes this problem by checking the channel state and > > returning > > ENODEV if not yet opened. We also move the call to > > hv_ringbuf_avail_percent() > > which may access the uninitialized ring buffer. > > > > Signed-off-by: Mohammed Gamal > > --- > > drivers/net/hyperv/netvsc.c | 7 ++- > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/net/hyperv/netvsc.c > > b/drivers/net/hyperv/netvsc.c index > > fe01e14..75f1b31 100644 > > --- a/drivers/net/hyperv/netvsc.c > > +++ b/drivers/net/hyperv/netvsc.c > > @@ -825,7 +825,12 @@ static inline int netvsc_send_pkt( > > struct netdev_queue *txq = netdev_get_tx_queue(ndev, > > packet->q_idx); > > u64 req_id; > > int ret; > > - u32 ring_avail = > > hv_get_avail_to_write_percent(&out_channel- > > > outbound); > > > > + u32 ring_avail; > > + > > + if (out_channel->state != CHANNEL_OPENED_STATE) > > + return -ENODEV; > > + > > + ring_avail = hv_get_avail_to_write_percent(&out_channel- > > >outbound); > > When you reproducing the NULL ptr panic, does your kernel include the > following patch? > hv_netvsc: common detach logic > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/c > ommit/?id=7b2ee50c0cd513a176a26a71f2989facdd75bfea > Yes it is included. And the commit did reduce the occurrence of this race condition, but it still nevertheless occurs albeit rarely. > We call netif_tx_disable(ndev) and netif_device_detach(ndev) before > doing the changes > on MTU or #channels. So there should be no call to start_xmit() when > channel is not ready. > > If you see the check for CHANNEL_OPENED_STATE is still necessary on > upstream kernel (including > the patch " common detach logic "), we should debug further on the > code and find out the > root cause. > > Thanks, > - Haiyang > ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH] hv_netvsc: Make sure out channel is fully opened on send
> -Original Message- > From: Mohammed Gamal > Sent: Wednesday, September 26, 2018 12:34 PM > To: Stephen Hemminger ; net...@vger.kernel.org > Cc: KY Srinivasan ; Haiyang Zhang > ; vkuznets ; > ot...@redhat.com; cavery ; linux- > ker...@vger.kernel.org; de...@linuxdriverproject.org; Mohammed Gamal > > Subject: [PATCH] hv_netvsc: Make sure out channel is fully opened on send > > Dring high network traffic changes to network interface parameters such as > number of channels or MTU can cause a kernel panic with a NULL pointer > dereference. This is due to netvsc_device_remove() being called and > deallocating the channel ring buffers, which can then be accessed by > netvsc_send_pkt() before they're allocated on calling > netvsc_device_add() > > The patch fixes this problem by checking the channel state and returning > ENODEV if not yet opened. We also move the call to hv_ringbuf_avail_percent() > which may access the uninitialized ring buffer. > > Signed-off-by: Mohammed Gamal > --- > drivers/net/hyperv/netvsc.c | 7 ++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c index > fe01e14..75f1b31 100644 > --- a/drivers/net/hyperv/netvsc.c > +++ b/drivers/net/hyperv/netvsc.c > @@ -825,7 +825,12 @@ static inline int netvsc_send_pkt( > struct netdev_queue *txq = netdev_get_tx_queue(ndev, packet->q_idx); > u64 req_id; > int ret; > - u32 ring_avail = hv_get_avail_to_write_percent(&out_channel- > >outbound); > + u32 ring_avail; > + > + if (out_channel->state != CHANNEL_OPENED_STATE) > + return -ENODEV; > + > + ring_avail = hv_get_avail_to_write_percent(&out_channel->outbound); When you reproducing the NULL ptr panic, does your kernel include the following patch? hv_netvsc: common detach logic https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=7b2ee50c0cd513a176a26a71f2989facdd75bfea We call netif_tx_disable(ndev) and netif_device_detach(ndev) before doing the changes on MTU or #channels. So there should be no call to start_xmit() when channel is not ready. If you see the check for CHANNEL_OPENED_STATE is still necessary on upstream kernel (including the patch " common detach logic "), we should debug further on the code and find out the root cause. Thanks, - Haiyang ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: [PATCH] hv_netvsc: Make sure out channel is fully opened on send
From: Mohammed Gamal Date: Tue, 13 Mar 2018 20:06:50 +0100 > Dring high network traffic changes to network interface parameters > such as number of channels or MTU can cause a kernel panic with a NULL > pointer dereference. This is due to netvsc_device_remove() being > called and deallocating the channel ring buffers, which can then be > accessed by netvsc_send_pkt() before they're allocated on calling > netvsc_device_add() > > The patch fixes this problem by checking the channel state and returning > ENODEV if not yet opened. We also move the call to hv_ringbuf_avail_percent() > which may access the uninitialized ring buffer. > > Signed-off-by: Mohammed Gamal Based upon the discusion on this patch, it looks like this will be fixed in some other way. ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: [PATCH] hv_netvsc: Make sure out channel is fully opened on send
On Thu, 15 Mar 2018 17:24:13 +0100 Mohammed Gamal wrote: > On Wed, 2018-03-14 at 10:22 +0100, Mohammed Gamal wrote: > > On Tue, 2018-03-13 at 12:35 -0700, Stephen Hemminger wrote: > > > On Tue, 13 Mar 2018 20:06:50 +0100 > > > Mohammed Gamal wrote: > > > > > > > Dring high network traffic changes to network interface > > > > parameters > > > > such as number of channels or MTU can cause a kernel panic with a > > > > NULL > > > > pointer dereference. This is due to netvsc_device_remove() being > > > > called and deallocating the channel ring buffers, which can then > > > > be > > > > accessed by netvsc_send_pkt() before they're allocated on calling > > > > netvsc_device_add() > > > > > > > > The patch fixes this problem by checking the channel state and > > > > returning > > > > ENODEV if not yet opened. We also move the call to > > > > hv_ringbuf_avail_percent() > > > > which may access the uninitialized ring buffer. > > > > > > > > Signed-off-by: Mohammed Gamal > > > > --- > > > > drivers/net/hyperv/netvsc.c | 5 +++-- > > > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/drivers/net/hyperv/netvsc.c > > > > b/drivers/net/hyperv/netvsc.c > > > > index 0265d70..44a8358 100644 > > > > --- a/drivers/net/hyperv/netvsc.c > > > > +++ b/drivers/net/hyperv/netvsc.c > > > > @@ -757,7 +757,7 @@ static inline int netvsc_send_pkt( > > > > struct netdev_queue *txq = netdev_get_tx_queue(ndev, > > > > packet->q_idx); > > > > u64 req_id; > > > > int ret; > > > > - u32 ring_avail = hv_ringbuf_avail_percent(&out_channel- > > > > > outbound); > > > > > > > > + u32 ring_avail; > > > > > > > > nvmsg.hdr.msg_type = NVSP_MSG1_TYPE_SEND_RNDIS_PKT; > > > > if (skb) > > > > @@ -773,7 +773,7 @@ static inline int netvsc_send_pkt( > > > > > > > > req_id = (ulong)skb; > > > > > > > > - if (out_channel->rescind) > > > > + if (out_channel->rescind || out_channel->state != > > > > CHANNEL_OPENED_STATE) > > > > return -ENODEV; > > > > > > > > if (packet->page_buf_cnt) { > > > > @@ -791,6 +791,7 @@ static inline int netvsc_send_pkt( > > > > VMBUS_DATA_PACKET_FLAG_CO > > > > MP > > > > LETION_REQUESTED); > > > > } > > > > > > > > + ring_avail = hv_ringbuf_avail_percent(&out_channel- > > > > > outbound); > > > > > > > > if (ret == 0) { > > > > atomic_inc_return(&nvchan->queue_sends); > > > > > > > > > > Thanks for your patch. Yes there are races with the current update > > > logic. The root cause goes higher up in the flow; the send queues > > > should > > > be stopped before netvsc_device_remove is called. Solving it where > > > you tried > > > to is racy and not going to work reliably. > > > > > > Network patches should go to net...@vger.kernel.org > > > > > > You can't move the ring_avail check until after the > > > vmbus_sendpacket > > > because > > > that will break the flow control logic. > > > > > > > Why? I don't see ring_avail being used before that point. > > Ah, stupid me. vmbus_sendpacket() will write to the ring buffer and > that means that ring_avail value will be different than the expected. > > > > > > Instead, you should just move the avail_read check until just after > > > the existing rescind > > > check. > > > > > > Also, you shouldn't need to check for OPENED_STATE, just rescind is > > > enough. > > > > That rarely mitigated the race. channel->rescind flag is set on vmbus > > exit - called on module unload - and when a rescind offer is received > > from the host, which AFAICT doesn't happen on every call to > > netvsc_device_remove, so it's quite possible that the ringbuffer is > > accessed before it's allocated again on channel open and hence the > > check for OPENED_STAT - which is only set after all vmbus data is > > initialized. > > > > Perhaps I haven't been clear enough. The NULL pointer dereference > happens in the call to hv_ringbuf_avail_percent() which is used to > calculate ring_avail. > > So we need to stop the queues before calling it if the channel's ring > buffers haven't been allocated yet, but OTOH we should only stop the > queues based upon the value of ring_avail, so this leads into a chicken > and egg situation. > > Is my observation here correct? Please correct me if I am wrong, > Stephen. This is a far more drastic work of the shutdown logic which I am still working on. The current netvsc driver is not doing a good job of handling concurrent send/receives during ring changes. From a22da18b41ad5024340dddcc989d918987836f6d Mon Sep 17 00:00:00 2001 From: Stephen Hemminger Date: Tue, 6 Feb 2018 15:05:19 -0800 Subject: [PATCH] hv_netvsc: common detach logic Make common function for detaching internals of device during changes to MTU and RSS. Make sure no more packets are transmitted and all packets have been received before doing device t
Re: [PATCH] hv_netvsc: Make sure out channel is fully opened on send
On Wed, 2018-03-14 at 10:22 +0100, Mohammed Gamal wrote: > On Tue, 2018-03-13 at 12:35 -0700, Stephen Hemminger wrote: > > On Tue, 13 Mar 2018 20:06:50 +0100 > > Mohammed Gamal wrote: > > > > > Dring high network traffic changes to network interface > > > parameters > > > such as number of channels or MTU can cause a kernel panic with a > > > NULL > > > pointer dereference. This is due to netvsc_device_remove() being > > > called and deallocating the channel ring buffers, which can then > > > be > > > accessed by netvsc_send_pkt() before they're allocated on calling > > > netvsc_device_add() > > > > > > The patch fixes this problem by checking the channel state and > > > returning > > > ENODEV if not yet opened. We also move the call to > > > hv_ringbuf_avail_percent() > > > which may access the uninitialized ring buffer. > > > > > > Signed-off-by: Mohammed Gamal > > > --- > > > drivers/net/hyperv/netvsc.c | 5 +++-- > > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > > > diff --git a/drivers/net/hyperv/netvsc.c > > > b/drivers/net/hyperv/netvsc.c > > > index 0265d70..44a8358 100644 > > > --- a/drivers/net/hyperv/netvsc.c > > > +++ b/drivers/net/hyperv/netvsc.c > > > @@ -757,7 +757,7 @@ static inline int netvsc_send_pkt( > > > struct netdev_queue *txq = netdev_get_tx_queue(ndev, > > > packet->q_idx); > > > u64 req_id; > > > int ret; > > > - u32 ring_avail = hv_ringbuf_avail_percent(&out_channel- > > > > outbound); > > > > > > + u32 ring_avail; > > > > > > nvmsg.hdr.msg_type = NVSP_MSG1_TYPE_SEND_RNDIS_PKT; > > > if (skb) > > > @@ -773,7 +773,7 @@ static inline int netvsc_send_pkt( > > > > > > req_id = (ulong)skb; > > > > > > - if (out_channel->rescind) > > > + if (out_channel->rescind || out_channel->state != > > > CHANNEL_OPENED_STATE) > > > return -ENODEV; > > > > > > if (packet->page_buf_cnt) { > > > @@ -791,6 +791,7 @@ static inline int netvsc_send_pkt( > > > VMBUS_DATA_PACKET_FLAG_CO > > > MP > > > LETION_REQUESTED); > > > } > > > > > > + ring_avail = hv_ringbuf_avail_percent(&out_channel- > > > > outbound); > > > > > > if (ret == 0) { > > > atomic_inc_return(&nvchan->queue_sends); > > > > > > > Thanks for your patch. Yes there are races with the current update > > logic. The root cause goes higher up in the flow; the send queues > > should > > be stopped before netvsc_device_remove is called. Solving it where > > you tried > > to is racy and not going to work reliably. > > > > Network patches should go to net...@vger.kernel.org > > > > You can't move the ring_avail check until after the > > vmbus_sendpacket > > because > > that will break the flow control logic. > > > > Why? I don't see ring_avail being used before that point. Ah, stupid me. vmbus_sendpacket() will write to the ring buffer and that means that ring_avail value will be different than the expected. > > > Instead, you should just move the avail_read check until just after > > the existing rescind > > check. > > > > Also, you shouldn't need to check for OPENED_STATE, just rescind is > > enough. > > That rarely mitigated the race. channel->rescind flag is set on vmbus > exit - called on module unload - and when a rescind offer is received > from the host, which AFAICT doesn't happen on every call to > netvsc_device_remove, so it's quite possible that the ringbuffer is > accessed before it's allocated again on channel open and hence the > check for OPENED_STAT - which is only set after all vmbus data is > initialized. > Perhaps I haven't been clear enough. The NULL pointer dereference happens in the call to hv_ringbuf_avail_percent() which is used to calculate ring_avail. So we need to stop the queues before calling it if the channel's ring buffers haven't been allocated yet, but OTOH we should only stop the queues based upon the value of ring_avail, so this leads into a chicken and egg situation. Is my observation here correct? Please correct me if I am wrong, Stephen. ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: [PATCH] hv_netvsc: Make sure out channel is fully opened on send
On Tue, 2018-03-13 at 12:35 -0700, Stephen Hemminger wrote: > On Tue, 13 Mar 2018 20:06:50 +0100 > Mohammed Gamal wrote: > > > Dring high network traffic changes to network interface parameters > > such as number of channels or MTU can cause a kernel panic with a > > NULL > > pointer dereference. This is due to netvsc_device_remove() being > > called and deallocating the channel ring buffers, which can then be > > accessed by netvsc_send_pkt() before they're allocated on calling > > netvsc_device_add() > > > > The patch fixes this problem by checking the channel state and > > returning > > ENODEV if not yet opened. We also move the call to > > hv_ringbuf_avail_percent() > > which may access the uninitialized ring buffer. > > > > Signed-off-by: Mohammed Gamal > > --- > > drivers/net/hyperv/netvsc.c | 5 +++-- > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/net/hyperv/netvsc.c > > b/drivers/net/hyperv/netvsc.c > > index 0265d70..44a8358 100644 > > --- a/drivers/net/hyperv/netvsc.c > > +++ b/drivers/net/hyperv/netvsc.c > > @@ -757,7 +757,7 @@ static inline int netvsc_send_pkt( > > struct netdev_queue *txq = netdev_get_tx_queue(ndev, > > packet->q_idx); > > u64 req_id; > > int ret; > > - u32 ring_avail = hv_ringbuf_avail_percent(&out_channel- > > >outbound); > > + u32 ring_avail; > > > > nvmsg.hdr.msg_type = NVSP_MSG1_TYPE_SEND_RNDIS_PKT; > > if (skb) > > @@ -773,7 +773,7 @@ static inline int netvsc_send_pkt( > > > > req_id = (ulong)skb; > > > > - if (out_channel->rescind) > > + if (out_channel->rescind || out_channel->state != > > CHANNEL_OPENED_STATE) > > return -ENODEV; > > > > if (packet->page_buf_cnt) { > > @@ -791,6 +791,7 @@ static inline int netvsc_send_pkt( > > VMBUS_DATA_PACKET_FLAG_COMP > > LETION_REQUESTED); > > } > > > > + ring_avail = hv_ringbuf_avail_percent(&out_channel- > > >outbound); > > if (ret == 0) { > > atomic_inc_return(&nvchan->queue_sends); > > > > Thanks for your patch. Yes there are races with the current update > logic. The root cause goes higher up in the flow; the send queues > should > be stopped before netvsc_device_remove is called. Solving it where > you tried > to is racy and not going to work reliably. > > Network patches should go to net...@vger.kernel.org > > You can't move the ring_avail check until after the vmbus_sendpacket > because > that will break the flow control logic. > Why? I don't see ring_avail being used before that point. > Instead, you should just move the avail_read check until just after > the existing rescind > check. > > Also, you shouldn't need to check for OPENED_STATE, just rescind is > enough. That rarely mitigated the race. channel->rescind flag is set on vmbus exit - called on module unload - and when a rescind offer is received from the host, which AFAICT doesn't happen on every call to netvsc_device_remove, so it's quite possible that the ringbuffer is accessed before it's allocated again on channel open and hence the check for OPENED_STAT - which is only set after all vmbus data is initialized. > > > > ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: [PATCH] hv_netvsc: Make sure out channel is fully opened on send
On Tue, Mar 13, 2018 at 08:06:50PM +0100, Mohammed Gamal wrote: > @@ -791,6 +791,7 @@ static inline int netvsc_send_pkt( > > VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED); > } > > + ring_avail = hv_ringbuf_avail_percent(&out_channel->outbound); > if (ret == 0) { > atomic_inc_return(&nvchan->queue_sends); > Could you move the assignment inside the "ret == 0" path closer to where it's used? regards, dan carpenter ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: [PATCH] hv_netvsc: Make sure out channel is fully opened on send
On Tue, 13 Mar 2018 20:06:50 +0100 Mohammed Gamal wrote: > Dring high network traffic changes to network interface parameters > such as number of channels or MTU can cause a kernel panic with a NULL > pointer dereference. This is due to netvsc_device_remove() being > called and deallocating the channel ring buffers, which can then be > accessed by netvsc_send_pkt() before they're allocated on calling > netvsc_device_add() > > The patch fixes this problem by checking the channel state and returning > ENODEV if not yet opened. We also move the call to hv_ringbuf_avail_percent() > which may access the uninitialized ring buffer. > > Signed-off-by: Mohammed Gamal > --- > drivers/net/hyperv/netvsc.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c > index 0265d70..44a8358 100644 > --- a/drivers/net/hyperv/netvsc.c > +++ b/drivers/net/hyperv/netvsc.c > @@ -757,7 +757,7 @@ static inline int netvsc_send_pkt( > struct netdev_queue *txq = netdev_get_tx_queue(ndev, packet->q_idx); > u64 req_id; > int ret; > - u32 ring_avail = hv_ringbuf_avail_percent(&out_channel->outbound); > + u32 ring_avail; > > nvmsg.hdr.msg_type = NVSP_MSG1_TYPE_SEND_RNDIS_PKT; > if (skb) > @@ -773,7 +773,7 @@ static inline int netvsc_send_pkt( > > req_id = (ulong)skb; > > - if (out_channel->rescind) > + if (out_channel->rescind || out_channel->state != CHANNEL_OPENED_STATE) > return -ENODEV; > > if (packet->page_buf_cnt) { > @@ -791,6 +791,7 @@ static inline int netvsc_send_pkt( > > VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED); > } > > + ring_avail = hv_ringbuf_avail_percent(&out_channel->outbound); > if (ret == 0) { > atomic_inc_return(&nvchan->queue_sends); > Thanks for your patch. Yes there are races with the current update logic. The root cause goes higher up in the flow; the send queues should be stopped before netvsc_device_remove is called. Solving it where you tried to is racy and not going to work reliably. Network patches should go to net...@vger.kernel.org You can't move the ring_avail check until after the vmbus_sendpacket because that will break the flow control logic. Instead, you should just move the avail_read check until just after the existing rescind check. Also, you shouldn't need to check for OPENED_STATE, just rescind is enough. ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel