Re: [PATCH net-next,2/2] hv_netvsc: Add range checking for rx packet offset and length

2018-03-23 Thread Vitaly Kuznetsov
Haiyang Zhang  writes:

> From: Haiyang Zhang 
>
> This patch adds range checking for rx packet offset and length.
> It may only happen if there is a host side bug.
>
> Signed-off-by: Haiyang Zhang 
> ---
>  drivers/net/hyperv/hyperv_net.h |  1 +
>  drivers/net/hyperv/netvsc.c | 17 +++--
>  2 files changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
> index 0db3bd1ea06f..49c05ac894e5 100644
> --- a/drivers/net/hyperv/hyperv_net.h
> +++ b/drivers/net/hyperv/hyperv_net.h
> @@ -793,6 +793,7 @@ struct netvsc_device {
>
>   /* Receive buffer allocated by us but manages by NetVSP */
>   void *recv_buf;
> + u32 recv_buf_size; /* allocated bytes */
>   u32 recv_buf_gpadl_handle;
>   u32 recv_section_cnt;
>   u32 recv_section_size;
> diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
> index 1ddb2c39b6e4..a6700d65f206 100644
> --- a/drivers/net/hyperv/netvsc.c
> +++ b/drivers/net/hyperv/netvsc.c
> @@ -289,6 +289,8 @@ static int netvsc_init_buf(struct hv_device *device,
>   goto cleanup;
>   }
>
> + net_device->recv_buf_size = buf_size;
> +
>   /*
>* Establish the gpadl handle for this buffer on this
>* channel.  Note: This call uses the vmbus connection rather
> @@ -1095,11 +1097,22 @@ static int netvsc_receive(struct net_device *ndev,
>
>   /* Each range represents 1 RNDIS pkt that contains 1 ethernet frame */
>   for (i = 0; i < count; i++) {
> - void *data = recv_buf
> - + vmxferpage_packet->ranges[i].byte_offset;
> + u32 offset = vmxferpage_packet->ranges[i].byte_offset;
>   u32 buflen = vmxferpage_packet->ranges[i].byte_count;
> + void *data;
>   int ret;
>
> + if (unlikely(offset + buflen > net_device->recv_buf_size)) {
> + status = NVSP_STAT_FAIL;
> + netif_err(net_device_ctx, rx_err, ndev,
> +   "Packet offset:%u + len:%u too big\n",
> +   offset, buflen);

This shouldn't happen, of course, but I'd rather ratelimit this error or
even used something like netdev_WARN_ONCE().

> +
> + continue;
> + }
> +
> + data = recv_buf + offset;
> +
>   trace_rndis_recv(ndev, q_idx, data);
>
>   /* Pass it to the upper layer */

-- 
  Vitaly


Re: [RFC 0/2] hv_netvsc shutdown redo

2018-01-27 Thread Vitaly Kuznetsov
Stephen Hemminger  writes:

> These patches change how teardown of Hyper-V network devices
> is done. These are tested on WS2012 and WS2016.
>
> It moves the tx/rx shutdown into the rndis close handling,
> and that makes earlier gpadl changes unnecsssary.
>

Thank you Stephen,

I gave these a try and they didn't survive my 'death row' test on
WS2016: I run 3 things in parallel:

1) iperf to some external IP
2) while true; do ethtool -L ethX combined 6; ethtool -L ethX combined 8; done
3) while true; do ip link set dev ethX mtu 1400; ip link set dev ethX mtu 1450; 
done

I ended up with a hang:

[ 1226.710034] INFO: task ip:2357 blocked for more than 120 seconds.
[ 1226.712397]   Not tainted 4.15.0-rc9+ #321
[ 1226.714030] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 1226.716724] ip  D0  2357   1474 0x
[ 1226.718698] Call Trace:
[ 1226.719588]  ? __schedule+0x1da/0x7b0
[ 1226.720910]  ? get_page_from_freelist+0x106d/0x15c0
[ 1226.722648]  schedule+0x28/0x80
[ 1226.723807]  schedule_preempt_disabled+0xa/0x10
[ 1226.725952]  __mutex_lock.isra.1+0x1a0/0x4e0
[ 1226.727915]  ? rtnetlink_rcv_msg+0x212/0x2d0
[ 1226.729849]  rtnetlink_rcv_msg+0x212/0x2d0
[ 1226.731611]  ? rtnl_calcit.isra.28+0x110/0x110
[ 1226.733824]  netlink_rcv_skb+0x4a/0x120
[ 1226.736916]  netlink_unicast+0x19d/0x250
[ 1226.738907]  netlink_sendmsg+0x2a5/0x3a0
[ 1226.740762]  sock_sendmsg+0x30/0x40
[ 1226.742552]  SYSC_sendto+0x10e/0x140
[ 1226.744310]  ? __do_page_fault+0x26d/0x4c0
[ 1226.746332]  entry_SYSCALL_64_fastpath+0x20/0x83
[ 1226.748730] RIP: 0033:0x7ff2cdc9aa7d
[ 1226.750776] RSP: 002b:7ffd0a3455e8 EFLAGS: 0246
[ 1349.590041] INFO: task kworker/3:6:1586 blocked for more than 120 seconds.
[ 1349.595358]   Not tainted 4.15.0-rc9+ #321
[ 1349.597335] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 1349.600638] kworker/3:6 D0  1586  2 0x8000
[ 1349.603335] Workqueue: ipv6_addrconf addrconf_verify_work
[ 1349.605779] Call Trace:
[ 1349.607080]  ? __schedule+0x1da/0x7b0
[ 1349.608856]  ? update_load_avg+0x563/0x6d0
[ 1349.610834]  ? update_curr+0xb9/0x190
[ 1349.613050]  schedule+0x28/0x80
[ 1349.615290]  schedule_preempt_disabled+0xa/0x10
[ 1349.617306]  __mutex_lock.isra.1+0x1a0/0x4e0
[ 1349.619072]  ? addrconf_verify_work+0xa/0x20
[ 1349.621108]  addrconf_verify_work+0xa/0x20
[ 1349.623107]  process_one_work+0x188/0x380
[ 1349.625012]  worker_thread+0x2e/0x390
[ 1349.626976]  ? process_one_work+0x380/0x380
[ 1349.628925]  kthread+0x111/0x130
[ 1349.630498]  ? kthread_create_worker_on_cpu+0x70/0x70
[ 1349.632786]  ? do_group_exit+0x3a/0xa0
[ 1349.634598]  ret_from_fork+0x35/0x40



(I'm not 100% sure this is a _new_ issue btw, it can happen that the
race was always there and it's just easier to trigger it now).

I'll try to do more testing next week.

Thanks,

-- 
  Vitaly


Re: [RFC PATCH] virtio_net: Extend virtio to use VF datapath when available

2017-12-20 Thread Vitaly Kuznetsov
David Miller  writes:

> From: "Samudrala, Sridhar" 
> Date: Tue, 19 Dec 2017 09:41:39 -0800
>
>> This is based on netvsc implementation and here is the commit that
>> added this delay.  Not sure if this needs to be 100ms.
>> 
>> commit 6123c66854c174e4982f98195100c1d990f9e5e6
>> Author: stephen hemminger 
>> Date:   Wed Aug 9 17:46:03 2017 -0700
>> 
>>     netvsc: delay setup of VF device
>> 
>>     When VF device is discovered, delay bring it automatically up in
>>     order to allow userspace to some simple changes (like renaming).
>
> This is kind of bogus, I should have called this out when the patch
> was posted.
>
> Any delay is wrong, there needs to be tight synchronization if a
> userspace operation must occur before proceeding.  If something
> happens and userspace is delayed, this whole thing doesn't work.

Hyper-V's netvsc does exactly the same and when this was first discussed
I suggested to allow renaming of IFF_UP interfaces:
https://patchwork.kernel.org/patch/9890299/

but as far as I understand it was decided that it's too risky and not
worth it. Maybe we need to reconsider this, at least for slave
interfaces (as scripts are not supposed to operate on them).

-- 
  Vitaly


[PATCH net v2] hv_netvsc: preserve hw_features on mtu/channels/ringparam changes

2017-11-15 Thread Vitaly Kuznetsov
rndis_filter_device_add() is called both from netvsc_probe() when we
initially create the device and from set channels/mtu/ringparam
routines where we basically remove the device and add it back.

hw_features is reset in rndis_filter_device_add() and filled with
host data. However, we lose all additional flags which are set outside
of the driver, e.g. register_netdevice() adds NETIF_F_SOFT_FEATURES and
many others.

Unfortunately, calls to rndis_{query_hwcaps(), _set_offload_params()}
calls cannot be avoided on every RNDIS reset: host expects us to set
required features explicitly. Moreover, in theory hardware capabilities
can change and we need to reflect the change in hw_features.

Reset net->hw_features bits according to host data in
rndis_netdev_set_hwcaps(), clear corresponding feature bits
from net->features in case some features went missing (will never happen
in real life I guess but let's be consistent).

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
Changes since v1: call rndis_netdev_set_hwcaps() unconditionally; clear
net->hw_features flags we set and set them again based on host's data.
Clear missing features from net->features.

Unfortunately, not calling rndis_netdev_set_hwcaps() on mtu/channels/
ringparam changes is not an option: we still need to call
rndis_{query_hwcaps(),set_offload_params()} or we will lose features
[Haiyang Zhang]
---
 drivers/net/hyperv/hyperv_net.h   |   4 ++
 drivers/net/hyperv/netvsc_drv.c   |   2 +-
 drivers/net/hyperv/rndis_filter.c | 136 ++
 3 files changed, 83 insertions(+), 59 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 5176be76ca7d..b6cbda2963ad 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -646,6 +646,10 @@ struct nvsp_message {
 #define NETVSC_RECEIVE_BUFFER_ID   0xcafe
 #define NETVSC_SEND_BUFFER_ID  0
 
+#define NETVSC_SUPPORTED_HW_FEATURES (NETIF_F_RXCSUM | NETIF_F_IP_CSUM | \
+ NETIF_F_TSO | NETIF_F_IPV6_CSUM | \
+ NETIF_F_TSO6)
+
 #define VRSS_SEND_TAB_SIZE 16  /* must be power of 2 */
 #define VRSS_CHANNEL_MAX 64
 #define VRSS_CHANNEL_DEFAULT 8
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index a32ae02e1b6c..fc1d0b4bca22 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -1953,7 +1953,7 @@ static int netvsc_probe(struct hv_device *dev,
 
memcpy(net->dev_addr, device_info.mac_adr, ETH_ALEN);
 
-   /* hw_features computed in rndis_filter_device_add */
+   /* hw_features computed in rndis_netdev_set_hwcaps() */
net->features = net->hw_features |
NETIF_F_HIGHDMA | NETIF_F_SG |
NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_CTAG_RX;
diff --git a/drivers/net/hyperv/rndis_filter.c 
b/drivers/net/hyperv/rndis_filter.c
index 065b204d8e17..cf1243347a0f 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -1128,69 +1128,20 @@ void rndis_set_subchannel(struct work_struct *w)
rtnl_unlock();
 }
 
-struct netvsc_device *rndis_filter_device_add(struct hv_device *dev,
- struct netvsc_device_info *device_info)
+static int rndis_netdev_set_hwcaps(struct rndis_device *rndis_device,
+  struct netvsc_device *nvdev)
 {
-   struct net_device *net = hv_get_drvdata(dev);
+   struct net_device *net = rndis_device->ndev;
struct net_device_context *net_device_ctx = netdev_priv(net);
-   struct netvsc_device *net_device;
-   struct rndis_device *rndis_device;
struct ndis_offload hwcaps;
struct ndis_offload_params offloads;
-   struct ndis_recv_scale_cap rsscap;
-   u32 rsscap_size = sizeof(struct ndis_recv_scale_cap);
unsigned int gso_max_size = GSO_MAX_SIZE;
-   u32 mtu, size;
-   const struct cpumask *node_cpu_mask;
-   u32 num_possible_rss_qs;
-   int i, ret;
-
-   rndis_device = get_rndis_device();
-   if (!rndis_device)
-   return ERR_PTR(-ENODEV);
-
-   /*
-* Let the inner driver handle this first to create the netvsc channel
-* NOTE! Once the channel is created, we may get a receive callback
-* (RndisFilterOnReceive()) before this call is completed
-*/
-   net_device = netvsc_device_add(dev, device_info);
-   if (IS_ERR(net_device)) {
-   kfree(rndis_device);
-   return net_device;
-   }
-
-   /* Initialize the rndis device */
-   net_device->max_chn = 1;
-   net_device->num_chn = 1;
-
-   net_device->extension = rndis_device;
-   rndis_device->ndev = net;
-
-   /* Send the rndis initialization message */
-   ret = rndis_filter_init_device(rndis_device, net_device);

Re: [PATCH net] hv_netvsc: preserve hw_features on mtu/channels/ringparam changes

2017-11-15 Thread Vitaly Kuznetsov
Haiyang Zhang <haiya...@microsoft.com> writes:

>> -Original Message-----
>> From: Vitaly Kuznetsov [mailto:vkuzn...@redhat.com]
>> Sent: Tuesday, November 14, 2017 11:58 AM
>> To: Stephen Hemminger <step...@networkplumber.org>
>> Cc: netdev@vger.kernel.org; linux-ker...@vger.kernel.org;
>> de...@linuxdriverproject.org; KY Srinivasan <k...@microsoft.com>; Haiyang
>> Zhang <haiya...@microsoft.com>; Stephen Hemminger
>> <sthem...@microsoft.com>; Mohammed Gamal <mmo...@redhat.com>
>> Subject: Re: [PATCH net] hv_netvsc: preserve hw_features on
>> mtu/channels/ringparam changes
>> 
>> Stephen Hemminger <step...@networkplumber.org> writes:
>> 
>> > On Tue, 14 Nov 2017 16:22:05 +0100
>> > Vitaly Kuznetsov <vkuzn...@redhat.com> wrote:
>> >
>> > Yes, this looks like a real issue.
>> >
>> >> + /* Query hardware capabilities if we're called from netvsc_probe() */
>> >> + if (!net->hw_features) {
>> >> + ret = rndis_netdev_set_hwcaps(net_device, rndis_device);
>> >> + if (ret != 0)
>> >> + goto err_dev_remv;
>> >> + }
>> >> +
>> >
>> > Rather than conditional behavior in rndis_filter_device_add, it would
>> > be cleaner to make the call to get hardware capabilities there.
>> >
>> > Please respin and make the query of host a separate function.
>> 
>> You mean call rndis_netdev_set_hwcaps() from netvsc_probe()? Will do.
>> 
>> One question though: in case we'll be avoiding
>> rndis_filter_set_offload_params() call on mtu/channels/ringparam changes -
>> - can we trust the host to preserve what was there before the RNDIS reset?
>> In case not we'll have to untangle what is
>> rndis_netdev_set_hwcaps() in my patch splitting it into two: hw_features
>> setup and rndis_filter_set_offload_params() and leaving the later in
>> rndis_filter_device_add().
>
> After remove/re-add RNDIS dev, you should pass the parameters to the host
> again.
>

Thanks, this changes a lot. I'll prepare v2.

-- 
  Vitaly


Re: [PATCH net] hv_netvsc: preserve hw_features on mtu/channels/ringparam changes

2017-11-14 Thread Vitaly Kuznetsov
Stephen Hemminger <step...@networkplumber.org> writes:

> On Tue, 14 Nov 2017 16:22:05 +0100
> Vitaly Kuznetsov <vkuzn...@redhat.com> wrote:
>
> Yes, this looks like a real issue.
>
>> +/* Query hardware capabilities if we're called from netvsc_probe() */
>> +if (!net->hw_features) {
>> +ret = rndis_netdev_set_hwcaps(net_device, rndis_device);
>> +if (ret != 0)
>> +goto err_dev_remv;
>> +}
>> +
>
> Rather than conditional behavior in rndis_filter_device_add, it would be 
> cleaner
> to make the call to get hardware capabilities there.
>
> Please respin and make the query of host a separate function.

You mean call rndis_netdev_set_hwcaps() from netvsc_probe()? Will do.

One question though: in case we'll be avoiding
rndis_filter_set_offload_params() call on mtu/channels/ringparam
changes -- can we trust the host to preserve what was there before the
RNDIS reset? In case not we'll have to untangle what is
rndis_netdev_set_hwcaps() in my patch splitting it into two: hw_features
setup and rndis_filter_set_offload_params() and leaving the later in
rndis_filter_device_add().

Thanks,

-- 
  Vitaly


[PATCH net] hv_netvsc: preserve hw_features on mtu/channels/ringparam changes

2017-11-14 Thread Vitaly Kuznetsov
rndis_filter_device_add() is called both from netvsc_probe() when we
initially create the device and from set channels/mtu/ringparam
routines where we basically remove the device and add it back.

hw_features is reset in rndis_filter_device_add() and filled with
host data. However, we lose all additional flags which are set outside
of the driver, e.g. register_netdevice() adds NETIF_F_SOFT_FEATURES and
many others.

Fill hw_features in only when we're called from netvsc_probe() and
hw_features is empty.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/net/hyperv/rndis_filter.c | 127 +-
 1 file changed, 70 insertions(+), 57 deletions(-)

diff --git a/drivers/net/hyperv/rndis_filter.c 
b/drivers/net/hyperv/rndis_filter.c
index 065b204d8e17..e9c12b66bf08 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -1128,69 +1128,20 @@ void rndis_set_subchannel(struct work_struct *w)
rtnl_unlock();
 }
 
-struct netvsc_device *rndis_filter_device_add(struct hv_device *dev,
- struct netvsc_device_info *device_info)
+static int rndis_netdev_set_hwcaps(struct netvsc_device *nvdev,
+  struct rndis_device *rndis_device)
 {
-   struct net_device *net = hv_get_drvdata(dev);
+   struct net_device *net = rndis_device->ndev;
struct net_device_context *net_device_ctx = netdev_priv(net);
-   struct netvsc_device *net_device;
-   struct rndis_device *rndis_device;
struct ndis_offload hwcaps;
struct ndis_offload_params offloads;
-   struct ndis_recv_scale_cap rsscap;
-   u32 rsscap_size = sizeof(struct ndis_recv_scale_cap);
unsigned int gso_max_size = GSO_MAX_SIZE;
-   u32 mtu, size;
-   const struct cpumask *node_cpu_mask;
-   u32 num_possible_rss_qs;
-   int i, ret;
-
-   rndis_device = get_rndis_device();
-   if (!rndis_device)
-   return ERR_PTR(-ENODEV);
-
-   /*
-* Let the inner driver handle this first to create the netvsc channel
-* NOTE! Once the channel is created, we may get a receive callback
-* (RndisFilterOnReceive()) before this call is completed
-*/
-   net_device = netvsc_device_add(dev, device_info);
-   if (IS_ERR(net_device)) {
-   kfree(rndis_device);
-   return net_device;
-   }
-
-   /* Initialize the rndis device */
-   net_device->max_chn = 1;
-   net_device->num_chn = 1;
-
-   net_device->extension = rndis_device;
-   rndis_device->ndev = net;
-
-   /* Send the rndis initialization message */
-   ret = rndis_filter_init_device(rndis_device, net_device);
-   if (ret != 0)
-   goto err_dev_remv;
-
-   /* Get the MTU from the host */
-   size = sizeof(u32);
-   ret = rndis_filter_query_device(rndis_device, net_device,
-   RNDIS_OID_GEN_MAXIMUM_FRAME_SIZE,
-   , );
-   if (ret == 0 && size == sizeof(u32) && mtu < net->mtu)
-   net->mtu = mtu;
-
-   /* Get the mac address */
-   ret = rndis_filter_query_device_mac(rndis_device, net_device);
-   if (ret != 0)
-   goto err_dev_remv;
-
-   memcpy(device_info->mac_adr, rndis_device->hw_mac_adr, ETH_ALEN);
+   int ret;
 
/* Find HW offload capabilities */
-   ret = rndis_query_hwcaps(rndis_device, net_device, );
+   ret = rndis_query_hwcaps(rndis_device, nvdev, );
if (ret != 0)
-   goto err_dev_remv;
+   return ret;
 
/* A value of zero means "no change"; now turn on what we want. */
memset(, 0, sizeof(struct ndis_offload_params));
@@ -1245,10 +1196,72 @@ struct netvsc_device *rndis_filter_device_add(struct 
hv_device *dev,
 
netif_set_gso_max_size(net, gso_max_size);
 
-   ret = rndis_filter_set_offload_params(net, net_device, );
-   if (ret)
+   ret = rndis_filter_set_offload_params(net, nvdev, );
+
+   return ret;
+}
+
+struct netvsc_device *rndis_filter_device_add(struct hv_device *dev,
+ struct netvsc_device_info *device_info)
+{
+   struct net_device *net = hv_get_drvdata(dev);
+   struct netvsc_device *net_device;
+   struct rndis_device *rndis_device;
+   struct ndis_recv_scale_cap rsscap;
+   u32 rsscap_size = sizeof(struct ndis_recv_scale_cap);
+   u32 mtu, size;
+   const struct cpumask *node_cpu_mask;
+   u32 num_possible_rss_qs;
+   int i, ret;
+
+   rndis_device = get_rndis_device();
+   if (!rndis_device)
+   return ERR_PTR(-ENODEV);
+
+   /* Let the inner driver handle this first to create the netvsc channel
+* NOTE! Once the channel is created, we may get a receive callback
+  

Re: [RFC] hv_netvsc: safer orderly shutdown

2017-11-13 Thread Vitaly Kuznetsov
Stephen Hemminger  writes:

>
> The NAPI disable is already handled by rndis close.

Sorry, but I'm probably missing something: I can only see
netif_napi_del() call in netvsc_device_remove() but this happens much
later. And I don see us doing napi_disable() anywhere on the path.
But I'm probably missing something.

-- 
  Vitaly


Re: [RFC] hv_netvsc: safer orderly shutdown

2017-11-10 Thread Vitaly Kuznetsov
Stephen Hemminger  writes:

> Several types of control operations require that the underlying RNDIS
> infrastructure be restarted. This patch changes the ordering of the
> shutdown to avoid race conditions.
> Stop all transmits before doing RNDIS halt. This involves stopping the
> network device transmit queues, then waiting for all outstanding
> sends before informing host to halt.
>
> Also, check for successful restart of the device when after the
> change is done.
>
> For review, not tested on Hyper-V yet.
>
> Signed-off-by: Stephen Hemminger 
> ---
>  drivers/net/hyperv/netvsc_drv.c   | 40 
> ++-
>  drivers/net/hyperv/rndis_filter.c | 23 +++---
>  2 files changed, 42 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
> index da216ca4f2b2..3afa082e093d 100644
> --- a/drivers/net/hyperv/netvsc_drv.c
> +++ b/drivers/net/hyperv/netvsc_drv.c
> @@ -855,8 +855,10 @@ static int netvsc_set_channels(struct net_device *net,
>
>   orig = nvdev->num_chn;
>   was_opened = rndis_filter_opened(nvdev);
> - if (was_opened)
> + if (was_opened) {
> + netif_tx_disable(net);
>   rndis_filter_close(nvdev);
> + }

I was also experimenting with this and I think it may also make sense to
add napi_disable() for all queues here.

It also seems that the usual TX disable pattern is 

netif_tx_stop_all_queues(net);
netif_tx_disable(net);

not sure why..

>
>   memset(_info, 0, sizeof(device_info));
>   device_info.num_chn = count;
> @@ -881,8 +883,13 @@ static int netvsc_set_channels(struct net_device *net,
>   }
>   }
>
> - if (was_opened)
> - rndis_filter_open(nvdev);
> + if (was_opened) {
> + ret = rndis_filter_open(nvdev);
> + if (ret)
> + netdev_err(net, "reopening device failed: %d\n", ret);
> + else
> + netif_tx_start_all_queues(net);
> + }
>
>   /* We may have missed link change notifications */
>   net_device_ctx->last_reconfig = 0;
> @@ -971,8 +978,10 @@ static int netvsc_change_mtu(struct net_device *ndev, 
> int mtu)
>
>   netif_device_detach(ndev);
>   was_opened = rndis_filter_opened(nvdev);
> - if (was_opened)
> + if (was_opened) {
> + netif_tx_disable(net);
>   rndis_filter_close(nvdev);
> + }

Shall we just nove netif_tx_disable() & friends to rndis_filter_close()?

>
>   memset(_info, 0, sizeof(device_info));
>   device_info.ring_size = ring_size;
> @@ -1004,8 +1013,13 @@ static int netvsc_change_mtu(struct net_device *ndev, 
> int mtu)
>   }
>   }
>
> - if (was_opened)
> - rndis_filter_open(nvdev);
> + if (was_opened) {
> + ret = rndis_filter_open(nvdev);
> + if (ret)
> + netdev_err(net, "reopening device failed: %d\n", ret);
> + else
> + netif_tx_start_all_queues(net);
> + }

Yea, the main problem here is that we can't do much if we fail, the
device will be completely unusable. That's not something users exepct
doing MTU change...

>
>   netif_device_attach(ndev);
>
> @@ -1547,8 +1561,10 @@ static int netvsc_set_ringparam(struct net_device 
> *ndev,
>
>   netif_device_detach(ndev);
>   was_opened = rndis_filter_opened(nvdev);
> - if (was_opened)
> + if (was_opened) {
> + netif_tx_disable(net);
>   rndis_filter_close(nvdev);
> + }
>
>   rndis_filter_device_remove(hdev, nvdev);
>
> @@ -1566,8 +1582,14 @@ static int netvsc_set_ringparam(struct net_device 
> *ndev,
>   }
>   }
>
> - if (was_opened)
> - rndis_filter_open(nvdev);
> + if (was_opened) {
> + ret = rndis_filter_open(nvdev);
> + if (ret)
> + netdev_err(net, "reopening device failed: %d\n", ret);
> + else
> + netif_tx_start_all_queues(net);
> + }
> +
>   netif_device_attach(ndev);
>
>   /* We may have missed link change notifications */
> diff --git a/drivers/net/hyperv/rndis_filter.c 
> b/drivers/net/hyperv/rndis_filter.c
> index 0648eebda829..164f5ffe9c50 100644
> --- a/drivers/net/hyperv/rndis_filter.c
> +++ b/drivers/net/hyperv/rndis_filter.c
> @@ -948,11 +948,20 @@ static void rndis_filter_halt_device(struct 
> rndis_device *dev)
>   struct net_device_context *net_device_ctx = netdev_priv(dev->ndev);
>   struct netvsc_device *nvdev = rtnl_dereference(net_device_ctx->nvdev);
>
> + /* tell bottom half that deice is being closed */
> + nvdev->destroy = true;
> +
> + /* Force flag to be ordered before waiting */
> + wmb();
> +
> + /* Wait for all send completions */
> + wait_event(nvdev->wait_drain, netvsc_device_idle(nvdev));
> +
>   /* 

Re: [PATCH net-next v2 2/2] hv_netvsc: hide warnings about uninitialized/missing rndis device

2017-11-08 Thread Vitaly Kuznetsov
Stephen Hemminger <step...@networkplumber.org> writes:

> On Nov 2, 2017 19:35, "Vitaly Kuznetsov" <vkuzn...@redhat.com> wrote:
>
>  Hyper-V hosts are known to send RNDIS messages even after we halt the
>  device in rndis_filter_halt_device(). Remove user visible messages
>  as they are not really useful.
>
>  Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
>  ---
>  drivers/net/hyperv/rndis_filter.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
>  diff --git a/drivers/net/hyperv/rndis_filter.c 
> b/drivers/net/hyperv/rndis_filter.c
>  index 0648eebda829..8b1242b8d8ef 100644
>  --- a/drivers/net/hyperv/rndis_filter.c
>  +++ b/drivers/net/hyperv/rndis_filter.c
>  @@ -407,13 +407,13 @@ int rndis_filter_receive(struct net_device *ndev,
>
>  /* Make sure the rndis device state is initialized */
>  if (unlikely(!rndis_dev)) {
>  - netif_err(net_device_ctx, rx_err, ndev,
>  + netif_dbg(net_device_ctx, rx_err, ndev,
>  "got rndis message but no rndis device!\n");
>  return NVSP_STAT_FAIL;
>  }
>
>  if (unlikely(rndis_dev->state == RNDIS_DEV_UNINITIALIZED)) {
>  - netif_err(net_device_ctx, rx_err, ndev,
>  + netif_dbg(net_device_ctx, rx_err, ndev,
>  "got rndis message uninitialized\n");
>  return NVSP_STAT_FAIL;
>  }
>  --
>  2.13.6
>
>  ___
>  devel mailing list
>  de...@linuxdriverproject.org
>  http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
>
> This should never happen since host removal should be setting device
> down which disables NAPI. If this is not working correctly, that needs
> to be fixed (rather than silencing the message).

This happens in between we halt RNDIS device in
rndis_filter_halt_device() and NAPI shutdown from netvsc_device_remove()
while the window is still open.

>
> Don't shoot the messenger

These messages are of no use for a random user: you change MTU on your
device and see 'got rndis message uninitialized' - what are you supposed
to do? I leave them at debugging level for us to actually debug.

-- 
  Vitaly


[PATCH net-next v2 0/2] hv_netvsc: fix a hang on channel/mtu changes

2017-11-02 Thread Vitaly Kuznetsov
It was found that netvsc driver doesn't survive e.g.

# while true; do ethtool -L eth0 combined 4; ethtool -L eth0 combined 8; done"

test. I was able to identify a hang in guest/host communication, it is
fixed by PATCH1 of this series. PATCH2 is a cosmetic change masking
unneeded messages.

Changes since v1:
- Throw away patches 2 and 3 of the original series as one is unneeded and
  the other is not justified [Eric Dumazet, Stephen Hemminger] so I'm only
  fixing the hang now, the crash doesn't reproduce. Will keep an eye on it.

Vitaly Kuznetsov (2):
  hv_netvsc: netvsc_teardown_gpadl() split
  hv_netvsc: hide warnings about uninitialized/missing rndis device

 drivers/net/hyperv/netvsc.c   | 69 ---
 drivers/net/hyperv/rndis_filter.c |  4 +--
 2 files changed, 38 insertions(+), 35 deletions(-)

-- 
2.13.6



[PATCH net-next v2 2/2] hv_netvsc: hide warnings about uninitialized/missing rndis device

2017-11-02 Thread Vitaly Kuznetsov
Hyper-V hosts are known to send RNDIS messages even after we halt the
device in rndis_filter_halt_device(). Remove user visible messages
as they are not really useful.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/net/hyperv/rndis_filter.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/hyperv/rndis_filter.c 
b/drivers/net/hyperv/rndis_filter.c
index 0648eebda829..8b1242b8d8ef 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -407,13 +407,13 @@ int rndis_filter_receive(struct net_device *ndev,
 
/* Make sure the rndis device state is initialized */
if (unlikely(!rndis_dev)) {
-   netif_err(net_device_ctx, rx_err, ndev,
+   netif_dbg(net_device_ctx, rx_err, ndev,
  "got rndis message but no rndis device!\n");
return NVSP_STAT_FAIL;
}
 
if (unlikely(rndis_dev->state == RNDIS_DEV_UNINITIALIZED)) {
-   netif_err(net_device_ctx, rx_err, ndev,
+   netif_dbg(net_device_ctx, rx_err, ndev,
  "got rndis message uninitialized\n");
return NVSP_STAT_FAIL;
}
-- 
2.13.6



[PATCH net-next v2 1/2] hv_netvsc: netvsc_teardown_gpadl() split

2017-11-02 Thread Vitaly Kuznetsov
It was found that in some cases host refuses to teardown GPADL for send/
receive buffers (probably when some work with these buffere is scheduled or
ongoing). Change the teardown logic to be:
1) Send NVSP_MSG1_TYPE_REVOKE_* messages
2) Close the channel
3) Teardown GPADLs.
This seems to work reliably.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/net/hyperv/netvsc.c | 69 +++--
 1 file changed, 36 insertions(+), 33 deletions(-)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 5bb6a20072dd..bfc79698b8f4 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -100,12 +100,11 @@ static void free_netvsc_device_rcu(struct netvsc_device 
*nvdev)
call_rcu(>rcu, free_netvsc_device);
 }
 
-static void netvsc_destroy_buf(struct hv_device *device)
+static void netvsc_revoke_buf(struct hv_device *device,
+ struct netvsc_device *net_device)
 {
struct nvsp_message *revoke_packet;
struct net_device *ndev = hv_get_drvdata(device);
-   struct net_device_context *ndc = netdev_priv(ndev);
-   struct netvsc_device *net_device = rtnl_dereference(ndc->nvdev);
int ret;
 
/*
@@ -148,28 +147,6 @@ static void netvsc_destroy_buf(struct hv_device *device)
net_device->recv_section_cnt = 0;
}
 
-   /* Teardown the gpadl on the vsp end */
-   if (net_device->recv_buf_gpadl_handle) {
-   ret = vmbus_teardown_gpadl(device->channel,
-  net_device->recv_buf_gpadl_handle);
-
-   /* If we failed here, we might as well return and have a leak
-* rather than continue and a bugchk
-*/
-   if (ret != 0) {
-   netdev_err(ndev,
-  "unable to teardown receive buffer's 
gpadl\n");
-   return;
-   }
-   net_device->recv_buf_gpadl_handle = 0;
-   }
-
-   if (net_device->recv_buf) {
-   /* Free up the receive buffer */
-   vfree(net_device->recv_buf);
-   net_device->recv_buf = NULL;
-   }
-
/* Deal with the send buffer we may have setup.
 * If we got a  send section size, it means we received a
 * NVSP_MSG1_TYPE_SEND_SEND_BUF_COMPLETE msg (ie sent
@@ -210,7 +187,35 @@ static void netvsc_destroy_buf(struct hv_device *device)
}
net_device->send_section_cnt = 0;
}
-   /* Teardown the gpadl on the vsp end */
+}
+
+static void netvsc_teardown_gpadl(struct hv_device *device,
+ struct netvsc_device *net_device)
+{
+   struct net_device *ndev = hv_get_drvdata(device);
+   int ret;
+
+   if (net_device->recv_buf_gpadl_handle) {
+   ret = vmbus_teardown_gpadl(device->channel,
+  net_device->recv_buf_gpadl_handle);
+
+   /* If we failed here, we might as well return and have a leak
+* rather than continue and a bugchk
+*/
+   if (ret != 0) {
+   netdev_err(ndev,
+  "unable to teardown receive buffer's 
gpadl\n");
+   return;
+   }
+   net_device->recv_buf_gpadl_handle = 0;
+   }
+
+   if (net_device->recv_buf) {
+   /* Free up the receive buffer */
+   vfree(net_device->recv_buf);
+   net_device->recv_buf = NULL;
+   }
+
if (net_device->send_buf_gpadl_handle) {
ret = vmbus_teardown_gpadl(device->channel,
   net_device->send_buf_gpadl_handle);
@@ -420,7 +425,8 @@ static int netvsc_init_buf(struct hv_device *device,
goto exit;
 
 cleanup:
-   netvsc_destroy_buf(device);
+   netvsc_revoke_buf(device, net_device);
+   netvsc_teardown_gpadl(device, net_device);
 
 exit:
return ret;
@@ -539,11 +545,6 @@ static int netvsc_connect_vsp(struct hv_device *device,
return ret;
 }
 
-static void netvsc_disconnect_vsp(struct hv_device *device)
-{
-   netvsc_destroy_buf(device);
-}
-
 /*
  * netvsc_device_remove - Callback when the root bus device is removed
  */
@@ -557,7 +558,7 @@ void netvsc_device_remove(struct hv_device *device)
 
cancel_work_sync(_device->subchan_work);
 
-   netvsc_disconnect_vsp(device);
+   netvsc_revoke_buf(device, net_device);
 
RCU_INIT_POINTER(net_device_ctx->nvdev, NULL);
 
@@ -570,6 +571,8 @@ void netvsc_device_remove(struct hv_device *device)
/* Now, we can close the channel safely */
vmbus_close(device->channel);
 
+   netvsc_teardown_gpadl(device, net_device);
+
 

Re: [PATCH net-next 2/4] hv_netvsc: protect nvdev->extension with RCU

2017-11-02 Thread Vitaly Kuznetsov
Vitaly Kuznetsov <vkuzn...@redhat.com> writes:

>
> Ok, I may have missed something. I'll try reproducing the crash and
> finding a better fine-grained solution.
>

It's been two days and I'm failing to reproduce the crash in
rndis_filter_receive() in my environment so I'll probably just re-send
patches 1 and 4 of this series as v2.

-- 
  Vitaly


Re: [PATCH net-next 2/4] hv_netvsc: protect nvdev->extension with RCU

2017-10-31 Thread Vitaly Kuznetsov
Stephen Hemminger <step...@networkplumber.org> writes:

> On Tue, 31 Oct 2017 14:42:02 +0100
> Vitaly Kuznetsov <vkuzn...@redhat.com> wrote:
>
>> @@ -2002,7 +2002,9 @@ static int netvsc_probe(struct hv_device *dev,
>>  device_info.recv_sections = NETVSC_DEFAULT_RX;
>>  device_info.recv_section_size = NETVSC_RECV_SECTION_SIZE;
>>  
>> +rtnl_lock();
>>  nvdev = rndis_filter_device_add(dev, _info);
>> +rtnl_unlock();
>
> rtnl is not necessary here. probe can not be bothered by other changes.
>

Yes, this is only to support rtnl_dereference() down the stack.

>> --- a/drivers/net/hyperv/rndis_filter.c
>> +++ b/drivers/net/hyperv/rndis_filter.c
>> @@ -402,20 +402,27 @@ int rndis_filter_receive(struct net_device *ndev,
>>   void *data, u32 buflen)
>>  {
>>  struct net_device_context *net_device_ctx = netdev_priv(ndev);
>> -struct rndis_device *rndis_dev = net_dev->extension;
>> +struct rndis_device *rndis_dev;
>>  struct rndis_message *rndis_msg = data;
>> +int ret = 0;
>> +
>> +rcu_read_lock_bh();
>> +
>> +rndis_dev = rcu_dereference_bh(net_dev->extension);
>
> filter_receive is already called only from NAPI only and has RCU lock and soft
> irq disabled. This is not necessary.
>
>> -net_dev->extension = NULL;
>> +rcu_assign_pointer(net_dev->extension, NULL);
>> +
>> +synchronize_rcu();
>
> rcu_assign_pointer with NULL is never a good idea.
> And synchronize_rcu is slow. Since net_device is already protected
> by RCU (for deletion) it should not be necessary.
>

I thought we don't care that much about the speed of this patch as
rndis_filter_device_remove() is only called on device remove/mtu
change/... and we need to interact with the host -- and this is already
slow.

> Thank you for trying to address these races. But it should be
> done carefully not by just slapping RCU everywhere.

Ok, I may have missed something. I'll try reproducing the crash and
finding a better fine-grained solution.

Thanks for the feedback!

-- 
  Vitaly


Re: [PATCH net-next 3/4] hv_netvsc: reset net_device_ctx->nvdev with rcu_assign_pointer()

2017-10-31 Thread Vitaly Kuznetsov
David Miller <da...@davemloft.net> writes:

> From: Vitaly Kuznetsov <vkuzn...@redhat.com>
> Date: Tue, 31 Oct 2017 15:40:06 +0100
>
>> Eric Dumazet <eric.duma...@gmail.com> writes:
>> 
>>> On Tue, 2017-10-31 at 14:42 +0100, Vitaly Kuznetsov wrote:
>>>> RCU_INIT_POINTER() is not suitable here as it doesn't give us ordering
>>>> guarantees (see the comment in rcupdate.h). This is also not a hotpath.
>>>> 
>>>> Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
>>>> ---
>>>>  drivers/net/hyperv/netvsc.c | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>> 
>>>> diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
>>>> index bfc79698b8f4..12efb3e34775 100644
>>>> --- a/drivers/net/hyperv/netvsc.c
>>>> +++ b/drivers/net/hyperv/netvsc.c
>>>> @@ -560,7 +560,7 @@ void netvsc_device_remove(struct hv_device *device)
>>>>  
>>>>netvsc_revoke_buf(device, net_device);
>>>>  
>>>> -  RCU_INIT_POINTER(net_device_ctx->nvdev, NULL);
>>>> +  rcu_assign_pointer(net_device_ctx->nvdev, NULL);
>>>
>>> I see no point for this patch.
>>>
>>> Setting a NULL pointer needs no barrier at all.
>> 
>> Oh, sorry, I got confused by the comment near RCU_INIT_POINTER() in
>> rcupdate.h. Now looking at their definitions I see.
>> 
>> This patch can of course be dropped from the series.
>
> Any time there is a change to the series, you must resubmit the entire
> series.
>

Sure, will do. Just wanted to give it a couple of days to see if
Microsoft guys or someone else have any comments.

Thanks,

-- 
  Vitaly


Re: [PATCH net-next 3/4] hv_netvsc: reset net_device_ctx->nvdev with rcu_assign_pointer()

2017-10-31 Thread Vitaly Kuznetsov
Eric Dumazet <eric.duma...@gmail.com> writes:

> On Tue, 2017-10-31 at 14:42 +0100, Vitaly Kuznetsov wrote:
>> RCU_INIT_POINTER() is not suitable here as it doesn't give us ordering
>> guarantees (see the comment in rcupdate.h). This is also not a hotpath.
>> 
>> Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
>> ---
>>  drivers/net/hyperv/netvsc.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
>> index bfc79698b8f4..12efb3e34775 100644
>> --- a/drivers/net/hyperv/netvsc.c
>> +++ b/drivers/net/hyperv/netvsc.c
>> @@ -560,7 +560,7 @@ void netvsc_device_remove(struct hv_device *device)
>>  
>>  netvsc_revoke_buf(device, net_device);
>>  
>> -RCU_INIT_POINTER(net_device_ctx->nvdev, NULL);
>> +rcu_assign_pointer(net_device_ctx->nvdev, NULL);
>
> I see no point for this patch.
>
> Setting a NULL pointer needs no barrier at all.

Oh, sorry, I got confused by the comment near RCU_INIT_POINTER() in
rcupdate.h. Now looking at their definitions I see.

This patch can of course be dropped from the series.

-- 
  Vitaly


[PATCH net-next 1/4] hv_netvsc: netvsc_teardown_gpadl() split

2017-10-31 Thread Vitaly Kuznetsov
It was found that in some cases host refuses to teardown GPADL for send/
receive buffers (probably when some work with these buffere is scheduled or
ongoing). Change the teardown logic to be:
1) Send NVSP_MSG1_TYPE_REVOKE_* messages
2) Close the channel
3) Teardown GPADLs.
This seems to work reliably.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/net/hyperv/netvsc.c | 69 +++--
 1 file changed, 36 insertions(+), 33 deletions(-)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 5bb6a20072dd..bfc79698b8f4 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -100,12 +100,11 @@ static void free_netvsc_device_rcu(struct netvsc_device 
*nvdev)
call_rcu(>rcu, free_netvsc_device);
 }
 
-static void netvsc_destroy_buf(struct hv_device *device)
+static void netvsc_revoke_buf(struct hv_device *device,
+ struct netvsc_device *net_device)
 {
struct nvsp_message *revoke_packet;
struct net_device *ndev = hv_get_drvdata(device);
-   struct net_device_context *ndc = netdev_priv(ndev);
-   struct netvsc_device *net_device = rtnl_dereference(ndc->nvdev);
int ret;
 
/*
@@ -148,28 +147,6 @@ static void netvsc_destroy_buf(struct hv_device *device)
net_device->recv_section_cnt = 0;
}
 
-   /* Teardown the gpadl on the vsp end */
-   if (net_device->recv_buf_gpadl_handle) {
-   ret = vmbus_teardown_gpadl(device->channel,
-  net_device->recv_buf_gpadl_handle);
-
-   /* If we failed here, we might as well return and have a leak
-* rather than continue and a bugchk
-*/
-   if (ret != 0) {
-   netdev_err(ndev,
-  "unable to teardown receive buffer's 
gpadl\n");
-   return;
-   }
-   net_device->recv_buf_gpadl_handle = 0;
-   }
-
-   if (net_device->recv_buf) {
-   /* Free up the receive buffer */
-   vfree(net_device->recv_buf);
-   net_device->recv_buf = NULL;
-   }
-
/* Deal with the send buffer we may have setup.
 * If we got a  send section size, it means we received a
 * NVSP_MSG1_TYPE_SEND_SEND_BUF_COMPLETE msg (ie sent
@@ -210,7 +187,35 @@ static void netvsc_destroy_buf(struct hv_device *device)
}
net_device->send_section_cnt = 0;
}
-   /* Teardown the gpadl on the vsp end */
+}
+
+static void netvsc_teardown_gpadl(struct hv_device *device,
+ struct netvsc_device *net_device)
+{
+   struct net_device *ndev = hv_get_drvdata(device);
+   int ret;
+
+   if (net_device->recv_buf_gpadl_handle) {
+   ret = vmbus_teardown_gpadl(device->channel,
+  net_device->recv_buf_gpadl_handle);
+
+   /* If we failed here, we might as well return and have a leak
+* rather than continue and a bugchk
+*/
+   if (ret != 0) {
+   netdev_err(ndev,
+  "unable to teardown receive buffer's 
gpadl\n");
+   return;
+   }
+   net_device->recv_buf_gpadl_handle = 0;
+   }
+
+   if (net_device->recv_buf) {
+   /* Free up the receive buffer */
+   vfree(net_device->recv_buf);
+   net_device->recv_buf = NULL;
+   }
+
if (net_device->send_buf_gpadl_handle) {
ret = vmbus_teardown_gpadl(device->channel,
   net_device->send_buf_gpadl_handle);
@@ -420,7 +425,8 @@ static int netvsc_init_buf(struct hv_device *device,
goto exit;
 
 cleanup:
-   netvsc_destroy_buf(device);
+   netvsc_revoke_buf(device, net_device);
+   netvsc_teardown_gpadl(device, net_device);
 
 exit:
return ret;
@@ -539,11 +545,6 @@ static int netvsc_connect_vsp(struct hv_device *device,
return ret;
 }
 
-static void netvsc_disconnect_vsp(struct hv_device *device)
-{
-   netvsc_destroy_buf(device);
-}
-
 /*
  * netvsc_device_remove - Callback when the root bus device is removed
  */
@@ -557,7 +558,7 @@ void netvsc_device_remove(struct hv_device *device)
 
cancel_work_sync(_device->subchan_work);
 
-   netvsc_disconnect_vsp(device);
+   netvsc_revoke_buf(device, net_device);
 
RCU_INIT_POINTER(net_device_ctx->nvdev, NULL);
 
@@ -570,6 +571,8 @@ void netvsc_device_remove(struct hv_device *device)
/* Now, we can close the channel safely */
vmbus_close(device->channel);
 
+   netvsc_teardown_gpadl(device, net_device);
+
 

[PATCH net-next 3/4] hv_netvsc: reset net_device_ctx->nvdev with rcu_assign_pointer()

2017-10-31 Thread Vitaly Kuznetsov
RCU_INIT_POINTER() is not suitable here as it doesn't give us ordering
guarantees (see the comment in rcupdate.h). This is also not a hotpath.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/net/hyperv/netvsc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index bfc79698b8f4..12efb3e34775 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -560,7 +560,7 @@ void netvsc_device_remove(struct hv_device *device)
 
netvsc_revoke_buf(device, net_device);
 
-   RCU_INIT_POINTER(net_device_ctx->nvdev, NULL);
+   rcu_assign_pointer(net_device_ctx->nvdev, NULL);
 
/*
 * At this point, no one should be accessing net_device
-- 
2.13.6



[PATCH net-next 2/4] hv_netvsc: protect nvdev->extension with RCU

2017-10-31 Thread Vitaly Kuznetsov
rndis_filter_receive() is called from interrupt context and may race with
rndis_filter_device_remove() resetting extension pointer. RNDIS_MSG_HALT
does not help, host may still send us messages after it. Protect extension
pointer with RCU.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/net/hyperv/hyperv_net.h   |  2 +-
 drivers/net/hyperv/netvsc_drv.c   | 10 +
 drivers/net/hyperv/rndis_filter.c | 43 +--
 3 files changed, 35 insertions(+), 20 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 4958bb6b7376..4f003e85781c 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -798,7 +798,7 @@ struct netvsc_device {
struct work_struct subchan_work;
wait_queue_head_t subchan_open;
 
-   struct rndis_device *extension;
+   struct rndis_device __rcu *extension;
 
int ring_size;
 
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index da216ca4f2b2..8ad018305aa5 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -94,7 +94,7 @@ static int netvsc_open(struct net_device *net)
 
netif_tx_wake_all_queues(net);
 
-   rdev = nvdev->extension;
+   rdev = rtnl_dereference(nvdev->extension);
 
if (!rdev->link_state)
netif_carrier_on(net);
@@ -1431,7 +1431,7 @@ static int netvsc_get_rxfh(struct net_device *dev, u32 
*indir, u8 *key,
if (hfunc)
*hfunc = ETH_RSS_HASH_TOP;  /* Toeplitz */
 
-   rndis_dev = ndev->extension;
+   rndis_dev = rtnl_dereference(ndev->extension);
if (indir) {
for (i = 0; i < ITAB_NUM; i++)
indir[i] = rndis_dev->rx_table[i];
@@ -1457,7 +1457,7 @@ static int netvsc_set_rxfh(struct net_device *dev, const 
u32 *indir,
if (hfunc != ETH_RSS_HASH_NO_CHANGE && hfunc != ETH_RSS_HASH_TOP)
return -EOPNOTSUPP;
 
-   rndis_dev = ndev->extension;
+   rndis_dev = rtnl_dereference(ndev->extension);
if (indir) {
for (i = 0; i < ITAB_NUM; i++)
if (indir[i] >= ndev->num_chn)
@@ -1640,7 +1640,7 @@ static void netvsc_link_change(struct work_struct *w)
if (!net_device)
goto out_unlock;
 
-   rdev = net_device->extension;
+   rdev = rtnl_dereference(net_device->extension);
 
next_reconfig = ndev_ctx->last_reconfig + LINKCHANGE_INT;
if (time_is_after_jiffies(next_reconfig)) {
@@ -2002,7 +2002,9 @@ static int netvsc_probe(struct hv_device *dev,
device_info.recv_sections = NETVSC_DEFAULT_RX;
device_info.recv_section_size = NETVSC_RECV_SECTION_SIZE;
 
+   rtnl_lock();
nvdev = rndis_filter_device_add(dev, _info);
+   rtnl_unlock();
if (IS_ERR(nvdev)) {
ret = PTR_ERR(nvdev);
netdev_err(net, "unable to add netvsc device (ret %d)\n", ret);
diff --git a/drivers/net/hyperv/rndis_filter.c 
b/drivers/net/hyperv/rndis_filter.c
index 0648eebda829..1c31e2b0216e 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -402,20 +402,27 @@ int rndis_filter_receive(struct net_device *ndev,
 void *data, u32 buflen)
 {
struct net_device_context *net_device_ctx = netdev_priv(ndev);
-   struct rndis_device *rndis_dev = net_dev->extension;
+   struct rndis_device *rndis_dev;
struct rndis_message *rndis_msg = data;
+   int ret = 0;
+
+   rcu_read_lock_bh();
+
+   rndis_dev = rcu_dereference_bh(net_dev->extension);
 
/* Make sure the rndis device state is initialized */
if (unlikely(!rndis_dev)) {
netif_err(net_device_ctx, rx_err, ndev,
  "got rndis message but no rndis device!\n");
-   return NVSP_STAT_FAIL;
+   ret = NVSP_STAT_FAIL;
+   goto unlock;
}
 
if (unlikely(rndis_dev->state == RNDIS_DEV_UNINITIALIZED)) {
netif_err(net_device_ctx, rx_err, ndev,
  "got rndis message uninitialized\n");
-   return NVSP_STAT_FAIL;
+   ret = NVSP_STAT_FAIL;
+   goto unlock;
}
 
if (netif_msg_rx_status(net_device_ctx))
@@ -423,8 +430,9 @@ int rndis_filter_receive(struct net_device *ndev,
 
switch (rndis_msg->ndis_msg_type) {
case RNDIS_MSG_PACKET:
-   return rndis_filter_receive_data(ndev, rndis_dev, rndis_msg,
-channel, data, buflen);
+   ret = rndis_filter_receive_data(ndev, rndis_dev, rndis_msg,
+   channel, data, buflen);
+   break;
case RNDIS_MSG_INIT_C:
case 

[PATCH net-next 4/4] hv_netvsc: hide warnings about uninitialized/missing rndis device

2017-10-31 Thread Vitaly Kuznetsov
Hyper-V hosts are known to send RNDIS messages even after we halt the
device in rndis_filter_halt_device(). Remove user visible messages
as they are not really useful.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/net/hyperv/rndis_filter.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/hyperv/rndis_filter.c 
b/drivers/net/hyperv/rndis_filter.c
index 1c31e2b0216e..77cea50d0945 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -412,14 +412,14 @@ int rndis_filter_receive(struct net_device *ndev,
 
/* Make sure the rndis device state is initialized */
if (unlikely(!rndis_dev)) {
-   netif_err(net_device_ctx, rx_err, ndev,
+   netif_dbg(net_device_ctx, rx_err, ndev,
  "got rndis message but no rndis device!\n");
ret = NVSP_STAT_FAIL;
goto unlock;
}
 
if (unlikely(rndis_dev->state == RNDIS_DEV_UNINITIALIZED)) {
-   netif_err(net_device_ctx, rx_err, ndev,
+   netif_dbg(net_device_ctx, rx_err, ndev,
  "got rndis message uninitialized\n");
ret = NVSP_STAT_FAIL;
goto unlock;
-- 
2.13.6



[PATCH net-next 0/4] hv_netvsc: fix some crashes and hangs on channel/mtu changes

2017-10-31 Thread Vitaly Kuznetsov
It was found that netvsc driver doesn't survive e.g.

# while true; do ethtool -L eth0 combined 4; ethtool -L eth0 combined 8; done"

test. I was able to identify several issues: hang in guest/host
communication and a couple of crashes. Fix these. While I'm not
convinced I'm fixing everything VMs seem to survive overnight
test. I'll send one more related patch to VMBus core too.

Vitaly Kuznetsov (4):
  hv_netvsc: netvsc_teardown_gpadl() split
  hv_netvsc: protect nvdev->extension with RCU
  hv_netvsc: reset net_device_ctx->nvdev with rcu_assign_pointer()
  hv_netvsc: hide warnings about uninitialized/missing rndis device

 drivers/net/hyperv/hyperv_net.h   |  2 +-
 drivers/net/hyperv/netvsc.c   | 71 ---
 drivers/net/hyperv/netvsc_drv.c   | 10 +++---
 drivers/net/hyperv/rndis_filter.c | 47 --
 4 files changed, 74 insertions(+), 56 deletions(-)

-- 
2.13.6



[PATCH net-next] hinic: don't build the module by default

2017-08-28 Thread Vitaly Kuznetsov
We probably don't want to enable code supporting particular hardware by
default e.g. when someone does 'make defconfig'. Other ethernet modules
don't do it.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/net/ethernet/huawei/hinic/Kconfig | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/huawei/hinic/Kconfig 
b/drivers/net/ethernet/huawei/hinic/Kconfig
index 69f2b1fba48d..08db24954f7e 100644
--- a/drivers/net/ethernet/huawei/hinic/Kconfig
+++ b/drivers/net/ethernet/huawei/hinic/Kconfig
@@ -5,7 +5,6 @@
 config HINIC
tristate "Huawei Intelligent PCIE Network Interface Card"
depends on (PCI_MSI && X86)
-   default m
---help---
  This driver supports HiNIC PCIE Ethernet cards.
  To compile this driver as part of the kernel, choose Y here.
-- 
2.13.5



Re: [PATCH RFC net-next] net: Allow name change of IFF_UP interfaces

2017-08-11 Thread Vitaly Kuznetsov
Stephen Hemminger <step...@networkplumber.org> writes:

> On Thu, 10 Aug 2017 10:55:01 -0600
> David Ahern <dsah...@gmail.com> wrote:
>
>> On 8/10/17 10:48 AM, David Miller wrote:
>> > From: Andrew Lunn <and...@lunn.ch>
>> > Date: Thu, 10 Aug 2017 18:27:22 +0200
>> >   
>> >> On Thu, Aug 10, 2017 at 05:24:55PM +0200, Vitaly Kuznetsov wrote:  
>> >>> Andrew Lunn <and...@lunn.ch> writes:
>> >>>  
>> >>>>> We are - rtnetlink_event() does the job. We, however, don't have a
>> >>>>> special IFLA_EVENT_* for name change and end up with IFLA_EVENT_NONE.  
>> >>>>
>> >>>> What is in this event? Old and new name? Just the new name?  
>> >>>
>> >>> Basically, it's everything we know about the interface - type, index,
>> >>> name, mtu, qdisc, ... - see rtnl_fill_ifinfo(). Back to your question -
>> >>> it's only the new name.  
>> >>
>> >> So the program needs to keep track of ifindex to know which interface
>> >> has changed name. Doable.
>> >>
>> >> I still expect this has the potential to break something. You probably
>> >> should be asking on linux-api for the API experts opinion.  
>> > 
>> > But a greater point is that nobody is monitoring device renames
>> > explicitly right now.  
>> 
>> Just to throw in an example:
>>   https://github.com/kobolabs/dhcpcd/blob/kobo/if-linux.c#L761
>> 
>> Learned of its use from a recent regression:
>> https://bugzilla.kernel.org/show_bug.cgi?id=196355
>
> Quagga is another example of what might break. Especially with all the new
> forks..

I see,

even if we don't see right away why the limitation was imposed and who
depends on this 'UP interfaces keep their names' semantics it may not be
worth it to open this pandora box just because of the netvsc driver
change...

-- 
  Vitaly


Re: [PATCH RFC net-next] net: Allow name change of IFF_UP interfaces

2017-08-10 Thread Vitaly Kuznetsov
Andrew Lunn <and...@lunn.ch> writes:

> On Thu, Aug 10, 2017 at 05:24:55PM +0200, Vitaly Kuznetsov wrote:
>> Andrew Lunn <and...@lunn.ch> writes:
>> 
>> >> We are - rtnetlink_event() does the job. We, however, don't have a
>> >> special IFLA_EVENT_* for name change and end up with IFLA_EVENT_NONE.
>> >
>> > What is in this event? Old and new name? Just the new name?
>> 
>> Basically, it's everything we know about the interface - type, index,
>> name, mtu, qdisc, ... - see rtnl_fill_ifinfo(). Back to your question -
>> it's only the new name.
>
> So the program needs to keep track of ifindex to know which interface
> has changed name. Doable.
>

Yes, and I'd expect that's what these daemons do nowdays to track name
changes for down interfaces (if/when they care).

> I still expect this has the potential to break something. You probably
> should be asking on linux-api for the API experts opinion.
>

Good idea, I'll do RFCv2 submission.

-- 
  Vitaly


Re: [PATCH RFC net-next] net: Allow name change of IFF_UP interfaces

2017-08-10 Thread Vitaly Kuznetsov
Andrew Lunn  writes:

>> We are - rtnetlink_event() does the job. We, however, don't have a
>> special IFLA_EVENT_* for name change and end up with IFLA_EVENT_NONE.
>
> What is in this event? Old and new name? Just the new name?

Basically, it's everything we know about the interface - type, index,
name, mtu, qdisc, ... - see rtnl_fill_ifinfo(). Back to your question -
it's only the new name.

-- 
  Vitaly


Re: [PATCH RFC net-next] net: Allow name change of IFF_UP interfaces

2017-08-10 Thread Vitaly Kuznetsov
Eric Dumazet <eric.duma...@gmail.com> writes:

> On Thu, 2017-08-10 at 10:41 +0200, Vitaly Kuznetsov wrote:
>> Andrew Lunn <and...@lunn.ch> writes:
>> 
>> >> I understand the 'legacy' concern but at the same time we don't want to
>> >> have aftificial limitations too. Name change, in particular, doesn't
>> >> happen 'under the hood' -- someone privileged enough needs to request
>> >> the change.
>> >> 
>> >> Can you think of any particular real world scenarios which are broken by
>> >> the change?
>> >
>> > How about:
>> >
>> > man 8 dhclient-script
>> >
>> > The interface name is passed in $interface to the scripts. Do we get
>> > the old name or the new name? I suspect scripts are going to break if
>> > they are given the old name, which no longer exists.
>> 
>> Yes but why would anyone change interface name while dhclient-script is
>> running? Things will also go wrong if you try bringing interface down
>> during the run or do some other configuration, right? Running multiple
>> configuration tools at the same moment is a bad idea, you never know
>> what you're gonna end up with. 
>> 
>> As I see it, checks in kernel we have are meant to protect kernel
>> itself, not to disallow all user<->kernel interactions leading to
>> imperfect result.
>> 
>> (AFAIU) If we remove the check nothing is going to change: udev will
>> still be renaming interfaces before bringing them up. In netvsc case
>> users are not supposed to configure the VF interface at all, it just
>> becomes a slave of netvsc interface.
>
> Are we sending an event if device name is changed ?
>

We are - rtnetlink_event() does the job. We, however, don't have a
special IFLA_EVENT_* for name change and end up with IFLA_EVENT_NONE.

> If yes, your patch is fine.
>
> If not, daemons would not be aware the need to refresh their view of the
> world.

Yes but AFAIU daemons may need to do the same refresh when the interface
is down too (and, hopefully, they do it already).

-- 
  Vitaly


Re: [PATCH RFC net-next] net: Allow name change of IFF_UP interfaces

2017-08-10 Thread Vitaly Kuznetsov
Andrew Lunn  writes:

>> I understand the 'legacy' concern but at the same time we don't want to
>> have aftificial limitations too. Name change, in particular, doesn't
>> happen 'under the hood' -- someone privileged enough needs to request
>> the change.
>> 
>> Can you think of any particular real world scenarios which are broken by
>> the change?
>
> How about:
>
> man 8 dhclient-script
>
> The interface name is passed in $interface to the scripts. Do we get
> the old name or the new name? I suspect scripts are going to break if
> they are given the old name, which no longer exists.

Yes but why would anyone change interface name while dhclient-script is
running? Things will also go wrong if you try bringing interface down
during the run or do some other configuration, right? Running multiple
configuration tools at the same moment is a bad idea, you never know
what you're gonna end up with. 

As I see it, checks in kernel we have are meant to protect kernel
itself, not to disallow all user<->kernel interactions leading to
imperfect result.

(AFAIU) If we remove the check nothing is going to change: udev will
still be renaming interfaces before bringing them up. In netvsc case
users are not supposed to configure the VF interface at all, it just
becomes a slave of netvsc interface.

-- 
  Vitaly


Re: [PATCH RFC net-next] net: Allow name change of IFF_UP interfaces

2017-08-09 Thread Vitaly Kuznetsov
吉藤英明 <hideaki.yoshif...@miraclelinux.com> writes:

> 2017-08-09 19:42 GMT+09:00 Vitaly Kuznetsov <vkuzn...@redhat.com>:
>> What happens is: __netvsc_vf_setup() does dev_open() for the VF device and
>> the consecutive dev_change_name() fails with -EBUSY because of the
>> (dev->flags & IFF_UP) check. The history of this code predates git so I
>> wasn't able to figure out when and why the check was added, everything
>> seems to work fine without it. dev_change_name() has only two call sites,
>> both hold rtnl_lock.
>>
>> Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
>> ---
>>  RFC: I'm probably miossing something obvious and the check can't be just
>>  dropped. Stephen suggested a different solution to the isuue:
>>  https://www.spinics.net/lists/netdev/msg448243.html but it has its own
>>  drawbacks.
>> ---
>>  net/core/dev.c | 2 --
>>  1 file changed, 2 deletions(-)
>>
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 1d75499add72..c608e233a78a 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -1186,8 +1186,6 @@ int dev_change_name(struct net_device *dev, const char 
>> *newname)
>> BUG_ON(!dev_net(dev));
>>
>> net = dev_net(dev);
>> -   if (dev->flags & IFF_UP)
>> -   return -EBUSY;
>>
>> write_seqcount_begin(_rename_seq);
>
> I think people expect the name won't change while up
> and I don't think it is a good idea to allow changing the
> name while the interface is up.

I understand the 'legacy' concern but at the same time we don't want to
have aftificial limitations too. Name change, in particular, doesn't
happen 'under the hood' -- someone privileged enough needs to request
the change.

Can you think of any particular real world scenarios which are broken by
the change?

-- 
  Vitaly


[PATCH RFC net-next] net: Allow name change of IFF_UP interfaces

2017-08-09 Thread Vitaly Kuznetsov
Recent 'transparenf VF' changes to netvsc driver made VF interfaces
register as netvsc interface slaves upon appearance. This led to udev
not being able to rename the interface according to the 'predictable
interface names' scheme:

 kernel: mlx4_core 0002:00:02.0 eth2: joined to eth1
 kernel: hv_netvsc 33b7a6f9-6736-451f-8fce-b382eaa50bee eth1: VF
  registering: eth2
 kernel: mlx4_en: eth2: Link Up
 kernel: hv_netvsc 33b7a6f9-6736-451f-8fce-b382eaa50bee eth1: Data path
  switched to VF: eth2
 systemd-udevd[1785]: Error changing net interface name 'eth2' to
  'enP2p0s2': Device or resource busy
 systemd-udevd[1785]: could not rename interface '5' from 'eth2' to
  'enP2p0s2': Device or resource busy

What happens is: __netvsc_vf_setup() does dev_open() for the VF device and
the consecutive dev_change_name() fails with -EBUSY because of the
(dev->flags & IFF_UP) check. The history of this code predates git so I
wasn't able to figure out when and why the check was added, everything
seems to work fine without it. dev_change_name() has only two call sites,
both hold rtnl_lock.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 RFC: I'm probably miossing something obvious and the check can't be just
 dropped. Stephen suggested a different solution to the isuue:
 https://www.spinics.net/lists/netdev/msg448243.html but it has its own
 drawbacks.
---
 net/core/dev.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 1d75499add72..c608e233a78a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1186,8 +1186,6 @@ int dev_change_name(struct net_device *dev, const char 
*newname)
BUG_ON(!dev_net(dev));
 
net = dev_net(dev);
-   if (dev->flags & IFF_UP)
-   return -EBUSY;
 
write_seqcount_begin(_rename_seq);
 
-- 
2.13.4



Re: [PATCH net-next 0/1] netvsc: another VF datapath fix

2017-08-09 Thread Vitaly Kuznetsov
Stephen Hemminger  writes:

> The following would allow udev a chance at the device.
>

This would of course work but the approach is a bit hackish and can
still fail on a loaded system. Raising the pause interval would be an
option, but again, probably not the best one.

Let me try to send an RFC removing the check it dev_change_name() and if
it turns out that it can't be removed we can go back to your patch. But
in case it can we can leave without it.

Thanks,

> From 37fb240a6107834c3dd3f57caede9d73b807f414 Mon Sep 17 00:00:00 2001
> From: Stephen Hemminger 
> Date: Tue, 8 Aug 2017 08:39:24 -0700
> Subject: [PATCH] netvsc: delay setup of VF device
>
> When VF device is discovered, delay bring it automatically up in
> order to allow userspace to some simple changes (like renaming).
>
> Signed-off-by: Stephen Hemminger 
> ---
>  drivers/net/hyperv/hyperv_net.h |  2 +-
>  drivers/net/hyperv/netvsc_drv.c | 11 ++-
>  2 files changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
> index d1ea99a12cf2..f620c90307ed 100644
> --- a/drivers/net/hyperv/hyperv_net.h
> +++ b/drivers/net/hyperv/hyperv_net.h
> @@ -723,7 +723,7 @@ struct net_device_context {
>   /* State to manage the associated VF interface. */
>   struct net_device __rcu *vf_netdev;
>   struct netvsc_vf_pcpu_stats __percpu *vf_stats;
> - struct work_struct vf_takeover;
> + struct delayed_work vf_takeover;
>
>   /* 1: allocated, serial number is valid. 0: not allocated */
>   u32 vf_alloc;
> diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
> index 7ebf0e10e62b..1dff160368a3 100644
> --- a/drivers/net/hyperv/netvsc_drv.c
> +++ b/drivers/net/hyperv/netvsc_drv.c
> @@ -47,6 +47,7 @@
>
>  #define RING_SIZE_MIN 64
>  #define LINKCHANGE_INT (2 * HZ)
> +#define VF_TAKEOVER_INT (HZ / 10)
>
>  static int ring_size = 128;
>  module_param(ring_size, int, S_IRUGO);
> @@ -1570,7 +1571,7 @@ static int netvsc_vf_join(struct net_device *vf_netdev,
>   /* set slave flag before open to prevent IPv6 addrconf */
>   vf_netdev->flags |= IFF_SLAVE;
>
> - schedule_work(_ctx->vf_takeover);
> + schedule_delayed_work(_ctx->vf_takeover, VF_TAKEOVER_INT);
>
>   netdev_info(vf_netdev, "joined to %s\n", ndev->name);
>   return 0;
> @@ -1608,12 +1609,12 @@ static void __netvsc_vf_setup(struct net_device *ndev,
>  static void netvsc_vf_setup(struct work_struct *w)
>  {
>   struct net_device_context *ndev_ctx
> - = container_of(w, struct net_device_context, vf_takeover);
> + = container_of(w, struct net_device_context, vf_takeover.work);
>   struct net_device *ndev = hv_get_drvdata(ndev_ctx->device_ctx);
>   struct net_device *vf_netdev;
>
>   if (!rtnl_trylock()) {
> - schedule_work(w);
> + schedule_delayed_work(_ctx->vf_takeover, 0);
>   return;
>   }
>
> @@ -1717,7 +1718,7 @@ static int netvsc_unregister_vf(struct net_device 
> *vf_netdev)
>   return NOTIFY_DONE;
>
>   net_device_ctx = netdev_priv(ndev);
> - cancel_work_sync(_device_ctx->vf_takeover);
> + cancel_delayed_work_sync(_device_ctx->vf_takeover);
>
>   netdev_info(ndev, "VF unregistering: %s\n", vf_netdev->name);
>
> @@ -1759,7 +1760,7 @@ static int netvsc_probe(struct hv_device *dev,
>
>   spin_lock_init(_device_ctx->lock);
>   INIT_LIST_HEAD(_device_ctx->reconfig_events);
> - INIT_WORK(_device_ctx->vf_takeover, netvsc_vf_setup);
> + INIT_DELAYED_WORK(_device_ctx->vf_takeover, netvsc_vf_setup);
>
>   net_device_ctx->vf_stats
>   = netdev_alloc_pcpu_stats(struct netvsc_vf_pcpu_stats);

-- 
  Vitaly


Re: [PATCH net-next 0/1] netvsc: another VF datapath fix

2017-08-08 Thread Vitaly Kuznetsov
Stephen Hemminger <step...@networkplumber.org> writes:

> On Tue, 08 Aug 2017 17:24:03 +0200
> Vitaly Kuznetsov <vkuzn...@redhat.com> wrote:
>
>> Stephen Hemminger <step...@networkplumber.org> writes:
>> 
>> > On Tue, 08 Aug 2017 16:03:56 +0200
>> > Vitaly Kuznetsov <vkuzn...@redhat.com> wrote:
>> >  
>> >> Stephen Hemminger <step...@networkplumber.org> writes:
>> >>   
>> >> > Previous fix was incomplete.
>> >> >
>> >> 
>> >> Not really related to this patch series (which btw fixes my issue,
>> >> thanks!), but I found one addition issue. Systemd fails to rename VF
>> >> interface:
>> >> 
>> >>  kernel: mlx4_core 0002:00:02.0 eth2: joined to eth1
>> >>  kernel: hv_netvsc 33b7a6f9-6736-451f-8fce-b382eaa50bee eth1: VF 
>> >> registering: eth2
>> >>  kernel: hv_netvsc 33b7a6f9-6736-451f-8fce-b382eaa50bee eth1: Data path 
>> >> switched to VF: eth2
>> >>  kernel: mlx4_en: eth2: Link Up
>> >>  NetworkManager[750]:   [1502200557.0821] device (eth2): link 
>> >> connected
>> >>  NetworkManager[750]:   [1502200557.1004] manager: (eth2): new 
>> >> Ethernet device (/org/freedesktop/NetworkManager/Devices/6)
>> >>  systemd-udevd[6942]: Error changing net interface name 'eth2' to 
>> >> 'enP2p0s2': Device or resource busy
>> >>  systemd-udevd[6942]: could not rename interface '6' from 'eth2' to 
>> >> 'enP2p0s2': Device or resource busy
>> >> 
>> >> With some debug added I figured out what's wrong: __netvsc_vf_setup()
>> >> does dev_open() which sets IFF_UP flag on the interface. When systemd
>> >> tries to rename the interface we get into dev_change_name() and this
>> >> fails with -EBUSY when (dev->flags & IFF_UP).
>> >> 
>> >> The issue is of less importance as we're not supposed to configure VF
>> >> interface now. However, we may still want to get a stable name for it.
>> >> 
>> >> Any idea how this can be fixed?  
>> >
>> > The problem is Network Manager should ignore the VF device. I don't run NM 
>> > on these
>> > interfaces because it causes more issues than it helps (dueling userspace).
>> >
>> > The driver needs to have slave track the master interface. Relying on 
>> > userspace
>> > to bring interface up leads to all the issues the bonding script had.
>> >
>> > One option would be to delay the work of bringing up the slave device to 
>> > allow
>> > small window for renaming to run.  
>> 
>> Actually, I tried removing 'if (dev->flags & IFF_UP)' check from
>> dev_change_name() and everything seems to work fine. The history of this
>> code predates git so I have no idea why it's forbiden to change names of
>> IFF_UP interfaces... I can send an RFC removing the check to figure out
>> the truth :-)
>
> That only works because NM by default brings everything up.
> Many users don't use NM on servers.

I'm not sure NetworkManager is related here: renaming is done by
systemd/udev according to the "predictable interface names" scheme:
https://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames/

Basically, systemd/udev tries to rename all new interfaces in the system
according to this scheme. And systemd/udev is ubiquitous nowdays.

I tried disabling NetworkManager in my VM and the behavior is not any
different:

 kernel: mlx4_core 0002:00:02.0 eth2: joined to eth1
 kernel: hv_netvsc 33b7a6f9-6736-451f-8fce-b382eaa50bee eth1: VF registering: 
eth2
 kernel: mlx4_en: eth2: Link Up
 kernel: hv_netvsc 33b7a6f9-6736-451f-8fce-b382eaa50bee eth1: Data path 
switched to VF: eth2
 systemd-udevd[1785]: Error changing net interface name 'eth2' to 'enP2p0s2': 
Device or resource busy
 systemd-udevd[1785]: could not rename interface '5' from 'eth2' to 'enP2p0s2': 
Device or resource busy

The 'if (dev->flags & IFF_UP)' check in dev_change_name() function
I mentioned prevents renaming interfaces which are already up but I
don't know an obvious reason for it to exist.

-- 
  Vitaly


Re: [PATCH net-next 0/1] netvsc: another VF datapath fix

2017-08-08 Thread Vitaly Kuznetsov
Stephen Hemminger <step...@networkplumber.org> writes:

> On Tue, 08 Aug 2017 16:03:56 +0200
> Vitaly Kuznetsov <vkuzn...@redhat.com> wrote:
>
>> Stephen Hemminger <step...@networkplumber.org> writes:
>> 
>> > Previous fix was incomplete.
>> >  
>> 
>> Not really related to this patch series (which btw fixes my issue,
>> thanks!), but I found one addition issue. Systemd fails to rename VF
>> interface:
>> 
>>  kernel: mlx4_core 0002:00:02.0 eth2: joined to eth1
>>  kernel: hv_netvsc 33b7a6f9-6736-451f-8fce-b382eaa50bee eth1: VF 
>> registering: eth2
>>  kernel: hv_netvsc 33b7a6f9-6736-451f-8fce-b382eaa50bee eth1: Data path 
>> switched to VF: eth2
>>  kernel: mlx4_en: eth2: Link Up
>>  NetworkManager[750]:   [1502200557.0821] device (eth2): link connected
>>  NetworkManager[750]:   [1502200557.1004] manager: (eth2): new 
>> Ethernet device (/org/freedesktop/NetworkManager/Devices/6)
>>  systemd-udevd[6942]: Error changing net interface name 'eth2' to 
>> 'enP2p0s2': Device or resource busy
>>  systemd-udevd[6942]: could not rename interface '6' from 'eth2' to 
>> 'enP2p0s2': Device or resource busy
>> 
>> With some debug added I figured out what's wrong: __netvsc_vf_setup()
>> does dev_open() which sets IFF_UP flag on the interface. When systemd
>> tries to rename the interface we get into dev_change_name() and this
>> fails with -EBUSY when (dev->flags & IFF_UP).
>> 
>> The issue is of less importance as we're not supposed to configure VF
>> interface now. However, we may still want to get a stable name for it.
>> 
>> Any idea how this can be fixed?
>
> The problem is Network Manager should ignore the VF device. I don't run NM on 
> these
> interfaces because it causes more issues than it helps (dueling userspace).
>
> The driver needs to have slave track the master interface. Relying on 
> userspace
> to bring interface up leads to all the issues the bonding script had.
>
> One option would be to delay the work of bringing up the slave device to allow
> small window for renaming to run.

Actually, I tried removing 'if (dev->flags & IFF_UP)' check from
dev_change_name() and everything seems to work fine. The history of this
code predates git so I have no idea why it's forbiden to change names of
IFF_UP interfaces... I can send an RFC removing the check to figure out
the truth :-)

-- 
  Vitaly


Re: [PATCH net-next 0/1] netvsc: another VF datapath fix

2017-08-08 Thread Vitaly Kuznetsov
Stephen Hemminger  writes:

> Previous fix was incomplete.
>

Not really related to this patch series (which btw fixes my issue,
thanks!), but I found one addition issue. Systemd fails to rename VF
interface:

 kernel: mlx4_core 0002:00:02.0 eth2: joined to eth1
 kernel: hv_netvsc 33b7a6f9-6736-451f-8fce-b382eaa50bee eth1: VF registering: 
eth2
 kernel: hv_netvsc 33b7a6f9-6736-451f-8fce-b382eaa50bee eth1: Data path 
switched to VF: eth2
 kernel: mlx4_en: eth2: Link Up
 NetworkManager[750]:   [1502200557.0821] device (eth2): link connected
 NetworkManager[750]:   [1502200557.1004] manager: (eth2): new Ethernet 
device (/org/freedesktop/NetworkManager/Devices/6)
 systemd-udevd[6942]: Error changing net interface name 'eth2' to 'enP2p0s2': 
Device or resource busy
 systemd-udevd[6942]: could not rename interface '6' from 'eth2' to 'enP2p0s2': 
Device or resource busy

With some debug added I figured out what's wrong: __netvsc_vf_setup()
does dev_open() which sets IFF_UP flag on the interface. When systemd
tries to rename the interface we get into dev_change_name() and this
fails with -EBUSY when (dev->flags & IFF_UP).

The issue is of less importance as we're not supposed to configure VF
interface now. However, we may still want to get a stable name for it.

Any idea how this can be fixed?

-- 
  Vitaly


Re: [PATCH net-next 1/1] netvsc: fix rtnl deadlock on unregister of vf

2017-08-07 Thread Vitaly Kuznetsov
Vitaly Kuznetsov <vkuzn...@redhat.com> writes:

> Vitaly Kuznetsov <vkuzn...@redhat.com> writes:
>
>> Stephen Hemminger <step...@networkplumber.org> writes:
>>
>>> With new transparent VF support, it is possible to get a deadlock
>>> when some of the deferred work is running and the unregister_vf
>>> is trying to cancel the work element. The solution is to use
>>> trylock and reschedule (similar to bonding and team device).
>>>
>>> Reported-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
>>> Fixes: 0c195567a8f6 ("netvsc: transparent VF management")
>>> Signed-off-by: Stephen Hemminger <sthem...@microsoft.com>
>>> ---
>>>  drivers/net/hyperv/netvsc_drv.c | 12 ++--
>>>  1 file changed, 10 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/net/hyperv/netvsc_drv.c 
>>> b/drivers/net/hyperv/netvsc_drv.c
>>> index c71728d82049..e75c0f852a63 100644
>>> --- a/drivers/net/hyperv/netvsc_drv.c
>>> +++ b/drivers/net/hyperv/netvsc_drv.c
>>> @@ -1601,7 +1601,11 @@ static void netvsc_vf_setup(struct work_struct *w)
>>> struct net_device *ndev = hv_get_drvdata(ndev_ctx->device_ctx);
>>> struct net_device *vf_netdev;
>>>
>>> -   rtnl_lock();
>>> +   if (!rtnl_trylock()) {
>>> +   schedule_work(w);
>>> +   return;
>>> +   }
>>> +
>>> vf_netdev = rtnl_dereference(ndev_ctx->vf_netdev);
>>> if (vf_netdev)
>>> __netvsc_vf_setup(ndev, vf_netdev);
>>> @@ -1655,7 +1659,11 @@ static void netvsc_vf_update(struct work_struct *w)
>>> struct net_device *vf_netdev;
>>> bool vf_is_up;
>>>
>>> -   rtnl_lock();
>>> +   if (!rtnl_trylock()) {
>>> +   schedule_work(w);
>>> +   return;
>>> +   }
>>> +
>>
>> So in the situation when we're currently in netvsc_unregister_vf() and
>> trying to do
>> cancel_work_sync(_device_ctx->vf_takeover);
>>  cancel_work_sync(_device_ctx->vf_notify);
>>
>> we'll end up not executing netvsc_vf_update() at all, right? Wouldn't it
>> create an issue as nobody is switching the datapath back to netvsc?
>>
>
> Actually, looking more at this I think we have additional issues:
>
> netvsc_unregister_vf() may get executed _before_ netvsc_vf_update() gets
> a chance and we just cancel it so the data path is never switched
> back. I actually have a VM where I suppose it happens ...
>
> [7.235566] hv_netvsc 33b7a6f9-6736-451f-8fce-b382eaa50bee eth1: VF up: 
> enP2p0s2
> [7.235569] hv_netvsc 33b7a6f9-6736-451f-8fce-b382eaa50bee eth1: Datapath 
> switched to VF: enP2p0s2
>
> On VF removal:
>
> [   17.675885] mlx4_en: enP2p0s2: Close port called
> [   17.727005] hv_netvsc 33b7a6f9-6736-451f-8fce-b382eaa50bee eth1: VF 
> unregistering: enP2p0s2
> 
>
> We need to make sure netvsc_vf_update() is always processed on removal.

So the question I have is: why do we need to call netvsc_vf_update()
from a work? I tried calling it directly from netvsc_netdev_event() (and
with rtnl_lock()/unlock() calls dropped from it as we already have it,
of course) and everything seems to work for me.

Shall I send a patch removing the work?

-- 
  Vitaly


Re: [PATCH net-next 1/1] netvsc: fix rtnl deadlock on unregister of vf

2017-08-07 Thread Vitaly Kuznetsov
Vitaly Kuznetsov <vkuzn...@redhat.com> writes:

> Stephen Hemminger <step...@networkplumber.org> writes:
>
>> With new transparent VF support, it is possible to get a deadlock
>> when some of the deferred work is running and the unregister_vf
>> is trying to cancel the work element. The solution is to use
>> trylock and reschedule (similar to bonding and team device).
>>
>> Reported-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
>> Fixes: 0c195567a8f6 ("netvsc: transparent VF management")
>> Signed-off-by: Stephen Hemminger <sthem...@microsoft.com>
>> ---
>>  drivers/net/hyperv/netvsc_drv.c | 12 ++--
>>  1 file changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/hyperv/netvsc_drv.c 
>> b/drivers/net/hyperv/netvsc_drv.c
>> index c71728d82049..e75c0f852a63 100644
>> --- a/drivers/net/hyperv/netvsc_drv.c
>> +++ b/drivers/net/hyperv/netvsc_drv.c
>> @@ -1601,7 +1601,11 @@ static void netvsc_vf_setup(struct work_struct *w)
>>  struct net_device *ndev = hv_get_drvdata(ndev_ctx->device_ctx);
>>  struct net_device *vf_netdev;
>>
>> -rtnl_lock();
>> +if (!rtnl_trylock()) {
>> +schedule_work(w);
>> +return;
>> +}
>> +
>>  vf_netdev = rtnl_dereference(ndev_ctx->vf_netdev);
>>  if (vf_netdev)
>>  __netvsc_vf_setup(ndev, vf_netdev);
>> @@ -1655,7 +1659,11 @@ static void netvsc_vf_update(struct work_struct *w)
>>  struct net_device *vf_netdev;
>>  bool vf_is_up;
>>
>> -rtnl_lock();
>> +if (!rtnl_trylock()) {
>> +schedule_work(w);
>> +return;
>> +}
>> +
>
> So in the situation when we're currently in netvsc_unregister_vf() and
> trying to do
> cancel_work_sync(_device_ctx->vf_takeover);
>   cancel_work_sync(_device_ctx->vf_notify);
>
> we'll end up not executing netvsc_vf_update() at all, right? Wouldn't it
> create an issue as nobody is switching the datapath back to netvsc?
>

Actually, looking more at this I think we have additional issues:

netvsc_unregister_vf() may get executed _before_ netvsc_vf_update() gets
a chance and we just cancel it so the data path is never switched
back. I actually have a VM where I suppose it happens ...

[7.235566] hv_netvsc 33b7a6f9-6736-451f-8fce-b382eaa50bee eth1: VF up: 
enP2p0s2
[7.235569] hv_netvsc 33b7a6f9-6736-451f-8fce-b382eaa50bee eth1: Datapath 
switched to VF: enP2p0s2

On VF removal:

[   17.675885] mlx4_en: enP2p0s2: Close port called
[   17.727005] hv_netvsc 33b7a6f9-6736-451f-8fce-b382eaa50bee eth1: VF 
unregistering: enP2p0s2


We need to make sure netvsc_vf_update() is always processed on removal.

-- 
  Vitaly


Re: [PATCH net-next 1/1] netvsc: fix rtnl deadlock on unregister of vf

2017-08-07 Thread Vitaly Kuznetsov
Stephen Hemminger <step...@networkplumber.org> writes:

> With new transparent VF support, it is possible to get a deadlock
> when some of the deferred work is running and the unregister_vf
> is trying to cancel the work element. The solution is to use
> trylock and reschedule (similar to bonding and team device).
>
> Reported-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
> Fixes: 0c195567a8f6 ("netvsc: transparent VF management")
> Signed-off-by: Stephen Hemminger <sthem...@microsoft.com>
> ---
>  drivers/net/hyperv/netvsc_drv.c | 12 ++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
> index c71728d82049..e75c0f852a63 100644
> --- a/drivers/net/hyperv/netvsc_drv.c
> +++ b/drivers/net/hyperv/netvsc_drv.c
> @@ -1601,7 +1601,11 @@ static void netvsc_vf_setup(struct work_struct *w)
>   struct net_device *ndev = hv_get_drvdata(ndev_ctx->device_ctx);
>   struct net_device *vf_netdev;
>
> - rtnl_lock();
> + if (!rtnl_trylock()) {
> + schedule_work(w);
> + return;
> + }
> +
>   vf_netdev = rtnl_dereference(ndev_ctx->vf_netdev);
>   if (vf_netdev)
>   __netvsc_vf_setup(ndev, vf_netdev);
> @@ -1655,7 +1659,11 @@ static void netvsc_vf_update(struct work_struct *w)
>   struct net_device *vf_netdev;
>   bool vf_is_up;
>
> - rtnl_lock();
> + if (!rtnl_trylock()) {
> + schedule_work(w);
> + return;
> + }
> +

So in the situation when we're currently in netvsc_unregister_vf() and
trying to do
cancel_work_sync(_device_ctx->vf_takeover);
cancel_work_sync(_device_ctx->vf_notify);

we'll end up not executing netvsc_vf_update() at all, right? Wouldn't it
create an issue as nobody is switching the datapath back to netvsc?

>   vf_netdev = rtnl_dereference(ndev_ctx->vf_netdev);
>   if (!vf_netdev)
>   goto unlock;

-- 
  Vitaly


[PATCH v2] xen-netfront: avoid crashing on resume after a failure in talk_to_netback()

2017-05-11 Thread Vitaly Kuznetsov
Unavoidable crashes in netfront_resume() and netback_changed() after a
previous fail in talk_to_netback() (e.g. when we fail to read MAC from
xenstore) were discovered. The failure path in talk_to_netback() does
unregister/free for netdev but we don't reset drvdata and we try accessing
it after resume.

Fix the bug by removing the whole xen device completely with
device_unregister(), this guarantees we won't have any calls into netfront
after a failure.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
Changes since v1: instead of cleaning drvdata and checking for it in
netfront_resume() and netback_changed() remove the device completely with
device_unregister() [David Miller]
---
 drivers/net/xen-netfront.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 6ffc482..7b61adb 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -1934,8 +1934,7 @@ static int talk_to_netback(struct xenbus_device *dev,
xennet_disconnect_backend(info);
xennet_destroy_queues(info);
  out:
-   unregister_netdev(info->netdev);
-   xennet_free_netdev(info->netdev);
+   device_unregister(>dev);
return err;
 }
 
-- 
2.9.3



Re: [PATCH] xen-netfront: avoid crashing on resume after a failure in talk_to_netback()

2017-05-05 Thread Vitaly Kuznetsov
David Miller <da...@davemloft.net> writes:

> From: Vitaly Kuznetsov <vkuzn...@redhat.com>
> Date: Thu,  4 May 2017 14:23:04 +0200
>
>> Unavoidable crashes in netfront_resume() and netback_changed() after a
>> previous fail in talk_to_netback() (e.g. when we fail to read MAC from
>> xenstore) were discovered. The failure path in talk_to_netback() does
>> unregister/free for netdev but we don't reset drvdata and we try accessing
>> it again after resume.
>> 
>> Reset drvdata in netback_changed() the same way we reset it in
>> netfront_probe() and check for NULL in both netfront_resume() and
>> netback_changed() to properly handle the situation.
>> 
>> Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
>
> The circumstances under which netfront_probe() NULLs out the device
> private is different than what you propose here, which is to do it
> on a live device in netback_changed() whilst mutliple susbsytems
> have a reference to this device and can call into the driver still.
>
> It is only legal to do this in the probe function because such
> references and execution possibilities do not exist at that point.
>
> What really needs to happen is that the xenbus_driver must be told to
> unregister this xen device and stop making calls into the driver for
> it before you release the netdev state.
>
> That is the only reasonable way to fix this bug.

True,

after looking at the issue again I realized that removing half of the
device in talk_to_netback() is a mistake - we should either treat errors
as fatal and remove the device completely or leave netdev in place
hoping that it'll magically got fixed later. I'm leaning towards the
former, I tried and the following simple patch does the job:

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 6ffc482..7b61adb 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -1934,8 +1934,7 @@ static int talk_to_netback(struct xenbus_device *dev,
xennet_disconnect_backend(info);
xennet_destroy_queues(info);
  out:
-   unregister_netdev(info->netdev);
-   xennet_free_netdev(info->netdev);
+   device_unregister(>dev);
return err;
 }

In case noone is against this big hammer I can send this as v2.

Thank you for your feedback, David!

-- 
  Vitaly


[PATCH] xen-netfront: avoid crashing on resume after a failure in talk_to_netback()

2017-05-04 Thread Vitaly Kuznetsov
Unavoidable crashes in netfront_resume() and netback_changed() after a
previous fail in talk_to_netback() (e.g. when we fail to read MAC from
xenstore) were discovered. The failure path in talk_to_netback() does
unregister/free for netdev but we don't reset drvdata and we try accessing
it again after resume.

Reset drvdata in netback_changed() the same way we reset it in
netfront_probe() and check for NULL in both netfront_resume() and
netback_changed() to properly handle the situation.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
I apologize for sending this during the merge window, I'm not sure if this
needs to go through xen or netdev tree. I think the fix is simple enough
and it would make sense to squeeze it in 4.12 if possible. Thanks.
---
 drivers/net/xen-netfront.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 6ffc482..92f746c 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -1426,6 +1426,9 @@ static int netfront_resume(struct xenbus_device *dev)
 {
struct netfront_info *info = dev_get_drvdata(>dev);
 
+   if (!info)
+   return 0;
+
dev_dbg(>dev, "%s\n", dev->nodename);
 
xennet_disconnect_backend(info);
@@ -1936,6 +1939,7 @@ static int talk_to_netback(struct xenbus_device *dev,
  out:
unregister_netdev(info->netdev);
xennet_free_netdev(info->netdev);
+   dev_set_drvdata(>dev, NULL);
return err;
 }
 
@@ -1997,7 +2001,12 @@ static void netback_changed(struct xenbus_device *dev,
enum xenbus_state backend_state)
 {
struct netfront_info *np = dev_get_drvdata(>dev);
-   struct net_device *netdev = np->netdev;
+   struct net_device *netdev;
+
+   if (!np)
+   return;
+
+   netdev = np->netdev;
 
dev_dbg(>dev, "%s\n", xenbus_strstate(backend_state));
 
-- 
2.9.3



Re: [PATCH net-next] net/hyperv: remove use of VLAN_TAG_PRESENT

2017-01-04 Thread Vitaly Kuznetsov
Michał Mirosław  writes:

> Signed-off-by: Michał Mirosław 

Can we have a non-empty description please?

> ---
>  drivers/net/hyperv/hyperv_net.h   |  2 +-
>  drivers/net/hyperv/netvsc_drv.c   | 13 ++---
>  drivers/net/hyperv/rndis_filter.c |  4 ++--
>  3 files changed, 9 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
> index 3958adade7eb..b53729e85a79 100644
> --- a/drivers/net/hyperv/hyperv_net.h
> +++ b/drivers/net/hyperv/hyperv_net.h
> @@ -186,7 +186,7 @@ int netvsc_recv_callback(struct hv_device *device_obj,
>   void **data,
>   struct ndis_tcp_ip_checksum_info *csum_info,
>   struct vmbus_channel *channel,
> - u16 vlan_tci);
> + u16 vlan_tci, bool vlan_present);
>  void netvsc_channel_cb(void *context);
>  int rndis_filter_open(struct netvsc_device *nvdev);
>  int rndis_filter_close(struct netvsc_device *nvdev);
> diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
> index c9414c054852..6597d7901929 100644
> --- a/drivers/net/hyperv/netvsc_drv.c
> +++ b/drivers/net/hyperv/netvsc_drv.c
> @@ -595,7 +595,7 @@ void netvsc_linkstatus_callback(struct hv_device 
> *device_obj,
>  static struct sk_buff *netvsc_alloc_recv_skb(struct net_device *net,
>   struct hv_netvsc_packet *packet,
>   struct ndis_tcp_ip_checksum_info *csum_info,
> - void *data, u16 vlan_tci)
> + void *data)
>  {
>   struct sk_buff *skb;
>
> @@ -625,10 +625,6 @@ static struct sk_buff *netvsc_alloc_recv_skb(struct 
> net_device *net,
>   skb->ip_summed = CHECKSUM_UNNECESSARY;
>   }
>
> - if (vlan_tci & VLAN_TAG_PRESENT)
> - __vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q),
> -vlan_tci);
> -
>   return skb;
>  }
>
> @@ -641,7 +637,7 @@ int netvsc_recv_callback(struct hv_device *device_obj,
>   void **data,
>   struct ndis_tcp_ip_checksum_info *csum_info,
>   struct vmbus_channel *channel,
> - u16 vlan_tci)
> + u16 vlan_tci, bool vlan_present)
>  {
>   struct net_device *net = hv_get_drvdata(device_obj);
>   struct net_device_context *net_device_ctx = netdev_priv(net);
> @@ -664,12 +660,15 @@ int netvsc_recv_callback(struct hv_device *device_obj,
>   net = vf_netdev;
>
>   /* Allocate a skb - TODO direct I/O to pages? */
> - skb = netvsc_alloc_recv_skb(net, packet, csum_info, *data, vlan_tci);
> + skb = netvsc_alloc_recv_skb(net, packet, csum_info, *data);
>   if (unlikely(!skb)) {
>   ++net->stats.rx_dropped;
>   return NVSP_STAT_FAIL;
>   }
>
> + if (vlan_present)
> + __vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), vlan_tci);
> +
>   if (net != vf_netdev)
>   skb_record_rx_queue(skb,
>   channel->offermsg.offer.sub_channel_index);
> diff --git a/drivers/net/hyperv/rndis_filter.c 
> b/drivers/net/hyperv/rndis_filter.c
> index 8d90904e0e49..7f7b410a41c2 100644
> --- a/drivers/net/hyperv/rndis_filter.c
> +++ b/drivers/net/hyperv/rndis_filter.c
> @@ -381,13 +381,13 @@ static int rndis_filter_receive_data(struct 
> rndis_device *dev,
>
>   vlan = rndis_get_ppi(rndis_pkt, IEEE_8021Q_INFO);
>   if (vlan) {
> - vlan_tci = VLAN_TAG_PRESENT | vlan->vlanid |
> + vlan_tci = vlan->vlanid |
>   (vlan->pri << VLAN_PRIO_SHIFT);
>   }
>
>   csum_info = rndis_get_ppi(rndis_pkt, TCPIP_CHKSUM_PKTINFO);
>   return netvsc_recv_callback(net_device_ctx->device_ctx, pkt, data,
> - csum_info, channel, vlan_tci);
> + csum_info, channel, vlan_tci, vlan);
>  }
>
>  int rndis_filter_receive(struct hv_device *dev,

-- 
  Vitaly


[PATCH net-next] hv_netvsc: remove excessive logging on MTU change

2016-11-28 Thread Vitaly Kuznetsov
When we change MTU or the number of channels on a netvsc device we get the
following logged:

 hv_netvsc bf5edba8...: net device safe to remove
 hv_netvsc: hv_netvsc channel opened successfully
 hv_netvsc bf5edba8...: Send section size: 6144, Section count:2560
 hv_netvsc bf5edba8...: Device MAC 00:15:5d:1e:91:12 link state up

This information is useful as debug at most.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/net/hyperv/netvsc.c   | 8 
 drivers/net/hyperv/rndis_filter.c | 6 +++---
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 720b5fa..d85da0d 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -410,8 +410,8 @@ static int netvsc_init_buf(struct hv_device *device)
net_device->send_section_cnt =
net_device->send_buf_size / net_device->send_section_size;
 
-   dev_info(>device, "Send section size: %d, Section count:%d\n",
-net_device->send_section_size, net_device->send_section_cnt);
+   netdev_dbg(ndev, "Send section size: %d, Section count:%d\n",
+  net_device->send_section_size, net_device->send_section_cnt);
 
/* Setup state for managing the send buffer. */
net_device->map_words = DIV_ROUND_UP(net_device->send_section_cnt,
@@ -578,7 +578,7 @@ void netvsc_device_remove(struct hv_device *device)
 * At this point, no one should be accessing net_device
 * except in here
 */
-   dev_notice(>device, "net device safe to remove\n");
+   netdev_dbg(ndev, "net device safe to remove\n");
 
/* Now, we can close the channel safely */
vmbus_close(device->channel);
@@ -1380,7 +1380,7 @@ int netvsc_device_add(struct hv_device *device, void 
*additional_info)
}
 
/* Channel is opened */
-   pr_info("hv_netvsc channel opened successfully\n");
+   netdev_dbg(ndev, "hv_netvsc channel opened successfully\n");
 
/* If we're reopening the device we may have multiple queues, fill the
 * chn_table with the default channel to use it before subchannels are
diff --git a/drivers/net/hyperv/rndis_filter.c 
b/drivers/net/hyperv/rndis_filter.c
index 9195d5d..8d90904 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -1059,9 +1059,9 @@ int rndis_filter_device_add(struct hv_device *dev,
 
device_info->link_state = rndis_device->link_state;
 
-   dev_info(>device, "Device MAC %pM link state %s\n",
-rndis_device->hw_mac_adr,
-device_info->link_state ? "down" : "up");
+   netdev_dbg(net, "Device MAC %pM link state %s\n",
+  rndis_device->hw_mac_adr,
+  device_info->link_state ? "down" : "up");
 
if (net_device->nvsp_version < NVSP_PROTOCOL_VERSION_5)
return 0;
-- 
2.9.3



Re: [PATCH net-next] hv_netvsc: fix a race between netvsc_send() and netvsc_init_buf()

2016-10-21 Thread Vitaly Kuznetsov
David Miller <da...@davemloft.net> writes:

> From: Vitaly Kuznetsov <vkuzn...@redhat.com>
> Date: Fri, 21 Oct 2016 13:15:53 +0200
>
>> David Miller <da...@davemloft.net> writes:
>> 
>>> From: Vitaly Kuznetsov <vkuzn...@redhat.com>
>>> Date: Thu, 20 Oct 2016 10:51:04 +0200
>>>
>>>> Stephen Hemminger <sthem...@microsoft.com> writes:
>>>> 
>>>>> Do we need ACCESS_ONCE() here to avoid check/use issues?
>>>>>
>>>> 
>>>> I think we don't: this is the only place in the function where we read
>>>> the variable so we'll get normal read. We're not trying to syncronize
>>>> with netvsc_init_buf() as that would require locking, if we read stale
>>>> NULL value after it was already updated on a different CPU we're fine,
>>>> we'll just return -EAGAIN.
>>>
>>> The concern is if we race with netvsc_destroy_buf() and this pointer
>>> becomes NULL after the test you are adding.
>> 
>> Thanks, this is interesting.
>> 
>> We may reach to netvsc_destroy_buf() by 3 pathes:
>> 
>> 1) cleanup path in netvsc_init_buf(). It is never called with
>> send_section_map being not NULL so it seems we're safe.
>> 
>> 2) from netvsc_remove() when the device is being removed. As far as I
>> understand we can't be on the transmit path after we call
>> unregister_netdev() so we're safe.
>> 
>> 3) from netvsc_change_mtu() and netvsc_set_channels(). These pathes are
>> specific to netvsc driver as basically we need to remove the device and
>> add it back to change mtu/number of channels. The underligning 'struct
>> net_device' stays but the rest is being removed and added back. On both
>> pathes we first call netvsc_close() which does netif_tx_disable() and as
>> far as I understand (I may be wrong) this guarantees that we won't be in
>> netvsc_send().
>> 
>> So *I think* that we can't ever free send_section_map while in
>> netvsc_send() and we can't even get to netvsc_send() after it is freed
>> but I may have missed something.
>
> Ok you may be right.
>
> Can't the device be taken down by asynchronous events as well?  For example
> if the peer end of the interface in the other guest disappears.

The device may be hot removed from host's side. In this case we follow
the following call chain:

... -> vmbus_device_unregister() -> ... -> vmbus_remove() -> netvsc_remove()

 and netvsc_remove() does netif_tx_disable(); unregister_netdev();
before calling rndis_filter_device_remove() leading to netvsc_destroy_buf().

So it seems we can't be in netvsc_send() when the device is
disappearing.

-- 
  Vitaly


Re: [PATCH net-next] hv_netvsc: fix a race between netvsc_send() and netvsc_init_buf()

2016-10-21 Thread Vitaly Kuznetsov
David Miller <da...@davemloft.net> writes:

> From: Vitaly Kuznetsov <vkuzn...@redhat.com>
> Date: Thu, 20 Oct 2016 10:51:04 +0200
>
>> Stephen Hemminger <sthem...@microsoft.com> writes:
>> 
>>> Do we need ACCESS_ONCE() here to avoid check/use issues?
>>>
>> 
>> I think we don't: this is the only place in the function where we read
>> the variable so we'll get normal read. We're not trying to syncronize
>> with netvsc_init_buf() as that would require locking, if we read stale
>> NULL value after it was already updated on a different CPU we're fine,
>> we'll just return -EAGAIN.
>
> The concern is if we race with netvsc_destroy_buf() and this pointer
> becomes NULL after the test you are adding.

Thanks, this is interesting.

We may reach to netvsc_destroy_buf() by 3 pathes:

1) cleanup path in netvsc_init_buf(). It is never called with
send_section_map being not NULL so it seems we're safe.

2) from netvsc_remove() when the device is being removed. As far as I
understand we can't be on the transmit path after we call
unregister_netdev() so we're safe.

3) from netvsc_change_mtu() and netvsc_set_channels(). These pathes are
specific to netvsc driver as basically we need to remove the device and
add it back to change mtu/number of channels. The underligning 'struct
net_device' stays but the rest is being removed and added back. On both
pathes we first call netvsc_close() which does netif_tx_disable() and as
far as I understand (I may be wrong) this guarantees that we won't be in
netvsc_send().

So *I think* that we can't ever free send_section_map while in
netvsc_send() and we can't even get to netvsc_send() after it is freed
but I may have missed something.

-- 
  Vitaly


Re: [PATCH net-next] hv_netvsc: fix a race between netvsc_send() and netvsc_init_buf()

2016-10-20 Thread Vitaly Kuznetsov
Stephen Hemminger <sthem...@microsoft.com> writes:

> Do we need ACCESS_ONCE() here to avoid check/use issues?
>

I think we don't: this is the only place in the function where we read
the variable so we'll get normal read. We're not trying to syncronize
with netvsc_init_buf() as that would require locking, if we read stale
NULL value after it was already updated on a different CPU we're fine,
we'll just return -EAGAIN.

> -Original Message-
> From: Vitaly Kuznetsov [mailto:vkuzn...@redhat.com] 
> Sent: Wednesday, October 19, 2016 2:53 PM
> To: netdev@vger.kernel.org
> Cc: Stephen Hemminger <sthem...@microsoft.com>; de...@linuxdriverproject.org; 
> linux-ker...@vger.kernel.org; KY Srinivasan <k...@microsoft.com>; Haiyang 
> Zhang <haiya...@microsoft.com>
> Subject: [PATCH net-next] hv_netvsc: fix a race between netvsc_send() and 
> netvsc_init_buf()
>
> Fix in commit 880988348270 ("hv_netvsc: set nvdev link after populating
> chn_table") turns out to be incomplete. A crash in
> netvsc_get_next_send_section() is observed on mtu change when the device is 
> under load. The race I identified is: if we get to netvsc_send() after we set 
> net_device_ctx->nvdev link in netvsc_device_add() but before we finish 
> netvsc_connect_vsp()->netvsc_init_buf() send_section_map is not allocated and 
> we crash. Unfortunately we can't set net_device_ctx->nvdev link after the 
> netvsc_init_buf() call as during the negotiation we need to receive packets 
> and on the receive path we check for it. It would probably be possible to 
> split nvdev into a pair of nvdev_in and nvdev_out links and check them 
> accordingly in get_outbound_net_device()/
> get_inbound_net_device() but this looks like an overkill.
>
> Check that send_section_map is allocated in netvsc_send().
>
> Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
> ---
>  drivers/net/hyperv/netvsc.c | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c index 
> 720b5fa..e2bfaac 100644
> --- a/drivers/net/hyperv/netvsc.c
> +++ b/drivers/net/hyperv/netvsc.c
> @@ -888,6 +888,13 @@ int netvsc_send(struct hv_device *device,
>   if (!net_device)
>   return -ENODEV;
>
> + /* We may race with netvsc_connect_vsp()/netvsc_init_buf() and get
> +  * here before the negotiation with the host is finished and
> +  * send_section_map may not be allocated yet.
> +  */
> + if (!net_device->send_section_map)
> + return -EAGAIN;
> +
>   out_channel = net_device->chn_table[q_idx];
>
>   packet->send_buf_index = NETVSC_INVALID_INDEX;
> --
> 2.7.4

-- 
  Vitaly


[PATCH net-next] hv_netvsc: fix a race between netvsc_send() and netvsc_init_buf()

2016-10-19 Thread Vitaly Kuznetsov
Fix in commit 880988348270 ("hv_netvsc: set nvdev link after populating
chn_table") turns out to be incomplete. A crash in
netvsc_get_next_send_section() is observed on mtu change when the device
is under load. The race I identified is: if we get to netvsc_send() after
we set net_device_ctx->nvdev link in netvsc_device_add() but before we
finish netvsc_connect_vsp()->netvsc_init_buf() send_section_map is not
allocated and we crash. Unfortunately we can't set net_device_ctx->nvdev
link after the netvsc_init_buf() call as during the negotiation we need
to receive packets and on the receive path we check for it. It would
probably be possible to split nvdev into a pair of nvdev_in and nvdev_out
links and check them accordingly in get_outbound_net_device()/
get_inbound_net_device() but this looks like an overkill.

Check that send_section_map is allocated in netvsc_send().

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/net/hyperv/netvsc.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 720b5fa..e2bfaac 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -888,6 +888,13 @@ int netvsc_send(struct hv_device *device,
if (!net_device)
return -ENODEV;
 
+   /* We may race with netvsc_connect_vsp()/netvsc_init_buf() and get
+* here before the negotiation with the host is finished and
+* send_section_map may not be allocated yet.
+*/
+   if (!net_device->send_section_map)
+   return -EAGAIN;
+
out_channel = net_device->chn_table[q_idx];
 
packet->send_buf_index = NETVSC_INVALID_INDEX;
-- 
2.7.4



[PATCH net-next v3] xen-netfront: avoid packet loss when ethernet header crosses page boundary

2016-09-19 Thread Vitaly Kuznetsov
Small packet loss is reported on complex multi host network configurations
including tunnels, NAT, ... My investigation led me to the following check
in netback which drops packets:

if (unlikely(txreq.size < ETH_HLEN)) {
netdev_err(queue->vif->dev,
   "Bad packet size: %d\n", txreq.size);
xenvif_tx_err(queue, , extra_count, idx);
break;
}

But this check itself is legitimate. SKBs consist of a linear part (which
has to have the ethernet header) and (optionally) a number of frags.
Netfront transmits the head of the linear part up to the page boundary
as the first request and all the rest becomes frags so when we're
reconstructing the SKB in netback we can't distinguish between original
frags and the 'tail' of the linear part. The first SKB needs to be at
least ETH_HLEN size. So in case we have an SKB with its linear part
starting too close to the page boundary the packet is lost.

I see two ways to fix the issue:
- Change the 'wire' protocol between netfront and netback to start keeping
  the original SKB structure. We'll have to add a flag indicating the fact
  that the particular request is a part of the original linear part and not
  a frag. We'll need to know the length of the linear part to pre-allocate
  memory.
- Avoid transmitting SKBs with linear parts starting too close to the page
  boundary. That seems preferable short-term and shouldn't bring
  significant performance degradation as such packets are rare. That's what
  this patch is trying to achieve with skb_copy().

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
Acked-by: David Vrabel <david.vra...@citrix.com>
---
- Changes since 'v2':
  Move 'len' calculation after the 'if' instead of re-calculating it
  inside. [David Vrabel]

- Changes since 'v1':
  Recalculate 'len' as in some cases it changes after skb_copy()
  [David Miller]
  I keep David's ACK on my patch as I think the change doesn't
  significantly change the patch, sorry in advance if I'm wrong.
---
 drivers/net/xen-netfront.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 96ccd4e..e17879d 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -565,6 +565,7 @@ static int xennet_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
struct netfront_queue *queue = NULL;
unsigned int num_queues = dev->real_num_tx_queues;
u16 queue_index;
+   struct sk_buff *nskb;
 
/* Drop the packet if no queues are set up */
if (num_queues < 1)
@@ -593,6 +594,20 @@ static int xennet_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
 
page = virt_to_page(skb->data);
offset = offset_in_page(skb->data);
+
+   /* The first req should be at least ETH_HLEN size or the packet will be
+* dropped by netback.
+*/
+   if (unlikely(PAGE_SIZE - offset < ETH_HLEN)) {
+   nskb = skb_copy(skb, GFP_ATOMIC);
+   if (!nskb)
+   goto drop;
+   dev_kfree_skb_any(skb);
+   skb = nskb;
+   page = virt_to_page(skb->data);
+   offset = offset_in_page(skb->data);
+   }
+
len = skb_headlen(skb);
 
spin_lock_irqsave(>tx_lock, flags);
-- 
2.7.4



Re: [PATCH net-next RESEND] xen-netfront: avoid packet loss when ethernet header crosses page boundary

2016-09-19 Thread Vitaly Kuznetsov
David Vrabel <david.vra...@citrix.com> writes:

> On 19/09/16 11:22, Vitaly Kuznetsov wrote:
>> David Miller <da...@davemloft.net> writes:
>> 
>>> From: Vitaly Kuznetsov <vkuzn...@redhat.com>
>>> Date: Fri, 16 Sep 2016 12:59:14 +0200
>>>
>>>> @@ -595,6 +596,19 @@ static int xennet_start_xmit(struct sk_buff *skb, 
>>>> struct net_device *dev)
>>>>offset = offset_in_page(skb->data);
>>>>len = skb_headlen(skb);
>>>>  
>>>> +  /* The first req should be at least ETH_HLEN size or the packet will be
>>>> +   * dropped by netback.
>>>> +   */
>>>> +  if (unlikely(PAGE_SIZE - offset < ETH_HLEN)) {
>>>> +  nskb = skb_copy(skb, GFP_ATOMIC);
>>>> +  if (!nskb)
>>>> +  goto drop;
>>>> +  dev_kfree_skb_any(skb);
>>>> +  skb = nskb;
>>>> +  page = virt_to_page(skb->data);
>>>> +  offset = offset_in_page(skb->data);
>>>> +  }
>>>> +
>>>>spin_lock_irqsave(>tx_lock, flags);
>>>
>>> I think you also have to recalculate 'len' in this case too, as
>>> skb_headlen() will definitely be different for nskb.
>>>
>>> In fact, I can't see how this code can work properly without that fix.
>> 
>> Thank you for your feedback David,
>> 
>> in my testing (even when I tried doing skb_copy() for all skbs
>> unconditionally) skb_headlen(nskb) always equals 'len' so I was under an
>> impression that both 'skb->len' and 'skb->data_len' remain the same when
>> we do skb_copy(). However, in case you think there are cases when
>> headlen changes, I see no problem with re-calculating 'len' as it won't
>> bring any significant performace penalty compared to the already added
>> skb_copy().
>
> I think you can move the len = skb_headlen(skb) after the if, no need to
> recalculate it.

Sorry, I was too fast with sending 'v2' and did the other way
around. I'll do v3.

-- 
  Vitaly


[PATCH net-next v2] xen-netfront: avoid packet loss when ethernet header crosses page boundary

2016-09-19 Thread Vitaly Kuznetsov
Small packet loss is reported on complex multi host network configurations
including tunnels, NAT, ... My investigation led me to the following check
in netback which drops packets:

if (unlikely(txreq.size < ETH_HLEN)) {
netdev_err(queue->vif->dev,
   "Bad packet size: %d\n", txreq.size);
xenvif_tx_err(queue, , extra_count, idx);
break;
}

But this check itself is legitimate. SKBs consist of a linear part (which
has to have the ethernet header) and (optionally) a number of frags.
Netfront transmits the head of the linear part up to the page boundary
as the first request and all the rest becomes frags so when we're
reconstructing the SKB in netback we can't distinguish between original
frags and the 'tail' of the linear part. The first SKB needs to be at
least ETH_HLEN size. So in case we have an SKB with its linear part
starting too close to the page boundary the packet is lost.

I see two ways to fix the issue:
- Change the 'wire' protocol between netfront and netback to start keeping
  the original SKB structure. We'll have to add a flag indicating the fact
  that the particular request is a part of the original linear part and not
  a frag. We'll need to know the length of the linear part to pre-allocate
  memory.
- Avoid transmitting SKBs with linear parts starting too close to the page
  boundary. That seems preferable short-term and shouldn't bring
  significant performance degradation as such packets are rare. That's what
  this patch is trying to achieve with skb_copy().

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
Acked-by: David Vrabel <david.vra...@citrix.com>
---
- Changes since 'v1':
  Recalculate 'len' as in some cases it changes after skb_copy()
  [David Miller]
  I keep David's ACK on my patch as I think the change doesn't
  significantly change the patch, sorry in advance if I'm wrong.
---
 drivers/net/xen-netfront.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 96ccd4e..043dd04 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -565,6 +565,7 @@ static int xennet_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
struct netfront_queue *queue = NULL;
unsigned int num_queues = dev->real_num_tx_queues;
u16 queue_index;
+   struct sk_buff *nskb;
 
/* Drop the packet if no queues are set up */
if (num_queues < 1)
@@ -595,6 +596,20 @@ static int xennet_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
offset = offset_in_page(skb->data);
len = skb_headlen(skb);
 
+   /* The first req should be at least ETH_HLEN size or the packet will be
+* dropped by netback.
+*/
+   if (unlikely(PAGE_SIZE - offset < ETH_HLEN)) {
+   nskb = skb_copy(skb, GFP_ATOMIC);
+   if (!nskb)
+   goto drop;
+   dev_kfree_skb_any(skb);
+   skb = nskb;
+   page = virt_to_page(skb->data);
+   offset = offset_in_page(skb->data);
+   len = skb_headlen(skb);
+   }
+
spin_lock_irqsave(>tx_lock, flags);
 
if (unlikely(!netif_carrier_ok(dev) ||
-- 
2.7.4



Re: [PATCH net-next RESEND] xen-netfront: avoid packet loss when ethernet header crosses page boundary

2016-09-19 Thread Vitaly Kuznetsov
David Miller <da...@davemloft.net> writes:

> From: Vitaly Kuznetsov <vkuzn...@redhat.com>
> Date: Fri, 16 Sep 2016 12:59:14 +0200
>
>> @@ -595,6 +596,19 @@ static int xennet_start_xmit(struct sk_buff *skb, 
>> struct net_device *dev)
>>  offset = offset_in_page(skb->data);
>>  len = skb_headlen(skb);
>>  
>> +/* The first req should be at least ETH_HLEN size or the packet will be
>> + * dropped by netback.
>> + */
>> +if (unlikely(PAGE_SIZE - offset < ETH_HLEN)) {
>> +nskb = skb_copy(skb, GFP_ATOMIC);
>> +if (!nskb)
>> +goto drop;
>> +dev_kfree_skb_any(skb);
>> +skb = nskb;
>> +page = virt_to_page(skb->data);
>> +offset = offset_in_page(skb->data);
>> +}
>> +
>>  spin_lock_irqsave(>tx_lock, flags);
>
> I think you also have to recalculate 'len' in this case too, as
> skb_headlen() will definitely be different for nskb.
>
> In fact, I can't see how this code can work properly without that fix.

Thank you for your feedback David,

in my testing (even when I tried doing skb_copy() for all skbs
unconditionally) skb_headlen(nskb) always equals 'len' so I was under an
impression that both 'skb->len' and 'skb->data_len' remain the same when
we do skb_copy(). However, in case you think there are cases when
headlen changes, I see no problem with re-calculating 'len' as it won't
bring any significant performace penalty compared to the already added
skb_copy().

I'll send 'v2'.

-- 
  Vitaly


[PATCH net-next RESEND] xen-netfront: avoid packet loss when ethernet header crosses page boundary

2016-09-16 Thread Vitaly Kuznetsov
Small packet loss is reported on complex multi host network configurations
including tunnels, NAT, ... My investigation led me to the following check
in netback which drops packets:

if (unlikely(txreq.size < ETH_HLEN)) {
netdev_err(queue->vif->dev,
   "Bad packet size: %d\n", txreq.size);
xenvif_tx_err(queue, , extra_count, idx);
break;
}

But this check itself is legitimate. SKBs consist of a linear part (which
has to have the ethernet header) and (optionally) a number of frags.
Netfront transmits the head of the linear part up to the page boundary
as the first request and all the rest becomes frags so when we're
reconstructing the SKB in netback we can't distinguish between original
frags and the 'tail' of the linear part. The first SKB needs to be at
least ETH_HLEN size. So in case we have an SKB with its linear part
starting too close to the page boundary the packet is lost.

I see two ways to fix the issue:
- Change the 'wire' protocol between netfront and netback to start keeping
  the original SKB structure. We'll have to add a flag indicating the fact
  that the particular request is a part of the original linear part and not
  a frag. We'll need to know the length of the linear part to pre-allocate
  memory.
- Avoid transmitting SKBs with linear parts starting too close to the page
  boundary. That seems preferable short-term and shouldn't bring
  significant performance degradation as such packets are rare. That's what
  this patch is trying to achieve with skb_copy().

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
Acked-by: David Vrabel <david.vra...@citrix.com>
---
- This is just a RESEND with David's ACK added.
---
 drivers/net/xen-netfront.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 96ccd4e..28c4a66 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -565,6 +565,7 @@ static int xennet_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
struct netfront_queue *queue = NULL;
unsigned int num_queues = dev->real_num_tx_queues;
u16 queue_index;
+   struct sk_buff *nskb;
 
/* Drop the packet if no queues are set up */
if (num_queues < 1)
@@ -595,6 +596,19 @@ static int xennet_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
offset = offset_in_page(skb->data);
len = skb_headlen(skb);
 
+   /* The first req should be at least ETH_HLEN size or the packet will be
+* dropped by netback.
+*/
+   if (unlikely(PAGE_SIZE - offset < ETH_HLEN)) {
+   nskb = skb_copy(skb, GFP_ATOMIC);
+   if (!nskb)
+   goto drop;
+   dev_kfree_skb_any(skb);
+   skb = nskb;
+   page = virt_to_page(skb->data);
+   offset = offset_in_page(skb->data);
+   }
+
spin_lock_irqsave(>tx_lock, flags);
 
if (unlikely(!netif_carrier_ok(dev) ||
-- 
2.7.4



Re: [PATCH net-next] xen-netfront: avoid packet loss when ethernet header crosses page boundary

2016-09-12 Thread Vitaly Kuznetsov
David Vrabel <david.vra...@citrix.com> writes:

> On 22/08/16 16:42, Vitaly Kuznetsov wrote:
>> Small packet loss is reported on complex multi host network configurations
>> including tunnels, NAT, ... My investigation led me to the following check
>> in netback which drops packets:
>> 
>> if (unlikely(txreq.size < ETH_HLEN)) {
>> netdev_err(queue->vif->dev,
>>"Bad packet size: %d\n", txreq.size);
>> xenvif_tx_err(queue, , extra_count, idx);
>> break;
>> }
>> 
>> But this check itself is legitimate. SKBs consist of a linear part (which
>> has to have the ethernet header) and (optionally) a number of frags.
>> Netfront transmits the head of the linear part up to the page boundary
>> as the first request and all the rest becomes frags so when we're
>> reconstructing the SKB in netback we can't distinguish between original
>> frags and the 'tail' of the linear part. The first SKB needs to be at
>> least ETH_HLEN size. So in case we have an SKB with its linear part
>> starting too close to the page boundary the packet is lost.
>> 
>> I see two ways to fix the issue:
>> - Change the 'wire' protocol between netfront and netback to start keeping
>>   the original SKB structure. We'll have to add a flag indicating the fact
>>   that the particular request is a part of the original linear part and not
>>   a frag. We'll need to know the length of the linear part to pre-allocate
>>   memory.
>> - Avoid transmitting SKBs with linear parts starting too close to the page
>>   boundary. That seems preferable short-term and shouldn't bring
>>   significant performance degradation as such packets are rare. That's what
>>   this patch is trying to achieve with skb_copy().
>> 
>> Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
>
> We should probably fix the backend to handle this (by grant copying a
> minimum amount in the linear area, but since netfront needs to work with
> older netback.
>
> Acked-by: David Vrabel <david.vra...@citrix.com>

David,

could you please pick this up for net-next or are there any concerns
remaining?

Thanks,

-- 
  Vitaly


Re: [Xen-devel] [PATCH net-next] xen-netfront: avoid packet loss when ethernet header crosses page boundary

2016-09-09 Thread Vitaly Kuznetsov
Vitaly Kuznetsov <vkuzn...@redhat.com> writes:

> Vitaly Kuznetsov <vkuzn...@redhat.com> writes:
>
>> David Vrabel <david.vra...@citrix.com> writes:
>>
>>> On 22/08/16 16:42, Vitaly Kuznetsov wrote:
>>>> 
>>>> I see two ways to fix the issue:
>>>> - Change the 'wire' protocol between netfront and netback to start keeping
>>>>   the original SKB structure. We'll have to add a flag indicating the fact
>>>>   that the particular request is a part of the original linear part and not
>>>>   a frag. We'll need to know the length of the linear part to pre-allocate
>>>>   memory.
>>>
>>> I don't think there needs to be a protocol change.  I think the check in
>>> netback is bogus -- it's the total packet length that must be >
>>> HLEN_ETH.  The upper layers will pull any headers from the frags as
>>> needed
>>
>> I'm afraid this is not always true, just removing the check leads us to
>> the following:
>>
>> [  495.442186] kernel BUG at ./include/linux/skbuff.h:1927! 
>> [  495.468789] invalid opcode:  [#1] SMP 
>
> What I wanted to say here is that this test makes me think the
> description of the patch I suggested is correct: an SKB can't have its
> linear part shorter than ETH_HLEN as the header is being pointed directly,
> upper network layers don't assemble it from frags, the check in netback
> is valid.
>
> So, how can we proceed here?

Sorry for the second ping but I'd really like to see this moving
forward...

-- 
  Vitaly


Re: [Xen-devel] [PATCH net-next] xen-netfront: avoid packet loss when ethernet header crosses page boundary

2016-08-29 Thread Vitaly Kuznetsov
Vitaly Kuznetsov <vkuzn...@redhat.com> writes:

> David Vrabel <david.vra...@citrix.com> writes:
>
>> On 22/08/16 16:42, Vitaly Kuznetsov wrote:
>>> 
>>> I see two ways to fix the issue:
>>> - Change the 'wire' protocol between netfront and netback to start keeping
>>>   the original SKB structure. We'll have to add a flag indicating the fact
>>>   that the particular request is a part of the original linear part and not
>>>   a frag. We'll need to know the length of the linear part to pre-allocate
>>>   memory.
>>
>> I don't think there needs to be a protocol change.  I think the check in
>> netback is bogus -- it's the total packet length that must be >
>> HLEN_ETH.  The upper layers will pull any headers from the frags as
>> needed
>
> I'm afraid this is not always true, just removing the check leads us to
> the following:
>
> [  495.442186] kernel BUG at ./include/linux/skbuff.h:1927! 
> [  495.468789] invalid opcode:  [#1] SMP 

What I wanted to say here is that this test makes me think the
description of the patch I suggested is correct: an SKB can't have its
linear part shorter than ETH_HLEN as the header is being pointed directly,
upper network layers don't assemble it from frags, the check in netback
is valid.

So, how can we proceed here?

-- 
  Vitaly


Re: [Xen-devel] [PATCH net-next] xen-netfront: avoid packet loss when ethernet header crosses page boundary

2016-08-23 Thread Vitaly Kuznetsov
David Vrabel <david.vra...@citrix.com> writes:

> On 22/08/16 16:42, Vitaly Kuznetsov wrote:
>> 
>> I see two ways to fix the issue:
>> - Change the 'wire' protocol between netfront and netback to start keeping
>>   the original SKB structure. We'll have to add a flag indicating the fact
>>   that the particular request is a part of the original linear part and not
>>   a frag. We'll need to know the length of the linear part to pre-allocate
>>   memory.
>
> I don't think there needs to be a protocol change.  I think the check in
> netback is bogus -- it's the total packet length that must be >
> HLEN_ETH.  The upper layers will pull any headers from the frags as
> needed

I'm afraid this is not always true, just removing the check leads us to
the following:

[  495.442186] kernel BUG at ./include/linux/skbuff.h:1927! 
[  495.468789] invalid opcode:  [#1] SMP 
[  495.490094] Modules linked in: tun loop bridge stp llc intel_rapl sb_edac 
edac_core x86_pkg_temp_thermal ipmi_ssif igb coretemp iTCO_wdt crct10dif_pclmul 
crc32_pclmul ptp ipmi_si iTCO_vendor_support ghash_clmulni_intel hpwdt 
ipmi_msghandler ioatdma hpilo pps_core lpc_ich acpi_power_meter wmi fjes 
tpm_tis dca shpchp tpm_tis_core tpm nfsd auth_rpcgss nfs_acl lockd xenfs grace 
xen_privcmd sunrpc xfs libcrc32c mgag200 i2c_algo_bit drm_kms_helper ttm drm 
crc32c_intel serio_raw xen_scsiback target_core_mod xen_pciback xen_netback 
xen_blkback xen_gntalloc xen_gntdev xen_evtchn 
[  495.749431] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc3+ #2 
[  495.782648] Hardware name: HP ProLiant DL380e Gen8, BIOS P73 08/20/2012 
[  495.817578] task: 81c0d500 task.stack: 81c0 
[  495.847805] RIP: e030:[]  [] 
eth_type_trans+0xf0/0x130 
[  495.888942] RSP: e02b:880429203d70  EFLAGS: 00010297 
[  495.916005] RAX: 0014 RBX: 88041f7bf200 RCX: 
 
[  495.952133] RDX: 88041ed76c40 RSI: 88041ad6b000 RDI: 
88041f7bf200 
[  495.988919] RBP: 880429203d80 R08:  R09: 
88041ed76cf0 
[  496.025782] R10: 1600 R11: c900041aa2f8 R12: 
000a 
[  496.061351] R13: c900041b0200 R14: 000b R15: 
c900041aa2a0 
[  496.098178] FS:  7fa2b9442880() GS:88042920() 
knlGS: 
[  496.139767] CS:  e033 DS:  ES:  CR0: 80050033 
[  496.169105] CR2: 5558e4d43ea0 CR3: 00042024e000 CR4: 
00042660 
[  496.206816] Stack: 
[  496.216904]  000b 51859c5d87cdd22f 880429203e68 
c002dd59 
[  496.254093]  8155eed0 51859c5d87cdd22f 88041a45 
000a22d66f70 
[  496.292351]  88041a45 c900041ad9e0 c900041aa3c0 
88041f7bf200 
[  496.330823] Call Trace: 
[  496.343397]
[  496.352992]  [] xenvif_tx_action+0x569/0x8b0 [xen_netback] 
[  496.389933]  [] ? scsi_put_command+0x80/0xd0 
[  496.418810]  [] ? __napi_schedule+0x47/0x50 
[  496.449097]  [] ? xenvif_tx_interrupt+0x50/0x60 
[xen_netback] 
[  496.485804]  [] ? __handle_irq_event_percpu+0x8d/0x190 
...

-- 
  Vitaly


[PATCH net-next] xen-netfront: avoid packet loss when ethernet header crosses page boundary

2016-08-22 Thread Vitaly Kuznetsov
Small packet loss is reported on complex multi host network configurations
including tunnels, NAT, ... My investigation led me to the following check
in netback which drops packets:

if (unlikely(txreq.size < ETH_HLEN)) {
netdev_err(queue->vif->dev,
   "Bad packet size: %d\n", txreq.size);
xenvif_tx_err(queue, , extra_count, idx);
break;
}

But this check itself is legitimate. SKBs consist of a linear part (which
has to have the ethernet header) and (optionally) a number of frags.
Netfront transmits the head of the linear part up to the page boundary
as the first request and all the rest becomes frags so when we're
reconstructing the SKB in netback we can't distinguish between original
frags and the 'tail' of the linear part. The first SKB needs to be at
least ETH_HLEN size. So in case we have an SKB with its linear part
starting too close to the page boundary the packet is lost.

I see two ways to fix the issue:
- Change the 'wire' protocol between netfront and netback to start keeping
  the original SKB structure. We'll have to add a flag indicating the fact
  that the particular request is a part of the original linear part and not
  a frag. We'll need to know the length of the linear part to pre-allocate
  memory.
- Avoid transmitting SKBs with linear parts starting too close to the page
  boundary. That seems preferable short-term and shouldn't bring
  significant performance degradation as such packets are rare. That's what
  this patch is trying to achieve with skb_copy().

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/net/xen-netfront.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 96ccd4e..28c4a66 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -565,6 +565,7 @@ static int xennet_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
struct netfront_queue *queue = NULL;
unsigned int num_queues = dev->real_num_tx_queues;
u16 queue_index;
+   struct sk_buff *nskb;
 
/* Drop the packet if no queues are set up */
if (num_queues < 1)
@@ -595,6 +596,19 @@ static int xennet_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
offset = offset_in_page(skb->data);
len = skb_headlen(skb);
 
+   /* The first req should be at least ETH_HLEN size or the packet will be
+* dropped by netback.
+*/
+   if (unlikely(PAGE_SIZE - offset < ETH_HLEN)) {
+   nskb = skb_copy(skb, GFP_ATOMIC);
+   if (!nskb)
+   goto drop;
+   dev_kfree_skb_any(skb);
+   skb = nskb;
+   page = virt_to_page(skb->data);
+   offset = offset_in_page(skb->data);
+   }
+
spin_lock_irqsave(>tx_lock, flags);
 
if (unlikely(!netif_carrier_ok(dev) ||
-- 
2.7.4



[PATCH net v2 4/5] hv_netvsc: protect module refcount by checking net_device_ctx->vf_netdev

2016-08-15 Thread Vitaly Kuznetsov
We're not guaranteed to see NETDEV_REGISTER/NETDEV_UNREGISTER notifications
only once per VF but we increase/decrease module refcount unconditionally.
Check vf_netdev to make sure we don't take/release it twice. We presume
that only one VF per netvsc device may exist.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
Acked-by: Haiyang Zhang <haiya...@microsoft.com>
---
 drivers/net/hyperv/netvsc_drv.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 2c90883..62a4e6e 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -1193,7 +1193,7 @@ static int netvsc_register_vf(struct net_device 
*vf_netdev)
 
net_device_ctx = netdev_priv(ndev);
netvsc_dev = net_device_ctx->nvdev;
-   if (netvsc_dev == NULL)
+   if (!netvsc_dev || net_device_ctx->vf_netdev)
return NOTIFY_DONE;
 
netdev_info(ndev, "VF registering: %s\n", vf_netdev->name);
@@ -1312,7 +1312,7 @@ static int netvsc_unregister_vf(struct net_device 
*vf_netdev)
 
net_device_ctx = netdev_priv(ndev);
netvsc_dev = net_device_ctx->nvdev;
-   if (netvsc_dev == NULL)
+   if (!netvsc_dev || !net_device_ctx->vf_netdev)
return NOTIFY_DONE;
netdev_info(ndev, "VF unregistering: %s\n", vf_netdev->name);
netvsc_inject_disable(net_device_ctx);
-- 
2.7.4



[PATCH net v2 1/5] hv_netvsc: don't lose VF information

2016-08-15 Thread Vitaly Kuznetsov
struct netvsc_device is not suitable for storing VF information as this
structure is being destroyed on MTU change / set channel operation (see
rndis_filter_device_remove()). Move all VF related stuff to struct
net_device_context which is persistent.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
Acked-by: Haiyang Zhang <haiya...@microsoft.com>
---
 drivers/net/hyperv/hyperv_net.h | 19 
 drivers/net/hyperv/netvsc.c | 19 +++-
 drivers/net/hyperv/netvsc_drv.c | 49 +++--
 3 files changed, 45 insertions(+), 42 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 467fb8b..3b3ecf2 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -647,7 +647,7 @@ struct netvsc_reconfig {
 struct garp_wrk {
struct work_struct dwrk;
struct net_device *netdev;
-   struct netvsc_device *netvsc_dev;
+   struct net_device_context *net_device_ctx;
 };
 
 /* The context of the netvsc device  */
@@ -678,6 +678,15 @@ struct net_device_context {
 
/* the device is going away */
bool start_remove;
+
+   /* State to manage the associated VF interface. */
+   struct net_device *vf_netdev;
+   bool vf_inject;
+   atomic_t vf_use_cnt;
+   /* 1: allocated, serial number is valid. 0: not allocated */
+   u32 vf_alloc;
+   /* Serial number of the VF to team with */
+   u32 vf_serial;
 };
 
 /* Per netvsc device */
@@ -733,15 +742,7 @@ struct netvsc_device {
u32 max_pkt; /* max number of pkt in one send, e.g. 8 */
u32 pkt_align; /* alignment bytes, e.g. 8 */
 
-   /* 1: allocated, serial number is valid. 0: not allocated */
-   u32 vf_alloc;
-   /* Serial number of the VF to team with */
-   u32 vf_serial;
atomic_t open_cnt;
-   /* State to manage the associated VF interface. */
-   bool vf_inject;
-   struct net_device *vf_netdev;
-   atomic_t vf_use_cnt;
 };
 
 static inline struct netvsc_device *
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 20e0917..410fb8e8 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -77,13 +77,9 @@ static struct netvsc_device *alloc_net_device(void)
init_waitqueue_head(_device->wait_drain);
net_device->destroy = false;
atomic_set(_device->open_cnt, 0);
-   atomic_set(_device->vf_use_cnt, 0);
net_device->max_pkt = RNDIS_MAX_PKT_DEFAULT;
net_device->pkt_align = RNDIS_PKT_ALIGN_DEFAULT;
 
-   net_device->vf_netdev = NULL;
-   net_device->vf_inject = false;
-
return net_device;
 }
 
@@ -1106,16 +1102,16 @@ static void netvsc_send_table(struct hv_device *hdev,
nvscdev->send_table[i] = tab[i];
 }
 
-static void netvsc_send_vf(struct netvsc_device *nvdev,
+static void netvsc_send_vf(struct net_device_context *net_device_ctx,
   struct nvsp_message *nvmsg)
 {
-   nvdev->vf_alloc = nvmsg->msg.v4_msg.vf_assoc.allocated;
-   nvdev->vf_serial = nvmsg->msg.v4_msg.vf_assoc.serial;
+   net_device_ctx->vf_alloc = nvmsg->msg.v4_msg.vf_assoc.allocated;
+   net_device_ctx->vf_serial = nvmsg->msg.v4_msg.vf_assoc.serial;
 }
 
 static inline void netvsc_receive_inband(struct hv_device *hdev,
-struct netvsc_device *nvdev,
-struct nvsp_message *nvmsg)
+struct net_device_context *net_device_ctx,
+struct nvsp_message *nvmsg)
 {
switch (nvmsg->hdr.msg_type) {
case NVSP_MSG5_TYPE_SEND_INDIRECTION_TABLE:
@@ -1123,7 +1119,7 @@ static inline void netvsc_receive_inband(struct hv_device 
*hdev,
break;
 
case NVSP_MSG4_TYPE_SEND_VF_ASSOCIATION:
-   netvsc_send_vf(nvdev, nvmsg);
+   netvsc_send_vf(net_device_ctx, nvmsg);
break;
}
 }
@@ -1136,6 +1132,7 @@ static void netvsc_process_raw_pkt(struct hv_device 
*device,
   struct vmpacket_descriptor *desc)
 {
struct nvsp_message *nvmsg;
+   struct net_device_context *net_device_ctx = netdev_priv(ndev);
 
nvmsg = (struct nvsp_message *)((unsigned long)
desc + (desc->offset8 << 3));
@@ -1150,7 +1147,7 @@ static void netvsc_process_raw_pkt(struct hv_device 
*device,
break;
 
case VM_PKT_DATA_INBAND:
-   netvsc_receive_inband(device, net_device, nvmsg);
+   netvsc_receive_inband(device, net_device_ctx, nvmsg);
break;
 
default:
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 41bd952..794139b 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c

[PATCH net v2 0/5] hv_netvsc: fixes for VF removal path

2016-08-15 Thread Vitaly Kuznetsov
Kernel crash is reported after VF is removed and detached from netvsc
device. Turns out we have multiple different (but related) issues on the
VF removal path which I'm trying to address with PATCHes 2-5 of this
series. PATCH1 is required to support the change.

Changes since v1:
- Re-arrange patches in the series to not introduce new issues [David Miller]
- Add PATCH5 which fixes a new issue I discovered while testing.
- Add Haiyang' A-b tags to PATCH1-4

With regards to Stephen's suggestion: I believe that switching to using RCU
and eliminating vf_use_cnt/vf_inject is the right thing to do long-term, we
can either put this on top of this series or do it later in net-next.

Vitaly Kuznetsov (5):
  hv_netvsc: don't lose VF information
  hv_netvsc: avoid deadlocks between rtnl lock and vf_use_cnt wait
  hv_netvsc: reset vf_inject on VF removal
  hv_netvsc: protect module refcount by checking
net_device_ctx->vf_netdev
  hv_netvsc: fix bonding devices check in netvsc_netdev_event()

 drivers/net/hyperv/hyperv_net.h |  24 -
 drivers/net/hyperv/netvsc.c |  19 +++-
 drivers/net/hyperv/netvsc_drv.c | 105 +++-
 3 files changed, 66 insertions(+), 82 deletions(-)

-- 
2.7.4



[PATCH net v2 3/5] hv_netvsc: reset vf_inject on VF removal

2016-08-15 Thread Vitaly Kuznetsov
We reset vf_inject on VF going down (netvsc_vf_down()) but we don't on
VF removal (netvsc_unregister_vf()) so vf_inject stays 'true' while
vf_netdev is already NULL and we're trying to inject packets into NULL
net device in netvsc_recv_callback() causing kernel to crash.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
Acked-by: Haiyang Zhang <haiya...@microsoft.com>
---
 drivers/net/hyperv/netvsc_drv.c | 26 --
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 70317fa..2c90883 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -1205,6 +1205,19 @@ static int netvsc_register_vf(struct net_device 
*vf_netdev)
return NOTIFY_OK;
 }
 
+static void netvsc_inject_enable(struct net_device_context *net_device_ctx)
+{
+   net_device_ctx->vf_inject = true;
+}
+
+static void netvsc_inject_disable(struct net_device_context *net_device_ctx)
+{
+   net_device_ctx->vf_inject = false;
+
+   /* Wait for currently active users to drain out. */
+   while (atomic_read(_device_ctx->vf_use_cnt) != 0)
+   udelay(50);
+}
 
 static int netvsc_vf_up(struct net_device *vf_netdev)
 {
@@ -1227,7 +1240,7 @@ static int netvsc_vf_up(struct net_device *vf_netdev)
return NOTIFY_DONE;
 
netdev_info(ndev, "VF up: %s\n", vf_netdev->name);
-   net_device_ctx->vf_inject = true;
+   netvsc_inject_enable(net_device_ctx);
 
/*
 * Open the device before switching data path.
@@ -1270,14 +1283,7 @@ static int netvsc_vf_down(struct net_device *vf_netdev)
return NOTIFY_DONE;
 
netdev_info(ndev, "VF down: %s\n", vf_netdev->name);
-   net_device_ctx->vf_inject = false;
-   /*
-* Wait for currently active users to
-* drain out.
-*/
-
-   while (atomic_read(_device_ctx->vf_use_cnt) != 0)
-   udelay(50);
+   netvsc_inject_disable(net_device_ctx);
netvsc_switch_datapath(ndev, false);
netdev_info(ndev, "Data path switched from VF: %s\n", vf_netdev->name);
rndis_filter_close(netvsc_dev);
@@ -1309,7 +1315,7 @@ static int netvsc_unregister_vf(struct net_device 
*vf_netdev)
if (netvsc_dev == NULL)
return NOTIFY_DONE;
netdev_info(ndev, "VF unregistering: %s\n", vf_netdev->name);
-
+   netvsc_inject_disable(net_device_ctx);
net_device_ctx->vf_netdev = NULL;
module_put(THIS_MODULE);
return NOTIFY_OK;
-- 
2.7.4



[PATCH net v2 2/5] hv_netvsc: avoid deadlocks between rtnl lock and vf_use_cnt wait

2016-08-15 Thread Vitaly Kuznetsov
Here is a deadlock scenario:
- netvsc_vf_up() schedules netvsc_notify_peers() work and quits.
- netvsc_vf_down() runs before netvsc_notify_peers() gets executed. As it
  is being executed from netdev notifier chain we hold rtnl lock when we
  get here.
- we enter while (atomic_read(_device_ctx->vf_use_cnt) != 0) loop and
  wait till netvsc_notify_peers() drops vf_use_cnt.
- netvsc_notify_peers() starts on some other CPU but netdev_notify_peers()
  will hang on rtnl_lock().
- deadlock!

Instead of introducing additional synchronization I suggest we drop
gwrk.dwrk completely and call NETDEV_NOTIFY_PEERS directly. As we're
acting under rtnl lock this is legitimate.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
Acked-by: Haiyang Zhang <haiya...@microsoft.com>
---
Changes since v1:
- Move the patch ahead in the series to avoid introducing new blocking
  issues and solving them later in the series. [David Miller]
---
 drivers/net/hyperv/hyperv_net.h |  7 ---
 drivers/net/hyperv/netvsc_drv.c | 33 +
 2 files changed, 5 insertions(+), 35 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 3b3ecf2..591af71 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -644,12 +644,6 @@ struct netvsc_reconfig {
u32 event;
 };
 
-struct garp_wrk {
-   struct work_struct dwrk;
-   struct net_device *netdev;
-   struct net_device_context *net_device_ctx;
-};
-
 /* The context of the netvsc device  */
 struct net_device_context {
/* point back to our device context */
@@ -667,7 +661,6 @@ struct net_device_context {
 
struct work_struct work;
u32 msg_enable; /* debug level */
-   struct garp_wrk gwrk;
 
struct netvsc_stats __percpu *tx_stats;
struct netvsc_stats __percpu *rx_stats;
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 794139b..70317fa 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -1151,17 +1151,6 @@ static void netvsc_free_netdev(struct net_device *netdev)
free_netdev(netdev);
 }
 
-static void netvsc_notify_peers(struct work_struct *wrk)
-{
-   struct garp_wrk *gwrk;
-
-   gwrk = container_of(wrk, struct garp_wrk, dwrk);
-
-   netdev_notify_peers(gwrk->netdev);
-
-   atomic_dec(>net_device_ctx->vf_use_cnt);
-}
-
 static struct net_device *get_netvsc_net_device(char *mac)
 {
struct net_device *dev, *found = NULL;
@@ -1253,15 +1242,8 @@ static int netvsc_vf_up(struct net_device *vf_netdev)
 
netif_carrier_off(ndev);
 
-   /*
-* Now notify peers. We are scheduling work to
-* notify peers; take a reference to prevent
-* the VF interface from vanishing.
-*/
-   atomic_inc(_device_ctx->vf_use_cnt);
-   net_device_ctx->gwrk.netdev = vf_netdev;
-   net_device_ctx->gwrk.net_device_ctx = net_device_ctx;
-   schedule_work(_device_ctx->gwrk.dwrk);
+   /* Now notify peers through VF device. */
+   call_netdevice_notifiers(NETDEV_NOTIFY_PEERS, vf_netdev);
 
return NOTIFY_OK;
 }
@@ -1300,13 +1282,9 @@ static int netvsc_vf_down(struct net_device *vf_netdev)
netdev_info(ndev, "Data path switched from VF: %s\n", vf_netdev->name);
rndis_filter_close(netvsc_dev);
netif_carrier_on(ndev);
-   /*
-* Notify peers.
-*/
-   atomic_inc(_device_ctx->vf_use_cnt);
-   net_device_ctx->gwrk.netdev = ndev;
-   net_device_ctx->gwrk.net_device_ctx = net_device_ctx;
-   schedule_work(_device_ctx->gwrk.dwrk);
+
+   /* Now notify peers through netvsc device. */
+   call_netdevice_notifiers(NETDEV_NOTIFY_PEERS, ndev);
 
return NOTIFY_OK;
 }
@@ -1378,7 +1356,6 @@ static int netvsc_probe(struct hv_device *dev,
 
INIT_DELAYED_WORK(_device_ctx->dwork, netvsc_link_change);
INIT_WORK(_device_ctx->work, do_set_multicast);
-   INIT_WORK(_device_ctx->gwrk.dwrk, netvsc_notify_peers);
 
spin_lock_init(_device_ctx->lock);
INIT_LIST_HEAD(_device_ctx->reconfig_events);
-- 
2.7.4



[PATCH net v2 5/5] hv_netvsc: fix bonding devices check in netvsc_netdev_event()

2016-08-15 Thread Vitaly Kuznetsov
Bonding driver sets IFF_BONDING on both master (the bonding device) and
slave (the real NIC) devices and in netvsc_netdev_event() we want to skip
master devices only. Currently, there is an uncertainty when a slave
interface is removed: if bonding module comes first in netdev_chain it
clears IFF_BONDING flag on the netdev and netvsc_netdev_event() correctly
handles NETDEV_UNREGISTER event, but in case netvsc comes first on the
chain it sees the device with IFF_BONDING still attached and skips it. As
we still hold vf_netdev pointer to the device we crash on the next inject.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/net/hyperv/netvsc_drv.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 62a4e6e..3ba29fc 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -1482,8 +1482,13 @@ static int netvsc_netdev_event(struct notifier_block 
*this,
 {
struct net_device *event_dev = netdev_notifier_info_to_dev(ptr);
 
-   /* Avoid Vlan, Bonding dev with same MAC registering as VF */
-   if (event_dev->priv_flags & (IFF_802_1Q_VLAN | IFF_BONDING))
+   /* Avoid Vlan dev with same MAC registering as VF */
+   if (event_dev->priv_flags & IFF_802_1Q_VLAN)
+   return NOTIFY_DONE;
+
+   /* Avoid Bonding master dev with same MAC registering as VF */
+   if (event_dev->priv_flags & IFF_BONDING &&
+   event_dev->flags & IFF_MASTER)
return NOTIFY_DONE;
 
switch (event) {
-- 
2.7.4



Re: [RFC 2/2] netvsc: use RCU for VF net device reference

2016-08-15 Thread Vitaly Kuznetsov
Stephen Hemminger  writes:

> Rather than keeping a pointer, a flag, and reference count, use RCU and 
> existing
> device reference count to protect the synthetic to VF relationship.

Thanks! I like the idea. Some nitpicks below ...

>
> One other change is that injected packets must be accounted for on the 
> synthetic
> device otherwise the statistics will be lost. The VF device driver (for most 
> devices)
> creates the statistics based on device registers and therefore would ignore 
> any direct
> manipulation of network device stats.
>
> Also, rx_dropped is not atomic_long.
>
> Signed-off-by: Stephen Hemminger 
>
> --- a/drivers/net/hyperv/hyperv_net.h 2016-08-13 11:25:59.764085593 -0700
> +++ b/drivers/net/hyperv/hyperv_net.h 2016-08-13 11:25:59.736085464 -0700
> @@ -689,6 +689,9 @@ struct netvsc_device {
>   wait_queue_head_t wait_drain;
>   bool destroy;
>
> + /* State to manage the associated VF interface. */
> + struct net_device *vf_netdev __rcu;
> +
>   /* Receive buffer allocated by us but manages by NetVSP */
>   void *recv_buf;
>   u32 recv_buf_size;
> @@ -739,10 +742,6 @@ struct netvsc_device {
>   /* Serial number of the VF to team with */
>   u32 vf_serial;
>   atomic_t open_cnt;
> - /* State to manage the associated VF interface. */
> - bool vf_inject;
> - struct net_device *vf_netdev;
> - atomic_t vf_use_cnt;
>  };
>
>  static inline struct netvsc_device *
> --- a/drivers/net/hyperv/netvsc.c 2016-08-13 11:25:59.764085593 -0700
> +++ b/drivers/net/hyperv/netvsc.c 2016-08-13 11:25:59.736085464 -0700
> @@ -77,13 +77,10 @@ static struct netvsc_device *alloc_net_d
>   init_waitqueue_head(_device->wait_drain);
>   net_device->destroy = false;
>   atomic_set(_device->open_cnt, 0);
> - atomic_set(_device->vf_use_cnt, 0);
> +
>   net_device->max_pkt = RNDIS_MAX_PKT_DEFAULT;
>   net_device->pkt_align = RNDIS_PKT_ALIGN_DEFAULT;
>
> - net_device->vf_netdev = NULL;
> - net_device->vf_inject = false;
> -
>   return net_device;
>  }
>
> --- a/drivers/net/hyperv/netvsc_drv.c 2016-08-13 11:25:59.764085593 -0700
> +++ b/drivers/net/hyperv/netvsc_drv.c 2016-08-13 11:31:47.733685146 -0700
> @@ -668,59 +668,45 @@ int netvsc_recv_callback(struct hv_devic
>  {
>   struct net_device *net = hv_get_drvdata(device_obj);
>   struct net_device_context *net_device_ctx = netdev_priv(net);
> - struct sk_buff *skb;
> - struct sk_buff *vf_skb;
> - struct netvsc_stats *rx_stats;
> + struct netvsc_stats *rx_stats = this_cpu_ptr(net_device_ctx->rx_stats);
>   struct netvsc_device *netvsc_dev = net_device_ctx->nvdev;
> - u32 bytes_recvd = packet->total_data_buflen;
> - int ret = 0;
> + struct net_device *vf_netdev;
> + struct sk_buff *skb;
>
>   if (!net || net->reg_state != NETREG_REGISTERED)
>   return NVSP_STAT_FAIL;
>
> - if (READ_ONCE(netvsc_dev->vf_inject)) {
> - atomic_inc(_dev->vf_use_cnt);
> - if (!READ_ONCE(netvsc_dev->vf_inject)) {
> - /*
> -  * We raced; just move on.
> -  */
> - atomic_dec(_dev->vf_use_cnt);
> - goto vf_injection_done;
> - }
> + vf_netdev = rcu_dereference(netvsc_dev->vf_netdev);
> + if (vf_netdev) {
> + /* Inject this packet into the VF interface.  On
> +  * Hyper-V, multicast and broadcast packets are only
> +  * delivered on the synthetic interface (after
> +  * subjecting these to policy filters on the
> +  * host). Deliver these via the VF interface in the
> +  * guest if up, otherwise drop.
> +  */
> + if (!netif_running(vf_netdev))
> + goto drop;

Why drop? In case VF is not running I guess it would be better to
receive the packet through netvsc interface.

>
> - /*
> -  * Inject this packet into the VF inerface.
> -  * On Hyper-V, multicast and brodcast packets
> -  * are only delivered on the synthetic interface
> -  * (after subjecting these to policy filters on
> -  * the host). Deliver these via the VF interface
> -  * in the guest.
> + /* Account for this on the synthetic interface
> +  * otherwise likely to be not accounted for since
> +  * device statistics on the VF are driver dependent.
>*/
> - vf_skb = netvsc_alloc_recv_skb(netvsc_dev->vf_netdev, packet,
> -csum_info, *data, vlan_tci);
> - if (vf_skb != NULL) {
> - ++netvsc_dev->vf_netdev->stats.rx_packets;
> - netvsc_dev->vf_netdev->stats.rx_bytes += bytes_recvd;
> - netif_receive_skb(vf_skb);

Re: [PATCH net 2/4] hv_netvsc: reset vf_inject on VF removal

2016-08-11 Thread Vitaly Kuznetsov
Yuval Mintz  writes:

>> +static void netvsc_inject_enable(struct net_device_context
>> +*net_device_ctx) {
>> +net_device_ctx->vf_inject = true;
>> +}
>> +
>> +static void netvsc_inject_disable(struct net_device_context
>> +*net_device_ctx) {
>> +net_device_ctx->vf_inject = false;
>> +
>> +/* Wait for currently active users to drain out. */
>> +while (atomic_read(_device_ctx->vf_use_cnt) != 0)
>> +udelay(50);
>> +}
>
> That was already the behavior before, but are you certain you
> want to unconditionally block without any possible timeout?

Yes, this is OK. After PATCH4 of this series there is only one place
which takes the vf_use_cnt (netvsc_recv_callback()) and it is an
interrupt handler, there are no sleepable operations there.

-- 
  Vitaly


[PATCH net 4/4] hv_netvsc: avoid deadlocks between rtnl lock and netvsc_inject_disable()

2016-08-11 Thread Vitaly Kuznetsov
Here is a deadlock scenario:
- netvsc_vf_up() schedules netvsc_notify_peers() work and quits.
- netvsc_vf_down() runs before netvsc_notify_peers() gets executed. As it
  is being executed from netdev notifier chain we hold rtnl lock when we
  get here.
- we enter netvsc_inject_disable() and loop and wait till
  netvsc_notify_peers() drops vf_use_cnt.
- netvsc_notify_peers() starts on some other CPU but netdev_notify_peers()
  will hang on rtnl_lock().
- deadlock!

Similar deadlocks are possible between netvsc_vf_{up,down}() and
netvsc_unregister_vf() as it also waits till vf_use_cnt drops to zero.
Instead of introducing additional synchronization I suggest we drop
gwrk.dwrk completely and call NETDEV_NOTIFY_PEERS directly. As we're
acting under rtnl lock this is legitimate.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/net/hyperv/hyperv_net.h |  7 ---
 drivers/net/hyperv/netvsc_drv.c | 33 +
 2 files changed, 5 insertions(+), 35 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 3b3ecf2..591af71 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -644,12 +644,6 @@ struct netvsc_reconfig {
u32 event;
 };
 
-struct garp_wrk {
-   struct work_struct dwrk;
-   struct net_device *netdev;
-   struct net_device_context *net_device_ctx;
-};
-
 /* The context of the netvsc device  */
 struct net_device_context {
/* point back to our device context */
@@ -667,7 +661,6 @@ struct net_device_context {
 
struct work_struct work;
u32 msg_enable; /* debug level */
-   struct garp_wrk gwrk;
 
struct netvsc_stats __percpu *tx_stats;
struct netvsc_stats __percpu *rx_stats;
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 874829a..62a4e6e 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -1151,17 +1151,6 @@ static void netvsc_free_netdev(struct net_device *netdev)
free_netdev(netdev);
 }
 
-static void netvsc_notify_peers(struct work_struct *wrk)
-{
-   struct garp_wrk *gwrk;
-
-   gwrk = container_of(wrk, struct garp_wrk, dwrk);
-
-   netdev_notify_peers(gwrk->netdev);
-
-   atomic_dec(>net_device_ctx->vf_use_cnt);
-}
-
 static struct net_device *get_netvsc_net_device(char *mac)
 {
struct net_device *dev, *found = NULL;
@@ -1266,15 +1255,8 @@ static int netvsc_vf_up(struct net_device *vf_netdev)
 
netif_carrier_off(ndev);
 
-   /*
-* Now notify peers. We are scheduling work to
-* notify peers; take a reference to prevent
-* the VF interface from vanishing.
-*/
-   atomic_inc(_device_ctx->vf_use_cnt);
-   net_device_ctx->gwrk.netdev = vf_netdev;
-   net_device_ctx->gwrk.net_device_ctx = net_device_ctx;
-   schedule_work(_device_ctx->gwrk.dwrk);
+   /* Now notify peers through VF device. */
+   call_netdevice_notifiers(NETDEV_NOTIFY_PEERS, vf_netdev);
 
return NOTIFY_OK;
 }
@@ -1306,13 +1288,9 @@ static int netvsc_vf_down(struct net_device *vf_netdev)
netdev_info(ndev, "Data path switched from VF: %s\n", vf_netdev->name);
rndis_filter_close(netvsc_dev);
netif_carrier_on(ndev);
-   /*
-* Notify peers.
-*/
-   atomic_inc(_device_ctx->vf_use_cnt);
-   net_device_ctx->gwrk.netdev = ndev;
-   net_device_ctx->gwrk.net_device_ctx = net_device_ctx;
-   schedule_work(_device_ctx->gwrk.dwrk);
+
+   /* Now notify peers through netvsc device. */
+   call_netdevice_notifiers(NETDEV_NOTIFY_PEERS, ndev);
 
return NOTIFY_OK;
 }
@@ -1384,7 +1362,6 @@ static int netvsc_probe(struct hv_device *dev,
 
INIT_DELAYED_WORK(_device_ctx->dwork, netvsc_link_change);
INIT_WORK(_device_ctx->work, do_set_multicast);
-   INIT_WORK(_device_ctx->gwrk.dwrk, netvsc_notify_peers);
 
spin_lock_init(_device_ctx->lock);
INIT_LIST_HEAD(_device_ctx->reconfig_events);
-- 
2.7.4



[PATCH net 3/4] hv_netvsc: protect module refcount by checking net_device_ctx->vf_netdev

2016-08-11 Thread Vitaly Kuznetsov
We're not guaranteed to see NETDEV_REGISTER/NETDEV_UNREGISTER notifications
only once per VF but we increase/decrease module refcount unconditionally.
Check vf_netdev to make sure we don't take/release it twice. We presume
that only one VF per netvsc device may exist.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/net/hyperv/netvsc_drv.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index b3c31e3..874829a 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -1204,7 +1204,7 @@ static int netvsc_register_vf(struct net_device 
*vf_netdev)
 
net_device_ctx = netdev_priv(ndev);
netvsc_dev = net_device_ctx->nvdev;
-   if (netvsc_dev == NULL)
+   if (!netvsc_dev || net_device_ctx->vf_netdev)
return NOTIFY_DONE;
 
netdev_info(ndev, "VF registering: %s\n", vf_netdev->name);
@@ -1334,7 +1334,7 @@ static int netvsc_unregister_vf(struct net_device 
*vf_netdev)
 
net_device_ctx = netdev_priv(ndev);
netvsc_dev = net_device_ctx->nvdev;
-   if (netvsc_dev == NULL)
+   if (!netvsc_dev || !net_device_ctx->vf_netdev)
return NOTIFY_DONE;
netdev_info(ndev, "VF unregistering: %s\n", vf_netdev->name);
netvsc_inject_disable(net_device_ctx);
-- 
2.7.4



[PATCH net 0/4] hv_netvsc: fixes for VF removal path

2016-08-11 Thread Vitaly Kuznetsov
Kernel crash is reported after VF is removed and detached from netvsc
device. My investigation led me to PATCH2 of this series but PATCH1 is
required to support the change. I also noticed a couple of other issues
while debugging and I fix them with PATCH3 and PATCH4.

Please review.

Vitaly Kuznetsov (4):
  hv_netvsc: don't lose VF information
  hv_netvsc: reset vf_inject on VF removal
  hv_netvsc: protect module refcount by checking
net_device_ctx->vf_netdev
  hv_netvsc: avoid deadlocks between rtnl lock and
netvsc_inject_disable()

 drivers/net/hyperv/hyperv_net.h | 24 ---
 drivers/net/hyperv/netvsc.c | 19 
 drivers/net/hyperv/netvsc_drv.c | 96 ++---
 3 files changed, 59 insertions(+), 80 deletions(-)

-- 
2.7.4



[PATCH net 2/4] hv_netvsc: reset vf_inject on VF removal

2016-08-11 Thread Vitaly Kuznetsov
We reset vf_inject on VF going down (netvsc_vf_down()) but we don't on
VF removal (netvsc_unregister_vf()) so vf_inject stays 'true' while
vf_netdev is already NULL and we're trying to inject packets into NULL
net device in netvsc_recv_callback() causing kernel to crash.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/net/hyperv/netvsc_drv.c | 26 --
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 794139b..b3c31e3 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -1216,6 +1216,19 @@ static int netvsc_register_vf(struct net_device 
*vf_netdev)
return NOTIFY_OK;
 }
 
+static void netvsc_inject_enable(struct net_device_context *net_device_ctx)
+{
+   net_device_ctx->vf_inject = true;
+}
+
+static void netvsc_inject_disable(struct net_device_context *net_device_ctx)
+{
+   net_device_ctx->vf_inject = false;
+
+   /* Wait for currently active users to drain out. */
+   while (atomic_read(_device_ctx->vf_use_cnt) != 0)
+   udelay(50);
+}
 
 static int netvsc_vf_up(struct net_device *vf_netdev)
 {
@@ -1238,7 +1251,7 @@ static int netvsc_vf_up(struct net_device *vf_netdev)
return NOTIFY_DONE;
 
netdev_info(ndev, "VF up: %s\n", vf_netdev->name);
-   net_device_ctx->vf_inject = true;
+   netvsc_inject_enable(net_device_ctx);
 
/*
 * Open the device before switching data path.
@@ -1288,14 +1301,7 @@ static int netvsc_vf_down(struct net_device *vf_netdev)
return NOTIFY_DONE;
 
netdev_info(ndev, "VF down: %s\n", vf_netdev->name);
-   net_device_ctx->vf_inject = false;
-   /*
-* Wait for currently active users to
-* drain out.
-*/
-
-   while (atomic_read(_device_ctx->vf_use_cnt) != 0)
-   udelay(50);
+   netvsc_inject_disable(net_device_ctx);
netvsc_switch_datapath(ndev, false);
netdev_info(ndev, "Data path switched from VF: %s\n", vf_netdev->name);
rndis_filter_close(netvsc_dev);
@@ -1331,7 +1337,7 @@ static int netvsc_unregister_vf(struct net_device 
*vf_netdev)
if (netvsc_dev == NULL)
return NOTIFY_DONE;
netdev_info(ndev, "VF unregistering: %s\n", vf_netdev->name);
-
+   netvsc_inject_disable(net_device_ctx);
net_device_ctx->vf_netdev = NULL;
module_put(THIS_MODULE);
return NOTIFY_OK;
-- 
2.7.4



[PATCH net 1/4] hv_netvsc: don't lose VF information

2016-08-11 Thread Vitaly Kuznetsov
struct netvsc_device is not suitable for storing VF information as this
structure is being destroyed on MTU change / set channel operation (see
rndis_filter_device_remove()). Move all VF related stuff to struct
net_device_context which is persistent.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/net/hyperv/hyperv_net.h | 19 
 drivers/net/hyperv/netvsc.c | 19 +++-
 drivers/net/hyperv/netvsc_drv.c | 49 +++--
 3 files changed, 45 insertions(+), 42 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 467fb8b..3b3ecf2 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -647,7 +647,7 @@ struct netvsc_reconfig {
 struct garp_wrk {
struct work_struct dwrk;
struct net_device *netdev;
-   struct netvsc_device *netvsc_dev;
+   struct net_device_context *net_device_ctx;
 };
 
 /* The context of the netvsc device  */
@@ -678,6 +678,15 @@ struct net_device_context {
 
/* the device is going away */
bool start_remove;
+
+   /* State to manage the associated VF interface. */
+   struct net_device *vf_netdev;
+   bool vf_inject;
+   atomic_t vf_use_cnt;
+   /* 1: allocated, serial number is valid. 0: not allocated */
+   u32 vf_alloc;
+   /* Serial number of the VF to team with */
+   u32 vf_serial;
 };
 
 /* Per netvsc device */
@@ -733,15 +742,7 @@ struct netvsc_device {
u32 max_pkt; /* max number of pkt in one send, e.g. 8 */
u32 pkt_align; /* alignment bytes, e.g. 8 */
 
-   /* 1: allocated, serial number is valid. 0: not allocated */
-   u32 vf_alloc;
-   /* Serial number of the VF to team with */
-   u32 vf_serial;
atomic_t open_cnt;
-   /* State to manage the associated VF interface. */
-   bool vf_inject;
-   struct net_device *vf_netdev;
-   atomic_t vf_use_cnt;
 };
 
 static inline struct netvsc_device *
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 20e0917..410fb8e8 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -77,13 +77,9 @@ static struct netvsc_device *alloc_net_device(void)
init_waitqueue_head(_device->wait_drain);
net_device->destroy = false;
atomic_set(_device->open_cnt, 0);
-   atomic_set(_device->vf_use_cnt, 0);
net_device->max_pkt = RNDIS_MAX_PKT_DEFAULT;
net_device->pkt_align = RNDIS_PKT_ALIGN_DEFAULT;
 
-   net_device->vf_netdev = NULL;
-   net_device->vf_inject = false;
-
return net_device;
 }
 
@@ -1106,16 +1102,16 @@ static void netvsc_send_table(struct hv_device *hdev,
nvscdev->send_table[i] = tab[i];
 }
 
-static void netvsc_send_vf(struct netvsc_device *nvdev,
+static void netvsc_send_vf(struct net_device_context *net_device_ctx,
   struct nvsp_message *nvmsg)
 {
-   nvdev->vf_alloc = nvmsg->msg.v4_msg.vf_assoc.allocated;
-   nvdev->vf_serial = nvmsg->msg.v4_msg.vf_assoc.serial;
+   net_device_ctx->vf_alloc = nvmsg->msg.v4_msg.vf_assoc.allocated;
+   net_device_ctx->vf_serial = nvmsg->msg.v4_msg.vf_assoc.serial;
 }
 
 static inline void netvsc_receive_inband(struct hv_device *hdev,
-struct netvsc_device *nvdev,
-struct nvsp_message *nvmsg)
+struct net_device_context *net_device_ctx,
+struct nvsp_message *nvmsg)
 {
switch (nvmsg->hdr.msg_type) {
case NVSP_MSG5_TYPE_SEND_INDIRECTION_TABLE:
@@ -1123,7 +1119,7 @@ static inline void netvsc_receive_inband(struct hv_device 
*hdev,
break;
 
case NVSP_MSG4_TYPE_SEND_VF_ASSOCIATION:
-   netvsc_send_vf(nvdev, nvmsg);
+   netvsc_send_vf(net_device_ctx, nvmsg);
break;
}
 }
@@ -1136,6 +1132,7 @@ static void netvsc_process_raw_pkt(struct hv_device 
*device,
   struct vmpacket_descriptor *desc)
 {
struct nvsp_message *nvmsg;
+   struct net_device_context *net_device_ctx = netdev_priv(ndev);
 
nvmsg = (struct nvsp_message *)((unsigned long)
desc + (desc->offset8 << 3));
@@ -1150,7 +1147,7 @@ static void netvsc_process_raw_pkt(struct hv_device 
*device,
break;
 
case VM_PKT_DATA_INBAND:
-   netvsc_receive_inband(device, net_device, nvmsg);
+   netvsc_receive_inband(device, net_device_ctx, nvmsg);
break;
 
default:
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 41bd952..794139b 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -658,20 +658,19 @@ int netvsc_recv_callback(struct hv_device *de

Re: [PATCH v15 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-07-08 Thread Vitaly Kuznetsov
Dexuan Cui <de...@microsoft.com> writes:

> Hyper-V Sockets (hv_sock) supplies a byte-stream based communication
> mechanism between the host and the guest. It's somewhat like TCP over
> VMBus, but the transportation layer (VMBus) is much simpler than IP.
>
> With Hyper-V Sockets, applications between the host and the guest can talk
> to each other directly by the traditional BSD-style socket APIs.
>
> Hyper-V Sockets is only available on new Windows hosts, like Windows Server
> 2016. More info is in this article "Make your own integration services":
> https://msdn.microsoft.com/en-us/virtualization/hyperv_on_windows/develop/make_mgmt_service
>
> The patch implements the necessary support in the guest side by introducing
> a new socket address family AF_HYPERV.
>
> Signed-off-by: Dexuan Cui <de...@microsoft.com>
> Cc: "K. Y. Srinivasan" <k...@microsoft.com>
> Cc: Haiyang Zhang <haiya...@microsoft.com>
> Cc: Vitaly Kuznetsov <vkuzn...@redhat.com>

Some comments below. The vast majority of them are really minor, the
only thing which bothers me a little bit is WARN() in hvsock_sendmsg()
which I think shouldn't be there. But I may have missed something.

I didn't do any tests for the code.

> Cc: Cathy Avery <cav...@redhat.com>
> ---
>
> You can also get the patch here (2764221d):
> https://github.com/dcui/linux/commits/decui/hv_sock/net-next/20160708_v15
>
> For the change log before v12, please see https://lkml.org/lkml/2016/5/15/31
>
> In v12, the changes are mainly the following:
>
> 1) remove the module params as David suggested.
>
> 2) use 5 exact pages for VMBus send/recv rings, respectively.
> The host side's design of the feature requires 5 exact pages for recv/send
> rings respectively -- this is suboptimal considering memory consumption,
> however unluckily we have to live with it, before the host comes up with
> a new design in the future. :-(
>
> 3) remove the per-connection static send/recv buffers
> Instead, we allocate and free the buffers dynamically only when we recv/send
> data. This means: when a connection is idle, no memory is consumed as
> recv/send buffers at all.
>
> In v13:
> I return ENOMEM on buffer alllocation failure
>
>Actually "man read/write" says "Other errors may occur, depending on the
> object connected to fd". "man send/recv" indeed lists ENOMEM.
>Considering AF_HYPERV is a new socket type, ENOMEM seems OK here.
>In the long run, I think we should add a new API in the VMBus driver,
> allowing data copy from VMBus ringbuffer into user mode buffer directly.
> This way, we can even eliminate this temporary buffer.
>
> In v14:
> fix some coding style issues pointed out by David.
>
> In v15:
> Just some stylistic changes addressing comments from Joe Perches and
> Olaf Hering -- thank you!
> - add a GPL blurb.
> - define a new macro PAGE_SIZE_4K and use it to replace PAGE_SIZE
> - change sk_to_hvsock/hvsock_to_sk() from macros to inline functions
> - remove a not-very-useful pr_err()
> - fix some typos in comment and coding style issues.
>
> Looking forward to your comments!
>
>  MAINTAINERS |2 +
>  include/linux/hyperv.h  |   13 +
>  include/linux/socket.h  |4 +-
>  include/net/af_hvsock.h |   78 +++
>  include/uapi/linux/hyperv.h |   24 +
>  net/Kconfig |1 +
>  net/Makefile|1 +
>  net/hv_sock/Kconfig |   10 +
>  net/hv_sock/Makefile|3 +
>  net/hv_sock/af_hvsock.c | 1523 
> +++
>  10 files changed, 1658 insertions(+), 1 deletion(-)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 50f69ba..6eaa26f 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -5514,7 +5514,9 @@ F:  drivers/pci/host/pci-hyperv.c
>  F:   drivers/net/hyperv/
>  F:   drivers/scsi/storvsc_drv.c
>  F:   drivers/video/fbdev/hyperv_fb.c
> +F:   net/hv_sock/
>  F:   include/linux/hyperv.h
> +F:   include/net/af_hvsock.h
>  F:   tools/hv/
>  F:   Documentation/ABI/stable/sysfs-bus-vmbus
>
> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
> index 50f493e..1cda6ea5 100644
> --- a/include/linux/hyperv.h
> +++ b/include/linux/hyperv.h
> @@ -1508,5 +1508,18 @@ static inline void commit_rd_index(struct 
> vmbus_channel *channel)
>   vmbus_set_event(channel);
>  }
>
> +struct vmpipe_proto_header {
> + u32 pkt_type;
> + u32 data_size;
> +};
> +
> +#define HVSOCK_HEADER_LEN(sizeof(struct vmpacket_descriptor) + \
> +  sizeof(struct vmpipe_proto_header))
> +
> +/* See 'prev_indices' in hv_ringbuf

[PATCH net-next v3] netvsc: get rid of completion timeouts

2016-06-09 Thread Vitaly Kuznetsov
I'm hitting 5 second timeout in rndis_filter_set_rss_param() while setting
RSS parameters for the device. When this happens we end up returning
-ETIMEDOUT from the function and rndis_filter_device_add() falls back to
setting

net_device->max_chn = 1;
net_device->num_chn = 1;
net_device->num_sc_offered = 0;

but after a moment the rndis request succeeds and subchannels start to
appear. netvsc_sc_open() does unconditional nvscdev->num_sc_offered-- and
it becomes U32_MAX-1. Consequent rndis_filter_device_remove() will hang
while waiting for all U32_MAX-1 subchannels to appear and this is not
going to happen.

The immediate issue could be solved by adding num_sc_offered > 0 check to
netvsc_sc_open() but we're getting out of sync with the host and it's not
easy to adjust things later, e.g. in this particular case we'll be creating
queues without a user request for it and races are expected. Same applies
to other parts of the driver which have the same completion timeout.

Following the trend in drivers/hv/* code I suggest we remove all these
timeouts completely. As a guest we can always trust the host we're running
on and if the host screws things up there is no easy way to recover anyway.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
Acked-by: Haiyang Zhang <haiya...@microsoft.com>
---
Changes since v2:
- Rebase to current net-next [David Miller]

Changes since v1 RFC:
- Non-RFC
- Restore (net_dev->num_sc_offered > 0) condition in
  rndis_filter_device_remove() as without it we may hang when there are no
  subchannels at all.
- Remove now unused ndev in rndis_filter_set_packet_filter() to fix the build
  warning reported by kbuild robot.
- Added Haiyang's ACK (hopefully it stands with the above mentioned
  changes).
---
 drivers/net/hyperv/netvsc.c   |  14 +
 drivers/net/hyperv/rndis_filter.c | 117 +-
 2 files changed, 31 insertions(+), 100 deletions(-)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 96f00c0..6909c32 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -244,7 +244,6 @@ static int netvsc_destroy_buf(struct hv_device *device)
 static int netvsc_init_buf(struct hv_device *device)
 {
int ret = 0;
-   unsigned long t;
struct netvsc_device *net_device;
struct nvsp_message *init_packet;
struct net_device *ndev;
@@ -305,9 +304,7 @@ static int netvsc_init_buf(struct hv_device *device)
goto cleanup;
}
 
-   t = wait_for_completion_timeout(_device->channel_init_wait, 5*HZ);
-   BUG_ON(t == 0);
-
+   wait_for_completion(_device->channel_init_wait);
 
/* Check the response */
if (init_packet->msg.v1_msg.
@@ -390,8 +387,7 @@ static int netvsc_init_buf(struct hv_device *device)
goto cleanup;
}
 
-   t = wait_for_completion_timeout(_device->channel_init_wait, 5*HZ);
-   BUG_ON(t == 0);
+   wait_for_completion(_device->channel_init_wait);
 
/* Check the response */
if (init_packet->msg.v1_msg.
@@ -445,7 +441,6 @@ static int negotiate_nvsp_ver(struct hv_device *device,
 {
struct net_device *ndev = hv_get_drvdata(device);
int ret;
-   unsigned long t;
 
memset(init_packet, 0, sizeof(struct nvsp_message));
init_packet->hdr.msg_type = NVSP_MSG_TYPE_INIT;
@@ -462,10 +457,7 @@ static int negotiate_nvsp_ver(struct hv_device *device,
if (ret != 0)
return ret;
 
-   t = wait_for_completion_timeout(_device->channel_init_wait, 5*HZ);
-
-   if (t == 0)
-   return -ETIMEDOUT;
+   wait_for_completion(_device->channel_init_wait);
 
if (init_packet->msg.init_msg.init_complete.status !=
NVSP_STAT_SUCCESS)
diff --git a/drivers/net/hyperv/rndis_filter.c 
b/drivers/net/hyperv/rndis_filter.c
index 979fa44..8e830f7 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -466,7 +466,6 @@ static int rndis_filter_query_device(struct rndis_device 
*dev, u32 oid,
struct rndis_query_request *query;
struct rndis_query_complete *query_complete;
int ret = 0;
-   unsigned long t;
 
if (!result)
return -EINVAL;
@@ -503,11 +502,7 @@ static int rndis_filter_query_device(struct rndis_device 
*dev, u32 oid,
if (ret != 0)
goto cleanup;
 
-   t = wait_for_completion_timeout(>wait_event, 5*HZ);
-   if (t == 0) {
-   ret = -ETIMEDOUT;
-   goto cleanup;
-   }
+   wait_for_completion(>wait_event);
 
/* Copy the response back */
query_complete = >response_msg.msg.query_complete;
@@ -556,7 +551,6 @@ int rndis_filter_set_device_mac(struct net_device *ndev, 
char *mac)
u32 extlen = sizeof(struct rndis_config_parameter_info) +
 

Re: [PATCH net-next v2] netvsc: get rid of completion timeouts

2016-06-09 Thread Vitaly Kuznetsov
David Miller <da...@davemloft.net> writes:

> From: Vitaly Kuznetsov <vkuzn...@redhat.com>
> Date: Wed,  8 Jun 2016 19:17:41 +0200
>
>> I'm hitting 5 second timeout in rndis_filter_set_rss_param() while setting
>> RSS parameters for the device. When this happens we end up returning
>> -ETIMEDOUT from the function and rndis_filter_device_add() falls back to
>> setting
>> 
>>  net_device->max_chn = 1;
>> net_device->num_chn = 1;
>> net_device->num_sc_offered = 0;
>> 
>> but after a moment the rndis request succeeds and subchannels start to
>> appear. netvsc_sc_open() does unconditional nvscdev->num_sc_offered-- and
>> it becomes U32_MAX-1. Consequent rndis_filter_device_remove() will hang
>> while waiting for all U32_MAX-1 subchannels to appear and this is not
>> going to happen.
>> 
>> The immediate issue could be solved by adding num_sc_offered > 0 check to
>> netvsc_sc_open() but we're getting out of sync with the host and it's not
>> easy to adjust things later, e.g. in this particular case we'll be creating
>> queues without a user request for it and races are expected. Same applies
>> to other parts of the driver which have the same completion timeout.
>> 
>> Following the trend in drivers/hv/* code I suggest we remove all these
>> timeouts completely. As a guest we can always trust the host we're running
>> on and if the host screws things up there is no easy way to recover anyway.
>> 
>> Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
>> Acked-by: Haiyang Zhang <haiya...@microsoft.com>
>
> This doesn't apply cleanly to net-next, I get rejects.

Sorry for the screw up, apparently I forgot about my own cleanups I sent
before. v3 is coming.

-- 
  Vitaly


[PATCH net-next v2] netvsc: get rid of completion timeouts

2016-06-08 Thread Vitaly Kuznetsov
I'm hitting 5 second timeout in rndis_filter_set_rss_param() while setting
RSS parameters for the device. When this happens we end up returning
-ETIMEDOUT from the function and rndis_filter_device_add() falls back to
setting

net_device->max_chn = 1;
net_device->num_chn = 1;
net_device->num_sc_offered = 0;

but after a moment the rndis request succeeds and subchannels start to
appear. netvsc_sc_open() does unconditional nvscdev->num_sc_offered-- and
it becomes U32_MAX-1. Consequent rndis_filter_device_remove() will hang
while waiting for all U32_MAX-1 subchannels to appear and this is not
going to happen.

The immediate issue could be solved by adding num_sc_offered > 0 check to
netvsc_sc_open() but we're getting out of sync with the host and it's not
easy to adjust things later, e.g. in this particular case we'll be creating
queues without a user request for it and races are expected. Same applies
to other parts of the driver which have the same completion timeout.

Following the trend in drivers/hv/* code I suggest we remove all these
timeouts completely. As a guest we can always trust the host we're running
on and if the host screws things up there is no easy way to recover anyway.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
Acked-by: Haiyang Zhang <haiya...@microsoft.com>
---
Changes since v1 RFC:
- Non-RFC
- Restore (net_dev->num_sc_offered > 0) condition in
  rndis_filter_device_remove() as without it we may hang when there are no
  subchannels at all.
- Remove now unused ndev in rndis_filter_set_packet_filter() to fix the build
  warning reported by kbuild robot.
- Added Haiyang's ACK (hopefully it stands with the above mentioned
  changes).
---
 drivers/net/hyperv/netvsc.c   |  14 +
 drivers/net/hyperv/rndis_filter.c | 117 +-
 2 files changed, 31 insertions(+), 100 deletions(-)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 719cb35..89e0dea 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -249,7 +249,6 @@ static int netvsc_destroy_buf(struct hv_device *device)
 static int netvsc_init_buf(struct hv_device *device)
 {
int ret = 0;
-   unsigned long t;
struct netvsc_device *net_device;
struct nvsp_message *init_packet;
struct net_device *ndev;
@@ -310,9 +309,7 @@ static int netvsc_init_buf(struct hv_device *device)
goto cleanup;
}
 
-   t = wait_for_completion_timeout(_device->channel_init_wait, 5*HZ);
-   BUG_ON(t == 0);
-
+   wait_for_completion(_device->channel_init_wait);
 
/* Check the response */
if (init_packet->msg.v1_msg.
@@ -395,8 +392,7 @@ static int netvsc_init_buf(struct hv_device *device)
goto cleanup;
}
 
-   t = wait_for_completion_timeout(_device->channel_init_wait, 5*HZ);
-   BUG_ON(t == 0);
+   wait_for_completion(_device->channel_init_wait);
 
/* Check the response */
if (init_packet->msg.v1_msg.
@@ -450,7 +446,6 @@ static int negotiate_nvsp_ver(struct hv_device *device,
 {
struct net_device *ndev = hv_get_drvdata(device);
int ret;
-   unsigned long t;
 
memset(init_packet, 0, sizeof(struct nvsp_message));
init_packet->hdr.msg_type = NVSP_MSG_TYPE_INIT;
@@ -467,10 +462,7 @@ static int negotiate_nvsp_ver(struct hv_device *device,
if (ret != 0)
return ret;
 
-   t = wait_for_completion_timeout(_device->channel_init_wait, 5*HZ);
-
-   if (t == 0)
-   return -ETIMEDOUT;
+   wait_for_completion(_device->channel_init_wait);
 
if (init_packet->msg.init_msg.init_complete.status !=
NVSP_STAT_SUCCESS)
diff --git a/drivers/net/hyperv/rndis_filter.c 
b/drivers/net/hyperv/rndis_filter.c
index 97c292b..f5bf267 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -466,7 +466,6 @@ static int rndis_filter_query_device(struct rndis_device 
*dev, u32 oid,
struct rndis_query_request *query;
struct rndis_query_complete *query_complete;
int ret = 0;
-   unsigned long t;
 
if (!result)
return -EINVAL;
@@ -503,11 +502,7 @@ static int rndis_filter_query_device(struct rndis_device 
*dev, u32 oid,
if (ret != 0)
goto cleanup;
 
-   t = wait_for_completion_timeout(>wait_event, 5*HZ);
-   if (t == 0) {
-   ret = -ETIMEDOUT;
-   goto cleanup;
-   }
+   wait_for_completion(>wait_event);
 
/* Copy the response back */
query_complete = >response_msg.msg.query_complete;
@@ -558,7 +553,6 @@ int rndis_filter_set_device_mac(struct hv_device *hdev, 
char *mac)
u32 extlen = sizeof(struct rndis_config_parameter_info) +
2*NWADR_STRLEN + 4*ETH_ALEN;
int ret;
-   un

Re: [PATCH RFC net-next] netvsc: get rid of completion timeouts

2016-06-08 Thread Vitaly Kuznetsov
Vitaly Kuznetsov <vkuzn...@redhat.com> writes:

> I'm hitting 5 second timeout in rndis_filter_set_rss_param() while setting
> RSS parameters for the device. When this happens we end up returning
> -ETIMEDOUT from the function and rndis_filter_device_add() falls back to
> setting
>
>   net_device->max_chn = 1;
> net_device->num_chn = 1;
> net_device->num_sc_offered = 0;
>
> but after a moment the rndis request succeeds and subchannels start to
> appear. netvsc_sc_open() does unconditional nvscdev->num_sc_offered-- and
> it becomes U32_MAX-1. Consequent rndis_filter_device_remove() will hang
> while waiting for all U32_MAX-1 subchannels to appear and this is not
> going to happen.
>
> The immediate issue could be solved by adding num_sc_offered > 0 check to
> netvsc_sc_open() but we're getting out of sync with the host and it's not
> easy to adjust things later, e.g. in this particular case we'll be creating
> queues without a user request for it and races are expected. Same applies
> to other parts of the driver which have the same completion timeout.
>
> Following the trend in drivers/hv/* code I suggest we remove all these
> timeouts completely. As a guest we can always trust the host we're running
> on and if the host screws things up there is no easy way to recover anyway.
>
> Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>

Kbuild test robot reports an unused variable after the patch, I'll fix
this and resend together with a related fix so please don't apply this
RFC to net-next atm.

[skip]

-- 
  Vitaly


[PATCH RFC net-next] netvsc: get rid of completion timeouts

2016-06-08 Thread Vitaly Kuznetsov
I'm hitting 5 second timeout in rndis_filter_set_rss_param() while setting
RSS parameters for the device. When this happens we end up returning
-ETIMEDOUT from the function and rndis_filter_device_add() falls back to
setting

net_device->max_chn = 1;
net_device->num_chn = 1;
net_device->num_sc_offered = 0;

but after a moment the rndis request succeeds and subchannels start to
appear. netvsc_sc_open() does unconditional nvscdev->num_sc_offered-- and
it becomes U32_MAX-1. Consequent rndis_filter_device_remove() will hang
while waiting for all U32_MAX-1 subchannels to appear and this is not
going to happen.

The immediate issue could be solved by adding num_sc_offered > 0 check to
netvsc_sc_open() but we're getting out of sync with the host and it's not
easy to adjust things later, e.g. in this particular case we'll be creating
queues without a user request for it and races are expected. Same applies
to other parts of the driver which have the same completion timeout.

Following the trend in drivers/hv/* code I suggest we remove all these
timeouts completely. As a guest we can always trust the host we're running
on and if the host screws things up there is no easy way to recover anyway.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/net/hyperv/netvsc.c   |  14 +
 drivers/net/hyperv/rndis_filter.c | 115 +-
 2 files changed, 30 insertions(+), 99 deletions(-)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 719cb35..89e0dea 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -249,7 +249,6 @@ static int netvsc_destroy_buf(struct hv_device *device)
 static int netvsc_init_buf(struct hv_device *device)
 {
int ret = 0;
-   unsigned long t;
struct netvsc_device *net_device;
struct nvsp_message *init_packet;
struct net_device *ndev;
@@ -310,9 +309,7 @@ static int netvsc_init_buf(struct hv_device *device)
goto cleanup;
}
 
-   t = wait_for_completion_timeout(_device->channel_init_wait, 5*HZ);
-   BUG_ON(t == 0);
-
+   wait_for_completion(_device->channel_init_wait);
 
/* Check the response */
if (init_packet->msg.v1_msg.
@@ -395,8 +392,7 @@ static int netvsc_init_buf(struct hv_device *device)
goto cleanup;
}
 
-   t = wait_for_completion_timeout(_device->channel_init_wait, 5*HZ);
-   BUG_ON(t == 0);
+   wait_for_completion(_device->channel_init_wait);
 
/* Check the response */
if (init_packet->msg.v1_msg.
@@ -450,7 +446,6 @@ static int negotiate_nvsp_ver(struct hv_device *device,
 {
struct net_device *ndev = hv_get_drvdata(device);
int ret;
-   unsigned long t;
 
memset(init_packet, 0, sizeof(struct nvsp_message));
init_packet->hdr.msg_type = NVSP_MSG_TYPE_INIT;
@@ -467,10 +462,7 @@ static int negotiate_nvsp_ver(struct hv_device *device,
if (ret != 0)
return ret;
 
-   t = wait_for_completion_timeout(_device->channel_init_wait, 5*HZ);
-
-   if (t == 0)
-   return -ETIMEDOUT;
+   wait_for_completion(_device->channel_init_wait);
 
if (init_packet->msg.init_msg.init_complete.status !=
NVSP_STAT_SUCCESS)
diff --git a/drivers/net/hyperv/rndis_filter.c 
b/drivers/net/hyperv/rndis_filter.c
index 97c292b..e68c6c4 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -466,7 +466,6 @@ static int rndis_filter_query_device(struct rndis_device 
*dev, u32 oid,
struct rndis_query_request *query;
struct rndis_query_complete *query_complete;
int ret = 0;
-   unsigned long t;
 
if (!result)
return -EINVAL;
@@ -503,11 +502,7 @@ static int rndis_filter_query_device(struct rndis_device 
*dev, u32 oid,
if (ret != 0)
goto cleanup;
 
-   t = wait_for_completion_timeout(>wait_event, 5*HZ);
-   if (t == 0) {
-   ret = -ETIMEDOUT;
-   goto cleanup;
-   }
+   wait_for_completion(>wait_event);
 
/* Copy the response back */
query_complete = >response_msg.msg.query_complete;
@@ -558,7 +553,6 @@ int rndis_filter_set_device_mac(struct hv_device *hdev, 
char *mac)
u32 extlen = sizeof(struct rndis_config_parameter_info) +
2*NWADR_STRLEN + 4*ETH_ALEN;
int ret;
-   unsigned long t;
 
request = get_rndis_request(rdev, RNDIS_MSG_SET,
RNDIS_MESSAGE_SIZE(struct rndis_set_request) + extlen);
@@ -599,21 +593,13 @@ int rndis_filter_set_device_mac(struct hv_device *hdev, 
char *mac)
if (ret != 0)
goto cleanup;
 
-   t = wait_for_completion_timeout(>wait_event, 5*HZ);
-   if (t == 0) {
-   netdev_err(ndev, "timeout before we got a set response..

[PATCH net-next v2 4/5] hv_netvsc: pass struct net_device to rndis_filter_set_device_mac()

2016-06-03 Thread Vitaly Kuznetsov
We unpack 'struct net_device' in netvsc_set_mac_addr() to get to
'struct hv_device' pointer which we use in rndis_filter_set_device_mac()
to get back to 'struct net_device'.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/net/hyperv/hyperv_net.h   | 2 +-
 drivers/net/hyperv/netvsc_drv.c   | 4 +---
 drivers/net/hyperv/rndis_filter.c | 3 +--
 3 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index f650ec1..467fb8b 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -201,7 +201,7 @@ int rndis_filter_receive(struct hv_device *dev,
struct vmbus_channel *channel);
 
 int rndis_filter_set_packet_filter(struct rndis_device *dev, u32 new_filter);
-int rndis_filter_set_device_mac(struct hv_device *hdev, char *mac);
+int rndis_filter_set_device_mac(struct net_device *ndev, char *mac);
 
 void netvsc_switch_datapath(struct net_device *nv_dev, bool vf);
 
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 591ab58..733dea1 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -982,8 +982,6 @@ static struct rtnl_link_stats64 *netvsc_get_stats64(struct 
net_device *net,
 
 static int netvsc_set_mac_addr(struct net_device *ndev, void *p)
 {
-   struct net_device_context *ndevctx = netdev_priv(ndev);
-   struct hv_device *hdev =  ndevctx->device_ctx;
struct sockaddr *addr = p;
char save_adr[ETH_ALEN];
unsigned char save_aatype;
@@ -996,7 +994,7 @@ static int netvsc_set_mac_addr(struct net_device *ndev, 
void *p)
if (err != 0)
return err;
 
-   err = rndis_filter_set_device_mac(hdev, addr->sa_data);
+   err = rndis_filter_set_device_mac(ndev, addr->sa_data);
if (err != 0) {
/* roll back to saved MAC */
memcpy(ndev->dev_addr, save_adr, ETH_ALEN);
diff --git a/drivers/net/hyperv/rndis_filter.c 
b/drivers/net/hyperv/rndis_filter.c
index 2c2f3b9..f1692bc 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -543,9 +543,8 @@ static int rndis_filter_query_device_mac(struct 
rndis_device *dev)
 #define NWADR_STR "NetworkAddress"
 #define NWADR_STRLEN 14
 
-int rndis_filter_set_device_mac(struct hv_device *hdev, char *mac)
+int rndis_filter_set_device_mac(struct net_device *ndev, char *mac)
 {
-   struct net_device *ndev = hv_get_drvdata(hdev);
struct netvsc_device *nvdev = net_device_to_netvsc_device(ndev);
struct rndis_device *rdev = nvdev->extension;
struct rndis_request *request;
-- 
2.5.5



[PATCH net-next v2 3/5] hv_netvsc: pass struct netvsc_device to rndis_filter_{open,close}()

2016-06-03 Thread Vitaly Kuznetsov
Both rndis_filter_open()/rndis_filter_close() use struct hv_device to
reach to struct netvsc_device only and all callers have it already.
While on it, rename net_device to nvdev in rndis_filter_open() as
net_device is misleading.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/net/hyperv/hyperv_net.h   |  5 +++--
 drivers/net/hyperv/netvsc_drv.c   | 13 +
 drivers/net/hyperv/rndis_filter.c | 14 +-
 3 files changed, 13 insertions(+), 19 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 952cbc8..f650ec1 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -173,6 +173,7 @@ struct rndis_device {
 
 /* Interface */
 struct rndis_message;
+struct netvsc_device;
 int netvsc_device_add(struct hv_device *device, void *additional_info);
 int netvsc_device_remove(struct hv_device *device);
 int netvsc_send(struct hv_device *device,
@@ -189,8 +190,8 @@ int netvsc_recv_callback(struct hv_device *device_obj,
struct vmbus_channel *channel,
u16 vlan_tci);
 void netvsc_channel_cb(void *context);
-int rndis_filter_open(struct hv_device *dev);
-int rndis_filter_close(struct hv_device *dev);
+int rndis_filter_open(struct netvsc_device *nvdev);
+int rndis_filter_close(struct netvsc_device *nvdev);
 int rndis_filter_device_add(struct hv_device *dev,
void *additional_info);
 void rndis_filter_device_remove(struct hv_device *dev);
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index db8fedf..591ab58 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -98,16 +98,14 @@ static void netvsc_set_multicast_list(struct net_device 
*net)
 
 static int netvsc_open(struct net_device *net)
 {
-   struct net_device_context *net_device_ctx = netdev_priv(net);
-   struct hv_device *device_obj = net_device_ctx->device_ctx;
-   struct netvsc_device *nvdev = net_device_ctx->nvdev;
+   struct netvsc_device *nvdev = net_device_to_netvsc_device(net);
struct rndis_device *rdev;
int ret = 0;
 
netif_carrier_off(net);
 
/* Open up the device */
-   ret = rndis_filter_open(device_obj);
+   ret = rndis_filter_open(nvdev);
if (ret != 0) {
netdev_err(net, "unable to open device (ret %d).\n", ret);
return ret;
@@ -125,7 +123,6 @@ static int netvsc_open(struct net_device *net)
 static int netvsc_close(struct net_device *net)
 {
struct net_device_context *net_device_ctx = netdev_priv(net);
-   struct hv_device *device_obj = net_device_ctx->device_ctx;
struct netvsc_device *nvdev = net_device_ctx->nvdev;
int ret;
u32 aread, awrite, i, msec = 10, retry = 0, retry_max = 20;
@@ -135,7 +132,7 @@ static int netvsc_close(struct net_device *net)
 
/* Make sure netvsc_set_multicast_list doesn't re-enable filter! */
cancel_work_sync(_device_ctx->work);
-   ret = rndis_filter_close(device_obj);
+   ret = rndis_filter_close(nvdev);
if (ret != 0) {
netdev_err(net, "unable to close device (ret %d).\n", ret);
return ret;
@@ -1247,7 +1244,7 @@ static int netvsc_vf_up(struct net_device *vf_netdev)
/*
 * Open the device before switching data path.
 */
-   rndis_filter_open(net_device_ctx->device_ctx);
+   rndis_filter_open(netvsc_dev);
 
/*
 * notify the host to switch the data path.
@@ -1302,7 +1299,7 @@ static int netvsc_vf_down(struct net_device *vf_netdev)
udelay(50);
netvsc_switch_datapath(ndev, false);
netdev_info(ndev, "Data path switched from VF: %s\n", vf_netdev->name);
-   rndis_filter_close(net_device_ctx->device_ctx);
+   rndis_filter_close(netvsc_dev);
netif_carrier_on(ndev);
/*
 * Notify peers.
diff --git a/drivers/net/hyperv/rndis_filter.c 
b/drivers/net/hyperv/rndis_filter.c
index 42c652e..2c2f3b9 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -1216,23 +1216,19 @@ void rndis_filter_device_remove(struct hv_device *dev)
 }
 
 
-int rndis_filter_open(struct hv_device *dev)
+int rndis_filter_open(struct netvsc_device *nvdev)
 {
-   struct netvsc_device *net_device = hv_device_to_netvsc_device(dev);
-
-   if (!net_device)
+   if (!nvdev)
return -EINVAL;
 
-   if (atomic_inc_return(_device->open_cnt) != 1)
+   if (atomic_inc_return(>open_cnt) != 1)
return 0;
 
-   return rndis_filter_open_device(net_device->extension);
+   return rndis_filter_open_device(nvdev->extension);
 }
 
-int rndis_filter_close(struct hv_device *dev)
+int rndis_filter_close(struct netvsc_device *nvdev)
 {
-   struct netvsc_device *nvdev = hv_device_to_

[PATCH net-next v2 1/5] hv_netvsc: remove redundant assignment in netvsc_recv_callback()

2016-06-03 Thread Vitaly Kuznetsov
net_device_ctx is assigned in the very beginning of the function and 'net'
pointer doesn't change.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/net/hyperv/netvsc_drv.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 6a69b5c..db8fedf 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -701,7 +701,6 @@ int netvsc_recv_callback(struct hv_device *device_obj,
}
 
 vf_injection_done:
-   net_device_ctx = netdev_priv(net);
rx_stats = this_cpu_ptr(net_device_ctx->rx_stats);
 
/* Allocate a skb - TODO direct I/O to pages? */
-- 
2.5.5



[PATCH net-next v2 0/5] hv_netvsc: cleanup after untangling the pointer mess

2016-06-03 Thread Vitaly Kuznetsov
Changes since v1:
- resend when net-next is open [David Miller]
- rebased to current net-next.

After we made traveling through our internal structures explicit it became
obvious that some functions take arguments they don't need just to do
redundant pointer travel and get to what they really need while their
callers already have the required information.

This is just a cleanup series with no functional changes intended. It
doesn't pretend to be complete, additional cleanup of other functions may
follow.

Vitaly Kuznetsov (5):
  hv_netvsc: remove redundant assignment in netvsc_recv_callback()
  hv_netvsc: introduce {net,hv}_device_to_netvsc_device() helpers
  hv_netvsc: pass struct netvsc_device to rndis_filter_{open,close}()
  hv_netvsc: pass struct net_device to rndis_filter_set_device_mac()
  hv_netvsc: pass struct net_device to rndis_filter_set_offload_params()

 drivers/net/hyperv/hyperv_net.h   | 19 +++---
 drivers/net/hyperv/netvsc.c   | 11 +++---
 drivers/net/hyperv/netvsc_drv.c   | 18 ++---
 drivers/net/hyperv/rndis_filter.c | 42 ---
 4 files changed, 38 insertions(+), 52 deletions(-)

-- 
2.5.5



[PATCH net-next v2 2/5] hv_netvsc: introduce {net,hv}_device_to_netvsc_device() helpers

2016-06-03 Thread Vitaly Kuznetsov
Make it easier to get 'struct netvsc_device' from 'struct net_device' and
'struct hv_device' by introducing inline helpers.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/net/hyperv/hyperv_net.h   | 12 
 drivers/net/hyperv/netvsc.c   | 11 +++
 drivers/net/hyperv/rndis_filter.c | 24 +++-
 3 files changed, 22 insertions(+), 25 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index c270c5a..952cbc8 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -743,6 +743,18 @@ struct netvsc_device {
atomic_t vf_use_cnt;
 };
 
+static inline struct netvsc_device *
+net_device_to_netvsc_device(struct net_device *ndev)
+{
+   return ((struct net_device_context *)netdev_priv(ndev))->nvdev;
+}
+
+static inline struct netvsc_device *
+hv_device_to_netvsc_device(struct hv_device *device)
+{
+   return net_device_to_netvsc_device(hv_get_drvdata(device));
+}
+
 /* NdisInitialize message */
 struct rndis_initialize_request {
u32 req_id;
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 719cb35..96f00c0 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -95,9 +95,7 @@ static void free_netvsc_device(struct netvsc_device *nvdev)
 
 static struct netvsc_device *get_outbound_net_device(struct hv_device *device)
 {
-   struct net_device *ndev = hv_get_drvdata(device);
-   struct net_device_context *net_device_ctx = netdev_priv(ndev);
-   struct netvsc_device *net_device = net_device_ctx->nvdev;
+   struct netvsc_device *net_device = hv_device_to_netvsc_device(device);
 
if (net_device && net_device->destroy)
net_device = NULL;
@@ -107,9 +105,7 @@ static struct netvsc_device *get_outbound_net_device(struct 
hv_device *device)
 
 static struct netvsc_device *get_inbound_net_device(struct hv_device *device)
 {
-   struct net_device *ndev = hv_get_drvdata(device);
-   struct net_device_context *net_device_ctx = netdev_priv(ndev);
-   struct netvsc_device *net_device = net_device_ctx->nvdev;
+   struct netvsc_device *net_device = hv_device_to_netvsc_device(device);
 
if (!net_device)
goto get_in_err;
@@ -128,8 +124,7 @@ static int netvsc_destroy_buf(struct hv_device *device)
struct nvsp_message *revoke_packet;
int ret = 0;
struct net_device *ndev = hv_get_drvdata(device);
-   struct net_device_context *net_device_ctx = netdev_priv(ndev);
-   struct netvsc_device *net_device = net_device_ctx->nvdev;
+   struct netvsc_device *net_device = net_device_to_netvsc_device(ndev);
 
/*
 * If we got a section count, it means we received a
diff --git a/drivers/net/hyperv/rndis_filter.c 
b/drivers/net/hyperv/rndis_filter.c
index 97c292b..42c652e 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -546,8 +546,7 @@ static int rndis_filter_query_device_mac(struct 
rndis_device *dev)
 int rndis_filter_set_device_mac(struct hv_device *hdev, char *mac)
 {
struct net_device *ndev = hv_get_drvdata(hdev);
-   struct net_device_context *net_device_ctx = netdev_priv(ndev);
-   struct netvsc_device *nvdev = net_device_ctx->nvdev;
+   struct netvsc_device *nvdev = net_device_to_netvsc_device(ndev);
struct rndis_device *rdev = nvdev->extension;
struct rndis_request *request;
struct rndis_set_request *set;
@@ -626,8 +625,7 @@ rndis_filter_set_offload_params(struct hv_device *hdev,
struct ndis_offload_params *req_offloads)
 {
struct net_device *ndev = hv_get_drvdata(hdev);
-   struct net_device_context *net_device_ctx = netdev_priv(ndev);
-   struct netvsc_device *nvdev = net_device_ctx->nvdev;
+   struct netvsc_device *nvdev = net_device_to_netvsc_device(ndev);
struct rndis_device *rdev = nvdev->extension;
struct rndis_request *request;
struct rndis_set_request *set;
@@ -851,8 +849,7 @@ static int rndis_filter_init_device(struct rndis_device 
*dev)
u32 status;
int ret;
unsigned long t;
-   struct net_device_context *net_device_ctx = netdev_priv(dev->ndev);
-   struct netvsc_device *nvdev = net_device_ctx->nvdev;
+   struct netvsc_device *nvdev = net_device_to_netvsc_device(dev->ndev);
 
request = get_rndis_request(dev, RNDIS_MSG_INIT,
RNDIS_MESSAGE_SIZE(struct rndis_initialize_request));
@@ -977,8 +974,7 @@ static void netvsc_sc_open(struct vmbus_channel *new_sc)
 {
struct net_device *ndev =
hv_get_drvdata(new_sc->primary_channel->device_obj);
-   struct net_device_context *net_device_ctx = netdev_priv(ndev);
-   struct netvsc_device *nvscdev = net_device_ctx->nvdev;
+   struct netvsc_device *nvscdev

[PATCH net-next v2 5/5] hv_netvsc: pass struct net_device to rndis_filter_set_offload_params()

2016-06-03 Thread Vitaly Kuznetsov
The only caller rndis_filter_device_add() has 'struct net_device' pointer
already.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/net/hyperv/rndis_filter.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/hyperv/rndis_filter.c 
b/drivers/net/hyperv/rndis_filter.c
index f1692bc..979fa44 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -620,10 +620,9 @@ cleanup:
 }
 
 static int
-rndis_filter_set_offload_params(struct hv_device *hdev,
+rndis_filter_set_offload_params(struct net_device *ndev,
struct ndis_offload_params *req_offloads)
 {
-   struct net_device *ndev = hv_get_drvdata(hdev);
struct netvsc_device *nvdev = net_device_to_netvsc_device(ndev);
struct rndis_device *rdev = nvdev->extension;
struct rndis_request *request;
@@ -1083,7 +1082,7 @@ int rndis_filter_device_add(struct hv_device *dev,
offloads.lso_v2_ipv4 = NDIS_OFFLOAD_PARAMETERS_LSOV2_ENABLED;
 
 
-   ret = rndis_filter_set_offload_params(dev, );
+   ret = rndis_filter_set_offload_params(net, );
if (ret)
goto err_dev_remv;
 
-- 
2.5.5



[PATCH net] bnx2x: avoid leaking memory on bnx2x_init_one() failures

2016-05-30 Thread Vitaly Kuznetsov
bnx2x_init_bp() allocates memory with bnx2x_alloc_mem_bp() so if we
fail later in bnx2x_init_one() we need to free this memory
with bnx2x_free_mem_bp() to avoid leakages. E.g. I'm observing memory
leaks reported by kmemleak when a failure (unrelated) happens in
bnx2x_vfpf_acquire().

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 0a5b770..c5fe9158 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -13941,14 +13941,14 @@ static int bnx2x_init_one(struct pci_dev *pdev,
bp->doorbells = bnx2x_vf_doorbells(bp);
rc = bnx2x_vf_pci_alloc(bp);
if (rc)
-   goto init_one_exit;
+   goto init_one_freemem;
} else {
doorbell_size = BNX2X_L2_MAX_CID(bp) * (1 << BNX2X_DB_SHIFT);
if (doorbell_size > pci_resource_len(pdev, 2)) {
dev_err(>pdev->dev,
"Cannot map doorbells, bar size too small, 
aborting\n");
rc = -ENOMEM;
-   goto init_one_exit;
+   goto init_one_freemem;
}
bp->doorbells = ioremap_nocache(pci_resource_start(pdev, 2),
doorbell_size);
@@ -13957,19 +13957,19 @@ static int bnx2x_init_one(struct pci_dev *pdev,
dev_err(>pdev->dev,
"Cannot map doorbell space, aborting\n");
rc = -ENOMEM;
-   goto init_one_exit;
+   goto init_one_freemem;
}
 
if (IS_VF(bp)) {
rc = bnx2x_vfpf_acquire(bp, tx_count, rx_count);
if (rc)
-   goto init_one_exit;
+   goto init_one_freemem;
}
 
/* Enable SRIOV if capability found in configuration space */
rc = bnx2x_iov_init_one(bp, int_mode, BNX2X_MAX_NUM_OF_VFS);
if (rc)
-   goto init_one_exit;
+   goto init_one_freemem;
 
/* calc qm_cid_count */
bp->qm_cid_count = bnx2x_set_qm_cid_count(bp);
@@ -13988,7 +13988,7 @@ static int bnx2x_init_one(struct pci_dev *pdev,
rc = bnx2x_set_int_mode(bp);
if (rc) {
dev_err(>dev, "Cannot set interrupts\n");
-   goto init_one_exit;
+   goto init_one_freemem;
}
BNX2X_DEV_INFO("set interrupts successfully\n");
 
@@ -13996,7 +13996,7 @@ static int bnx2x_init_one(struct pci_dev *pdev,
rc = register_netdev(dev);
if (rc) {
dev_err(>dev, "Cannot register net device\n");
-   goto init_one_exit;
+   goto init_one_freemem;
}
BNX2X_DEV_INFO("device name after netdev register %s\n", dev->name);
 
@@ -14029,6 +14029,9 @@ static int bnx2x_init_one(struct pci_dev *pdev,
 
return 0;
 
+init_one_freemem:
+   bnx2x_free_mem_bp(bp);
+
 init_one_exit:
bnx2x_disable_pcie_error_reporting(bp);
 
-- 
2.5.5



[PATCH net-next 0/5] hv_netvsc: cleanup after untangling the pointer mess

2016-05-23 Thread Vitaly Kuznetsov
After we made traveling through our internal structures explicit it became
obvious that some functions take arguments they don't need just to do
redundant pointer travel and get to what they really need while their
callers already have the required information.

This is just a cleanup series with no functional changes intended. It
doesn't pretend to be complete, additional cleanup of other functions may
follow.

Vitaly Kuznetsov (5):
  hv_netvsc: remove redundant assignment in netvsc_recv_callback()
  hv_netvsc: introduce {net,hv}_device_to_netvsc_device() helpers
  hv_netvsc: pass struct netvsc_device to rndis_filter_{open,close}()
  hv_netvsc: pass struct net_device to rndis_filter_set_device_mac()
  hv_netvsc: pass struct net_device to rndis_filter_set_offload_params()

 drivers/net/hyperv/hyperv_net.h   | 19 +++---
 drivers/net/hyperv/netvsc.c   | 11 +++---
 drivers/net/hyperv/netvsc_drv.c   | 18 ++---
 drivers/net/hyperv/rndis_filter.c | 42 ---
 4 files changed, 38 insertions(+), 52 deletions(-)

-- 
2.5.5



[PATCH net-next 2/5] hv_netvsc: introduce {net,hv}_device_to_netvsc_device() helpers

2016-05-23 Thread Vitaly Kuznetsov
Make it easier to get 'struct netvsc_device' from 'struct net_device' and
'struct hv_device' by introducing inline helpers.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/net/hyperv/hyperv_net.h   | 12 
 drivers/net/hyperv/netvsc.c   | 11 +++
 drivers/net/hyperv/rndis_filter.c | 24 +++-
 3 files changed, 22 insertions(+), 25 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index c270c5a..952cbc8 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -743,6 +743,18 @@ struct netvsc_device {
atomic_t vf_use_cnt;
 };
 
+static inline struct netvsc_device *
+net_device_to_netvsc_device(struct net_device *ndev)
+{
+   return ((struct net_device_context *)netdev_priv(ndev))->nvdev;
+}
+
+static inline struct netvsc_device *
+hv_device_to_netvsc_device(struct hv_device *device)
+{
+   return net_device_to_netvsc_device(hv_get_drvdata(device));
+}
+
 /* NdisInitialize message */
 struct rndis_initialize_request {
u32 req_id;
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 719cb35..96f00c0 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -95,9 +95,7 @@ static void free_netvsc_device(struct netvsc_device *nvdev)
 
 static struct netvsc_device *get_outbound_net_device(struct hv_device *device)
 {
-   struct net_device *ndev = hv_get_drvdata(device);
-   struct net_device_context *net_device_ctx = netdev_priv(ndev);
-   struct netvsc_device *net_device = net_device_ctx->nvdev;
+   struct netvsc_device *net_device = hv_device_to_netvsc_device(device);
 
if (net_device && net_device->destroy)
net_device = NULL;
@@ -107,9 +105,7 @@ static struct netvsc_device *get_outbound_net_device(struct 
hv_device *device)
 
 static struct netvsc_device *get_inbound_net_device(struct hv_device *device)
 {
-   struct net_device *ndev = hv_get_drvdata(device);
-   struct net_device_context *net_device_ctx = netdev_priv(ndev);
-   struct netvsc_device *net_device = net_device_ctx->nvdev;
+   struct netvsc_device *net_device = hv_device_to_netvsc_device(device);
 
if (!net_device)
goto get_in_err;
@@ -128,8 +124,7 @@ static int netvsc_destroy_buf(struct hv_device *device)
struct nvsp_message *revoke_packet;
int ret = 0;
struct net_device *ndev = hv_get_drvdata(device);
-   struct net_device_context *net_device_ctx = netdev_priv(ndev);
-   struct netvsc_device *net_device = net_device_ctx->nvdev;
+   struct netvsc_device *net_device = net_device_to_netvsc_device(ndev);
 
/*
 * If we got a section count, it means we received a
diff --git a/drivers/net/hyperv/rndis_filter.c 
b/drivers/net/hyperv/rndis_filter.c
index 97c292b..42c652e 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -546,8 +546,7 @@ static int rndis_filter_query_device_mac(struct 
rndis_device *dev)
 int rndis_filter_set_device_mac(struct hv_device *hdev, char *mac)
 {
struct net_device *ndev = hv_get_drvdata(hdev);
-   struct net_device_context *net_device_ctx = netdev_priv(ndev);
-   struct netvsc_device *nvdev = net_device_ctx->nvdev;
+   struct netvsc_device *nvdev = net_device_to_netvsc_device(ndev);
struct rndis_device *rdev = nvdev->extension;
struct rndis_request *request;
struct rndis_set_request *set;
@@ -626,8 +625,7 @@ rndis_filter_set_offload_params(struct hv_device *hdev,
struct ndis_offload_params *req_offloads)
 {
struct net_device *ndev = hv_get_drvdata(hdev);
-   struct net_device_context *net_device_ctx = netdev_priv(ndev);
-   struct netvsc_device *nvdev = net_device_ctx->nvdev;
+   struct netvsc_device *nvdev = net_device_to_netvsc_device(ndev);
struct rndis_device *rdev = nvdev->extension;
struct rndis_request *request;
struct rndis_set_request *set;
@@ -851,8 +849,7 @@ static int rndis_filter_init_device(struct rndis_device 
*dev)
u32 status;
int ret;
unsigned long t;
-   struct net_device_context *net_device_ctx = netdev_priv(dev->ndev);
-   struct netvsc_device *nvdev = net_device_ctx->nvdev;
+   struct netvsc_device *nvdev = net_device_to_netvsc_device(dev->ndev);
 
request = get_rndis_request(dev, RNDIS_MSG_INIT,
RNDIS_MESSAGE_SIZE(struct rndis_initialize_request));
@@ -977,8 +974,7 @@ static void netvsc_sc_open(struct vmbus_channel *new_sc)
 {
struct net_device *ndev =
hv_get_drvdata(new_sc->primary_channel->device_obj);
-   struct net_device_context *net_device_ctx = netdev_priv(ndev);
-   struct netvsc_device *nvscdev = net_device_ctx->nvdev;
+   struct netvsc_device *nvscdev

[PATCH net-next 1/5] hv_netvsc: remove redundant assignment in netvsc_recv_callback()

2016-05-23 Thread Vitaly Kuznetsov
net_device_ctx is assigned in the very beginning of the function and 'net'
pointer doesn't change.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/net/hyperv/netvsc_drv.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 6a69b5c..db8fedf 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -701,7 +701,6 @@ int netvsc_recv_callback(struct hv_device *device_obj,
}
 
 vf_injection_done:
-   net_device_ctx = netdev_priv(net);
rx_stats = this_cpu_ptr(net_device_ctx->rx_stats);
 
/* Allocate a skb - TODO direct I/O to pages? */
-- 
2.5.5



[PATCH net-next 3/5] hv_netvsc: pass struct netvsc_device to rndis_filter_{open,close}()

2016-05-23 Thread Vitaly Kuznetsov
Both rndis_filter_open()/rndis_filter_close() use struct hv_device to
reach to struct netvsc_device only and all callers have it already.
While on it, rename net_device to nvdev in rndis_filter_open() as
net_device is misleading.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/net/hyperv/hyperv_net.h   |  5 +++--
 drivers/net/hyperv/netvsc_drv.c   | 13 +
 drivers/net/hyperv/rndis_filter.c | 14 +-
 3 files changed, 13 insertions(+), 19 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 952cbc8..f650ec1 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -173,6 +173,7 @@ struct rndis_device {
 
 /* Interface */
 struct rndis_message;
+struct netvsc_device;
 int netvsc_device_add(struct hv_device *device, void *additional_info);
 int netvsc_device_remove(struct hv_device *device);
 int netvsc_send(struct hv_device *device,
@@ -189,8 +190,8 @@ int netvsc_recv_callback(struct hv_device *device_obj,
struct vmbus_channel *channel,
u16 vlan_tci);
 void netvsc_channel_cb(void *context);
-int rndis_filter_open(struct hv_device *dev);
-int rndis_filter_close(struct hv_device *dev);
+int rndis_filter_open(struct netvsc_device *nvdev);
+int rndis_filter_close(struct netvsc_device *nvdev);
 int rndis_filter_device_add(struct hv_device *dev,
void *additional_info);
 void rndis_filter_device_remove(struct hv_device *dev);
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index db8fedf..591ab58 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -98,16 +98,14 @@ static void netvsc_set_multicast_list(struct net_device 
*net)
 
 static int netvsc_open(struct net_device *net)
 {
-   struct net_device_context *net_device_ctx = netdev_priv(net);
-   struct hv_device *device_obj = net_device_ctx->device_ctx;
-   struct netvsc_device *nvdev = net_device_ctx->nvdev;
+   struct netvsc_device *nvdev = net_device_to_netvsc_device(net);
struct rndis_device *rdev;
int ret = 0;
 
netif_carrier_off(net);
 
/* Open up the device */
-   ret = rndis_filter_open(device_obj);
+   ret = rndis_filter_open(nvdev);
if (ret != 0) {
netdev_err(net, "unable to open device (ret %d).\n", ret);
return ret;
@@ -125,7 +123,6 @@ static int netvsc_open(struct net_device *net)
 static int netvsc_close(struct net_device *net)
 {
struct net_device_context *net_device_ctx = netdev_priv(net);
-   struct hv_device *device_obj = net_device_ctx->device_ctx;
struct netvsc_device *nvdev = net_device_ctx->nvdev;
int ret;
u32 aread, awrite, i, msec = 10, retry = 0, retry_max = 20;
@@ -135,7 +132,7 @@ static int netvsc_close(struct net_device *net)
 
/* Make sure netvsc_set_multicast_list doesn't re-enable filter! */
cancel_work_sync(_device_ctx->work);
-   ret = rndis_filter_close(device_obj);
+   ret = rndis_filter_close(nvdev);
if (ret != 0) {
netdev_err(net, "unable to close device (ret %d).\n", ret);
return ret;
@@ -1247,7 +1244,7 @@ static int netvsc_vf_up(struct net_device *vf_netdev)
/*
 * Open the device before switching data path.
 */
-   rndis_filter_open(net_device_ctx->device_ctx);
+   rndis_filter_open(netvsc_dev);
 
/*
 * notify the host to switch the data path.
@@ -1302,7 +1299,7 @@ static int netvsc_vf_down(struct net_device *vf_netdev)
udelay(50);
netvsc_switch_datapath(ndev, false);
netdev_info(ndev, "Data path switched from VF: %s\n", vf_netdev->name);
-   rndis_filter_close(net_device_ctx->device_ctx);
+   rndis_filter_close(netvsc_dev);
netif_carrier_on(ndev);
/*
 * Notify peers.
diff --git a/drivers/net/hyperv/rndis_filter.c 
b/drivers/net/hyperv/rndis_filter.c
index 42c652e..2c2f3b9 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -1216,23 +1216,19 @@ void rndis_filter_device_remove(struct hv_device *dev)
 }
 
 
-int rndis_filter_open(struct hv_device *dev)
+int rndis_filter_open(struct netvsc_device *nvdev)
 {
-   struct netvsc_device *net_device = hv_device_to_netvsc_device(dev);
-
-   if (!net_device)
+   if (!nvdev)
return -EINVAL;
 
-   if (atomic_inc_return(_device->open_cnt) != 1)
+   if (atomic_inc_return(>open_cnt) != 1)
return 0;
 
-   return rndis_filter_open_device(net_device->extension);
+   return rndis_filter_open_device(nvdev->extension);
 }
 
-int rndis_filter_close(struct hv_device *dev)
+int rndis_filter_close(struct netvsc_device *nvdev)
 {
-   struct netvsc_device *nvdev = hv_device_to_

[PATCH net-next 5/5] hv_netvsc: pass struct net_device to rndis_filter_set_offload_params()

2016-05-23 Thread Vitaly Kuznetsov
The only caller rndis_filter_device_add() has 'struct net_device' pointer
already.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/net/hyperv/rndis_filter.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/hyperv/rndis_filter.c 
b/drivers/net/hyperv/rndis_filter.c
index f1692bc..979fa44 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -620,10 +620,9 @@ cleanup:
 }
 
 static int
-rndis_filter_set_offload_params(struct hv_device *hdev,
+rndis_filter_set_offload_params(struct net_device *ndev,
struct ndis_offload_params *req_offloads)
 {
-   struct net_device *ndev = hv_get_drvdata(hdev);
struct netvsc_device *nvdev = net_device_to_netvsc_device(ndev);
struct rndis_device *rdev = nvdev->extension;
struct rndis_request *request;
@@ -1083,7 +1082,7 @@ int rndis_filter_device_add(struct hv_device *dev,
offloads.lso_v2_ipv4 = NDIS_OFFLOAD_PARAMETERS_LSOV2_ENABLED;
 
 
-   ret = rndis_filter_set_offload_params(dev, );
+   ret = rndis_filter_set_offload_params(net, );
if (ret)
goto err_dev_remv;
 
-- 
2.5.5



[PATCH net-next 4/5] hv_netvsc: pass struct net_device to rndis_filter_set_device_mac()

2016-05-23 Thread Vitaly Kuznetsov
We unpack 'struct net_device' in netvsc_set_mac_addr() to get to
'struct hv_device' pointer which we use in rndis_filter_set_device_mac()
to get back to 'struct net_device'.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/net/hyperv/hyperv_net.h   | 2 +-
 drivers/net/hyperv/netvsc_drv.c   | 4 +---
 drivers/net/hyperv/rndis_filter.c | 3 +--
 3 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index f650ec1..467fb8b 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -201,7 +201,7 @@ int rndis_filter_receive(struct hv_device *dev,
struct vmbus_channel *channel);
 
 int rndis_filter_set_packet_filter(struct rndis_device *dev, u32 new_filter);
-int rndis_filter_set_device_mac(struct hv_device *hdev, char *mac);
+int rndis_filter_set_device_mac(struct net_device *ndev, char *mac);
 
 void netvsc_switch_datapath(struct net_device *nv_dev, bool vf);
 
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 591ab58..733dea1 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -982,8 +982,6 @@ static struct rtnl_link_stats64 *netvsc_get_stats64(struct 
net_device *net,
 
 static int netvsc_set_mac_addr(struct net_device *ndev, void *p)
 {
-   struct net_device_context *ndevctx = netdev_priv(ndev);
-   struct hv_device *hdev =  ndevctx->device_ctx;
struct sockaddr *addr = p;
char save_adr[ETH_ALEN];
unsigned char save_aatype;
@@ -996,7 +994,7 @@ static int netvsc_set_mac_addr(struct net_device *ndev, 
void *p)
if (err != 0)
return err;
 
-   err = rndis_filter_set_device_mac(hdev, addr->sa_data);
+   err = rndis_filter_set_device_mac(ndev, addr->sa_data);
if (err != 0) {
/* roll back to saved MAC */
memcpy(ndev->dev_addr, save_adr, ETH_ALEN);
diff --git a/drivers/net/hyperv/rndis_filter.c 
b/drivers/net/hyperv/rndis_filter.c
index 2c2f3b9..f1692bc 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -543,9 +543,8 @@ static int rndis_filter_query_device_mac(struct 
rndis_device *dev)
 #define NWADR_STR "NetworkAddress"
 #define NWADR_STRLEN 14
 
-int rndis_filter_set_device_mac(struct hv_device *hdev, char *mac)
+int rndis_filter_set_device_mac(struct net_device *ndev, char *mac)
 {
-   struct net_device *ndev = hv_get_drvdata(hdev);
struct netvsc_device *nvdev = net_device_to_netvsc_device(ndev);
struct rndis_device *rdev = nvdev->extension;
struct rndis_request *request;
-- 
2.5.5



[PATCH net-next v2 0/6] hv_netvsc: avoid races on mtu change/set channels

2016-05-13 Thread Vitaly Kuznetsov
Changes since v1:
- Rebased to net-next [Haiyang Zhang]

Original description:

MTU change and set channels operations are implemented as netvsc device
re-creation destroying internal structures (struct net_device stays). This
is really unfortunate but there is no support from Hyper-V host to do it
in a different way. Such re-creation is unsurprisingly racy, Haiyang
reported a crash when netvsc_change_mtu() is racing with
netvsc_link_change() but I was able to identify additional races upon
investigation. Both netvsc_set_channels() and netvsc_change_mtu() race
against:
1) netvsc_link_change()
2) netvsc_remove()
3) netvsc_send()

To solve these issues without introducing new locks some refactoring is
required. We need to get rid of very complex link graph in all the
internal structures and avoid traveling through structures which are being
removed.

Vitaly Kuznetsov (6):
  hv_netvsc: move start_remove flag to net_device_context
  hv_netvsc: use start_remove flag to protect netvsc_link_change()
  hv_netvsc: untangle the pointer mess
  hv_netvsc: get rid of struct net_device pointer in struct
netvsc_device
  hv_netvsc: synchronize netvsc_change_mtu()/netvsc_set_channels() with
netvsc_remove()
  hv_netvsc: set nvdev link after populating chn_table

 drivers/net/hyperv/hyperv_net.h   |  17 ++--
 drivers/net/hyperv/netvsc.c   | 136 +
 drivers/net/hyperv/netvsc_drv.c   | 178 ++
 drivers/net/hyperv/rndis_filter.c |  82 +-
 4 files changed, 215 insertions(+), 198 deletions(-)

-- 
2.5.5



[PATCH net-next v2 2/6] hv_netvsc: use start_remove flag to protect netvsc_link_change()

2016-05-13 Thread Vitaly Kuznetsov
netvsc_link_change() can race with netvsc_change_mtu() or
netvsc_set_channels() as these functions destroy struct netvsc_device and
rndis filter. Use start_remove flag for syncronization. As
netvsc_change_mtu()/netvsc_set_channels() are called with rtnl lock held
we need to take it before checking start_remove value in
netvsc_link_change().

Reported-by: Haiyang Zhang <haiya...@microsoft.com>
Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/net/hyperv/netvsc_drv.c | 21 +
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index b3fa2cd..01de2dc 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -838,6 +838,8 @@ static int netvsc_set_channels(struct net_device *net,
  out:
netvsc_open(net);
net_device_ctx->start_remove = false;
+   /* We may have missed link change notifications */
+   schedule_delayed_work(_device_ctx->dwork, 0);
 
return ret;
 
@@ -946,6 +948,9 @@ out:
netvsc_open(ndev);
ndevctx->start_remove = false;
 
+   /* We may have missed link change notifications */
+   schedule_delayed_work(>dwork, 0);
+
return ret;
 }
 
@@ -1066,6 +1071,11 @@ static void netvsc_link_change(struct work_struct *w)
unsigned long flags, next_reconfig, delay;
 
ndev_ctx = container_of(w, struct net_device_context, dwork.work);
+
+   rtnl_lock();
+   if (ndev_ctx->start_remove)
+   goto out_unlock;
+
net_device = hv_get_drvdata(ndev_ctx->device_ctx);
rdev = net_device->extension;
net = net_device->ndev;
@@ -1079,7 +1089,7 @@ static void netvsc_link_change(struct work_struct *w)
delay = next_reconfig - jiffies;
delay = delay < LINKCHANGE_INT ? delay : LINKCHANGE_INT;
schedule_delayed_work(_ctx->dwork, delay);
-   return;
+   goto out_unlock;
}
ndev_ctx->last_reconfig = jiffies;
 
@@ -1093,9 +1103,7 @@ static void netvsc_link_change(struct work_struct *w)
spin_unlock_irqrestore(_ctx->lock, flags);
 
if (!event)
-   return;
-
-   rtnl_lock();
+   goto out_unlock;
 
switch (event->event) {
/* Only the following events are possible due to the check in
@@ -1144,6 +1152,11 @@ static void netvsc_link_change(struct work_struct *w)
 */
if (reschedule)
schedule_delayed_work(_ctx->dwork, LINKCHANGE_INT);
+
+   return;
+
+out_unlock:
+   rtnl_unlock();
 }
 
 static void netvsc_free_netdev(struct net_device *netdev)
-- 
2.5.5



[PATCH net-next v2 3/6] hv_netvsc: untangle the pointer mess

2016-05-13 Thread Vitaly Kuznetsov
We have the following structures keeping netvsc adapter state:
- struct net_device
- struct net_device_context
- struct netvsc_device
- struct rndis_device
- struct hv_device
and there are pointers/dependencies between them:
- struct net_device_context is contained in struct net_device
- struct hv_device has driver_data pointer which points to
  'struct net_device' OR 'struct netvsc_device' depending on driver's
  state (!).
- struct net_device_context has a pointer to 'struct hv_device'.
- struct netvsc_device has pointers to 'struct hv_device' and
  'struct net_device_context'.
- struct rndis_device has a pointer to 'struct netvsc_device'.

Different functions get different structures as parameters and use these
pointers for traveling. The problem is (in addition to keeping in mind
this complex graph) that some of these structures (struct netvsc_device
and struct rndis_device) are being removed and re-created on mtu change
(as we implement it as re-creation of hyper-v device) so our travel using
these pointers is dangerous.

Simplify this to a the following:
- add struct netvsc_device pointer to struct net_device_context (which is
  a part of struct net_device and thus never disappears)
- remove struct hv_device and struct net_device_context pointers from
  struct netvsc_device
- replace pointer to 'struct netvsc_device' with pointer to
  'struct net_device'.
- always keep 'struct net_device' in hv_device driver_data.

We'll end up with the following 'circular' structure:

net_device:
 [net_device_context] -> netvsc_device -> rndis_device -> net_device
  -> hv_device -> net_device

On MTU change we'll be removing the 'netvsc_device -> rndis_device'
branch and re-creating it making the synchronization easier.

There is one additional redundant pointer left, it is struct net_device
link in struct netvsc_device, it is going to be removed in a separate
commit.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/net/hyperv/hyperv_net.h   | 10 ++---
 drivers/net/hyperv/netvsc.c   | 82 ---
 drivers/net/hyperv/netvsc_drv.c   | 51 ++--
 drivers/net/hyperv/rndis_filter.c | 78 +++--
 4 files changed, 99 insertions(+), 122 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 18e9cc8..0f38743 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -158,7 +158,7 @@ enum rndis_device_state {
 };
 
 struct rndis_device {
-   struct netvsc_device *net_dev;
+   struct net_device *ndev;
 
enum rndis_device_state state;
bool link_state;
@@ -173,6 +173,7 @@ struct rndis_device {
 
 /* Interface */
 struct rndis_message;
+struct netvsc_device;
 int netvsc_device_add(struct hv_device *device, void *additional_info);
 int netvsc_device_remove(struct hv_device *device);
 int netvsc_send(struct hv_device *device,
@@ -653,6 +654,8 @@ struct garp_wrk {
 struct net_device_context {
/* point back to our device context */
struct hv_device *device_ctx;
+   /* netvsc_device */
+   struct netvsc_device *nvdev;
/* reconfigure work */
struct delayed_work dwork;
/* last reconfig time */
@@ -679,8 +682,6 @@ struct net_device_context {
 
 /* Per netvsc device */
 struct netvsc_device {
-   struct hv_device *dev;
-
u32 nvsp_version;
 
atomic_t num_outstanding_sends;
@@ -734,9 +735,6 @@ struct netvsc_device {
u32 max_pkt; /* max number of pkt in one send, e.g. 8 */
u32 pkt_align; /* alignment bytes, e.g. 8 */
 
-   /* The net device context */
-   struct net_device_context *nd_ctx;
-
/* 1: allocated, serial number is valid. 0: not allocated */
u32 vf_alloc;
/* Serial number of the VF to team with */
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 5e2017b..1cd01ad 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -40,7 +40,9 @@
 void netvsc_switch_datapath(struct netvsc_device *nv_dev, bool vf)
 {
struct nvsp_message *init_pkt = _dev->channel_init_pkt;
-   struct hv_device *dev = nv_dev->dev;
+   struct net_device *ndev = nv_dev->ndev;
+   struct net_device_context *net_device_ctx = netdev_priv(ndev);
+   struct hv_device *dev = net_device_ctx->device_ctx;
 
memset(init_pkt, 0, sizeof(struct nvsp_message));
init_pkt->hdr.msg_type = NVSP_MSG4_TYPE_SWITCH_DATA_PATH;
@@ -62,6 +64,7 @@ static struct netvsc_device *alloc_net_device(struct 
hv_device *device)
 {
struct netvsc_device *net_device;
struct net_device *ndev = hv_get_drvdata(device);
+   struct net_device_context *net_device_ctx = netdev_priv(ndev);
 
net_device = kzalloc(sizeof(struct netvsc_device), GFP_KERNEL);
if (!net_device)
@@ -77,7 +80,6 @@ static struct netvsc_device *all

[PATCH net-next v2 1/6] hv_netvsc: move start_remove flag to net_device_context

2016-05-13 Thread Vitaly Kuznetsov
struct netvsc_device is destroyed on mtu change so keeping the
protection flag there is not a good idea. Move it to struct
net_device_context which is preserved.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/net/hyperv/hyperv_net.h |  4 +++-
 drivers/net/hyperv/netvsc.c |  3 +--
 drivers/net/hyperv/netvsc_drv.c | 12 +---
 3 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 6700a4d..18e9cc8 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -672,6 +672,9 @@ struct net_device_context {
/* Ethtool settings */
u8 duplex;
u32 speed;
+
+   /* the device is going away */
+   bool start_remove;
 };
 
 /* Per netvsc device */
@@ -682,7 +685,6 @@ struct netvsc_device {
 
atomic_t num_outstanding_sends;
wait_queue_head_t wait_drain;
-   bool start_remove;
bool destroy;
 
/* Receive buffer allocated by us but manages by NetVSP */
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index eddce3c..5e2017b 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -74,7 +74,6 @@ static struct netvsc_device *alloc_net_device(struct 
hv_device *device)
}
 
init_waitqueue_head(_device->wait_drain);
-   net_device->start_remove = false;
net_device->destroy = false;
atomic_set(_device->open_cnt, 0);
atomic_set(_device->vf_use_cnt, 0);
@@ -691,7 +690,7 @@ static void netvsc_send_completion(struct netvsc_device 
*net_device,
wake_up(_device->wait_drain);
 
if (netif_tx_queue_stopped(netdev_get_tx_queue(ndev, q_idx)) &&
-   !net_device->start_remove &&
+   !net_device->nd_ctx->start_remove &&
(hv_ringbuf_avail_percent(>outbound) >
 RING_AVAIL_PERCENT_HIWATER || queue_sends < 1))
netif_tx_wake_queue(netdev_get_tx_queue(
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index ba3f3f3..b3fa2cd 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -793,7 +793,7 @@ static int netvsc_set_channels(struct net_device *net,
goto out;
 
  do_set:
-   nvdev->start_remove = true;
+   net_device_ctx->start_remove = true;
rndis_filter_device_remove(dev);
 
nvdev->num_chn = channels->combined_count;
@@ -837,6 +837,7 @@ static int netvsc_set_channels(struct net_device *net,
 
  out:
netvsc_open(net);
+   net_device_ctx->start_remove = false;
 
return ret;
 
@@ -927,7 +928,7 @@ static int netvsc_change_mtu(struct net_device *ndev, int 
mtu)
 
num_chn = nvdev->num_chn;
 
-   nvdev->start_remove = true;
+   ndevctx->start_remove = true;
rndis_filter_device_remove(hdev);
 
ndev->mtu = mtu;
@@ -943,6 +944,7 @@ static int netvsc_change_mtu(struct net_device *ndev, int 
mtu)
 
 out:
netvsc_open(ndev);
+   ndevctx->start_remove = false;
 
return ret;
 }
@@ -1358,6 +1360,9 @@ static int netvsc_probe(struct hv_device *dev,
}
 
hv_set_drvdata(dev, net);
+
+   net_device_ctx->start_remove = false;
+
INIT_DELAYED_WORK(_device_ctx->dwork, netvsc_link_change);
INIT_WORK(_device_ctx->work, do_set_multicast);
INIT_WORK(_device_ctx->gwrk.dwrk, netvsc_notify_peers);
@@ -1419,9 +1424,10 @@ static int netvsc_remove(struct hv_device *dev)
return 0;
}
 
-   net_device->start_remove = true;
 
ndev_ctx = netdev_priv(net);
+   ndev_ctx->start_remove = true;
+
cancel_delayed_work_sync(_ctx->dwork);
cancel_work_sync(_ctx->work);
 
-- 
2.5.5



[PATCH net-next v2 5/6] hv_netvsc: synchronize netvsc_change_mtu()/netvsc_set_channels() with netvsc_remove()

2016-05-13 Thread Vitaly Kuznetsov
When netvsc device is removed during mtu change or channels setup we get
into troubles as both paths are trying to remove the device. Synchronize
them with start_remove flag and rtnl lock.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/net/hyperv/netvsc_drv.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 7325d69..6a69b5c 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -759,7 +759,7 @@ static int netvsc_set_channels(struct net_device *net,
int ret = 0;
bool recovering = false;
 
-   if (!nvdev || nvdev->destroy)
+   if (net_device_ctx->start_remove || !nvdev || nvdev->destroy)
return -ENODEV;
 
num_chn = nvdev->num_chn;
@@ -907,7 +907,7 @@ static int netvsc_change_mtu(struct net_device *ndev, int 
mtu)
u32 num_chn;
int ret = 0;
 
-   if (nvdev == NULL || nvdev->destroy)
+   if (ndevctx->start_remove || !nvdev || nvdev->destroy)
return -ENODEV;
 
if (nvdev->nvsp_version >= NVSP_PROTOCOL_VERSION_2)
@@ -1445,7 +1445,12 @@ static int netvsc_remove(struct hv_device *dev)
ndev_ctx = netdev_priv(net);
net_device = ndev_ctx->nvdev;
 
+   /* Avoid racing with netvsc_change_mtu()/netvsc_set_channels()
+* removing the device.
+*/
+   rtnl_lock();
ndev_ctx->start_remove = true;
+   rtnl_unlock();
 
cancel_delayed_work_sync(_ctx->dwork);
cancel_work_sync(_ctx->work);
-- 
2.5.5



[PATCH net-next v2 4/6] hv_netvsc: get rid of struct net_device pointer in struct netvsc_device

2016-05-13 Thread Vitaly Kuznetsov
Simplify netvsvc pointer graph by getting rid of the redundant ndev
pointer. We can always get a pointer to struct net_device from somewhere
else.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/net/hyperv/hyperv_net.h   |  5 +--
 drivers/net/hyperv/netvsc.c   | 36 +++-
 drivers/net/hyperv/netvsc_drv.c   | 91 +++
 drivers/net/hyperv/rndis_filter.c |  4 +-
 4 files changed, 72 insertions(+), 64 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 0f38743..c270c5a 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -173,7 +173,6 @@ struct rndis_device {
 
 /* Interface */
 struct rndis_message;
-struct netvsc_device;
 int netvsc_device_add(struct hv_device *device, void *additional_info);
 int netvsc_device_remove(struct hv_device *device);
 int netvsc_send(struct hv_device *device,
@@ -203,7 +202,7 @@ int rndis_filter_receive(struct hv_device *dev,
 int rndis_filter_set_packet_filter(struct rndis_device *dev, u32 new_filter);
 int rndis_filter_set_device_mac(struct hv_device *hdev, char *mac);
 
-void netvsc_switch_datapath(struct netvsc_device *nv_dev, bool vf);
+void netvsc_switch_datapath(struct net_device *nv_dev, bool vf);
 
 #define NVSP_INVALID_PROTOCOL_VERSION  ((u32)0x)
 
@@ -711,8 +710,6 @@ struct netvsc_device {
struct nvsp_message revoke_packet;
/* unsigned char HwMacAddr[HW_MACADDR_LEN]; */
 
-   struct net_device *ndev;
-
struct vmbus_channel *chn_table[VRSS_CHANNEL_MAX];
u32 send_table[VRSS_SEND_TAB_SIZE];
u32 max_chn;
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 1cd01ad..96b3c32 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -37,12 +37,12 @@
  * Switch the data path from the synthetic interface to the VF
  * interface.
  */
-void netvsc_switch_datapath(struct netvsc_device *nv_dev, bool vf)
+void netvsc_switch_datapath(struct net_device *ndev, bool vf)
 {
-   struct nvsp_message *init_pkt = _dev->channel_init_pkt;
-   struct net_device *ndev = nv_dev->ndev;
struct net_device_context *net_device_ctx = netdev_priv(ndev);
struct hv_device *dev = net_device_ctx->device_ctx;
+   struct netvsc_device *nv_dev = net_device_ctx->nvdev;
+   struct nvsp_message *init_pkt = _dev->channel_init_pkt;
 
memset(init_pkt, 0, sizeof(struct nvsp_message));
init_pkt->hdr.msg_type = NVSP_MSG4_TYPE_SWITCH_DATA_PATH;
@@ -80,7 +80,6 @@ static struct netvsc_device *alloc_net_device(struct 
hv_device *device)
net_device->destroy = false;
atomic_set(_device->open_cnt, 0);
atomic_set(_device->vf_use_cnt, 0);
-   net_device->ndev = ndev;
net_device->max_pkt = RNDIS_MAX_PKT_DEFAULT;
net_device->pkt_align = RNDIS_PKT_ALIGN_DEFAULT;
 
@@ -263,7 +262,7 @@ static int netvsc_init_buf(struct hv_device *device)
net_device = get_outbound_net_device(device);
if (!net_device)
return -ENODEV;
-   ndev = net_device->ndev;
+   ndev = hv_get_drvdata(device);
 
node = cpu_to_node(device->channel->target_cpu);
net_device->recv_buf = vzalloc_node(net_device->recv_buf_size, node);
@@ -453,6 +452,7 @@ static int negotiate_nvsp_ver(struct hv_device *device,
  struct nvsp_message *init_packet,
  u32 nvsp_ver)
 {
+   struct net_device *ndev = hv_get_drvdata(device);
int ret;
unsigned long t;
 
@@ -486,8 +486,7 @@ static int negotiate_nvsp_ver(struct hv_device *device,
/* NVSPv2 or later: Send NDIS config */
memset(init_packet, 0, sizeof(struct nvsp_message));
init_packet->hdr.msg_type = NVSP_MSG2_TYPE_SEND_NDIS_CONFIG;
-   init_packet->msg.v2_msg.send_ndis_config.mtu = net_device->ndev->mtu +
-  ETH_HLEN;
+   init_packet->msg.v2_msg.send_ndis_config.mtu = ndev->mtu + ETH_HLEN;
init_packet->msg.v2_msg.send_ndis_config.capability.ieee8021q = 1;
 
if (nvsp_ver >= NVSP_PROTOCOL_VERSION_5)
@@ -507,7 +506,6 @@ static int netvsc_connect_vsp(struct hv_device *device)
struct netvsc_device *net_device;
struct nvsp_message *init_packet;
int ndis_version;
-   struct net_device *ndev;
u32 ver_list[] = { NVSP_PROTOCOL_VERSION_1, NVSP_PROTOCOL_VERSION_2,
NVSP_PROTOCOL_VERSION_4, NVSP_PROTOCOL_VERSION_5 };
int i, num_ver = 4; /* number of different NVSP versions */
@@ -515,7 +513,6 @@ static int netvsc_connect_vsp(struct hv_device *device)
net_device = get_outbound_net_device(device);
if (!net_device)
return -ENODEV;
-   ndev = net_device->ndev;
 
init_packet = _device->channel_init_pkt;
 
@

[PATCH net-next v2 6/6] hv_netvsc: set nvdev link after populating chn_table

2016-05-13 Thread Vitaly Kuznetsov
Crash in netvsc_send() is observed when netvsc device is re-created on
mtu change/set channels. The crash is caused by dereferencing of NULL
channel pointer which comes from chn_table. The root cause is a mixture
of two facts:
- we set nvdev pointer in net_device_context in alloc_net_device()
  before we populate chn_table.
- we populate chn_table[0] only.

The issue could be papered over by checking channel != NULL in
netvsc_send() but populating the whole chn_table and writing the
nvdev pointer afterwards seems more appropriate.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/net/hyperv/netvsc.c | 29 ++---
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 96b3c32..719cb35 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -60,11 +60,9 @@ void netvsc_switch_datapath(struct net_device *ndev, bool vf)
 }
 
 
-static struct netvsc_device *alloc_net_device(struct hv_device *device)
+static struct netvsc_device *alloc_net_device(void)
 {
struct netvsc_device *net_device;
-   struct net_device *ndev = hv_get_drvdata(device);
-   struct net_device_context *net_device_ctx = netdev_priv(ndev);
 
net_device = kzalloc(sizeof(struct netvsc_device), GFP_KERNEL);
if (!net_device)
@@ -86,8 +84,6 @@ static struct netvsc_device *alloc_net_device(struct 
hv_device *device)
net_device->vf_netdev = NULL;
net_device->vf_inject = false;
 
-   net_device_ctx->nvdev = net_device;
-
return net_device;
 }
 
@@ -1240,20 +1236,19 @@ void netvsc_channel_cb(void *context)
  */
 int netvsc_device_add(struct hv_device *device, void *additional_info)
 {
-   int ret = 0;
+   int i, ret = 0;
int ring_size =
((struct netvsc_device_info *)additional_info)->ring_size;
struct netvsc_device *net_device;
-   struct net_device *ndev;
+   struct net_device *ndev = hv_get_drvdata(device);
+   struct net_device_context *net_device_ctx = netdev_priv(ndev);
 
-   net_device = alloc_net_device(device);
+   net_device = alloc_net_device();
if (!net_device)
return -ENOMEM;
 
net_device->ring_size = ring_size;
 
-   ndev = hv_get_drvdata(device);
-
/* Initialize the NetVSC channel extension */
init_completion(_device->channel_init_wait);
 
@@ -1272,7 +1267,19 @@ int netvsc_device_add(struct hv_device *device, void 
*additional_info)
/* Channel is opened */
pr_info("hv_netvsc channel opened successfully\n");
 
-   net_device->chn_table[0] = device->channel;
+   /* If we're reopening the device we may have multiple queues, fill the
+* chn_table with the default channel to use it before subchannels are
+* opened.
+*/
+   for (i = 0; i < VRSS_CHANNEL_MAX; i++)
+   net_device->chn_table[i] = device->channel;
+
+   /* Writing nvdev pointer unlocks netvsc_send(), make sure chn_table is
+* populated.
+*/
+   wmb();
+
+   net_device_ctx->nvdev = net_device;
 
/* Connect with the NetVsp */
ret = netvsc_connect_vsp(device);
-- 
2.5.5



Re: Aw: [PATCH 0/6] hv_netvsc: avoid races on mtu change/set channels

2016-05-12 Thread Vitaly Kuznetsov
"Lino Sanfilippo"  writes:

> Hi,
>
>>
>> MTU change and set channels operations are implemented as netvsc device
>> re-creation destroying internal structures (struct net_device stays). This
>> is really unfortunate but there is no support from Hyper-V host to do it
>> in a different way. Such re-creation is unsurprisingly racy, Haiyang
>> reported a crash when netvsc_change_mtu() is racing with
>> netvsc_link_change() but I was able to identify additional races upon
>> investigation. Both netvsc_set_channels() and netvsc_change_mtu() race
>> against:
>> 1) netvsc_link_change()
>> 2) netvsc_remove()
>> 3) netvsc_send()
>> 
>
> after having a look into this driver I got the impression that you are 
> working around an
> unfortunate implementation of the shutdown sequence in the remove function:
> If you do unregister_netdev() first instead of resource cleanup then neither 
> set_channels()
> nor change_mtu() can race with remove(). This is since after 
> unregister_netdev() returns
> the netdev is not longer available from userspace and thus neither 
> set_channels nor
> change_mtu can be called anymore (note that all of these functions are 
> protected by the 
> rtnl_lock).

It's worse: before the patch series we get 'struct hv_device' (as it is
called from core VMBus code and we simply cannot get to 'struct
net_device' we need without traveling through 'struct
netvsc_device'. This structure is removed and re-created by both
netvsc_set_channels() and netvsc_change_mtu().

>
> To avoid the race between netvsc_change_mtu()/netvsc_set_channels() and 
> netvsc_link_change()
> you have to stop the concerning worker thread (dwork) before you call 
> netvsc_close() and
> restart it once the device is up again.

Yes, but we also need to guarantee this won't get rescheduled so we need
a flag for that. The appropriate flag is start_remove but only after we
move it from a structure we remove.

-- 
  Vitaly


[PATCH 0/6] hv_netvsc: avoid races on mtu change/set channels

2016-05-12 Thread Vitaly Kuznetsov
MTU change and set channels operations are implemented as netvsc device
re-creation destroying internal structures (struct net_device stays). This
is really unfortunate but there is no support from Hyper-V host to do it
in a different way. Such re-creation is unsurprisingly racy, Haiyang
reported a crash when netvsc_change_mtu() is racing with
netvsc_link_change() but I was able to identify additional races upon
investigation. Both netvsc_set_channels() and netvsc_change_mtu() race
against:
1) netvsc_link_change()
2) netvsc_remove()
3) netvsc_send()

To solve these issues without introducing new locks some refactoring is
required. We need to get rid of very complex link graph in all the
internal structures and avoid traveling through structures which are being
removed.

Vitaly Kuznetsov (6):
  hv_netvsc: move start_remove flag to net_device_context
  hv_netvsc: use start_remove flag to protect netvsc_link_change()
  hv_netvsc: untangle the pointer mess
  hv_netvsc: get rid of struct net_device pointer in struct
netvsc_device
  hv_netvsc: synchronize netvsc_change_mtu()/netvsc_set_channels() with
netvsc_remove()
  hv_netvsc: set nvdev link after populating chn_table

 drivers/net/hyperv/hyperv_net.h   |  15 ++---
 drivers/net/hyperv/netvsc.c   | 130 --
 drivers/net/hyperv/netvsc_drv.c   | 104 +-
 drivers/net/hyperv/rndis_filter.c |  82 
 4 files changed, 164 insertions(+), 167 deletions(-)

-- 
2.5.5



[PATCH 5/6] hv_netvsc: synchronize netvsc_change_mtu()/netvsc_set_channels() with netvsc_remove()

2016-05-12 Thread Vitaly Kuznetsov
When netvsc device is removed during mtu change or channels setup we get
into troubles as both paths are trying to remove the device. Synchronize
them with start_remove flag and rtnl lock.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/net/hyperv/netvsc_drv.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 393403a..71b9125 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -707,7 +707,7 @@ static int netvsc_set_channels(struct net_device *net,
int ret = 0;
bool recovering = false;
 
-   if (!nvdev || nvdev->destroy)
+   if (net_device_ctx->start_remove || !nvdev || nvdev->destroy)
return -ENODEV;
 
num_chn = nvdev->num_chn;
@@ -855,7 +855,7 @@ static int netvsc_change_mtu(struct net_device *ndev, int 
mtu)
u32 num_chn;
int ret = 0;
 
-   if (nvdev == NULL || nvdev->destroy)
+   if (ndevctx->start_remove || !nvdev || nvdev->destroy)
return -ENODEV;
 
if (nvdev->nvsp_version >= NVSP_PROTOCOL_VERSION_2)
@@ -1206,7 +1206,12 @@ static int netvsc_remove(struct hv_device *dev)
ndev_ctx = netdev_priv(net);
net_device = ndev_ctx->nvdev;
 
+   /* Avoid racing with netvsc_change_mtu()/netvsc_set_channels()
+* removing the device.
+*/
+   rtnl_lock();
ndev_ctx->start_remove = true;
+   rtnl_unlock();
 
cancel_delayed_work_sync(_ctx->dwork);
cancel_work_sync(_ctx->work);
-- 
2.5.5



[PATCH 3/6] hv_netvsc: untangle the pointer mess

2016-05-12 Thread Vitaly Kuznetsov
We have the following structures keeping netvsc adapter state:
- struct net_device
- struct net_device_context
- struct netvsc_device
- struct rndis_device
- struct hv_device
and there are pointers/dependencies between them:
- struct net_device_context is contained in struct net_device
- struct hv_device has driver_data pointer which points to
  'struct net_device' OR 'struct netvsc_device' depending on driver's
  state (!).
- struct net_device_context has a pointer to 'struct hv_device'.
- struct netvsc_device has pointers to 'struct hv_device' and
  'struct net_device_context'.
- struct rndis_device has a pointer to 'struct netvsc_device'.

Different functions get different structures as parameters and use these
pointers for traveling. The problem is (in addition to keeping in mind
this complex graph) that some of these structures (struct netvsc_device
and struct rndis_device) are being removed and re-created on mtu change
(as we implement it as re-creation of hyper-v device) so our travel using
these pointers is dangerous.

Simplify this to a the following:
- add struct netvsc_device pointer to struct net_device_context (which is
  a part of struct net_device and thus never disappears)
- remove struct hv_device and struct net_device_context pointers from
  struct netvsc_device
- replace pointer to 'struct netvsc_device' with pointer to
  'struct net_device'.
- always keep 'struct net_device' in hv_device driver_data.

We'll end up with the following 'circular' structure:

net_device:
 [net_device_context] -> netvsc_device -> rndis_device -> net_device
  -> hv_device -> net_device

On MTU change we'll be removing the 'netvsc_device -> rndis_device'
branch and re-creating it making the synchronization easier.

There is one additional redundant pointer left, it is struct net_device
link in struct netvsc_device, it is going to be removed in a separate
commit.

Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/net/hyperv/hyperv_net.h   |  9 ++---
 drivers/net/hyperv/netvsc.c   | 78 +++
 drivers/net/hyperv/netvsc_drv.c   | 48 ++--
 drivers/net/hyperv/rndis_filter.c | 78 ---
 4 files changed, 93 insertions(+), 120 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 3f08132..6edc55f 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -158,7 +158,7 @@ enum rndis_device_state {
 };
 
 struct rndis_device {
-   struct netvsc_device *net_dev;
+   struct net_device *ndev;
 
enum rndis_device_state state;
bool link_state;
@@ -645,6 +645,8 @@ struct netvsc_reconfig {
 struct net_device_context {
/* point back to our device context */
struct hv_device *device_ctx;
+   /* netvsc_device */
+   struct netvsc_device *nvdev;
/* reconfigure work */
struct delayed_work dwork;
/* last reconfig time */
@@ -670,8 +672,6 @@ struct net_device_context {
 
 /* Per netvsc device */
 struct netvsc_device {
-   struct hv_device *dev;
-
u32 nvsp_version;
 
atomic_t num_outstanding_sends;
@@ -725,9 +725,6 @@ struct netvsc_device {
u32 max_pkt; /* max number of pkt in one send, e.g. 8 */
u32 pkt_align; /* alignment bytes, e.g. 8 */
 
-   /* The net device context */
-   struct net_device_context *nd_ctx;
-
/* 1: allocated, serial number is valid. 0: not allocated */
u32 vf_alloc;
/* Serial number of the VF to team with */
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index ba6b304..cc0af6a 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -38,6 +38,7 @@ static struct netvsc_device *alloc_net_device(struct 
hv_device *device)
 {
struct netvsc_device *net_device;
struct net_device *ndev = hv_get_drvdata(device);
+   struct net_device_context *net_device_ctx = netdev_priv(ndev);
 
net_device = kzalloc(sizeof(struct netvsc_device), GFP_KERNEL);
if (!net_device)
@@ -51,12 +52,12 @@ static struct netvsc_device *alloc_net_device(struct 
hv_device *device)
 
init_waitqueue_head(_device->wait_drain);
net_device->destroy = false;
-   net_device->dev = device;
net_device->ndev = ndev;
net_device->max_pkt = RNDIS_MAX_PKT_DEFAULT;
net_device->pkt_align = RNDIS_PKT_ALIGN_DEFAULT;
 
-   hv_set_drvdata(device, net_device);
+   net_device_ctx->nvdev = net_device;
+
return net_device;
 }
 
@@ -68,9 +69,10 @@ static void free_netvsc_device(struct netvsc_device *nvdev)
 
 static struct netvsc_device *get_outbound_net_device(struct hv_device *device)
 {
-   struct netvsc_device *net_device;
+   struct net_device *ndev = hv_get_drvdata(device);
+   struct net_device_context *net_device_ctx = netdev_priv(nd

[PATCH 2/6] hv_netvsc: use start_remove flag to protect netvsc_link_change()

2016-05-12 Thread Vitaly Kuznetsov
netvsc_link_change() can race with netvsc_change_mtu() or
netvsc_set_channels() as these functions destroy struct netvsc_device and
rndis filter. Use start_remove flag for syncronization. As
netvsc_change_mtu()/netvsc_set_channels() are called with rtnl lock held
we need to take it before checking start_remove value in
netvsc_link_change().

Reported-by: Haiyang Zhang <haiya...@microsoft.com>
Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 drivers/net/hyperv/netvsc_drv.c | 21 +
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 48dc81c..9a46b3d 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -787,6 +787,8 @@ static int netvsc_set_channels(struct net_device *net,
  out:
netvsc_open(net);
net_device_ctx->start_remove = false;
+   /* We may have missed link change notifications */
+   schedule_delayed_work(_device_ctx->dwork, 0);
 
return ret;
 
@@ -895,6 +897,9 @@ out:
netvsc_open(ndev);
ndevctx->start_remove = false;
 
+   /* We may have missed link change notifications */
+   schedule_delayed_work(>dwork, 0);
+
return ret;
 }
 
@@ -1015,6 +1020,11 @@ static void netvsc_link_change(struct work_struct *w)
unsigned long flags, next_reconfig, delay;
 
ndev_ctx = container_of(w, struct net_device_context, dwork.work);
+
+   rtnl_lock();
+   if (ndev_ctx->start_remove)
+   goto out_unlock;
+
net_device = hv_get_drvdata(ndev_ctx->device_ctx);
rdev = net_device->extension;
net = net_device->ndev;
@@ -1028,7 +1038,7 @@ static void netvsc_link_change(struct work_struct *w)
delay = next_reconfig - jiffies;
delay = delay < LINKCHANGE_INT ? delay : LINKCHANGE_INT;
schedule_delayed_work(_ctx->dwork, delay);
-   return;
+   goto out_unlock;
}
ndev_ctx->last_reconfig = jiffies;
 
@@ -1042,9 +1052,7 @@ static void netvsc_link_change(struct work_struct *w)
spin_unlock_irqrestore(_ctx->lock, flags);
 
if (!event)
-   return;
-
-   rtnl_lock();
+   goto out_unlock;
 
switch (event->event) {
/* Only the following events are possible due to the check in
@@ -1093,6 +1101,11 @@ static void netvsc_link_change(struct work_struct *w)
 */
if (reschedule)
schedule_delayed_work(_ctx->dwork, LINKCHANGE_INT);
+
+   return;
+
+out_unlock:
+   rtnl_unlock();
 }
 
 static void netvsc_free_netdev(struct net_device *netdev)
-- 
2.5.5



  1   2   >