Re: [PATCH 00/40] Staging: hv: Driver cleanup

2011-06-30 Thread Stephen Hemminger
On Fri, 1 Jul 2011 00:19:38 +
KY Srinivasan  wrote:

> 
> 
> > -Original Message-
> > From: Stephen Hemminger [mailto:shemmin...@vyatta.com]
> > Sent: Thursday, June 30, 2011 7:48 PM
> > To: KY Srinivasan
> > Cc: Christoph Hellwig; de...@linuxdriverproject.org; gre...@suse.de; linux-
> > ker...@vger.kernel.org; virtualizat...@lists.osdl.org
> > Subject: Re: [PATCH 00/40] Staging: hv: Driver cleanup
> > 
> > On Thu, 30 Jun 2011 23:32:34 +
> > KY Srinivasan  wrote:
> > 
> > >
> > > > -Original Message-
> > > > From: Christoph Hellwig [mailto:h...@infradead.org]
> > > > Sent: Thursday, June 30, 2011 3:34 PM
> > > > To: KY Srinivasan
> > > > Cc: gre...@suse.de; linux-ker...@vger.kernel.org;
> > > > de...@linuxdriverproject.org; virtualizat...@lists.osdl.org
> > > > Subject: Re: [PATCH 00/40] Staging: hv: Driver cleanup
> > > >
> > > > On Wed, Jun 29, 2011 at 07:38:21AM -0700, K. Y. Srinivasan wrote:
> > > > > Further cleanup of the hv drivers:
> > > > >
> > > > >   1) Cleanup the reference counting mess for both stor and net 
> > > > > devices.
> > > >
> > > > I really don't understand the need for reference counting on the storage
> > > > side, especially now that you only have a SCSI driver.  The SCSI
> > > > midlayer does proper counting on it's objects (Scsi_Host, scsi_device,
> > > > scsi_cmnd), so you'll get that for free given that SCSI drivers just
> > > > piggyback on the midlayer lifetime rules.
> > > >
> > > > For now your patches should probably go in as-is, but mid-term you
> > > > should be able to completely remove that code on the storage side.
> > > >
> > >
> > > Greg,
> > >
> > > I am thinking of  going back to my original implementation where I had 
> > > one scsi
> > host
> > > per IDE device. This will certainly simply the code. Let me know what you 
> > > think.
> > If you
> > > agree with this approach, please drop this patch-set, I will send you a 
> > > new set
> > of patches.
> > 
> > I think there ref counting on network devices is also unneeded
> > as long as the unregister logic handles RCU correctly. The network layer
> > calls the driver unregister routine after all packets are gone.
> On the networking side, what about incoming packets that may be racing
> with the device destruction. The current ref counting scheme deals with
> that case.

Not sure how HV driver tells hypervisor to stop sending packets. But the
destructor is not called until after all other CPU's are done processing
packets from that device.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [RFC] kvm tools: Implement multiple VQ for virtio-net

2011-11-22 Thread Stephen Hemminger
I have been playing with userspace-rcu which has a number of neat
lockless routines for queuing and hashing. But there aren't kernel versions
and several of them may require cmpxchg to work.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] macvtap: Fix macvtap_get_queue to use rxhash first

2011-11-28 Thread Stephen Hemminger
On Mon, 28 Nov 2011 12:25:15 +0800
Jason Wang  wrote:

> I'm using ixgbe for testing also, for host, its driver seems provide irq 
> affinity hint, so no binding or irqbalance is needed.

The hint is for irqbalance to use. You need to still do manual
affinity or use irqbalance.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH] vhost-net: add module alias

2012-01-10 Thread Stephen Hemminger
By adding the a module alias, programs (or users) won't have to explicitly
call modprobe. Vhost-net will always be available if built into the kernel.
It does require assigning a permanent minor number for depmod to work.
Choose one next to TUN since this driver is related to it.

Also, use C99 style initialization.

Signed-off-by: Stephen Hemminger 

---
 drivers/vhost/net.c|8 +---
 include/linux/miscdevice.h |1 +
 2 files changed, 6 insertions(+), 3 deletions(-)

--- a/drivers/vhost/net.c   2012-01-10 10:56:58.883179194 -0800
+++ b/drivers/vhost/net.c   2012-01-10 19:48:23.650225892 -0800
@@ -856,9 +856,9 @@ static const struct file_operations vhos
 };
 
 static struct miscdevice vhost_net_misc = {
-   MISC_DYNAMIC_MINOR,
-   "vhost-net",
-   &vhost_net_fops,
+   .minor = VHOST_NET_MINOR,
+   .name = "vhost-net",
+   .fops = &vhost_net_fops,
 };
 
 static int vhost_net_init(void)
@@ -879,3 +879,5 @@ MODULE_VERSION("0.0.1");
 MODULE_LICENSE("GPL v2");
 MODULE_AUTHOR("Michael S. Tsirkin");
 MODULE_DESCRIPTION("Host kernel accelerator for virtio net");
+MODULE_ALIAS_MISCDEV(VHOST_NET_MINOR);
+MODULE_ALIAS("devname:vhost-net");
--- a/include/linux/miscdevice.h2012-01-10 10:56:59.779189436 -0800
+++ b/include/linux/miscdevice.h2012-01-10 19:49:56.091748210 -0800
@@ -31,6 +31,7 @@
 #define I2O_MINOR  166
 #define MICROCODE_MINOR184
 #define TUN_MINOR  200
+#define VHOST_NET_MINOR201
 #define MWAVE_MINOR219 /* ACP/Mwave Modem */
 #define MPT_MINOR  220
 #define MPT2SAS_MINOR  221

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vhost-net: add module alias

2012-01-11 Thread Stephen Hemminger
On Wed, 11 Jan 2012 15:43:42 +0800
Amos Kong  wrote:

> On Wed, Jan 11, 2012 at 12:54 PM, Stephen Hemminger
> wrote:
> 
> > By adding the a module alias, programs (or users) won't have to explicitly
> > call modprobe. Vhost-net will always be available if built into the kernel.
> > It does require assigning a permanent minor number for depmod to work.
> > Choose one next to TUN since this driver is related to it.
> >
> > Also, use C99 style initialization.
> >
> > Signed-off-by: Stephen Hemminger 
> >
> > ---
> >  drivers/vhost/net.c|8 +---
> >  include/linux/miscdevice.h |1 +
> >  2 files changed, 6 insertions(+), 3 deletions(-)
> >
:
> /*
>  *  These allocations are managed by dev...@lanana.org. If you use an
>  *  entry that is not in assigned your entry may well be moved and
>  *  reassigned, or set dynamic if a fixed value is not justified.
>  */

Didn't that mailing address was ever used any more. Like many places
in kernel, the comment looked like a historical leftover.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vhost-net: add module alias

2012-01-11 Thread Stephen Hemminger
On Wed, 11 Jan 2012 11:07:47 +0400
Michael Tokarev  wrote:

> On 11.01.2012 08:54, Stephen Hemminger wrote:
> > By adding the a module alias, programs (or users) won't have to explicitly
> > call modprobe. Vhost-net will always be available if built into the kernel.
> > It does require assigning a permanent minor number for depmod to work.
> > Choose one next to TUN since this driver is related to it.
> 
> Why do you think a statically-allocated device number will do any good
> at all?  Static /dev is gone almost completely, at least on the systems
> where whole virt stuff makes any sense, so you don't have pre-created
> vhost-net device anymore, and hence this allocation makes no sense.
> Just IMHO anyway.

The statically allocated device number is required for the udev/module
autoloading to work. Probably the udev infrastructure needs a consistent
number to hang off of.

It looks like:
  * driver adds MODULE_ALIAS() for devname and character device
  * depmod scans modules and creates modules.devname (in /lib/modules)
  * udev uses modules.devname to autoload the module

$ /sbin/modinfo vhost_net
filename:   /lib/modules/3.2.0-net+/kernel/drivers/vhost/vhost_net.ko
alias:  devname:vhost-net
alias:  char-major-10-201
description:Host kernel accelerator for virtio net
...

See also: https://lkml.org/lkml/2010/5/21/134



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH] vhost-net: add module alias (v2)

2012-01-11 Thread Stephen Hemminger
By adding the correct module alias, programs won't have to explicitly
call modprobe. Vhost-net will always be available if built into the kernel.
It does require assigning a permanent minor number for depmod to work.
Choose one next to TUN since this driver is related to it.

Also, use C99 style initialization.

Signed-off-by: Stephen Hemminger 

---
v2 - document minor number and make sure to not overlap

 Documentation/devices.txt  |2 ++
 drivers/vhost/net.c|8 +---
 include/linux/miscdevice.h |1 +
 3 files changed, 8 insertions(+), 3 deletions(-)

--- a/drivers/vhost/net.c   2012-01-10 10:56:58.883179194 -0800
+++ b/drivers/vhost/net.c   2012-01-10 19:48:23.650225892 -0800
@@ -856,9 +856,9 @@ static const struct file_operations vhos
 };
 
 static struct miscdevice vhost_net_misc = {
-   MISC_DYNAMIC_MINOR,
-   "vhost-net",
-   &vhost_net_fops,
+   .minor = VHOST_NET_MINOR,
+   .name = "vhost-net",
+   .fops = &vhost_net_fops,
 };
 
 static int vhost_net_init(void)
@@ -879,3 +879,5 @@ MODULE_VERSION("0.0.1");
 MODULE_LICENSE("GPL v2");
 MODULE_AUTHOR("Michael S. Tsirkin");
 MODULE_DESCRIPTION("Host kernel accelerator for virtio net");
+MODULE_ALIAS_MISCDEV(VHOST_NET_MINOR);
+MODULE_ALIAS("devname:vhost-net");
--- a/include/linux/miscdevice.h2012-01-10 10:56:59.779189436 -0800
+++ b/include/linux/miscdevice.h2012-01-11 09:13:20.803694316 -0800
@@ -42,6 +42,7 @@
 #define AUTOFS_MINOR   235
 #define MAPPER_CTRL_MINOR  236
 #define LOOP_CTRL_MINOR237
+#define VHOST_NET_MINOR238
 #define MISC_DYNAMIC_MINOR 255
 
 struct device;
--- a/Documentation/devices.txt 2012-01-10 10:56:53.399116518 -0800
+++ b/Documentation/devices.txt 2012-01-11 09:12:49.251197653 -0800
@@ -447,6 +447,8 @@ Your cooperation is appreciated.
234 = /dev/btrfs-controlBtrfs control device
235 = /dev/autofs   Autofs control device
236 = /dev/mapper/control   Device-Mapper control device
+   237 = /dev/vhost-netHost kernel accelerator for virtio net
+
240-254 Reserved for local use
255 Reserved for MISC_DYNAMIC_MINOR
 


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH] vhost-net: add module alias (v2.1)

2012-01-11 Thread Stephen Hemminger
Subject: vhost-net: add module alias (v2.1)

By adding some module aliases, programs (or users) won't have to explicitly
call modprobe. Vhost-net will always be available if built into the kernel.
It does require assigning a permanent minor number for depmod to work.

Also:
  - use C99 style initialization.
  - add missing entry in documentation for loop-control

Signed-off-by: Stephen Hemminger 

---
2.1 - add missing documentation for loop control as well

 Documentation/devices.txt  |3 +++
 drivers/vhost/net.c|8 +---
 include/linux/miscdevice.h |1 +
 3 files changed, 9 insertions(+), 3 deletions(-)

--- a/drivers/vhost/net.c   2012-01-10 10:56:58.883179194 -0800
+++ b/drivers/vhost/net.c   2012-01-10 19:48:23.650225892 -0800
@@ -856,9 +856,9 @@ static const struct file_operations vhos
 };
 
 static struct miscdevice vhost_net_misc = {
-   MISC_DYNAMIC_MINOR,
-   "vhost-net",
-   &vhost_net_fops,
+   .minor = VHOST_NET_MINOR,
+   .name = "vhost-net",
+   .fops = &vhost_net_fops,
 };
 
 static int vhost_net_init(void)
@@ -879,3 +879,5 @@ MODULE_VERSION("0.0.1");
 MODULE_LICENSE("GPL v2");
 MODULE_AUTHOR("Michael S. Tsirkin");
 MODULE_DESCRIPTION("Host kernel accelerator for virtio net");
+MODULE_ALIAS_MISCDEV(VHOST_NET_MINOR);
+MODULE_ALIAS("devname:vhost-net");
--- a/include/linux/miscdevice.h2012-01-10 10:56:59.779189436 -0800
+++ b/include/linux/miscdevice.h2012-01-11 09:13:20.803694316 -0800
@@ -42,6 +42,7 @@
 #define AUTOFS_MINOR   235
 #define MAPPER_CTRL_MINOR  236
 #define LOOP_CTRL_MINOR237
+#define VHOST_NET_MINOR238
 #define MISC_DYNAMIC_MINOR 255
 
 struct device;
--- a/Documentation/devices.txt 2012-01-10 10:56:53.399116518 -0800
+++ b/Documentation/devices.txt 2012-01-11 13:17:07.882113340 -0800
@@ -447,6 +447,9 @@ Your cooperation is appreciated.
234 = /dev/btrfs-controlBtrfs control device
235 = /dev/autofs   Autofs control device
236 = /dev/mapper/control   Device-Mapper control device
+   237 = /dev/loop-control Loopback control device
+   238 = /dev/vhost-netHost kernel accelerator for virtio net
+
240-254 Reserved for local use
255 Reserved for MISC_DYNAMIC_MINOR
 


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vhost-net: add module alias (v2.1)

2012-01-16 Thread Stephen Hemminger
On Mon, 16 Jan 2012 12:26:45 +
Alan Cox  wrote:

> > > ACKs, NACKs?  What is happening here?
> > 
> > I would like an Ack from Alan Cox who switched vhost-net
> > to a dynamic minor in the first place, in commit
> > 79907d89c397b8bc2e05b347ec94e928ea919d33.
> 
> Sorry dev...@lanana.org isn't yet back from the kernel hack incident.
> 
> I don't read netdev so someone needs to summarise the issue and send me
> a copy of the patch to look at.
> 
> Alan

Subject: vhost-net: add module alias (v2.1)

By adding some module aliases, programs (or users) won't have to explicitly
call modprobe. Vhost-net will always be available if built into the kernel.
It does require assigning a permanent minor number for depmod to work.

Also:
  - use C99 style initialization.
  - add missing entry in documentation for loop-control

Signed-off-by: Stephen Hemminger 

---
2.1 - add missing documentation for loop control as well

 Documentation/devices.txt  |3 +++
 drivers/vhost/net.c|8 +---
 include/linux/miscdevice.h |1 +
 3 files changed, 9 insertions(+), 3 deletions(-)

--- a/drivers/vhost/net.c   2012-01-12 14:14:25.681815487 -0800
+++ b/drivers/vhost/net.c   2012-01-12 18:09:56.810680816 -0800
@@ -856,9 +856,9 @@ static const struct file_operations vhos
 };
 
 static struct miscdevice vhost_net_misc = {
-   MISC_DYNAMIC_MINOR,
-   "vhost-net",
-   &vhost_net_fops,
+   .minor = VHOST_NET_MINOR,
+   .name = "vhost-net",
+   .fops = &vhost_net_fops,
 };
 
 static int vhost_net_init(void)
@@ -879,3 +879,5 @@ MODULE_VERSION("0.0.1");
 MODULE_LICENSE("GPL v2");
 MODULE_AUTHOR("Michael S. Tsirkin");
 MODULE_DESCRIPTION("Host kernel accelerator for virtio net");
+MODULE_ALIAS_MISCDEV(VHOST_NET_MINOR);
+MODULE_ALIAS("devname:vhost-net");
--- a/include/linux/miscdevice.h2012-01-12 14:14:25.725815981 -0800
+++ b/include/linux/miscdevice.h2012-01-12 18:09:56.810680816 -0800
@@ -42,6 +42,7 @@
 #define AUTOFS_MINOR   235
 #define MAPPER_CTRL_MINOR  236
 #define LOOP_CTRL_MINOR237
+#define VHOST_NET_MINOR238
 #define MISC_DYNAMIC_MINOR 255
 
 struct device;
--- a/Documentation/devices.txt 2012-01-12 14:14:25.701815712 -0800
+++ b/Documentation/devices.txt 2012-01-12 18:09:56.814680860 -0800
@@ -447,6 +447,9 @@ Your cooperation is appreciated.
234 = /dev/btrfs-controlBtrfs control device
235 = /dev/autofs   Autofs control device
236 = /dev/mapper/control   Device-Mapper control device
+   237 = /dev/loop-control Loopback control device
+   238 = /dev/vhost-netHost kernel accelerator for virtio net
+
240-254 Reserved for local use
255 Reserved for MISC_DYNAMIC_MINOR
 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


lguest documentation

2012-01-25 Thread Stephen Hemminger
After moving lguest to tools the documentation index was not updated.

Documentation/virtual/00-INDEX still refers to lguest/ when in fact
the documentation is now over in tools/lguest.  Maybe make a symlink,
or change the reference in the index?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [vmw_vmci RFC 01/11] Apply VMCI context code

2012-05-16 Thread Stephen Hemminger
On Tue, 15 May 2012 08:06:58 -0700
"Andrew Stiegmann (stieg)"  wrote:

> Context code maintains state for vmci and allows the driver
> to communicate with multiple VMs.
> 
> Signed-off-by: Andrew Stiegmann (stieg) 

Running checkpatch reveals the usual noise, and the following that
should be addressed.

ERROR: do not use C99 // comments
#272: FILE: drivers/misc/vmw_vmci/vmci_context.c:183:
+static bool ctx_exists_locked(uint32_t cid)// IN

ERROR: "foo * bar" should be "foo *bar"
#304: FILE: drivers/misc/vmw_vmci/vmci_context.c:215:
+ uid_t * user, struct vmci_ctx **outContext)

I don't mind the C99 style comments, but the // IN convention
is pretty useless and should be removed.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio-net: fix a race on 32bit arches

2012-06-06 Thread Stephen Hemminger
On Wed, 6 Jun 2012 17:49:42 +0300
"Michael S. Tsirkin"  wrote:

> Sounds good, but I have a question: this realies on counters
> being atomic on 64 bit.
> Would not it be better to always use a seqlock even on 64 bit?
> This way counters would actually be correct and in sync.
> As it is if we want e.g. average packet size,
> we can not rely e.g. on it being bytes/packets.

This has not been a requirement on real physical devices; therefore
the added overhead is not really justified.

Many network cards use counters in hardware to count packets/bytes
and there is no expectation of atomic access there.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [net-next RFC V5 5/5] virtio_net: support negotiating the number of queues through ctrl vq

2012-07-05 Thread Stephen Hemminger
On Fri, 06 Jul 2012 11:20:06 +0800
Jason Wang  wrote:

> On 07/05/2012 08:51 PM, Sasha Levin wrote:
> > On Thu, 2012-07-05 at 18:29 +0800, Jason Wang wrote:
> >> @@ -1387,6 +1404,10 @@ static int virtnet_probe(struct virtio_device *vdev)
> >>  if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ))
> >>  vi->has_cvq = true;
> >>
> >> +   /* Use single tx/rx queue pair as default */
> >> +   vi->num_queue_pairs = 1;
> >> +   vi->total_queue_pairs = num_queue_pairs;
> > The code is using this "default" even if the amount of queue pairs it
> > wants was specified during initialization. This basically limits any
> > device to use 1 pair when starting up.
> >
> 
> Yes, currently the virtio-net driver would use 1 txq/txq by default 
> since multiqueue may not outperform in all kinds of workload. So it's 
> better to keep it as default and let user enable multiqueue by ethtool -L.
> 

I would prefer that the driver sized number of queues based on number
of online CPU's. That is what real hardware does. What kind of workload
are you doing? If it is some DBMS benchmark then maybe the issue is that
some CPU's need to be reserved.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Question]About KVM network zero-copy feature!

2012-08-11 Thread Stephen Hemminger
On Fri, 10 Aug 2012 11:34:32 +0800
"Peter Huang(Peng)"  wrote:

> Hi,All
> 
> I searched from git-log, and found that until now we have vhost TX zero-copy 
> experiment feature, how
> about RX zero-copy?
> 
> For XEN, net-back also only has TX zero-copy, Is there any reason that RX 
> zero-copy still not implemented?
> 
There is no guarantee that packet will ever be read by receiver. This means 
zero-copy could
create memory back pressure stalls.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next V3 3/3] tun: rx batching

2016-12-31 Thread Stephen Hemminger
On Fri, 30 Dec 2016 13:20:51 +0800
Jason Wang  wrote:

> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index cd8e02c..a268ed9 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -75,6 +75,10 @@
>  
>  #include 
>  
> +static int rx_batched;
> +module_param(rx_batched, int, 0444);
> +MODULE_PARM_DESC(rx_batched, "Number of packets batched in rx");
> +
>  /* Uncomment to enable debugging */

I like the concept or rx batching. But controlling it via a module parameter
is one of the worst API choices.  Ethtool would be better to use because that is
how other network devices control batching.

If you do ethtool, you could even extend it to have an number of packets
and max latency value.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH net-next] net: make ndo_get_stats64 a void function

2017-01-05 Thread Stephen Hemminger
The network device operation for reading statistics is only called
in one place, and it ignores the return value. Having a structure
return value is potentially confusing because some future driver could
incorrectly assume that the return value was used.

Fix all drivers with ndo_get_stats64 to have a void function.

Signed-off-by: Stephen Hemminger 
---
 drivers/net/bonding/bond_main.c  | 10 --
 drivers/net/dummy.c  |  5 ++---
 drivers/net/ethernet/alacritech/slicoss.c|  6 ++
 drivers/net/ethernet/amazon/ena/ena_netdev.c | 10 --
 drivers/net/ethernet/amd/xgbe/xgbe-drv.c |  6 ++
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c |  4 +---
 drivers/net/ethernet/atheros/alx/main.c  |  6 ++
 drivers/net/ethernet/broadcom/b44.c  |  5 ++---
 drivers/net/ethernet/broadcom/bnx2.c |  3 +--
 drivers/net/ethernet/broadcom/bnxt/bnxt.c|  6 ++
 drivers/net/ethernet/broadcom/tg3.c  |  8 +++-
 drivers/net/ethernet/brocade/bna/bnad.c  |  6 ++
 drivers/net/ethernet/calxeda/xgmac.c |  5 ++---
 drivers/net/ethernet/cavium/thunder/nicvf_main.c |  5 ++---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c  |  7 +++
 drivers/net/ethernet/cisco/enic/enic_main.c  |  8 +++-
 drivers/net/ethernet/ec_bhf.c|  4 +---
 drivers/net/ethernet/emulex/benet/be_main.c  |  5 ++---
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c   |  6 ++
 drivers/net/ethernet/hisilicon/hns/hns_enet.c|  6 ++
 drivers/net/ethernet/ibm/ehea/ehea_main.c|  5 ++---
 drivers/net/ethernet/intel/e1000e/e1000.h|  4 ++--
 drivers/net/ethernet/intel/e1000e/netdev.c   |  5 ++---
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c  |  6 ++
 drivers/net/ethernet/intel/i40e/i40e.h   |  5 ++---
 drivers/net/ethernet/intel/i40e/i40e_main.c  | 18 ++
 drivers/net/ethernet/intel/igb/igb_main.c| 10 --
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c|  7 ---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c|  6 ++
 drivers/net/ethernet/marvell/mvneta.c|  4 +---
 drivers/net/ethernet/marvell/mvpp2.c |  4 +---
 drivers/net/ethernet/marvell/sky2.c  |  6 ++
 drivers/net/ethernet/mediatek/mtk_eth_soc.c  |  6 ++
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c   |  4 +---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c|  3 +--
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c |  3 +--
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c   |  4 +---
 drivers/net/ethernet/mellanox/mlxsw/switchx2.c   |  3 +--
 drivers/net/ethernet/myricom/myri10ge/myri10ge.c |  9 -
 drivers/net/ethernet/neterion/vxge/vxge-main.c   |  4 +---
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c  |  6 ++
 drivers/net/ethernet/nvidia/forcedeth.c  |  4 +---
 drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c | 10 --
 drivers/net/ethernet/qlogic/qede/qede_main.c |  7 ++-
 drivers/net/ethernet/qualcomm/emac/emac.c|  6 ++
 drivers/net/ethernet/realtek/8139too.c   |  9 +++--
 drivers/net/ethernet/realtek/r8169.c |  4 +---
 drivers/net/ethernet/samsung/sxgbe/sxgbe_main.c  |  8 ++--
 drivers/net/ethernet/sfc/efx.c   |  6 ++
 drivers/net/ethernet/sfc/falcon/efx.c|  6 ++
 drivers/net/ethernet/sun/niu.c   |  6 ++
 drivers/net/ethernet/synopsys/dwc_eth_qos.c  |  4 +---
 drivers/net/ethernet/tile/tilepro.c  |  4 ++--
 drivers/net/ethernet/via/via-rhine.c |  8 +++-
 drivers/net/fjes/fjes_main.c |  7 ++-
 drivers/net/hyperv/netvsc_drv.c  |  6 ++
 drivers/net/ifb.c|  6 ++
 drivers/net/ipvlan/ipvlan_main.c |  5 ++---
 drivers/net/loopback.c   |  5 ++---
 drivers/net/macsec.c |  6 ++
 drivers/net/macvlan.c|  5 ++---
 drivers/net/nlmon.c  |  4 +---
 drivers/net/ppp/ppp_generic.c|  4 +---
 drivers/net/slip/slip.c  |  3 +--
 drivers/net/team/team.c  |  3 +--
 drivers/net/tun.c|  3 +--
 drivers/net/veth.c   |  6 ++
 drivers/net/virtio_net.c |  6 ++
 drivers/net/vmxnet3/vmxnet3_ethtool.c|  4 +---
 drivers/net/vmxnet3/vmxnet3_int.h|  4 ++--
 drivers/net/vrf.c

[PATCH 00/14] hyperv: vmbus related patches

2017-02-01 Thread Stephen Hemminger
This is a rebase/resend of earlier patches. I skipped the pure
cosmetic patches for now.  Mostly this is consolidation earlier
changes, removing dead code etc.  The important part is the
change for allowing a vmbus channel to get callback directly
in interrupt mode; this is necessary for NAPI support.

Stephen Hemminger (14):
  vmbus: use kernel bitops for traversing interrupt mask
  vmbus: drop no longer used kick_q argument
  vmbus: remove no longer used signal_policy
  vmbus: remove unused kickq argument to sendpacket
  netvsc: remove no longer needed receive staging buffers
  vmbus: remove per channel state
  vmbus: callback is in softirq not workqueue
  vmbus: put related per-cpu variable together
  vmbus: change to per channel tasklet
  vmbus: add direct isr callback mode
  vmbus: remove conditional locking of vmbus_write
  vmbus: expose hv_begin/end_read
  vmbus: constify parameters where possible
  vmbus: replace modulus operation with subtraction

Starting point was top of current char-misc-next branch.

 drivers/hv/channel.c  |  47 +
 drivers/hv/channel_mgmt.c |  41 ++--
 drivers/hv/connection.c   | 134 +-
 drivers/hv/hv.c   | 124 +++
 drivers/hv/hv_util.c  |   3 +-
 drivers/hv/hyperv_vmbus.h |  80 ---
 drivers/hv/ring_buffer.c  |  66 ++-
 drivers/hv/vmbus_drv.c| 115 ++--
 drivers/net/hyperv/hyperv_net.h   |   5 --
 drivers/net/hyperv/netvsc.c   | 104 -
 drivers/net/hyperv/rndis_filter.c |  11 
 drivers/uio/uio_hv_generic.c  |   2 +-
 include/linux/hyperv.h| 134 +-
 13 files changed, 338 insertions(+), 528 deletions(-)

-- 
2.11.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 02/14] vmbus: drop no longer used kick_q argument

2017-02-01 Thread Stephen Hemminger
The flag to cause notification of host is unused after
commit a01a291a282f7c2e ("Drivers: hv: vmbus: Base host signaling
strictly on the ring state"). Therefore remove it from the ring
buffer internal API.

Signed-off-by: Stephen Hemminger 
---
 drivers/hv/channel.c  | 13 -
 drivers/hv/hyperv_vmbus.h |  5 ++---
 drivers/hv/ring_buffer.c  |  8 +++-
 3 files changed, 9 insertions(+), 17 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index a016c5c0e472..e26285cde8e0 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -670,9 +670,7 @@ int vmbus_sendpacket_ctl(struct vmbus_channel *channel, 
void *buffer,
bufferlist[2].iov_base = &aligned_data;
bufferlist[2].iov_len = (packetlen_aligned - packetlen);
 
-   return hv_ringbuffer_write(channel, bufferlist, num_vecs,
-  lock, kick_q);
-
+   return hv_ringbuffer_write(channel, bufferlist, num_vecs, lock);
 }
 EXPORT_SYMBOL(vmbus_sendpacket_ctl);
 
@@ -757,8 +755,7 @@ int vmbus_sendpacket_pagebuffer_ctl(struct vmbus_channel 
*channel,
bufferlist[2].iov_base = &aligned_data;
bufferlist[2].iov_len = (packetlen_aligned - packetlen);
 
-   return hv_ringbuffer_write(channel, bufferlist, 3,
-  lock, kick_q);
+   return hv_ringbuffer_write(channel, bufferlist, 3, lock);
 }
 EXPORT_SYMBOL_GPL(vmbus_sendpacket_pagebuffer_ctl);
 
@@ -813,8 +810,7 @@ int vmbus_sendpacket_mpb_desc(struct vmbus_channel *channel,
bufferlist[2].iov_base = &aligned_data;
bufferlist[2].iov_len = (packetlen_aligned - packetlen);
 
-   return hv_ringbuffer_write(channel, bufferlist, 3,
-  lock, true);
+   return hv_ringbuffer_write(channel, bufferlist, 3, lock);
 }
 EXPORT_SYMBOL_GPL(vmbus_sendpacket_mpb_desc);
 
@@ -871,8 +867,7 @@ int vmbus_sendpacket_multipagebuffer(struct vmbus_channel 
*channel,
bufferlist[2].iov_base = &aligned_data;
bufferlist[2].iov_len = (packetlen_aligned - packetlen);
 
-   return hv_ringbuffer_write(channel, bufferlist, 3,
-  lock, true);
+   return hv_ringbuffer_write(channel, bufferlist, 3, lock);
 }
 EXPORT_SYMBOL_GPL(vmbus_sendpacket_multipagebuffer);
 
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 2749a4142889..c375ec89db6f 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -275,9 +275,8 @@ int hv_ringbuffer_init(struct hv_ring_buffer_info 
*ring_info,
 void hv_ringbuffer_cleanup(struct hv_ring_buffer_info *ring_info);
 
 int hv_ringbuffer_write(struct vmbus_channel *channel,
-   struct kvec *kv_list,
-   u32 kv_count, bool lock,
-   bool kick_q);
+   struct kvec *kv_list,
+   u32 kv_count, bool lock);
 
 int hv_ringbuffer_read(struct vmbus_channel *channel,
   void *buffer, u32 buflen, u32 *buffer_actual_len,
diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index 2cd402986858..30ca55aefd24 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -77,8 +77,7 @@ u32 hv_end_read(struct hv_ring_buffer_info *rbi)
  * host logic is fixed.
  */
 
-static void hv_signal_on_write(u32 old_write, struct vmbus_channel *channel,
-  bool kick_q)
+static void hv_signal_on_write(u32 old_write, struct vmbus_channel *channel)
 {
struct hv_ring_buffer_info *rbi = &channel->outbound;
 
@@ -285,8 +284,7 @@ void hv_ringbuffer_cleanup(struct hv_ring_buffer_info 
*ring_info)
 
 /* Write to the ring buffer. */
 int hv_ringbuffer_write(struct vmbus_channel *channel,
-   struct kvec *kv_list, u32 kv_count, bool lock,
-   bool kick_q)
+   struct kvec *kv_list, u32 kv_count, bool lock)
 {
int i = 0;
u32 bytes_avail_towrite;
@@ -352,7 +350,7 @@ int hv_ringbuffer_write(struct vmbus_channel *channel,
if (lock)
spin_unlock_irqrestore(&outring_info->ring_lock, flags);
 
-   hv_signal_on_write(old_write, channel, kick_q);
+   hv_signal_on_write(old_write, channel);
 
if (channel->rescind)
return -ENODEV;
-- 
2.11.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 03/14] vmbus: remove no longer used signal_policy

2017-02-01 Thread Stephen Hemminger
The explicit signal policy is no longer used. A different mechanism
will be added later when xmit_more is supported.

Signed-off-by: Stephen Hemminger 
---
 include/linux/hyperv.h | 18 --
 1 file changed, 18 deletions(-)

diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 85b26f06e172..423fc96cc26a 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -670,11 +670,6 @@ struct hv_input_signal_event_buffer {
struct hv_input_signal_event event;
 };
 
-enum hv_signal_policy {
-   HV_SIGNAL_POLICY_DEFAULT = 0,
-   HV_SIGNAL_POLICY_EXPLICIT,
-};
-
 enum hv_numa_policy {
HV_BALANCED = 0,
HV_LOCALIZED,
@@ -837,13 +832,6 @@ struct vmbus_channel {
 */
struct list_head percpu_list;
/*
-* Host signaling policy: The default policy will be
-* based on the ring buffer state. We will also support
-* a policy where the client driver can have explicit
-* signaling control.
-*/
-   enum hv_signal_policy  signal_policy;
-   /*
 * On the channel send side, many of the VMBUS
 * device drivers explicity serialize access to the
 * outgoing ring buffer. Give more control to the
@@ -904,12 +892,6 @@ static inline bool is_hvsock_channel(const struct 
vmbus_channel *c)
  VMBUS_CHANNEL_TLNPI_PROVIDER_OFFER);
 }
 
-static inline void set_channel_signal_state(struct vmbus_channel *c,
-   enum hv_signal_policy policy)
-{
-   c->signal_policy = policy;
-}
-
 static inline void set_channel_affinity_state(struct vmbus_channel *c,
  enum hv_numa_policy policy)
 {
-- 
2.11.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 01/14] vmbus: use kernel bitops for traversing interrupt mask

2017-02-01 Thread Stephen Hemminger
Use standard kernel operations for find first set bit to traverse
the channel bit array. This has added benefit of speeding up
lookup on 64 bit and because it uses find first set instruction.

Signed-off-by: Stephen Hemminger 
---
 drivers/hv/channel.c  |  8 ++-
 drivers/hv/connection.c   | 55 +++
 drivers/hv/hyperv_vmbus.h | 16 --
 drivers/hv/vmbus_drv.c|  4 +---
 4 files changed, 29 insertions(+), 54 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index be34547cdb68..a016c5c0e472 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -47,12 +47,8 @@ void vmbus_setevent(struct vmbus_channel *channel)
 * For channels marked as in "low latency" mode
 * bypass the monitor page mechanism.
 */
-   if ((channel->offermsg.monitor_allocated) &&
-   (!channel->low_latency)) {
-   /* Each u32 represents 32 channels */
-   sync_set_bit(channel->offermsg.child_relid & 31,
-   (unsigned long *) vmbus_connection.send_int_page +
-   (channel->offermsg.child_relid >> 5));
+   if (channel->offermsg.monitor_allocated && !channel->low_latency) {
+   vmbus_send_interrupt(channel->offermsg.child_relid);
 
/* Get the child to parent monitor page */
monitorpage = vmbus_connection.monitor_pages[1];
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 307a5a8937f6..1766ef03e78d 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -379,17 +379,11 @@ static void process_chn_event(u32 relid)
  */
 void vmbus_on_event(unsigned long data)
 {
-   u32 dword;
-   u32 maxdword;
-   int bit;
-   u32 relid;
-   u32 *recv_int_page = NULL;
-   void *page_addr;
-   int cpu = smp_processor_id();
-   union hv_synic_event_flags *event;
+   unsigned long *recv_int_page;
+   u32 maxbits, relid;
 
if (vmbus_proto_version < VERSION_WIN8) {
-   maxdword = MAX_NUM_CHANNELS_SUPPORTED >> 5;
+   maxbits = MAX_NUM_CHANNELS_SUPPORTED;
recv_int_page = vmbus_connection.recv_int_page;
} else {
/*
@@ -397,35 +391,24 @@ void vmbus_on_event(unsigned long data)
 * can be directly checked to get the id of the channel
 * that has the interrupt pending.
 */
-   maxdword = HV_EVENT_FLAGS_DWORD_COUNT;
-   page_addr = hv_context.synic_event_page[cpu];
-   event = (union hv_synic_event_flags *)page_addr +
+   int cpu = smp_processor_id();
+   void *page_addr = hv_context.synic_event_page[cpu];
+   union hv_synic_event_flags *event
+   = (union hv_synic_event_flags *)page_addr +
 VMBUS_MESSAGE_SINT;
-   recv_int_page = event->flags32;
-   }
-
 
+   maxbits = HV_EVENT_FLAGS_COUNT;
+   recv_int_page = event->flags;
+   }
 
-   /* Check events */
-   if (!recv_int_page)
+   if (unlikely(!recv_int_page))
return;
-   for (dword = 0; dword < maxdword; dword++) {
-   if (!recv_int_page[dword])
-   continue;
-   for (bit = 0; bit < 32; bit++) {
-   if (sync_test_and_clear_bit(bit,
-   (unsigned long *)&recv_int_page[dword])) {
-   relid = (dword << 5) + bit;
-
-   if (relid == 0)
-   /*
-* Special case - vmbus
-* channel protocol msg
-*/
-   continue;
 
+   for_each_set_bit(relid, recv_int_page, maxbits) {
+   if (sync_test_and_clear_bit(relid, recv_int_page)) {
+   /* Special case - vmbus channel protocol msg */
+   if (relid != 0)
process_chn_event(relid);
-   }
}
}
 }
@@ -491,12 +474,8 @@ void vmbus_set_event(struct vmbus_channel *channel)
 {
u32 child_relid = channel->offermsg.child_relid;
 
-   if (!channel->is_dedicated_interrupt) {
-   /* Each u32 represents 32 channels */
-   sync_set_bit(child_relid & 31,
-   (unsigned long *)vmbus_connection.send_int_page +
-   (child_relid >> 5));
-   }
+   if (!channel->is_dedicated_interrupt)
+   vmbus_send_interrupt(child_relid);
 
hv_do_hypercall(HVCALL_SIGNAL_EVENT, channel->sig_event, NULL);
 }
diff --git a/

[PATCH 04/14] vmbus: remove unused kickq argument to sendpacket

2017-02-01 Thread Stephen Hemminger
Since sendpacket no longer uses kickq argument remove it.
Remove it no longer used xmit_more in sendpacket in netvsc as well.

Signed-off-by: Stephen Hemminger 
---
 drivers/hv/channel.c| 19 +--
 drivers/net/hyperv/netvsc.c | 21 +++--
 include/linux/hyperv.h  |  6 ++
 3 files changed, 14 insertions(+), 32 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index e26285cde8e0..789c75f6df26 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -643,8 +643,8 @@ void vmbus_close(struct vmbus_channel *channel)
 EXPORT_SYMBOL_GPL(vmbus_close);
 
 int vmbus_sendpacket_ctl(struct vmbus_channel *channel, void *buffer,
-  u32 bufferlen, u64 requestid,
-  enum vmbus_packet_type type, u32 flags, bool kick_q)
+u32 bufferlen, u64 requestid,
+enum vmbus_packet_type type, u32 flags)
 {
struct vmpacket_descriptor desc;
u32 packetlen = sizeof(struct vmpacket_descriptor) + bufferlen;
@@ -693,7 +693,7 @@ int vmbus_sendpacket(struct vmbus_channel *channel, void 
*buffer,
   enum vmbus_packet_type type, u32 flags)
 {
return vmbus_sendpacket_ctl(channel, buffer, bufferlen, requestid,
-   type, flags, true);
+   type, flags);
 }
 EXPORT_SYMBOL(vmbus_sendpacket);
 
@@ -705,11 +705,9 @@ EXPORT_SYMBOL(vmbus_sendpacket);
  * explicitly.
  */
 int vmbus_sendpacket_pagebuffer_ctl(struct vmbus_channel *channel,
-struct hv_page_buffer pagebuffers[],
-u32 pagecount, void *buffer, u32 bufferlen,
-u64 requestid,
-u32 flags,
-bool kick_q)
+   struct hv_page_buffer pagebuffers[],
+   u32 pagecount, void *buffer, u32 bufferlen,
+   u64 requestid, u32 flags)
 {
int i;
struct vmbus_channel_packet_page_buffer desc;
@@ -769,9 +767,10 @@ int vmbus_sendpacket_pagebuffer(struct vmbus_channel 
*channel,
 u64 requestid)
 {
u32 flags = VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED;
+
return vmbus_sendpacket_pagebuffer_ctl(channel, pagebuffers, pagecount,
-  buffer, bufferlen, requestid,
-  flags, true);
+  buffer, bufferlen,
+  requestid, flags);
 
 }
 EXPORT_SYMBOL_GPL(vmbus_sendpacket_pagebuffer);
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 5a1cc089acb7..e326e68f9f6d 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -723,8 +723,6 @@ static u32 netvsc_copy_to_send_buf(struct netvsc_device 
*net_device,
char *dest = start + (section_index * net_device->send_section_size)
 + pend_size;
int i;
-   bool is_data_pkt = (skb != NULL) ? true : false;
-   bool xmit_more = (skb != NULL) ? skb->xmit_more : false;
u32 msg_size = 0;
u32 padding = 0;
u32 remain = packet->total_data_buflen % net_device->pkt_align;
@@ -732,7 +730,7 @@ static u32 netvsc_copy_to_send_buf(struct netvsc_device 
*net_device,
packet->page_buf_cnt;
 
/* Add padding */
-   if (is_data_pkt && xmit_more && remain &&
+   if (skb && skb->xmit_more && remain &&
!packet->cp_partial) {
padding = net_device->pkt_align - remain;
rndis_msg->msg_len += padding;
@@ -772,7 +770,6 @@ static inline int netvsc_send_pkt(
int ret;
struct hv_page_buffer *pgbuf;
u32 ring_avail = hv_ringbuf_avail_percent(&out_channel->outbound);
-   bool xmit_more = (skb != NULL) ? skb->xmit_more : false;
 
nvmsg.hdr.msg_type = NVSP_MSG1_TYPE_SEND_RNDIS_PKT;
if (skb != NULL) {
@@ -796,16 +793,6 @@ static inline int netvsc_send_pkt(
if (out_channel->rescind)
return -ENODEV;
 
-   /*
-* It is possible that once we successfully place this packet
-* on the ringbuffer, we may stop the queue. In that case, we want
-* to notify the host independent of the xmit_more flag. We don't
-* need to be precise here; in the worst case we may signal the host
-* unnecessarily.
-*/
-   if (ring_avail < (RING_AVAIL_PERCENT_LOWATER + 1))
-   xmit_more = false;
-
if (packet->page_buf_cnt) {
pgbuf = packet->cp_partial ? (*pb) +
packet->rmsg_pgcnt : (*pb);
@@ -815,15 +

[PATCH 05/14] netvsc: remove no longer needed receive staging buffers

2017-02-01 Thread Stephen Hemminger
Since commit aed8c164ca5199 ("Drivers: hv: ring_buffer: count on wrap
around mappings") it is no longer necessary to handle ring wrapping
by having a special receive buffer.

Signed-off-by: Stephen Hemminger 
---
 drivers/net/hyperv/hyperv_net.h   |  5 ---
 drivers/net/hyperv/netvsc.c   | 83 ++-
 drivers/net/hyperv/rndis_filter.c | 11 --
 3 files changed, 11 insertions(+), 88 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 3958adade7eb..cce70ceba6d5 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -748,11 +748,6 @@ struct netvsc_device {
 
int ring_size;
 
-   /* The primary channel callback buffer */
-   unsigned char *cb_buffer;
-   /* The sub channel callback buffer */
-   unsigned char *sub_cb_buf;
-
struct multi_send_data msd[VRSS_CHANNEL_MAX];
u32 max_pkt; /* max number of pkt in one send, e.g. 8 */
u32 pkt_align; /* alignment bytes, e.g. 8 */
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index e326e68f9f6d..7487498b663c 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -67,12 +67,6 @@ static struct netvsc_device *alloc_net_device(void)
if (!net_device)
return NULL;
 
-   net_device->cb_buffer = kzalloc(NETVSC_PACKET_SIZE, GFP_KERNEL);
-   if (!net_device->cb_buffer) {
-   kfree(net_device);
-   return NULL;
-   }
-
net_device->mrc[0].buf = vzalloc(NETVSC_RECVSLOT_MAX *
 sizeof(struct recv_comp_data));
 
@@ -93,7 +87,6 @@ static void free_netvsc_device(struct netvsc_device *nvdev)
for (i = 0; i < VRSS_CHANNEL_MAX; i++)
vfree(nvdev->mrc[i].buf);
 
-   kfree(nvdev->cb_buffer);
kfree(nvdev);
 }
 
@@ -584,7 +577,6 @@ void netvsc_device_remove(struct hv_device *device)
vmbus_close(device->channel);
 
/* Release all resources */
-   vfree(net_device->sub_cb_buf);
free_netvsc_device(net_device);
 }
 
@@ -1256,16 +1248,11 @@ static void netvsc_process_raw_pkt(struct hv_device 
*device,
 
 void netvsc_channel_cb(void *context)
 {
-   int ret;
-   struct vmbus_channel *channel = (struct vmbus_channel *)context;
+   struct vmbus_channel *channel = context;
u16 q_idx = channel->offermsg.offer.sub_channel_index;
struct hv_device *device;
struct netvsc_device *net_device;
-   u32 bytes_recvd;
-   u64 request_id;
struct vmpacket_descriptor *desc;
-   unsigned char *buffer;
-   int bufferlen = NETVSC_PACKET_SIZE;
struct net_device *ndev;
bool need_to_commit = false;
 
@@ -1277,65 +1264,19 @@ void netvsc_channel_cb(void *context)
net_device = get_inbound_net_device(device);
if (!net_device)
return;
+
ndev = hv_get_drvdata(device);
-   buffer = get_per_channel_state(channel);
-
-   do {
-   desc = get_next_pkt_raw(channel);
-   if (desc != NULL) {
-   netvsc_process_raw_pkt(device,
-  channel,
-  net_device,
-  ndev,
-  desc->trans_id,
-  desc);
-
-   put_pkt_raw(channel, desc);
-   need_to_commit = true;
-   continue;
-   }
-   if (need_to_commit) {
-   need_to_commit = false;
-   commit_rd_index(channel);
-   }
 
-   ret = vmbus_recvpacket_raw(channel, buffer, bufferlen,
-  &bytes_recvd, &request_id);
-   if (ret == 0) {
-   if (bytes_recvd > 0) {
-   desc = (struct vmpacket_descriptor *)buffer;
-   netvsc_process_raw_pkt(device,
-  channel,
-  net_device,
-  ndev,
-  request_id,
-  desc);
-   } else {
-   /*
-* We are done for this pass.
-*/
-   break;
-   }
-
-   } else if (ret == -ENOBUFS) {
-   if (bufferlen > NETVSC_PACKET_SIZE)
-   kfree(buffer);
-   /* Handle large packet */
-   buffer = kmallo

[PATCH 06/14] vmbus: remove per channel state

2017-02-01 Thread Stephen Hemminger
The netvsc no longer needs per channel state hook to track receive buffer.

Signed-off-by: Stephen Hemminger 
---
 include/linux/hyperv.h | 14 --
 1 file changed, 14 deletions(-)

diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 8c6a1505b876..39d493ce550d 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -823,10 +823,6 @@ struct vmbus_channel {
 */
struct vmbus_channel *primary_channel;
/*
-* Support per-channel state for use by vmbus drivers.
-*/
-   void *per_channel_state;
-   /*
 * To support per-cpu lookup mapping of relid to channel,
 * link up channels based on their CPU affinity.
 */
@@ -903,16 +899,6 @@ static inline void set_channel_read_state(struct 
vmbus_channel *c, bool state)
c->batched_reading = state;
 }
 
-static inline void set_per_channel_state(struct vmbus_channel *c, void *s)
-{
-   c->per_channel_state = s;
-}
-
-static inline void *get_per_channel_state(struct vmbus_channel *c)
-{
-   return c->per_channel_state;
-}
-
 static inline void set_channel_pending_send_size(struct vmbus_channel *c,
 u32 size)
 {
-- 
2.11.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 08/14] vmbus: put related per-cpu variable together

2017-02-01 Thread Stephen Hemminger
The hv_context structure had several arrays which were per-cpu
and was allocating small structures (tasklet_struct). Instead use
a single per-cpu array.

Signed-off-by: Stephen Hemminger 
---
 drivers/hv/channel_mgmt.c |  35 -
 drivers/hv/connection.c   |  20 ---
 drivers/hv/hv.c   | 130 --
 drivers/hv/hyperv_vmbus.h |  53 +++
 drivers/hv/vmbus_drv.c|  39 --
 5 files changed, 143 insertions(+), 134 deletions(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index de90a9900fee..579ad2560a39 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -353,9 +353,10 @@ static void free_channel(struct vmbus_channel *channel)
 static void percpu_channel_enq(void *arg)
 {
struct vmbus_channel *channel = arg;
-   int cpu = smp_processor_id();
+   struct hv_per_cpu_context *hv_cpu
+   = this_cpu_ptr(hv_context.cpu_context);
 
-   list_add_tail(&channel->percpu_list, &hv_context.percpu_list[cpu]);
+   list_add_tail(&channel->percpu_list, &hv_cpu->chan_list);
 }
 
 static void percpu_channel_deq(void *arg)
@@ -379,19 +380,21 @@ static void vmbus_release_relid(u32 relid)
 
 void hv_event_tasklet_disable(struct vmbus_channel *channel)
 {
-   struct tasklet_struct *tasklet;
-   tasklet = hv_context.event_dpc[channel->target_cpu];
-   tasklet_disable(tasklet);
+   struct hv_per_cpu_context *hv_cpu;
+
+   hv_cpu = per_cpu_ptr(hv_context.cpu_context, channel->target_cpu);
+   tasklet_disable(&hv_cpu->event_dpc);
 }
 
 void hv_event_tasklet_enable(struct vmbus_channel *channel)
 {
-   struct tasklet_struct *tasklet;
-   tasklet = hv_context.event_dpc[channel->target_cpu];
-   tasklet_enable(tasklet);
+   struct hv_per_cpu_context *hv_cpu;
+
+   hv_cpu = per_cpu_ptr(hv_context.cpu_context, channel->target_cpu);
+   tasklet_enable(&hv_cpu->event_dpc);
 
/* In case there is any pending event */
-   tasklet_schedule(tasklet);
+   tasklet_schedule(&hv_cpu->event_dpc);
 }
 
 void hv_process_channel_removal(struct vmbus_channel *channel, u32 relid)
@@ -726,9 +729,12 @@ static void vmbus_wait_for_unload(void)
break;
 
for_each_online_cpu(cpu) {
-   page_addr = hv_context.synic_message_page[cpu];
-   msg = (struct hv_message *)page_addr +
-   VMBUS_MESSAGE_SINT;
+   struct hv_per_cpu_context *hv_cpu
+   = per_cpu_ptr(hv_context.cpu_context, cpu);
+
+   page_addr = hv_cpu->synic_message_page;
+   msg = (struct hv_message *)page_addr
+   + VMBUS_MESSAGE_SINT;
 
message_type = READ_ONCE(msg->header.message_type);
if (message_type == HVMSG_NONE)
@@ -752,7 +758,10 @@ static void vmbus_wait_for_unload(void)
 * messages after we reconnect.
 */
for_each_online_cpu(cpu) {
-   page_addr = hv_context.synic_message_page[cpu];
+   struct hv_per_cpu_context *hv_cpu
+   = per_cpu_ptr(hv_context.cpu_context, cpu);
+
+   page_addr = hv_cpu->synic_message_page;
msg = (struct hv_message *)page_addr + VMBUS_MESSAGE_SINT;
msg->header.message_type = HVMSG_NONE;
}
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 1766ef03e78d..158f12823baf 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -93,12 +93,10 @@ static int vmbus_negotiate_version(struct 
vmbus_channel_msginfo *msginfo,
 * all the CPUs. This is needed for kexec to work correctly where
 * the CPU attempting to connect may not be CPU 0.
 */
-   if (version >= VERSION_WIN8_1) {
-   msg->target_vcpu = hv_context.vp_index[get_cpu()];
-   put_cpu();
-   } else {
+   if (version >= VERSION_WIN8_1)
+   msg->target_vcpu = hv_context.vp_index[smp_processor_id()];
+   else
msg->target_vcpu = 0;
-   }
 
/*
 * Add to list before we send the request since we may
@@ -269,12 +267,12 @@ void vmbus_disconnect(void)
  */
 static struct vmbus_channel *pcpu_relid2channel(u32 relid)
 {
+   struct hv_per_cpu_context *hv_cpu
+   = this_cpu_ptr(hv_context.cpu_context);
+   struct vmbus_channel *found_channel = NULL;
struct vmbus_channel *channel;
-   struct vmbus_channel *found_channel  = NULL;
-   int cpu = smp_processor_id();
-   struct list_head *pcpu_head = &hv_context.percpu_list[cpu];
 
-   list_for_each_entry(channel, pcpu_head, percpu_list) {
+   list_for_each_entry(channel, &

[PATCH 07/14] vmbus: callback is in softirq not workqueue

2017-02-01 Thread Stephen Hemminger
The callback is done via tasklet not workqueue.

Signed-off-by: Stephen Hemminger 
---
 include/linux/hyperv.h | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 39d493ce550d..b30808f740f9 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -32,7 +32,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -729,9 +728,7 @@ struct vmbus_channel {
 
struct vmbus_close_msg close_msg;
 
-   /* Channel callback are invoked in this workqueue context */
-   /* HANDLE dataWorkQueue; */
-
+   /* Channel callback's invoked in softirq context */
void (*onchannel_callback)(void *context);
void *channel_callback_context;
 
-- 
2.11.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 09/14] vmbus: change to per channel tasklet

2017-02-01 Thread Stephen Hemminger
Make the event handling tasklet per channel rather than per-cpu.
This allows for better fairness when getting lots of data on the same
cpu.

Signed-off-by: Stephen Hemminger 
---
 drivers/hv/channel.c  |  2 +-
 drivers/hv/channel_mgmt.c | 16 +-
 drivers/hv/connection.c   | 78 ++-
 drivers/hv/hv.c   |  2 --
 drivers/hv/hyperv_vmbus.h |  1 -
 drivers/hv/vmbus_drv.c| 58 ++-
 include/linux/hyperv.h|  3 +-
 7 files changed, 64 insertions(+), 96 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index 789c75f6df26..18cc1c78260d 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -530,7 +530,7 @@ static int vmbus_close_internal(struct vmbus_channel 
*channel)
int ret;
 
/*
-* process_chn_event(), running in the tasklet, can race
+* vmbus_on_event(), running in the tasklet, can race
 * with vmbus_close_internal() in the case of SMP guest, e.g., when
 * the former is accessing channel->inbound.ring_buffer, the latter
 * could be freeing the ring_buffer pages.
diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 579ad2560a39..2f6270d76b79 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -339,6 +339,9 @@ static struct vmbus_channel *alloc_channel(void)
INIT_LIST_HEAD(&channel->sc_list);
INIT_LIST_HEAD(&channel->percpu_list);
 
+   tasklet_init(&channel->callback_event,
+vmbus_on_event, (unsigned long)channel);
+
return channel;
 }
 
@@ -347,6 +350,7 @@ static struct vmbus_channel *alloc_channel(void)
  */
 static void free_channel(struct vmbus_channel *channel)
 {
+   tasklet_kill(&channel->callback_event);
kfree(channel);
 }
 
@@ -380,21 +384,15 @@ static void vmbus_release_relid(u32 relid)
 
 void hv_event_tasklet_disable(struct vmbus_channel *channel)
 {
-   struct hv_per_cpu_context *hv_cpu;
-
-   hv_cpu = per_cpu_ptr(hv_context.cpu_context, channel->target_cpu);
-   tasklet_disable(&hv_cpu->event_dpc);
+   tasklet_disable(&channel->callback_event);
 }
 
 void hv_event_tasklet_enable(struct vmbus_channel *channel)
 {
-   struct hv_per_cpu_context *hv_cpu;
-
-   hv_cpu = per_cpu_ptr(hv_context.cpu_context, channel->target_cpu);
-   tasklet_enable(&hv_cpu->event_dpc);
+   tasklet_enable(&channel->callback_event);
 
/* In case there is any pending event */
-   tasklet_schedule(&hv_cpu->event_dpc);
+   tasklet_schedule(&channel->callback_event);
 }
 
 void hv_process_channel_removal(struct vmbus_channel *channel, u32 relid)
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 158f12823baf..27e72dc07e12 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -260,29 +260,6 @@ void vmbus_disconnect(void)
 }
 
 /*
- * Map the given relid to the corresponding channel based on the
- * per-cpu list of channels that have been affinitized to this CPU.
- * This will be used in the channel callback path as we can do this
- * mapping in a lock-free fashion.
- */
-static struct vmbus_channel *pcpu_relid2channel(u32 relid)
-{
-   struct hv_per_cpu_context *hv_cpu
-   = this_cpu_ptr(hv_context.cpu_context);
-   struct vmbus_channel *found_channel = NULL;
-   struct vmbus_channel *channel;
-
-   list_for_each_entry(channel, &hv_cpu->chan_list, percpu_list) {
-   if (channel->offermsg.child_relid == relid) {
-   found_channel = channel;
-   break;
-   }
-   }
-
-   return found_channel;
-}
-
-/*
  * relid2channel - Get the channel object given its
  * child relative id (ie channel id)
  */
@@ -318,25 +295,16 @@ struct vmbus_channel *relid2channel(u32 relid)
 }
 
 /*
- * process_chn_event - Process a channel event notification
+ * vmbus_on_event - Process a channel event notification
  */
-static void process_chn_event(u32 relid)
+void vmbus_on_event(unsigned long data)
 {
-   struct vmbus_channel *channel;
+   struct vmbus_channel *channel = (void *) data;
void *arg;
bool read_state;
u32 bytes_to_read;
 
/*
-* Find the channel based on this relid and invokes the
-* channel callback to process the event
-*/
-   channel = pcpu_relid2channel(relid);
-
-   if (!channel)
-   return;
-
-   /*
 * A channel once created is persistent even when there
 * is no driver handling the device. An unloading driver
 * sets the onchannel_callback to NULL on the same CPU
@@ -344,7 +312,6 @@ static void process_chn_event(u32 relid)
 * Thus, checking and invoking the driver specific callback takes
 * care of orderly unloading of the driver.
 */
-
if (chann

[PATCH 11/14] vmbus: remove conditional locking of vmbus_write

2017-02-01 Thread Stephen Hemminger
All current usage of vmbus write uses the acquire_lock flag, therefore
having it be optional is unnecessary. This also fixes a sparse warning
since sparse doesn't like when a function has conditional locking.

Signed-off-by: Stephen Hemminger 
---
 drivers/hv/channel.c  | 13 -
 drivers/hv/channel_mgmt.c |  1 -
 drivers/hv/hyperv_vmbus.h |  3 +--
 drivers/hv/ring_buffer.c  | 11 ---
 include/linux/hyperv.h| 15 ---
 5 files changed, 9 insertions(+), 34 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index 18cc1c78260d..81a80c82f1bd 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -651,7 +651,6 @@ int vmbus_sendpacket_ctl(struct vmbus_channel *channel, 
void *buffer,
u32 packetlen_aligned = ALIGN(packetlen, sizeof(u64));
struct kvec bufferlist[3];
u64 aligned_data = 0;
-   bool lock = channel->acquire_ring_lock;
int num_vecs = ((bufferlen != 0) ? 3 : 1);
 
 
@@ -670,7 +669,7 @@ int vmbus_sendpacket_ctl(struct vmbus_channel *channel, 
void *buffer,
bufferlist[2].iov_base = &aligned_data;
bufferlist[2].iov_len = (packetlen_aligned - packetlen);
 
-   return hv_ringbuffer_write(channel, bufferlist, num_vecs, lock);
+   return hv_ringbuffer_write(channel, bufferlist, num_vecs);
 }
 EXPORT_SYMBOL(vmbus_sendpacket_ctl);
 
@@ -716,12 +715,10 @@ int vmbus_sendpacket_pagebuffer_ctl(struct vmbus_channel 
*channel,
u32 packetlen_aligned;
struct kvec bufferlist[3];
u64 aligned_data = 0;
-   bool lock = channel->acquire_ring_lock;
 
if (pagecount > MAX_PAGE_BUFFER_COUNT)
return -EINVAL;
 
-
/*
 * Adjust the size down since vmbus_channel_packet_page_buffer is the
 * largest size we support
@@ -753,7 +750,7 @@ int vmbus_sendpacket_pagebuffer_ctl(struct vmbus_channel 
*channel,
bufferlist[2].iov_base = &aligned_data;
bufferlist[2].iov_len = (packetlen_aligned - packetlen);
 
-   return hv_ringbuffer_write(channel, bufferlist, 3, lock);
+   return hv_ringbuffer_write(channel, bufferlist, 3);
 }
 EXPORT_SYMBOL_GPL(vmbus_sendpacket_pagebuffer_ctl);
 
@@ -789,7 +786,6 @@ int vmbus_sendpacket_mpb_desc(struct vmbus_channel *channel,
u32 packetlen_aligned;
struct kvec bufferlist[3];
u64 aligned_data = 0;
-   bool lock = channel->acquire_ring_lock;
 
packetlen = desc_size + bufferlen;
packetlen_aligned = ALIGN(packetlen, sizeof(u64));
@@ -809,7 +805,7 @@ int vmbus_sendpacket_mpb_desc(struct vmbus_channel *channel,
bufferlist[2].iov_base = &aligned_data;
bufferlist[2].iov_len = (packetlen_aligned - packetlen);
 
-   return hv_ringbuffer_write(channel, bufferlist, 3, lock);
+   return hv_ringbuffer_write(channel, bufferlist, 3);
 }
 EXPORT_SYMBOL_GPL(vmbus_sendpacket_mpb_desc);
 
@@ -827,7 +823,6 @@ int vmbus_sendpacket_multipagebuffer(struct vmbus_channel 
*channel,
u32 packetlen_aligned;
struct kvec bufferlist[3];
u64 aligned_data = 0;
-   bool lock = channel->acquire_ring_lock;
u32 pfncount = NUM_PAGES_SPANNED(multi_pagebuffer->offset,
 multi_pagebuffer->len);
 
@@ -866,7 +861,7 @@ int vmbus_sendpacket_multipagebuffer(struct vmbus_channel 
*channel,
bufferlist[2].iov_base = &aligned_data;
bufferlist[2].iov_len = (packetlen_aligned - packetlen);
 
-   return hv_ringbuffer_write(channel, bufferlist, 3, lock);
+   return hv_ringbuffer_write(channel, bufferlist, 3);
 }
 EXPORT_SYMBOL_GPL(vmbus_sendpacket_multipagebuffer);
 
diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index b2bb5aafaa2f..f33465d78a02 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -332,7 +332,6 @@ static struct vmbus_channel *alloc_channel(void)
if (!channel)
return NULL;
 
-   channel->acquire_ring_lock = true;
spin_lock_init(&channel->inbound_lock);
spin_lock_init(&channel->lock);
 
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 558a798c407c..6a9b54677218 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -283,8 +283,7 @@ int hv_ringbuffer_init(struct hv_ring_buffer_info 
*ring_info,
 void hv_ringbuffer_cleanup(struct hv_ring_buffer_info *ring_info);
 
 int hv_ringbuffer_write(struct vmbus_channel *channel,
-   struct kvec *kv_list,
-   u32 kv_count, bool lock);
+   struct kvec *kv_list, u32 kv_count);
 
 int hv_ringbuffer_read(struct vmbus_channel *channel,
   void *buffer, u32 buflen, u32 *buffer_actual_len,
diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index 30ca55aefd24..146fd8ab2a2a 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -284,7 +284

[PATCH 10/14] vmbus: add direct isr callback mode

2017-02-01 Thread Stephen Hemminger
Change the simple boolean batched_reading into a tri-value.
For future NAPI support in netvsc driver, the callback needs to
occur directly in interrupt handler.

Batched mode is also changed to disable host interrupts immediately
in interrupt routine (to avoid unnecessary host signals), and the
tasklet is rescheduled if more data is detected.

Signed-off-by: Stephen Hemminger 
---
 drivers/hv/channel_mgmt.c|  7 ---
 drivers/hv/connection.c  | 27 ---
 drivers/hv/hv_util.c |  3 +--
 drivers/hv/vmbus_drv.c   | 26 --
 drivers/uio/uio_hv_generic.c |  2 +-
 include/linux/hyperv.h   | 31 +--
 6 files changed, 55 insertions(+), 41 deletions(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 2f6270d76b79..b2bb5aafaa2f 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -820,13 +820,6 @@ static void vmbus_onoffer(struct 
vmbus_channel_message_header *hdr)
}
 
/*
-* By default we setup state to enable batched
-* reading. A specific service can choose to
-* disable this prior to opening the channel.
-*/
-   newchannel->batched_reading = true;
-
-   /*
 * Setup state for signalling the host.
 */
newchannel->sig_event = (struct hv_input_signal_event *)
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 27e72dc07e12..a8366fec1458 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -300,9 +300,7 @@ struct vmbus_channel *relid2channel(u32 relid)
 void vmbus_on_event(unsigned long data)
 {
struct vmbus_channel *channel = (void *) data;
-   void *arg;
-   bool read_state;
-   u32 bytes_to_read;
+   void (*callback_fn)(void *);
 
/*
 * A channel once created is persistent even when there
@@ -312,9 +310,13 @@ void vmbus_on_event(unsigned long data)
 * Thus, checking and invoking the driver specific callback takes
 * care of orderly unloading of the driver.
 */
-   if (channel->onchannel_callback != NULL) {
-   arg = channel->channel_callback_context;
-   read_state = channel->batched_reading;
+   callback_fn = READ_ONCE(channel->onchannel_callback);
+   if (unlikely(callback_fn == NULL))
+   return;
+
+   (*callback_fn)(channel->channel_callback_context);
+
+   if (channel->callback_mode == HV_CALL_BATCHED) {
/*
 * This callback reads the messages sent by the host.
 * We can optimize host to guest signaling by ensuring:
@@ -326,16 +328,11 @@ void vmbus_on_event(unsigned long data)
 *state is set we check to see if additional packets are
 *available to read. In this case we repeat the process.
 */
+   if (hv_end_read(&channel->inbound) != 0) {
+   hv_begin_read(&channel->inbound);
 
-   do {
-   if (read_state)
-   hv_begin_read(&channel->inbound);
-   channel->onchannel_callback(arg);
-   if (read_state)
-   bytes_to_read = hv_end_read(&channel->inbound);
-   else
-   bytes_to_read = 0;
-   } while (read_state && (bytes_to_read != 0));
+   tasklet_schedule(&channel->callback_event);
+   }
}
 }
 
diff --git a/drivers/hv/hv_util.c b/drivers/hv/hv_util.c
index d42ede78a9dd..8410191b4992 100644
--- a/drivers/hv/hv_util.c
+++ b/drivers/hv/hv_util.c
@@ -409,8 +409,7 @@ static int util_probe(struct hv_device *dev,
 * Turn off batched reading for all util drivers before we open the
 * channel.
 */
-
-   set_channel_read_state(dev->channel, false);
+   set_channel_read_mode(dev->channel, HV_CALL_DIRECT);
 
hv_set_drvdata(dev, srv);
 
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index eaf1a10b0245..f7f6b9144b07 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -887,6 +887,18 @@ void vmbus_on_msg_dpc(unsigned long data)
 
 
 /*
+ * Direct callback for channels using other deferred processing
+ */
+static void vmbus_channel_isr(struct vmbus_channel *channel)
+{
+   void (*callback_fn)(void *);
+
+   callback_fn = READ_ONCE(channel->onchannel_callback);
+   if (likely(callback_fn != NULL))
+   (*callback_fn)(channel->channel_callback_context);
+}
+
+/*
  * Schedule all channels with events pending
  */
 static void vmbus_chan_sched(struct hv_per_cpu_context *hv_cpu)
@@ -927,9 +939,19 @@ static void vmbus_chan_sched(struct hv_per_cpu_context 
*hv_cpu)
 
/* Find channel based on rel

[PATCH 13/14] vmbus: constify parameters where possible

2017-02-01 Thread Stephen Hemminger
Functions that just query state of ring buffer can have parameters
marked const.

Signed-off-by: Stephen Hemminger 
---
 drivers/hv/hyperv_vmbus.h |  6 +++---
 drivers/hv/ring_buffer.c  | 22 ++
 include/linux/hyperv.h| 12 ++--
 3 files changed, 19 insertions(+), 21 deletions(-)

diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index e15a130de3c9..884f83bba1ab 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -283,14 +283,14 @@ int hv_ringbuffer_init(struct hv_ring_buffer_info 
*ring_info,
 void hv_ringbuffer_cleanup(struct hv_ring_buffer_info *ring_info);
 
 int hv_ringbuffer_write(struct vmbus_channel *channel,
-   struct kvec *kv_list, u32 kv_count);
+   const struct kvec *kv_list, u32 kv_count);
 
 int hv_ringbuffer_read(struct vmbus_channel *channel,
   void *buffer, u32 buflen, u32 *buffer_actual_len,
   u64 *requestid, bool raw);
 
-void hv_ringbuffer_get_debuginfo(struct hv_ring_buffer_info *ring_info,
-   struct hv_ring_buffer_debug_info *debug_info);
+void hv_ringbuffer_get_debuginfo(const struct hv_ring_buffer_info *ring_info,
+struct hv_ring_buffer_debug_info *debug_info);
 
 /*
  * Maximum channels is determined by the size of the interrupt page
diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index 47ab69089115..ee3e488d9dee 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -96,11 +96,9 @@ hv_set_next_write_location(struct hv_ring_buffer_info 
*ring_info,
 
 /* Get the next read location for the specified ring buffer. */
 static inline u32
-hv_get_next_read_location(struct hv_ring_buffer_info *ring_info)
+hv_get_next_read_location(const struct hv_ring_buffer_info *ring_info)
 {
-   u32 next = ring_info->ring_buffer->read_index;
-
-   return next;
+   return ring_info->ring_buffer->read_index;
 }
 
 /*
@@ -108,8 +106,8 @@ hv_get_next_read_location(struct hv_ring_buffer_info 
*ring_info)
  * This allows the caller to skip.
  */
 static inline u32
-hv_get_next_readlocation_withoffset(struct hv_ring_buffer_info *ring_info,
-u32 offset)
+hv_get_next_readlocation_withoffset(const struct hv_ring_buffer_info 
*ring_info,
+   u32 offset)
 {
u32 next = ring_info->ring_buffer->read_index;
 
@@ -130,7 +128,7 @@ hv_set_next_read_location(struct hv_ring_buffer_info 
*ring_info,
 
 /* Get the size of the ring buffer. */
 static inline u32
-hv_get_ring_buffersize(struct hv_ring_buffer_info *ring_info)
+hv_get_ring_buffersize(const struct hv_ring_buffer_info *ring_info)
 {
return ring_info->ring_datasize;
 }
@@ -147,7 +145,7 @@ hv_get_ring_bufferindices(struct hv_ring_buffer_info 
*ring_info)
  * Assume there is enough room. Handles wrap-around in src case only!!
  */
 static u32 hv_copyfrom_ringbuffer(
-   struct hv_ring_buffer_info  *ring_info,
+   const struct hv_ring_buffer_info *ring_info,
void*dest,
u32 destlen,
u32 start_read_offset)
@@ -171,7 +169,7 @@ static u32 hv_copyfrom_ringbuffer(
 static u32 hv_copyto_ringbuffer(
struct hv_ring_buffer_info  *ring_info,
u32 start_write_offset,
-   void*src,
+   const void  *src,
u32 srclen)
 {
void *ring_buffer = hv_get_ring_buffer(ring_info);
@@ -186,8 +184,8 @@ static u32 hv_copyto_ringbuffer(
 }
 
 /* Get various debug metrics for the specified ring buffer. */
-void hv_ringbuffer_get_debuginfo(struct hv_ring_buffer_info *ring_info,
-   struct hv_ring_buffer_debug_info *debug_info)
+void hv_ringbuffer_get_debuginfo(const struct hv_ring_buffer_info *ring_info,
+struct hv_ring_buffer_debug_info *debug_info)
 {
u32 bytes_avail_towrite;
u32 bytes_avail_toread;
@@ -264,7 +262,7 @@ void hv_ringbuffer_cleanup(struct hv_ring_buffer_info 
*ring_info)
 
 /* Write to the ring buffer. */
 int hv_ringbuffer_write(struct vmbus_channel *channel,
-   struct kvec *kv_list, u32 kv_count)
+   const struct kvec *kv_list, u32 kv_count)
 {
int i = 0;
u32 bytes_avail_towrite;
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index dc50997a3fba..32a9cbc66b65 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -137,8 +137,8 @@ struct hv_ring_buffer_info {
  * for the specified ring buffer
  */
 static inline void
-hv_get_ringbuffer_availbytes(struct hv_ring_buffer_info *rbi,
- u32 *read, u32 *write)
+hv_get_ringbuffer_availbytes(const struct hv_ring_buffer_info *rbi,
+ 

[PATCH 14/14] vmbus: replace modulus operation with subtraction

2017-02-01 Thread Stephen Hemminger
Takes less clock cycles to check for ring wrap and subtract than to
do a modulus instruction.

Signed-off-by: Stephen Hemminger 
---
 drivers/hv/ring_buffer.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index ee3e488d9dee..8ab6298fd5ae 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -112,7 +112,8 @@ hv_get_next_readlocation_withoffset(const struct 
hv_ring_buffer_info *ring_info,
u32 next = ring_info->ring_buffer->read_index;
 
next += offset;
-   next %= ring_info->ring_datasize;
+   if (next >= ring_info->ring_datasize)
+   next -= ring_info->ring_datasize;
 
return next;
 }
@@ -156,7 +157,8 @@ static u32 hv_copyfrom_ringbuffer(
memcpy(dest, ring_buffer + start_read_offset, destlen);
 
start_read_offset += destlen;
-   start_read_offset %= ring_buffer_size;
+   if (start_read_offset >= ring_buffer_size)
+   start_read_offset -= ring_buffer_size;
 
return start_read_offset;
 }
@@ -178,7 +180,8 @@ static u32 hv_copyto_ringbuffer(
memcpy(ring_buffer + start_write_offset, src, srclen);
 
start_write_offset += srclen;
-   start_write_offset %= ring_buffer_size;
+   if (start_write_offset >= ring_buffer_size)
+   start_write_offset -= ring_buffer_size;
 
return start_write_offset;
 }
-- 
2.11.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 12/14] vmbus: expose hv_begin/end_read

2017-02-01 Thread Stephen Hemminger
In order to implement NAPI in netvsc, the driver needs access to
control host interrupt mask.

Signed-off-by: Stephen Hemminger 
---
 drivers/hv/hyperv_vmbus.h |  4 
 drivers/hv/ring_buffer.c  | 20 
 include/linux/hyperv.h| 30 ++
 3 files changed, 30 insertions(+), 24 deletions(-)

diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 6a9b54677218..e15a130de3c9 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -292,10 +292,6 @@ int hv_ringbuffer_read(struct vmbus_channel *channel,
 void hv_ringbuffer_get_debuginfo(struct hv_ring_buffer_info *ring_info,
struct hv_ring_buffer_debug_info *debug_info);
 
-void hv_begin_read(struct hv_ring_buffer_info *rbi);
-
-u32 hv_end_read(struct hv_ring_buffer_info *rbi);
-
 /*
  * Maximum channels is determined by the size of the interrupt page
  * which is PAGE_SIZE. 1/2 of PAGE_SIZE is for send endpoint interrupt
diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index 146fd8ab2a2a..47ab69089115 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -32,26 +32,6 @@
 
 #include "hyperv_vmbus.h"
 
-void hv_begin_read(struct hv_ring_buffer_info *rbi)
-{
-   rbi->ring_buffer->interrupt_mask = 1;
-   virt_mb();
-}
-
-u32 hv_end_read(struct hv_ring_buffer_info *rbi)
-{
-
-   rbi->ring_buffer->interrupt_mask = 0;
-   virt_mb();
-
-   /*
-* Now check to see if the ring buffer is still empty.
-* If it is not, we raced and we need to process new
-* incoming messages.
-*/
-   return hv_get_bytes_to_read(rbi);
-}
-
 /*
  * When we write to the ring buffer, check if the host needs to
  * be signaled. Here is the details of this protocol:
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 9b0165b11c5c..dc50997a3fba 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1473,6 +1473,36 @@ static inline  void hv_signal_on_read(struct 
vmbus_channel *channel)
 }
 
 /*
+ * Mask off host interrupt callback notifications
+ */
+static inline void hv_begin_read(struct hv_ring_buffer_info *rbi)
+{
+   rbi->ring_buffer->interrupt_mask = 1;
+
+   /* make sure mask update is not reordered */
+   virt_mb();
+}
+
+/*
+ * Re-enable host callback and return number of outstanding bytes
+ */
+static inline u32 hv_end_read(struct hv_ring_buffer_info *rbi)
+{
+
+   rbi->ring_buffer->interrupt_mask = 0;
+
+   /* make sure mask update is not reordered */
+   virt_mb();
+
+   /*
+* Now check to see if the ring buffer is still empty.
+* If it is not, we raced and we need to process new
+* incoming messages.
+*/
+   return hv_get_bytes_to_read(rbi);
+}
+
+/*
  * An API to support in-place processing of incoming VMBUS packets.
  */
 #define VMBUS_PKT_TRAILER  8
-- 
2.11.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method

2017-02-09 Thread Stephen Hemminger
On Thu, 9 Feb 2017 14:55:50 -0800
Andy Lutomirski  wrote:

> On Thu, Feb 9, 2017 at 12:45 PM, KY Srinivasan  wrote:
> >
> >  
> >> -Original Message-
> >> From: Thomas Gleixner [mailto:t...@linutronix.de]
> >> Sent: Thursday, February 9, 2017 9:08 AM
> >> To: Vitaly Kuznetsov 
> >> Cc: x...@kernel.org; Andy Lutomirski ; Ingo Molnar
> >> ; H. Peter Anvin ; KY Srinivasan
> >> ; Haiyang Zhang ; Stephen
> >> Hemminger ; Dexuan Cui
> >> ; linux-ker...@vger.kernel.org;
> >> de...@linuxdriverproject.org; virtualization@lists.linux-foundation.org
> >> Subject: Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read
> >> method
> >>
> >> On Thu, 9 Feb 2017, Vitaly Kuznetsov wrote:  
> >> > +#ifdef CONFIG_HYPERV_TSCPAGE
> >> > +static notrace u64 vread_hvclock(int *mode)
> >> > +{
> >> > +   const struct ms_hyperv_tsc_page *tsc_pg =
> >> > +   (const struct ms_hyperv_tsc_page *)&hvclock_page;
> >> > +   u64 sequence, scale, offset, current_tick, cur_tsc;
> >> > +
> >> > +   while (1) {
> >> > +   sequence = READ_ONCE(tsc_pg->tsc_sequence);
> >> > +   if (!sequence)
> >> > +   break;
> >> > +
> >> > +   scale = READ_ONCE(tsc_pg->tsc_scale);
> >> > +   offset = READ_ONCE(tsc_pg->tsc_offset);
> >> > +   rdtscll(cur_tsc);
> >> > +
> >> > +   current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
> >> > +
> >> > +   if (READ_ONCE(tsc_pg->tsc_sequence) == sequence)
> >> > +   return current_tick;  
> >>
> >> That sequence stuff lacks still a sensible explanation. It's fundamentally
> >> different from the sequence counting we do in the kernel, so documentation
> >> for it is really required.  
> >
> > The host is updating multiple fields in this shared TSC page and the 
> > sequence number is
> > used to ensure that the guest sees a consistent set values published. If I 
> > remember
> > correctly, Xen has a similar mechanism.  
> 
> So what's the actual protocol?  When the hypervisor updates the page,
> does it freeze all guest cpus?  If not, how does it maintain
> atomicity?

The protocol looks a lot like Linux seqlock, but it has an extra protection
which is missing here.

The host needs to update sequence number twice in order to guarantee ordering.
Otherwise it is possible that Host and guest can race.

Host
Write offset
Write scale
Set tsc_sequence = N
  Guest
read sequence = N
Read scale
Write scale
Write offset

Read Offset
Check sequence == N
Set tsc_sequence = N +1

Look like the current host side protocol is wrong.

The solution that Andi Kleen invented, and I used in seqlock was for the writer 
to update
sequence at start and end of transaction. If sequence number is odd, then the 
reader knows
it is looking at stale data.
Host
Write offset
Write scale
Set tsc_sequence = N (end of 
transaction)
  Guest
read sequence = N
Spin until sequence is even (N is even)
Read scale
Set tsc_sequence += 1
Write scale
Write offset

Read Offset
Check sequence == N? (fails is N + 1)
Set tsc_sequence += 1 (end of 
transaction)
read sequence = N+2
Spin until sequence is even (ie N +2)
Read scale  
Read Offset
Check sequence == N +2? (yes ok).

Also it is faster to just read scale and offset with this loop and save
the reading of TSC and doing multiply until after scale/offset has been 
acquired.




___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/2] hyperv: implement hv_get_tsc_page()

2017-02-09 Thread Stephen Hemminger
On Thu, 9 Feb 2017 21:14:25 +0100 (CET)
Thomas Gleixner  wrote:

> On Thu, 9 Feb 2017, Stephen Hemminger wrote:
> 
> > The actual code looks fine, but the style police will not like you.
> > { should be at start of line on functions.
> > And #else should be at start of line,
> > 
> > But maybe this was just more of exchange mangling the mail.  
> 
> Looks like.
> 
> > +struct ms_hyperv_tsc_page *hv_get_tsc_page(void) {
> > +   return tsc_pg;
> > +}
> > +  
> 
> That's how it reads in a proper mail client connected to a proper mail
> server:
> 
> > +struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
> > +{
> > +   return tsc_pg;
> > +}  
> 
> :)


Yup. it looks like the mail server is trying to be "helpful" by eliminating 
extra white space.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 2/3] x86/hyperv: move TSC reading method to asm/mshyperv.h

2017-03-03 Thread Stephen Hemminger

Minor coding comments

> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index d324dce..4ff25436 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -178,6 +178,56 @@ void hyperv_cleanup(void);
>  #endif
>  #ifdef CONFIG_HYPERV_TSCPAGE
>  struct ms_hyperv_tsc_page *hv_get_tsc_page(void);
> +static inline u64 hv_read_tsc_page(const struct ms_hyperv_tsc_page *tsc_pg)
> +{
> + u64 scale, offset, current_tick, cur_tsc;
> + u32 sequence;
> +
> + /*
> +  * The protocol for reading Hyper-V TSC page is specified in Hypervisor
> +  * Top-Level Functional Specification ver. 3.0 and above. To get the
> +  * reference time we must do the following:
> +  * - READ ReferenceTscSequence
> +  *   A special '0' value indicates the time source is unreliable and we
> +  *   need to use something else. The currently published specification
> +  *   versions (up to 4.0b) contain a mistake and wrongly claim '-1'
> +  *   instead of '0' as the special value, see commit c35b82ef0294.
> +  * - ReferenceTime =
> +  *((RDTSC() * ReferenceTscScale) >> 64) + ReferenceTscOffset
> +  * - READ ReferenceTscSequence again. In case its value has changed
> +  *   since our first reading we need to discard ReferenceTime and repeat
> +  *   the whole sequence as the hypervisor was updating the page in
> +  *   between.
> +  */
> + while (1) {
> + sequence = READ_ONCE(tsc_pg->tsc_sequence);
> + if (!sequence)
> + break;

It would be clearer to just return U64_MAX here (and not fall out)
since this is only case here. Also since this failure only occurs if host
clock is not available, probably should be unlikely.

> + /*
> +  * Make sure we read sequence before we read other values from
> +  * TSC page.
> +  */
> + smp_rmb();
> +
> + scale = READ_ONCE(tsc_pg->tsc_scale);
> + offset = READ_ONCE(tsc_pg->tsc_offset);
> + cur_tsc = rdtsc_ordered();

Since you already have smp_ barriers and rdtsc_ordered is a barrier,
the compiler barriers (READ_ONCE()) shouldn't be necessary.

> +
> + current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
> +
> + /*
> +  * Make sure we read sequence after we read all other values
> +  * from TSC page.
> +  */
> + smp_rmb();
> +
> + if (READ_ONCE(tsc_pg->tsc_sequence) == sequence)
> + return current_tick;
> + }

Why not make do { } while out of this.

do {
...
} while (unlikely(READ_ONCE(tsc_pg->tsc_sequence) != sequence);
return current_tick;

Also don't need to calculate tick value until have good data. As in:

static inline u32 hv_clock_sequence(const struct ms_hyperv_tsc_page *tsc_pg)
{
u32 sequence =
return sequence;
}

static inline u64 hv_read_tsc_page(const struct ms_hyperv_tsc_page *tsc_pg)
{
u64 scale, offset, cur_tsc;
u32 start;

/*
 * The protocol for reading Hyper-V TSC page is specified in Hypervisor
 * Top-Level Functional Specification ver. 3.0 and above. To get the
 * reference time we must do the following:
 * - READ ReferenceTscSequence
 *   A special '0' value indicates the time source is unreliable and we
 *   need to use something else. The currently published specification
 *   versions (up to 4.0b) contain a mistake and wrongly claim '-1'
 *   instead of '0' as the special value, see commit c35b82ef0294.
 * - ReferenceTime =
 *((RDTSC() * ReferenceTscScale) >> 64) + ReferenceTscOffset
 * - READ ReferenceTscSequence again. In case its value has changed
 *   since our first reading we need to discard ReferenceTime and repeat
 *   the whole sequence as the hypervisor was updating the page in
 *   between.
 */
do {
start = READ_ONCE(tsc_pg->tsc_sequence);
smp_rmb();

if (unlikely(!start))
return U64_MAX;

scale = tsc_pg->tsc_scale;
offset = tsc_pg->tsc_offset;

/*
 * Make sure we read sequence after we read all other values
 * from TSC page.
 */
smp_rmb();
} while (unlikely(READ_ONCE(tsc_pg->tsc_sequence != start)));

cur_tsc = rdtsc_ordered();
return mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
}

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH net-next 0/4] various sizeof cleanups

2017-08-15 Thread Stephen Hemminger
Noticed some places that were using sizeof as an operator.
This is legal C but is not the convention used in the kernel.

Stephen Hemminger (4):
  tun/tap: use paren's with sizeof
  virtio: put paren around sizeof
  skge: add paren around sizeof arg
  mlx4: sizeof style usage

 drivers/net/ethernet/marvell/skge.c|  2 +-
 drivers/net/ethernet/mellanox/mlx4/alloc.c |  2 +-
 drivers/net/ethernet/mellanox/mlx4/cmd.c   |  4 ++--
 drivers/net/ethernet/mellanox/mlx4/en_resources.c  |  2 +-
 drivers/net/ethernet/mellanox/mlx4/en_rx.c |  2 +-
 drivers/net/ethernet/mellanox/mlx4/en_tx.c |  2 +-
 drivers/net/ethernet/mellanox/mlx4/eq.c| 20 +-
 drivers/net/ethernet/mellanox/mlx4/fw.c|  2 +-
 drivers/net/ethernet/mellanox/mlx4/icm.c   |  2 +-
 drivers/net/ethernet/mellanox/mlx4/icm.h   |  4 ++--
 drivers/net/ethernet/mellanox/mlx4/intf.c  |  2 +-
 drivers/net/ethernet/mellanox/mlx4/main.c  | 12 +--
 drivers/net/ethernet/mellanox/mlx4/mcg.c   | 12 +--
 drivers/net/ethernet/mellanox/mlx4/mr.c| 10 -
 drivers/net/ethernet/mellanox/mlx4/qp.c| 12 +--
 .../net/ethernet/mellanox/mlx4/resource_tracker.c  | 24 +++---
 drivers/net/tap.c  |  2 +-
 drivers/net/tun.c  |  2 +-
 drivers/net/virtio_net.c   |  2 +-
 19 files changed, 60 insertions(+), 60 deletions(-)

-- 
2.11.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH net-next 2/4] virtio: put paren around sizeof

2017-08-15 Thread Stephen Hemminger
Kernel coding style is to put paren around operand of sizeof.

Signed-off-by: Stephen Hemminger 
---
 drivers/net/virtio_net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index a3f3c66b4530..4302f313d9a7 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -319,7 +319,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 
hdr_len = vi->hdr_len;
if (vi->mergeable_rx_bufs)
-   hdr_padded_len = sizeof *hdr;
+   hdr_padded_len = sizeof(*hdr);
else
hdr_padded_len = sizeof(struct padded_vnet_hdr);
 
-- 
2.11.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH net-next 1/4] tun/tap: use paren's with sizeof

2017-08-15 Thread Stephen Hemminger
Although sizeof is an operator in C. The kernel coding style convention
is to always use it like a function and add parenthesis.

Signed-off-by: Stephen Hemminger 
---
 drivers/net/tap.c | 2 +-
 drivers/net/tun.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index 0d039411e64c..21b71ae947fd 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -1215,7 +1215,7 @@ int tap_queue_resize(struct tap_dev *tap)
int n = tap->numqueues;
int ret, i = 0;
 
-   arrays = kmalloc(sizeof *arrays * n, GFP_KERNEL);
+   arrays = kmalloc_array(n, sizeof(*arrays), GFP_KERNEL);
if (!arrays)
return -ENOMEM;
 
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 5892284eb8d0..f5017121cd57 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2737,7 +2737,7 @@ static int tun_queue_resize(struct tun_struct *tun)
int n = tun->numqueues + tun->numdisabled;
int ret, i;
 
-   arrays = kmalloc(sizeof *arrays * n, GFP_KERNEL);
+   arrays = kmalloc_array(n, sizeof(*arrays), GFP_KERNEL);
if (!arrays)
return -ENOMEM;
 
-- 
2.11.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH net-next 3/4] skge: add paren around sizeof arg

2017-08-15 Thread Stephen Hemminger
Signed-off-by: Stephen Hemminger 
---
 drivers/net/ethernet/marvell/skge.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/skge.c 
b/drivers/net/ethernet/marvell/skge.c
index 5d7d94de4e00..8a835e82256a 100644
--- a/drivers/net/ethernet/marvell/skge.c
+++ b/drivers/net/ethernet/marvell/skge.c
@@ -3516,7 +3516,7 @@ static const char *skge_board_name(const struct skge_hw 
*hw)
if (skge_chips[i].id == hw->chip_id)
return skge_chips[i].name;
 
-   snprintf(buf, sizeof buf, "chipid 0x%x", hw->chip_id);
+   snprintf(buf, sizeof(buf), "chipid 0x%x", hw->chip_id);
return buf;
 }
 
-- 
2.11.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH net-next 4/4] mlx4: sizeof style usage

2017-08-15 Thread Stephen Hemminger
The kernel coding style is to treat sizeof as a function
(ie. with parenthesis) not as an operator.

Also use kcalloc and kmalloc_array

Signed-off-by: Stephen Hemminger 
---
 drivers/net/ethernet/mellanox/mlx4/alloc.c |  2 +-
 drivers/net/ethernet/mellanox/mlx4/cmd.c   |  4 ++--
 drivers/net/ethernet/mellanox/mlx4/en_resources.c  |  2 +-
 drivers/net/ethernet/mellanox/mlx4/en_rx.c |  2 +-
 drivers/net/ethernet/mellanox/mlx4/en_tx.c |  2 +-
 drivers/net/ethernet/mellanox/mlx4/eq.c| 20 +-
 drivers/net/ethernet/mellanox/mlx4/fw.c|  2 +-
 drivers/net/ethernet/mellanox/mlx4/icm.c   |  2 +-
 drivers/net/ethernet/mellanox/mlx4/icm.h   |  4 ++--
 drivers/net/ethernet/mellanox/mlx4/intf.c  |  2 +-
 drivers/net/ethernet/mellanox/mlx4/main.c  | 12 +--
 drivers/net/ethernet/mellanox/mlx4/mcg.c   | 12 +--
 drivers/net/ethernet/mellanox/mlx4/mr.c| 10 -
 drivers/net/ethernet/mellanox/mlx4/qp.c| 12 +--
 .../net/ethernet/mellanox/mlx4/resource_tracker.c  | 24 +++---
 15 files changed, 56 insertions(+), 56 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/alloc.c 
b/drivers/net/ethernet/mellanox/mlx4/alloc.c
index b651c1210555..6dabd983e7e0 100644
--- a/drivers/net/ethernet/mellanox/mlx4/alloc.c
+++ b/drivers/net/ethernet/mellanox/mlx4/alloc.c
@@ -186,7 +186,7 @@ int mlx4_bitmap_init(struct mlx4_bitmap *bitmap, u32 num, 
u32 mask,
bitmap->effective_len = bitmap->avail;
spin_lock_init(&bitmap->lock);
bitmap->table = kzalloc(BITS_TO_LONGS(bitmap->max) *
-   sizeof (long), GFP_KERNEL);
+   sizeof(long), GFP_KERNEL);
if (!bitmap->table)
return -ENOMEM;
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/cmd.c 
b/drivers/net/ethernet/mellanox/mlx4/cmd.c
index 674773b28b2e..97aed30ead21 100644
--- a/drivers/net/ethernet/mellanox/mlx4/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c
@@ -2637,7 +2637,7 @@ int mlx4_cmd_use_events(struct mlx4_dev *dev)
int err = 0;
 
priv->cmd.context = kmalloc(priv->cmd.max_cmds *
-  sizeof (struct mlx4_cmd_context),
+  sizeof(struct mlx4_cmd_context),
   GFP_KERNEL);
if (!priv->cmd.context)
return -ENOMEM;
@@ -2695,7 +2695,7 @@ struct mlx4_cmd_mailbox *mlx4_alloc_cmd_mailbox(struct 
mlx4_dev *dev)
 {
struct mlx4_cmd_mailbox *mailbox;
 
-   mailbox = kmalloc(sizeof *mailbox, GFP_KERNEL);
+   mailbox = kmalloc(sizeof(*mailbox), GFP_KERNEL);
if (!mailbox)
return ERR_PTR(-ENOMEM);
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_resources.c 
b/drivers/net/ethernet/mellanox/mlx4/en_resources.c
index 86d2d42d658d..5a47f9669621 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_resources.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_resources.c
@@ -44,7 +44,7 @@ void mlx4_en_fill_qp_context(struct mlx4_en_priv *priv, int 
size, int stride,
struct mlx4_en_dev *mdev = priv->mdev;
struct net_device *dev = priv->dev;
 
-   memset(context, 0, sizeof *context);
+   memset(context, 0, sizeof(*context));
context->flags = cpu_to_be32(7 << 16 | rss << MLX4_RSS_QPC_FLAG_OFFSET);
context->pd = cpu_to_be32(mdev->priv_pdn);
context->mtu_msgmax = 0xff;
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index bf1638044a7a..dcb8f8f84a97 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -1056,7 +1056,7 @@ static int mlx4_en_config_rss_qp(struct mlx4_en_priv 
*priv, int qpn,
}
qp->event = mlx4_en_sqp_event;
 
-   memset(context, 0, sizeof *context);
+   memset(context, 0, sizeof(*context));
mlx4_en_fill_qp_context(priv, ring->actual_size, ring->stride, 0, 0,
qpn, ring->cqn, -1, context);
context->db_rec_addr = cpu_to_be64(ring->wqres.db.dma);
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c 
b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 73faa3d77921..bcf422efd3b8 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -643,7 +643,7 @@ static void build_inline_wqe(struct mlx4_en_tx_desc 
*tx_desc,
 void *fragptr)
 {
struct mlx4_wqe_inline_seg *inl = &tx_desc->inl;
-   int spc = MLX4_INLINE_ALIGN - CTRL_SIZE - sizeof *inl;
+   int spc = MLX4_INLINE_ALIGN - CTRL_SIZE - sizeof(*inl);
unsigned int hlen = skb_headlen(skb);
 
if (skb->len <= spc) {
diff --git a/drivers/net/ethernet/mellanox/mlx4/eq.c 
b/drivers/net/et

[PATCH] uapi: add SPDX identifier to vm_sockets_diag.h

2017-11-24 Thread Stephen Hemminger
New file seems to have missed the SPDX license scan and update.

Signed-off-by: Stephen Hemminger 
---
 include/uapi/linux/vm_sockets_diag.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/vm_sockets_diag.h 
b/include/uapi/linux/vm_sockets_diag.h
index 14cd7dc5a187..0b4dd54f3d1e 100644
--- a/include/uapi/linux/vm_sockets_diag.h
+++ b/include/uapi/linux/vm_sockets_diag.h
@@ -1,3 +1,4 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /* AF_VSOCK sock_diag(7) interface for querying open sockets */
 
 #ifndef _UAPI__VM_SOCKETS_DIAG_H__
-- 
2.11.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC] virtio-net: help live migrate SR-IOV devices

2017-11-29 Thread Stephen Hemminger
On Wed, 29 Nov 2017 19:51:38 -0800
Jakub Kicinski  wrote:

> On Thu, 30 Nov 2017 11:29:56 +0800, Jason Wang wrote:
> > On 2017年11月29日 03:27, Jesse Brandeburg wrote:  
> > > Hi, I'd like to get some feedback on a proposal to enhance
> > > virtio-net to ease configuration of a VM and that would enable
> > > live migration of passthrough network SR-IOV devices.
> > >
> > > Today we have SR-IOV network devices (VFs) that can be passed
> > > into a VM in order to enable high performance networking direct
> > > within the VM. The problem I am trying to address is that this
> > > configuration is generally difficult to live-migrate.  There is
> > > documentation [1] indicating that some OS/Hypervisor vendors will
> > > support live migration of a system with a direct assigned
> > > networking device.  The problem I see with these implementations
> > > is that the network configuration requirements that are passed on
> > > to the owner of the VM are quite complicated.  You have to set up
> > > bonding, you have to configure it to enslave two interfaces,
> > > those interfaces (one is virtio-net, the other is SR-IOV
> > > device/driver like ixgbevf) must support MAC address changes
> > > requested in the VM, and on and on...
> > >
> > > So, on to the proposal:
> > > Modify virtio-net driver to be a single VM network device that
> > > enslaves an SR-IOV network device (inside the VM) with the same
> > > MAC address. This would cause the virtio-net driver to appear and
> > > work like a simplified bonding/team driver.  The live migration
> > > problem would be solved just like today's bonding solution, but
> > > the VM user's networking config would be greatly simplified.
> > >
> > > At it's simplest, it would appear something like this in the VM.
> > >
> > > ==
> > > = vnet0  =
> > >   =
> > > (virtio- =   |
> > >   net)=   |
> > >   =  ==
> > >   =  = ixgbef =
> > > ==  ==
> > >
> > > (forgive the ASCII art)
> > >
> > > The fast path traffic would prefer the ixgbevf or other SR-IOV
> > > device path, and fall back to virtio's transmit/receive when
> > > migrating.
> > >
> > > Compared to today's options this proposal would
> > > 1) make virtio-net more sticky, allow fast path traffic at SR-IOV
> > > speeds
> > > 2) simplify end user configuration in the VM (most if not all of
> > > the set up to enable migration would be done in the hypervisor)
> > > 3) allow live migration via a simple link down and maybe a PCI
> > > hot-unplug of the SR-IOV device, with failover to the
> > > virtio-net driver core
> > > 4) allow vendor agnostic hardware acceleration, and live migration
> > > between vendors if the VM os has driver support for all the
> > > required SR-IOV devices.
> > >
> > > Runtime operation proposed:
> > > -  virtio-net driver loads, SR-IOV driver loads
> > > - virtio-net finds other NICs that match it's MAC address by
> > >both examining existing interfaces, and sets up a new device
> > > notifier
> > > - virtio-net enslaves the first NIC with the same MAC address
> > > - virtio-net brings up the slave, and makes it the "preferred"
> > > path
> > > - virtio-net follows the behavior of an active backup bond/team
> > > - virtio-net acts as the interface to the VM
> > > - live migration initiates
> > > - link goes down on SR-IOV, or SR-IOV device is removed
> > > - failover to virtio-net as primary path
> > > - migration continues to new host
> > > - new host is started with virio-net as primary
> > > - if no SR-IOV, virtio-net stays primary
> > > - hypervisor can hot-add SR-IOV NIC, with same MAC addr as virtio
> > > - virtio-net notices new NIC and starts over at enslave step above
> > >
> > > Future ideas (brainstorming):
> > > - Optimize Fast east-west by having special rules to direct
> > > east-west traffic through virtio-net traffic path
> > >
> > > Thanks for reading!
> > > Jesse
> > 
> > Cc netdev.
> > 
> > Interesting, and this method is actually used by netvsc now:
> > 
> > commit 0c195567a8f6e82ea5535cd9f1d54a1626dd233e
> > Author: stephen hemminger 
> > Date:   Tue Aug 1 19:58:53 2017 -0700
&

Re: [RFC] virtio-net: help live migrate SR-IOV devices

2017-12-03 Thread Stephen Hemminger
On Sun, 3 Dec 2017 11:14:37 +0200
achiad shochat  wrote:

> On 3 December 2017 at 07:05, Michael S. Tsirkin  wrote:
> > On Fri, Dec 01, 2017 at 12:08:59PM -0800, Shannon Nelson wrote:  
> >> On 11/30/2017 6:11 AM, Michael S. Tsirkin wrote:  
> >> > On Thu, Nov 30, 2017 at 10:08:45AM +0200, achiad shochat wrote:  
> >> > > Re. problem #2:
> >> > > Indeed the best way to address it seems to be to enslave the VF driver
> >> > > netdev under a persistent anchor netdev.
> >> > > And it's indeed desired to allow (but not enforce) PV netdev and VF
> >> > > netdev to work in conjunction.
> >> > > And it's indeed desired that this enslavement logic work out-of-the 
> >> > > box.
> >> > > But in case of PV+VF some configurable policies must be in place (and
> >> > > they'd better be generic rather than differ per PV technology).
> >> > > For example - based on which characteristics should the PV+VF coupling
> >> > > be done? netvsc uses MAC address, but that might not always be the
> >> > > desire.  
> >> >
> >> > It's a policy but not guest userspace policy.
> >> >
> >> > The hypervisor certainly knows.
> >> >
> >> > Are you concerned that someone might want to create two devices with the
> >> > same MAC for an unrelated reason?  If so, hypervisor could easily set a
> >> > flag in the virtio device to say "this is a backup, use MAC to find
> >> > another device".  
> >>
> >> This is something I was going to suggest: a flag or other configuration on
> >> the virtio device to help control how this new feature is used.  I can
> >> imagine this might be useful to control from either the hypervisor side or
> >> the VM side.
> >>
> >> The hypervisor might want to (1) disable it (force it off), (2) enable it
> >> for VM choice, or (3) force it on for the VM.  In case (2), the VM might be
> >> able to chose whether it wants to make use of the feature, or stick with 
> >> the
> >> bonding solution.
> >>
> >> Either way, the kernel is making a feature available, and the user (VM or
> >> hypervisor) is able to control it by selecting the feature based on the
> >> policy desired.
> >>
> >> sln  
> >
> > I'm not sure what's the feature that is available here.
> >
> > I saw this as a flag that says "this device shares backend with another
> > network device which can be found using MAC, and that backend should be
> > preferred".  kernel then forces configuration which uses that other
> > backend - as long as it exists.
> >
> > However, please Cc virtio-dev mailing list if we are doing this since
> > this is a spec extension.
> >
> > --
> > MST  
> 
> 
> Can someone please explain why assume a virtio device is there at all??
> I specified a case where there isn't any.
> 
> I second Jacob - having a netdev of one device driver enslave a netdev
> of another device driver is an awkward a-symmetric model.
> Regardless of whether they share the same backend device.
> Only I am not sure the Linux Bond is the right choice.
> e.g one may well want to use the virtio device also when the
> pass-through device is available, e.g for multicasts, east-west
> traffic, etc.
> I'm not sure the Linux Bond fits that functionality.
> And, as I hear in this thread, it is hard to make it work out of the box.
> So I think the right thing would be to write a new dedicated module
> for this purpose.
> 
> Re policy -
> Indeed the HV can request a policy from the guest but that's not a
> claim for the virtio device enslaving the pass-through device.
> Any policy can be queried by the upper enslaving device.
> 
> Bottom line - I do not see a single reason to have the virtio netdev
> (nor netvsc or any other PV netdev) enslave another netdev by itself.
> If we'd do it right with netvsc from the beginning we wouldn't need
> this discussion at all...

There are several issues with transparent migration.
The first is that the SR-IOV device needs to be shut off for earlier
in the migration process.
Next, the SR-IOV device in the migrated go guest environment maybe different.
It might not exist at all, it might be at a different PCI address, or it
could even be a different vendor/speed/model.
Keeping a virtual network device around allows persisting the connectivity,
during the process.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC] virtio-net: help live migrate SR-IOV devices

2017-12-05 Thread Stephen Hemminger
On Tue, 5 Dec 2017 14:29:28 -0800
Jakub Kicinski  wrote:

> On Tue, 5 Dec 2017 11:59:17 +0200, achiad shochat wrote:
> >  I second Jacob - having a netdev of one device driver enslave a netdev
> >  of another device driver is an awkward a-symmetric model.
> >  Regardless of whether they share the same backend device.
> >  Only I am not sure the Linux Bond is the right choice.
> >  e.g one may well want to use the virtio device also when the
> >  pass-through device is available, e.g for multicasts, east-west
> >  traffic, etc.
> >  I'm not sure the Linux Bond fits that functionality.
> >  And, as I hear in this thread, it is hard to make it work out of the 
> >  box.
> >  So I think the right thing would be to write a new dedicated module
> >  for this purpose.
> > >
> > > This part I can sort of agree with. What if we were to look at
> > > providing a way to somehow advertise that the two devices were meant
> > > to be boded for virtualization purposes? For now lets call it a
> > > "virt-bond". Basically we could look at providing a means for virtio
> > > and VF drivers to advertise that they want this sort of bond. Then it
> > > would just be a matter of providing some sort of side channel to
> > > indicate where you want things like multicast/broadcast/east-west
> > > traffic to go.  
> > 
> > I like this approach.  
> 
> +1 on a separate driver, just enslaving devices to virtio may break
> existing setups.  If people are bonding from user space today, if they
> update their kernel it may surprise them how things get auto-mangled.
> 
> Is what Alex is suggesting a separate PV device that says "I would
> like to be a bond of those two interfaces"?  That would make the HV
> intent explicit and kernel decisions more understandable.

So far, in my experience it still works.
As long as the kernel slaving happens first, it will work.
The attempt to bond an already slaved device will fail and no scripts seem
to check the error return.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH] virtio_net: Extend virtio to use VF datapath when available

2017-12-19 Thread Stephen Hemminger
On Tue, 19 Dec 2017 09:41:39 -0800
"Samudrala, Sridhar"  wrote:

> On 12/19/2017 7:47 AM, Michael S. Tsirkin wrote:
> > I'll need to look at this more, in particular the feature
> > bit is missing here. For now one question:
> >
> > On Mon, Dec 18, 2017 at 04:40:36PM -0800, Sridhar Samudrala wrote:  
> >> @@ -56,6 +58,8 @@ module_param(napi_tx, bool, 0644);
> >>*/
> >>   DECLARE_EWMA(pkt_len, 0, 64)
> >>   
> >> +#define VF_TAKEOVER_INT   (HZ / 10)
> >> +
> >>   #define VIRTNET_DRIVER_VERSION "1.0.0"
> >>   
> >>   static const unsigned long guest_offloads[] = {  
> > Why is this delay necessary? And why by 100ms?  
> 
> This is based on netvsc implementation and here is the commit that
> added this delay.  Not sure if this needs to be 100ms.
> 
> commit 6123c66854c174e4982f98195100c1d990f9e5e6
> Author: stephen hemminger 
> Date:   Wed Aug 9 17:46:03 2017 -0700
> 
>      netvsc: delay setup of VF device
> 
>      When VF device is discovered, delay bring it automatically up in
>      order to allow userspace to some simple changes (like renaming).
> 
> 
> 

could be 10ms, just enough to let udev do its renaming
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH] virtio_net: Extend virtio to use VF datapath when available

2017-12-19 Thread Stephen Hemminger
On Tue, 19 Dec 2017 20:07:01 +0200
"Michael S. Tsirkin"  wrote:

> On Tue, Dec 19, 2017 at 09:55:48AM -0800, Stephen Hemminger wrote:
> > On Tue, 19 Dec 2017 09:41:39 -0800
> > "Samudrala, Sridhar"  wrote:
> >   
> > > On 12/19/2017 7:47 AM, Michael S. Tsirkin wrote:  
> > > > I'll need to look at this more, in particular the feature
> > > > bit is missing here. For now one question:
> > > >
> > > > On Mon, Dec 18, 2017 at 04:40:36PM -0800, Sridhar Samudrala wrote:
> > > >> @@ -56,6 +58,8 @@ module_param(napi_tx, bool, 0644);
> > > >>*/
> > > >>   DECLARE_EWMA(pkt_len, 0, 64)
> > > >>   
> > > >> +#define VF_TAKEOVER_INT   (HZ / 10)
> > > >> +
> > > >>   #define VIRTNET_DRIVER_VERSION "1.0.0"
> > > >>   
> > > >>   static const unsigned long guest_offloads[] = {
> > > > Why is this delay necessary? And why by 100ms?
> > > 
> > > This is based on netvsc implementation and here is the commit that
> > > added this delay.  Not sure if this needs to be 100ms.
> > > 
> > > commit 6123c66854c174e4982f98195100c1d990f9e5e6
> > > Author: stephen hemminger 
> > > Date:   Wed Aug 9 17:46:03 2017 -0700
> > > 
> > >      netvsc: delay setup of VF device
> > > 
> > >      When VF device is discovered, delay bring it automatically up in
> > >      order to allow userspace to some simple changes (like renaming).
> > > 
> > > 
> > >   
> > 
> > could be 10ms, just enough to let udev do its renaming  
> 
> Isn't there a way not to depend on udev completing its thing within a given 
> timeframe?

Not that I know. the path is quite indirect.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH] virtio_net: Extend virtio to use VF datapath when available

2017-12-19 Thread Stephen Hemminger
On Tue, 19 Dec 2017 13:21:17 -0500 (EST)
David Miller  wrote:

> From: Stephen Hemminger 
> Date: Tue, 19 Dec 2017 09:55:48 -0800
> 
> > could be 10ms, just enough to let udev do its renaming  
> 
> Please, move to some kind of notification or event based handling of
> this problem.
> 
> No delay is safe, what if userspace gets swapped out or whatever
> else might make userspace stall unexpectedly?
> 

The plan is to remove the delay and do the naming in the kernel.
This was suggested by Lennart since udev is only doing naming policy
because kernel names were not repeatable.

This makes the VF show up as "ethN_vf" on Hyper-V which is user friendly.

Patch is pending.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH] virtio_net: Extend virtio to use VF datapath when available

2017-12-19 Thread Stephen Hemminger
On Tue, 19 Dec 2017 11:42:33 -0800
"Samudrala, Sridhar"  wrote:

> On 12/19/2017 10:41 AM, Stephen Hemminger wrote:
> > On Tue, 19 Dec 2017 13:21:17 -0500 (EST)
> > David Miller  wrote:
> >  
> >> From: Stephen Hemminger 
> >> Date: Tue, 19 Dec 2017 09:55:48 -0800
> >>  
> >>> could be 10ms, just enough to let udev do its renaming  
> >> Please, move to some kind of notification or event based handling of
> >> this problem.
> >>
> >> No delay is safe, what if userspace gets swapped out or whatever
> >> else might make userspace stall unexpectedly?
> >>  
> > The plan is to remove the delay and do the naming in the kernel.
> > This was suggested by Lennart since udev is only doing naming policy
> > because kernel names were not repeatable.
> >
> > This makes the VF show up as "ethN_vf" on Hyper-V which is user friendly.
> >
> > Patch is pending.  
> Do we really need to delay the setup until the name is changed?
> Can't we call dev_set_mtu() and dev_open() until dev_change_name() is done?
> 
> Thanks
> Sridhar

You can call dev_set_mtu, but when dev_open is done the device name
can not be changed by userspace.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH] virtio_net: Extend virtio to use VF datapath when available

2017-12-19 Thread Stephen Hemminger
On Tue, 19 Dec 2017 14:37:50 -0800
"Samudrala, Sridhar"  wrote:

> On 12/19/2017 11:46 AM, Stephen Hemminger wrote:
> > On Tue, 19 Dec 2017 11:42:33 -0800
> > "Samudrala, Sridhar"  wrote:
> >  
> >> On 12/19/2017 10:41 AM, Stephen Hemminger wrote:  
> >>> On Tue, 19 Dec 2017 13:21:17 -0500 (EST)
> >>> David Miller  wrote:
> >>> 
> >>>> From: Stephen Hemminger 
> >>>> Date: Tue, 19 Dec 2017 09:55:48 -0800
> >>>> 
> >>>>> could be 10ms, just enough to let udev do its renaming  
> >>>> Please, move to some kind of notification or event based handling of
> >>>> this problem.
> >>>>
> >>>> No delay is safe, what if userspace gets swapped out or whatever
> >>>> else might make userspace stall unexpectedly?
> >>>> 
> >>> The plan is to remove the delay and do the naming in the kernel.
> >>> This was suggested by Lennart since udev is only doing naming policy
> >>> because kernel names were not repeatable.
> >>>
> >>> This makes the VF show up as "ethN_vf" on Hyper-V which is user friendly.
> >>>
> >>> Patch is pending.  
> >> Do we really need to delay the setup until the name is changed?
> >> Can't we call dev_set_mtu() and dev_open() until dev_change_name() is done?
> >>
> >> Thanks
> >> Sridhar  
> > You can call dev_set_mtu, but when dev_open is done the device name
> > can not be changed by userspace.  
> I did a quick test to remove the delay and also the dev_open() call and 
> i don't see
> any issues with virtio taking over the VF datapath.
> Only the netdev_info() messages may show old device name.
> 
> Any specific scenario where we need to explicitly call  VF's dev_open() 
> in the VF setup process?
> I tried i40evf driver loaded after virtio_net  AND  virtio_net loading 
> after i40evf.
> 
> Thanks
> Sridhar

It happens with hotplug. It is possible on Hyper-V to hotplug SR-IOV on
and off while guest is running. If SR-IOV is disabled in host then the
VF device is removed (hotplug) and the inverse. If the master device is
up then the VF device should be brought up by the master device.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next] virtio_net: Add ethtool stats

2017-12-24 Thread Stephen Hemminger
On Wed, 20 Dec 2017 13:40:37 +0900
Toshiaki Makita  wrote:

> +
> +static const struct virtnet_gstats virtnet_gstrings_stats[] = {
> + { "rx_packets", VIRTNET_NETDEV_STAT(rx_packets) },
> + { "tx_packets", VIRTNET_NETDEV_STAT(tx_packets) },
> + { "rx_bytes",   VIRTNET_NETDEV_STAT(rx_bytes) },
> + { "tx_bytes",   VIRTNET_NETDEV_STAT(tx_bytes) },
> + { "rx_dropped", VIRTNET_NETDEV_STAT(rx_dropped) },
> + { "rx_length_errors",   VIRTNET_NETDEV_STAT(rx_length_errors) },
> + { "rx_frame_errors",VIRTNET_NETDEV_STAT(rx_frame_errors) },
> + { "tx_dropped", VIRTNET_NETDEV_STAT(tx_dropped) },
> + { "tx_fifo_errors", VIRTNET_NETDEV_STAT(tx_fifo_errors) },
> +};
> +

Please do not merge pre-existing global stats into ethtool.
It just duplicates existing functionality.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] [RFC PATCH net-next v2 1/2] virtio_net: Introduce VIRTIO_NET_F_BACKUP feature bit

2018-01-22 Thread Stephen Hemminger
On Mon, 22 Jan 2018 15:27:40 -0800
"Samudrala, Sridhar"  wrote:

> On 1/22/2018 1:31 PM, Michael S. Tsirkin wrote:
> > On Wed, Jan 17, 2018 at 01:49:58PM -0800, Alexander Duyck wrote:  
> >> On Wed, Jan 17, 2018 at 11:57 AM, Michael S. Tsirkin  
> >> wrote:  
> >>> On Wed, Jan 17, 2018 at 11:25:41AM -0800, Samudrala, Sridhar wrote:  
> 
>  On 1/17/2018 11:02 AM, Michael S. Tsirkin wrote:  
> > On Wed, Jan 17, 2018 at 10:15:52AM -0800, Alexander Duyck wrote:  
> >> On Thu, Jan 11, 2018 at 9:58 PM, Sridhar Samudrala
> >>  wrote:  
> >>> This feature bit can be used by hypervisor to indicate virtio_net 
> >>> device to
> >>> act as a backup for another device with the same MAC address.
> >>>
> >>> Signed-off-by: Sridhar Samudrala 
> >>> ---
> >>>drivers/net/virtio_net.c| 2 +-
> >>>include/uapi/linux/virtio_net.h | 3 +++
> >>>2 files changed, 4 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> >>> index 12dfc5fee58e..f149a160a8c5 100644
> >>> --- a/drivers/net/virtio_net.c
> >>> +++ b/drivers/net/virtio_net.c
> >>> @@ -2829,7 +2829,7 @@ static struct virtio_device_id id_table[] = {
> >>>   VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ, \
> >>>   VIRTIO_NET_F_CTRL_MAC_ADDR, \
> >>>   VIRTIO_NET_F_MTU, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS, \
> >>> -   VIRTIO_NET_F_SPEED_DUPLEX
> >>> +   VIRTIO_NET_F_SPEED_DUPLEX, VIRTIO_NET_F_BACKUP
> >>>
> >>>static unsigned int features[] = {
> >>>   VIRTNET_FEATURES,
> >>> diff --git a/include/uapi/linux/virtio_net.h 
> >>> b/include/uapi/linux/virtio_net.h
> >>> index 5de6ed37695b..c7c35fd1a5ed 100644
> >>> --- a/include/uapi/linux/virtio_net.h
> >>> +++ b/include/uapi/linux/virtio_net.h
> >>> @@ -57,6 +57,9 @@
> >>>* Steering */
> >>>#define VIRTIO_NET_F_CTRL_MAC_ADDR 23  /* Set MAC address */
> >>>
> >>> +#define VIRTIO_NET_F_BACKUP  62/* Act as backup for another 
> >>> device
> >>> +* with the same MAC.
> >>> +*/
> >>>#define VIRTIO_NET_F_SPEED_DUPLEX 63   /* Device set linkspeed and 
> >>> duplex */
> >>>
> >>>#ifndef VIRTIO_NET_NO_LEGACY  
> >> I'm not a huge fan of the name "backup" since that implies that the
> >> Virtio interface is only used if the VF is not present, and there are
> >> multiple instances such as dealing with east/west or
> >> broadcast/multicast traffic where it may be desirable to use the
> >> para-virtual interface rather then deal with PCI overhead/bottleneck
> >> to send the packet.  
> > Right now hypervisors mostly expect that yes, only one at a time is
> > used.  E.g. if you try to do multicast sending packets on both VF and
> > virtio then you will end up with two copies of each packet.  
>  I think we want to use only 1 interface to  send out any packet. In case 
>  of
>  broadcast/multicasts it would be an optimization to send them via virtio 
>  and
>  this patch series adds that optimization.  
> >>> Right that's what I think we should rather avoid for now.
> >>>
> >>> It's *not* an optimization if there's a single VM on this host,
> >>> or if a specific multicast group does not have any VMs on same
> >>> host.  
> >> Agreed. In my mind this is something that is controlled by the
> >> pass-thru interface once it is enslaved.  
> > It would be pretty tricky to control through the PT
> > interface since a PT interface pretends to be a physical
> > device, which has no concept of VMs.
> >  
> >>> I'd rather we just sent everything out on the PT if that's
> >>> there. The reason we have virtio in the picture is just so
> >>> we can migrate without downtime.  
> >> I wasn't saying we do that in all cases. That would be something that
> >> would have to be decided by the pass-thru interface. Ideally the
> >> virtio would provide just enough information to get itself into the
> >> bond and I see this being the mechanism for it to do so. From there
> >> the complexity mostly lies in the pass-thru interface to configure the
> >> correct transmit modes if for example you have multiple pass-thru
> >> interfaces or a more complex traffic setup due to things like
> >> SwitchDev.
> >>
> >> In my mind we go the bonding route and there are few use cases for all
> >> of this. First is the backup case that is being addressed here. That
> >> becomes your basic "copy netvsc" approach for this which would be
> >> default. It is how we would handle basic pass-thru back-up paths. If
> >> the host decides to send multicast/broadcast traffic from the host up
> >> through it that is a host side decision. I am okay with our default
> >> transmit behavior from the guest being 

RE: [PATCH 05/14] netvsc: remove no longer needed receive staging buffers

2017-02-06 Thread Stephen Hemminger via Virtualization
The netvsc part is already in net-next.  This patch is not needed.
The part that removes the per-channel state can be in another patch.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH 1/2] hyperv: implement hv_get_tsc_page()

2017-02-09 Thread Stephen Hemminger via Virtualization
The actual code looks fine, but the style police will not like you.
{ should be at start of line on functions.
And #else should be at start of line,

But maybe this was just more of exchange mangling the mail.

-Original Message-
From: Vitaly Kuznetsov [mailto:vkuzn...@redhat.com] 
Sent: Thursday, February 9, 2017 6:11 AM
To: x...@kernel.org; Andy Lutomirski 
Cc: Thomas Gleixner ; Ingo Molnar ; H. 
Peter Anvin ; KY Srinivasan ; Haiyang Zhang 
; Stephen Hemminger ; Dexuan 
Cui ; linux-ker...@vger.kernel.org; 
de...@linuxdriverproject.org; virtualization@lists.linux-foundation.org
Subject: [PATCH 1/2] hyperv: implement hv_get_tsc_page()

To use Hyper-V TSC page clocksource from vDSO we need to make tsc_pg available. 
Implement hv_get_tsc_page() and add CONFIG_HYPERV_TSCPAGE to make #ifdef-s 
simple.

Signed-off-by: Vitaly Kuznetsov 
---
 arch/x86/hyperv/hv_init.c   | 9 +++--
 arch/x86/include/asm/mshyperv.h | 8 
 drivers/hv/Kconfig  | 3 +++
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c index 
b371d0e..0ce8485 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -27,10 +27,15 @@
 #include 
 
 
-#ifdef CONFIG_X86_64
+#ifdef CONFIG_HYPERV_TSCPAGE
 
 static struct ms_hyperv_tsc_page *tsc_pg;
 
+struct ms_hyperv_tsc_page *hv_get_tsc_page(void) {
+   return tsc_pg;
+}
+
 static u64 read_hv_clock_tsc(struct clocksource *arg)  {
u64 current_tick;
@@ -136,7 +141,7 @@ void hyperv_init(void)
/*
 * Register Hyper-V specific clocksource.
 */
-#ifdef CONFIG_X86_64
+#ifdef CONFIG_HYPERV_TSCPAGE
if (ms_hyperv.features & HV_X64_MSR_REFERENCE_TSC_AVAILABLE) {
union hv_x64_msr_hypercall_contents tsc_msr;
 
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h 
index f8dc370..14dd92c 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -173,4 +173,12 @@ void hyperv_report_panic(struct pt_regs *regs);  bool 
hv_is_hypercall_page_setup(void);  void hyperv_cleanup(void);  #endif
+#ifdef CONFIG_HYPERV_TSCPAGE
+struct ms_hyperv_tsc_page *hv_get_tsc_page(void); #else static inline 
+struct ms_hyperv_tsc_page *hv_get_tsc_page(void) {
+   return NULL;
+}
+#endif
 #endif
diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig index 0403b51..c29cd53 
100644
--- a/drivers/hv/Kconfig
+++ b/drivers/hv/Kconfig
@@ -7,6 +7,9 @@ config HYPERV
  Select this option to run Linux as a Hyper-V client operating
  system.
 
+config HYPERV_TSCPAGE
+   def_bool HYPERV && X86_64
+
 config HYPERV_UTILS
tristate "Microsoft Hyper-V Utilities driver"
depends on HYPERV && CONNECTOR && NLS
--
2.9.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method

2017-02-09 Thread Stephen Hemminger via Virtualization
Why not use existing seqlock's?

-Original Message-
From: Thomas Gleixner [mailto:t...@linutronix.de] 
Sent: Thursday, February 9, 2017 9:08 AM
To: Vitaly Kuznetsov 
Cc: x...@kernel.org; Andy Lutomirski ; Ingo Molnar 
; H. Peter Anvin ; KY Srinivasan 
; Haiyang Zhang ; Stephen Hemminger 
; Dexuan Cui ; 
linux-ker...@vger.kernel.org; de...@linuxdriverproject.org; 
virtualization@lists.linux-foundation.org
Subject: Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method

On Thu, 9 Feb 2017, Vitaly Kuznetsov wrote:
> +#ifdef CONFIG_HYPERV_TSCPAGE
> +static notrace u64 vread_hvclock(int *mode) {
> + const struct ms_hyperv_tsc_page *tsc_pg =
> + (const struct ms_hyperv_tsc_page *)&hvclock_page;
> + u64 sequence, scale, offset, current_tick, cur_tsc;
> +
> + while (1) {
> + sequence = READ_ONCE(tsc_pg->tsc_sequence);
> + if (!sequence)
> + break;
> +
> + scale = READ_ONCE(tsc_pg->tsc_scale);
> + offset = READ_ONCE(tsc_pg->tsc_offset);
> + rdtscll(cur_tsc);
> +
> + current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
> +
> + if (READ_ONCE(tsc_pg->tsc_sequence) == sequence)
> + return current_tick;

That sequence stuff lacks still a sensible explanation. It's fundamentally 
different from the sequence counting we do in the kernel, so documentation for 
it is really required.

Thanks,

tglx
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method

2017-02-10 Thread Stephen Hemminger via Virtualization
Since sequence count algorithm is done by hypervisor, better to not reuse 
seqcount.
Still concerned that the code is racy.

-Original Message-
From: Thomas Gleixner [mailto:t...@linutronix.de] 
Sent: Friday, February 10, 2017 4:28 AM
To: Vitaly Kuznetsov 
Cc: Stephen Hemminger ; x...@kernel.org; Andy 
Lutomirski ; Ingo Molnar ; H. Peter 
Anvin ; KY Srinivasan ; Haiyang Zhang 
; Dexuan Cui ; 
linux-ker...@vger.kernel.org; de...@linuxdriverproject.org; 
virtualization@lists.linux-foundation.org
Subject: Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read method

On Fri, 10 Feb 2017, Vitaly Kuznetsov wrote:

> Stephen Hemminger  writes:
> 
> > Why not use existing seqlock's?
> >
> 
> To be honest I don't quite understand how we could use it -- the 
> sequence locking here is done against the page updated by the 
> hypersior, we're not creating new structures (so I don't understand 
> how we could use struct seqcount which we don't have) but I may be 
> misunderstanding something.

You can't use seqlock, but you might be able to use seqcount. Though I doubt it 
given the 0 check 

Thanks,

tglx
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH net-next 4/4] mlx4: sizeof style usage

2017-08-20 Thread Stephen Hemminger via Virtualization
Yes, good catch.

-Original Message-
From: Tariq Toukan [mailto:tar...@mellanox.com] 
Sent: Sunday, August 20, 2017 3:27 AM
To: Stephen Hemminger ; mlind...@marvell.com; 
m...@redhat.com; jasow...@redhat.com
Cc: net...@vger.kernel.org; linux-r...@vger.kernel.org; 
virtualization@lists.linux-foundation.org; Stephen Hemminger 

Subject: Re: [PATCH net-next 4/4] mlx4: sizeof style usage

[You don't often get email from tar...@mellanox.com. Learn why this is 
important at http://aka.ms/LearnAboutSenderIdentification.]

Thanks Stephen.
Sorry for the late reply, I was on vacation.
I know this is already accepted, but still I have one comment.

On 15/08/2017 8:29 PM, Stephen Hemminger wrote:
> The kernel coding style is to treat sizeof as a function
> (ie. with parenthesis) not as an operator.
>
> Also use kcalloc and kmalloc_array
>
> Signed-off-by: Stephen Hemminger 
> ---
> @@ -726,7 +726,7 @@ static int mlx4_eq_int(struct mlx4_dev *dev, struct 
> mlx4_eq *eq)
>   }
>   memcpy(&priv->mfunc.master.comm_arm_bit_vector,
>  eqe->event.comm_channel_arm.bit_vec,
> -sizeof eqe->event.comm_channel_arm.bit_vec);
> +sizeof(eqe)->event.comm_channel_arm.bit_vec);

I think the brackets here are misplaced.
Shouldn't they be as follows?

sizeof(eqe->event.comm_channel_arm.bit_vec));

>   queue_work(priv->mfunc.master.comm_wq,
>  &priv->mfunc.master.comm_work);
>   break;

Thanks,
Tariq
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


<    1   2