Re: [PATCH net-next 3/6] net: remove dsa.h include from linux/netdevice.h

2015-10-08 Thread Wei Xu


On 10/8/2015 10:18 AM, Jiri Pirko wrote:
> Thu, Oct 08, 2015 at 11:04:48AM CEST, l...@intel.com wrote:
>> Hi Vivien,
>>
>> [auto build test ERROR on net-next/master -- if it's inappropriate base, 
>> please ignore]
>>
>> config: arm64-allyesconfig (attached as .config)
>> reproduce:
>>wget 
>> https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
>>  -O ~/bin/make.cross
>>chmod +x ~/bin/make.cross
>># save the attached .config to linux build tree
>>make.cross ARCH=arm64 
>>
>> All errors (new ones prefixed by >>):
>>
>>   In file included from drivers/net/ethernet/hisilicon/hns/hnae.c:15:0:
 drivers/net/ethernet/hisilicon/hns/hnae.h:465:2: error: unknown type name 
 'phy_interface_t'
>> phy_interface_t phy_if;
>> ^
>>
>> vim +/phy_interface_t +465 drivers/net/ethernet/hisilicon/hns/hnae.h
> 
> 

Hi Jiri,

> Looks like hnae.c needs to do "#include " directly.
> Cc'ing maintainer.
> 

Thanks!
We will send the fix patch soon.

Best Regards,
Wei

> 
>>
>> 6fe6611f huangdaode 2015-09-17  449  struct hnae_ae_dev {
>> 6fe6611f huangdaode 2015-09-17  450  struct device cls_dev; /* the 
>> class dev */
>> 6fe6611f huangdaode 2015-09-17  451  struct device *dev; /* the 
>> presented dev */
>> 6fe6611f huangdaode 2015-09-17  452  struct hnae_ae_ops *ops;
>> 6fe6611f huangdaode 2015-09-17  453  struct list_head node;
>> 6fe6611f huangdaode 2015-09-17  454  struct module *owner; /* the 
>> module who provides this dev */
>> 6fe6611f huangdaode 2015-09-17  455  int id;
>> 6fe6611f huangdaode 2015-09-17  456  char name[AE_NAME_SIZE];
>> 6fe6611f huangdaode 2015-09-17  457  struct list_head handle_list;
>> 6fe6611f huangdaode 2015-09-17  458  spinlock_t lock; /* lock to 
>> protect the handle_list */
>> 6fe6611f huangdaode 2015-09-17  459  };
>> 6fe6611f huangdaode 2015-09-17  460  
>> 6fe6611f huangdaode 2015-09-17  461  struct hnae_handle {
>> 6fe6611f huangdaode 2015-09-17  462  struct device *owner_dev; /* 
>> the device which make use of this handle */
>> 6fe6611f huangdaode 2015-09-17  463  struct hnae_ae_dev *dev;  /* 
>> the device who provides this handle */
>> 6fe6611f huangdaode 2015-09-17  464  struct device_node *phy_node;
>> 6fe6611f huangdaode 2015-09-17 @465  phy_interface_t phy_if;
>> 6fe6611f huangdaode 2015-09-17  466  u32 if_support;
>> 6fe6611f huangdaode 2015-09-17  467  int q_num;
>> 6fe6611f huangdaode 2015-09-17  468  int vf_id;
>> 6fe6611f huangdaode 2015-09-17  469  u32 eport_id;
>> 6fe6611f huangdaode 2015-09-17  470  enum hnae_port_type port_type;
>> 6fe6611f huangdaode 2015-09-17  471  struct list_head node;/* 
>> list to hnae_ae_dev->handle_list */
>> 6fe6611f huangdaode 2015-09-17  472  struct hnae_buf_ops *bops; /* 
>> operation for the buffer */
>> 6fe6611f huangdaode 2015-09-17  473  struct hnae_queue **qs;  /* 
>> array base of all queues */
>>
>> :: The code at line 465 was first introduced by commit
>> :: 6fe6611ff275522a4e4c0359e2f46cdd07780d2f net: add Hisilicon Network 
>> Subsystem hnae framework support
>>
>> :: TO: huangdaode 
>> :: CC: David S. Miller 
>>
>> ---
>> 0-DAY kernel test infrastructureOpen Source Technology Center
>> https://lists.01.org/pipermail/kbuild-all   Intel Corporation
> 
> 
> 
> .
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC V4 PATCH 0/8] Packed ring layout for vhost

2018-05-20 Thread Wei Xu
On Wed, May 16, 2018 at 08:32:13PM +0800, Jason Wang wrote:
> Hi all:
> 
> This RFC implement packed ring layout. The code were tested with
> Tiwei's RFC V3 ahttps://lkml.org/lkml/2018/4/25/34. Some fixups and
> tweaks were needed on top of Tiwei's code to make it run for event
> index.

Could you please show the change based on Tiwei's code to easy other's
test?

> 
> Pktgen reports about 20% improvement on PPS (event index is off). More
> testing is ongoing.
> 
> Notes for tester:
> 
> - Start from this version, vhost need qemu co-operation to work
>   correctly. Or you can comment out the packed specific code for
>   GET/SET_VRING_BASE.

Do you mean the code in vhost_virtqueue_start/stop? Both Tiwei's and your v3
work fortunately correctly which should be avoided since the ring should be
definitely different.

Wei

> 
> Changes from V3:
> - Fix math on event idx checking
> - Sync last avail wrap counter through GET/SET_VRING_BASE
> - remove desc_event prefix in the driver/device structure
> 
> Changes from V2:
> - do not use & in checking desc_event_flags
> - off should be most significant bit
> - remove the workaround of mergeable buffer for dpdk prototype
> - id should be in the last descriptor in the chain
> - keep _F_WRITE for write descriptor when adding used
> - device flags updating should use ADDR_USED type
> - return error on unexpected unavail descriptor in a chain
> - return false in vhost_ve_avail_empty is descriptor is available
> - track last seen avail_wrap_counter
> - correctly examine available descriptor in get_indirect_packed()
> - vhost_idx_diff should return u16 instead of bool
> 
> Changes from V1:
> 
> - Refactor vhost used elem code to avoid open coding on used elem
> - Event suppression support (compile test only).
> - Indirect descriptor support (compile test only).
> - Zerocopy support.
> - vIOMMU support.
> - SCSI/VSOCK support (compile test only).
> - Fix several bugs
> 
> Jason Wang (8):
>   vhost: move get_rx_bufs to vhost.c
>   vhost: hide used ring layout from device
>   vhost: do not use vring_used_elem
>   vhost_net: do not explicitly manipulate vhost_used_elem
>   vhost: vhost_put_user() can accept metadata type
>   virtio: introduce packed ring defines
>   vhost: packed ring support
>   vhost: event suppression for packed ring
> 
>  drivers/vhost/net.c| 136 ++
>  drivers/vhost/scsi.c   |  62 +--
>  drivers/vhost/vhost.c  | 861 
> -
>  drivers/vhost/vhost.h  |  47 +-
>  drivers/vhost/vsock.c  |  42 +-
>  include/uapi/linux/virtio_config.h |   9 +
>  include/uapi/linux/virtio_ring.h   |  32 ++
>  7 files changed, 928 insertions(+), 261 deletions(-)
> 
> -- 
> 2.7.4
> 


Re: [RFC V4 PATCH 0/8] Packed ring layout for vhost

2018-05-21 Thread Wei Xu
On Mon, May 21, 2018 at 10:33:30AM +0800, Jason Wang wrote:
> 
> 
> On 2018年05月21日 00:25, Wei Xu wrote:
> >On Wed, May 16, 2018 at 08:32:13PM +0800, Jason Wang wrote:
> >>Hi all:
> >>
> >>This RFC implement packed ring layout. The code were tested with
> >>Tiwei's RFC V3 ahttps://lkml.org/lkml/2018/4/25/34. Some fixups and
> >>tweaks were needed on top of Tiwei's code to make it run for event
> >>index.
> >Could you please show the change based on Tiwei's code to easy other's
> >test?
> 
> Please try Tiwei's V4 instead of just waiting for the fixup. It should work
> unless you don't try zerocopy and vIOMMU.

Yeah, actually v3 of both you guys works well on my test bed except resetting.

> 
> 
> >>Pktgen reports about 20% improvement on PPS (event index is off). More
> >>testing is ongoing.
> >>
> >>Notes for tester:
> >>
> >>- Start from this version, vhost need qemu co-operation to work
> >>   correctly. Or you can comment out the packed specific code for
> >>   GET/SET_VRING_BASE.
> >Do you mean the code in vhost_virtqueue_start/stop?
> 
> For qemu, probably.
> 
> >Both Tiwei's and your v3
> >work fortunately correctly which should be avoided since the ring should be
> >definitely different.
> 
> I don't understand this, you mean reset work?a

No, currently we have not handled vhost start/stop for split/packed ring 
relatively
which is what I am doing now.

Wei

> 
> Thanks
> 
> >
> >Wei
> >
> >>Changes from V3:
> >>- Fix math on event idx checking
> >>- Sync last avail wrap counter through GET/SET_VRING_BASE
> >>- remove desc_event prefix in the driver/device structure
> >>
> >>Changes from V2:
> >>- do not use & in checking desc_event_flags
> >>- off should be most significant bit
> >>- remove the workaround of mergeable buffer for dpdk prototype
> >>- id should be in the last descriptor in the chain
> >>- keep _F_WRITE for write descriptor when adding used
> >>- device flags updating should use ADDR_USED type
> >>- return error on unexpected unavail descriptor in a chain
> >>- return false in vhost_ve_avail_empty is descriptor is available
> >>- track last seen avail_wrap_counter
> >>- correctly examine available descriptor in get_indirect_packed()
> >>- vhost_idx_diff should return u16 instead of bool
> >>
> >>Changes from V1:
> >>
> >>- Refactor vhost used elem code to avoid open coding on used elem
> >>- Event suppression support (compile test only).
> >>- Indirect descriptor support (compile test only).
> >>- Zerocopy support.
> >>- vIOMMU support.
> >>- SCSI/VSOCK support (compile test only).
> >>- Fix several bugs
> >>
> >>Jason Wang (8):
> >>   vhost: move get_rx_bufs to vhost.c
> >>   vhost: hide used ring layout from device
> >>   vhost: do not use vring_used_elem
> >>   vhost_net: do not explicitly manipulate vhost_used_elem
> >>   vhost: vhost_put_user() can accept metadata type
> >>   virtio: introduce packed ring defines
> >>   vhost: packed ring support
> >>   vhost: event suppression for packed ring
> >>
> >>  drivers/vhost/net.c| 136 ++
> >>  drivers/vhost/scsi.c   |  62 +--
> >>  drivers/vhost/vhost.c  | 861 
> >> -
> >>  drivers/vhost/vhost.h  |  47 +-
> >>  drivers/vhost/vsock.c  |  42 +-
> >>  include/uapi/linux/virtio_config.h |   9 +
> >>  include/uapi/linux/virtio_ring.h   |  32 ++
> >>  7 files changed, 928 insertions(+), 261 deletions(-)
> >>
> >>-- 
> >>2.7.4
> >>
> 


Re: [RFC V4 PATCH 7/8] vhost: packed ring support

2018-05-22 Thread Wei Xu
On Wed, May 16, 2018 at 08:32:20PM +0800, Jason Wang wrote:
> Signed-off-by: Jason Wang 
> ---
>  drivers/vhost/net.c   |   3 +-
>  drivers/vhost/vhost.c | 539 
> ++
>  drivers/vhost/vhost.h |   8 +-
>  3 files changed, 513 insertions(+), 37 deletions(-)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 8304c30..f2a0f5b 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -1358,6 +1382,8 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned 
> int ioctl, void __user *arg
>   break;
>   }
>   vq->last_avail_idx = s.num;
> + if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
> + vq->avail_wrap_counter = s.num >> 31;
>   /* Forget the cached index value. */
>   vq->avail_idx = vq->last_avail_idx;
>   break;
> @@ -1366,6 +1392,8 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned 
> int ioctl, void __user *arg
>   s.num = vq->last_avail_idx;
>   if (copy_to_user(argp, &s, sizeof s))
>   r = -EFAULT;
> + if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
> + s.num |= vq->avail_wrap_counter << 31;
>   break;
>   case VHOST_SET_VRING_ADDR:
>   if (copy_from_user(&a, argp, sizeof a)) {

'last_used_idx' also needs to be saved/restored here.

I have figured out the root cause of broken device after reloading
'virtio-net' module, all indices have been reset for a reloading but
'last_used_idx' is not properly reset in this case. This confuses
handle_rx()/tx().

Wei



Re: [RFC V4 PATCH 7/8] vhost: packed ring support

2018-05-23 Thread Wei Xu
On Wed, May 23, 2018 at 09:39:28AM +0800, Jason Wang wrote:
> 
> 
> On 2018年05月23日 00:54, Wei Xu wrote:
> >On Wed, May 16, 2018 at 08:32:20PM +0800, Jason Wang wrote:
> >>Signed-off-by: Jason Wang 
> >>---
> >>  drivers/vhost/net.c   |   3 +-
> >>  drivers/vhost/vhost.c | 539 
> >> ++
> >>  drivers/vhost/vhost.h |   8 +-
> >>  3 files changed, 513 insertions(+), 37 deletions(-)
> >>
> >>diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> >>index 8304c30..f2a0f5b 100644
> >>--- a/drivers/vhost/vhost.c
> >>+++ b/drivers/vhost/vhost.c
> >>@@ -1358,6 +1382,8 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned 
> >>int ioctl, void __user *arg
> >>break;
> >>}
> >>vq->last_avail_idx = s.num;
> >>+   if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
> >>+   vq->avail_wrap_counter = s.num >> 31;
> >>/* Forget the cached index value. */
> >>vq->avail_idx = vq->last_avail_idx;
> >>break;
> >>@@ -1366,6 +1392,8 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned 
> >>int ioctl, void __user *arg
> >>s.num = vq->last_avail_idx;
> >>if (copy_to_user(argp, &s, sizeof s))
> >>r = -EFAULT;
> >>+   if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
> >>+   s.num |= vq->avail_wrap_counter << 31;
> >>break;
> >>case VHOST_SET_VRING_ADDR:
> >>if (copy_from_user(&a, argp, sizeof a)) {
> >'last_used_idx' also needs to be saved/restored here.
> >
> >I have figured out the root cause of broken device after reloading
> >'virtio-net' module, all indices have been reset for a reloading but
> >'last_used_idx' is not properly reset in this case. This confuses
> >handle_rx()/tx().
> >
> >Wei
> >
> 
> Good catch, so we probably need a new ioctl to sync between qemu and vhost.
> 
> Something like VHOST_SET/GET_USED_BASE.

Sure, or can we expand 'vhost_vring_state' to keep them done in a bunch?

> 
> Thanks
> 


Re: "virtio-net: enable multiqueue by default" in linux-next breaks networking on GCE

2016-12-13 Thread Wei Xu


On 2016年12月13日 07:33, Theodore Ts'o wrote:

Hi,

I was doing a last minute regression test of the ext4 tree before
sending a pull request to Linus, which I do using gce-xfstests[1], and
I found that using networking was broken on GCE on linux-next.  I was
using next-20161209, and after bisecting things, I narrowed down the
commit which causing things to break to commit 449000102901:
"virtio-net: enable multiqueue by default".  Reverting this commit on
top of next-20161209 fixed the problem.

[1] http://thunk.org/gce-xfstests

You can reproduce the problem for building the kernel for Google
Compute Engine --- I use a config such as this [2], and then try to
boot a kernel on a VM.  The way I do this involves booting a test
appliance and then kexec'ing into the kernel to be tested[3], using a
2cpu configuration.  (GCE machine type: n1-standard-2)

[2] 
https://git.kernel.org/cgit/fs/ext2/xfstests-bld.git/tree/kernel-configs/ext4-x86_64-config-4.9
[3] 
https://github.com/tytso/xfstests-bld/blob/master/Documentation/gce-xfstests.md

You can then take a look at serial console using a command such as
"gcloud compute instances get-serial-port-output ", and
you will get something like this (see attached).  The important bit is
that the dhclient command is completely failing to be able to get a
response from the network, from which I deduce that apparently that
either networking send or receive or both seem to be badly affected by
the commit in question.

Please let me know if there's anything I can do to help you debug this
further.


Hi Ted,
Just had a quick try on GCE, sorry for my stupid questions.

Q1:
Which distribution are you using for the GCE instance?

Q2:
Are you running xfs test as an embedded VM case, which means XFS test
appliance is also a VM inside the GCE instance? Or the kernel is built
for the instance itself?

Q3:
Can this bug be reproduced for kvm-xfstests case? I'm trying to set up
a local test bed if it makes sense.



Cheers,

- Ted

Dec 11 23:53:20 xfstests-201612120451 kernel: [0.00] Linux version 
4.9.0-rc8-ext4-06387-g03e5cbd (tytso@tytso-ssd) (gcc version 4.9.2 (Debian 
4.9.2-10) ) #9 SMP Mon Dec 12 04:50:16 UTC 2016
Dec 11 23:53:20 xfstests-201612120451 kernel: [0.00] Command line: 
root=/dev/sda1 ro console=ttyS0,38400n8 elevator=noop console=ttyS0  
fstestcfg=4k fstestset=-g,quick fstestexc= fstestopt=aex fstesttyp=ext4 
fstestapi=1.3
Dec 11 23:53:20 xfstests-201612120451 kernel: [0.00] x86/fpu: 
Supporting XSAVE feature 0x001: 'x87 floating point registers'
Dec 11 23:53:20 xfstests-201612120451 kernel: [0.00] x86/fpu: 
Supporting XSAVE feature 0x002: 'SSE registers'
Dec 11 23:53:20 xfstests-201612120451 kernel: [0.00] x86/fpu: 
Supporting XSAVE feature 0x004: 'AVX registers'
Dec 11 23:53:20 xfstests-201612120451 kernel: [0.00] x86/fpu: 
xstate_offset[2]:  576, xstate_sizes[2]:  256
Dec 11 23:53:20 xfstests-201612120451 kernel: [0.00] x86/fpu: Enabled 
xstate features 0x7, context size is 832 bytes, using 'standard' format.
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started Load Kernel Modules.
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting Apply Kernel 
Variables...
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Mounting Configuration File 
System...
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Mounting FUSE Control File 
System...
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Mounted FUSE Control File 
System.
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Mounted Configuration File 
System.
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started Apply Kernel 
Variables.
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started Create Static Device 
Nodes in /dev.
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting udev Kernel Device 
Manager...
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started udev Kernel Device 
Manager.
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started udev Coldplug all 
Devices.
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting udev Wait for 
Complete Device Initialization...
Dec 11 23:53:20 xfstests-201612120451 systemd-fsck[1659]: xfstests-root: clean, 
56268/655360 files, 357439/2620928 blocks
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started File System Check on 
Root Device.
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting Remount Root and 
Kernel File Systems...
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started Remount Root and 
Kernel File Systems.
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Started Various fixups to 
make systemd work better on Debian.
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting Load/Save Random 
Seed...
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Starting Local File Systems 
(Pre).
Dec 11 23:53:20 xfstests-201612120451 systemd[1]: Reached target Local File 
Systems (Pre).
Dec 11 23:53:20 xfstests-201612120451 syst

Re: "virtio-net: enable multiqueue by default" in linux-next breaks networking on GCE

2016-12-13 Thread Wei Xu

On 2016年12月14日 03:44, Theodore Ts'o wrote:

Jason's patch fixed the issue, so I think we have the proper fix, but
to answer your questions:

On Wed, Dec 14, 2016 at 01:46:44AM +0800, Wei Xu wrote:


Q1:
Which distribution are you using for the GCE instance?


The test appliance is based on Debian Jessie.


Q2:
Are you running xfs test as an embedded VM case, which means XFS test
appliance is also a VM inside the GCE instance? Or the kernel is built
for the instance itself?


No, GCE currently doesn't support running nested VM's (e.g., running
VM's inside GCE).  So the kernel is built for the instance itself.
The way the test appliance works is that it initially boots using the
Debian Jessie default kernel and then we kexec into the kernel under
test.


Q3:
Can this bug be reproduced for kvm-xfstests case? I'm trying to set up
a local test bed if it makes sense.


You definitely can't do it out of the box -- you need to build the
image using "gen-image --networking", and then run "kvm-xfstests -N
shell" as root.  But the bug doesn't reproduce on kvm-xfstests, using
a 4.9 host kernel and linux-next guest kernel.



OK, thanks a lot.

BTW, although this is a guest issue, is there anyway to view the GCE
host kernel or qemu(if it is) version?



Cheers,

- Ted



Re: Regression in throughput between kvm guests over virtual bridge

2017-10-12 Thread Wei Xu
On Thu, Oct 05, 2017 at 04:07:45PM -0400, Matthew Rosato wrote:
> 
> Ping...  Jason, any other ideas or suggestions?

Hi Matthew,
Recently I am doing similar test on x86 for this patch, here are some,
differences between our testbeds.

1. It is nice you have got improvement with 50+ instances(or connections here?)
which would be quite helpful to address the issue, also you've figured out the
cost(wait/wakeup), kindly reminder did you pin uperf client/server along the 
whole
path besides vhost and vcpu threads? 

2. It might be useful to short the traffic path as a reference, What I am 
running
is briefly like:
pktgen(host kernel) -> tap(x) -> guest(DPDK testpmd)

The bridge driver(br_forward(), etc) might impact performance due to my personal
experience, so eventually I settled down with this simplified testbed which 
fully
isolates the traffic from both userspace and host kernel stack(1 and 50 
instances,
bridge driver, etc), therefore reduces potential interferences.

The down side of this is that it needs DPDK support in guest, has this ever be
run on s390x guest? An alternative approach is to directly run XDP drop on
virtio-net nic in guest, while this requires compiling XDP inside guest which 
needs
a newer distro(Fedora 25+ in my case or Ubuntu 16.10, not sure).

3. BTW, did you enable hugepage for your guest? It would  performance more
or less depends on the memory demand when generating traffic, I didn't see
similar command lines in yours.

Hope this doesn't make it more complicated for you.:) We will keep working on 
this
and update you.

Thanks,
Wei
 



> 


Re: Regression in throughput between kvm guests over virtual bridge

2017-11-27 Thread Wei Xu
On Mon, Nov 20, 2017 at 02:25:17PM -0500, Matthew Rosato wrote:
> On 11/14/2017 03:11 PM, Matthew Rosato wrote:
> > On 11/12/2017 01:34 PM, Wei Xu wrote:
> >> On Sat, Nov 11, 2017 at 03:59:54PM -0500, Matthew Rosato wrote:
> >>>>> This case should be quite similar with pkgten, if you got improvement 
> >>>>> with
> >>>>> pktgen, usually it was also the same for UDP, could you please try to 
> >>>>> disable
> >>>>> tso, gso, gro, ufo on all host tap devices and guest virtio-net 
> >>>>> devices? Currently
> >>>>> the most significant tests would be like this AFAICT:
> >>>>>
> >>>>> Host->VM 4.124.13
> >>>>>  TCP:
> >>>>>  UDP:
> >>>>> pktgen:
> 
> So, I automated these scenarios for extended overnight runs and started
> experiencing OOM conditions overnight on a 40G system.  I did a bisect
> and it also points to c67df11f.  I can see a leak in at least all of the
> Host->VM testcases (TCP, UDP, pktgen), but the pktgen scenario shows the
> fastest leak.
> 
> I enabled slub_debug on base 4.13 and ran my pktgen scenario in short
> intervals until a large% of host memory was consumed.  Numbers below
> after the last pktgen run completed. The summary is that a very large #
> of active skbuff_head_cache entries can be seen - The sum of alloc/free
> calls match up, but the # of active skbuff_head_cache entries keeps
> growing each time the workload is run and never goes back down in
> between runs.
> 
> free -h:
>  totalusedfree  shared  buff/cache   available
> Mem:   39G 31G6.6G472K1.4G6.8G
> 
>   OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
> 
> 1001952 1000610  99%0.75K  23856 42763392K skbuff_head_cache
> 126192 126153  99%0.36K   2868 44 45888K ksm_rmap_item
> 100485 100435  99%0.41K   1305 77 41760K kernfs_node_cache
>  63294  39598  62%0.48K959 66 30688K dentry
>  31968  31719  99%0.88K888 36 28416K inode_cache
> 
> /sys/kernel/slab/skbuff_head_cache/alloc_calls :
> 259 __alloc_skb+0x68/0x188 age=1/135076/135741 pid=0-11776 cpus=0,2,4,18
> 1000351 __build_skb+0x42/0xb0 age=8114/63172/117830 pid=0-11863 cpus=0,10
> 
> /sys/kernel/slab/skbuff_head_cache/free_calls:
>   13492  age=4295073614 pid=0 cpus=0
>  978298 tun_do_read.part.10+0x18c/0x6a0 age=8532/63624/110571 pid=11733
> cpus=1-19
>   6 skb_free_datagram+0x32/0x78 age=11648/73253/110173 pid=11325
> cpus=4,8,10,12,14
>   3 __dev_kfree_skb_any+0x5e/0x70 age=108957/115043/118269
> pid=0-11605 cpus=5,7,12
>   1 netlink_broadcast_filtered+0x172/0x470 age=136165 pid=1 cpus=4
>   2 netlink_dump+0x268/0x2a8 age=73236/86857/100479 pid=11325 cpus=4,12
>   1 netlink_unicast+0x1ae/0x220 age=12991 pid=9922 cpus=12
>   1 tcp_recvmsg+0x2e2/0xa60 age=0 pid=11776 cpus=6
>   3 unix_stream_read_generic+0x810/0x908 age=15443/50904/118273
> pid=9915-11581 cpus=8,16,18
>   2 tap_do_read+0x16a/0x488 [tap] age=42338/74246/106155
> pid=11605-11699 cpus=2,9
>   1 macvlan_process_broadcast+0x17e/0x1e0 [macvlan] age=18835
> pid=331 cpus=11
>8800 pktgen_thread_worker+0x80a/0x16d8 [pktgen] age=8545/62184/110571
> pid=11863 cpus=0
> 
> 
> By comparison, when running 4.13 with c67df11f reverted, here's the same
> output after the exact same test:
> 
> free -h:
>totalusedfree  shared  buff/cache   available
> Mem: 39G783M 37G472K637M 37G
> 
> slabtop:
>   OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
>714256  35%0.75K 17 42   544K skbuff_head_cache
> 
> /sys/kernel/slab/skbuff_head_cache/alloc_calls:
> 257 __alloc_skb+0x68/0x188 age=0/65252/65507 pid=1-11768 cpus=10,15
> /sys/kernel/slab/skbuff_head_cache/free_calls:
> 255  age=4295003081 pid=0 cpus=0
>   1 netlink_broadcast_filtered+0x2e8/0x4e0 age=65601 pid=1 cpus=15
>   1 tcp_recvmsg+0x2e2/0xa60 age=0 pid=11768 cpus=16
> 

Thanks a lot for the test, and sorry for the late update, I was working on
the code path and didn't find anything helpful to you till today.

I did some tests and initially it turned out that the bottleneck was the guest
kernel stack(napi) side, followed by tracking the traffic footprints and it
appeared as the loss happened when vring was full and could not be drained
out by the guest, afterwards it triggered a SKB drop in vhost driver due
to no headcount to fill it with, it can be avoided by deferring consuming the
SKB after having obtain

Re: Regression in throughput between kvm guests over virtual bridge

2017-11-27 Thread Wei Xu
On Tue, Nov 28, 2017 at 09:36:37AM +0800, Jason Wang wrote:
> 
> 
> On 2017年11月28日 00:21, Wei Xu wrote:
> > On Mon, Nov 20, 2017 at 02:25:17PM -0500, Matthew Rosato wrote:
> > > On 11/14/2017 03:11 PM, Matthew Rosato wrote:
> > > > On 11/12/2017 01:34 PM, Wei Xu wrote:
> > > > > On Sat, Nov 11, 2017 at 03:59:54PM -0500, Matthew Rosato wrote:
> > > > > > > > This case should be quite similar with pkgten, if you got 
> > > > > > > > improvement with
> > > > > > > > pktgen, usually it was also the same for UDP, could you please 
> > > > > > > > try to disable
> > > > > > > > tso, gso, gro, ufo on all host tap devices and guest virtio-net 
> > > > > > > > devices? Currently
> > > > > > > > the most significant tests would be like this AFAICT:
> > > > > > > > 
> > > > > > > > Host->VM 4.124.13
> > > > > > > >   TCP:
> > > > > > > >   UDP:
> > > > > > > > pktgen:
> > > So, I automated these scenarios for extended overnight runs and started
> > > experiencing OOM conditions overnight on a 40G system.  I did a bisect
> > > and it also points to c67df11f.  I can see a leak in at least all of the
> > > Host->VM testcases (TCP, UDP, pktgen), but the pktgen scenario shows the
> > > fastest leak.
> > > 
> > > I enabled slub_debug on base 4.13 and ran my pktgen scenario in short
> > > intervals until a large% of host memory was consumed.  Numbers below
> > > after the last pktgen run completed. The summary is that a very large #
> > > of active skbuff_head_cache entries can be seen - The sum of alloc/free
> > > calls match up, but the # of active skbuff_head_cache entries keeps
> > > growing each time the workload is run and never goes back down in
> > > between runs.
> > > 
> > > free -h:
> > >   totalusedfree  shared  buff/cache   available
> > > Mem:   39G 31G6.6G472K1.4G6.8G
> > > 
> > >OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
> > > 
> > > 1001952 1000610  99%0.75K  23856 42763392K 
> > > skbuff_head_cache
> > > 126192 126153  99%0.36K   2868 44 45888K ksm_rmap_item
> > > 100485 100435  99%0.41K   1305 77 41760K kernfs_node_cache
> > >   63294  39598  62%0.48K95966 30688K dentry
> > >   31968  31719  99%0.88K88836 28416K inode_cache
> > > 
> > > /sys/kernel/slab/skbuff_head_cache/alloc_calls :
> > >  259 __alloc_skb+0x68/0x188 age=1/135076/135741 pid=0-11776 
> > > cpus=0,2,4,18
> > > 1000351 __build_skb+0x42/0xb0 age=8114/63172/117830 pid=0-11863 cpus=0,10
> > > 
> > > /sys/kernel/slab/skbuff_head_cache/free_calls:
> > >13492  age=4295073614 pid=0 cpus=0
> > >   978298 tun_do_read.part.10+0x18c/0x6a0 age=8532/63624/110571 pid=11733
> > > cpus=1-19
> > >6 skb_free_datagram+0x32/0x78 age=11648/73253/110173 pid=11325
> > > cpus=4,8,10,12,14
> > >3 __dev_kfree_skb_any+0x5e/0x70 age=108957/115043/118269
> > > pid=0-11605 cpus=5,7,12
> > >1 netlink_broadcast_filtered+0x172/0x470 age=136165 pid=1 cpus=4
> > >2 netlink_dump+0x268/0x2a8 age=73236/86857/100479 pid=11325 
> > > cpus=4,12
> > >1 netlink_unicast+0x1ae/0x220 age=12991 pid=9922 cpus=12
> > >1 tcp_recvmsg+0x2e2/0xa60 age=0 pid=11776 cpus=6
> > >3 unix_stream_read_generic+0x810/0x908 age=15443/50904/118273
> > > pid=9915-11581 cpus=8,16,18
> > >2 tap_do_read+0x16a/0x488 [tap] age=42338/74246/106155
> > > pid=11605-11699 cpus=2,9
> > >1 macvlan_process_broadcast+0x17e/0x1e0 [macvlan] age=18835
> > > pid=331 cpus=11
> > > 8800 pktgen_thread_worker+0x80a/0x16d8 [pktgen] age=8545/62184/110571
> > > pid=11863 cpus=0
> > > 
> > > 
> > > By comparison, when running 4.13 with c67df11f reverted, here's the same
> > > output after the exact same test:
> > > 
> > > free -h:
> > > totalusedfree  shared  buff/cache   available
> > > Mem: 39G783M 37G472K637M 37G
> > > 
> > > slabtop:
> > >OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACH

Re: Regression in throughput between kvm guests over virtual bridge

2017-11-28 Thread Wei Xu
On Mon, Nov 27, 2017 at 09:44:07PM -0500, Matthew Rosato wrote:
> On 11/27/2017 08:36 PM, Jason Wang wrote:
> > 
> > 
> > On 2017年11月28日 00:21, Wei Xu wrote:
> >> On Mon, Nov 20, 2017 at 02:25:17PM -0500, Matthew Rosato wrote:
> >>> On 11/14/2017 03:11 PM, Matthew Rosato wrote:
> >>>> On 11/12/2017 01:34 PM, Wei Xu wrote:
> >>>>> On Sat, Nov 11, 2017 at 03:59:54PM -0500, Matthew Rosato wrote:
> >>>>>>>> This case should be quite similar with pkgten, if you got
> >>>>>>>> improvement with
> >>>>>>>> pktgen, usually it was also the same for UDP, could you please
> >>>>>>>> try to disable
> >>>>>>>> tso, gso, gro, ufo on all host tap devices and guest virtio-net
> >>>>>>>> devices? Currently
> >>>>>>>> the most significant tests would be like this AFAICT:
> >>>>>>>>
> >>>>>>>> Host->VM 4.12    4.13
> >>>>>>>>   TCP:
> >>>>>>>>   UDP:
> >>>>>>>> pktgen:
> >>> So, I automated these scenarios for extended overnight runs and started
> >>> experiencing OOM conditions overnight on a 40G system.  I did a bisect
> >>> and it also points to c67df11f.  I can see a leak in at least all of the
> >>> Host->VM testcases (TCP, UDP, pktgen), but the pktgen scenario shows the
> >>> fastest leak.
> >>>
> >>> I enabled slub_debug on base 4.13 and ran my pktgen scenario in short
> >>> intervals until a large% of host memory was consumed.  Numbers below
> >>> after the last pktgen run completed. The summary is that a very large #
> >>> of active skbuff_head_cache entries can be seen - The sum of alloc/free
> >>> calls match up, but the # of active skbuff_head_cache entries keeps
> >>> growing each time the workload is run and never goes back down in
> >>> between runs.
> >>>
> >>> free -h:
> >>>   total    used    free  shared  buff/cache   available
> >>> Mem:   39G 31G    6.6G    472K    1.4G    6.8G
> >>>
> >>>    OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
> >>>
> >>> 1001952 1000610  99%    0.75K  23856   42    763392K
> >>> skbuff_head_cache
> >>> 126192 126153  99%    0.36K   2868 44 45888K ksm_rmap_item
> >>> 100485 100435  99%    0.41K   1305 77 41760K kernfs_node_cache
> >>>   63294  39598  62%    0.48K    959 66 30688K dentry
> >>>   31968  31719  99%    0.88K    888 36 28416K inode_cache
> >>>
> >>> /sys/kernel/slab/skbuff_head_cache/alloc_calls :
> >>>  259 __alloc_skb+0x68/0x188 age=1/135076/135741 pid=0-11776
> >>> cpus=0,2,4,18
> >>> 1000351 __build_skb+0x42/0xb0 age=8114/63172/117830 pid=0-11863
> >>> cpus=0,10
> >>>
> >>> /sys/kernel/slab/skbuff_head_cache/free_calls:
> >>>    13492  age=4295073614 pid=0 cpus=0
> >>>   978298 tun_do_read.part.10+0x18c/0x6a0 age=8532/63624/110571 pid=11733
> >>> cpus=1-19
> >>>    6 skb_free_datagram+0x32/0x78 age=11648/73253/110173 pid=11325
> >>> cpus=4,8,10,12,14
> >>>    3 __dev_kfree_skb_any+0x5e/0x70 age=108957/115043/118269
> >>> pid=0-11605 cpus=5,7,12
> >>>    1 netlink_broadcast_filtered+0x172/0x470 age=136165 pid=1 cpus=4
> >>>    2 netlink_dump+0x268/0x2a8 age=73236/86857/100479 pid=11325
> >>> cpus=4,12
> >>>    1 netlink_unicast+0x1ae/0x220 age=12991 pid=9922 cpus=12
> >>>    1 tcp_recvmsg+0x2e2/0xa60 age=0 pid=11776 cpus=6
> >>>    3 unix_stream_read_generic+0x810/0x908 age=15443/50904/118273
> >>> pid=9915-11581 cpus=8,16,18
> >>>    2 tap_do_read+0x16a/0x488 [tap] age=42338/74246/106155
> >>> pid=11605-11699 cpus=2,9
> >>>    1 macvlan_process_broadcast+0x17e/0x1e0 [macvlan] age=18835
> >>> pid=331 cpus=11
> >>>     8800 pktgen_thread_worker+0x80a/0x16d8 [pktgen]
> >>> age=8545/62184/110571
> >>> pid=11863 cpus=0
> >>>
> >>>
> >>> By comparison, when running 4.13 with c67df11f reverted, here's the same
> >>> output after the exact same test:
> >>>
> >>> free -h:
> >>>     total    used  

Re: [PATCH net,stable] vhost: fix skb leak in handle_rx()

2017-11-28 Thread Wei Xu
On Wed, Nov 29, 2017 at 01:06:28PM +0800, Jason Wang wrote:
> 
> 
> On 2017年11月29日 09:53, Jason Wang wrote:
> > 
> > 
> > On 2017年11月29日 01:17, w...@redhat.com wrote:
> > > From: Wei Xu 
> > > 
> > > Matthew found a roughly 40% tcp throughput regression with commit
> > > c67df11f(vhost_net: try batch dequing from skb array) as discussed
> > > in the following thread:
> > > https://www.mail-archive.com/netdev@vger.kernel.org/msg187936.html
> > > 
> > > Eventually we figured out that it was a skb leak in handle_rx()
> > > when sending packets to the VM. This usually happens when a guest
> > > can not drain out vq as fast as vhost fills in, afterwards it sets
> > > off the traffic jam and leaks skb(s) which occurs as no headcount
> > > to send on the vq from vhost side.
> > > 
> > > This can be avoided by making sure we have got enough headcount
> > > before actually consuming a skb from the batched rx array while
> > > transmitting, which is simply done by deferring it a moment later
> > > in this patch.
> > > 
> > > Signed-off-by: Wei Xu 
> > > ---
> > >   drivers/vhost/net.c | 4 ++--
> > >   1 file changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > > index 8d626d7..e76535e 100644
> > > --- a/drivers/vhost/net.c
> > > +++ b/drivers/vhost/net.c
> > > @@ -778,8 +778,6 @@ static void handle_rx(struct vhost_net *net)
> > >   /* On error, stop handling until the next kick. */
> > >   if (unlikely(headcount < 0))
> > >   goto out;
> > > -    if (nvq->rx_array)
> > > -    msg.msg_control = vhost_net_buf_consume(&nvq->rxq);
> > >   /* On overrun, truncate and discard */
> > >   if (unlikely(headcount > UIO_MAXIOV)) {
> > 
> > You need do msg.msg_control = vhost_net_buf_consume() here too,
> > otherwise we may still get it leaked.
> > 
> > Thanks
> 
> Not a leak actually, but the packet won't be consumed and we will hit
> UIO_MAXIOV forever in this case.

I see, thanks, will make a v2.

> 
> Thanks


Re: [PATCH net,stable] vhost: fix skb leak in handle_rx()

2017-11-28 Thread Wei Xu
On Tue, Nov 28, 2017 at 07:50:58PM +0200, Michael S. Tsirkin wrote:
> On Tue, Nov 28, 2017 at 12:17:16PM -0500, w...@redhat.com wrote:
> > From: Wei Xu 
> > 
> > Matthew found a roughly 40% tcp throughput regression with commit
> > c67df11f(vhost_net: try batch dequing from skb array) as discussed
> > in the following thread:
> > https://www.mail-archive.com/netdev@vger.kernel.org/msg187936.html
> > 
> > Eventually we figured out that it was a skb leak in handle_rx()
> > when sending packets to the VM. This usually happens when a guest
> > can not drain out vq as fast as vhost fills in, afterwards it sets
> > off the traffic jam and leaks skb(s) which occurs as no headcount
> > to send on the vq from vhost side.
> > 
> > This can be avoided by making sure we have got enough headcount
> > before actually consuming a skb from the batched rx array while
> > transmitting, which is simply done by deferring it a moment later
> > in this patch.
> > 
> > Signed-off-by: Wei Xu 
> 
> Looks like a good way to fix it, but it will still leak if recvmsg
> returns a wrong length (I'm not sure this can happen in practice, but
> best to keep it simple).

Right, it is better to defend this case, will include it in v2.

> 
> Also, we need to add this before each recvmsg, including overrun,
> and discard on error.

OK.

Wei

> 
> > ---
> >  drivers/vhost/net.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > index 8d626d7..e76535e 100644
> > --- a/drivers/vhost/net.c
> > +++ b/drivers/vhost/net.c
> > @@ -778,8 +778,6 @@ static void handle_rx(struct vhost_net *net)
> > /* On error, stop handling until the next kick. */
> > if (unlikely(headcount < 0))
> > goto out;
> > -   if (nvq->rx_array)
> > -   msg.msg_control = vhost_net_buf_consume(&nvq->rxq);
> > /* On overrun, truncate and discard */
> > if (unlikely(headcount > UIO_MAXIOV)) {
> > iov_iter_init(&msg.msg_iter, READ, vq->iov, 1, 1);
> > @@ -809,6 +807,8 @@ static void handle_rx(struct vhost_net *net)
> >  */
> > iov_iter_advance(&msg.msg_iter, vhost_hlen);
> > }
> > +   if (nvq->rx_array)
> > +   msg.msg_control = vhost_net_buf_consume(&nvq->rxq);
> > err = sock->ops->recvmsg(sock, &msg,
> >  sock_len, MSG_DONTWAIT | MSG_TRUNC);
> > /* Userspace might have consumed the packet meanwhile:
> > -- 
> > 1.8.3.1


Re: [PATCH net,stable] vhost: fix skb leak in handle_rx()

2017-11-28 Thread Wei Xu
On Tue, Nov 28, 2017 at 07:53:33PM +0200, Michael S. Tsirkin wrote:
> On Tue, Nov 28, 2017 at 12:17:16PM -0500, w...@redhat.com wrote:
> > From: Wei Xu 
> > 
> > Matthew found a roughly 40% tcp throughput regression with commit
> > c67df11f(vhost_net: try batch dequing from skb array) as discussed
> > in the following thread:
> > https://www.mail-archive.com/netdev@vger.kernel.org/msg187936.html
> > 
> > Eventually we figured out that it was a skb leak in handle_rx()
> > when sending packets to the VM. This usually happens when a guest
> > can not drain out vq as fast as vhost fills in, afterwards it sets
> > off the traffic jam and leaks skb(s) which occurs as no headcount
> > to send on the vq from vhost side.
> > 
> > This can be avoided by making sure we have got enough headcount
> > before actually consuming a skb from the batched rx array while
> > transmitting, which is simply done by deferring it a moment later
> > in this patch.
> > 
> > Signed-off-by: Wei Xu 
> 
> Given the amount of effort Matthew has put into this,
> you definitely want
> 
> Reported-by: Matthew Rosato 
> 
> here.
> 

Absolutely we want that, sorry for missed you here, Matthew. It is more
like you have figured out this issue independently all by yourself with
a wide assortment of extremely quick tests(tools, throughput, slub leak,
etc), and I am happy that I have the opportunity to do the paperwork on
behalf of you.:) Thanks a lot Matthew.

Wei

> Let's give credit where credit is due.
> 
> Thanks a lot Matthew!
> 
> > ---
> >  drivers/vhost/net.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > index 8d626d7..e76535e 100644
> > --- a/drivers/vhost/net.c
> > +++ b/drivers/vhost/net.c
> > @@ -778,8 +778,6 @@ static void handle_rx(struct vhost_net *net)
> > /* On error, stop handling until the next kick. */
> > if (unlikely(headcount < 0))
> > goto out;
> > -   if (nvq->rx_array)
> > -   msg.msg_control = vhost_net_buf_consume(&nvq->rxq);
> > /* On overrun, truncate and discard */
> > if (unlikely(headcount > UIO_MAXIOV)) {
> > iov_iter_init(&msg.msg_iter, READ, vq->iov, 1, 1);
> > @@ -809,6 +807,8 @@ static void handle_rx(struct vhost_net *net)
> >  */
> > iov_iter_advance(&msg.msg_iter, vhost_hlen);
> > }
> > +   if (nvq->rx_array)
> > +   msg.msg_control = vhost_net_buf_consume(&nvq->rxq);
> > err = sock->ops->recvmsg(sock, &msg,
> >  sock_len, MSG_DONTWAIT | MSG_TRUNC);
> > /* Userspace might have consumed the packet meanwhile:
> > -- 
> > 1.8.3.1


Re: [PATCH net,stable v2] vhost: fix skb leak in handle_rx()

2017-11-29 Thread Wei Xu
On Wed, Nov 29, 2017 at 10:43:33PM +0800, Jason Wang wrote:
> 
> 
> On 2017年11月29日 22:23, w...@redhat.com wrote:
> > From: Wei Xu 
> > 
> > Matthew found a roughly 40% tcp throughput regression with commit
> > c67df11f(vhost_net: try batch dequing from skb array) as discussed
> > in the following thread:
> > https://www.mail-archive.com/netdev@vger.kernel.org/msg187936.html
> > 
> > Eventually we figured out that it was a skb leak in handle_rx()
> > when sending packets to the VM. This usually happens when a guest
> > can not drain out vq as fast as vhost fills in, afterwards it sets
> > off the traffic jam and leaks skb(s) which occurs as no headcount
> > to send on the vq from vhost side.
> > 
> > This can be avoided by making sure we have got enough headcount
> > before actually consuming a skb from the batched rx array while
> > transmitting, which is simply done by moving checking the zero
> > headcount a bit ahead.
> > 
> > Also strengthen the small possibility of leak in case of recvmsg()
> > fails by freeing the skb.
> > 
> > Signed-off-by: Wei Xu 
> > Reported-by: Matthew Rosato 
> > ---
> >   drivers/vhost/net.c | 23 +--
> >   1 file changed, 13 insertions(+), 10 deletions(-)
> > 
> > v2:
> > - add Matthew as the reporter, thanks matthew.
> > - moving zero headcount check ahead instead of defer consuming skb
> >due to jason and mst's comment.
> > - add freeing skb in favor of recvmsg() fails.
> > 
> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > index 8d626d7..e302e08 100644
> > --- a/drivers/vhost/net.c
> > +++ b/drivers/vhost/net.c
> > @@ -778,16 +778,6 @@ static void handle_rx(struct vhost_net *net)
> > /* On error, stop handling until the next kick. */
> > if (unlikely(headcount < 0))
> > goto out;
> > -   if (nvq->rx_array)
> > -   msg.msg_control = vhost_net_buf_consume(&nvq->rxq);
> > -   /* On overrun, truncate and discard */
> > -   if (unlikely(headcount > UIO_MAXIOV)) {
> > -   iov_iter_init(&msg.msg_iter, READ, vq->iov, 1, 1);
> > -   err = sock->ops->recvmsg(sock, &msg,
> > -1, MSG_DONTWAIT | MSG_TRUNC);
> > -   pr_debug("Discarded rx packet: len %zd\n", sock_len);
> > -   continue;
> > -   }
> > /* OK, now we need to know about added descriptors. */
> > if (!headcount) {
> > if (unlikely(vhost_enable_notify(&net->dev, vq))) {
> > @@ -800,6 +790,18 @@ static void handle_rx(struct vhost_net *net)
> >  * they refilled. */
> > goto out;
> > }
> > +   if (nvq->rx_array)
> > +   msg.msg_control = vhost_net_buf_consume(&nvq->rxq);
> > +   /* On overrun, truncate and discard */
> > +   if (unlikely(headcount > UIO_MAXIOV)) {
> > +   iov_iter_init(&msg.msg_iter, READ, vq->iov, 1, 1);
> > +   err = sock->ops->recvmsg(sock, &msg,
> > +1, MSG_DONTWAIT | MSG_TRUNC);
> > +   if (unlikely(err != 1))
> > +   kfree_skb((struct sk_buff *)msg.msg_control);
> 
> I think we'd better fix this in tun/tap (better in another patch) otherwise
> it lead to an odd API: some case skb were freed in recvmsg() but caller
> still need to deal with the rest case.

Right, it is better to handle it in recvmsg().

Wei


Re: [PATCH net,stable v2] vhost: fix skb leak in handle_rx()

2017-11-30 Thread Wei Xu
On Thu, Nov 30, 2017 at 10:46:17AM +0800, Jason Wang wrote:
> 
> 
> On 2017年11月29日 23:31, Michael S. Tsirkin wrote:
> > On Wed, Nov 29, 2017 at 09:23:24AM -0500,w...@redhat.com  wrote:
> > > From: Wei Xu
> > > 
> > > Matthew found a roughly 40% tcp throughput regression with commit
> > > c67df11f(vhost_net: try batch dequing from skb array) as discussed
> > > in the following thread:
> > > https://www.mail-archive.com/netdev@vger.kernel.org/msg187936.html
> > > 
> > > Eventually we figured out that it was a skb leak in handle_rx()
> > > when sending packets to the VM. This usually happens when a guest
> > > can not drain out vq as fast as vhost fills in, afterwards it sets
> > > off the traffic jam and leaks skb(s) which occurs as no headcount
> > > to send on the vq from vhost side.
> > > 
> > > This can be avoided by making sure we have got enough headcount
> > > before actually consuming a skb from the batched rx array while
> > > transmitting, which is simply done by moving checking the zero
> > > headcount a bit ahead.
> > > 
> > > Also strengthen the small possibility of leak in case of recvmsg()
> > > fails by freeing the skb.
> > > 
> > > Signed-off-by: Wei Xu
> > > Reported-by: Matthew Rosato
> > > ---
> > >   drivers/vhost/net.c | 23 +--
> > >   1 file changed, 13 insertions(+), 10 deletions(-)
> > > 
> > > v2:
> > > - add Matthew as the reporter, thanks matthew.
> > > - moving zero headcount check ahead instead of defer consuming skb
> > >due to jason and mst's comment.
> > > - add freeing skb in favor of recvmsg() fails.
> > > 
> > > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > > index 8d626d7..e302e08 100644
> > > --- a/drivers/vhost/net.c
> > > +++ b/drivers/vhost/net.c
> > > @@ -778,16 +778,6 @@ static void handle_rx(struct vhost_net *net)
> > >   /* On error, stop handling until the next kick. */
> > >   if (unlikely(headcount < 0))
> > >   goto out;
> > > - if (nvq->rx_array)
> > > - msg.msg_control = vhost_net_buf_consume(&nvq->rxq);
> > > - /* On overrun, truncate and discard */
> > > - if (unlikely(headcount > UIO_MAXIOV)) {
> > > - iov_iter_init(&msg.msg_iter, READ, vq->iov, 1, 1);
> > > - err = sock->ops->recvmsg(sock, &msg,
> > > -  1, MSG_DONTWAIT | MSG_TRUNC);
> > > - pr_debug("Discarded rx packet: len %zd\n", sock_len);
> > > - continue;
> > > - }
> > >   /* OK, now we need to know about added descriptors. */
> > >   if (!headcount) {
> > >   if (unlikely(vhost_enable_notify(&net->dev, 
> > > vq))) {
> > > @@ -800,6 +790,18 @@ static void handle_rx(struct vhost_net *net)
> > >* they refilled. */
> > >   goto out;
> > >   }
> > > + if (nvq->rx_array)
> > > + msg.msg_control = vhost_net_buf_consume(&nvq->rxq);
> > > + /* On overrun, truncate and discard */
> > > + if (unlikely(headcount > UIO_MAXIOV)) {
> > > + iov_iter_init(&msg.msg_iter, READ, vq->iov, 1, 1);
> > > + err = sock->ops->recvmsg(sock, &msg,
> > > +  1, MSG_DONTWAIT | MSG_TRUNC);
> > > + if (unlikely(err != 1))
> > Why 1? How is receiving 1 byte special or even possible?
> > Also, I wouldn't put an unlikely here. It's all error handling code anyway.

Vhost is dropping the skb by invoking a 1 byte recvmsg() here, while it
is kind of weird to free skb since it would have been freed in recvmsg()
for most cases, and the return value doesn't make sense too much.

> > 
> > > + kfree_skb((struct sk_buff *)msg.msg_control);
> > You do not need a cast here.

Yes, exactly, I missed it.

> > Also, is it really safe to refer to msg_control here?
> > I'd rather keep a copy of the skb pointer and use it than assume
> > caller did not change it. But also see below.

It should be safe since msg is a local variable here, the calle

Re: Regression in throughput between kvm guests over virtual bridge

2017-10-23 Thread Wei Xu
On Wed, Oct 18, 2017 at 04:17:51PM -0400, Matthew Rosato wrote:
> On 10/12/2017 02:31 PM, Wei Xu wrote:
> > On Thu, Oct 05, 2017 at 04:07:45PM -0400, Matthew Rosato wrote:
> >>
> >> Ping...  Jason, any other ideas or suggestions?
> > 
> > Hi Matthew,
> > Recently I am doing similar test on x86 for this patch, here are some,
> > differences between our testbeds.
> > 
> > 1. It is nice you have got improvement with 50+ instances(or connections 
> > here?)
> > which would be quite helpful to address the issue, also you've figured out 
> > the
> > cost(wait/wakeup), kindly reminder did you pin uperf client/server along 
> > the whole
> > path besides vhost and vcpu threads? 
> 
> Was not previously doing any pinning whatsoever, just reproducing an
> environment that one of our testers here was running.  Reducing guest
> vcpu count from 4->1, still see the regression.  Then, pinned each vcpu
> thread and vhost thread to a separate host CPU -- still made no
> difference (regression still present).
> 
> > 
> > 2. It might be useful to short the traffic path as a reference, What I am 
> > running
> > is briefly like:
> > pktgen(host kernel) -> tap(x) -> guest(DPDK testpmd)
> > 
> > The bridge driver(br_forward(), etc) might impact performance due to my 
> > personal
> > experience, so eventually I settled down with this simplified testbed which 
> > fully
> > isolates the traffic from both userspace and host kernel stack(1 and 50 
> > instances,
> > bridge driver, etc), therefore reduces potential interferences.
> > 
> > The down side of this is that it needs DPDK support in guest, has this ever 
> > be
> > run on s390x guest? An alternative approach is to directly run XDP drop on
> > virtio-net nic in guest, while this requires compiling XDP inside guest 
> > which needs
> > a newer distro(Fedora 25+ in my case or Ubuntu 16.10, not sure).
> > 
> 
> I made an attempt at DPDK, but it has not been run on s390x as far as
> I'm aware and didn't seem trivial to get working.
> 
> So instead I took your alternate suggestion & did:
> pktgen(host) -> tap(x) -> guest(xdp_drop)

It is really nice of you for having tried this, I also tried this on x86 with 
two ubuntu 16.04 guests, but unfortunately I couldn't reproduce it as well,
but I did get lower throughput with 50 instances than one instance(1-4 vcpus),
is this the same on s390x? 

> 
> When running this setup, I am not able to reproduce the regression.  As
> mentioned previously, I am also unable to reproduce when running one end
> of the uperf connection from the host - I have only ever been able to
> reproduce when both ends of the uperf connection are running within a guest.

Did you see improvement when running uperf from the host if no regression? 

It would be pretty nice to run pktgen from the VM as Jason suggested in another
mail(pktgen(vm1) -> tap1 -> bridge -> tap2 -> vm2), this is super close to your
original test case and can help to determine if we can get some clue with tcp or
bridge driver.

Also I am interested in your hardware platform, how many NUMA nodes do you have?
what about your binding(vcpu/vhost/pktgen). For my case, I got a server with 4
NUMA nodes and 12 cpus for each sockets, and I am explicitly launching qemu from
cpu0, then bind vhost(Rx/Tx) to cpu 2&3, and vcpus start from cpu 4(3 vcpus for
each).

> 
> > 3. BTW, did you enable hugepage for your guest? It would  performance more
> > or less depends on the memory demand when generating traffic, I didn't see
> > similar command lines in yours.
> > 
> 
> s390x does not currently support passing through hugetlb backing via
> QEMU mem-path.

Okay, thanks for sharing this.

Wei


> 


Re: Regression in throughput between kvm guests over virtual bridge

2017-10-26 Thread Wei Xu
On Wed, Oct 25, 2017 at 04:21:26PM -0400, Matthew Rosato wrote:
> On 10/22/2017 10:06 PM, Jason Wang wrote:
> > 
> > 
> > On 2017年10月19日 04:17, Matthew Rosato wrote:
> >>> 2. It might be useful to short the traffic path as a reference, What
> >>> I am running
> >>> is briefly like:
> >>>  pktgen(host kernel) -> tap(x) -> guest(DPDK testpmd)
> >>>
> >>> The bridge driver(br_forward(), etc) might impact performance due to
> >>> my personal
> >>> experience, so eventually I settled down with this simplified testbed
> >>> which fully
> >>> isolates the traffic from both userspace and host kernel stack(1 and
> >>> 50 instances,
> >>> bridge driver, etc), therefore reduces potential interferences.
> >>>
> >>> The down side of this is that it needs DPDK support in guest, has
> >>> this ever be
> >>> run on s390x guest? An alternative approach is to directly run XDP
> >>> drop on
> >>> virtio-net nic in guest, while this requires compiling XDP inside
> >>> guest which needs
> >>> a newer distro(Fedora 25+ in my case or Ubuntu 16.10, not sure).
> >>>
> >> I made an attempt at DPDK, but it has not been run on s390x as far as
> >> I'm aware and didn't seem trivial to get working.
> >>
> >> So instead I took your alternate suggestion & did:
> >> pktgen(host) -> tap(x) -> guest(xdp_drop)
> >>
> >> When running this setup, I am not able to reproduce the regression.  As
> >> mentioned previously, I am also unable to reproduce when running one end
> >> of the uperf connection from the host - I have only ever been able to
> >> reproduce when both ends of the uperf connection are running within a
> >> guest.
> >>
> > 
> > Thanks for the test. Looking at the code, the only obvious difference
> > when BATCH is 1 is that one spinlock which was previously called by
> > tun_peek_len() was avoided since we can do it locally. I wonder whether
> > or not this speeds up handle_rx() a little more then leads more wakeups
> > during some rates/sizes of TCP stream. To prove this, maybe you can try:
> > 
> > - enable busy polling, using poll-us=1000, and to see if we can still
> > get the regression
> 
> Enabled poll-us=1000 for both guests - drastically reduces throughput,
> but can still see the regression between host 4.12->4.13 running the
> uperf workload
> 
> 
> > - measure the pps pktgen(vm1) -> tap1 -> bridge -> tap2 -> vm2
> > 
> 
> I'm getting apparent stalls when I run pktgen from the guest in this
> manner...  (pktgen thread continues spinning after the first 5000
> packets make it to vm2, but no further packets get sent).  Not sure why yet.
> 


Are you using the same binding as mentioned in previous mail sent by you? it
might be caused by cpu convention between pktgen and vhost, could you please
try to run pktgen from another idle cpu by adjusting the binding? 
 
BTW, did you see any improvement when running pktgen from the host if no 
regression was found? Since this can be reproduced with only 1 vcpu for
guest, may you try this bind? This might help simplify the problem.
  vcpu0  -> cpu2
  vhost  -> cpu3
  pktgen -> cpu1 

Wei


Re: Regression in throughput between kvm guests over virtual bridge

2017-10-30 Thread Wei Xu
On Thu, Oct 26, 2017 at 01:53:12PM -0400, Matthew Rosato wrote:
> 
> > 
> > Are you using the same binding as mentioned in previous mail sent by you? it
> > might be caused by cpu convention between pktgen and vhost, could you please
> > try to run pktgen from another idle cpu by adjusting the binding? 
> 
> I don't think that's the case -- I can cause pktgen to hang in the guest
> without any cpu binding, and with vhost disabled even.

Yes, I did a test and it also hangs in guest, before we figure it out,
maybe you try udp with uperf with this case?

VM   -> Host
Host -> VM
VM   -> VM

> 
> > BTW, did you see any improvement when running pktgen from the host if no 
> > regression was found? Since this can be reproduced with only 1 vcpu for
> > guest, may you try this bind? This might help simplify the problem.
> >   vcpu0  -> cpu2
> >   vhost  -> cpu3
> >   pktgen -> cpu1 
> > 
> 
> Yes -- I ran the pktgen test from host to guest with the binding
> described.  I see an approx 5% increase in throughput from 4.12->4.13.
> Some numbers:
> 
> host-4.12: 1384486.2pps 663.8MB/sec
> host-4.13: 1434598.6pps 688.2MB/sec

That's great, at least we are aligned in this case.

Jason, any thoughts on this? 

Wei

> 


Re: Regression in throughput between kvm guests over virtual bridge

2017-11-04 Thread Wei Xu
On Fri, Nov 03, 2017 at 12:30:12AM -0400, Matthew Rosato wrote:
> On 10/31/2017 03:07 AM, Wei Xu wrote:
> > On Thu, Oct 26, 2017 at 01:53:12PM -0400, Matthew Rosato wrote:
> >>
> >>>
> >>> Are you using the same binding as mentioned in previous mail sent by you? 
> >>> it
> >>> might be caused by cpu convention between pktgen and vhost, could you 
> >>> please
> >>> try to run pktgen from another idle cpu by adjusting the binding? 
> >>
> >> I don't think that's the case -- I can cause pktgen to hang in the guest
> >> without any cpu binding, and with vhost disabled even.
> > 
> > Yes, I did a test and it also hangs in guest, before we figure it out,
> > maybe you try udp with uperf with this case?
> > 
> > VM   -> Host
> > Host -> VM
> > VM   -> VM
> > 
> 
> Here are averaged run numbers (Gbps throughput) across 4.12, 4.13 and
> net-next with and without Jason's recent "vhost_net: conditionally
> enable tx polling" applied (referred to as 'patch' below).  1 uperf
> instance in each case:

Thanks a lot for the test. 

> 
> uperf TCP:
>4.12   4.134.13+patch  net-nextnet-next+patch
> --
> VM->VM 35.2   16.520.84   22.224.36

Are you using the same server/test suite? You mentioned the number was around 
28Gb for 4.12 and it dropped about 40% for 4.13, it seems thing changed, are
there any options for performance tuning on the server to maximize the cpu
utilization? 

I had similar experience on x86 server and desktop before and it made that
the result number always went up and down pretty much.

> VM->Host 42.1543.57   44.90   30.83   32.26
> Host->VM 53.1741.51   42.18   37.05   37.30

This is a bit odd, I remember you said there was no regression while 
testing Host>VM, wasn't it? 

> 
> uperf UDP:
>4.12   4.134.13+patch  net-nextnet-next+patch
> --
> VM->VM 24.93  21.63   25.09   8.869.62
> VM->Host 40.2138.21   39.72   8.749.35
> Host->VM 31.2630.18   31.25   7.2 9.26

This case should be quite similar with pkgten, if you got improvement with
pktgen, usually it was also the same for UDP, could you please try to disable
tso, gso, gro, ufo on all host tap devices and guest virtio-net devices? 
Currently
the most significant tests would be like this AFAICT:

Host->VM 4.124.13
 TCP:
 UDP:
pktgen:

Don't want to bother you too much, so maybe 4.12 & 4.13 without Jason's patch 
should
work since we have seen positive number for that, you can also temporarily skip
net-next as well.

If you see UDP and pktgen are aligned, then it might be helpful to continue
the other two cases, otherwise we fail in the first place.

> The net is that Jason's recent patch definitely improves things across
> the board at 4.13 as well as at net-next -- But the VM<->VM TCP numbers
> I am observing are still lower than base 4.12.

Cool.

> 
> A separate concern is why my UDP numbers look so bad on net-next (have
> not bisected this yet).

This might be another issue, I am in vacation, will try it on x86 once back
to work on next Wednesday.

Wei

> 


Re: Regression in throughput between kvm guests over virtual bridge

2017-11-12 Thread Wei Xu
On Tue, Nov 07, 2017 at 08:02:48PM -0500, Matthew Rosato wrote:
> On 11/04/2017 07:35 PM, Wei Xu wrote:
> > On Fri, Nov 03, 2017 at 12:30:12AM -0400, Matthew Rosato wrote:
> >> On 10/31/2017 03:07 AM, Wei Xu wrote:
> >>> On Thu, Oct 26, 2017 at 01:53:12PM -0400, Matthew Rosato wrote:
> >>>>
> >>>>>
> >>>>> Are you using the same binding as mentioned in previous mail sent by 
> >>>>> you? it
> >>>>> might be caused by cpu convention between pktgen and vhost, could you 
> >>>>> please
> >>>>> try to run pktgen from another idle cpu by adjusting the binding? 
> >>>>
> >>>> I don't think that's the case -- I can cause pktgen to hang in the guest
> >>>> without any cpu binding, and with vhost disabled even.
> >>>
> >>> Yes, I did a test and it also hangs in guest, before we figure it out,
> >>> maybe you try udp with uperf with this case?
> >>>
> >>> VM   -> Host
> >>> Host -> VM
> >>> VM   -> VM
> >>>
> >>
> >> Here are averaged run numbers (Gbps throughput) across 4.12, 4.13 and
> >> net-next with and without Jason's recent "vhost_net: conditionally
> >> enable tx polling" applied (referred to as 'patch' below).  1 uperf
> >> instance in each case:
> > 
> > Thanks a lot for the test. 
> > 
> >>
> >> uperf TCP:
> >> 4.12   4.134.13+patch  net-nextnet-next+patch
> >> --
> >> VM->VM  35.2   16.520.84   22.224.36
> > 
> > Are you using the same server/test suite? You mentioned the number was 
> > around 
> > 28Gb for 4.12 and it dropped about 40% for 4.13, it seems thing changed, are
> > there any options for performance tuning on the server to maximize the cpu
> > utilization? 
> 
> I experience some volatility as I am running on 1 of multiple LPARs
> available to this system (they are sharing physical resources).  But I
> think the real issue was that I left my guest environment set to 4
> vcpus, but was binding assuming there was 1 vcpu (was working on
> something else, forgot to change back).  This likely tainted my most
> recent results, sorry.

Not a problem at all, also thanks for the feedback. :)

> 
> > 
> > I had similar experience on x86 server and desktop before and it made that
> > the result number always went up and down pretty much.
> > 
> >> VM->Host 42.15 43.57   44.90   30.83   32.26
> >> Host->VM 53.17 41.51   42.18   37.05   37.30
> > 
> > This is a bit odd, I remember you said there was no regression while 
> > testing Host>VM, wasn't it? 
> > 
> >>
> >> uperf UDP:
> >> 4.12   4.134.13+patch  net-nextnet-next+patch
> >> --
> >> VM->VM  24.93  21.63   25.09   8.869.62
> >> VM->Host 40.21 38.21   39.72   8.749.35
> >> Host->VM 31.26 30.18   31.25   7.2 9.26
> > 
> > This case should be quite similar with pkgten, if you got improvement with
> > pktgen, usually it was also the same for UDP, could you please try to 
> > disable
> > tso, gso, gro, ufo on all host tap devices and guest virtio-net devices? 
> > Currently
> > the most significant tests would be like this AFAICT:
> > 
> > Host->VM 4.124.13
> >  TCP:
> >  UDP:
> > pktgen:
> > 
> > Don't want to bother you too much, so maybe 4.12 & 4.13 without Jason's 
> > patch should
> > work since we have seen positive number for that, you can also temporarily 
> > skip
> > net-next as well.
> 
> Here are the requested numbers, averaged over numerous runs --  guest is
> 4GB+1vcpu, host uperf/pktgen bound to 1 host CPU + qemu and vhost thread
> pinned to other unique host CPUs.  tso, gso, gro, ufo disabled on host
> taps / guest virtio-net devs as requested:
> 
> Host->VM  4.124.13
> TCP:  9.92Gb/s6.44Gb/s
> UDP:  5.77Gb/s6.63Gb/s
> pktgen:   1572403pps  1904265pps
> 
> UDP/pktgen both show improvement from 4.12->4.13.  More interesting,
> however, is that I am seeing the TCP regression for the first time from
> host->VM.  I wonder if the combi

Re: Regression in throughput between kvm guests over virtual bridge

2017-11-12 Thread Wei Xu
On Sat, Nov 11, 2017 at 03:59:54PM -0500, Matthew Rosato wrote:
> >> This case should be quite similar with pkgten, if you got improvement with
> >> pktgen, usually it was also the same for UDP, could you please try to 
> >> disable
> >> tso, gso, gro, ufo on all host tap devices and guest virtio-net devices? 
> >> Currently
> >> the most significant tests would be like this AFAICT:
> >>
> >> Host->VM 4.124.13
> >>  TCP:
> >>  UDP:
> >> pktgen:
> >>
> >> Don't want to bother you too much, so maybe 4.12 & 4.13 without Jason's 
> >> patch should
> >> work since we have seen positive number for that, you can also temporarily 
> >> skip
> >> net-next as well.
> > 
> > Here are the requested numbers, averaged over numerous runs --  guest is
> > 4GB+1vcpu, host uperf/pktgen bound to 1 host CPU + qemu and vhost thread
> > pinned to other unique host CPUs.  tso, gso, gro, ufo disabled on host
> > taps / guest virtio-net devs as requested:
> > 
> > Host->VM4.124.13
> > TCP:9.92Gb/s6.44Gb/s
> > UDP:5.77Gb/s6.63Gb/s
> > pktgen: 1572403pps  1904265pps
> > 
> > UDP/pktgen both show improvement from 4.12->4.13.  More interesting,
> > however, is that I am seeing the TCP regression for the first time from
> > host->VM.  I wonder if the combination of CPU binding + disabling of one
> > or more of tso/gso/gro/ufo is related.
> > 
> >>
> >> If you see UDP and pktgen are aligned, then it might be helpful to continue
> >> the other two cases, otherwise we fail in the first place.
> > 
> 
> I continued running many iterations of these tests between 4.12 and
> 4.13..  My throughput findings can be summarized as:

Really nice to have these numbers.

> 
> VM->VM case:
> UDP:  roughly equivalent
> TCP:  Consistent regression (5-10%)
> 
> VM->Host
> Both UDP and TCP traffic are roughly equivalent.

The patch improves performance for Rx from guest point of view, so the Tx
would be no big difference since the Rx packets are far less than Tx in 
this case.

> 
> Host->VM
> UDP+pktgen: improvement (5-10%), but inconsistent
> TCP: Consistent regression (25-30%)

Maybe we can try to figure out this case first since it is the shortest path,
can you have a look at TCP statistics and paste a few outputs between tests?
I am suspecting there are some retransmitting, zero window probing, etc.

> 
> Host->VM UDP and pktgen seemed to show improvement in some runs, and in
> others seemed to mirror 4.12-level performance.
> 
> The TCP regression for VM->VM is no surprise, we started with that.
> It's still consistent, but smaller in this specific environment.

Right, there are too many facts might influent the performance.

> 
> The TCP regression in Host->VM is interesting because I wasn't seeing it
> consistently before binding CPUs + disabling tso/gso/gro/ufo.  Also
> interesting because of how large it is -- By any chance can you see this
> regression on x86 with the same configuration?

Had a quick test and it seems I also got drop on x86 without tso,gro,..., data
with/without tso,gso,..., will check out tcp statistics and let you know soon.

4.12  
--
master32.34s   112.63GB29.91Gb/s  40310900.00
master32.33s32.58GB 8.66Gb/s  11660140.00
-

4.13
-
master32.35s   119.17GB31.64Gb/s  42651900.00
master32.33s27.02GB 7.18Gb/s   9670070.00
-

Wei 



[net-next] net: smsc911x: Remove unused variables

2020-09-07 Thread Wei Xu
Fixes the following W=1 kernel build warning(s):

 drivers/net/ethernet/smsc/smsc911x.c: In function ‘smsc911x_rx_fastforward’:
 drivers/net/ethernet/smsc/smsc911x.c:1199:16: warning: variable ‘temp’ set but 
not used [-Wunused-but-set-variable]

 drivers/net/ethernet/smsc/smsc911x.c: In function 
‘smsc911x_eeprom_write_location’:
 drivers/net/ethernet/smsc/smsc911x.c:2058:6: warning: variable ‘temp’ set but 
not used [-Wunused-but-set-variable]

Signed-off-by: Wei Xu 
---
 drivers/net/ethernet/smsc/smsc911x.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/smsc/smsc911x.c 
b/drivers/net/ethernet/smsc/smsc911x.c
index fc168f8..823d9a7 100644
--- a/drivers/net/ethernet/smsc/smsc911x.c
+++ b/drivers/net/ethernet/smsc/smsc911x.c
@@ -1196,9 +1196,8 @@ smsc911x_rx_fastforward(struct smsc911x_data *pdata, 
unsigned int pktwords)
SMSC_WARN(pdata, hw, "Timed out waiting for "
  "RX FFWD to finish, RX_DP_CTRL: 0x%08X", val);
} else {
-   unsigned int temp;
while (pktwords--)
-   temp = smsc911x_reg_read(pdata, RX_DATA_FIFO);
+   smsc911x_reg_read(pdata, RX_DATA_FIFO);
}
 }
 
@@ -2055,7 +2054,6 @@ static int smsc911x_eeprom_write_location(struct 
smsc911x_data *pdata,
  u8 address, u8 data)
 {
u32 op = E2P_CMD_EPC_CMD_ERASE_ | address;
-   u32 temp;
int ret;
 
SMSC_TRACE(pdata, drv, "address 0x%x, data 0x%x", address, data);
@@ -2066,7 +2064,7 @@ static int smsc911x_eeprom_write_location(struct 
smsc911x_data *pdata,
smsc911x_reg_write(pdata, E2P_DATA, (u32)data);
 
/* Workaround for hardware read-after-write restriction */
-   temp = smsc911x_reg_read(pdata, BYTE_TEST);
+   smsc911x_reg_read(pdata, BYTE_TEST);
 
ret = smsc911x_eeprom_send_cmd(pdata, op);
}
-- 
2.8.1



[net-next] net: i40e: Use the ARRAY_SIZE macro for aq_to_posix

2020-09-09 Thread Wei Xu
Use the ARRAY_SIZE macro to calculate the size of an array.
This code was detected with the help of Coccinelle.

Signed-off-by: Wei Xu 
---
 drivers/net/ethernet/intel/i40e/i40e_adminq.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq.h 
b/drivers/net/ethernet/intel/i40e/i40e_adminq.h
index edec3df..11c5fca 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_adminq.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_adminq.h
@@ -120,7 +120,7 @@ static inline int i40e_aq_rc_to_posix(int aq_ret, int aq_rc)
if (aq_ret == I40E_ERR_ADMIN_QUEUE_TIMEOUT)
return -EAGAIN;
 
-   if (!((u32)aq_rc < (sizeof(aq_to_posix) / sizeof((aq_to_posix)[0]
+   if (!((u32)aq_rc < ARRAY_SIZE(aq_to_posix)))
return -ERANGE;
 
return aq_to_posix[aq_rc];
-- 
2.8.1



[net-next] net: iavf: Use the ARRAY_SIZE macro for aq_to_posix

2020-09-09 Thread Wei Xu
Use the ARRAY_SIZE macro to calculate the size of an array.
This code was detected with the help of Coccinelle.

Signed-off-by: Wei Xu 
---
 drivers/net/ethernet/intel/iavf/iavf_adminq.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_adminq.h 
b/drivers/net/ethernet/intel/iavf/iavf_adminq.h
index baf2fe2..eead12c 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_adminq.h
+++ b/drivers/net/ethernet/intel/iavf/iavf_adminq.h
@@ -120,7 +120,7 @@ static inline int iavf_aq_rc_to_posix(int aq_ret, int aq_rc)
if (aq_ret == IAVF_ERR_ADMIN_QUEUE_TIMEOUT)
return -EAGAIN;
 
-   if (!((u32)aq_rc < (sizeof(aq_to_posix) / sizeof((aq_to_posix)[0]
+   if (!((u32)aq_rc < ARRAY_SIZE(aq_to_posix)))
return -ERANGE;
 
return aq_to_posix[aq_rc];
-- 
2.8.1



[PATCH] net/mlx5e: kTLS, Fix GFP_KERNEL in spinlock context

2020-09-03 Thread Wei Xu
Replace GFP_KERNEL with GFP_ATOMIC while resync_post_get_progress_params
is invoked in a spinlock context.
This code was detected with the help of Coccinelle.

Signed-off-by: Wei Xu 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_rx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_rx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_rx.c
index acf6d80..1a32435 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_rx.c
@@ -247,7 +247,7 @@ resync_post_get_progress_params(struct mlx5e_icosq *sq,
int err;
u16 pi;
 
-   buf = kzalloc(sizeof(*buf), GFP_KERNEL);
+   buf = kzalloc(sizeof(*buf), GFP_ATOMIC);
if (unlikely(!buf)) {
err = -ENOMEM;
goto err_out;
-- 
2.8.1



Re: [PATCH] net/mlx5e: kTLS, Fix GFP_KERNEL in spinlock context

2020-09-03 Thread Wei Xu
Hi All,

Sorry for the noise and please ignore it!
I found a nearly same patch has been sent out 2 days before.

Best Regards,
Wei

On 2020/9/3 19:45, Wei Xu wrote:
> Replace GFP_KERNEL with GFP_ATOMIC while resync_post_get_progress_params
> is invoked in a spinlock context.
> This code was detected with the help of Coccinelle.
> 
> Signed-off-by: Wei Xu 
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_rx.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_rx.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_rx.c
> index acf6d80..1a32435 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_rx.c
> @@ -247,7 +247,7 @@ resync_post_get_progress_params(struct mlx5e_icosq *sq,
>   int err;
>   u16 pi;
>  
> - buf = kzalloc(sizeof(*buf), GFP_KERNEL);
> + buf = kzalloc(sizeof(*buf), GFP_ATOMIC);
>   if (unlikely(!buf)) {
>   err = -ENOMEM;
>   goto err_out;
>