[Kernel-packages] [Bug 1854842] Re: mlx5_core reports hardware checksum error for padded packets on Mellanox NICs

Matthew Ruffell Mon, 03 Feb 2020 15:56:33 -0800

Hi Mohammad,

It seems things have returned back to normal for this current SRU cycle,
and the two commits you requested:


net/mlx5e: Rx, Fix checksum calculation for new hardware
net/mlx5e: Rx, Fixup skb checksum for packets with tail padding

Have been tagged and built into the 4.15.0-87-generic bionic kernel,
which is currently sitting in -proposed awaiting validation.

Can you please install the kernel in -proposed, and run the reproducer
and check that no kernel splat is generated when you send large IP
packets with padding at the end?

Instructions to install (on a bionic system):
1) Add the -proposed repository, by adding the following line to 
/etc/apt/sources.list
deb http://archive.ubuntu.com/ubuntu/ bionic-proposed restricted main 
multiverse universe
2) sudo apt update
3) sudo apt install linux-image-4.15.0-87-generic 
linux-modules-4.15.0-87-generic \
linux-modules-extra-4.15.0-87-generic linux-headers-4.15.0-87 
linux-headers-4.15.0-87-generic
4) sudo reboot
5) uname -rv
4.15.0-87-generic #87-Ubuntu SMP Fri Jan 31 19:32:37 UTC 2020

Hopefully the reproducer shows everything has been fixed.

I apologise again for the delay, the kernel team were really adamant
about having no regressions in the previous SRU cycle, but things should
be back to normal now.

Let me know how the kernel in -proposed goes.

Thanks,
Matthew

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1854842

Title:
  mlx5_core reports hardware checksum error for padded packets on
  Mellanox NICs

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Fix Committed

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1854842

  [Impact]

  On machines equipped with Mellanox NIC's, in this particular case,
  Mellanox 5 series NICs using the mlx5_core driver, there is a kernel
  splat when sending large IP packets which have padding at the end.

  enp6s0f0: hw csum failure
  CPU: 19 PID: 0 Comm: swapper/19 Not tainted 4.15.0-72-generic
  Call Trace:
  <IRQ>
  dump_stack+0x63/0x8e
  netdev_rx_csum_fault+0x38/0x40
  __skb_checksum_complete+0xbc/0xd0
  nf_ip_checksum+0xc3/0xf0
  icmp_error+0x27d/0x310 [nf_conntrack_ipv4]
  nf_conntrack_in+0x15a/0x510 [nf_conntrack]
  ? __skb_checksum+0x68/0x330
  ipv4_conntrack_in+0x1c/0x20 [nf_conntrack_ipv4]
  nf_hook_slow+0x48/0xc0
  ? skb_send_sock+0x50/0x50
  ip_rcv+0x301/0x360
  ? inet_del_offload+0x40/0x40
  __netif_receive_skb_core+0x432/0xb80
  __netif_receive_skb+0x18/0x60
  ? __netif_receive_skb+0x18/0x60
  netif_receive_skb_internal+0x45/0xe0
  napi_gro_receive+0xc5/0xf0
  mlx5e_handle_rx_cqe+0x48d/0x5e0 [mlx5_core]
  ? enqueue_task_rt+0x1b4/0x2e0
  mlx5e_poll_rx_cq+0xd1/0x8c0 [mlx5_core]
  mlx5e_napi_poll+0x9d/0x290 [mlx5_core]
  net_rx_action+0x140/0x3a0
  __do_softirq+0xe4/0x2d4
  irq_exit+0xc5/0xd0
  do_IRQ+0x86/0xe0
  common_interrupt+0x8c/0x8c
  </IRQ>

  This bug is a further attempt to fix these splats, as there has been
  previous fixes in LP #1840854 and a series of commits which landed in
  4.15.0-67 (LP #1847155) as a part of upstream -stable patches.

  This bug will also fix the same problems on the new Mellanox CX6 and
  Bluefield hardware, which has been enabled already via previous
  upstream -stable patches which landed in LP #1847155.

  [Fix]

  This particular issue was fixed for Mellanox series 5 drivers in the
  following commits:

  commit 0aa1d18615c163f92935b806dcaff9157645233a
  Author: Saeed Mahameed <sae...@mellanox.com>
  Date:   Tue Mar 12 00:24:52 2019 -0700
  Subject: net/mlx5e: Rx, Fixup skb checksum for packets with tail padding

  This commit required a minor backport.

  This commit was selected for upstream -stable in 4.19.76 and 5.0.10.
  This commit appears to be omitted from "Bionic update: upstream stable 
patchset 2019-10-07", which is LP #1847155, probably due to requiring a 
backport.

  commit db849faa9bef993a1379dc510623f750a72fa7ce
  Author: Saeed Mahameed <sae...@mellanox.com>
  Date:   Fri May 3 13:14:59 2019 -0700
  Subject: net/mlx5e: Rx, Fix checksum calculation for new hardware

  This commit required a minor backport.

  This commit was selected for upstream -stable in 5.1.21 and 5.2.4.
  This commit has already been applied to the disco kernel, as part of stable 
updates.

  [Testcase]

  The following scapy script will reproduce this issue. Run from the
  machine with the Mellanox series 5 NIC:

  1)
  
a=Ether(dst='ff:ff:ff:ff:ff:ff')/IP(dst='127.0.0.1')/ICMP()/Padding(load='\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe')

  2) sendp(a, iface='enp6s0f0')

  3) Check dmesg on the reciever side. The example uses localhost, so
  check dmesg.

  I have built some test kernels, which are available here:

  https://launchpad.net/~mruffell/+archive/ubuntu/lp1854842-test
  This kernel contains 0aa1d18615c163f92935b806dcaff9157645233a.

  and

  https://launchpad.net/~mruffell/+archive/ubuntu/lp1854842-test-2
  This kernel contains db849faa9bef993a1379dc510623f750a72fa7ce.

  If you install the test kernels the issue is resolved.

  [Regression Potential]

  The changes are limited to the mlx5_core driver, and only modify how
  packet checksums are calculated when padding is involved.

  Both patches have been accepted and published by upstream -stable, and
  are widely accepted by the community.

  Because of this, I believe the risk of regression is low.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1854842/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1854842] Re: mlx5_core reports hardware checksum error for padded packets on Mellanox NICs

Reply via email to