Performing verification for Focal.

The affected user enabled -proposed and installed 5.4.0-66-generic to
the system with a QLogic FastLinQ QL41000 Series 10/25/40/50GbE
Controller.

They then set all interfaces down, and brought up the QLogic NIC only.

#‌ uname -rv
5.4.0-66-generic #‌74~18.04.2-Ubuntu SMP Fri Feb 5 11:17:31 UTC 2021

#‌ nslookup internal.kubernetes.domain.example 10.1.0.10
Server: 10.1.0.10
Address: 10.1.0.10#‌53
Name: internal.kubernetes.domain.example
Address: 10.48.24.11

#‌ ethtool -k eno1 | grep tx-checksumming
tx-checksumming: on
#‌ ethtool -k enp94s0f0 | grep tx-checksumming
tx-checksumming: on

DNS lookup to an internal kubernetes domain with IPIP type DNS lookups
work as intended, with tx checksumming enabled.

The kernel in -proposed fixes the issue, marking as verified.

** Tags removed: verification-needed-focal
** Tags added: verification-done-focal

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1909062

Title:
  qede: Kubernetes Internal DNS Failure due to QL41xxx NIC not
  supporting IPIP tx csum offload

Status in linux package in Ubuntu:
  Fix Committed
Status in linux source package in Focal:
  Fix Committed
Status in linux source package in Groovy:
  Fix Committed
Status in linux source package in Hirsute:
  Fix Committed

Bug description:
  BugLink: https://bugs.launchpad.net/bugs/1909062

  [Impact]

  For users with QLogic QL41xxx series NICs, such as the FastLinQ
  QL41000 Series 10/25/40/50GbE Controller, when they upgrade from the
  4.15 kernel to the 5.4 kernel, Kubernetes Internal DNS requests will
  fail, due to these packets getting corrupted.

  Kubernetes uses IPIP tunnelled packets for internal DNS resolution,
  and this particular packet type is not supported for hardware tx
  checksum offload, and the packets end up corrupted when the qede
  driver attempts to checksum them.

  This only affects internal Kubernetes DNS, as regular DNS lookups to
  regular external domains will succeed, due to them not using IPIP
  packet types.

  [Fix]

  Marvell has developed a fix for the qede driver, which checks the
  packet type, and if it is IPPROTO_IPIP, then csum offloads are
  disabled for socket buffers of type IPIP.

  commit 5d5647dad259bb416fd5d3d87012760386d97530
  Author: Manish Chopra <mani...@marvell.com>
  Date: Mon Dec 21 06:55:30 2020 -0800
  Subject: qede: fix offload for IPIP tunnel packets
  Link: 
https://github.com/torvalds/linux/commit/5d5647dad259bb416fd5d3d87012760386d97530

  This commit landed in mainline in 5.11-rc3. The commit was accepted
  into upstream stable 4.14.215, 4.19.167, 5.4.89 and 5.10.7.

  Note, this SRU isn't targeted for Bionic due to tx csum offload
  support only landing in 5.0 and onward, meaning the 4.15 kernel still
  works even without this patch. Because of this, Bionic can pick the
  patch up naturally from upstream stable.

  [Testcase]

  The system must have a QLogic QL41xxx series NIC fitted, and needs to
  be a part of a Kubernetes cluster.

  Firstly, get a list of all devices in the system:

  $ sudo ifconfig

  Next, set all devices down with:

  $ sudo ifconfig <device> down

  Next, bring up the QLogic QL41xxx device:

  $ sudo ifconfig <qlogic nic device> up

  Then, attempt to lookup an internal Kubernetes domain:

  $ nslookup <internal kubernetes domain address>

  Without the patch, the connection will time out:

  ;; connection timed out; no servers could be reached

  If we look at packet traces with tcpdump, we see it leaves the source,
  but never arrives at the destination.

  There is a test kernel available in the following ppa:

  https://launchpad.net/~mruffell/+archive/ubuntu/sf297772-test

  If you install it, then Kubernetes internal DNS lookups will succeed.

  [Where problems could occur]

  If a regression were to occur, then users of the qede driver would be
  affected. This is limited to those with QLogic QL41xxx series NICs.
  The patch explicitly checks for IPIP type packets, so only those
  particular packets would be affected.

  Since IPIP type packets are uncommon, it would not cause a total
  outage on regression, since most packets are not IPIP tunnelled. It
  could potentially cause problems for users who frequently handle VPN
  or Kubernetes internal DNS traffic.

  A workaround would be to use ethtool to disable tx csum offload for
  all packet types, or to revert to an older kernel.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1909062/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to