[Kernel-packages] [Bug 2071471] Re: [UBUNTU 24.04] IOMMU DMA mode changed in kernel config causes massive throughput degradation for PCI-related network workloads

Jürg Häfliger Thu, 17 Apr 2025 14:56:08 -0700

** Tags added: kernel-daily-bug

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2071471


Title:
  [UBUNTU 24.04] IOMMU DMA mode changed in kernel config causes massive
  throughput degradation for PCI-related network workloads

Status in Ubuntu on IBM z Systems:
  Fix Released
Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Noble:
  Fix Released
Status in linux source package in Oracular:
  Fix Released

Bug description:
  SRU Justification:

  [Impact]

   * With the introduction of c76c067e488c "s390/pci: Use dma-iommu layer"
     (upstream with since kernel v6.7-rc1) there was a move (on s390x only)
     to a different dma-iommu implementation.

   * And with 92bce97f0c34 "s390/pci: Fix reset of IOMMU software counters"
     (again upstream since 6.7(rc-1) the IOMMU_DEFAULT_DMA_LAZY kernel config
     option should now be set to 'yes' by default for s390x.

   * Since CONFIG_IOMMU_DEFAULT_DMA_STRICT and IOMMU_DEFAULT_DMA_LAZY
     are related to each other CONFIG_IOMMU_DEFAULT_DMA_STRICT needs to be
     set to "no" by default, which was upstream done by b2b97a62f055
     "Revert "s390: update defconfigs"".

   * These changes are all upstream, but were not picked up by the Ubuntu
     kernel config.

   * And not having these config options set properly is causing significant
     PCI-related network throughput degradation (up to -72%).

   * This shows for almost all workloads and numbers of connections,
     deteriorating with the number of connections increasing.

   * Especially drastic is the drop for a high number of parallel connections
     (50 and 250) and for small and medium-size transactional workloads.
     However, also for streaming-type workloads the degradation is clearly
     visible (up to 48% degradation).

  [Fix]

   * The (upstream accepted) fix is to set
     IOMMU_DEFAULT_DMA_STRICT=no
     and
     IOMMU_DEFAULT_DMA_LAZY=y
     (which is needed for the changed DAM IOMMU implementation since v6.7).

  [Test Case]

   * Setup two Ubuntu Server 24.04 systems (with kernel 6.8)
     (one acting as server and as client)
     that have (PCIe attached) RoCE Express devices attached
     and that are connected to each other.

   * Verify if the the iommu_group type of the used PCI device is DMA-FQ:
     cat /sys/bus/pci/devices/<device>\:00\:00.0/iommu_group/type
     DMA-FQ

   * Sample workload rr1c-200x1000-250 with rr1c-200x1000-250.xml:
     <?xml version="1.0"?>
     <profile name="TCP_RR">
             <group nprocs="250">
                     <transaction iterations="1">
                             <flowop type="connect" options="remotehost=<remote 
IP> protocol=tcp tcp_nodelay" />
                     </transaction>
                     <transaction duration="300">
                             <flowop type="write" options="size=200"/>
                             <flowop type="read" options="size=1000"/>
                     </transaction>
                     <transaction iterations="1">
                             <flowop type="disconnect" />
                     </transaction>
             </group>
     </profile>

   * Install uperf on both systems, client and server.

   * Start uperf at server: uperf -s

   * Start uperf at client: uperf -vai 5 -m uperf-profile.xml

   * Switch from strict to lazy mode
     either using the new kernel (or the test build below)
     or using kernel cmd-line parameter iommu.strict=0.

   * Restart uperf on server and client, like before.

   * Verification will be performed by IBM.

  [Regression Potential]

   * The is a certain regression potential, since the behavior with
     the two modified kernel config options will change significantly.

   * This may solve the (network) throughput issue with PCI devices,
     but may also come with side-effects on other PCIe based devices
     (the old compression adapters or the new NVMe carrier cards).

  [Other]

   * CCW devices are not affected.

   * This is s390x-specific only, hence will not affect any other
  architecture.

  __________

  Symptom:
  Comparing Ubuntu 24.04 (kernelversion: 6.8.0-31-generic) against Ubuntu 
22.04, all of our PCI-related network measurements on LPAR show massive 
throughput degradations (up to -72%). This shows for almost all workloads and 
numbers of connections, detereorating with the number of connections 
increasing. Especially drastic is the drop for a high number of parallel 
connections (50 and 250) and for small and medium-size transactional workloads. 
However, also for streaming-type workloads the degradation is clearly visible 
(up to 48% degradation).

  Problem:
  With kernel config setting CONFIG_IOMMU_DEFAULT_DMA_STRICT=y, IOMMU DMA mode 
changed from lazy to strict, causing these massive degradations.
  Behavior can also be changed with a kernel commandline parameter 
(iommu.strict) for easy verification.

  The issue is known and was quickly fixed upstream in December 2023, after 
being present for little less than two weeks.
  Upstream fix: 
https://github.com/torvalds/linux/commit/b2b97a62f055dd638f7f02087331a8380d8f139a

  Repro:
  rr1c-200x1000-250 with rr1c-200x1000-250.xml:

  <?xml version="1.0"?>
  <profile name="TCP_RR">
          <group nprocs="250">
                  <transaction iterations="1">
                          <flowop type="connect" options="remotehost=<remote 
IP> protocol=tcp  tcp_nodelay" />
                  </transaction>
                  <transaction duration="300">
                          <flowop type="write" options="size=200"/>
                          <flowop type="read" options="size=1000"/>
                  </transaction>
                  <transaction iterations="1">
                          <flowop type="disconnect" />
                  </transaction>
          </group>
  </profile>

  0) Install uperf on both systems, client and server.
  1) Start uperf at server: uperf -s
  2) Start uperf at client: uperf -vai 5 -m uperf-profile.xml

  3) Switch from strict to lazy mode using kernel commandline parameter 
iommu.strict=0.
  4) Repeat steps 1) and 2).

  Example:
  For the following example, we chose the workload named above 
(rr1c-200x1000-250):

  iommu.strict=1 (strict): 233464.914 TPS
  iommu.strict=0 (lazy): 835123.193 TPS

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/2071471/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 2071471] Re: [UBUNTU 24.04] IOMMU DMA mode changed in kernel config causes massive throughput degradation for PCI-related network workloads

Reply via email to