This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed-
groovy' to 'verification-done-groovy'. If the problem still exists,
change the tag 'verification-needed-groovy' to 'verification-failed-
groovy'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: verification-needed-groovy

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1907421

Title:
  [UBUNTU 21.04] vfio: pass DMA availability information to userspace

Status in Ubuntu on IBM z Systems:
  Confirmed
Status in linux package in Ubuntu:
  Fix Committed
Status in linux source package in Focal:
  Confirmed
Status in linux source package in Groovy:
  Fix Committed
Status in linux source package in Hirsute:
  Fix Committed

Bug description:
  Description:   vfio: pass DMA availability information to userspace
  Symptom:       vfio-pci device on s390 enters error state
  Problem:       Commit 492855939bdb added a limit to the number of concurrent
                 DMA requests for a vfio container.  However, lazy unmapping in
                 s390 can in fact cause quite a large number of outstanding DMA
                 requests to build up prior to being purged, potentially the
                 entire guest DMA space.  This results in unexpected errors seen
                 in qemu such as 'VFIO_MAP_DMA failed: No space left on device'
  Solution:      The solution requires a change to both kernel and qemu - For
                 the kernel, add the ability to provide the number of allowable
                 DMA requests via the VFIO_IOMMU_GET_INFO ioctl.
  Reproduction:  Put a vfio-pci device on s390 under I/O load
  Upstream-ID:   a717072007e8aedd3f951726d8cf55454860b30d
                 7d6e1329652ed971d1b6e0e7bea66fba5044e271

  Need also to be integrated into 20.10 and 20.04.

  OK, just to clarify we don't need to fix bionic for this one, but rather 
focal (20.04) and groovy (20.10).  Furthermore, for 20.04, 20.10 and 21.04 ONLY 
commit 7d6e1329652ed971d1b6e0e7bea66fba5044e271 is needed, the other was a 
pre-req that is already present.
  __________

  SRU Justification:

  [Impact]

  * In case a vfio-pci device on s390x is under I/O load, vfio-pci
  device may end up in error state.

  * The commit 492855939bdb added a limit to the number of concurrent
  DMA requests for a vfio container.

  * However, lazy unmapping in s390x can in fact cause quite a large number of 
outstanding DMA requests
    to build up prior to being purged - potentially the entire guest DMA space.

  * This results in unexpected errors seen in qemu such as 'VFIO_MAP_DMA
  failed: No space left on device'.

  * The solution requires a change to both kernel and qemu.

  * The kernel side of things is addressed by this SRU.

  * The fix adds the ability to provide the number of allowable DMA
  requests via VFIO_IOMMU_GET_INFO ioctl.

  * The actual fix comes with commit 7d6e1329652e, but another fix
  ccd59dce1a21 is needed to get it cleanly applied.

  [Fix]

  * ccd59dce1a21f473518bf273bdf5b182bab955b3 ccd59dce1a21 "vfio/type1:
  Refactor vfio_iommu_type1_ioctl()"

  * 7d6e1329652ed971d1b6e0e7bea66fba5044e271 7d6e1329652e "vfio iommu:
  Add dma available capability"

  [Test Case]

  * IBM Z or LinuxONE hardware with Ubuntu Server 20.10 installed.

  * PCIe adapters in place that provide vfio, like RoCE Express 2.

  * A KVM host needs to be setup and a KVM guest (use again 20.10) that
  uses vfio.

  * Generate I/O that flows through the vf and watch out for error like
  'VFIO_MAP_DMA failed: No space left on device' in the log.

  [Regression Potential]

  * The first patch ccd59dce1a21 modifies the common code file
  drivers/vfio/vfio_iommu_type1.c quity significantly.

  * But the reason is not that it introduces a lot of new things, it's a
  refactoring patch.

  * Nevertheless if done in a bad way it can significantly harm the IO
  memory management of virtual function adapters.

  * In worst case it may break them entirely, instead of 'just' exeeding
  the entire DMA space.

  * Things could also go wrong while doing the mapping and unmapping of
  DMA, that may even have an impact beyond vf adapters - harming other
  DMA devices.

  * The handling of dirty pages is also touched and the ioctl itself -
  which is important to keep the control of the devices.

  * But as said before, it re-factoring work, it's upstream accepted
  since 5.9 and the provenance shows that many engineers had an eye on
  these changes.

  * The second patch 7d6e1329652e - that inclides the needed fix - comes
  with far less modifications.

  * It also tounches drivers/vfio/vfio_iommu_type1.c but in a way that
  it mainly add one new function (vfio_iommu_dma_avail_build_caps) and
  an add. if statement in vfio_iommu_type1_get_info.

  * That allows to provide the current allowable number of DMA mappings to the 
userspace via the IOMMU info chain,
    so that the userspace can take appropriate mitigation.

  * Potential problems here can be that the current allowable number of
  DMA mappings are wrong and in best case just mappings are wasted and
  in worst case there are more reported than available in reality, which
  could have a severe impact.

  * What happens in such a case is a bit depending on the userspace.

  * This patch got upstream accepted with kernel 5.10 and pre-tested by
  IBM.

  * In addition a PPA was created with a patched groovy kernel that was
  shared for further testing.

  [Other]

  * The patch got upstream accepted with kernel v5.10, hence it will
  land in Hirsute once the target kernel 5.10 is in.

  * For 5.4 this will come as upstream stable update 5.4.90/91.

  * For 5.8 (and 5.9) upstream stable support already ended, hence this
  SRU is only needed for groovy.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/1907421/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to