This RFC proposes to allow a vfio-pci device to manipulate the PCI
Express capability of an associated root port to enable Atomic Op
completer support as equivalent to host capabilities.  This would
dynamically change capability bits in the config space of the root
port on realize and exit of the vfio-pci device under the following
condiations:

 - The vfio-pci device is directly connected to the root port and
   the root port implements a v2 PCIe capability, thereby supporting
   the DEVCAP2 register.

 - The vfio-pci device is exposed as a single-function device.

 - Atomic completer support is not otherwise reported on the root port.

 - The vfio-pci device reports the VFIO_DEVICE_INFO_CAP_PCI_ATOMIC_COMP
   capability with a non-zero set of supported atomic completer widths.

This proposal aims to avoid complications of reporting Atomic Ops
Routing, which can easily escalate into invalid P2P paths.  We also
require a specific VM configuration to avoid dependencies between
devices which may be sourced from dissimilar host paths.

While it's not exactly standard practice to modify root port device
capabilities runtime, it also does not seem to be precluded by the PCIe
Spec (6.0.1).  The Atomic Op completion bits of the DEVCAP2 register
are defined as Read-only:

7.4 Configuration Register Types
 Read-only - Register bits are read-only and cannot be altered by software.
             Where explicitly defined, these bits are used to reflect changing
             hardware state, and as a result bit values can be observed to
             change at run time. Register bit default values and bits that
             cannot change value at run time, are permitted to be hard-coded,
             initialized by system/device firmware, or initialized by hardware
             mechanisms such as pin strapping or nonvolatile storage.
             Initialization by system firmware is permitted only for system-
             integrated devices. If the optional feature that would Set the
             bits is not implemented, the bits must be hardwired to Zero.

Here "altered by software" is relative to guest writes to the config
space register, whereas in this implementation we're acting as hardware
and the bits are changing to reflect a change in runtime capabilities.
The spec does include a HwInit register type which would restrict the
value from changing at runtime outside of resets.  Therefore while it
would not be advised to update these bits arbitrarily, it does seem safe
and compatible with guest software to update the value on device attach
and detach.

Note that of the Linux in-kernel drivers that make use of Atomic Ops,
it's not common for the driver to test Atomic Ops support of the device
itself.  Support is assumed, therefore it's fruitless to provide masking
of support at the device rather than the root port.

Also, by allowing this dynamic support, enabling Atomic Ops becomes
transparent to VM management tools.  There is no requirement to
designate specific Atomic Ops capabilities to a root port and impose a
burden on other userspace utilities.

Feedback welcome.  Thanks,

Alex

v2:
 - Don't require cold-plug device, modify RP bits around realize/exit

Alex Williamson (4):
  linux-headers: Update for vfio capability reporting AtomicOps
  vfio: Implement a common device info helper
  pcie: Add a PCIe capability version helper
  vfio/pci: Enable AtomicOps completers on root ports

 hw/pci/pcie.c                 |  7 ++++
 hw/s390x/s390-pci-vfio.c      | 37 +++--------------
 hw/vfio/common.c              | 46 ++++++++++++++++-----
 hw/vfio/pci.c                 | 78 +++++++++++++++++++++++++++++++++++
 hw/vfio/pci.h                 |  1 +
 include/hw/pci/pcie.h         |  1 +
 include/hw/vfio/vfio-common.h |  1 +
 linux-headers/linux/vfio.h    | 14 +++++++
 8 files changed, 142 insertions(+), 43 deletions(-)

-- 
2.39.2


Reply via email to