This RFC proposes to allow a vfio-pci device to manipulate the PCI Express capability of an associated root port to enable Atomic Op completer support as equivalent to host capabilities. This would dynamically change capability bits in the config space of the root port on realize and exit of the vfio-pci device under the following condiations:
- The vfio-pci device is directly connected to the root port and the root port implements a v2 PCIe capability, thereby supporting the DEVCAP2 register. - The vfio-pci device is exposed as a single-function device. - Atomic completer support is not otherwise reported on the root port. - The vfio-pci device reports the VFIO_DEVICE_INFO_CAP_PCI_ATOMIC_COMP capability with a non-zero set of supported atomic completer widths. This proposal aims to avoid complications of reporting Atomic Ops Routing, which can easily escalate into invalid P2P paths. We also require a specific VM configuration to avoid dependencies between devices which may be sourced from dissimilar host paths. While it's not exactly standard practice to modify root port device capabilities runtime, it also does not seem to be precluded by the PCIe Spec (6.0.1). The Atomic Op completion bits of the DEVCAP2 register are defined as Read-only: 7.4 Configuration Register Types Read-only - Register bits are read-only and cannot be altered by software. Where explicitly defined, these bits are used to reflect changing hardware state, and as a result bit values can be observed to change at run time. Register bit default values and bits that cannot change value at run time, are permitted to be hard-coded, initialized by system/device firmware, or initialized by hardware mechanisms such as pin strapping or nonvolatile storage. Initialization by system firmware is permitted only for system- integrated devices. If the optional feature that would Set the bits is not implemented, the bits must be hardwired to Zero. Here "altered by software" is relative to guest writes to the config space register, whereas in this implementation we're acting as hardware and the bits are changing to reflect a change in runtime capabilities. The spec does include a HwInit register type which would restrict the value from changing at runtime outside of resets. Therefore while it would not be advised to update these bits arbitrarily, it does seem safe and compatible with guest software to update the value on device attach and detach. Note that of the Linux in-kernel drivers that make use of Atomic Ops, it's not common for the driver to test Atomic Ops support of the device itself. Support is assumed, therefore it's fruitless to provide masking of support at the device rather than the root port. Also, by allowing this dynamic support, enabling Atomic Ops becomes transparent to VM management tools. There is no requirement to designate specific Atomic Ops capabilities to a root port and impose a burden on other userspace utilities. Feedback welcome. Thanks, Alex v2: - Don't require cold-plug device, modify RP bits around realize/exit Alex Williamson (4): linux-headers: Update for vfio capability reporting AtomicOps vfio: Implement a common device info helper pcie: Add a PCIe capability version helper vfio/pci: Enable AtomicOps completers on root ports hw/pci/pcie.c | 7 ++++ hw/s390x/s390-pci-vfio.c | 37 +++-------------- hw/vfio/common.c | 46 ++++++++++++++++----- hw/vfio/pci.c | 78 +++++++++++++++++++++++++++++++++++ hw/vfio/pci.h | 1 + include/hw/pci/pcie.h | 1 + include/hw/vfio/vfio-common.h | 1 + linux-headers/linux/vfio.h | 14 +++++++ 8 files changed, 142 insertions(+), 43 deletions(-) -- 2.39.2