Implement passthru of PCI devices to unprivileged virtual machines
(VMs) when Linux is running as a privileged VM on Microsoft Hyper-V
hypervisor. This support is made to fit within the workings of VFIO
framework, and any VMM needing to use it must use the VFIO subsystem.
This supports both full device passthru and SR-IOV based VFs.
At a high level, the hypervisor supports traditional mapped iommu domains
that use explicit map and unmap hypercalls for mapping and unmapping guest
RAM into the iommu subsystem. Hyper-V also has a concept of direct attach
devices whereby the iommu subsystem simply uses the guest HW page table
(ept/npt/..). This series adds support for both, and both are made to
work with the VFIO subsystem.
While this Part I focuses on memory mappings, upcoming Part II
will focus on irq bypass along with some minor irq remapping
updates.
Based on: cd9f2e7d6e5b (origin/hyperv-next)
Testing:
o Most testing done on hyperv-next:e733a9e28180 using Cloud Hypervisor (51).
o Limited testing on : cd9f2e7d6e5b
o Tested with impending Part II irq patches.
o All tests involved PF passthru of devices using MSIx.
o Following combinations were tested:
- L1VH(1): test 1: Mellanox ConnectX-6 Lx passthru
test 2: NVIDIA Tesla Tesla T4 GPU.
test 3: Both of above simultaneous passthru
- Baremetal dom0/root: All of above.
(1) L1VH: this is a semi privileged VM that runs on Windows root on
Hyper-V, and allows users to create more child VMs.
Pending: This to establish a baseline for further enhancements.
o arm64 : some delta to make this work on arm64 (in progress).
o device sleep/wakeup.
o More stress testing
o CH reports it could not unbind vfio group upon guest shutdown. Need
to reboot for now.
o Qemu support (in progress).
Changes in V1:
o patch 1: Don't tie hyperv-irq.c to CONFIG_HYPERV_IOMMU.
o patch 4: Redesigned to address security vulnerability found by copilot
with passing tgid as a parameter. Also, do tgid setting right
after setting pt_id.
o patch 5: Remove unused type parameter from mshv_device_ops.device_create
o patch 7: mshv_partition_ioctl_create_device cleanup on copy_to_user.
o patch 10: Add export of hv_build_devid_type_pci here to get rid of
patch 11.
o patch 12: Move functions to build device ids from patch 11 here for
the benefit of arm64. Rename file to: hyperv-iommu-root.c.
o patch 13: removed to be made part of interrupt part II of this support.
o patch 14: get rid of fast path to reduce review noise.
o New (last) patch to pin ram regions if device passthru to a VM.
Thanks,
-Mukesh
Mukesh R (13):
iommu/hyperv: rename hyperv-iommu.c to hyperv-irq.c
x86/hyperv: cosmetic changes in irqdomain.c for readability
x86/hyperv: add insufficient memory support in irqdomain.c
mshv: Provide a way to get partition id if running in a VMM process
mshv: Declarations and definitions for VFIO-MSHV bridge device
mshv: Implement mshv bridge device for VFIO
mshv: Add ioctl support for MSHV-VFIO bridge device
PCI: hv: rename hv_compose_msi_msg to hv_vmbus_compose_msi_msg
mshv: Import data structs around device passthru from hyperv headers
PCI: hv: Build device id for a VMBus device, export PCI devid function
x86/hyperv: Implement hyperv virtual iommu
mshv: Populate mmio mappings for PCI passthru
mshv: pin all ram mem regions if partition has device passthru
MAINTAINERS | 3 +-
arch/x86/hyperv/irqdomain.c | 229 +++--
arch/x86/include/asm/mshyperv.h | 4 +
arch/x86/kernel/pci-dma.c | 2 +
drivers/hv/Makefile | 3 +-
drivers/hv/mshv_root.h | 26 +
drivers/hv/mshv_root_main.c | 256 ++++-
drivers/hv/mshv_vfio.c | 211 ++++
drivers/iommu/Kconfig | 5 +-
drivers/iommu/Makefile | 3 +-
drivers/iommu/hyperv-iommu-root.c | 899 ++++++++++++++++++
.../iommu/{hyperv-iommu.c => hyperv-irq.c} | 2 +-
drivers/iommu/irq_remapping.c | 2 +-
drivers/pci/controller/pci-hyperv.c | 120 ++-
include/asm-generic/mshyperv.h | 34 +
include/hyperv/hvgdk_mini.h | 11 +
include/hyperv/hvhdk_mini.h | 112 +++
include/linux/hyperv.h | 6 +
include/uapi/linux/mshv.h | 31 +
19 files changed, 1790 insertions(+), 169 deletions(-)
create mode 100644 drivers/hv/mshv_vfio.c
create mode 100644 drivers/iommu/hyperv-iommu-root.c
rename drivers/iommu/{hyperv-iommu.c => hyperv-irq.c} (99%)
--
2.51.2.vfs.0.1