When using vfio-pci to pass through devices, though it's desired to use its default implementations in most of time, it is also sometimes necessary to call vendors specific operations. For example, in order to do device live migration, the way of dirty pages detection and device state save-restore may be varied from device to device. Vendors may want to add a vendor device region or may want to intercept writes to a BAR region. So, in this series, we introduce a way to allow vendors to provide vendor specific ops for VFIO devices and meanwhile export several vfio-pci interfaces as default implementations to simplify code of vendor driver and avoid duplication.
Vendor driver registration/unregistration goes like this: (1) macros are provided to let vendor drivers register/unregister vfio_pci_vendor_driver_ops to vfio_pci in their module_init() and module_exit(). vfio_pci_vendor_driver_ops contains callbacks probe() and remove() and a pointer to vfio_device_ops. (2) vendor drivers define their module aliases as "vfio-pci:$vendor_id-$device_id". E.g. A vendor module for VF devices of Intel(R) Ethernet Controller XL710 family can define its module alias as MODULE_ALIAS("vfio-pci:8086-154c"). (3) when module vfio_pci is bound to a device, it would call modprobe in user space for modules of alias "vfio-pci:$vendor_id-$device_id", which would trigger unloaded vendor drivers to register their vfio_pci_vendor_driver_ops to vfio_pci. Then it searches registered ops list and calls probe() to test whether this vendor driver supports this physical device. A success probe() would make bind vfio device to vendor provided vfio_device_ops, which would call exported default implementations in vfio_pci_ops if necessary. _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __________ (un)register vendor ops | ___________ ___________ | | |<----------------------------| VF | | | | vfio-pci | | | vendor | | PF driver | | |__________|---------------------------->| driver | |___________| | probe/remove() | ----------- | | | | | |_ _ _ _ _ _ _ _ _ _ _ _|_ _ _ _| \|/ \|/ ----------- ------------ | VF | | PF | ----------- ------------ a typical usage in SRIOV Ref counts: (1) vendor drivers must be a module and compiled to depend on module vfio_pci. (2) In vfio_pci, a successful register would add refs of itself, and a successful unregister would derefs of itself. (3) In vfio_pci, a successful probe() of a vendor driver would add ref of the vendor module. It derefs of the vendor module after calling remove(). (4) macro provided to make sure vendor module always unregister itself in its module_exit Those are to prevent below conditions: a. vfio_pci is unloaded after a successful register from vendor driver. Though vfio_pci would later call modprobe to ask the vendor module to register again, it cannot help if vendor driver remain as loaded across unloading-loading of vfio_pci. b. vendor driver unregisters itself after successfully probed by vfio_pci. c. circular dependency between vfio_pci and the vendor driver. if vfio_pci adds refs to both vfio_pci and vendor driver on a successful register and if vendor driver only do the unregistration in its module_exit, then it would have no chance to do the unregistration. Patch Overview patches 1-2 provide register/unregister interfaces for vendor drivers Patch 3 exports several members in vdev, including vendor_data, and exports functions in vfio_pci_ops to allow them accessible from vendor drivers. patches 4-5 export some more vdev members to vendor driver to simplify their implementations. patch 6 is from Tina Zhang to define vendor specific Irq type capability. patch 7 introduces a new vendor defined irq type VFIO_IRQ_TYPE_REMAP_BAR_REGION. patches 8-10 use VF live migration driver for Intel's 710 SRIOV devices as an example of how to implement this vendor ops interface. patch 8 first let the vendor ops pass through VFs. patch 9 implements a migration region based on migration protocol defined in [1][2]. (Some dirty page tracking functions are intentionally commented out and would send out later in future.) patch 10 serves as an example of how to define vendor specific irq type. This irq will trigger qemu to dynamic map BAR regions in order to implement software based dirty page track. Changelog: RFC v3- RFC v4: - use exported function to make vendor driver access internal fields of vdev rather than make struct vfio_pci_device public. (Alex) - add a new interface vfio_pci_get_barmap() to call vfio_pci_setup_barma() and let vfio_pci_setup_barmap() still able to return detailed errno. (Alex) - removed sample code to pass through igd devices. instead, use the first patch (patch 8/10) of i40e vf migration as an mere pass-through example. - rebased code to 5.7 and VFIO migration kernel patches v17 and qemu patches v16. - added a demo of vendor defined irq type. RFC v2- RFC v3: - embedding struct vfio_pci_device into struct vfio_pci_device_private. (Alex) RFC v1- RFC v2: - renamed mediate ops to vendor ops - use of request_module and module alias to manage vendor driver load (Alex) - changed from vfio_pci_ops calling vendor ops to vendor ops calling default vfio_pci_ops (Alex) - dropped patches for dynamic traps of BARs. will submit them later. Links: [1] VFIO migration kernel v17: https://patchwork.kernel.org/cover/11466129/ [2] VFIO migration qemu v16: https://patchwork.kernel.org/cover/11456557/ Previous versions: RFC v3: https://lkml.org/lkml/2020/2/11/142 RFC v2: https://lkml.org/lkml/2020/1/30/956 RFC v1: kernel part: https://www.spinics.net/lists/kernel/msg3337337.html. qemu part: https://www.spinics.net/lists/kernel/msg3337337.html. Tina Zhang (1): vfio: Define device specific irq type capability Yan Zhao (9): vfio/pci: register/unregister vfio_pci_vendor_driver_ops vfio/pci: macros to generate module_init and module_exit for vendor modules vfio/pci: export vendor_data, irq_type, num_regions, pdev and functions in vfio_pci_ops vfio/pci: let vfio_pci know number of vendor regions and vendor irqs vfio/pci: export vfio_pci_get_barmap vfio/pci: introduce a new irq type VFIO_IRQ_TYPE_REMAP_BAR_REGION i40e/vf_migration: VF live migration - pass-through VF first i40e/vf_migration: register a migration vendor region i40e/vf_migration: vendor defined irq_type to support dynamic bar map drivers/net/ethernet/intel/Kconfig | 10 + drivers/net/ethernet/intel/i40e/Makefile | 2 + .../ethernet/intel/i40e/i40e_vf_migration.c | 904 ++++++++++++++++++ .../ethernet/intel/i40e/i40e_vf_migration.h | 119 +++ drivers/vfio/pci/vfio_pci.c | 181 +++- drivers/vfio/pci/vfio_pci_private.h | 9 + drivers/vfio/pci/vfio_pci_rdwr.c | 10 + include/linux/vfio.h | 58 ++ include/uapi/linux/vfio.h | 30 +- 9 files changed, 1311 insertions(+), 12 deletions(-) create mode 100644 drivers/net/ethernet/intel/i40e/i40e_vf_migration.c create mode 100644 drivers/net/ethernet/intel/i40e/i40e_vf_migration.h -- 2.17.1