Re: [libvirt] [RFC PATCH 1/9] vfio/pci: introduce mediate ops to intercept vfio-pci ops
On Wed, Dec 11, 2019 at 12:58:24AM +0800, Alex Williamson wrote: > On Mon, 9 Dec 2019 21:44:23 -0500 > Yan Zhao wrote: > > > > > > > Currently, yes, i40e has build dependency on vfio-pci. > > > > > > It's like this, if i40e decides to support SRIOV and compiles in vf > > > > > > related code who depends on vfio-pci, it will also have build > > > > > > dependency > > > > > > on vfio-pci. isn't it natural? > > > > > > > > > > No, this is not natural. There are certainly i40e VF use cases that > > > > > have no interest in vfio and having dependencies between the two > > > > > modules is unacceptable. I think you probably want to modularize the > > > > > i40e vfio support code and then perhaps register a table in vfio-pci > > > > > that the vfio-pci code can perform a module request when using a > > > > > compatible device. Just and idea, there might be better options. I > > > > > will not accept a solution that requires unloading the i40e driver in > > > > > order to unload the vfio-pci driver. It's inconvenient with just one > > > > > NIC driver, imagine how poorly that scales. > > > > > > > > > what about this way: > > > > mediate driver registers a module notifier and every time when > > > > vfio_pci is loaded, register to vfio_pci its mediate ops? > > > > (Just like in below sample code) > > > > This way vfio-pci is free to unload and this registering only gives > > > > vfio-pci a name of what module to request. > > > > After that, > > > > in vfio_pci_open(), vfio-pci requests the mediate driver. (or puts > > > > the mediate driver when mediate driver does not support mediating the > > > > device) > > > > in vfio_pci_release(), vfio-pci puts the mediate driver. > > > > > > > > static void register_mediate_ops(void) > > > > { > > > > int (*func)(struct vfio_pci_mediate_ops *ops) = NULL; > > > > > > > > func = symbol_get(vfio_pci_register_mediate_ops); > > > > > > > > if (func) { > > > > func(&igd_dt_ops); > > > > symbol_put(vfio_pci_register_mediate_ops); > > > > } > > > > } > > > > > > > > static int igd_module_notify(struct notifier_block *self, > > > > unsigned long val, void *data) > > > > { > > > > struct module *mod = data; > > > > int ret = 0; > > > > > > > > switch (val) { > > > > case MODULE_STATE_LIVE: > > > > if (!strcmp(mod->name, "vfio_pci")) > > > > register_mediate_ops(); > > > > break; > > > > case MODULE_STATE_GOING: > > > > break; > > > > default: > > > > break; > > > > } > > > > return ret; > > > > } > > > > > > > > static struct notifier_block igd_module_nb = { > > > > .notifier_call = igd_module_notify, > > > > .priority = 0, > > > > }; > > > > > > > > > > > > > > > > static int __init igd_dt_init(void) > > > > { > > > > ... > > > > register_mediate_ops(); > > > > register_module_notifier(&igd_module_nb); > > > > ... > > > > return 0; > > > > } > > > > > > > > > No, this is bad. Please look at MODULE_ALIAS() and request_module() as > > > used in the vfio-platform for loading reset driver modules. I think > > > the correct approach is that vfio-pci should perform a request_module() > > > based on the device being probed. Having the mediation provider > > > listening for vfio-pci and registering itself regardless of whether we > > > intend to use it assumes that we will want to use it and assumes that > > > the mediation provider module is already loaded. We should be able to > > > support demand loading of modules that may serve no other purpose than > > > providing this mediation. Thanks, > > hi Alex > > Thanks for this message. > > So is it good to create a separate module as mediation provider driver, > > and alias its module name to "vfio-pci-mediate-vid-did". > > Then when vfio-pci probes the device, it requests module of that name ? > > I think this would give us an option to have the mediator as a separate > module, but not require it. Maybe rather than a request_module(), > where if we follow the platform reset example we'd then expect the init > code for the module to register into a list, we could do a > symbol_request(). AIUI, this would give us a reference to the symbol > if the module providing it is already loaded, and request a module > (perhaps via an alias) if it's not already load. Thanks, > ok. got it! Thank you :) Yan -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC PATCH 1/9] vfio/pci: introduce mediate ops to intercept vfio-pci ops
On Mon, 9 Dec 2019 21:44:23 -0500 Yan Zhao wrote: > > > > > Currently, yes, i40e has build dependency on vfio-pci. > > > > > It's like this, if i40e decides to support SRIOV and compiles in vf > > > > > related code who depends on vfio-pci, it will also have build > > > > > dependency > > > > > on vfio-pci. isn't it natural? > > > > > > > > No, this is not natural. There are certainly i40e VF use cases that > > > > have no interest in vfio and having dependencies between the two > > > > modules is unacceptable. I think you probably want to modularize the > > > > i40e vfio support code and then perhaps register a table in vfio-pci > > > > that the vfio-pci code can perform a module request when using a > > > > compatible device. Just and idea, there might be better options. I > > > > will not accept a solution that requires unloading the i40e driver in > > > > order to unload the vfio-pci driver. It's inconvenient with just one > > > > NIC driver, imagine how poorly that scales. > > > > > > > what about this way: > > > mediate driver registers a module notifier and every time when > > > vfio_pci is loaded, register to vfio_pci its mediate ops? > > > (Just like in below sample code) > > > This way vfio-pci is free to unload and this registering only gives > > > vfio-pci a name of what module to request. > > > After that, > > > in vfio_pci_open(), vfio-pci requests the mediate driver. (or puts > > > the mediate driver when mediate driver does not support mediating the > > > device) > > > in vfio_pci_release(), vfio-pci puts the mediate driver. > > > > > > static void register_mediate_ops(void) > > > { > > > int (*func)(struct vfio_pci_mediate_ops *ops) = NULL; > > > > > > func = symbol_get(vfio_pci_register_mediate_ops); > > > > > > if (func) { > > > func(&igd_dt_ops); > > > symbol_put(vfio_pci_register_mediate_ops); > > > } > > > } > > > > > > static int igd_module_notify(struct notifier_block *self, > > > unsigned long val, void *data) > > > { > > > struct module *mod = data; > > > int ret = 0; > > > > > > switch (val) { > > > case MODULE_STATE_LIVE: > > > if (!strcmp(mod->name, "vfio_pci")) > > > register_mediate_ops(); > > > break; > > > case MODULE_STATE_GOING: > > > break; > > > default: > > > break; > > > } > > > return ret; > > > } > > > > > > static struct notifier_block igd_module_nb = { > > > .notifier_call = igd_module_notify, > > > .priority = 0, > > > }; > > > > > > > > > > > > static int __init igd_dt_init(void) > > > { > > > ... > > > register_mediate_ops(); > > > register_module_notifier(&igd_module_nb); > > > ... > > > return 0; > > > } > > > > > > No, this is bad. Please look at MODULE_ALIAS() and request_module() as > > used in the vfio-platform for loading reset driver modules. I think > > the correct approach is that vfio-pci should perform a request_module() > > based on the device being probed. Having the mediation provider > > listening for vfio-pci and registering itself regardless of whether we > > intend to use it assumes that we will want to use it and assumes that > > the mediation provider module is already loaded. We should be able to > > support demand loading of modules that may serve no other purpose than > > providing this mediation. Thanks, > hi Alex > Thanks for this message. > So is it good to create a separate module as mediation provider driver, > and alias its module name to "vfio-pci-mediate-vid-did". > Then when vfio-pci probes the device, it requests module of that name ? I think this would give us an option to have the mediator as a separate module, but not require it. Maybe rather than a request_module(), where if we follow the platform reset example we'd then expect the init code for the module to register into a list, we could do a symbol_request(). AIUI, this would give us a reference to the symbol if the module providing it is already loaded, and request a module (perhaps via an alias) if it's not already load. Thanks, Alex -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC PATCH 1/9] vfio/pci: introduce mediate ops to intercept vfio-pci ops
> > > > Currently, yes, i40e has build dependency on vfio-pci. > > > > It's like this, if i40e decides to support SRIOV and compiles in vf > > > > related code who depends on vfio-pci, it will also have build dependency > > > > on vfio-pci. isn't it natural? > > > > > > No, this is not natural. There are certainly i40e VF use cases that > > > have no interest in vfio and having dependencies between the two > > > modules is unacceptable. I think you probably want to modularize the > > > i40e vfio support code and then perhaps register a table in vfio-pci > > > that the vfio-pci code can perform a module request when using a > > > compatible device. Just and idea, there might be better options. I > > > will not accept a solution that requires unloading the i40e driver in > > > order to unload the vfio-pci driver. It's inconvenient with just one > > > NIC driver, imagine how poorly that scales. > > > > > what about this way: > > mediate driver registers a module notifier and every time when > > vfio_pci is loaded, register to vfio_pci its mediate ops? > > (Just like in below sample code) > > This way vfio-pci is free to unload and this registering only gives > > vfio-pci a name of what module to request. > > After that, > > in vfio_pci_open(), vfio-pci requests the mediate driver. (or puts > > the mediate driver when mediate driver does not support mediating the > > device) > > in vfio_pci_release(), vfio-pci puts the mediate driver. > > > > static void register_mediate_ops(void) > > { > > int (*func)(struct vfio_pci_mediate_ops *ops) = NULL; > > > > func = symbol_get(vfio_pci_register_mediate_ops); > > > > if (func) { > > func(&igd_dt_ops); > > symbol_put(vfio_pci_register_mediate_ops); > > } > > } > > > > static int igd_module_notify(struct notifier_block *self, > > unsigned long val, void *data) > > { > > struct module *mod = data; > > int ret = 0; > > > > switch (val) { > > case MODULE_STATE_LIVE: > > if (!strcmp(mod->name, "vfio_pci")) > > register_mediate_ops(); > > break; > > case MODULE_STATE_GOING: > > break; > > default: > > break; > > } > > return ret; > > } > > > > static struct notifier_block igd_module_nb = { > > .notifier_call = igd_module_notify, > > .priority = 0, > > }; > > > > > > > > static int __init igd_dt_init(void) > > { > > ... > > register_mediate_ops(); > > register_module_notifier(&igd_module_nb); > > ... > > return 0; > > } > > > No, this is bad. Please look at MODULE_ALIAS() and request_module() as > used in the vfio-platform for loading reset driver modules. I think > the correct approach is that vfio-pci should perform a request_module() > based on the device being probed. Having the mediation provider > listening for vfio-pci and registering itself regardless of whether we > intend to use it assumes that we will want to use it and assumes that > the mediation provider module is already loaded. We should be able to > support demand loading of modules that may serve no other purpose than > providing this mediation. Thanks, hi Alex Thanks for this message. So is it good to create a separate module as mediation provider driver, and alias its module name to "vfio-pci-mediate-vid-did". Then when vfio-pci probes the device, it requests module of that name ? Thanks Yan -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC PATCH 1/9] vfio/pci: introduce mediate ops to intercept vfio-pci ops
On Sun, 8 Dec 2019 22:42:25 -0500 Yan Zhao wrote: > On Sat, Dec 07, 2019 at 05:22:26AM +0800, Alex Williamson wrote: > > On Fri, 6 Dec 2019 02:56:55 -0500 > > Yan Zhao wrote: > > > > > On Fri, Dec 06, 2019 at 07:55:19AM +0800, Alex Williamson wrote: > > > > On Wed, 4 Dec 2019 22:25:36 -0500 > > > > Yan Zhao wrote: > > > > > > > > > when vfio-pci is bound to a physical device, almost all the hardware > > > > > resources are passthroughed. > > > > > Sometimes, vendor driver of this physcial device may want to mediate > > > > > some > > > > > hardware resource access for a short period of time, e.g. dirty page > > > > > tracking during live migration. > > > > > > > > > > Here we introduce mediate ops in vfio-pci for this purpose. > > > > > > > > > > Vendor driver can register a mediate ops to vfio-pci. > > > > > But rather than directly bind to the passthroughed device, the > > > > > vendor driver is now either a module that does not bind to any device > > > > > or > > > > > a module binds to other device. > > > > > E.g. when passing through a VF device that is bound to vfio-pci > > > > > modules, > > > > > PF driver that binds to PF device can register to vfio-pci to mediate > > > > > VF's regions, hence supporting VF live migration. > > > > > > > > > > The sequence goes like this: > > > > > 1. Vendor driver register its vfio_pci_mediate_ops to vfio-pci driver > > > > > > > > > > 2. vfio-pci maintains a list of those registered vfio_pci_mediate_ops > > > > > > > > > > 3. Whenever vfio-pci opens a device, it searches the list and call > > > > > vfio_pci_mediate_ops->open() to check whether a vendor driver supports > > > > > mediating this device. > > > > > Upon a success return value of from vfio_pci_mediate_ops->open(), > > > > > vfio-pci will stop list searching and store a mediate handle to > > > > > represent this open into vendor driver. > > > > > (so if multiple vendor drivers support mediating a device through > > > > > vfio_pci_mediate_ops, only one will win, depending on their > > > > > registering > > > > > sequence) > > > > > > > > > > 4. Whenever a VFIO_DEVICE_GET_REGION_INFO ioctl is received in > > > > > vfio-pci > > > > > ops, it will chain into vfio_pci_mediate_ops->get_region_info(), so > > > > > that > > > > > vendor driver is able to override a region's default flags and caps, > > > > > e.g. adding a sparse mmap cap to passthrough only sub-regions of a > > > > > whole > > > > > region. > > > > > > > > > > 5. vfio_pci_rw()/vfio_pci_mmap() first calls into > > > > > vfio_pci_mediate_ops->rw()/vfio_pci_mediate_ops->mmaps(). > > > > > if pt=true is rteturned, vfio_pci_rw()/vfio_pci_mmap() will further > > > > > passthrough this read/write/mmap to physical device, otherwise it just > > > > > returns without touch physical device. > > > > > > > > > > 6. When vfio-pci closes a device, vfio_pci_release() chains into > > > > > vfio_pci_mediate_ops->release() to close the reference in vendor > > > > > driver. > > > > > > > > > > 7. Vendor driver unregister its vfio_pci_mediate_ops when driver exits > > > > > > > > > > Cc: Kevin Tian > > > > > > > > > > Signed-off-by: Yan Zhao > > > > > --- > > > > > drivers/vfio/pci/vfio_pci.c | 146 > > > > > > > > > > drivers/vfio/pci/vfio_pci_private.h | 2 + > > > > > include/linux/vfio.h| 16 +++ > > > > > 3 files changed, 164 insertions(+) > > > > > > > > > > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c > > > > > index 02206162eaa9..55080ff29495 100644 > > > > > --- a/drivers/vfio/pci/vfio_pci.c > > > > > +++ b/drivers/vfio/pci/vfio_pci.c > > > > > @@ -54,6 +54,14 @@ module_param(disable_idle_d3, bool, S_IRUGO | > > > > > S_IWUSR); > > > > > MODULE_PARM_DESC(disable_idle_d3, > > > > >"Disable using the PCI D3 low power state for idle, > > > > > unused devices"); > > > > > > > > > > +static LIST_HEAD(mediate_ops_list); > > > > > +static DEFINE_MUTEX(mediate_ops_list_lock); > > > > > +struct vfio_pci_mediate_ops_list_entry { > > > > > + struct vfio_pci_mediate_ops *ops; > > > > > + int refcnt; > > > > > + struct list_headnext; > > > > > +}; > > > > > + > > > > > static inline bool vfio_vga_disabled(void) > > > > > { > > > > > #ifdef CONFIG_VFIO_PCI_VGA > > > > > @@ -472,6 +480,10 @@ static void vfio_pci_release(void *device_data) > > > > > if (!(--vdev->refcnt)) { > > > > > vfio_spapr_pci_eeh_release(vdev->pdev); > > > > > vfio_pci_disable(vdev); > > > > > + if (vdev->mediate_ops && vdev->mediate_ops->release) { > > > > > + > > > > > vdev->mediate_ops->release(vdev->mediate_handle); > > > > > + vdev->mediate_ops = NULL; > > > > > + } > > > > > } > > > > > > > > > > mutex_unlock(&vdev->reflck->lock); > > > > > @@ -483,6 +495,7 @@ static int vfio_pci_op
Re: [libvirt] [RFC PATCH 1/9] vfio/pci: introduce mediate ops to intercept vfio-pci ops
On Sat, Dec 07, 2019 at 05:22:26AM +0800, Alex Williamson wrote: > On Fri, 6 Dec 2019 02:56:55 -0500 > Yan Zhao wrote: > > > On Fri, Dec 06, 2019 at 07:55:19AM +0800, Alex Williamson wrote: > > > On Wed, 4 Dec 2019 22:25:36 -0500 > > > Yan Zhao wrote: > > > > > > > when vfio-pci is bound to a physical device, almost all the hardware > > > > resources are passthroughed. > > > > Sometimes, vendor driver of this physcial device may want to mediate > > > > some > > > > hardware resource access for a short period of time, e.g. dirty page > > > > tracking during live migration. > > > > > > > > Here we introduce mediate ops in vfio-pci for this purpose. > > > > > > > > Vendor driver can register a mediate ops to vfio-pci. > > > > But rather than directly bind to the passthroughed device, the > > > > vendor driver is now either a module that does not bind to any device or > > > > a module binds to other device. > > > > E.g. when passing through a VF device that is bound to vfio-pci modules, > > > > PF driver that binds to PF device can register to vfio-pci to mediate > > > > VF's regions, hence supporting VF live migration. > > > > > > > > The sequence goes like this: > > > > 1. Vendor driver register its vfio_pci_mediate_ops to vfio-pci driver > > > > > > > > 2. vfio-pci maintains a list of those registered vfio_pci_mediate_ops > > > > > > > > 3. Whenever vfio-pci opens a device, it searches the list and call > > > > vfio_pci_mediate_ops->open() to check whether a vendor driver supports > > > > mediating this device. > > > > Upon a success return value of from vfio_pci_mediate_ops->open(), > > > > vfio-pci will stop list searching and store a mediate handle to > > > > represent this open into vendor driver. > > > > (so if multiple vendor drivers support mediating a device through > > > > vfio_pci_mediate_ops, only one will win, depending on their registering > > > > sequence) > > > > > > > > 4. Whenever a VFIO_DEVICE_GET_REGION_INFO ioctl is received in vfio-pci > > > > ops, it will chain into vfio_pci_mediate_ops->get_region_info(), so that > > > > vendor driver is able to override a region's default flags and caps, > > > > e.g. adding a sparse mmap cap to passthrough only sub-regions of a whole > > > > region. > > > > > > > > 5. vfio_pci_rw()/vfio_pci_mmap() first calls into > > > > vfio_pci_mediate_ops->rw()/vfio_pci_mediate_ops->mmaps(). > > > > if pt=true is rteturned, vfio_pci_rw()/vfio_pci_mmap() will further > > > > passthrough this read/write/mmap to physical device, otherwise it just > > > > returns without touch physical device. > > > > > > > > 6. When vfio-pci closes a device, vfio_pci_release() chains into > > > > vfio_pci_mediate_ops->release() to close the reference in vendor driver. > > > > > > > > 7. Vendor driver unregister its vfio_pci_mediate_ops when driver exits > > > > > > > > Cc: Kevin Tian > > > > > > > > Signed-off-by: Yan Zhao > > > > --- > > > > drivers/vfio/pci/vfio_pci.c | 146 > > > > drivers/vfio/pci/vfio_pci_private.h | 2 + > > > > include/linux/vfio.h| 16 +++ > > > > 3 files changed, 164 insertions(+) > > > > > > > > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c > > > > index 02206162eaa9..55080ff29495 100644 > > > > --- a/drivers/vfio/pci/vfio_pci.c > > > > +++ b/drivers/vfio/pci/vfio_pci.c > > > > @@ -54,6 +54,14 @@ module_param(disable_idle_d3, bool, S_IRUGO | > > > > S_IWUSR); > > > > MODULE_PARM_DESC(disable_idle_d3, > > > > "Disable using the PCI D3 low power state for idle, > > > > unused devices"); > > > > > > > > +static LIST_HEAD(mediate_ops_list); > > > > +static DEFINE_MUTEX(mediate_ops_list_lock); > > > > +struct vfio_pci_mediate_ops_list_entry { > > > > + struct vfio_pci_mediate_ops *ops; > > > > + int refcnt; > > > > + struct list_headnext; > > > > +}; > > > > + > > > > static inline bool vfio_vga_disabled(void) > > > > { > > > > #ifdef CONFIG_VFIO_PCI_VGA > > > > @@ -472,6 +480,10 @@ static void vfio_pci_release(void *device_data) > > > > if (!(--vdev->refcnt)) { > > > > vfio_spapr_pci_eeh_release(vdev->pdev); > > > > vfio_pci_disable(vdev); > > > > + if (vdev->mediate_ops && vdev->mediate_ops->release) { > > > > + > > > > vdev->mediate_ops->release(vdev->mediate_handle); > > > > + vdev->mediate_ops = NULL; > > > > + } > > > > } > > > > > > > > mutex_unlock(&vdev->reflck->lock); > > > > @@ -483,6 +495,7 @@ static int vfio_pci_open(void *device_data) > > > > { > > > > struct vfio_pci_device *vdev = device_data; > > > > int ret = 0; > > > > + struct vfio_pci_mediate_ops_list_entry *mentry; > > > > > > > > if (!try_module_get(THIS_MODULE)) > > > > return -ENODEV; > > > > @@ -495,6 +508,30 @@ stati
Re: [libvirt] [RFC PATCH 1/9] vfio/pci: introduce mediate ops to intercept vfio-pci ops
Sorry about that. I'll pay attention to them next time and thank you for pointing them out :) On Sat, Dec 07, 2019 at 07:13:30AM +0800, Eric Blake wrote: > On 12/4/19 9:25 PM, Yan Zhao wrote: > > when vfio-pci is bound to a physical device, almost all the hardware > > resources are passthroughed. > > The intent is obvious, but it sounds awkward to a native speaker. > s/passthroughed/passed through/ > > > Sometimes, vendor driver of this physcial device may want to mediate some > > physical > > > hardware resource access for a short period of time, e.g. dirty page > > tracking during live migration. > > > > Here we introduce mediate ops in vfio-pci for this purpose. > > > > Vendor driver can register a mediate ops to vfio-pci. > > But rather than directly bind to the passthroughed device, the > > passed-through > > -- > Eric Blake, Principal Software Engineer > Red Hat, Inc. +1-919-301-3226 > Virtualization: qemu.org | libvirt.org > -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC PATCH 1/9] vfio/pci: introduce mediate ops to intercept vfio-pci ops
On 12/4/19 9:25 PM, Yan Zhao wrote: when vfio-pci is bound to a physical device, almost all the hardware resources are passthroughed. The intent is obvious, but it sounds awkward to a native speaker. s/passthroughed/passed through/ Sometimes, vendor driver of this physcial device may want to mediate some physical hardware resource access for a short period of time, e.g. dirty page tracking during live migration. Here we introduce mediate ops in vfio-pci for this purpose. Vendor driver can register a mediate ops to vfio-pci. But rather than directly bind to the passthroughed device, the passed-through -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC PATCH 1/9] vfio/pci: introduce mediate ops to intercept vfio-pci ops
On Fri, 6 Dec 2019 02:56:55 -0500 Yan Zhao wrote: > On Fri, Dec 06, 2019 at 07:55:19AM +0800, Alex Williamson wrote: > > On Wed, 4 Dec 2019 22:25:36 -0500 > > Yan Zhao wrote: > > > > > when vfio-pci is bound to a physical device, almost all the hardware > > > resources are passthroughed. > > > Sometimes, vendor driver of this physcial device may want to mediate some > > > hardware resource access for a short period of time, e.g. dirty page > > > tracking during live migration. > > > > > > Here we introduce mediate ops in vfio-pci for this purpose. > > > > > > Vendor driver can register a mediate ops to vfio-pci. > > > But rather than directly bind to the passthroughed device, the > > > vendor driver is now either a module that does not bind to any device or > > > a module binds to other device. > > > E.g. when passing through a VF device that is bound to vfio-pci modules, > > > PF driver that binds to PF device can register to vfio-pci to mediate > > > VF's regions, hence supporting VF live migration. > > > > > > The sequence goes like this: > > > 1. Vendor driver register its vfio_pci_mediate_ops to vfio-pci driver > > > > > > 2. vfio-pci maintains a list of those registered vfio_pci_mediate_ops > > > > > > 3. Whenever vfio-pci opens a device, it searches the list and call > > > vfio_pci_mediate_ops->open() to check whether a vendor driver supports > > > mediating this device. > > > Upon a success return value of from vfio_pci_mediate_ops->open(), > > > vfio-pci will stop list searching and store a mediate handle to > > > represent this open into vendor driver. > > > (so if multiple vendor drivers support mediating a device through > > > vfio_pci_mediate_ops, only one will win, depending on their registering > > > sequence) > > > > > > 4. Whenever a VFIO_DEVICE_GET_REGION_INFO ioctl is received in vfio-pci > > > ops, it will chain into vfio_pci_mediate_ops->get_region_info(), so that > > > vendor driver is able to override a region's default flags and caps, > > > e.g. adding a sparse mmap cap to passthrough only sub-regions of a whole > > > region. > > > > > > 5. vfio_pci_rw()/vfio_pci_mmap() first calls into > > > vfio_pci_mediate_ops->rw()/vfio_pci_mediate_ops->mmaps(). > > > if pt=true is rteturned, vfio_pci_rw()/vfio_pci_mmap() will further > > > passthrough this read/write/mmap to physical device, otherwise it just > > > returns without touch physical device. > > > > > > 6. When vfio-pci closes a device, vfio_pci_release() chains into > > > vfio_pci_mediate_ops->release() to close the reference in vendor driver. > > > > > > 7. Vendor driver unregister its vfio_pci_mediate_ops when driver exits > > > > > > Cc: Kevin Tian > > > > > > Signed-off-by: Yan Zhao > > > --- > > > drivers/vfio/pci/vfio_pci.c | 146 > > > drivers/vfio/pci/vfio_pci_private.h | 2 + > > > include/linux/vfio.h| 16 +++ > > > 3 files changed, 164 insertions(+) > > > > > > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c > > > index 02206162eaa9..55080ff29495 100644 > > > --- a/drivers/vfio/pci/vfio_pci.c > > > +++ b/drivers/vfio/pci/vfio_pci.c > > > @@ -54,6 +54,14 @@ module_param(disable_idle_d3, bool, S_IRUGO | S_IWUSR); > > > MODULE_PARM_DESC(disable_idle_d3, > > >"Disable using the PCI D3 low power state for idle, unused > > > devices"); > > > > > > +static LIST_HEAD(mediate_ops_list); > > > +static DEFINE_MUTEX(mediate_ops_list_lock); > > > +struct vfio_pci_mediate_ops_list_entry { > > > + struct vfio_pci_mediate_ops *ops; > > > + int refcnt; > > > + struct list_headnext; > > > +}; > > > + > > > static inline bool vfio_vga_disabled(void) > > > { > > > #ifdef CONFIG_VFIO_PCI_VGA > > > @@ -472,6 +480,10 @@ static void vfio_pci_release(void *device_data) > > > if (!(--vdev->refcnt)) { > > > vfio_spapr_pci_eeh_release(vdev->pdev); > > > vfio_pci_disable(vdev); > > > + if (vdev->mediate_ops && vdev->mediate_ops->release) { > > > + vdev->mediate_ops->release(vdev->mediate_handle); > > > + vdev->mediate_ops = NULL; > > > + } > > > } > > > > > > mutex_unlock(&vdev->reflck->lock); > > > @@ -483,6 +495,7 @@ static int vfio_pci_open(void *device_data) > > > { > > > struct vfio_pci_device *vdev = device_data; > > > int ret = 0; > > > + struct vfio_pci_mediate_ops_list_entry *mentry; > > > > > > if (!try_module_get(THIS_MODULE)) > > > return -ENODEV; > > > @@ -495,6 +508,30 @@ static int vfio_pci_open(void *device_data) > > > goto error; > > > > > > vfio_spapr_pci_eeh_open(vdev->pdev); > > > + mutex_lock(&mediate_ops_list_lock); > > > + list_for_each_entry(mentry, &mediate_ops_list, next) { > > > + u64 caps; > > > + u32 handle; > > > > Wouldn't it seem likely that the ops provider might use this handle
Re: [libvirt] [RFC PATCH 1/9] vfio/pci: introduce mediate ops to intercept vfio-pci ops
On Fri, Dec 06, 2019 at 07:55:19AM +0800, Alex Williamson wrote: > On Wed, 4 Dec 2019 22:25:36 -0500 > Yan Zhao wrote: > > > when vfio-pci is bound to a physical device, almost all the hardware > > resources are passthroughed. > > Sometimes, vendor driver of this physcial device may want to mediate some > > hardware resource access for a short period of time, e.g. dirty page > > tracking during live migration. > > > > Here we introduce mediate ops in vfio-pci for this purpose. > > > > Vendor driver can register a mediate ops to vfio-pci. > > But rather than directly bind to the passthroughed device, the > > vendor driver is now either a module that does not bind to any device or > > a module binds to other device. > > E.g. when passing through a VF device that is bound to vfio-pci modules, > > PF driver that binds to PF device can register to vfio-pci to mediate > > VF's regions, hence supporting VF live migration. > > > > The sequence goes like this: > > 1. Vendor driver register its vfio_pci_mediate_ops to vfio-pci driver > > > > 2. vfio-pci maintains a list of those registered vfio_pci_mediate_ops > > > > 3. Whenever vfio-pci opens a device, it searches the list and call > > vfio_pci_mediate_ops->open() to check whether a vendor driver supports > > mediating this device. > > Upon a success return value of from vfio_pci_mediate_ops->open(), > > vfio-pci will stop list searching and store a mediate handle to > > represent this open into vendor driver. > > (so if multiple vendor drivers support mediating a device through > > vfio_pci_mediate_ops, only one will win, depending on their registering > > sequence) > > > > 4. Whenever a VFIO_DEVICE_GET_REGION_INFO ioctl is received in vfio-pci > > ops, it will chain into vfio_pci_mediate_ops->get_region_info(), so that > > vendor driver is able to override a region's default flags and caps, > > e.g. adding a sparse mmap cap to passthrough only sub-regions of a whole > > region. > > > > 5. vfio_pci_rw()/vfio_pci_mmap() first calls into > > vfio_pci_mediate_ops->rw()/vfio_pci_mediate_ops->mmaps(). > > if pt=true is rteturned, vfio_pci_rw()/vfio_pci_mmap() will further > > passthrough this read/write/mmap to physical device, otherwise it just > > returns without touch physical device. > > > > 6. When vfio-pci closes a device, vfio_pci_release() chains into > > vfio_pci_mediate_ops->release() to close the reference in vendor driver. > > > > 7. Vendor driver unregister its vfio_pci_mediate_ops when driver exits > > > > Cc: Kevin Tian > > > > Signed-off-by: Yan Zhao > > --- > > drivers/vfio/pci/vfio_pci.c | 146 > > drivers/vfio/pci/vfio_pci_private.h | 2 + > > include/linux/vfio.h| 16 +++ > > 3 files changed, 164 insertions(+) > > > > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c > > index 02206162eaa9..55080ff29495 100644 > > --- a/drivers/vfio/pci/vfio_pci.c > > +++ b/drivers/vfio/pci/vfio_pci.c > > @@ -54,6 +54,14 @@ module_param(disable_idle_d3, bool, S_IRUGO | S_IWUSR); > > MODULE_PARM_DESC(disable_idle_d3, > > "Disable using the PCI D3 low power state for idle, unused > > devices"); > > > > +static LIST_HEAD(mediate_ops_list); > > +static DEFINE_MUTEX(mediate_ops_list_lock); > > +struct vfio_pci_mediate_ops_list_entry { > > + struct vfio_pci_mediate_ops *ops; > > + int refcnt; > > + struct list_headnext; > > +}; > > + > > static inline bool vfio_vga_disabled(void) > > { > > #ifdef CONFIG_VFIO_PCI_VGA > > @@ -472,6 +480,10 @@ static void vfio_pci_release(void *device_data) > > if (!(--vdev->refcnt)) { > > vfio_spapr_pci_eeh_release(vdev->pdev); > > vfio_pci_disable(vdev); > > + if (vdev->mediate_ops && vdev->mediate_ops->release) { > > + vdev->mediate_ops->release(vdev->mediate_handle); > > + vdev->mediate_ops = NULL; > > + } > > } > > > > mutex_unlock(&vdev->reflck->lock); > > @@ -483,6 +495,7 @@ static int vfio_pci_open(void *device_data) > > { > > struct vfio_pci_device *vdev = device_data; > > int ret = 0; > > + struct vfio_pci_mediate_ops_list_entry *mentry; > > > > if (!try_module_get(THIS_MODULE)) > > return -ENODEV; > > @@ -495,6 +508,30 @@ static int vfio_pci_open(void *device_data) > > goto error; > > > > vfio_spapr_pci_eeh_open(vdev->pdev); > > + mutex_lock(&mediate_ops_list_lock); > > + list_for_each_entry(mentry, &mediate_ops_list, next) { > > + u64 caps; > > + u32 handle; > > Wouldn't it seem likely that the ops provider might use this handle as > a pointer, so we'd want it to be an opaque void*? > yes, you are right, handle as a pointer is much better. will change it. Thanks :) > > + > > + memset(&caps, 0, sizeof(caps)); > > @caps has no purpose
Re: [libvirt] [RFC PATCH 1/9] vfio/pci: introduce mediate ops to intercept vfio-pci ops
On Wed, 4 Dec 2019 22:25:36 -0500 Yan Zhao wrote: > when vfio-pci is bound to a physical device, almost all the hardware > resources are passthroughed. > Sometimes, vendor driver of this physcial device may want to mediate some > hardware resource access for a short period of time, e.g. dirty page > tracking during live migration. > > Here we introduce mediate ops in vfio-pci for this purpose. > > Vendor driver can register a mediate ops to vfio-pci. > But rather than directly bind to the passthroughed device, the > vendor driver is now either a module that does not bind to any device or > a module binds to other device. > E.g. when passing through a VF device that is bound to vfio-pci modules, > PF driver that binds to PF device can register to vfio-pci to mediate > VF's regions, hence supporting VF live migration. > > The sequence goes like this: > 1. Vendor driver register its vfio_pci_mediate_ops to vfio-pci driver > > 2. vfio-pci maintains a list of those registered vfio_pci_mediate_ops > > 3. Whenever vfio-pci opens a device, it searches the list and call > vfio_pci_mediate_ops->open() to check whether a vendor driver supports > mediating this device. > Upon a success return value of from vfio_pci_mediate_ops->open(), > vfio-pci will stop list searching and store a mediate handle to > represent this open into vendor driver. > (so if multiple vendor drivers support mediating a device through > vfio_pci_mediate_ops, only one will win, depending on their registering > sequence) > > 4. Whenever a VFIO_DEVICE_GET_REGION_INFO ioctl is received in vfio-pci > ops, it will chain into vfio_pci_mediate_ops->get_region_info(), so that > vendor driver is able to override a region's default flags and caps, > e.g. adding a sparse mmap cap to passthrough only sub-regions of a whole > region. > > 5. vfio_pci_rw()/vfio_pci_mmap() first calls into > vfio_pci_mediate_ops->rw()/vfio_pci_mediate_ops->mmaps(). > if pt=true is rteturned, vfio_pci_rw()/vfio_pci_mmap() will further > passthrough this read/write/mmap to physical device, otherwise it just > returns without touch physical device. > > 6. When vfio-pci closes a device, vfio_pci_release() chains into > vfio_pci_mediate_ops->release() to close the reference in vendor driver. > > 7. Vendor driver unregister its vfio_pci_mediate_ops when driver exits > > Cc: Kevin Tian > > Signed-off-by: Yan Zhao > --- > drivers/vfio/pci/vfio_pci.c | 146 > drivers/vfio/pci/vfio_pci_private.h | 2 + > include/linux/vfio.h| 16 +++ > 3 files changed, 164 insertions(+) > > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c > index 02206162eaa9..55080ff29495 100644 > --- a/drivers/vfio/pci/vfio_pci.c > +++ b/drivers/vfio/pci/vfio_pci.c > @@ -54,6 +54,14 @@ module_param(disable_idle_d3, bool, S_IRUGO | S_IWUSR); > MODULE_PARM_DESC(disable_idle_d3, >"Disable using the PCI D3 low power state for idle, unused > devices"); > > +static LIST_HEAD(mediate_ops_list); > +static DEFINE_MUTEX(mediate_ops_list_lock); > +struct vfio_pci_mediate_ops_list_entry { > + struct vfio_pci_mediate_ops *ops; > + int refcnt; > + struct list_headnext; > +}; > + > static inline bool vfio_vga_disabled(void) > { > #ifdef CONFIG_VFIO_PCI_VGA > @@ -472,6 +480,10 @@ static void vfio_pci_release(void *device_data) > if (!(--vdev->refcnt)) { > vfio_spapr_pci_eeh_release(vdev->pdev); > vfio_pci_disable(vdev); > + if (vdev->mediate_ops && vdev->mediate_ops->release) { > + vdev->mediate_ops->release(vdev->mediate_handle); > + vdev->mediate_ops = NULL; > + } > } > > mutex_unlock(&vdev->reflck->lock); > @@ -483,6 +495,7 @@ static int vfio_pci_open(void *device_data) > { > struct vfio_pci_device *vdev = device_data; > int ret = 0; > + struct vfio_pci_mediate_ops_list_entry *mentry; > > if (!try_module_get(THIS_MODULE)) > return -ENODEV; > @@ -495,6 +508,30 @@ static int vfio_pci_open(void *device_data) > goto error; > > vfio_spapr_pci_eeh_open(vdev->pdev); > + mutex_lock(&mediate_ops_list_lock); > + list_for_each_entry(mentry, &mediate_ops_list, next) { > + u64 caps; > + u32 handle; Wouldn't it seem likely that the ops provider might use this handle as a pointer, so we'd want it to be an opaque void*? > + > + memset(&caps, 0, sizeof(caps)); @caps has no purpose here, add it if/when we do something with it. It's also a standard type, why are we memset'ing it rather than just =0?? > + ret = mentry->ops->open(vdev->pdev, &caps, &handle); > + if (!ret) { > + vdev->mediate_ops = mentry->ops; > +
[libvirt] [RFC PATCH 1/9] vfio/pci: introduce mediate ops to intercept vfio-pci ops
when vfio-pci is bound to a physical device, almost all the hardware resources are passthroughed. Sometimes, vendor driver of this physcial device may want to mediate some hardware resource access for a short period of time, e.g. dirty page tracking during live migration. Here we introduce mediate ops in vfio-pci for this purpose. Vendor driver can register a mediate ops to vfio-pci. But rather than directly bind to the passthroughed device, the vendor driver is now either a module that does not bind to any device or a module binds to other device. E.g. when passing through a VF device that is bound to vfio-pci modules, PF driver that binds to PF device can register to vfio-pci to mediate VF's regions, hence supporting VF live migration. The sequence goes like this: 1. Vendor driver register its vfio_pci_mediate_ops to vfio-pci driver 2. vfio-pci maintains a list of those registered vfio_pci_mediate_ops 3. Whenever vfio-pci opens a device, it searches the list and call vfio_pci_mediate_ops->open() to check whether a vendor driver supports mediating this device. Upon a success return value of from vfio_pci_mediate_ops->open(), vfio-pci will stop list searching and store a mediate handle to represent this open into vendor driver. (so if multiple vendor drivers support mediating a device through vfio_pci_mediate_ops, only one will win, depending on their registering sequence) 4. Whenever a VFIO_DEVICE_GET_REGION_INFO ioctl is received in vfio-pci ops, it will chain into vfio_pci_mediate_ops->get_region_info(), so that vendor driver is able to override a region's default flags and caps, e.g. adding a sparse mmap cap to passthrough only sub-regions of a whole region. 5. vfio_pci_rw()/vfio_pci_mmap() first calls into vfio_pci_mediate_ops->rw()/vfio_pci_mediate_ops->mmaps(). if pt=true is rteturned, vfio_pci_rw()/vfio_pci_mmap() will further passthrough this read/write/mmap to physical device, otherwise it just returns without touch physical device. 6. When vfio-pci closes a device, vfio_pci_release() chains into vfio_pci_mediate_ops->release() to close the reference in vendor driver. 7. Vendor driver unregister its vfio_pci_mediate_ops when driver exits Cc: Kevin Tian Signed-off-by: Yan Zhao --- drivers/vfio/pci/vfio_pci.c | 146 drivers/vfio/pci/vfio_pci_private.h | 2 + include/linux/vfio.h| 16 +++ 3 files changed, 164 insertions(+) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 02206162eaa9..55080ff29495 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -54,6 +54,14 @@ module_param(disable_idle_d3, bool, S_IRUGO | S_IWUSR); MODULE_PARM_DESC(disable_idle_d3, "Disable using the PCI D3 low power state for idle, unused devices"); +static LIST_HEAD(mediate_ops_list); +static DEFINE_MUTEX(mediate_ops_list_lock); +struct vfio_pci_mediate_ops_list_entry { + struct vfio_pci_mediate_ops *ops; + int refcnt; + struct list_headnext; +}; + static inline bool vfio_vga_disabled(void) { #ifdef CONFIG_VFIO_PCI_VGA @@ -472,6 +480,10 @@ static void vfio_pci_release(void *device_data) if (!(--vdev->refcnt)) { vfio_spapr_pci_eeh_release(vdev->pdev); vfio_pci_disable(vdev); + if (vdev->mediate_ops && vdev->mediate_ops->release) { + vdev->mediate_ops->release(vdev->mediate_handle); + vdev->mediate_ops = NULL; + } } mutex_unlock(&vdev->reflck->lock); @@ -483,6 +495,7 @@ static int vfio_pci_open(void *device_data) { struct vfio_pci_device *vdev = device_data; int ret = 0; + struct vfio_pci_mediate_ops_list_entry *mentry; if (!try_module_get(THIS_MODULE)) return -ENODEV; @@ -495,6 +508,30 @@ static int vfio_pci_open(void *device_data) goto error; vfio_spapr_pci_eeh_open(vdev->pdev); + mutex_lock(&mediate_ops_list_lock); + list_for_each_entry(mentry, &mediate_ops_list, next) { + u64 caps; + u32 handle; + + memset(&caps, 0, sizeof(caps)); + ret = mentry->ops->open(vdev->pdev, &caps, &handle); + if (!ret) { + vdev->mediate_ops = mentry->ops; + vdev->mediate_handle = handle; + + pr_info("vfio pci found mediate_ops %s, caps=%llx, handle=%x for %x:%x\n", + vdev->mediate_ops->name, caps, + handle, vdev->pdev->vendor, + vdev->pdev->device); + /* +* only find the first matching mediate_ops,