Re: [libvirt] [RFC PATCH 1/9] vfio/pci: introduce mediate ops to intercept vfio-pci ops

2019-12-10 Thread Yan Zhao
On Wed, Dec 11, 2019 at 12:58:24AM +0800, Alex Williamson wrote:
> On Mon, 9 Dec 2019 21:44:23 -0500
> Yan Zhao  wrote:
> 
> > > > > > Currently, yes, i40e has build dependency on vfio-pci.
> > > > > > It's like this, if i40e decides to support SRIOV and compiles in vf
> > > > > > related code who depends on vfio-pci, it will also have build 
> > > > > > dependency
> > > > > > on vfio-pci. isn't it natural?
> > > > > 
> > > > > No, this is not natural.  There are certainly i40e VF use cases that
> > > > > have no interest in vfio and having dependencies between the two
> > > > > modules is unacceptable.  I think you probably want to modularize the
> > > > > i40e vfio support code and then perhaps register a table in vfio-pci
> > > > > that the vfio-pci code can perform a module request when using a
> > > > > compatible device.  Just and idea, there might be better options.  I
> > > > > will not accept a solution that requires unloading the i40e driver in
> > > > > order to unload the vfio-pci driver.  It's inconvenient with just one
> > > > > NIC driver, imagine how poorly that scales.
> > > > > 
> > > > what about this way:
> > > > mediate driver registers a module notifier and every time when
> > > > vfio_pci is loaded, register to vfio_pci its mediate ops?
> > > > (Just like in below sample code)
> > > > This way vfio-pci is free to unload and this registering only gives
> > > > vfio-pci a name of what module to request.
> > > > After that,
> > > > in vfio_pci_open(), vfio-pci requests the mediate driver. (or puts
> > > > the mediate driver when mediate driver does not support mediating the
> > > > device)
> > > > in vfio_pci_release(), vfio-pci puts the mediate driver.
> > > > 
> > > > static void register_mediate_ops(void)
> > > > {
> > > > int (*func)(struct vfio_pci_mediate_ops *ops) = NULL;
> > > > 
> > > > func = symbol_get(vfio_pci_register_mediate_ops);
> > > > 
> > > > if (func) {
> > > > func(_dt_ops);
> > > > symbol_put(vfio_pci_register_mediate_ops);
> > > > }
> > > > }
> > > > 
> > > > static int igd_module_notify(struct notifier_block *self,
> > > >   unsigned long val, void *data)
> > > > {
> > > > struct module *mod = data;
> > > > int ret = 0;
> > > > 
> > > > switch (val) {
> > > > case MODULE_STATE_LIVE:
> > > > if (!strcmp(mod->name, "vfio_pci"))
> > > > register_mediate_ops();
> > > > break;
> > > > case MODULE_STATE_GOING:
> > > > break;
> > > > default:
> > > > break;
> > > > }
> > > > return ret;
> > > > }
> > > > 
> > > > static struct notifier_block igd_module_nb = {
> > > > .notifier_call = igd_module_notify,
> > > > .priority = 0,
> > > > };
> > > > 
> > > > 
> > > > 
> > > > static int __init igd_dt_init(void)
> > > > {
> > > > ...
> > > > register_mediate_ops();
> > > > register_module_notifier(_module_nb);
> > > > ...
> > > > return 0;
> > > > }  
> > > 
> > > 
> > > No, this is bad.  Please look at MODULE_ALIAS() and request_module() as
> > > used in the vfio-platform for loading reset driver modules.  I think
> > > the correct approach is that vfio-pci should perform a request_module()
> > > based on the device being probed.  Having the mediation provider
> > > listening for vfio-pci and registering itself regardless of whether we
> > > intend to use it assumes that we will want to use it and assumes that
> > > the mediation provider module is already loaded.  We should be able to
> > > support demand loading of modules that may serve no other purpose than
> > > providing this mediation.  Thanks,  
> > hi Alex
> > Thanks for this message.
> > So is it good to create a separate module as mediation provider driver,
> > and alias its module name to "vfio-pci-mediate-vid-did".
> > Then when vfio-pci probes the device, it requests module of that name ?
> 
> I think this would give us an option to have the mediator as a separate
> module, but not require it.  Maybe rather than a request_module(),
> where if we follow the platform reset example we'd then expect the init
> code for the module to register into a list, we could do a
> symbol_request().  AIUI, this would give us a reference to the symbol
> if the module providing it is already loaded, and request a module
> (perhaps via an alias) if it's not already load.  Thanks,
> 
ok. got it!
Thank you :)

Yan


--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list



Re: [libvirt] [RFC PATCH 1/9] vfio/pci: introduce mediate ops to intercept vfio-pci ops

2019-12-10 Thread Alex Williamson
On Mon, 9 Dec 2019 21:44:23 -0500
Yan Zhao  wrote:

> > > > > Currently, yes, i40e has build dependency on vfio-pci.
> > > > > It's like this, if i40e decides to support SRIOV and compiles in vf
> > > > > related code who depends on vfio-pci, it will also have build 
> > > > > dependency
> > > > > on vfio-pci. isn't it natural?
> > > > 
> > > > No, this is not natural.  There are certainly i40e VF use cases that
> > > > have no interest in vfio and having dependencies between the two
> > > > modules is unacceptable.  I think you probably want to modularize the
> > > > i40e vfio support code and then perhaps register a table in vfio-pci
> > > > that the vfio-pci code can perform a module request when using a
> > > > compatible device.  Just and idea, there might be better options.  I
> > > > will not accept a solution that requires unloading the i40e driver in
> > > > order to unload the vfio-pci driver.  It's inconvenient with just one
> > > > NIC driver, imagine how poorly that scales.
> > > > 
> > > what about this way:
> > > mediate driver registers a module notifier and every time when
> > > vfio_pci is loaded, register to vfio_pci its mediate ops?
> > > (Just like in below sample code)
> > > This way vfio-pci is free to unload and this registering only gives
> > > vfio-pci a name of what module to request.
> > > After that,
> > > in vfio_pci_open(), vfio-pci requests the mediate driver. (or puts
> > > the mediate driver when mediate driver does not support mediating the
> > > device)
> > > in vfio_pci_release(), vfio-pci puts the mediate driver.
> > > 
> > > static void register_mediate_ops(void)
> > > {
> > > int (*func)(struct vfio_pci_mediate_ops *ops) = NULL;
> > > 
> > > func = symbol_get(vfio_pci_register_mediate_ops);
> > > 
> > > if (func) {
> > > func(_dt_ops);
> > > symbol_put(vfio_pci_register_mediate_ops);
> > > }
> > > }
> > > 
> > > static int igd_module_notify(struct notifier_block *self,
> > >   unsigned long val, void *data)
> > > {
> > > struct module *mod = data;
> > > int ret = 0;
> > > 
> > > switch (val) {
> > > case MODULE_STATE_LIVE:
> > > if (!strcmp(mod->name, "vfio_pci"))
> > > register_mediate_ops();
> > > break;
> > > case MODULE_STATE_GOING:
> > > break;
> > > default:
> > > break;
> > > }
> > > return ret;
> > > }
> > > 
> > > static struct notifier_block igd_module_nb = {
> > > .notifier_call = igd_module_notify,
> > > .priority = 0,
> > > };
> > > 
> > > 
> > > 
> > > static int __init igd_dt_init(void)
> > > {
> > >   ...
> > >   register_mediate_ops();
> > >   register_module_notifier(_module_nb);
> > >   ...
> > >   return 0;
> > > }  
> > 
> > 
> > No, this is bad.  Please look at MODULE_ALIAS() and request_module() as
> > used in the vfio-platform for loading reset driver modules.  I think
> > the correct approach is that vfio-pci should perform a request_module()
> > based on the device being probed.  Having the mediation provider
> > listening for vfio-pci and registering itself regardless of whether we
> > intend to use it assumes that we will want to use it and assumes that
> > the mediation provider module is already loaded.  We should be able to
> > support demand loading of modules that may serve no other purpose than
> > providing this mediation.  Thanks,  
> hi Alex
> Thanks for this message.
> So is it good to create a separate module as mediation provider driver,
> and alias its module name to "vfio-pci-mediate-vid-did".
> Then when vfio-pci probes the device, it requests module of that name ?

I think this would give us an option to have the mediator as a separate
module, but not require it.  Maybe rather than a request_module(),
where if we follow the platform reset example we'd then expect the init
code for the module to register into a list, we could do a
symbol_request().  AIUI, this would give us a reference to the symbol
if the module providing it is already loaded, and request a module
(perhaps via an alias) if it's not already load.  Thanks,

Alex

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list



Re: [libvirt] [RFC PATCH 1/9] vfio/pci: introduce mediate ops to intercept vfio-pci ops

2019-12-09 Thread Yan Zhao
> > > > Currently, yes, i40e has build dependency on vfio-pci.
> > > > It's like this, if i40e decides to support SRIOV and compiles in vf
> > > > related code who depends on vfio-pci, it will also have build dependency
> > > > on vfio-pci. isn't it natural?  
> > > 
> > > No, this is not natural.  There are certainly i40e VF use cases that
> > > have no interest in vfio and having dependencies between the two
> > > modules is unacceptable.  I think you probably want to modularize the
> > > i40e vfio support code and then perhaps register a table in vfio-pci
> > > that the vfio-pci code can perform a module request when using a
> > > compatible device.  Just and idea, there might be better options.  I
> > > will not accept a solution that requires unloading the i40e driver in
> > > order to unload the vfio-pci driver.  It's inconvenient with just one
> > > NIC driver, imagine how poorly that scales.
> > >   
> > what about this way:
> > mediate driver registers a module notifier and every time when
> > vfio_pci is loaded, register to vfio_pci its mediate ops?
> > (Just like in below sample code)
> > This way vfio-pci is free to unload and this registering only gives
> > vfio-pci a name of what module to request.
> > After that,
> > in vfio_pci_open(), vfio-pci requests the mediate driver. (or puts
> > the mediate driver when mediate driver does not support mediating the
> > device)
> > in vfio_pci_release(), vfio-pci puts the mediate driver.
> > 
> > static void register_mediate_ops(void)
> > {
> > int (*func)(struct vfio_pci_mediate_ops *ops) = NULL;
> > 
> > func = symbol_get(vfio_pci_register_mediate_ops);
> > 
> > if (func) {
> > func(_dt_ops);
> > symbol_put(vfio_pci_register_mediate_ops);
> > }
> > }
> > 
> > static int igd_module_notify(struct notifier_block *self,
> >   unsigned long val, void *data)
> > {
> > struct module *mod = data;
> > int ret = 0;
> > 
> > switch (val) {
> > case MODULE_STATE_LIVE:
> > if (!strcmp(mod->name, "vfio_pci"))
> > register_mediate_ops();
> > break;
> > case MODULE_STATE_GOING:
> > break;
> > default:
> > break;
> > }
> > return ret;
> > }
> > 
> > static struct notifier_block igd_module_nb = {
> > .notifier_call = igd_module_notify,
> > .priority = 0,
> > };
> > 
> > 
> > 
> > static int __init igd_dt_init(void)
> > {
> > ...
> > register_mediate_ops();
> > register_module_notifier(_module_nb);
> > ...
> > return 0;
> > }
> 
> 
> No, this is bad.  Please look at MODULE_ALIAS() and request_module() as
> used in the vfio-platform for loading reset driver modules.  I think
> the correct approach is that vfio-pci should perform a request_module()
> based on the device being probed.  Having the mediation provider
> listening for vfio-pci and registering itself regardless of whether we
> intend to use it assumes that we will want to use it and assumes that
> the mediation provider module is already loaded.  We should be able to
> support demand loading of modules that may serve no other purpose than
> providing this mediation.  Thanks,
hi Alex
Thanks for this message.
So is it good to create a separate module as mediation provider driver,
and alias its module name to "vfio-pci-mediate-vid-did".
Then when vfio-pci probes the device, it requests module of that name ?

Thanks
Yan


--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list



Re: [libvirt] [RFC PATCH 1/9] vfio/pci: introduce mediate ops to intercept vfio-pci ops

2019-12-09 Thread Alex Williamson
On Sun, 8 Dec 2019 22:42:25 -0500
Yan Zhao  wrote:

> On Sat, Dec 07, 2019 at 05:22:26AM +0800, Alex Williamson wrote:
> > On Fri, 6 Dec 2019 02:56:55 -0500
> > Yan Zhao  wrote:
> >   
> > > On Fri, Dec 06, 2019 at 07:55:19AM +0800, Alex Williamson wrote:  
> > > > On Wed,  4 Dec 2019 22:25:36 -0500
> > > > Yan Zhao  wrote:
> > > > 
> > > > > when vfio-pci is bound to a physical device, almost all the hardware
> > > > > resources are passthroughed.
> > > > > Sometimes, vendor driver of this physcial device may want to mediate 
> > > > > some
> > > > > hardware resource access for a short period of time, e.g. dirty page
> > > > > tracking during live migration.
> > > > > 
> > > > > Here we introduce mediate ops in vfio-pci for this purpose.
> > > > > 
> > > > > Vendor driver can register a mediate ops to vfio-pci.
> > > > > But rather than directly bind to the passthroughed device, the
> > > > > vendor driver is now either a module that does not bind to any device 
> > > > > or
> > > > > a module binds to other device.
> > > > > E.g. when passing through a VF device that is bound to vfio-pci 
> > > > > modules,
> > > > > PF driver that binds to PF device can register to vfio-pci to mediate
> > > > > VF's regions, hence supporting VF live migration.
> > > > > 
> > > > > The sequence goes like this:
> > > > > 1. Vendor driver register its vfio_pci_mediate_ops to vfio-pci driver
> > > > > 
> > > > > 2. vfio-pci maintains a list of those registered vfio_pci_mediate_ops
> > > > > 
> > > > > 3. Whenever vfio-pci opens a device, it searches the list and call
> > > > > vfio_pci_mediate_ops->open() to check whether a vendor driver supports
> > > > > mediating this device.
> > > > > Upon a success return value of from vfio_pci_mediate_ops->open(),
> > > > > vfio-pci will stop list searching and store a mediate handle to
> > > > > represent this open into vendor driver.
> > > > > (so if multiple vendor drivers support mediating a device through
> > > > > vfio_pci_mediate_ops, only one will win, depending on their 
> > > > > registering
> > > > > sequence)
> > > > > 
> > > > > 4. Whenever a VFIO_DEVICE_GET_REGION_INFO ioctl is received in 
> > > > > vfio-pci
> > > > > ops, it will chain into vfio_pci_mediate_ops->get_region_info(), so 
> > > > > that
> > > > > vendor driver is able to override a region's default flags and caps,
> > > > > e.g. adding a sparse mmap cap to passthrough only sub-regions of a 
> > > > > whole
> > > > > region.
> > > > > 
> > > > > 5. vfio_pci_rw()/vfio_pci_mmap() first calls into
> > > > > vfio_pci_mediate_ops->rw()/vfio_pci_mediate_ops->mmaps().
> > > > > if pt=true is rteturned, vfio_pci_rw()/vfio_pci_mmap() will further
> > > > > passthrough this read/write/mmap to physical device, otherwise it just
> > > > > returns without touch physical device.
> > > > > 
> > > > > 6. When vfio-pci closes a device, vfio_pci_release() chains into
> > > > > vfio_pci_mediate_ops->release() to close the reference in vendor 
> > > > > driver.
> > > > > 
> > > > > 7. Vendor driver unregister its vfio_pci_mediate_ops when driver exits
> > > > > 
> > > > > Cc: Kevin Tian 
> > > > > 
> > > > > Signed-off-by: Yan Zhao 
> > > > > ---
> > > > >  drivers/vfio/pci/vfio_pci.c | 146 
> > > > > 
> > > > >  drivers/vfio/pci/vfio_pci_private.h |   2 +
> > > > >  include/linux/vfio.h|  16 +++
> > > > >  3 files changed, 164 insertions(+)
> > > > > 
> > > > > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > > > > index 02206162eaa9..55080ff29495 100644
> > > > > --- a/drivers/vfio/pci/vfio_pci.c
> > > > > +++ b/drivers/vfio/pci/vfio_pci.c
> > > > > @@ -54,6 +54,14 @@ module_param(disable_idle_d3, bool, S_IRUGO | 
> > > > > S_IWUSR);
> > > > >  MODULE_PARM_DESC(disable_idle_d3,
> > > > >"Disable using the PCI D3 low power state for idle, 
> > > > > unused devices");
> > > > >  
> > > > > +static LIST_HEAD(mediate_ops_list);
> > > > > +static DEFINE_MUTEX(mediate_ops_list_lock);
> > > > > +struct vfio_pci_mediate_ops_list_entry {
> > > > > + struct vfio_pci_mediate_ops *ops;
> > > > > + int refcnt;
> > > > > + struct list_headnext;
> > > > > +};
> > > > > +
> > > > >  static inline bool vfio_vga_disabled(void)
> > > > >  {
> > > > >  #ifdef CONFIG_VFIO_PCI_VGA
> > > > > @@ -472,6 +480,10 @@ static void vfio_pci_release(void *device_data)
> > > > >   if (!(--vdev->refcnt)) {
> > > > >   vfio_spapr_pci_eeh_release(vdev->pdev);
> > > > >   vfio_pci_disable(vdev);
> > > > > + if (vdev->mediate_ops && vdev->mediate_ops->release) {
> > > > > + 
> > > > > vdev->mediate_ops->release(vdev->mediate_handle);
> > > > > + vdev->mediate_ops = NULL;
> > > > > + }
> > > > >   }
> > > > >  
> > > > >   mutex_unlock(>reflck->lock);
> > > > > @@ -483,6 +495,7 @@ static int 

Re: [libvirt] [RFC PATCH 1/9] vfio/pci: introduce mediate ops to intercept vfio-pci ops

2019-12-08 Thread Yan Zhao
On Sat, Dec 07, 2019 at 05:22:26AM +0800, Alex Williamson wrote:
> On Fri, 6 Dec 2019 02:56:55 -0500
> Yan Zhao  wrote:
> 
> > On Fri, Dec 06, 2019 at 07:55:19AM +0800, Alex Williamson wrote:
> > > On Wed,  4 Dec 2019 22:25:36 -0500
> > > Yan Zhao  wrote:
> > >   
> > > > when vfio-pci is bound to a physical device, almost all the hardware
> > > > resources are passthroughed.
> > > > Sometimes, vendor driver of this physcial device may want to mediate 
> > > > some
> > > > hardware resource access for a short period of time, e.g. dirty page
> > > > tracking during live migration.
> > > > 
> > > > Here we introduce mediate ops in vfio-pci for this purpose.
> > > > 
> > > > Vendor driver can register a mediate ops to vfio-pci.
> > > > But rather than directly bind to the passthroughed device, the
> > > > vendor driver is now either a module that does not bind to any device or
> > > > a module binds to other device.
> > > > E.g. when passing through a VF device that is bound to vfio-pci modules,
> > > > PF driver that binds to PF device can register to vfio-pci to mediate
> > > > VF's regions, hence supporting VF live migration.
> > > > 
> > > > The sequence goes like this:
> > > > 1. Vendor driver register its vfio_pci_mediate_ops to vfio-pci driver
> > > > 
> > > > 2. vfio-pci maintains a list of those registered vfio_pci_mediate_ops
> > > > 
> > > > 3. Whenever vfio-pci opens a device, it searches the list and call
> > > > vfio_pci_mediate_ops->open() to check whether a vendor driver supports
> > > > mediating this device.
> > > > Upon a success return value of from vfio_pci_mediate_ops->open(),
> > > > vfio-pci will stop list searching and store a mediate handle to
> > > > represent this open into vendor driver.
> > > > (so if multiple vendor drivers support mediating a device through
> > > > vfio_pci_mediate_ops, only one will win, depending on their registering
> > > > sequence)
> > > > 
> > > > 4. Whenever a VFIO_DEVICE_GET_REGION_INFO ioctl is received in vfio-pci
> > > > ops, it will chain into vfio_pci_mediate_ops->get_region_info(), so that
> > > > vendor driver is able to override a region's default flags and caps,
> > > > e.g. adding a sparse mmap cap to passthrough only sub-regions of a whole
> > > > region.
> > > > 
> > > > 5. vfio_pci_rw()/vfio_pci_mmap() first calls into
> > > > vfio_pci_mediate_ops->rw()/vfio_pci_mediate_ops->mmaps().
> > > > if pt=true is rteturned, vfio_pci_rw()/vfio_pci_mmap() will further
> > > > passthrough this read/write/mmap to physical device, otherwise it just
> > > > returns without touch physical device.
> > > > 
> > > > 6. When vfio-pci closes a device, vfio_pci_release() chains into
> > > > vfio_pci_mediate_ops->release() to close the reference in vendor driver.
> > > > 
> > > > 7. Vendor driver unregister its vfio_pci_mediate_ops when driver exits
> > > > 
> > > > Cc: Kevin Tian 
> > > > 
> > > > Signed-off-by: Yan Zhao 
> > > > ---
> > > >  drivers/vfio/pci/vfio_pci.c | 146 
> > > >  drivers/vfio/pci/vfio_pci_private.h |   2 +
> > > >  include/linux/vfio.h|  16 +++
> > > >  3 files changed, 164 insertions(+)
> > > > 
> > > > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > > > index 02206162eaa9..55080ff29495 100644
> > > > --- a/drivers/vfio/pci/vfio_pci.c
> > > > +++ b/drivers/vfio/pci/vfio_pci.c
> > > > @@ -54,6 +54,14 @@ module_param(disable_idle_d3, bool, S_IRUGO | 
> > > > S_IWUSR);
> > > >  MODULE_PARM_DESC(disable_idle_d3,
> > > >  "Disable using the PCI D3 low power state for idle, 
> > > > unused devices");
> > > >  
> > > > +static LIST_HEAD(mediate_ops_list);
> > > > +static DEFINE_MUTEX(mediate_ops_list_lock);
> > > > +struct vfio_pci_mediate_ops_list_entry {
> > > > +   struct vfio_pci_mediate_ops *ops;
> > > > +   int refcnt;
> > > > +   struct list_headnext;
> > > > +};
> > > > +
> > > >  static inline bool vfio_vga_disabled(void)
> > > >  {
> > > >  #ifdef CONFIG_VFIO_PCI_VGA
> > > > @@ -472,6 +480,10 @@ static void vfio_pci_release(void *device_data)
> > > > if (!(--vdev->refcnt)) {
> > > > vfio_spapr_pci_eeh_release(vdev->pdev);
> > > > vfio_pci_disable(vdev);
> > > > +   if (vdev->mediate_ops && vdev->mediate_ops->release) {
> > > > +   
> > > > vdev->mediate_ops->release(vdev->mediate_handle);
> > > > +   vdev->mediate_ops = NULL;
> > > > +   }
> > > > }
> > > >  
> > > > mutex_unlock(>reflck->lock);
> > > > @@ -483,6 +495,7 @@ static int vfio_pci_open(void *device_data)
> > > >  {
> > > > struct vfio_pci_device *vdev = device_data;
> > > > int ret = 0;
> > > > +   struct vfio_pci_mediate_ops_list_entry *mentry;
> > > >  
> > > > if (!try_module_get(THIS_MODULE))
> > > > return -ENODEV;
> > > > @@ -495,6 +508,30 @@ static int 

Re: [libvirt] [RFC PATCH 1/9] vfio/pci: introduce mediate ops to intercept vfio-pci ops

2019-12-08 Thread Yan Zhao
Sorry about that. I'll pay attention to them next time and thank you for
pointing them out :)

On Sat, Dec 07, 2019 at 07:13:30AM +0800, Eric Blake wrote:
> On 12/4/19 9:25 PM, Yan Zhao wrote:
> > when vfio-pci is bound to a physical device, almost all the hardware
> > resources are passthroughed.
> 
> The intent is obvious, but it sounds awkward to a native speaker.
> s/passthroughed/passed through/
> 
> > Sometimes, vendor driver of this physcial device may want to mediate some
> 
> physical
> 
> > hardware resource access for a short period of time, e.g. dirty page
> > tracking during live migration.
> > 
> > Here we introduce mediate ops in vfio-pci for this purpose.
> > 
> > Vendor driver can register a mediate ops to vfio-pci.
> > But rather than directly bind to the passthroughed device, the
> 
> passed-through
> 
> -- 
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.   +1-919-301-3226
> Virtualization:  qemu.org | libvirt.org
> 


--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list



Re: [libvirt] [RFC PATCH 1/9] vfio/pci: introduce mediate ops to intercept vfio-pci ops

2019-12-06 Thread Eric Blake

On 12/4/19 9:25 PM, Yan Zhao wrote:

when vfio-pci is bound to a physical device, almost all the hardware
resources are passthroughed.


The intent is obvious, but it sounds awkward to a native speaker.
s/passthroughed/passed through/


Sometimes, vendor driver of this physcial device may want to mediate some


physical


hardware resource access for a short period of time, e.g. dirty page
tracking during live migration.

Here we introduce mediate ops in vfio-pci for this purpose.

Vendor driver can register a mediate ops to vfio-pci.
But rather than directly bind to the passthroughed device, the


passed-through

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list



Re: [libvirt] [RFC PATCH 1/9] vfio/pci: introduce mediate ops to intercept vfio-pci ops

2019-12-06 Thread Alex Williamson
On Fri, 6 Dec 2019 02:56:55 -0500
Yan Zhao  wrote:

> On Fri, Dec 06, 2019 at 07:55:19AM +0800, Alex Williamson wrote:
> > On Wed,  4 Dec 2019 22:25:36 -0500
> > Yan Zhao  wrote:
> >   
> > > when vfio-pci is bound to a physical device, almost all the hardware
> > > resources are passthroughed.
> > > Sometimes, vendor driver of this physcial device may want to mediate some
> > > hardware resource access for a short period of time, e.g. dirty page
> > > tracking during live migration.
> > > 
> > > Here we introduce mediate ops in vfio-pci for this purpose.
> > > 
> > > Vendor driver can register a mediate ops to vfio-pci.
> > > But rather than directly bind to the passthroughed device, the
> > > vendor driver is now either a module that does not bind to any device or
> > > a module binds to other device.
> > > E.g. when passing through a VF device that is bound to vfio-pci modules,
> > > PF driver that binds to PF device can register to vfio-pci to mediate
> > > VF's regions, hence supporting VF live migration.
> > > 
> > > The sequence goes like this:
> > > 1. Vendor driver register its vfio_pci_mediate_ops to vfio-pci driver
> > > 
> > > 2. vfio-pci maintains a list of those registered vfio_pci_mediate_ops
> > > 
> > > 3. Whenever vfio-pci opens a device, it searches the list and call
> > > vfio_pci_mediate_ops->open() to check whether a vendor driver supports
> > > mediating this device.
> > > Upon a success return value of from vfio_pci_mediate_ops->open(),
> > > vfio-pci will stop list searching and store a mediate handle to
> > > represent this open into vendor driver.
> > > (so if multiple vendor drivers support mediating a device through
> > > vfio_pci_mediate_ops, only one will win, depending on their registering
> > > sequence)
> > > 
> > > 4. Whenever a VFIO_DEVICE_GET_REGION_INFO ioctl is received in vfio-pci
> > > ops, it will chain into vfio_pci_mediate_ops->get_region_info(), so that
> > > vendor driver is able to override a region's default flags and caps,
> > > e.g. adding a sparse mmap cap to passthrough only sub-regions of a whole
> > > region.
> > > 
> > > 5. vfio_pci_rw()/vfio_pci_mmap() first calls into
> > > vfio_pci_mediate_ops->rw()/vfio_pci_mediate_ops->mmaps().
> > > if pt=true is rteturned, vfio_pci_rw()/vfio_pci_mmap() will further
> > > passthrough this read/write/mmap to physical device, otherwise it just
> > > returns without touch physical device.
> > > 
> > > 6. When vfio-pci closes a device, vfio_pci_release() chains into
> > > vfio_pci_mediate_ops->release() to close the reference in vendor driver.
> > > 
> > > 7. Vendor driver unregister its vfio_pci_mediate_ops when driver exits
> > > 
> > > Cc: Kevin Tian 
> > > 
> > > Signed-off-by: Yan Zhao 
> > > ---
> > >  drivers/vfio/pci/vfio_pci.c | 146 
> > >  drivers/vfio/pci/vfio_pci_private.h |   2 +
> > >  include/linux/vfio.h|  16 +++
> > >  3 files changed, 164 insertions(+)
> > > 
> > > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > > index 02206162eaa9..55080ff29495 100644
> > > --- a/drivers/vfio/pci/vfio_pci.c
> > > +++ b/drivers/vfio/pci/vfio_pci.c
> > > @@ -54,6 +54,14 @@ module_param(disable_idle_d3, bool, S_IRUGO | S_IWUSR);
> > >  MODULE_PARM_DESC(disable_idle_d3,
> > >"Disable using the PCI D3 low power state for idle, unused 
> > > devices");
> > >  
> > > +static LIST_HEAD(mediate_ops_list);
> > > +static DEFINE_MUTEX(mediate_ops_list_lock);
> > > +struct vfio_pci_mediate_ops_list_entry {
> > > + struct vfio_pci_mediate_ops *ops;
> > > + int refcnt;
> > > + struct list_headnext;
> > > +};
> > > +
> > >  static inline bool vfio_vga_disabled(void)
> > >  {
> > >  #ifdef CONFIG_VFIO_PCI_VGA
> > > @@ -472,6 +480,10 @@ static void vfio_pci_release(void *device_data)
> > >   if (!(--vdev->refcnt)) {
> > >   vfio_spapr_pci_eeh_release(vdev->pdev);
> > >   vfio_pci_disable(vdev);
> > > + if (vdev->mediate_ops && vdev->mediate_ops->release) {
> > > + vdev->mediate_ops->release(vdev->mediate_handle);
> > > + vdev->mediate_ops = NULL;
> > > + }
> > >   }
> > >  
> > >   mutex_unlock(>reflck->lock);
> > > @@ -483,6 +495,7 @@ static int vfio_pci_open(void *device_data)
> > >  {
> > >   struct vfio_pci_device *vdev = device_data;
> > >   int ret = 0;
> > > + struct vfio_pci_mediate_ops_list_entry *mentry;
> > >  
> > >   if (!try_module_get(THIS_MODULE))
> > >   return -ENODEV;
> > > @@ -495,6 +508,30 @@ static int vfio_pci_open(void *device_data)
> > >   goto error;
> > >  
> > >   vfio_spapr_pci_eeh_open(vdev->pdev);
> > > + mutex_lock(_ops_list_lock);
> > > + list_for_each_entry(mentry, _ops_list, next) {
> > > + u64 caps;
> > > + u32 handle;  
> > 
> > Wouldn't it seem likely that the ops provider might use this handle as
> > a pointer, so 

Re: [libvirt] [RFC PATCH 1/9] vfio/pci: introduce mediate ops to intercept vfio-pci ops

2019-12-06 Thread Yan Zhao
On Fri, Dec 06, 2019 at 07:55:19AM +0800, Alex Williamson wrote:
> On Wed,  4 Dec 2019 22:25:36 -0500
> Yan Zhao  wrote:
> 
> > when vfio-pci is bound to a physical device, almost all the hardware
> > resources are passthroughed.
> > Sometimes, vendor driver of this physcial device may want to mediate some
> > hardware resource access for a short period of time, e.g. dirty page
> > tracking during live migration.
> > 
> > Here we introduce mediate ops in vfio-pci for this purpose.
> > 
> > Vendor driver can register a mediate ops to vfio-pci.
> > But rather than directly bind to the passthroughed device, the
> > vendor driver is now either a module that does not bind to any device or
> > a module binds to other device.
> > E.g. when passing through a VF device that is bound to vfio-pci modules,
> > PF driver that binds to PF device can register to vfio-pci to mediate
> > VF's regions, hence supporting VF live migration.
> > 
> > The sequence goes like this:
> > 1. Vendor driver register its vfio_pci_mediate_ops to vfio-pci driver
> > 
> > 2. vfio-pci maintains a list of those registered vfio_pci_mediate_ops
> > 
> > 3. Whenever vfio-pci opens a device, it searches the list and call
> > vfio_pci_mediate_ops->open() to check whether a vendor driver supports
> > mediating this device.
> > Upon a success return value of from vfio_pci_mediate_ops->open(),
> > vfio-pci will stop list searching and store a mediate handle to
> > represent this open into vendor driver.
> > (so if multiple vendor drivers support mediating a device through
> > vfio_pci_mediate_ops, only one will win, depending on their registering
> > sequence)
> > 
> > 4. Whenever a VFIO_DEVICE_GET_REGION_INFO ioctl is received in vfio-pci
> > ops, it will chain into vfio_pci_mediate_ops->get_region_info(), so that
> > vendor driver is able to override a region's default flags and caps,
> > e.g. adding a sparse mmap cap to passthrough only sub-regions of a whole
> > region.
> > 
> > 5. vfio_pci_rw()/vfio_pci_mmap() first calls into
> > vfio_pci_mediate_ops->rw()/vfio_pci_mediate_ops->mmaps().
> > if pt=true is rteturned, vfio_pci_rw()/vfio_pci_mmap() will further
> > passthrough this read/write/mmap to physical device, otherwise it just
> > returns without touch physical device.
> > 
> > 6. When vfio-pci closes a device, vfio_pci_release() chains into
> > vfio_pci_mediate_ops->release() to close the reference in vendor driver.
> > 
> > 7. Vendor driver unregister its vfio_pci_mediate_ops when driver exits
> > 
> > Cc: Kevin Tian 
> > 
> > Signed-off-by: Yan Zhao 
> > ---
> >  drivers/vfio/pci/vfio_pci.c | 146 
> >  drivers/vfio/pci/vfio_pci_private.h |   2 +
> >  include/linux/vfio.h|  16 +++
> >  3 files changed, 164 insertions(+)
> > 
> > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > index 02206162eaa9..55080ff29495 100644
> > --- a/drivers/vfio/pci/vfio_pci.c
> > +++ b/drivers/vfio/pci/vfio_pci.c
> > @@ -54,6 +54,14 @@ module_param(disable_idle_d3, bool, S_IRUGO | S_IWUSR);
> >  MODULE_PARM_DESC(disable_idle_d3,
> >  "Disable using the PCI D3 low power state for idle, unused 
> > devices");
> >  
> > +static LIST_HEAD(mediate_ops_list);
> > +static DEFINE_MUTEX(mediate_ops_list_lock);
> > +struct vfio_pci_mediate_ops_list_entry {
> > +   struct vfio_pci_mediate_ops *ops;
> > +   int refcnt;
> > +   struct list_headnext;
> > +};
> > +
> >  static inline bool vfio_vga_disabled(void)
> >  {
> >  #ifdef CONFIG_VFIO_PCI_VGA
> > @@ -472,6 +480,10 @@ static void vfio_pci_release(void *device_data)
> > if (!(--vdev->refcnt)) {
> > vfio_spapr_pci_eeh_release(vdev->pdev);
> > vfio_pci_disable(vdev);
> > +   if (vdev->mediate_ops && vdev->mediate_ops->release) {
> > +   vdev->mediate_ops->release(vdev->mediate_handle);
> > +   vdev->mediate_ops = NULL;
> > +   }
> > }
> >  
> > mutex_unlock(>reflck->lock);
> > @@ -483,6 +495,7 @@ static int vfio_pci_open(void *device_data)
> >  {
> > struct vfio_pci_device *vdev = device_data;
> > int ret = 0;
> > +   struct vfio_pci_mediate_ops_list_entry *mentry;
> >  
> > if (!try_module_get(THIS_MODULE))
> > return -ENODEV;
> > @@ -495,6 +508,30 @@ static int vfio_pci_open(void *device_data)
> > goto error;
> >  
> > vfio_spapr_pci_eeh_open(vdev->pdev);
> > +   mutex_lock(_ops_list_lock);
> > +   list_for_each_entry(mentry, _ops_list, next) {
> > +   u64 caps;
> > +   u32 handle;
> 
> Wouldn't it seem likely that the ops provider might use this handle as
> a pointer, so we'd want it to be an opaque void*?
>
yes, you are right, handle as a pointer is much better. will change it.
Thanks :)

> > +
> > +   memset(, 0, sizeof(caps));
> 
> @caps has no purpose here, add it if/when we do 

Re: [libvirt] [RFC PATCH 1/9] vfio/pci: introduce mediate ops to intercept vfio-pci ops

2019-12-05 Thread Alex Williamson
On Wed,  4 Dec 2019 22:25:36 -0500
Yan Zhao  wrote:

> when vfio-pci is bound to a physical device, almost all the hardware
> resources are passthroughed.
> Sometimes, vendor driver of this physcial device may want to mediate some
> hardware resource access for a short period of time, e.g. dirty page
> tracking during live migration.
> 
> Here we introduce mediate ops in vfio-pci for this purpose.
> 
> Vendor driver can register a mediate ops to vfio-pci.
> But rather than directly bind to the passthroughed device, the
> vendor driver is now either a module that does not bind to any device or
> a module binds to other device.
> E.g. when passing through a VF device that is bound to vfio-pci modules,
> PF driver that binds to PF device can register to vfio-pci to mediate
> VF's regions, hence supporting VF live migration.
> 
> The sequence goes like this:
> 1. Vendor driver register its vfio_pci_mediate_ops to vfio-pci driver
> 
> 2. vfio-pci maintains a list of those registered vfio_pci_mediate_ops
> 
> 3. Whenever vfio-pci opens a device, it searches the list and call
> vfio_pci_mediate_ops->open() to check whether a vendor driver supports
> mediating this device.
> Upon a success return value of from vfio_pci_mediate_ops->open(),
> vfio-pci will stop list searching and store a mediate handle to
> represent this open into vendor driver.
> (so if multiple vendor drivers support mediating a device through
> vfio_pci_mediate_ops, only one will win, depending on their registering
> sequence)
> 
> 4. Whenever a VFIO_DEVICE_GET_REGION_INFO ioctl is received in vfio-pci
> ops, it will chain into vfio_pci_mediate_ops->get_region_info(), so that
> vendor driver is able to override a region's default flags and caps,
> e.g. adding a sparse mmap cap to passthrough only sub-regions of a whole
> region.
> 
> 5. vfio_pci_rw()/vfio_pci_mmap() first calls into
> vfio_pci_mediate_ops->rw()/vfio_pci_mediate_ops->mmaps().
> if pt=true is rteturned, vfio_pci_rw()/vfio_pci_mmap() will further
> passthrough this read/write/mmap to physical device, otherwise it just
> returns without touch physical device.
> 
> 6. When vfio-pci closes a device, vfio_pci_release() chains into
> vfio_pci_mediate_ops->release() to close the reference in vendor driver.
> 
> 7. Vendor driver unregister its vfio_pci_mediate_ops when driver exits
> 
> Cc: Kevin Tian 
> 
> Signed-off-by: Yan Zhao 
> ---
>  drivers/vfio/pci/vfio_pci.c | 146 
>  drivers/vfio/pci/vfio_pci_private.h |   2 +
>  include/linux/vfio.h|  16 +++
>  3 files changed, 164 insertions(+)
> 
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 02206162eaa9..55080ff29495 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -54,6 +54,14 @@ module_param(disable_idle_d3, bool, S_IRUGO | S_IWUSR);
>  MODULE_PARM_DESC(disable_idle_d3,
>"Disable using the PCI D3 low power state for idle, unused 
> devices");
>  
> +static LIST_HEAD(mediate_ops_list);
> +static DEFINE_MUTEX(mediate_ops_list_lock);
> +struct vfio_pci_mediate_ops_list_entry {
> + struct vfio_pci_mediate_ops *ops;
> + int refcnt;
> + struct list_headnext;
> +};
> +
>  static inline bool vfio_vga_disabled(void)
>  {
>  #ifdef CONFIG_VFIO_PCI_VGA
> @@ -472,6 +480,10 @@ static void vfio_pci_release(void *device_data)
>   if (!(--vdev->refcnt)) {
>   vfio_spapr_pci_eeh_release(vdev->pdev);
>   vfio_pci_disable(vdev);
> + if (vdev->mediate_ops && vdev->mediate_ops->release) {
> + vdev->mediate_ops->release(vdev->mediate_handle);
> + vdev->mediate_ops = NULL;
> + }
>   }
>  
>   mutex_unlock(>reflck->lock);
> @@ -483,6 +495,7 @@ static int vfio_pci_open(void *device_data)
>  {
>   struct vfio_pci_device *vdev = device_data;
>   int ret = 0;
> + struct vfio_pci_mediate_ops_list_entry *mentry;
>  
>   if (!try_module_get(THIS_MODULE))
>   return -ENODEV;
> @@ -495,6 +508,30 @@ static int vfio_pci_open(void *device_data)
>   goto error;
>  
>   vfio_spapr_pci_eeh_open(vdev->pdev);
> + mutex_lock(_ops_list_lock);
> + list_for_each_entry(mentry, _ops_list, next) {
> + u64 caps;
> + u32 handle;

Wouldn't it seem likely that the ops provider might use this handle as
a pointer, so we'd want it to be an opaque void*?

> +
> + memset(, 0, sizeof(caps));

@caps has no purpose here, add it if/when we do something with it.
It's also a standard type, why are we memset'ing it rather than just
=0??

> + ret = mentry->ops->open(vdev->pdev, , );
> + if (!ret)  {
> + vdev->mediate_ops = mentry->ops;
> + vdev->mediate_handle = handle;
> 

[libvirt] [RFC PATCH 1/9] vfio/pci: introduce mediate ops to intercept vfio-pci ops

2019-12-05 Thread Yan Zhao
when vfio-pci is bound to a physical device, almost all the hardware
resources are passthroughed.
Sometimes, vendor driver of this physcial device may want to mediate some
hardware resource access for a short period of time, e.g. dirty page
tracking during live migration.

Here we introduce mediate ops in vfio-pci for this purpose.

Vendor driver can register a mediate ops to vfio-pci.
But rather than directly bind to the passthroughed device, the
vendor driver is now either a module that does not bind to any device or
a module binds to other device.
E.g. when passing through a VF device that is bound to vfio-pci modules,
PF driver that binds to PF device can register to vfio-pci to mediate
VF's regions, hence supporting VF live migration.

The sequence goes like this:
1. Vendor driver register its vfio_pci_mediate_ops to vfio-pci driver

2. vfio-pci maintains a list of those registered vfio_pci_mediate_ops

3. Whenever vfio-pci opens a device, it searches the list and call
vfio_pci_mediate_ops->open() to check whether a vendor driver supports
mediating this device.
Upon a success return value of from vfio_pci_mediate_ops->open(),
vfio-pci will stop list searching and store a mediate handle to
represent this open into vendor driver.
(so if multiple vendor drivers support mediating a device through
vfio_pci_mediate_ops, only one will win, depending on their registering
sequence)

4. Whenever a VFIO_DEVICE_GET_REGION_INFO ioctl is received in vfio-pci
ops, it will chain into vfio_pci_mediate_ops->get_region_info(), so that
vendor driver is able to override a region's default flags and caps,
e.g. adding a sparse mmap cap to passthrough only sub-regions of a whole
region.

5. vfio_pci_rw()/vfio_pci_mmap() first calls into
vfio_pci_mediate_ops->rw()/vfio_pci_mediate_ops->mmaps().
if pt=true is rteturned, vfio_pci_rw()/vfio_pci_mmap() will further
passthrough this read/write/mmap to physical device, otherwise it just
returns without touch physical device.

6. When vfio-pci closes a device, vfio_pci_release() chains into
vfio_pci_mediate_ops->release() to close the reference in vendor driver.

7. Vendor driver unregister its vfio_pci_mediate_ops when driver exits

Cc: Kevin Tian 

Signed-off-by: Yan Zhao 
---
 drivers/vfio/pci/vfio_pci.c | 146 
 drivers/vfio/pci/vfio_pci_private.h |   2 +
 include/linux/vfio.h|  16 +++
 3 files changed, 164 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 02206162eaa9..55080ff29495 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -54,6 +54,14 @@ module_param(disable_idle_d3, bool, S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(disable_idle_d3,
 "Disable using the PCI D3 low power state for idle, unused 
devices");
 
+static LIST_HEAD(mediate_ops_list);
+static DEFINE_MUTEX(mediate_ops_list_lock);
+struct vfio_pci_mediate_ops_list_entry {
+   struct vfio_pci_mediate_ops *ops;
+   int refcnt;
+   struct list_headnext;
+};
+
 static inline bool vfio_vga_disabled(void)
 {
 #ifdef CONFIG_VFIO_PCI_VGA
@@ -472,6 +480,10 @@ static void vfio_pci_release(void *device_data)
if (!(--vdev->refcnt)) {
vfio_spapr_pci_eeh_release(vdev->pdev);
vfio_pci_disable(vdev);
+   if (vdev->mediate_ops && vdev->mediate_ops->release) {
+   vdev->mediate_ops->release(vdev->mediate_handle);
+   vdev->mediate_ops = NULL;
+   }
}
 
mutex_unlock(>reflck->lock);
@@ -483,6 +495,7 @@ static int vfio_pci_open(void *device_data)
 {
struct vfio_pci_device *vdev = device_data;
int ret = 0;
+   struct vfio_pci_mediate_ops_list_entry *mentry;
 
if (!try_module_get(THIS_MODULE))
return -ENODEV;
@@ -495,6 +508,30 @@ static int vfio_pci_open(void *device_data)
goto error;
 
vfio_spapr_pci_eeh_open(vdev->pdev);
+   mutex_lock(_ops_list_lock);
+   list_for_each_entry(mentry, _ops_list, next) {
+   u64 caps;
+   u32 handle;
+
+   memset(, 0, sizeof(caps));
+   ret = mentry->ops->open(vdev->pdev, , );
+   if (!ret)  {
+   vdev->mediate_ops = mentry->ops;
+   vdev->mediate_handle = handle;
+
+   pr_info("vfio pci found mediate_ops %s, 
caps=%llx, handle=%x for %x:%x\n",
+   vdev->mediate_ops->name, caps,
+   handle, vdev->pdev->vendor,
+   vdev->pdev->device);
+   /*
+* only find the first matching mediate_ops,
+*