Re: [PATCH 1/5] vfio/iommu_type1: Fixes vfio_dma_populate_bitmap to avoid dirty lose

2021-01-13 Thread Kirti Wankhede




On 1/13/2021 2:50 AM, Alex Williamson wrote:

On Thu, 7 Jan 2021 17:28:57 +0800
Keqian Zhu  wrote:


Defer checking whether vfio_dma is of fully-dirty in update_user_bitmap
is easy to lose dirty log. For example, after promoting pinned_scope of
vfio_iommu, vfio_dma is not considered as fully-dirty, then we may lose
dirty log that occurs before vfio_iommu is promoted.

The key point is that pinned-dirty is not a real dirty tracking way, it
can't continuously track dirty pages, but just restrict dirty scope. It
is essentially the same as fully-dirty. Fully-dirty is of full-scope and
pinned-dirty is of pinned-scope.

So we must mark pinned-dirty or fully-dirty after we start dirty tracking
or clear dirty bitmap, to ensure that dirty log is marked right away.


I was initially convinced by these first three patches, but upon
further review, I think the premise is wrong.  AIUI, the concern across
these patches is that our dirty bitmap is only populated with pages
dirtied by pinning and we only take into account the pinned page dirty
scope at the time the bitmap is retrieved by the user.  You suppose
this presents a gap where if a vendor driver has not yet identified
with a page pinning scope that the entire bitmap should be considered
dirty regardless of whether that driver later pins pages prior to the
user retrieving the dirty bitmap.

I don't think this is how we intended the cooperation between the iommu
driver and vendor driver to work.  By pinning pages a vendor driver is
not declaring that only their future dirty page scope is limited to
pinned pages, instead they're declaring themselves as a participant in
dirty page tracking and take responsibility for pinning any necessary
pages.  For example we might extend VFIO_IOMMU_DIRTY_PAGES_FLAG_START
to trigger a blocking notification to groups to not only begin dirty
tracking, but also to synchronously register their current device DMA
footprint.  This patch would require a vendor driver to possibly perform
a gratuitous page pinning in order to set the scope prior to dirty
logging being enabled, or else the initial bitmap will be fully dirty.

Therefore, I don't see that this series is necessary or correct.  Kirti,
does this match your thinking?



That's correct Alex and I agree with you.


Thinking about these semantics, it seems there might still be an issue
if a group with non-pinned-page dirty scope is detached with dirty
logging enabled.  


Hot-unplug a device while migration process has started - is this 
scenario supported?


Thanks,
Kirti


It seems this should in fact fully populate the dirty
bitmaps at the time it's removed since we don't know the extent of its
previous DMA, nor will the group be present to trigger the full bitmap
when the user retrieves the dirty bitmap.  Creating fully populated
bitmaps at the time tracking is enabled negates our ability to take
advantage of later enlightenment though.  Thanks,

Alex


Fixes: d6a4c185660c ("vfio iommu: Implementation of ioctl for dirty pages 
tracking")
Signed-off-by: Keqian Zhu 
---
  drivers/vfio/vfio_iommu_type1.c | 33 ++---
  1 file changed, 22 insertions(+), 11 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index bceda5e8baaa..b0a26e8e0adf 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -224,7 +224,7 @@ static void vfio_dma_bitmap_free(struct vfio_dma *dma)
dma->bitmap = NULL;
  }
  
-static void vfio_dma_populate_bitmap(struct vfio_dma *dma, size_t pgsize)

+static void vfio_dma_populate_bitmap_pinned(struct vfio_dma *dma, size_t 
pgsize)
  {
struct rb_node *p;
unsigned long pgshift = __ffs(pgsize);
@@ -236,6 +236,25 @@ static void vfio_dma_populate_bitmap(struct vfio_dma *dma, 
size_t pgsize)
}
  }
  
+static void vfio_dma_populate_bitmap_full(struct vfio_dma *dma, size_t pgsize)

+{
+   unsigned long pgshift = __ffs(pgsize);
+   unsigned long nbits = dma->size >> pgshift;
+
+   bitmap_set(dma->bitmap, 0, nbits);
+}
+
+static void vfio_dma_populate_bitmap(struct vfio_iommu *iommu,
+struct vfio_dma *dma)
+{
+   size_t pgsize = (size_t)1 << __ffs(iommu->pgsize_bitmap);
+
+   if (iommu->pinned_page_dirty_scope)
+   vfio_dma_populate_bitmap_pinned(dma, pgsize);
+   else if (dma->iommu_mapped)
+   vfio_dma_populate_bitmap_full(dma, pgsize);
+}
+
  static int vfio_dma_bitmap_alloc_all(struct vfio_iommu *iommu)
  {
struct rb_node *n;
@@ -257,7 +276,7 @@ static int vfio_dma_bitmap_alloc_all(struct vfio_iommu 
*iommu)
}
return ret;
}
-   vfio_dma_populate_bitmap(dma, pgsize);
+   vfio_dma_populate_bitmap(iommu, dma);
}
return 0;
  }
@@ -987,13 +1006,6 @@ static int update_user_bitmap(u64 __user *bitmap, struct 
vfio_iommu *iommu,
unsigned long shift = 

Re: [PATCH] vfio: Fix typo of the device_state

2020-09-11 Thread Kirti Wankhede

Ops. Thanks for fixing it.

Reviewed-by: Kirti Wankhede 

On 9/10/2020 5:55 PM, Zenghui Yu wrote:

A typo fix ("_RUNNNG" => "_RUNNING") in comment block of the uapi header.

Signed-off-by: Zenghui Yu 
---
  include/uapi/linux/vfio.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 920470502329..d4bd39e124bf 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -462,7 +462,7 @@ struct vfio_region_gfx_edid {
   * 5. Resumed
   *  |->|
   *
- * 0. Default state of VFIO device is _RUNNNG when the user application starts.
+ * 0. Default state of VFIO device is _RUNNING when the user application 
starts.
   * 1. During normal shutdown of the user application, the user application may
   *optionally change the VFIO device state from _RUNNING to _STOP. This
   *transition is optional. The vendor driver must support this transition 
but



Re: [PATCH] vfio/type1: Fix migration info capability ID

2020-06-19 Thread Kirti Wankhede




On 6/19/2020 12:42 AM, Alex Williamson wrote:

ID 1 is already used by the IOVA range capability, use ID 2.



Ops.
Thanks for Fixing it.

Reviewed-by: Kirti Wankhede 


Reported-by: Liu Yi L 
Cc: Kirti Wankhede 
Fixes: ad721705d09c ("vfio iommu: Add migration capability to report supported 
features")
Signed-off-by: Alex Williamson 
---
  include/uapi/linux/vfio.h |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index eca6692667a3..920470502329 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -1030,7 +1030,7 @@ struct vfio_iommu_type1_info_cap_iova_range {
   * size in bytes that can be used by user applications when getting the dirty
   * bitmap.
   */
-#define VFIO_IOMMU_TYPE1_INFO_CAP_MIGRATION  1
+#define VFIO_IOMMU_TYPE1_INFO_CAP_MIGRATION  2
  
  struct vfio_iommu_type1_info_cap_migration {

struct  vfio_info_cap_header header;



Re: [PATCH] vfio/mdev: Fix reference count leak in add_mdev_supported_type.

2020-05-29 Thread Kirti Wankhede




On 5/28/2020 12:32 PM, Cornelia Huck wrote:

On Wed, 27 May 2020 21:01:09 -0500
wu000...@umn.edu wrote:


From: Qiushi Wu 

kobject_init_and_add() takes reference even when it fails.
If this function returns an error, kobject_put() must be called to
properly clean up the memory associated with the object. Thus,
replace kfree() by kobject_put() to fix this issue. Previous
commit "b8eb718348b8" fixed a similar problem.

Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
Signed-off-by: Qiushi Wu 
---
  drivers/vfio/mdev/mdev_sysfs.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


Reviewed-by: Cornelia Huck 



Thanks for fixing.

Reviewed-by: Kirti Wankhede 


Re: [PATCH v2 0/2] Simplify mtty driver and mdev core

2019-08-12 Thread Kirti Wankhede



On 8/9/2019 4:32 AM, Alex Williamson wrote:
> On Thu,  8 Aug 2019 09:12:53 -0500
> Parav Pandit  wrote:
> 
>> Currently mtty sample driver uses mdev state and UUID in convoluated way to
>> generate an interrupt.
>> It uses several translations from mdev_state to mdev_device to mdev uuid.
>> After which it does linear search of long uuid comparision to
>> find out mdev_state in mtty_trigger_interrupt().
>> mdev_state is already available while generating interrupt from which all
>> such translations are done to reach back to mdev_state.
>>
>> This translations are done during interrupt generation path.
>> This is unnecessary and reduandant.
> 
> Is the interrupt handling efficiency of this particular sample driver
> really relevant, or is its purpose more to illustrate the API and
> provide a proof of concept?  If we go to the trouble to optimize the
> sample driver and remove this interface from the API, what do we lose?
> 
> This interface was added via commit:
> 
> 99e3123e3d72 vfio-mdev: Make mdev_device private and abstract interfaces
> 
> Where the goal was to create a more formal interface and abstract
> driver access to the struct mdev_device.  In part this served to make
> out-of-tree mdev vendor drivers more supportable; the object is
> considered opaque and access is provided via an API rather than through
> direct structure fields.
> 
> I believe that the NVIDIA GRID mdev driver does make use of this
> interface and it's likely included in the sample driver specifically so
> that there is an in-kernel user for it (ie. specifically to avoid it
> being removed so casually).  An interesting feature of the NVIDIA mdev
> driver is that I believe it has portions that run in userspace.  As we
> know, mdevs are named with a UUID, so I can imagine there are some
> efficiencies to be gained in having direct access to the UUID for a
> device when interacting with userspace, rather than repeatedly parsing
> it from a device name.

That's right.

>  Is that really something we want to make more
> difficult in order to optimize a sample driver?  Knowing that an mdev
> device uses a UUID for it's name, as tools like libvirt and mdevctl
> expect, is it really worthwhile to remove such a trivial API?
> 
>> Hence,
>> Patch-1 simplifies mtty sample driver to directly use mdev_state.
>>
>> Patch-2, Since no production driver uses mdev_uuid(), simplifies and
>> removes redandant mdev_uuid() exported symbol.
> 
> s/no production driver/no in-kernel production driver/
> 
> I'd be interested to hear how the NVIDIA folks make use of this API
> interface.  Thanks,
> 

Yes, NVIDIA mdev driver do use this interface. I don't agree on removing
mdev_uuid() interface.

Thanks,
Kirti


> Alex
> 
>> ---
>> Changelog:
>> v1->v2:
>>  - Corrected email of Kirti
>>  - Updated cover letter commit log to address comment from Cornelia
>>  - Added Reviewed-by tag
>> v0->v1:
>>  - Updated commit log
>>
>> Parav Pandit (2):
>>   vfio-mdev/mtty: Simplify interrupt generation
>>   vfio/mdev: Removed unused and redundant API for mdev UUID
>>
>>  drivers/vfio/mdev/mdev_core.c |  6 --
>>  include/linux/mdev.h  |  1 -
>>  samples/vfio-mdev/mtty.c  | 39 +++
>>  3 files changed, 8 insertions(+), 38 deletions(-)
>>
> 


Re: [PATCH v3] mdev: Send uevents around parent device registration

2019-07-10 Thread Kirti Wankhede



On 7/10/2019 11:11 PM, Alex Williamson wrote:
> This allows udev to trigger rules when a parent device is registered
> or unregistered from mdev.
> 
> Reviewed-by: Cornelia Huck 
> Signed-off-by: Alex Williamson 
> ---
> 
> v3: Add Connie's R-b
> Add comment clarifying expected device requirements for unreg
> 

Reviewed-by: Kirti Wankhede 

Thanks,
Kirti

>  drivers/vfio/mdev/mdev_core.c |9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> index ae23151442cb..23976db6c6c7 100644
> --- a/drivers/vfio/mdev/mdev_core.c
> +++ b/drivers/vfio/mdev/mdev_core.c
> @@ -146,6 +146,8 @@ int mdev_register_device(struct device *dev, const struct 
> mdev_parent_ops *ops)
>  {
>   int ret;
>   struct mdev_parent *parent;
> + char *env_string = "MDEV_STATE=registered";
> + char *envp[] = { env_string, NULL };
>  
>   /* check for mandatory ops */
>   if (!ops || !ops->create || !ops->remove || !ops->supported_type_groups)
> @@ -197,6 +199,8 @@ int mdev_register_device(struct device *dev, const struct 
> mdev_parent_ops *ops)
>   mutex_unlock(_list_lock);
>  
>   dev_info(dev, "MDEV: Registered\n");
> + kobject_uevent_env(>kobj, KOBJ_CHANGE, envp);
> +
>   return 0;
>  
>  add_dev_err:
> @@ -220,6 +224,8 @@ EXPORT_SYMBOL(mdev_register_device);
>  void mdev_unregister_device(struct device *dev)
>  {
>   struct mdev_parent *parent;
> + char *env_string = "MDEV_STATE=unregistered";
> + char *envp[] = { env_string, NULL };
>  
>   mutex_lock(_list_lock);
>   parent = __find_parent_device(dev);
> @@ -243,6 +249,9 @@ void mdev_unregister_device(struct device *dev)
>   up_write(>unreg_sem);
>  
>   mdev_put_parent(parent);
> +
> + /* We still have the caller's reference to use for the uevent */
> + kobject_uevent_env(>kobj, KOBJ_CHANGE, envp);
>  }
>  EXPORT_SYMBOL(mdev_unregister_device);
>  
> 


Re: [PATCH v2] mdev: Send uevents around parent device registration

2019-07-02 Thread Kirti Wankhede



On 7/2/2019 8:13 PM, Alex Williamson wrote:
> On Tue, 2 Jul 2019 19:10:17 +0530
> Kirti Wankhede  wrote:
> 
>> On 7/2/2019 6:38 PM, Alex Williamson wrote:
>>> On Tue, 2 Jul 2019 18:17:41 +0530
>>> Kirti Wankhede  wrote:
>>>   
>>>> On 7/2/2019 12:43 PM, Parav Pandit wrote:  
>>>>>
>>>>> 
>>>>>> -Original Message-
>>>>>> From: linux-kernel-ow...@vger.kernel.org >>>>> ow...@vger.kernel.org> On Behalf Of Alex Williamson
>>>>>> Sent: Tuesday, July 2, 2019 11:12 AM
>>>>>> To: Kirti Wankhede 
>>>>>> Cc: coh...@redhat.com; k...@vger.kernel.org; linux-kernel@vger.kernel.org
>>>>>> Subject: Re: [PATCH v2] mdev: Send uevents around parent device 
>>>>>> registration
>>>>>>
>>>>>> On Tue, 2 Jul 2019 10:25:04 +0530
>>>>>> Kirti Wankhede  wrote:
>>>>>>
>>>>>>> On 7/2/2019 1:34 AM, Alex Williamson wrote:
>>>>>>>> On Mon, 1 Jul 2019 23:20:35 +0530
>>>>>>>> Kirti Wankhede  wrote:
>>>>>>>>
>>>>>>>>> On 7/1/2019 10:54 PM, Alex Williamson wrote:
>>>>>>>>>> On Mon, 1 Jul 2019 22:43:10 +0530
>>>>>>>>>> Kirti Wankhede  wrote:
>>>>>>>>>>
>>>>>>>>>>> On 7/1/2019 8:24 PM, Alex Williamson wrote:
>>>>>>>>>>>> This allows udev to trigger rules when a parent device is
>>>>>>>>>>>> registered or unregistered from mdev.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Alex Williamson 
>>>>>>>>>>>> ---
>>>>>>>>>>>>
>>>>>>>>>>>> v2: Don't remove the dev_info(), Kirti requested they stay and
>>>>>>>>>>>> removing them is only tangential to the goal of this change.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>  drivers/vfio/mdev/mdev_core.c |8 
>>>>>>>>>>>>  1 file changed, 8 insertions(+)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/drivers/vfio/mdev/mdev_core.c
>>>>>>>>>>>> b/drivers/vfio/mdev/mdev_core.c index ae23151442cb..7fb268136c62
>>>>>>>>>>>> 100644
>>>>>>>>>>>> --- a/drivers/vfio/mdev/mdev_core.c
>>>>>>>>>>>> +++ b/drivers/vfio/mdev/mdev_core.c
>>>>>>>>>>>> @@ -146,6 +146,8 @@ int mdev_register_device(struct device *dev,
>>>>>>>>>>>> const struct mdev_parent_ops *ops)  {
>>>>>>>>>>>>int ret;
>>>>>>>>>>>>struct mdev_parent *parent;
>>>>>>>>>>>> +  char *env_string = "MDEV_STATE=registered";
>>>>>>>>>>>> +  char *envp[] = { env_string, NULL };
>>>>>>>>>>>>
>>>>>>>>>>>>/* check for mandatory ops */
>>>>>>>>>>>>if (!ops || !ops->create || !ops->remove ||
>>>>>>>>>>>> !ops->supported_type_groups) @@ -197,6 +199,8 @@ int
>>>>>> mdev_register_device(struct device *dev, const struct mdev_parent_ops 
>>>>>> *ops)
>>>>>>>>>>>>mutex_unlock(_list_lock);
>>>>>>>>>>>>
>>>>>>>>>>>>dev_info(dev, "MDEV: Registered\n");
>>>>>>>>>>>> +  kobject_uevent_env(>kobj, KOBJ_CHANGE, envp);
>>>>>>>>>>>> +
>>>>>>>>>>>>return 0;
>>>>>>>>>>>>
>>>>>>>>>>>>  add_dev_err:
>>>>>>>>>>>> @@ -220,6 +224,8 @@ EXPORT_SYMBOL(mdev_register_device);
>>>>>>>>>>>>  void mdev_unregister_device(struct device *dev)  {

Re: [PATCH v2] mdev: Send uevents around parent device registration

2019-07-02 Thread Kirti Wankhede



On 7/2/2019 6:38 PM, Alex Williamson wrote:
> On Tue, 2 Jul 2019 18:17:41 +0530
> Kirti Wankhede  wrote:
> 
>> On 7/2/2019 12:43 PM, Parav Pandit wrote:
>>>
>>>   
>>>> -Original Message-
>>>> From: linux-kernel-ow...@vger.kernel.org >>> ow...@vger.kernel.org> On Behalf Of Alex Williamson  
>>>> Sent: Tuesday, July 2, 2019 11:12 AM
>>>> To: Kirti Wankhede 
>>>> Cc: coh...@redhat.com; k...@vger.kernel.org; linux-kernel@vger.kernel.org
>>>> Subject: Re: [PATCH v2] mdev: Send uevents around parent device 
>>>> registration
>>>>
>>>> On Tue, 2 Jul 2019 10:25:04 +0530
>>>> Kirti Wankhede  wrote:
>>>>  
>>>>> On 7/2/2019 1:34 AM, Alex Williamson wrote:  
>>>>>> On Mon, 1 Jul 2019 23:20:35 +0530
>>>>>> Kirti Wankhede  wrote:
>>>>>>  
>>>>>>> On 7/1/2019 10:54 PM, Alex Williamson wrote:  
>>>>>>>> On Mon, 1 Jul 2019 22:43:10 +0530
>>>>>>>> Kirti Wankhede  wrote:
>>>>>>>>  
>>>>>>>>> On 7/1/2019 8:24 PM, Alex Williamson wrote:  
>>>>>>>>>> This allows udev to trigger rules when a parent device is
>>>>>>>>>> registered or unregistered from mdev.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Alex Williamson 
>>>>>>>>>> ---
>>>>>>>>>>
>>>>>>>>>> v2: Don't remove the dev_info(), Kirti requested they stay and
>>>>>>>>>> removing them is only tangential to the goal of this change.
>>>>>>>>>>  
>>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>>
>>>>>>>>>  
>>>>>>>>>>  drivers/vfio/mdev/mdev_core.c |8 
>>>>>>>>>>  1 file changed, 8 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/vfio/mdev/mdev_core.c
>>>>>>>>>> b/drivers/vfio/mdev/mdev_core.c index ae23151442cb..7fb268136c62
>>>>>>>>>> 100644
>>>>>>>>>> --- a/drivers/vfio/mdev/mdev_core.c
>>>>>>>>>> +++ b/drivers/vfio/mdev/mdev_core.c
>>>>>>>>>> @@ -146,6 +146,8 @@ int mdev_register_device(struct device *dev,
>>>>>>>>>> const struct mdev_parent_ops *ops)  {
>>>>>>>>>>  int ret;
>>>>>>>>>>  struct mdev_parent *parent;
>>>>>>>>>> +char *env_string = "MDEV_STATE=registered";
>>>>>>>>>> +char *envp[] = { env_string, NULL };
>>>>>>>>>>
>>>>>>>>>>  /* check for mandatory ops */
>>>>>>>>>>  if (!ops || !ops->create || !ops->remove ||
>>>>>>>>>> !ops->supported_type_groups) @@ -197,6 +199,8 @@ int  
>>>> mdev_register_device(struct device *dev, const struct mdev_parent_ops 
>>>> *ops)  
>>>>>>>>>>  mutex_unlock(_list_lock);
>>>>>>>>>>
>>>>>>>>>>  dev_info(dev, "MDEV: Registered\n");
>>>>>>>>>> +kobject_uevent_env(>kobj, KOBJ_CHANGE, envp);
>>>>>>>>>> +
>>>>>>>>>>  return 0;
>>>>>>>>>>
>>>>>>>>>>  add_dev_err:
>>>>>>>>>> @@ -220,6 +224,8 @@ EXPORT_SYMBOL(mdev_register_device);
>>>>>>>>>>  void mdev_unregister_device(struct device *dev)  {
>>>>>>>>>>  struct mdev_parent *parent;
>>>>>>>>>> +char *env_string = "MDEV_STATE=unregistered";
>>>>>>>>>> +char *envp[] = { env_string, NULL };
>>>>>>>>>>
>>>>>>>>>>  mutex_lock(_list_lock);
>>>>>>>>>>  parent = __find_parent_device(dev); @@ -243,6 +249,8 @@  
>>>> void  
>>>>>>>>>> mdev_unregister_device(struct device *dev)
>>>>>>>>>>  up_write(>unreg_sem);
>>>>>>>>>>

Re: [PATCH v2] mdev: Send uevents around parent device registration

2019-07-02 Thread Kirti Wankhede



On 7/2/2019 12:43 PM, Parav Pandit wrote:
> 
> 
>> -Original Message-
>> From: linux-kernel-ow...@vger.kernel.org > ow...@vger.kernel.org> On Behalf Of Alex Williamson
>> Sent: Tuesday, July 2, 2019 11:12 AM
>> To: Kirti Wankhede 
>> Cc: coh...@redhat.com; k...@vger.kernel.org; linux-kernel@vger.kernel.org
>> Subject: Re: [PATCH v2] mdev: Send uevents around parent device registration
>>
>> On Tue, 2 Jul 2019 10:25:04 +0530
>> Kirti Wankhede  wrote:
>>
>>> On 7/2/2019 1:34 AM, Alex Williamson wrote:
>>>> On Mon, 1 Jul 2019 23:20:35 +0530
>>>> Kirti Wankhede  wrote:
>>>>
>>>>> On 7/1/2019 10:54 PM, Alex Williamson wrote:
>>>>>> On Mon, 1 Jul 2019 22:43:10 +0530
>>>>>> Kirti Wankhede  wrote:
>>>>>>
>>>>>>> On 7/1/2019 8:24 PM, Alex Williamson wrote:
>>>>>>>> This allows udev to trigger rules when a parent device is
>>>>>>>> registered or unregistered from mdev.
>>>>>>>>
>>>>>>>> Signed-off-by: Alex Williamson 
>>>>>>>> ---
>>>>>>>>
>>>>>>>> v2: Don't remove the dev_info(), Kirti requested they stay and
>>>>>>>> removing them is only tangential to the goal of this change.
>>>>>>>>
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>>
>>>>>>>>  drivers/vfio/mdev/mdev_core.c |8 
>>>>>>>>  1 file changed, 8 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/vfio/mdev/mdev_core.c
>>>>>>>> b/drivers/vfio/mdev/mdev_core.c index ae23151442cb..7fb268136c62
>>>>>>>> 100644
>>>>>>>> --- a/drivers/vfio/mdev/mdev_core.c
>>>>>>>> +++ b/drivers/vfio/mdev/mdev_core.c
>>>>>>>> @@ -146,6 +146,8 @@ int mdev_register_device(struct device *dev,
>>>>>>>> const struct mdev_parent_ops *ops)  {
>>>>>>>>int ret;
>>>>>>>>struct mdev_parent *parent;
>>>>>>>> +  char *env_string = "MDEV_STATE=registered";
>>>>>>>> +  char *envp[] = { env_string, NULL };
>>>>>>>>
>>>>>>>>/* check for mandatory ops */
>>>>>>>>if (!ops || !ops->create || !ops->remove ||
>>>>>>>> !ops->supported_type_groups) @@ -197,6 +199,8 @@ int
>> mdev_register_device(struct device *dev, const struct mdev_parent_ops *ops)
>>>>>>>>mutex_unlock(_list_lock);
>>>>>>>>
>>>>>>>>dev_info(dev, "MDEV: Registered\n");
>>>>>>>> +  kobject_uevent_env(>kobj, KOBJ_CHANGE, envp);
>>>>>>>> +
>>>>>>>>return 0;
>>>>>>>>
>>>>>>>>  add_dev_err:
>>>>>>>> @@ -220,6 +224,8 @@ EXPORT_SYMBOL(mdev_register_device);
>>>>>>>>  void mdev_unregister_device(struct device *dev)  {
>>>>>>>>struct mdev_parent *parent;
>>>>>>>> +  char *env_string = "MDEV_STATE=unregistered";
>>>>>>>> +  char *envp[] = { env_string, NULL };
>>>>>>>>
>>>>>>>>mutex_lock(_list_lock);
>>>>>>>>parent = __find_parent_device(dev); @@ -243,6 +249,8 @@
>> void
>>>>>>>> mdev_unregister_device(struct device *dev)
>>>>>>>>up_write(>unreg_sem);
>>>>>>>>
>>>>>>>>mdev_put_parent(parent);
>>>>>>>> +
>>>>>>>> +  kobject_uevent_env(>kobj, KOBJ_CHANGE, envp);
>>>>>>>
>>>>>>> mdev_put_parent() calls put_device(dev). If this is the last
>>>>>>> instance holding device, then on put_device(dev) dev would get freed.
>>>>>>>
>>>>>>> This event should be before mdev_put_parent()
>>>>>>
>>>>>> So you're suggesting the vendor driver is calling
>>>>>> mdev_unregister_device() without a reference to the struct device
>>>>>> that

Re: [PATCH v2] mdev: Send uevents around parent device registration

2019-07-01 Thread Kirti Wankhede



On 7/2/2019 1:34 AM, Alex Williamson wrote:
> On Mon, 1 Jul 2019 23:20:35 +0530
> Kirti Wankhede  wrote:
> 
>> On 7/1/2019 10:54 PM, Alex Williamson wrote:
>>> On Mon, 1 Jul 2019 22:43:10 +0530
>>> Kirti Wankhede  wrote:
>>>   
>>>> On 7/1/2019 8:24 PM, Alex Williamson wrote:  
>>>>> This allows udev to trigger rules when a parent device is registered
>>>>> or unregistered from mdev.
>>>>>
>>>>> Signed-off-by: Alex Williamson 
>>>>> ---
>>>>>
>>>>> v2: Don't remove the dev_info(), Kirti requested they stay and
>>>>> removing them is only tangential to the goal of this change.
>>>>> 
>>>>
>>>> Thanks.
>>>>
>>>>  
>>>>>  drivers/vfio/mdev/mdev_core.c |8 
>>>>>  1 file changed, 8 insertions(+)
>>>>>
>>>>> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
>>>>> index ae23151442cb..7fb268136c62 100644
>>>>> --- a/drivers/vfio/mdev/mdev_core.c
>>>>> +++ b/drivers/vfio/mdev/mdev_core.c
>>>>> @@ -146,6 +146,8 @@ int mdev_register_device(struct device *dev, const 
>>>>> struct mdev_parent_ops *ops)
>>>>>  {
>>>>>   int ret;
>>>>>   struct mdev_parent *parent;
>>>>> + char *env_string = "MDEV_STATE=registered";
>>>>> + char *envp[] = { env_string, NULL };
>>>>>  
>>>>>   /* check for mandatory ops */
>>>>>   if (!ops || !ops->create || !ops->remove || !ops->supported_type_groups)
>>>>> @@ -197,6 +199,8 @@ int mdev_register_device(struct device *dev, const 
>>>>> struct mdev_parent_ops *ops)
>>>>>   mutex_unlock(_list_lock);
>>>>>  
>>>>>   dev_info(dev, "MDEV: Registered\n");
>>>>> + kobject_uevent_env(>kobj, KOBJ_CHANGE, envp);
>>>>> +
>>>>>   return 0;
>>>>>  
>>>>>  add_dev_err:
>>>>> @@ -220,6 +224,8 @@ EXPORT_SYMBOL(mdev_register_device);
>>>>>  void mdev_unregister_device(struct device *dev)
>>>>>  {
>>>>>   struct mdev_parent *parent;
>>>>> + char *env_string = "MDEV_STATE=unregistered";
>>>>> + char *envp[] = { env_string, NULL };
>>>>>  
>>>>>   mutex_lock(_list_lock);
>>>>>   parent = __find_parent_device(dev);
>>>>> @@ -243,6 +249,8 @@ void mdev_unregister_device(struct device *dev)
>>>>>   up_write(>unreg_sem);
>>>>>  
>>>>>   mdev_put_parent(parent);
>>>>> +
>>>>> + kobject_uevent_env(>kobj, KOBJ_CHANGE, envp);
>>>>
>>>> mdev_put_parent() calls put_device(dev). If this is the last instance
>>>> holding device, then on put_device(dev) dev would get freed.
>>>>
>>>> This event should be before mdev_put_parent()  
>>>
>>> So you're suggesting the vendor driver is calling
>>> mdev_unregister_device() without a reference to the struct device that
>>> it's passing to unregister?  Sounds bogus to me.  We take a
>>> reference to the device so that it can't disappear out from under us,
>>> the caller cannot rely on our reference and the caller provided the
>>> struct device.  Thanks,
>>>   
>>
>> 1. Register uevent is sent after mdev holding reference to device, then
>> ideally, unregister path should be mirror of register path, send uevent
>> and then release the reference to device.
> 
> I don't see the relevance here.  We're marking an event, not unwinding
> state of the device from the registration process.  Additionally, the
> event we're trying to mark is the completion of each process, so the
> notion that we need to mirror the ordering between the two is invalid.
> 
>> 2. I agree that vendor driver shouldn't call mdev_unregister_device()
>> without holding reference to device. But to be on safer side, if ever
>> such case occur, to avoid any segmentation fault in kernel, better to
>> send event before mdev release the reference to device.
> 
> I know that get_device() and put_device() are GPL symbols and that's a
> bit of an issue, but I don't think we should be kludging the code for a
> vendor driver that might have problems with that.  A) we're using the
> caller provided device  for the uevent, B) we're only releasing our own
> reference to the device that was acquired during registration, the
> vendor driver must have other references,

Are you going to assume that someone/vendor driver is always going to do
right thing?

> C) the parent device
> generally lives on a bus, with a vendor driver, there's an entire
> ecosystem of references to the device below mdev.  Is this a paranoia
> request or are you really concerned that your PCI device suddenly
> disappears when mdev's reference to it disappears. 

mdev infrastructure is not always used by PCI devices. It is designed to
be generic, so that other devices (other than PCI devices) can also use
this framework.
If there is a assumption that user of mdev framework or vendor drivers
are always going to use mdev in right way, then there is no need for
mdev core to held reference of the device?
This is not a "paranoia request". This is more of a ideal scenario, mdev
should use device by holding its reference rather than assuming (or
relying on) someone else holding the reference of device.

Thanks,
Kirti


Re: [PATCH v2] mdev: Send uevents around parent device registration

2019-07-01 Thread Kirti Wankhede



On 7/1/2019 10:54 PM, Alex Williamson wrote:
> On Mon, 1 Jul 2019 22:43:10 +0530
> Kirti Wankhede  wrote:
> 
>> On 7/1/2019 8:24 PM, Alex Williamson wrote:
>>> This allows udev to trigger rules when a parent device is registered
>>> or unregistered from mdev.
>>>
>>> Signed-off-by: Alex Williamson 
>>> ---
>>>
>>> v2: Don't remove the dev_info(), Kirti requested they stay and
>>> removing them is only tangential to the goal of this change.
>>>   
>>
>> Thanks.
>>
>>
>>>  drivers/vfio/mdev/mdev_core.c |8 
>>>  1 file changed, 8 insertions(+)
>>>
>>> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
>>> index ae23151442cb..7fb268136c62 100644
>>> --- a/drivers/vfio/mdev/mdev_core.c
>>> +++ b/drivers/vfio/mdev/mdev_core.c
>>> @@ -146,6 +146,8 @@ int mdev_register_device(struct device *dev, const 
>>> struct mdev_parent_ops *ops)
>>>  {
>>> int ret;
>>> struct mdev_parent *parent;
>>> +   char *env_string = "MDEV_STATE=registered";
>>> +   char *envp[] = { env_string, NULL };
>>>  
>>> /* check for mandatory ops */
>>> if (!ops || !ops->create || !ops->remove || !ops->supported_type_groups)
>>> @@ -197,6 +199,8 @@ int mdev_register_device(struct device *dev, const 
>>> struct mdev_parent_ops *ops)
>>> mutex_unlock(_list_lock);
>>>  
>>> dev_info(dev, "MDEV: Registered\n");
>>> +   kobject_uevent_env(>kobj, KOBJ_CHANGE, envp);
>>> +
>>> return 0;
>>>  
>>>  add_dev_err:
>>> @@ -220,6 +224,8 @@ EXPORT_SYMBOL(mdev_register_device);
>>>  void mdev_unregister_device(struct device *dev)
>>>  {
>>> struct mdev_parent *parent;
>>> +   char *env_string = "MDEV_STATE=unregistered";
>>> +   char *envp[] = { env_string, NULL };
>>>  
>>> mutex_lock(_list_lock);
>>> parent = __find_parent_device(dev);
>>> @@ -243,6 +249,8 @@ void mdev_unregister_device(struct device *dev)
>>> up_write(>unreg_sem);
>>>  
>>> mdev_put_parent(parent);
>>> +
>>> +   kobject_uevent_env(>kobj, KOBJ_CHANGE, envp);  
>>
>> mdev_put_parent() calls put_device(dev). If this is the last instance
>> holding device, then on put_device(dev) dev would get freed.
>>
>> This event should be before mdev_put_parent()
> 
> So you're suggesting the vendor driver is calling
> mdev_unregister_device() without a reference to the struct device that
> it's passing to unregister?  Sounds bogus to me.  We take a
> reference to the device so that it can't disappear out from under us,
> the caller cannot rely on our reference and the caller provided the
> struct device.  Thanks,
> 

1. Register uevent is sent after mdev holding reference to device, then
ideally, unregister path should be mirror of register path, send uevent
and then release the reference to device.

2. I agree that vendor driver shouldn't call mdev_unregister_device()
without holding reference to device. But to be on safer side, if ever
such case occur, to avoid any segmentation fault in kernel, better to
send event before mdev release the reference to device.

Thanks,
Kirti


Re: [PATCH v2] mdev: Send uevents around parent device registration

2019-07-01 Thread Kirti Wankhede



On 7/1/2019 8:24 PM, Alex Williamson wrote:
> This allows udev to trigger rules when a parent device is registered
> or unregistered from mdev.
> 
> Signed-off-by: Alex Williamson 
> ---
> 
> v2: Don't remove the dev_info(), Kirti requested they stay and
> removing them is only tangential to the goal of this change.
> 

Thanks.


>  drivers/vfio/mdev/mdev_core.c |8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> index ae23151442cb..7fb268136c62 100644
> --- a/drivers/vfio/mdev/mdev_core.c
> +++ b/drivers/vfio/mdev/mdev_core.c
> @@ -146,6 +146,8 @@ int mdev_register_device(struct device *dev, const struct 
> mdev_parent_ops *ops)
>  {
>   int ret;
>   struct mdev_parent *parent;
> + char *env_string = "MDEV_STATE=registered";
> + char *envp[] = { env_string, NULL };
>  
>   /* check for mandatory ops */
>   if (!ops || !ops->create || !ops->remove || !ops->supported_type_groups)
> @@ -197,6 +199,8 @@ int mdev_register_device(struct device *dev, const struct 
> mdev_parent_ops *ops)
>   mutex_unlock(_list_lock);
>  
>   dev_info(dev, "MDEV: Registered\n");
> + kobject_uevent_env(>kobj, KOBJ_CHANGE, envp);
> +
>   return 0;
>  
>  add_dev_err:
> @@ -220,6 +224,8 @@ EXPORT_SYMBOL(mdev_register_device);
>  void mdev_unregister_device(struct device *dev)
>  {
>   struct mdev_parent *parent;
> + char *env_string = "MDEV_STATE=unregistered";
> + char *envp[] = { env_string, NULL };
>  
>   mutex_lock(_list_lock);
>   parent = __find_parent_device(dev);
> @@ -243,6 +249,8 @@ void mdev_unregister_device(struct device *dev)
>   up_write(>unreg_sem);
>  
>   mdev_put_parent(parent);
> +
> + kobject_uevent_env(>kobj, KOBJ_CHANGE, envp);

mdev_put_parent() calls put_device(dev). If this is the last instance
holding device, then on put_device(dev) dev would get freed.

This event should be before mdev_put_parent()

Thanks,
Kirti

>  }
>  EXPORT_SYMBOL(mdev_unregister_device);
>  
> 


Re: [PATCH] mdev: Send uevents around parent device registration

2019-06-27 Thread Kirti Wankhede



On 6/27/2019 1:51 PM, Cornelia Huck wrote:
> On Thu, 27 Jun 2019 00:33:59 +0530
> Kirti Wankhede  wrote:
> 
>> On 6/26/2019 11:35 PM, Alex Williamson wrote:
>>> On Wed, 26 Jun 2019 23:23:00 +0530
>>> Kirti Wankhede  wrote:
>>>   
>>>> On 6/26/2019 7:57 PM, Alex Williamson wrote:  
>>>>> This allows udev to trigger rules when a parent device is registered
>>>>> or unregistered from mdev.
>>>>>
>>>>> Signed-off-by: Alex Williamson 
>>>>> ---
>>>>>  drivers/vfio/mdev/mdev_core.c |   10 --
>>>>>  1 file changed, 8 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
>>>>> index ae23151442cb..ecec2a3b13cb 100644
>>>>> --- a/drivers/vfio/mdev/mdev_core.c
>>>>> +++ b/drivers/vfio/mdev/mdev_core.c
>>>>> @@ -146,6 +146,8 @@ int mdev_register_device(struct device *dev, const 
>>>>> struct mdev_parent_ops *ops)
>>>>>  {
>>>>>   int ret;
>>>>>   struct mdev_parent *parent;
>>>>> + char *env_string = "MDEV_STATE=registered";
>>>>> + char *envp[] = { env_string, NULL };
>>>>>  
>>>>>   /* check for mandatory ops */
>>>>>   if (!ops || !ops->create || !ops->remove || !ops->supported_type_groups)
>>>>> @@ -196,7 +198,8 @@ int mdev_register_device(struct device *dev, const 
>>>>> struct mdev_parent_ops *ops)
>>>>>   list_add(>next, _list);
>>>>>   mutex_unlock(_list_lock);
>>>>>  
>>>>> - dev_info(dev, "MDEV: Registered\n");
>>>>> + kobject_uevent_env(>kobj, KOBJ_CHANGE, envp);
>>>>> +
>>>>
>>>> Its good to have udev event, but don't remove debug print from dmesg.
>>>> Same for unregister.  
>>>
>>> Who consumes these?  They seem noisy.  Thanks,
>>>   
>>
>> I don't think its noisy, its more of logging purpose. This is seen in
>> kernel log only when physical device is registered to mdev.
> 
> Yes; but why do you want to log success? If you need to log it
> somewhere, wouldn't a trace event be a much better choice?
> 

Trace events are not always collected in production environment, there
kernel log helps.

Thanks,
Kirti


Re: [PATCH] mdev: Send uevents around parent device registration

2019-06-26 Thread Kirti Wankhede



On 6/26/2019 11:35 PM, Alex Williamson wrote:
> On Wed, 26 Jun 2019 23:23:00 +0530
> Kirti Wankhede  wrote:
> 
>> On 6/26/2019 7:57 PM, Alex Williamson wrote:
>>> This allows udev to trigger rules when a parent device is registered
>>> or unregistered from mdev.
>>>
>>> Signed-off-by: Alex Williamson 
>>> ---
>>>  drivers/vfio/mdev/mdev_core.c |   10 --
>>>  1 file changed, 8 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
>>> index ae23151442cb..ecec2a3b13cb 100644
>>> --- a/drivers/vfio/mdev/mdev_core.c
>>> +++ b/drivers/vfio/mdev/mdev_core.c
>>> @@ -146,6 +146,8 @@ int mdev_register_device(struct device *dev, const 
>>> struct mdev_parent_ops *ops)
>>>  {
>>> int ret;
>>> struct mdev_parent *parent;
>>> +   char *env_string = "MDEV_STATE=registered";
>>> +   char *envp[] = { env_string, NULL };
>>>  
>>> /* check for mandatory ops */
>>> if (!ops || !ops->create || !ops->remove || !ops->supported_type_groups)
>>> @@ -196,7 +198,8 @@ int mdev_register_device(struct device *dev, const 
>>> struct mdev_parent_ops *ops)
>>> list_add(>next, _list);
>>> mutex_unlock(_list_lock);
>>>  
>>> -   dev_info(dev, "MDEV: Registered\n");
>>> +   kobject_uevent_env(>kobj, KOBJ_CHANGE, envp);
>>> +  
>>
>> Its good to have udev event, but don't remove debug print from dmesg.
>> Same for unregister.
> 
> Who consumes these?  They seem noisy.  Thanks,
> 

I don't think its noisy, its more of logging purpose. This is seen in
kernel log only when physical device is registered to mdev.

Thanks,
Kirti


> Alex
> 
>>> return 0;
>>>  
>>>  add_dev_err:
>>> @@ -220,6 +223,8 @@ EXPORT_SYMBOL(mdev_register_device);
>>>  void mdev_unregister_device(struct device *dev)
>>>  {
>>> struct mdev_parent *parent;
>>> +   char *env_string = "MDEV_STATE=unregistered";
>>> +   char *envp[] = { env_string, NULL };
>>>  
>>> mutex_lock(_list_lock);
>>> parent = __find_parent_device(dev);
>>> @@ -228,7 +233,6 @@ void mdev_unregister_device(struct device *dev)
>>> mutex_unlock(_list_lock);
>>> return;
>>> }
>>> -   dev_info(dev, "MDEV: Unregistering\n");
>>>  
>>> list_del(>next);
>>> mutex_unlock(_list_lock);
>>> @@ -243,6 +247,8 @@ void mdev_unregister_device(struct device *dev)
>>> up_write(>unreg_sem);
>>>  
>>> mdev_put_parent(parent);
>>> +
>>> +   kobject_uevent_env(>kobj, KOBJ_CHANGE, envp);
>>>  }
>>>  EXPORT_SYMBOL(mdev_unregister_device);
>>>  
>>>   
> 


Re: [PATCH] mdev: Send uevents around parent device registration

2019-06-26 Thread Kirti Wankhede



On 6/26/2019 7:57 PM, Alex Williamson wrote:
> This allows udev to trigger rules when a parent device is registered
> or unregistered from mdev.
> 
> Signed-off-by: Alex Williamson 
> ---
>  drivers/vfio/mdev/mdev_core.c |   10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> index ae23151442cb..ecec2a3b13cb 100644
> --- a/drivers/vfio/mdev/mdev_core.c
> +++ b/drivers/vfio/mdev/mdev_core.c
> @@ -146,6 +146,8 @@ int mdev_register_device(struct device *dev, const struct 
> mdev_parent_ops *ops)
>  {
>   int ret;
>   struct mdev_parent *parent;
> + char *env_string = "MDEV_STATE=registered";
> + char *envp[] = { env_string, NULL };
>  
>   /* check for mandatory ops */
>   if (!ops || !ops->create || !ops->remove || !ops->supported_type_groups)
> @@ -196,7 +198,8 @@ int mdev_register_device(struct device *dev, const struct 
> mdev_parent_ops *ops)
>   list_add(>next, _list);
>   mutex_unlock(_list_lock);
>  
> - dev_info(dev, "MDEV: Registered\n");
> + kobject_uevent_env(>kobj, KOBJ_CHANGE, envp);
> +

Its good to have udev event, but don't remove debug print from dmesg.
Same for unregister.

Thanks,
Kirti


>   return 0;
>  
>  add_dev_err:
> @@ -220,6 +223,8 @@ EXPORT_SYMBOL(mdev_register_device);
>  void mdev_unregister_device(struct device *dev)
>  {
>   struct mdev_parent *parent;
> + char *env_string = "MDEV_STATE=unregistered";
> + char *envp[] = { env_string, NULL };
>  
>   mutex_lock(_list_lock);
>   parent = __find_parent_device(dev);
> @@ -228,7 +233,6 @@ void mdev_unregister_device(struct device *dev)
>   mutex_unlock(_list_lock);
>   return;
>   }
> - dev_info(dev, "MDEV: Unregistering\n");
>  
>   list_del(>next);
>   mutex_unlock(_list_lock);
> @@ -243,6 +247,8 @@ void mdev_unregister_device(struct device *dev)
>   up_write(>unreg_sem);
>  
>   mdev_put_parent(parent);
> +
> + kobject_uevent_env(>kobj, KOBJ_CHANGE, envp);
>  }
>  EXPORT_SYMBOL(mdev_unregister_device);
>  
> 


Re: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence

2019-03-28 Thread Kirti Wankhede



On 3/26/2019 9:00 PM, Parav Pandit wrote:
> 
> 
>> -Original Message-----
>> From: Kirti Wankhede 
>> Sent: Tuesday, March 26, 2019 2:06 AM
>> To: Parav Pandit ; k...@vger.kernel.org; linux-
>> ker...@vger.kernel.org; alex.william...@redhat.com
>> Cc: Neo Jia 
>> Subject: Re: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence
>>
>>
>>
>> On 3/23/2019 4:50 AM, Parav Pandit wrote:
>>> There are five problems with current code structure.
>>> 1. mdev device is placed on the mdev bus before it is created in the
>>> vendor driver. Once a device is placed on the mdev bus without
>>> creating its supporting underlying vendor device, an open() can get
>>> triggered by userspace on partially initialized device.
>>> Below ladder diagram highlight it.
>>>
>>>   cpu-0   cpu-1
>>>   -   -
>>>create_store()
>>>  mdev_create_device()
>>>device_register()
>>>   ...
>>>  vfio_mdev_probe()
>>>  ...creates char device
>>> vfio_mdev_open()
>>>   parent->ops->open(mdev)
>>> vfio_ap_mdev_open()
>>>   matrix_mdev = NULL
>>> [...]
>>> parent->ops->create()
>>>   vfio_ap_mdev_create()
>>> mdev_set_drvdata(mdev, matrix_mdev);
>>> /* Valid pointer set above */
>>>
>>
>> VFIO interface uses sysfs path of device or PCI device's BDF where it checks
>> sysfs file for that device exist.
>> In case of VFIO mdev device, above situation will never happen as open will
>> only get called if sysfs entry for that device exist.
>>
>> If you don't use VFIO interface then this situation can arise. In that case
>> probe() can be used for very basic initialization then create actual char
>> device from create().
>>
> I explained you that create() cannot do the heavy lifting work of creating 
> netdev and rdma dev because at that stage driver doesn't know whether its 
> getting used for VM or host.
> create() needs to create the device that probe() can work on in stable manner.
> 

You can identify if its getting used by VM or host from create(). Since
probe() happens first, from create() you can check
mdev_dev(mdev)->driver->name, if its 'vfio_mdev' then its getting used
by VM, otherwise used by host.

>>
>>> 2. Current creation sequence is,
>>>parent->ops_create()
>>>groups_register()
>>>
>>> Remove sequence is,
>>>parent->ops->remove()
>>>groups_unregister()
>>> However, remove sequence should be exact mirror of creation sequence.
>>> Once this is achieved, all users of the mdev will be terminated first
>>> before removing underlying vendor device.
>>> (Follow standard linux driver model).
>>> At that point vendor's remove() ops shouldn't failed because device is
>>> taken off the bus that should terminate the users.
>>>
>>
>> If VMM or user space application is using mdev device,
>> parent->ops->remove() can return failure. In that case sysfs files
>> shouldn't be removed. Hence above sequence is followed for remove.
>>
>> Standard linux driver model doesn't allow remove() to fail, but in of mdev
>> framework, interface is defined to handle such error case.
>>
> But the sequence is incorrect for wider use case.
>>
>>> 3. Additionally any new mdev driver that wants to work on mdev device
>>> during probe() routine registered using mdev_register_driver() needs
>>> to get stable mdev structure.
>>>
>>
>> Things that you are trying to handle with mdev structure from probe(),
>> couldn't that be moved to create()?
>>
> No, as explained before and above.
> That approach just doesn't look right.
>

As I mentioned abouve, you can do that.


>>
>>> 4. In following sequence, child devices created while removing mdev
>>> parent device can be left out, or it may lead to race of removing half
>>> initialized child mdev devices.
>>>
>>> issue-1:
>>> 
>>>cpu-0 cpu-1
>>>- -
>>>   mdev_unregister_device()
>>> 

Re: [PATCH 8/8] vfio/mdev: Improve the create/remove sequence

2019-03-26 Thread Kirti Wankhede



On 3/23/2019 4:50 AM, Parav Pandit wrote:
> There are five problems with current code structure.
> 1. mdev device is placed on the mdev bus before it is created in the
> vendor driver. Once a device is placed on the mdev bus without creating
> its supporting underlying vendor device, an open() can get triggered by
> userspace on partially initialized device.
> Below ladder diagram highlight it.
> 
>   cpu-0   cpu-1
>   -   -
>create_store()
>  mdev_create_device()
>device_register()
>   ...
>  vfio_mdev_probe()
>  ...creates char device
> vfio_mdev_open()
>   parent->ops->open(mdev)
> vfio_ap_mdev_open()
>   matrix_mdev = NULL
> [...]
> parent->ops->create()
>   vfio_ap_mdev_create()
> mdev_set_drvdata(mdev, matrix_mdev);
> /* Valid pointer set above */
> 

VFIO interface uses sysfs path of device or PCI device's BDF where it
checks sysfs file for that device exist.
In case of VFIO mdev device, above situation will never happen as open
will only get called if sysfs entry for that device exist.

If you don't use VFIO interface then this situation can arise. In that
case probe() can be used for very basic initialization then create
actual char device from create().


> 2. Current creation sequence is,
>parent->ops_create()
>groups_register()
> 
> Remove sequence is,
>parent->ops->remove()
>groups_unregister()
> However, remove sequence should be exact mirror of creation sequence.
> Once this is achieved, all users of the mdev will be terminated first
> before removing underlying vendor device.
> (Follow standard linux driver model).
> At that point vendor's remove() ops shouldn't failed because device is
> taken off the bus that should terminate the users.
> 

If VMM or user space application is using mdev device,
parent->ops->remove() can return failure. In that case sysfs files
shouldn't be removed. Hence above sequence is followed for remove.

Standard linux driver model doesn't allow remove() to fail, but in
of mdev framework, interface is defined to handle such error case.


> 3. Additionally any new mdev driver that wants to work on mdev device
> during probe() routine registered using mdev_register_driver() needs to
> get stable mdev structure.
> 

Things that you are trying to handle with mdev structure from probe(),
couldn't that be moved to create()?


> 4. In following sequence, child devices created while removing mdev parent
> device can be left out, or it may lead to race of removing half
> initialized child mdev devices.
> 
> issue-1:
> 
>cpu-0 cpu-1
>- -
>   mdev_unregister_device()
>  device_for_each_child()
> mdev_device_remove_cb()
> mdev_device_remove()
> create_store()
>   mdev_device_create()   [...]
>device_register()
>   parent_remove_sysfs_files()
>   /* BUG: device added by cpu-0
>* whose parent is getting removed.
>*/
> 
> issue-2:
> 
>cpu-0 cpu-1
>- -
> create_store()
>   mdev_device_create()   [...]
>device_register()
> 
>[...]  mdev_unregister_device()
>  device_for_each_child()
> mdev_device_remove_cb()
> mdev_device_remove()
> 
>mdev_create_sysfs_files()
>/* BUG: create is adding
> * sysfs files for a device
> * which is undergoing removal.
> */
>  parent_remove_sysfs_files()
> 
> 5. Below crash is observed when user initiated remove is in progress
> and mdev_unregister_driver() completes parent unregistration.
> 
>cpu-0 cpu-1
>- -
> remove_store()
>mdev_device_remove()
>active = false;
>   mdev_unregister_device()
> remove type
>[...]
>mdev_remove_ops() crashes.
> 
> This is similar race like create() racing with mdev_unregister_device().
> 
> mtty mtty: MDEV: Registered
> iommu: Adding device 83b8f4f2-509f-382f-3c1e-e6bfe0fa1001 to group 57
> vfio_mdev 83b8f4f2-509f-382f-3c1e-e6bfe0fa1001: MDEV: group_id = 57
> mdev_device_remove sleep started
> mtty mtty: MDEV: 

Re: [PATCH 7/8] vfio/mdev: Fix aborting mdev child device removal if one fails

2019-03-25 Thread Kirti Wankhede



On 3/23/2019 4:50 AM, Parav Pandit wrote:
> device_for_each_child() stops executing callback function for remaining
> child devices, if callback hits an error.
> Each child mdev device is independent of each other.
> While unregistering parent device, mdev core must remove all child mdev
> devices.
> Therefore, mdev_device_remove_cb() always returns success so that
> device_for_each_child doesn't abort if one child removal hits error.
> 

When unregistering parent device, force_remove is set to true amd
mdev_device_remove_ops() always returns success.

> While at it, improve remove and unregister functions for below simplicity.
> 
> There isn't need to pass forced flag pointer during mdev parent
> removal which invokes mdev_device_remove().

There is a need to pass the flag, pasting here the comment above
mdev_device_remove_ops() which explains why the flag is needed:

/*
 * mdev_device_remove_ops gets called from sysfs's 'remove' and when parent
 * device is being unregistered from mdev device framework.
 * - 'force_remove' is set to 'false' when called from sysfs's 'remove'
which
 *   indicates that if the mdev device is active, used by VMM or userspace
 *   application, vendor driver could return error then don't remove the
device.
 * - 'force_remove' is set to 'true' when called from
mdev_unregister_device()
 *   which indicate that parent device is being removed from mdev device
 *   framework so remove mdev device forcefully.
 */

Thanks,
Kirti

 So simplify the flow.
> 
> mdev_device_remove() is called from two paths.
> 1. mdev_unregister_driver()
>  mdev_device_remove_cb()
>mdev_device_remove()
> 2. remove_store()
>  mdev_device_remove()
> 
> When device is removed by user using remote_store(), device under
> removal is mdev device.
> When device is removed during parent device removal using generic child
> iterator, mdev check is already done using dev_is_mdev().
> 
> Hence, remove the unnecessary loop in mdev_device_remove().
> 
> Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
> Signed-off-by: Parav Pandit 
> ---
>  drivers/vfio/mdev/mdev_core.c | 24 +---
>  1 file changed, 5 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> index ab05464..944a058 100644
> --- a/drivers/vfio/mdev/mdev_core.c
> +++ b/drivers/vfio/mdev/mdev_core.c
> @@ -150,10 +150,10 @@ static int mdev_device_remove_ops(struct mdev_device 
> *mdev, bool force_remove)
>  
>  static int mdev_device_remove_cb(struct device *dev, void *data)
>  {
> - if (!dev_is_mdev(dev))
> - return 0;
> + if (dev_is_mdev(dev))
> + mdev_device_remove(dev, true);
>  
> - return mdev_device_remove(dev, data ? *(bool *)data : true);
> + return 0;
>  }
>  
>  /*
> @@ -241,7 +241,6 @@ int mdev_register_device(struct device *dev, const struct 
> mdev_parent_ops *ops)
>  void mdev_unregister_device(struct device *dev)
>  {
>   struct mdev_parent *parent;
> - bool force_remove = true;
>  
>   mutex_lock(_list_lock);
>   parent = __find_parent_device(dev);
> @@ -255,8 +254,7 @@ void mdev_unregister_device(struct device *dev)
>   list_del(>next);
>   class_compat_remove_link(mdev_bus_compat_class, dev, NULL);
>  
> - device_for_each_child(dev, (void *)_remove,
> -   mdev_device_remove_cb);
> + device_for_each_child(dev, NULL, mdev_device_remove_cb);
>  
>   parent_remove_sysfs_files(parent);
>  
> @@ -346,24 +344,12 @@ int mdev_device_create(struct kobject *kobj, struct 
> device *dev, uuid_le uuid)
>  
>  int mdev_device_remove(struct device *dev, bool force_remove)
>  {
> - struct mdev_device *mdev, *tmp;
> + struct mdev_device *mdev;
>   struct mdev_parent *parent;
>   struct mdev_type *type;
>   int ret;
>  
>   mdev = to_mdev_device(dev);
> -
> - mutex_lock(_list_lock);
> - list_for_each_entry(tmp, _list, next) {
> - if (tmp == mdev)
> - break;
> - }
> -
> - if (tmp != mdev) {
> - mutex_unlock(_list_lock);
> - return -ENODEV;
> - }
> -
>   if (!mdev->active) {
>   mutex_unlock(_list_lock);
>   return -EAGAIN;
> 


Re: [PATCH 5/8] vfio/mdev: Avoid masking error code to EBUSY

2019-03-25 Thread Kirti Wankhede



On 3/23/2019 4:50 AM, Parav Pandit wrote:
> Instead of masking return error to -EBUSY, return actual error
> returned by the driver.
> 
> Signed-off-by: Parav Pandit 
> ---
>  drivers/vfio/mdev/mdev_core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> index 3d91f62..ab05464 100644
> --- a/drivers/vfio/mdev/mdev_core.c
> +++ b/drivers/vfio/mdev/mdev_core.c
> @@ -142,7 +142,7 @@ static int mdev_device_remove_ops(struct mdev_device 
> *mdev, bool force_remove)
>*/
>   ret = parent->ops->remove(mdev);
>   if (ret && !force_remove)
> - return -EBUSY;
> + return ret;
>  
>   sysfs_remove_groups(>dev.kobj, parent->ops->mdev_attr_groups);
>   return 0;
> 

Intentionally returned -EBUSY here. If VMM or userspace application is
using this mdev device, vendor driver can return error. In that case
sysfs interface should see -EBUSY error indicating device is still active.

Thanks,
Kirti


Re: [PATCH 4/8] vfio/mdev: Drop redundant extern for exported symbols

2019-03-25 Thread Kirti Wankhede



On 3/23/2019 4:50 AM, Parav Pandit wrote:
> There is no need use 'extern' for exported functions.
> 
> Signed-off-by: Parav Pandit 
> ---
>  include/linux/mdev.h | 21 ++---
>  1 file changed, 10 insertions(+), 11 deletions(-)
> 
> diff --git a/include/linux/mdev.h b/include/linux/mdev.h
> index b6e048e..0924c48 100644
> --- a/include/linux/mdev.h
> +++ b/include/linux/mdev.h
> @@ -118,21 +118,20 @@ struct mdev_driver {
>  
>  #define to_mdev_driver(drv)  container_of(drv, struct mdev_driver, driver)
>  
> -extern void *mdev_get_drvdata(struct mdev_device *mdev);
> -extern void mdev_set_drvdata(struct mdev_device *mdev, void *data);
> -extern uuid_le mdev_uuid(struct mdev_device *mdev);
> +void *mdev_get_drvdata(struct mdev_device *mdev);
> +void mdev_set_drvdata(struct mdev_device *mdev, void *data);
> +uuid_le mdev_uuid(struct mdev_device *mdev);
>  
>  extern struct bus_type mdev_bus_type;
>  
> -extern int  mdev_register_device(struct device *dev,
> -  const struct mdev_parent_ops *ops);
> -extern void mdev_unregister_device(struct device *dev);
> +int mdev_register_device(struct device *dev, const struct mdev_parent_ops 
> *ops);
> +void mdev_unregister_device(struct device *dev);
>  
> -extern int  mdev_register_driver(struct mdev_driver *drv, struct module 
> *owner);
> -extern void mdev_unregister_driver(struct mdev_driver *drv);
> +int mdev_register_driver(struct mdev_driver *drv, struct module *owner);
> +void mdev_unregister_driver(struct mdev_driver *drv);
>  
> -extern struct device *mdev_parent_dev(struct mdev_device *mdev);
> -extern struct device *mdev_dev(struct mdev_device *mdev);
> -extern struct mdev_device *mdev_from_dev(struct device *dev);
> +struct device *mdev_parent_dev(struct mdev_device *mdev);
> +struct device *mdev_dev(struct mdev_device *mdev);
> +struct mdev_device *mdev_from_dev(struct device *dev);
>  
>  #endif /* MDEV_H */
> 

Adding 'extern' to exported symbols is inline to other exported
functions from device's core module like device_register(),
device_unregister(), get_device(), put_device()

Thanks,
Kirti



Re: [PATCH 3/8] vfio/mdev: Removed unused kref

2019-03-25 Thread Kirti Wankhede



On 3/23/2019 4:50 AM, Parav Pandit wrote:
> Remove unused kref from the mdev_device structure.
> 
> Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
> Signed-off-by: Parav Pandit 
> ---
>  drivers/vfio/mdev/mdev_core.c| 1 -
>  drivers/vfio/mdev/mdev_private.h | 1 -
>  2 files changed, 2 deletions(-)
> 
> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> index 4f213e4d..3d91f62 100644
> --- a/drivers/vfio/mdev/mdev_core.c
> +++ b/drivers/vfio/mdev/mdev_core.c
> @@ -311,7 +311,6 @@ int mdev_device_create(struct kobject *kobj, struct 
> device *dev, uuid_le uuid)
>   mutex_unlock(_list_lock);
>  
>   mdev->parent = parent;
> - kref_init(>ref);
>  
>   mdev->dev.parent  = dev;
>   mdev->dev.bus = _bus_type;
> diff --git a/drivers/vfio/mdev/mdev_private.h 
> b/drivers/vfio/mdev/mdev_private.h
> index b5819b7..84b2b6c 100644
> --- a/drivers/vfio/mdev/mdev_private.h
> +++ b/drivers/vfio/mdev/mdev_private.h
> @@ -30,7 +30,6 @@ struct mdev_device {
>   struct mdev_parent *parent;
>   uuid_le uuid;
>   void *driver_data;
> - struct kref ref;
>   struct list_head next;
>   struct kobject *type_kobj;
>   bool active;
> 

Yes, this should be removed.

Reviewed By: Kirti Wankhede 

Thanks,
Kirti



Re: [PATCH 2/8] vfio/mdev: Avoid release parent reference during error path

2019-03-25 Thread Kirti Wankhede



On 3/23/2019 4:50 AM, Parav Pandit wrote:
> During mdev parent registration in mdev_register_device(),
> if parent device is duplicate, it releases the reference of existing
> parent device.
> This is incorrect. Existing parent device should not be touched.
> 
> Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
> Signed-off-by: Parav Pandit 
> ---
>  drivers/vfio/mdev/mdev_core.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> index 3e5880a..4f213e4d 100644
> --- a/drivers/vfio/mdev/mdev_core.c
> +++ b/drivers/vfio/mdev/mdev_core.c
> @@ -182,6 +182,7 @@ int mdev_register_device(struct device *dev, const struct 
> mdev_parent_ops *ops)
>   /* Check for duplicate */
>   parent = __find_parent_device(dev);
>   if (parent) {
> + parent = NULL;
>   ret = -EEXIST;
>   goto add_dev_err;
>   }
> 

Agreed. Thanks for fixing this.

Reviewed By: Kirti Wankhede 

Thanks,
Kirti


Re: [PATCH 1/8] vfio/mdev: Fix to not do put_device on device_register failure

2019-03-25 Thread Kirti Wankhede



On 3/23/2019 4:50 AM, Parav Pandit wrote:
> device_register() performs put_device() if device_add() fails.
> This balances with device_initialize().
> 
> mdev core performing put_device() when device_register() fails,
> is an error that puts already released device again.
> Therefore, don't put the device on error.
> 

device_add() on all errors doesn't call put_device(dev). It releases
reference to its parent, put_device(parent), but not the device itself,
put_device(dev).

Thanks,
Kirti


> Fixes: 7b96953bc640 ("vfio: Mediated device Core driver")
> Signed-off-by: Parav Pandit 
> ---
>  drivers/vfio/mdev/mdev_core.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> index 0212f0e..3e5880a 100644
> --- a/drivers/vfio/mdev/mdev_core.c
> +++ b/drivers/vfio/mdev/mdev_core.c
> @@ -318,10 +318,8 @@ int mdev_device_create(struct kobject *kobj, struct 
> device *dev, uuid_le uuid)
>   dev_set_name(>dev, "%pUl", uuid.b);
>  
>   ret = device_register(>dev);
> - if (ret) {
> - put_device(>dev);
> + if (ret)
>   goto mdev_fail;
> - }
>  
>   ret = mdev_device_create_ops(kobj, mdev);
>   if (ret)
> 


Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension

2019-03-08 Thread Kirti Wankhede



On 3/8/2019 4:01 AM, Parav Pandit wrote:
> 
> 
>> -Original Message-----
>> From: Kirti Wankhede 
>> Sent: Thursday, March 7, 2019 4:02 PM
>> To: Parav Pandit ; Jakub Kicinski
>> 
>> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
>> ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net;
>> gre...@linuxfoundation.org; Jiri Pirko ; Alex
>> Williamson 
>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
>>
>>
>>
>> On 3/8/2019 2:51 AM, Parav Pandit wrote:
>>>
>>>
>>>> -Original Message-
>>>> From: Kirti Wankhede 
>>>> Sent: Thursday, March 7, 2019 3:08 PM
>>>> To: Parav Pandit ; Jakub Kicinski
>>>> 
>>>> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
>>>> ker...@vger.kernel.org; michal.l...@markovi.net;
>> da...@davemloft.net;
>>>> gre...@linuxfoundation.org; Jiri Pirko ; Alex
>>>> Williamson 
>>>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink
>>>> extension
>>>>
>>>>
>>>>
>>>> On 3/8/2019 2:32 AM, Parav Pandit wrote:
>>>>>
>>>>>
>>>>>> -Original Message-
>>>>>> From: Kirti Wankhede 
>>>>>> Sent: Thursday, March 7, 2019 2:54 PM
>>>>>> To: Parav Pandit ; Jakub Kicinski
>>>>>> 
>>>>>> Cc: Or Gerlitz ; net...@vger.kernel.org;
>>>>>> linux- ker...@vger.kernel.org; michal.l...@markovi.net;
>>>> da...@davemloft.net;
>>>>>> gre...@linuxfoundation.org; Jiri Pirko ; Alex
>>>>>> Williamson 
>>>>>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink
>>>>>> extension
>>>>>>
>>>>>>
>>>>>>
>>>>>> 
>>>>>>
>>>>>>>>>
>>>>>>>>> Yes. I got my patches to adapt to mdev way. Will be posting RFC
>>>>>>>>> v2
>>>> soon.
>>>>>>>>> Will wait for a day to receive more comments/views from Greg and
>>>>>> others.
>>>>>>>>>
>>>>>>>>> As I explained in this cover-letter and discussion, First use
>>>>>>>>> case is to create and use mdevs in the host (and not in VM).
>>>>>>>>> Later on, I am sure once we have mdevs available, VM users will
>>>>>>>>> likely use
>>>>>>>> it.
>>>>>>>>>
>>>>>>>>> So, mlx5_core driver will have two components as starting point.
>>>>>>>>>
>>>>>>>>> 1. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev.c
>>>>>>>>> This is mdev device life cycle driver which will do,
>>>>>>>>> mdev_register_device()
>>>>>>>> and implements mlx5_mdev_ops.
>>>>>>>>>
>>>>>>>> Ok. I would suggest not use mdev.c file name, may be add device
>>>>>>>> name, something like mlx_mdev.c or vfio_mlx.c
>>>>>>>>
>>>>>>> mlx5/core is coding convention is not following to prefix mlx to
>>>>>>> its
>>>>>>> 40+
>>>>>> files.
>>>>>>>
>>>>>>> it uses actual subsystem or functionality name, such as, sriov.c
>>>>>>> eswitch.c fw.c en_tc.c (en for Ethernet) lag.c so, mdev.c aligns
>>>>>>> to rest of the 40+ files.
>>>>>>>
>>>>>>>
>>>>>>>>> 2. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev_driver.c
>>>>>>>>> This is mdev device driver which does mdev_register_driver() and
>>>>>>>>> probe() creates netdev by heavily reusing existing code of the
>>>>>>>>> PF
>>>> device.
>>>>>>>>> These drivers will not be placed under drivers/vfio/mdev,
>>>>>>>>> because this is
>>>>>>>> not a vfio driver.
>>>>>>>>> This is fine, right?
>>>>>>>>>
>>>>>>>>
>>>>>>>> I'm not too familiar with netdev, but can you create netdev on
>>>>>>>> open() call on mlx mdev devi

Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension

2019-03-07 Thread Kirti Wankhede



On 3/8/2019 2:51 AM, Parav Pandit wrote:
> 
> 
>> -Original Message-----
>> From: Kirti Wankhede 
>> Sent: Thursday, March 7, 2019 3:08 PM
>> To: Parav Pandit ; Jakub Kicinski
>> 
>> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
>> ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net;
>> gre...@linuxfoundation.org; Jiri Pirko ; Alex
>> Williamson 
>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
>>
>>
>>
>> On 3/8/2019 2:32 AM, Parav Pandit wrote:
>>>
>>>
>>>> -Original Message-
>>>> From: Kirti Wankhede 
>>>> Sent: Thursday, March 7, 2019 2:54 PM
>>>> To: Parav Pandit ; Jakub Kicinski
>>>> 
>>>> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
>>>> ker...@vger.kernel.org; michal.l...@markovi.net;
>> da...@davemloft.net;
>>>> gre...@linuxfoundation.org; Jiri Pirko ; Alex
>>>> Williamson 
>>>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink
>>>> extension
>>>>
>>>>
>>>>
>>>> 
>>>>
>>>>>>>
>>>>>>> Yes. I got my patches to adapt to mdev way. Will be posting RFC v2
>> soon.
>>>>>>> Will wait for a day to receive more comments/views from Greg and
>>>> others.
>>>>>>>
>>>>>>> As I explained in this cover-letter and discussion, First use case
>>>>>>> is to create and use mdevs in the host (and not in VM).
>>>>>>> Later on, I am sure once we have mdevs available, VM users will
>>>>>>> likely use
>>>>>> it.
>>>>>>>
>>>>>>> So, mlx5_core driver will have two components as starting point.
>>>>>>>
>>>>>>> 1. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev.c
>>>>>>> This is mdev device life cycle driver which will do,
>>>>>>> mdev_register_device()
>>>>>> and implements mlx5_mdev_ops.
>>>>>>>
>>>>>> Ok. I would suggest not use mdev.c file name, may be add device
>>>>>> name, something like mlx_mdev.c or vfio_mlx.c
>>>>>>
>>>>> mlx5/core is coding convention is not following to prefix mlx to its
>>>>> 40+
>>>> files.
>>>>>
>>>>> it uses actual subsystem or functionality name, such as, sriov.c
>>>>> eswitch.c fw.c en_tc.c (en for Ethernet) lag.c so, mdev.c aligns to
>>>>> rest of the 40+ files.
>>>>>
>>>>>
>>>>>>> 2. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev_driver.c
>>>>>>> This is mdev device driver which does mdev_register_driver() and
>>>>>>> probe() creates netdev by heavily reusing existing code of the PF
>> device.
>>>>>>> These drivers will not be placed under drivers/vfio/mdev, because
>>>>>>> this is
>>>>>> not a vfio driver.
>>>>>>> This is fine, right?
>>>>>>>
>>>>>>
>>>>>> I'm not too familiar with netdev, but can you create netdev on
>>>>>> open() call on mlx mdev device? Then you don't have to write mdev
>>>>>> device
>>>> driver.
>>>>>>
>>>>> Who invokes open() and release()?
>>>>> I believe it is the qemu would do open(), release, read/write/mmap?
>>>>>
>>>>> Assuming that is the case,
>>>>> I think its incorrect to create netdev in open.
>>>>> Because when we want to map the mdev to VM using above mdev calls,
>>>>> we
>>>> actually wont be creating netdev in host.
>>>>> Instead, some queues etc will be setup as part of these calls.
>>>>>
>>>>> By default this created mdev is bound to vfio_mdev.
>>>>> And once we unbind the device from this driver, we need to bind to
>>>>> mlx5
>>>> driver so that driver can create the netdev etc.
>>>>>
>>>>> Or did I get open() and friends call wrong?
>>>>>
>>>>
>>>> In 'struct mdev_parent_ops' there are create() and remove(). When
>>>> user creates mdev device by writing UUID to create sysfs, vendor
>>>> driver's
>>>> create() callback gets called. This should b

Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension

2019-03-07 Thread Kirti Wankhede



On 3/8/2019 2:32 AM, Parav Pandit wrote:
> 
> 
>> -Original Message-----
>> From: Kirti Wankhede 
>> Sent: Thursday, March 7, 2019 2:54 PM
>> To: Parav Pandit ; Jakub Kicinski
>> 
>> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
>> ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net;
>> gre...@linuxfoundation.org; Jiri Pirko ; Alex
>> Williamson 
>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
>>
>>
>>
>> 
>>
>>>>>
>>>>> Yes. I got my patches to adapt to mdev way. Will be posting RFC v2 soon.
>>>>> Will wait for a day to receive more comments/views from Greg and
>> others.
>>>>>
>>>>> As I explained in this cover-letter and discussion, First use case
>>>>> is to create and use mdevs in the host (and not in VM).
>>>>> Later on, I am sure once we have mdevs available, VM users will
>>>>> likely use
>>>> it.
>>>>>
>>>>> So, mlx5_core driver will have two components as starting point.
>>>>>
>>>>> 1. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev.c
>>>>> This is mdev device life cycle driver which will do,
>>>>> mdev_register_device()
>>>> and implements mlx5_mdev_ops.
>>>>>
>>>> Ok. I would suggest not use mdev.c file name, may be add device name,
>>>> something like mlx_mdev.c or vfio_mlx.c
>>>>
>>> mlx5/core is coding convention is not following to prefix mlx to its 40+
>> files.
>>>
>>> it uses actual subsystem or functionality name, such as, sriov.c
>>> eswitch.c fw.c en_tc.c (en for Ethernet) lag.c so, mdev.c aligns to
>>> rest of the 40+ files.
>>>
>>>
>>>>> 2. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev_driver.c
>>>>> This is mdev device driver which does mdev_register_driver() and
>>>>> probe() creates netdev by heavily reusing existing code of the PF device.
>>>>> These drivers will not be placed under drivers/vfio/mdev, because
>>>>> this is
>>>> not a vfio driver.
>>>>> This is fine, right?
>>>>>
>>>>
>>>> I'm not too familiar with netdev, but can you create netdev on open()
>>>> call on mlx mdev device? Then you don't have to write mdev device
>> driver.
>>>>
>>> Who invokes open() and release()?
>>> I believe it is the qemu would do open(), release, read/write/mmap?
>>>
>>> Assuming that is the case,
>>> I think its incorrect to create netdev in open.
>>> Because when we want to map the mdev to VM using above mdev calls, we
>> actually wont be creating netdev in host.
>>> Instead, some queues etc will be setup as part of these calls.
>>>
>>> By default this created mdev is bound to vfio_mdev.
>>> And once we unbind the device from this driver, we need to bind to mlx5
>> driver so that driver can create the netdev etc.
>>>
>>> Or did I get open() and friends call wrong?
>>>
>>
>> In 'struct mdev_parent_ops' there are create() and remove(). When user
>> creates mdev device by writing UUID to create sysfs, vendor driver's
>> create() callback gets called. This should be used to allocate/commit
> Yes. I am already past that stage.
> 
>> resources from parent device and on remove() callback free those resources.
>> So there is no need to bind mlx5 driver to that mdev device.
>>
> If we don't bind mlx5 driver, vfio_mdev driver is bound to it. Such driver 
> won't create netdev.

Doesn't need to.

Create netdev from create() callback.

Thanks,
Kirti

> Again, we do not want to map this mdev to a VM.
> We want to consume it in the host where mdev is created.
> So I am able to detach this mdev from vfio_mdev driver as usaual using 
> $ echo mdev_name > ../drivers/vfio_mdev/unbind
> 
> Followed by binding it to mlx5_core driver.
> 
> Below is sample output before binding it to mlx5_core driver.
> When we bind with mlx5_core driver, that driver creates the netdev in host.
> If user wants to map this mdev to VM, user won't bind to mlx5_core driver. 
> instead he will bind to vfio driver and that does usual open/release/...
> 
> 
> lrwxrwxrwx 1 root root 0 Mar  7 14:24 69ea1551-d054-46e9-974d-8edae8f0aefe -> 
> ../../../devices/pci:00/:00:02.2/:05:00.0/69ea1551-d054-46e9-974d-8edae8f0aefe
> [root@sw-mtx-036 net-next]# ls -l 
> /sys/bus/mdev/devices/69ea1551-d054-46e9-974d-8edae8f0aefe/
> total 0
> lrwxrwxrwx 1 root root0 Mar  7 14:24 driver -> 
> ../../../../../bus/mdev/drivers/vfio_mdev
> lrwxrwxrwx 1 root root0 Mar  7 14:24 iommu_group -> 
> ../../../../../kernel/iommu_groups/0
> lrwxrwxrwx 1 root root0 Mar  7 14:24 mdev_type -> 
> ../mdev_supported_types/mlx5_core-mgmt
> drwxr-xr-x 2 root root0 Mar  7 14:24 power
> --w--- 1 root root 4096 Mar  7 14:24 remove
> lrwxrwxrwx 1 root root0 Mar  7 14:24 subsystem -> ../../../../../bus/mdev
> -rw-r--r-- 1 root root 4096 Mar  7 14:24 uevent
> 
>> open/release/read/write/mmap/ioctl are regular file operations for that
>> mdev device.
>>
> 
>> Thanks,
>> Kirti
> 


Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension

2019-03-07 Thread Kirti Wankhede





>>>
>>> Yes. I got my patches to adapt to mdev way. Will be posting RFC v2 soon.
>>> Will wait for a day to receive more comments/views from Greg and others.
>>>
>>> As I explained in this cover-letter and discussion, First use case is
>>> to create and use mdevs in the host (and not in VM).
>>> Later on, I am sure once we have mdevs available, VM users will likely use
>> it.
>>>
>>> So, mlx5_core driver will have two components as starting point.
>>>
>>> 1. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev.c
>>> This is mdev device life cycle driver which will do, mdev_register_device()
>> and implements mlx5_mdev_ops.
>>>
>> Ok. I would suggest not use mdev.c file name, may be add device name,
>> something like mlx_mdev.c or vfio_mlx.c
>>
> mlx5/core is coding convention is not following to prefix mlx to its 40+ 
> files.
> 
> it uses actual subsystem or functionality name, such as,
> sriov.c
> eswitch.c
> fw.c
> en_tc.c (en for Ethernet)
> lag.c
> so,
> mdev.c aligns to rest of the 40+ files.
> 
> 
>>> 2. drivers/net/ethernet/mellanox/mlx5/core/mdev/mdev_driver.c
>>> This is mdev device driver which does mdev_register_driver() and
>>> probe() creates netdev by heavily reusing existing code of the PF device.
>>> These drivers will not be placed under drivers/vfio/mdev, because this is
>> not a vfio driver.
>>> This is fine, right?
>>>
>>
>> I'm not too familiar with netdev, but can you create netdev on open() call on
>> mlx mdev device? Then you don't have to write mdev device driver.
>>
> Who invokes open() and release()?
> I believe it is the qemu would do open(), release, read/write/mmap?
> 
> Assuming that is the case,
> I think its incorrect to create netdev in open.
> Because when we want to map the mdev to VM using above mdev calls, we 
> actually wont be creating netdev in host.
> Instead, some queues etc will be setup as part of these calls.
> 
> By default this created mdev is bound to vfio_mdev.
> And once we unbind the device from this driver, we need to bind to mlx5 
> driver so that driver can create the netdev etc.
> 
> Or did I get open() and friends call wrong?
> 

In 'struct mdev_parent_ops' there are create() and remove(). When user
creates mdev device by writing UUID to create sysfs, vendor driver's
create() callback gets called. This should be used to allocate/commit
resources from parent device and on remove() callback free those
resources. So there is no need to bind mlx5 driver to that mdev device.

open/release/read/write/mmap/ioctl are regular file operations for that
mdev device.

Thanks,
Kirti



Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension

2019-03-07 Thread Kirti Wankhede
CC += Alex

On 3/6/2019 11:12 AM, Parav Pandit wrote:
> Hi Kirti,
> 
>> -Original Message-----
>> From: Kirti Wankhede 
>> Sent: Tuesday, March 5, 2019 9:51 PM
>> To: Parav Pandit ; Jakub Kicinski
>> 
>> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
>> ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net;
>> gre...@linuxfoundation.org; Jiri Pirko 
>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
>>
>>
>>
>> On 3/6/2019 6:14 AM, Parav Pandit wrote:
>>> Hi Greg, Kirti,
>>>
>>>> -----Original Message-
>>>> From: Parav Pandit
>>>> Sent: Tuesday, March 5, 2019 5:45 PM
>>>> To: Parav Pandit ; Kirti Wankhede
>>>> ; Jakub Kicinski
>> 
>>>> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
>>>> ker...@vger.kernel.org; michal.l...@markovi.net;
>> da...@davemloft.net;
>>>> gre...@linuxfoundation.org; Jiri Pirko 
>>>> Subject: RE: [RFC net-next 0/8] Introducing subdev bus and devlink
>>>> extension
>>>>
>>>>
>>>>
>>>>> -Original Message-
>>>>> From: linux-kernel-ow...@vger.kernel.org >>>> ow...@vger.kernel.org> On Behalf Of Parav Pandit
>>>>> Sent: Tuesday, March 5, 2019 5:17 PM
>>>>> To: Kirti Wankhede ; Jakub Kicinski
>>>>> 
>>>>> Cc: Or Gerlitz ; net...@vger.kernel.org;
>>>>> linux- ker...@vger.kernel.org; michal.l...@markovi.net;
>>>>> da...@davemloft.net; gre...@linuxfoundation.org; Jiri Pirko
>>>>> 
>>>>> Subject: RE: [RFC net-next 0/8] Introducing subdev bus and devlink
>>>>> extension
>>>>>
>>>>> Hi Kirti,
>>>>>
>>>>>> -Original Message-
>>>>>> From: Kirti Wankhede 
>>>>>> Sent: Tuesday, March 5, 2019 4:40 PM
>>>>>> To: Parav Pandit ; Jakub Kicinski
>>>>>> 
>>>>>> Cc: Or Gerlitz ; net...@vger.kernel.org;
>>>>>> linux- ker...@vger.kernel.org; michal.l...@markovi.net;
>>>>>> da...@davemloft.net; gre...@linuxfoundation.org; Jiri Pirko
>>>>>> 
>>>>>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink
>>>>>> extension
>>>>>>
>>>>>>
>>>>>>
>>>>>>> I am novice at mdev level too. mdev or vfio mdev.
>>>>>>> Currently by default we bind to same vendor driver, but when it
>>>>>>> was
>>>>>> created as passthrough device, vendor driver won't create netdevice
>>>>>> or rdma device for it.
>>>>>>> And vfio/mdev or whatever mature available driver would bind at
>>>>>>> that
>>>>>> point.
>>>>>>>
>>>>>>
>>>>>> Using mdev framework, if you want to partition a physical device
>>>>>> into multiple logic devices, you can bind those devices to same
>>>>>> vendor driver through vfio-mdev, where as if you want to
>>>>>> passthrough the device bind it to vfio-pci. If I understand
>>>>>> correctly, that is what you are
>>>>> looking for.
>>>>>>
>>>>>>
>>>>> We cannot bind a whole PCI device to vfio-pci, reason is, A given
>>>>> PCI device has existing protocol devices on it such as netdevs and rdma
>> dev.
>>>>> This device is partitioned while those protocol devices exist and
>>>>> mlx5_core, mlx5_ib drivers are loaded on it.
>>>>> And we also need to connect these objects rightly to eswitch exposed
>>>>> by devlink interface (net/core/devlink.c) that supports eswitch
>>>>> binding, health, registers, parameters, ports support.
>>>>> It also supports existing PCI VFs.
>>>>>
>>>>> I don’t think we want to replicate all of this again in mdev subsystem 
>>>>> [1].
>>>>>
>>>>> [1]
>>>>> https://www.kernel.org/doc/Documentation/vfio-mediated-device.txt
>>>>>
>>>>> So devlink interface to migrate users from managing VFs to non_VF
>>>>> sub device is natural progression.
>>>>>
>>>>> However, in future, I believe we would be creating mediated devices
>>>>> on user request, to use mdev modu

Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension

2019-03-05 Thread Kirti Wankhede



On 3/6/2019 6:14 AM, Parav Pandit wrote:
> Hi Greg, Kirti,
> 
>> -Original Message-
>> From: Parav Pandit
>> Sent: Tuesday, March 5, 2019 5:45 PM
>> To: Parav Pandit ; Kirti Wankhede
>> ; Jakub Kicinski 
>> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
>> ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net;
>> gre...@linuxfoundation.org; Jiri Pirko 
>> Subject: RE: [RFC net-next 0/8] Introducing subdev bus and devlink extension
>>
>>
>>
>>> -Original Message-
>>> From: linux-kernel-ow...@vger.kernel.org >> ow...@vger.kernel.org> On Behalf Of Parav Pandit
>>> Sent: Tuesday, March 5, 2019 5:17 PM
>>> To: Kirti Wankhede ; Jakub Kicinski
>>> 
>>> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
>>> ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net;
>>> gre...@linuxfoundation.org; Jiri Pirko 
>>> Subject: RE: [RFC net-next 0/8] Introducing subdev bus and devlink
>>> extension
>>>
>>> Hi Kirti,
>>>
>>>> -Original Message-
>>>> From: Kirti Wankhede 
>>>> Sent: Tuesday, March 5, 2019 4:40 PM
>>>> To: Parav Pandit ; Jakub Kicinski
>>>> 
>>>> Cc: Or Gerlitz ; net...@vger.kernel.org;
>>>> linux- ker...@vger.kernel.org; michal.l...@markovi.net;
>>>> da...@davemloft.net; gre...@linuxfoundation.org; Jiri Pirko
>>>> 
>>>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink
>>>> extension
>>>>
>>>>
>>>>
>>>>> I am novice at mdev level too. mdev or vfio mdev.
>>>>> Currently by default we bind to same vendor driver, but when it
>>>>> was
>>>> created as passthrough device, vendor driver won't create netdevice
>>>> or rdma device for it.
>>>>> And vfio/mdev or whatever mature available driver would bind at
>>>>> that
>>>> point.
>>>>>
>>>>
>>>> Using mdev framework, if you want to partition a physical device
>>>> into multiple logic devices, you can bind those devices to same
>>>> vendor driver through vfio-mdev, where as if you want to passthrough
>>>> the device bind it to vfio-pci. If I understand correctly, that is
>>>> what you are
>>> looking for.
>>>>
>>>>
>>> We cannot bind a whole PCI device to vfio-pci, reason is, A given PCI
>>> device has existing protocol devices on it such as netdevs and rdma dev.
>>> This device is partitioned while those protocol devices exist and
>>> mlx5_core, mlx5_ib drivers are loaded on it.
>>> And we also need to connect these objects rightly to eswitch exposed
>>> by devlink interface (net/core/devlink.c) that supports eswitch
>>> binding, health, registers, parameters, ports support.
>>> It also supports existing PCI VFs.
>>>
>>> I don’t think we want to replicate all of this again in mdev subsystem [1].
>>>
>>> [1] https://www.kernel.org/doc/Documentation/vfio-mediated-device.txt
>>>
>>> So devlink interface to migrate users from managing VFs to non_VF sub
>>> device is natural progression.
>>>
>>> However, in future, I believe we would be creating mediated devices on
>>> user request, to use mdev modules and map them to VM.
>>>
>>> Also 'mdev_bus' is created as a class and not as a bus. This limits to
>>> not use devlink interface whose handle is bus+device name.
>>>
>>> So one option is to change mdev from class to bus.
>>> devlink will create mdevs on the bus, mdev driver can probe these
>>> devices on host system by default.
>>> And if told to do passthrough, a different driver exposes them to VM.
>>> How feasible is this?
>>>
>> Wait, I do see a mdev bus and mdevs are created on this bus using
>> mdev_device_create().
>> So how about we create mdevs on this bus using devlink, instead of sysfs?
>> And driver side on host gets the mdev_register_driver()->probe()?
>>
> 
> Thinking more and reviewing more mdev code, I believe mdev fits 
> this need a lot better than new subdev bus, mfd, platform device, or devlink 
> subport.
> For coming future, to map this sub device (mdev) to VM will also be easier by 
> using mdev bus.
> 

Thanks for taking close look at mdev code.

Assigning mdev to VM support is already in place, QEMU and libvirt have
support to assign mdev device to VM.

> I also believe we can us

Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension

2019-03-05 Thread Kirti Wankhede



On 3/6/2019 1:16 AM, Parav Pandit wrote:
> 
> 
>> -Original Message-
>> From: Jakub Kicinski 
>> Sent: Monday, March 4, 2019 7:35 PM
>> To: Parav Pandit 
>> Cc: Or Gerlitz ; net...@vger.kernel.org; linux-
>> ker...@vger.kernel.org; michal.l...@markovi.net; da...@davemloft.net;
>> gre...@linuxfoundation.org; Jiri Pirko 
>> Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink extension
>>
>> Parav, please wrap your responses to at most 80 characters.
>> This is hard to read.
>>
> Sorry about it. I will wrap now on.
> 
>> On Mon, 4 Mar 2019 04:41:01 +, Parav Pandit wrote:
 -Original Message-
 From: Jakub Kicinski 
 Sent: Friday, March 1, 2019 2:04 PM
 To: Parav Pandit ; Or Gerlitz
 
 Cc: net...@vger.kernel.org; linux-kernel@vger.kernel.org;
 michal.l...@markovi.net; da...@davemloft.net;
 gre...@linuxfoundation.org; Jiri Pirko 
 Subject: Re: [RFC net-next 0/8] Introducing subdev bus and devlink
 extension

 On Thu, 28 Feb 2019 23:37:44 -0600, Parav Pandit wrote:
> Requirements for above use cases:
> 
> 1. We need a generic user interface & core APIs to create sub
> devices from a parent pci device but should be generic enough for
> other parent devices 2. Interface should be vendor agnostic 3.
> User should be able to set device params at creation time 4. In
> future if needed, tool should be able to create passthrough device
> to map to a virtual machine

 Like a mediated device?
>>>
>>> Yes.
>>>
 https://www.kernel.org/doc/Documentation/vfio-mediated-device.txt
 https://www.dpdk.org/wp-content/uploads/sites/35/2018/06/Mediated-
 Devices-Better-Userland-IO.pdf

 Other than pass-through it is entirely unclear to me why you'd need a
>> bus.
 (Or should I say VM pass through or DPDK?)  Could you clarify why
 the need for a bus?

>>> A bus follow standard linux kernel device driver model to attach a
>>> driver to specific device. Platform device with my limited
>>> understanding looks a hack/abuse of it based on documentation [1], but
>>> it can possibly be an alternative to bus if it looks fine to Greg and
>>> others.
>>
>> I grok from this text that the main advantage you see is the ability to 
>> choose
>> a driver for the subdevice.
>>
> Yes.
> 
 My thinking is that we should allow spawning subports in devlink and
 if user specifies "passthrough" the device spawned would be an mdev.
>>>
>>> devlink device is much more comprehensive way to create sub-devices
>>> than sub-ports for at least below reasons.
>>>
>>> 1. devlink device already defines device->port relation which enables
>>> to create multiport device.
>>
>> I presume that by devlink device you mean devlink instance?  Yes, this part
>> I'm following.
>>
> Yes -> 'struct devlink' 
>>> subport breaks that.
>>
>> Breaks what?  The ability to create a devlink instance with multiple ports?
>>
> Right.
> 
>>> 2. With bus model, it enables us to load driver of same vendor or
>>> generic one such a vfio in future.
>>

You can achieve this with mdev as well.

>> Yes, sorry, I'm not an expert on mdevs, but isn't that the goal of those?
>> Could you go into more detail why not just use mdevs?
>>
> I am novice at mdev level too. mdev or vfio mdev.
> Currently by default we bind to same vendor driver, but when it was created 
> as passthrough device, vendor driver won't create netdevice or rdma device 
> for it.
> And vfio/mdev or whatever mature available driver would bind at that point.
> 

Using mdev framework, if you want to partition a physical device into
multiple logic devices, you can bind those devices to same vendor driver
through vfio-mdev, where as if you want to passthrough the device bind
it to vfio-pci. If I understand correctly, that is what you are looking for.


>>> 3. Devices live on the bus, mapping a subport to 'struct device' is
>>> not intuitive.
>>
>> Are you saying that the main devlink instance would not have any port
>> information for the subdevices?
>>
> Right, this newly created devlink device is the control point of its port(s).
> 
>> Devices live on a bus.  Software constructs - depend on how one wants to
>> model them - don't have to.
>>
>>> 4. sub-device allows to use existing devlink port, registers, health
>>> infrastructure to sub devices, which otherwise need to be duplicated
>>> for ports.
>>
>> Health stuff is not tied to a port, I'm not following you.  You can create a
>> reporter per port, per ACL rule or per SB or per whatever your heart 
>> desires..
>>
> Instead of creating multiple reporters and inventing these reporter naming 
> schemes,
> creating devlink instance leverage all health reporting done for a devliink 
> instance.
> So whatever is done for instance A (parent), can be available for instance B 
> (subdev).
> 
>>> 5. Even though current devlink devices are networking devices, there
>>> is nothing restricts it to 

Re: [PATCH v3 2/2] vfio: add edid support to mbochs sample driver

2018-09-28 Thread Kirti Wankhede



On 9/28/2018 11:10 AM, Gerd Hoffmann wrote:
>>> +   case MBOCHS_EDID_REGION_INDEX:
>>> +   ext->base.argsz = sizeof(*ext);
>>> +   ext->base.offset = MBOCHS_EDID_OFFSET;
>>> +   ext->base.size = MBOCHS_EDID_SIZE;
>>> +   ext->base.flags = (VFIO_REGION_INFO_FLAG_READ  |
>>> +  VFIO_REGION_INFO_FLAG_WRITE |
>>> +  VFIO_REGION_INFO_FLAG_CAPS);
>>
>> Any reason to not to use _MMAP flag?
> 
> There is no page backing this.  Also it is not performance-critical,
> edid updates should be rare, so the extra code for mmap support doesn't
> look like it is worth it.
> 
> Also for the virtual registers (especially link_state) it is probably
> useful to have the write callback of the mdev driver called to get
> notified about the change.
> 
>> How would QEMU side code read this region? will it be always trapped?
> 
> qemu uses read & write syscalls (well, pread & pwrite actually).
> 
>> If vendor driver sets _MMAP flag, will QEMU side handle that case as well?
> 
> The current test branch doesn't, it expects read+write to work.
>   https://git.kraxel.org/cgit/qemu/log/?h=sirius/edid-vfio
> 

Ok.
Can you add a comment in vfio.h that this region is non-mmappable?

>> I think since its blob, edid could be read by QEMU using one memcpy
>> rather than adding multiple memcpy of 4 or 8 bytes.
> 
> From qemu it's a single pwrite syscall actually.  mbochs_write() splits
> it into 4 byte writes and calls mbochs_access() for each of them.  One
> could probably add a special case for the EDID blob to mbochs_write().
> But again: doesn't seem worth the effort given that edid updates should
> be a rare event.
> 

Ok.

Thanks,
Kirti

> cheers,
>   Gerd
> 


Re: [PATCH v3 2/2] vfio: add edid support to mbochs sample driver

2018-09-28 Thread Kirti Wankhede



On 9/28/2018 11:10 AM, Gerd Hoffmann wrote:
>>> +   case MBOCHS_EDID_REGION_INDEX:
>>> +   ext->base.argsz = sizeof(*ext);
>>> +   ext->base.offset = MBOCHS_EDID_OFFSET;
>>> +   ext->base.size = MBOCHS_EDID_SIZE;
>>> +   ext->base.flags = (VFIO_REGION_INFO_FLAG_READ  |
>>> +  VFIO_REGION_INFO_FLAG_WRITE |
>>> +  VFIO_REGION_INFO_FLAG_CAPS);
>>
>> Any reason to not to use _MMAP flag?
> 
> There is no page backing this.  Also it is not performance-critical,
> edid updates should be rare, so the extra code for mmap support doesn't
> look like it is worth it.
> 
> Also for the virtual registers (especially link_state) it is probably
> useful to have the write callback of the mdev driver called to get
> notified about the change.
> 
>> How would QEMU side code read this region? will it be always trapped?
> 
> qemu uses read & write syscalls (well, pread & pwrite actually).
> 
>> If vendor driver sets _MMAP flag, will QEMU side handle that case as well?
> 
> The current test branch doesn't, it expects read+write to work.
>   https://git.kraxel.org/cgit/qemu/log/?h=sirius/edid-vfio
> 

Ok.
Can you add a comment in vfio.h that this region is non-mmappable?

>> I think since its blob, edid could be read by QEMU using one memcpy
>> rather than adding multiple memcpy of 4 or 8 bytes.
> 
> From qemu it's a single pwrite syscall actually.  mbochs_write() splits
> it into 4 byte writes and calls mbochs_access() for each of them.  One
> could probably add a special case for the EDID blob to mbochs_write().
> But again: doesn't seem worth the effort given that edid updates should
> be a rare event.
> 

Ok.

Thanks,
Kirti

> cheers,
>   Gerd
> 


Re: [PATCH v3 2/2] vfio: add edid support to mbochs sample driver

2018-09-27 Thread Kirti Wankhede



On 9/21/2018 2:00 PM, Gerd Hoffmann wrote:
> Signed-off-by: Gerd Hoffmann 
> ---
>  samples/vfio-mdev/mbochs.c | 136 
> ++---
>  1 file changed, 117 insertions(+), 19 deletions(-)
> 
> diff --git a/samples/vfio-mdev/mbochs.c b/samples/vfio-mdev/mbochs.c
> index 2535c3677c..ca7960adf5 100644
> --- a/samples/vfio-mdev/mbochs.c
> +++ b/samples/vfio-mdev/mbochs.c
> @@ -71,11 +71,19 @@
>  #define MBOCHS_NAME"mbochs"
>  #define MBOCHS_CLASS_NAME  "mbochs"
>  
> +#define MBOCHS_EDID_REGION_INDEX  VFIO_PCI_NUM_REGIONS
> +#define MBOCHS_NUM_REGIONS(MBOCHS_EDID_REGION_INDEX+1)
> +
>  #define MBOCHS_CONFIG_SPACE_SIZE  0xff
>  #define MBOCHS_MMIO_BAR_OFFSET PAGE_SIZE
>  #define MBOCHS_MMIO_BAR_SIZE   PAGE_SIZE
> -#define MBOCHS_MEMORY_BAR_OFFSET  (MBOCHS_MMIO_BAR_OFFSET + \
> +#define MBOCHS_EDID_OFFSET (MBOCHS_MMIO_BAR_OFFSET + \
>  MBOCHS_MMIO_BAR_SIZE)
> +#define MBOCHS_EDID_SIZE   PAGE_SIZE
> +#define MBOCHS_MEMORY_BAR_OFFSET  (MBOCHS_EDID_OFFSET + \
> +MBOCHS_EDID_SIZE)
> +
> +#define MBOCHS_EDID_BLOB_OFFSET   (MBOCHS_EDID_SIZE/2)
>  
>  #define STORE_LE16(addr, val)(*(u16 *)addr = val)
>  #define STORE_LE32(addr, val)(*(u32 *)addr = val)
> @@ -95,16 +103,24 @@ MODULE_PARM_DESC(mem, "megabytes available to " 
> MBOCHS_NAME " devices");
>  static const struct mbochs_type {
>   const char *name;
>   u32 mbytes;
> + u32 max_x;
> + u32 max_y;
>  } mbochs_types[] = {
>   {
>   .name   = MBOCHS_CLASS_NAME "-" MBOCHS_TYPE_1,
>   .mbytes = 4,
> + .max_x  = 800,
> + .max_y  = 600,
>   }, {
>   .name   = MBOCHS_CLASS_NAME "-" MBOCHS_TYPE_2,
>   .mbytes = 16,
> + .max_x  = 1920,
> + .max_y  = 1440,
>   }, {
>   .name   = MBOCHS_CLASS_NAME "-" MBOCHS_TYPE_3,
>   .mbytes = 64,
> + .max_x  = 0,
> + .max_y  = 0,
>   },
>  };
>  
> @@ -115,6 +131,11 @@ static struct cdev   mbochs_cdev;
>  static struct device mbochs_dev;
>  static int   mbochs_used_mbytes;
>  
> +struct vfio_region_info_ext {
> + struct vfio_region_info  base;
> + struct vfio_region_info_cap_type type;
> +};
> +
>  struct mbochs_mode {
>   u32 drm_format;
>   u32 bytepp;
> @@ -144,13 +165,14 @@ struct mdev_state {
>   u32 memory_bar_mask;
>   struct mutex ops_lock;
>   struct mdev_device *mdev;
> - struct vfio_device_info dev_info;
>  
>   const struct mbochs_type *type;
>   u16 vbe[VBE_DISPI_INDEX_COUNT];
>   u64 memsize;
>   struct page **pages;
>   pgoff_t pagecount;
> + struct vfio_region_gfx_edid edid_regs;
> + u8 edid_blob[0x400];
>  
>   struct list_head dmabufs;
>   u32 active_id;
> @@ -342,10 +364,20 @@ static void handle_mmio_read(struct mdev_state 
> *mdev_state, u16 offset,
>char *buf, u32 count)
>  {
>   struct device *dev = mdev_dev(mdev_state->mdev);
> + struct vfio_region_gfx_edid *edid;
>   u16 reg16 = 0;
>   int index;
>  
>   switch (offset) {
> + case 0x000 ... 0x3ff: /* edid block */
> + edid = _state->edid_regs;
> + if (edid->link_state != VFIO_DEVICE_GFX_LINK_STATE_UP ||
> + offset >= edid->edid_size) {
> + memset(buf, 0, count);
> + break;
> + }
> + memcpy(buf, mdev_state->edid_blob + offset, count);
> + break;
>   case 0x500 ... 0x515: /* bochs dispi interface */
>   if (count != 2)
>   goto unhandled;
> @@ -365,6 +397,44 @@ static void handle_mmio_read(struct mdev_state 
> *mdev_state, u16 offset,
>   }
>  }
>  
> +static void handle_edid_regs(struct mdev_state *mdev_state, u16 offset,
> +  char *buf, u32 count, bool is_write)
> +{
> + char *regs = (void *)_state->edid_regs;
> +
> + if (offset + count > sizeof(mdev_state->edid_regs))
> + return;
> + if (count != 4)
> + return;
> + if (offset % 4)
> + return;
> +
> + if (is_write) {
> + switch (offset) {
> + case offsetof(struct vfio_region_gfx_edid, link_state):
> + case offsetof(struct vfio_region_gfx_edid, edid_size):
> + memcpy(regs + offset, buf, count);
> + break;
> + default:
> + /* read-only regs */
> + break;
> + }
> + } else {
> + memcpy(buf, regs + offset, count);
> + }
> +}
> +
> +static void handle_edid_blob(struct mdev_state *mdev_state, u16 offset,
> +  char *buf, u32 count, bool is_write)
> +{
> + if (offset + count > mdev_state->edid_regs.edid_max_size)
> + return;
> + 

Re: [PATCH v3 2/2] vfio: add edid support to mbochs sample driver

2018-09-27 Thread Kirti Wankhede



On 9/21/2018 2:00 PM, Gerd Hoffmann wrote:
> Signed-off-by: Gerd Hoffmann 
> ---
>  samples/vfio-mdev/mbochs.c | 136 
> ++---
>  1 file changed, 117 insertions(+), 19 deletions(-)
> 
> diff --git a/samples/vfio-mdev/mbochs.c b/samples/vfio-mdev/mbochs.c
> index 2535c3677c..ca7960adf5 100644
> --- a/samples/vfio-mdev/mbochs.c
> +++ b/samples/vfio-mdev/mbochs.c
> @@ -71,11 +71,19 @@
>  #define MBOCHS_NAME"mbochs"
>  #define MBOCHS_CLASS_NAME  "mbochs"
>  
> +#define MBOCHS_EDID_REGION_INDEX  VFIO_PCI_NUM_REGIONS
> +#define MBOCHS_NUM_REGIONS(MBOCHS_EDID_REGION_INDEX+1)
> +
>  #define MBOCHS_CONFIG_SPACE_SIZE  0xff
>  #define MBOCHS_MMIO_BAR_OFFSET PAGE_SIZE
>  #define MBOCHS_MMIO_BAR_SIZE   PAGE_SIZE
> -#define MBOCHS_MEMORY_BAR_OFFSET  (MBOCHS_MMIO_BAR_OFFSET + \
> +#define MBOCHS_EDID_OFFSET (MBOCHS_MMIO_BAR_OFFSET + \
>  MBOCHS_MMIO_BAR_SIZE)
> +#define MBOCHS_EDID_SIZE   PAGE_SIZE
> +#define MBOCHS_MEMORY_BAR_OFFSET  (MBOCHS_EDID_OFFSET + \
> +MBOCHS_EDID_SIZE)
> +
> +#define MBOCHS_EDID_BLOB_OFFSET   (MBOCHS_EDID_SIZE/2)
>  
>  #define STORE_LE16(addr, val)(*(u16 *)addr = val)
>  #define STORE_LE32(addr, val)(*(u32 *)addr = val)
> @@ -95,16 +103,24 @@ MODULE_PARM_DESC(mem, "megabytes available to " 
> MBOCHS_NAME " devices");
>  static const struct mbochs_type {
>   const char *name;
>   u32 mbytes;
> + u32 max_x;
> + u32 max_y;
>  } mbochs_types[] = {
>   {
>   .name   = MBOCHS_CLASS_NAME "-" MBOCHS_TYPE_1,
>   .mbytes = 4,
> + .max_x  = 800,
> + .max_y  = 600,
>   }, {
>   .name   = MBOCHS_CLASS_NAME "-" MBOCHS_TYPE_2,
>   .mbytes = 16,
> + .max_x  = 1920,
> + .max_y  = 1440,
>   }, {
>   .name   = MBOCHS_CLASS_NAME "-" MBOCHS_TYPE_3,
>   .mbytes = 64,
> + .max_x  = 0,
> + .max_y  = 0,
>   },
>  };
>  
> @@ -115,6 +131,11 @@ static struct cdev   mbochs_cdev;
>  static struct device mbochs_dev;
>  static int   mbochs_used_mbytes;
>  
> +struct vfio_region_info_ext {
> + struct vfio_region_info  base;
> + struct vfio_region_info_cap_type type;
> +};
> +
>  struct mbochs_mode {
>   u32 drm_format;
>   u32 bytepp;
> @@ -144,13 +165,14 @@ struct mdev_state {
>   u32 memory_bar_mask;
>   struct mutex ops_lock;
>   struct mdev_device *mdev;
> - struct vfio_device_info dev_info;
>  
>   const struct mbochs_type *type;
>   u16 vbe[VBE_DISPI_INDEX_COUNT];
>   u64 memsize;
>   struct page **pages;
>   pgoff_t pagecount;
> + struct vfio_region_gfx_edid edid_regs;
> + u8 edid_blob[0x400];
>  
>   struct list_head dmabufs;
>   u32 active_id;
> @@ -342,10 +364,20 @@ static void handle_mmio_read(struct mdev_state 
> *mdev_state, u16 offset,
>char *buf, u32 count)
>  {
>   struct device *dev = mdev_dev(mdev_state->mdev);
> + struct vfio_region_gfx_edid *edid;
>   u16 reg16 = 0;
>   int index;
>  
>   switch (offset) {
> + case 0x000 ... 0x3ff: /* edid block */
> + edid = _state->edid_regs;
> + if (edid->link_state != VFIO_DEVICE_GFX_LINK_STATE_UP ||
> + offset >= edid->edid_size) {
> + memset(buf, 0, count);
> + break;
> + }
> + memcpy(buf, mdev_state->edid_blob + offset, count);
> + break;
>   case 0x500 ... 0x515: /* bochs dispi interface */
>   if (count != 2)
>   goto unhandled;
> @@ -365,6 +397,44 @@ static void handle_mmio_read(struct mdev_state 
> *mdev_state, u16 offset,
>   }
>  }
>  
> +static void handle_edid_regs(struct mdev_state *mdev_state, u16 offset,
> +  char *buf, u32 count, bool is_write)
> +{
> + char *regs = (void *)_state->edid_regs;
> +
> + if (offset + count > sizeof(mdev_state->edid_regs))
> + return;
> + if (count != 4)
> + return;
> + if (offset % 4)
> + return;
> +
> + if (is_write) {
> + switch (offset) {
> + case offsetof(struct vfio_region_gfx_edid, link_state):
> + case offsetof(struct vfio_region_gfx_edid, edid_size):
> + memcpy(regs + offset, buf, count);
> + break;
> + default:
> + /* read-only regs */
> + break;
> + }
> + } else {
> + memcpy(buf, regs + offset, count);
> + }
> +}
> +
> +static void handle_edid_blob(struct mdev_state *mdev_state, u16 offset,
> +  char *buf, u32 count, bool is_write)
> +{
> + if (offset + count > mdev_state->edid_regs.edid_max_size)
> + return;
> + 

Re: [PATCH v4 2/2] vfio/mdev: Re-order sysfs attribute creation

2018-05-18 Thread Kirti Wankhede


On 5/19/2018 12:40 AM, Alex Williamson wrote:
> There exists a gap at the end of mdev_device_create() where the device
> is visible to userspace, but we're not yet ready to handle removal, as
> triggered through the 'remove' attribute.  We handle this properly in
> mdev_device_remove() with an -EAGAIN return, but we can marginally
> reduce this gap by adding this attribute as a final step of our sysfs
> setup.
> 
> Signed-off-by: Alex Williamson <alex.william...@redhat.com>

Looks good.

Reviewed by: Kirti Wankhede <kwankh...@nvidia.com>

> ---
>  drivers/vfio/mdev/mdev_sysfs.c |   14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/vfio/mdev/mdev_sysfs.c b/drivers/vfio/mdev/mdev_sysfs.c
> index 802df210929b..249472f05509 100644
> --- a/drivers/vfio/mdev/mdev_sysfs.c
> +++ b/drivers/vfio/mdev/mdev_sysfs.c
> @@ -257,24 +257,24 @@ int  mdev_create_sysfs_files(struct device *dev, struct 
> mdev_type *type)
>  {
>   int ret;
>  
> - ret = sysfs_create_files(>kobj, mdev_device_attrs);
> - if (ret)
> - return ret;
> -
>   ret = sysfs_create_link(type->devices_kobj, >kobj, dev_name(dev));
>   if (ret)
> - goto device_link_failed;
> + return ret;
>  
>   ret = sysfs_create_link(>kobj, >kobj, "mdev_type");
>   if (ret)
>   goto type_link_failed;
>  
> + ret = sysfs_create_files(>kobj, mdev_device_attrs);
> + if (ret)
> + goto create_files_failed;
> +
>   return ret;
>  
> +create_files_failed:
> + sysfs_remove_link(>kobj, "mdev_type");
>  type_link_failed:
>   sysfs_remove_link(type->devices_kobj, dev_name(dev));
> -device_link_failed:
> - sysfs_remove_files(>kobj, mdev_device_attrs);
>   return ret;
>  }
>  
> 


Re: [PATCH v4 2/2] vfio/mdev: Re-order sysfs attribute creation

2018-05-18 Thread Kirti Wankhede


On 5/19/2018 12:40 AM, Alex Williamson wrote:
> There exists a gap at the end of mdev_device_create() where the device
> is visible to userspace, but we're not yet ready to handle removal, as
> triggered through the 'remove' attribute.  We handle this properly in
> mdev_device_remove() with an -EAGAIN return, but we can marginally
> reduce this gap by adding this attribute as a final step of our sysfs
> setup.
> 
> Signed-off-by: Alex Williamson 

Looks good.

Reviewed by: Kirti Wankhede 

> ---
>  drivers/vfio/mdev/mdev_sysfs.c |   14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/vfio/mdev/mdev_sysfs.c b/drivers/vfio/mdev/mdev_sysfs.c
> index 802df210929b..249472f05509 100644
> --- a/drivers/vfio/mdev/mdev_sysfs.c
> +++ b/drivers/vfio/mdev/mdev_sysfs.c
> @@ -257,24 +257,24 @@ int  mdev_create_sysfs_files(struct device *dev, struct 
> mdev_type *type)
>  {
>   int ret;
>  
> - ret = sysfs_create_files(>kobj, mdev_device_attrs);
> - if (ret)
> - return ret;
> -
>   ret = sysfs_create_link(type->devices_kobj, >kobj, dev_name(dev));
>   if (ret)
> - goto device_link_failed;
> + return ret;
>  
>   ret = sysfs_create_link(>kobj, >kobj, "mdev_type");
>   if (ret)
>   goto type_link_failed;
>  
> + ret = sysfs_create_files(>kobj, mdev_device_attrs);
> + if (ret)
> + goto create_files_failed;
> +
>   return ret;
>  
> +create_files_failed:
> + sysfs_remove_link(>kobj, "mdev_type");
>  type_link_failed:
>   sysfs_remove_link(type->devices_kobj, dev_name(dev));
> -device_link_failed:
> - sysfs_remove_files(>kobj, mdev_device_attrs);
>   return ret;
>  }
>  
> 


Re: [PATCH v4 0/2] vfio/mdev: Device namespace protection

2018-05-18 Thread Kirti Wankhede


On 5/19/2018 12:40 AM, Alex Williamson wrote:
> v4: Fix the 'create' racing 'remove' gap noted by Kirti by moving
> removal from mdev_list to mdev_device_release().  Fix missing
> mdev_put_parent() cases in mdev_device_create(), also noted
> by Kirti.  Added documention update regarding serialization as
> noted by Cornelia.  Added additional commit log comment about
> -EAGAIN vs -ENODEV for 'remove' racing 'create'.  Added second
> patch to re-order sysfs attributes, with this my targeted
> scripts can no longer hit the gap where -EAGAIN is regurned.
> BTW, the gap where the current code returns -ENODEV in this
> race condition is about 50% easier to hit than it exists in
> this series with patch 1 alone.
> 

Thanks for fixing this. This patch set looks good to me.

Thanks,
Kirti

> Thanks,
> Alex
> 
> ---
> 
> Alex Williamson (2):
>   vfio/mdev: Check globally for duplicate devices
>   vfio/mdev: Re-order sysfs attribute creation
> 
> 
>  Documentation/vfio-mediated-device.txt |5 ++
>  drivers/vfio/mdev/mdev_core.c  |  102 
> +++-
>  drivers/vfio/mdev/mdev_private.h   |2 -
>  drivers/vfio/mdev/mdev_sysfs.c |   14 ++--
>  4 files changed, 49 insertions(+), 74 deletions(-)
> 


Re: [PATCH v4 0/2] vfio/mdev: Device namespace protection

2018-05-18 Thread Kirti Wankhede


On 5/19/2018 12:40 AM, Alex Williamson wrote:
> v4: Fix the 'create' racing 'remove' gap noted by Kirti by moving
> removal from mdev_list to mdev_device_release().  Fix missing
> mdev_put_parent() cases in mdev_device_create(), also noted
> by Kirti.  Added documention update regarding serialization as
> noted by Cornelia.  Added additional commit log comment about
> -EAGAIN vs -ENODEV for 'remove' racing 'create'.  Added second
> patch to re-order sysfs attributes, with this my targeted
> scripts can no longer hit the gap where -EAGAIN is regurned.
> BTW, the gap where the current code returns -ENODEV in this
> race condition is about 50% easier to hit than it exists in
> this series with patch 1 alone.
> 

Thanks for fixing this. This patch set looks good to me.

Thanks,
Kirti

> Thanks,
> Alex
> 
> ---
> 
> Alex Williamson (2):
>   vfio/mdev: Check globally for duplicate devices
>   vfio/mdev: Re-order sysfs attribute creation
> 
> 
>  Documentation/vfio-mediated-device.txt |5 ++
>  drivers/vfio/mdev/mdev_core.c  |  102 
> +++-
>  drivers/vfio/mdev/mdev_private.h   |2 -
>  drivers/vfio/mdev/mdev_sysfs.c |   14 ++--
>  4 files changed, 49 insertions(+), 74 deletions(-)
> 


Re: [PATCH v4 1/2] vfio/mdev: Check globally for duplicate devices

2018-05-18 Thread Kirti Wankhede


On 5/19/2018 12:40 AM, Alex Williamson wrote:
> When we create an mdev device, we check for duplicates against the
> parent device and return -EEXIST if found, but the mdev device
> namespace is global since we'll link all devices from the bus.  We do
> catch this later in sysfs_do_create_link_sd() to return -EEXIST, but
> with it comes a kernel warning and stack trace for trying to create
> duplicate sysfs links, which makes it an undesirable response.
> 
> Therefore we should really be looking for duplicates across all mdev
> parent devices, or as implemented here, against our mdev device list.
> Using mdev_list to prevent duplicates means that we can remove
> mdev_parent.lock, but in order not to serialize mdev device creation
> and removal globally, we add mdev_device.active which allows UUIDs to
> be reserved such that we can drop the mdev_list_lock before the mdev
> device is fully in place.
> 
> Two behavioral notes; first, mdev_parent.lock had the side-effect of
> serializing mdev create and remove ops per parent device.  This was
> an implementation detail, not an intentional guarantee provided to
> the mdev vendor drivers.  Vendor drivers can trivially provide this
> serialization internally if necessary.  Second, review comments note
> the new -EAGAIN behavior when the device, and in particular the remove
> attribute, becomes visible in sysfs.  If a remove is triggered prior
> to completion of mdev_device_create() the user will see a -EAGAIN
> error.  While the errno is different, receiving an error during this
> period is not, the previous implementation returned -ENODEV for the
> same condition.  Furthermore, the consistency to the user is improved
> in the case where mdev_device_remove_ops() returns error.  Previously
> concurrent calls to mdev_device_remove() could see the device
> disappear with -ENODEV and return in the case of error.  Now a user
> would see -EAGAIN while the device is in this transitory state.
> 
> Signed-off-by: Alex Williamson <alex.william...@redhat.com>

Looks good to me.

Reviewed by: Kirti Wankhede <kwankh...@nvidia.com>

> ---
>  Documentation/vfio-mediated-device.txt |5 ++
>  drivers/vfio/mdev/mdev_core.c  |  102 
> +++-
>  drivers/vfio/mdev/mdev_private.h   |2 -
>  3 files changed, 42 insertions(+), 67 deletions(-)
> 
> diff --git a/Documentation/vfio-mediated-device.txt 
> b/Documentation/vfio-mediated-device.txt
> index 1b3950346532..c3f69bcaf96e 100644
> --- a/Documentation/vfio-mediated-device.txt
> +++ b/Documentation/vfio-mediated-device.txt
> @@ -145,6 +145,11 @@ The functions in the mdev_parent_ops structure are as 
> follows:
>  * create: allocate basic resources in a driver for a mediated device
>  * remove: free resources in a driver when a mediated device is destroyed
>  
> +(Note that mdev-core provides no implicit serialization of create/remove
> +callbacks per mdev parent device, per mdev type, or any other categorization.
> +Vendor drivers are expected to be fully asynchronous in this respect or
> +provide their own internal resource protection.)
> +
>  The callbacks in the mdev_parent_ops structure are as follows:
>  
>  * open: open callback of mediated device
> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> index 126991046eb7..0212f0ee8aea 100644
> --- a/drivers/vfio/mdev/mdev_core.c
> +++ b/drivers/vfio/mdev/mdev_core.c
> @@ -66,34 +66,6 @@ uuid_le mdev_uuid(struct mdev_device *mdev)
>  }
>  EXPORT_SYMBOL(mdev_uuid);
>  
> -static int _find_mdev_device(struct device *dev, void *data)
> -{
> - struct mdev_device *mdev;
> -
> - if (!dev_is_mdev(dev))
> - return 0;
> -
> - mdev = to_mdev_device(dev);
> -
> - if (uuid_le_cmp(mdev->uuid, *(uuid_le *)data) == 0)
> - return 1;
> -
> - return 0;
> -}
> -
> -static bool mdev_device_exist(struct mdev_parent *parent, uuid_le uuid)
> -{
> - struct device *dev;
> -
> - dev = device_find_child(parent->dev, , _find_mdev_device);
> - if (dev) {
> - put_device(dev);
> - return true;
> - }
> -
> - return false;
> -}
> -
>  /* Should be called holding parent_list_lock */
>  static struct mdev_parent *__find_parent_device(struct device *dev)
>  {
> @@ -221,7 +193,6 @@ int mdev_register_device(struct device *dev, const struct 
> mdev_parent_ops *ops)
>   }
>  
>   kref_init(>ref);
> - mutex_init(>lock);
>  
>   parent->dev = dev;
>   parent->ops = ops;
> @@ -297,6 +268,10 @@ static void mdev_device_release(struct device *dev)
>  {
>   struct mdev_device *mdev = to_mdev_device(d

Re: [PATCH v4 1/2] vfio/mdev: Check globally for duplicate devices

2018-05-18 Thread Kirti Wankhede


On 5/19/2018 12:40 AM, Alex Williamson wrote:
> When we create an mdev device, we check for duplicates against the
> parent device and return -EEXIST if found, but the mdev device
> namespace is global since we'll link all devices from the bus.  We do
> catch this later in sysfs_do_create_link_sd() to return -EEXIST, but
> with it comes a kernel warning and stack trace for trying to create
> duplicate sysfs links, which makes it an undesirable response.
> 
> Therefore we should really be looking for duplicates across all mdev
> parent devices, or as implemented here, against our mdev device list.
> Using mdev_list to prevent duplicates means that we can remove
> mdev_parent.lock, but in order not to serialize mdev device creation
> and removal globally, we add mdev_device.active which allows UUIDs to
> be reserved such that we can drop the mdev_list_lock before the mdev
> device is fully in place.
> 
> Two behavioral notes; first, mdev_parent.lock had the side-effect of
> serializing mdev create and remove ops per parent device.  This was
> an implementation detail, not an intentional guarantee provided to
> the mdev vendor drivers.  Vendor drivers can trivially provide this
> serialization internally if necessary.  Second, review comments note
> the new -EAGAIN behavior when the device, and in particular the remove
> attribute, becomes visible in sysfs.  If a remove is triggered prior
> to completion of mdev_device_create() the user will see a -EAGAIN
> error.  While the errno is different, receiving an error during this
> period is not, the previous implementation returned -ENODEV for the
> same condition.  Furthermore, the consistency to the user is improved
> in the case where mdev_device_remove_ops() returns error.  Previously
> concurrent calls to mdev_device_remove() could see the device
> disappear with -ENODEV and return in the case of error.  Now a user
> would see -EAGAIN while the device is in this transitory state.
> 
> Signed-off-by: Alex Williamson 

Looks good to me.

Reviewed by: Kirti Wankhede 

> ---
>  Documentation/vfio-mediated-device.txt |5 ++
>  drivers/vfio/mdev/mdev_core.c  |  102 
> +++-
>  drivers/vfio/mdev/mdev_private.h   |2 -
>  3 files changed, 42 insertions(+), 67 deletions(-)
> 
> diff --git a/Documentation/vfio-mediated-device.txt 
> b/Documentation/vfio-mediated-device.txt
> index 1b3950346532..c3f69bcaf96e 100644
> --- a/Documentation/vfio-mediated-device.txt
> +++ b/Documentation/vfio-mediated-device.txt
> @@ -145,6 +145,11 @@ The functions in the mdev_parent_ops structure are as 
> follows:
>  * create: allocate basic resources in a driver for a mediated device
>  * remove: free resources in a driver when a mediated device is destroyed
>  
> +(Note that mdev-core provides no implicit serialization of create/remove
> +callbacks per mdev parent device, per mdev type, or any other categorization.
> +Vendor drivers are expected to be fully asynchronous in this respect or
> +provide their own internal resource protection.)
> +
>  The callbacks in the mdev_parent_ops structure are as follows:
>  
>  * open: open callback of mediated device
> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> index 126991046eb7..0212f0ee8aea 100644
> --- a/drivers/vfio/mdev/mdev_core.c
> +++ b/drivers/vfio/mdev/mdev_core.c
> @@ -66,34 +66,6 @@ uuid_le mdev_uuid(struct mdev_device *mdev)
>  }
>  EXPORT_SYMBOL(mdev_uuid);
>  
> -static int _find_mdev_device(struct device *dev, void *data)
> -{
> - struct mdev_device *mdev;
> -
> - if (!dev_is_mdev(dev))
> - return 0;
> -
> - mdev = to_mdev_device(dev);
> -
> - if (uuid_le_cmp(mdev->uuid, *(uuid_le *)data) == 0)
> - return 1;
> -
> - return 0;
> -}
> -
> -static bool mdev_device_exist(struct mdev_parent *parent, uuid_le uuid)
> -{
> - struct device *dev;
> -
> - dev = device_find_child(parent->dev, , _find_mdev_device);
> - if (dev) {
> - put_device(dev);
> - return true;
> - }
> -
> - return false;
> -}
> -
>  /* Should be called holding parent_list_lock */
>  static struct mdev_parent *__find_parent_device(struct device *dev)
>  {
> @@ -221,7 +193,6 @@ int mdev_register_device(struct device *dev, const struct 
> mdev_parent_ops *ops)
>   }
>  
>   kref_init(>ref);
> - mutex_init(>lock);
>  
>   parent->dev = dev;
>   parent->ops = ops;
> @@ -297,6 +268,10 @@ static void mdev_device_release(struct device *dev)
>  {
>   struct mdev_device *mdev = to_mdev_device(dev);
>  
> + mutex_lock(_list_lock);
> + list

Re: [PATCH v3] vfio/mdev: Check globally for duplicate devices

2018-05-18 Thread Kirti Wankhede


On 5/18/2018 11:00 PM, Alex Williamson wrote:
> On Fri, 18 May 2018 12:34:03 +0530
> Kirti Wankhede <kwankh...@nvidia.com> wrote:
> 
>> On 5/18/2018 3:07 AM, Alex Williamson wrote:
>>> On Fri, 18 May 2018 01:56:50 +0530
>>> Kirti Wankhede <kwankh...@nvidia.com> wrote:
>>>   
>>>> On 5/17/2018 9:50 PM, Alex Williamson wrote:  
>>>>> On Thu, 17 May 2018 21:25:22 +0530
>>>>> Kirti Wankhede <kwankh...@nvidia.com> wrote:
>>>>> 
>>>>>> On 5/17/2018 1:39 PM, Cornelia Huck wrote:
>>>>>>> On Wed, 16 May 2018 21:30:19 -0600
>>>>>>> Alex Williamson <alex.william...@redhat.com> wrote:
>>>>>>>   
>>>>>>>> When we create an mdev device, we check for duplicates against the
>>>>>>>> parent device and return -EEXIST if found, but the mdev device
>>>>>>>> namespace is global since we'll link all devices from the bus.  We do
>>>>>>>> catch this later in sysfs_do_create_link_sd() to return -EEXIST, but
>>>>>>>> with it comes a kernel warning and stack trace for trying to create
>>>>>>>> duplicate sysfs links, which makes it an undesirable response.
>>>>>>>>
>>>>>>>> Therefore we should really be looking for duplicates across all mdev
>>>>>>>> parent devices, or as implemented here, against our mdev device list.
>>>>>>>> Using mdev_list to prevent duplicates means that we can remove
>>>>>>>> mdev_parent.lock, but in order not to serialize mdev device creation
>>>>>>>> and removal globally, we add mdev_device.active which allows UUIDs to
>>>>>>>> be reserved such that we can drop the mdev_list_lock before the mdev
>>>>>>>> device is fully in place.
>>>>>>>>
>>>>>>>> NB. there was never intended to be any serialization guarantee
>>>>>>>> provided by the mdev core with respect to creation and removal of mdev
>>>>>>>> devices, mdev_parent.lock provided this only as a side-effect of the
>>>>>>>> implementation for locking the namespace per parent.  That
>>>>>>>> serialization is now removed.  
>>>>>>>   
>>>>>>
>>>>>> mdev_parent.lock is to serialize create and remove of that mdev device,
>>>>>> that handles race condition that Cornelia mentioned below.
>>>>>
>>>>> Previously it was stated:
>>>>>
>>>>> On Thu, 17 May 2018 01:01:40 +0530
>>>>> Kirti Wankhede <kwankh...@nvidia.com> wrote:
>>>>>> Here lock is not for create/remove routines of vendor driver, its about
>>>>>> mdev device creation and device registration, which is a common code
>>>>>> path, and so is part of mdev core module.
>>>>>
>>>>> So the race condition was handled previously, but as a side-effect of
>>>>> protecting the namespace, aiui.  I'm trying to state above that the
>>>>> serialization of create/remove was never intended as a guarantee
>>>>> provided to mdev vendor drivers.  I don't see that there's a need to
>>>>> protect "mdev device creation and device registration" beyond conflicts
>>>>> in the UUID namespace, which is done here.  Are there others?
>>>>> 
>>>>
>>>> Sorry not being elaborative in my earlier response to
>>>>  
>>>>> If we can
>>>>> show that vendor drivers handle the create/remove paths themselves,
>>>>> perhaps we can refine the locking granularity.
>>>>
>>>> mdev_device_create() function does :
>>>> - create mdev device
>>>> - register device
>>>> - call vendor driver->create
>>>> - create sysfs files.
>>>>
>>>> mdev_device_remove() removes sysfs files, unregister device and delete
>>>> device.
>>>>
>>>> There is common code in mdev_device_create() and mdev_device_remove()
>>>> independent of what vendor driver does in its create()/remove()
>>>> callback. Moving this code to each vendor driver to handle create/remove
>>>> themselves doesn't make sense to me.  
>>>
>>> I don't see where anyone is suggesting that, I'm n

Re: [PATCH v3] vfio/mdev: Check globally for duplicate devices

2018-05-18 Thread Kirti Wankhede


On 5/18/2018 11:00 PM, Alex Williamson wrote:
> On Fri, 18 May 2018 12:34:03 +0530
> Kirti Wankhede  wrote:
> 
>> On 5/18/2018 3:07 AM, Alex Williamson wrote:
>>> On Fri, 18 May 2018 01:56:50 +0530
>>> Kirti Wankhede  wrote:
>>>   
>>>> On 5/17/2018 9:50 PM, Alex Williamson wrote:  
>>>>> On Thu, 17 May 2018 21:25:22 +0530
>>>>> Kirti Wankhede  wrote:
>>>>> 
>>>>>> On 5/17/2018 1:39 PM, Cornelia Huck wrote:
>>>>>>> On Wed, 16 May 2018 21:30:19 -0600
>>>>>>> Alex Williamson  wrote:
>>>>>>>   
>>>>>>>> When we create an mdev device, we check for duplicates against the
>>>>>>>> parent device and return -EEXIST if found, but the mdev device
>>>>>>>> namespace is global since we'll link all devices from the bus.  We do
>>>>>>>> catch this later in sysfs_do_create_link_sd() to return -EEXIST, but
>>>>>>>> with it comes a kernel warning and stack trace for trying to create
>>>>>>>> duplicate sysfs links, which makes it an undesirable response.
>>>>>>>>
>>>>>>>> Therefore we should really be looking for duplicates across all mdev
>>>>>>>> parent devices, or as implemented here, against our mdev device list.
>>>>>>>> Using mdev_list to prevent duplicates means that we can remove
>>>>>>>> mdev_parent.lock, but in order not to serialize mdev device creation
>>>>>>>> and removal globally, we add mdev_device.active which allows UUIDs to
>>>>>>>> be reserved such that we can drop the mdev_list_lock before the mdev
>>>>>>>> device is fully in place.
>>>>>>>>
>>>>>>>> NB. there was never intended to be any serialization guarantee
>>>>>>>> provided by the mdev core with respect to creation and removal of mdev
>>>>>>>> devices, mdev_parent.lock provided this only as a side-effect of the
>>>>>>>> implementation for locking the namespace per parent.  That
>>>>>>>> serialization is now removed.  
>>>>>>>   
>>>>>>
>>>>>> mdev_parent.lock is to serialize create and remove of that mdev device,
>>>>>> that handles race condition that Cornelia mentioned below.
>>>>>
>>>>> Previously it was stated:
>>>>>
>>>>> On Thu, 17 May 2018 01:01:40 +0530
>>>>> Kirti Wankhede  wrote:
>>>>>> Here lock is not for create/remove routines of vendor driver, its about
>>>>>> mdev device creation and device registration, which is a common code
>>>>>> path, and so is part of mdev core module.
>>>>>
>>>>> So the race condition was handled previously, but as a side-effect of
>>>>> protecting the namespace, aiui.  I'm trying to state above that the
>>>>> serialization of create/remove was never intended as a guarantee
>>>>> provided to mdev vendor drivers.  I don't see that there's a need to
>>>>> protect "mdev device creation and device registration" beyond conflicts
>>>>> in the UUID namespace, which is done here.  Are there others?
>>>>> 
>>>>
>>>> Sorry not being elaborative in my earlier response to
>>>>  
>>>>> If we can
>>>>> show that vendor drivers handle the create/remove paths themselves,
>>>>> perhaps we can refine the locking granularity.
>>>>
>>>> mdev_device_create() function does :
>>>> - create mdev device
>>>> - register device
>>>> - call vendor driver->create
>>>> - create sysfs files.
>>>>
>>>> mdev_device_remove() removes sysfs files, unregister device and delete
>>>> device.
>>>>
>>>> There is common code in mdev_device_create() and mdev_device_remove()
>>>> independent of what vendor driver does in its create()/remove()
>>>> callback. Moving this code to each vendor driver to handle create/remove
>>>> themselves doesn't make sense to me.  
>>>
>>> I don't see where anyone is suggesting that, I'm not.
>>>
>>>> mdev_parent.lock here does take care of race conditions that could occur
>>>> during mdev device crea

Re: [PATCH v3] vfio/mdev: Check globally for duplicate devices

2018-05-18 Thread Kirti Wankhede


On 5/18/2018 3:07 AM, Alex Williamson wrote:
> On Fri, 18 May 2018 01:56:50 +0530
> Kirti Wankhede <kwankh...@nvidia.com> wrote:
> 
>> On 5/17/2018 9:50 PM, Alex Williamson wrote:
>>> On Thu, 17 May 2018 21:25:22 +0530
>>> Kirti Wankhede <kwankh...@nvidia.com> wrote:
>>>   
>>>> On 5/17/2018 1:39 PM, Cornelia Huck wrote:  
>>>>> On Wed, 16 May 2018 21:30:19 -0600
>>>>> Alex Williamson <alex.william...@redhat.com> wrote:
>>>>> 
>>>>>> When we create an mdev device, we check for duplicates against the
>>>>>> parent device and return -EEXIST if found, but the mdev device
>>>>>> namespace is global since we'll link all devices from the bus.  We do
>>>>>> catch this later in sysfs_do_create_link_sd() to return -EEXIST, but
>>>>>> with it comes a kernel warning and stack trace for trying to create
>>>>>> duplicate sysfs links, which makes it an undesirable response.
>>>>>>
>>>>>> Therefore we should really be looking for duplicates across all mdev
>>>>>> parent devices, or as implemented here, against our mdev device list.
>>>>>> Using mdev_list to prevent duplicates means that we can remove
>>>>>> mdev_parent.lock, but in order not to serialize mdev device creation
>>>>>> and removal globally, we add mdev_device.active which allows UUIDs to
>>>>>> be reserved such that we can drop the mdev_list_lock before the mdev
>>>>>> device is fully in place.
>>>>>>
>>>>>> NB. there was never intended to be any serialization guarantee
>>>>>> provided by the mdev core with respect to creation and removal of mdev
>>>>>> devices, mdev_parent.lock provided this only as a side-effect of the
>>>>>> implementation for locking the namespace per parent.  That
>>>>>> serialization is now removed.
>>>>> 
>>>>
>>>> mdev_parent.lock is to serialize create and remove of that mdev device,
>>>> that handles race condition that Cornelia mentioned below.  
>>>
>>> Previously it was stated:
>>>
>>> On Thu, 17 May 2018 01:01:40 +0530
>>> Kirti Wankhede <kwankh...@nvidia.com> wrote:  
>>>> Here lock is not for create/remove routines of vendor driver, its about
>>>> mdev device creation and device registration, which is a common code
>>>> path, and so is part of mdev core module.  
>>>
>>> So the race condition was handled previously, but as a side-effect of
>>> protecting the namespace, aiui.  I'm trying to state above that the
>>> serialization of create/remove was never intended as a guarantee
>>> provided to mdev vendor drivers.  I don't see that there's a need to
>>> protect "mdev device creation and device registration" beyond conflicts
>>> in the UUID namespace, which is done here.  Are there others?
>>>   
>>
>> Sorry not being elaborative in my earlier response to
>>
>>> If we can
>>> show that vendor drivers handle the create/remove paths themselves,
>>> perhaps we can refine the locking granularity.  
>>
>> mdev_device_create() function does :
>> - create mdev device
>> - register device
>> - call vendor driver->create
>> - create sysfs files.
>>
>> mdev_device_remove() removes sysfs files, unregister device and delete
>> device.
>>
>> There is common code in mdev_device_create() and mdev_device_remove()
>> independent of what vendor driver does in its create()/remove()
>> callback. Moving this code to each vendor driver to handle create/remove
>> themselves doesn't make sense to me.
> 
> I don't see where anyone is suggesting that, I'm not.
>  
>> mdev_parent.lock here does take care of race conditions that could occur
>> during mdev device creation and deletion in this common code path.
> 
> Exactly what races in the common code path is mdev_parent.lock
> preventing?  mdev_device_create() calls:
> 
> device_register()
> mdev_device_create_ops()
>   parent->ops->create()
>   sysfs_create_groups()
> mdev_create_sysfs_files()
>   sysfs_create_files()
>   sysfs_create_link()
>   sysfs_create_link()
> 
> mdev_parent.lock is certainly not serializing all calls across the
> entire kernel to device_register and sysfs_create_{groups,files,link}
> so what is it protecting other than serializing parent->ops->create()?
> Locks pro

Re: [PATCH v3] vfio/mdev: Check globally for duplicate devices

2018-05-18 Thread Kirti Wankhede


On 5/18/2018 3:07 AM, Alex Williamson wrote:
> On Fri, 18 May 2018 01:56:50 +0530
> Kirti Wankhede  wrote:
> 
>> On 5/17/2018 9:50 PM, Alex Williamson wrote:
>>> On Thu, 17 May 2018 21:25:22 +0530
>>> Kirti Wankhede  wrote:
>>>   
>>>> On 5/17/2018 1:39 PM, Cornelia Huck wrote:  
>>>>> On Wed, 16 May 2018 21:30:19 -0600
>>>>> Alex Williamson  wrote:
>>>>> 
>>>>>> When we create an mdev device, we check for duplicates against the
>>>>>> parent device and return -EEXIST if found, but the mdev device
>>>>>> namespace is global since we'll link all devices from the bus.  We do
>>>>>> catch this later in sysfs_do_create_link_sd() to return -EEXIST, but
>>>>>> with it comes a kernel warning and stack trace for trying to create
>>>>>> duplicate sysfs links, which makes it an undesirable response.
>>>>>>
>>>>>> Therefore we should really be looking for duplicates across all mdev
>>>>>> parent devices, or as implemented here, against our mdev device list.
>>>>>> Using mdev_list to prevent duplicates means that we can remove
>>>>>> mdev_parent.lock, but in order not to serialize mdev device creation
>>>>>> and removal globally, we add mdev_device.active which allows UUIDs to
>>>>>> be reserved such that we can drop the mdev_list_lock before the mdev
>>>>>> device is fully in place.
>>>>>>
>>>>>> NB. there was never intended to be any serialization guarantee
>>>>>> provided by the mdev core with respect to creation and removal of mdev
>>>>>> devices, mdev_parent.lock provided this only as a side-effect of the
>>>>>> implementation for locking the namespace per parent.  That
>>>>>> serialization is now removed.
>>>>> 
>>>>
>>>> mdev_parent.lock is to serialize create and remove of that mdev device,
>>>> that handles race condition that Cornelia mentioned below.  
>>>
>>> Previously it was stated:
>>>
>>> On Thu, 17 May 2018 01:01:40 +0530
>>> Kirti Wankhede  wrote:  
>>>> Here lock is not for create/remove routines of vendor driver, its about
>>>> mdev device creation and device registration, which is a common code
>>>> path, and so is part of mdev core module.  
>>>
>>> So the race condition was handled previously, but as a side-effect of
>>> protecting the namespace, aiui.  I'm trying to state above that the
>>> serialization of create/remove was never intended as a guarantee
>>> provided to mdev vendor drivers.  I don't see that there's a need to
>>> protect "mdev device creation and device registration" beyond conflicts
>>> in the UUID namespace, which is done here.  Are there others?
>>>   
>>
>> Sorry not being elaborative in my earlier response to
>>
>>> If we can
>>> show that vendor drivers handle the create/remove paths themselves,
>>> perhaps we can refine the locking granularity.  
>>
>> mdev_device_create() function does :
>> - create mdev device
>> - register device
>> - call vendor driver->create
>> - create sysfs files.
>>
>> mdev_device_remove() removes sysfs files, unregister device and delete
>> device.
>>
>> There is common code in mdev_device_create() and mdev_device_remove()
>> independent of what vendor driver does in its create()/remove()
>> callback. Moving this code to each vendor driver to handle create/remove
>> themselves doesn't make sense to me.
> 
> I don't see where anyone is suggesting that, I'm not.
>  
>> mdev_parent.lock here does take care of race conditions that could occur
>> during mdev device creation and deletion in this common code path.
> 
> Exactly what races in the common code path is mdev_parent.lock
> preventing?  mdev_device_create() calls:
> 
> device_register()
> mdev_device_create_ops()
>   parent->ops->create()
>   sysfs_create_groups()
> mdev_create_sysfs_files()
>   sysfs_create_files()
>   sysfs_create_link()
>   sysfs_create_link()
> 
> mdev_parent.lock is certainly not serializing all calls across the
> entire kernel to device_register and sysfs_create_{groups,files,link}
> so what is it protecting other than serializing parent->ops->create()?
> Locks protect data, not code.  The data we're protecting is the shared
> mdev_list, there is no shared data once mdev_device

Re: [PATCH v3] vfio/mdev: Check globally for duplicate devices

2018-05-17 Thread Kirti Wankhede


On 5/17/2018 9:50 PM, Alex Williamson wrote:
> On Thu, 17 May 2018 21:25:22 +0530
> Kirti Wankhede <kwankh...@nvidia.com> wrote:
> 
>> On 5/17/2018 1:39 PM, Cornelia Huck wrote:
>>> On Wed, 16 May 2018 21:30:19 -0600
>>> Alex Williamson <alex.william...@redhat.com> wrote:
>>>   
>>>> When we create an mdev device, we check for duplicates against the
>>>> parent device and return -EEXIST if found, but the mdev device
>>>> namespace is global since we'll link all devices from the bus.  We do
>>>> catch this later in sysfs_do_create_link_sd() to return -EEXIST, but
>>>> with it comes a kernel warning and stack trace for trying to create
>>>> duplicate sysfs links, which makes it an undesirable response.
>>>>
>>>> Therefore we should really be looking for duplicates across all mdev
>>>> parent devices, or as implemented here, against our mdev device list.
>>>> Using mdev_list to prevent duplicates means that we can remove
>>>> mdev_parent.lock, but in order not to serialize mdev device creation
>>>> and removal globally, we add mdev_device.active which allows UUIDs to
>>>> be reserved such that we can drop the mdev_list_lock before the mdev
>>>> device is fully in place.
>>>>
>>>> NB. there was never intended to be any serialization guarantee
>>>> provided by the mdev core with respect to creation and removal of mdev
>>>> devices, mdev_parent.lock provided this only as a side-effect of the
>>>> implementation for locking the namespace per parent.  That
>>>> serialization is now removed.  
>>>   
>>
>> mdev_parent.lock is to serialize create and remove of that mdev device,
>> that handles race condition that Cornelia mentioned below.
> 
> Previously it was stated:
> 
> On Thu, 17 May 2018 01:01:40 +0530
> Kirti Wankhede <kwankh...@nvidia.com> wrote:
>> Here lock is not for create/remove routines of vendor driver, its about
>> mdev device creation and device registration, which is a common code
>> path, and so is part of mdev core module.
> 
> So the race condition was handled previously, but as a side-effect of
> protecting the namespace, aiui.  I'm trying to state above that the
> serialization of create/remove was never intended as a guarantee
> provided to mdev vendor drivers.  I don't see that there's a need to
> protect "mdev device creation and device registration" beyond conflicts
> in the UUID namespace, which is done here.  Are there others?
> 

Sorry not being elaborative in my earlier response to

> If we can
> show that vendor drivers handle the create/remove paths themselves,
> perhaps we can refine the locking granularity.

mdev_device_create() function does :
- create mdev device
- register device
- call vendor driver->create
- create sysfs files.

mdev_device_remove() removes sysfs files, unregister device and delete
device.

There is common code in mdev_device_create() and mdev_device_remove()
independent of what vendor driver does in its create()/remove()
callback. Moving this code to each vendor driver to handle create/remove
themselves doesn't make sense to me.

mdev_parent.lock here does take care of race conditions that could occur
during mdev device creation and deletion in this common code path.

What is the urge to remove mdev_parent.lock if that handles all race
conditions without bothering user to handle -EAGAIN?

Thanks,
Kirti


>>> This is probably fine; but I noted that documentation on the locking
>>> conventions and serialization guarantees for mdev is a bit sparse, and
>>> this topic also came up during the vfio-ap review.
>>>
>>> We probably want to add some more concrete documentation; would the
>>> kernel doc for the _ops or vfio-mediated-device.txt be a better place
>>> for that?
> 
> I'll look to see where we can add a note withing that file, I suspect
> that's the right place to put it.
> 
>>> [Dong Jia, Halil: Can you please take a look whether vfio-ccw is really
>>> ok? I don't think we open up any new races, but I'd appreciate a second
>>> or third opinion.]
>>>   
>>>>
>>>> Signed-off-by: Alex Williamson <alex.william...@redhat.com>
>>>> ---
>>>>
>>>> v3: Rework locking and add a field to mdev_device so we can track
>>>> completed instances vs those added to reserve the namespace.
>>>>
>>>>  drivers/vfio/mdev/mdev_core.c|   94 
>>>> +-
>>>>  drivers/vfio/mdev/mdev_private.h |2 -
>>&

Re: [PATCH v3] vfio/mdev: Check globally for duplicate devices

2018-05-17 Thread Kirti Wankhede


On 5/17/2018 9:50 PM, Alex Williamson wrote:
> On Thu, 17 May 2018 21:25:22 +0530
> Kirti Wankhede  wrote:
> 
>> On 5/17/2018 1:39 PM, Cornelia Huck wrote:
>>> On Wed, 16 May 2018 21:30:19 -0600
>>> Alex Williamson  wrote:
>>>   
>>>> When we create an mdev device, we check for duplicates against the
>>>> parent device and return -EEXIST if found, but the mdev device
>>>> namespace is global since we'll link all devices from the bus.  We do
>>>> catch this later in sysfs_do_create_link_sd() to return -EEXIST, but
>>>> with it comes a kernel warning and stack trace for trying to create
>>>> duplicate sysfs links, which makes it an undesirable response.
>>>>
>>>> Therefore we should really be looking for duplicates across all mdev
>>>> parent devices, or as implemented here, against our mdev device list.
>>>> Using mdev_list to prevent duplicates means that we can remove
>>>> mdev_parent.lock, but in order not to serialize mdev device creation
>>>> and removal globally, we add mdev_device.active which allows UUIDs to
>>>> be reserved such that we can drop the mdev_list_lock before the mdev
>>>> device is fully in place.
>>>>
>>>> NB. there was never intended to be any serialization guarantee
>>>> provided by the mdev core with respect to creation and removal of mdev
>>>> devices, mdev_parent.lock provided this only as a side-effect of the
>>>> implementation for locking the namespace per parent.  That
>>>> serialization is now removed.  
>>>   
>>
>> mdev_parent.lock is to serialize create and remove of that mdev device,
>> that handles race condition that Cornelia mentioned below.
> 
> Previously it was stated:
> 
> On Thu, 17 May 2018 01:01:40 +0530
> Kirti Wankhede  wrote:
>> Here lock is not for create/remove routines of vendor driver, its about
>> mdev device creation and device registration, which is a common code
>> path, and so is part of mdev core module.
> 
> So the race condition was handled previously, but as a side-effect of
> protecting the namespace, aiui.  I'm trying to state above that the
> serialization of create/remove was never intended as a guarantee
> provided to mdev vendor drivers.  I don't see that there's a need to
> protect "mdev device creation and device registration" beyond conflicts
> in the UUID namespace, which is done here.  Are there others?
> 

Sorry not being elaborative in my earlier response to

> If we can
> show that vendor drivers handle the create/remove paths themselves,
> perhaps we can refine the locking granularity.

mdev_device_create() function does :
- create mdev device
- register device
- call vendor driver->create
- create sysfs files.

mdev_device_remove() removes sysfs files, unregister device and delete
device.

There is common code in mdev_device_create() and mdev_device_remove()
independent of what vendor driver does in its create()/remove()
callback. Moving this code to each vendor driver to handle create/remove
themselves doesn't make sense to me.

mdev_parent.lock here does take care of race conditions that could occur
during mdev device creation and deletion in this common code path.

What is the urge to remove mdev_parent.lock if that handles all race
conditions without bothering user to handle -EAGAIN?

Thanks,
Kirti


>>> This is probably fine; but I noted that documentation on the locking
>>> conventions and serialization guarantees for mdev is a bit sparse, and
>>> this topic also came up during the vfio-ap review.
>>>
>>> We probably want to add some more concrete documentation; would the
>>> kernel doc for the _ops or vfio-mediated-device.txt be a better place
>>> for that?
> 
> I'll look to see where we can add a note withing that file, I suspect
> that's the right place to put it.
> 
>>> [Dong Jia, Halil: Can you please take a look whether vfio-ccw is really
>>> ok? I don't think we open up any new races, but I'd appreciate a second
>>> or third opinion.]
>>>   
>>>>
>>>> Signed-off-by: Alex Williamson 
>>>> ---
>>>>
>>>> v3: Rework locking and add a field to mdev_device so we can track
>>>> completed instances vs those added to reserve the namespace.
>>>>
>>>>  drivers/vfio/mdev/mdev_core.c|   94 
>>>> +-
>>>>  drivers/vfio/mdev/mdev_private.h |2 -
>>>>  2 files changed, 34 insertions(+), 62 deletions(-)
>>>>
>>>> diff --git a/drivers/vfio/md

Re: [PATCH v3] vfio/mdev: Check globally for duplicate devices

2018-05-17 Thread Kirti Wankhede


On 5/17/2018 1:39 PM, Cornelia Huck wrote:
> On Wed, 16 May 2018 21:30:19 -0600
> Alex Williamson  wrote:
> 
>> When we create an mdev device, we check for duplicates against the
>> parent device and return -EEXIST if found, but the mdev device
>> namespace is global since we'll link all devices from the bus.  We do
>> catch this later in sysfs_do_create_link_sd() to return -EEXIST, but
>> with it comes a kernel warning and stack trace for trying to create
>> duplicate sysfs links, which makes it an undesirable response.
>>
>> Therefore we should really be looking for duplicates across all mdev
>> parent devices, or as implemented here, against our mdev device list.
>> Using mdev_list to prevent duplicates means that we can remove
>> mdev_parent.lock, but in order not to serialize mdev device creation
>> and removal globally, we add mdev_device.active which allows UUIDs to
>> be reserved such that we can drop the mdev_list_lock before the mdev
>> device is fully in place.
>>
>> NB. there was never intended to be any serialization guarantee
>> provided by the mdev core with respect to creation and removal of mdev
>> devices, mdev_parent.lock provided this only as a side-effect of the
>> implementation for locking the namespace per parent.  That
>> serialization is now removed.
> 

mdev_parent.lock is to serialize create and remove of that mdev device,
that handles race condition that Cornelia mentioned below.

> This is probably fine; but I noted that documentation on the locking
> conventions and serialization guarantees for mdev is a bit sparse, and
> this topic also came up during the vfio-ap review.
> 
> We probably want to add some more concrete documentation; would the
> kernel doc for the _ops or vfio-mediated-device.txt be a better place
> for that?
> 
> [Dong Jia, Halil: Can you please take a look whether vfio-ccw is really
> ok? I don't think we open up any new races, but I'd appreciate a second
> or third opinion.]
> 
>>
>> Signed-off-by: Alex Williamson 
>> ---
>>
>> v3: Rework locking and add a field to mdev_device so we can track
>> completed instances vs those added to reserve the namespace.
>>
>>  drivers/vfio/mdev/mdev_core.c|   94 
>> +-
>>  drivers/vfio/mdev/mdev_private.h |2 -
>>  2 files changed, 34 insertions(+), 62 deletions(-)
>>
>> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
>> index 126991046eb7..55ea9d34ec69 100644
>> --- a/drivers/vfio/mdev/mdev_core.c
>> +++ b/drivers/vfio/mdev/mdev_core.c
>> @@ -66,34 +66,6 @@ uuid_le mdev_uuid(struct mdev_device *mdev)
>>  }
>>  EXPORT_SYMBOL(mdev_uuid);
>>  
>> -static int _find_mdev_device(struct device *dev, void *data)
>> -{
>> -struct mdev_device *mdev;
>> -
>> -if (!dev_is_mdev(dev))
>> -return 0;
>> -
>> -mdev = to_mdev_device(dev);
>> -
>> -if (uuid_le_cmp(mdev->uuid, *(uuid_le *)data) == 0)
>> -return 1;
>> -
>> -return 0;
>> -}
>> -
>> -static bool mdev_device_exist(struct mdev_parent *parent, uuid_le uuid)
>> -{
>> -struct device *dev;
>> -
>> -dev = device_find_child(parent->dev, , _find_mdev_device);
>> -if (dev) {
>> -put_device(dev);
>> -return true;
>> -}
>> -
>> -return false;
>> -}
>> -
>>  /* Should be called holding parent_list_lock */
>>  static struct mdev_parent *__find_parent_device(struct device *dev)
>>  {
>> @@ -221,7 +193,6 @@ int mdev_register_device(struct device *dev, const 
>> struct mdev_parent_ops *ops)
>>  }
>>  
>>  kref_init(>ref);
>> -mutex_init(>lock);
>>  
>>  parent->dev = dev;
>>  parent->ops = ops;
>> @@ -304,7 +275,7 @@ static void mdev_device_release(struct device *dev)
>>  int mdev_device_create(struct kobject *kobj, struct device *dev, uuid_le 
>> uuid)
>>  {
>>  int ret;
>> -struct mdev_device *mdev;
>> +struct mdev_device *mdev, *tmp;
>>  struct mdev_parent *parent;
>>  struct mdev_type *type = to_mdev_type(kobj);
>>  
>> @@ -312,21 +283,26 @@ int mdev_device_create(struct kobject *kobj, struct 
>> device *dev, uuid_le uuid)
>>  if (!parent)
>>  return -EINVAL;
>>  
>> -mutex_lock(>lock);
>> +mutex_lock(_list_lock);
>>  
>>  /* Check for duplicate */
>> -if (mdev_device_exist(parent, uuid)) {
>> -ret = -EEXIST;
>> -goto create_err;
>> +list_for_each_entry(tmp, _list, next) {
>> +if (!uuid_le_cmp(tmp->uuid, uuid)) {
>> +mutex_unlock(_list_lock);
>> +return -EEXIST;
>> +}
>>  }
>>  

mdev_put_parent(parent) missing before return.


>>  mdev = kzalloc(sizeof(*mdev), GFP_KERNEL);
>>  if (!mdev) {
>> -ret = -ENOMEM;
>> -goto create_err;
>> +mutex_unlock(_list_lock);
>> +return -ENOMEM;
>>  }
>>

mdev_put_parent(parent) missing here again.

Thanks,
Kirti


Re: [PATCH v3] vfio/mdev: Check globally for duplicate devices

2018-05-17 Thread Kirti Wankhede


On 5/17/2018 1:39 PM, Cornelia Huck wrote:
> On Wed, 16 May 2018 21:30:19 -0600
> Alex Williamson  wrote:
> 
>> When we create an mdev device, we check for duplicates against the
>> parent device and return -EEXIST if found, but the mdev device
>> namespace is global since we'll link all devices from the bus.  We do
>> catch this later in sysfs_do_create_link_sd() to return -EEXIST, but
>> with it comes a kernel warning and stack trace for trying to create
>> duplicate sysfs links, which makes it an undesirable response.
>>
>> Therefore we should really be looking for duplicates across all mdev
>> parent devices, or as implemented here, against our mdev device list.
>> Using mdev_list to prevent duplicates means that we can remove
>> mdev_parent.lock, but in order not to serialize mdev device creation
>> and removal globally, we add mdev_device.active which allows UUIDs to
>> be reserved such that we can drop the mdev_list_lock before the mdev
>> device is fully in place.
>>
>> NB. there was never intended to be any serialization guarantee
>> provided by the mdev core with respect to creation and removal of mdev
>> devices, mdev_parent.lock provided this only as a side-effect of the
>> implementation for locking the namespace per parent.  That
>> serialization is now removed.
> 

mdev_parent.lock is to serialize create and remove of that mdev device,
that handles race condition that Cornelia mentioned below.

> This is probably fine; but I noted that documentation on the locking
> conventions and serialization guarantees for mdev is a bit sparse, and
> this topic also came up during the vfio-ap review.
> 
> We probably want to add some more concrete documentation; would the
> kernel doc for the _ops or vfio-mediated-device.txt be a better place
> for that?
> 
> [Dong Jia, Halil: Can you please take a look whether vfio-ccw is really
> ok? I don't think we open up any new races, but I'd appreciate a second
> or third opinion.]
> 
>>
>> Signed-off-by: Alex Williamson 
>> ---
>>
>> v3: Rework locking and add a field to mdev_device so we can track
>> completed instances vs those added to reserve the namespace.
>>
>>  drivers/vfio/mdev/mdev_core.c|   94 
>> +-
>>  drivers/vfio/mdev/mdev_private.h |2 -
>>  2 files changed, 34 insertions(+), 62 deletions(-)
>>
>> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
>> index 126991046eb7..55ea9d34ec69 100644
>> --- a/drivers/vfio/mdev/mdev_core.c
>> +++ b/drivers/vfio/mdev/mdev_core.c
>> @@ -66,34 +66,6 @@ uuid_le mdev_uuid(struct mdev_device *mdev)
>>  }
>>  EXPORT_SYMBOL(mdev_uuid);
>>  
>> -static int _find_mdev_device(struct device *dev, void *data)
>> -{
>> -struct mdev_device *mdev;
>> -
>> -if (!dev_is_mdev(dev))
>> -return 0;
>> -
>> -mdev = to_mdev_device(dev);
>> -
>> -if (uuid_le_cmp(mdev->uuid, *(uuid_le *)data) == 0)
>> -return 1;
>> -
>> -return 0;
>> -}
>> -
>> -static bool mdev_device_exist(struct mdev_parent *parent, uuid_le uuid)
>> -{
>> -struct device *dev;
>> -
>> -dev = device_find_child(parent->dev, , _find_mdev_device);
>> -if (dev) {
>> -put_device(dev);
>> -return true;
>> -}
>> -
>> -return false;
>> -}
>> -
>>  /* Should be called holding parent_list_lock */
>>  static struct mdev_parent *__find_parent_device(struct device *dev)
>>  {
>> @@ -221,7 +193,6 @@ int mdev_register_device(struct device *dev, const 
>> struct mdev_parent_ops *ops)
>>  }
>>  
>>  kref_init(>ref);
>> -mutex_init(>lock);
>>  
>>  parent->dev = dev;
>>  parent->ops = ops;
>> @@ -304,7 +275,7 @@ static void mdev_device_release(struct device *dev)
>>  int mdev_device_create(struct kobject *kobj, struct device *dev, uuid_le 
>> uuid)
>>  {
>>  int ret;
>> -struct mdev_device *mdev;
>> +struct mdev_device *mdev, *tmp;
>>  struct mdev_parent *parent;
>>  struct mdev_type *type = to_mdev_type(kobj);
>>  
>> @@ -312,21 +283,26 @@ int mdev_device_create(struct kobject *kobj, struct 
>> device *dev, uuid_le uuid)
>>  if (!parent)
>>  return -EINVAL;
>>  
>> -mutex_lock(>lock);
>> +mutex_lock(_list_lock);
>>  
>>  /* Check for duplicate */
>> -if (mdev_device_exist(parent, uuid)) {
>> -ret = -EEXIST;
>> -goto create_err;
>> +list_for_each_entry(tmp, _list, next) {
>> +if (!uuid_le_cmp(tmp->uuid, uuid)) {
>> +mutex_unlock(_list_lock);
>> +return -EEXIST;
>> +}
>>  }
>>  

mdev_put_parent(parent) missing before return.


>>  mdev = kzalloc(sizeof(*mdev), GFP_KERNEL);
>>  if (!mdev) {
>> -ret = -ENOMEM;
>> -goto create_err;
>> +mutex_unlock(_list_lock);
>> +return -ENOMEM;
>>  }
>>

mdev_put_parent(parent) missing here again.

Thanks,
Kirti

>>  memcpy(>uuid, , sizeof(uuid_le));
>> +

Re: [PATCH v2] vfio/mdev: Check globally for duplicate devices

2018-05-16 Thread Kirti Wankhede


On 5/16/2018 8:53 PM, Alex Williamson wrote:
> When we create an mdev device, we check for duplicates against the
> parent device and return -EEXIST if found, but the mdev device
> namespace is global since we'll link all devices from the bus.  We do
> catch this later in sysfs_do_create_link_sd() to return -EEXIST, but
> with it comes a kernel warning and stack trace for trying to create
> duplicate sysfs links, which makes it an undesirable response.
> 
> Therefore we should really be looking for duplicates across all mdev
> parent devices, or as implemented here, against our mdev device list.
> 
> Notably, mdev_parent.lock really only seems to be serializing device
> creation and removal per parent.  I'm not sure if this is necessary,
> mdev vendor drivers could easily provide this serialization if it
> is required, but a side-effect of holding the mdev_list_lock to
> protect the namespace is actually greater serialization across the
> create and remove paths,

Exactly for this reason more granular lock is used and that's the reason
mdev_parent.lock was introduced. Consider the max supported config for
vGPU: 8 GPUs in a system with 16 mdev devices on each GPUs, i.e. 128
mdev devices need to be created in a system, and this count will
increase in future, all mdev device creation/removal will get serialized
with this change.
I agree with your concern that if there are duplicates across parents,
its not caught earlier.

> so mdev_parent.lock is removed.  If we can
> show that vendor drivers handle the create/remove paths themselves,
> perhaps we can refine the locking granularity.
> 

Here lock is not for create/remove routines of vendor driver, its about
mdev device creation and device registration, which is a common code
path, and so is part of mdev core module.


> Reviewed-by: Cornelia Huck 
> Signed-off-by: Alex Williamson 
> ---
> 
> v2: Remove unnecessary ret init per Cornelia's review
> 
>  drivers/vfio/mdev/mdev_core.c|   77 
> +-
>  drivers/vfio/mdev/mdev_private.h |1 
>  2 files changed, 19 insertions(+), 59 deletions(-)
> 
> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> index 126991046eb7..aaab3ef93e1c 100644
> --- a/drivers/vfio/mdev/mdev_core.c
> +++ b/drivers/vfio/mdev/mdev_core.c
> @@ -66,34 +66,6 @@ uuid_le mdev_uuid(struct mdev_device *mdev)
>  }
>  EXPORT_SYMBOL(mdev_uuid);
>  
> -static int _find_mdev_device(struct device *dev, void *data)
> -{
> - struct mdev_device *mdev;
> -
> - if (!dev_is_mdev(dev))
> - return 0;
> -
> - mdev = to_mdev_device(dev);
> -
> - if (uuid_le_cmp(mdev->uuid, *(uuid_le *)data) == 0)
> - return 1;
> -
> - return 0;
> -}
> -
> -static bool mdev_device_exist(struct mdev_parent *parent, uuid_le uuid)
> -{
> - struct device *dev;
> -
> - dev = device_find_child(parent->dev, , _find_mdev_device);
> - if (dev) {
> - put_device(dev);
> - return true;
> - }
> -
> - return false;
> -}
> -
>  /* Should be called holding parent_list_lock */
>  static struct mdev_parent *__find_parent_device(struct device *dev)
>  {
> @@ -221,7 +193,6 @@ int mdev_register_device(struct device *dev, const struct 
> mdev_parent_ops *ops)
>   }
>  
>   kref_init(>ref);
> - mutex_init(>lock);
>  
>   parent->dev = dev;
>   parent->ops = ops;
> @@ -304,7 +275,7 @@ static void mdev_device_release(struct device *dev)
>  int mdev_device_create(struct kobject *kobj, struct device *dev, uuid_le 
> uuid)
>  {
>   int ret;
> - struct mdev_device *mdev;
> + struct mdev_device *mdev, *tmp;
>   struct mdev_parent *parent;
>   struct mdev_type *type = to_mdev_type(kobj);
>  
> @@ -312,12 +283,14 @@ int mdev_device_create(struct kobject *kobj, struct 
> device *dev, uuid_le uuid)
>   if (!parent)
>   return -EINVAL;
>  
> - mutex_lock(>lock);
> + mutex_lock(_list_lock);
>  
>   /* Check for duplicate */
> - if (mdev_device_exist(parent, uuid)) {
> - ret = -EEXIST;
> - goto create_err;
> + list_for_each_entry(tmp, _list, next) {
> + if (!uuid_le_cmp(tmp->uuid, uuid)) {
> + ret = -EEXIST;
> + goto create_err;
> + }
>   }
>
Is it possible to use mdev_list_lock for as minimal portion as possible?
By adding mdev device to mdev_list just after:
memcpy(>uuid, , sizeof(uuid_le));
and then unlock mdev_list_lock, but at the same time all later error
cases need to be handled properly in this function.

Thanks,
Kirti


>   mdev = kzalloc(sizeof(*mdev), GFP_KERNEL);
> @@ -354,9 +327,6 @@ int mdev_device_create(struct kobject *kobj, struct 
> device *dev, uuid_le uuid)
>   mdev->type_kobj = kobj;
>   dev_dbg(>dev, "MDEV: created\n");
>  
> - mutex_unlock(>lock);
> -
> - mutex_lock(_list_lock);
>   

Re: [PATCH v2] vfio/mdev: Check globally for duplicate devices

2018-05-16 Thread Kirti Wankhede


On 5/16/2018 8:53 PM, Alex Williamson wrote:
> When we create an mdev device, we check for duplicates against the
> parent device and return -EEXIST if found, but the mdev device
> namespace is global since we'll link all devices from the bus.  We do
> catch this later in sysfs_do_create_link_sd() to return -EEXIST, but
> with it comes a kernel warning and stack trace for trying to create
> duplicate sysfs links, which makes it an undesirable response.
> 
> Therefore we should really be looking for duplicates across all mdev
> parent devices, or as implemented here, against our mdev device list.
> 
> Notably, mdev_parent.lock really only seems to be serializing device
> creation and removal per parent.  I'm not sure if this is necessary,
> mdev vendor drivers could easily provide this serialization if it
> is required, but a side-effect of holding the mdev_list_lock to
> protect the namespace is actually greater serialization across the
> create and remove paths,

Exactly for this reason more granular lock is used and that's the reason
mdev_parent.lock was introduced. Consider the max supported config for
vGPU: 8 GPUs in a system with 16 mdev devices on each GPUs, i.e. 128
mdev devices need to be created in a system, and this count will
increase in future, all mdev device creation/removal will get serialized
with this change.
I agree with your concern that if there are duplicates across parents,
its not caught earlier.

> so mdev_parent.lock is removed.  If we can
> show that vendor drivers handle the create/remove paths themselves,
> perhaps we can refine the locking granularity.
> 

Here lock is not for create/remove routines of vendor driver, its about
mdev device creation and device registration, which is a common code
path, and so is part of mdev core module.


> Reviewed-by: Cornelia Huck 
> Signed-off-by: Alex Williamson 
> ---
> 
> v2: Remove unnecessary ret init per Cornelia's review
> 
>  drivers/vfio/mdev/mdev_core.c|   77 
> +-
>  drivers/vfio/mdev/mdev_private.h |1 
>  2 files changed, 19 insertions(+), 59 deletions(-)
> 
> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> index 126991046eb7..aaab3ef93e1c 100644
> --- a/drivers/vfio/mdev/mdev_core.c
> +++ b/drivers/vfio/mdev/mdev_core.c
> @@ -66,34 +66,6 @@ uuid_le mdev_uuid(struct mdev_device *mdev)
>  }
>  EXPORT_SYMBOL(mdev_uuid);
>  
> -static int _find_mdev_device(struct device *dev, void *data)
> -{
> - struct mdev_device *mdev;
> -
> - if (!dev_is_mdev(dev))
> - return 0;
> -
> - mdev = to_mdev_device(dev);
> -
> - if (uuid_le_cmp(mdev->uuid, *(uuid_le *)data) == 0)
> - return 1;
> -
> - return 0;
> -}
> -
> -static bool mdev_device_exist(struct mdev_parent *parent, uuid_le uuid)
> -{
> - struct device *dev;
> -
> - dev = device_find_child(parent->dev, , _find_mdev_device);
> - if (dev) {
> - put_device(dev);
> - return true;
> - }
> -
> - return false;
> -}
> -
>  /* Should be called holding parent_list_lock */
>  static struct mdev_parent *__find_parent_device(struct device *dev)
>  {
> @@ -221,7 +193,6 @@ int mdev_register_device(struct device *dev, const struct 
> mdev_parent_ops *ops)
>   }
>  
>   kref_init(>ref);
> - mutex_init(>lock);
>  
>   parent->dev = dev;
>   parent->ops = ops;
> @@ -304,7 +275,7 @@ static void mdev_device_release(struct device *dev)
>  int mdev_device_create(struct kobject *kobj, struct device *dev, uuid_le 
> uuid)
>  {
>   int ret;
> - struct mdev_device *mdev;
> + struct mdev_device *mdev, *tmp;
>   struct mdev_parent *parent;
>   struct mdev_type *type = to_mdev_type(kobj);
>  
> @@ -312,12 +283,14 @@ int mdev_device_create(struct kobject *kobj, struct 
> device *dev, uuid_le uuid)
>   if (!parent)
>   return -EINVAL;
>  
> - mutex_lock(>lock);
> + mutex_lock(_list_lock);
>  
>   /* Check for duplicate */
> - if (mdev_device_exist(parent, uuid)) {
> - ret = -EEXIST;
> - goto create_err;
> + list_for_each_entry(tmp, _list, next) {
> + if (!uuid_le_cmp(tmp->uuid, uuid)) {
> + ret = -EEXIST;
> + goto create_err;
> + }
>   }
>
Is it possible to use mdev_list_lock for as minimal portion as possible?
By adding mdev device to mdev_list just after:
memcpy(>uuid, , sizeof(uuid_le));
and then unlock mdev_list_lock, but at the same time all later error
cases need to be handled properly in this function.

Thanks,
Kirti


>   mdev = kzalloc(sizeof(*mdev), GFP_KERNEL);
> @@ -354,9 +327,6 @@ int mdev_device_create(struct kobject *kobj, struct 
> device *dev, uuid_le uuid)
>   mdev->type_kobj = kobj;
>   dev_dbg(>dev, "MDEV: created\n");
>  
> - mutex_unlock(>lock);
> -
> - mutex_lock(_list_lock);
>   list_add(>next, _list);
>   mutex_unlock(_list_lock);
>  

Re: [PATCH] vfio-mdev/samples: change RDI interrupt condition

2018-03-08 Thread Kirti Wankhede

Thanks for fixing it.
Patch looks good to me.
+Alex to pull this patch.

Reviewed by: Kirti Wankhede <kwankh...@nvidia.com>

Thanks,
Kirti

On 3/8/2018 12:38 PM, Shunyong Yang wrote:
> When FIFO mode is enabled, the receive data available interrupt
> (UART_IIR_RDI in code) should be triggered when the number of data
> in FIFO is equal or larger than interrupt trigger level.
> 
> This patch changes the trigger level check to ensure multiple bytes
> received from upper layer can trigger RDI interrupt correctly.
> 
> Cc: Joey Zheng <yu.zh...@hxt-semitech.com>
> Signed-off-by: Shunyong Yang <shunyong.y...@hxt-semitech.com>
> ---
>  samples/vfio-mdev/mtty.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/samples/vfio-mdev/mtty.c b/samples/vfio-mdev/mtty.c
> index 09f255bdf3ac..7abb79d8313d 100644
> --- a/samples/vfio-mdev/mtty.c
> +++ b/samples/vfio-mdev/mtty.c
> @@ -534,7 +534,7 @@ static void handle_bar_read(unsigned int index, struct 
> mdev_state *mdev_state,
>  
>   /* Interrupt priority 2: Fifo trigger level reached */
>   if ((ier & UART_IER_RDI) &&
> - (mdev_state->s[index].rxtx.count ==
> + (mdev_state->s[index].rxtx.count >=
> mdev_state->s[index].intr_trigger_level))
>   *buf |= UART_IIR_RDI;
>  
> 


Re: [PATCH] vfio-mdev/samples: change RDI interrupt condition

2018-03-08 Thread Kirti Wankhede

Thanks for fixing it.
Patch looks good to me.
+Alex to pull this patch.

Reviewed by: Kirti Wankhede 

Thanks,
Kirti

On 3/8/2018 12:38 PM, Shunyong Yang wrote:
> When FIFO mode is enabled, the receive data available interrupt
> (UART_IIR_RDI in code) should be triggered when the number of data
> in FIFO is equal or larger than interrupt trigger level.
> 
> This patch changes the trigger level check to ensure multiple bytes
> received from upper layer can trigger RDI interrupt correctly.
> 
> Cc: Joey Zheng 
> Signed-off-by: Shunyong Yang 
> ---
>  samples/vfio-mdev/mtty.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/samples/vfio-mdev/mtty.c b/samples/vfio-mdev/mtty.c
> index 09f255bdf3ac..7abb79d8313d 100644
> --- a/samples/vfio-mdev/mtty.c
> +++ b/samples/vfio-mdev/mtty.c
> @@ -534,7 +534,7 @@ static void handle_bar_read(unsigned int index, struct 
> mdev_state *mdev_state,
>  
>   /* Interrupt priority 2: Fifo trigger level reached */
>   if ((ier & UART_IER_RDI) &&
> - (mdev_state->s[index].rxtx.count ==
> + (mdev_state->s[index].rxtx.count >=
> mdev_state->s[index].intr_trigger_level))
>   *buf |= UART_IIR_RDI;
>  
> 


Re: [PATCH] vfio: mdev: make a couple of functions and structure vfio_mdev_driver static

2017-12-26 Thread Kirti Wankhede


On 12/22/2017 4:42 AM, Xiongwei Song wrote:
> The functions vfio_mdev_probe, vfio_mdev_remove and the structure
> vfio_mdev_driver are only used in this file, so make them static.
> 
> Clean up sparse warnings:
> drivers/vfio/mdev/vfio_mdev.c:114:5: warning: no previous prototype
> for 'vfio_mdev_probe' [-Wmissing-prototypes]
> drivers/vfio/mdev/vfio_mdev.c:121:6: warning: no previous prototype
> for 'vfio_mdev_remove' [-Wmissing-prototypes]
> 
> Signed-off-by: Xiongwei Song <sxwj...@gmail.com>
> ---
>  drivers/vfio/mdev/vfio_mdev.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/vfio/mdev/vfio_mdev.c b/drivers/vfio/mdev/vfio_mdev.c
> index fa848a701b8b..d230620fe02d 100644
> --- a/drivers/vfio/mdev/vfio_mdev.c
> +++ b/drivers/vfio/mdev/vfio_mdev.c
> @@ -111,19 +111,19 @@ static const struct vfio_device_ops vfio_mdev_dev_ops = 
> {
>   .mmap   = vfio_mdev_mmap,
>  };
>  
> -int vfio_mdev_probe(struct device *dev)
> +static int vfio_mdev_probe(struct device *dev)
>  {
>   struct mdev_device *mdev = to_mdev_device(dev);
>  
>   return vfio_add_group_dev(dev, _mdev_dev_ops, mdev);
>  }
>  
> -void vfio_mdev_remove(struct device *dev)
> +static void vfio_mdev_remove(struct device *dev)
>  {
>   vfio_del_group_dev(dev);
>  }
>  
> -struct mdev_driver vfio_mdev_driver = {
> +static struct mdev_driver vfio_mdev_driver = {
>   .name   = "vfio_mdev",
>   .probe  = vfio_mdev_probe,
>   .remove = vfio_mdev_remove,
> 

Reviewed by: Kirti Wankhede <kwankh...@nvidia.com>


Re: [PATCH] vfio: mdev: make a couple of functions and structure vfio_mdev_driver static

2017-12-26 Thread Kirti Wankhede


On 12/22/2017 4:42 AM, Xiongwei Song wrote:
> The functions vfio_mdev_probe, vfio_mdev_remove and the structure
> vfio_mdev_driver are only used in this file, so make them static.
> 
> Clean up sparse warnings:
> drivers/vfio/mdev/vfio_mdev.c:114:5: warning: no previous prototype
> for 'vfio_mdev_probe' [-Wmissing-prototypes]
> drivers/vfio/mdev/vfio_mdev.c:121:6: warning: no previous prototype
> for 'vfio_mdev_remove' [-Wmissing-prototypes]
> 
> Signed-off-by: Xiongwei Song 
> ---
>  drivers/vfio/mdev/vfio_mdev.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/vfio/mdev/vfio_mdev.c b/drivers/vfio/mdev/vfio_mdev.c
> index fa848a701b8b..d230620fe02d 100644
> --- a/drivers/vfio/mdev/vfio_mdev.c
> +++ b/drivers/vfio/mdev/vfio_mdev.c
> @@ -111,19 +111,19 @@ static const struct vfio_device_ops vfio_mdev_dev_ops = 
> {
>   .mmap   = vfio_mdev_mmap,
>  };
>  
> -int vfio_mdev_probe(struct device *dev)
> +static int vfio_mdev_probe(struct device *dev)
>  {
>   struct mdev_device *mdev = to_mdev_device(dev);
>  
>   return vfio_add_group_dev(dev, _mdev_dev_ops, mdev);
>  }
>  
> -void vfio_mdev_remove(struct device *dev)
> +static void vfio_mdev_remove(struct device *dev)
>  {
>   vfio_del_group_dev(dev);
>  }
>  
> -struct mdev_driver vfio_mdev_driver = {
> +static struct mdev_driver vfio_mdev_driver = {
>       .name   = "vfio_mdev",
>   .probe  = vfio_mdev_probe,
>   .remove = vfio_mdev_remove,
> 

Reviewed by: Kirti Wankhede 


Re: [PATCH] vfio: Simplify capability helper

2017-12-13 Thread Kirti Wankhede


On 12/13/2017 12:31 PM, Zhenyu Wang wrote:
> On 2017.12.13 12:13:34 +1100, Alexey Kardashevskiy wrote:
>> On 13/12/17 06:59, Alex Williamson wrote:
>>> The vfio_info_add_capability() helper requires the caller to pass a
>>> capability ID, which it then uses to fill in header fields, assuming
>>> hard coded versions.  This makes for an awkward and rigid interface.
>>> The only thing we want this helper to do is allocate sufficient
>>> space in the caps buffer and chain this capability into the list.
>>> Reduce it to that simple task.
>>>
>>> Signed-off-by: Alex Williamson <alex.william...@redhat.com>
>>
>>
>> Makes more sense now, thanks. I'll repost mine on top of this.
>>
>>
>> Reviewed-by: Alexey Kardashevskiy <a...@ozlabs.ru>
>>
>>
> 
> Looks good for KVMGT part.
> 
> Acked-by: Zhenyu Wang <zhen...@linux.intel.com>
> 

Looks good to me too.

Reviewed-by: Kirti Wankhede <kwankh...@nvidia.com>

Thanks,
Kirti


>> Below one observation, unrelated to this patch.
>>
>>> ---
>>>  drivers/gpu/drm/i915/gvt/kvmgt.c |   15 +++
>>>  drivers/vfio/pci/vfio_pci.c  |   14 ++
>>>  drivers/vfio/vfio.c  |   52 
>>> +++---
>>>  include/linux/vfio.h |3 +-
>>>  4 files changed, 24 insertions(+), 60 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c 
>>> b/drivers/gpu/drm/i915/gvt/kvmgt.c
>>> index 96060920a6fe..0a7d084da1a2 100644
>>> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
>>> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
>>> @@ -1012,6 +1012,8 @@ static long intel_vgpu_ioctl(struct mdev_device 
>>> *mdev, unsigned int cmd,
>>> if (!sparse)
>>> return -ENOMEM;
>>>  
>>> +   sparse->header.id = VFIO_REGION_INFO_CAP_SPARSE_MMAP;
>>> +   sparse->header.version = 1;
>>> sparse->nr_areas = nr_areas;
>>> cap_type_id = VFIO_REGION_INFO_CAP_SPARSE_MMAP;
>>
>>
>> @cap_type_id is initialized in just one of many cases of switch
>> (info.index) and after the entire switch, there is switch (cap_type_id). I
>> wonder why compiler missed "potentially uninitialized variable here,
>> although there is no bug - @cap_type_id is in sync with @spapse. It would
>> make it cleaner imho just to have vfio_info_add_capability() next to the
>> header initialization.
>>
> 
> yeah, we could clean that up, thanks for pointing out.
> 
>>
>>
>>> sparse->areas[0].offset =
>>> @@ -1033,7 +1035,9 @@ static long intel_vgpu_ioctl(struct mdev_device 
>>> *mdev, unsigned int cmd,
>>> break;
>>> default:
>>> {
>>> -   struct vfio_region_info_cap_type cap_type;
>>> +   struct vfio_region_info_cap_type cap_type = {
>>> +   .header.id = VFIO_REGION_INFO_CAP_TYPE,
>>> +   .header.version = 1 };
>>>  
>>> if (info.index >= VFIO_PCI_NUM_REGIONS +
>>> vgpu->vdev.num_regions)
>>> @@ -1050,8 +1054,8 @@ static long intel_vgpu_ioctl(struct mdev_device 
>>> *mdev, unsigned int cmd,
>>> cap_type.subtype = vgpu->vdev.region[i].subtype;
>>>  
>>> ret = vfio_info_add_capability(,
>>> -   VFIO_REGION_INFO_CAP_TYPE,
>>> -   _type);
>>> +   _type.header,
>>> +   sizeof(cap_type));
>>> if (ret)
>>> return ret;
>>> }
>>> @@ -1061,8 +1065,9 @@ static long intel_vgpu_ioctl(struct mdev_device 
>>> *mdev, unsigned int cmd,
>>> switch (cap_type_id) {
>>> case VFIO_REGION_INFO_CAP_SPARSE_MMAP:
>>> ret = vfio_info_add_capability(,
>>> -   VFIO_REGION_INFO_CAP_SPARSE_MMAP,
>>> -   sparse);
>>> +   >

Re: [PATCH] vfio: Simplify capability helper

2017-12-13 Thread Kirti Wankhede


On 12/13/2017 12:31 PM, Zhenyu Wang wrote:
> On 2017.12.13 12:13:34 +1100, Alexey Kardashevskiy wrote:
>> On 13/12/17 06:59, Alex Williamson wrote:
>>> The vfio_info_add_capability() helper requires the caller to pass a
>>> capability ID, which it then uses to fill in header fields, assuming
>>> hard coded versions.  This makes for an awkward and rigid interface.
>>> The only thing we want this helper to do is allocate sufficient
>>> space in the caps buffer and chain this capability into the list.
>>> Reduce it to that simple task.
>>>
>>> Signed-off-by: Alex Williamson 
>>
>>
>> Makes more sense now, thanks. I'll repost mine on top of this.
>>
>>
>> Reviewed-by: Alexey Kardashevskiy 
>>
>>
> 
> Looks good for KVMGT part.
> 
> Acked-by: Zhenyu Wang 
> 

Looks good to me too.

Reviewed-by: Kirti Wankhede 

Thanks,
Kirti


>> Below one observation, unrelated to this patch.
>>
>>> ---
>>>  drivers/gpu/drm/i915/gvt/kvmgt.c |   15 +++
>>>  drivers/vfio/pci/vfio_pci.c  |   14 ++
>>>  drivers/vfio/vfio.c  |   52 
>>> +++---
>>>  include/linux/vfio.h |3 +-
>>>  4 files changed, 24 insertions(+), 60 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c 
>>> b/drivers/gpu/drm/i915/gvt/kvmgt.c
>>> index 96060920a6fe..0a7d084da1a2 100644
>>> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
>>> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
>>> @@ -1012,6 +1012,8 @@ static long intel_vgpu_ioctl(struct mdev_device 
>>> *mdev, unsigned int cmd,
>>> if (!sparse)
>>> return -ENOMEM;
>>>  
>>> +   sparse->header.id = VFIO_REGION_INFO_CAP_SPARSE_MMAP;
>>> +   sparse->header.version = 1;
>>> sparse->nr_areas = nr_areas;
>>> cap_type_id = VFIO_REGION_INFO_CAP_SPARSE_MMAP;
>>
>>
>> @cap_type_id is initialized in just one of many cases of switch
>> (info.index) and after the entire switch, there is switch (cap_type_id). I
>> wonder why compiler missed "potentially uninitialized variable here,
>> although there is no bug - @cap_type_id is in sync with @spapse. It would
>> make it cleaner imho just to have vfio_info_add_capability() next to the
>> header initialization.
>>
> 
> yeah, we could clean that up, thanks for pointing out.
> 
>>
>>
>>> sparse->areas[0].offset =
>>> @@ -1033,7 +1035,9 @@ static long intel_vgpu_ioctl(struct mdev_device 
>>> *mdev, unsigned int cmd,
>>> break;
>>> default:
>>> {
>>> -   struct vfio_region_info_cap_type cap_type;
>>> +   struct vfio_region_info_cap_type cap_type = {
>>> +   .header.id = VFIO_REGION_INFO_CAP_TYPE,
>>> +   .header.version = 1 };
>>>  
>>> if (info.index >= VFIO_PCI_NUM_REGIONS +
>>> vgpu->vdev.num_regions)
>>> @@ -1050,8 +1054,8 @@ static long intel_vgpu_ioctl(struct mdev_device 
>>> *mdev, unsigned int cmd,
>>> cap_type.subtype = vgpu->vdev.region[i].subtype;
>>>  
>>> ret = vfio_info_add_capability(,
>>> -   VFIO_REGION_INFO_CAP_TYPE,
>>> -   _type);
>>> +   _type.header,
>>> +   sizeof(cap_type));
>>> if (ret)
>>> return ret;
>>> }
>>> @@ -1061,8 +1065,9 @@ static long intel_vgpu_ioctl(struct mdev_device 
>>> *mdev, unsigned int cmd,
>>> switch (cap_type_id) {
>>> case VFIO_REGION_INFO_CAP_SPARSE_MMAP:
>>> ret = vfio_info_add_capability(,
>>> -   VFIO_REGION_INFO_CAP_SPARSE_MMAP,
>>> -   sparse);
>>> +   >header, sizeof(*sparse) +
>>> +   (sparse->nr_areas *
>>>

Re: [PATCH v18 5/6] vfio: ABI for mdev display dma-buf operation

2017-11-18 Thread Kirti Wankhede
Extremely sorry for the delay.
This works for VFIO_GFX_PLANE_TYPE_REGION. Tested with local changes.

Reviewed-by: Kirti Wankhede <kwankh...@nvidia.com>

Thanks,
Kirti

On 11/18/2017 9:00 PM, Alex Williamson wrote:
> 
> Kirti?
> 
> On Wed, 15 Nov 2017 21:11:42 -0700
> Alex Williamson <alex.william...@redhat.com> wrote:
> 
>> On Thu, 16 Nov 2017 11:21:56 +0800
>> Zhenyu Wang <zhen...@linux.intel.com> wrote:
>>
>>> On 2017.11.15 11:48:42 -0700, Alex Williamson wrote:  
>>>> On Wed, 15 Nov 2017 17:11:54 +0800
>>>> Tina Zhang <tina.zh...@intel.com> wrote:
>>>> 
>>>>> Add VFIO_DEVICE_QUERY_GFX_PLANE ioctl command to let user query and get
>>>>> a plane and its information. So far, two types of buffers are supported:
>>>>> buffers based on dma-buf and buffers based on region.
>>>>>
>>>>> This ioctl can be invoked with:
>>>>> 1) Either DMABUF or REGION flag. Vendor driver returns a plane_info
>>>>> successfully only when the specific kind of buffer is supported.
>>>>> 2) Flag PROBE. And at the same time either DMABUF or REGION must be set,
>>>>> so that vendor driver returns success only when the specific kind of
>>>>> buffer is supported.
>>>>>
>>>>> Add VFIO_DEVICE_GET_GFX_DMABUF ioctl command to let user get a specific
>>>>> dma-buf fd of an exposed MDEV buffer provided by dmabuf_id which was
>>>>> returned in VFIO_DEVICE_QUERY_GFX_PLANE ioctl command.
>>>>>
>>>>> The life cycle of an exposed MDEV buffer is handled by userspace and
>>>>> tracked by kernel space. The returned dmabuf_id in struct vfio_device_
>>>>> query_gfx_plane can be a new id of a new exposed buffer or an old id of
>>>>> a re-exported buffer. Host user can check the value of dmabuf_id to see
>>>>> if it needs to create new resources according to the new exposed buffer
>>>>> or just re-use the existing resource related to the old buffer.
>>>>>
>>>>> v18:
>>>>> - update comments for VFIO_DEVICE_GET_GFX_DMABUF. (Alex)
>>>>>
>>>>> v17:
>>>>> - modify VFIO_DEVICE_GET_GFX_DMABUF interface. (Alex)
>>>>>
>>>>> v16:
>>>>> - add x_hot and y_hot fields. (Gerd)
>>>>> - add comments for VFIO_DEVICE_GET_GFX_DMABUF. (Alex)
>>>>> - rebase to 4.14.0-rc6.
>>>>>
>>>>> v15:
>>>>> - add a ioctl to get a dmabuf for a given dmabuf id. (Gerd)
>>>>>
>>>>> v14:
>>>>> - add PROBE, DMABUF and REGION flags. (Alex)
>>>>>
>>>>> v12:
>>>>> - add drm_format_mod back. (Gerd and Zhenyu)
>>>>> - add region_index. (Gerd)
>>>>>
>>>>> v11:
>>>>> - rename plane_type to drm_plane_type. (Gerd)
>>>>> - move fields of vfio_device_query_gfx_plane to 
>>>>> vfio_device_gfx_plane_info.
>>>>>   (Gerd)
>>>>> - remove drm_format_mod, start fields. (Daniel)
>>>>> - remove plane_id.
>>>>>
>>>>> v10:
>>>>> - refine the ABI API VFIO_DEVICE_QUERY_GFX_PLANE. (Alex) (Gerd)
>>>>>
>>>>> v3:
>>>>> - add a field gvt_plane_info in the drm_i915_gem_obj structure to save
>>>>>   the decoded plane information to avoid look up while need the plane
>>>>>   info. (Gerd)
>>>>>
>>>>> Signed-off-by: Tina Zhang <tina.zh...@intel.com>
>>>>> Cc: Gerd Hoffmann <kra...@redhat.com>
>>>>> Cc: Alex Williamson <alex.william...@redhat.com>
>>>>> Cc: Daniel Vetter <daniel.vet...@ffwll.ch>
>>>>> ---
>>>>>  include/uapi/linux/vfio.h | 62 
>>>>> +++
>>>>>  1 file changed, 62 insertions(+)
>>>>>
>>>>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>>>>> index ae46105..5c1cca2 100644
>>>>> --- a/include/uapi/linux/vfio.h
>>>>> +++ b/include/uapi/linux/vfio.h
>>>>> @@ -502,6 +502,68 @@ struct vfio_pci_hot_reset {
>>>>>  
>>>>>  #define VFIO_DEVICE_PCI_HOT_RESET_IO(VFIO_TYPE, VFIO_BASE + 13)
>>>>>  
>>>>> +/**
>>>>> + * VFIO_DEVICE_QUERY_GFX_PLANE - _IOW(VFIO

Re: [PATCH v18 5/6] vfio: ABI for mdev display dma-buf operation

2017-11-18 Thread Kirti Wankhede
Extremely sorry for the delay.
This works for VFIO_GFX_PLANE_TYPE_REGION. Tested with local changes.

Reviewed-by: Kirti Wankhede 

Thanks,
Kirti

On 11/18/2017 9:00 PM, Alex Williamson wrote:
> 
> Kirti?
> 
> On Wed, 15 Nov 2017 21:11:42 -0700
> Alex Williamson  wrote:
> 
>> On Thu, 16 Nov 2017 11:21:56 +0800
>> Zhenyu Wang  wrote:
>>
>>> On 2017.11.15 11:48:42 -0700, Alex Williamson wrote:  
>>>> On Wed, 15 Nov 2017 17:11:54 +0800
>>>> Tina Zhang  wrote:
>>>> 
>>>>> Add VFIO_DEVICE_QUERY_GFX_PLANE ioctl command to let user query and get
>>>>> a plane and its information. So far, two types of buffers are supported:
>>>>> buffers based on dma-buf and buffers based on region.
>>>>>
>>>>> This ioctl can be invoked with:
>>>>> 1) Either DMABUF or REGION flag. Vendor driver returns a plane_info
>>>>> successfully only when the specific kind of buffer is supported.
>>>>> 2) Flag PROBE. And at the same time either DMABUF or REGION must be set,
>>>>> so that vendor driver returns success only when the specific kind of
>>>>> buffer is supported.
>>>>>
>>>>> Add VFIO_DEVICE_GET_GFX_DMABUF ioctl command to let user get a specific
>>>>> dma-buf fd of an exposed MDEV buffer provided by dmabuf_id which was
>>>>> returned in VFIO_DEVICE_QUERY_GFX_PLANE ioctl command.
>>>>>
>>>>> The life cycle of an exposed MDEV buffer is handled by userspace and
>>>>> tracked by kernel space. The returned dmabuf_id in struct vfio_device_
>>>>> query_gfx_plane can be a new id of a new exposed buffer or an old id of
>>>>> a re-exported buffer. Host user can check the value of dmabuf_id to see
>>>>> if it needs to create new resources according to the new exposed buffer
>>>>> or just re-use the existing resource related to the old buffer.
>>>>>
>>>>> v18:
>>>>> - update comments for VFIO_DEVICE_GET_GFX_DMABUF. (Alex)
>>>>>
>>>>> v17:
>>>>> - modify VFIO_DEVICE_GET_GFX_DMABUF interface. (Alex)
>>>>>
>>>>> v16:
>>>>> - add x_hot and y_hot fields. (Gerd)
>>>>> - add comments for VFIO_DEVICE_GET_GFX_DMABUF. (Alex)
>>>>> - rebase to 4.14.0-rc6.
>>>>>
>>>>> v15:
>>>>> - add a ioctl to get a dmabuf for a given dmabuf id. (Gerd)
>>>>>
>>>>> v14:
>>>>> - add PROBE, DMABUF and REGION flags. (Alex)
>>>>>
>>>>> v12:
>>>>> - add drm_format_mod back. (Gerd and Zhenyu)
>>>>> - add region_index. (Gerd)
>>>>>
>>>>> v11:
>>>>> - rename plane_type to drm_plane_type. (Gerd)
>>>>> - move fields of vfio_device_query_gfx_plane to 
>>>>> vfio_device_gfx_plane_info.
>>>>>   (Gerd)
>>>>> - remove drm_format_mod, start fields. (Daniel)
>>>>> - remove plane_id.
>>>>>
>>>>> v10:
>>>>> - refine the ABI API VFIO_DEVICE_QUERY_GFX_PLANE. (Alex) (Gerd)
>>>>>
>>>>> v3:
>>>>> - add a field gvt_plane_info in the drm_i915_gem_obj structure to save
>>>>>   the decoded plane information to avoid look up while need the plane
>>>>>   info. (Gerd)
>>>>>
>>>>> Signed-off-by: Tina Zhang 
>>>>> Cc: Gerd Hoffmann 
>>>>> Cc: Alex Williamson 
>>>>> Cc: Daniel Vetter 
>>>>> ---
>>>>>  include/uapi/linux/vfio.h | 62 
>>>>> +++
>>>>>  1 file changed, 62 insertions(+)
>>>>>
>>>>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>>>>> index ae46105..5c1cca2 100644
>>>>> --- a/include/uapi/linux/vfio.h
>>>>> +++ b/include/uapi/linux/vfio.h
>>>>> @@ -502,6 +502,68 @@ struct vfio_pci_hot_reset {
>>>>>  
>>>>>  #define VFIO_DEVICE_PCI_HOT_RESET_IO(VFIO_TYPE, VFIO_BASE + 13)
>>>>>  
>>>>> +/**
>>>>> + * VFIO_DEVICE_QUERY_GFX_PLANE - _IOW(VFIO_TYPE, VFIO_BASE + 14,
>>>>> + *struct vfio_device_query_gfx_plane)
>>>>> + *
>>>>> + * Set the drm_plane_type and flags, then retrieve the gfx plane info.
>>>>&

Re: [RFC]Add new mdev interface for QoS

2017-08-08 Thread Kirti Wankhede


On 8/7/2017 1:11 PM, Gao, Ping A wrote:
> 
> On 2017/8/4 5:11, Alex Williamson wrote:
>> On Thu, 3 Aug 2017 20:26:14 +0800
>> "Gao, Ping A" <ping.a@intel.com> wrote:
>>
>>> On 2017/8/3 0:58, Alex Williamson wrote:
>>>> On Wed, 2 Aug 2017 21:16:28 +0530
>>>> Kirti Wankhede <kwankh...@nvidia.com> wrote:
>>>>  
>>>>> On 8/2/2017 6:29 PM, Gao, Ping A wrote:  
>>>>>> On 2017/8/2 18:19, Kirti Wankhede wrote:
>>>>>>> On 8/2/2017 3:56 AM, Alex Williamson wrote:
>>>>>>>> On Tue, 1 Aug 2017 13:54:27 +0800
>>>>>>>> "Gao, Ping A" <ping.a@intel.com> wrote:
>>>>>>>>
>>>>>>>>> On 2017/7/28 0:00, Gao, Ping A wrote:
>>>>>>>>>> On 2017/7/27 0:43, Alex Williamson wrote:  
>>>>>>>>>>> [cc +libvir-list]
>>>>>>>>>>>
>>>>>>>>>>> On Wed, 26 Jul 2017 21:16:59 +0800
>>>>>>>>>>> "Gao, Ping A" <ping.a@intel.com> wrote:
>>>>>>>>>>>  
>>>>>>>>>>>> The vfio-mdev provide the capability to let different guest share 
>>>>>>>>>>>> the
>>>>>>>>>>>> same physical device through mediate sharing, as result it bring a
>>>>>>>>>>>> requirement about how to control the device sharing, we need a QoS
>>>>>>>>>>>> related interface for mdev to management virtual device resource.
>>>>>>>>>>>>
>>>>>>>>>>>> E.g. In practical use, vGPUs assigned to different quests almost 
>>>>>>>>>>>> has
>>>>>>>>>>>> different performance requirements, some guests may need higher 
>>>>>>>>>>>> priority
>>>>>>>>>>>> for real time usage, some other may need more portion of the GPU
>>>>>>>>>>>> resource to get higher 3D performance, corresponding we can define 
>>>>>>>>>>>> some
>>>>>>>>>>>> interfaces like weight/cap for overall budget control, priority for
>>>>>>>>>>>> single submission control.
>>>>>>>>>>>>
>>>>>>>>>>>> So I suggest to add some common attributes which are vendor 
>>>>>>>>>>>> agnostic in
>>>>>>>>>>>> mdev core sysfs for QoS purpose.  
>>>>>>>>>>> I think what you're asking for is just some standardization of a QoS
>>>>>>>>>>> attribute_group which a vendor can optionally include within the
>>>>>>>>>>> existing mdev_parent_ops.mdev_attr_groups.  The mdev core will
>>>>>>>>>>> transparently enable this, but it really only provides the standard,
>>>>>>>>>>> all of the support code is left for the vendor.  I'm fine with that,
>>>>>>>>>>> but of course the trouble with and sort of standardization is 
>>>>>>>>>>> arriving
>>>>>>>>>>> at an agreed upon standard.  Are there QoS knobs that are generic
>>>>>>>>>>> across any mdev device type?  Are there others that are more 
>>>>>>>>>>> specific
>>>>>>>>>>> to vGPU?  Are there existing examples of this that we can steal 
>>>>>>>>>>> their
>>>>>>>>>>> specification?  
>>>>>>>>>> Yes, you are right, standardization QoS knobs are exactly what I 
>>>>>>>>>> wanted.
>>>>>>>>>> Only when it become a part of the mdev framework and libvirt, then 
>>>>>>>>>> QoS
>>>>>>>>>> such critical feature can be leveraged by cloud usage. HW vendor only
>>>>>>>>>> need to focus on the implementation of the corresponding QoS 
>>>>>>>>>> algorithm
>>>>>>>>>> in their back-end driver.
>>>>>>>>>>
>>>>>>>>>> Vfio-mdev framework provide the 

Re: [RFC]Add new mdev interface for QoS

2017-08-08 Thread Kirti Wankhede


On 8/7/2017 1:11 PM, Gao, Ping A wrote:
> 
> On 2017/8/4 5:11, Alex Williamson wrote:
>> On Thu, 3 Aug 2017 20:26:14 +0800
>> "Gao, Ping A"  wrote:
>>
>>> On 2017/8/3 0:58, Alex Williamson wrote:
>>>> On Wed, 2 Aug 2017 21:16:28 +0530
>>>> Kirti Wankhede  wrote:
>>>>  
>>>>> On 8/2/2017 6:29 PM, Gao, Ping A wrote:  
>>>>>> On 2017/8/2 18:19, Kirti Wankhede wrote:
>>>>>>> On 8/2/2017 3:56 AM, Alex Williamson wrote:
>>>>>>>> On Tue, 1 Aug 2017 13:54:27 +0800
>>>>>>>> "Gao, Ping A"  wrote:
>>>>>>>>
>>>>>>>>> On 2017/7/28 0:00, Gao, Ping A wrote:
>>>>>>>>>> On 2017/7/27 0:43, Alex Williamson wrote:  
>>>>>>>>>>> [cc +libvir-list]
>>>>>>>>>>>
>>>>>>>>>>> On Wed, 26 Jul 2017 21:16:59 +0800
>>>>>>>>>>> "Gao, Ping A"  wrote:
>>>>>>>>>>>  
>>>>>>>>>>>> The vfio-mdev provide the capability to let different guest share 
>>>>>>>>>>>> the
>>>>>>>>>>>> same physical device through mediate sharing, as result it bring a
>>>>>>>>>>>> requirement about how to control the device sharing, we need a QoS
>>>>>>>>>>>> related interface for mdev to management virtual device resource.
>>>>>>>>>>>>
>>>>>>>>>>>> E.g. In practical use, vGPUs assigned to different quests almost 
>>>>>>>>>>>> has
>>>>>>>>>>>> different performance requirements, some guests may need higher 
>>>>>>>>>>>> priority
>>>>>>>>>>>> for real time usage, some other may need more portion of the GPU
>>>>>>>>>>>> resource to get higher 3D performance, corresponding we can define 
>>>>>>>>>>>> some
>>>>>>>>>>>> interfaces like weight/cap for overall budget control, priority for
>>>>>>>>>>>> single submission control.
>>>>>>>>>>>>
>>>>>>>>>>>> So I suggest to add some common attributes which are vendor 
>>>>>>>>>>>> agnostic in
>>>>>>>>>>>> mdev core sysfs for QoS purpose.  
>>>>>>>>>>> I think what you're asking for is just some standardization of a QoS
>>>>>>>>>>> attribute_group which a vendor can optionally include within the
>>>>>>>>>>> existing mdev_parent_ops.mdev_attr_groups.  The mdev core will
>>>>>>>>>>> transparently enable this, but it really only provides the standard,
>>>>>>>>>>> all of the support code is left for the vendor.  I'm fine with that,
>>>>>>>>>>> but of course the trouble with and sort of standardization is 
>>>>>>>>>>> arriving
>>>>>>>>>>> at an agreed upon standard.  Are there QoS knobs that are generic
>>>>>>>>>>> across any mdev device type?  Are there others that are more 
>>>>>>>>>>> specific
>>>>>>>>>>> to vGPU?  Are there existing examples of this that we can steal 
>>>>>>>>>>> their
>>>>>>>>>>> specification?  
>>>>>>>>>> Yes, you are right, standardization QoS knobs are exactly what I 
>>>>>>>>>> wanted.
>>>>>>>>>> Only when it become a part of the mdev framework and libvirt, then 
>>>>>>>>>> QoS
>>>>>>>>>> such critical feature can be leveraged by cloud usage. HW vendor only
>>>>>>>>>> need to focus on the implementation of the corresponding QoS 
>>>>>>>>>> algorithm
>>>>>>>>>> in their back-end driver.
>>>>>>>>>>
>>>>>>>>>> Vfio-mdev framework provide the capability to share the device that 
>>>>>>>>>> lack
>>>>>&

Re: [RFC]Add new mdev interface for QoS

2017-08-02 Thread Kirti Wankhede


On 8/2/2017 6:29 PM, Gao, Ping A wrote:
> 
> On 2017/8/2 18:19, Kirti Wankhede wrote:
>>
>> On 8/2/2017 3:56 AM, Alex Williamson wrote:
>>> On Tue, 1 Aug 2017 13:54:27 +0800
>>> "Gao, Ping A" <ping.a@intel.com> wrote:
>>>
>>>> On 2017/7/28 0:00, Gao, Ping A wrote:
>>>>> On 2017/7/27 0:43, Alex Williamson wrote:  
>>>>>> [cc +libvir-list]
>>>>>>
>>>>>> On Wed, 26 Jul 2017 21:16:59 +0800
>>>>>> "Gao, Ping A" <ping.a@intel.com> wrote:
>>>>>>  
>>>>>>> The vfio-mdev provide the capability to let different guest share the
>>>>>>> same physical device through mediate sharing, as result it bring a
>>>>>>> requirement about how to control the device sharing, we need a QoS
>>>>>>> related interface for mdev to management virtual device resource.
>>>>>>>
>>>>>>> E.g. In practical use, vGPUs assigned to different quests almost has
>>>>>>> different performance requirements, some guests may need higher priority
>>>>>>> for real time usage, some other may need more portion of the GPU
>>>>>>> resource to get higher 3D performance, corresponding we can define some
>>>>>>> interfaces like weight/cap for overall budget control, priority for
>>>>>>> single submission control.
>>>>>>>
>>>>>>> So I suggest to add some common attributes which are vendor agnostic in
>>>>>>> mdev core sysfs for QoS purpose.  
>>>>>> I think what you're asking for is just some standardization of a QoS
>>>>>> attribute_group which a vendor can optionally include within the
>>>>>> existing mdev_parent_ops.mdev_attr_groups.  The mdev core will
>>>>>> transparently enable this, but it really only provides the standard,
>>>>>> all of the support code is left for the vendor.  I'm fine with that,
>>>>>> but of course the trouble with and sort of standardization is arriving
>>>>>> at an agreed upon standard.  Are there QoS knobs that are generic
>>>>>> across any mdev device type?  Are there others that are more specific
>>>>>> to vGPU?  Are there existing examples of this that we can steal their
>>>>>> specification?  
>>>>> Yes, you are right, standardization QoS knobs are exactly what I wanted.
>>>>> Only when it become a part of the mdev framework and libvirt, then QoS
>>>>> such critical feature can be leveraged by cloud usage. HW vendor only
>>>>> need to focus on the implementation of the corresponding QoS algorithm
>>>>> in their back-end driver.
>>>>>
>>>>> Vfio-mdev framework provide the capability to share the device that lack
>>>>> of HW virtualization support to guests, no matter the device type,
>>>>> mediated sharing actually is a time sharing multiplex method, from this
>>>>> point of view, QoS can be take as a generic way about how to control the
>>>>> time assignment for virtual mdev device that occupy HW. As result we can
>>>>> define QoS knob generic across any device type by this way. Even if HW
>>>>> has build in with some kind of QoS support, I think it's not a problem
>>>>> for back-end driver to convert mdev standard QoS definition to their
>>>>> specification to reach the same performance expectation. Seems there are
>>>>> no examples for us to follow, we need define it from scratch.
>>>>>
>>>>> I proposal universal QoS control interfaces like below:
>>>>>
>>>>> Cap: The cap limits the maximum percentage of time a mdev device can own
>>>>> physical device. e.g. cap=60, means mdev device cannot take over 60% of
>>>>> total physical resource.
>>>>>
>>>>> Weight: The weight define proportional control of the mdev device
>>>>> resource between guests, it’s orthogonal with Cap, to target load
>>>>> balancing. E.g. if guest 1 should take double mdev device resource
>>>>> compare with guest 2, need set weight ratio to 2:1.
>>>>>
>>>>> Priority: The guest who has higher priority will get execution first,
>>>>> target to some real time usage and speeding interactive response.
>>>>>
>>>>> Above QoS 

Re: [RFC]Add new mdev interface for QoS

2017-08-02 Thread Kirti Wankhede


On 8/2/2017 6:29 PM, Gao, Ping A wrote:
> 
> On 2017/8/2 18:19, Kirti Wankhede wrote:
>>
>> On 8/2/2017 3:56 AM, Alex Williamson wrote:
>>> On Tue, 1 Aug 2017 13:54:27 +0800
>>> "Gao, Ping A"  wrote:
>>>
>>>> On 2017/7/28 0:00, Gao, Ping A wrote:
>>>>> On 2017/7/27 0:43, Alex Williamson wrote:  
>>>>>> [cc +libvir-list]
>>>>>>
>>>>>> On Wed, 26 Jul 2017 21:16:59 +0800
>>>>>> "Gao, Ping A"  wrote:
>>>>>>  
>>>>>>> The vfio-mdev provide the capability to let different guest share the
>>>>>>> same physical device through mediate sharing, as result it bring a
>>>>>>> requirement about how to control the device sharing, we need a QoS
>>>>>>> related interface for mdev to management virtual device resource.
>>>>>>>
>>>>>>> E.g. In practical use, vGPUs assigned to different quests almost has
>>>>>>> different performance requirements, some guests may need higher priority
>>>>>>> for real time usage, some other may need more portion of the GPU
>>>>>>> resource to get higher 3D performance, corresponding we can define some
>>>>>>> interfaces like weight/cap for overall budget control, priority for
>>>>>>> single submission control.
>>>>>>>
>>>>>>> So I suggest to add some common attributes which are vendor agnostic in
>>>>>>> mdev core sysfs for QoS purpose.  
>>>>>> I think what you're asking for is just some standardization of a QoS
>>>>>> attribute_group which a vendor can optionally include within the
>>>>>> existing mdev_parent_ops.mdev_attr_groups.  The mdev core will
>>>>>> transparently enable this, but it really only provides the standard,
>>>>>> all of the support code is left for the vendor.  I'm fine with that,
>>>>>> but of course the trouble with and sort of standardization is arriving
>>>>>> at an agreed upon standard.  Are there QoS knobs that are generic
>>>>>> across any mdev device type?  Are there others that are more specific
>>>>>> to vGPU?  Are there existing examples of this that we can steal their
>>>>>> specification?  
>>>>> Yes, you are right, standardization QoS knobs are exactly what I wanted.
>>>>> Only when it become a part of the mdev framework and libvirt, then QoS
>>>>> such critical feature can be leveraged by cloud usage. HW vendor only
>>>>> need to focus on the implementation of the corresponding QoS algorithm
>>>>> in their back-end driver.
>>>>>
>>>>> Vfio-mdev framework provide the capability to share the device that lack
>>>>> of HW virtualization support to guests, no matter the device type,
>>>>> mediated sharing actually is a time sharing multiplex method, from this
>>>>> point of view, QoS can be take as a generic way about how to control the
>>>>> time assignment for virtual mdev device that occupy HW. As result we can
>>>>> define QoS knob generic across any device type by this way. Even if HW
>>>>> has build in with some kind of QoS support, I think it's not a problem
>>>>> for back-end driver to convert mdev standard QoS definition to their
>>>>> specification to reach the same performance expectation. Seems there are
>>>>> no examples for us to follow, we need define it from scratch.
>>>>>
>>>>> I proposal universal QoS control interfaces like below:
>>>>>
>>>>> Cap: The cap limits the maximum percentage of time a mdev device can own
>>>>> physical device. e.g. cap=60, means mdev device cannot take over 60% of
>>>>> total physical resource.
>>>>>
>>>>> Weight: The weight define proportional control of the mdev device
>>>>> resource between guests, it’s orthogonal with Cap, to target load
>>>>> balancing. E.g. if guest 1 should take double mdev device resource
>>>>> compare with guest 2, need set weight ratio to 2:1.
>>>>>
>>>>> Priority: The guest who has higher priority will get execution first,
>>>>> target to some real time usage and speeding interactive response.
>>>>>
>>>>> Above QoS interfaces cover both overall budget control and single

Re: [RFC]Add new mdev interface for QoS

2017-08-02 Thread Kirti Wankhede


On 8/2/2017 3:56 AM, Alex Williamson wrote:
> On Tue, 1 Aug 2017 13:54:27 +0800
> "Gao, Ping A"  wrote:
> 
>> On 2017/7/28 0:00, Gao, Ping A wrote:
>>> On 2017/7/27 0:43, Alex Williamson wrote:  
 [cc +libvir-list]

 On Wed, 26 Jul 2017 21:16:59 +0800
 "Gao, Ping A"  wrote:
  
> The vfio-mdev provide the capability to let different guest share the
> same physical device through mediate sharing, as result it bring a
> requirement about how to control the device sharing, we need a QoS
> related interface for mdev to management virtual device resource.
>
> E.g. In practical use, vGPUs assigned to different quests almost has
> different performance requirements, some guests may need higher priority
> for real time usage, some other may need more portion of the GPU
> resource to get higher 3D performance, corresponding we can define some
> interfaces like weight/cap for overall budget control, priority for
> single submission control.
>
> So I suggest to add some common attributes which are vendor agnostic in
> mdev core sysfs for QoS purpose.  
 I think what you're asking for is just some standardization of a QoS
 attribute_group which a vendor can optionally include within the
 existing mdev_parent_ops.mdev_attr_groups.  The mdev core will
 transparently enable this, but it really only provides the standard,
 all of the support code is left for the vendor.  I'm fine with that,
 but of course the trouble with and sort of standardization is arriving
 at an agreed upon standard.  Are there QoS knobs that are generic
 across any mdev device type?  Are there others that are more specific
 to vGPU?  Are there existing examples of this that we can steal their
 specification?  
>>> Yes, you are right, standardization QoS knobs are exactly what I wanted.
>>> Only when it become a part of the mdev framework and libvirt, then QoS
>>> such critical feature can be leveraged by cloud usage. HW vendor only
>>> need to focus on the implementation of the corresponding QoS algorithm
>>> in their back-end driver.
>>>
>>> Vfio-mdev framework provide the capability to share the device that lack
>>> of HW virtualization support to guests, no matter the device type,
>>> mediated sharing actually is a time sharing multiplex method, from this
>>> point of view, QoS can be take as a generic way about how to control the
>>> time assignment for virtual mdev device that occupy HW. As result we can
>>> define QoS knob generic across any device type by this way. Even if HW
>>> has build in with some kind of QoS support, I think it's not a problem
>>> for back-end driver to convert mdev standard QoS definition to their
>>> specification to reach the same performance expectation. Seems there are
>>> no examples for us to follow, we need define it from scratch.
>>>
>>> I proposal universal QoS control interfaces like below:
>>>
>>> Cap: The cap limits the maximum percentage of time a mdev device can own
>>> physical device. e.g. cap=60, means mdev device cannot take over 60% of
>>> total physical resource.
>>>
>>> Weight: The weight define proportional control of the mdev device
>>> resource between guests, it’s orthogonal with Cap, to target load
>>> balancing. E.g. if guest 1 should take double mdev device resource
>>> compare with guest 2, need set weight ratio to 2:1.
>>>
>>> Priority: The guest who has higher priority will get execution first,
>>> target to some real time usage and speeding interactive response.
>>>
>>> Above QoS interfaces cover both overall budget control and single
>>> submission control. I will sent out detail design later once get aligned.  
>>
>> Hi Alex,
>> Any comments about the interface mentioned above?
> 
> Not really.
> 
> Kirti, are there any QoS knobs that would be interesting
> for NVIDIA devices?
> 

We have different types of vGPU for different QoS factors.

When mdev devices are created, its resources are allocated irrespective
of which VM/userspace app is going to use that mdev device. Any
parameter we add here should be tied to particular mdev device and not
to the guest/app that are going to use it. 'Cap' and 'Priority' are
along that line. All mdev device might not need/use these parameters,
these can be made optional interfaces.

In the above proposal, I'm not sure how 'Weight' would work for mdev
devices on same physical device.

In the above example, "if guest 1 should take double mdev device
resource compare with guest 2" but what if guest 2 never booted, how
will you calculate resources?

If libvirt/other toolstack decides to do smart allocation based on type
name without taking physical host device as input, guest 1 and guest 2
might get mdev devices created on different physical device. Then would
weightage matter here?

Thanks,
Kirti


> Implementing libvirt support at the same time might be an interesting
> exercise if we don't 

Re: [RFC]Add new mdev interface for QoS

2017-08-02 Thread Kirti Wankhede


On 8/2/2017 3:56 AM, Alex Williamson wrote:
> On Tue, 1 Aug 2017 13:54:27 +0800
> "Gao, Ping A"  wrote:
> 
>> On 2017/7/28 0:00, Gao, Ping A wrote:
>>> On 2017/7/27 0:43, Alex Williamson wrote:  
 [cc +libvir-list]

 On Wed, 26 Jul 2017 21:16:59 +0800
 "Gao, Ping A"  wrote:
  
> The vfio-mdev provide the capability to let different guest share the
> same physical device through mediate sharing, as result it bring a
> requirement about how to control the device sharing, we need a QoS
> related interface for mdev to management virtual device resource.
>
> E.g. In practical use, vGPUs assigned to different quests almost has
> different performance requirements, some guests may need higher priority
> for real time usage, some other may need more portion of the GPU
> resource to get higher 3D performance, corresponding we can define some
> interfaces like weight/cap for overall budget control, priority for
> single submission control.
>
> So I suggest to add some common attributes which are vendor agnostic in
> mdev core sysfs for QoS purpose.  
 I think what you're asking for is just some standardization of a QoS
 attribute_group which a vendor can optionally include within the
 existing mdev_parent_ops.mdev_attr_groups.  The mdev core will
 transparently enable this, but it really only provides the standard,
 all of the support code is left for the vendor.  I'm fine with that,
 but of course the trouble with and sort of standardization is arriving
 at an agreed upon standard.  Are there QoS knobs that are generic
 across any mdev device type?  Are there others that are more specific
 to vGPU?  Are there existing examples of this that we can steal their
 specification?  
>>> Yes, you are right, standardization QoS knobs are exactly what I wanted.
>>> Only when it become a part of the mdev framework and libvirt, then QoS
>>> such critical feature can be leveraged by cloud usage. HW vendor only
>>> need to focus on the implementation of the corresponding QoS algorithm
>>> in their back-end driver.
>>>
>>> Vfio-mdev framework provide the capability to share the device that lack
>>> of HW virtualization support to guests, no matter the device type,
>>> mediated sharing actually is a time sharing multiplex method, from this
>>> point of view, QoS can be take as a generic way about how to control the
>>> time assignment for virtual mdev device that occupy HW. As result we can
>>> define QoS knob generic across any device type by this way. Even if HW
>>> has build in with some kind of QoS support, I think it's not a problem
>>> for back-end driver to convert mdev standard QoS definition to their
>>> specification to reach the same performance expectation. Seems there are
>>> no examples for us to follow, we need define it from scratch.
>>>
>>> I proposal universal QoS control interfaces like below:
>>>
>>> Cap: The cap limits the maximum percentage of time a mdev device can own
>>> physical device. e.g. cap=60, means mdev device cannot take over 60% of
>>> total physical resource.
>>>
>>> Weight: The weight define proportional control of the mdev device
>>> resource between guests, it’s orthogonal with Cap, to target load
>>> balancing. E.g. if guest 1 should take double mdev device resource
>>> compare with guest 2, need set weight ratio to 2:1.
>>>
>>> Priority: The guest who has higher priority will get execution first,
>>> target to some real time usage and speeding interactive response.
>>>
>>> Above QoS interfaces cover both overall budget control and single
>>> submission control. I will sent out detail design later once get aligned.  
>>
>> Hi Alex,
>> Any comments about the interface mentioned above?
> 
> Not really.
> 
> Kirti, are there any QoS knobs that would be interesting
> for NVIDIA devices?
> 

We have different types of vGPU for different QoS factors.

When mdev devices are created, its resources are allocated irrespective
of which VM/userspace app is going to use that mdev device. Any
parameter we add here should be tied to particular mdev device and not
to the guest/app that are going to use it. 'Cap' and 'Priority' are
along that line. All mdev device might not need/use these parameters,
these can be made optional interfaces.

In the above proposal, I'm not sure how 'Weight' would work for mdev
devices on same physical device.

In the above example, "if guest 1 should take double mdev device
resource compare with guest 2" but what if guest 2 never booted, how
will you calculate resources?

If libvirt/other toolstack decides to do smart allocation based on type
name without taking physical host device as input, guest 1 and guest 2
might get mdev devices created on different physical device. Then would
weightage matter here?

Thanks,
Kirti


> Implementing libvirt support at the same time might be an interesting
> exercise if we don't have a second user in the kernel to validate
> 

Re: [PATCH v10] vfio: ABI for mdev display dma-buf operation

2017-07-19 Thread Kirti Wankhede


On 7/19/2017 11:55 AM, Gerd Hoffmann wrote:
> On Wed, 2017-07-19 at 00:16 +, Zhang, Tina wrote:
>>> -Original Message-
>>> From: Gerd Hoffmann [mailto:kra...@redhat.com]
>>> Sent: Monday, July 17, 2017 7:03 PM
>>> To: Kirti Wankhede <kwankh...@nvidia.com>; Zhang, Tina
>>> <tina.zh...@intel.com>; Tian, Kevin <kevin.t...@intel.com>; linux-
>>> ker...@vger.kernel.org; intel-...@lists.freedesktop.org;
>>> alex.william...@redhat.com; zhen...@linux.intel.com; chris@chris-
>>> wilson.co.uk; Lv, Zhiyuan <zhiyuan...@intel.com>; intel-gvt-
>>> d...@lists.freedesktop.org; Wang, Zhi A <zhi.a.w...@intel.com>
>>> Subject: Re: [PATCH v10] vfio: ABI for mdev display dma-buf
>>> operation
>>>
>>>   Hi,
>>>
>>>> No need of flag here. If vGPU driver is not loaded in the guest,
>>>> there
>>>> is no surface being managed by vGPU, in that case this size will
>>>> be
>>>> zero.
>>>
>>> Ok, we certainly have the same situation with intel.  When the
>>> guest driver is not
>>> loaded (yet) there is no valid surface.
>>>
>>> We should cleanly define what the ioctl should do in that case, so
>>> all drivers
>>> behave the same way.
>>>
>>> I'd suggest that all fields defining the surface (drm_format,
>>> width, height, stride,
>>> size) should be set to zero in that case.
>>
>> Yeah, it's reasonable. How about the return value? Currently, the
>> ioctl also returns "-ENODEV" in that situation.
> 
> I think it should not return an error.  Querying the plane parameters
> worked fine.
> 

Sounds good to me too.

Thanks,
Kirti


Re: [PATCH v10] vfio: ABI for mdev display dma-buf operation

2017-07-19 Thread Kirti Wankhede


On 7/19/2017 11:55 AM, Gerd Hoffmann wrote:
> On Wed, 2017-07-19 at 00:16 +, Zhang, Tina wrote:
>>> -Original Message-
>>> From: Gerd Hoffmann [mailto:kra...@redhat.com]
>>> Sent: Monday, July 17, 2017 7:03 PM
>>> To: Kirti Wankhede ; Zhang, Tina
>>> ; Tian, Kevin ; linux-
>>> ker...@vger.kernel.org; intel-...@lists.freedesktop.org;
>>> alex.william...@redhat.com; zhen...@linux.intel.com; chris@chris-
>>> wilson.co.uk; Lv, Zhiyuan ; intel-gvt-
>>> d...@lists.freedesktop.org; Wang, Zhi A 
>>> Subject: Re: [PATCH v10] vfio: ABI for mdev display dma-buf
>>> operation
>>>
>>>   Hi,
>>>
>>>> No need of flag here. If vGPU driver is not loaded in the guest,
>>>> there
>>>> is no surface being managed by vGPU, in that case this size will
>>>> be
>>>> zero.
>>>
>>> Ok, we certainly have the same situation with intel.  When the
>>> guest driver is not
>>> loaded (yet) there is no valid surface.
>>>
>>> We should cleanly define what the ioctl should do in that case, so
>>> all drivers
>>> behave the same way.
>>>
>>> I'd suggest that all fields defining the surface (drm_format,
>>> width, height, stride,
>>> size) should be set to zero in that case.
>>
>> Yeah, it's reasonable. How about the return value? Currently, the
>> ioctl also returns "-ENODEV" in that situation.
> 
> I think it should not return an error.  Querying the plane parameters
> worked fine.
> 

Sounds good to me too.

Thanks,
Kirti


Re: [PATCH v10] vfio: ABI for mdev display dma-buf operation

2017-07-14 Thread Kirti Wankhede


On 7/14/2017 5:35 PM, Gerd Hoffmann wrote:
>   Hi,
> 
>> There could be only two planes, one DRM_PLANE_TYPE_PRIMARY and one
>> DRM_PLANE_TYPE_CURSOR.
>> Steps from gfx_update for region case would be:
>> - VFIO_DEVICE_QUERY_GFX_PLANE with plane_type =
>> DRM_PLANE_TYPE_PRIMARY
> 
>> - if vfio_device_gfx_plane_info.size > 0, read region for primary
>> surface and update console surface
> 
> Why?  I suspect you want notify the caller whenever the surface has
> been updated or not?  If so we should add an explicit flag or field for
> that.
> 

No need of flag here. If vGPU driver is not loaded in the guest, there
is no surface being managed by vGPU, in that case this size will be zero.

Thanks,
Kirti


Re: [PATCH v10] vfio: ABI for mdev display dma-buf operation

2017-07-14 Thread Kirti Wankhede


On 7/14/2017 5:35 PM, Gerd Hoffmann wrote:
>   Hi,
> 
>> There could be only two planes, one DRM_PLANE_TYPE_PRIMARY and one
>> DRM_PLANE_TYPE_CURSOR.
>> Steps from gfx_update for region case would be:
>> - VFIO_DEVICE_QUERY_GFX_PLANE with plane_type =
>> DRM_PLANE_TYPE_PRIMARY
> 
>> - if vfio_device_gfx_plane_info.size > 0, read region for primary
>> surface and update console surface
> 
> Why?  I suspect you want notify the caller whenever the surface has
> been updated or not?  If so we should add an explicit flag or field for
> that.
> 

No need of flag here. If vGPU driver is not loaded in the guest, there
is no surface being managed by vGPU, in that case this size will be zero.

Thanks,
Kirti


Re: [PATCH v10] vfio: ABI for mdev display dma-buf operation

2017-07-14 Thread Kirti Wankhede


On 7/14/2017 3:31 PM, Gerd Hoffmann wrote:
>   Hi,
> 
>> In case when VFIO region is used to provide surface to QEMU, plane_id
>> would be region index,
> 
> Then we should name it "region_index" not "plane_id".
> 
>> for example region 10 could be used for primary
>> surface and region 11 could be used for cursor surface. So in that
>> case,
>> mdev vendor driver should return plane_type and its corresponding
>> plane_id.
> 
> Hmm?  plane_type is the input (userspace -> kernel) parameter.
> 

Yes, that's right. Sorry for confusion. And mdev vendor driver would
return which region to read.

Thanks,
Kirti

> cheers,
>   Gerd
> 


Re: [PATCH v10] vfio: ABI for mdev display dma-buf operation

2017-07-14 Thread Kirti Wankhede


On 7/14/2017 3:31 PM, Gerd Hoffmann wrote:
>   Hi,
> 
>> In case when VFIO region is used to provide surface to QEMU, plane_id
>> would be region index,
> 
> Then we should name it "region_index" not "plane_id".
> 
>> for example region 10 could be used for primary
>> surface and region 11 could be used for cursor surface. So in that
>> case,
>> mdev vendor driver should return plane_type and its corresponding
>> plane_id.
> 
> Hmm?  plane_type is the input (userspace -> kernel) parameter.
> 

Yes, that's right. Sorry for confusion. And mdev vendor driver would
return which region to read.

Thanks,
Kirti

> cheers,
>   Gerd
> 


Re: [PATCH v10] vfio: ABI for mdev display dma-buf operation

2017-07-14 Thread Kirti Wankhede


On 7/14/2017 7:00 AM, Zhang, Tina wrote:
> 
> 
>> -Original Message-
>> From: intel-gvt-dev [mailto:intel-gvt-dev-boun...@lists.freedesktop.org] On
>> Behalf Of Kirti Wankhede
>> Sent: Wednesday, July 12, 2017 8:45 PM
>> To: Zhang, Tina <tina.zh...@intel.com>; Gerd Hoffmann
>> <kra...@redhat.com>; Tian, Kevin <kevin.t...@intel.com>; linux-
>> ker...@vger.kernel.org; intel-...@lists.freedesktop.org;
>> alex.william...@redhat.com; zhen...@linux.intel.com; chris@chris-
>> wilson.co.uk; Lv, Zhiyuan <zhiyuan...@intel.com>; intel-gvt-
>> d...@lists.freedesktop.org; Wang, Zhi A <zhi.a.w...@intel.com>
>> Subject: Re: [PATCH v10] vfio: ABI for mdev display dma-buf operation
>>
>>
>>
>> On 7/12/2017 1:10 PM, Daniel Vetter wrote:
>>> On Wed, Jul 12, 2017 at 02:31:40AM +, Zhang, Tina wrote:
>>>>
>>>>
>>>>> -Original Message-
>>>>> From: intel-gvt-dev
>>>>> [mailto:intel-gvt-dev-boun...@lists.freedesktop.org] On Behalf Of
>>>>> Daniel Vetter
>>>>> Sent: Tuesday, July 11, 2017 5:13 PM
>>>>> To: Gerd Hoffmann <kra...@redhat.com>
>>>>> Cc: Tian, Kevin <kevin.t...@intel.com>;
>>>>> linux-kernel@vger.kernel.org; intel- g...@lists.freedesktop.org;
>>>>> alex.william...@redhat.com; zhen...@linux.intel.com;
>>>>> ch...@chris-wilson.co.uk; Kirti Wankhede <kwankh...@nvidia.com>; Lv,
>>>>> Zhiyuan <zhiyuan...@intel.com>; dan...@ffwll.ch; Zhang, Tina
>>>>> <tina.zh...@intel.com>; intel-gvt- d...@lists.freedesktop.org; Wang,
>>>>> Zhi A <zhi.a.w...@intel.com>
>>>>> Subject: Re: [PATCH v10] vfio: ABI for mdev display dma-buf
>>>>> operation
>>>>>
>>>>> On Tue, Jul 11, 2017 at 08:14:08AM +0200, Gerd Hoffmann wrote:
>>>>>>   Hi,
>>>>>>
>>>>>>>> +struct vfio_device_query_gfx_plane {
>>>>>>>> +  __u32 argsz;
>>>>>>>> +  __u32 flags;
>>>>>>>> +  struct vfio_device_gfx_plane_info plane_info;
>>>>>>>> +  __u32 plane_type;
>>>>>>>> +  __s32 fd; /* dma-buf fd */
>>>>>>>> +  __u32 plane_id;
>>>>>>>> +};
>>>>>>>> +
>>>>>>>
>>>>>>> It would be better to have comment here about what are expected
>>>>>>> values for plane_type and plane_id.
>>>>>>
>>>>>> plane_type is DRM_PLANE_TYPE_*.
>>>>>>
>>>>>> yes, a comment saying so would be good, same for drm_format which
>>>>>> is DRM_FORMAT_*.  While looking at these two: renaming plane_type
>>>>>> to drm_plane_type (for consistency) is probably a good idea too.
>>>>>>
>>>>>> plane_id needs a specification.
>>>>>
>>>>> Why do you need plane_type? With universal planes the plane_id along
>>>>> is sufficient to identify a plane on a given drm device instance. I'd just
>> remove it.
>>>>> -Daniel
>>>> The plane_type here, is to ask the mdev vendor driver to return the
>> information according to the value in field plane_type. So, it's a input 
>> field.
>>>> The values in plane_type field is the same of drm_plane_type. And yes, it's
>> better to use drm_plane_type instead of plane_id.
>>>
>>> I have no idea what you mean here, I guess that just shows that
>>> discussing an ioctl struct without solid definitions of what field
>>> does what and why is not all that useful. What exactly it plane_id for then?
>>>
>>
>> plane type could be DRM_PLANE_TYPE_PRIMARY or
>> DRM_PLANE_TYPE_CURSOR.
>>
>> In case when VFIO region is used to provide surface to QEMU, plane_id would
>> be region index, for example region 10 could be used for primary surface and
>> region 11 could be used for cursor surface. So in that case, mdev vendor 
>> driver
>> should return plane_type and its corresponding plane_id.

> Thanks, Kirti, do you mean there will be multiple DRM_PLANE_TYPE_PRIMARY and 
> multiple DRM_PLANE_TYPE_CURSOR planes existing in the same time and region 
> usage needs to use plane_id to distinguish among them? Is it for the multiple 
> output or that's the typical way of region usage? Thanks.

There could be only two planes, one DRM_PLANE_TYPE_PRIMARY and one
DRM_PLANE_TYPE_CURSOR.
Steps from gfx_update for region case would be:
- VFIO_DEVICE_QUERY_GFX_PLANE with plane_type = DRM_PLANE_TYPE_PRIMARY
- if vfio_device_gfx_plane_info.size > 0, read region for primary
surface and update console surface
- VFIO_DEVICE_QUERY_GFX_PLANE with plane_type = DRM_PLANE_TYPE_CURSOR
- if vfio_device_gfx_plane_info.size > 0, read region for cursor surface
update cursor on surface.

Thanks,
Kirti


> 
> Tina
> 
>>
>> Thanks,
>> Kirti
>>
>>> This just confused me more ...
>>> -Daniel
>>>
>> ___
>> intel-gvt-dev mailing list
>> intel-gvt-...@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev


Re: [PATCH v10] vfio: ABI for mdev display dma-buf operation

2017-07-14 Thread Kirti Wankhede


On 7/14/2017 7:00 AM, Zhang, Tina wrote:
> 
> 
>> -Original Message-
>> From: intel-gvt-dev [mailto:intel-gvt-dev-boun...@lists.freedesktop.org] On
>> Behalf Of Kirti Wankhede
>> Sent: Wednesday, July 12, 2017 8:45 PM
>> To: Zhang, Tina ; Gerd Hoffmann
>> ; Tian, Kevin ; linux-
>> ker...@vger.kernel.org; intel-...@lists.freedesktop.org;
>> alex.william...@redhat.com; zhen...@linux.intel.com; chris@chris-
>> wilson.co.uk; Lv, Zhiyuan ; intel-gvt-
>> d...@lists.freedesktop.org; Wang, Zhi A 
>> Subject: Re: [PATCH v10] vfio: ABI for mdev display dma-buf operation
>>
>>
>>
>> On 7/12/2017 1:10 PM, Daniel Vetter wrote:
>>> On Wed, Jul 12, 2017 at 02:31:40AM +, Zhang, Tina wrote:
>>>>
>>>>
>>>>> -Original Message-
>>>>> From: intel-gvt-dev
>>>>> [mailto:intel-gvt-dev-boun...@lists.freedesktop.org] On Behalf Of
>>>>> Daniel Vetter
>>>>> Sent: Tuesday, July 11, 2017 5:13 PM
>>>>> To: Gerd Hoffmann 
>>>>> Cc: Tian, Kevin ;
>>>>> linux-kernel@vger.kernel.org; intel- g...@lists.freedesktop.org;
>>>>> alex.william...@redhat.com; zhen...@linux.intel.com;
>>>>> ch...@chris-wilson.co.uk; Kirti Wankhede ; Lv,
>>>>> Zhiyuan ; dan...@ffwll.ch; Zhang, Tina
>>>>> ; intel-gvt- d...@lists.freedesktop.org; Wang,
>>>>> Zhi A 
>>>>> Subject: Re: [PATCH v10] vfio: ABI for mdev display dma-buf
>>>>> operation
>>>>>
>>>>> On Tue, Jul 11, 2017 at 08:14:08AM +0200, Gerd Hoffmann wrote:
>>>>>>   Hi,
>>>>>>
>>>>>>>> +struct vfio_device_query_gfx_plane {
>>>>>>>> +  __u32 argsz;
>>>>>>>> +  __u32 flags;
>>>>>>>> +  struct vfio_device_gfx_plane_info plane_info;
>>>>>>>> +  __u32 plane_type;
>>>>>>>> +  __s32 fd; /* dma-buf fd */
>>>>>>>> +  __u32 plane_id;
>>>>>>>> +};
>>>>>>>> +
>>>>>>>
>>>>>>> It would be better to have comment here about what are expected
>>>>>>> values for plane_type and plane_id.
>>>>>>
>>>>>> plane_type is DRM_PLANE_TYPE_*.
>>>>>>
>>>>>> yes, a comment saying so would be good, same for drm_format which
>>>>>> is DRM_FORMAT_*.  While looking at these two: renaming plane_type
>>>>>> to drm_plane_type (for consistency) is probably a good idea too.
>>>>>>
>>>>>> plane_id needs a specification.
>>>>>
>>>>> Why do you need plane_type? With universal planes the plane_id along
>>>>> is sufficient to identify a plane on a given drm device instance. I'd just
>> remove it.
>>>>> -Daniel
>>>> The plane_type here, is to ask the mdev vendor driver to return the
>> information according to the value in field plane_type. So, it's a input 
>> field.
>>>> The values in plane_type field is the same of drm_plane_type. And yes, it's
>> better to use drm_plane_type instead of plane_id.
>>>
>>> I have no idea what you mean here, I guess that just shows that
>>> discussing an ioctl struct without solid definitions of what field
>>> does what and why is not all that useful. What exactly it plane_id for then?
>>>
>>
>> plane type could be DRM_PLANE_TYPE_PRIMARY or
>> DRM_PLANE_TYPE_CURSOR.
>>
>> In case when VFIO region is used to provide surface to QEMU, plane_id would
>> be region index, for example region 10 could be used for primary surface and
>> region 11 could be used for cursor surface. So in that case, mdev vendor 
>> driver
>> should return plane_type and its corresponding plane_id.

> Thanks, Kirti, do you mean there will be multiple DRM_PLANE_TYPE_PRIMARY and 
> multiple DRM_PLANE_TYPE_CURSOR planes existing in the same time and region 
> usage needs to use plane_id to distinguish among them? Is it for the multiple 
> output or that's the typical way of region usage? Thanks.

There could be only two planes, one DRM_PLANE_TYPE_PRIMARY and one
DRM_PLANE_TYPE_CURSOR.
Steps from gfx_update for region case would be:
- VFIO_DEVICE_QUERY_GFX_PLANE with plane_type = DRM_PLANE_TYPE_PRIMARY
- if vfio_device_gfx_plane_info.size > 0, read region for primary
surface and update console surface
- VFIO_DEVICE_QUERY_GFX_PLANE with plane_type = DRM_PLANE_TYPE_CURSOR
- if vfio_device_gfx_plane_info.size > 0, read region for cursor surface
update cursor on surface.

Thanks,
Kirti


> 
> Tina
> 
>>
>> Thanks,
>> Kirti
>>
>>> This just confused me more ...
>>> -Daniel
>>>
>> ___
>> intel-gvt-dev mailing list
>> intel-gvt-...@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev


Re: [PATCH v9 5/7] vfio: Define vfio based dma-buf operations

2017-07-12 Thread Kirti Wankhede
Hey Gerd,

Sorry, I missed this mail earlier.

On 6/21/2017 12:52 PM, Gerd Hoffmann wrote:
>   Hi,
> 
>> We don't support cursor for console vnc. Ideally console vnc should
>> be
>> used by admin for configuration or during maintenance, which refresh
>> primary surface at low refresh rate, 10 fps.
> 
> But you surely want a mouse pointer for the admin?
> You render it directly to the primary surface then I guess?
> 

If cursor surface is not provided, a dot for cursor is seen on the
primary surface, which is pretty much usable.

>> Right we need to know this at device initialization time for both
>> cases
>> to initialize VGACommonState structure for that device
> 
> Why do you need a VGACommonState?
> 

We need to create a GRAPHIC_CONSOLE for vGPU device and specify
GraphicHwOps so that from its .gfx_update callback, surface can be
queried and updated.

>> and also need
>> NONE to decide whether to init console vnc or not. We have a
>> mechanism
>> to disable console vnc path and we recommend to disable it for better
>> performance.
> 
> Hmm, maybe we should have a ioctl to configure the refresh rate, or a
> ioctl to allow qemu ask for a refresh when needed?
> 

What is default refresh rate of QEMU if vnc is connected?

Thanks,
Kirti

> qemu can throttle the display update rate, which for example happens in
> case no vnc client is connected.  qemu updates the display only once
> every few seconds then.
> 
> cheers,
>   Gerd
> 


Re: [PATCH v9 5/7] vfio: Define vfio based dma-buf operations

2017-07-12 Thread Kirti Wankhede
Hey Gerd,

Sorry, I missed this mail earlier.

On 6/21/2017 12:52 PM, Gerd Hoffmann wrote:
>   Hi,
> 
>> We don't support cursor for console vnc. Ideally console vnc should
>> be
>> used by admin for configuration or during maintenance, which refresh
>> primary surface at low refresh rate, 10 fps.
> 
> But you surely want a mouse pointer for the admin?
> You render it directly to the primary surface then I guess?
> 

If cursor surface is not provided, a dot for cursor is seen on the
primary surface, which is pretty much usable.

>> Right we need to know this at device initialization time for both
>> cases
>> to initialize VGACommonState structure for that device
> 
> Why do you need a VGACommonState?
> 

We need to create a GRAPHIC_CONSOLE for vGPU device and specify
GraphicHwOps so that from its .gfx_update callback, surface can be
queried and updated.

>> and also need
>> NONE to decide whether to init console vnc or not. We have a
>> mechanism
>> to disable console vnc path and we recommend to disable it for better
>> performance.
> 
> Hmm, maybe we should have a ioctl to configure the refresh rate, or a
> ioctl to allow qemu ask for a refresh when needed?
> 

What is default refresh rate of QEMU if vnc is connected?

Thanks,
Kirti

> qemu can throttle the display update rate, which for example happens in
> case no vnc client is connected.  qemu updates the display only once
> every few seconds then.
> 
> cheers,
>   Gerd
> 


Re: [PATCH v10] vfio: ABI for mdev display dma-buf operation

2017-07-12 Thread Kirti Wankhede


On 7/12/2017 1:10 PM, Daniel Vetter wrote:
> On Wed, Jul 12, 2017 at 02:31:40AM +, Zhang, Tina wrote:
>>
>>
>>> -Original Message-
>>> From: intel-gvt-dev [mailto:intel-gvt-dev-boun...@lists.freedesktop.org] On
>>> Behalf Of Daniel Vetter
>>> Sent: Tuesday, July 11, 2017 5:13 PM
>>> To: Gerd Hoffmann <kra...@redhat.com>
>>> Cc: Tian, Kevin <kevin.t...@intel.com>; linux-kernel@vger.kernel.org; intel-
>>> g...@lists.freedesktop.org; alex.william...@redhat.com;
>>> zhen...@linux.intel.com; ch...@chris-wilson.co.uk; Kirti Wankhede
>>> <kwankh...@nvidia.com>; Lv, Zhiyuan <zhiyuan...@intel.com>;
>>> dan...@ffwll.ch; Zhang, Tina <tina.zh...@intel.com>; intel-gvt-
>>> d...@lists.freedesktop.org; Wang, Zhi A <zhi.a.w...@intel.com>
>>> Subject: Re: [PATCH v10] vfio: ABI for mdev display dma-buf operation
>>>
>>> On Tue, Jul 11, 2017 at 08:14:08AM +0200, Gerd Hoffmann wrote:
>>>>   Hi,
>>>>
>>>>>> +struct vfio_device_query_gfx_plane {
>>>>>> +__u32 argsz;
>>>>>> +__u32 flags;
>>>>>> +struct vfio_device_gfx_plane_info plane_info;
>>>>>> +__u32 plane_type;
>>>>>> +__s32 fd; /* dma-buf fd */
>>>>>> +__u32 plane_id;
>>>>>> +};
>>>>>> +
>>>>>
>>>>> It would be better to have comment here about what are expected
>>>>> values for plane_type and plane_id.
>>>>
>>>> plane_type is DRM_PLANE_TYPE_*.
>>>>
>>>> yes, a comment saying so would be good, same for drm_format which is
>>>> DRM_FORMAT_*.  While looking at these two: renaming plane_type to
>>>> drm_plane_type (for consistency) is probably a good idea too.
>>>>
>>>> plane_id needs a specification.
>>>
>>> Why do you need plane_type? With universal planes the plane_id along is
>>> sufficient to identify a plane on a given drm device instance. I'd just 
>>> remove it.
>>> -Daniel
>> The plane_type here, is to ask the mdev vendor driver to return the 
>> information according to the value in field plane_type. So, it's a input 
>> field.
>> The values in plane_type field is the same of drm_plane_type. And yes, it's 
>> better to use drm_plane_type instead of plane_id. 
> 
> I have no idea what you mean here, I guess that just shows that discussing
> an ioctl struct without solid definitions of what field does what and why
> is not all that useful. What exactly it plane_id for then?
> 

plane type could be DRM_PLANE_TYPE_PRIMARY or DRM_PLANE_TYPE_CURSOR.

In case when VFIO region is used to provide surface to QEMU, plane_id
would be region index, for example region 10 could be used for primary
surface and region 11 could be used for cursor surface. So in that case,
mdev vendor driver should return plane_type and its corresponding plane_id.

Thanks,
Kirti

> This just confused me more ...
> -Daniel
> 


Re: [PATCH v10] vfio: ABI for mdev display dma-buf operation

2017-07-12 Thread Kirti Wankhede


On 7/12/2017 1:10 PM, Daniel Vetter wrote:
> On Wed, Jul 12, 2017 at 02:31:40AM +, Zhang, Tina wrote:
>>
>>
>>> -Original Message-
>>> From: intel-gvt-dev [mailto:intel-gvt-dev-boun...@lists.freedesktop.org] On
>>> Behalf Of Daniel Vetter
>>> Sent: Tuesday, July 11, 2017 5:13 PM
>>> To: Gerd Hoffmann 
>>> Cc: Tian, Kevin ; linux-kernel@vger.kernel.org; intel-
>>> g...@lists.freedesktop.org; alex.william...@redhat.com;
>>> zhen...@linux.intel.com; ch...@chris-wilson.co.uk; Kirti Wankhede
>>> ; Lv, Zhiyuan ;
>>> dan...@ffwll.ch; Zhang, Tina ; intel-gvt-
>>> d...@lists.freedesktop.org; Wang, Zhi A 
>>> Subject: Re: [PATCH v10] vfio: ABI for mdev display dma-buf operation
>>>
>>> On Tue, Jul 11, 2017 at 08:14:08AM +0200, Gerd Hoffmann wrote:
>>>>   Hi,
>>>>
>>>>>> +struct vfio_device_query_gfx_plane {
>>>>>> +__u32 argsz;
>>>>>> +__u32 flags;
>>>>>> +struct vfio_device_gfx_plane_info plane_info;
>>>>>> +__u32 plane_type;
>>>>>> +__s32 fd; /* dma-buf fd */
>>>>>> +__u32 plane_id;
>>>>>> +};
>>>>>> +
>>>>>
>>>>> It would be better to have comment here about what are expected
>>>>> values for plane_type and plane_id.
>>>>
>>>> plane_type is DRM_PLANE_TYPE_*.
>>>>
>>>> yes, a comment saying so would be good, same for drm_format which is
>>>> DRM_FORMAT_*.  While looking at these two: renaming plane_type to
>>>> drm_plane_type (for consistency) is probably a good idea too.
>>>>
>>>> plane_id needs a specification.
>>>
>>> Why do you need plane_type? With universal planes the plane_id along is
>>> sufficient to identify a plane on a given drm device instance. I'd just 
>>> remove it.
>>> -Daniel
>> The plane_type here, is to ask the mdev vendor driver to return the 
>> information according to the value in field plane_type. So, it's a input 
>> field.
>> The values in plane_type field is the same of drm_plane_type. And yes, it's 
>> better to use drm_plane_type instead of plane_id. 
> 
> I have no idea what you mean here, I guess that just shows that discussing
> an ioctl struct without solid definitions of what field does what and why
> is not all that useful. What exactly it plane_id for then?
> 

plane type could be DRM_PLANE_TYPE_PRIMARY or DRM_PLANE_TYPE_CURSOR.

In case when VFIO region is used to provide surface to QEMU, plane_id
would be region index, for example region 10 could be used for primary
surface and region 11 could be used for cursor surface. So in that case,
mdev vendor driver should return plane_type and its corresponding plane_id.

Thanks,
Kirti

> This just confused me more ...
> -Daniel
> 


Re: [PATCH] vfio: Remove unnecessary uses of vfio_container.group_lock

2017-07-11 Thread Kirti Wankhede
Sounds reasonable to me.

Thanks,
Kirti

On 7/8/2017 3:45 AM, Alex Williamson wrote:
> The original intent of vfio_container.group_lock is to protect
> vfio_container.group_list, however over time it's become a crutch to
> prevent changes in container composition any time we call into the
> iommu driver backend.  This introduces problems when we start to have
> more complex interactions, for example when a user's DMA unmap request
> triggers a notification to an mdev vendor driver, who responds by
> attempting to unpin mappings within that request, re-entering the
> iommu backend.  We incorrectly assume that the use of read-locks here
> allow for this nested locking behavior, but a poorly timed write-lock
> could in fact trigger a deadlock.
> 
> The current use of group_lock seems to fall into the trap of locking
> code, not data.  Correct that by removing uses of group_lock that are
> not directly related to group_list.  Note that the vfio type1 iommu
> backend has its own mutex, vfio_iommu.lock, which it uses to protect
> itself for each of these interfaces anyway.  The group_lock appears to
> be a redundancy for these interfaces and type1 even goes so far as to
> release its mutex to allow for exactly the re-entrant code path above.
> 
> Reported-by: Chuanxiao Dong 
> Signed-off-by: Alex Williamson 
> ---
> 
> Alexey, does the SPAPR/TCE iommu backend have any dependencies on this
> lock?  If so, let's create a lock in the spapr_tce backend like we
> have in type1 to handle it.  I believe the ioctl passthrough is the
> only interface that can reach spapr_tce.  Thanks,
> 
> Alex
> 
>  drivers/vfio/vfio.c |   38 --
>  1 file changed, 38 deletions(-)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 7597a377eb4e..330d50582f40 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -1175,15 +1175,11 @@ static long vfio_fops_unl_ioctl(struct file *filep,
>   ret = vfio_ioctl_set_iommu(container, arg);
>   break;
>   default:
> - down_read(>group_lock);
> -
>   driver = container->iommu_driver;
>   data = container->iommu_data;
>  
>   if (driver) /* passthrough all unrecognized ioctls */
>   ret = driver->ops->ioctl(data, cmd, arg);
> -
> - up_read(>group_lock);
>   }
>  
>   return ret;
> @@ -1237,15 +1233,11 @@ static ssize_t vfio_fops_read(struct file *filep, 
> char __user *buf,
>   struct vfio_iommu_driver *driver;
>   ssize_t ret = -EINVAL;
>  
> - down_read(>group_lock);
> -
>   driver = container->iommu_driver;
>   if (likely(driver && driver->ops->read))
>   ret = driver->ops->read(container->iommu_data,
>   buf, count, ppos);
>  
> - up_read(>group_lock);
> -
>   return ret;
>  }
>  
> @@ -1256,15 +1248,11 @@ static ssize_t vfio_fops_write(struct file *filep, 
> const char __user *buf,
>   struct vfio_iommu_driver *driver;
>   ssize_t ret = -EINVAL;
>  
> - down_read(>group_lock);
> -
>   driver = container->iommu_driver;
>   if (likely(driver && driver->ops->write))
>   ret = driver->ops->write(container->iommu_data,
>buf, count, ppos);
>  
> - up_read(>group_lock);
> -
>   return ret;
>  }
>  
> @@ -1274,14 +1262,10 @@ static int vfio_fops_mmap(struct file *filep, struct 
> vm_area_struct *vma)
>   struct vfio_iommu_driver *driver;
>   int ret = -EINVAL;
>  
> - down_read(>group_lock);
> -
>   driver = container->iommu_driver;
>   if (likely(driver && driver->ops->mmap))
>   ret = driver->ops->mmap(container->iommu_data, vma);
>  
> - up_read(>group_lock);
> -
>   return ret;
>  }
>  
> @@ -1993,8 +1977,6 @@ int vfio_pin_pages(struct device *dev, unsigned long 
> *user_pfn, int npage,
>   goto err_pin_pages;
>  
>   container = group->container;
> - down_read(>group_lock);
> -
>   driver = container->iommu_driver;
>   if (likely(driver && driver->ops->pin_pages))
>   ret = driver->ops->pin_pages(container->iommu_data, user_pfn,
> @@ -2002,7 +1984,6 @@ int vfio_pin_pages(struct device *dev, unsigned long 
> *user_pfn, int npage,
>   else
>   ret = -ENOTTY;
>  
> - up_read(>group_lock);
>   vfio_group_try_dissolve_container(group);
>  
>  err_pin_pages:
> @@ -2042,8 +2023,6 @@ int vfio_unpin_pages(struct device *dev, unsigned long 
> *user_pfn, int npage)
>   goto err_unpin_pages;
>  
>   container = group->container;
> - down_read(>group_lock);
> -
>   driver = container->iommu_driver;
>   if (likely(driver && driver->ops->unpin_pages))
>   ret = driver->ops->unpin_pages(container->iommu_data, user_pfn,
> @@ -2051,7 +2030,6 @@ int vfio_unpin_pages(struct device *dev, 

Re: [PATCH] vfio: Remove unnecessary uses of vfio_container.group_lock

2017-07-11 Thread Kirti Wankhede
Sounds reasonable to me.

Thanks,
Kirti

On 7/8/2017 3:45 AM, Alex Williamson wrote:
> The original intent of vfio_container.group_lock is to protect
> vfio_container.group_list, however over time it's become a crutch to
> prevent changes in container composition any time we call into the
> iommu driver backend.  This introduces problems when we start to have
> more complex interactions, for example when a user's DMA unmap request
> triggers a notification to an mdev vendor driver, who responds by
> attempting to unpin mappings within that request, re-entering the
> iommu backend.  We incorrectly assume that the use of read-locks here
> allow for this nested locking behavior, but a poorly timed write-lock
> could in fact trigger a deadlock.
> 
> The current use of group_lock seems to fall into the trap of locking
> code, not data.  Correct that by removing uses of group_lock that are
> not directly related to group_list.  Note that the vfio type1 iommu
> backend has its own mutex, vfio_iommu.lock, which it uses to protect
> itself for each of these interfaces anyway.  The group_lock appears to
> be a redundancy for these interfaces and type1 even goes so far as to
> release its mutex to allow for exactly the re-entrant code path above.
> 
> Reported-by: Chuanxiao Dong 
> Signed-off-by: Alex Williamson 
> ---
> 
> Alexey, does the SPAPR/TCE iommu backend have any dependencies on this
> lock?  If so, let's create a lock in the spapr_tce backend like we
> have in type1 to handle it.  I believe the ioctl passthrough is the
> only interface that can reach spapr_tce.  Thanks,
> 
> Alex
> 
>  drivers/vfio/vfio.c |   38 --
>  1 file changed, 38 deletions(-)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 7597a377eb4e..330d50582f40 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -1175,15 +1175,11 @@ static long vfio_fops_unl_ioctl(struct file *filep,
>   ret = vfio_ioctl_set_iommu(container, arg);
>   break;
>   default:
> - down_read(>group_lock);
> -
>   driver = container->iommu_driver;
>   data = container->iommu_data;
>  
>   if (driver) /* passthrough all unrecognized ioctls */
>   ret = driver->ops->ioctl(data, cmd, arg);
> -
> - up_read(>group_lock);
>   }
>  
>   return ret;
> @@ -1237,15 +1233,11 @@ static ssize_t vfio_fops_read(struct file *filep, 
> char __user *buf,
>   struct vfio_iommu_driver *driver;
>   ssize_t ret = -EINVAL;
>  
> - down_read(>group_lock);
> -
>   driver = container->iommu_driver;
>   if (likely(driver && driver->ops->read))
>   ret = driver->ops->read(container->iommu_data,
>   buf, count, ppos);
>  
> - up_read(>group_lock);
> -
>   return ret;
>  }
>  
> @@ -1256,15 +1248,11 @@ static ssize_t vfio_fops_write(struct file *filep, 
> const char __user *buf,
>   struct vfio_iommu_driver *driver;
>   ssize_t ret = -EINVAL;
>  
> - down_read(>group_lock);
> -
>   driver = container->iommu_driver;
>   if (likely(driver && driver->ops->write))
>   ret = driver->ops->write(container->iommu_data,
>buf, count, ppos);
>  
> - up_read(>group_lock);
> -
>   return ret;
>  }
>  
> @@ -1274,14 +1262,10 @@ static int vfio_fops_mmap(struct file *filep, struct 
> vm_area_struct *vma)
>   struct vfio_iommu_driver *driver;
>   int ret = -EINVAL;
>  
> - down_read(>group_lock);
> -
>   driver = container->iommu_driver;
>   if (likely(driver && driver->ops->mmap))
>   ret = driver->ops->mmap(container->iommu_data, vma);
>  
> - up_read(>group_lock);
> -
>   return ret;
>  }
>  
> @@ -1993,8 +1977,6 @@ int vfio_pin_pages(struct device *dev, unsigned long 
> *user_pfn, int npage,
>   goto err_pin_pages;
>  
>   container = group->container;
> - down_read(>group_lock);
> -
>   driver = container->iommu_driver;
>   if (likely(driver && driver->ops->pin_pages))
>   ret = driver->ops->pin_pages(container->iommu_data, user_pfn,
> @@ -2002,7 +1984,6 @@ int vfio_pin_pages(struct device *dev, unsigned long 
> *user_pfn, int npage,
>   else
>   ret = -ENOTTY;
>  
> - up_read(>group_lock);
>   vfio_group_try_dissolve_container(group);
>  
>  err_pin_pages:
> @@ -2042,8 +2023,6 @@ int vfio_unpin_pages(struct device *dev, unsigned long 
> *user_pfn, int npage)
>   goto err_unpin_pages;
>  
>   container = group->container;
> - down_read(>group_lock);
> -
>   driver = container->iommu_driver;
>   if (likely(driver && driver->ops->unpin_pages))
>   ret = driver->ops->unpin_pages(container->iommu_data, user_pfn,
> @@ -2051,7 +2030,6 @@ int vfio_unpin_pages(struct device *dev, unsigned long 
> *user_pfn, int npage)
>   else
>  

Re: [PATCH v10] vfio: ABI for mdev display dma-buf operation

2017-07-06 Thread Kirti Wankhede


On 7/6/2017 3:59 AM, Tina Zhang wrote:
> Add VFIO_DEVICE_QUERY_GFX_PLANE ioctl command to let user mode query and
> get the plan and its related information.
> 
> The dma-buf's life cycle is handled by user mode and tracked by kernel.
> The returned fd in struct vfio_device_query_gfx_plane can be a new
> fd or an old fd of a re-exported dma-buf. Host User mode can check the
> value of fd and to see if it need to creat new resource according to the
> new fd or just use the existed resource related to the old fd.
> 
> Signed-off-by: Tina Zhang 
> 
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index ae46105..c92bc69 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -502,6 +502,36 @@ struct vfio_pci_hot_reset {
>  
>  #define VFIO_DEVICE_PCI_HOT_RESET_IO(VFIO_TYPE, VFIO_BASE + 13)
>  
> +/**
> + * VFIO_DEVICE_QUERY_GFX_PLANE - _IOW(VFIO_TYPE, VFIO_BASE + 14,
> + *   struct vfio_device_query_gfx_plane)
> + * Return: 0 on success, -errno on failure.
> + */
> +
> +struct vfio_device_gfx_plane_info {
> + __u64 start;
> + __u64 drm_format_mod;
> + __u32 drm_format;
> + __u32 width;
> + __u32 height;
> + __u32 stride;
> + __u32 size;
> + __u32 x_pos;
> + __u32 y_pos;
> +};
> +

Above structure looks good to me.

> +struct vfio_device_query_gfx_plane {
> + __u32 argsz;
> + __u32 flags;
> + struct vfio_device_gfx_plane_info plane_info;
> + __u32 plane_type;
> + __s32 fd; /* dma-buf fd */
> + __u32 plane_id;
> +};
> +

It would be better to have comment here about what are expected values
for plane_type and plane_id.

Thanks,
Kirti

> +#define VFIO_DEVICE_QUERY_GFX_PLANE _IO(VFIO_TYPE, VFIO_BASE + 14)
> +
> +
>  /*  API for Type1 VFIO IOMMU  */
>  
>  /**
> 


Re: [PATCH v10] vfio: ABI for mdev display dma-buf operation

2017-07-06 Thread Kirti Wankhede


On 7/6/2017 3:59 AM, Tina Zhang wrote:
> Add VFIO_DEVICE_QUERY_GFX_PLANE ioctl command to let user mode query and
> get the plan and its related information.
> 
> The dma-buf's life cycle is handled by user mode and tracked by kernel.
> The returned fd in struct vfio_device_query_gfx_plane can be a new
> fd or an old fd of a re-exported dma-buf. Host User mode can check the
> value of fd and to see if it need to creat new resource according to the
> new fd or just use the existed resource related to the old fd.
> 
> Signed-off-by: Tina Zhang 
> 
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index ae46105..c92bc69 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -502,6 +502,36 @@ struct vfio_pci_hot_reset {
>  
>  #define VFIO_DEVICE_PCI_HOT_RESET_IO(VFIO_TYPE, VFIO_BASE + 13)
>  
> +/**
> + * VFIO_DEVICE_QUERY_GFX_PLANE - _IOW(VFIO_TYPE, VFIO_BASE + 14,
> + *   struct vfio_device_query_gfx_plane)
> + * Return: 0 on success, -errno on failure.
> + */
> +
> +struct vfio_device_gfx_plane_info {
> + __u64 start;
> + __u64 drm_format_mod;
> + __u32 drm_format;
> + __u32 width;
> + __u32 height;
> + __u32 stride;
> + __u32 size;
> + __u32 x_pos;
> + __u32 y_pos;
> +};
> +

Above structure looks good to me.

> +struct vfio_device_query_gfx_plane {
> + __u32 argsz;
> + __u32 flags;
> + struct vfio_device_gfx_plane_info plane_info;
> + __u32 plane_type;
> + __s32 fd; /* dma-buf fd */
> + __u32 plane_id;
> +};
> +

It would be better to have comment here about what are expected values
for plane_type and plane_id.

Thanks,
Kirti

> +#define VFIO_DEVICE_QUERY_GFX_PLANE _IO(VFIO_TYPE, VFIO_BASE + 14)
> +
> +
>  /*  API for Type1 VFIO IOMMU  */
>  
>  /**
> 


Re: [Intel-gfx] [PATCH v9 5/7] vfio: Define vfio based dma-buf operations

2017-06-20 Thread Kirti Wankhede


On 6/20/2017 8:30 PM, Alex Williamson wrote:
> On Tue, 20 Jun 2017 12:57:36 +0200
> Gerd Hoffmann  wrote:
> 
>> On Tue, 2017-06-20 at 08:41 +, Zhang, Tina wrote:
>>> Hi,
>>>
>>> Thanks for all the comments. Here are the summaries:
>>>
>>> 1. Modify the structures to make it more general.
>>> struct vfio_device_gfx_plane_info {
>>> __u64 start;
>>> __u64 drm_format_mod;
>>> __u32 drm_format;
>>> __u32 width;
>>> __u32 height;
>>> __u32 stride;
>>> __u32 size;
>>> __u32 x_pos;
>>> __u32 y_pos;
>>> __u32 generation;
>>> };  
>>
>> Looks good to me.
>>
>>> struct vfio_device_query_gfx_plane {
>>> __u32 argsz;
>>> __u32 flags;
>>> #define VFIO_GFX_PLANE_FLAGS_REGION_ID  (1 << 0)
>>> #define VFIO_GFX_PLANE_FLAGS_PLANE_ID   (1 << 1)
>>> struct vfio_device_gfx_plane_info plane_info;
>>> __u32 id; 
>>> };  
>>
>> I'm not convinced the flags are a great idea.  Whenever dmabufs or a
>> region is used is a static property of the device, not of each
>> individual plane.
>>
>>
>> I think we should have this for userspace to figure:
>>
>> enum vfio_device_gfx_type {
>> VFIO_DEVICE_GFX_NONE,
>> VFIO_DEVICE_GFX_DMABUF,
>> VFIO_DEVICE_GFX_REGION,
>> };
>>
>> struct vfio_device_gfx_query_caps {
>> __u32 argsz;
>> __u32 flags;
>> enum vfio_device_gfx_type;
>> };
> 
> We already have VFIO_DEVICE_GET_INFO which returns:
> 
> struct vfio_device_info {
> __u32   argsz;
> __u32   flags;
> #define VFIO_DEVICE_FLAGS_RESET (1 << 0)/* Device supports reset */
> #define VFIO_DEVICE_FLAGS_PCI   (1 << 1)/* vfio-pci device */
> #define VFIO_DEVICE_FLAGS_PLATFORM (1 << 2) /* vfio-platform device */
> #define VFIO_DEVICE_FLAGS_AMBA  (1 << 3)/* vfio-amba device */
> #define VFIO_DEVICE_FLAGS_CCW   (1 << 4)/* vfio-ccw device */
> __u32   num_regions;/* Max region index + 1 */
> __u32   num_irqs;   /* Max IRQ index + 1 */
> };
> 
> We could use two flag bits to indicate dmabuf or graphics region
> support.  vfio_device_gfx_query_caps seems to imply a new ioctl, which
> would be unnecessary.
>

Sounds good to me.

>> Then this to query the plane:
>>
>> struct vfio_device_gfx_query_plane {
>> __u32 argsz;
>> __u32 flags;
>> struct vfio_device_gfx_plane_info plane_info;  /* out */
>> __u32 plane_type;  /* in  */
>> };
> 
> I'm not sure why we're using an enum for something that can currently
> be defined with 2 bits, seems like this would be another good use of
> flags.  We could even embed an enum into the flags if we want to
> leave some expansion room, 4 bits maybe?  Also, I was imagining that a
> device could support multiple graphics regions, that's where specifying
> the "id" as a region index seemed useful.  We lose that ability here
> unless we go back to defining a flag bit to specify how to interpret
> this last field.
> 

Right, as I mentioned in earlier reply, we need 2 seperate fields
- plane type : DRM_PLANE_TYPE_PRIMARY or DRM_PLANE_TYPE_CURSOR
- id : fd for dmabuf or region index for region type


>> 2. Remove dmabuf mgr fd and add these two ioctl commands to the vfio
>> device fd.
>>> VFIO_DEVICE_QUERY_GFX_PLANE : used to query
>>> vfio_device_gfx_plane_info.  
>>
>> Yes.
>>
>>> VFIO_DEVICE_GET_DMABUF_FD: used to create and return the dmabuf fd.  
> 
> I'm not convinced this adds value, but I'll list it as an option:
> 
> VFIO_DEVICE_QUERY(VFIO_DEVICE_GFX_PLANE)
> VFIO_DEVICE_GET_FD(VFIO_DEVICE_GFX_DMABUF_FD)
> 
> The benefit is that it might help to avoid a proliferation of ioctls on
> the device the pain is that we need to either define a field or section
> of flags which identify what is being queried or what type of device fd
> is being requested.
> 
>> Yes.  The plane might have changed between query-plane and get-dmabuf
>> ioctl calls though, we must make sure we handle that somehow.  Current
>> patches return plane_info on get-dmabuf ioctl too, so userspace can see
>> what it actually got.
>>
>> With the generation we can also do something different:  Pass in
>> plane_type and generation, and have VFIO_DEVICE_GET_DMABUF_FD return
>> an error in case the generation doesn't match.  In that case it doesn't
>> make much sense any more to have a separate plane_info struct, which
>> was added so we don't have to duplicate things in query-plane and get-
>> dmabuf ioctl structs.
> 
> I'm not sure I understand how this works for a region, the region is
> always the current generation, how can the user ever be sure the
> plane_info matches what is exposed in the region?  Thanks,
>

Userspace have to follow the sequence to query plane info
(VFIO_DEVICE_QUERY_GFX_PLANE) and then read primary surface from the region.
On kernel space side, from VFIO_DEVICE_QUERY_GFX_PLANE ioctl, driver
should update surface which is being exposed by the GFX region, 

Re: [Intel-gfx] [PATCH v9 5/7] vfio: Define vfio based dma-buf operations

2017-06-20 Thread Kirti Wankhede


On 6/20/2017 8:30 PM, Alex Williamson wrote:
> On Tue, 20 Jun 2017 12:57:36 +0200
> Gerd Hoffmann  wrote:
> 
>> On Tue, 2017-06-20 at 08:41 +, Zhang, Tina wrote:
>>> Hi,
>>>
>>> Thanks for all the comments. Here are the summaries:
>>>
>>> 1. Modify the structures to make it more general.
>>> struct vfio_device_gfx_plane_info {
>>> __u64 start;
>>> __u64 drm_format_mod;
>>> __u32 drm_format;
>>> __u32 width;
>>> __u32 height;
>>> __u32 stride;
>>> __u32 size;
>>> __u32 x_pos;
>>> __u32 y_pos;
>>> __u32 generation;
>>> };  
>>
>> Looks good to me.
>>
>>> struct vfio_device_query_gfx_plane {
>>> __u32 argsz;
>>> __u32 flags;
>>> #define VFIO_GFX_PLANE_FLAGS_REGION_ID  (1 << 0)
>>> #define VFIO_GFX_PLANE_FLAGS_PLANE_ID   (1 << 1)
>>> struct vfio_device_gfx_plane_info plane_info;
>>> __u32 id; 
>>> };  
>>
>> I'm not convinced the flags are a great idea.  Whenever dmabufs or a
>> region is used is a static property of the device, not of each
>> individual plane.
>>
>>
>> I think we should have this for userspace to figure:
>>
>> enum vfio_device_gfx_type {
>> VFIO_DEVICE_GFX_NONE,
>> VFIO_DEVICE_GFX_DMABUF,
>> VFIO_DEVICE_GFX_REGION,
>> };
>>
>> struct vfio_device_gfx_query_caps {
>> __u32 argsz;
>> __u32 flags;
>> enum vfio_device_gfx_type;
>> };
> 
> We already have VFIO_DEVICE_GET_INFO which returns:
> 
> struct vfio_device_info {
> __u32   argsz;
> __u32   flags;
> #define VFIO_DEVICE_FLAGS_RESET (1 << 0)/* Device supports reset */
> #define VFIO_DEVICE_FLAGS_PCI   (1 << 1)/* vfio-pci device */
> #define VFIO_DEVICE_FLAGS_PLATFORM (1 << 2) /* vfio-platform device */
> #define VFIO_DEVICE_FLAGS_AMBA  (1 << 3)/* vfio-amba device */
> #define VFIO_DEVICE_FLAGS_CCW   (1 << 4)/* vfio-ccw device */
> __u32   num_regions;/* Max region index + 1 */
> __u32   num_irqs;   /* Max IRQ index + 1 */
> };
> 
> We could use two flag bits to indicate dmabuf or graphics region
> support.  vfio_device_gfx_query_caps seems to imply a new ioctl, which
> would be unnecessary.
>

Sounds good to me.

>> Then this to query the plane:
>>
>> struct vfio_device_gfx_query_plane {
>> __u32 argsz;
>> __u32 flags;
>> struct vfio_device_gfx_plane_info plane_info;  /* out */
>> __u32 plane_type;  /* in  */
>> };
> 
> I'm not sure why we're using an enum for something that can currently
> be defined with 2 bits, seems like this would be another good use of
> flags.  We could even embed an enum into the flags if we want to
> leave some expansion room, 4 bits maybe?  Also, I was imagining that a
> device could support multiple graphics regions, that's where specifying
> the "id" as a region index seemed useful.  We lose that ability here
> unless we go back to defining a flag bit to specify how to interpret
> this last field.
> 

Right, as I mentioned in earlier reply, we need 2 seperate fields
- plane type : DRM_PLANE_TYPE_PRIMARY or DRM_PLANE_TYPE_CURSOR
- id : fd for dmabuf or region index for region type


>> 2. Remove dmabuf mgr fd and add these two ioctl commands to the vfio
>> device fd.
>>> VFIO_DEVICE_QUERY_GFX_PLANE : used to query
>>> vfio_device_gfx_plane_info.  
>>
>> Yes.
>>
>>> VFIO_DEVICE_GET_DMABUF_FD: used to create and return the dmabuf fd.  
> 
> I'm not convinced this adds value, but I'll list it as an option:
> 
> VFIO_DEVICE_QUERY(VFIO_DEVICE_GFX_PLANE)
> VFIO_DEVICE_GET_FD(VFIO_DEVICE_GFX_DMABUF_FD)
> 
> The benefit is that it might help to avoid a proliferation of ioctls on
> the device the pain is that we need to either define a field or section
> of flags which identify what is being queried or what type of device fd
> is being requested.
> 
>> Yes.  The plane might have changed between query-plane and get-dmabuf
>> ioctl calls though, we must make sure we handle that somehow.  Current
>> patches return plane_info on get-dmabuf ioctl too, so userspace can see
>> what it actually got.
>>
>> With the generation we can also do something different:  Pass in
>> plane_type and generation, and have VFIO_DEVICE_GET_DMABUF_FD return
>> an error in case the generation doesn't match.  In that case it doesn't
>> make much sense any more to have a separate plane_info struct, which
>> was added so we don't have to duplicate things in query-plane and get-
>> dmabuf ioctl structs.
> 
> I'm not sure I understand how this works for a region, the region is
> always the current generation, how can the user ever be sure the
> plane_info matches what is exposed in the region?  Thanks,
>

Userspace have to follow the sequence to query plane info
(VFIO_DEVICE_QUERY_GFX_PLANE) and then read primary surface from the region.
On kernel space side, from VFIO_DEVICE_QUERY_GFX_PLANE ioctl, driver
should update surface which is being exposed by the GFX region, fill

Re: [PATCH v9 5/7] vfio: Define vfio based dma-buf operations

2017-06-20 Thread Kirti Wankhede


On 6/20/2017 2:05 PM, Gerd Hoffmann wrote:
>   Hi,
> 
>>> Hmm, plane isn't really an ID, it is a type, with type being either
>>> DRM_PLANE_TYPE_PRIMARY or DRM_PLANE_TYPE_CURSOR, so I don't think
>>> the
>>> flage above make sense.
>>
>> The intention was that ..._REGION_ID and ...PLANE_ID are describing
>> what the vfio_device_query_gfx_plane.id field represents, either a
>> region index or a plane identifier.  The type of plane would be
>> represented within the vfio_device_gfx_plane_info struct.
> 
> The planes don't really have an id, we should rename that to
> plane_type, or maybe drm_plane_type (simliar to the drm_format_*
> fields), to avoid that confusion.
> 
> plane_type is set by userspace to specify what kind of plane it asks
> for.
> 

Ok. so there should be two fields:
- plane type : DRM_PLANE_TYPE_PRIMARY or DRM_PLANE_TYPE_CURSOR
- id : fd for dmabuf or region index for region type

Adding reply to Gerd's question from earlier mail:
> What are the nvidia plane for cursor support btw?

We don't support cursor for console vnc. Ideally console vnc should be
used by admin for configuration or during maintenance, which refresh
primary surface at low refresh rate, 10 fps. We recommend to use
remoting solution for actual use.

>>> Also I think it would be useful to have some way to figure the
>>> device
>>> capabilities as the userspace workflow will look quite different
>>> for
>>> the two cases.
>>
>> In the region case, VFIO_DEVICE_GET_REGION_INFO would include a
>> device
>> specific region with a hopefully common identifier to identify it as
>> a
>> graphics framebuffer.
> 
> Ok, that should work to figure whenever the mdev supports a plane
> region or not.
> 
>> In the dmabuf case,VFIO_DEVICE_QUERY_GFX_PLANE would indicate the
>> plane as a "plane ID" and some sort of
>> VFIO_DEVICE_GET_GFX_PLANE(VFIO_GFX_TYPE_DMABUF) ioctl would be
>> necessary to get a file descriptor to that plane.
>>
>> What else are you thinking we need?  Thanks,
> 
> I need to know whenever the mdev supports dmabufs or not, at device
> initialization time (because dmabufs require opengl support), when
> VFIO_DEVICE_QUERY_GFX_PLANE doesn't work due to the guest not having
> the device initialized yet.
> 
> Maybe we should have a error field in the ioctl struct, or we need to
> clearly define error codes so the kernel doesn't just throw EINVAL in
> all cases.
> 
> Or just a VFIO_DEVICE_GFX_CAPS ioctl which returns NONE, REGION or
> DMABUF.
>

Right we need to know this at device initialization time for both cases
to initialize VGACommonState structure for that device and also need
NONE to decide whether to init console vnc or not. We have a mechanism
to disable console vnc path and we recommend to disable it for better
performance.

Thanks,
Kirti

> cheers,
>   Gerd
> 


Re: [PATCH v9 5/7] vfio: Define vfio based dma-buf operations

2017-06-20 Thread Kirti Wankhede


On 6/20/2017 2:05 PM, Gerd Hoffmann wrote:
>   Hi,
> 
>>> Hmm, plane isn't really an ID, it is a type, with type being either
>>> DRM_PLANE_TYPE_PRIMARY or DRM_PLANE_TYPE_CURSOR, so I don't think
>>> the
>>> flage above make sense.
>>
>> The intention was that ..._REGION_ID and ...PLANE_ID are describing
>> what the vfio_device_query_gfx_plane.id field represents, either a
>> region index or a plane identifier.  The type of plane would be
>> represented within the vfio_device_gfx_plane_info struct.
> 
> The planes don't really have an id, we should rename that to
> plane_type, or maybe drm_plane_type (simliar to the drm_format_*
> fields), to avoid that confusion.
> 
> plane_type is set by userspace to specify what kind of plane it asks
> for.
> 

Ok. so there should be two fields:
- plane type : DRM_PLANE_TYPE_PRIMARY or DRM_PLANE_TYPE_CURSOR
- id : fd for dmabuf or region index for region type

Adding reply to Gerd's question from earlier mail:
> What are the nvidia plane for cursor support btw?

We don't support cursor for console vnc. Ideally console vnc should be
used by admin for configuration or during maintenance, which refresh
primary surface at low refresh rate, 10 fps. We recommend to use
remoting solution for actual use.

>>> Also I think it would be useful to have some way to figure the
>>> device
>>> capabilities as the userspace workflow will look quite different
>>> for
>>> the two cases.
>>
>> In the region case, VFIO_DEVICE_GET_REGION_INFO would include a
>> device
>> specific region with a hopefully common identifier to identify it as
>> a
>> graphics framebuffer.
> 
> Ok, that should work to figure whenever the mdev supports a plane
> region or not.
> 
>> In the dmabuf case,VFIO_DEVICE_QUERY_GFX_PLANE would indicate the
>> plane as a "plane ID" and some sort of
>> VFIO_DEVICE_GET_GFX_PLANE(VFIO_GFX_TYPE_DMABUF) ioctl would be
>> necessary to get a file descriptor to that plane.
>>
>> What else are you thinking we need?  Thanks,
> 
> I need to know whenever the mdev supports dmabufs or not, at device
> initialization time (because dmabufs require opengl support), when
> VFIO_DEVICE_QUERY_GFX_PLANE doesn't work due to the guest not having
> the device initialized yet.
> 
> Maybe we should have a error field in the ioctl struct, or we need to
> clearly define error codes so the kernel doesn't just throw EINVAL in
> all cases.
> 
> Or just a VFIO_DEVICE_GFX_CAPS ioctl which returns NONE, REGION or
> DMABUF.
>

Right we need to know this at device initialization time for both cases
to initialize VGACommonState structure for that device and also need
NONE to decide whether to init console vnc or not. We have a mechanism
to disable console vnc path and we recommend to disable it for better
performance.

Thanks,
Kirti

> cheers,
>   Gerd
> 


Re: [PATCH v9 5/7] vfio: Define vfio based dma-buf operations

2017-06-20 Thread Kirti Wankhede


On 6/19/2017 8:25 PM, Alex Williamson wrote:
> On Mon, 19 Jun 2017 08:38:32 +0200
> Gerd Hoffmann  wrote:
> 
>>   Hi,
>>
>>> My suggestion was to use vfio device fd for this ioctl and have
>>> dmabuf
>>> mgr fd as member in above query_plane structure, for region type it
>>> would be set to 0.  
>>
>> Region type should be DRM_PLANE_TYPE_PRIMARY
>>
>>> Can't mmap that page to get surface information. There is no way to
>>> synchronize between QEMU reading this mmapped region and vendor
>>> driver
>>> writing it. There could be race condition in these two operations.
>>> Read
>>> on this page should be trapped and blocking, so that surface in that
>>> region is only updated when its asked for.  
>>
>> Does it make sense to have a "generation" field in the plane_info
>> struct (which gets increased each time the struct changes) ?
> 
> It seems less cumbersome than checking each field to see if it has
> changed.  Thanks,
> 

Looks good. And vendor driver should take care of rounding up the value
when it reaches its max limit.

Thanks,
Kirti


> Alex
> 


Re: [PATCH v9 5/7] vfio: Define vfio based dma-buf operations

2017-06-20 Thread Kirti Wankhede


On 6/19/2017 8:25 PM, Alex Williamson wrote:
> On Mon, 19 Jun 2017 08:38:32 +0200
> Gerd Hoffmann  wrote:
> 
>>   Hi,
>>
>>> My suggestion was to use vfio device fd for this ioctl and have
>>> dmabuf
>>> mgr fd as member in above query_plane structure, for region type it
>>> would be set to 0.  
>>
>> Region type should be DRM_PLANE_TYPE_PRIMARY
>>
>>> Can't mmap that page to get surface information. There is no way to
>>> synchronize between QEMU reading this mmapped region and vendor
>>> driver
>>> writing it. There could be race condition in these two operations.
>>> Read
>>> on this page should be trapped and blocking, so that surface in that
>>> region is only updated when its asked for.  
>>
>> Does it make sense to have a "generation" field in the plane_info
>> struct (which gets increased each time the struct changes) ?
> 
> It seems less cumbersome than checking each field to see if it has
> changed.  Thanks,
> 

Looks good. And vendor driver should take care of rounding up the value
when it reaches its max limit.

Thanks,
Kirti


> Alex
> 


Re: [PATCH v9 5/7] vfio: Define vfio based dma-buf operations

2017-06-16 Thread Kirti Wankhede


On 6/16/2017 10:09 PM, Alex Williamson wrote:
> On Fri, 16 Jun 2017 19:02:30 +0530
> Kirti Wankhede <kwankh...@nvidia.com> wrote:
> 
>> On 6/16/2017 2:08 AM, Alex Williamson wrote:
>>> On Thu, 15 Jun 2017 18:00:38 +0200
>>> Gerd Hoffmann <kra...@redhat.com> wrote:
>>>   
>>>>   Hi,
>>>>  
>>>>>> +struct vfio_dmabuf_mgr_plane_info {
>>>>>> +__u64 start;
>>>>>> +__u64 drm_format_mod;
>>>>>> +__u32 drm_format;
>>>>>> +__u32 width;
>>>>>> +__u32 height;
>>>>>> +__u32 stride;
>>>>>> +__u32 size;
>>>>>> +__u32 x_pos;
>>>>>> +__u32 y_pos;
>>>>>> +__u32 padding;
>>>>>> +};
>>>>>> +
>>>>>
>>>>> This structure is generic, can remove dmabuf from its name,
>>>>> vfio_plane_info or vfio_vgpu_surface_info since this will only be
>>>>> used
>>>>> by vgpu.
>>>>
>>>> Agree.  
>>>
>>> I'm not sure I agree regarding the vgpu statement, maybe this is not
>>> dmabuf specific, but what makes it vgpu specific?  We need to separate
>>> our current usage plans from what it's actually describing and I don't
>>> see that it describes anything vgpu specific.
>>>
>>>>>> +struct vfio_dmabuf_mgr_query_plane {
>>>>>> +__u32 argsz;
>>>>>> +__u32 flags;
>>>>>> +struct vfio_dmabuf_mgr_plane_info plane_info;
>>>>>> +__u32 plane_id;
>>>>>> +};
>>>>>> +
>>>>>> +#define VFIO_DMABUF_MGR_QUERY_PLANE _IO(VFIO_TYPE, VFIO_BASE + 15)
>>>>>> +
>>>>>
>>>>> This same interface can be used to query surface/plane information
>>>>> for
>>>>> both, dmabuf and region, case. Here also 'DMABUF' can be removed and
>>>>> define flags if you want to differentiate query for 'dmabuf' and
>>>>> 'region'.
>>>>
>>>> Hmm, any specific reason why you want use a ioctl for that?  I would
>>>> simply place a "struct vfio_dmabuf_mgr_plane_info" (or whatever the
>>>> final name will be) at the start of the region.  
>>>
>>> Right, these are ioctls on the dmabuf mgr fd, not the vfio device fd,
>>> if you're exposing a region with the info I wouldn't think you'd want
>>> the hassle of managing this separate fd when you could do something
>>> like Gerd suggests with defining the first page of the regions as
>>> containing the structure.  
>>
>> My suggestion was to use vfio device fd for this ioctl and have dmabuf
>> mgr fd as member in above query_plane structure, for region type it
>> would be set to 0.
>> Yes there is other way to query surface information as Gerd suggested,
>> but my point is: if a ioctl is being add, it could be used for both
>> types, dmabuf and region.
> 
> I think this suggests abandoning the dmabuf manager fd entirely.  That's
> not necessarily a bad thing, but I don't think the idea of the dmabuf
> manager fd stands if we push one of its primary reasons for existing
> back to the device fd.  Reading though previous posts, I think we
> embraced the dmabuf manager as a separate fd primarily for
> consolidation and the potential to use it as a notification point, the
> latter being only theoretically useful.
> 
> So perhaps this becomes:
> 
> struct vfio_device_gfx_plane_info {
>   __u64 start;
>   __u64 drm_format_mod;
>   __u32 drm_format;
>   __u32 width;
>   __u32 height;
>   __u32 stride;
>   __u32 size;
>   __u32 x_pos;
>   __u32 y_pos;
> };
> 
> struct vfio_device_query_gfx_plane {
>   __u32 argsz;
>   __u32 flags;
> #define VFIO_GFX_PLANE_FLAGS_REGION_ID(1 << 0)
> #define VFIO_GFX_PLANE_FLAGS_PLANE_ID (1 << 1)
>   struct vfio_device_gfx_plane_info plane_info;
>   __u32 id; 
> };
> 
> The flag defines the data in the id field as either referring to a
> region (perhaps there could be multiple regions with only one active)
> or a plane ID, which is acquired separately, such as via a dmabuf fd.
> This would be retrieved via an optional VFIO_DEVICE_QUERY_GFX_PLANE
> ioctl on the vfio device, implemented in the vendor driver.
> 
> Would the above, along with

Re: [PATCH v9 5/7] vfio: Define vfio based dma-buf operations

2017-06-16 Thread Kirti Wankhede


On 6/16/2017 10:09 PM, Alex Williamson wrote:
> On Fri, 16 Jun 2017 19:02:30 +0530
> Kirti Wankhede  wrote:
> 
>> On 6/16/2017 2:08 AM, Alex Williamson wrote:
>>> On Thu, 15 Jun 2017 18:00:38 +0200
>>> Gerd Hoffmann  wrote:
>>>   
>>>>   Hi,
>>>>  
>>>>>> +struct vfio_dmabuf_mgr_plane_info {
>>>>>> +__u64 start;
>>>>>> +__u64 drm_format_mod;
>>>>>> +__u32 drm_format;
>>>>>> +__u32 width;
>>>>>> +__u32 height;
>>>>>> +__u32 stride;
>>>>>> +__u32 size;
>>>>>> +__u32 x_pos;
>>>>>> +__u32 y_pos;
>>>>>> +__u32 padding;
>>>>>> +};
>>>>>> +
>>>>>
>>>>> This structure is generic, can remove dmabuf from its name,
>>>>> vfio_plane_info or vfio_vgpu_surface_info since this will only be
>>>>> used
>>>>> by vgpu.
>>>>
>>>> Agree.  
>>>
>>> I'm not sure I agree regarding the vgpu statement, maybe this is not
>>> dmabuf specific, but what makes it vgpu specific?  We need to separate
>>> our current usage plans from what it's actually describing and I don't
>>> see that it describes anything vgpu specific.
>>>
>>>>>> +struct vfio_dmabuf_mgr_query_plane {
>>>>>> +__u32 argsz;
>>>>>> +__u32 flags;
>>>>>> +struct vfio_dmabuf_mgr_plane_info plane_info;
>>>>>> +__u32 plane_id;
>>>>>> +};
>>>>>> +
>>>>>> +#define VFIO_DMABUF_MGR_QUERY_PLANE _IO(VFIO_TYPE, VFIO_BASE + 15)
>>>>>> +
>>>>>
>>>>> This same interface can be used to query surface/plane information
>>>>> for
>>>>> both, dmabuf and region, case. Here also 'DMABUF' can be removed and
>>>>> define flags if you want to differentiate query for 'dmabuf' and
>>>>> 'region'.
>>>>
>>>> Hmm, any specific reason why you want use a ioctl for that?  I would
>>>> simply place a "struct vfio_dmabuf_mgr_plane_info" (or whatever the
>>>> final name will be) at the start of the region.  
>>>
>>> Right, these are ioctls on the dmabuf mgr fd, not the vfio device fd,
>>> if you're exposing a region with the info I wouldn't think you'd want
>>> the hassle of managing this separate fd when you could do something
>>> like Gerd suggests with defining the first page of the regions as
>>> containing the structure.  
>>
>> My suggestion was to use vfio device fd for this ioctl and have dmabuf
>> mgr fd as member in above query_plane structure, for region type it
>> would be set to 0.
>> Yes there is other way to query surface information as Gerd suggested,
>> but my point is: if a ioctl is being add, it could be used for both
>> types, dmabuf and region.
> 
> I think this suggests abandoning the dmabuf manager fd entirely.  That's
> not necessarily a bad thing, but I don't think the idea of the dmabuf
> manager fd stands if we push one of its primary reasons for existing
> back to the device fd.  Reading though previous posts, I think we
> embraced the dmabuf manager as a separate fd primarily for
> consolidation and the potential to use it as a notification point, the
> latter being only theoretically useful.
> 
> So perhaps this becomes:
> 
> struct vfio_device_gfx_plane_info {
>   __u64 start;
>   __u64 drm_format_mod;
>   __u32 drm_format;
>   __u32 width;
>   __u32 height;
>   __u32 stride;
>   __u32 size;
>   __u32 x_pos;
>   __u32 y_pos;
> };
> 
> struct vfio_device_query_gfx_plane {
>   __u32 argsz;
>   __u32 flags;
> #define VFIO_GFX_PLANE_FLAGS_REGION_ID(1 << 0)
> #define VFIO_GFX_PLANE_FLAGS_PLANE_ID (1 << 1)
>   struct vfio_device_gfx_plane_info plane_info;
>   __u32 id; 
> };
> 
> The flag defines the data in the id field as either referring to a
> region (perhaps there could be multiple regions with only one active)
> or a plane ID, which is acquired separately, such as via a dmabuf fd.
> This would be retrieved via an optional VFIO_DEVICE_QUERY_GFX_PLANE
> ioctl on the vfio device, implemented in the vendor driver.
> 
> Would the above, along with the already defined mechanism for defining
> device spe

Re: [PATCH v9 5/7] vfio: Define vfio based dma-buf operations

2017-06-16 Thread Kirti Wankhede


On 6/16/2017 2:08 AM, Alex Williamson wrote:
> On Thu, 15 Jun 2017 18:00:38 +0200
> Gerd Hoffmann  wrote:
> 
>>   Hi,
>>
 +struct vfio_dmabuf_mgr_plane_info {
 +  __u64 start;
 +  __u64 drm_format_mod;
 +  __u32 drm_format;
 +  __u32 width;
 +  __u32 height;
 +  __u32 stride;
 +  __u32 size;
 +  __u32 x_pos;
 +  __u32 y_pos;
 +  __u32 padding;
 +};
 +  
>>>
>>> This structure is generic, can remove dmabuf from its name,
>>> vfio_plane_info or vfio_vgpu_surface_info since this will only be
>>> used
>>> by vgpu.  
>>
>> Agree.
> 
> I'm not sure I agree regarding the vgpu statement, maybe this is not
> dmabuf specific, but what makes it vgpu specific?  We need to separate
> our current usage plans from what it's actually describing and I don't
> see that it describes anything vgpu specific.
>  
 +struct vfio_dmabuf_mgr_query_plane {
 +  __u32 argsz;
 +  __u32 flags;
 +  struct vfio_dmabuf_mgr_plane_info plane_info;
 +  __u32 plane_id;
 +};
 +
 +#define VFIO_DMABUF_MGR_QUERY_PLANE _IO(VFIO_TYPE, VFIO_BASE + 15)
 +  
>>>
>>> This same interface can be used to query surface/plane information
>>> for
>>> both, dmabuf and region, case. Here also 'DMABUF' can be removed and
>>> define flags if you want to differentiate query for 'dmabuf' and
>>> 'region'.  
>>
>> Hmm, any specific reason why you want use a ioctl for that?  I would
>> simply place a "struct vfio_dmabuf_mgr_plane_info" (or whatever the
>> final name will be) at the start of the region.
> 
> Right, these are ioctls on the dmabuf mgr fd, not the vfio device fd,
> if you're exposing a region with the info I wouldn't think you'd want
> the hassle of managing this separate fd when you could do something
> like Gerd suggests with defining the first page of the regions as
> containing the structure.

My suggestion was to use vfio device fd for this ioctl and have dmabuf
mgr fd as member in above query_plane structure, for region type it
would be set to 0.
Yes there is other way to query surface information as Gerd suggested,
but my point is: if a ioctl is being add, it could be used for both
types, dmabuf and region.

>  Maybe you could even allow mmap of that page
> to reduce the overhead of getting the current state.  

Can't mmap that page to get surface information. There is no way to
synchronize between QEMU reading this mmapped region and vendor driver
writing it. There could be race condition in these two operations. Read
on this page should be trapped and blocking, so that surface in that
region is only updated when its asked for.

> For the sake of
> userspace, I'd hope we'd still use the same structure for either the
> ioctl or region mapping.  I'm not really in favor of declaring that
> this particular ioctl might exist on the device fd when such-and-such
> region is present otherwise it might exist on a dmabuf manager fd.

Userspace will always use vfio device fd for this ioctl, it only have to
set proper arguments to its structure based on type.

Thanks,
Kirti

> Thanks,
> 
> Alex
> 


Re: [PATCH v9 5/7] vfio: Define vfio based dma-buf operations

2017-06-16 Thread Kirti Wankhede


On 6/16/2017 2:08 AM, Alex Williamson wrote:
> On Thu, 15 Jun 2017 18:00:38 +0200
> Gerd Hoffmann  wrote:
> 
>>   Hi,
>>
 +struct vfio_dmabuf_mgr_plane_info {
 +  __u64 start;
 +  __u64 drm_format_mod;
 +  __u32 drm_format;
 +  __u32 width;
 +  __u32 height;
 +  __u32 stride;
 +  __u32 size;
 +  __u32 x_pos;
 +  __u32 y_pos;
 +  __u32 padding;
 +};
 +  
>>>
>>> This structure is generic, can remove dmabuf from its name,
>>> vfio_plane_info or vfio_vgpu_surface_info since this will only be
>>> used
>>> by vgpu.  
>>
>> Agree.
> 
> I'm not sure I agree regarding the vgpu statement, maybe this is not
> dmabuf specific, but what makes it vgpu specific?  We need to separate
> our current usage plans from what it's actually describing and I don't
> see that it describes anything vgpu specific.
>  
 +struct vfio_dmabuf_mgr_query_plane {
 +  __u32 argsz;
 +  __u32 flags;
 +  struct vfio_dmabuf_mgr_plane_info plane_info;
 +  __u32 plane_id;
 +};
 +
 +#define VFIO_DMABUF_MGR_QUERY_PLANE _IO(VFIO_TYPE, VFIO_BASE + 15)
 +  
>>>
>>> This same interface can be used to query surface/plane information
>>> for
>>> both, dmabuf and region, case. Here also 'DMABUF' can be removed and
>>> define flags if you want to differentiate query for 'dmabuf' and
>>> 'region'.  
>>
>> Hmm, any specific reason why you want use a ioctl for that?  I would
>> simply place a "struct vfio_dmabuf_mgr_plane_info" (or whatever the
>> final name will be) at the start of the region.
> 
> Right, these are ioctls on the dmabuf mgr fd, not the vfio device fd,
> if you're exposing a region with the info I wouldn't think you'd want
> the hassle of managing this separate fd when you could do something
> like Gerd suggests with defining the first page of the regions as
> containing the structure.

My suggestion was to use vfio device fd for this ioctl and have dmabuf
mgr fd as member in above query_plane structure, for region type it
would be set to 0.
Yes there is other way to query surface information as Gerd suggested,
but my point is: if a ioctl is being add, it could be used for both
types, dmabuf and region.

>  Maybe you could even allow mmap of that page
> to reduce the overhead of getting the current state.  

Can't mmap that page to get surface information. There is no way to
synchronize between QEMU reading this mmapped region and vendor driver
writing it. There could be race condition in these two operations. Read
on this page should be trapped and blocking, so that surface in that
region is only updated when its asked for.

> For the sake of
> userspace, I'd hope we'd still use the same structure for either the
> ioctl or region mapping.  I'm not really in favor of declaring that
> this particular ioctl might exist on the device fd when such-and-such
> region is present otherwise it might exist on a dmabuf manager fd.

Userspace will always use vfio device fd for this ioctl, it only have to
set proper arguments to its structure based on type.

Thanks,
Kirti

> Thanks,
> 
> Alex
> 


Re: [PATCH v9 5/7] vfio: Define vfio based dma-buf operations

2017-06-15 Thread Kirti Wankhede


On 6/15/2017 1:30 PM, Xiaoguang Chen wrote:
> Here we defined a new ioctl to create a fd for a vfio device based on
> the input type. Now only one type is supported that is a dma-buf
> management fd.
> Two ioctls are defined for the dma-buf management fd: query the vfio
> vgpu's plane information and create a dma-buf for a plane.
> 

I had suggested how we can use common structures for both type of ways
to query surface on v6 version of your patch,
https://lkml.org/lkml/2017/6/1/890


> Signed-off-by: Xiaoguang Chen 
> ---
>  include/uapi/linux/vfio.h | 57 
> +++
>  1 file changed, 57 insertions(+)
> 
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index ae46105..7d86101 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -502,6 +502,63 @@ struct vfio_pci_hot_reset {
>  
>  #define VFIO_DEVICE_PCI_HOT_RESET_IO(VFIO_TYPE, VFIO_BASE + 13)
>  
> +/**
> + * VFIO_DEVICE_GET_FD - _IO(VFIO_TYPE, VFIO_BASE + 14, __u32)
> + *
> + * Create a fd for a vfio device based on the input type
> + * Vendor driver should handle this ioctl to create a fd and manage the
> + * life cycle of this fd.
> + *
> + * Return: a fd if vendor support that type, -errno if not supported
> + */
> +
> +#define VFIO_DEVICE_GET_FD   _IO(VFIO_TYPE, VFIO_BASE + 14)
> +
> +#define VFIO_DEVICE_DMABUF_MGR_FD0 /* Supported fd types */
> +
> +struct vfio_dmabuf_mgr_plane_info {
> + __u64 start;
> + __u64 drm_format_mod;
> + __u32 drm_format;
> + __u32 width;
> + __u32 height;
> + __u32 stride;
> + __u32 size;
> + __u32 x_pos;
> + __u32 y_pos;
> + __u32 padding;
> +};
> +

This structure is generic, can remove dmabuf from its name,
vfio_plane_info or vfio_vgpu_surface_info since this will only be used
by vgpu.

> +/*
> + * VFIO_DMABUF_MGR_QUERY_PLANE - _IO(VFIO_TYPE, VFIO_BASE + 15,
> + *   struct vfio_dmabuf_mgr_query_plane)
> + * Query plane information
> + */
> +struct vfio_dmabuf_mgr_query_plane {
> + __u32 argsz;
> + __u32 flags;
> + struct vfio_dmabuf_mgr_plane_info plane_info;
> + __u32 plane_id;
> +};
> +
> +#define VFIO_DMABUF_MGR_QUERY_PLANE _IO(VFIO_TYPE, VFIO_BASE + 15)
> +

This same interface can be used to query surface/plane information for
both, dmabuf and region, case. Here also 'DMABUF' can be removed and
define flags if you want to differentiate query for 'dmabuf' and 'region'.

Thanks,
Kirti

> +/*
> + * VFIO_DMABUF_MGR_CREATE_DMABUF - _IO(VFIO, VFIO_BASE + 16,
> + *   struct vfio_dmabuf_mgr_create_dmabuf)
> + *
> + * Create a dma-buf for a plane
> + */
> +struct vfio_dmabuf_mgr_create_dmabuf {
> + __u32 argsz;
> + __u32 flags;
> + struct vfio_dmabuf_mgr_plane_info plane_info;
> + __u32 plane_id;
> + __s32 fd;
> +};
> +
> +#define VFIO_DMABUF_MGR_CREATE_DMABUF _IO(VFIO_TYPE, VFIO_BASE + 16)
> +
>  /*  API for Type1 VFIO IOMMU  */
>  /**
> 


Re: [PATCH v9 5/7] vfio: Define vfio based dma-buf operations

2017-06-15 Thread Kirti Wankhede


On 6/15/2017 1:30 PM, Xiaoguang Chen wrote:
> Here we defined a new ioctl to create a fd for a vfio device based on
> the input type. Now only one type is supported that is a dma-buf
> management fd.
> Two ioctls are defined for the dma-buf management fd: query the vfio
> vgpu's plane information and create a dma-buf for a plane.
> 

I had suggested how we can use common structures for both type of ways
to query surface on v6 version of your patch,
https://lkml.org/lkml/2017/6/1/890


> Signed-off-by: Xiaoguang Chen 
> ---
>  include/uapi/linux/vfio.h | 57 
> +++
>  1 file changed, 57 insertions(+)
> 
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index ae46105..7d86101 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -502,6 +502,63 @@ struct vfio_pci_hot_reset {
>  
>  #define VFIO_DEVICE_PCI_HOT_RESET_IO(VFIO_TYPE, VFIO_BASE + 13)
>  
> +/**
> + * VFIO_DEVICE_GET_FD - _IO(VFIO_TYPE, VFIO_BASE + 14, __u32)
> + *
> + * Create a fd for a vfio device based on the input type
> + * Vendor driver should handle this ioctl to create a fd and manage the
> + * life cycle of this fd.
> + *
> + * Return: a fd if vendor support that type, -errno if not supported
> + */
> +
> +#define VFIO_DEVICE_GET_FD   _IO(VFIO_TYPE, VFIO_BASE + 14)
> +
> +#define VFIO_DEVICE_DMABUF_MGR_FD0 /* Supported fd types */
> +
> +struct vfio_dmabuf_mgr_plane_info {
> + __u64 start;
> + __u64 drm_format_mod;
> + __u32 drm_format;
> + __u32 width;
> + __u32 height;
> + __u32 stride;
> + __u32 size;
> + __u32 x_pos;
> + __u32 y_pos;
> + __u32 padding;
> +};
> +

This structure is generic, can remove dmabuf from its name,
vfio_plane_info or vfio_vgpu_surface_info since this will only be used
by vgpu.

> +/*
> + * VFIO_DMABUF_MGR_QUERY_PLANE - _IO(VFIO_TYPE, VFIO_BASE + 15,
> + *   struct vfio_dmabuf_mgr_query_plane)
> + * Query plane information
> + */
> +struct vfio_dmabuf_mgr_query_plane {
> + __u32 argsz;
> + __u32 flags;
> + struct vfio_dmabuf_mgr_plane_info plane_info;
> + __u32 plane_id;
> +};
> +
> +#define VFIO_DMABUF_MGR_QUERY_PLANE _IO(VFIO_TYPE, VFIO_BASE + 15)
> +

This same interface can be used to query surface/plane information for
both, dmabuf and region, case. Here also 'DMABUF' can be removed and
define flags if you want to differentiate query for 'dmabuf' and 'region'.

Thanks,
Kirti

> +/*
> + * VFIO_DMABUF_MGR_CREATE_DMABUF - _IO(VFIO, VFIO_BASE + 16,
> + *   struct vfio_dmabuf_mgr_create_dmabuf)
> + *
> + * Create a dma-buf for a plane
> + */
> +struct vfio_dmabuf_mgr_create_dmabuf {
> + __u32 argsz;
> + __u32 flags;
> + struct vfio_dmabuf_mgr_plane_info plane_info;
> + __u32 plane_id;
> + __s32 fd;
> +};
> +
> +#define VFIO_DMABUF_MGR_CREATE_DMABUF _IO(VFIO_TYPE, VFIO_BASE + 16)
> +
>  /*  API for Type1 VFIO IOMMU  */
>  /**
> 


Re: [PATCH v6 4/6] vfio: Define vfio based vgpu's dma-buf operations

2017-06-05 Thread Kirti Wankhede


On 6/2/2017 2:08 PM, Gerd Hoffmann wrote:
> 
>> struct vfio_vgpu_surface_info {
>> __u64 start;
>> __u32 width;
>> __u32 height;
>> __u32 stride;
>> __u32 size;
>> __u32 x_pos;
>> __u32 y_pos;
>> __u32 padding;
>> /* Only used when VFIO_VGPU_SURFACE_DMABUF_* flags set */
>> __u64 drm_format_mod;
>> __u32 drm_format;
> 
> Why for dmabufs only?  Shouldn't the region specify the format too? 
> Even in case you are using a fixed one (say DRM_FORMAT_XRGB) you
> can explicitly say so in drm_format (and set drm_format_mod to zero).
> 

Definitions for PIXMAN formats and DRM formats are different. I think
we need a flag to specify the format of surface that vendor driver is
going to provide, PIXMAN or DRM.
If surface is provided through region in PIXMAN format, existing
functions in QEMU can be used to get format from bpp value,
qemu_default_pixman_format(). Similarly, display surface can be updated
by QEMU using qemu_create_displaysurface_from() from mmaped region.

Thanks,
Kirti

> cheers,
>   Gerd
> 


Re: [PATCH v6 4/6] vfio: Define vfio based vgpu's dma-buf operations

2017-06-05 Thread Kirti Wankhede


On 6/2/2017 2:08 PM, Gerd Hoffmann wrote:
> 
>> struct vfio_vgpu_surface_info {
>> __u64 start;
>> __u32 width;
>> __u32 height;
>> __u32 stride;
>> __u32 size;
>> __u32 x_pos;
>> __u32 y_pos;
>> __u32 padding;
>> /* Only used when VFIO_VGPU_SURFACE_DMABUF_* flags set */
>> __u64 drm_format_mod;
>> __u32 drm_format;
> 
> Why for dmabufs only?  Shouldn't the region specify the format too? 
> Even in case you are using a fixed one (say DRM_FORMAT_XRGB) you
> can explicitly say so in drm_format (and set drm_format_mod to zero).
> 

Definitions for PIXMAN formats and DRM formats are different. I think
we need a flag to specify the format of surface that vendor driver is
going to provide, PIXMAN or DRM.
If surface is provided through region in PIXMAN format, existing
functions in QEMU can be used to get format from bpp value,
qemu_default_pixman_format(). Similarly, display surface can be updated
by QEMU using qemu_create_displaysurface_from() from mmaped region.

Thanks,
Kirti

> cheers,
>   Gerd
> 


Re: [PATCH v6 4/6] vfio: Define vfio based vgpu's dma-buf operations

2017-06-01 Thread Kirti Wankhede


On 6/1/2017 10:08 PM, Alex Williamson wrote:
> On Thu, 1 Jun 2017 03:01:28 +
> "Chen, Xiaoguang" <xiaoguang.c...@intel.com> wrote:
> 
>> Hi Kirti,
>>
>>> -Original Message-
>>> From: Kirti Wankhede [mailto:kwankh...@nvidia.com]
>>> Sent: Thursday, June 01, 2017 1:23 AM
>>> To: Chen, Xiaoguang <xiaoguang.c...@intel.com>; Gerd Hoffmann
>>> <kra...@redhat.com>; alex.william...@redhat.com; ch...@chris-wilson.co.uk;
>>> intel-...@lists.freedesktop.org; linux-kernel@vger.kernel.org;
>>> zhen...@linux.intel.com; Lv, Zhiyuan <zhiyuan...@intel.com>; intel-gvt-
>>> d...@lists.freedesktop.org; Wang, Zhi A <zhi.a.w...@intel.com>; Tian, Kevin
>>> <kevin.t...@intel.com>
>>> Subject: Re: [PATCH v6 4/6] vfio: Define vfio based vgpu's dma-buf 
>>> operations
>>>
>>>
>>>
>>> On 5/31/2017 11:48 AM, Chen, Xiaoguang wrote:  
>>>> Hi,
>>>>  
>>>>> -Original Message-
>>>>> From: Gerd Hoffmann [mailto:kra...@redhat.com]
>>>>> Sent: Monday, May 29, 2017 3:20 PM
>>>>> To: Chen, Xiaoguang <xiaoguang.c...@intel.com>;
>>>>> alex.william...@redhat.com; ch...@chris-wilson.co.uk; intel-
>>>>> g...@lists.freedesktop.org; linux-kernel@vger.kernel.org;
>>>>> zhen...@linux.intel.com; Lv, Zhiyuan <zhiyuan...@intel.com>;
>>>>> intel-gvt- d...@lists.freedesktop.org; Wang, Zhi A
>>>>> <zhi.a.w...@intel.com>; Tian, Kevin <kevin.t...@intel.com>
>>>>> Subject: Re: [PATCH v6 4/6] vfio: Define vfio based vgpu's dma-buf
>>>>> operations
>>>>>  
>>>>>> +struct vfio_vgpu_dmabuf_info {
>>>>>> +__u32 argsz;
>>>>>> +__u32 flags;
>>>>>> +struct vfio_vgpu_plane_info plane_info;
>>>>>> +__s32 fd;
>>>>>> +__u32 pad;
>>>>>> +};  
>>>>>
>>>>> Hmm, now you have argsz and flags twice in vfio_vgpu_dmabuf_info ...
>>>>>
>>>>> I think we should have something like this:
>>>>>
>>>>> struct vfio_vgpu_plane_info {
>>>>> __u64 start;
>>>>> __u64 drm_format_mod;
>>>>> __u32 drm_format;
>>>>> __u32 width;
>>>>> __u32 height;
>>>>> __u32 stride;
>>>>> __u32 size;
>>>>> __u32 x_pos;
>>>>> __u32 y_pos;
>>>>>__u32 padding;
>>>>> };
>>>>>
>>>>> struct vfio_vgpu_query_plane {
>>>>>   __u32 argsz;
>>>>>   __u32 flags;
>>>>>   struct vfio_vgpu_plane_info plane_info;
>>>>>__u32 plane_id;
>>>>>__u32 padding;
>>>>> };
>>>>>
>>>>> struct vfio_vgpu_create_dmabuf {
>>>>>   __u32 argsz;
>>>>>   __u32 flags;
>>>>>   struct vfio_vgpu_plane_info plane_info;
>>>>>__u32 plane_id;
>>>>>__s32 fd;
>>>>> };  
>>>> Good suggestion will apply in the next version.
>>>> Thanks for review :)
>>>>  
>>>
>>> Can you define what are the expected values of 'flags' would be?  
>> Flags is not used in this case.  It is defined to follow the rules of vfio 
>> ioctls.
> 
> An important note about flags, the vendor driver must validate it.  If
> they don't and the user passes an arbitrary value there, then we have a
> backwards compatibility issue with ever attempting to use the flags
> field.  The user passing in a flag unknown to the vendor driver should
> return an -EINVAL response.  In this case, we haven't defined any
> flags, so the vendor driver needs to force the user to pass zero.

There are two ways QEMU can get surface for console:
1. adding a region using region capability
2. dmabuf

In both the above case surface parameters need to be queried from vendor
driver are same. The structure would be :

struct vfio_vgpu_surface_info {
__u64 start;
__u32 width;
__u32 height;
__u32 stride;
__u32 size;
__u32 x_pos;
__u32 y_pos;
__u32 padding;
/* Only used when VFIO_VGPU_SURFACE_DMABUF_* flags set */
__u64 drm_format_mod;
__u32 drm_format;
};

We can use one ioctl to query surface information from vendor driver,
structure would look like:

struct vfio_vgpu_get_surface_info{
__u32 argsz;
__u32 flags;
#define VFIO_VGPU_SURFACE_DMABUF_CREATE (1 << 0) /* Create dmabuf */
#define VFIO_VGPU_SURFACE_DMABUF_QUERY  (1 << 1) /* Query surface info
for dmabuf */
#define VFIO_VGPU_SURFACE_REGION_QUERY  (1 << 2) /* Query surface info
for REGION type  */
struct vfio_vgpu_surface_info surface;
__u32 plane_id;
__s32 fd;
};

#define VFIO_DEVICE_SURFACE_INFO _IO(VFIO_TYPE, VFIO_BASE + 15)

Vendor driver should return -EINVAL, if that type of query is not
supported.

I would like to design this interface to support both type, region cap
and dmabuf.

Thanks,
Kirti


Re: [PATCH v6 4/6] vfio: Define vfio based vgpu's dma-buf operations

2017-06-01 Thread Kirti Wankhede


On 6/1/2017 10:08 PM, Alex Williamson wrote:
> On Thu, 1 Jun 2017 03:01:28 +
> "Chen, Xiaoguang"  wrote:
> 
>> Hi Kirti,
>>
>>> -----Original Message-
>>> From: Kirti Wankhede [mailto:kwankh...@nvidia.com]
>>> Sent: Thursday, June 01, 2017 1:23 AM
>>> To: Chen, Xiaoguang ; Gerd Hoffmann
>>> ; alex.william...@redhat.com; ch...@chris-wilson.co.uk;
>>> intel-...@lists.freedesktop.org; linux-kernel@vger.kernel.org;
>>> zhen...@linux.intel.com; Lv, Zhiyuan ; intel-gvt-
>>> d...@lists.freedesktop.org; Wang, Zhi A ; Tian, Kevin
>>> 
>>> Subject: Re: [PATCH v6 4/6] vfio: Define vfio based vgpu's dma-buf 
>>> operations
>>>
>>>
>>>
>>> On 5/31/2017 11:48 AM, Chen, Xiaoguang wrote:  
>>>> Hi,
>>>>  
>>>>> -Original Message-
>>>>> From: Gerd Hoffmann [mailto:kra...@redhat.com]
>>>>> Sent: Monday, May 29, 2017 3:20 PM
>>>>> To: Chen, Xiaoguang ;
>>>>> alex.william...@redhat.com; ch...@chris-wilson.co.uk; intel-
>>>>> g...@lists.freedesktop.org; linux-kernel@vger.kernel.org;
>>>>> zhen...@linux.intel.com; Lv, Zhiyuan ;
>>>>> intel-gvt- d...@lists.freedesktop.org; Wang, Zhi A
>>>>> ; Tian, Kevin 
>>>>> Subject: Re: [PATCH v6 4/6] vfio: Define vfio based vgpu's dma-buf
>>>>> operations
>>>>>  
>>>>>> +struct vfio_vgpu_dmabuf_info {
>>>>>> +__u32 argsz;
>>>>>> +__u32 flags;
>>>>>> +struct vfio_vgpu_plane_info plane_info;
>>>>>> +__s32 fd;
>>>>>> +__u32 pad;
>>>>>> +};  
>>>>>
>>>>> Hmm, now you have argsz and flags twice in vfio_vgpu_dmabuf_info ...
>>>>>
>>>>> I think we should have something like this:
>>>>>
>>>>> struct vfio_vgpu_plane_info {
>>>>> __u64 start;
>>>>> __u64 drm_format_mod;
>>>>> __u32 drm_format;
>>>>> __u32 width;
>>>>> __u32 height;
>>>>> __u32 stride;
>>>>> __u32 size;
>>>>> __u32 x_pos;
>>>>> __u32 y_pos;
>>>>>__u32 padding;
>>>>> };
>>>>>
>>>>> struct vfio_vgpu_query_plane {
>>>>>   __u32 argsz;
>>>>>   __u32 flags;
>>>>>   struct vfio_vgpu_plane_info plane_info;
>>>>>__u32 plane_id;
>>>>>__u32 padding;
>>>>> };
>>>>>
>>>>> struct vfio_vgpu_create_dmabuf {
>>>>>   __u32 argsz;
>>>>>   __u32 flags;
>>>>>   struct vfio_vgpu_plane_info plane_info;
>>>>>__u32 plane_id;
>>>>>__s32 fd;
>>>>> };  
>>>> Good suggestion will apply in the next version.
>>>> Thanks for review :)
>>>>  
>>>
>>> Can you define what are the expected values of 'flags' would be?  
>> Flags is not used in this case.  It is defined to follow the rules of vfio 
>> ioctls.
> 
> An important note about flags, the vendor driver must validate it.  If
> they don't and the user passes an arbitrary value there, then we have a
> backwards compatibility issue with ever attempting to use the flags
> field.  The user passing in a flag unknown to the vendor driver should
> return an -EINVAL response.  In this case, we haven't defined any
> flags, so the vendor driver needs to force the user to pass zero.

There are two ways QEMU can get surface for console:
1. adding a region using region capability
2. dmabuf

In both the above case surface parameters need to be queried from vendor
driver are same. The structure would be :

struct vfio_vgpu_surface_info {
__u64 start;
__u32 width;
__u32 height;
__u32 stride;
__u32 size;
__u32 x_pos;
__u32 y_pos;
__u32 padding;
/* Only used when VFIO_VGPU_SURFACE_DMABUF_* flags set */
__u64 drm_format_mod;
__u32 drm_format;
};

We can use one ioctl to query surface information from vendor driver,
structure would look like:

struct vfio_vgpu_get_surface_info{
__u32 argsz;
__u32 flags;
#define VFIO_VGPU_SURFACE_DMABUF_CREATE (1 << 0) /* Create dmabuf */
#define VFIO_VGPU_SURFACE_DMABUF_QUERY  (1 << 1) /* Query surface info
for dmabuf */
#define VFIO_VGPU_SURFACE_REGION_QUERY  (1 << 2) /* Query surface info
for REGION type  */
struct vfio_vgpu_surface_info surface;
__u32 plane_id;
__s32 fd;
};

#define VFIO_DEVICE_SURFACE_INFO _IO(VFIO_TYPE, VFIO_BASE + 15)

Vendor driver should return -EINVAL, if that type of query is not
supported.

I would like to design this interface to support both type, region cap
and dmabuf.

Thanks,
Kirti


Re: [PATCH v6 4/6] vfio: Define vfio based vgpu's dma-buf operations

2017-05-31 Thread Kirti Wankhede


On 5/31/2017 11:48 AM, Chen, Xiaoguang wrote:
> Hi,
> 
>> -Original Message-
>> From: Gerd Hoffmann [mailto:kra...@redhat.com]
>> Sent: Monday, May 29, 2017 3:20 PM
>> To: Chen, Xiaoguang ;
>> alex.william...@redhat.com; ch...@chris-wilson.co.uk; intel-
>> g...@lists.freedesktop.org; linux-kernel@vger.kernel.org;
>> zhen...@linux.intel.com; Lv, Zhiyuan ; intel-gvt-
>> d...@lists.freedesktop.org; Wang, Zhi A ; Tian, Kevin
>> 
>> Subject: Re: [PATCH v6 4/6] vfio: Define vfio based vgpu's dma-buf operations
>>
>>> +struct vfio_vgpu_dmabuf_info {
>>> +   __u32 argsz;
>>> +   __u32 flags;
>>> +   struct vfio_vgpu_plane_info plane_info;
>>> +   __s32 fd;
>>> +   __u32 pad;
>>> +};
>>
>> Hmm, now you have argsz and flags twice in vfio_vgpu_dmabuf_info ...
>>
>> I think we should have something like this:
>>
>> struct vfio_vgpu_plane_info {
>> __u64 start;
>> __u64 drm_format_mod;
>> __u32 drm_format;
>> __u32 width;
>> __u32 height;
>> __u32 stride;
>> __u32 size;
>> __u32 x_pos;
>> __u32 y_pos;
>>__u32 padding;
>> };
>>
>> struct vfio_vgpu_query_plane {
>>  __u32 argsz;
>>  __u32 flags;
>>  struct vfio_vgpu_plane_info plane_info;
>>__u32 plane_id;
>>__u32 padding;
>> };
>>
>> struct vfio_vgpu_create_dmabuf {
>>  __u32 argsz;
>>  __u32 flags;
>>  struct vfio_vgpu_plane_info plane_info;
>>__u32 plane_id;
>>__s32 fd;
>> };
> Good suggestion will apply in the next version.
> Thanks for review :)
> 

Can you define what are the expected values of 'flags' would be?

Thanks,
Kirti

> Chenxg.
> 


Re: [PATCH v6 4/6] vfio: Define vfio based vgpu's dma-buf operations

2017-05-31 Thread Kirti Wankhede


On 5/31/2017 11:48 AM, Chen, Xiaoguang wrote:
> Hi,
> 
>> -Original Message-
>> From: Gerd Hoffmann [mailto:kra...@redhat.com]
>> Sent: Monday, May 29, 2017 3:20 PM
>> To: Chen, Xiaoguang ;
>> alex.william...@redhat.com; ch...@chris-wilson.co.uk; intel-
>> g...@lists.freedesktop.org; linux-kernel@vger.kernel.org;
>> zhen...@linux.intel.com; Lv, Zhiyuan ; intel-gvt-
>> d...@lists.freedesktop.org; Wang, Zhi A ; Tian, Kevin
>> 
>> Subject: Re: [PATCH v6 4/6] vfio: Define vfio based vgpu's dma-buf operations
>>
>>> +struct vfio_vgpu_dmabuf_info {
>>> +   __u32 argsz;
>>> +   __u32 flags;
>>> +   struct vfio_vgpu_plane_info plane_info;
>>> +   __s32 fd;
>>> +   __u32 pad;
>>> +};
>>
>> Hmm, now you have argsz and flags twice in vfio_vgpu_dmabuf_info ...
>>
>> I think we should have something like this:
>>
>> struct vfio_vgpu_plane_info {
>> __u64 start;
>> __u64 drm_format_mod;
>> __u32 drm_format;
>> __u32 width;
>> __u32 height;
>> __u32 stride;
>> __u32 size;
>> __u32 x_pos;
>> __u32 y_pos;
>>__u32 padding;
>> };
>>
>> struct vfio_vgpu_query_plane {
>>  __u32 argsz;
>>  __u32 flags;
>>  struct vfio_vgpu_plane_info plane_info;
>>__u32 plane_id;
>>__u32 padding;
>> };
>>
>> struct vfio_vgpu_create_dmabuf {
>>  __u32 argsz;
>>  __u32 flags;
>>  struct vfio_vgpu_plane_info plane_info;
>>__u32 plane_id;
>>__s32 fd;
>> };
> Good suggestion will apply in the next version.
> Thanks for review :)
> 

Can you define what are the expected values of 'flags' would be?

Thanks,
Kirti

> Chenxg.
> 


Re: [PATCH v5 5/5] drm/i915/gvt: Adding interface so user space can get the dma-buf

2017-05-23 Thread Kirti Wankhede


On 5/23/2017 7:39 PM, Gerd Hoffmann wrote:
>   Hi,
> 
>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>> index ae46105..285dc16 100644
>> --- a/include/uapi/linux/vfio.h
>> +++ b/include/uapi/linux/vfio.h
>> @@ -502,10 +502,58 @@ struct vfio_pci_hot_reset {
>>  
>>  #define VFIO_DEVICE_PCI_HOT_RESET   _IO(VFIO_TYPE, VFIO_BASE +
>> 13)
>>  
>> +/**
>> + * VFIO_DEVICE_GET_FD - _IO(VFIO_TYPE, VFIO_BASE + 14, __u32)
>> + *
>> + * Create a fd for a vfio device based on the input type
>> + * Vendor driver should handle this ioctl to create a fd and manage
>> the
>> + * life cycle of this fd.
>> + *
>> + * Return: a fd if vendor support that type, -errno if not supported
>> + */
>> +
>> +#define VFIO_DEVICE_GET_FD  _IO(VFIO_TYPE, VFIO_BASE + 14)
>> +
>> +#define VFIO_DEVICE_DMABUF_MGR_FD   0 /* Supported fd types */
>> +
>> +/*
>> + * VFIO_DEVICE_QUERY_PLANE - _IO(VFIO_TYPE, VFIO_BASE + 15, struct
>> plane_info)
>> + * Query plane information for a plane
>> + */
>> +struct plane_info {
> 
> That is a pretty generic name.  vfio_vgpu_plane_info?  Or
> vfio_dmabuf_plane_info?
> 

Agree with Gerd, another suggestion vfio_vgpu_surface_info since all
solutions might not always use dmabuf.
Another way to provide surface is by adding a VGA region for vGPU device
using region capability which would be mmaped by QEMU and then QEMU can
directly use that region to get surface. This structure could be made
generic such that user can either use it using dmabuf or a separate
region.

I also back Gerd's comment on 4/5 patch, change in
include/uapi/linux/vfio.h should be a seperate patch.

Thanks,
Kirti

>> +__u32 plane_id;
>> +__u32 drm_format;
>> +__u32 width;
>> +__u32 height;
>> +__u32 stride;
>> +__u32 start;
>> +__u32 x_pos;
>> +__u32 y_pos;
>> +__u32 size;
>> +__u64 drm_format_mod;
>> +};
>> +
>> +#define VFIO_PRIMARY_PLANE  1
>> +#define VFIO_CURSOR_PLANE   2
> 
> I think we should use "enum drm_plane_type" values instead of creating
> something new.
> 
> cheers,
>   Gerd
> 


Re: [PATCH v5 5/5] drm/i915/gvt: Adding interface so user space can get the dma-buf

2017-05-23 Thread Kirti Wankhede


On 5/23/2017 7:39 PM, Gerd Hoffmann wrote:
>   Hi,
> 
>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>> index ae46105..285dc16 100644
>> --- a/include/uapi/linux/vfio.h
>> +++ b/include/uapi/linux/vfio.h
>> @@ -502,10 +502,58 @@ struct vfio_pci_hot_reset {
>>  
>>  #define VFIO_DEVICE_PCI_HOT_RESET   _IO(VFIO_TYPE, VFIO_BASE +
>> 13)
>>  
>> +/**
>> + * VFIO_DEVICE_GET_FD - _IO(VFIO_TYPE, VFIO_BASE + 14, __u32)
>> + *
>> + * Create a fd for a vfio device based on the input type
>> + * Vendor driver should handle this ioctl to create a fd and manage
>> the
>> + * life cycle of this fd.
>> + *
>> + * Return: a fd if vendor support that type, -errno if not supported
>> + */
>> +
>> +#define VFIO_DEVICE_GET_FD  _IO(VFIO_TYPE, VFIO_BASE + 14)
>> +
>> +#define VFIO_DEVICE_DMABUF_MGR_FD   0 /* Supported fd types */
>> +
>> +/*
>> + * VFIO_DEVICE_QUERY_PLANE - _IO(VFIO_TYPE, VFIO_BASE + 15, struct
>> plane_info)
>> + * Query plane information for a plane
>> + */
>> +struct plane_info {
> 
> That is a pretty generic name.  vfio_vgpu_plane_info?  Or
> vfio_dmabuf_plane_info?
> 

Agree with Gerd, another suggestion vfio_vgpu_surface_info since all
solutions might not always use dmabuf.
Another way to provide surface is by adding a VGA region for vGPU device
using region capability which would be mmaped by QEMU and then QEMU can
directly use that region to get surface. This structure could be made
generic such that user can either use it using dmabuf or a separate
region.

I also back Gerd's comment on 4/5 patch, change in
include/uapi/linux/vfio.h should be a seperate patch.

Thanks,
Kirti

>> +__u32 plane_id;
>> +__u32 drm_format;
>> +__u32 width;
>> +__u32 height;
>> +__u32 stride;
>> +__u32 start;
>> +__u32 x_pos;
>> +__u32 y_pos;
>> +__u32 size;
>> +__u64 drm_format_mod;
>> +};
>> +
>> +#define VFIO_PRIMARY_PLANE  1
>> +#define VFIO_CURSOR_PLANE   2
> 
> I think we should use "enum drm_plane_type" values instead of creating
> something new.
> 
> cheers,
>   Gerd
> 


  1   2   3   4   5   6   >