Re: [libvirt] [PATCH v7 0/4] Add Mediated device support

2016-09-01 Thread Kirti Wankhede


On 9/2/2016 1:31 AM, Alex Williamson wrote:
> On Thu, 1 Sep 2016 23:52:02 +0530
> Kirti Wankhede  wrote:
> 
>> Alex,
>> Thanks for summarizing the discussion.
>>
>> On 8/31/2016 9:18 PM, Alex Williamson wrote:
>>> On Wed, 31 Aug 2016 15:04:13 +0800
>>> Jike Song  wrote:
>>>   
 On 08/31/2016 02:12 PM, Tian, Kevin wrote:  
>> From: Alex Williamson [mailto:alex.william...@redhat.com]
>> Sent: Wednesday, August 31, 2016 12:17 AM
>>
>> Hi folks,
>>
>> At KVM Forum we had a BoF session primarily around the mediated device
>> sysfs interface.  I'd like to share what I think we agreed on and the
>> "problem areas" that still need some work so we can get the thoughts
>> and ideas from those who weren't able to attend.
>>
>> DanPB expressed some concern about the mdev_supported_types sysfs
>> interface, which exposes a flat csv file with fields like "type",
>> "number of instance", "vendor string", and then a bunch of type
>> specific fields like "framebuffer size", "resolution", "frame rate
>> limit", etc.  This is not entirely machine parsing friendly and sort of
>> abuses the sysfs concept of one value per file.  Example output taken
>> from Neo's libvirt RFC:
>>
>> cat /sys/bus/pci/devices/:86:00.0/mdev_supported_types
>> # vgpu_type_id, vgpu_type, max_instance, num_heads, frl_config, 
>> framebuffer,
>> max_resolution
>> 11  ,"GRID M60-0B",  16,   2,  45, 512M,2560x1600
>> 12  ,"GRID M60-0Q",  16,   2,  60, 512M,2560x1600
>> 13  ,"GRID M60-1B",   8,   2,  45,1024M,2560x1600
>> 14  ,"GRID M60-1Q",   8,   2,  60,1024M,2560x1600
>> 15  ,"GRID M60-2B",   4,   2,  45,2048M,2560x1600
>> 16  ,"GRID M60-2Q",   4,   4,  60,2048M,2560x1600
>> 17  ,"GRID M60-4Q",   2,   4,  60,4096M,3840x2160
>> 18  ,"GRID M60-8Q",   1,   4,  60,8192M,3840x2160
>>
>> The create/destroy then looks like this:
>>
>> echo "$mdev_UUID:vendor_specific_argument_list" >
>>  /sys/bus/pci/devices/.../mdev_create
>>
>> echo "$mdev_UUID:vendor_specific_argument_list" >
>>  /sys/bus/pci/devices/.../mdev_destroy
>>
>> "vendor_specific_argument_list" is nebulous.
>>
>> So the idea to fix this is to explode this into a directory structure,
>> something like:
>>
>> ├── mdev_destroy
>> └── mdev_supported_types
>> ├── 11
>> │   ├── create
>> │   ├── description
>> │   └── max_instances
>> ├── 12
>> │   ├── create
>> │   ├── description
>> │   └── max_instances
>> └── 13
>> ├── create
>> ├── description
>> └── max_instances
>>
>> Note that I'm only exposing the minimal attributes here for simplicity,
>> the other attributes would be included in separate files and we would
>> require vendors to create standard attributes for common device classes. 
>>
>
> I like this idea. All standard attributes are reflected into this 
> hierarchy.
> In the meantime, can we still allow optional vendor string in create 
> interface? libvirt doesn't need to know the meaning, but allows upper
> layer to do some vendor specific tweak if necessary.
> 

 Not sure whether this can done within MDEV framework (attrs provided by
 vendor driver of course), or must be within the vendor driver.  
>>>
>>> The purpose of the sub-directories is that libvirt doesn't need to pass
>>> arbitrary, vendor strings to the create function, the attributes of the
>>> mdev device created are defined by the attributes in the sysfs
>>> directory where the create is done.  The user only provides a uuid for
>>> the device.  Arbitrary vendor parameters are a barrier, libvirt may not
>>> need to know the meaning, but would need to know when to apply them,
>>> which is just as bad.  Ultimately we want libvirt to be able to
>>> interact with sysfs without having an vendor specific knowledge.
>>>   
>>
>> Above directory hierarchy looks fine to me. Along with the fixed set of
>> parameter, a optional field of extra parameter is also required. Such
>> parameters are required for some specific testing or running benchmarks,
>> for example to disable FRL (framerate limiter) or to disable console vnc
>> when not required. Libvirt don't need to know its details, its just a
>> string that user can provide and libvirt need to pass the string as it
>> is to vendor driver, vendor driver would act accordingly.
> 
> Wouldn't it make more sense to enable these through the vendor driver
> which would then provide additional types through the sysfs interface
> that could be selected by libvirt?  Or simply transparently change
> these parameters within the existing types? 

Re: [libvirt] [PATCH v7 0/4] Add Mediated device support

2016-09-01 Thread Alex Williamson
On Thu, 1 Sep 2016 23:52:02 +0530
Kirti Wankhede  wrote:

> Alex,
> Thanks for summarizing the discussion.
> 
> On 8/31/2016 9:18 PM, Alex Williamson wrote:
> > On Wed, 31 Aug 2016 15:04:13 +0800
> > Jike Song  wrote:
> >   
> >> On 08/31/2016 02:12 PM, Tian, Kevin wrote:  
>  From: Alex Williamson [mailto:alex.william...@redhat.com]
>  Sent: Wednesday, August 31, 2016 12:17 AM
> 
>  Hi folks,
> 
>  At KVM Forum we had a BoF session primarily around the mediated device
>  sysfs interface.  I'd like to share what I think we agreed on and the
>  "problem areas" that still need some work so we can get the thoughts
>  and ideas from those who weren't able to attend.
> 
>  DanPB expressed some concern about the mdev_supported_types sysfs
>  interface, which exposes a flat csv file with fields like "type",
>  "number of instance", "vendor string", and then a bunch of type
>  specific fields like "framebuffer size", "resolution", "frame rate
>  limit", etc.  This is not entirely machine parsing friendly and sort of
>  abuses the sysfs concept of one value per file.  Example output taken
>  from Neo's libvirt RFC:
> 
>  cat /sys/bus/pci/devices/:86:00.0/mdev_supported_types
>  # vgpu_type_id, vgpu_type, max_instance, num_heads, frl_config, 
>  framebuffer,
>  max_resolution
>  11  ,"GRID M60-0B",  16,   2,  45, 512M,2560x1600
>  12  ,"GRID M60-0Q",  16,   2,  60, 512M,2560x1600
>  13  ,"GRID M60-1B",   8,   2,  45,1024M,2560x1600
>  14  ,"GRID M60-1Q",   8,   2,  60,1024M,2560x1600
>  15  ,"GRID M60-2B",   4,   2,  45,2048M,2560x1600
>  16  ,"GRID M60-2Q",   4,   4,  60,2048M,2560x1600
>  17  ,"GRID M60-4Q",   2,   4,  60,4096M,3840x2160
>  18  ,"GRID M60-8Q",   1,   4,  60,8192M,3840x2160
> 
>  The create/destroy then looks like this:
> 
>  echo "$mdev_UUID:vendor_specific_argument_list" >
>   /sys/bus/pci/devices/.../mdev_create
> 
>  echo "$mdev_UUID:vendor_specific_argument_list" >
>   /sys/bus/pci/devices/.../mdev_destroy
> 
>  "vendor_specific_argument_list" is nebulous.
> 
>  So the idea to fix this is to explode this into a directory structure,
>  something like:
> 
>  ├── mdev_destroy
>  └── mdev_supported_types
>  ├── 11
>  │   ├── create
>  │   ├── description
>  │   └── max_instances
>  ├── 12
>  │   ├── create
>  │   ├── description
>  │   └── max_instances
>  └── 13
>  ├── create
>  ├── description
>  └── max_instances
> 
>  Note that I'm only exposing the minimal attributes here for simplicity,
>  the other attributes would be included in separate files and we would
>  require vendors to create standard attributes for common device classes. 
> 
> >>>
> >>> I like this idea. All standard attributes are reflected into this 
> >>> hierarchy.
> >>> In the meantime, can we still allow optional vendor string in create 
> >>> interface? libvirt doesn't need to know the meaning, but allows upper
> >>> layer to do some vendor specific tweak if necessary.
> >>> 
> >>
> >> Not sure whether this can done within MDEV framework (attrs provided by
> >> vendor driver of course), or must be within the vendor driver.  
> > 
> > The purpose of the sub-directories is that libvirt doesn't need to pass
> > arbitrary, vendor strings to the create function, the attributes of the
> > mdev device created are defined by the attributes in the sysfs
> > directory where the create is done.  The user only provides a uuid for
> > the device.  Arbitrary vendor parameters are a barrier, libvirt may not
> > need to know the meaning, but would need to know when to apply them,
> > which is just as bad.  Ultimately we want libvirt to be able to
> > interact with sysfs without having an vendor specific knowledge.
> >   
> 
> Above directory hierarchy looks fine to me. Along with the fixed set of
> parameter, a optional field of extra parameter is also required. Such
> parameters are required for some specific testing or running benchmarks,
> for example to disable FRL (framerate limiter) or to disable console vnc
> when not required. Libvirt don't need to know its details, its just a
> string that user can provide and libvirt need to pass the string as it
> is to vendor driver, vendor driver would act accordingly.

Wouldn't it make more sense to enable these through the vendor driver
which would then provide additional types through the sysfs interface
that could be selected by libvirt?  Or simply transparently change
these parameters within the existing types?  I think we really want to
get away from adding any sort of magic vendo

Re: [libvirt] [PATCH v7 0/4] Add Mediated device support

2016-09-01 Thread Kirti Wankhede

Alex,
Thanks for summarizing the discussion.

On 8/31/2016 9:18 PM, Alex Williamson wrote:
> On Wed, 31 Aug 2016 15:04:13 +0800
> Jike Song  wrote:
> 
>> On 08/31/2016 02:12 PM, Tian, Kevin wrote:
 From: Alex Williamson [mailto:alex.william...@redhat.com]
 Sent: Wednesday, August 31, 2016 12:17 AM

 Hi folks,

 At KVM Forum we had a BoF session primarily around the mediated device
 sysfs interface.  I'd like to share what I think we agreed on and the
 "problem areas" that still need some work so we can get the thoughts
 and ideas from those who weren't able to attend.

 DanPB expressed some concern about the mdev_supported_types sysfs
 interface, which exposes a flat csv file with fields like "type",
 "number of instance", "vendor string", and then a bunch of type
 specific fields like "framebuffer size", "resolution", "frame rate
 limit", etc.  This is not entirely machine parsing friendly and sort of
 abuses the sysfs concept of one value per file.  Example output taken
 from Neo's libvirt RFC:

 cat /sys/bus/pci/devices/:86:00.0/mdev_supported_types
 # vgpu_type_id, vgpu_type, max_instance, num_heads, frl_config, 
 framebuffer,
 max_resolution
 11  ,"GRID M60-0B",  16,   2,  45, 512M,2560x1600
 12  ,"GRID M60-0Q",  16,   2,  60, 512M,2560x1600
 13  ,"GRID M60-1B",   8,   2,  45,1024M,2560x1600
 14  ,"GRID M60-1Q",   8,   2,  60,1024M,2560x1600
 15  ,"GRID M60-2B",   4,   2,  45,2048M,2560x1600
 16  ,"GRID M60-2Q",   4,   4,  60,2048M,2560x1600
 17  ,"GRID M60-4Q",   2,   4,  60,4096M,3840x2160
 18  ,"GRID M60-8Q",   1,   4,  60,8192M,3840x2160

 The create/destroy then looks like this:

 echo "$mdev_UUID:vendor_specific_argument_list" >
/sys/bus/pci/devices/.../mdev_create

 echo "$mdev_UUID:vendor_specific_argument_list" >
/sys/bus/pci/devices/.../mdev_destroy

 "vendor_specific_argument_list" is nebulous.

 So the idea to fix this is to explode this into a directory structure,
 something like:

 ├── mdev_destroy
 └── mdev_supported_types
 ├── 11
 │   ├── create
 │   ├── description
 │   └── max_instances
 ├── 12
 │   ├── create
 │   ├── description
 │   └── max_instances
 └── 13
 ├── create
 ├── description
 └── max_instances

 Note that I'm only exposing the minimal attributes here for simplicity,
 the other attributes would be included in separate files and we would
 require vendors to create standard attributes for common device classes.  
>>>
>>> I like this idea. All standard attributes are reflected into this hierarchy.
>>> In the meantime, can we still allow optional vendor string in create 
>>> interface? libvirt doesn't need to know the meaning, but allows upper
>>> layer to do some vendor specific tweak if necessary.
>>>   
>>
>> Not sure whether this can done within MDEV framework (attrs provided by
>> vendor driver of course), or must be within the vendor driver.
> 
> The purpose of the sub-directories is that libvirt doesn't need to pass
> arbitrary, vendor strings to the create function, the attributes of the
> mdev device created are defined by the attributes in the sysfs
> directory where the create is done.  The user only provides a uuid for
> the device.  Arbitrary vendor parameters are a barrier, libvirt may not
> need to know the meaning, but would need to know when to apply them,
> which is just as bad.  Ultimately we want libvirt to be able to
> interact with sysfs without having an vendor specific knowledge.
> 

Above directory hierarchy looks fine to me. Along with the fixed set of
parameter, a optional field of extra parameter is also required. Such
parameters are required for some specific testing or running benchmarks,
for example to disable FRL (framerate limiter) or to disable console vnc
when not required. Libvirt don't need to know its details, its just a
string that user can provide and libvirt need to pass the string as it
is to vendor driver, vendor driver would act accordingly.


 For vGPUs like NVIDIA where we don't support multiple types
 concurrently, this directory structure would update as mdev devices are
 created, removing no longer available types.  I carried forward  
>>>
>>> or keep the type with max_instances cleared to ZERO.
>>>  
>>
>> +1 :)
> 
> Possible yes, but why would the vendor driver report types that the
> user cannot create?  It just seems like superfluous information (well,
> except for the use I discover below).
> 

The directory structure for a physical GPU will be defined when device
is register to mdev module. It would b

Re: [libvirt] [PATCH v7 0/4] Add Mediated device support

2016-08-31 Thread Tian, Kevin
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Wednesday, August 31, 2016 11:49 PM
> 
> > >
> > > IGD doesn't have such peer-to-peer resource setup requirement. So
> > > it's sufficient to create/destroy a mdev instance in a single action on
> > > IGD. However I'd expect we still keep the "start/stop" interface (
> > > maybe not exposed as sysfs node, instead being a VFIO API), as
> > > required to support future live migration usage. We've made prototype
> > > working for KVMGT today.
> 
> Great!
> 

btw here is a link to KVMGT live migration demo:

https://www.youtube.com/watch?v=y2SkU5JODIY

Thanks
Kevin

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH v7 0/4] Add Mediated device support

2016-08-31 Thread Tian, Kevin
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Wednesday, August 31, 2016 11:49 PM
> 
> On Wed, 31 Aug 2016 15:04:13 +0800
> Jike Song  wrote:
> 
> > On 08/31/2016 02:12 PM, Tian, Kevin wrote:
> > >> From: Alex Williamson [mailto:alex.william...@redhat.com]
> > >> Sent: Wednesday, August 31, 2016 12:17 AM
> > >>
> > >> Hi folks,
> > >>
> > >> At KVM Forum we had a BoF session primarily around the mediated device
> > >> sysfs interface.  I'd like to share what I think we agreed on and the
> > >> "problem areas" that still need some work so we can get the thoughts
> > >> and ideas from those who weren't able to attend.
> > >>
> > >> DanPB expressed some concern about the mdev_supported_types sysfs
> > >> interface, which exposes a flat csv file with fields like "type",
> > >> "number of instance", "vendor string", and then a bunch of type
> > >> specific fields like "framebuffer size", "resolution", "frame rate
> > >> limit", etc.  This is not entirely machine parsing friendly and sort of
> > >> abuses the sysfs concept of one value per file.  Example output taken
> > >> from Neo's libvirt RFC:
> > >>
> > >> cat /sys/bus/pci/devices/:86:00.0/mdev_supported_types
> > >> # vgpu_type_id, vgpu_type, max_instance, num_heads, frl_config, 
> > >> framebuffer,
> > >> max_resolution
> > >> 11  ,"GRID M60-0B",  16,   2,  45, 512M,2560x1600
> > >> 12  ,"GRID M60-0Q",  16,   2,  60, 512M,2560x1600
> > >> 13  ,"GRID M60-1B",   8,   2,  45,1024M,2560x1600
> > >> 14  ,"GRID M60-1Q",   8,   2,  60,1024M,2560x1600
> > >> 15  ,"GRID M60-2B",   4,   2,  45,2048M,2560x1600
> > >> 16  ,"GRID M60-2Q",   4,   4,  60,2048M,2560x1600
> > >> 17  ,"GRID M60-4Q",   2,   4,  60,4096M,3840x2160
> > >> 18  ,"GRID M60-8Q",   1,   4,  60,8192M,3840x2160
> > >>
> > >> The create/destroy then looks like this:
> > >>
> > >> echo "$mdev_UUID:vendor_specific_argument_list" >
> > >>  /sys/bus/pci/devices/.../mdev_create
> > >>
> > >> echo "$mdev_UUID:vendor_specific_argument_list" >
> > >>  /sys/bus/pci/devices/.../mdev_destroy
> > >>
> > >> "vendor_specific_argument_list" is nebulous.
> > >>
> > >> So the idea to fix this is to explode this into a directory structure,
> > >> something like:
> > >>
> > >> ├── mdev_destroy
> > >> └── mdev_supported_types
> > >> ├── 11
> > >> │   ├── create
> > >> │   ├── description
> > >> │   └── max_instances
> > >> ├── 12
> > >> │   ├── create
> > >> │   ├── description
> > >> │   └── max_instances
> > >> └── 13
> > >> ├── create
> > >> ├── description
> > >> └── max_instances
> > >>
> > >> Note that I'm only exposing the minimal attributes here for simplicity,
> > >> the other attributes would be included in separate files and we would
> > >> require vendors to create standard attributes for common device classes.
> > >
> > > I like this idea. All standard attributes are reflected into this 
> > > hierarchy.
> > > In the meantime, can we still allow optional vendor string in create
> > > interface? libvirt doesn't need to know the meaning, but allows upper
> > > layer to do some vendor specific tweak if necessary.
> > >
> >
> > Not sure whether this can done within MDEV framework (attrs provided by
> > vendor driver of course), or must be within the vendor driver.
> 
> The purpose of the sub-directories is that libvirt doesn't need to pass
> arbitrary, vendor strings to the create function, the attributes of the
> mdev device created are defined by the attributes in the sysfs
> directory where the create is done.  The user only provides a uuid for
> the device.  Arbitrary vendor parameters are a barrier, libvirt may not
> need to know the meaning, but would need to know when to apply them,
> which is just as bad.  Ultimately we want libvirt to be able to
> interact with sysfs without having an vendor specific knowledge.

Understand. Today Intel doesn't have such vendor specific parameter
requirement when creating a mdev instance (assuming type definition
is enough to cover our existing parameters).

Just think about future extensibility. Say if a new parameter (say
a QoS parameter like weight or cap) must be statically set before 
created mdev instance starts to work, due to device limitation, such
parameter needs to be exposed as a new attribute under the specific 
mdev instance, e.g.:
/sys/bus/pci/devices//mdev/weight

Then libvirt needs to make sure it's set before open() the instance.

If such flow is acceptable, it should remove necessity of vendor specific
parameter at the create, because any such requirement should be 
converted into sysfs node, if applicable to all vendors, then libvirt
can do asynchronous configurations before starting the instance.

> 
> > >>
> > >> For vGPUs like NVIDIA where we don't support multiple t

Re: [libvirt] [PATCH v7 0/4] Add Mediated device support

2016-08-31 Thread Tian, Kevin
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Wednesday, August 31, 2016 12:17 AM
> 
> Hi folks,
> 
> At KVM Forum we had a BoF session primarily around the mediated device
> sysfs interface.  I'd like to share what I think we agreed on and the
> "problem areas" that still need some work so we can get the thoughts
> and ideas from those who weren't able to attend.
> 
> DanPB expressed some concern about the mdev_supported_types sysfs
> interface, which exposes a flat csv file with fields like "type",
> "number of instance", "vendor string", and then a bunch of type
> specific fields like "framebuffer size", "resolution", "frame rate
> limit", etc.  This is not entirely machine parsing friendly and sort of
> abuses the sysfs concept of one value per file.  Example output taken
> from Neo's libvirt RFC:
> 
> cat /sys/bus/pci/devices/:86:00.0/mdev_supported_types
> # vgpu_type_id, vgpu_type, max_instance, num_heads, frl_config, framebuffer,
> max_resolution
> 11  ,"GRID M60-0B",  16,   2,  45, 512M,2560x1600
> 12  ,"GRID M60-0Q",  16,   2,  60, 512M,2560x1600
> 13  ,"GRID M60-1B",   8,   2,  45,1024M,2560x1600
> 14  ,"GRID M60-1Q",   8,   2,  60,1024M,2560x1600
> 15  ,"GRID M60-2B",   4,   2,  45,2048M,2560x1600
> 16  ,"GRID M60-2Q",   4,   4,  60,2048M,2560x1600
> 17  ,"GRID M60-4Q",   2,   4,  60,4096M,3840x2160
> 18  ,"GRID M60-8Q",   1,   4,  60,8192M,3840x2160
> 
> The create/destroy then looks like this:
> 
> echo "$mdev_UUID:vendor_specific_argument_list" >
>   /sys/bus/pci/devices/.../mdev_create
> 
> echo "$mdev_UUID:vendor_specific_argument_list" >
>   /sys/bus/pci/devices/.../mdev_destroy
> 
> "vendor_specific_argument_list" is nebulous.
> 
> So the idea to fix this is to explode this into a directory structure,
> something like:
> 
> ├── mdev_destroy
> └── mdev_supported_types
> ├── 11
> │   ├── create
> │   ├── description
> │   └── max_instances
> ├── 12
> │   ├── create
> │   ├── description
> │   └── max_instances
> └── 13
> ├── create
> ├── description
> └── max_instances
> 
> Note that I'm only exposing the minimal attributes here for simplicity,
> the other attributes would be included in separate files and we would
> require vendors to create standard attributes for common device classes.

I like this idea. All standard attributes are reflected into this hierarchy.
In the meantime, can we still allow optional vendor string in create 
interface? libvirt doesn't need to know the meaning, but allows upper
layer to do some vendor specific tweak if necessary.

> 
> For vGPUs like NVIDIA where we don't support multiple types
> concurrently, this directory structure would update as mdev devices are
> created, removing no longer available types.  I carried forward

or keep the type with max_instances cleared to ZERO.

> max_instances here, but perhaps we really want to copy SR-IOV and
> report a max and current allocation.  Creation and deletion is

right, cur/max_instances look reasonable.

> simplified as we can simply "echo $UUID > create" per type.  I don't
> understand why destroy had a parameter list, so here I imagine we can
> simply do the same... in fact, I'd actually rather see a "remove" sysfs
> entry under each mdev device, so we remove it at the device rather than
> in some central location (any objections?).

OK to me. 

> 
> We discussed how this might look with Intel devices which do allow
> mixed vGPU types concurrently.  We believe, but need confirmation, that
> the vendor driver could still make a finite set of supported types,
> perhaps with additional module options to the vendor driver to enable
> more "exotic" types.  So for instance if IGD vGPUs are based on
> power-of-2 portions of the framebuffer size, then the vendor driver
> could list types with 32MB, 64MB, 128MB, etc in useful and popular
> sizes.  As vGPUs are allocated, the larger sizes may become unavailable.

Yes, Intel can do such type of definition. One thing I'm not sure is 
about impact cross listed types, i.e. when creating a new instance
under a given type, max_instances under other types would be 
dynamically decremented based on available resource. Would it be
a problem for libvirt or upper level stack, since a natural interpretation
of max_instances should be a static number?

An alternative is to make max_instances configurable, so libvirt has
chance to define a pool of available instances with different types
before creating any instance. For example, initially IGD driver may 
report max_instances only for a minimal sharing granularity:
128MB:
max_instances (8)
256MB:
max_instances (0)
512MB:
max_instances (0)

Then libvirt can configure more types as:

Re: [libvirt] [PATCH v7 0/4] Add Mediated device support

2016-08-31 Thread Alex Williamson
On Wed, 31 Aug 2016 15:04:13 +0800
Jike Song  wrote:

> On 08/31/2016 02:12 PM, Tian, Kevin wrote:
> >> From: Alex Williamson [mailto:alex.william...@redhat.com]
> >> Sent: Wednesday, August 31, 2016 12:17 AM
> >>
> >> Hi folks,
> >>
> >> At KVM Forum we had a BoF session primarily around the mediated device
> >> sysfs interface.  I'd like to share what I think we agreed on and the
> >> "problem areas" that still need some work so we can get the thoughts
> >> and ideas from those who weren't able to attend.
> >>
> >> DanPB expressed some concern about the mdev_supported_types sysfs
> >> interface, which exposes a flat csv file with fields like "type",
> >> "number of instance", "vendor string", and then a bunch of type
> >> specific fields like "framebuffer size", "resolution", "frame rate
> >> limit", etc.  This is not entirely machine parsing friendly and sort of
> >> abuses the sysfs concept of one value per file.  Example output taken
> >> from Neo's libvirt RFC:
> >>
> >> cat /sys/bus/pci/devices/:86:00.0/mdev_supported_types
> >> # vgpu_type_id, vgpu_type, max_instance, num_heads, frl_config, 
> >> framebuffer,
> >> max_resolution
> >> 11  ,"GRID M60-0B",  16,   2,  45, 512M,2560x1600
> >> 12  ,"GRID M60-0Q",  16,   2,  60, 512M,2560x1600
> >> 13  ,"GRID M60-1B",   8,   2,  45,1024M,2560x1600
> >> 14  ,"GRID M60-1Q",   8,   2,  60,1024M,2560x1600
> >> 15  ,"GRID M60-2B",   4,   2,  45,2048M,2560x1600
> >> 16  ,"GRID M60-2Q",   4,   4,  60,2048M,2560x1600
> >> 17  ,"GRID M60-4Q",   2,   4,  60,4096M,3840x2160
> >> 18  ,"GRID M60-8Q",   1,   4,  60,8192M,3840x2160
> >>
> >> The create/destroy then looks like this:
> >>
> >> echo "$mdev_UUID:vendor_specific_argument_list" >
> >>/sys/bus/pci/devices/.../mdev_create
> >>
> >> echo "$mdev_UUID:vendor_specific_argument_list" >
> >>/sys/bus/pci/devices/.../mdev_destroy
> >>
> >> "vendor_specific_argument_list" is nebulous.
> >>
> >> So the idea to fix this is to explode this into a directory structure,
> >> something like:
> >>
> >> ├── mdev_destroy
> >> └── mdev_supported_types
> >> ├── 11
> >> │   ├── create
> >> │   ├── description
> >> │   └── max_instances
> >> ├── 12
> >> │   ├── create
> >> │   ├── description
> >> │   └── max_instances
> >> └── 13
> >> ├── create
> >> ├── description
> >> └── max_instances
> >>
> >> Note that I'm only exposing the minimal attributes here for simplicity,
> >> the other attributes would be included in separate files and we would
> >> require vendors to create standard attributes for common device classes.  
> > 
> > I like this idea. All standard attributes are reflected into this hierarchy.
> > In the meantime, can we still allow optional vendor string in create 
> > interface? libvirt doesn't need to know the meaning, but allows upper
> > layer to do some vendor specific tweak if necessary.
> >   
> 
> Not sure whether this can done within MDEV framework (attrs provided by
> vendor driver of course), or must be within the vendor driver.

The purpose of the sub-directories is that libvirt doesn't need to pass
arbitrary, vendor strings to the create function, the attributes of the
mdev device created are defined by the attributes in the sysfs
directory where the create is done.  The user only provides a uuid for
the device.  Arbitrary vendor parameters are a barrier, libvirt may not
need to know the meaning, but would need to know when to apply them,
which is just as bad.  Ultimately we want libvirt to be able to
interact with sysfs without having an vendor specific knowledge.

> >>
> >> For vGPUs like NVIDIA where we don't support multiple types
> >> concurrently, this directory structure would update as mdev devices are
> >> created, removing no longer available types.  I carried forward  
> > 
> > or keep the type with max_instances cleared to ZERO.
> >  
> 
> +1 :)

Possible yes, but why would the vendor driver report types that the
user cannot create?  It just seems like superfluous information (well,
except for the use I discover below).

> >> max_instances here, but perhaps we really want to copy SR-IOV and
> >> report a max and current allocation.  Creation and deletion is  
> > 
> > right, cur/max_instances look reasonable.
> >   
> >> simplified as we can simply "echo $UUID > create" per type.  I don't
> >> understand why destroy had a parameter list, so here I imagine we can
> >> simply do the same... in fact, I'd actually rather see a "remove" sysfs
> >> entry under each mdev device, so we remove it at the device rather than
> >> in some central location (any objections?).  
> > 
> > OK to me.   
> 
> IIUC, "destroy" has a parameter list is only because the previous
> $VM_UUID + instnace implementation. It should be safe to move the "destroy"
> fil

Re: [libvirt] [PATCH v7 0/4] Add Mediated device support

2016-08-31 Thread Jike Song
On 08/31/2016 02:12 PM, Tian, Kevin wrote:
>> From: Alex Williamson [mailto:alex.william...@redhat.com]
>> Sent: Wednesday, August 31, 2016 12:17 AM
>>
>> Hi folks,
>>
>> At KVM Forum we had a BoF session primarily around the mediated device
>> sysfs interface.  I'd like to share what I think we agreed on and the
>> "problem areas" that still need some work so we can get the thoughts
>> and ideas from those who weren't able to attend.
>>
>> DanPB expressed some concern about the mdev_supported_types sysfs
>> interface, which exposes a flat csv file with fields like "type",
>> "number of instance", "vendor string", and then a bunch of type
>> specific fields like "framebuffer size", "resolution", "frame rate
>> limit", etc.  This is not entirely machine parsing friendly and sort of
>> abuses the sysfs concept of one value per file.  Example output taken
>> from Neo's libvirt RFC:
>>
>> cat /sys/bus/pci/devices/:86:00.0/mdev_supported_types
>> # vgpu_type_id, vgpu_type, max_instance, num_heads, frl_config, framebuffer,
>> max_resolution
>> 11  ,"GRID M60-0B",  16,   2,  45, 512M,2560x1600
>> 12  ,"GRID M60-0Q",  16,   2,  60, 512M,2560x1600
>> 13  ,"GRID M60-1B",   8,   2,  45,1024M,2560x1600
>> 14  ,"GRID M60-1Q",   8,   2,  60,1024M,2560x1600
>> 15  ,"GRID M60-2B",   4,   2,  45,2048M,2560x1600
>> 16  ,"GRID M60-2Q",   4,   4,  60,2048M,2560x1600
>> 17  ,"GRID M60-4Q",   2,   4,  60,4096M,3840x2160
>> 18  ,"GRID M60-8Q",   1,   4,  60,8192M,3840x2160
>>
>> The create/destroy then looks like this:
>>
>> echo "$mdev_UUID:vendor_specific_argument_list" >
>>  /sys/bus/pci/devices/.../mdev_create
>>
>> echo "$mdev_UUID:vendor_specific_argument_list" >
>>  /sys/bus/pci/devices/.../mdev_destroy
>>
>> "vendor_specific_argument_list" is nebulous.
>>
>> So the idea to fix this is to explode this into a directory structure,
>> something like:
>>
>> ├── mdev_destroy
>> └── mdev_supported_types
>> ├── 11
>> │   ├── create
>> │   ├── description
>> │   └── max_instances
>> ├── 12
>> │   ├── create
>> │   ├── description
>> │   └── max_instances
>> └── 13
>> ├── create
>> ├── description
>> └── max_instances
>>
>> Note that I'm only exposing the minimal attributes here for simplicity,
>> the other attributes would be included in separate files and we would
>> require vendors to create standard attributes for common device classes.
> 
> I like this idea. All standard attributes are reflected into this hierarchy.
> In the meantime, can we still allow optional vendor string in create 
> interface? libvirt doesn't need to know the meaning, but allows upper
> layer to do some vendor specific tweak if necessary.
> 

Not sure whether this can done within MDEV framework (attrs provided by
vendor driver of course), or must be within the vendor driver.

>>
>> For vGPUs like NVIDIA where we don't support multiple types
>> concurrently, this directory structure would update as mdev devices are
>> created, removing no longer available types.  I carried forward
> 
> or keep the type with max_instances cleared to ZERO.
>

+1 :)

>> max_instances here, but perhaps we really want to copy SR-IOV and
>> report a max and current allocation.  Creation and deletion is
> 
> right, cur/max_instances look reasonable.
> 
>> simplified as we can simply "echo $UUID > create" per type.  I don't
>> understand why destroy had a parameter list, so here I imagine we can
>> simply do the same... in fact, I'd actually rather see a "remove" sysfs
>> entry under each mdev device, so we remove it at the device rather than
>> in some central location (any objections?).
> 
> OK to me. 

IIUC, "destroy" has a parameter list is only because the previous
$VM_UUID + instnace implementation. It should be safe to move the "destroy"
file under mdev now.

>> We discussed how this might look with Intel devices which do allow
>> mixed vGPU types concurrently.  We believe, but need confirmation, that
>> the vendor driver could still make a finite set of supported types,
>> perhaps with additional module options to the vendor driver to enable
>> more "exotic" types.  So for instance if IGD vGPUs are based on
>> power-of-2 portions of the framebuffer size, then the vendor driver
>> could list types with 32MB, 64MB, 128MB, etc in useful and popular
>> sizes.  As vGPUs are allocated, the larger sizes may become unavailable.
>
> Yes, Intel can do such type of definition. One thing I'm not sure is 
> about impact cross listed types, i.e. when creating a new instance
> under a given type, max_instances under other types would be 
> dynamically decremented based on available resource. Would it be
> a problem for libvirt or upper level stack, since a natural interpretation
> of max_instances should be a static number?
>

Re: [libvirt] [PATCH v7 0/4] Add Mediated device support

2016-08-30 Thread Alex Williamson
Hi folks,

At KVM Forum we had a BoF session primarily around the mediated device
sysfs interface.  I'd like to share what I think we agreed on and the
"problem areas" that still need some work so we can get the thoughts
and ideas from those who weren't able to attend.

DanPB expressed some concern about the mdev_supported_types sysfs
interface, which exposes a flat csv file with fields like "type",
"number of instance", "vendor string", and then a bunch of type
specific fields like "framebuffer size", "resolution", "frame rate
limit", etc.  This is not entirely machine parsing friendly and sort of
abuses the sysfs concept of one value per file.  Example output taken
from Neo's libvirt RFC:

cat /sys/bus/pci/devices/:86:00.0/mdev_supported_types
# vgpu_type_id, vgpu_type, max_instance, num_heads, frl_config, framebuffer,
max_resolution
11  ,"GRID M60-0B",  16,   2,  45, 512M,2560x1600
12  ,"GRID M60-0Q",  16,   2,  60, 512M,2560x1600
13  ,"GRID M60-1B",   8,   2,  45,1024M,2560x1600
14  ,"GRID M60-1Q",   8,   2,  60,1024M,2560x1600
15  ,"GRID M60-2B",   4,   2,  45,2048M,2560x1600
16  ,"GRID M60-2Q",   4,   4,  60,2048M,2560x1600
17  ,"GRID M60-4Q",   2,   4,  60,4096M,3840x2160
18  ,"GRID M60-8Q",   1,   4,  60,8192M,3840x2160

The create/destroy then looks like this:

echo "$mdev_UUID:vendor_specific_argument_list" >
/sys/bus/pci/devices/.../mdev_create

echo "$mdev_UUID:vendor_specific_argument_list" >
/sys/bus/pci/devices/.../mdev_destroy

"vendor_specific_argument_list" is nebulous.

So the idea to fix this is to explode this into a directory structure,
something like:

├── mdev_destroy
└── mdev_supported_types
├── 11
│   ├── create
│   ├── description
│   └── max_instances
├── 12
│   ├── create
│   ├── description
│   └── max_instances
└── 13
├── create
├── description
└── max_instances

Note that I'm only exposing the minimal attributes here for simplicity,
the other attributes would be included in separate files and we would
require vendors to create standard attributes for common device classes.

For vGPUs like NVIDIA where we don't support multiple types
concurrently, this directory structure would update as mdev devices are
created, removing no longer available types.  I carried forward
max_instances here, but perhaps we really want to copy SR-IOV and
report a max and current allocation.  Creation and deletion is
simplified as we can simply "echo $UUID > create" per type.  I don't
understand why destroy had a parameter list, so here I imagine we can
simply do the same... in fact, I'd actually rather see a "remove" sysfs
entry under each mdev device, so we remove it at the device rather than
in some central location (any objections?).

We discussed how this might look with Intel devices which do allow
mixed vGPU types concurrently.  We believe, but need confirmation, that
the vendor driver could still make a finite set of supported types,
perhaps with additional module options to the vendor driver to enable
more "exotic" types.  So for instance if IGD vGPUs are based on
power-of-2 portions of the framebuffer size, then the vendor driver
could list types with 32MB, 64MB, 128MB, etc in useful and popular
sizes.  As vGPUs are allocated, the larger sizes may become unavailable.

We still don't have any way for the admin to learn in advance how the
available supported types will change once mdev devices start to be
created.  I'm not sure how we can create a specification for this, so
probing by creating devices may be the most flexible model.

The other issue is the start/stop requirement, which was revealed to
setup peer-to-peer resources between vGPUs which is a limited hardware
resource.  We'd really like to have these happen automatically on the
first open of a vfio mdev device file and final release.  So we
brainstormed how the open/release callbacks could know the other mdev
devices for a given user.  This is where the instance number came into
play previously.  This is an area that needs work.

There was a thought that perhaps on open() the vendor driver could look
at the user pid and use that to associate with other devices, but the
problem here is that we open and begin access to each device, so
devices do this discovery serially rather than in parallel as desired.
(we might not fault in mmio space yet though, so I wonder if open()
could set the association of mdev to pid, then the first mmio fault
would trigger the resource allocation?  Then all the "magic" would live
in the vendor driver.  open() could fail if the pid already has running
mdev devices and the vendor driver chooses not to support hotplug)

One comment was that for a GPU that only supports homogeneous vGPUs,
libvirt may choose to create all the vGPUs in advanc