Re: [libvirt] RFC: Creating mediated devices with libvirt

2017-06-23 Thread Daniel P. Berrange
On Thu, Jun 22, 2017 at 05:57:34PM -0400, John Ferlan wrote:
> 
> 
> On 06/14/2017 06:06 PM, Erik Skultety wrote:
> > Hi all,
> > 
> > so there's been an off-list discussion about finally implementing creation 
> > of
> > mediated devices with libvirt and it's more than desired to get as many 
> > opinions
> > on that as possible, so please do share your ideas. This did come up 
> > already as
> > part of some older threads ([1] for example), so this will be a respin of 
> > the
> > discussions. Long story short, we decided to put device creation off and 
> > focus
> > on the introduction of the framework as such first and build upon that 
> > later,
> > i.e. now.
> > 
> > [1] https://www.redhat.com/archives/libvir-list/2017-February/msg00177.html
> > 
> > 
> > PART 1: NODEDEV-DRIVER
> > 
> > 
> > API-wise, device creation through the nodedev driver should be pretty
> > straightforward and without any issues, since virNodeDevCreateXML takes an 
> > XML
> > and does support flags. Looking at the current device XML:
> > 
> > 
> >   mdev_0cce8709_0640_46ef_bd14_962c7f73cc6f
> >   
> > /sys/devices/pci:00/.../0cce8709-0640-46ef-bd14-962c7f73cc6f
> >   pci__03_00_0
> >   
> > vfio_mdev
> >   
> >   
> > 
> > 
> > UUID 
> >   
> > 
> > 
> > We can ignore ,, elements, since these are useless
> > during creation. We also cannot use  since we don't support arbitrary
> > names and we also can't rely on users providing a name in correct form 
> > which we
> > would need to further parse in order to get the UUID.
> > So since the only thing missing to successfully use create an mdev using 
> > XML is
> > the UUID (if user doesn't want it to be generated automatically), how about
> > having a  subelement under  just like PCIs have  
> > and
> > friends, USBs have  & , interfaces have  to uniquely
> > identify the device even if the name itself is unique.
> > Removal of a device should work as well, although we might want to
> > consider creating a *Flags version of the API.
> 
> 
> Has any thought been put towards creating an mdev pool modeled after the
> Storage Pool? Similar to how vHBA's are created from a Storage Pool XML
> definition.
> 
> That way XML could be defined to keep track of a lot of different things
> that you may need and would require only starting the pool in order to
> access.
> 
> Placed "appropriately" - the mdev's could already be available by the
> time node device state initialization occurs too since the pool would
> conceivably been created/defined using data from the physical device and
> the calls to create the virtual devices would have occurred. Much easier
> to add logic to a new driver/pool mgmt to handle whatever considerations
> there are than adding logic into the existing node device driver.

All those things you describe are possible with the node device API,
once we add the inactive object concept that other APIs have. It is
also more flexible to use the node device concept, because it seemlessly
integrates with the physical PCI device management. We've already seen
with SRIOV NICs that mgmt apps needed the flexibility to choose between
assigning the physical NIC, vs assigning individual functions. I expect
the same to be true of mdevs, where you choose between assigning the
GPU PCI device, vs one of the mdev vGPUs.  In OpenStack what I'm expecting
is that the existing PCI device / SRIOV device mgmt code (that is based
on the node device APIs) is genericised to cover arbitrary types of node
device, not simply those with the pci capability. Thus we'd expect mdev
mgmt to be part of the node device APIs framework, not split off in a
separate set of pool APIs. 

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] RFC: Creating mediated devices with libvirt

2017-06-23 Thread Erik Skultety
[...]
> >> So, just for clarification of the concept, the device with ^this UUID will 
> >> have
> >> had to be defined by the nodedev API by the time we start to edit the 
> >> domain
> >> XML in this manner in which case the only thing the autocreate=yes would 
> >> do is
> >> to actually create the mdev according to the nodedev config, right? 
> >> Continuing
> >> with that thought, if UUID doesn't refer to any of the inactive configs it 
> >> will
> >> be an error I suppose? What about the fact that only one vgpu type can 
> >> live on
> >> the GPU? even if you can successfully identify a device using the UUID in 
> >> this
> >> way, you'll still face the problem, that other types might be currently
> >> occupying the GPU and need to be torn down first, will this be automated as
> >> well in what you suggest? I assume not.
> >
> > Technically we shouldn't need the node device to exist at the time we
> > define the XML - only at the time we start the guest, does the node
> > device have to exist. eg same way you list a virtual network as the
> > source of a guest NIC, but that virtual network doesn't have to actually
> > have been defined & started until the guest starts.
> >
> > If there are constraints that a pGPU can only support a certain combination
> > of vGPUs at any single point in time, doesn't the kernel already  enforce
> > that when you try to create the vGPU in sysfs. IOW, we merely need to try
> > to create the vGPU, and if the kernel mdev driver doesn't allow you to mix
> > that with the other vGPUs that already exist, then we'd just report an
> > error from virNodeDeviceCreate, and that'd get propagated back as the
> > error for the virDomainCreate call.
> >
> >>
> >>> 
> >>> 
> >>>   
> >>>
> >>> In the QEMU driver, then the only change required is
> >>>
> >>>if (def->autocreate)
> >>>virNodeDeviceCreate(dev)
> >>
> >> Aha, so if a device gets torn down on shutdown, we won't face the problem 
> >> with
> >> some other devices being active, all of them will have to be in the 
> >> inactive
> >> state because they got torn down during the last shutdown - that would 
> >> work.
> >
> > I'm not sure what the relationship with other active devices is relevant
> > here. The virNodeDevicePtr we're accesing here is a single vGPU - if other
> > running guests have further vGPUs on the same pGPU, that's not really
> > relevant. Each vGPU is created/deleted as required.
>
> I think he's talking about devices that were previously used by other
> domains that are no longer active. Since they're also automatically
> destroyed, they're not a problem.

Yes, that was exactly my point, anyhow, seems like I got a grasp of Dan's
proposal then, great.

Erik

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] RFC: Creating mediated devices with libvirt

2017-06-22 Thread John Ferlan


On 06/14/2017 06:06 PM, Erik Skultety wrote:
> Hi all,
> 
> so there's been an off-list discussion about finally implementing creation of
> mediated devices with libvirt and it's more than desired to get as many 
> opinions
> on that as possible, so please do share your ideas. This did come up already 
> as
> part of some older threads ([1] for example), so this will be a respin of the
> discussions. Long story short, we decided to put device creation off and focus
> on the introduction of the framework as such first and build upon that later,
> i.e. now.
> 
> [1] https://www.redhat.com/archives/libvir-list/2017-February/msg00177.html
> 
> 
> PART 1: NODEDEV-DRIVER
> 
> 
> API-wise, device creation through the nodedev driver should be pretty
> straightforward and without any issues, since virNodeDevCreateXML takes an XML
> and does support flags. Looking at the current device XML:
> 
> 
>   mdev_0cce8709_0640_46ef_bd14_962c7f73cc6f
>   
> /sys/devices/pci:00/.../0cce8709-0640-46ef-bd14-962c7f73cc6f
>   pci__03_00_0
>   
> vfio_mdev
>   
>   
> 
> 
> UUID 
>   
> 
> 
> We can ignore ,, elements, since these are useless
> during creation. We also cannot use  since we don't support arbitrary
> names and we also can't rely on users providing a name in correct form which 
> we
> would need to further parse in order to get the UUID.
> So since the only thing missing to successfully use create an mdev using XML 
> is
> the UUID (if user doesn't want it to be generated automatically), how about
> having a  subelement under  just like PCIs have  and
> friends, USBs have  & , interfaces have  to uniquely
> identify the device even if the name itself is unique.
> Removal of a device should work as well, although we might want to
> consider creating a *Flags version of the API.


Has any thought been put towards creating an mdev pool modeled after the
Storage Pool? Similar to how vHBA's are created from a Storage Pool XML
definition.

That way XML could be defined to keep track of a lot of different things
that you may need and would require only starting the pool in order to
access.

Placed "appropriately" - the mdev's could already be available by the
time node device state initialization occurs too since the pool would
conceivably been created/defined using data from the physical device and
the calls to create the virtual devices would have occurred. Much easier
to add logic to a new driver/pool mgmt to handle whatever considerations
there are than adding logic into the existing node device driver.

Of course if there's only ever going to be a 1-to-1 relationship between
whatever the mdev parent is and an mdev child, then it's probably
overkill to go with a pool model; however, I was under the impression
that an mdev parent could have many mdev children with various different
configuration options depending on multiple factors.

Thus:


  Happy
  UUID
  

...
  
...


where the parent is then "found" in node device via "mdev_%s",  XML that would define specific
"formats" that could be used and made active/inactive. A bit different
than  XML which is output only based on what's found in the
storage pool source.

My recollection of the whole frame work is not up to par with the latest
information, but I recall there being multiple different ways to have
"something" defined that could then be used by the guest based on one
parent mdev. What those things are were a combination of what the mdev
could support and there could be 1 or many depending on the resultant vGPU.

Maybe we need a virtual white board to help describe the things ;-)

If you wait long enough or perhaps if review pace would pick up, maybe
creating a new driver and vir*obj infrastructure will be easier with a
common virObject instance. Oh and this has a "uuid" and "name" for
searches, so fits nicely.

> 
> =
> PART 2: DOMAIN XML & DEVICE AUTO-CREATION, NO POLICY INVOLVED!
> =
> 
> There were some doubts about auto-creation mentioned in [1], although they
> weren't specified further. So hopefully, we'll get further in the discussion
> this time.
> 
>>From my perspective there are two main reasons/benefits to that:
> 
> 1) Convenience
> For apps like virt-manager, user will want to add a host device transparently,
> "hey libvirt, I want an mdev assigned to my VM, can you do that". Even for
> higher management apps, like oVirt, even they might not care about the parent
> device at all times and considering that they would need to enumerate the
> parents, pick one, create the device XML and pass it to the nodedev driver, 
> IMHO
> it would actually be easier and faster to just do it directly through sysfs,
> bypassing libvirt once again

Using "pool" methodology borrows on existing storage technology except
applying it to 

Re: [libvirt] RFC: Creating mediated devices with libvirt

2017-06-22 Thread John Ferlan


On 06/22/2017 11:52 AM, Pavel Hrdina wrote:
> On Thu, Jun 22, 2017 at 09:28:57AM -0600, Alex Williamson wrote:
>> On Thu, 22 Jun 2017 17:14:48 +0200
>> Erik Skultety  wrote:
>>
>>> [...]
>
> ^this is the thing we constantly keep discussing as everyone has a 
> slightly
> different angle of view - libvirt does not implement any kind of policy,
> therefore the only "configuration" would be the PCI parent placement - 
> you say
> what to do and we do it, no logic in it, that's it. Now, I don't 
> understand
> taking care of the guesswork for the user in the simplest manner possible 
> as
> policy rather as a mere convenience, be it just for developers and 
> testers, but
> even that might apparently be perceived as a policy and therefore 
> unacceptable.
>
> I still stand by idea of having auto-creation as unfortunately, I sort of 
> still
> fail to understand what the negative implications of having it are - is 
> that it
> would get just unnecessarily too complex to maintain in the future that 
> we would
> regret it or that we'd get a huge amount of follow-up requests for 
> extending the
> feature or is it just that simply the interpretation of auto-create == 
> policy?  

 The increasing complexity of the qemu driver is a significant concern with
 adding policy based logic to the code. THinking about this though, if we
 provide the inactive node device feature, then we can avoid essentially
 all new code and complexity QEMU driver, and still support auto-create.

 ie, in the domain XML we just continue to have the exact same XML that
 we already have today for mdevs, but with a single new attribute
 autocreate=yes|no

   
 >>> autocreate="yes">
 
 
>>>
>>> So, just for clarification of the concept, the device with ^this UUID will 
>>> have
>>> had to be defined by the nodedev API by the time we start to edit the domain
>>> XML in this manner in which case the only thing the autocreate=yes would do 
>>> is
>>> to actually create the mdev according to the nodedev config, right? 
>>> Continuing
>>> with that thought, if UUID doesn't refer to any of the inactive configs it 
>>> will
>>> be an error I suppose? What about the fact that only one vgpu type can live 
>>> on
>>> the GPU? even if you can successfully identify a device using the UUID in 
>>> this
>>> way, you'll still face the problem, that other types might be currently
>>> occupying the GPU and need to be torn down first, will this be automated as
>>> well in what you suggest? I assume not.
>>>
 
 
   

 In the QEMU driver, then the only change required is

if (def->autocreate)
virNodeDeviceCreate(dev)  
>>>
>>> Aha, so if a device gets torn down on shutdown, we won't face the problem 
>>> with
>>> some other devices being active, all of them will have to be in the inactive
>>> state because they got torn down during the last shutdown - that would work.
>>
>>
>> I'm not familiar with how inactive devices would be defined in the
>> nodedev API, would someone mind explaining or providing an example
>> please?  I don't understand where the metadata is stored that describes
>> the what and where of a given UUID.  Thanks,
> 
> It would basically copy what we do for domains.  Currently there is
> virNodeDeviceCreateXML() which takes the XML definitions and creates a
> new active node device and virNodeDeviceDestroy() which takes as
> argument an object of existing active node device.

FWIW: (Just in case someone doesn't know yet...) The only current
CreateXML consumer is for NPIV/vHBA devices. As I've pointed out before
I see a lot of similarities w/ mdev because they both have a dependency
on "something else" in order for proper creation. NPIV/vHBA requires an
HBA (scsi_hostN) that has a sysfs structure with a vport_create function
to create the vHBA. The HBA scsi_hostN is instantiated during
udevEnumerateDevices processing while the vHBA scsi_hostM is created
during udevEventHandleCallback.

The CreateXML provides an essentially 'transient' model to describe
a(the) vHBA device(s). After host reboot, one would have to run virsh
nodedev-create file.xml in order to recreate their vHBA.

In order to create more permanent vHBA's, it's possible to define a
storage pool that would create the vHBA when the storage pool is
started. So while there's no DefineXML support, there is a model that
does provide a mechanism to have persistence without needing to have a
DefineXML for node devices.

> 
> We would extend the functionality with new APIs:
> 
>   - virNodeDeviceCreate() which would take as argument an object of
> existing inactive node device.
> 
>   - virNodeDeviceDefineXML() would define the node device as inactive.
> 
> With the virNodeDeviceDefineXML() you would create a list of predefined
> inactive 

Re: [libvirt] RFC: Creating mediated devices with libvirt

2017-06-22 Thread Daniel P. Berrange
On Thu, Jun 22, 2017 at 12:33:16PM -0400, Laine Stump wrote:
> On 06/22/2017 11:28 AM, Alex Williamson wrote:
> > On Thu, 22 Jun 2017 17:14:48 +0200
> > Erik Skultety  wrote:
> > 
> >> [...]
> 
>  ^this is the thing we constantly keep discussing as everyone has a 
>  slightly
>  different angle of view - libvirt does not implement any kind of policy,
>  therefore the only "configuration" would be the PCI parent placement - 
>  you say
>  what to do and we do it, no logic in it, that's it. Now, I don't 
>  understand
>  taking care of the guesswork for the user in the simplest manner 
>  possible as
>  policy rather as a mere convenience, be it just for developers and 
>  testers, but
>  even that might apparently be perceived as a policy and therefore 
>  unacceptable.
> 
>  I still stand by idea of having auto-creation as unfortunately, I sort 
>  of still
>  fail to understand what the negative implications of having it are - is 
>  that it
>  would get just unnecessarily too complex to maintain in the future that 
>  we would
>  regret it or that we'd get a huge amount of follow-up requests for 
>  extending the
>  feature or is it just that simply the interpretation of auto-create == 
>  policy?  
> >>>
> >>> The increasing complexity of the qemu driver is a significant concern with
> >>> adding policy based logic to the code. THinking about this though, if we
> >>> provide the inactive node device feature, then we can avoid essentially
> >>> all new code and complexity QEMU driver, and still support auto-create.
> >>>
> >>> ie, in the domain XML we just continue to have the exact same XML that
> >>> we already have today for mdevs, but with a single new attribute
> >>> autocreate=yes|no
> >>>
> >>>   
> >>>  >>> autocreate="yes">
> >>> 
> >>> 
> >>
> >> So, just for clarification of the concept, the device with ^this UUID will 
> >> have
> >> had to be defined by the nodedev API by the time we start to edit the 
> >> domain
> >> XML in this manner in which case the only thing the autocreate=yes would 
> >> do is
> >> to actually create the mdev according to the nodedev config, right? 
> >> Continuing
> >> with that thought, if UUID doesn't refer to any of the inactive configs it 
> >> will
> >> be an error I suppose? What about the fact that only one vgpu type can 
> >> live on
> >> the GPU? even if you can successfully identify a device using the UUID in 
> >> this
> >> way, you'll still face the problem, that other types might be currently
> >> occupying the GPU and need to be torn down first, will this be automated as
> >> well in what you suggest? I assume not.
> >>
> >>> 
> >>> 
> >>>   
> >>>
> >>> In the QEMU driver, then the only change required is
> >>>
> >>>if (def->autocreate)
> >>>virNodeDeviceCreate(dev)  
> >>
> >> Aha, so if a device gets torn down on shutdown, we won't face the problem 
> >> with
> >> some other devices being active, all of them will have to be in the 
> >> inactive
> >> state because they got torn down during the last shutdown - that would 
> >> work.
> > 
> > 
> > I'm not familiar with how inactive devices would be defined in the
> > nodedev API, would someone mind explaining or providing an example
> > please?  I don't understand where the metadata is stored that describes
> > the what and where of a given UUID.  Thanks,
> 
> You don't understand it because it doesn't exist yet :-)
> 
> The idea is essentially the same that we've talked about, except that
> all the information about parent PCI address, desired type of child, and
> anything else (is there anything else?) is stored in some
> not-yet-specified persistent node device config rather than directly in
> the domain XML. Maybe something like:
> 
>   
> BobLobLaw
> 
>   
> 
> 
>   
> 
> I haven't thought about how it would show the difference between active
> and inactive - didn't get enough coffee today and I have a headache.

The XML doesn't need to show the difference between active & inactive.

That distinction is something you filter on when querying the list
of devices. We'd want to add  a virNodeDeviceIsActive() API like
we have for other objects too, so you can query it afterwards too.


> ... okay, another "shower thought" is coming in... One deficiency of
> this comes to mind - since the domain config references the device by
> uuid, and an existing child device's uuid can't be changed, the unique
> uuid used by a particular domain must be defined on all of the hosts
> that the domain might be moved to. And since other domains can't share
> that uuid (unless you're 100% sure they'll never be active at the same
> time), you won't be able to implement the alternate idea of "pre-create
> all the devices, then assign them to domains as needed"; instead, you'll
> be forced to use the "create-on-demand" model.

You can still 

Re: [libvirt] RFC: Creating mediated devices with libvirt

2017-06-22 Thread Laine Stump
On 06/22/2017 12:15 PM, Daniel P. Berrange wrote:
> On Thu, Jun 22, 2017 at 05:14:48PM +0200, Erik Skultety wrote:
>> [...]

 ^this is the thing we constantly keep discussing as everyone has a slightly
 different angle of view - libvirt does not implement any kind of policy,
 therefore the only "configuration" would be the PCI parent placement - you 
 say
 what to do and we do it, no logic in it, that's it. Now, I don't understand
 taking care of the guesswork for the user in the simplest manner possible 
 as
 policy rather as a mere convenience, be it just for developers and 
 testers, but
 even that might apparently be perceived as a policy and therefore 
 unacceptable.

 I still stand by idea of having auto-creation as unfortunately, I sort of 
 still
 fail to understand what the negative implications of having it are - is 
 that it
 would get just unnecessarily too complex to maintain in the future that we 
 would
 regret it or that we'd get a huge amount of follow-up requests for 
 extending the
 feature or is it just that simply the interpretation of auto-create == 
 policy?
>>>
>>> The increasing complexity of the qemu driver is a significant concern with
>>> adding policy based logic to the code. THinking about this though, if we
>>> provide the inactive node device feature, then we can avoid essentially
>>> all new code and complexity QEMU driver, and still support auto-create.
>>>
>>> ie, in the domain XML we just continue to have the exact same XML that
>>> we already have today for mdevs, but with a single new attribute
>>> autocreate=yes|no
>>>
>>>   
>>> 
>>> 
>>>   
>>
>> So, just for clarification of the concept, the device with ^this UUID will 
>> have
>> had to be defined by the nodedev API by the time we start to edit the domain
>> XML in this manner in which case the only thing the autocreate=yes would do 
>> is
>> to actually create the mdev according to the nodedev config, right? 
>> Continuing
>> with that thought, if UUID doesn't refer to any of the inactive configs it 
>> will
>> be an error I suppose? What about the fact that only one vgpu type can live 
>> on
>> the GPU? even if you can successfully identify a device using the UUID in 
>> this
>> way, you'll still face the problem, that other types might be currently
>> occupying the GPU and need to be torn down first, will this be automated as
>> well in what you suggest? I assume not.
> 
> Technically we shouldn't need the node device to exist at the time we
> define the XML - only at the time we start the guest, does the node
> device have to exist. eg same way you list a virtual network as the
> source of a guest NIC, but that virtual network doesn't have to actually
> have been defined & started until the guest starts.
> 
> If there are constraints that a pGPU can only support a certain combination
> of vGPUs at any single point in time, doesn't the kernel already  enforce
> that when you try to create the vGPU in sysfs. IOW, we merely need to try
> to create the vGPU, and if the kernel mdev driver doesn't allow you to mix
> that with the other vGPUs that already exist, then we'd just report an
> error from virNodeDeviceCreate, and that'd get propagated back as the
> error for the virDomainCreate call.
> 
>>
>>> 
>>> 
>>>   
>>>
>>> In the QEMU driver, then the only change required is
>>>
>>>if (def->autocreate)
>>>virNodeDeviceCreate(dev)
>>
>> Aha, so if a device gets torn down on shutdown, we won't face the problem 
>> with
>> some other devices being active, all of them will have to be in the inactive
>> state because they got torn down during the last shutdown - that would work.
> 
> I'm not sure what the relationship with other active devices is relevant
> here. The virNodeDevicePtr we're accesing here is a single vGPU - if other
> running guests have further vGPUs on the same pGPU, that's not really
> relevant. Each vGPU is created/deleted as required.

I think he's talking about devices that were previously used by other
domains that are no longer active. Since they're also automatically
destroyed, they're not a problem.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] RFC: Creating mediated devices with libvirt

2017-06-22 Thread Laine Stump
On 06/22/2017 11:28 AM, Alex Williamson wrote:
> On Thu, 22 Jun 2017 17:14:48 +0200
> Erik Skultety  wrote:
> 
>> [...]

 ^this is the thing we constantly keep discussing as everyone has a slightly
 different angle of view - libvirt does not implement any kind of policy,
 therefore the only "configuration" would be the PCI parent placement - you 
 say
 what to do and we do it, no logic in it, that's it. Now, I don't understand
 taking care of the guesswork for the user in the simplest manner possible 
 as
 policy rather as a mere convenience, be it just for developers and 
 testers, but
 even that might apparently be perceived as a policy and therefore 
 unacceptable.

 I still stand by idea of having auto-creation as unfortunately, I sort of 
 still
 fail to understand what the negative implications of having it are - is 
 that it
 would get just unnecessarily too complex to maintain in the future that we 
 would
 regret it or that we'd get a huge amount of follow-up requests for 
 extending the
 feature or is it just that simply the interpretation of auto-create == 
 policy?  
>>>
>>> The increasing complexity of the qemu driver is a significant concern with
>>> adding policy based logic to the code. THinking about this though, if we
>>> provide the inactive node device feature, then we can avoid essentially
>>> all new code and complexity QEMU driver, and still support auto-create.
>>>
>>> ie, in the domain XML we just continue to have the exact same XML that
>>> we already have today for mdevs, but with a single new attribute
>>> autocreate=yes|no
>>>
>>>   
>>> 
>>> 
>>> 
>>
>> So, just for clarification of the concept, the device with ^this UUID will 
>> have
>> had to be defined by the nodedev API by the time we start to edit the domain
>> XML in this manner in which case the only thing the autocreate=yes would do 
>> is
>> to actually create the mdev according to the nodedev config, right? 
>> Continuing
>> with that thought, if UUID doesn't refer to any of the inactive configs it 
>> will
>> be an error I suppose? What about the fact that only one vgpu type can live 
>> on
>> the GPU? even if you can successfully identify a device using the UUID in 
>> this
>> way, you'll still face the problem, that other types might be currently
>> occupying the GPU and need to be torn down first, will this be automated as
>> well in what you suggest? I assume not.
>>
>>> 
>>> 
>>>   
>>>
>>> In the QEMU driver, then the only change required is
>>>
>>>if (def->autocreate)
>>>virNodeDeviceCreate(dev)  
>>
>> Aha, so if a device gets torn down on shutdown, we won't face the problem 
>> with
>> some other devices being active, all of them will have to be in the inactive
>> state because they got torn down during the last shutdown - that would work.
> 
> 
> I'm not familiar with how inactive devices would be defined in the
> nodedev API, would someone mind explaining or providing an example
> please?  I don't understand where the metadata is stored that describes
> the what and where of a given UUID.  Thanks,

You don't understand it because it doesn't exist yet :-)

The idea is essentially the same that we've talked about, except that
all the information about parent PCI address, desired type of child, and
anything else (is there anything else?) is stored in some
not-yet-specified persistent node device config rather than directly in
the domain XML. Maybe something like:

  
BobLobLaw

  


  

I haven't thought about how it would show the difference between active
and inactive - didn't get enough coffee today and I have a headache.

The advantage of this is that it uncouples the  specifics of the child
device from the domain XML - the only thing in the domain XML is the
uuid. So a device config with that uuid would need to exist on every
host where you wanted to run a particular guest, but the details could
be different, yet you wouldn't need to edit the domain XML. This is a
similar concept to the idea of creating libvirt networks that are just
an indirect pointer to a bridge device (which may have a different name
on each host) or to an SRIOV PF (yeah, I know Dan doesn't like that
feature, but I find it very useful, and unobtrusive if management
chooses not to use it).

So from your point of view (I'm talking to Alex here), implementing it
this way would mean that you would need to create the child device
definitions in the nodedev driver once (and possibly/hopefully the uuid
of the devices would be autogenerated, same as we do for uuids in other
parts of libvirt config), then copy that uuid to the domain config one
time. But after doing that once, you would be able to start and stop
domains and the host without any extra action. You could also define
different nodedevices that used the same parent for different child
types, and reference them 

Re: [libvirt] RFC: Creating mediated devices with libvirt

2017-06-22 Thread Daniel P. Berrange
On Thu, Jun 22, 2017 at 05:14:48PM +0200, Erik Skultety wrote:
> [...]
> > >
> > > ^this is the thing we constantly keep discussing as everyone has a 
> > > slightly
> > > different angle of view - libvirt does not implement any kind of policy,
> > > therefore the only "configuration" would be the PCI parent placement - 
> > > you say
> > > what to do and we do it, no logic in it, that's it. Now, I don't 
> > > understand
> > > taking care of the guesswork for the user in the simplest manner possible 
> > > as
> > > policy rather as a mere convenience, be it just for developers and 
> > > testers, but
> > > even that might apparently be perceived as a policy and therefore 
> > > unacceptable.
> > >
> > > I still stand by idea of having auto-creation as unfortunately, I sort of 
> > > still
> > > fail to understand what the negative implications of having it are - is 
> > > that it
> > > would get just unnecessarily too complex to maintain in the future that 
> > > we would
> > > regret it or that we'd get a huge amount of follow-up requests for 
> > > extending the
> > > feature or is it just that simply the interpretation of auto-create == 
> > > policy?
> >
> > The increasing complexity of the qemu driver is a significant concern with
> > adding policy based logic to the code. THinking about this though, if we
> > provide the inactive node device feature, then we can avoid essentially
> > all new code and complexity QEMU driver, and still support auto-create.
> >
> > ie, in the domain XML we just continue to have the exact same XML that
> > we already have today for mdevs, but with a single new attribute
> > autocreate=yes|no
> >
> >   
> > 
> > 
> >   
> 
> So, just for clarification of the concept, the device with ^this UUID will 
> have
> had to be defined by the nodedev API by the time we start to edit the domain
> XML in this manner in which case the only thing the autocreate=yes would do is
> to actually create the mdev according to the nodedev config, right? Continuing
> with that thought, if UUID doesn't refer to any of the inactive configs it 
> will
> be an error I suppose? What about the fact that only one vgpu type can live on
> the GPU? even if you can successfully identify a device using the UUID in this
> way, you'll still face the problem, that other types might be currently
> occupying the GPU and need to be torn down first, will this be automated as
> well in what you suggest? I assume not.

Technically we shouldn't need the node device to exist at the time we
define the XML - only at the time we start the guest, does the node
device have to exist. eg same way you list a virtual network as the
source of a guest NIC, but that virtual network doesn't have to actually
have been defined & started until the guest starts.

If there are constraints that a pGPU can only support a certain combination
of vGPUs at any single point in time, doesn't the kernel already  enforce
that when you try to create the vGPU in sysfs. IOW, we merely need to try
to create the vGPU, and if the kernel mdev driver doesn't allow you to mix
that with the other vGPUs that already exist, then we'd just report an
error from virNodeDeviceCreate, and that'd get propagated back as the
error for the virDomainCreate call.

> 
> > 
> > 
> >   
> >
> > In the QEMU driver, then the only change required is
> >
> >if (def->autocreate)
> >virNodeDeviceCreate(dev)
> 
> Aha, so if a device gets torn down on shutdown, we won't face the problem with
> some other devices being active, all of them will have to be in the inactive
> state because they got torn down during the last shutdown - that would work.

I'm not sure what the relationship with other active devices is relevant
here. The virNodeDevicePtr we're accesing here is a single vGPU - if other
running guests have further vGPUs on the same pGPU, that's not really
relevant. Each vGPU is created/deleted as required.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] RFC: Creating mediated devices with libvirt

2017-06-22 Thread Pavel Hrdina
On Thu, Jun 22, 2017 at 09:28:57AM -0600, Alex Williamson wrote:
> On Thu, 22 Jun 2017 17:14:48 +0200
> Erik Skultety  wrote:
> 
> > [...]
> > > >
> > > > ^this is the thing we constantly keep discussing as everyone has a 
> > > > slightly
> > > > different angle of view - libvirt does not implement any kind of policy,
> > > > therefore the only "configuration" would be the PCI parent placement - 
> > > > you say
> > > > what to do and we do it, no logic in it, that's it. Now, I don't 
> > > > understand
> > > > taking care of the guesswork for the user in the simplest manner 
> > > > possible as
> > > > policy rather as a mere convenience, be it just for developers and 
> > > > testers, but
> > > > even that might apparently be perceived as a policy and therefore 
> > > > unacceptable.
> > > >
> > > > I still stand by idea of having auto-creation as unfortunately, I sort 
> > > > of still
> > > > fail to understand what the negative implications of having it are - is 
> > > > that it
> > > > would get just unnecessarily too complex to maintain in the future that 
> > > > we would
> > > > regret it or that we'd get a huge amount of follow-up requests for 
> > > > extending the
> > > > feature or is it just that simply the interpretation of auto-create == 
> > > > policy?  
> > >
> > > The increasing complexity of the qemu driver is a significant concern with
> > > adding policy based logic to the code. THinking about this though, if we
> > > provide the inactive node device feature, then we can avoid essentially
> > > all new code and complexity QEMU driver, and still support auto-create.
> > >
> > > ie, in the domain XML we just continue to have the exact same XML that
> > > we already have today for mdevs, but with a single new attribute
> > > autocreate=yes|no
> > >
> > >   
> > >  > > autocreate="yes">
> > > 
> > > 
> > 
> > So, just for clarification of the concept, the device with ^this UUID will 
> > have
> > had to be defined by the nodedev API by the time we start to edit the domain
> > XML in this manner in which case the only thing the autocreate=yes would do 
> > is
> > to actually create the mdev according to the nodedev config, right? 
> > Continuing
> > with that thought, if UUID doesn't refer to any of the inactive configs it 
> > will
> > be an error I suppose? What about the fact that only one vgpu type can live 
> > on
> > the GPU? even if you can successfully identify a device using the UUID in 
> > this
> > way, you'll still face the problem, that other types might be currently
> > occupying the GPU and need to be torn down first, will this be automated as
> > well in what you suggest? I assume not.
> > 
> > > 
> > > 
> > >   
> > >
> > > In the QEMU driver, then the only change required is
> > >
> > >if (def->autocreate)
> > >virNodeDeviceCreate(dev)  
> > 
> > Aha, so if a device gets torn down on shutdown, we won't face the problem 
> > with
> > some other devices being active, all of them will have to be in the inactive
> > state because they got torn down during the last shutdown - that would work.
> 
> 
> I'm not familiar with how inactive devices would be defined in the
> nodedev API, would someone mind explaining or providing an example
> please?  I don't understand where the metadata is stored that describes
> the what and where of a given UUID.  Thanks,

It would basically copy what we do for domains.  Currently there is
virNodeDeviceCreateXML() which takes the XML definitions and creates a
new active node device and virNodeDeviceDestroy() which takes as
argument an object of existing active node device.

We would extend the functionality with new APIs:

  - virNodeDeviceCreate() which would take as argument an object of
existing inactive node device.

  - virNodeDeviceDefineXML() would define the node device as inactive.

With the virNodeDeviceDefineXML() you would create a list of predefined
inactive devices which could be obtained by
virConnectListAllNodeDevices() for example.

Internally we would store XML files the same way as we do for domains,
somewhere in "/etc/libvirt/..." and like with domains the APIs would
work with these files.

In virsh terms there would be similar analogy to the domain commands:

"virsh nodedev-start" could simply map to virNodeDeviceCreate() and
would work like "virsh start" for domains and "virsh nodedev-define"
woudl map to virNodeDeviceDefineXML() and work the same way as
"virsh define".  You could simply list the predefined mdev devices
using "virsh nodedev-list", get UUID of existing mdev device and use it
in a domain.

In virt-manager there could be new type of hostdev device where you
could select on of existing mdev devices from a drop-down list where
virt-manager would show nice user-friendly descriptions of the mdev
devices but under the hood it would put the UUID in the domain XML.

Pavel

> 
> Alex
> 
> --
> libvir-list mailing list
> libvir-list@redhat.com
> 

Re: [libvirt] RFC: Creating mediated devices with libvirt

2017-06-22 Thread Alex Williamson
On Thu, 22 Jun 2017 17:14:48 +0200
Erik Skultety  wrote:

> [...]
> > >
> > > ^this is the thing we constantly keep discussing as everyone has a 
> > > slightly
> > > different angle of view - libvirt does not implement any kind of policy,
> > > therefore the only "configuration" would be the PCI parent placement - 
> > > you say
> > > what to do and we do it, no logic in it, that's it. Now, I don't 
> > > understand
> > > taking care of the guesswork for the user in the simplest manner possible 
> > > as
> > > policy rather as a mere convenience, be it just for developers and 
> > > testers, but
> > > even that might apparently be perceived as a policy and therefore 
> > > unacceptable.
> > >
> > > I still stand by idea of having auto-creation as unfortunately, I sort of 
> > > still
> > > fail to understand what the negative implications of having it are - is 
> > > that it
> > > would get just unnecessarily too complex to maintain in the future that 
> > > we would
> > > regret it or that we'd get a huge amount of follow-up requests for 
> > > extending the
> > > feature or is it just that simply the interpretation of auto-create == 
> > > policy?  
> >
> > The increasing complexity of the qemu driver is a significant concern with
> > adding policy based logic to the code. THinking about this though, if we
> > provide the inactive node device feature, then we can avoid essentially
> > all new code and complexity QEMU driver, and still support auto-create.
> >
> > ie, in the domain XML we just continue to have the exact same XML that
> > we already have today for mdevs, but with a single new attribute
> > autocreate=yes|no
> >
> >   
> > 
> > 
> > 
> 
> So, just for clarification of the concept, the device with ^this UUID will 
> have
> had to be defined by the nodedev API by the time we start to edit the domain
> XML in this manner in which case the only thing the autocreate=yes would do is
> to actually create the mdev according to the nodedev config, right? Continuing
> with that thought, if UUID doesn't refer to any of the inactive configs it 
> will
> be an error I suppose? What about the fact that only one vgpu type can live on
> the GPU? even if you can successfully identify a device using the UUID in this
> way, you'll still face the problem, that other types might be currently
> occupying the GPU and need to be torn down first, will this be automated as
> well in what you suggest? I assume not.
> 
> > 
> > 
> >   
> >
> > In the QEMU driver, then the only change required is
> >
> >if (def->autocreate)
> >virNodeDeviceCreate(dev)  
> 
> Aha, so if a device gets torn down on shutdown, we won't face the problem with
> some other devices being active, all of them will have to be in the inactive
> state because they got torn down during the last shutdown - that would work.


I'm not familiar with how inactive devices would be defined in the
nodedev API, would someone mind explaining or providing an example
please?  I don't understand where the metadata is stored that describes
the what and where of a given UUID.  Thanks,

Alex

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] RFC: Creating mediated devices with libvirt

2017-06-22 Thread Erik Skultety
[...]
> >
> > ^this is the thing we constantly keep discussing as everyone has a slightly
> > different angle of view - libvirt does not implement any kind of policy,
> > therefore the only "configuration" would be the PCI parent placement - you 
> > say
> > what to do and we do it, no logic in it, that's it. Now, I don't understand
> > taking care of the guesswork for the user in the simplest manner possible as
> > policy rather as a mere convenience, be it just for developers and testers, 
> > but
> > even that might apparently be perceived as a policy and therefore 
> > unacceptable.
> >
> > I still stand by idea of having auto-creation as unfortunately, I sort of 
> > still
> > fail to understand what the negative implications of having it are - is 
> > that it
> > would get just unnecessarily too complex to maintain in the future that we 
> > would
> > regret it or that we'd get a huge amount of follow-up requests for 
> > extending the
> > feature or is it just that simply the interpretation of auto-create == 
> > policy?
>
> The increasing complexity of the qemu driver is a significant concern with
> adding policy based logic to the code. THinking about this though, if we
> provide the inactive node device feature, then we can avoid essentially
> all new code and complexity QEMU driver, and still support auto-create.
>
> ie, in the domain XML we just continue to have the exact same XML that
> we already have today for mdevs, but with a single new attribute
> autocreate=yes|no
>
>   
> 
> 
>   

So, just for clarification of the concept, the device with ^this UUID will have
had to be defined by the nodedev API by the time we start to edit the domain
XML in this manner in which case the only thing the autocreate=yes would do is
to actually create the mdev according to the nodedev config, right? Continuing
with that thought, if UUID doesn't refer to any of the inactive configs it will
be an error I suppose? What about the fact that only one vgpu type can live on
the GPU? even if you can successfully identify a device using the UUID in this
way, you'll still face the problem, that other types might be currently
occupying the GPU and need to be torn down first, will this be automated as
well in what you suggest? I assume not.

> 
> 
>   
>
> In the QEMU driver, then the only change required is
>
>if (def->autocreate)
>virNodeDeviceCreate(dev)

Aha, so if a device gets torn down on shutdown, we won't face the problem with
some other devices being active, all of them will have to be in the inactive
state because they got torn down during the last shutdown - that would work.

Erik

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] RFC: Creating mediated devices with libvirt

2017-06-22 Thread Daniel P. Berrange
On Thu, Jun 22, 2017 at 10:41:13AM +0200, Martin Polednik wrote:
> On 16/06/17 18:14 +0100, Daniel P. Berrange wrote:
> > On Fri, Jun 16, 2017 at 06:11:17PM +0100, Daniel P. Berrange wrote:
> > > On Fri, Jun 16, 2017 at 11:02:55AM -0600, Alex Williamson wrote:
> > > > On Fri, 16 Jun 2017 11:32:04 -0400
> > > > Laine Stump  wrote:
> > > >
> > > > > On 06/15/2017 02:42 PM, Alex Williamson wrote:
> > > > > > On Thu, 15 Jun 2017 09:33:01 +0100
> > > > > > "Daniel P. Berrange"  wrote:
> > > > > >
> > > > > >> On Thu, Jun 15, 2017 at 12:06:43AM +0200, Erik Skultety wrote:
> > > > > >>> Hi all,
> > > > > >>>
> > > > > >>> so there's been an off-list discussion about finally implementing 
> > > > > >>> creation of
> > > > > >>> mediated devices with libvirt and it's more than desired to get 
> > > > > >>> as many opinions
> > > > > >>> on that as possible, so please do share your ideas. This did come 
> > > > > >>> up already as
> > > > > >>> part of some older threads ([1] for example), so this will be a 
> > > > > >>> respin of the
> > > > > >>> discussions. Long story short, we decided to put device creation 
> > > > > >>> off and focus
> > > > > >>> on the introduction of the framework as such first and build upon 
> > > > > >>> that later,
> > > > > >>> i.e. now.
> > > > > >>>
> > > > > >>> [1] 
> > > > > >>> https://www.redhat.com/archives/libvir-list/2017-February/msg00177.html
> > > > > >>>
> > > > > >>> 
> > > > > >>> PART 1: NODEDEV-DRIVER
> > > > > >>> 
> > > > > >>>
> > > > > >>> API-wise, device creation through the nodedev driver should be 
> > > > > >>> pretty
> > > > > >>> straightforward and without any issues, since virNodeDevCreateXML 
> > > > > >>> takes an XML
> > > > > >>> and does support flags. Looking at the current device XML:
> > > > > >>>
> > > > > >>> 
> > > > > >>>   mdev_0cce8709_0640_46ef_bd14_962c7f73cc6f
> > > > > >>>   
> > > > > >>> /sys/devices/pci:00/.../0cce8709-0640-46ef-bd14-962c7f73cc6f
> > > > > >>>   pci__03_00_0
> > > > > >>>   
> > > > > >>> vfio_mdev
> > > > > >>>   
> > > > > >>>   
> > > > > >>> 
> > > > > >>> 
> > > > > >>> UUID 
> > > > > >>>   
> > > > > >>> 
> > > > > >>>
> > > > > >>> We can ignore ,, elements, since these 
> > > > > >>> are useless
> > > > > >>> during creation. We also cannot use  since we don't support 
> > > > > >>> arbitrary
> > > > > >>> names and we also can't rely on users providing a name in correct 
> > > > > >>> form which we
> > > > > >>> would need to further parse in order to get the UUID.
> > > > > >>> So since the only thing missing to successfully use create an 
> > > > > >>> mdev using XML is
> > > > > >>> the UUID (if user doesn't want it to be generated automatically), 
> > > > > >>> how about
> > > > > >>> having a  subelement under  just like PCIs have 
> > > > > >>>  and
> > > > > >>> friends, USBs have  & , interfaces have  to 
> > > > > >>> uniquely
> > > > > >>> identify the device even if the name itself is unique.
> > > > > >>> Removal of a device should work as well, although we might want to
> > > > > >>> consider creating a *Flags version of the API.
> > > > > >>>
> > > > > >>> =
> > > > > >>> PART 2: DOMAIN XML & DEVICE AUTO-CREATION, NO POLICY INVOLVED!
> > > > > >>> =
> > > > > >>>
> > > > > >>> There were some doubts about auto-creation mentioned in [1], 
> > > > > >>> although they
> > > > > >>> weren't specified further. So hopefully, we'll get further in the 
> > > > > >>> discussion
> > > > > >>> this time.
> > > > > >>>
> > > > > >>> From my perspective there are two main reasons/benefits to that:
> > > > > >>>
> > > > > >>> 1) Convenience
> > > > > >>> For apps like virt-manager, user will want to add a host device 
> > > > > >>> transparently,
> > > > > >>> "hey libvirt, I want an mdev assigned to my VM, can you do that". 
> > > > > >>> Even for
> > > > > >>> higher management apps, like oVirt, even they might not care 
> > > > > >>> about the parent
> > > > > >>> device at all times and considering that they would need to 
> > > > > >>> enumerate the
> > > > > >>> parents, pick one, create the device XML and pass it to the 
> > > > > >>> nodedev driver, IMHO
> > > > > >>> it would actually  be easier and faster to just do it directly 
> > > > > >>> through sysfs,
> > > > > >>> bypassing libvirt once again
> > > > > >>
> > > > > >> The convenience only works if the policy we've provided in libvirt 
> > > > > >> actually
> > > > > >> matches the policy the application wants. I think it is quite 
> > > > > >> likely that with
> > > > > >> cloud the mdevs will be created out of band from the domain 
> > > > > >> startup process.
> > > > > >> It is possible the app will just have a fixed set of mdevs 
> > > > > >> pre-created when
> > > > > >> 

Re: [libvirt] RFC: Creating mediated devices with libvirt

2017-06-22 Thread Daniel P. Berrange
On Thu, Jun 22, 2017 at 02:05:26PM +0200, Erik Skultety wrote:
> On Thu, Jun 22, 2017 at 10:41:13AM +0200, Martin Polednik wrote:
> > On 16/06/17 18:14 +0100, Daniel P. Berrange wrote:
> > > On Fri, Jun 16, 2017 at 06:11:17PM +0100, Daniel P. Berrange wrote:
> > > >
> > > > I'm fine with libvirt having APIs in the node device APIs to enable
> > > > create/delete with libvirt, as well as using managed=yes in the same
> > > > manner that we do for regular PCI devices (the bind/unbind to vfio
> > > > or pci-back)
> > >
> > > Oh, and we really need to fix the big missing feature in the node
> > > device APIs of persistent, inactive configs. eg we should be able
> > > to record XML configs of mdevs (and npiv devices too), in /etc/libvirt
> > > so they persist across reboots, and can be setup for auto-start on
> > > boot too.
> >
> > That doesn't help mdev in any way though. It doesn't make sense to
> > generate new UUID for given VM at each start. So in case of
> 
> What statement does this^^ refer to? Why would you generate a new UUID for a 
> VM
> at each start, you'd generate it only once and then store it, the same way as
> domain UUIDs work.
> 
> > single host, the persistent file is redundant to the domain XML (as
> > long as uuid+parent is in the xml) and in case of cluster we'd have to
> 
> Right now you don't have any info about the parent device in the domain XML 
> and
> such data would only exist in the XML if we all agreed on auto-creating mdevs,
> in which case persistent configs in nodedev would be unnecessary and 
> vice-versa.
> 
> > copy all possible VM mdev definitions to all the hosts.
> 
> ^For mdev configs, you might be better off with creating them explicitly than
> copying configs, simply because given the information the XML has, you might
> conflict with UUIDs between hosts, so you'd have to take care for that. 
> Parents
> have different PCI addresses that most probably wouldn't match across hosts, 
> so
> from automation point of view, I think writing a stub recreating the whole set
> of devices/configs might actually be easier than copying & handling them
> (solely because the 2 things left - after the ones I mentioned - in the XML 
> are
> the vgpu type and IOMMU group number which AFAIK cannot be requested 
> explicitly).

Yep, separately the mdev config from the domain config is a significant
benefit as it makes the domain config independant of the particular device
you've attached to which can vary across hosts.

> > The idea works nicely if you had such definitions accessible in the
> > cluster and could define a group of devices (gpu+soundcard, single
> > mdev, single vf, ...) that would later be assigned to a VM (let's hope
> > kubevirt can get there).
> >
> > As for automatic creation, I think it's on the "nice to have" level.
> > So far libvirt is close to useless when working with mdevs as all the
> > data is in the same sysfs place where create/delete endpoints are - as
> > mentioned earlier, we can just get the data and do everything directly
> > from there instead of dealing with XML and bunch of new API calls.
> > Having at least some *configurable* auto create policy might add some
> 
> ^this is the thing we constantly keep discussing as everyone has a slightly
> different angle of view - libvirt does not implement any kind of policy,
> therefore the only "configuration" would be the PCI parent placement - you say
> what to do and we do it, no logic in it, that's it. Now, I don't understand
> taking care of the guesswork for the user in the simplest manner possible as
> policy rather as a mere convenience, be it just for developers and testers, 
> but
> even that might apparently be perceived as a policy and therefore 
> unacceptable.
> 
> I still stand by idea of having auto-creation as unfortunately, I sort of 
> still
> fail to understand what the negative implications of having it are - is that 
> it
> would get just unnecessarily too complex to maintain in the future that we 
> would
> regret it or that we'd get a huge amount of follow-up requests for extending 
> the
> feature or is it just that simply the interpretation of auto-create == policy?

The increasing complexity of the qemu driver is a significant concern with
adding policy based logic to the code. THinking about this though, if we
provide the inactive node device feature, then we can avoid essentially
all new code and complexity QEMU driver, and still support auto-create.

ie, in the domain XML we just continue to have the exact same XML that
we already have today for mdevs, but with a single new attribute
autocreate=yes|no

  


  


  

In the QEMU driver, then the only change required is

   if (def->autocreate)
   virNodeDeviceCreate(dev)

and the opposite in shutdown. This avoids pulling all the node device
XML schema into the domain XML schema too which is something I dislike
about the previous proposals too.

The inactive node device concept is also more broadly useful 

Re: [libvirt] RFC: Creating mediated devices with libvirt

2017-06-22 Thread Martin Polednik

On 22/06/17 14:05 +0200, Erik Skultety wrote:

On Thu, Jun 22, 2017 at 10:41:13AM +0200, Martin Polednik wrote:

On 16/06/17 18:14 +0100, Daniel P. Berrange wrote:
> On Fri, Jun 16, 2017 at 06:11:17PM +0100, Daniel P. Berrange wrote:
> > On Fri, Jun 16, 2017 at 11:02:55AM -0600, Alex Williamson wrote:
> > > On Fri, 16 Jun 2017 11:32:04 -0400
> > > Laine Stump  wrote:
> > >
> > > > On 06/15/2017 02:42 PM, Alex Williamson wrote:
> > > > > On Thu, 15 Jun 2017 09:33:01 +0100
> > > > > "Daniel P. Berrange"  wrote:
> > > > >
> > > > >> On Thu, Jun 15, 2017 at 12:06:43AM +0200, Erik Skultety wrote:
> > > > >>> Hi all,
> > > > >>>
> > > > >>> so there's been an off-list discussion about finally implementing 
creation of
> > > > >>> mediated devices with libvirt and it's more than desired to get as 
many opinions
> > > > >>> on that as possible, so please do share your ideas. This did come 
up already as
> > > > >>> part of some older threads ([1] for example), so this will be a 
respin of the
> > > > >>> discussions. Long story short, we decided to put device creation 
off and focus
> > > > >>> on the introduction of the framework as such first and build upon 
that later,
> > > > >>> i.e. now.
> > > > >>>
> > > > >>> [1] 
https://www.redhat.com/archives/libvir-list/2017-February/msg00177.html
> > > > >>>
> > > > >>> 
> > > > >>> PART 1: NODEDEV-DRIVER
> > > > >>> 
> > > > >>>
> > > > >>> API-wise, device creation through the nodedev driver should be 
pretty
> > > > >>> straightforward and without any issues, since virNodeDevCreateXML 
takes an XML
> > > > >>> and does support flags. Looking at the current device XML:
> > > > >>>
> > > > >>> 
> > > > >>>   mdev_0cce8709_0640_46ef_bd14_962c7f73cc6f
> > > > >>>   
/sys/devices/pci:00/.../0cce8709-0640-46ef-bd14-962c7f73cc6f
> > > > >>>   pci__03_00_0
> > > > >>>   
> > > > >>> vfio_mdev
> > > > >>>   
> > > > >>>   
> > > > >>> 
> > > > >>> 
> > > > >>> UUID 
> > > > >>>   
> > > > >>> 
> > > > >>>
> > > > >>> We can ignore ,, elements, since these 
are useless
> > > > >>> during creation. We also cannot use  since we don't support 
arbitrary
> > > > >>> names and we also can't rely on users providing a name in correct 
form which we
> > > > >>> would need to further parse in order to get the UUID.
> > > > >>> So since the only thing missing to successfully use create an mdev 
using XML is
> > > > >>> the UUID (if user doesn't want it to be generated automatically), 
how about
> > > > >>> having a  subelement under  just like PCIs have 
 and
> > > > >>> friends, USBs have  & , interfaces have  to 
uniquely
> > > > >>> identify the device even if the name itself is unique.
> > > > >>> Removal of a device should work as well, although we might want to
> > > > >>> consider creating a *Flags version of the API.
> > > > >>>
> > > > >>> =
> > > > >>> PART 2: DOMAIN XML & DEVICE AUTO-CREATION, NO POLICY INVOLVED!
> > > > >>> =
> > > > >>>
> > > > >>> There were some doubts about auto-creation mentioned in [1], 
although they
> > > > >>> weren't specified further. So hopefully, we'll get further in the 
discussion
> > > > >>> this time.
> > > > >>>
> > > > >>> From my perspective there are two main reasons/benefits to that:
> > > > >>>
> > > > >>> 1) Convenience
> > > > >>> For apps like virt-manager, user will want to add a host device 
transparently,
> > > > >>> "hey libvirt, I want an mdev assigned to my VM, can you do that". 
Even for
> > > > >>> higher management apps, like oVirt, even they might not care about 
the parent
> > > > >>> device at all times and considering that they would need to 
enumerate the
> > > > >>> parents, pick one, create the device XML and pass it to the nodedev 
driver, IMHO
> > > > >>> it would actually   be easier and faster to just do it directly 
through sysfs,
> > > > >>> bypassing libvirt once again
> > > > >>
> > > > >> The convenience only works if the policy we've provided in libvirt 
actually
> > > > >> matches the policy the application wants. I think it is quite likely 
that with
> > > > >> cloud the mdevs will be created out of band from the domain startup 
process.
> > > > >> It is possible the app will just have a fixed set of mdevs 
pre-created when
> > > > >> the host starts up. Or that the mgmt app wants the domain startup 
process to
> > > > >> be a two phase setup, where it first allocates the resources needed, 
and later
> > > > >> then tries to start the guest. This is why I keep saying that 
putting this kind
> > > > >> of "convenient" policy in libvirt is a bad idea - it is essentially 
just putting
> > > > >> a bit of virt-manager code into libvirt - more advanced apps will 
need more
> > > > >> flexibility in this area.
> > > > >>
> > > > >>> 2) 

Re: [libvirt] RFC: Creating mediated devices with libvirt

2017-06-22 Thread Erik Skultety
On Thu, Jun 22, 2017 at 10:41:13AM +0200, Martin Polednik wrote:
> On 16/06/17 18:14 +0100, Daniel P. Berrange wrote:
> > On Fri, Jun 16, 2017 at 06:11:17PM +0100, Daniel P. Berrange wrote:
> > > On Fri, Jun 16, 2017 at 11:02:55AM -0600, Alex Williamson wrote:
> > > > On Fri, 16 Jun 2017 11:32:04 -0400
> > > > Laine Stump  wrote:
> > > >
> > > > > On 06/15/2017 02:42 PM, Alex Williamson wrote:
> > > > > > On Thu, 15 Jun 2017 09:33:01 +0100
> > > > > > "Daniel P. Berrange"  wrote:
> > > > > >
> > > > > >> On Thu, Jun 15, 2017 at 12:06:43AM +0200, Erik Skultety wrote:
> > > > > >>> Hi all,
> > > > > >>>
> > > > > >>> so there's been an off-list discussion about finally implementing 
> > > > > >>> creation of
> > > > > >>> mediated devices with libvirt and it's more than desired to get 
> > > > > >>> as many opinions
> > > > > >>> on that as possible, so please do share your ideas. This did come 
> > > > > >>> up already as
> > > > > >>> part of some older threads ([1] for example), so this will be a 
> > > > > >>> respin of the
> > > > > >>> discussions. Long story short, we decided to put device creation 
> > > > > >>> off and focus
> > > > > >>> on the introduction of the framework as such first and build upon 
> > > > > >>> that later,
> > > > > >>> i.e. now.
> > > > > >>>
> > > > > >>> [1] 
> > > > > >>> https://www.redhat.com/archives/libvir-list/2017-February/msg00177.html
> > > > > >>>
> > > > > >>> 
> > > > > >>> PART 1: NODEDEV-DRIVER
> > > > > >>> 
> > > > > >>>
> > > > > >>> API-wise, device creation through the nodedev driver should be 
> > > > > >>> pretty
> > > > > >>> straightforward and without any issues, since virNodeDevCreateXML 
> > > > > >>> takes an XML
> > > > > >>> and does support flags. Looking at the current device XML:
> > > > > >>>
> > > > > >>> 
> > > > > >>>   mdev_0cce8709_0640_46ef_bd14_962c7f73cc6f
> > > > > >>>   
> > > > > >>> /sys/devices/pci:00/.../0cce8709-0640-46ef-bd14-962c7f73cc6f
> > > > > >>>   pci__03_00_0
> > > > > >>>   
> > > > > >>> vfio_mdev
> > > > > >>>   
> > > > > >>>   
> > > > > >>> 
> > > > > >>> 
> > > > > >>> UUID 
> > > > > >>>   
> > > > > >>> 
> > > > > >>>
> > > > > >>> We can ignore ,, elements, since these 
> > > > > >>> are useless
> > > > > >>> during creation. We also cannot use  since we don't support 
> > > > > >>> arbitrary
> > > > > >>> names and we also can't rely on users providing a name in correct 
> > > > > >>> form which we
> > > > > >>> would need to further parse in order to get the UUID.
> > > > > >>> So since the only thing missing to successfully use create an 
> > > > > >>> mdev using XML is
> > > > > >>> the UUID (if user doesn't want it to be generated automatically), 
> > > > > >>> how about
> > > > > >>> having a  subelement under  just like PCIs have 
> > > > > >>>  and
> > > > > >>> friends, USBs have  & , interfaces have  to 
> > > > > >>> uniquely
> > > > > >>> identify the device even if the name itself is unique.
> > > > > >>> Removal of a device should work as well, although we might want to
> > > > > >>> consider creating a *Flags version of the API.
> > > > > >>>
> > > > > >>> =
> > > > > >>> PART 2: DOMAIN XML & DEVICE AUTO-CREATION, NO POLICY INVOLVED!
> > > > > >>> =
> > > > > >>>
> > > > > >>> There were some doubts about auto-creation mentioned in [1], 
> > > > > >>> although they
> > > > > >>> weren't specified further. So hopefully, we'll get further in the 
> > > > > >>> discussion
> > > > > >>> this time.
> > > > > >>>
> > > > > >>> From my perspective there are two main reasons/benefits to that:
> > > > > >>>
> > > > > >>> 1) Convenience
> > > > > >>> For apps like virt-manager, user will want to add a host device 
> > > > > >>> transparently,
> > > > > >>> "hey libvirt, I want an mdev assigned to my VM, can you do that". 
> > > > > >>> Even for
> > > > > >>> higher management apps, like oVirt, even they might not care 
> > > > > >>> about the parent
> > > > > >>> device at all times and considering that they would need to 
> > > > > >>> enumerate the
> > > > > >>> parents, pick one, create the device XML and pass it to the 
> > > > > >>> nodedev driver, IMHO
> > > > > >>> it would actually  be easier and faster to just do it directly 
> > > > > >>> through sysfs,
> > > > > >>> bypassing libvirt once again
> > > > > >>
> > > > > >> The convenience only works if the policy we've provided in libvirt 
> > > > > >> actually
> > > > > >> matches the policy the application wants. I think it is quite 
> > > > > >> likely that with
> > > > > >> cloud the mdevs will be created out of band from the domain 
> > > > > >> startup process.
> > > > > >> It is possible the app will just have a fixed set of mdevs 
> > > > > >> pre-created when
> > > > > >> 

Re: [libvirt] RFC: Creating mediated devices with libvirt

2017-06-22 Thread Martin Polednik

On 16/06/17 18:14 +0100, Daniel P. Berrange wrote:

On Fri, Jun 16, 2017 at 06:11:17PM +0100, Daniel P. Berrange wrote:

On Fri, Jun 16, 2017 at 11:02:55AM -0600, Alex Williamson wrote:
> On Fri, 16 Jun 2017 11:32:04 -0400
> Laine Stump  wrote:
>
> > On 06/15/2017 02:42 PM, Alex Williamson wrote:
> > > On Thu, 15 Jun 2017 09:33:01 +0100
> > > "Daniel P. Berrange"  wrote:
> > >
> > >> On Thu, Jun 15, 2017 at 12:06:43AM +0200, Erik Skultety wrote:
> > >>> Hi all,
> > >>>
> > >>> so there's been an off-list discussion about finally implementing 
creation of
> > >>> mediated devices with libvirt and it's more than desired to get as many 
opinions
> > >>> on that as possible, so please do share your ideas. This did come up 
already as
> > >>> part of some older threads ([1] for example), so this will be a respin 
of the
> > >>> discussions. Long story short, we decided to put device creation off 
and focus
> > >>> on the introduction of the framework as such first and build upon that 
later,
> > >>> i.e. now.
> > >>>
> > >>> [1] 
https://www.redhat.com/archives/libvir-list/2017-February/msg00177.html
> > >>>
> > >>> 
> > >>> PART 1: NODEDEV-DRIVER
> > >>> 
> > >>>
> > >>> API-wise, device creation through the nodedev driver should be pretty
> > >>> straightforward and without any issues, since virNodeDevCreateXML takes 
an XML
> > >>> and does support flags. Looking at the current device XML:
> > >>>
> > >>> 
> > >>>   mdev_0cce8709_0640_46ef_bd14_962c7f73cc6f
> > >>>   
/sys/devices/pci:00/.../0cce8709-0640-46ef-bd14-962c7f73cc6f
> > >>>   pci__03_00_0
> > >>>   
> > >>> vfio_mdev
> > >>>   
> > >>>   
> > >>> 
> > >>> 
> > >>> UUID 
> > >>>   
> > >>> 
> > >>>
> > >>> We can ignore ,, elements, since these are 
useless
> > >>> during creation. We also cannot use  since we don't support 
arbitrary
> > >>> names and we also can't rely on users providing a name in correct form 
which we
> > >>> would need to further parse in order to get the UUID.
> > >>> So since the only thing missing to successfully use create an mdev 
using XML is
> > >>> the UUID (if user doesn't want it to be generated automatically), how 
about
> > >>> having a  subelement under  just like PCIs have 
 and
> > >>> friends, USBs have  & , interfaces have  to 
uniquely
> > >>> identify the device even if the name itself is unique.
> > >>> Removal of a device should work as well, although we might want to
> > >>> consider creating a *Flags version of the API.
> > >>>
> > >>> =
> > >>> PART 2: DOMAIN XML & DEVICE AUTO-CREATION, NO POLICY INVOLVED!
> > >>> =
> > >>>
> > >>> There were some doubts about auto-creation mentioned in [1], although 
they
> > >>> weren't specified further. So hopefully, we'll get further in the 
discussion
> > >>> this time.
> > >>>
> > >>> From my perspective there are two main reasons/benefits to that:
> > >>>
> > >>> 1) Convenience
> > >>> For apps like virt-manager, user will want to add a host device 
transparently,
> > >>> "hey libvirt, I want an mdev assigned to my VM, can you do that". Even 
for
> > >>> higher management apps, like oVirt, even they might not care about the 
parent
> > >>> device at all times and considering that they would need to enumerate 
the
> > >>> parents, pick one, create the device XML and pass it to the nodedev 
driver, IMHO
> > >>> it would actually be easier and faster to just do it directly 
through sysfs,
> > >>> bypassing libvirt once again
> > >>
> > >> The convenience only works if the policy we've provided in libvirt 
actually
> > >> matches the policy the application wants. I think it is quite likely 
that with
> > >> cloud the mdevs will be created out of band from the domain startup 
process.
> > >> It is possible the app will just have a fixed set of mdevs pre-created 
when
> > >> the host starts up. Or that the mgmt app wants the domain startup 
process to
> > >> be a two phase setup, where it first allocates the resources needed, and 
later
> > >> then tries to start the guest. This is why I keep saying that putting 
this kind
> > >> of "convenient" policy in libvirt is a bad idea - it is essentially just 
putting
> > >> a bit of virt-manager code into libvirt - more advanced apps will need 
more
> > >> flexibility in this area.
> > >>
> > >>> 2) Future domain migration
> > >>> Suppose now that the mdev backing physical devices support state dump 
and
> > >>> reload. Chances are, that the corresponding mdev doesn't even exist or 
has a
> > >>> different UUID on the destination, so libvirt would do its best to 
handle this
> > >>> before the domain could be resumed.
> > >>
> > >> This is not an unusual scenario - there are already many other parts of 
the
> > >> device backend config that need to 

Re: [libvirt] RFC: Creating mediated devices with libvirt

2017-06-16 Thread Alex Williamson
On Fri, 16 Jun 2017 18:11:17 +0100
"Daniel P. Berrange"  wrote:

> On Fri, Jun 16, 2017 at 11:02:55AM -0600, Alex Williamson wrote:
> > On Fri, 16 Jun 2017 11:32:04 -0400
> > Laine Stump  wrote:
> >   
> > > On 06/15/2017 02:42 PM, Alex Williamson wrote:  
> > > > On Thu, 15 Jun 2017 09:33:01 +0100
> > > > "Daniel P. Berrange"  wrote:
> > > > 
> > > >> On Thu, Jun 15, 2017 at 12:06:43AM +0200, Erik Skultety wrote:
> > > >>> Hi all,
> > > >>>
> > > >>> so there's been an off-list discussion about finally implementing 
> > > >>> creation of
> > > >>> mediated devices with libvirt and it's more than desired to get as 
> > > >>> many opinions
> > > >>> on that as possible, so please do share your ideas. This did come up 
> > > >>> already as
> > > >>> part of some older threads ([1] for example), so this will be a 
> > > >>> respin of the
> > > >>> discussions. Long story short, we decided to put device creation off 
> > > >>> and focus
> > > >>> on the introduction of the framework as such first and build upon 
> > > >>> that later,
> > > >>> i.e. now.
> > > >>>
> > > >>> [1] 
> > > >>> https://www.redhat.com/archives/libvir-list/2017-February/msg00177.html
> > > >>>
> > > >>> 
> > > >>> PART 1: NODEDEV-DRIVER
> > > >>> 
> > > >>>
> > > >>> API-wise, device creation through the nodedev driver should be pretty
> > > >>> straightforward and without any issues, since virNodeDevCreateXML 
> > > >>> takes an XML
> > > >>> and does support flags. Looking at the current device XML:
> > > >>>
> > > >>> 
> > > >>>   mdev_0cce8709_0640_46ef_bd14_962c7f73cc6f
> > > >>>   
> > > >>> /sys/devices/pci:00/.../0cce8709-0640-46ef-bd14-962c7f73cc6f
> > > >>>   pci__03_00_0
> > > >>>   
> > > >>> vfio_mdev
> > > >>>   
> > > >>>   
> > > >>> 
> > > >>> 
> > > >>> UUID 
> > > >>>   
> > > >>> 
> > > >>>
> > > >>> We can ignore ,, elements, since these are 
> > > >>> useless
> > > >>> during creation. We also cannot use  since we don't support 
> > > >>> arbitrary
> > > >>> names and we also can't rely on users providing a name in correct 
> > > >>> form which we
> > > >>> would need to further parse in order to get the UUID.
> > > >>> So since the only thing missing to successfully use create an mdev 
> > > >>> using XML is
> > > >>> the UUID (if user doesn't want it to be generated automatically), how 
> > > >>> about
> > > >>> having a  subelement under  just like PCIs have 
> > > >>>  and
> > > >>> friends, USBs have  & , interfaces have  to 
> > > >>> uniquely
> > > >>> identify the device even if the name itself is unique.
> > > >>> Removal of a device should work as well, although we might want to
> > > >>> consider creating a *Flags version of the API.
> > > >>>
> > > >>> =
> > > >>> PART 2: DOMAIN XML & DEVICE AUTO-CREATION, NO POLICY INVOLVED!
> > > >>> =
> > > >>>
> > > >>> There were some doubts about auto-creation mentioned in [1], although 
> > > >>> they
> > > >>> weren't specified further. So hopefully, we'll get further in the 
> > > >>> discussion
> > > >>> this time.
> > > >>>
> > > >>> From my perspective there are two main reasons/benefits to that:
> > > >>>
> > > >>> 1) Convenience
> > > >>> For apps like virt-manager, user will want to add a host device 
> > > >>> transparently,
> > > >>> "hey libvirt, I want an mdev assigned to my VM, can you do that". 
> > > >>> Even for
> > > >>> higher management apps, like oVirt, even they might not care about 
> > > >>> the parent
> > > >>> device at all times and considering that they would need to enumerate 
> > > >>> the
> > > >>> parents, pick one, create the device XML and pass it to the nodedev 
> > > >>> driver, IMHO
> > > >>> it would actually  be easier and faster to just do it directly 
> > > >>> through sysfs,
> > > >>> bypassing libvirt once again  
> > > >>
> > > >> The convenience only works if the policy we've provided in libvirt 
> > > >> actually
> > > >> matches the policy the application wants. I think it is quite likely 
> > > >> that with
> > > >> cloud the mdevs will be created out of band from the domain startup 
> > > >> process.
> > > >> It is possible the app will just have a fixed set of mdevs pre-created 
> > > >> when
> > > >> the host starts up. Or that the mgmt app wants the domain startup 
> > > >> process to
> > > >> be a two phase setup, where it first allocates the resources needed, 
> > > >> and later
> > > >> then tries to start the guest. This is why I keep saying that putting 
> > > >> this kind
> > > >> of "convenient" policy in libvirt is a bad idea - it is essentially 
> > > >> just putting
> > > >> a bit of virt-manager code into libvirt - more advanced apps will need 
> > > >> more
> > > >> flexibility in this area.
> > > >>  

Re: [libvirt] RFC: Creating mediated devices with libvirt

2017-06-16 Thread Daniel P. Berrange
On Fri, Jun 16, 2017 at 06:11:17PM +0100, Daniel P. Berrange wrote:
> On Fri, Jun 16, 2017 at 11:02:55AM -0600, Alex Williamson wrote:
> > On Fri, 16 Jun 2017 11:32:04 -0400
> > Laine Stump  wrote:
> > 
> > > On 06/15/2017 02:42 PM, Alex Williamson wrote:
> > > > On Thu, 15 Jun 2017 09:33:01 +0100
> > > > "Daniel P. Berrange"  wrote:
> > > >   
> > > >> On Thu, Jun 15, 2017 at 12:06:43AM +0200, Erik Skultety wrote:  
> > > >>> Hi all,
> > > >>>
> > > >>> so there's been an off-list discussion about finally implementing 
> > > >>> creation of
> > > >>> mediated devices with libvirt and it's more than desired to get as 
> > > >>> many opinions
> > > >>> on that as possible, so please do share your ideas. This did come up 
> > > >>> already as
> > > >>> part of some older threads ([1] for example), so this will be a 
> > > >>> respin of the
> > > >>> discussions. Long story short, we decided to put device creation off 
> > > >>> and focus
> > > >>> on the introduction of the framework as such first and build upon 
> > > >>> that later,
> > > >>> i.e. now.
> > > >>>
> > > >>> [1] 
> > > >>> https://www.redhat.com/archives/libvir-list/2017-February/msg00177.html
> > > >>>
> > > >>> 
> > > >>> PART 1: NODEDEV-DRIVER
> > > >>> 
> > > >>>
> > > >>> API-wise, device creation through the nodedev driver should be pretty
> > > >>> straightforward and without any issues, since virNodeDevCreateXML 
> > > >>> takes an XML
> > > >>> and does support flags. Looking at the current device XML:
> > > >>>
> > > >>> 
> > > >>>   mdev_0cce8709_0640_46ef_bd14_962c7f73cc6f
> > > >>>   
> > > >>> /sys/devices/pci:00/.../0cce8709-0640-46ef-bd14-962c7f73cc6f
> > > >>>   pci__03_00_0
> > > >>>   
> > > >>> vfio_mdev
> > > >>>   
> > > >>>   
> > > >>> 
> > > >>> 
> > > >>> UUID 
> > > >>>   
> > > >>> 
> > > >>>
> > > >>> We can ignore ,, elements, since these are 
> > > >>> useless
> > > >>> during creation. We also cannot use  since we don't support 
> > > >>> arbitrary
> > > >>> names and we also can't rely on users providing a name in correct 
> > > >>> form which we
> > > >>> would need to further parse in order to get the UUID.
> > > >>> So since the only thing missing to successfully use create an mdev 
> > > >>> using XML is
> > > >>> the UUID (if user doesn't want it to be generated automatically), how 
> > > >>> about
> > > >>> having a  subelement under  just like PCIs have 
> > > >>>  and
> > > >>> friends, USBs have  & , interfaces have  to 
> > > >>> uniquely
> > > >>> identify the device even if the name itself is unique.
> > > >>> Removal of a device should work as well, although we might want to
> > > >>> consider creating a *Flags version of the API.
> > > >>>
> > > >>> =
> > > >>> PART 2: DOMAIN XML & DEVICE AUTO-CREATION, NO POLICY INVOLVED!
> > > >>> =
> > > >>>
> > > >>> There were some doubts about auto-creation mentioned in [1], although 
> > > >>> they
> > > >>> weren't specified further. So hopefully, we'll get further in the 
> > > >>> discussion
> > > >>> this time.
> > > >>>
> > > >>> From my perspective there are two main reasons/benefits to that:
> > > >>>
> > > >>> 1) Convenience
> > > >>> For apps like virt-manager, user will want to add a host device 
> > > >>> transparently,
> > > >>> "hey libvirt, I want an mdev assigned to my VM, can you do that". 
> > > >>> Even for
> > > >>> higher management apps, like oVirt, even they might not care about 
> > > >>> the parent
> > > >>> device at all times and considering that they would need to enumerate 
> > > >>> the
> > > >>> parents, pick one, create the device XML and pass it to the nodedev 
> > > >>> driver, IMHO
> > > >>> it would actually  be easier and faster to just do it directly 
> > > >>> through sysfs,
> > > >>> bypassing libvirt once again
> > > >>
> > > >> The convenience only works if the policy we've provided in libvirt 
> > > >> actually
> > > >> matches the policy the application wants. I think it is quite likely 
> > > >> that with
> > > >> cloud the mdevs will be created out of band from the domain startup 
> > > >> process.
> > > >> It is possible the app will just have a fixed set of mdevs pre-created 
> > > >> when
> > > >> the host starts up. Or that the mgmt app wants the domain startup 
> > > >> process to
> > > >> be a two phase setup, where it first allocates the resources needed, 
> > > >> and later
> > > >> then tries to start the guest. This is why I keep saying that putting 
> > > >> this kind
> > > >> of "convenient" policy in libvirt is a bad idea - it is essentially 
> > > >> just putting
> > > >> a bit of virt-manager code into libvirt - more advanced apps will need 
> > > >> more
> > > >> flexibility in this area.
> > > >>  
> > > >>> 2) Future domain 

Re: [libvirt] RFC: Creating mediated devices with libvirt

2017-06-16 Thread Daniel P. Berrange
On Fri, Jun 16, 2017 at 11:02:55AM -0600, Alex Williamson wrote:
> On Fri, 16 Jun 2017 11:32:04 -0400
> Laine Stump  wrote:
> 
> > On 06/15/2017 02:42 PM, Alex Williamson wrote:
> > > On Thu, 15 Jun 2017 09:33:01 +0100
> > > "Daniel P. Berrange"  wrote:
> > >   
> > >> On Thu, Jun 15, 2017 at 12:06:43AM +0200, Erik Skultety wrote:  
> > >>> Hi all,
> > >>>
> > >>> so there's been an off-list discussion about finally implementing 
> > >>> creation of
> > >>> mediated devices with libvirt and it's more than desired to get as many 
> > >>> opinions
> > >>> on that as possible, so please do share your ideas. This did come up 
> > >>> already as
> > >>> part of some older threads ([1] for example), so this will be a respin 
> > >>> of the
> > >>> discussions. Long story short, we decided to put device creation off 
> > >>> and focus
> > >>> on the introduction of the framework as such first and build upon that 
> > >>> later,
> > >>> i.e. now.
> > >>>
> > >>> [1] 
> > >>> https://www.redhat.com/archives/libvir-list/2017-February/msg00177.html
> > >>>
> > >>> 
> > >>> PART 1: NODEDEV-DRIVER
> > >>> 
> > >>>
> > >>> API-wise, device creation through the nodedev driver should be pretty
> > >>> straightforward and without any issues, since virNodeDevCreateXML takes 
> > >>> an XML
> > >>> and does support flags. Looking at the current device XML:
> > >>>
> > >>> 
> > >>>   mdev_0cce8709_0640_46ef_bd14_962c7f73cc6f
> > >>>   
> > >>> /sys/devices/pci:00/.../0cce8709-0640-46ef-bd14-962c7f73cc6f
> > >>>   pci__03_00_0
> > >>>   
> > >>> vfio_mdev
> > >>>   
> > >>>   
> > >>> 
> > >>> 
> > >>> UUID 
> > >>>   
> > >>> 
> > >>>
> > >>> We can ignore ,, elements, since these are 
> > >>> useless
> > >>> during creation. We also cannot use  since we don't support 
> > >>> arbitrary
> > >>> names and we also can't rely on users providing a name in correct form 
> > >>> which we
> > >>> would need to further parse in order to get the UUID.
> > >>> So since the only thing missing to successfully use create an mdev 
> > >>> using XML is
> > >>> the UUID (if user doesn't want it to be generated automatically), how 
> > >>> about
> > >>> having a  subelement under  just like PCIs have 
> > >>>  and
> > >>> friends, USBs have  & , interfaces have  to 
> > >>> uniquely
> > >>> identify the device even if the name itself is unique.
> > >>> Removal of a device should work as well, although we might want to
> > >>> consider creating a *Flags version of the API.
> > >>>
> > >>> =
> > >>> PART 2: DOMAIN XML & DEVICE AUTO-CREATION, NO POLICY INVOLVED!
> > >>> =
> > >>>
> > >>> There were some doubts about auto-creation mentioned in [1], although 
> > >>> they
> > >>> weren't specified further. So hopefully, we'll get further in the 
> > >>> discussion
> > >>> this time.
> > >>>
> > >>> From my perspective there are two main reasons/benefits to that:
> > >>>
> > >>> 1) Convenience
> > >>> For apps like virt-manager, user will want to add a host device 
> > >>> transparently,
> > >>> "hey libvirt, I want an mdev assigned to my VM, can you do that". Even 
> > >>> for
> > >>> higher management apps, like oVirt, even they might not care about the 
> > >>> parent
> > >>> device at all times and considering that they would need to enumerate 
> > >>> the
> > >>> parents, pick one, create the device XML and pass it to the nodedev 
> > >>> driver, IMHO
> > >>> it would actuallybe easier and faster to just do it directly 
> > >>> through sysfs,
> > >>> bypassing libvirt once again
> > >>
> > >> The convenience only works if the policy we've provided in libvirt 
> > >> actually
> > >> matches the policy the application wants. I think it is quite likely 
> > >> that with
> > >> cloud the mdevs will be created out of band from the domain startup 
> > >> process.
> > >> It is possible the app will just have a fixed set of mdevs pre-created 
> > >> when
> > >> the host starts up. Or that the mgmt app wants the domain startup 
> > >> process to
> > >> be a two phase setup, where it first allocates the resources needed, and 
> > >> later
> > >> then tries to start the guest. This is why I keep saying that putting 
> > >> this kind
> > >> of "convenient" policy in libvirt is a bad idea - it is essentially just 
> > >> putting
> > >> a bit of virt-manager code into libvirt - more advanced apps will need 
> > >> more
> > >> flexibility in this area.
> > >>  
> > >>> 2) Future domain migration
> > >>> Suppose now that the mdev backing physical devices support state dump 
> > >>> and
> > >>> reload. Chances are, that the corresponding mdev doesn't even exist or 
> > >>> has a
> > >>> different UUID on the destination, so libvirt would do its best to 
> > >>> handle this
> > >>> 

Re: [libvirt] RFC: Creating mediated devices with libvirt

2017-06-16 Thread Alex Williamson
On Fri, 16 Jun 2017 11:32:04 -0400
Laine Stump  wrote:

> On 06/15/2017 02:42 PM, Alex Williamson wrote:
> > On Thu, 15 Jun 2017 09:33:01 +0100
> > "Daniel P. Berrange"  wrote:
> >   
> >> On Thu, Jun 15, 2017 at 12:06:43AM +0200, Erik Skultety wrote:  
> >>> Hi all,
> >>>
> >>> so there's been an off-list discussion about finally implementing 
> >>> creation of
> >>> mediated devices with libvirt and it's more than desired to get as many 
> >>> opinions
> >>> on that as possible, so please do share your ideas. This did come up 
> >>> already as
> >>> part of some older threads ([1] for example), so this will be a respin of 
> >>> the
> >>> discussions. Long story short, we decided to put device creation off and 
> >>> focus
> >>> on the introduction of the framework as such first and build upon that 
> >>> later,
> >>> i.e. now.
> >>>
> >>> [1] 
> >>> https://www.redhat.com/archives/libvir-list/2017-February/msg00177.html
> >>>
> >>> 
> >>> PART 1: NODEDEV-DRIVER
> >>> 
> >>>
> >>> API-wise, device creation through the nodedev driver should be pretty
> >>> straightforward and without any issues, since virNodeDevCreateXML takes 
> >>> an XML
> >>> and does support flags. Looking at the current device XML:
> >>>
> >>> 
> >>>   mdev_0cce8709_0640_46ef_bd14_962c7f73cc6f
> >>>   
> >>> /sys/devices/pci:00/.../0cce8709-0640-46ef-bd14-962c7f73cc6f
> >>>   pci__03_00_0
> >>>   
> >>> vfio_mdev
> >>>   
> >>>   
> >>> 
> >>> 
> >>> UUID 
> >>>   
> >>> 
> >>>
> >>> We can ignore ,, elements, since these are 
> >>> useless
> >>> during creation. We also cannot use  since we don't support 
> >>> arbitrary
> >>> names and we also can't rely on users providing a name in correct form 
> >>> which we
> >>> would need to further parse in order to get the UUID.
> >>> So since the only thing missing to successfully use create an mdev using 
> >>> XML is
> >>> the UUID (if user doesn't want it to be generated automatically), how 
> >>> about
> >>> having a  subelement under  just like PCIs have 
> >>>  and
> >>> friends, USBs have  & , interfaces have  to uniquely
> >>> identify the device even if the name itself is unique.
> >>> Removal of a device should work as well, although we might want to
> >>> consider creating a *Flags version of the API.
> >>>
> >>> =
> >>> PART 2: DOMAIN XML & DEVICE AUTO-CREATION, NO POLICY INVOLVED!
> >>> =
> >>>
> >>> There were some doubts about auto-creation mentioned in [1], although they
> >>> weren't specified further. So hopefully, we'll get further in the 
> >>> discussion
> >>> this time.
> >>>
> >>> From my perspective there are two main reasons/benefits to that:
> >>>
> >>> 1) Convenience
> >>> For apps like virt-manager, user will want to add a host device 
> >>> transparently,
> >>> "hey libvirt, I want an mdev assigned to my VM, can you do that". Even for
> >>> higher management apps, like oVirt, even they might not care about the 
> >>> parent
> >>> device at all times and considering that they would need to enumerate the
> >>> parents, pick one, create the device XML and pass it to the nodedev 
> >>> driver, IMHO
> >>> it would actually  be easier and faster to just do it directly through 
> >>> sysfs,
> >>> bypassing libvirt once again
> >>
> >> The convenience only works if the policy we've provided in libvirt actually
> >> matches the policy the application wants. I think it is quite likely that 
> >> with
> >> cloud the mdevs will be created out of band from the domain startup 
> >> process.
> >> It is possible the app will just have a fixed set of mdevs pre-created when
> >> the host starts up. Or that the mgmt app wants the domain startup process 
> >> to
> >> be a two phase setup, where it first allocates the resources needed, and 
> >> later
> >> then tries to start the guest. This is why I keep saying that putting this 
> >> kind
> >> of "convenient" policy in libvirt is a bad idea - it is essentially just 
> >> putting
> >> a bit of virt-manager code into libvirt - more advanced apps will need more
> >> flexibility in this area.
> >>  
> >>> 2) Future domain migration
> >>> Suppose now that the mdev backing physical devices support state dump and
> >>> reload. Chances are, that the corresponding mdev doesn't even exist or 
> >>> has a
> >>> different UUID on the destination, so libvirt would do its best to handle 
> >>> this
> >>> before the domain could be resumed.
> >>
> >> This is not an unusual scenario - there are already many other parts of the
> >> device backend config that need to change prior to migration, especially 
> >> for
> >> anything related to host devices, so apps already have support for doing
> >> this, which is more flexible & convenient becasue it doesn't tie creation 
> >> 

Re: [libvirt] RFC: Creating mediated devices with libvirt

2017-06-16 Thread Daniel P. Berrange
On Fri, Jun 16, 2017 at 11:32:04AM -0400, Laine Stump wrote:
> On 06/15/2017 02:42 PM, Alex Williamson wrote:
> > On Thu, 15 Jun 2017 09:33:01 +0100
> > "Daniel P. Berrange"  wrote:
> > 
> >> On Thu, Jun 15, 2017 at 12:06:43AM +0200, Erik Skultety wrote:
> >>> Hi all,
> >>>
> >>> so there's been an off-list discussion about finally implementing 
> >>> creation of
> >>> mediated devices with libvirt and it's more than desired to get as many 
> >>> opinions
> >>> on that as possible, so please do share your ideas. This did come up 
> >>> already as
> >>> part of some older threads ([1] for example), so this will be a respin of 
> >>> the
> >>> discussions. Long story short, we decided to put device creation off and 
> >>> focus
> >>> on the introduction of the framework as such first and build upon that 
> >>> later,
> >>> i.e. now.
> >>>
> >>> [1] 
> >>> https://www.redhat.com/archives/libvir-list/2017-February/msg00177.html
> >>>
> >>> 
> >>> PART 1: NODEDEV-DRIVER
> >>> 
> >>>
> >>> API-wise, device creation through the nodedev driver should be pretty
> >>> straightforward and without any issues, since virNodeDevCreateXML takes 
> >>> an XML
> >>> and does support flags. Looking at the current device XML:
> >>>
> >>> 
> >>>   mdev_0cce8709_0640_46ef_bd14_962c7f73cc6f
> >>>   
> >>> /sys/devices/pci:00/.../0cce8709-0640-46ef-bd14-962c7f73cc6f
> >>>   pci__03_00_0
> >>>   
> >>> vfio_mdev
> >>>   
> >>>   
> >>> 
> >>> 
> >>> UUID 
> >>>   
> >>> 
> >>>
> >>> We can ignore ,, elements, since these are 
> >>> useless
> >>> during creation. We also cannot use  since we don't support 
> >>> arbitrary
> >>> names and we also can't rely on users providing a name in correct form 
> >>> which we
> >>> would need to further parse in order to get the UUID.
> >>> So since the only thing missing to successfully use create an mdev using 
> >>> XML is
> >>> the UUID (if user doesn't want it to be generated automatically), how 
> >>> about
> >>> having a  subelement under  just like PCIs have 
> >>>  and
> >>> friends, USBs have  & , interfaces have  to uniquely
> >>> identify the device even if the name itself is unique.
> >>> Removal of a device should work as well, although we might want to
> >>> consider creating a *Flags version of the API.
> >>>
> >>> =
> >>> PART 2: DOMAIN XML & DEVICE AUTO-CREATION, NO POLICY INVOLVED!
> >>> =
> >>>
> >>> There were some doubts about auto-creation mentioned in [1], although they
> >>> weren't specified further. So hopefully, we'll get further in the 
> >>> discussion
> >>> this time.
> >>>
> >>> From my perspective there are two main reasons/benefits to that:
> >>>
> >>> 1) Convenience
> >>> For apps like virt-manager, user will want to add a host device 
> >>> transparently,
> >>> "hey libvirt, I want an mdev assigned to my VM, can you do that". Even for
> >>> higher management apps, like oVirt, even they might not care about the 
> >>> parent
> >>> device at all times and considering that they would need to enumerate the
> >>> parents, pick one, create the device XML and pass it to the nodedev 
> >>> driver, IMHO
> >>> it would actually be easier and faster to just do it directly through 
> >>> sysfs,
> >>> bypassing libvirt once again  
> >>
> >> The convenience only works if the policy we've provided in libvirt actually
> >> matches the policy the application wants. I think it is quite likely that 
> >> with
> >> cloud the mdevs will be created out of band from the domain startup 
> >> process.
> >> It is possible the app will just have a fixed set of mdevs pre-created when
> >> the host starts up. Or that the mgmt app wants the domain startup process 
> >> to
> >> be a two phase setup, where it first allocates the resources needed, and 
> >> later
> >> then tries to start the guest. This is why I keep saying that putting this 
> >> kind
> >> of "convenient" policy in libvirt is a bad idea - it is essentially just 
> >> putting
> >> a bit of virt-manager code into libvirt - more advanced apps will need more
> >> flexibility in this area.
> >>
> >>> 2) Future domain migration
> >>> Suppose now that the mdev backing physical devices support state dump and
> >>> reload. Chances are, that the corresponding mdev doesn't even exist or 
> >>> has a
> >>> different UUID on the destination, so libvirt would do its best to handle 
> >>> this
> >>> before the domain could be resumed.  
> >>
> >> This is not an unusual scenario - there are already many other parts of the
> >> device backend config that need to change prior to migration, especially 
> >> for
> >> anything related to host devices, so apps already have support for doing
> >> this, which is more flexible & convenient becasue it doesn't tie creation 
> >> of
> >> the mdevs to 

Re: [libvirt] RFC: Creating mediated devices with libvirt

2017-06-16 Thread Laine Stump
On 06/15/2017 02:42 PM, Alex Williamson wrote:
> On Thu, 15 Jun 2017 09:33:01 +0100
> "Daniel P. Berrange"  wrote:
> 
>> On Thu, Jun 15, 2017 at 12:06:43AM +0200, Erik Skultety wrote:
>>> Hi all,
>>>
>>> so there's been an off-list discussion about finally implementing creation 
>>> of
>>> mediated devices with libvirt and it's more than desired to get as many 
>>> opinions
>>> on that as possible, so please do share your ideas. This did come up 
>>> already as
>>> part of some older threads ([1] for example), so this will be a respin of 
>>> the
>>> discussions. Long story short, we decided to put device creation off and 
>>> focus
>>> on the introduction of the framework as such first and build upon that 
>>> later,
>>> i.e. now.
>>>
>>> [1] https://www.redhat.com/archives/libvir-list/2017-February/msg00177.html
>>>
>>> 
>>> PART 1: NODEDEV-DRIVER
>>> 
>>>
>>> API-wise, device creation through the nodedev driver should be pretty
>>> straightforward and without any issues, since virNodeDevCreateXML takes an 
>>> XML
>>> and does support flags. Looking at the current device XML:
>>>
>>> 
>>>   mdev_0cce8709_0640_46ef_bd14_962c7f73cc6f
>>>   
>>> /sys/devices/pci:00/.../0cce8709-0640-46ef-bd14-962c7f73cc6f
>>>   pci__03_00_0
>>>   
>>> vfio_mdev
>>>   
>>>   
>>> 
>>> 
>>> UUID 
>>>   
>>> 
>>>
>>> We can ignore ,, elements, since these are useless
>>> during creation. We also cannot use  since we don't support arbitrary
>>> names and we also can't rely on users providing a name in correct form 
>>> which we
>>> would need to further parse in order to get the UUID.
>>> So since the only thing missing to successfully use create an mdev using 
>>> XML is
>>> the UUID (if user doesn't want it to be generated automatically), how about
>>> having a  subelement under  just like PCIs have  
>>> and
>>> friends, USBs have  & , interfaces have  to uniquely
>>> identify the device even if the name itself is unique.
>>> Removal of a device should work as well, although we might want to
>>> consider creating a *Flags version of the API.
>>>
>>> =
>>> PART 2: DOMAIN XML & DEVICE AUTO-CREATION, NO POLICY INVOLVED!
>>> =
>>>
>>> There were some doubts about auto-creation mentioned in [1], although they
>>> weren't specified further. So hopefully, we'll get further in the discussion
>>> this time.
>>>
>>> From my perspective there are two main reasons/benefits to that:
>>>
>>> 1) Convenience
>>> For apps like virt-manager, user will want to add a host device 
>>> transparently,
>>> "hey libvirt, I want an mdev assigned to my VM, can you do that". Even for
>>> higher management apps, like oVirt, even they might not care about the 
>>> parent
>>> device at all times and considering that they would need to enumerate the
>>> parents, pick one, create the device XML and pass it to the nodedev driver, 
>>> IMHO
>>> it would actually be easier and faster to just do it directly through sysfs,
>>> bypassing libvirt once again  
>>
>> The convenience only works if the policy we've provided in libvirt actually
>> matches the policy the application wants. I think it is quite likely that 
>> with
>> cloud the mdevs will be created out of band from the domain startup process.
>> It is possible the app will just have a fixed set of mdevs pre-created when
>> the host starts up. Or that the mgmt app wants the domain startup process to
>> be a two phase setup, where it first allocates the resources needed, and 
>> later
>> then tries to start the guest. This is why I keep saying that putting this 
>> kind
>> of "convenient" policy in libvirt is a bad idea - it is essentially just 
>> putting
>> a bit of virt-manager code into libvirt - more advanced apps will need more
>> flexibility in this area.
>>
>>> 2) Future domain migration
>>> Suppose now that the mdev backing physical devices support state dump and
>>> reload. Chances are, that the corresponding mdev doesn't even exist or has a
>>> different UUID on the destination, so libvirt would do its best to handle 
>>> this
>>> before the domain could be resumed.  
>>
>> This is not an unusual scenario - there are already many other parts of the
>> device backend config that need to change prior to migration, especially for
>> anything related to host devices, so apps already have support for doing
>> this, which is more flexible & convenient becasue it doesn't tie creation of
>> the mdevs to running of the migrate command.
>>
>> IOW, I'm still against adding any kind of automatic creation policy for
>> mdevs in libvirt. Just provide the node device API support.
> 
> I'm not super clear on the extent of what you're against here, is it
> all forms of device creation or only a placement policy?  Are you
> against any form of having the XML specify 

Re: [libvirt] RFC: Creating mediated devices with libvirt

2017-06-15 Thread Alex Williamson
On Thu, 15 Jun 2017 09:33:01 +0100
"Daniel P. Berrange"  wrote:

> On Thu, Jun 15, 2017 at 12:06:43AM +0200, Erik Skultety wrote:
> > Hi all,
> > 
> > so there's been an off-list discussion about finally implementing creation 
> > of
> > mediated devices with libvirt and it's more than desired to get as many 
> > opinions
> > on that as possible, so please do share your ideas. This did come up 
> > already as
> > part of some older threads ([1] for example), so this will be a respin of 
> > the
> > discussions. Long story short, we decided to put device creation off and 
> > focus
> > on the introduction of the framework as such first and build upon that 
> > later,
> > i.e. now.
> > 
> > [1] https://www.redhat.com/archives/libvir-list/2017-February/msg00177.html
> > 
> > 
> > PART 1: NODEDEV-DRIVER
> > 
> > 
> > API-wise, device creation through the nodedev driver should be pretty
> > straightforward and without any issues, since virNodeDevCreateXML takes an 
> > XML
> > and does support flags. Looking at the current device XML:
> > 
> > 
> >   mdev_0cce8709_0640_46ef_bd14_962c7f73cc6f
> >   
> > /sys/devices/pci:00/.../0cce8709-0640-46ef-bd14-962c7f73cc6f
> >   pci__03_00_0
> >   
> > vfio_mdev
> >   
> >   
> > 
> > 
> > UUID 
> >   
> > 
> > 
> > We can ignore ,, elements, since these are useless
> > during creation. We also cannot use  since we don't support arbitrary
> > names and we also can't rely on users providing a name in correct form 
> > which we
> > would need to further parse in order to get the UUID.
> > So since the only thing missing to successfully use create an mdev using 
> > XML is
> > the UUID (if user doesn't want it to be generated automatically), how about
> > having a  subelement under  just like PCIs have  
> > and
> > friends, USBs have  & , interfaces have  to uniquely
> > identify the device even if the name itself is unique.
> > Removal of a device should work as well, although we might want to
> > consider creating a *Flags version of the API.
> > 
> > =
> > PART 2: DOMAIN XML & DEVICE AUTO-CREATION, NO POLICY INVOLVED!
> > =
> > 
> > There were some doubts about auto-creation mentioned in [1], although they
> > weren't specified further. So hopefully, we'll get further in the discussion
> > this time.
> > 
> > From my perspective there are two main reasons/benefits to that:
> > 
> > 1) Convenience
> > For apps like virt-manager, user will want to add a host device 
> > transparently,
> > "hey libvirt, I want an mdev assigned to my VM, can you do that". Even for
> > higher management apps, like oVirt, even they might not care about the 
> > parent
> > device at all times and considering that they would need to enumerate the
> > parents, pick one, create the device XML and pass it to the nodedev driver, 
> > IMHO
> > it would actually be easier and faster to just do it directly through sysfs,
> > bypassing libvirt once again  
> 
> The convenience only works if the policy we've provided in libvirt actually
> matches the policy the application wants. I think it is quite likely that with
> cloud the mdevs will be created out of band from the domain startup process.
> It is possible the app will just have a fixed set of mdevs pre-created when
> the host starts up. Or that the mgmt app wants the domain startup process to
> be a two phase setup, where it first allocates the resources needed, and later
> then tries to start the guest. This is why I keep saying that putting this 
> kind
> of "convenient" policy in libvirt is a bad idea - it is essentially just 
> putting
> a bit of virt-manager code into libvirt - more advanced apps will need more
> flexibility in this area.
> 
> > 2) Future domain migration
> > Suppose now that the mdev backing physical devices support state dump and
> > reload. Chances are, that the corresponding mdev doesn't even exist or has a
> > different UUID on the destination, so libvirt would do its best to handle 
> > this
> > before the domain could be resumed.  
> 
> This is not an unusual scenario - there are already many other parts of the
> device backend config that need to change prior to migration, especially for
> anything related to host devices, so apps already have support for doing
> this, which is more flexible & convenient becasue it doesn't tie creation of
> the mdevs to running of the migrate command.
> 
> IOW, I'm still against adding any kind of automatic creation policy for
> mdevs in libvirt. Just provide the node device API support.

I'm not super clear on the extent of what you're against here, is it
all forms of device creation or only a placement policy?  Are you
against any form of having the XML specify the non-instantiated mdev
that it wants?  We've clearly made an important step 

Re: [libvirt] RFC: Creating mediated devices with libvirt

2017-06-15 Thread Daniel P. Berrange
On Thu, Jun 15, 2017 at 12:06:43AM +0200, Erik Skultety wrote:
> Hi all,
> 
> so there's been an off-list discussion about finally implementing creation of
> mediated devices with libvirt and it's more than desired to get as many 
> opinions
> on that as possible, so please do share your ideas. This did come up already 
> as
> part of some older threads ([1] for example), so this will be a respin of the
> discussions. Long story short, we decided to put device creation off and focus
> on the introduction of the framework as such first and build upon that later,
> i.e. now.
> 
> [1] https://www.redhat.com/archives/libvir-list/2017-February/msg00177.html
> 
> 
> PART 1: NODEDEV-DRIVER
> 
> 
> API-wise, device creation through the nodedev driver should be pretty
> straightforward and without any issues, since virNodeDevCreateXML takes an XML
> and does support flags. Looking at the current device XML:
> 
> 
>   mdev_0cce8709_0640_46ef_bd14_962c7f73cc6f
>   
> /sys/devices/pci:00/.../0cce8709-0640-46ef-bd14-962c7f73cc6f
>   pci__03_00_0
>   
> vfio_mdev
>   
>   
> 
> 
> UUID 
>   
> 
> 
> We can ignore ,, elements, since these are useless
> during creation. We also cannot use  since we don't support arbitrary
> names and we also can't rely on users providing a name in correct form which 
> we
> would need to further parse in order to get the UUID.
> So since the only thing missing to successfully use create an mdev using XML 
> is
> the UUID (if user doesn't want it to be generated automatically), how about
> having a  subelement under  just like PCIs have  and
> friends, USBs have  & , interfaces have  to uniquely
> identify the device even if the name itself is unique.
> Removal of a device should work as well, although we might want to
> consider creating a *Flags version of the API.
> 
> =
> PART 2: DOMAIN XML & DEVICE AUTO-CREATION, NO POLICY INVOLVED!
> =
> 
> There were some doubts about auto-creation mentioned in [1], although they
> weren't specified further. So hopefully, we'll get further in the discussion
> this time.
> 
> From my perspective there are two main reasons/benefits to that:
> 
> 1) Convenience
> For apps like virt-manager, user will want to add a host device transparently,
> "hey libvirt, I want an mdev assigned to my VM, can you do that". Even for
> higher management apps, like oVirt, even they might not care about the parent
> device at all times and considering that they would need to enumerate the
> parents, pick one, create the device XML and pass it to the nodedev driver, 
> IMHO
> it would actually be easier and faster to just do it directly through sysfs,
> bypassing libvirt once again

The convenience only works if the policy we've provided in libvirt actually
matches the policy the application wants. I think it is quite likely that with
cloud the mdevs will be created out of band from the domain startup process.
It is possible the app will just have a fixed set of mdevs pre-created when
the host starts up. Or that the mgmt app wants the domain startup process to
be a two phase setup, where it first allocates the resources needed, and later
then tries to start the guest. This is why I keep saying that putting this kind
of "convenient" policy in libvirt is a bad idea - it is essentially just putting
a bit of virt-manager code into libvirt - more advanced apps will need more
flexibility in this area.

> 2) Future domain migration
> Suppose now that the mdev backing physical devices support state dump and
> reload. Chances are, that the corresponding mdev doesn't even exist or has a
> different UUID on the destination, so libvirt would do its best to handle this
> before the domain could be resumed.

This is not an unusual scenario - there are already many other parts of the
device backend config that need to change prior to migration, especially for
anything related to host devices, so apps already have support for doing
this, which is more flexible & convenient becasue it doesn't tie creation of
the mdevs to running of the migrate command.

IOW, I'm still against adding any kind of automatic creation policy for
mdevs in libvirt. Just provide the node device API support.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] RFC: Creating mediated devices with libvirt

2017-06-14 Thread Erik Skultety
Hi all,

so there's been an off-list discussion about finally implementing creation of
mediated devices with libvirt and it's more than desired to get as many opinions
on that as possible, so please do share your ideas. This did come up already as
part of some older threads ([1] for example), so this will be a respin of the
discussions. Long story short, we decided to put device creation off and focus
on the introduction of the framework as such first and build upon that later,
i.e. now.

[1] https://www.redhat.com/archives/libvir-list/2017-February/msg00177.html


PART 1: NODEDEV-DRIVER


API-wise, device creation through the nodedev driver should be pretty
straightforward and without any issues, since virNodeDevCreateXML takes an XML
and does support flags. Looking at the current device XML:


  mdev_0cce8709_0640_46ef_bd14_962c7f73cc6f
  /sys/devices/pci:00/.../0cce8709-0640-46ef-bd14-962c7f73cc6f
  pci__03_00_0
  
vfio_mdev
  
  


UUID 
  


We can ignore ,, elements, since these are useless
during creation. We also cannot use  since we don't support arbitrary
names and we also can't rely on users providing a name in correct form which we
would need to further parse in order to get the UUID.
So since the only thing missing to successfully use create an mdev using XML is
the UUID (if user doesn't want it to be generated automatically), how about
having a  subelement under  just like PCIs have  and
friends, USBs have  & , interfaces have  to uniquely
identify the device even if the name itself is unique.
Removal of a device should work as well, although we might want to
consider creating a *Flags version of the API.

=
PART 2: DOMAIN XML & DEVICE AUTO-CREATION, NO POLICY INVOLVED!
=

There were some doubts about auto-creation mentioned in [1], although they
weren't specified further. So hopefully, we'll get further in the discussion
this time.

>From my perspective there are two main reasons/benefits to that:

1) Convenience
For apps like virt-manager, user will want to add a host device transparently,
"hey libvirt, I want an mdev assigned to my VM, can you do that". Even for
higher management apps, like oVirt, even they might not care about the parent
device at all times and considering that they would need to enumerate the
parents, pick one, create the device XML and pass it to the nodedev driver, IMHO
it would actually be easier and faster to just do it directly through sysfs,
bypassing libvirt once again

2) Future domain migration
Suppose now that the mdev backing physical devices support state dump and
reload. Chances are, that the corresponding mdev doesn't even exist or has a
different UUID on the destination, so libvirt would do its best to handle this
before the domain could be resumed.
Following what we already have:


  
  

  
  


Instead of trying to somehow extend the  element using more
attributes like 'domain', 'slot', 'function', etc. that would render the whole
element ambiguous, I was thinking about creating a  element nested under
 that would be basically just a nested definition of another host device
re-using all the element we already know, i.e.  for PCI, and of course
others if there happens to be a need for devices other than PCI. So speaking
about XML, we'd end up with something like:


  
  

  


  



  
  


So, this was the first idea off the top of my head, so I'd appreciate any
suggestions, comments, especially from people who have got the 'legacy'
insight into libvirt and can predict potential pitfalls based on experience :).

Thanks,
Erik

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list