Re: [libvirt] RFC: Creating mediated devices with libvirt
On Thu, Jun 22, 2017 at 05:57:34PM -0400, John Ferlan wrote: > > > On 06/14/2017 06:06 PM, Erik Skultety wrote: > > Hi all, > > > > so there's been an off-list discussion about finally implementing creation > > of > > mediated devices with libvirt and it's more than desired to get as many > > opinions > > on that as possible, so please do share your ideas. This did come up > > already as > > part of some older threads ([1] for example), so this will be a respin of > > the > > discussions. Long story short, we decided to put device creation off and > > focus > > on the introduction of the framework as such first and build upon that > > later, > > i.e. now. > > > > [1] https://www.redhat.com/archives/libvir-list/2017-February/msg00177.html > > > > > > PART 1: NODEDEV-DRIVER > > > > > > API-wise, device creation through the nodedev driver should be pretty > > straightforward and without any issues, since virNodeDevCreateXML takes an > > XML > > and does support flags. Looking at the current device XML: > > > > > > mdev_0cce8709_0640_46ef_bd14_962c7f73cc6f > > > > /sys/devices/pci:00/.../0cce8709-0640-46ef-bd14-962c7f73cc6f > > pci__03_00_0 > > > > vfio_mdev > > > > > > > > > > UUID > > > > > > > > We can ignore ,, elements, since these are useless > > during creation. We also cannot use since we don't support arbitrary > > names and we also can't rely on users providing a name in correct form > > which we > > would need to further parse in order to get the UUID. > > So since the only thing missing to successfully use create an mdev using > > XML is > > the UUID (if user doesn't want it to be generated automatically), how about > > having a subelement under just like PCIs have > > and > > friends, USBs have & , interfaces have to uniquely > > identify the device even if the name itself is unique. > > Removal of a device should work as well, although we might want to > > consider creating a *Flags version of the API. > > > Has any thought been put towards creating an mdev pool modeled after the > Storage Pool? Similar to how vHBA's are created from a Storage Pool XML > definition. > > That way XML could be defined to keep track of a lot of different things > that you may need and would require only starting the pool in order to > access. > > Placed "appropriately" - the mdev's could already be available by the > time node device state initialization occurs too since the pool would > conceivably been created/defined using data from the physical device and > the calls to create the virtual devices would have occurred. Much easier > to add logic to a new driver/pool mgmt to handle whatever considerations > there are than adding logic into the existing node device driver. All those things you describe are possible with the node device API, once we add the inactive object concept that other APIs have. It is also more flexible to use the node device concept, because it seemlessly integrates with the physical PCI device management. We've already seen with SRIOV NICs that mgmt apps needed the flexibility to choose between assigning the physical NIC, vs assigning individual functions. I expect the same to be true of mdevs, where you choose between assigning the GPU PCI device, vs one of the mdev vGPUs. In OpenStack what I'm expecting is that the existing PCI device / SRIOV device mgmt code (that is based on the node device APIs) is genericised to cover arbitrary types of node device, not simply those with the pci capability. Thus we'd expect mdev mgmt to be part of the node device APIs framework, not split off in a separate set of pool APIs. Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] RFC: Creating mediated devices with libvirt
[...] > >> So, just for clarification of the concept, the device with ^this UUID will > >> have > >> had to be defined by the nodedev API by the time we start to edit the > >> domain > >> XML in this manner in which case the only thing the autocreate=yes would > >> do is > >> to actually create the mdev according to the nodedev config, right? > >> Continuing > >> with that thought, if UUID doesn't refer to any of the inactive configs it > >> will > >> be an error I suppose? What about the fact that only one vgpu type can > >> live on > >> the GPU? even if you can successfully identify a device using the UUID in > >> this > >> way, you'll still face the problem, that other types might be currently > >> occupying the GPU and need to be torn down first, will this be automated as > >> well in what you suggest? I assume not. > > > > Technically we shouldn't need the node device to exist at the time we > > define the XML - only at the time we start the guest, does the node > > device have to exist. eg same way you list a virtual network as the > > source of a guest NIC, but that virtual network doesn't have to actually > > have been defined & started until the guest starts. > > > > If there are constraints that a pGPU can only support a certain combination > > of vGPUs at any single point in time, doesn't the kernel already enforce > > that when you try to create the vGPU in sysfs. IOW, we merely need to try > > to create the vGPU, and if the kernel mdev driver doesn't allow you to mix > > that with the other vGPUs that already exist, then we'd just report an > > error from virNodeDeviceCreate, and that'd get propagated back as the > > error for the virDomainCreate call. > > > >> > >>> > >>> > >>> > >>> > >>> In the QEMU driver, then the only change required is > >>> > >>>if (def->autocreate) > >>>virNodeDeviceCreate(dev) > >> > >> Aha, so if a device gets torn down on shutdown, we won't face the problem > >> with > >> some other devices being active, all of them will have to be in the > >> inactive > >> state because they got torn down during the last shutdown - that would > >> work. > > > > I'm not sure what the relationship with other active devices is relevant > > here. The virNodeDevicePtr we're accesing here is a single vGPU - if other > > running guests have further vGPUs on the same pGPU, that's not really > > relevant. Each vGPU is created/deleted as required. > > I think he's talking about devices that were previously used by other > domains that are no longer active. Since they're also automatically > destroyed, they're not a problem. Yes, that was exactly my point, anyhow, seems like I got a grasp of Dan's proposal then, great. Erik -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] RFC: Creating mediated devices with libvirt
On 06/14/2017 06:06 PM, Erik Skultety wrote: > Hi all, > > so there's been an off-list discussion about finally implementing creation of > mediated devices with libvirt and it's more than desired to get as many > opinions > on that as possible, so please do share your ideas. This did come up already > as > part of some older threads ([1] for example), so this will be a respin of the > discussions. Long story short, we decided to put device creation off and focus > on the introduction of the framework as such first and build upon that later, > i.e. now. > > [1] https://www.redhat.com/archives/libvir-list/2017-February/msg00177.html > > > PART 1: NODEDEV-DRIVER > > > API-wise, device creation through the nodedev driver should be pretty > straightforward and without any issues, since virNodeDevCreateXML takes an XML > and does support flags. Looking at the current device XML: > > > mdev_0cce8709_0640_46ef_bd14_962c7f73cc6f > > /sys/devices/pci:00/.../0cce8709-0640-46ef-bd14-962c7f73cc6f > pci__03_00_0 > > vfio_mdev > > > > > UUID > > > > We can ignore ,, elements, since these are useless > during creation. We also cannot use since we don't support arbitrary > names and we also can't rely on users providing a name in correct form which > we > would need to further parse in order to get the UUID. > So since the only thing missing to successfully use create an mdev using XML > is > the UUID (if user doesn't want it to be generated automatically), how about > having a subelement under just like PCIs have and > friends, USBs have & , interfaces have to uniquely > identify the device even if the name itself is unique. > Removal of a device should work as well, although we might want to > consider creating a *Flags version of the API. Has any thought been put towards creating an mdev pool modeled after the Storage Pool? Similar to how vHBA's are created from a Storage Pool XML definition. That way XML could be defined to keep track of a lot of different things that you may need and would require only starting the pool in order to access. Placed "appropriately" - the mdev's could already be available by the time node device state initialization occurs too since the pool would conceivably been created/defined using data from the physical device and the calls to create the virtual devices would have occurred. Much easier to add logic to a new driver/pool mgmt to handle whatever considerations there are than adding logic into the existing node device driver. Of course if there's only ever going to be a 1-to-1 relationship between whatever the mdev parent is and an mdev child, then it's probably overkill to go with a pool model; however, I was under the impression that an mdev parent could have many mdev children with various different configuration options depending on multiple factors. Thus: Happy UUID ... ... where the parent is then "found" in node device via "mdev_%s", XML that would define specific "formats" that could be used and made active/inactive. A bit different than XML which is output only based on what's found in the storage pool source. My recollection of the whole frame work is not up to par with the latest information, but I recall there being multiple different ways to have "something" defined that could then be used by the guest based on one parent mdev. What those things are were a combination of what the mdev could support and there could be 1 or many depending on the resultant vGPU. Maybe we need a virtual white board to help describe the things ;-) If you wait long enough or perhaps if review pace would pick up, maybe creating a new driver and vir*obj infrastructure will be easier with a common virObject instance. Oh and this has a "uuid" and "name" for searches, so fits nicely. > > = > PART 2: DOMAIN XML & DEVICE AUTO-CREATION, NO POLICY INVOLVED! > = > > There were some doubts about auto-creation mentioned in [1], although they > weren't specified further. So hopefully, we'll get further in the discussion > this time. > >>From my perspective there are two main reasons/benefits to that: > > 1) Convenience > For apps like virt-manager, user will want to add a host device transparently, > "hey libvirt, I want an mdev assigned to my VM, can you do that". Even for > higher management apps, like oVirt, even they might not care about the parent > device at all times and considering that they would need to enumerate the > parents, pick one, create the device XML and pass it to the nodedev driver, > IMHO > it would actually be easier and faster to just do it directly through sysfs, > bypassing libvirt once again Using "pool" methodology borrows on existing storage technology except applying it to
Re: [libvirt] RFC: Creating mediated devices with libvirt
On 06/22/2017 11:52 AM, Pavel Hrdina wrote: > On Thu, Jun 22, 2017 at 09:28:57AM -0600, Alex Williamson wrote: >> On Thu, 22 Jun 2017 17:14:48 +0200 >> Erik Skultetywrote: >> >>> [...] > > ^this is the thing we constantly keep discussing as everyone has a > slightly > different angle of view - libvirt does not implement any kind of policy, > therefore the only "configuration" would be the PCI parent placement - > you say > what to do and we do it, no logic in it, that's it. Now, I don't > understand > taking care of the guesswork for the user in the simplest manner possible > as > policy rather as a mere convenience, be it just for developers and > testers, but > even that might apparently be perceived as a policy and therefore > unacceptable. > > I still stand by idea of having auto-creation as unfortunately, I sort of > still > fail to understand what the negative implications of having it are - is > that it > would get just unnecessarily too complex to maintain in the future that > we would > regret it or that we'd get a huge amount of follow-up requests for > extending the > feature or is it just that simply the interpretation of auto-create == > policy? The increasing complexity of the qemu driver is a significant concern with adding policy based logic to the code. THinking about this though, if we provide the inactive node device feature, then we can avoid essentially all new code and complexity QEMU driver, and still support auto-create. ie, in the domain XML we just continue to have the exact same XML that we already have today for mdevs, but with a single new attribute autocreate=yes|no >>> autocreate="yes"> >>> >>> So, just for clarification of the concept, the device with ^this UUID will >>> have >>> had to be defined by the nodedev API by the time we start to edit the domain >>> XML in this manner in which case the only thing the autocreate=yes would do >>> is >>> to actually create the mdev according to the nodedev config, right? >>> Continuing >>> with that thought, if UUID doesn't refer to any of the inactive configs it >>> will >>> be an error I suppose? What about the fact that only one vgpu type can live >>> on >>> the GPU? even if you can successfully identify a device using the UUID in >>> this >>> way, you'll still face the problem, that other types might be currently >>> occupying the GPU and need to be torn down first, will this be automated as >>> well in what you suggest? I assume not. >>> In the QEMU driver, then the only change required is if (def->autocreate) virNodeDeviceCreate(dev) >>> >>> Aha, so if a device gets torn down on shutdown, we won't face the problem >>> with >>> some other devices being active, all of them will have to be in the inactive >>> state because they got torn down during the last shutdown - that would work. >> >> >> I'm not familiar with how inactive devices would be defined in the >> nodedev API, would someone mind explaining or providing an example >> please? I don't understand where the metadata is stored that describes >> the what and where of a given UUID. Thanks, > > It would basically copy what we do for domains. Currently there is > virNodeDeviceCreateXML() which takes the XML definitions and creates a > new active node device and virNodeDeviceDestroy() which takes as > argument an object of existing active node device. FWIW: (Just in case someone doesn't know yet...) The only current CreateXML consumer is for NPIV/vHBA devices. As I've pointed out before I see a lot of similarities w/ mdev because they both have a dependency on "something else" in order for proper creation. NPIV/vHBA requires an HBA (scsi_hostN) that has a sysfs structure with a vport_create function to create the vHBA. The HBA scsi_hostN is instantiated during udevEnumerateDevices processing while the vHBA scsi_hostM is created during udevEventHandleCallback. The CreateXML provides an essentially 'transient' model to describe a(the) vHBA device(s). After host reboot, one would have to run virsh nodedev-create file.xml in order to recreate their vHBA. In order to create more permanent vHBA's, it's possible to define a storage pool that would create the vHBA when the storage pool is started. So while there's no DefineXML support, there is a model that does provide a mechanism to have persistence without needing to have a DefineXML for node devices. > > We would extend the functionality with new APIs: > > - virNodeDeviceCreate() which would take as argument an object of > existing inactive node device. > > - virNodeDeviceDefineXML() would define the node device as inactive. > > With the virNodeDeviceDefineXML() you would create a list of predefined > inactive
Re: [libvirt] RFC: Creating mediated devices with libvirt
On Thu, Jun 22, 2017 at 12:33:16PM -0400, Laine Stump wrote: > On 06/22/2017 11:28 AM, Alex Williamson wrote: > > On Thu, 22 Jun 2017 17:14:48 +0200 > > Erik Skultetywrote: > > > >> [...] > > ^this is the thing we constantly keep discussing as everyone has a > slightly > different angle of view - libvirt does not implement any kind of policy, > therefore the only "configuration" would be the PCI parent placement - > you say > what to do and we do it, no logic in it, that's it. Now, I don't > understand > taking care of the guesswork for the user in the simplest manner > possible as > policy rather as a mere convenience, be it just for developers and > testers, but > even that might apparently be perceived as a policy and therefore > unacceptable. > > I still stand by idea of having auto-creation as unfortunately, I sort > of still > fail to understand what the negative implications of having it are - is > that it > would get just unnecessarily too complex to maintain in the future that > we would > regret it or that we'd get a huge amount of follow-up requests for > extending the > feature or is it just that simply the interpretation of auto-create == > policy? > >>> > >>> The increasing complexity of the qemu driver is a significant concern with > >>> adding policy based logic to the code. THinking about this though, if we > >>> provide the inactive node device feature, then we can avoid essentially > >>> all new code and complexity QEMU driver, and still support auto-create. > >>> > >>> ie, in the domain XML we just continue to have the exact same XML that > >>> we already have today for mdevs, but with a single new attribute > >>> autocreate=yes|no > >>> > >>> > >>> >>> autocreate="yes"> > >>> > >>> > >> > >> So, just for clarification of the concept, the device with ^this UUID will > >> have > >> had to be defined by the nodedev API by the time we start to edit the > >> domain > >> XML in this manner in which case the only thing the autocreate=yes would > >> do is > >> to actually create the mdev according to the nodedev config, right? > >> Continuing > >> with that thought, if UUID doesn't refer to any of the inactive configs it > >> will > >> be an error I suppose? What about the fact that only one vgpu type can > >> live on > >> the GPU? even if you can successfully identify a device using the UUID in > >> this > >> way, you'll still face the problem, that other types might be currently > >> occupying the GPU and need to be torn down first, will this be automated as > >> well in what you suggest? I assume not. > >> > >>> > >>> > >>> > >>> > >>> In the QEMU driver, then the only change required is > >>> > >>>if (def->autocreate) > >>>virNodeDeviceCreate(dev) > >> > >> Aha, so if a device gets torn down on shutdown, we won't face the problem > >> with > >> some other devices being active, all of them will have to be in the > >> inactive > >> state because they got torn down during the last shutdown - that would > >> work. > > > > > > I'm not familiar with how inactive devices would be defined in the > > nodedev API, would someone mind explaining or providing an example > > please? I don't understand where the metadata is stored that describes > > the what and where of a given UUID. Thanks, > > You don't understand it because it doesn't exist yet :-) > > The idea is essentially the same that we've talked about, except that > all the information about parent PCI address, desired type of child, and > anything else (is there anything else?) is stored in some > not-yet-specified persistent node device config rather than directly in > the domain XML. Maybe something like: > > > BobLobLaw > > > > > > > I haven't thought about how it would show the difference between active > and inactive - didn't get enough coffee today and I have a headache. The XML doesn't need to show the difference between active & inactive. That distinction is something you filter on when querying the list of devices. We'd want to add a virNodeDeviceIsActive() API like we have for other objects too, so you can query it afterwards too. > ... okay, another "shower thought" is coming in... One deficiency of > this comes to mind - since the domain config references the device by > uuid, and an existing child device's uuid can't be changed, the unique > uuid used by a particular domain must be defined on all of the hosts > that the domain might be moved to. And since other domains can't share > that uuid (unless you're 100% sure they'll never be active at the same > time), you won't be able to implement the alternate idea of "pre-create > all the devices, then assign them to domains as needed"; instead, you'll > be forced to use the "create-on-demand" model. You can still
Re: [libvirt] RFC: Creating mediated devices with libvirt
On 06/22/2017 12:15 PM, Daniel P. Berrange wrote: > On Thu, Jun 22, 2017 at 05:14:48PM +0200, Erik Skultety wrote: >> [...] ^this is the thing we constantly keep discussing as everyone has a slightly different angle of view - libvirt does not implement any kind of policy, therefore the only "configuration" would be the PCI parent placement - you say what to do and we do it, no logic in it, that's it. Now, I don't understand taking care of the guesswork for the user in the simplest manner possible as policy rather as a mere convenience, be it just for developers and testers, but even that might apparently be perceived as a policy and therefore unacceptable. I still stand by idea of having auto-creation as unfortunately, I sort of still fail to understand what the negative implications of having it are - is that it would get just unnecessarily too complex to maintain in the future that we would regret it or that we'd get a huge amount of follow-up requests for extending the feature or is it just that simply the interpretation of auto-create == policy? >>> >>> The increasing complexity of the qemu driver is a significant concern with >>> adding policy based logic to the code. THinking about this though, if we >>> provide the inactive node device feature, then we can avoid essentially >>> all new code and complexity QEMU driver, and still support auto-create. >>> >>> ie, in the domain XML we just continue to have the exact same XML that >>> we already have today for mdevs, but with a single new attribute >>> autocreate=yes|no >>> >>> >>> >>> >>> >> >> So, just for clarification of the concept, the device with ^this UUID will >> have >> had to be defined by the nodedev API by the time we start to edit the domain >> XML in this manner in which case the only thing the autocreate=yes would do >> is >> to actually create the mdev according to the nodedev config, right? >> Continuing >> with that thought, if UUID doesn't refer to any of the inactive configs it >> will >> be an error I suppose? What about the fact that only one vgpu type can live >> on >> the GPU? even if you can successfully identify a device using the UUID in >> this >> way, you'll still face the problem, that other types might be currently >> occupying the GPU and need to be torn down first, will this be automated as >> well in what you suggest? I assume not. > > Technically we shouldn't need the node device to exist at the time we > define the XML - only at the time we start the guest, does the node > device have to exist. eg same way you list a virtual network as the > source of a guest NIC, but that virtual network doesn't have to actually > have been defined & started until the guest starts. > > If there are constraints that a pGPU can only support a certain combination > of vGPUs at any single point in time, doesn't the kernel already enforce > that when you try to create the vGPU in sysfs. IOW, we merely need to try > to create the vGPU, and if the kernel mdev driver doesn't allow you to mix > that with the other vGPUs that already exist, then we'd just report an > error from virNodeDeviceCreate, and that'd get propagated back as the > error for the virDomainCreate call. > >> >>> >>> >>> >>> >>> In the QEMU driver, then the only change required is >>> >>>if (def->autocreate) >>>virNodeDeviceCreate(dev) >> >> Aha, so if a device gets torn down on shutdown, we won't face the problem >> with >> some other devices being active, all of them will have to be in the inactive >> state because they got torn down during the last shutdown - that would work. > > I'm not sure what the relationship with other active devices is relevant > here. The virNodeDevicePtr we're accesing here is a single vGPU - if other > running guests have further vGPUs on the same pGPU, that's not really > relevant. Each vGPU is created/deleted as required. I think he's talking about devices that were previously used by other domains that are no longer active. Since they're also automatically destroyed, they're not a problem. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] RFC: Creating mediated devices with libvirt
On 06/22/2017 11:28 AM, Alex Williamson wrote: > On Thu, 22 Jun 2017 17:14:48 +0200 > Erik Skultetywrote: > >> [...] ^this is the thing we constantly keep discussing as everyone has a slightly different angle of view - libvirt does not implement any kind of policy, therefore the only "configuration" would be the PCI parent placement - you say what to do and we do it, no logic in it, that's it. Now, I don't understand taking care of the guesswork for the user in the simplest manner possible as policy rather as a mere convenience, be it just for developers and testers, but even that might apparently be perceived as a policy and therefore unacceptable. I still stand by idea of having auto-creation as unfortunately, I sort of still fail to understand what the negative implications of having it are - is that it would get just unnecessarily too complex to maintain in the future that we would regret it or that we'd get a huge amount of follow-up requests for extending the feature or is it just that simply the interpretation of auto-create == policy? >>> >>> The increasing complexity of the qemu driver is a significant concern with >>> adding policy based logic to the code. THinking about this though, if we >>> provide the inactive node device feature, then we can avoid essentially >>> all new code and complexity QEMU driver, and still support auto-create. >>> >>> ie, in the domain XML we just continue to have the exact same XML that >>> we already have today for mdevs, but with a single new attribute >>> autocreate=yes|no >>> >>> >>> >>> >>> >> >> So, just for clarification of the concept, the device with ^this UUID will >> have >> had to be defined by the nodedev API by the time we start to edit the domain >> XML in this manner in which case the only thing the autocreate=yes would do >> is >> to actually create the mdev according to the nodedev config, right? >> Continuing >> with that thought, if UUID doesn't refer to any of the inactive configs it >> will >> be an error I suppose? What about the fact that only one vgpu type can live >> on >> the GPU? even if you can successfully identify a device using the UUID in >> this >> way, you'll still face the problem, that other types might be currently >> occupying the GPU and need to be torn down first, will this be automated as >> well in what you suggest? I assume not. >> >>> >>> >>> >>> >>> In the QEMU driver, then the only change required is >>> >>>if (def->autocreate) >>>virNodeDeviceCreate(dev) >> >> Aha, so if a device gets torn down on shutdown, we won't face the problem >> with >> some other devices being active, all of them will have to be in the inactive >> state because they got torn down during the last shutdown - that would work. > > > I'm not familiar with how inactive devices would be defined in the > nodedev API, would someone mind explaining or providing an example > please? I don't understand where the metadata is stored that describes > the what and where of a given UUID. Thanks, You don't understand it because it doesn't exist yet :-) The idea is essentially the same that we've talked about, except that all the information about parent PCI address, desired type of child, and anything else (is there anything else?) is stored in some not-yet-specified persistent node device config rather than directly in the domain XML. Maybe something like: BobLobLaw I haven't thought about how it would show the difference between active and inactive - didn't get enough coffee today and I have a headache. The advantage of this is that it uncouples the specifics of the child device from the domain XML - the only thing in the domain XML is the uuid. So a device config with that uuid would need to exist on every host where you wanted to run a particular guest, but the details could be different, yet you wouldn't need to edit the domain XML. This is a similar concept to the idea of creating libvirt networks that are just an indirect pointer to a bridge device (which may have a different name on each host) or to an SRIOV PF (yeah, I know Dan doesn't like that feature, but I find it very useful, and unobtrusive if management chooses not to use it). So from your point of view (I'm talking to Alex here), implementing it this way would mean that you would need to create the child device definitions in the nodedev driver once (and possibly/hopefully the uuid of the devices would be autogenerated, same as we do for uuids in other parts of libvirt config), then copy that uuid to the domain config one time. But after doing that once, you would be able to start and stop domains and the host without any extra action. You could also define different nodedevices that used the same parent for different child types, and reference them
Re: [libvirt] RFC: Creating mediated devices with libvirt
On Thu, Jun 22, 2017 at 05:14:48PM +0200, Erik Skultety wrote: > [...] > > > > > > ^this is the thing we constantly keep discussing as everyone has a > > > slightly > > > different angle of view - libvirt does not implement any kind of policy, > > > therefore the only "configuration" would be the PCI parent placement - > > > you say > > > what to do and we do it, no logic in it, that's it. Now, I don't > > > understand > > > taking care of the guesswork for the user in the simplest manner possible > > > as > > > policy rather as a mere convenience, be it just for developers and > > > testers, but > > > even that might apparently be perceived as a policy and therefore > > > unacceptable. > > > > > > I still stand by idea of having auto-creation as unfortunately, I sort of > > > still > > > fail to understand what the negative implications of having it are - is > > > that it > > > would get just unnecessarily too complex to maintain in the future that > > > we would > > > regret it or that we'd get a huge amount of follow-up requests for > > > extending the > > > feature or is it just that simply the interpretation of auto-create == > > > policy? > > > > The increasing complexity of the qemu driver is a significant concern with > > adding policy based logic to the code. THinking about this though, if we > > provide the inactive node device feature, then we can avoid essentially > > all new code and complexity QEMU driver, and still support auto-create. > > > > ie, in the domain XML we just continue to have the exact same XML that > > we already have today for mdevs, but with a single new attribute > > autocreate=yes|no > > > > > > > > > > > > So, just for clarification of the concept, the device with ^this UUID will > have > had to be defined by the nodedev API by the time we start to edit the domain > XML in this manner in which case the only thing the autocreate=yes would do is > to actually create the mdev according to the nodedev config, right? Continuing > with that thought, if UUID doesn't refer to any of the inactive configs it > will > be an error I suppose? What about the fact that only one vgpu type can live on > the GPU? even if you can successfully identify a device using the UUID in this > way, you'll still face the problem, that other types might be currently > occupying the GPU and need to be torn down first, will this be automated as > well in what you suggest? I assume not. Technically we shouldn't need the node device to exist at the time we define the XML - only at the time we start the guest, does the node device have to exist. eg same way you list a virtual network as the source of a guest NIC, but that virtual network doesn't have to actually have been defined & started until the guest starts. If there are constraints that a pGPU can only support a certain combination of vGPUs at any single point in time, doesn't the kernel already enforce that when you try to create the vGPU in sysfs. IOW, we merely need to try to create the vGPU, and if the kernel mdev driver doesn't allow you to mix that with the other vGPUs that already exist, then we'd just report an error from virNodeDeviceCreate, and that'd get propagated back as the error for the virDomainCreate call. > > > > > > > > > > > In the QEMU driver, then the only change required is > > > >if (def->autocreate) > >virNodeDeviceCreate(dev) > > Aha, so if a device gets torn down on shutdown, we won't face the problem with > some other devices being active, all of them will have to be in the inactive > state because they got torn down during the last shutdown - that would work. I'm not sure what the relationship with other active devices is relevant here. The virNodeDevicePtr we're accesing here is a single vGPU - if other running guests have further vGPUs on the same pGPU, that's not really relevant. Each vGPU is created/deleted as required. Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] RFC: Creating mediated devices with libvirt
On Thu, Jun 22, 2017 at 09:28:57AM -0600, Alex Williamson wrote: > On Thu, 22 Jun 2017 17:14:48 +0200 > Erik Skultetywrote: > > > [...] > > > > > > > > ^this is the thing we constantly keep discussing as everyone has a > > > > slightly > > > > different angle of view - libvirt does not implement any kind of policy, > > > > therefore the only "configuration" would be the PCI parent placement - > > > > you say > > > > what to do and we do it, no logic in it, that's it. Now, I don't > > > > understand > > > > taking care of the guesswork for the user in the simplest manner > > > > possible as > > > > policy rather as a mere convenience, be it just for developers and > > > > testers, but > > > > even that might apparently be perceived as a policy and therefore > > > > unacceptable. > > > > > > > > I still stand by idea of having auto-creation as unfortunately, I sort > > > > of still > > > > fail to understand what the negative implications of having it are - is > > > > that it > > > > would get just unnecessarily too complex to maintain in the future that > > > > we would > > > > regret it or that we'd get a huge amount of follow-up requests for > > > > extending the > > > > feature or is it just that simply the interpretation of auto-create == > > > > policy? > > > > > > The increasing complexity of the qemu driver is a significant concern with > > > adding policy based logic to the code. THinking about this though, if we > > > provide the inactive node device feature, then we can avoid essentially > > > all new code and complexity QEMU driver, and still support auto-create. > > > > > > ie, in the domain XML we just continue to have the exact same XML that > > > we already have today for mdevs, but with a single new attribute > > > autocreate=yes|no > > > > > > > > > > > autocreate="yes"> > > > > > > > > > > So, just for clarification of the concept, the device with ^this UUID will > > have > > had to be defined by the nodedev API by the time we start to edit the domain > > XML in this manner in which case the only thing the autocreate=yes would do > > is > > to actually create the mdev according to the nodedev config, right? > > Continuing > > with that thought, if UUID doesn't refer to any of the inactive configs it > > will > > be an error I suppose? What about the fact that only one vgpu type can live > > on > > the GPU? even if you can successfully identify a device using the UUID in > > this > > way, you'll still face the problem, that other types might be currently > > occupying the GPU and need to be torn down first, will this be automated as > > well in what you suggest? I assume not. > > > > > > > > > > > > > > > > > In the QEMU driver, then the only change required is > > > > > >if (def->autocreate) > > >virNodeDeviceCreate(dev) > > > > Aha, so if a device gets torn down on shutdown, we won't face the problem > > with > > some other devices being active, all of them will have to be in the inactive > > state because they got torn down during the last shutdown - that would work. > > > I'm not familiar with how inactive devices would be defined in the > nodedev API, would someone mind explaining or providing an example > please? I don't understand where the metadata is stored that describes > the what and where of a given UUID. Thanks, It would basically copy what we do for domains. Currently there is virNodeDeviceCreateXML() which takes the XML definitions and creates a new active node device and virNodeDeviceDestroy() which takes as argument an object of existing active node device. We would extend the functionality with new APIs: - virNodeDeviceCreate() which would take as argument an object of existing inactive node device. - virNodeDeviceDefineXML() would define the node device as inactive. With the virNodeDeviceDefineXML() you would create a list of predefined inactive devices which could be obtained by virConnectListAllNodeDevices() for example. Internally we would store XML files the same way as we do for domains, somewhere in "/etc/libvirt/..." and like with domains the APIs would work with these files. In virsh terms there would be similar analogy to the domain commands: "virsh nodedev-start" could simply map to virNodeDeviceCreate() and would work like "virsh start" for domains and "virsh nodedev-define" woudl map to virNodeDeviceDefineXML() and work the same way as "virsh define". You could simply list the predefined mdev devices using "virsh nodedev-list", get UUID of existing mdev device and use it in a domain. In virt-manager there could be new type of hostdev device where you could select on of existing mdev devices from a drop-down list where virt-manager would show nice user-friendly descriptions of the mdev devices but under the hood it would put the UUID in the domain XML. Pavel > > Alex > > -- > libvir-list mailing list > libvir-list@redhat.com >
Re: [libvirt] RFC: Creating mediated devices with libvirt
On Thu, 22 Jun 2017 17:14:48 +0200 Erik Skultetywrote: > [...] > > > > > > ^this is the thing we constantly keep discussing as everyone has a > > > slightly > > > different angle of view - libvirt does not implement any kind of policy, > > > therefore the only "configuration" would be the PCI parent placement - > > > you say > > > what to do and we do it, no logic in it, that's it. Now, I don't > > > understand > > > taking care of the guesswork for the user in the simplest manner possible > > > as > > > policy rather as a mere convenience, be it just for developers and > > > testers, but > > > even that might apparently be perceived as a policy and therefore > > > unacceptable. > > > > > > I still stand by idea of having auto-creation as unfortunately, I sort of > > > still > > > fail to understand what the negative implications of having it are - is > > > that it > > > would get just unnecessarily too complex to maintain in the future that > > > we would > > > regret it or that we'd get a huge amount of follow-up requests for > > > extending the > > > feature or is it just that simply the interpretation of auto-create == > > > policy? > > > > The increasing complexity of the qemu driver is a significant concern with > > adding policy based logic to the code. THinking about this though, if we > > provide the inactive node device feature, then we can avoid essentially > > all new code and complexity QEMU driver, and still support auto-create. > > > > ie, in the domain XML we just continue to have the exact same XML that > > we already have today for mdevs, but with a single new attribute > > autocreate=yes|no > > > > > > > > > > > > So, just for clarification of the concept, the device with ^this UUID will > have > had to be defined by the nodedev API by the time we start to edit the domain > XML in this manner in which case the only thing the autocreate=yes would do is > to actually create the mdev according to the nodedev config, right? Continuing > with that thought, if UUID doesn't refer to any of the inactive configs it > will > be an error I suppose? What about the fact that only one vgpu type can live on > the GPU? even if you can successfully identify a device using the UUID in this > way, you'll still face the problem, that other types might be currently > occupying the GPU and need to be torn down first, will this be automated as > well in what you suggest? I assume not. > > > > > > > > > > > In the QEMU driver, then the only change required is > > > >if (def->autocreate) > >virNodeDeviceCreate(dev) > > Aha, so if a device gets torn down on shutdown, we won't face the problem with > some other devices being active, all of them will have to be in the inactive > state because they got torn down during the last shutdown - that would work. I'm not familiar with how inactive devices would be defined in the nodedev API, would someone mind explaining or providing an example please? I don't understand where the metadata is stored that describes the what and where of a given UUID. Thanks, Alex -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] RFC: Creating mediated devices with libvirt
[...] > > > > ^this is the thing we constantly keep discussing as everyone has a slightly > > different angle of view - libvirt does not implement any kind of policy, > > therefore the only "configuration" would be the PCI parent placement - you > > say > > what to do and we do it, no logic in it, that's it. Now, I don't understand > > taking care of the guesswork for the user in the simplest manner possible as > > policy rather as a mere convenience, be it just for developers and testers, > > but > > even that might apparently be perceived as a policy and therefore > > unacceptable. > > > > I still stand by idea of having auto-creation as unfortunately, I sort of > > still > > fail to understand what the negative implications of having it are - is > > that it > > would get just unnecessarily too complex to maintain in the future that we > > would > > regret it or that we'd get a huge amount of follow-up requests for > > extending the > > feature or is it just that simply the interpretation of auto-create == > > policy? > > The increasing complexity of the qemu driver is a significant concern with > adding policy based logic to the code. THinking about this though, if we > provide the inactive node device feature, then we can avoid essentially > all new code and complexity QEMU driver, and still support auto-create. > > ie, in the domain XML we just continue to have the exact same XML that > we already have today for mdevs, but with a single new attribute > autocreate=yes|no > > > > > So, just for clarification of the concept, the device with ^this UUID will have had to be defined by the nodedev API by the time we start to edit the domain XML in this manner in which case the only thing the autocreate=yes would do is to actually create the mdev according to the nodedev config, right? Continuing with that thought, if UUID doesn't refer to any of the inactive configs it will be an error I suppose? What about the fact that only one vgpu type can live on the GPU? even if you can successfully identify a device using the UUID in this way, you'll still face the problem, that other types might be currently occupying the GPU and need to be torn down first, will this be automated as well in what you suggest? I assume not. > > > > > In the QEMU driver, then the only change required is > >if (def->autocreate) >virNodeDeviceCreate(dev) Aha, so if a device gets torn down on shutdown, we won't face the problem with some other devices being active, all of them will have to be in the inactive state because they got torn down during the last shutdown - that would work. Erik -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] RFC: Creating mediated devices with libvirt
On Thu, Jun 22, 2017 at 10:41:13AM +0200, Martin Polednik wrote: > On 16/06/17 18:14 +0100, Daniel P. Berrange wrote: > > On Fri, Jun 16, 2017 at 06:11:17PM +0100, Daniel P. Berrange wrote: > > > On Fri, Jun 16, 2017 at 11:02:55AM -0600, Alex Williamson wrote: > > > > On Fri, 16 Jun 2017 11:32:04 -0400 > > > > Laine Stumpwrote: > > > > > > > > > On 06/15/2017 02:42 PM, Alex Williamson wrote: > > > > > > On Thu, 15 Jun 2017 09:33:01 +0100 > > > > > > "Daniel P. Berrange" wrote: > > > > > > > > > > > >> On Thu, Jun 15, 2017 at 12:06:43AM +0200, Erik Skultety wrote: > > > > > >>> Hi all, > > > > > >>> > > > > > >>> so there's been an off-list discussion about finally implementing > > > > > >>> creation of > > > > > >>> mediated devices with libvirt and it's more than desired to get > > > > > >>> as many opinions > > > > > >>> on that as possible, so please do share your ideas. This did come > > > > > >>> up already as > > > > > >>> part of some older threads ([1] for example), so this will be a > > > > > >>> respin of the > > > > > >>> discussions. Long story short, we decided to put device creation > > > > > >>> off and focus > > > > > >>> on the introduction of the framework as such first and build upon > > > > > >>> that later, > > > > > >>> i.e. now. > > > > > >>> > > > > > >>> [1] > > > > > >>> https://www.redhat.com/archives/libvir-list/2017-February/msg00177.html > > > > > >>> > > > > > >>> > > > > > >>> PART 1: NODEDEV-DRIVER > > > > > >>> > > > > > >>> > > > > > >>> API-wise, device creation through the nodedev driver should be > > > > > >>> pretty > > > > > >>> straightforward and without any issues, since virNodeDevCreateXML > > > > > >>> takes an XML > > > > > >>> and does support flags. Looking at the current device XML: > > > > > >>> > > > > > >>> > > > > > >>> mdev_0cce8709_0640_46ef_bd14_962c7f73cc6f > > > > > >>> > > > > > >>> /sys/devices/pci:00/.../0cce8709-0640-46ef-bd14-962c7f73cc6f > > > > > >>> pci__03_00_0 > > > > > >>> > > > > > >>> vfio_mdev > > > > > >>> > > > > > >>> > > > > > >>> > > > > > >>> > > > > > >>> UUID > > > > > >>> > > > > > >>> > > > > > >>> > > > > > >>> We can ignore ,, elements, since these > > > > > >>> are useless > > > > > >>> during creation. We also cannot use since we don't support > > > > > >>> arbitrary > > > > > >>> names and we also can't rely on users providing a name in correct > > > > > >>> form which we > > > > > >>> would need to further parse in order to get the UUID. > > > > > >>> So since the only thing missing to successfully use create an > > > > > >>> mdev using XML is > > > > > >>> the UUID (if user doesn't want it to be generated automatically), > > > > > >>> how about > > > > > >>> having a subelement under just like PCIs have > > > > > >>> and > > > > > >>> friends, USBs have & , interfaces have to > > > > > >>> uniquely > > > > > >>> identify the device even if the name itself is unique. > > > > > >>> Removal of a device should work as well, although we might want to > > > > > >>> consider creating a *Flags version of the API. > > > > > >>> > > > > > >>> = > > > > > >>> PART 2: DOMAIN XML & DEVICE AUTO-CREATION, NO POLICY INVOLVED! > > > > > >>> = > > > > > >>> > > > > > >>> There were some doubts about auto-creation mentioned in [1], > > > > > >>> although they > > > > > >>> weren't specified further. So hopefully, we'll get further in the > > > > > >>> discussion > > > > > >>> this time. > > > > > >>> > > > > > >>> From my perspective there are two main reasons/benefits to that: > > > > > >>> > > > > > >>> 1) Convenience > > > > > >>> For apps like virt-manager, user will want to add a host device > > > > > >>> transparently, > > > > > >>> "hey libvirt, I want an mdev assigned to my VM, can you do that". > > > > > >>> Even for > > > > > >>> higher management apps, like oVirt, even they might not care > > > > > >>> about the parent > > > > > >>> device at all times and considering that they would need to > > > > > >>> enumerate the > > > > > >>> parents, pick one, create the device XML and pass it to the > > > > > >>> nodedev driver, IMHO > > > > > >>> it would actually be easier and faster to just do it directly > > > > > >>> through sysfs, > > > > > >>> bypassing libvirt once again > > > > > >> > > > > > >> The convenience only works if the policy we've provided in libvirt > > > > > >> actually > > > > > >> matches the policy the application wants. I think it is quite > > > > > >> likely that with > > > > > >> cloud the mdevs will be created out of band from the domain > > > > > >> startup process. > > > > > >> It is possible the app will just have a fixed set of mdevs > > > > > >> pre-created when > > > > > >>
Re: [libvirt] RFC: Creating mediated devices with libvirt
On Thu, Jun 22, 2017 at 02:05:26PM +0200, Erik Skultety wrote: > On Thu, Jun 22, 2017 at 10:41:13AM +0200, Martin Polednik wrote: > > On 16/06/17 18:14 +0100, Daniel P. Berrange wrote: > > > On Fri, Jun 16, 2017 at 06:11:17PM +0100, Daniel P. Berrange wrote: > > > > > > > > I'm fine with libvirt having APIs in the node device APIs to enable > > > > create/delete with libvirt, as well as using managed=yes in the same > > > > manner that we do for regular PCI devices (the bind/unbind to vfio > > > > or pci-back) > > > > > > Oh, and we really need to fix the big missing feature in the node > > > device APIs of persistent, inactive configs. eg we should be able > > > to record XML configs of mdevs (and npiv devices too), in /etc/libvirt > > > so they persist across reboots, and can be setup for auto-start on > > > boot too. > > > > That doesn't help mdev in any way though. It doesn't make sense to > > generate new UUID for given VM at each start. So in case of > > What statement does this^^ refer to? Why would you generate a new UUID for a > VM > at each start, you'd generate it only once and then store it, the same way as > domain UUIDs work. > > > single host, the persistent file is redundant to the domain XML (as > > long as uuid+parent is in the xml) and in case of cluster we'd have to > > Right now you don't have any info about the parent device in the domain XML > and > such data would only exist in the XML if we all agreed on auto-creating mdevs, > in which case persistent configs in nodedev would be unnecessary and > vice-versa. > > > copy all possible VM mdev definitions to all the hosts. > > ^For mdev configs, you might be better off with creating them explicitly than > copying configs, simply because given the information the XML has, you might > conflict with UUIDs between hosts, so you'd have to take care for that. > Parents > have different PCI addresses that most probably wouldn't match across hosts, > so > from automation point of view, I think writing a stub recreating the whole set > of devices/configs might actually be easier than copying & handling them > (solely because the 2 things left - after the ones I mentioned - in the XML > are > the vgpu type and IOMMU group number which AFAIK cannot be requested > explicitly). Yep, separately the mdev config from the domain config is a significant benefit as it makes the domain config independant of the particular device you've attached to which can vary across hosts. > > The idea works nicely if you had such definitions accessible in the > > cluster and could define a group of devices (gpu+soundcard, single > > mdev, single vf, ...) that would later be assigned to a VM (let's hope > > kubevirt can get there). > > > > As for automatic creation, I think it's on the "nice to have" level. > > So far libvirt is close to useless when working with mdevs as all the > > data is in the same sysfs place where create/delete endpoints are - as > > mentioned earlier, we can just get the data and do everything directly > > from there instead of dealing with XML and bunch of new API calls. > > Having at least some *configurable* auto create policy might add some > > ^this is the thing we constantly keep discussing as everyone has a slightly > different angle of view - libvirt does not implement any kind of policy, > therefore the only "configuration" would be the PCI parent placement - you say > what to do and we do it, no logic in it, that's it. Now, I don't understand > taking care of the guesswork for the user in the simplest manner possible as > policy rather as a mere convenience, be it just for developers and testers, > but > even that might apparently be perceived as a policy and therefore > unacceptable. > > I still stand by idea of having auto-creation as unfortunately, I sort of > still > fail to understand what the negative implications of having it are - is that > it > would get just unnecessarily too complex to maintain in the future that we > would > regret it or that we'd get a huge amount of follow-up requests for extending > the > feature or is it just that simply the interpretation of auto-create == policy? The increasing complexity of the qemu driver is a significant concern with adding policy based logic to the code. THinking about this though, if we provide the inactive node device feature, then we can avoid essentially all new code and complexity QEMU driver, and still support auto-create. ie, in the domain XML we just continue to have the exact same XML that we already have today for mdevs, but with a single new attribute autocreate=yes|no In the QEMU driver, then the only change required is if (def->autocreate) virNodeDeviceCreate(dev) and the opposite in shutdown. This avoids pulling all the node device XML schema into the domain XML schema too which is something I dislike about the previous proposals too. The inactive node device concept is also more broadly useful
Re: [libvirt] RFC: Creating mediated devices with libvirt
On 22/06/17 14:05 +0200, Erik Skultety wrote: On Thu, Jun 22, 2017 at 10:41:13AM +0200, Martin Polednik wrote: On 16/06/17 18:14 +0100, Daniel P. Berrange wrote: > On Fri, Jun 16, 2017 at 06:11:17PM +0100, Daniel P. Berrange wrote: > > On Fri, Jun 16, 2017 at 11:02:55AM -0600, Alex Williamson wrote: > > > On Fri, 16 Jun 2017 11:32:04 -0400 > > > Laine Stumpwrote: > > > > > > > On 06/15/2017 02:42 PM, Alex Williamson wrote: > > > > > On Thu, 15 Jun 2017 09:33:01 +0100 > > > > > "Daniel P. Berrange" wrote: > > > > > > > > > >> On Thu, Jun 15, 2017 at 12:06:43AM +0200, Erik Skultety wrote: > > > > >>> Hi all, > > > > >>> > > > > >>> so there's been an off-list discussion about finally implementing creation of > > > > >>> mediated devices with libvirt and it's more than desired to get as many opinions > > > > >>> on that as possible, so please do share your ideas. This did come up already as > > > > >>> part of some older threads ([1] for example), so this will be a respin of the > > > > >>> discussions. Long story short, we decided to put device creation off and focus > > > > >>> on the introduction of the framework as such first and build upon that later, > > > > >>> i.e. now. > > > > >>> > > > > >>> [1] https://www.redhat.com/archives/libvir-list/2017-February/msg00177.html > > > > >>> > > > > >>> > > > > >>> PART 1: NODEDEV-DRIVER > > > > >>> > > > > >>> > > > > >>> API-wise, device creation through the nodedev driver should be pretty > > > > >>> straightforward and without any issues, since virNodeDevCreateXML takes an XML > > > > >>> and does support flags. Looking at the current device XML: > > > > >>> > > > > >>> > > > > >>> mdev_0cce8709_0640_46ef_bd14_962c7f73cc6f > > > > >>> /sys/devices/pci:00/.../0cce8709-0640-46ef-bd14-962c7f73cc6f > > > > >>> pci__03_00_0 > > > > >>> > > > > >>> vfio_mdev > > > > >>> > > > > >>> > > > > >>> > > > > >>> > > > > >>> UUID > > > > >>> > > > > >>> > > > > >>> > > > > >>> We can ignore ,, elements, since these are useless > > > > >>> during creation. We also cannot use since we don't support arbitrary > > > > >>> names and we also can't rely on users providing a name in correct form which we > > > > >>> would need to further parse in order to get the UUID. > > > > >>> So since the only thing missing to successfully use create an mdev using XML is > > > > >>> the UUID (if user doesn't want it to be generated automatically), how about > > > > >>> having a subelement under just like PCIs have and > > > > >>> friends, USBs have & , interfaces have to uniquely > > > > >>> identify the device even if the name itself is unique. > > > > >>> Removal of a device should work as well, although we might want to > > > > >>> consider creating a *Flags version of the API. > > > > >>> > > > > >>> = > > > > >>> PART 2: DOMAIN XML & DEVICE AUTO-CREATION, NO POLICY INVOLVED! > > > > >>> = > > > > >>> > > > > >>> There were some doubts about auto-creation mentioned in [1], although they > > > > >>> weren't specified further. So hopefully, we'll get further in the discussion > > > > >>> this time. > > > > >>> > > > > >>> From my perspective there are two main reasons/benefits to that: > > > > >>> > > > > >>> 1) Convenience > > > > >>> For apps like virt-manager, user will want to add a host device transparently, > > > > >>> "hey libvirt, I want an mdev assigned to my VM, can you do that". Even for > > > > >>> higher management apps, like oVirt, even they might not care about the parent > > > > >>> device at all times and considering that they would need to enumerate the > > > > >>> parents, pick one, create the device XML and pass it to the nodedev driver, IMHO > > > > >>> it would actually be easier and faster to just do it directly through sysfs, > > > > >>> bypassing libvirt once again > > > > >> > > > > >> The convenience only works if the policy we've provided in libvirt actually > > > > >> matches the policy the application wants. I think it is quite likely that with > > > > >> cloud the mdevs will be created out of band from the domain startup process. > > > > >> It is possible the app will just have a fixed set of mdevs pre-created when > > > > >> the host starts up. Or that the mgmt app wants the domain startup process to > > > > >> be a two phase setup, where it first allocates the resources needed, and later > > > > >> then tries to start the guest. This is why I keep saying that putting this kind > > > > >> of "convenient" policy in libvirt is a bad idea - it is essentially just putting > > > > >> a bit of virt-manager code into libvirt - more advanced apps will need more > > > > >> flexibility in this area. > > > > >> > > > > >>> 2)
Re: [libvirt] RFC: Creating mediated devices with libvirt
On Thu, Jun 22, 2017 at 10:41:13AM +0200, Martin Polednik wrote: > On 16/06/17 18:14 +0100, Daniel P. Berrange wrote: > > On Fri, Jun 16, 2017 at 06:11:17PM +0100, Daniel P. Berrange wrote: > > > On Fri, Jun 16, 2017 at 11:02:55AM -0600, Alex Williamson wrote: > > > > On Fri, 16 Jun 2017 11:32:04 -0400 > > > > Laine Stumpwrote: > > > > > > > > > On 06/15/2017 02:42 PM, Alex Williamson wrote: > > > > > > On Thu, 15 Jun 2017 09:33:01 +0100 > > > > > > "Daniel P. Berrange" wrote: > > > > > > > > > > > >> On Thu, Jun 15, 2017 at 12:06:43AM +0200, Erik Skultety wrote: > > > > > >>> Hi all, > > > > > >>> > > > > > >>> so there's been an off-list discussion about finally implementing > > > > > >>> creation of > > > > > >>> mediated devices with libvirt and it's more than desired to get > > > > > >>> as many opinions > > > > > >>> on that as possible, so please do share your ideas. This did come > > > > > >>> up already as > > > > > >>> part of some older threads ([1] for example), so this will be a > > > > > >>> respin of the > > > > > >>> discussions. Long story short, we decided to put device creation > > > > > >>> off and focus > > > > > >>> on the introduction of the framework as such first and build upon > > > > > >>> that later, > > > > > >>> i.e. now. > > > > > >>> > > > > > >>> [1] > > > > > >>> https://www.redhat.com/archives/libvir-list/2017-February/msg00177.html > > > > > >>> > > > > > >>> > > > > > >>> PART 1: NODEDEV-DRIVER > > > > > >>> > > > > > >>> > > > > > >>> API-wise, device creation through the nodedev driver should be > > > > > >>> pretty > > > > > >>> straightforward and without any issues, since virNodeDevCreateXML > > > > > >>> takes an XML > > > > > >>> and does support flags. Looking at the current device XML: > > > > > >>> > > > > > >>> > > > > > >>> mdev_0cce8709_0640_46ef_bd14_962c7f73cc6f > > > > > >>> > > > > > >>> /sys/devices/pci:00/.../0cce8709-0640-46ef-bd14-962c7f73cc6f > > > > > >>> pci__03_00_0 > > > > > >>> > > > > > >>> vfio_mdev > > > > > >>> > > > > > >>> > > > > > >>> > > > > > >>> > > > > > >>> UUID > > > > > >>> > > > > > >>> > > > > > >>> > > > > > >>> We can ignore ,, elements, since these > > > > > >>> are useless > > > > > >>> during creation. We also cannot use since we don't support > > > > > >>> arbitrary > > > > > >>> names and we also can't rely on users providing a name in correct > > > > > >>> form which we > > > > > >>> would need to further parse in order to get the UUID. > > > > > >>> So since the only thing missing to successfully use create an > > > > > >>> mdev using XML is > > > > > >>> the UUID (if user doesn't want it to be generated automatically), > > > > > >>> how about > > > > > >>> having a subelement under just like PCIs have > > > > > >>> and > > > > > >>> friends, USBs have & , interfaces have to > > > > > >>> uniquely > > > > > >>> identify the device even if the name itself is unique. > > > > > >>> Removal of a device should work as well, although we might want to > > > > > >>> consider creating a *Flags version of the API. > > > > > >>> > > > > > >>> = > > > > > >>> PART 2: DOMAIN XML & DEVICE AUTO-CREATION, NO POLICY INVOLVED! > > > > > >>> = > > > > > >>> > > > > > >>> There were some doubts about auto-creation mentioned in [1], > > > > > >>> although they > > > > > >>> weren't specified further. So hopefully, we'll get further in the > > > > > >>> discussion > > > > > >>> this time. > > > > > >>> > > > > > >>> From my perspective there are two main reasons/benefits to that: > > > > > >>> > > > > > >>> 1) Convenience > > > > > >>> For apps like virt-manager, user will want to add a host device > > > > > >>> transparently, > > > > > >>> "hey libvirt, I want an mdev assigned to my VM, can you do that". > > > > > >>> Even for > > > > > >>> higher management apps, like oVirt, even they might not care > > > > > >>> about the parent > > > > > >>> device at all times and considering that they would need to > > > > > >>> enumerate the > > > > > >>> parents, pick one, create the device XML and pass it to the > > > > > >>> nodedev driver, IMHO > > > > > >>> it would actually be easier and faster to just do it directly > > > > > >>> through sysfs, > > > > > >>> bypassing libvirt once again > > > > > >> > > > > > >> The convenience only works if the policy we've provided in libvirt > > > > > >> actually > > > > > >> matches the policy the application wants. I think it is quite > > > > > >> likely that with > > > > > >> cloud the mdevs will be created out of band from the domain > > > > > >> startup process. > > > > > >> It is possible the app will just have a fixed set of mdevs > > > > > >> pre-created when > > > > > >>
Re: [libvirt] RFC: Creating mediated devices with libvirt
On 16/06/17 18:14 +0100, Daniel P. Berrange wrote: On Fri, Jun 16, 2017 at 06:11:17PM +0100, Daniel P. Berrange wrote: On Fri, Jun 16, 2017 at 11:02:55AM -0600, Alex Williamson wrote: > On Fri, 16 Jun 2017 11:32:04 -0400 > Laine Stumpwrote: > > > On 06/15/2017 02:42 PM, Alex Williamson wrote: > > > On Thu, 15 Jun 2017 09:33:01 +0100 > > > "Daniel P. Berrange" wrote: > > > > > >> On Thu, Jun 15, 2017 at 12:06:43AM +0200, Erik Skultety wrote: > > >>> Hi all, > > >>> > > >>> so there's been an off-list discussion about finally implementing creation of > > >>> mediated devices with libvirt and it's more than desired to get as many opinions > > >>> on that as possible, so please do share your ideas. This did come up already as > > >>> part of some older threads ([1] for example), so this will be a respin of the > > >>> discussions. Long story short, we decided to put device creation off and focus > > >>> on the introduction of the framework as such first and build upon that later, > > >>> i.e. now. > > >>> > > >>> [1] https://www.redhat.com/archives/libvir-list/2017-February/msg00177.html > > >>> > > >>> > > >>> PART 1: NODEDEV-DRIVER > > >>> > > >>> > > >>> API-wise, device creation through the nodedev driver should be pretty > > >>> straightforward and without any issues, since virNodeDevCreateXML takes an XML > > >>> and does support flags. Looking at the current device XML: > > >>> > > >>> > > >>> mdev_0cce8709_0640_46ef_bd14_962c7f73cc6f > > >>> /sys/devices/pci:00/.../0cce8709-0640-46ef-bd14-962c7f73cc6f > > >>> pci__03_00_0 > > >>> > > >>> vfio_mdev > > >>> > > >>> > > >>> > > >>> > > >>> UUID > > >>> > > >>> > > >>> > > >>> We can ignore ,, elements, since these are useless > > >>> during creation. We also cannot use since we don't support arbitrary > > >>> names and we also can't rely on users providing a name in correct form which we > > >>> would need to further parse in order to get the UUID. > > >>> So since the only thing missing to successfully use create an mdev using XML is > > >>> the UUID (if user doesn't want it to be generated automatically), how about > > >>> having a subelement under just like PCIs have and > > >>> friends, USBs have & , interfaces have to uniquely > > >>> identify the device even if the name itself is unique. > > >>> Removal of a device should work as well, although we might want to > > >>> consider creating a *Flags version of the API. > > >>> > > >>> = > > >>> PART 2: DOMAIN XML & DEVICE AUTO-CREATION, NO POLICY INVOLVED! > > >>> = > > >>> > > >>> There were some doubts about auto-creation mentioned in [1], although they > > >>> weren't specified further. So hopefully, we'll get further in the discussion > > >>> this time. > > >>> > > >>> From my perspective there are two main reasons/benefits to that: > > >>> > > >>> 1) Convenience > > >>> For apps like virt-manager, user will want to add a host device transparently, > > >>> "hey libvirt, I want an mdev assigned to my VM, can you do that". Even for > > >>> higher management apps, like oVirt, even they might not care about the parent > > >>> device at all times and considering that they would need to enumerate the > > >>> parents, pick one, create the device XML and pass it to the nodedev driver, IMHO > > >>> it would actually be easier and faster to just do it directly through sysfs, > > >>> bypassing libvirt once again > > >> > > >> The convenience only works if the policy we've provided in libvirt actually > > >> matches the policy the application wants. I think it is quite likely that with > > >> cloud the mdevs will be created out of band from the domain startup process. > > >> It is possible the app will just have a fixed set of mdevs pre-created when > > >> the host starts up. Or that the mgmt app wants the domain startup process to > > >> be a two phase setup, where it first allocates the resources needed, and later > > >> then tries to start the guest. This is why I keep saying that putting this kind > > >> of "convenient" policy in libvirt is a bad idea - it is essentially just putting > > >> a bit of virt-manager code into libvirt - more advanced apps will need more > > >> flexibility in this area. > > >> > > >>> 2) Future domain migration > > >>> Suppose now that the mdev backing physical devices support state dump and > > >>> reload. Chances are, that the corresponding mdev doesn't even exist or has a > > >>> different UUID on the destination, so libvirt would do its best to handle this > > >>> before the domain could be resumed. > > >> > > >> This is not an unusual scenario - there are already many other parts of the > > >> device backend config that need to
Re: [libvirt] RFC: Creating mediated devices with libvirt
On Fri, 16 Jun 2017 18:11:17 +0100 "Daniel P. Berrange"wrote: > On Fri, Jun 16, 2017 at 11:02:55AM -0600, Alex Williamson wrote: > > On Fri, 16 Jun 2017 11:32:04 -0400 > > Laine Stump wrote: > > > > > On 06/15/2017 02:42 PM, Alex Williamson wrote: > > > > On Thu, 15 Jun 2017 09:33:01 +0100 > > > > "Daniel P. Berrange" wrote: > > > > > > > >> On Thu, Jun 15, 2017 at 12:06:43AM +0200, Erik Skultety wrote: > > > >>> Hi all, > > > >>> > > > >>> so there's been an off-list discussion about finally implementing > > > >>> creation of > > > >>> mediated devices with libvirt and it's more than desired to get as > > > >>> many opinions > > > >>> on that as possible, so please do share your ideas. This did come up > > > >>> already as > > > >>> part of some older threads ([1] for example), so this will be a > > > >>> respin of the > > > >>> discussions. Long story short, we decided to put device creation off > > > >>> and focus > > > >>> on the introduction of the framework as such first and build upon > > > >>> that later, > > > >>> i.e. now. > > > >>> > > > >>> [1] > > > >>> https://www.redhat.com/archives/libvir-list/2017-February/msg00177.html > > > >>> > > > >>> > > > >>> PART 1: NODEDEV-DRIVER > > > >>> > > > >>> > > > >>> API-wise, device creation through the nodedev driver should be pretty > > > >>> straightforward and without any issues, since virNodeDevCreateXML > > > >>> takes an XML > > > >>> and does support flags. Looking at the current device XML: > > > >>> > > > >>> > > > >>> mdev_0cce8709_0640_46ef_bd14_962c7f73cc6f > > > >>> > > > >>> /sys/devices/pci:00/.../0cce8709-0640-46ef-bd14-962c7f73cc6f > > > >>> pci__03_00_0 > > > >>> > > > >>> vfio_mdev > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> UUID > > > >>> > > > >>> > > > >>> > > > >>> We can ignore ,, elements, since these are > > > >>> useless > > > >>> during creation. We also cannot use since we don't support > > > >>> arbitrary > > > >>> names and we also can't rely on users providing a name in correct > > > >>> form which we > > > >>> would need to further parse in order to get the UUID. > > > >>> So since the only thing missing to successfully use create an mdev > > > >>> using XML is > > > >>> the UUID (if user doesn't want it to be generated automatically), how > > > >>> about > > > >>> having a subelement under just like PCIs have > > > >>> and > > > >>> friends, USBs have & , interfaces have to > > > >>> uniquely > > > >>> identify the device even if the name itself is unique. > > > >>> Removal of a device should work as well, although we might want to > > > >>> consider creating a *Flags version of the API. > > > >>> > > > >>> = > > > >>> PART 2: DOMAIN XML & DEVICE AUTO-CREATION, NO POLICY INVOLVED! > > > >>> = > > > >>> > > > >>> There were some doubts about auto-creation mentioned in [1], although > > > >>> they > > > >>> weren't specified further. So hopefully, we'll get further in the > > > >>> discussion > > > >>> this time. > > > >>> > > > >>> From my perspective there are two main reasons/benefits to that: > > > >>> > > > >>> 1) Convenience > > > >>> For apps like virt-manager, user will want to add a host device > > > >>> transparently, > > > >>> "hey libvirt, I want an mdev assigned to my VM, can you do that". > > > >>> Even for > > > >>> higher management apps, like oVirt, even they might not care about > > > >>> the parent > > > >>> device at all times and considering that they would need to enumerate > > > >>> the > > > >>> parents, pick one, create the device XML and pass it to the nodedev > > > >>> driver, IMHO > > > >>> it would actually be easier and faster to just do it directly > > > >>> through sysfs, > > > >>> bypassing libvirt once again > > > >> > > > >> The convenience only works if the policy we've provided in libvirt > > > >> actually > > > >> matches the policy the application wants. I think it is quite likely > > > >> that with > > > >> cloud the mdevs will be created out of band from the domain startup > > > >> process. > > > >> It is possible the app will just have a fixed set of mdevs pre-created > > > >> when > > > >> the host starts up. Or that the mgmt app wants the domain startup > > > >> process to > > > >> be a two phase setup, where it first allocates the resources needed, > > > >> and later > > > >> then tries to start the guest. This is why I keep saying that putting > > > >> this kind > > > >> of "convenient" policy in libvirt is a bad idea - it is essentially > > > >> just putting > > > >> a bit of virt-manager code into libvirt - more advanced apps will need > > > >> more > > > >> flexibility in this area. > > > >>
Re: [libvirt] RFC: Creating mediated devices with libvirt
On Fri, Jun 16, 2017 at 06:11:17PM +0100, Daniel P. Berrange wrote: > On Fri, Jun 16, 2017 at 11:02:55AM -0600, Alex Williamson wrote: > > On Fri, 16 Jun 2017 11:32:04 -0400 > > Laine Stumpwrote: > > > > > On 06/15/2017 02:42 PM, Alex Williamson wrote: > > > > On Thu, 15 Jun 2017 09:33:01 +0100 > > > > "Daniel P. Berrange" wrote: > > > > > > > >> On Thu, Jun 15, 2017 at 12:06:43AM +0200, Erik Skultety wrote: > > > >>> Hi all, > > > >>> > > > >>> so there's been an off-list discussion about finally implementing > > > >>> creation of > > > >>> mediated devices with libvirt and it's more than desired to get as > > > >>> many opinions > > > >>> on that as possible, so please do share your ideas. This did come up > > > >>> already as > > > >>> part of some older threads ([1] for example), so this will be a > > > >>> respin of the > > > >>> discussions. Long story short, we decided to put device creation off > > > >>> and focus > > > >>> on the introduction of the framework as such first and build upon > > > >>> that later, > > > >>> i.e. now. > > > >>> > > > >>> [1] > > > >>> https://www.redhat.com/archives/libvir-list/2017-February/msg00177.html > > > >>> > > > >>> > > > >>> PART 1: NODEDEV-DRIVER > > > >>> > > > >>> > > > >>> API-wise, device creation through the nodedev driver should be pretty > > > >>> straightforward and without any issues, since virNodeDevCreateXML > > > >>> takes an XML > > > >>> and does support flags. Looking at the current device XML: > > > >>> > > > >>> > > > >>> mdev_0cce8709_0640_46ef_bd14_962c7f73cc6f > > > >>> > > > >>> /sys/devices/pci:00/.../0cce8709-0640-46ef-bd14-962c7f73cc6f > > > >>> pci__03_00_0 > > > >>> > > > >>> vfio_mdev > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> UUID > > > >>> > > > >>> > > > >>> > > > >>> We can ignore ,, elements, since these are > > > >>> useless > > > >>> during creation. We also cannot use since we don't support > > > >>> arbitrary > > > >>> names and we also can't rely on users providing a name in correct > > > >>> form which we > > > >>> would need to further parse in order to get the UUID. > > > >>> So since the only thing missing to successfully use create an mdev > > > >>> using XML is > > > >>> the UUID (if user doesn't want it to be generated automatically), how > > > >>> about > > > >>> having a subelement under just like PCIs have > > > >>> and > > > >>> friends, USBs have & , interfaces have to > > > >>> uniquely > > > >>> identify the device even if the name itself is unique. > > > >>> Removal of a device should work as well, although we might want to > > > >>> consider creating a *Flags version of the API. > > > >>> > > > >>> = > > > >>> PART 2: DOMAIN XML & DEVICE AUTO-CREATION, NO POLICY INVOLVED! > > > >>> = > > > >>> > > > >>> There were some doubts about auto-creation mentioned in [1], although > > > >>> they > > > >>> weren't specified further. So hopefully, we'll get further in the > > > >>> discussion > > > >>> this time. > > > >>> > > > >>> From my perspective there are two main reasons/benefits to that: > > > >>> > > > >>> 1) Convenience > > > >>> For apps like virt-manager, user will want to add a host device > > > >>> transparently, > > > >>> "hey libvirt, I want an mdev assigned to my VM, can you do that". > > > >>> Even for > > > >>> higher management apps, like oVirt, even they might not care about > > > >>> the parent > > > >>> device at all times and considering that they would need to enumerate > > > >>> the > > > >>> parents, pick one, create the device XML and pass it to the nodedev > > > >>> driver, IMHO > > > >>> it would actually be easier and faster to just do it directly > > > >>> through sysfs, > > > >>> bypassing libvirt once again > > > >> > > > >> The convenience only works if the policy we've provided in libvirt > > > >> actually > > > >> matches the policy the application wants. I think it is quite likely > > > >> that with > > > >> cloud the mdevs will be created out of band from the domain startup > > > >> process. > > > >> It is possible the app will just have a fixed set of mdevs pre-created > > > >> when > > > >> the host starts up. Or that the mgmt app wants the domain startup > > > >> process to > > > >> be a two phase setup, where it first allocates the resources needed, > > > >> and later > > > >> then tries to start the guest. This is why I keep saying that putting > > > >> this kind > > > >> of "convenient" policy in libvirt is a bad idea - it is essentially > > > >> just putting > > > >> a bit of virt-manager code into libvirt - more advanced apps will need > > > >> more > > > >> flexibility in this area. > > > >> > > > >>> 2) Future domain
Re: [libvirt] RFC: Creating mediated devices with libvirt
On Fri, Jun 16, 2017 at 11:02:55AM -0600, Alex Williamson wrote: > On Fri, 16 Jun 2017 11:32:04 -0400 > Laine Stumpwrote: > > > On 06/15/2017 02:42 PM, Alex Williamson wrote: > > > On Thu, 15 Jun 2017 09:33:01 +0100 > > > "Daniel P. Berrange" wrote: > > > > > >> On Thu, Jun 15, 2017 at 12:06:43AM +0200, Erik Skultety wrote: > > >>> Hi all, > > >>> > > >>> so there's been an off-list discussion about finally implementing > > >>> creation of > > >>> mediated devices with libvirt and it's more than desired to get as many > > >>> opinions > > >>> on that as possible, so please do share your ideas. This did come up > > >>> already as > > >>> part of some older threads ([1] for example), so this will be a respin > > >>> of the > > >>> discussions. Long story short, we decided to put device creation off > > >>> and focus > > >>> on the introduction of the framework as such first and build upon that > > >>> later, > > >>> i.e. now. > > >>> > > >>> [1] > > >>> https://www.redhat.com/archives/libvir-list/2017-February/msg00177.html > > >>> > > >>> > > >>> PART 1: NODEDEV-DRIVER > > >>> > > >>> > > >>> API-wise, device creation through the nodedev driver should be pretty > > >>> straightforward and without any issues, since virNodeDevCreateXML takes > > >>> an XML > > >>> and does support flags. Looking at the current device XML: > > >>> > > >>> > > >>> mdev_0cce8709_0640_46ef_bd14_962c7f73cc6f > > >>> > > >>> /sys/devices/pci:00/.../0cce8709-0640-46ef-bd14-962c7f73cc6f > > >>> pci__03_00_0 > > >>> > > >>> vfio_mdev > > >>> > > >>> > > >>> > > >>> > > >>> UUID > > >>> > > >>> > > >>> > > >>> We can ignore ,, elements, since these are > > >>> useless > > >>> during creation. We also cannot use since we don't support > > >>> arbitrary > > >>> names and we also can't rely on users providing a name in correct form > > >>> which we > > >>> would need to further parse in order to get the UUID. > > >>> So since the only thing missing to successfully use create an mdev > > >>> using XML is > > >>> the UUID (if user doesn't want it to be generated automatically), how > > >>> about > > >>> having a subelement under just like PCIs have > > >>> and > > >>> friends, USBs have & , interfaces have to > > >>> uniquely > > >>> identify the device even if the name itself is unique. > > >>> Removal of a device should work as well, although we might want to > > >>> consider creating a *Flags version of the API. > > >>> > > >>> = > > >>> PART 2: DOMAIN XML & DEVICE AUTO-CREATION, NO POLICY INVOLVED! > > >>> = > > >>> > > >>> There were some doubts about auto-creation mentioned in [1], although > > >>> they > > >>> weren't specified further. So hopefully, we'll get further in the > > >>> discussion > > >>> this time. > > >>> > > >>> From my perspective there are two main reasons/benefits to that: > > >>> > > >>> 1) Convenience > > >>> For apps like virt-manager, user will want to add a host device > > >>> transparently, > > >>> "hey libvirt, I want an mdev assigned to my VM, can you do that". Even > > >>> for > > >>> higher management apps, like oVirt, even they might not care about the > > >>> parent > > >>> device at all times and considering that they would need to enumerate > > >>> the > > >>> parents, pick one, create the device XML and pass it to the nodedev > > >>> driver, IMHO > > >>> it would actuallybe easier and faster to just do it directly > > >>> through sysfs, > > >>> bypassing libvirt once again > > >> > > >> The convenience only works if the policy we've provided in libvirt > > >> actually > > >> matches the policy the application wants. I think it is quite likely > > >> that with > > >> cloud the mdevs will be created out of band from the domain startup > > >> process. > > >> It is possible the app will just have a fixed set of mdevs pre-created > > >> when > > >> the host starts up. Or that the mgmt app wants the domain startup > > >> process to > > >> be a two phase setup, where it first allocates the resources needed, and > > >> later > > >> then tries to start the guest. This is why I keep saying that putting > > >> this kind > > >> of "convenient" policy in libvirt is a bad idea - it is essentially just > > >> putting > > >> a bit of virt-manager code into libvirt - more advanced apps will need > > >> more > > >> flexibility in this area. > > >> > > >>> 2) Future domain migration > > >>> Suppose now that the mdev backing physical devices support state dump > > >>> and > > >>> reload. Chances are, that the corresponding mdev doesn't even exist or > > >>> has a > > >>> different UUID on the destination, so libvirt would do its best to > > >>> handle this > > >>>
Re: [libvirt] RFC: Creating mediated devices with libvirt
On Fri, 16 Jun 2017 11:32:04 -0400 Laine Stumpwrote: > On 06/15/2017 02:42 PM, Alex Williamson wrote: > > On Thu, 15 Jun 2017 09:33:01 +0100 > > "Daniel P. Berrange" wrote: > > > >> On Thu, Jun 15, 2017 at 12:06:43AM +0200, Erik Skultety wrote: > >>> Hi all, > >>> > >>> so there's been an off-list discussion about finally implementing > >>> creation of > >>> mediated devices with libvirt and it's more than desired to get as many > >>> opinions > >>> on that as possible, so please do share your ideas. This did come up > >>> already as > >>> part of some older threads ([1] for example), so this will be a respin of > >>> the > >>> discussions. Long story short, we decided to put device creation off and > >>> focus > >>> on the introduction of the framework as such first and build upon that > >>> later, > >>> i.e. now. > >>> > >>> [1] > >>> https://www.redhat.com/archives/libvir-list/2017-February/msg00177.html > >>> > >>> > >>> PART 1: NODEDEV-DRIVER > >>> > >>> > >>> API-wise, device creation through the nodedev driver should be pretty > >>> straightforward and without any issues, since virNodeDevCreateXML takes > >>> an XML > >>> and does support flags. Looking at the current device XML: > >>> > >>> > >>> mdev_0cce8709_0640_46ef_bd14_962c7f73cc6f > >>> > >>> /sys/devices/pci:00/.../0cce8709-0640-46ef-bd14-962c7f73cc6f > >>> pci__03_00_0 > >>> > >>> vfio_mdev > >>> > >>> > >>> > >>> > >>> UUID > >>> > >>> > >>> > >>> We can ignore ,, elements, since these are > >>> useless > >>> during creation. We also cannot use since we don't support > >>> arbitrary > >>> names and we also can't rely on users providing a name in correct form > >>> which we > >>> would need to further parse in order to get the UUID. > >>> So since the only thing missing to successfully use create an mdev using > >>> XML is > >>> the UUID (if user doesn't want it to be generated automatically), how > >>> about > >>> having a subelement under just like PCIs have > >>> and > >>> friends, USBs have & , interfaces have to uniquely > >>> identify the device even if the name itself is unique. > >>> Removal of a device should work as well, although we might want to > >>> consider creating a *Flags version of the API. > >>> > >>> = > >>> PART 2: DOMAIN XML & DEVICE AUTO-CREATION, NO POLICY INVOLVED! > >>> = > >>> > >>> There were some doubts about auto-creation mentioned in [1], although they > >>> weren't specified further. So hopefully, we'll get further in the > >>> discussion > >>> this time. > >>> > >>> From my perspective there are two main reasons/benefits to that: > >>> > >>> 1) Convenience > >>> For apps like virt-manager, user will want to add a host device > >>> transparently, > >>> "hey libvirt, I want an mdev assigned to my VM, can you do that". Even for > >>> higher management apps, like oVirt, even they might not care about the > >>> parent > >>> device at all times and considering that they would need to enumerate the > >>> parents, pick one, create the device XML and pass it to the nodedev > >>> driver, IMHO > >>> it would actually be easier and faster to just do it directly through > >>> sysfs, > >>> bypassing libvirt once again > >> > >> The convenience only works if the policy we've provided in libvirt actually > >> matches the policy the application wants. I think it is quite likely that > >> with > >> cloud the mdevs will be created out of band from the domain startup > >> process. > >> It is possible the app will just have a fixed set of mdevs pre-created when > >> the host starts up. Or that the mgmt app wants the domain startup process > >> to > >> be a two phase setup, where it first allocates the resources needed, and > >> later > >> then tries to start the guest. This is why I keep saying that putting this > >> kind > >> of "convenient" policy in libvirt is a bad idea - it is essentially just > >> putting > >> a bit of virt-manager code into libvirt - more advanced apps will need more > >> flexibility in this area. > >> > >>> 2) Future domain migration > >>> Suppose now that the mdev backing physical devices support state dump and > >>> reload. Chances are, that the corresponding mdev doesn't even exist or > >>> has a > >>> different UUID on the destination, so libvirt would do its best to handle > >>> this > >>> before the domain could be resumed. > >> > >> This is not an unusual scenario - there are already many other parts of the > >> device backend config that need to change prior to migration, especially > >> for > >> anything related to host devices, so apps already have support for doing > >> this, which is more flexible & convenient becasue it doesn't tie creation > >>
Re: [libvirt] RFC: Creating mediated devices with libvirt
On Fri, Jun 16, 2017 at 11:32:04AM -0400, Laine Stump wrote: > On 06/15/2017 02:42 PM, Alex Williamson wrote: > > On Thu, 15 Jun 2017 09:33:01 +0100 > > "Daniel P. Berrange"wrote: > > > >> On Thu, Jun 15, 2017 at 12:06:43AM +0200, Erik Skultety wrote: > >>> Hi all, > >>> > >>> so there's been an off-list discussion about finally implementing > >>> creation of > >>> mediated devices with libvirt and it's more than desired to get as many > >>> opinions > >>> on that as possible, so please do share your ideas. This did come up > >>> already as > >>> part of some older threads ([1] for example), so this will be a respin of > >>> the > >>> discussions. Long story short, we decided to put device creation off and > >>> focus > >>> on the introduction of the framework as such first and build upon that > >>> later, > >>> i.e. now. > >>> > >>> [1] > >>> https://www.redhat.com/archives/libvir-list/2017-February/msg00177.html > >>> > >>> > >>> PART 1: NODEDEV-DRIVER > >>> > >>> > >>> API-wise, device creation through the nodedev driver should be pretty > >>> straightforward and without any issues, since virNodeDevCreateXML takes > >>> an XML > >>> and does support flags. Looking at the current device XML: > >>> > >>> > >>> mdev_0cce8709_0640_46ef_bd14_962c7f73cc6f > >>> > >>> /sys/devices/pci:00/.../0cce8709-0640-46ef-bd14-962c7f73cc6f > >>> pci__03_00_0 > >>> > >>> vfio_mdev > >>> > >>> > >>> > >>> > >>> UUID > >>> > >>> > >>> > >>> We can ignore ,, elements, since these are > >>> useless > >>> during creation. We also cannot use since we don't support > >>> arbitrary > >>> names and we also can't rely on users providing a name in correct form > >>> which we > >>> would need to further parse in order to get the UUID. > >>> So since the only thing missing to successfully use create an mdev using > >>> XML is > >>> the UUID (if user doesn't want it to be generated automatically), how > >>> about > >>> having a subelement under just like PCIs have > >>> and > >>> friends, USBs have & , interfaces have to uniquely > >>> identify the device even if the name itself is unique. > >>> Removal of a device should work as well, although we might want to > >>> consider creating a *Flags version of the API. > >>> > >>> = > >>> PART 2: DOMAIN XML & DEVICE AUTO-CREATION, NO POLICY INVOLVED! > >>> = > >>> > >>> There were some doubts about auto-creation mentioned in [1], although they > >>> weren't specified further. So hopefully, we'll get further in the > >>> discussion > >>> this time. > >>> > >>> From my perspective there are two main reasons/benefits to that: > >>> > >>> 1) Convenience > >>> For apps like virt-manager, user will want to add a host device > >>> transparently, > >>> "hey libvirt, I want an mdev assigned to my VM, can you do that". Even for > >>> higher management apps, like oVirt, even they might not care about the > >>> parent > >>> device at all times and considering that they would need to enumerate the > >>> parents, pick one, create the device XML and pass it to the nodedev > >>> driver, IMHO > >>> it would actually be easier and faster to just do it directly through > >>> sysfs, > >>> bypassing libvirt once again > >> > >> The convenience only works if the policy we've provided in libvirt actually > >> matches the policy the application wants. I think it is quite likely that > >> with > >> cloud the mdevs will be created out of band from the domain startup > >> process. > >> It is possible the app will just have a fixed set of mdevs pre-created when > >> the host starts up. Or that the mgmt app wants the domain startup process > >> to > >> be a two phase setup, where it first allocates the resources needed, and > >> later > >> then tries to start the guest. This is why I keep saying that putting this > >> kind > >> of "convenient" policy in libvirt is a bad idea - it is essentially just > >> putting > >> a bit of virt-manager code into libvirt - more advanced apps will need more > >> flexibility in this area. > >> > >>> 2) Future domain migration > >>> Suppose now that the mdev backing physical devices support state dump and > >>> reload. Chances are, that the corresponding mdev doesn't even exist or > >>> has a > >>> different UUID on the destination, so libvirt would do its best to handle > >>> this > >>> before the domain could be resumed. > >> > >> This is not an unusual scenario - there are already many other parts of the > >> device backend config that need to change prior to migration, especially > >> for > >> anything related to host devices, so apps already have support for doing > >> this, which is more flexible & convenient becasue it doesn't tie creation > >> of > >> the mdevs to
Re: [libvirt] RFC: Creating mediated devices with libvirt
On 06/15/2017 02:42 PM, Alex Williamson wrote: > On Thu, 15 Jun 2017 09:33:01 +0100 > "Daniel P. Berrange"wrote: > >> On Thu, Jun 15, 2017 at 12:06:43AM +0200, Erik Skultety wrote: >>> Hi all, >>> >>> so there's been an off-list discussion about finally implementing creation >>> of >>> mediated devices with libvirt and it's more than desired to get as many >>> opinions >>> on that as possible, so please do share your ideas. This did come up >>> already as >>> part of some older threads ([1] for example), so this will be a respin of >>> the >>> discussions. Long story short, we decided to put device creation off and >>> focus >>> on the introduction of the framework as such first and build upon that >>> later, >>> i.e. now. >>> >>> [1] https://www.redhat.com/archives/libvir-list/2017-February/msg00177.html >>> >>> >>> PART 1: NODEDEV-DRIVER >>> >>> >>> API-wise, device creation through the nodedev driver should be pretty >>> straightforward and without any issues, since virNodeDevCreateXML takes an >>> XML >>> and does support flags. Looking at the current device XML: >>> >>> >>> mdev_0cce8709_0640_46ef_bd14_962c7f73cc6f >>> >>> /sys/devices/pci:00/.../0cce8709-0640-46ef-bd14-962c7f73cc6f >>> pci__03_00_0 >>> >>> vfio_mdev >>> >>> >>> >>> >>> UUID >>> >>> >>> >>> We can ignore ,, elements, since these are useless >>> during creation. We also cannot use since we don't support arbitrary >>> names and we also can't rely on users providing a name in correct form >>> which we >>> would need to further parse in order to get the UUID. >>> So since the only thing missing to successfully use create an mdev using >>> XML is >>> the UUID (if user doesn't want it to be generated automatically), how about >>> having a subelement under just like PCIs have >>> and >>> friends, USBs have & , interfaces have to uniquely >>> identify the device even if the name itself is unique. >>> Removal of a device should work as well, although we might want to >>> consider creating a *Flags version of the API. >>> >>> = >>> PART 2: DOMAIN XML & DEVICE AUTO-CREATION, NO POLICY INVOLVED! >>> = >>> >>> There were some doubts about auto-creation mentioned in [1], although they >>> weren't specified further. So hopefully, we'll get further in the discussion >>> this time. >>> >>> From my perspective there are two main reasons/benefits to that: >>> >>> 1) Convenience >>> For apps like virt-manager, user will want to add a host device >>> transparently, >>> "hey libvirt, I want an mdev assigned to my VM, can you do that". Even for >>> higher management apps, like oVirt, even they might not care about the >>> parent >>> device at all times and considering that they would need to enumerate the >>> parents, pick one, create the device XML and pass it to the nodedev driver, >>> IMHO >>> it would actually be easier and faster to just do it directly through sysfs, >>> bypassing libvirt once again >> >> The convenience only works if the policy we've provided in libvirt actually >> matches the policy the application wants. I think it is quite likely that >> with >> cloud the mdevs will be created out of band from the domain startup process. >> It is possible the app will just have a fixed set of mdevs pre-created when >> the host starts up. Or that the mgmt app wants the domain startup process to >> be a two phase setup, where it first allocates the resources needed, and >> later >> then tries to start the guest. This is why I keep saying that putting this >> kind >> of "convenient" policy in libvirt is a bad idea - it is essentially just >> putting >> a bit of virt-manager code into libvirt - more advanced apps will need more >> flexibility in this area. >> >>> 2) Future domain migration >>> Suppose now that the mdev backing physical devices support state dump and >>> reload. Chances are, that the corresponding mdev doesn't even exist or has a >>> different UUID on the destination, so libvirt would do its best to handle >>> this >>> before the domain could be resumed. >> >> This is not an unusual scenario - there are already many other parts of the >> device backend config that need to change prior to migration, especially for >> anything related to host devices, so apps already have support for doing >> this, which is more flexible & convenient becasue it doesn't tie creation of >> the mdevs to running of the migrate command. >> >> IOW, I'm still against adding any kind of automatic creation policy for >> mdevs in libvirt. Just provide the node device API support. > > I'm not super clear on the extent of what you're against here, is it > all forms of device creation or only a placement policy? Are you > against any form of having the XML specify
Re: [libvirt] RFC: Creating mediated devices with libvirt
On Thu, 15 Jun 2017 09:33:01 +0100 "Daniel P. Berrange"wrote: > On Thu, Jun 15, 2017 at 12:06:43AM +0200, Erik Skultety wrote: > > Hi all, > > > > so there's been an off-list discussion about finally implementing creation > > of > > mediated devices with libvirt and it's more than desired to get as many > > opinions > > on that as possible, so please do share your ideas. This did come up > > already as > > part of some older threads ([1] for example), so this will be a respin of > > the > > discussions. Long story short, we decided to put device creation off and > > focus > > on the introduction of the framework as such first and build upon that > > later, > > i.e. now. > > > > [1] https://www.redhat.com/archives/libvir-list/2017-February/msg00177.html > > > > > > PART 1: NODEDEV-DRIVER > > > > > > API-wise, device creation through the nodedev driver should be pretty > > straightforward and without any issues, since virNodeDevCreateXML takes an > > XML > > and does support flags. Looking at the current device XML: > > > > > > mdev_0cce8709_0640_46ef_bd14_962c7f73cc6f > > > > /sys/devices/pci:00/.../0cce8709-0640-46ef-bd14-962c7f73cc6f > > pci__03_00_0 > > > > vfio_mdev > > > > > > > > > > UUID > > > > > > > > We can ignore ,, elements, since these are useless > > during creation. We also cannot use since we don't support arbitrary > > names and we also can't rely on users providing a name in correct form > > which we > > would need to further parse in order to get the UUID. > > So since the only thing missing to successfully use create an mdev using > > XML is > > the UUID (if user doesn't want it to be generated automatically), how about > > having a subelement under just like PCIs have > > and > > friends, USBs have & , interfaces have to uniquely > > identify the device even if the name itself is unique. > > Removal of a device should work as well, although we might want to > > consider creating a *Flags version of the API. > > > > = > > PART 2: DOMAIN XML & DEVICE AUTO-CREATION, NO POLICY INVOLVED! > > = > > > > There were some doubts about auto-creation mentioned in [1], although they > > weren't specified further. So hopefully, we'll get further in the discussion > > this time. > > > > From my perspective there are two main reasons/benefits to that: > > > > 1) Convenience > > For apps like virt-manager, user will want to add a host device > > transparently, > > "hey libvirt, I want an mdev assigned to my VM, can you do that". Even for > > higher management apps, like oVirt, even they might not care about the > > parent > > device at all times and considering that they would need to enumerate the > > parents, pick one, create the device XML and pass it to the nodedev driver, > > IMHO > > it would actually be easier and faster to just do it directly through sysfs, > > bypassing libvirt once again > > The convenience only works if the policy we've provided in libvirt actually > matches the policy the application wants. I think it is quite likely that with > cloud the mdevs will be created out of band from the domain startup process. > It is possible the app will just have a fixed set of mdevs pre-created when > the host starts up. Or that the mgmt app wants the domain startup process to > be a two phase setup, where it first allocates the resources needed, and later > then tries to start the guest. This is why I keep saying that putting this > kind > of "convenient" policy in libvirt is a bad idea - it is essentially just > putting > a bit of virt-manager code into libvirt - more advanced apps will need more > flexibility in this area. > > > 2) Future domain migration > > Suppose now that the mdev backing physical devices support state dump and > > reload. Chances are, that the corresponding mdev doesn't even exist or has a > > different UUID on the destination, so libvirt would do its best to handle > > this > > before the domain could be resumed. > > This is not an unusual scenario - there are already many other parts of the > device backend config that need to change prior to migration, especially for > anything related to host devices, so apps already have support for doing > this, which is more flexible & convenient becasue it doesn't tie creation of > the mdevs to running of the migrate command. > > IOW, I'm still against adding any kind of automatic creation policy for > mdevs in libvirt. Just provide the node device API support. I'm not super clear on the extent of what you're against here, is it all forms of device creation or only a placement policy? Are you against any form of having the XML specify the non-instantiated mdev that it wants? We've clearly made an important step
Re: [libvirt] RFC: Creating mediated devices with libvirt
On Thu, Jun 15, 2017 at 12:06:43AM +0200, Erik Skultety wrote: > Hi all, > > so there's been an off-list discussion about finally implementing creation of > mediated devices with libvirt and it's more than desired to get as many > opinions > on that as possible, so please do share your ideas. This did come up already > as > part of some older threads ([1] for example), so this will be a respin of the > discussions. Long story short, we decided to put device creation off and focus > on the introduction of the framework as such first and build upon that later, > i.e. now. > > [1] https://www.redhat.com/archives/libvir-list/2017-February/msg00177.html > > > PART 1: NODEDEV-DRIVER > > > API-wise, device creation through the nodedev driver should be pretty > straightforward and without any issues, since virNodeDevCreateXML takes an XML > and does support flags. Looking at the current device XML: > > > mdev_0cce8709_0640_46ef_bd14_962c7f73cc6f > > /sys/devices/pci:00/.../0cce8709-0640-46ef-bd14-962c7f73cc6f > pci__03_00_0 > > vfio_mdev > > > > > UUID > > > > We can ignore ,, elements, since these are useless > during creation. We also cannot use since we don't support arbitrary > names and we also can't rely on users providing a name in correct form which > we > would need to further parse in order to get the UUID. > So since the only thing missing to successfully use create an mdev using XML > is > the UUID (if user doesn't want it to be generated automatically), how about > having a subelement under just like PCIs have and > friends, USBs have & , interfaces have to uniquely > identify the device even if the name itself is unique. > Removal of a device should work as well, although we might want to > consider creating a *Flags version of the API. > > = > PART 2: DOMAIN XML & DEVICE AUTO-CREATION, NO POLICY INVOLVED! > = > > There were some doubts about auto-creation mentioned in [1], although they > weren't specified further. So hopefully, we'll get further in the discussion > this time. > > From my perspective there are two main reasons/benefits to that: > > 1) Convenience > For apps like virt-manager, user will want to add a host device transparently, > "hey libvirt, I want an mdev assigned to my VM, can you do that". Even for > higher management apps, like oVirt, even they might not care about the parent > device at all times and considering that they would need to enumerate the > parents, pick one, create the device XML and pass it to the nodedev driver, > IMHO > it would actually be easier and faster to just do it directly through sysfs, > bypassing libvirt once again The convenience only works if the policy we've provided in libvirt actually matches the policy the application wants. I think it is quite likely that with cloud the mdevs will be created out of band from the domain startup process. It is possible the app will just have a fixed set of mdevs pre-created when the host starts up. Or that the mgmt app wants the domain startup process to be a two phase setup, where it first allocates the resources needed, and later then tries to start the guest. This is why I keep saying that putting this kind of "convenient" policy in libvirt is a bad idea - it is essentially just putting a bit of virt-manager code into libvirt - more advanced apps will need more flexibility in this area. > 2) Future domain migration > Suppose now that the mdev backing physical devices support state dump and > reload. Chances are, that the corresponding mdev doesn't even exist or has a > different UUID on the destination, so libvirt would do its best to handle this > before the domain could be resumed. This is not an unusual scenario - there are already many other parts of the device backend config that need to change prior to migration, especially for anything related to host devices, so apps already have support for doing this, which is more flexible & convenient becasue it doesn't tie creation of the mdevs to running of the migrate command. IOW, I'm still against adding any kind of automatic creation policy for mdevs in libvirt. Just provide the node device API support. Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] RFC: Creating mediated devices with libvirt
Hi all, so there's been an off-list discussion about finally implementing creation of mediated devices with libvirt and it's more than desired to get as many opinions on that as possible, so please do share your ideas. This did come up already as part of some older threads ([1] for example), so this will be a respin of the discussions. Long story short, we decided to put device creation off and focus on the introduction of the framework as such first and build upon that later, i.e. now. [1] https://www.redhat.com/archives/libvir-list/2017-February/msg00177.html PART 1: NODEDEV-DRIVER API-wise, device creation through the nodedev driver should be pretty straightforward and without any issues, since virNodeDevCreateXML takes an XML and does support flags. Looking at the current device XML: mdev_0cce8709_0640_46ef_bd14_962c7f73cc6f /sys/devices/pci:00/.../0cce8709-0640-46ef-bd14-962c7f73cc6f pci__03_00_0 vfio_mdev UUID We can ignore ,, elements, since these are useless during creation. We also cannot use since we don't support arbitrary names and we also can't rely on users providing a name in correct form which we would need to further parse in order to get the UUID. So since the only thing missing to successfully use create an mdev using XML is the UUID (if user doesn't want it to be generated automatically), how about having a subelement under just like PCIs have and friends, USBs have & , interfaces have to uniquely identify the device even if the name itself is unique. Removal of a device should work as well, although we might want to consider creating a *Flags version of the API. = PART 2: DOMAIN XML & DEVICE AUTO-CREATION, NO POLICY INVOLVED! = There were some doubts about auto-creation mentioned in [1], although they weren't specified further. So hopefully, we'll get further in the discussion this time. >From my perspective there are two main reasons/benefits to that: 1) Convenience For apps like virt-manager, user will want to add a host device transparently, "hey libvirt, I want an mdev assigned to my VM, can you do that". Even for higher management apps, like oVirt, even they might not care about the parent device at all times and considering that they would need to enumerate the parents, pick one, create the device XML and pass it to the nodedev driver, IMHO it would actually be easier and faster to just do it directly through sysfs, bypassing libvirt once again 2) Future domain migration Suppose now that the mdev backing physical devices support state dump and reload. Chances are, that the corresponding mdev doesn't even exist or has a different UUID on the destination, so libvirt would do its best to handle this before the domain could be resumed. Following what we already have: Instead of trying to somehow extend the element using more attributes like 'domain', 'slot', 'function', etc. that would render the whole element ambiguous, I was thinking about creating a element nested under that would be basically just a nested definition of another host device re-using all the element we already know, i.e. for PCI, and of course others if there happens to be a need for devices other than PCI. So speaking about XML, we'd end up with something like: So, this was the first idea off the top of my head, so I'd appreciate any suggestions, comments, especially from people who have got the 'legacy' insight into libvirt and can predict potential pitfalls based on experience :). Thanks, Erik -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list