Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver
On Tue, Aug 16, 2016 at 02:51:03PM -0600, Alex Williamson wrote: > On Tue, 16 Aug 2016 13:30:06 -0700 > Neo Jia wrote: > > > On Mon, Aug 15, 2016 at 04:47:41PM -0600, Alex Williamson wrote: > > > On Mon, 15 Aug 2016 12:59:08 -0700 > > > Neo Jia wrote: > > > > > > > > > > > > > > > > > I'm not sure a comma separated list makes sense here, for both > > > > > > > simplicity in the kernel and more fine grained error reporting, we > > > > > > > probably want to start/stop them individually. Actually, why is > > > > > > > it > > > > > > > that we can't use the mediated device being opened and released to > > > > > > > automatically signal to the backend vendor driver to commit and > > > > > > > release > > > > > > > resources? I don't fully understand why userspace needs this > > > > > > > interface. > > > > > > > > > > > That doesn't give an individual user the ability to stop and start > > > their devices though, because in order for a user to have write > > > permissions there, they get permission to DoS other users by pumping > > > arbitrary UUIDs into those files. By placing start/stop per mdev, we > > > have mdev level granularity of granting start/stop privileges. Really > > > though, do we want QEMU fumbling around through sysfs or do we want an > > > interface through the vfio API to perform start/stop? Thanks, > > > > Hi Alex, > > > > I think those two suggests make sense, so we will move the "start/stop" > > under mdev sysfs. > > > > This will be incorporated in our next v7 patch and by doing that, it will > > make > > the locking scheme easier. > > Thanks Neo. Also note that the semantics change when we move to per > device control. It would be redundant to 'echo $UUID' into a start > file which only controls a single device. So that means we probably > just want an 'echo 1'. But if we can 'echo 1' then we can also 'echo > 0', so we can reduce this to a single sysfs file. Sysfs already has a > common interface for starting and stopping devices, the "online" file. > So I think we should probably move in that direction. Additionally, an > "online" file should support a _show() function, so if we have an Intel > vGPU that perhaps does not need start/stop support, online could report > "1" after create to show that it's already online, possibly even > generate an error trying to change the online state. Thanks, Agree. We will adopt the similar syntax and support _show() function. Thanks, Neo > > Alex
Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver
On Mon, 15 Aug 2016 23:13:20 -0700 Neo Jia wrote: > On Tue, Aug 16, 2016 at 05:58:54AM +, Tian, Kevin wrote: > > > From: Neo Jia [mailto:c...@nvidia.com] > > > Sent: Tuesday, August 16, 2016 1:44 PM > > > > > > On Tue, Aug 16, 2016 at 04:52:30AM +, Tian, Kevin wrote: > > > > > From: Neo Jia [mailto:c...@nvidia.com] > > > > > Sent: Tuesday, August 16, 2016 12:17 PM > > > > > > > > > > On Tue, Aug 16, 2016 at 03:50:44AM +, Tian, Kevin wrote: > > > > > > > From: Neo Jia [mailto:c...@nvidia.com] > > > > > > > Sent: Tuesday, August 16, 2016 11:46 AM > > > > > > > > > > > > > > On Tue, Aug 16, 2016 at 12:30:25AM +, Tian, Kevin wrote: > > > > > > > > > From: Neo Jia [mailto:c...@nvidia.com] > > > > > > > > > Sent: Tuesday, August 16, 2016 3:59 AM > > > > > > > > > > > > > > > > > > > > > > > > > > > > > For NVIDIA vGPU solution we need to know all devices > > > > > > > > > > > assigned to a VM > > > in > > > > > > > > > > > one shot to commit resources of all vGPUs assigned to a > > > > > > > > > > > VM along with > > > > > > > > > > > some common resources. > > > > > > > > > > > > > > > > > > > > Kirti, can you elaborate the background about above > > > > > > > > > > one-shot commit > > > > > > > > > > requirement? It's hard to understand such a requirement. > > > > > > > > > > > > > > > > > > > > As I relied in another mail, I really hope start/stop > > > > > > > > > > become a per-mdev > > > > > > > > > > attribute instead of global one, e.g.: > > > > > > > > > > > > > > > > > > > > echo "0/1" > > > > > > /sys/class/mdev/12345678-1234-1234-1234-123456789abc/start > > > > > > > > > > > > > > > > > > > > In many scenario the user space client may only want to > > > > > > > > > > talk to mdev > > > > > > > > > > instance directly, w/o need to contact its parent device. > > > > > > > > > > Still take > > > > > > > > > > live migration for example, I don't think Qemu wants to > > > > > > > > > > know parent > > > > > > > > > > device of assigned mdev instances. > > > > > > > > > > > > > > > > > > Hi Kevin, > > > > > > > > > > > > > > > > > > Having a global /sys/class/mdev/mdev_start doesn't require > > > > > > > > > anybody to > > > know > > > > > > > > > parent device. you can just do > > > > > > > > > > > > > > > > > > echo "mdev_UUID" > /sys/class/mdev/mdev_start > > > > > > > > > > > > > > > > > > or > > > > > > > > > > > > > > > > > > echo "mdev_UUID" > /sys/class/mdev/mdev_stop > > > > > > > > > > > > > > > > > > without knowing the parent device. > > > > > > > > > > > > > > > > > > > > > > > > > You can look at some existing sysfs example, e.g.: > > > > > > > > > > > > > > > > echo "0/1" > /sys/bus/cpu/devices/cpu1/online > > > > > > > > > > > > > > > > You may also argue why not using a global style: > > > > > > > > > > > > > > > > echo "cpu1" > /sys/bus/cpu/devices/cpu_online > > > > > > > > echo "cpu1" > /sys/bus/cpu/devices/cpu_offline > > > > > > > > > > > > > > > > There are many similar examples... > > > > > > > > > > > > > > Hi Kevin, > > > > > > > > > > > > > > My response above is to your question about using the global > > > > > > > sysfs entry as you > > > > > > > don't want to have the global path because > > > > > > > > > > > > > > "I don't think Qemu wants to know parent device of assigned mdev > > > > > > > instances.". > > > > > > > > > > > > > > So I just want to confirm with you that (in case you miss): > > > > > > > > > > > > > > /sys/class/mdev/mdev_start | mdev_stop > > > > > > > > > > > > > > doesn't require the knowledge of parent device. > > > > > > > > > > > > > > > > > > > Qemu is just one example, where your explanation of parent device > > > > > > makes sense but still it's not good for Qemu to populate > > > > > > /sys/class/mdev > > > > > > directly. Qemu is passed with the actual sysfs path of assigned mdev > > > > > > instance, so any mdev attributes touched by Qemu should be put under > > > > > > that node (e.g. start/stop for live migration usage as I explained > > > > > > earlier). > > > > > > > > > > Exactly, qemu is passed with the actual sysfs path. > > > > > > > > > > So, QEMU doesn't touch the file /sys/class/mdev/mdev_start | > > > > > mdev_stop at all. > > > > > > > > > > QEMU will take the sysfs path as input: > > > > > > > > > > -device > > > > > > > > vfio-pci,sysfsdev=/sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0,i > > > > > > > > d=vgpu0 > > > > > > > > no need of passing "id=vgpu0" here. If necessary you can put id as an > > > > attribute > > > > under sysfs mdev node: > > > > > > > > /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0/id > > > > > > I think we have moved away from the device index based on Alex's comment, > > > so the > > > device path will be: > > > > > > /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818 > > > > pass /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818 > > as parameter, and then Qemu can access 'id'
Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver
On Tue, 16 Aug 2016 13:30:06 -0700 Neo Jia wrote: > On Mon, Aug 15, 2016 at 04:47:41PM -0600, Alex Williamson wrote: > > On Mon, 15 Aug 2016 12:59:08 -0700 > > Neo Jia wrote: > > > > > > > > > > > > > > I'm not sure a comma separated list makes sense here, for both > > > > > > simplicity in the kernel and more fine grained error reporting, we > > > > > > probably want to start/stop them individually. Actually, why is it > > > > > > that we can't use the mediated device being opened and released to > > > > > > automatically signal to the backend vendor driver to commit and > > > > > > release > > > > > > resources? I don't fully understand why userspace needs this > > > > > > interface. > > > > > > > > That doesn't give an individual user the ability to stop and start > > their devices though, because in order for a user to have write > > permissions there, they get permission to DoS other users by pumping > > arbitrary UUIDs into those files. By placing start/stop per mdev, we > > have mdev level granularity of granting start/stop privileges. Really > > though, do we want QEMU fumbling around through sysfs or do we want an > > interface through the vfio API to perform start/stop? Thanks, > > Hi Alex, > > I think those two suggests make sense, so we will move the "start/stop" > under mdev sysfs. > > This will be incorporated in our next v7 patch and by doing that, it will make > the locking scheme easier. Thanks Neo. Also note that the semantics change when we move to per device control. It would be redundant to 'echo $UUID' into a start file which only controls a single device. So that means we probably just want an 'echo 1'. But if we can 'echo 1' then we can also 'echo 0', so we can reduce this to a single sysfs file. Sysfs already has a common interface for starting and stopping devices, the "online" file. So I think we should probably move in that direction. Additionally, an "online" file should support a _show() function, so if we have an Intel vGPU that perhaps does not need start/stop support, online could report "1" after create to show that it's already online, possibly even generate an error trying to change the online state. Thanks, Alex
Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver
On Mon, Aug 15, 2016 at 04:47:41PM -0600, Alex Williamson wrote: > On Mon, 15 Aug 2016 12:59:08 -0700 > Neo Jia wrote: > > > > > > > > > > > I'm not sure a comma separated list makes sense here, for both > > > > > simplicity in the kernel and more fine grained error reporting, we > > > > > probably want to start/stop them individually. Actually, why is it > > > > > that we can't use the mediated device being opened and released to > > > > > automatically signal to the backend vendor driver to commit and > > > > > release > > > > > resources? I don't fully understand why userspace needs this > > > > > interface. > > > > > That doesn't give an individual user the ability to stop and start > their devices though, because in order for a user to have write > permissions there, they get permission to DoS other users by pumping > arbitrary UUIDs into those files. By placing start/stop per mdev, we > have mdev level granularity of granting start/stop privileges. Really > though, do we want QEMU fumbling around through sysfs or do we want an > interface through the vfio API to perform start/stop? Thanks, Hi Alex, I think those two suggests make sense, so we will move the "start/stop" under mdev sysfs. This will be incorporated in our next v7 patch and by doing that, it will make the locking scheme easier. Thanks, Neo > > Alex
Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver
On Tue, 16 Aug 2016 04:52:30 + "Tian, Kevin" wrote: > > From: Neo Jia [mailto:c...@nvidia.com] > > Sent: Tuesday, August 16, 2016 12:17 PM > > > > On Tue, Aug 16, 2016 at 03:50:44AM +, Tian, Kevin wrote: > > > > From: Neo Jia [mailto:c...@nvidia.com] > > > > Sent: Tuesday, August 16, 2016 11:46 AM > > > > > > > > On Tue, Aug 16, 2016 at 12:30:25AM +, Tian, Kevin wrote: > > > > > > From: Neo Jia [mailto:c...@nvidia.com] > > > > > > Sent: Tuesday, August 16, 2016 3:59 AM > > > > > > > > > > > > > > > > > > > > For NVIDIA vGPU solution we need to know all devices assigned > > > > > > > > to a VM in > > > > > > > > one shot to commit resources of all vGPUs assigned to a VM > > > > > > > > along with > > > > > > > > some common resources. > > > > > > > > > > > > > > Kirti, can you elaborate the background about above one-shot > > > > > > > commit > > > > > > > requirement? It's hard to understand such a requirement. > > > > > > > > > > > > > > As I relied in another mail, I really hope start/stop become a > > > > > > > per-mdev > > > > > > > attribute instead of global one, e.g.: > > > > > > > > > > > > > > echo "0/1" > > > /sys/class/mdev/12345678-1234-1234-1234-123456789abc/start > > > > > > > > > > > > > > In many scenario the user space client may only want to talk to > > > > > > > mdev > > > > > > > instance directly, w/o need to contact its parent device. Still > > > > > > > take > > > > > > > live migration for example, I don't think Qemu wants to know > > > > > > > parent > > > > > > > device of assigned mdev instances. > > > > > > > > > > > > Hi Kevin, > > > > > > > > > > > > Having a global /sys/class/mdev/mdev_start doesn't require anybody > > > > > > to know > > > > > > parent device. you can just do > > > > > > > > > > > > echo "mdev_UUID" > /sys/class/mdev/mdev_start > > > > > > > > > > > > or > > > > > > > > > > > > echo "mdev_UUID" > /sys/class/mdev/mdev_stop > > > > > > > > > > > > without knowing the parent device. > > > > > > > > > > > > > > > > You can look at some existing sysfs example, e.g.: > > > > > > > > > > echo "0/1" > /sys/bus/cpu/devices/cpu1/online > > > > > > > > > > You may also argue why not using a global style: > > > > > > > > > > echo "cpu1" > /sys/bus/cpu/devices/cpu_online > > > > > echo "cpu1" > /sys/bus/cpu/devices/cpu_offline > > > > > > > > > > There are many similar examples... > > > > > > > > Hi Kevin, > > > > > > > > My response above is to your question about using the global sysfs > > > > entry as you > > > > don't want to have the global path because > > > > > > > > "I don't think Qemu wants to know parent device of assigned mdev > > > > instances.". > > > > > > > > So I just want to confirm with you that (in case you miss): > > > > > > > > /sys/class/mdev/mdev_start | mdev_stop > > > > > > > > doesn't require the knowledge of parent device. > > > > > > > > > > Qemu is just one example, where your explanation of parent device > > > makes sense but still it's not good for Qemu to populate /sys/class/mdev > > > directly. Qemu is passed with the actual sysfs path of assigned mdev > > > instance, so any mdev attributes touched by Qemu should be put under > > > that node (e.g. start/stop for live migration usage as I explained > > > earlier). > > > > Exactly, qemu is passed with the actual sysfs path. > > > > So, QEMU doesn't touch the file /sys/class/mdev/mdev_start | mdev_stop at > > all. > > > > QEMU will take the sysfs path as input: > > > > -device > > vfio-pci,sysfsdev=/sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0,i > > d=vgpu0 > > no need of passing "id=vgpu0" here. If necessary you can put id as an > attribute > under sysfs mdev node: > > /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0/id QEMU needs an id parameter for devices, libvirt gives devices arbitrary names, typically hostdev# for assigned devices. This id is used to reference the device for hmp/qmp commands. This is not something the mdev infrastructure should define. Thanks, Alex
Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver
On Tue, Aug 16, 2016 at 05:58:54AM +, Tian, Kevin wrote: > > From: Neo Jia [mailto:c...@nvidia.com] > > Sent: Tuesday, August 16, 2016 1:44 PM > > > > On Tue, Aug 16, 2016 at 04:52:30AM +, Tian, Kevin wrote: > > > > From: Neo Jia [mailto:c...@nvidia.com] > > > > Sent: Tuesday, August 16, 2016 12:17 PM > > > > > > > > On Tue, Aug 16, 2016 at 03:50:44AM +, Tian, Kevin wrote: > > > > > > From: Neo Jia [mailto:c...@nvidia.com] > > > > > > Sent: Tuesday, August 16, 2016 11:46 AM > > > > > > > > > > > > On Tue, Aug 16, 2016 at 12:30:25AM +, Tian, Kevin wrote: > > > > > > > > From: Neo Jia [mailto:c...@nvidia.com] > > > > > > > > Sent: Tuesday, August 16, 2016 3:59 AM > > > > > > > > > > > > > > > > > > > > > > > > > > For NVIDIA vGPU solution we need to know all devices > > > > > > > > > > assigned to a VM > > in > > > > > > > > > > one shot to commit resources of all vGPUs assigned to a VM > > > > > > > > > > along with > > > > > > > > > > some common resources. > > > > > > > > > > > > > > > > > > Kirti, can you elaborate the background about above one-shot > > > > > > > > > commit > > > > > > > > > requirement? It's hard to understand such a requirement. > > > > > > > > > > > > > > > > > > As I relied in another mail, I really hope start/stop become > > > > > > > > > a per-mdev > > > > > > > > > attribute instead of global one, e.g.: > > > > > > > > > > > > > > > > > > echo "0/1" > > > > > /sys/class/mdev/12345678-1234-1234-1234-123456789abc/start > > > > > > > > > > > > > > > > > > In many scenario the user space client may only want to talk > > > > > > > > > to mdev > > > > > > > > > instance directly, w/o need to contact its parent device. > > > > > > > > > Still take > > > > > > > > > live migration for example, I don't think Qemu wants to know > > > > > > > > > parent > > > > > > > > > device of assigned mdev instances. > > > > > > > > > > > > > > > > Hi Kevin, > > > > > > > > > > > > > > > > Having a global /sys/class/mdev/mdev_start doesn't require > > > > > > > > anybody to > > know > > > > > > > > parent device. you can just do > > > > > > > > > > > > > > > > echo "mdev_UUID" > /sys/class/mdev/mdev_start > > > > > > > > > > > > > > > > or > > > > > > > > > > > > > > > > echo "mdev_UUID" > /sys/class/mdev/mdev_stop > > > > > > > > > > > > > > > > without knowing the parent device. > > > > > > > > > > > > > > > > > > > > > > You can look at some existing sysfs example, e.g.: > > > > > > > > > > > > > > echo "0/1" > /sys/bus/cpu/devices/cpu1/online > > > > > > > > > > > > > > You may also argue why not using a global style: > > > > > > > > > > > > > > echo "cpu1" > /sys/bus/cpu/devices/cpu_online > > > > > > > echo "cpu1" > /sys/bus/cpu/devices/cpu_offline > > > > > > > > > > > > > > There are many similar examples... > > > > > > > > > > > > Hi Kevin, > > > > > > > > > > > > My response above is to your question about using the global sysfs > > > > > > entry as you > > > > > > don't want to have the global path because > > > > > > > > > > > > "I don't think Qemu wants to know parent device of assigned mdev > > > > > > instances.". > > > > > > > > > > > > So I just want to confirm with you that (in case you miss): > > > > > > > > > > > > /sys/class/mdev/mdev_start | mdev_stop > > > > > > > > > > > > doesn't require the knowledge of parent device. > > > > > > > > > > > > > > > > Qemu is just one example, where your explanation of parent device > > > > > makes sense but still it's not good for Qemu to populate > > > > > /sys/class/mdev > > > > > directly. Qemu is passed with the actual sysfs path of assigned mdev > > > > > instance, so any mdev attributes touched by Qemu should be put under > > > > > that node (e.g. start/stop for live migration usage as I explained > > > > > earlier). > > > > > > > > Exactly, qemu is passed with the actual sysfs path. > > > > > > > > So, QEMU doesn't touch the file /sys/class/mdev/mdev_start | mdev_stop > > > > at all. > > > > > > > > QEMU will take the sysfs path as input: > > > > > > > > -device > > > > > > vfio-pci,sysfsdev=/sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0,i > > > > d=vgpu0 > > > > > > no need of passing "id=vgpu0" here. If necessary you can put id as an > > > attribute > > > under sysfs mdev node: > > > > > > /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0/id > > > > I think we have moved away from the device index based on Alex's comment, > > so the > > device path will be: > > > > /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818 > > pass /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818 > as parameter, and then Qemu can access 'id' under that path. You > don't need to pass a separate 'id' field. That's my point. > > > > > > > > > > > > > > > As you are saying in live migration, QEMU needs to access "start" and > > > > "stop". Could > > you > > > > please share more details, such as how QEMU access the "start" and > > > > "stop" sysfs, > > > > when and what trigg
Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver
> From: Neo Jia [mailto:c...@nvidia.com] > Sent: Tuesday, August 16, 2016 1:44 PM > > On Tue, Aug 16, 2016 at 04:52:30AM +, Tian, Kevin wrote: > > > From: Neo Jia [mailto:c...@nvidia.com] > > > Sent: Tuesday, August 16, 2016 12:17 PM > > > > > > On Tue, Aug 16, 2016 at 03:50:44AM +, Tian, Kevin wrote: > > > > > From: Neo Jia [mailto:c...@nvidia.com] > > > > > Sent: Tuesday, August 16, 2016 11:46 AM > > > > > > > > > > On Tue, Aug 16, 2016 at 12:30:25AM +, Tian, Kevin wrote: > > > > > > > From: Neo Jia [mailto:c...@nvidia.com] > > > > > > > Sent: Tuesday, August 16, 2016 3:59 AM > > > > > > > > > > > > > > > > > > > > > > > For NVIDIA vGPU solution we need to know all devices assigned > > > > > > > > > to a VM > in > > > > > > > > > one shot to commit resources of all vGPUs assigned to a VM > > > > > > > > > along with > > > > > > > > > some common resources. > > > > > > > > > > > > > > > > Kirti, can you elaborate the background about above one-shot > > > > > > > > commit > > > > > > > > requirement? It's hard to understand such a requirement. > > > > > > > > > > > > > > > > As I relied in another mail, I really hope start/stop become a > > > > > > > > per-mdev > > > > > > > > attribute instead of global one, e.g.: > > > > > > > > > > > > > > > > echo "0/1" > > > > /sys/class/mdev/12345678-1234-1234-1234-123456789abc/start > > > > > > > > > > > > > > > > In many scenario the user space client may only want to talk to > > > > > > > > mdev > > > > > > > > instance directly, w/o need to contact its parent device. Still > > > > > > > > take > > > > > > > > live migration for example, I don't think Qemu wants to know > > > > > > > > parent > > > > > > > > device of assigned mdev instances. > > > > > > > > > > > > > > Hi Kevin, > > > > > > > > > > > > > > Having a global /sys/class/mdev/mdev_start doesn't require > > > > > > > anybody to > know > > > > > > > parent device. you can just do > > > > > > > > > > > > > > echo "mdev_UUID" > /sys/class/mdev/mdev_start > > > > > > > > > > > > > > or > > > > > > > > > > > > > > echo "mdev_UUID" > /sys/class/mdev/mdev_stop > > > > > > > > > > > > > > without knowing the parent device. > > > > > > > > > > > > > > > > > > > You can look at some existing sysfs example, e.g.: > > > > > > > > > > > > echo "0/1" > /sys/bus/cpu/devices/cpu1/online > > > > > > > > > > > > You may also argue why not using a global style: > > > > > > > > > > > > echo "cpu1" > /sys/bus/cpu/devices/cpu_online > > > > > > echo "cpu1" > /sys/bus/cpu/devices/cpu_offline > > > > > > > > > > > > There are many similar examples... > > > > > > > > > > Hi Kevin, > > > > > > > > > > My response above is to your question about using the global sysfs > > > > > entry as you > > > > > don't want to have the global path because > > > > > > > > > > "I don't think Qemu wants to know parent device of assigned mdev > > > > > instances.". > > > > > > > > > > So I just want to confirm with you that (in case you miss): > > > > > > > > > > /sys/class/mdev/mdev_start | mdev_stop > > > > > > > > > > doesn't require the knowledge of parent device. > > > > > > > > > > > > > Qemu is just one example, where your explanation of parent device > > > > makes sense but still it's not good for Qemu to populate /sys/class/mdev > > > > directly. Qemu is passed with the actual sysfs path of assigned mdev > > > > instance, so any mdev attributes touched by Qemu should be put under > > > > that node (e.g. start/stop for live migration usage as I explained > > > > earlier). > > > > > > Exactly, qemu is passed with the actual sysfs path. > > > > > > So, QEMU doesn't touch the file /sys/class/mdev/mdev_start | mdev_stop at > > > all. > > > > > > QEMU will take the sysfs path as input: > > > > > > -device > > > > vfio-pci,sysfsdev=/sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0,i > > > d=vgpu0 > > > > no need of passing "id=vgpu0" here. If necessary you can put id as an > > attribute > > under sysfs mdev node: > > > > /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0/id > > I think we have moved away from the device index based on Alex's comment, so > the > device path will be: > > /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818 pass /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818 as parameter, and then Qemu can access 'id' under that path. You don't need to pass a separate 'id' field. That's my point. > > > > > > > > > As you are saying in live migration, QEMU needs to access "start" and > > > "stop". Could > you > > > please share more details, such as how QEMU access the "start" and "stop" > > > sysfs, > > > when and what triggers that? > > > > > > > A conceptual flow as below: > > > > 1. Quiescent mdev activity on the parent device (e.g. stop scheduling, wait > > for > > in-flight DMA completed, etc.) > > > > echo "0" > > > /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0/start > > > > 2. Save mdev state: > > > > cat /sys/bus/mdev/devices
Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver
On Tue, Aug 16, 2016 at 04:52:30AM +, Tian, Kevin wrote: > > From: Neo Jia [mailto:c...@nvidia.com] > > Sent: Tuesday, August 16, 2016 12:17 PM > > > > On Tue, Aug 16, 2016 at 03:50:44AM +, Tian, Kevin wrote: > > > > From: Neo Jia [mailto:c...@nvidia.com] > > > > Sent: Tuesday, August 16, 2016 11:46 AM > > > > > > > > On Tue, Aug 16, 2016 at 12:30:25AM +, Tian, Kevin wrote: > > > > > > From: Neo Jia [mailto:c...@nvidia.com] > > > > > > Sent: Tuesday, August 16, 2016 3:59 AM > > > > > > > > > > > > > > > > > > > > For NVIDIA vGPU solution we need to know all devices assigned > > > > > > > > to a VM in > > > > > > > > one shot to commit resources of all vGPUs assigned to a VM > > > > > > > > along with > > > > > > > > some common resources. > > > > > > > > > > > > > > Kirti, can you elaborate the background about above one-shot > > > > > > > commit > > > > > > > requirement? It's hard to understand such a requirement. > > > > > > > > > > > > > > As I relied in another mail, I really hope start/stop become a > > > > > > > per-mdev > > > > > > > attribute instead of global one, e.g.: > > > > > > > > > > > > > > echo "0/1" > > > /sys/class/mdev/12345678-1234-1234-1234-123456789abc/start > > > > > > > > > > > > > > In many scenario the user space client may only want to talk to > > > > > > > mdev > > > > > > > instance directly, w/o need to contact its parent device. Still > > > > > > > take > > > > > > > live migration for example, I don't think Qemu wants to know > > > > > > > parent > > > > > > > device of assigned mdev instances. > > > > > > > > > > > > Hi Kevin, > > > > > > > > > > > > Having a global /sys/class/mdev/mdev_start doesn't require anybody > > > > > > to know > > > > > > parent device. you can just do > > > > > > > > > > > > echo "mdev_UUID" > /sys/class/mdev/mdev_start > > > > > > > > > > > > or > > > > > > > > > > > > echo "mdev_UUID" > /sys/class/mdev/mdev_stop > > > > > > > > > > > > without knowing the parent device. > > > > > > > > > > > > > > > > You can look at some existing sysfs example, e.g.: > > > > > > > > > > echo "0/1" > /sys/bus/cpu/devices/cpu1/online > > > > > > > > > > You may also argue why not using a global style: > > > > > > > > > > echo "cpu1" > /sys/bus/cpu/devices/cpu_online > > > > > echo "cpu1" > /sys/bus/cpu/devices/cpu_offline > > > > > > > > > > There are many similar examples... > > > > > > > > Hi Kevin, > > > > > > > > My response above is to your question about using the global sysfs > > > > entry as you > > > > don't want to have the global path because > > > > > > > > "I don't think Qemu wants to know parent device of assigned mdev > > > > instances.". > > > > > > > > So I just want to confirm with you that (in case you miss): > > > > > > > > /sys/class/mdev/mdev_start | mdev_stop > > > > > > > > doesn't require the knowledge of parent device. > > > > > > > > > > Qemu is just one example, where your explanation of parent device > > > makes sense but still it's not good for Qemu to populate /sys/class/mdev > > > directly. Qemu is passed with the actual sysfs path of assigned mdev > > > instance, so any mdev attributes touched by Qemu should be put under > > > that node (e.g. start/stop for live migration usage as I explained > > > earlier). > > > > Exactly, qemu is passed with the actual sysfs path. > > > > So, QEMU doesn't touch the file /sys/class/mdev/mdev_start | mdev_stop at > > all. > > > > QEMU will take the sysfs path as input: > > > > -device > > vfio-pci,sysfsdev=/sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0,i > > d=vgpu0 > > no need of passing "id=vgpu0" here. If necessary you can put id as an > attribute > under sysfs mdev node: > > /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0/id I think we have moved away from the device index based on Alex's comment, so the device path will be: /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818 > > > > > As you are saying in live migration, QEMU needs to access "start" and > > "stop". Could you > > please share more details, such as how QEMU access the "start" and "stop" > > sysfs, > > when and what triggers that? > > > > A conceptual flow as below: > > 1. Quiescent mdev activity on the parent device (e.g. stop scheduling, wait > for > in-flight DMA completed, etc.) > > echo "0" > /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0/start > > 2. Save mdev state: > > cat /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0/state > xxx > > 3. xxx will be part of the final VM image and copied to a new machine > > 4. Allocate/prepare mdev on the new machine for this VM > > 5. Restore mdev state: > > cat xxx > /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0/state > (might be a different path name) > > 6. start mdev on the new parent device: > > echo "1" > /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0/start Thanks for the sequence, so based on above live migration, the access of
Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver
> From: Neo Jia [mailto:c...@nvidia.com] > Sent: Tuesday, August 16, 2016 12:17 PM > > On Tue, Aug 16, 2016 at 03:50:44AM +, Tian, Kevin wrote: > > > From: Neo Jia [mailto:c...@nvidia.com] > > > Sent: Tuesday, August 16, 2016 11:46 AM > > > > > > On Tue, Aug 16, 2016 at 12:30:25AM +, Tian, Kevin wrote: > > > > > From: Neo Jia [mailto:c...@nvidia.com] > > > > > Sent: Tuesday, August 16, 2016 3:59 AM > > > > > > > > > > > > > > > > > For NVIDIA vGPU solution we need to know all devices assigned to > > > > > > > a VM in > > > > > > > one shot to commit resources of all vGPUs assigned to a VM along > > > > > > > with > > > > > > > some common resources. > > > > > > > > > > > > Kirti, can you elaborate the background about above one-shot commit > > > > > > requirement? It's hard to understand such a requirement. > > > > > > > > > > > > As I relied in another mail, I really hope start/stop become a > > > > > > per-mdev > > > > > > attribute instead of global one, e.g.: > > > > > > > > > > > > echo "0/1" > > /sys/class/mdev/12345678-1234-1234-1234-123456789abc/start > > > > > > > > > > > > In many scenario the user space client may only want to talk to mdev > > > > > > instance directly, w/o need to contact its parent device. Still take > > > > > > live migration for example, I don't think Qemu wants to know parent > > > > > > device of assigned mdev instances. > > > > > > > > > > Hi Kevin, > > > > > > > > > > Having a global /sys/class/mdev/mdev_start doesn't require anybody to > > > > > know > > > > > parent device. you can just do > > > > > > > > > > echo "mdev_UUID" > /sys/class/mdev/mdev_start > > > > > > > > > > or > > > > > > > > > > echo "mdev_UUID" > /sys/class/mdev/mdev_stop > > > > > > > > > > without knowing the parent device. > > > > > > > > > > > > > You can look at some existing sysfs example, e.g.: > > > > > > > > echo "0/1" > /sys/bus/cpu/devices/cpu1/online > > > > > > > > You may also argue why not using a global style: > > > > > > > > echo "cpu1" > /sys/bus/cpu/devices/cpu_online > > > > echo "cpu1" > /sys/bus/cpu/devices/cpu_offline > > > > > > > > There are many similar examples... > > > > > > Hi Kevin, > > > > > > My response above is to your question about using the global sysfs entry > > > as you > > > don't want to have the global path because > > > > > > "I don't think Qemu wants to know parent device of assigned mdev > > > instances.". > > > > > > So I just want to confirm with you that (in case you miss): > > > > > > /sys/class/mdev/mdev_start | mdev_stop > > > > > > doesn't require the knowledge of parent device. > > > > > > > Qemu is just one example, where your explanation of parent device > > makes sense but still it's not good for Qemu to populate /sys/class/mdev > > directly. Qemu is passed with the actual sysfs path of assigned mdev > > instance, so any mdev attributes touched by Qemu should be put under > > that node (e.g. start/stop for live migration usage as I explained earlier). > > Exactly, qemu is passed with the actual sysfs path. > > So, QEMU doesn't touch the file /sys/class/mdev/mdev_start | mdev_stop at all. > > QEMU will take the sysfs path as input: > > -device > vfio-pci,sysfsdev=/sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0,i > d=vgpu0 no need of passing "id=vgpu0" here. If necessary you can put id as an attribute under sysfs mdev node: /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0/id > > As you are saying in live migration, QEMU needs to access "start" and "stop". > Could you > please share more details, such as how QEMU access the "start" and "stop" > sysfs, > when and what triggers that? > A conceptual flow as below: 1. Quiescent mdev activity on the parent device (e.g. stop scheduling, wait for in-flight DMA completed, etc.) echo "0" > /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0/start 2. Save mdev state: cat /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0/state > xxx 3. xxx will be part of the final VM image and copied to a new machine 4. Allocate/prepare mdev on the new machine for this VM 5. Restore mdev state: cat xxx > /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0/state (might be a different path name) 6. start mdev on the new parent device: echo "1" > /sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0/start Thanks Kevin
Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver
On Tue, Aug 16, 2016 at 03:50:44AM +, Tian, Kevin wrote: > > From: Neo Jia [mailto:c...@nvidia.com] > > Sent: Tuesday, August 16, 2016 11:46 AM > > > > On Tue, Aug 16, 2016 at 12:30:25AM +, Tian, Kevin wrote: > > > > From: Neo Jia [mailto:c...@nvidia.com] > > > > Sent: Tuesday, August 16, 2016 3:59 AM > > > > > > > > > > > > > > For NVIDIA vGPU solution we need to know all devices assigned to a > > > > > > VM in > > > > > > one shot to commit resources of all vGPUs assigned to a VM along > > > > > > with > > > > > > some common resources. > > > > > > > > > > Kirti, can you elaborate the background about above one-shot commit > > > > > requirement? It's hard to understand such a requirement. > > > > > > > > > > As I relied in another mail, I really hope start/stop become a > > > > > per-mdev > > > > > attribute instead of global one, e.g.: > > > > > > > > > > echo "0/1" > > > > > > /sys/class/mdev/12345678-1234-1234-1234-123456789abc/start > > > > > > > > > > In many scenario the user space client may only want to talk to mdev > > > > > instance directly, w/o need to contact its parent device. Still take > > > > > live migration for example, I don't think Qemu wants to know parent > > > > > device of assigned mdev instances. > > > > > > > > Hi Kevin, > > > > > > > > Having a global /sys/class/mdev/mdev_start doesn't require anybody to > > > > know > > > > parent device. you can just do > > > > > > > > echo "mdev_UUID" > /sys/class/mdev/mdev_start > > > > > > > > or > > > > > > > > echo "mdev_UUID" > /sys/class/mdev/mdev_stop > > > > > > > > without knowing the parent device. > > > > > > > > > > You can look at some existing sysfs example, e.g.: > > > > > > echo "0/1" > /sys/bus/cpu/devices/cpu1/online > > > > > > You may also argue why not using a global style: > > > > > > echo "cpu1" > /sys/bus/cpu/devices/cpu_online > > > echo "cpu1" > /sys/bus/cpu/devices/cpu_offline > > > > > > There are many similar examples... > > > > Hi Kevin, > > > > My response above is to your question about using the global sysfs entry as > > you > > don't want to have the global path because > > > > "I don't think Qemu wants to know parent device of assigned mdev > > instances.". > > > > So I just want to confirm with you that (in case you miss): > > > > /sys/class/mdev/mdev_start | mdev_stop > > > > doesn't require the knowledge of parent device. > > > > Qemu is just one example, where your explanation of parent device > makes sense but still it's not good for Qemu to populate /sys/class/mdev > directly. Qemu is passed with the actual sysfs path of assigned mdev > instance, so any mdev attributes touched by Qemu should be put under > that node (e.g. start/stop for live migration usage as I explained earlier). Exactly, qemu is passed with the actual sysfs path. So, QEMU doesn't touch the file /sys/class/mdev/mdev_start | mdev_stop at all. QEMU will take the sysfs path as input: -device vfio-pci,sysfsdev=/sys/bus/mdev/devices/c0b26072-dd1b-4340-84fe-bf338c510818-0,id=vgpu0 As you are saying in live migration, QEMU needs to access "start" and "stop". Could you please share more details, such as how QEMU access the "start" and "stop" sysfs, when and what triggers that? Thanks, Neo > > Thanks > Kevin
Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver
> From: Neo Jia [mailto:c...@nvidia.com] > Sent: Tuesday, August 16, 2016 11:46 AM > > On Tue, Aug 16, 2016 at 12:30:25AM +, Tian, Kevin wrote: > > > From: Neo Jia [mailto:c...@nvidia.com] > > > Sent: Tuesday, August 16, 2016 3:59 AM > > > > > > > > > > > For NVIDIA vGPU solution we need to know all devices assigned to a VM > > > > > in > > > > > one shot to commit resources of all vGPUs assigned to a VM along with > > > > > some common resources. > > > > > > > > Kirti, can you elaborate the background about above one-shot commit > > > > requirement? It's hard to understand such a requirement. > > > > > > > > As I relied in another mail, I really hope start/stop become a per-mdev > > > > attribute instead of global one, e.g.: > > > > > > > > echo "0/1" > /sys/class/mdev/12345678-1234-1234-1234-123456789abc/start > > > > > > > > In many scenario the user space client may only want to talk to mdev > > > > instance directly, w/o need to contact its parent device. Still take > > > > live migration for example, I don't think Qemu wants to know parent > > > > device of assigned mdev instances. > > > > > > Hi Kevin, > > > > > > Having a global /sys/class/mdev/mdev_start doesn't require anybody to know > > > parent device. you can just do > > > > > > echo "mdev_UUID" > /sys/class/mdev/mdev_start > > > > > > or > > > > > > echo "mdev_UUID" > /sys/class/mdev/mdev_stop > > > > > > without knowing the parent device. > > > > > > > You can look at some existing sysfs example, e.g.: > > > > echo "0/1" > /sys/bus/cpu/devices/cpu1/online > > > > You may also argue why not using a global style: > > > > echo "cpu1" > /sys/bus/cpu/devices/cpu_online > > echo "cpu1" > /sys/bus/cpu/devices/cpu_offline > > > > There are many similar examples... > > Hi Kevin, > > My response above is to your question about using the global sysfs entry as > you > don't want to have the global path because > > "I don't think Qemu wants to know parent device of assigned mdev instances.". > > So I just want to confirm with you that (in case you miss): > > /sys/class/mdev/mdev_start | mdev_stop > > doesn't require the knowledge of parent device. > Qemu is just one example, where your explanation of parent device makes sense but still it's not good for Qemu to populate /sys/class/mdev directly. Qemu is passed with the actual sysfs path of assigned mdev instance, so any mdev attributes touched by Qemu should be put under that node (e.g. start/stop for live migration usage as I explained earlier). Thanks Kevin
Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver
On Tue, Aug 16, 2016 at 12:30:25AM +, Tian, Kevin wrote: > > From: Neo Jia [mailto:c...@nvidia.com] > > Sent: Tuesday, August 16, 2016 3:59 AM > > > > > > > > For NVIDIA vGPU solution we need to know all devices assigned to a VM in > > > > one shot to commit resources of all vGPUs assigned to a VM along with > > > > some common resources. > > > > > > Kirti, can you elaborate the background about above one-shot commit > > > requirement? It's hard to understand such a requirement. > > > > > > As I relied in another mail, I really hope start/stop become a per-mdev > > > attribute instead of global one, e.g.: > > > > > > echo "0/1" > /sys/class/mdev/12345678-1234-1234-1234-123456789abc/start > > > > > > In many scenario the user space client may only want to talk to mdev > > > instance directly, w/o need to contact its parent device. Still take > > > live migration for example, I don't think Qemu wants to know parent > > > device of assigned mdev instances. > > > > Hi Kevin, > > > > Having a global /sys/class/mdev/mdev_start doesn't require anybody to know > > parent device. you can just do > > > > echo "mdev_UUID" > /sys/class/mdev/mdev_start > > > > or > > > > echo "mdev_UUID" > /sys/class/mdev/mdev_stop > > > > without knowing the parent device. > > > > You can look at some existing sysfs example, e.g.: > > echo "0/1" > /sys/bus/cpu/devices/cpu1/online > > You may also argue why not using a global style: > > echo "cpu1" > /sys/bus/cpu/devices/cpu_online > echo "cpu1" > /sys/bus/cpu/devices/cpu_offline > > There are many similar examples... Hi Kevin, My response above is to your question about using the global sysfs entry as you don't want to have the global path because "I don't think Qemu wants to know parent device of assigned mdev instances.". So I just want to confirm with you that (in case you miss): /sys/class/mdev/mdev_start | mdev_stop doesn't require the knowledge of parent device. Thanks, Neo > > Thanks > Kevin
Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver
> From: Neo Jia [mailto:c...@nvidia.com] > Sent: Tuesday, August 16, 2016 7:24 AM > > > > > > > > > > > > > > > > > For NVIDIA vGPU solution we need to know all devices assigned to a > > > > > > VM in > > > > > > one shot to commit resources of all vGPUs assigned to a VM along > > > > > > with > > > > > > some common resources. > > > > > > > > > > Kirti, can you elaborate the background about above one-shot commit > > > > > requirement? It's hard to understand such a requirement. > > > > > > > > Agree, I know NVIDIA isn't planning to support hotplug initially, but > > > > this seems like we're precluding hotplug from the design. I don't > > > > understand what's driving this one-shot requirement. > > > > > > Hi Alex, > > > > > > The requirement here is based on how our internal vGPU device model > > > designed and > > > with this we are able to pre-allocate resources required for multiple > > > virtual > > > devices within same domain. > > > > > > And I don't think this syntax will stop us from supporting hotplug at all. > > > > > > For example, you can always create a virtual mdev and then do > > > > > > echo "mdev_UUID" > /sys/class/mdev/mdev_start > > > > > > then use QEMU monitor to add the device for hotplug. > > > > Hi Neo, > > > > I'm still not understanding the advantage you get from the "one-shot" > > approach then if we can always add more mdevs by starting them later. > > Are the hotplug mdevs somehow less capable than the initial set of > > mdevs added in a single shot? If the initial set is allocated > > from the "same domain", does that give them some sort of hardware > > locality/resource benefit? Thanks, > > Hi Alex, > > At least we will not able to guarantee some special hardware resource for the > hotplug devices. > > So from our point of view, we also have dedicated internal SW entity to > manage all > virtual devices for each "domain/virtual machine", and such SW entity will be > created > at virtual device start time. > > This is why we need to do this in one-shot to support multiple virtual device > per VM case. > Is pre-allocation of special hardware resource done one-time for all mdev instances? Can it be done one-by-one as long as mdev is started early before VM is launched? If such one-shot requirement is really required, it would be cleaner to me to introduce a mdev group concept, so mdev instances with one-short start requirements can be put under a mdev group. Then you can do one-shot start by: echo "0/1" > /sys/class/mdev/group/0/start Thanks Kevin
Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver
> From: Neo Jia [mailto:c...@nvidia.com] > Sent: Tuesday, August 16, 2016 3:59 AM > > On Mon, Aug 15, 2016 at 09:38:52AM +, Tian, Kevin wrote: > > > From: Kirti Wankhede [mailto:kwankh...@nvidia.com] > > > Sent: Saturday, August 13, 2016 8:37 AM > > > > > > > > > > > > On 8/13/2016 2:46 AM, Alex Williamson wrote: > > > > On Sat, 13 Aug 2016 00:14:39 +0530 > > > > Kirti Wankhede wrote: > > > > > > > >> On 8/10/2016 12:30 AM, Alex Williamson wrote: > > > >>> On Thu, 4 Aug 2016 00:33:51 +0530 > > > >>> Kirti Wankhede wrote: > > > >>> > > > >>> This is used later by mdev_device_start() and mdev_device_stop() to > > > >>> get > > > >>> the parent_device so it can call the start and stop ops callbacks > > > >>> respectively. That seems to imply that all of instances for a given > > > >>> uuid come from the same parent_device. Where is that enforced? I'm > > > >>> still having a hard time buying into the uuid+instance plan when it > > > >>> seems like each mdev_device should have an actual unique uuid. > > > >>> Userspace tools can figure out which uuids to start for a given user, > > > >>> I > > > >>> don't see much value in collecting them to instances within a uuid. > > > >>> > > > >> > > > >> Initially we started discussion with VM_UUID+instance suggestion, where > > > >> instance was introduced to support multiple devices in a VM. > > > > > > > > The instance number was never required in order to support multiple > > > > devices in a VM, IIRC this UUID+instance scheme was to appease NVIDIA > > > > management tools which wanted to re-use the VM UUID by creating vGPU > > > > devices with that same UUID and therefore associate udev events to a > > > > given VM. Only then does an instance number become necessary since the > > > > UUID needs to be static for a vGPUs within a VM. This has always felt > > > > like a very dodgy solution when we should probably just be querying > > > > libvirt to give us a device to VM association. > > > > Agree with Alex here. We'd better not assume that UUID will be a VM_UUID > > for mdev in the basic design. It's bound to NVIDIA management stack too > > tightly. > > > > I'm OK to give enough flexibility for various upper level management stacks, > > e.g. instead of $UUID+INDEX style, would $UUID+STRING provide a better > > option where either UUID or STRING could be optional? Upper management > > stack can choose its own policy to identify a mdev: > > > > a) $UUID only, so each mdev is allocated with a unique UUID > > b) STRING only, which could be an index (0, 1, 2, ...), or any combination > > (vgpu0, vgpu1, etc.) > > c) $UUID+STRING, where UUID could be a VM UUID, and STRING could be > > a numeric index > > > > > > > > > >> 'mdev_create' creates device and 'mdev_start' is to commit resources of > > > >> all instances of similar devices assigned to VM. > > > >> > > > >> For example, to create 2 devices: > > > >> # echo "$UUID:0:params" > /sys/devices/../mdev_create > > > >> # echo "$UUID:1:params" > /sys/devices/../mdev_create > > > >> > > > >> "$UUID-0" and "$UUID-1" devices are created. > > > >> > > > >> Commit resources for above devices with single 'mdev_start': > > > >> # echo "$UUID" > /sys/class/mdev/mdev_start > > > >> > > > >> Considering $UUID to be a unique UUID of a device, we don't need > > > >> 'instance', so 'mdev_create' would look like: > > > >> > > > >> # echo "$UUID1:params" > /sys/devices/../mdev_create > > > >> # echo "$UUID2:params" > /sys/devices/../mdev_create > > > >> > > > >> where $UUID1 and $UUID2 would be mdev device's unique UUID and 'params' > > > >> would be vendor specific parameters. > > > >> > > > >> Device nodes would be created as "$UUID1" and "$UUID" > > > >> > > > >> Then 'mdev_start' would be: > > > >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_start > > > >> > > > >> Similarly 'mdev_stop' and 'mdev_destroy' would be: > > > >> > > > >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_stop > > > > > > > > I'm not sure a comma separated list makes sense here, for both > > > > simplicity in the kernel and more fine grained error reporting, we > > > > probably want to start/stop them individually. Actually, why is it > > > > that we can't use the mediated device being opened and released to > > > > automatically signal to the backend vendor driver to commit and release > > > > resources? I don't fully understand why userspace needs this interface. > > > > There is a meaningful use of start/stop interface, as required in live > > migration support. Such interface allows vendor driver to quiescent > > mdev activity on source device before mdev hardware state is snapshot, > > and then resume mdev activity on dest device after its state is recovered. > > Intel has implemented experimental live migration support in KVMGT (soon > > to release), based on above two interfaces (plus another two to get/set > > mdev state). > > > > > > > > > > > > For NVIDIA vGPU solution we need to know all devices assigned to a VM in > > > one shot to
Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver
> From: Alex Williamson [mailto:alex.william...@redhat.com] > Sent: Tuesday, August 16, 2016 6:48 AM > > On Mon, 15 Aug 2016 12:59:08 -0700 > Neo Jia wrote: > > > On Mon, Aug 15, 2016 at 09:38:52AM +, Tian, Kevin wrote: > > > > From: Kirti Wankhede [mailto:kwankh...@nvidia.com] > > > > Sent: Saturday, August 13, 2016 8:37 AM > > > > > > > > > > > > > > > > On 8/13/2016 2:46 AM, Alex Williamson wrote: > > > > > On Sat, 13 Aug 2016 00:14:39 +0530 > > > > > Kirti Wankhede wrote: > > > > > > > > > >> On 8/10/2016 12:30 AM, Alex Williamson wrote: > > > > >>> On Thu, 4 Aug 2016 00:33:51 +0530 > > > > >>> Kirti Wankhede wrote: > > > > >>> > > > > >>> This is used later by mdev_device_start() and mdev_device_stop() to > > > > >>> get > > > > >>> the parent_device so it can call the start and stop ops callbacks > > > > >>> respectively. That seems to imply that all of instances for a given > > > > >>> uuid come from the same parent_device. Where is that enforced? I'm > > > > >>> still having a hard time buying into the uuid+instance plan when it > > > > >>> seems like each mdev_device should have an actual unique uuid. > > > > >>> Userspace tools can figure out which uuids to start for a given > > > > >>> user, I > > > > >>> don't see much value in collecting them to instances within a uuid. > > > > >>> > > > > >> > > > > >> Initially we started discussion with VM_UUID+instance suggestion, > > > > >> where > > > > >> instance was introduced to support multiple devices in a VM. > > > > > > > > > > The instance number was never required in order to support multiple > > > > > devices in a VM, IIRC this UUID+instance scheme was to appease NVIDIA > > > > > management tools which wanted to re-use the VM UUID by creating vGPU > > > > > devices with that same UUID and therefore associate udev events to a > > > > > given VM. Only then does an instance number become necessary since > > > > > the > > > > > UUID needs to be static for a vGPUs within a VM. This has always felt > > > > > like a very dodgy solution when we should probably just be querying > > > > > libvirt to give us a device to VM association. > > > > > > Agree with Alex here. We'd better not assume that UUID will be a VM_UUID > > > for mdev in the basic design. It's bound to NVIDIA management stack too > > > tightly. > > > > > > I'm OK to give enough flexibility for various upper level management > > > stacks, > > > e.g. instead of $UUID+INDEX style, would $UUID+STRING provide a better > > > option where either UUID or STRING could be optional? Upper management > > > stack can choose its own policy to identify a mdev: > > > > > > a) $UUID only, so each mdev is allocated with a unique UUID > > > b) STRING only, which could be an index (0, 1, 2, ...), or any combination > > > (vgpu0, vgpu1, etc.) > > > c) $UUID+STRING, where UUID could be a VM UUID, and STRING could be > > > a numeric index > > > > > > > > > > > > >> 'mdev_create' creates device and 'mdev_start' is to commit resources > > > > >> of > > > > >> all instances of similar devices assigned to VM. > > > > >> > > > > >> For example, to create 2 devices: > > > > >> # echo "$UUID:0:params" > /sys/devices/../mdev_create > > > > >> # echo "$UUID:1:params" > /sys/devices/../mdev_create > > > > >> > > > > >> "$UUID-0" and "$UUID-1" devices are created. > > > > >> > > > > >> Commit resources for above devices with single 'mdev_start': > > > > >> # echo "$UUID" > /sys/class/mdev/mdev_start > > > > >> > > > > >> Considering $UUID to be a unique UUID of a device, we don't need > > > > >> 'instance', so 'mdev_create' would look like: > > > > >> > > > > >> # echo "$UUID1:params" > /sys/devices/../mdev_create > > > > >> # echo "$UUID2:params" > /sys/devices/../mdev_create > > > > >> > > > > >> where $UUID1 and $UUID2 would be mdev device's unique UUID and > > > > >> 'params' > > > > >> would be vendor specific parameters. > > > > >> > > > > >> Device nodes would be created as "$UUID1" and "$UUID" > > > > >> > > > > >> Then 'mdev_start' would be: > > > > >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_start > > > > >> > > > > >> Similarly 'mdev_stop' and 'mdev_destroy' would be: > > > > >> > > > > >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_stop > > > > > > > > > > I'm not sure a comma separated list makes sense here, for both > > > > > simplicity in the kernel and more fine grained error reporting, we > > > > > probably want to start/stop them individually. Actually, why is it > > > > > that we can't use the mediated device being opened and released to > > > > > automatically signal to the backend vendor driver to commit and > > > > > release > > > > > resources? I don't fully understand why userspace needs this > > > > > interface. > > > > > > There is a meaningful use of start/stop interface, as required in live > > > migration support. Such interface allows vendor driver to quiescent > > > mdev activity on source device before mdev hardware state is snapshot, > > > and then resume
Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver
On Mon, Aug 15, 2016 at 04:47:41PM -0600, Alex Williamson wrote: > On Mon, 15 Aug 2016 12:59:08 -0700 > Neo Jia wrote: > > > On Mon, Aug 15, 2016 at 09:38:52AM +, Tian, Kevin wrote: > > > > From: Kirti Wankhede [mailto:kwankh...@nvidia.com] > > > > Sent: Saturday, August 13, 2016 8:37 AM > > > > > > > > > > > > > > > > On 8/13/2016 2:46 AM, Alex Williamson wrote: > > > > > On Sat, 13 Aug 2016 00:14:39 +0530 > > > > > Kirti Wankhede wrote: > > > > > > > > > >> On 8/10/2016 12:30 AM, Alex Williamson wrote: > > > > >>> On Thu, 4 Aug 2016 00:33:51 +0530 > > > > >>> Kirti Wankhede wrote: > > > > >>> > > > > >>> This is used later by mdev_device_start() and mdev_device_stop() to > > > > >>> get > > > > >>> the parent_device so it can call the start and stop ops callbacks > > > > >>> respectively. That seems to imply that all of instances for a given > > > > >>> uuid come from the same parent_device. Where is that enforced? I'm > > > > >>> still having a hard time buying into the uuid+instance plan when it > > > > >>> seems like each mdev_device should have an actual unique uuid. > > > > >>> Userspace tools can figure out which uuids to start for a given > > > > >>> user, I > > > > >>> don't see much value in collecting them to instances within a uuid. > > > > >>> > > > > >> > > > > >> Initially we started discussion with VM_UUID+instance suggestion, > > > > >> where > > > > >> instance was introduced to support multiple devices in a VM. > > > > > > > > > > The instance number was never required in order to support multiple > > > > > devices in a VM, IIRC this UUID+instance scheme was to appease NVIDIA > > > > > management tools which wanted to re-use the VM UUID by creating vGPU > > > > > devices with that same UUID and therefore associate udev events to a > > > > > given VM. Only then does an instance number become necessary since > > > > > the > > > > > UUID needs to be static for a vGPUs within a VM. This has always felt > > > > > like a very dodgy solution when we should probably just be querying > > > > > libvirt to give us a device to VM association. > > > > > > Agree with Alex here. We'd better not assume that UUID will be a VM_UUID > > > for mdev in the basic design. It's bound to NVIDIA management stack too > > > tightly. > > > > > > I'm OK to give enough flexibility for various upper level management > > > stacks, > > > e.g. instead of $UUID+INDEX style, would $UUID+STRING provide a better > > > option where either UUID or STRING could be optional? Upper management > > > stack can choose its own policy to identify a mdev: > > > > > > a) $UUID only, so each mdev is allocated with a unique UUID > > > b) STRING only, which could be an index (0, 1, 2, ...), or any > > > combination > > > (vgpu0, vgpu1, etc.) > > > c) $UUID+STRING, where UUID could be a VM UUID, and STRING could be > > > a numeric index > > > > > > > > > > > > >> 'mdev_create' creates device and 'mdev_start' is to commit resources > > > > >> of > > > > >> all instances of similar devices assigned to VM. > > > > >> > > > > >> For example, to create 2 devices: > > > > >> # echo "$UUID:0:params" > /sys/devices/../mdev_create > > > > >> # echo "$UUID:1:params" > /sys/devices/../mdev_create > > > > >> > > > > >> "$UUID-0" and "$UUID-1" devices are created. > > > > >> > > > > >> Commit resources for above devices with single 'mdev_start': > > > > >> # echo "$UUID" > /sys/class/mdev/mdev_start > > > > >> > > > > >> Considering $UUID to be a unique UUID of a device, we don't need > > > > >> 'instance', so 'mdev_create' would look like: > > > > >> > > > > >> # echo "$UUID1:params" > /sys/devices/../mdev_create > > > > >> # echo "$UUID2:params" > /sys/devices/../mdev_create > > > > >> > > > > >> where $UUID1 and $UUID2 would be mdev device's unique UUID and > > > > >> 'params' > > > > >> would be vendor specific parameters. > > > > >> > > > > >> Device nodes would be created as "$UUID1" and "$UUID" > > > > >> > > > > >> Then 'mdev_start' would be: > > > > >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_start > > > > >> > > > > >> Similarly 'mdev_stop' and 'mdev_destroy' would be: > > > > >> > > > > >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_stop > > > > > > > > > > I'm not sure a comma separated list makes sense here, for both > > > > > simplicity in the kernel and more fine grained error reporting, we > > > > > probably want to start/stop them individually. Actually, why is it > > > > > that we can't use the mediated device being opened and released to > > > > > automatically signal to the backend vendor driver to commit and > > > > > release > > > > > resources? I don't fully understand why userspace needs this > > > > > interface. > > > > > > There is a meaningful use of start/stop interface, as required in live > > > migration support. Such interface allows vendor driver to quiescent > > > mdev activity on source device before mdev hardware state is snapshot, > > > and then resume m
Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver
On Mon, Aug 15, 2016 at 04:52:39PM -0600, Alex Williamson wrote: > On Mon, 15 Aug 2016 15:09:30 -0700 > Neo Jia wrote: > > > On Mon, Aug 15, 2016 at 09:59:26AM -0600, Alex Williamson wrote: > > > On Mon, 15 Aug 2016 09:38:52 + > > > "Tian, Kevin" wrote: > > > > > > > > From: Kirti Wankhede [mailto:kwankh...@nvidia.com] > > > > > Sent: Saturday, August 13, 2016 8:37 AM > > > > > > > > > > > > > > > > > > > > On 8/13/2016 2:46 AM, Alex Williamson wrote: > > > > > > On Sat, 13 Aug 2016 00:14:39 +0530 > > > > > > Kirti Wankhede wrote: > > > > > > > > > > > >> On 8/10/2016 12:30 AM, Alex Williamson wrote: > > > > > >>> On Thu, 4 Aug 2016 00:33:51 +0530 > > > > > >>> Kirti Wankhede wrote: > > > > > >>> > > > > > >>> This is used later by mdev_device_start() and mdev_device_stop() > > > > > >>> to get > > > > > >>> the parent_device so it can call the start and stop ops callbacks > > > > > >>> respectively. That seems to imply that all of instances for a > > > > > >>> given > > > > > >>> uuid come from the same parent_device. Where is that enforced? > > > > > >>> I'm > > > > > >>> still having a hard time buying into the uuid+instance plan when > > > > > >>> it > > > > > >>> seems like each mdev_device should have an actual unique uuid. > > > > > >>> Userspace tools can figure out which uuids to start for a given > > > > > >>> user, I > > > > > >>> don't see much value in collecting them to instances within a > > > > > >>> uuid. > > > > > >>> > > > > > >> > > > > > >> Initially we started discussion with VM_UUID+instance suggestion, > > > > > >> where > > > > > >> instance was introduced to support multiple devices in a VM. > > > > > > > > > > > > The instance number was never required in order to support multiple > > > > > > devices in a VM, IIRC this UUID+instance scheme was to appease > > > > > > NVIDIA > > > > > > management tools which wanted to re-use the VM UUID by creating vGPU > > > > > > devices with that same UUID and therefore associate udev events to a > > > > > > given VM. Only then does an instance number become necessary since > > > > > > the > > > > > > UUID needs to be static for a vGPUs within a VM. This has always > > > > > > felt > > > > > > like a very dodgy solution when we should probably just be querying > > > > > > libvirt to give us a device to VM association. > > > > > > > > Agree with Alex here. We'd better not assume that UUID will be a VM_UUID > > > > for mdev in the basic design. It's bound to NVIDIA management stack too > > > > tightly. > > > > > > > > I'm OK to give enough flexibility for various upper level management > > > > stacks, > > > > e.g. instead of $UUID+INDEX style, would $UUID+STRING provide a better > > > > option where either UUID or STRING could be optional? Upper management > > > > stack can choose its own policy to identify a mdev: > > > > > > > > a) $UUID only, so each mdev is allocated with a unique UUID > > > > b) STRING only, which could be an index (0, 1, 2, ...), or any > > > > combination > > > > (vgpu0, vgpu1, etc.) > > > > c) $UUID+STRING, where UUID could be a VM UUID, and STRING could be > > > > a numeric index > > > > > > > > > > > > > > > >> 'mdev_create' creates device and 'mdev_start' is to commit > > > > > >> resources of > > > > > >> all instances of similar devices assigned to VM. > > > > > >> > > > > > >> For example, to create 2 devices: > > > > > >> # echo "$UUID:0:params" > /sys/devices/../mdev_create > > > > > >> # echo "$UUID:1:params" > /sys/devices/../mdev_create > > > > > >> > > > > > >> "$UUID-0" and "$UUID-1" devices are created. > > > > > >> > > > > > >> Commit resources for above devices with single 'mdev_start': > > > > > >> # echo "$UUID" > /sys/class/mdev/mdev_start > > > > > >> > > > > > >> Considering $UUID to be a unique UUID of a device, we don't need > > > > > >> 'instance', so 'mdev_create' would look like: > > > > > >> > > > > > >> # echo "$UUID1:params" > /sys/devices/../mdev_create > > > > > >> # echo "$UUID2:params" > /sys/devices/../mdev_create > > > > > >> > > > > > >> where $UUID1 and $UUID2 would be mdev device's unique UUID and > > > > > >> 'params' > > > > > >> would be vendor specific parameters. > > > > > >> > > > > > >> Device nodes would be created as "$UUID1" and "$UUID" > > > > > >> > > > > > >> Then 'mdev_start' would be: > > > > > >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_start > > > > > >> > > > > > >> Similarly 'mdev_stop' and 'mdev_destroy' would be: > > > > > >> > > > > > >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_stop > > > > > > > > > > > > I'm not sure a comma separated list makes sense here, for both > > > > > > simplicity in the kernel and more fine grained error reporting, we > > > > > > probably want to start/stop them individually. Actually, why is it > > > > > > that we can't use the mediated device being opened and released to > > > > > > automatically signal to the backend vendor driver to commit and
Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver
On Mon, 15 Aug 2016 15:09:30 -0700 Neo Jia wrote: > On Mon, Aug 15, 2016 at 09:59:26AM -0600, Alex Williamson wrote: > > On Mon, 15 Aug 2016 09:38:52 + > > "Tian, Kevin" wrote: > > > > > > From: Kirti Wankhede [mailto:kwankh...@nvidia.com] > > > > Sent: Saturday, August 13, 2016 8:37 AM > > > > > > > > > > > > > > > > On 8/13/2016 2:46 AM, Alex Williamson wrote: > > > > > On Sat, 13 Aug 2016 00:14:39 +0530 > > > > > Kirti Wankhede wrote: > > > > > > > > > >> On 8/10/2016 12:30 AM, Alex Williamson wrote: > > > > >>> On Thu, 4 Aug 2016 00:33:51 +0530 > > > > >>> Kirti Wankhede wrote: > > > > >>> > > > > >>> This is used later by mdev_device_start() and mdev_device_stop() to > > > > >>> get > > > > >>> the parent_device so it can call the start and stop ops callbacks > > > > >>> respectively. That seems to imply that all of instances for a given > > > > >>> uuid come from the same parent_device. Where is that enforced? I'm > > > > >>> still having a hard time buying into the uuid+instance plan when it > > > > >>> seems like each mdev_device should have an actual unique uuid. > > > > >>> Userspace tools can figure out which uuids to start for a given > > > > >>> user, I > > > > >>> don't see much value in collecting them to instances within a uuid. > > > > >>> > > > > >> > > > > >> Initially we started discussion with VM_UUID+instance suggestion, > > > > >> where > > > > >> instance was introduced to support multiple devices in a VM. > > > > > > > > > > The instance number was never required in order to support multiple > > > > > devices in a VM, IIRC this UUID+instance scheme was to appease NVIDIA > > > > > management tools which wanted to re-use the VM UUID by creating vGPU > > > > > devices with that same UUID and therefore associate udev events to a > > > > > given VM. Only then does an instance number become necessary since > > > > > the > > > > > UUID needs to be static for a vGPUs within a VM. This has always felt > > > > > like a very dodgy solution when we should probably just be querying > > > > > libvirt to give us a device to VM association. > > > > > > Agree with Alex here. We'd better not assume that UUID will be a VM_UUID > > > for mdev in the basic design. It's bound to NVIDIA management stack too > > > tightly. > > > > > > I'm OK to give enough flexibility for various upper level management > > > stacks, > > > e.g. instead of $UUID+INDEX style, would $UUID+STRING provide a better > > > option where either UUID or STRING could be optional? Upper management > > > stack can choose its own policy to identify a mdev: > > > > > > a) $UUID only, so each mdev is allocated with a unique UUID > > > b) STRING only, which could be an index (0, 1, 2, ...), or any > > > combination > > > (vgpu0, vgpu1, etc.) > > > c) $UUID+STRING, where UUID could be a VM UUID, and STRING could be > > > a numeric index > > > > > > > > > > > > >> 'mdev_create' creates device and 'mdev_start' is to commit resources > > > > >> of > > > > >> all instances of similar devices assigned to VM. > > > > >> > > > > >> For example, to create 2 devices: > > > > >> # echo "$UUID:0:params" > /sys/devices/../mdev_create > > > > >> # echo "$UUID:1:params" > /sys/devices/../mdev_create > > > > >> > > > > >> "$UUID-0" and "$UUID-1" devices are created. > > > > >> > > > > >> Commit resources for above devices with single 'mdev_start': > > > > >> # echo "$UUID" > /sys/class/mdev/mdev_start > > > > >> > > > > >> Considering $UUID to be a unique UUID of a device, we don't need > > > > >> 'instance', so 'mdev_create' would look like: > > > > >> > > > > >> # echo "$UUID1:params" > /sys/devices/../mdev_create > > > > >> # echo "$UUID2:params" > /sys/devices/../mdev_create > > > > >> > > > > >> where $UUID1 and $UUID2 would be mdev device's unique UUID and > > > > >> 'params' > > > > >> would be vendor specific parameters. > > > > >> > > > > >> Device nodes would be created as "$UUID1" and "$UUID" > > > > >> > > > > >> Then 'mdev_start' would be: > > > > >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_start > > > > >> > > > > >> Similarly 'mdev_stop' and 'mdev_destroy' would be: > > > > >> > > > > >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_stop > > > > > > > > > > I'm not sure a comma separated list makes sense here, for both > > > > > simplicity in the kernel and more fine grained error reporting, we > > > > > probably want to start/stop them individually. Actually, why is it > > > > > that we can't use the mediated device being opened and released to > > > > > automatically signal to the backend vendor driver to commit and > > > > > release > > > > > resources? I don't fully understand why userspace needs this > > > > > interface. > > > > > > There is a meaningful use of start/stop interface, as required in live > > > migration support. Such interface allows vendor driver to quiescent > > > mdev activity on source device before mdev hardware state is snapshot, >
Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver
On Mon, 15 Aug 2016 12:59:08 -0700 Neo Jia wrote: > On Mon, Aug 15, 2016 at 09:38:52AM +, Tian, Kevin wrote: > > > From: Kirti Wankhede [mailto:kwankh...@nvidia.com] > > > Sent: Saturday, August 13, 2016 8:37 AM > > > > > > > > > > > > On 8/13/2016 2:46 AM, Alex Williamson wrote: > > > > On Sat, 13 Aug 2016 00:14:39 +0530 > > > > Kirti Wankhede wrote: > > > > > > > >> On 8/10/2016 12:30 AM, Alex Williamson wrote: > > > >>> On Thu, 4 Aug 2016 00:33:51 +0530 > > > >>> Kirti Wankhede wrote: > > > >>> > > > >>> This is used later by mdev_device_start() and mdev_device_stop() to > > > >>> get > > > >>> the parent_device so it can call the start and stop ops callbacks > > > >>> respectively. That seems to imply that all of instances for a given > > > >>> uuid come from the same parent_device. Where is that enforced? I'm > > > >>> still having a hard time buying into the uuid+instance plan when it > > > >>> seems like each mdev_device should have an actual unique uuid. > > > >>> Userspace tools can figure out which uuids to start for a given user, > > > >>> I > > > >>> don't see much value in collecting them to instances within a uuid. > > > >>> > > > >> > > > >> Initially we started discussion with VM_UUID+instance suggestion, where > > > >> instance was introduced to support multiple devices in a VM. > > > > > > > > The instance number was never required in order to support multiple > > > > devices in a VM, IIRC this UUID+instance scheme was to appease NVIDIA > > > > management tools which wanted to re-use the VM UUID by creating vGPU > > > > devices with that same UUID and therefore associate udev events to a > > > > given VM. Only then does an instance number become necessary since the > > > > UUID needs to be static for a vGPUs within a VM. This has always felt > > > > like a very dodgy solution when we should probably just be querying > > > > libvirt to give us a device to VM association. > > > > Agree with Alex here. We'd better not assume that UUID will be a VM_UUID > > for mdev in the basic design. It's bound to NVIDIA management stack too > > tightly. > > > > I'm OK to give enough flexibility for various upper level management stacks, > > e.g. instead of $UUID+INDEX style, would $UUID+STRING provide a better > > option where either UUID or STRING could be optional? Upper management > > stack can choose its own policy to identify a mdev: > > > > a) $UUID only, so each mdev is allocated with a unique UUID > > b) STRING only, which could be an index (0, 1, 2, ...), or any combination > > (vgpu0, vgpu1, etc.) > > c) $UUID+STRING, where UUID could be a VM UUID, and STRING could be > > a numeric index > > > > > > > > > >> 'mdev_create' creates device and 'mdev_start' is to commit resources of > > > >> all instances of similar devices assigned to VM. > > > >> > > > >> For example, to create 2 devices: > > > >> # echo "$UUID:0:params" > /sys/devices/../mdev_create > > > >> # echo "$UUID:1:params" > /sys/devices/../mdev_create > > > >> > > > >> "$UUID-0" and "$UUID-1" devices are created. > > > >> > > > >> Commit resources for above devices with single 'mdev_start': > > > >> # echo "$UUID" > /sys/class/mdev/mdev_start > > > >> > > > >> Considering $UUID to be a unique UUID of a device, we don't need > > > >> 'instance', so 'mdev_create' would look like: > > > >> > > > >> # echo "$UUID1:params" > /sys/devices/../mdev_create > > > >> # echo "$UUID2:params" > /sys/devices/../mdev_create > > > >> > > > >> where $UUID1 and $UUID2 would be mdev device's unique UUID and 'params' > > > >> would be vendor specific parameters. > > > >> > > > >> Device nodes would be created as "$UUID1" and "$UUID" > > > >> > > > >> Then 'mdev_start' would be: > > > >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_start > > > >> > > > >> Similarly 'mdev_stop' and 'mdev_destroy' would be: > > > >> > > > >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_stop > > > > > > > > I'm not sure a comma separated list makes sense here, for both > > > > simplicity in the kernel and more fine grained error reporting, we > > > > probably want to start/stop them individually. Actually, why is it > > > > that we can't use the mediated device being opened and released to > > > > automatically signal to the backend vendor driver to commit and release > > > > resources? I don't fully understand why userspace needs this interface. > > > > > > > > There is a meaningful use of start/stop interface, as required in live > > migration support. Such interface allows vendor driver to quiescent > > mdev activity on source device before mdev hardware state is snapshot, > > and then resume mdev activity on dest device after its state is recovered. > > Intel has implemented experimental live migration support in KVMGT (soon > > to release), based on above two interfaces (plus another two to get/set > > mdev state). > > > > > > > > > > > > For NVIDIA vGPU solution we need to know all devices assigned to a VM in > >
Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver
On Mon, Aug 15, 2016 at 09:59:26AM -0600, Alex Williamson wrote: > On Mon, 15 Aug 2016 09:38:52 + > "Tian, Kevin" wrote: > > > > From: Kirti Wankhede [mailto:kwankh...@nvidia.com] > > > Sent: Saturday, August 13, 2016 8:37 AM > > > > > > > > > > > > On 8/13/2016 2:46 AM, Alex Williamson wrote: > > > > On Sat, 13 Aug 2016 00:14:39 +0530 > > > > Kirti Wankhede wrote: > > > > > > > >> On 8/10/2016 12:30 AM, Alex Williamson wrote: > > > >>> On Thu, 4 Aug 2016 00:33:51 +0530 > > > >>> Kirti Wankhede wrote: > > > >>> > > > >>> This is used later by mdev_device_start() and mdev_device_stop() to > > > >>> get > > > >>> the parent_device so it can call the start and stop ops callbacks > > > >>> respectively. That seems to imply that all of instances for a given > > > >>> uuid come from the same parent_device. Where is that enforced? I'm > > > >>> still having a hard time buying into the uuid+instance plan when it > > > >>> seems like each mdev_device should have an actual unique uuid. > > > >>> Userspace tools can figure out which uuids to start for a given user, > > > >>> I > > > >>> don't see much value in collecting them to instances within a uuid. > > > >>> > > > >> > > > >> Initially we started discussion with VM_UUID+instance suggestion, where > > > >> instance was introduced to support multiple devices in a VM. > > > > > > > > The instance number was never required in order to support multiple > > > > devices in a VM, IIRC this UUID+instance scheme was to appease NVIDIA > > > > management tools which wanted to re-use the VM UUID by creating vGPU > > > > devices with that same UUID and therefore associate udev events to a > > > > given VM. Only then does an instance number become necessary since the > > > > UUID needs to be static for a vGPUs within a VM. This has always felt > > > > like a very dodgy solution when we should probably just be querying > > > > libvirt to give us a device to VM association. > > > > Agree with Alex here. We'd better not assume that UUID will be a VM_UUID > > for mdev in the basic design. It's bound to NVIDIA management stack too > > tightly. > > > > I'm OK to give enough flexibility for various upper level management stacks, > > e.g. instead of $UUID+INDEX style, would $UUID+STRING provide a better > > option where either UUID or STRING could be optional? Upper management > > stack can choose its own policy to identify a mdev: > > > > a) $UUID only, so each mdev is allocated with a unique UUID > > b) STRING only, which could be an index (0, 1, 2, ...), or any combination > > (vgpu0, vgpu1, etc.) > > c) $UUID+STRING, where UUID could be a VM UUID, and STRING could be > > a numeric index > > > > > > > > > >> 'mdev_create' creates device and 'mdev_start' is to commit resources of > > > >> all instances of similar devices assigned to VM. > > > >> > > > >> For example, to create 2 devices: > > > >> # echo "$UUID:0:params" > /sys/devices/../mdev_create > > > >> # echo "$UUID:1:params" > /sys/devices/../mdev_create > > > >> > > > >> "$UUID-0" and "$UUID-1" devices are created. > > > >> > > > >> Commit resources for above devices with single 'mdev_start': > > > >> # echo "$UUID" > /sys/class/mdev/mdev_start > > > >> > > > >> Considering $UUID to be a unique UUID of a device, we don't need > > > >> 'instance', so 'mdev_create' would look like: > > > >> > > > >> # echo "$UUID1:params" > /sys/devices/../mdev_create > > > >> # echo "$UUID2:params" > /sys/devices/../mdev_create > > > >> > > > >> where $UUID1 and $UUID2 would be mdev device's unique UUID and 'params' > > > >> would be vendor specific parameters. > > > >> > > > >> Device nodes would be created as "$UUID1" and "$UUID" > > > >> > > > >> Then 'mdev_start' would be: > > > >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_start > > > >> > > > >> Similarly 'mdev_stop' and 'mdev_destroy' would be: > > > >> > > > >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_stop > > > > > > > > I'm not sure a comma separated list makes sense here, for both > > > > simplicity in the kernel and more fine grained error reporting, we > > > > probably want to start/stop them individually. Actually, why is it > > > > that we can't use the mediated device being opened and released to > > > > automatically signal to the backend vendor driver to commit and release > > > > resources? I don't fully understand why userspace needs this interface. > > > > > > > > There is a meaningful use of start/stop interface, as required in live > > migration support. Such interface allows vendor driver to quiescent > > mdev activity on source device before mdev hardware state is snapshot, > > and then resume mdev activity on dest device after its state is recovered. > > Intel has implemented experimental live migration support in KVMGT (soon > > to release), based on above two interfaces (plus another two to get/set > > mdev state). > > Ok, that's actually an interesting use case for start/stop. > > > > > > > For NVIDIA vGP
Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver
On Mon, Aug 15, 2016 at 09:38:52AM +, Tian, Kevin wrote: > > From: Kirti Wankhede [mailto:kwankh...@nvidia.com] > > Sent: Saturday, August 13, 2016 8:37 AM > > > > > > > > On 8/13/2016 2:46 AM, Alex Williamson wrote: > > > On Sat, 13 Aug 2016 00:14:39 +0530 > > > Kirti Wankhede wrote: > > > > > >> On 8/10/2016 12:30 AM, Alex Williamson wrote: > > >>> On Thu, 4 Aug 2016 00:33:51 +0530 > > >>> Kirti Wankhede wrote: > > >>> > > >>> This is used later by mdev_device_start() and mdev_device_stop() to get > > >>> the parent_device so it can call the start and stop ops callbacks > > >>> respectively. That seems to imply that all of instances for a given > > >>> uuid come from the same parent_device. Where is that enforced? I'm > > >>> still having a hard time buying into the uuid+instance plan when it > > >>> seems like each mdev_device should have an actual unique uuid. > > >>> Userspace tools can figure out which uuids to start for a given user, I > > >>> don't see much value in collecting them to instances within a uuid. > > >>> > > >> > > >> Initially we started discussion with VM_UUID+instance suggestion, where > > >> instance was introduced to support multiple devices in a VM. > > > > > > The instance number was never required in order to support multiple > > > devices in a VM, IIRC this UUID+instance scheme was to appease NVIDIA > > > management tools which wanted to re-use the VM UUID by creating vGPU > > > devices with that same UUID and therefore associate udev events to a > > > given VM. Only then does an instance number become necessary since the > > > UUID needs to be static for a vGPUs within a VM. This has always felt > > > like a very dodgy solution when we should probably just be querying > > > libvirt to give us a device to VM association. > > Agree with Alex here. We'd better not assume that UUID will be a VM_UUID > for mdev in the basic design. It's bound to NVIDIA management stack too > tightly. > > I'm OK to give enough flexibility for various upper level management stacks, > e.g. instead of $UUID+INDEX style, would $UUID+STRING provide a better > option where either UUID or STRING could be optional? Upper management > stack can choose its own policy to identify a mdev: > > a) $UUID only, so each mdev is allocated with a unique UUID > b) STRING only, which could be an index (0, 1, 2, ...), or any combination > (vgpu0, vgpu1, etc.) > c) $UUID+STRING, where UUID could be a VM UUID, and STRING could be > a numeric index > > > > > > >> 'mdev_create' creates device and 'mdev_start' is to commit resources of > > >> all instances of similar devices assigned to VM. > > >> > > >> For example, to create 2 devices: > > >> # echo "$UUID:0:params" > /sys/devices/../mdev_create > > >> # echo "$UUID:1:params" > /sys/devices/../mdev_create > > >> > > >> "$UUID-0" and "$UUID-1" devices are created. > > >> > > >> Commit resources for above devices with single 'mdev_start': > > >> # echo "$UUID" > /sys/class/mdev/mdev_start > > >> > > >> Considering $UUID to be a unique UUID of a device, we don't need > > >> 'instance', so 'mdev_create' would look like: > > >> > > >> # echo "$UUID1:params" > /sys/devices/../mdev_create > > >> # echo "$UUID2:params" > /sys/devices/../mdev_create > > >> > > >> where $UUID1 and $UUID2 would be mdev device's unique UUID and 'params' > > >> would be vendor specific parameters. > > >> > > >> Device nodes would be created as "$UUID1" and "$UUID" > > >> > > >> Then 'mdev_start' would be: > > >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_start > > >> > > >> Similarly 'mdev_stop' and 'mdev_destroy' would be: > > >> > > >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_stop > > > > > > I'm not sure a comma separated list makes sense here, for both > > > simplicity in the kernel and more fine grained error reporting, we > > > probably want to start/stop them individually. Actually, why is it > > > that we can't use the mediated device being opened and released to > > > automatically signal to the backend vendor driver to commit and release > > > resources? I don't fully understand why userspace needs this interface. > > There is a meaningful use of start/stop interface, as required in live > migration support. Such interface allows vendor driver to quiescent > mdev activity on source device before mdev hardware state is snapshot, > and then resume mdev activity on dest device after its state is recovered. > Intel has implemented experimental live migration support in KVMGT (soon > to release), based on above two interfaces (plus another two to get/set > mdev state). > > > > > > > > For NVIDIA vGPU solution we need to know all devices assigned to a VM in > > one shot to commit resources of all vGPUs assigned to a VM along with > > some common resources. > > Kirti, can you elaborate the background about above one-shot commit > requirement? It's hard to understand such a requirement. > > As I relied in another mail, I really hope start/stop become a per-mdev
Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver
On Mon, 15 Aug 2016 09:38:52 + "Tian, Kevin" wrote: > > From: Kirti Wankhede [mailto:kwankh...@nvidia.com] > > Sent: Saturday, August 13, 2016 8:37 AM > > > > > > > > On 8/13/2016 2:46 AM, Alex Williamson wrote: > > > On Sat, 13 Aug 2016 00:14:39 +0530 > > > Kirti Wankhede wrote: > > > > > >> On 8/10/2016 12:30 AM, Alex Williamson wrote: > > >>> On Thu, 4 Aug 2016 00:33:51 +0530 > > >>> Kirti Wankhede wrote: > > >>> > > >>> This is used later by mdev_device_start() and mdev_device_stop() to get > > >>> the parent_device so it can call the start and stop ops callbacks > > >>> respectively. That seems to imply that all of instances for a given > > >>> uuid come from the same parent_device. Where is that enforced? I'm > > >>> still having a hard time buying into the uuid+instance plan when it > > >>> seems like each mdev_device should have an actual unique uuid. > > >>> Userspace tools can figure out which uuids to start for a given user, I > > >>> don't see much value in collecting them to instances within a uuid. > > >>> > > >> > > >> Initially we started discussion with VM_UUID+instance suggestion, where > > >> instance was introduced to support multiple devices in a VM. > > > > > > The instance number was never required in order to support multiple > > > devices in a VM, IIRC this UUID+instance scheme was to appease NVIDIA > > > management tools which wanted to re-use the VM UUID by creating vGPU > > > devices with that same UUID and therefore associate udev events to a > > > given VM. Only then does an instance number become necessary since the > > > UUID needs to be static for a vGPUs within a VM. This has always felt > > > like a very dodgy solution when we should probably just be querying > > > libvirt to give us a device to VM association. > > Agree with Alex here. We'd better not assume that UUID will be a VM_UUID > for mdev in the basic design. It's bound to NVIDIA management stack too > tightly. > > I'm OK to give enough flexibility for various upper level management stacks, > e.g. instead of $UUID+INDEX style, would $UUID+STRING provide a better > option where either UUID or STRING could be optional? Upper management > stack can choose its own policy to identify a mdev: > > a) $UUID only, so each mdev is allocated with a unique UUID > b) STRING only, which could be an index (0, 1, 2, ...), or any combination > (vgpu0, vgpu1, etc.) > c) $UUID+STRING, where UUID could be a VM UUID, and STRING could be > a numeric index > > > > > > >> 'mdev_create' creates device and 'mdev_start' is to commit resources of > > >> all instances of similar devices assigned to VM. > > >> > > >> For example, to create 2 devices: > > >> # echo "$UUID:0:params" > /sys/devices/../mdev_create > > >> # echo "$UUID:1:params" > /sys/devices/../mdev_create > > >> > > >> "$UUID-0" and "$UUID-1" devices are created. > > >> > > >> Commit resources for above devices with single 'mdev_start': > > >> # echo "$UUID" > /sys/class/mdev/mdev_start > > >> > > >> Considering $UUID to be a unique UUID of a device, we don't need > > >> 'instance', so 'mdev_create' would look like: > > >> > > >> # echo "$UUID1:params" > /sys/devices/../mdev_create > > >> # echo "$UUID2:params" > /sys/devices/../mdev_create > > >> > > >> where $UUID1 and $UUID2 would be mdev device's unique UUID and 'params' > > >> would be vendor specific parameters. > > >> > > >> Device nodes would be created as "$UUID1" and "$UUID" > > >> > > >> Then 'mdev_start' would be: > > >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_start > > >> > > >> Similarly 'mdev_stop' and 'mdev_destroy' would be: > > >> > > >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_stop > > > > > > I'm not sure a comma separated list makes sense here, for both > > > simplicity in the kernel and more fine grained error reporting, we > > > probably want to start/stop them individually. Actually, why is it > > > that we can't use the mediated device being opened and released to > > > automatically signal to the backend vendor driver to commit and release > > > resources? I don't fully understand why userspace needs this interface. > > There is a meaningful use of start/stop interface, as required in live > migration support. Such interface allows vendor driver to quiescent > mdev activity on source device before mdev hardware state is snapshot, > and then resume mdev activity on dest device after its state is recovered. > Intel has implemented experimental live migration support in KVMGT (soon > to release), based on above two interfaces (plus another two to get/set > mdev state). Ok, that's actually an interesting use case for start/stop. > > > > For NVIDIA vGPU solution we need to know all devices assigned to a VM in > > one shot to commit resources of all vGPUs assigned to a VM along with > > some common resources. > > Kirti, can you elaborate the background about above one-shot commit > requirement? It's hard to understand such a requirement. Agree,
Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver
> From: Kirti Wankhede [mailto:kwankh...@nvidia.com] > Sent: Saturday, August 13, 2016 8:37 AM > > > > On 8/13/2016 2:46 AM, Alex Williamson wrote: > > On Sat, 13 Aug 2016 00:14:39 +0530 > > Kirti Wankhede wrote: > > > >> On 8/10/2016 12:30 AM, Alex Williamson wrote: > >>> On Thu, 4 Aug 2016 00:33:51 +0530 > >>> Kirti Wankhede wrote: > >>> > >>> This is used later by mdev_device_start() and mdev_device_stop() to get > >>> the parent_device so it can call the start and stop ops callbacks > >>> respectively. That seems to imply that all of instances for a given > >>> uuid come from the same parent_device. Where is that enforced? I'm > >>> still having a hard time buying into the uuid+instance plan when it > >>> seems like each mdev_device should have an actual unique uuid. > >>> Userspace tools can figure out which uuids to start for a given user, I > >>> don't see much value in collecting them to instances within a uuid. > >>> > >> > >> Initially we started discussion with VM_UUID+instance suggestion, where > >> instance was introduced to support multiple devices in a VM. > > > > The instance number was never required in order to support multiple > > devices in a VM, IIRC this UUID+instance scheme was to appease NVIDIA > > management tools which wanted to re-use the VM UUID by creating vGPU > > devices with that same UUID and therefore associate udev events to a > > given VM. Only then does an instance number become necessary since the > > UUID needs to be static for a vGPUs within a VM. This has always felt > > like a very dodgy solution when we should probably just be querying > > libvirt to give us a device to VM association. Agree with Alex here. We'd better not assume that UUID will be a VM_UUID for mdev in the basic design. It's bound to NVIDIA management stack too tightly. I'm OK to give enough flexibility for various upper level management stacks, e.g. instead of $UUID+INDEX style, would $UUID+STRING provide a better option where either UUID or STRING could be optional? Upper management stack can choose its own policy to identify a mdev: a) $UUID only, so each mdev is allocated with a unique UUID b) STRING only, which could be an index (0, 1, 2, ...), or any combination (vgpu0, vgpu1, etc.) c) $UUID+STRING, where UUID could be a VM UUID, and STRING could be a numeric index > > > >> 'mdev_create' creates device and 'mdev_start' is to commit resources of > >> all instances of similar devices assigned to VM. > >> > >> For example, to create 2 devices: > >> # echo "$UUID:0:params" > /sys/devices/../mdev_create > >> # echo "$UUID:1:params" > /sys/devices/../mdev_create > >> > >> "$UUID-0" and "$UUID-1" devices are created. > >> > >> Commit resources for above devices with single 'mdev_start': > >> # echo "$UUID" > /sys/class/mdev/mdev_start > >> > >> Considering $UUID to be a unique UUID of a device, we don't need > >> 'instance', so 'mdev_create' would look like: > >> > >> # echo "$UUID1:params" > /sys/devices/../mdev_create > >> # echo "$UUID2:params" > /sys/devices/../mdev_create > >> > >> where $UUID1 and $UUID2 would be mdev device's unique UUID and 'params' > >> would be vendor specific parameters. > >> > >> Device nodes would be created as "$UUID1" and "$UUID" > >> > >> Then 'mdev_start' would be: > >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_start > >> > >> Similarly 'mdev_stop' and 'mdev_destroy' would be: > >> > >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_stop > > > > I'm not sure a comma separated list makes sense here, for both > > simplicity in the kernel and more fine grained error reporting, we > > probably want to start/stop them individually. Actually, why is it > > that we can't use the mediated device being opened and released to > > automatically signal to the backend vendor driver to commit and release > > resources? I don't fully understand why userspace needs this interface. There is a meaningful use of start/stop interface, as required in live migration support. Such interface allows vendor driver to quiescent mdev activity on source device before mdev hardware state is snapshot, and then resume mdev activity on dest device after its state is recovered. Intel has implemented experimental live migration support in KVMGT (soon to release), based on above two interfaces (plus another two to get/set mdev state). > > > > For NVIDIA vGPU solution we need to know all devices assigned to a VM in > one shot to commit resources of all vGPUs assigned to a VM along with > some common resources. Kirti, can you elaborate the background about above one-shot commit requirement? It's hard to understand such a requirement. As I relied in another mail, I really hope start/stop become a per-mdev attribute instead of global one, e.g.: echo "0/1" > /sys/class/mdev/12345678-1234-1234-1234-123456789abc/start In many scenario the user space client may only want to talk to mdev instance directly, w/o need to contact its parent device. Still take live migration for exam
Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver
> From: Kirti Wankhede [mailto:kwankh...@nvidia.com] > Sent: Friday, August 05, 2016 2:13 PM > > On 8/4/2016 12:51 PM, Tian, Kevin wrote: > >> From: Kirti Wankhede [mailto:kwankh...@nvidia.com] > >> Sent: Thursday, August 04, 2016 3:04 AM > >> > >> > >> 2. Physical device driver interface > >> This interface provides vendor driver the set APIs to manage physical > >> device related work in their own driver. APIs are : > >> - supported_config: provide supported configuration list by the vendor > >>driver > >> - create: to allocate basic resources in vendor driver for a mediated > >> device. > >> - destroy: to free resources in vendor driver when mediated device is > >> destroyed. > >> - reset: to free and reallocate resources in vendor driver during reboot > > > > Currently I saw 'reset' callback only invoked from VFIO ioctl path. Do > > you think whether it makes sense to expose a sysfs 'reset' node too, > > similar to what people see under a PCI device node? > > > > All vendor drivers might not support reset of mdev from sysfs. But those > who want to support can expose 'reset' node using 'mdev_attr_groups' of > 'struct parent_ops'. > Yes, this way it works. Just wonder whether it makes sense to expose reset sysfs node by default if a reset callback is provided by vendor driver. :-) Thanks Kevin
Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver
On 8/13/2016 2:46 AM, Alex Williamson wrote: > On Sat, 13 Aug 2016 00:14:39 +0530 > Kirti Wankhede wrote: > >> On 8/10/2016 12:30 AM, Alex Williamson wrote: >>> On Thu, 4 Aug 2016 00:33:51 +0530 >>> Kirti Wankhede wrote: >>> >>> This is used later by mdev_device_start() and mdev_device_stop() to get >>> the parent_device so it can call the start and stop ops callbacks >>> respectively. That seems to imply that all of instances for a given >>> uuid come from the same parent_device. Where is that enforced? I'm >>> still having a hard time buying into the uuid+instance plan when it >>> seems like each mdev_device should have an actual unique uuid. >>> Userspace tools can figure out which uuids to start for a given user, I >>> don't see much value in collecting them to instances within a uuid. >>> >> >> Initially we started discussion with VM_UUID+instance suggestion, where >> instance was introduced to support multiple devices in a VM. > > The instance number was never required in order to support multiple > devices in a VM, IIRC this UUID+instance scheme was to appease NVIDIA > management tools which wanted to re-use the VM UUID by creating vGPU > devices with that same UUID and therefore associate udev events to a > given VM. Only then does an instance number become necessary since the > UUID needs to be static for a vGPUs within a VM. This has always felt > like a very dodgy solution when we should probably just be querying > libvirt to give us a device to VM association. > >> 'mdev_create' creates device and 'mdev_start' is to commit resources of >> all instances of similar devices assigned to VM. >> >> For example, to create 2 devices: >> # echo "$UUID:0:params" > /sys/devices/../mdev_create >> # echo "$UUID:1:params" > /sys/devices/../mdev_create >> >> "$UUID-0" and "$UUID-1" devices are created. >> >> Commit resources for above devices with single 'mdev_start': >> # echo "$UUID" > /sys/class/mdev/mdev_start >> >> Considering $UUID to be a unique UUID of a device, we don't need >> 'instance', so 'mdev_create' would look like: >> >> # echo "$UUID1:params" > /sys/devices/../mdev_create >> # echo "$UUID2:params" > /sys/devices/../mdev_create >> >> where $UUID1 and $UUID2 would be mdev device's unique UUID and 'params' >> would be vendor specific parameters. >> >> Device nodes would be created as "$UUID1" and "$UUID" >> >> Then 'mdev_start' would be: >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_start >> >> Similarly 'mdev_stop' and 'mdev_destroy' would be: >> >> # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_stop > > I'm not sure a comma separated list makes sense here, for both > simplicity in the kernel and more fine grained error reporting, we > probably want to start/stop them individually. Actually, why is it > that we can't use the mediated device being opened and released to > automatically signal to the backend vendor driver to commit and release > resources? I don't fully understand why userspace needs this interface. > For NVIDIA vGPU solution we need to know all devices assigned to a VM in one shot to commit resources of all vGPUs assigned to a VM along with some common resources. For start callback, I can pass on the list of UUIDs as is to vendor driver. Let vendor driver decide whether to iterate for each device and commit resources or do it in one shot. Thanks, Kirti >> and >> >> # echo "$UUID1" > /sys/devices/../mdev_destroy >> # echo "$UUID2" > /sys/devices/../mdev_destroy >> >> Does this seems reasonable? > > I've been hoping we could drop the instance numbers and create actual > unique UUIDs per mediated device for a while ;) Thanks, > > Alex >
Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver
On Sat, 13 Aug 2016 00:14:39 +0530 Kirti Wankhede wrote: > On 8/10/2016 12:30 AM, Alex Williamson wrote: > > On Thu, 4 Aug 2016 00:33:51 +0530 > > Kirti Wankhede wrote: > > > > This is used later by mdev_device_start() and mdev_device_stop() to get > > the parent_device so it can call the start and stop ops callbacks > > respectively. That seems to imply that all of instances for a given > > uuid come from the same parent_device. Where is that enforced? I'm > > still having a hard time buying into the uuid+instance plan when it > > seems like each mdev_device should have an actual unique uuid. > > Userspace tools can figure out which uuids to start for a given user, I > > don't see much value in collecting them to instances within a uuid. > > > > Initially we started discussion with VM_UUID+instance suggestion, where > instance was introduced to support multiple devices in a VM. The instance number was never required in order to support multiple devices in a VM, IIRC this UUID+instance scheme was to appease NVIDIA management tools which wanted to re-use the VM UUID by creating vGPU devices with that same UUID and therefore associate udev events to a given VM. Only then does an instance number become necessary since the UUID needs to be static for a vGPUs within a VM. This has always felt like a very dodgy solution when we should probably just be querying libvirt to give us a device to VM association. > 'mdev_create' creates device and 'mdev_start' is to commit resources of > all instances of similar devices assigned to VM. > > For example, to create 2 devices: > # echo "$UUID:0:params" > /sys/devices/../mdev_create > # echo "$UUID:1:params" > /sys/devices/../mdev_create > > "$UUID-0" and "$UUID-1" devices are created. > > Commit resources for above devices with single 'mdev_start': > # echo "$UUID" > /sys/class/mdev/mdev_start > > Considering $UUID to be a unique UUID of a device, we don't need > 'instance', so 'mdev_create' would look like: > > # echo "$UUID1:params" > /sys/devices/../mdev_create > # echo "$UUID2:params" > /sys/devices/../mdev_create > > where $UUID1 and $UUID2 would be mdev device's unique UUID and 'params' > would be vendor specific parameters. > > Device nodes would be created as "$UUID1" and "$UUID" > > Then 'mdev_start' would be: > # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_start > > Similarly 'mdev_stop' and 'mdev_destroy' would be: > > # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_stop I'm not sure a comma separated list makes sense here, for both simplicity in the kernel and more fine grained error reporting, we probably want to start/stop them individually. Actually, why is it that we can't use the mediated device being opened and released to automatically signal to the backend vendor driver to commit and release resources? I don't fully understand why userspace needs this interface. > and > > # echo "$UUID1" > /sys/devices/../mdev_destroy > # echo "$UUID2" > /sys/devices/../mdev_destroy > > Does this seems reasonable? I've been hoping we could drop the instance numbers and create actual unique UUIDs per mediated device for a while ;) Thanks, Alex
Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver
On 8/10/2016 12:30 AM, Alex Williamson wrote: > On Thu, 4 Aug 2016 00:33:51 +0530 > Kirti Wankhede wrote: > > This is used later by mdev_device_start() and mdev_device_stop() to get > the parent_device so it can call the start and stop ops callbacks > respectively. That seems to imply that all of instances for a given > uuid come from the same parent_device. Where is that enforced? I'm > still having a hard time buying into the uuid+instance plan when it > seems like each mdev_device should have an actual unique uuid. > Userspace tools can figure out which uuids to start for a given user, I > don't see much value in collecting them to instances within a uuid. > Initially we started discussion with VM_UUID+instance suggestion, where instance was introduced to support multiple devices in a VM. 'mdev_create' creates device and 'mdev_start' is to commit resources of all instances of similar devices assigned to VM. For example, to create 2 devices: # echo "$UUID:0:params" > /sys/devices/../mdev_create # echo "$UUID:1:params" > /sys/devices/../mdev_create "$UUID-0" and "$UUID-1" devices are created. Commit resources for above devices with single 'mdev_start': # echo "$UUID" > /sys/class/mdev/mdev_start Considering $UUID to be a unique UUID of a device, we don't need 'instance', so 'mdev_create' would look like: # echo "$UUID1:params" > /sys/devices/../mdev_create # echo "$UUID2:params" > /sys/devices/../mdev_create where $UUID1 and $UUID2 would be mdev device's unique UUID and 'params' would be vendor specific parameters. Device nodes would be created as "$UUID1" and "$UUID" Then 'mdev_start' would be: # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_start Similarly 'mdev_stop' and 'mdev_destroy' would be: # echo "$UUID1, $UUID2" > /sys/class/mdev/mdev_stop and # echo "$UUID1" > /sys/devices/../mdev_destroy # echo "$UUID2" > /sys/devices/../mdev_destroy Does this seems reasonable? Thanks, Kirti
Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver
On Thu, 4 Aug 2016 00:33:51 +0530 Kirti Wankhede wrote: > Design for Mediated Device Driver: > Main purpose of this driver is to provide a common interface for mediated > device management that can be used by different drivers of different > devices. > > This module provides a generic interface to create the device, add it to > mediated bus, add device to IOMMU group and then add it to vfio group. > > Below is the high Level block diagram, with Nvidia, Intel and IBM devices > as example, since these are the devices which are going to actively use > this module as of now. > > +---+ > | | > | +---+ | mdev_register_driver() +--+ > | | | +<+ __init() | > | | | | | | > | | mdev | +>+ |<-> VFIO user > | | bus | | probe()/remove()| vfio_mpci.ko |APIs > | | driver | | | | > | | | | +--+ > | | | | mdev_register_driver() +--+ > | | | +<+ __init() | > | | | | | | > | | | +>+ |<-> VFIO user > | +---+ | probe()/remove()| vfio_mccw.ko |APIs > | | | | > | MDEV CORE| +--+ > | MODULE | > | mdev.ko | > | +---+ | mdev_register_device() +--+ > | | | +<+ | > | | | | | nvidia.ko |<-> physical > | | | +>+ |device > | | | |callback +--+ > | | Physical | | > | | device | | mdev_register_device() +--+ > | | interface | |<+ | > | | | | | i915.ko |<-> physical > | | | +>+ |device > | | | |callback +--+ > | | | | > | | | | mdev_register_device() +--+ > | | | +<+ | > | | | | | ccw_device.ko|<-> physical > | | | +>+ |device > | | | |callback +--+ > | +---+ | > +---+ > > Core driver provides two types of registration interfaces: > 1. Registration interface for mediated bus driver: > > /** > * struct mdev_driver - Mediated device's driver > * @name: driver name > * @probe: called when new device created > * @remove:called when device removed > * @match: called when new device or driver is added for this bus. > Return 1 if given device can be handled by given driver and > zero otherwise. > * @driver:device driver structure > * > **/ > struct mdev_driver { > const char *name; > int (*probe) (struct device *dev); > void (*remove) (struct device *dev); > int (*match)(struct device *dev); > struct device_driverdriver; > }; > > int mdev_register_driver(struct mdev_driver *drv, struct module *owner); > void mdev_unregister_driver(struct mdev_driver *drv); > > Mediated device's driver for mdev should use this interface to register > with Core driver. With this, mediated devices driver for such devices is > responsible to add mediated device to VFIO group. > > 2. Physical device driver interface > This interface provides vendor driver the set APIs to manage physical > device related work in their own driver. APIs are : > - supported_config: provide supported configuration list by the vendor > driver > - create: to allocate basic resources in vendor driver for a mediated > device. > - destroy: to free resources in vendor driver when mediated device is > destroyed. > - reset: to free and reallocate resources in vendor driver during reboot > - start: to initiate mediated device initialization process from vendor >driver > - shutdown: to teardown mediated device resources during teardown. > - read : read emulation callback. > - write: write emulation callback. > - set_irqs: send interrupt configuration information that VMM sets. > - get_region_info: to provide region size and its flags for the mediated > device. > - validate_map_request: to validate remap pfn request. > > This registration interface should be used by vendor drivers to register > each physical device to mdev core driver. > Locks to serialize above callbacks are removed. If required, vendor driver > can have locks to serialize above APIs in their driver. > >
Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver
On 8/4/2016 12:51 PM, Tian, Kevin wrote: >> From: Kirti Wankhede [mailto:kwankh...@nvidia.com] >> Sent: Thursday, August 04, 2016 3:04 AM >> >> >> 2. Physical device driver interface >> This interface provides vendor driver the set APIs to manage physical >> device related work in their own driver. APIs are : >> - supported_config: provide supported configuration list by the vendor >> driver >> - create: to allocate basic resources in vendor driver for a mediated >>device. >> - destroy: to free resources in vendor driver when mediated device is >> destroyed. >> - reset: to free and reallocate resources in vendor driver during reboot > > Currently I saw 'reset' callback only invoked from VFIO ioctl path. Do > you think whether it makes sense to expose a sysfs 'reset' node too, > similar to what people see under a PCI device node? > All vendor drivers might not support reset of mdev from sysfs. But those who want to support can expose 'reset' node using 'mdev_attr_groups' of 'struct parent_ops'. >> - start: to initiate mediated device initialization process from vendor >> driver >> - shutdown: to teardown mediated device resources during teardown. > > I think 'shutdown' should be 'stop' based on actual code. > Thanks for catching that, yes I missed to updated here. Thanks, Kirti > Thanks > Kevin >
Re: [Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver
> From: Kirti Wankhede [mailto:kwankh...@nvidia.com] > Sent: Thursday, August 04, 2016 3:04 AM > > > 2. Physical device driver interface > This interface provides vendor driver the set APIs to manage physical > device related work in their own driver. APIs are : > - supported_config: provide supported configuration list by the vendor > driver > - create: to allocate basic resources in vendor driver for a mediated > device. > - destroy: to free resources in vendor driver when mediated device is > destroyed. > - reset: to free and reallocate resources in vendor driver during reboot Currently I saw 'reset' callback only invoked from VFIO ioctl path. Do you think whether it makes sense to expose a sysfs 'reset' node too, similar to what people see under a PCI device node? > - start: to initiate mediated device initialization process from vendor >driver > - shutdown: to teardown mediated device resources during teardown. I think 'shutdown' should be 'stop' based on actual code. Thanks Kevin
[Qemu-devel] [PATCH v6 1/4] vfio: Mediated device Core driver
Design for Mediated Device Driver: Main purpose of this driver is to provide a common interface for mediated device management that can be used by different drivers of different devices. This module provides a generic interface to create the device, add it to mediated bus, add device to IOMMU group and then add it to vfio group. Below is the high Level block diagram, with Nvidia, Intel and IBM devices as example, since these are the devices which are going to actively use this module as of now. +---+ | | | +---+ | mdev_register_driver() +--+ | | | +<+ __init() | | | | | | | | | mdev | +>+ |<-> VFIO user | | bus | | probe()/remove()| vfio_mpci.ko |APIs | | driver | | | | | | | | +--+ | | | | mdev_register_driver() +--+ | | | +<+ __init() | | | | | | | | | | +>+ |<-> VFIO user | +---+ | probe()/remove()| vfio_mccw.ko |APIs | | | | | MDEV CORE| +--+ | MODULE | | mdev.ko | | +---+ | mdev_register_device() +--+ | | | +<+ | | | | | | nvidia.ko |<-> physical | | | +>+ |device | | | |callback +--+ | | Physical | | | | device | | mdev_register_device() +--+ | | interface | |<+ | | | | | | i915.ko |<-> physical | | | +>+ |device | | | |callback +--+ | | | | | | | | mdev_register_device() +--+ | | | +<+ | | | | | | ccw_device.ko|<-> physical | | | +>+ |device | | | |callback +--+ | +---+ | +---+ Core driver provides two types of registration interfaces: 1. Registration interface for mediated bus driver: /** * struct mdev_driver - Mediated device's driver * @name: driver name * @probe: called when new device created * @remove:called when device removed * @match: called when new device or driver is added for this bus. Return 1 if given device can be handled by given driver and zero otherwise. * @driver:device driver structure * **/ struct mdev_driver { const char *name; int (*probe) (struct device *dev); void (*remove) (struct device *dev); int (*match)(struct device *dev); struct device_driverdriver; }; int mdev_register_driver(struct mdev_driver *drv, struct module *owner); void mdev_unregister_driver(struct mdev_driver *drv); Mediated device's driver for mdev should use this interface to register with Core driver. With this, mediated devices driver for such devices is responsible to add mediated device to VFIO group. 2. Physical device driver interface This interface provides vendor driver the set APIs to manage physical device related work in their own driver. APIs are : - supported_config: provide supported configuration list by the vendor driver - create: to allocate basic resources in vendor driver for a mediated device. - destroy: to free resources in vendor driver when mediated device is destroyed. - reset: to free and reallocate resources in vendor driver during reboot - start: to initiate mediated device initialization process from vendor driver - shutdown: to teardown mediated device resources during teardown. - read : read emulation callback. - write: write emulation callback. - set_irqs: send interrupt configuration information that VMM sets. - get_region_info: to provide region size and its flags for the mediated device. - validate_map_request: to validate remap pfn request. This registration interface should be used by vendor drivers to register each physical device to mdev core driver. Locks to serialize above callbacks are removed. If required, vendor driver can have locks to serialize above APIs in their driver. Added support to keep track of physical mappings for each mdev device. APIs to be used by mediated device bus driver to add and delete mappings to tracking logic: int mdev_add_phys_mapping(struct mdev_device *mdev, struct addre