Re: [PATCH 0/4] Introduce ARM MTE feature

2023-05-22 Thread Cornelia Huck
On Wed, May 17 2023, Andrea Bolognani  wrote:

> On Wed, May 17, 2023 at 11:19:17AM +0200, Cornelia Huck wrote:
>> This has been through some iterations... we (as in people working on
>> this in QEMU) need to decide on where to go with cpu features, cpu
>> models, etc. on Arm, but for now, it's a virt machine property.
>>
>> I have considered doing a GIC-like configuration, but for starters, the
>> kernel doesn't support configuring the MTE version yet... and I'm not
>> sure if configuring the MTE version (vs enabling/disabling MTE) should
>> not be modeled as a cpu property instead.
>>
>> Note that my patch adds a migration blocker when enabling MTE, so (1)
>> nothing bad regarding migration compatibility should happen yet
>
> Migration is of course the most obvious failure scenario, but one of
> the critical features offered by libvirt is guest ABI compatibility.
>
> If the user needs MTE3 specifically, rather than any MTE version, to
> be available to the guest OS, they'll configure the domain with
> something like
>
>   
>
> and we need to be able to prevent such a VM from running on a host
> that doesn't have MTE3 support.
>
> So the fundamental question is, does the current libvirt interface
> paint us into a corner when it comes to implementing a more granular
> one later?

Given that an interface allowing to actually control the exposed version
is not likely to pop up in the next days, does it make sense to even try
to wire it up in libvirt right now?

>
> Remember that, unlike QEMU, we don't have the luxury of erasing
> mistakes from our public interfaces, ever :)



Re: [libvirt PATCH 0/4] Revert MTE support

2023-05-22 Thread Cornelia Huck
On Mon, May 22 2023, Andrea Bolognani  wrote:

> See the subthread starting at [1] for the rationale. The QEMU commit
> that (temporarily) reverted KVM support is at [2].
>
> [1] https://listman.redhat.com/archives/libvir-list/2023-May/239911.html
> [2] 
> https://gitlab.com/qemu-project/qemu/-/commit/d009607d08d22f91ca399b72828c6693855e7325
>
> Andrea Bolognani (4):
>   Revert "qemu: Generate command line for MTE feature"
>   Revert "qemu: Validate MTE feature"
>   Revert "qemu:: Introduce QEMU_CAPS_MACHINE_VIRT_MTE capability"
>   Revert "conf: Introduce MTE domain feature"
>
>  docs/formatdomain.rst |  7 -
>  src/conf/domain_conf.c|  6 +
>  src/conf/domain_conf.h|  1 -
>  src/conf/schemas/domaincommon.rng |  5 
>  src/qemu/qemu_capabilities.c  |  2 --
>  src/qemu/qemu_capabilities.h  |  1 -
>  src/qemu/qemu_command.c   |  6 -
>  src/qemu/qemu_validate.c  | 26 ++-
>  .../caps_5.2.0_aarch64.xml|  1 -
>  .../caps_6.0.0_aarch64.xml|  1 -
>  .../caps_6.2.0_aarch64.xml|  1 -
>  .../caps_7.0.0_aarch64+hvf.xml|  1 -
>  .../caps_7.0.0_aarch64.xml|  1 -
>  tests/qemuxml2argvdata/aarch64-gic-v3.args|  2 +-
>  tests/qemuxml2argvdata/aarch64-gic-v3.xml |  1 -
>  .../aarch64-gic-v3.aarch64-latest.xml |  1 -
>  16 files changed, 10 insertions(+), 53 deletions(-)

FWIW,
Reviewed-by: Cornelia Huck 



Re: [PATCH 0/4] Introduce ARM MTE feature

2023-05-17 Thread Cornelia Huck
On Wed, May 17 2023, Michal Prívozník  wrote:

> On 5/16/23 18:32, Andrea Bolognani wrote:
>> On Tue, May 16, 2023 at 12:54:12PM +0200, Michal Privoznik wrote:
>>> Michal Prívozník (4):
>>>   conf: Introduce MTE domain feature
>>>   qemu:: Introduce QEMU_CAPS_MACHINE_VIRT_MTE capability
>>>   qemu: Validate MTE feature
>>>   qemu: Generate command line for MTE feature
>> 
>> I wish I'd managed to see this before it got reviewed and merged :/
>
> We can still revert the patches, if needed.
>
>> 
>> For context, I have been following the development of the MTE feature
>> in QEMU for a while, and was planning to work on the libvirt part
>> later down the line. The main reason why I have not done so yet is
>> that there are still some open questions about the interface.
>> 
>> Specifically, MTE is not just a single thing: there are at least two
>> versions that I'm aware of, MTE and MTE3.
>> 
>> Right now, mte=on gives you MTE3 with TCG and whatever the host
>> supports on KVM. Of course the latter is problematic when it comes to
>> guaranteeing a stable guest ABI... I think a reasonable interface
>> would be similar to what we have for GIC, with a 'version' attribute
>> used to explicitly choose between MTE and MTE3, and some logic to
>> fill in a reasonable value for the host by default.
>> 
>> But there's also the question of whether MTE should be a machine
>> property in the first place, rather than a CPU feature?
>
> I admit that my QEMU code base understanding is limited, but the patch
> you've linked doesn't really expose any CPU feature that libvirt could
> set. Instead, it enables MTE for KVM under the same API, i.e. -machine
> virt,mte=on.

This has been through some iterations... we (as in people working on
this in QEMU) need to decide on where to go with cpu features, cpu
models, etc. on Arm, but for now, it's a virt machine property.

I have considered doing a GIC-like configuration, but for starters, the
kernel doesn't support configuring the MTE version yet... and I'm not
sure if configuring the MTE version (vs enabling/disabling MTE) should
not be modeled as a cpu property instead.

Note that my patch adds a migration blocker when enabling MTE, so (1)
nothing bad regarding migration compatibility should happen yet, and (2)
libvirt should probably not turn it on by default (I haven't checked
what the patches actually do ;)

[Also, I don't think there is any readily available hardware supporting
MTE yet; I have been testing my code on the simulator...]



Re: [PATCH] virttools-planet: update location of Cornelia's blog

2021-11-02 Thread Cornelia Huck
On Tue, Nov 02 2021, Daniel P. Berrangé  wrote:

> On Tue, Nov 02, 2021 at 12:10:42PM +0100, Cornelia Huck wrote:
>> Signed-off-by: Cornelia Huck 
>> ---
>>  virt-tools/config.ini | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/virt-tools/config.ini b/virt-tools/config.ini
>> index f56495a26fd8..89b4e5ae6f91 100644
>> --- a/virt-tools/config.ini
>> +++ b/virt-tools/config.ini
>> @@ -162,7 +162,7 @@ name = Cédric Bosdonnat
>>  [http://www.studiopixl.com/feeds/rss]
>>  name = Nathan Gauër
>>  
>> -[https://virtualpenguins.blogspot.com/feeds/posts/default]
>> +[https://people.redhat.com/~cohuck/feed.xml]
>>  name = Cornelia Huck
>>  
>>  [http://www.otubo.net/feeds/posts/default/-/virt/?alt=rss]
>
> Change looks fine, but I forgot to update the instructions to
> request opening a merge request now, instead of sending email.
>
> Could you open a MR against
>
>https://gitlab.com/libvirt/virttools-planet/

Done: https://gitlab.com/libvirt/virttools-planet/-/merge_requests/3



[PATCH] virttools-planet: update location of Cornelia's blog

2021-11-02 Thread Cornelia Huck
Signed-off-by: Cornelia Huck 
---
 virt-tools/config.ini | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/virt-tools/config.ini b/virt-tools/config.ini
index f56495a26fd8..89b4e5ae6f91 100644
--- a/virt-tools/config.ini
+++ b/virt-tools/config.ini
@@ -162,7 +162,7 @@ name = Cédric Bosdonnat
 [http://www.studiopixl.com/feeds/rss]
 name = Nathan Gauër
 
-[https://virtualpenguins.blogspot.com/feeds/posts/default]
+[https://people.redhat.com/~cohuck/feed.xml]
 name = Cornelia Huck
 
 [http://www.otubo.net/feeds/posts/default/-/virt/?alt=rss]
-- 
2.31.1



Re: [PATCH 2/2] docs/about: Unify the subject format

2021-08-20 Thread Cornelia Huck
On Fri, Aug 20 2021, Yanan Wang  wrote:

> Unify the subject format in deprecated.rst to "since X.Y".
> Unify the subject format in removed-features.rst to "removed in X.Y".

It seems unlikely that we will ever deprecate something in a stable
release, and even more unlikely that we'll remove something in one, so
the short versions look like the thing we want to standardize on.

>
> Signed-off-by: Yanan Wang 
> ---
>  docs/about/deprecated.rst   | 56 -
>  docs/about/removed-features.rst | 28 -
>  2 files changed, 42 insertions(+), 42 deletions(-)

Unrelated to your patch, line 143 in removed-features.rst reads

``-vnc ...,tls=...``, ``-vnc ...,x509=...`` & ``-vnc ...,x509verify=...``

and is missing the release it was removed in (presumably 3.1?)

Anyway,

Reviewed-by: Cornelia Huck 



Re: [libvirt PATCH v3 00/21] Add support for persistent mediated devices

2021-02-05 Thread Cornelia Huck
On Tue, 2 Feb 2021 08:28:52 +0100
Erik Skultety  wrote:

> On Mon, Feb 01, 2021 at 04:38:56PM -0700, Alex Williamson wrote:
> > On Mon, 1 Feb 2021 16:57:44 -0600
> > Jonathon Jongsma  wrote:
> >   
> > > On Mon, 1 Feb 2021 11:33:08 +0100
> > > Erik Skultety  wrote:
> > >   
> > > > On Mon, Feb 01, 2021 at 09:48:11AM +, Daniel P. Berrangé wrote:
> > > > > On Mon, Feb 01, 2021 at 10:45:43AM +0100, Erik Skultety wrote:  
> > > > > > On Mon, Feb 01, 2021 at 09:42:32AM +, Daniel P. Berrangé
> > > > > > wrote:  
> > > > > > > On Fri, Jan 29, 2021 at 05:34:36PM -0600, Jonathon Jongsma
> > > > > > > wrote:  
> > > > > > > > On Thu, 7 Jan 2021 17:43:54 +0100
> > > > > > > > Erik Skultety  wrote:
> > > > > > > >   
> > > > > > > > > > Tested with v6.10.0-283-g1948d4e61e.
> > > > > > > > > > 
> > > > > > > > > > 1.Can define/start/destroy mdev device successfully;
> > > > > > > > > > 
> > > > > > > > > > 2.'virsh nodedev-list' has no '--active' option, which is
> > > > > > > > > > inconsistent with the description in the patch:
> > > > > > > > > > # virsh nodedev-list --active
> > > > > > > > > > error: command 'nodedev-list' doesn't support option
> > > > > > > > > > --active
> > > > > > > > > > 
> > > > > > > > > > 3.virsh client hang when trying to destroy a mdev device
> > > > > > > > > > which is using by a vm, and after that all 'virsh nodev*'
> > > > > > > > > > cmds will hang. If restarting llibvirtd after that,
> > > > > > > > > > libvirtd will hang.
> > > > > > > > > 
> > > > > > > > > It hangs because underneath a write to the 'remove' sysfs
> > > > > > > > > attribute is now blocking for some reason and since we're
> > > > > > > > > relying on mdevctl to do it for us, hence "it hangs". I'm
> > > > > > > > > not trying to make an excuse, it's plain wrong. I'd love to
> > > > > > > > > rely on such a basic functionality, but it looks like we'll
> > > > > > > > > have to go with a extremely ugly workaround and try to get
> > > > > > > > > the list of active domains from the nodedev driver and see
> > > > > > > > > whether any of them has the device assigned before we try
> > > > > > > > > to destroy the mdev via the nodedev driver.  
> > > > > > > > 
> > > > > > > > So, I've been trying to figure out a way to do this, but as
> > > > > > > > far as I know, there's no way to get a list of active domains
> > > > > > > > from within the nodedev driver, and I can't think of any
> > > > > > > > better ways to handle it. Any ideas?  
> > > > > > > 
> > > > > > > Correct, the nodedev driver isn't permitted to talk to any of
> > > > > > > the virt drivers.  
> > > > > > 
> > > > > > Oh, not even via secondary connection? What makes nodedev so
> > > > > > special, since we can open a secondary connection from e.g. the
> > > > > > storage driver?  
> > > > > 
> > > > > It is technically possible, but it should never be done, because it
> > > > > introduces a bi-directional dependancy between the daemons which
> > > > > introduces the danger of deadlocking them. None of the secondary
> > > > > drivers should connect to the hypervisor drivers.
> > > > >   
> > > > > > > Is there anything in sysfs which reports whether the device is
> > > > > > > in use ?  
> > > > > > 
> > > > > > Nothing that I know of, the way it used to work was that you
> > > > > > tried to write to sysfs and kernel returned a write error with
> > > > > > "device in use" or something like that, but that has changed
> > > > > > since :(.  
> > > > 
> > > > Without having tried this and since mdevctl is just a Bash script,
> > > > can we bypass mdevctl on destroys a little bit by constructing the
> > > > path to the sysfs attribute ourselves and perform a non-blocking
> > > > write of zero bytes to see if we get an error? If so, we'll skip
> > > > mdevctl and report an error. If we don't, we'll invoke mdevctl to
> > > > remove the device in order to remain consistent. Would that be an
> > > > acceptable workaround (provided it would work)?
> > > 
> > > As far as I can tell, this doesn't work. According to my
> > > tests, attempting to write zero bytes to $(mdev_sysfs_path)/remove
> > > doesn't result in an error if the mdev is in use by a vm. It just
> > > "successfully" writes zero bytes. Adding Alex to cc in case he has an
> > > idea for a workaround here.  
> > 
> > [Cc +Connie]
> > 
> > I'm not really sure why mdevs are unique here.  When we write to
> > remove, the first step is to release the device from the driver, so
> > it's really the same as an unbind for a vfio-pci device.  PCI drivers
> > cannot return an error, an unbind is handled not as a request, but a
> > directive, so when the device is in use we block until the unbind can
> > complete.  With vfio-pci (and added upstream to the mdev core -
> > depending on vendor support), the driver remove callback triggers a
> > virtual interrupt to the user asking to cooperatively return the device
> > (triggering a hot-unplug in 

Re: [PATCH v2] docs: Clarify the documentation of the elements

2021-01-29 Thread Cornelia Huck
On Fri, 29 Jan 2021 15:37:26 +0100
Thomas Huth  wrote:

> The channel subsystem elements describe a channel in the I/O subsystem
> of a s390x machine, and not a normal device (like a disk or network card).
> Reword the documentation here to make it this a little bit clearer.
> 
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1898074
> Reviewed-by: Boris Fiuczynski  
> Signed-off-by: Thomas Huth 
> ---
>  v2: "channel" -> "subchannel" as suggested by Cornelia
> 
>  docs/formatnode.html.in | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/docs/formatnode.html.in b/docs/formatnode.html.in
> index 1010a37a3d..5c7286df8a 100644
> --- a/docs/formatnode.html.in
> +++ b/docs/formatnode.html.in
> @@ -396,15 +396,15 @@
>  
>
>css
> -  Describes a Channel SubSystem (CSS) device commonly found on
> -  the S390 architecture. Sub-elements include:
> +  Describes a subchannel in the Channel SubSystem (CSS) commonly
> +  found on the S390 architecture. Sub-elements include:
>  
>cssid
>The channel subsystem identifier.
>ssid
>The subchannel-set identifier.
>devno
> -  The device number.
> +  The subchannel number.
>    capability
>
>  This optional element can occur multiple times. If it

Reviewed-by: Cornelia Huck 



Re: [PATCH] docs: Clarify the documentation of the elements

2021-01-29 Thread Cornelia Huck
On Fri, 29 Jan 2021 11:04:44 +0100
Thomas Huth  wrote:

> The channel subsystem elements describe a channel in the I/O subsystem
> of a s390x machine, and not a normal device (like a disk or network card).
> Reword the documentation here to make it this a little bit clearer.
> 
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1898074
> Signed-off-by: Thomas Huth 
> ---
>  docs/formatnode.html.in | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/docs/formatnode.html.in b/docs/formatnode.html.in
> index 1010a37a3d..8fbd3e6ece 100644
> --- a/docs/formatnode.html.in
> +++ b/docs/formatnode.html.in
> @@ -396,15 +396,15 @@
>  
>
>css
> -  Describes a Channel SubSystem (CSS) device commonly found on
> -  the S390 architecture. Sub-elements include:
> +  Describes a channel in the Channel SubSystem (CSS) commonly 
> found

s/channel/subchannel/, to be a bit nitpicky?

> +  on the S390 architecture. Sub-elements include:
>  
>cssid
>The channel subsystem identifier.
>ssid
>The subchannel-set identifier.
>devno
> -  The device number.
> +  The subchannel number.
>capability
>
>  This optional element can occur multiple times. If it



Re: [PATCH libvirt v4 00/12] Support AP card, AP queues and AP matrix

2020-12-04 Thread Cornelia Huck
On Fri, 4 Dec 2020 16:31:56 +0100
Boris Fiuczynski  wrote:

> On 12/4/20 11:24 AM, Erik Skultety wrote:
> > On Thu, Dec 03, 2020 at 06:59:32PM +0100, Shalini Chellathurai Saroja 
> > wrote:  
> >> Add support for AP card devices, AP queues and AP matrix devices in
> >> libvirt node device driver.
> >> ---
> >> v4:
> >>   - Added virNodeDevAPAdapterParseXML function to extract the adapter
> >> parsing logic.
> >>   - Modified according to review comments.
> >>   - New patch to mention support for AP devices in NEWS.rst.  
> > 
> > Reviewed-by: Erik Skultety 
> > 
> > I had 2 nitpicks which I can fix before merging, but I'd like to give other
> > people a couple more days to express their "final" opinions and if there 
> > are no
> > more comments, then sometime next week I'll merge this.
> > There's one more little thing...Boris linked the s390 AP facility kernel
> > documentation which really helped me during the review, so I think we should
> > link it somewhere too - usually we're not so keen on doing that because 3rd
> > party documentation URLs tend to die or migrate, but in this case it linked
> > directly to the github repo (I think even the generated HTML on kernel.org
> > would be just fine), but I don't know what the right place for this 
> > actually is
> > as it describes the whole facility which we modelled in 3 capabilities. We
> > could put it into the NEWS file, but then again, not sure how often anyone
> > developing libvirt reads the NEWS file.
> > 
> > Regards,
> > Erik
> >   
> 
> Erik,
> thanks for your review.
> How about we put for developers the links into the device originating 
> commit messages:



> Patch 1+3: 
> https://github.com/torvalds/linux/blob/master/Documentation/s390/vfio-ap.rst#ap-architectural-overview

For the link, I would use

https://www.kernel.org/doc/html/latest/s390/vfio-ap.html#ap-architectural-overview

> Patch 6: 
> https://github.com/torvalds/linux/blob/master/Documentation/s390/vfio-ap.rst#the-design

and

https://www.kernel.org/doc/html/latest/s390/vfio-ap.html#the-design



> 
> Another alternative could be to add the links into 
> docs/formatnode.html.in but that might be too much information for the 
> normal libvirt users not interested in these details.
> 



Re: [PATCH v1 05/26] qemu_domain_address: Reformat qemuDomainAssignS390Addresses()

2020-11-30 Thread Cornelia Huck
On Mon, 30 Nov 2020 11:18:20 +0100
Michal Privoznik  wrote:

> On 11/30/20 10:38 AM, Thomas Huth wrote:
> > On 27/11/2020 16.02, Michal Privoznik wrote:  
> >> Signed-off-by: Michal Privoznik 
> >> ---
> >>   src/qemu/qemu_domain_address.c | 10 --
> >>   1 file changed, 4 insertions(+), 6 deletions(-)

> >>   } else if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_VIRTIO_S390)) {  
> > 
> > Not related to your patch, but an idea for a future clean-up: That
> > QEMU_CAPS_VIRTIO_S390 seems to belong to the ancient "s390-virtio" (without
> > ccw) machine that has been removed in QEMU v2.6 already:
> > 
> > https://git.qemu.org/?p=qemu.git;a=commitdiff;h=7b3fdbd9a82
> > https://git.qemu.org/?p=qemu.git;a=commitdiff;h=3538fb6f89d
> > 
> > IIRC, that machine was already considered as deprecated since a couple of
> > earlier QEMU releases, so I really doubt that anybody is still using that in
> > production today.
> > 
> > Thus I think that all code related to QEMU_CAPS_VIRTIO_S390 could likely be
> > removed from libvirt nowadays.  
> 
> That is even better idea. But currently libvirt supports QEMU-1.5.0 and 
> newer. So I think we shouldn't remove that until the minimum version is 
> bumped even though we think feature has no users.
> 
> https://gitlab.com/libvirt/libvirt/-/commit/c1bc9c662b4
> 
> Although, it might be about time to look again what is the oldest QEMU 
> we need to support.

Would be great if you could bump it enough to get rid of the old
virtio-s390 transport :)

FWIW, virtio-ccw was introduced in QEMU 1.4, and became the default
with QEMU 2.4, although it had supplanted virtio-s390 well before that.
What are the criteria for possibly removing support for a feature in
libvirt: that nobody would use it in practice, or that nobody would be
able to use it?



Re: [PATCH 3/3] domain_conf: Allow to look up scsi disks when controller uses a CCW address

2020-11-27 Thread Cornelia Huck
On Wed, 25 Nov 2020 12:06:48 +0100
Thomas Huth  wrote:

> On s390x, devices are attached to the channel IO subsytem by default,

s/attached to the channel IO subsystem/accessed via the channel subsystem/

> so we need to look up scsi controllers via their CCW address there
> instead of using PCI.
> 
> This fixes "virsh domfsinfo" on s390x for virtio-scsi devices (the first
> attempt from commit f8333b3b0a7 did it in the wrong way, reporting the
> device name on the guest side instead of the target name on the host side).
> 
> Fixes: f8333b3b0a ("qemu: Fix domfsinfo for non-PCI device information ...")
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1858771
> Signed-off-by: Thomas Huth 
> ---
>  src/conf/domain_conf.c | 23 +++
>  src/conf/domain_conf.h |  2 ++
>  2 files changed, 25 insertions(+)
> 
> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
> index 00c115d453..d617580fb3 100644
> --- a/src/conf/domain_conf.c
> +++ b/src/conf/domain_conf.c
> @@ -17610,6 +17610,12 @@ virDomainDiskIndexByAddress(virDomainDefPtr def,
>  if ((cidx = virDomainControllerFindByPCIAddress(def, pci_address)) >= 0)
>  controller = def->controllers[cidx];
>  
> +if (!controller && ccw_addr) {
> +cidx = virDomainControllerFindByCCWAddress(def, ccw_addr);
> +if (cidx >= 0)
> +controller = def->controllers[cidx];
> +}

Maybe
   if ((cidx = virDomainControllerFindByCCWAddress(def, ccw_addr)) >= 0)
   controller = def->controllers[cidx];

to match the pci case right above?

(I find your way more readable, though.)

> +
>  for (i = 0; i < def->ndisks; i++) {
>  vdisk = def->disks[i];
>  if (vdisk->info.type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_PCI &&

(...)

LGTM.



Re: [PATCH 2/3] domain_conf: Allow to look up virtio-block devices by their CCW address

2020-11-27 Thread Cornelia Huck
On Wed, 25 Nov 2020 12:06:47 +0100
Thomas Huth  wrote:

> On s390x, devices are attached to the channel IO subsytem by default,

s/attached to the channel IO subsystem/accessed via the channel subsystem/

> so we need to look up the devices via their CCW address there instead
> of using PCI.
> 
> This fixes "virsh domfsinfo" on s390x for virtio-block devices (the first
> attempt from commit f8333b3b0a7 did it in the wrong way, reporting the
> device name on the guest side instead of the target name on the host side).
> 
> Fixes: f8333b3b0a ("qemu: Fix domfsinfo for non-PCI device information ...")
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1858771
> Signed-off-by: Thomas Huth 
> ---
>  src/conf/domain_conf.c | 10 +-
>  src/conf/domain_conf.h |  2 ++
>  src/qemu/qemu_driver.c |  8 +---
>  3 files changed, 16 insertions(+), 4 deletions(-)

Keeping my limited libvirt knowledge in mind:

Reviewed-by: Cornelia Huck 



Re: [PATCH 1/3] qemu: agent: Store CCW address in qemuAgentDiskInfo if provided by the guest

2020-11-27 Thread Cornelia Huck
On Wed, 25 Nov 2020 12:06:46 +0100
Thomas Huth  wrote:

> Newer versions of the QEMU guest agent will provide the CCW address
> of devices on s390x. Store this information in the qemuAgentDiskInfo
> so that we can use this later.
> 
> We also map the CSSID 0 from the guest to the value 0xfe on the host,
> see https://www.qemu.org/docs/master/system/s390x/css.html for details.
> 
> Signed-off-by: Thomas Huth 
> ---
>  src/qemu/qemu_agent.c | 11 +++
>  src/qemu/qemu_agent.h |  2 ++
>  2 files changed, 13 insertions(+)
> 

(...)

> @@ -1916,6 +1917,16 @@ qemuAgentGetFSInfoFillDisks(virJSONValuePtr jsondisks,
>  GET_DISK_ADDR(pci, >pci_controller.bus, "bus");
>  GET_DISK_ADDR(pci, >pci_controller.slot, "slot");
>  GET_DISK_ADDR(pci, >pci_controller.function, "function");
> +
> +if ((ccw = virJSONValueObjectGet(jsondisk, "ccw-address"))) {
> +disk->has_ccw_address = true;
> +GET_DISK_ADDR(ccw, >ccw_addr.cssid, "cssid");
> +if (disk->ccw_addr.cssid == 0)  /* Guest CSSID 0 is 0xfe on host 
> */
> +disk->ccw_addr.cssid = 0xfe;

This will be true for any guest that doesn't support MCSS-E. I don't
see any MCSS-E enablement coming up in the foreseeable future (in fact,
I get the impression that this feature is rather dead, and QEMU remains
the only implementation anyway), so this should be fine.

> +GET_DISK_ADDR(ccw, >ccw_addr.ssid, "ssid");
> +GET_DISK_ADDR(ccw, >ccw_addr.devno, "devno");
> +}
> +
>  #undef GET_DISK_ADDR
>  }
>  

(...)



Availability of physical devices and migration (was: Re: [RFC PATCH 2/2] vfio-ccw: Connect the device request notifier)

2020-11-20 Thread Cornelia Huck
On Fri, 20 Nov 2020 03:51:07 +0100
Halil Pasic  wrote:

> On Tue, 17 Nov 2020 04:26:05 +0100
> Eric Farman  wrote:
> 
> > Now that the vfio-ccw code has a notifier interface to request that
> > a device be unplugged, let's wire that together.  
> 
> I'm aware of the fact that performing an unplug is what vfio-pci does,
> but I was not aware of this before, and I became curious with regards
> to how is this going to mesh with migration (in the future).
> 
> The sentence 'For this to work, QEMU has to be launched with the same
> arguments the two times.' form docs/devel/migration.rst should not
> be taken literally, I know, but the VM configuration changing not because
> it was changed via a management interface, but because of events that
> may not be under the control of whoever is managing the VM does
> make thinks harder to reason about.
> 
> I suppose, we now basically mandate that whoever manages the VM, either
> a) maintains his own model of the VM and listens to the events, to
> update it if such a device removal happens, or
> b) inspects the VM at some appropriate point in time, to figure out how
> the target VM is supposed to be started.
> 
> I think libvirt does a).

This seems like something that would be of general interest to libvirt
folks, adding the list on cc:.

For virtual devices, QEMU and any management software are in full
control, and can simply make sure that both source and target have the
device available.

For physical devices, we still can make sure that source and target
match when we do the setup, but device failures can happen at
inconvenient times. It may suddenly be no longer possible to access
state etc. Can we propagate changes like "device foobar, don't bother
migrating" even when we already started migration? Should the handling
be different if the target system uses a different (compatible) device?
Should we fail the migration?

> 
> I also suppose, such a device removal may not happen during device
> migration. That is, form the QEMU perspective I  believe taken care
> by the BQL.

Even if the device is not actually removed, it might still be
inaccessible.

> 
> But I'm in the dark regarding the management software/libivrt view. For
> example what happens if we get a req_notification on the target while
> pre-copy memory migration? At that point the destination is already
> receiving pages, thus it is already constructed.
> 
> My questions are not necessarily addressed to you Eric. Maybe the
> folks working on vfio migration can help us out. I'm also adding
> our libvirt guys.
> 
> Regards,
> Halil
> 



Re: [PATCH] news: Mention nodedev support for CSS on S390

2020-10-27 Thread Cornelia Huck
On Tue, 27 Oct 2020 13:52:04 +0100
Boris Fiuczynski  wrote:

> Signed-off-by: Boris Fiuczynski 
> ---
>  NEWS.rst | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/NEWS.rst b/NEWS.rst
> index 5bd8ed1c91..2fef3f706c 100644
> --- a/NEWS.rst
> +++ b/NEWS.rst
> @@ -13,6 +13,12 @@ v6.9.0 (unreleased)
>  
>  * **New features**
>  
> +  * nodedev: Add support for channel subsystem (CSS) devices on S390
> +
> +A CSS device is represented as a parent device of a CCW device.
> +This support allows to create vfio-ccw mediated devices with
> +``virNodeDeviceCreateXML()``.
> +
>* qemu: Implement memory failure event
>  
>  New event is implemented that is emitted whenever a guest encounters a

Reviewed-by: Cornelia Huck 



[PATCH] virDomainNetFindIdx: add support for CCW addresses

2020-09-24 Thread Cornelia Huck
Allow to match with CCW addresses in addition to PCI addresses
(and MAC addresses).

Signed-off-by: Cornelia Huck 
---
 src/conf/device_conf.c   | 12 
 src/conf/device_conf.h   |  3 +++
 src/conf/domain_conf.c   | 23 ++-
 src/libvirt_private.syms |  1 +
 4 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/src/conf/device_conf.c b/src/conf/device_conf.c
index 9398191dfd84..87bf32bbc685 100644
--- a/src/conf/device_conf.c
+++ b/src/conf/device_conf.c
@@ -341,6 +341,18 @@ virDomainDeviceCCWAddressParseXML(xmlNodePtr node,
 return 0;
 }
 
+bool
+virDomainDeviceCCWAddressEqual(virDomainDeviceCCWAddressPtr addr1,
+   virDomainDeviceCCWAddressPtr addr2)
+{
+if (addr1->cssid == addr2->cssid &&
+addr1->ssid == addr2->ssid &&
+addr1->devno == addr2->devno) {
+return true;
+}
+return false;
+}
+
 int
 virDomainDeviceDriveAddressParseXML(xmlNodePtr node,
 virDomainDeviceDriveAddressPtr addr)
diff --git a/src/conf/device_conf.h b/src/conf/device_conf.h
index d7395f220174..a51bdf10ee6e 100644
--- a/src/conf/device_conf.h
+++ b/src/conf/device_conf.h
@@ -207,6 +207,9 @@ void virPCIDeviceAddressFormat(virBufferPtr buf,
 bool virDomainDeviceCCWAddressIsValid(virDomainDeviceCCWAddressPtr addr);
 int virDomainDeviceCCWAddressParseXML(xmlNodePtr node,
   virDomainDeviceCCWAddressPtr addr);
+bool virDomainDeviceCCWAddressEqual(virDomainDeviceCCWAddressPtr addr1,
+virDomainDeviceCCWAddressPtr addr2);
+#define VIR_CCW_DEVICE_ADDRESS_FMT "%x.%x.%04x"
 
 int virDomainDeviceDriveAddressParseXML(xmlNodePtr node,
 virDomainDeviceDriveAddressPtr addr);
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
index 8d30557bdcbe..a91dbd4aa95b 100644
--- a/src/conf/domain_conf.c
+++ b/src/conf/domain_conf.c
@@ -17816,6 +17816,8 @@ virDomainNetFindIdx(virDomainDefPtr def, 
virDomainNetDefPtr net)
 bool MACAddrSpecified = !net->mac_generated;
 bool PCIAddrSpecified = virDomainDeviceAddressIsValid(>info,
   
VIR_DOMAIN_DEVICE_ADDRESS_TYPE_PCI);
+bool CCWAddrSpecified = virDomainDeviceAddressIsValid(>info,
+  
VIR_DOMAIN_DEVICE_ADDRESS_TYPE_CCW);
 
 for (i = 0; i < def->nnets; i++) {
 if (MACAddrSpecified &&
@@ -17827,9 +17829,14 @@ virDomainNetFindIdx(virDomainDefPtr def, 
virDomainNetDefPtr net)
   >info.addr.pci))
 continue;
 
+if (CCWAddrSpecified &&
+!virDomainDeviceCCWAddressEqual(>nets[i]->info.addr.ccw,
+>info.addr.ccw))
+continue;
+
 if (matchidx >= 0) {
 /* there were multiple matches on mac address, and no
- * qualifying guest-side PCI address was given, so we must
+ * qualifying guest-side PCI/CCW address was given, so we must
  * fail (NB: a USB address isn't adequate, since it may
  * specify only vendor and product ID, and there may be
  * multiples of those.
@@ -17859,6 +17866,14 @@ virDomainNetFindIdx(virDomainDefPtr def, 
virDomainNetDefPtr net)
net->info.addr.pci.bus,
net->info.addr.pci.slot,
net->info.addr.pci.function);
+} else if (MACAddrSpecified && CCWAddrSpecified) {
+virReportError(VIR_ERR_DEVICE_MISSING,
+   _("no device matching MAC address %s found on "
+ VIR_CCW_DEVICE_ADDRESS_FMT),
+   virMacAddrFormat(>mac, mac),
+   net->info.addr.ccw.cssid,
+   net->info.addr.ccw.ssid,
+   net->info.addr.ccw.devno);
 } else if (PCIAddrSpecified) {
 virReportError(VIR_ERR_DEVICE_MISSING,
_("no device found on " VIR_PCI_DEVICE_ADDRESS_FMT),
@@ -17866,6 +17881,12 @@ virDomainNetFindIdx(virDomainDefPtr def, 
virDomainNetDefPtr net)
net->info.addr.pci.bus,
net->info.addr.pci.slot,
net->info.addr.pci.function);
+} else if (CCWAddrSpecified) {
+virReportError(VIR_ERR_DEVICE_MISSING,
+   _("no device found on " VIR_CCW_DEVICE_ADDRESS_FMT),
+   net->info.addr.ccw.cssid,
+   net->info.addr.ccw.ssid,
+   net->info.addr.ccw.devno);
 } else if (MACAddrSpecified) {
  

Re: [PATCH v2] cphp: remove deprecated cpu-add command(s)

2020-09-14 Thread Cornelia Huck
On Mon, 14 Sep 2020 03:46:14 -0400
Igor Mammedov  wrote:

> theses were deprecated since 4.0, remove both HMP and QMP variants.

s/theses/These/

> 
> Users should use device_add command instead. To get list of
> possible CPUs and options, use 'info hotpluggable-cpus' HMP
> or query-hotpluggable-cpus QMP command.
> 
> Signed-off-by: Igor Mammedov 
> Reviewed-by: Thomas Huth 
> Acked-by: Dr. David Alan Gilbert 
> ---
>  include/hw/boards.h |   1 -
>  include/hw/i386/pc.h|   1 -
>  include/monitor/hmp.h   |   1 -
>  docs/system/deprecated.rst  |  25 +
>  hmp-commands.hx |  15 --
>  hw/core/machine-hmp-cmds.c  |  12 -
>  hw/core/machine-qmp-cmds.c  |  12 -
>  hw/i386/pc.c|  27 --
>  hw/i386/pc_piix.c   |   1 -
>  hw/s390x/s390-virtio-ccw.c  |  12 -
>  qapi/machine.json   |  24 -
>  tests/qtest/cpu-plug-test.c | 100 
>  tests/qtest/test-hmp.c  |   1 -
>  13 files changed, 21 insertions(+), 211 deletions(-)

Acked-by: Cornelia Huck 



Re: device compatibility interface for live migration with assigned devices

2020-09-11 Thread Cornelia Huck
On Fri, 11 Sep 2020 08:56:00 +0800
Yan Zhao  wrote:

> On Thu, Sep 10, 2020 at 12:02:44PM -0600, Alex Williamson wrote:
> > On Thu, 10 Sep 2020 13:50:11 +0100
> > Sean Mooney  wrote:
> >   
> > > On Thu, 2020-09-10 at 14:38 +0200, Cornelia Huck wrote:  
> > > > On Wed, 9 Sep 2020 10:13:09 +0800
> > > > Yan Zhao  wrote:
> > > > 
> > > > > > > still, I'd like to put it more explicitly to make ensure it's not 
> > > > > > > missed:
> > > > > > > the reason we want to specify compatible_type as a trait and check
> > > > > > > whether target compatible_type is the superset of source
> > > > > > > compatible_type is for the consideration of backward 
> > > > > > > compatibility.
> > > > > > > e.g.
> > > > > > > an old generation device may have a mdev type xxx-v4-yyy, while a 
> > > > > > > newer
> > > > > > > generation  device may be of mdev type xxx-v5-yyy.
> > > > > > > with the compatible_type traits, the old generation device is 
> > > > > > > still
> > > > > > > able to be regarded as compatible to newer generation device even 
> > > > > > > their
> > > > > > > mdev types are not equal.  
> > > > > > 
> > > > > > If you want to support migration from v4 to v5, can't the 
> > > > > > (presumably
> > > > > > newer) driver that supports v5 simply register the v4 type as well, 
> > > > > > so
> > > > > > that the mdev can be created as v4? (Just like QEMU versioned 
> > > > > > machine
> > > > > > types work.)  
> > > > > 
> > > > > yes, it should work in some conditions.
> > > > > but it may not be that good in some cases when v5 and v4 in the name 
> > > > > string
> > > > > of mdev type identify hardware generation (e.g. v4 for gen8, and v5 
> > > > > for
> > > > > gen9)
> > > > > 
> > > > > e.g.
> > > > > (1). when src mdev type is v4 and target mdev type is v5 as
> > > > > software does not support it initially, and v4 and v5 identify 
> > > > > hardware
> > > > > differences.
> > > > 
> > > > My first hunch here is: Don't introduce types that may be compatible
> > > > later. Either make them compatible, or make them distinct by design,
> > > > and possibly add a different, compatible type later.
> > > > 
> > > > > then after software upgrade, v5 is now compatible to v4, should the
> > > > > software now downgrade mdev type from v5 to v4?
> > > > > not sure if moving hardware generation info into a separate attribute
> > > > > from mdev type name is better. e.g. remove v4, v5 in mdev type, while 
> > > > > use
> > > > > compatible_pci_ids to identify compatibility.
> > > > 
> > > > If the generations are compatible, don't mention it in the mdev type.
> > > > If they aren't, use distinct types, so that management software doesn't
> > > > have to guess. At least that would be my naive approach here.  

[*]
  
> > > yep that is what i would prefer to see too.  
> > > > 
> > > > > 
> > > > > (2) name string of mdev type is composed by "driver_name + type_name".
> > > > > in some devices, e.g. qat, different generations of devices are 
> > > > > binding to
> > > > > drivers of different names, e.g. "qat-v4", "qat-v5".
> > > > > then though type_name is equal, mdev type is not equal. e.g.
> > > > > "qat-v4-type1", "qat-v5-type1".
> > > > 
> > > > I guess that shows a shortcoming of that "driver_name + type_name"
> > > > approach? Or maybe I'm just confused.
> > > yes i really dont like haveing the version in the mdev-type name 
> > > i would stongly perfger just qat-type-1 wehere qat is just there as a way 
> > > of namespacing.
> > > although symmetric-cryto, asymmetric-cryto and compression woudl be a 
> > > better name then type-1, type-2, type-3 if
> > > that is what they would end up mapping too. e.g. qat-compression or 
> > > qat-aes is a much better name then type-1
> &g

Re: device compatibility interface for live migration with assigned devices

2020-09-10 Thread Cornelia Huck
On Wed, 9 Sep 2020 10:13:09 +0800
Yan Zhao  wrote:

> > > still, I'd like to put it more explicitly to make ensure it's not missed:
> > > the reason we want to specify compatible_type as a trait and check
> > > whether target compatible_type is the superset of source
> > > compatible_type is for the consideration of backward compatibility.
> > > e.g.
> > > an old generation device may have a mdev type xxx-v4-yyy, while a newer
> > > generation  device may be of mdev type xxx-v5-yyy.
> > > with the compatible_type traits, the old generation device is still
> > > able to be regarded as compatible to newer generation device even their
> > > mdev types are not equal.  
> > 
> > If you want to support migration from v4 to v5, can't the (presumably
> > newer) driver that supports v5 simply register the v4 type as well, so
> > that the mdev can be created as v4? (Just like QEMU versioned machine
> > types work.)  
> yes, it should work in some conditions.
> but it may not be that good in some cases when v5 and v4 in the name string
> of mdev type identify hardware generation (e.g. v4 for gen8, and v5 for
> gen9)
> 
> e.g.
> (1). when src mdev type is v4 and target mdev type is v5 as
> software does not support it initially, and v4 and v5 identify hardware
> differences.

My first hunch here is: Don't introduce types that may be compatible
later. Either make them compatible, or make them distinct by design,
and possibly add a different, compatible type later.

> then after software upgrade, v5 is now compatible to v4, should the
> software now downgrade mdev type from v5 to v4?
> not sure if moving hardware generation info into a separate attribute
> from mdev type name is better. e.g. remove v4, v5 in mdev type, while use
> compatible_pci_ids to identify compatibility.

If the generations are compatible, don't mention it in the mdev type.
If they aren't, use distinct types, so that management software doesn't
have to guess. At least that would be my naive approach here.

> 
> (2) name string of mdev type is composed by "driver_name + type_name".
> in some devices, e.g. qat, different generations of devices are binding to
> drivers of different names, e.g. "qat-v4", "qat-v5".
> then though type_name is equal, mdev type is not equal. e.g.
> "qat-v4-type1", "qat-v5-type1".

I guess that shows a shortcoming of that "driver_name + type_name"
approach? Or maybe I'm just confused.



Re: [PATCH 5/5] node_device: mdev vfio-ccw support

2020-09-09 Thread Cornelia Huck
On Wed, 9 Sep 2020 14:40:33 +0200
Erik Skultety  wrote:

> On Wed, Sep 09, 2020 at 08:23:39AM +0200, Bjoern Walk wrote:
> > Erik Skultety  [2020-09-08, 06:01PM +0200]:  
> > > On Mon, Aug 24, 2020 at 01:59:15PM +0200, Bjoern Walk wrote:  
> > > > From: Boris Fiuczynski 
> > > > +case VIR_NODE_DEV_CAP_SYSTEM:
> > > > +case VIR_NODE_DEV_CAP_USB_DEV:
> > > > +case VIR_NODE_DEV_CAP_USB_INTERFACE:
> > > > +case VIR_NODE_DEV_CAP_NET:
> > > > +case VIR_NODE_DEV_CAP_SCSI_HOST:
> > > > +case VIR_NODE_DEV_CAP_SCSI_TARGET:
> > > > +case VIR_NODE_DEV_CAP_SCSI:
> > > > +case VIR_NODE_DEV_CAP_STORAGE:
> > > > +case VIR_NODE_DEV_CAP_FC_HOST:
> > > > +case VIR_NODE_DEV_CAP_VPORTS:
> > > > +case VIR_NODE_DEV_CAP_SCSI_GENERIC:
> > > > +case VIR_NODE_DEV_CAP_DRM:
> > > > +case VIR_NODE_DEV_CAP_MDEV_TYPES:
> > > > +case VIR_NODE_DEV_CAP_MDEV:
> > > > +case VIR_NODE_DEV_CAP_CCW_DEV:  
> > >
> > > What about ccw mdevs? I would think that it's CCW end point device that
> > > we would want to assign concurrently, but we're only "slicing" the root 
> > > channel
> > > subsystem device. If you have and CSS mdev, are the CCW subchannels 
> > > separate
> > > for each VM or does it behave kind of like NPIV with its vHBAs? OR I'm
> > > completely off here?  
> >
> > You are correct. This is an unfortunate architectural decision on our
> > platform. For vfio-ccw, we instatiate the mdevs on the CSS layer, which
> > has a 1-to-1 relation to the underlying CCW device. That's why we need
> > the information about the CSS devices in libvirt, to give the user a
> > chance to easily find this relation when he wants to do passthrough with
> > a DASD for example. It's a bit unintuitive and even silly for the
> > end user, but it's architecturally correct and hence it was implemented
> > like this in the vfio-ccw kernel driver. We don't even get the benefits
> > of something like NPIV because for every subchannel there is only one
> > CCW device (at least to my knowledge).  
> 
> Having read both Connie's blog on channels and the I/O chapter in the free
> architectural publication you linked in the other reply, I now have a very
> vague understating of the architecture. One thing that still
> puzzles me though is that if there's a 1:1 subchannel mapping between CSS and 
> an
> CCW device which then communicates via a control unit (which there may be more
> than 1) with CSS over up to 8 channel paths (this at least to me resembles
> NPIV), what does the mdev really solve, can't you assign one of the channel
> paths of one of the several control units the device is connected to to
> plug into the VM? Well, clearly not, I'm just being curious and trying to
> understand the use case here.

I think it's easier to leave control units mostly out of the picture,
as they are not really relevant for the modelling here. And yes, there
is always exactly one subchannel for an I/O device.

Subchannels (mostly a concept describing how to interact with a device)
and channel paths (mostly corresponding to the way data flows to/from
the device) are probably best understood as different ways to describe
how devices are accessed. You don't really want to pass channel paths
(as they are shared between many devices); the subchannel is a more
natural unit to pass. The operating system also has way more control
over subchannels than over individual channel paths.

Regarding mdev, the idea here is less to be able to slice up a device
as you do with GPUs, but more to be able to add a vendor driver that
does the actual interesting work (channel program translations etc.)

As to why we use the subchannel instead of the device, that's mostly
following from how the low-level channel I/O layer in the Linux kernel
is organized: The default I/O subchannel driver does a lot of things
that a normal device driver should not need to deal with (path
grouping, verification, status accumulation, etc.), but which would get
into the way if you wanted to assign the device via vfio. Therefore,
vfio-ccw implements an alternate I/O subchannel driver.

HTH (at least a little bit).



Re: [PATCH 2/5] node_device: detect CSS devices

2020-09-09 Thread Cornelia Huck
On Wed, 9 Sep 2020 14:46:00 +0200
Erik Skultety  wrote:

> > > > Are ^these attributes documented a little more somewhere? I didn't find 
> > > > those
> > > > in [1]. I suppose it is available in the z/Architecture: Principles of  
> >
> > [BTW: what are [1] and [2] referring to?]  
> 
> Urgh...not again...
> [1] 
> https://www.ibm.com/support/knowledgecenter/zosbasics/com.ibm.zos.znetwork/znetwork_59.htm
> [2] https://www.kernel.org/doc/html/latest/driver-api/s390-drivers.html

TBH, I'm not sure how useful [2] is; it's mostly a vehicle for pulling
in kerneldoc documentation of structs and functions. There's also
https://www.kernel.org/doc/html/latest/s390/driver-model.html, not sure
if that is helpful.

> 
> Sorry...
> 
> >  
> > > > Operation publication for which I'd have to get an IBM account.
> > > >  
> > >
> > > Here's a freely available version of the PoP:
> > >
> > > https://www.ibm.com/support/pages/sites/default/files/inline-files/690450_SA22-7832-03.pdf
> > >
> > > I/O is described in chapter 13.  
> >
> > Let me also point to a series of articles I did on my blog:
> > https://virtualpenguins.blogspot.com/2017/02/channel-io-demystified.html  
> 
> Thanks, I've looked at both, it's still fairly cloudy though.

Let me know if I can do anything to make it clearer. This channel I/O
stuff probably can be a bit daunting...



Re: [PATCH 2/5] node_device: detect CSS devices

2020-09-09 Thread Cornelia Huck
On Wed, 9 Sep 2020 08:18:08 +0200
Bjoern Walk  wrote:

> Erik Skultety  [2020-09-08, 05:46PM +0200]:
> > On Mon, Aug 24, 2020 at 01:59:12PM +0200, Bjoern Walk wrote:  
> > > From: Boris Fiuczynski 
> > >
> > > Make channel subsystem (CSS) devices available in the node_device 
> > > driver.o  
> > 
> > Can there be more CSS devices per CPC?  
> 
> Yes, several typically. One for each I/O device attached to the LPAR.
> 
> >   
> > > The CCS devices reside in the computer system and provide CCW devices, 
> > > e.g.:
> > >
> > >   +- css_0_0_003a
> > >   |
> > >   +- ccw_0_0_1a2b
> > >   |
> > >   +- scsi_host0
> > >   |
> > >   +- scsi_target0_0_0
> > >   |
> > >   +- scsi_0_0_0_0
> > >
> > > Reviewed-by: Bjoern Walk 
> > > Signed-off-by: Boris Fiuczynski 
> > > ---
> > >  docs/schemas/nodedev.rng  | 16 ++
> > >  src/conf/node_device_conf.c   |  5 +
> > >  src/conf/node_device_conf.h   |  1 +
> > >  src/conf/virnodedeviceobj.c   |  1 +
> > >  src/node_device/node_device_udev.c| 22 +++
> > >  .../ccw_0_0_1-invalid.xml |  4 ++--
> > >  tests/nodedevschemadata/ccw_0_0_.xml  |  4 ++--
> > >  tests/nodedevschemadata/css_0_0_.xml  | 10 +
> > >  tests/nodedevxml2xmltest.c|  1 +
> > >  tools/virsh-nodedev.c |  1 +
> > >  10 files changed, 61 insertions(+), 4 deletions(-)
> > >  create mode 100644 tests/nodedevschemadata/css_0_0_.xml
> > >
> > > diff --git a/docs/schemas/nodedev.rng b/docs/schemas/nodedev.rng
> > > index 4b2b350f..f7f517b5 100644
> > > --- a/docs/schemas/nodedev.rng
> > > +++ b/docs/schemas/nodedev.rng
> > > @@ -85,6 +85,7 @@
> > >  
> > >  
> > >  
> > > +
> > >
> > >  
> > >
> > > @@ -659,6 +660,21 @@
> > >  
> > >
> > >
> > > +  
> > > +
> > > +  css
> > > +
> > > +  
> > 
> > Is ^this one just a different name for CHPID or those are completely 
> > unrelated?  
> 
> As far as I understood, there is a 1-to-1 relation between CHPIDs and
> channel paths, but they are not the same. For example, there are only
> 256 CHPIDs per system available, compared to 2^16 channel paths.

'CHPID' is short for 'channel path identifier'. You can have up to 256
channel paths per channel subsystem image (a given LPAR only sees one
channel subsystem image[a]). Any subchannel can use up to 8 channel
paths, and a given channel path is usually used by many subchannels.

The 'cssid' is the number of the channel subsystem image for the
subchannel.[b]

> 
> >   
> > > +  
> > > +
> > > +
> > > +  
> > > +
> > > +
> > > +  
> > > +
> > > +  
> > > +  
> > 
> > Are ^these attributes documented a little more somewhere? I didn't find 
> > those
> > in [1]. I suppose it is available in the z/Architecture: Principles of

[BTW: what are [1] and [2] referring to?]

> > Operation publication for which I'd have to get an IBM account.
> >   
> 
> Here's a freely available version of the PoP:
> 
> https://www.ibm.com/support/pages/sites/default/files/inline-files/690450_SA22-7832-03.pdf
> 
> I/O is described in chapter 13.

Let me also point to a series of articles I did on my blog:
https://virtualpenguins.blogspot.com/2017/02/channel-io-demystified.html

> 
> > [snip]
> >   
> > >
> > >
> > > +static int
> > > +udevProcessCSS(struct udev_device *device,
> > > +   virNodeDeviceDefPtr def)
> > > +{
> > > +/* do not process EADM and CHSC devices to keep the list sane */
> > > +if (STREQ(def->driver, "eadm_subchannel") ||
> > > +STREQ(def->driver, "chsc_subchannel"))  
> > 
> > [2] Also mentions message subchannel, although apparently there are no Linux
> > kernel drivers for that yet, IIUC that one would have to be added here as 
> > well
> > as some point, is that correct?  
> 
> We have never seen those, but we would want to add them here if they
> have no relevance for the user.

I think you want to filter message subchannels as well, my assumption
is that libvirt will only care about I/O subchannels.

> 
> >   
> > > +return -1;
> > > +
> > > +if (udevGetCCWAddress(def->sysfs_path, >caps->data) < 0)
> > > +return -1;
> > > +
> > > +if (udevGenerateDeviceName(device, def, NULL) != 0)
> > > +return -1;
> > > +
> > > +return 0;
> > > +}
> > > +
> > >  static int
> > >  udevGetDeviceNodes(struct udev_device *device,
> > > virNodeDeviceDefPtr def)
> > > @@ -1175,6 +1193,8 @@ udevGetDeviceType(struct udev_device *device,
> > >  *type = VIR_NODE_DEV_CAP_MDEV;
> > >  else if (STREQ_NULLABLE(subsystem, "ccw"))
> > >  *type = VIR_NODE_DEV_CAP_CCW_DEV;
> > > +else if (STREQ_NULLABLE(subsystem, "css"))
> > > +*type = 

Re: device compatibility interface for live migration with assigned devices

2020-09-08 Thread Cornelia Huck
On Mon, 31 Aug 2020 12:43:44 +0800
Yan Zhao  wrote:

> On Fri, Aug 28, 2020 at 03:04:12PM +0100, Sean Mooney wrote:
> > On Fri, 2020-08-28 at 15:47 +0200, Cornelia Huck wrote:  
> > > On Wed, 26 Aug 2020 14:41:17 +0800
> > > Yan Zhao  wrote:
> > >   
> > > > previously, we want to regard the two mdevs created with dsa-1dwq x 30 
> > > > and
> > > > dsa-2dwq x 15 as compatible, because the two mdevs consist equal 
> > > > resources.
> > > > 
> > > > But, as it's a burden to upper layer, we agree that if this condition
> > > > happens, we still treat the two as incompatible.
> > > > 
> > > > To fix it, either the driver should expose dsa-1dwq only, or the target
> > > > dsa-2dwq needs to be destroyed and reallocated via dsa-1dwq x 30.  
> > > 
> > > AFAIU, these are mdev types, aren't they? So, basically, any management
> > > software needs to take care to use the matching mdev type on the target
> > > system for device creation?  
> > 
> > or just do the simple thing of use the same mdev type on the source and 
> > dest.
> > matching mdevtypes is not nessiarly trivial. we could do that but we woudl 
> > have
> > to do that in python rather then sql so it would be slower to do at least 
> > today.
> > 
> > we dont currently have the ablity to say the resouce provider must have 1 
> > of these
> > set of traits. just that we must have a specific trait. this is a feature 
> > we have
> > disucssed a couple of times and delayed untill we really really need it but 
> > its not out
> > of the question that we could add it for this usecase. i suspect however we 
> > would do exact
> > match first and explore this later after the inital mdev migration works.  
> 
> Yes, I think it's good.
> 
> still, I'd like to put it more explicitly to make ensure it's not missed:
> the reason we want to specify compatible_type as a trait and check
> whether target compatible_type is the superset of source
> compatible_type is for the consideration of backward compatibility.
> e.g.
> an old generation device may have a mdev type xxx-v4-yyy, while a newer
> generation  device may be of mdev type xxx-v5-yyy.
> with the compatible_type traits, the old generation device is still
> able to be regarded as compatible to newer generation device even their
> mdev types are not equal.

If you want to support migration from v4 to v5, can't the (presumably
newer) driver that supports v5 simply register the v4 type as well, so
that the mdev can be created as v4? (Just like QEMU versioned machine
types work.)



Re: device compatibility interface for live migration with assigned devices

2020-08-28 Thread Cornelia Huck
On Wed, 26 Aug 2020 14:41:17 +0800
Yan Zhao  wrote:

> previously, we want to regard the two mdevs created with dsa-1dwq x 30 and
> dsa-2dwq x 15 as compatible, because the two mdevs consist equal resources.
> 
> But, as it's a burden to upper layer, we agree that if this condition
> happens, we still treat the two as incompatible.
> 
> To fix it, either the driver should expose dsa-1dwq only, or the target
> dsa-2dwq needs to be destroyed and reallocated via dsa-1dwq x 30.

AFAIU, these are mdev types, aren't they? So, basically, any management
software needs to take care to use the matching mdev type on the target
system for device creation?



Re: device compatibility interface for live migration with assigned devices

2020-08-25 Thread Cornelia Huck
On Thu, 20 Aug 2020 11:16:21 +0800
Yan Zhao  wrote:

> On Wed, Aug 19, 2020 at 09:22:34PM -0600, Alex Williamson wrote:
> > On Thu, 20 Aug 2020 08:39:22 +0800
> > Yan Zhao  wrote:
> >   
> > > On Tue, Aug 18, 2020 at 11:36:52AM +0200, Cornelia Huck wrote:  
> > > > On Tue, 18 Aug 2020 10:16:28 +0100
> > > > Daniel P. Berrangé  wrote:
> > > > 
> > > > > On Tue, Aug 18, 2020 at 05:01:51PM +0800, Jason Wang wrote:
> > > > > >On 2020/8/18 下午4:55, Daniel P. Berrangé wrote:
> > > > > > 
> > > > > >  On Tue, Aug 18, 2020 at 11:24:30AM +0800, Jason Wang wrote:
> > > > > > 
> > > > > >  On 2020/8/14 下午1:16, Yan Zhao wrote:
> > > > > > 
> > > > > >  On Thu, Aug 13, 2020 at 12:24:50PM +0800, Jason Wang wrote:
> > > > > > 
> > > > > >  On 2020/8/10 下午3:46, Yan Zhao wrote:  
> > > > > 
> > > > > >  we actually can also retrieve the same information through sysfs, 
> > > > > > .e.g
> > > > > > 
> > > > > >  |- [path to device]
> > > > > > |--- migration
> > > > > > | |--- self
> > > > > > | |   |---device_api
> > > > > > ||   |---mdev_type
> > > > > > ||   |---software_version
> > > > > > ||   |---device_id
> > > > > > ||   |---aggregator
> > > > > > | |--- compatible
> > > > > > | |   |---device_api
> > > > > > ||   |---mdev_type
> > > > > > ||   |---software_version
> > > > > > ||   |---device_id
> > > > > > ||   |---aggregator
> > > > > > 
> > > > > > 
> > > > > >  Yes but:
> > > > > > 
> > > > > >  - You need one file per attribute (one syscall for one attribute)
> > > > > >  - Attribute is coupled with kobject
> > > > 
> > > > Is that really that bad? You have the device with an embedded kobject
> > > > anyway, and you can just put things into an attribute group?
> > > > 
> > > > [Also, I think that self/compatible split in the example makes things
> > > > needlessly complex. Shouldn't semantic versioning and matching already
> > > > cover nearly everything? I would expect very few cases that are more
> > > > complex than that. Maybe the aggregation stuff, but I don't think we
> > > > need that self/compatible split for that, either.]
> > > Hi Cornelia,
> > > 
> > > The reason I want to declare compatible list of attributes is that
> > > sometimes it's not a simple 1:1 matching of source attributes and target 
> > > attributes
> > > as I demonstrated below,
> > > source mdev of (mdev_type i915-GVTg_V5_2 + aggregator 1) is compatible to
> > > target mdev of (mdev_type i915-GVTg_V5_4 + aggregator 2),
> > >(mdev_type i915-GVTg_V5_8 + aggregator 4)
> > > 
> > > and aggragator may be just one of such examples that 1:1 matching does not
> > > fit.  
> > 
> > If you're suggesting that we need a new 'compatible' set for every
> > aggregation, haven't we lost the purpose of aggregation?  For example,
> > rather than having N mdev types to represent all the possible
> > aggregation values, we have a single mdev type with N compatible
> > migration entries, one for each possible aggregation value.  BTW, how do
> > we have multiple compatible directories?  compatible0001,
> > compatible0002? Thanks,
> >   
> do you think the bin_attribute I proposed yesterday good?
> Then we can have a single compatible with a variable in the mdev_type and
> aggregator.
> 
>mdev_type=i915-GVTg_V5_{val1:int:2,4,8}
>aggregator={val1}/2

I'm not really a fan of binary attributes other than in cases where we
have some kind of binary format to begin with.

IIUC, we basically have:
- different partitioning (expressed in the mdev_type)
- different number of partitions (expressed via the aggregator)
- devices being compatible if the partitioning:aggregator ratio is the
  same

(The multiple mdev_type variants seem to come from avoiding extra
creation parameters, IIRC?)

Would it be enough to export
base_type=i915-GVTg_V5
aggregation_ratio=

to express the various combinations that are compatible without the
need for multiple sets of attributes?



Re: [ovirt-devel] Re: device compatibility interface for live migration with assigned devices

2020-08-21 Thread Cornelia Huck
On Fri, 21 Aug 2020 11:14:41 +0800
Jason Wang  wrote:

> On 2020/8/20 下午8:27, Cornelia Huck wrote:
> > On Wed, 19 Aug 2020 17:28:38 +0800
> > Jason Wang  wrote:
> >  
> >> On 2020/8/19 下午4:13, Yan Zhao wrote:  
> >>> On Wed, Aug 19, 2020 at 03:39:50PM +0800, Jason Wang wrote:  
> >>>> On 2020/8/19 下午2:59, Yan Zhao wrote:  
> >>>>> On Wed, Aug 19, 2020 at 02:57:34PM +0800, Jason Wang wrote:  
> >>>>>> On 2020/8/19 上午11:30, Yan Zhao wrote:  
> >>>>>>> hi All,
> >>>>>>> could we decide that sysfs is the interface that every VFIO vendor 
> >>>>>>> driver
> >>>>>>> needs to provide in order to support vfio live migration, otherwise 
> >>>>>>> the
> >>>>>>> userspace management tool would not list the device into the 
> >>>>>>> compatible
> >>>>>>> list?
> >>>>>>>
> >>>>>>> if that's true, let's move to the standardizing of the sysfs 
> >>>>>>> interface.
> >>>>>>> (1) content
> >>>>>>> common part: (must)
> >>>>>>>- software_version: (in major.minor.bugfix scheme)  
> >>>>>> This can not work for devices whose features can be 
> >>>>>> negotiated/advertised
> >>>>>> independently. (E.g virtio devices)  
> > I thought the 'software_version' was supposed to describe kind of a
> > 'protocol version' for the data we transmit? I.e., you add a new field,
> > you bump the version number.  
> 
> 
> Ok, but since we mandate backward compatibility of uABI, is this really 
> worth to have a version for sysfs? (Searching on sysfs shows no examples 
> like this)

I was not thinking about the sysfs interface, but rather about the data
that is sent over while migrating. E.g. we find out that sending some
auxiliary data is a good idea and bump to version 1.1.0; version 1.0.0
cannot deal with the extra data, but version 1.1.0 can deal with the
older data stream.

(...)

> >>>>>>>- device_api: vfio-pci or vfio-ccw ...
> >>>>>>>- type: mdev type for mdev device or
> >>>>>>>a signature for physical device which is a counterpart 
> >>>>>>> for
> >>>>>>>  mdev type.
> >>>>>>>
> >>>>>>> device api specific part: (must)
> >>>>>>>   - pci id: pci id of mdev parent device or pci id of physical pci
> >>>>>>> device (device_api is vfio-pci)API here.  
> >>>>>> So this assumes a PCI device which is probably not true.
> >>>>>> 
> >>>>> for device_api of vfio-pci, why it's not true?
> >>>>>
> >>>>> for vfio-ccw, it's subchannel_type.  
> >>>> Ok but having two different attributes for the same file is not good 
> >>>> idea.
> >>>> How mgmt know there will be a 3rd type?  
> >>> that's why some attributes need to be common. e.g.
> >>> device_api: it's common because mgmt need to know it's a pci device or a
> >>>   ccw device. and the api type is already defined vfio.h.
> >>>   (The field is agreed by and actually suggested by Alex in previous 
> >>> mail)
> >>> type: mdev_type for mdev. if mgmt does not understand it, it would not
> >>> be able to create one compatible mdev device.
> >>> software_version: mgmt can compare the major and minor if it understands
> >>> this fields.  
> >>
> >> I think it would be helpful if you can describe how mgmt is expected to
> >> work step by step with the proposed sysfs API. This can help people to
> >> understand.  
> > My proposal would be:
> > - check that device_api matches
> > - check possible device_api specific attributes
> > - check that type matches [I don't think the combination of mdev types
> >and another attribute to determine compatibility is a good idea;  
> 
> 
> Any reason for this? Actually if we only use mdev type to detect the 
> compatibility, it would be much more easier. Otherwise, we are actually 
> re-inventing mdev types.
> 
> E.g can we have the same mdev types with different device_api and other 
> attributes?

In the end, the mdev type is represented as a string; but I'm not sure
we can expect that two types with th

Re: [ovirt-devel] Re: device compatibility interface for live migration with assigned devices

2020-08-20 Thread Cornelia Huck
On Wed, 19 Aug 2020 17:28:38 +0800
Jason Wang  wrote:

> On 2020/8/19 下午4:13, Yan Zhao wrote:
> > On Wed, Aug 19, 2020 at 03:39:50PM +0800, Jason Wang wrote:  
> >> On 2020/8/19 下午2:59, Yan Zhao wrote:  
> >>> On Wed, Aug 19, 2020 at 02:57:34PM +0800, Jason Wang wrote:  
>  On 2020/8/19 上午11:30, Yan Zhao wrote:  
> > hi All,
> > could we decide that sysfs is the interface that every VFIO vendor 
> > driver
> > needs to provide in order to support vfio live migration, otherwise the
> > userspace management tool would not list the device into the compatible
> > list?
> >
> > if that's true, let's move to the standardizing of the sysfs interface.
> > (1) content
> > common part: (must)
> >   - software_version: (in major.minor.bugfix scheme)  
>  This can not work for devices whose features can be negotiated/advertised
>  independently. (E.g virtio devices)

I thought the 'software_version' was supposed to describe kind of a
'protocol version' for the data we transmit? I.e., you add a new field,
you bump the version number.

>   
> >>> sorry, I don't understand here, why virtio devices need to use vfio 
> >>> interface?  
> >>
> >> I don't see any reason that virtio devices can't be used by VFIO. Do you?
> >>
> >> Actually, virtio devices have been used by VFIO for many years:
> >>
> >> - passthrough a hardware virtio devices to userspace(VM) drivers
> >> - using virtio PMD inside guest
> >>  
> > So, what's different for it vs passing through a physical hardware via 
> > VFIO?  
> 
> 
> The difference is in the guest, the device could be either real hardware 
> or emulated ones.
> 
> 
> > even though the features are negotiated dynamically, could you explain
> > why it would cause software_version not work?  
> 
> 
> Virtio device 1 supports feature A, B, C
> Virtio device 2 supports feature B, C, D
> 
> So you can't migrate a guest from device 1 to device 2. And it's 
> impossible to model the features with versions.

We're talking about the features offered by the device, right? Would it
be sufficient to mandate that the target device supports the same
features or a superset of the features supported by the source device?

> 
> 
> >
> >  
> >>> I think this thread is discussing about vfio related devices.
> >>>  
> >   - device_api: vfio-pci or vfio-ccw ...
> >   - type: mdev type for mdev device or
> >   a signature for physical device which is a counterpart for
> >mdev type.
> >
> > device api specific part: (must)
> >  - pci id: pci id of mdev parent device or pci id of physical pci
> >device (device_api is vfio-pci)API here.  
>  So this assumes a PCI device which is probably not true.
>   
> >>> for device_api of vfio-pci, why it's not true?
> >>>
> >>> for vfio-ccw, it's subchannel_type.  
> >>
> >> Ok but having two different attributes for the same file is not good idea.
> >> How mgmt know there will be a 3rd type?  
> > that's why some attributes need to be common. e.g.
> > device_api: it's common because mgmt need to know it's a pci device or a
> >  ccw device. and the api type is already defined vfio.h.
> > (The field is agreed by and actually suggested by Alex in previous 
> > mail)
> > type: mdev_type for mdev. if mgmt does not understand it, it would not
> >be able to create one compatible mdev device.
> > software_version: mgmt can compare the major and minor if it understands
> >this fields.  
> 
> 
> I think it would be helpful if you can describe how mgmt is expected to 
> work step by step with the proposed sysfs API. This can help people to 
> understand.

My proposal would be:
- check that device_api matches
- check possible device_api specific attributes
- check that type matches [I don't think the combination of mdev types
  and another attribute to determine compatibility is a good idea;
  actually, the current proposal confuses me every time I look at it]
- check that software_version is compatible, assuming semantic
  versioning
- check possible type-specific attributes

> 
> Thanks for the patience. Since sysfs is uABI, when accepted, we need 
> support it forever. That's why we need to be careful.

Nod.

(...)



Re: device compatibility interface for live migration with assigned devices

2020-08-18 Thread Cornelia Huck
On Tue, 18 Aug 2020 10:24:33 +0100
Daniel P. Berrangé  wrote:

> On Tue, Aug 18, 2020 at 11:06:17AM +0200, Cornelia Huck wrote:
> > On Tue, 18 Aug 2020 09:55:27 +0100
> > Daniel P. Berrangé  wrote:
> >   
> > > On Tue, Aug 18, 2020 at 11:24:30AM +0800, Jason Wang wrote:  
> > > > Another point, as we discussed in another thread, it's really hard to 
> > > > make
> > > > sure the above API work for all types of devices and frameworks. So 
> > > > having a
> > > > vendor specific API looks much better.
> > > 
> > > From the POV of userspace mgmt apps doing device compat checking / 
> > > migration,
> > > we certainly do NOT want to use different vendor specific APIs. We want to
> > > have an API that can be used / controlled in a standard manner across 
> > > vendors.  
> > 
> > As we certainly will need to have different things to check for
> > different device types and vendor drivers, would it still be fine to
> > have differing (say) attributes, as long as they are presented (and can
> > be discovered) in a standardized way?  
> 
> Yes, the control API and algorithm to deal with the problem needs to
> have standardization, but the data passed in/out of the APIs can vary.
> 
> Essentially the key is that vendors should be able to create devices
> at the kernel, and those devices should "just work" with the existing
> generic userspace migration / compat checking code, without needing
> extra vendor specific logic to be added.
> 
> Note, I'm not saying that the userspace decisions would be perfectly
> optimal based on generic code. They might be making a simplified
> decision that while functionally safe, is not the ideal solution.
> Adding vendor specific code might be able to optimize the userspace
> decisions, but that should be considered just optimization, not a
> core must have for any opertion.

Yes, that sounds reasonable.



Re: device compatibility interface for live migration with assigned devices

2020-08-18 Thread Cornelia Huck
On Tue, 18 Aug 2020 10:16:28 +0100
Daniel P. Berrangé  wrote:

> On Tue, Aug 18, 2020 at 05:01:51PM +0800, Jason Wang wrote:
> >On 2020/8/18 下午4:55, Daniel P. Berrangé wrote:
> > 
> >  On Tue, Aug 18, 2020 at 11:24:30AM +0800, Jason Wang wrote:
> > 
> >  On 2020/8/14 下午1:16, Yan Zhao wrote:
> > 
> >  On Thu, Aug 13, 2020 at 12:24:50PM +0800, Jason Wang wrote:
> > 
> >  On 2020/8/10 下午3:46, Yan Zhao wrote:  
> 
> >  we actually can also retrieve the same information through sysfs, .e.g
> > 
> >  |- [path to device]
> > |--- migration
> > | |--- self
> > | |   |---device_api
> > ||   |---mdev_type
> > ||   |---software_version
> > ||   |---device_id
> > ||   |---aggregator
> > | |--- compatible
> > | |   |---device_api
> > ||   |---mdev_type
> > ||   |---software_version
> > ||   |---device_id
> > ||   |---aggregator
> > 
> > 
> >  Yes but:
> > 
> >  - You need one file per attribute (one syscall for one attribute)
> >  - Attribute is coupled with kobject

Is that really that bad? You have the device with an embedded kobject
anyway, and you can just put things into an attribute group?

[Also, I think that self/compatible split in the example makes things
needlessly complex. Shouldn't semantic versioning and matching already
cover nearly everything? I would expect very few cases that are more
complex than that. Maybe the aggregation stuff, but I don't think we
need that self/compatible split for that, either.]

> > 
> >  All of above seems unnecessary.
> > 
> >  Another point, as we discussed in another thread, it's really hard to make
> >  sure the above API work for all types of devices and frameworks. So having 
> > a
> >  vendor specific API looks much better.
> > 
> >  From the POV of userspace mgmt apps doing device compat checking / 
> > migration,
> >  we certainly do NOT want to use different vendor specific APIs. We want to
> >  have an API that can be used / controlled in a standard manner across 
> > vendors.
> > 
> >Yes, but it could be hard. E.g vDPA will chose to use devlink (there's a
> >long debate on sysfs vs devlink). So if we go with sysfs, at least two
> >APIs needs to be supported ...  
> 
> NB, I was not questioning devlink vs sysfs directly. If devlink is related
> to netlink, I can't say I'm enthusiastic as IMKE sysfs is easier to deal
> with. I don't know enough about devlink to have much of an opinion though.
> The key point was that I don't want the userspace APIs we need to deal with
> to be vendor specific.

>From what I've seen of devlink, it seems quite nice; but I understand
why sysfs might be easier to deal with (especially as there's likely
already a lot of code using it.)

I understand that some users would like devlink because it is already
widely used for network drivers (and some others), but I don't think
the majority of devices used with vfio are network (although certainly
a lot of them are.)

> 
> What I care about is that we have a *standard* userspace API for performing
> device compatibility checking / state migration, for use by QEMU/libvirt/
> OpenStack, such that we can write code without countless vendor specific
> code paths.
> 
> If there is vendor specific stuff on the side, that's fine as we can ignore
> that, but the core functionality for device compat / migration needs to be
> standardized.

To summarize:
- choose one of sysfs or devlink
- have a common interface, with a standardized way to add
  vendor-specific attributes
?



Re: device compatibility interface for live migration with assigned devices

2020-08-18 Thread Cornelia Huck
On Tue, 18 Aug 2020 09:55:27 +0100
Daniel P. Berrangé  wrote:

> On Tue, Aug 18, 2020 at 11:24:30AM +0800, Jason Wang wrote:
> > Another point, as we discussed in another thread, it's really hard to make
> > sure the above API work for all types of devices and frameworks. So having a
> > vendor specific API looks much better.  
> 
> From the POV of userspace mgmt apps doing device compat checking / migration,
> we certainly do NOT want to use different vendor specific APIs. We want to
> have an API that can be used / controlled in a standard manner across vendors.

As we certainly will need to have different things to check for
different device types and vendor drivers, would it still be fine to
have differing (say) attributes, as long as they are presented (and can
be discovered) in a standardized way?

(See e.g. what I came up with for vfio-ccw in a different branch of
this thread.)

E.g.
version=
.type_specific_value0=
.type_specific_value1=
.vendor_driver_specific_value0=

with a type or vendor driver having some kind of
get_supported_attributes method?



Re: device compatibility interface for live migration with assigned devices

2020-08-17 Thread Cornelia Huck
On Thu, 13 Aug 2020 15:02:53 -0400
Eric Farman  wrote:

> On 8/13/20 11:33 AM, Cornelia Huck wrote:
> > On Fri, 7 Aug 2020 13:59:42 +0200
> > Cornelia Huck  wrote:
> >   
> >> On Wed, 05 Aug 2020 12:35:01 +0100
> >> Sean Mooney  wrote:
> >>  
> >>> On Wed, 2020-08-05 at 12:53 +0200, Jiri Pirko wrote:
> >>>> Wed, Aug 05, 2020 at 11:33:38AM CEST, yan.y.z...@intel.com wrote:  
> >>
> >> (...)
> >>  
> >>>>>software_version: device driver's version.
> >>>>>   in .[.bugfix] scheme, where there is no
> >>>>>compatibility across major versions, minor versions have
> >>>>>forward compatibility (ex. 1-> 2 is ok, 2 -> 1 is not) 
> >>>>> and
> >>>>>bugfix version number indicates some degree of internal
> >>>>>improvement that is not visible to the user in terms of
> >>>>>features or compatibility,
> >>>>>
> >>>>> vendor specific attributes: each vendor may define different attributes
> >>>>>   device id : device id of a physical devices or mdev's parent pci 
> >>>>> device.
> >>>>>   it could be equal to pci id for pci devices
> >>>>>   aggregator: used together with mdev_type. e.g. aggregator=2 together
> >>>>>   with i915-GVTg_V5_4 means 2*1/4=1/2 of a gen9 Intel
> >>>>>graphics device.
> >>>>>   remote_url: for a local NVMe VF, it may be configured with a remote
> >>>>>   url of a remote storage and all data is stored in the
> >>>>>remote side specified by the remote url.
> >>>>>   ...  
> >>> just a minor not that i find ^ much more simmple to understand then
> >>> the current proposal with self and compatiable.
> >>> if i have well defiend attibute that i can parse and understand that allow
> >>> me to calulate the what is and is not compatible that is likely going to
> >>> more useful as you wont have to keep maintianing a list of other 
> >>> compatible
> >>> devices every time a new sku is released.
> >>>
> >>> in anycase thank for actully shareing ^ as it make it simpler to reson 
> >>> about what
> >>> you have previously proposed.
> >>
> >> So, what would be the most helpful format? A 'software_version' field
> >> that follows the conventions outlined above, and other (possibly
> >> optional) fields that have to match?  
> > 
> > Just to get a different perspective, I've been trying to come up with
> > what would be useful for a very different kind of device, namely
> > vfio-ccw. (Adding Eric to cc: for that.)
> > 
> > software_version makes sense for everybody, so it should be a standard
> > attribute.
> > 
> > For the vfio-ccw type, we have only one vendor driver (vfio-ccw_IO).
> > 
> > Given a subchannel A, we want to make sure that subchannel B has a
> > reasonable chance of being compatible. I guess that means:
> > 
> > - same subchannel type (I/O)
> > - same chpid type (e.g. all FICON; I assume there are no 'mixed' setups
> >   -- Eric?)  
> 
> Correct.
> 
> > - same number of chpids? Maybe we can live without that and just inject
> >   some machine checks, I don't know. Same chpid numbers is something we
> >   cannot guarantee, especially if we want to migrate cross-CEC (to
> >   another machine.)  
> 
> I think we'd live without it, because I wouldn't expect it to be
> consistent between systems.

Yes, and the guest needs to be able to deal with changing path
configurations anyway.

> 
> > 
> > Other possibly interesting information is not available at the
> > subchannel level (vfio-ccw is a subchannel driver.)  
> 
> I presume you're alluding to the DASD uid (dasdinfo -x) here?

Yes, or the even more basic Sense ID information.

> 
> > 
> > So, looking at a concrete subchannel on one of my machines, it would
> > look something like the following:
> > 
> > 
> > software_version=1.0.0
> > type=vfio-ccw  <-- would be vfio-pci on the example above
> > 
> > subchannel_type=0
> > 
> > chpid_type=0x1a
> > chpid_mask=0xf0<-- not sure if needed/wanted

Let's just drop the chpid_mask here.

> > 
> > Does that make sense?

Would be interesting if someone could come up with some possible
information for a third type of device.



Re: device compatibility interface for live migration with assigned devices

2020-08-13 Thread Cornelia Huck
On Fri, 7 Aug 2020 13:59:42 +0200
Cornelia Huck  wrote:

> On Wed, 05 Aug 2020 12:35:01 +0100
> Sean Mooney  wrote:
> 
> > On Wed, 2020-08-05 at 12:53 +0200, Jiri Pirko wrote:  
> > > Wed, Aug 05, 2020 at 11:33:38AM CEST, yan.y.z...@intel.com wrote:
> 
> (...)
> 
> > > >software_version: device driver's version.
> > > >   in .[.bugfix] scheme, where there is no
> > > >compatibility across major versions, minor versions have
> > > >forward compatibility (ex. 1-> 2 is ok, 2 -> 1 is not) 
> > > > and
> > > >bugfix version number indicates some degree of internal
> > > >improvement that is not visible to the user in terms of
> > > >features or compatibility,
> > > > 
> > > > vendor specific attributes: each vendor may define different attributes
> > > >   device id : device id of a physical devices or mdev's parent pci 
> > > > device.
> > > >   it could be equal to pci id for pci devices
> > > >   aggregator: used together with mdev_type. e.g. aggregator=2 together
> > > >   with i915-GVTg_V5_4 means 2*1/4=1/2 of a gen9 Intel
> > > >graphics device.
> > > >   remote_url: for a local NVMe VF, it may be configured with a remote
> > > >   url of a remote storage and all data is stored in the
> > > >remote side specified by the remote url.
> > > >   ...
> > just a minor not that i find ^ much more simmple to understand then
> > the current proposal with self and compatiable.
> > if i have well defiend attibute that i can parse and understand that allow
> > me to calulate the what is and is not compatible that is likely going to
> > more useful as you wont have to keep maintianing a list of other compatible
> > devices every time a new sku is released.
> > 
> > in anycase thank for actully shareing ^ as it make it simpler to reson 
> > about what
> > you have previously proposed.  
> 
> So, what would be the most helpful format? A 'software_version' field
> that follows the conventions outlined above, and other (possibly
> optional) fields that have to match?

Just to get a different perspective, I've been trying to come up with
what would be useful for a very different kind of device, namely
vfio-ccw. (Adding Eric to cc: for that.)

software_version makes sense for everybody, so it should be a standard
attribute.

For the vfio-ccw type, we have only one vendor driver (vfio-ccw_IO).

Given a subchannel A, we want to make sure that subchannel B has a
reasonable chance of being compatible. I guess that means:

- same subchannel type (I/O)
- same chpid type (e.g. all FICON; I assume there are no 'mixed' setups
  -- Eric?)
- same number of chpids? Maybe we can live without that and just inject
  some machine checks, I don't know. Same chpid numbers is something we
  cannot guarantee, especially if we want to migrate cross-CEC (to
  another machine.)

Other possibly interesting information is not available at the
subchannel level (vfio-ccw is a subchannel driver.)

So, looking at a concrete subchannel on one of my machines, it would
look something like the following:


software_version=1.0.0
type=vfio-ccw  <-- would be vfio-pci on the example above

subchannel_type=0

chpid_type=0x1a
chpid_mask=0xf0<-- not sure if needed/wanted

Does that make sense?



Re: device compatibility interface for live migration with assigned devices

2020-08-07 Thread Cornelia Huck
On Wed, 05 Aug 2020 12:35:01 +0100
Sean Mooney  wrote:

> On Wed, 2020-08-05 at 12:53 +0200, Jiri Pirko wrote:
> > Wed, Aug 05, 2020 at 11:33:38AM CEST, yan.y.z...@intel.com wrote:  

(...)

> > >software_version: device driver's version.
> > >   in .[.bugfix] scheme, where there is no
> > >  compatibility across major versions, minor versions have
> > >  forward compatibility (ex. 1-> 2 is ok, 2 -> 1 is not) and
> > >  bugfix version number indicates some degree of internal
> > >  improvement that is not visible to the user in terms of
> > >  features or compatibility,
> > > 
> > > vendor specific attributes: each vendor may define different attributes
> > >   device id : device id of a physical devices or mdev's parent pci device.
> > >   it could be equal to pci id for pci devices
> > >   aggregator: used together with mdev_type. e.g. aggregator=2 together
> > >   with i915-GVTg_V5_4 means 2*1/4=1/2 of a gen9 Intel
> > >  graphics device.
> > >   remote_url: for a local NVMe VF, it may be configured with a remote
> > >   url of a remote storage and all data is stored in the
> > >  remote side specified by the remote url.
> > >   ...  
> just a minor not that i find ^ much more simmple to understand then
> the current proposal with self and compatiable.
> if i have well defiend attibute that i can parse and understand that allow
> me to calulate the what is and is not compatible that is likely going to
> more useful as you wont have to keep maintianing a list of other compatible
> devices every time a new sku is released.
> 
> in anycase thank for actully shareing ^ as it make it simpler to reson about 
> what
> you have previously proposed.

So, what would be the most helpful format? A 'software_version' field
that follows the conventions outlined above, and other (possibly
optional) fields that have to match?

(...)

> > Thanks for the explanation, I'm still fuzzy about the details.
> > Anyway, I suggest you to check "devlink dev info" command we have
> > implemented for multiple drivers.  
> 
> is devlink exposed as a filesytem we can read with just open?
> openstack will likely try to leverage libvirt to get this info but when we
> cant its much simpler to read sysfs then it is to take a a depenency on a 
> commandline
> too and have to fork shell to execute it and parse the cli output.
> pyroute2 which we use in some openstack poject has basic python binding for 
> devlink but im not
> sure how complete it is as i think its relitivly new addtion. if we need to 
> take a dependcy
> we will but that would be a drawback fo devlink not that that is a large one 
> just something
> to keep in mind.

A devlinkfs, maybe? At least for reading information (IIUC, "devlink
dev info" is only about information retrieval, right?)



Re: device compatibility interface for live migration with assigned devices

2020-08-04 Thread Cornelia Huck
[sorry about not chiming in earlier]

On Wed, 29 Jul 2020 16:05:03 +0800
Yan Zhao  wrote:

> On Mon, Jul 27, 2020 at 04:23:21PM -0600, Alex Williamson wrote:

(...)

> > Based on the feedback we've received, the previously proposed interface
> > is not viable.  I think there's agreement that the user needs to be
> > able to parse and interpret the version information.  Using json seems
> > viable, but I don't know if it's the best option.  Is there any
> > precedent of markup strings returned via sysfs we could follow?  

I don't think encoding complex information in a sysfs file is a viable
approach. Quoting Documentation/filesystems/sysfs.rst:

"Attributes should be ASCII text files, preferably with only one value  
  
per file. It is noted that it may not be efficient to contain only one  
 
value per file, so it is socially acceptable to express an array of 
 
values of the same type.
 

 
Mixing types, expressing multiple lines of data, and doing fancy
 
formatting of data is heavily frowned upon."

Even though this is an older file, I think these restrictions still
apply.

> I found some examples of using formatted string under /sys, mostly under
> tracing. maybe we can do a similar implementation.
> 
> #cat /sys/kernel/debug/tracing/events/kvm/kvm_mmio/format

Note that this is *not* sysfs (anything under debug/ follows different
rules anyway!)

> 
> name: kvm_mmio
> ID: 32
> format:
> field:unsigned short common_type;   offset:0;   size:2; 
> signed:0;
> field:unsigned char common_flags;   offset:2;   size:1; 
> signed:0;
> field:unsigned char common_preempt_count;   offset:3;   
> size:1; signed:0;
> field:int common_pid;   offset:4;   size:4; signed:1;
> 
> field:u32 type; offset:8;   size:4; signed:0;
> field:u32 len;  offset:12;  size:4; signed:0;
> field:u64 gpa;  offset:16;  size:8; signed:0;
> field:u64 val;  offset:24;  size:8; signed:0;
> 
> print fmt: "mmio %s len %u gpa 0x%llx val 0x%llx", 
> __print_symbolic(REC->type, { 0, "unsatisfied-read" }, { 1, "read" }, { 2, 
> "write" }), REC->len, REC->gpa, REC->val
> 
> 
> #cat /sys/devices/pci:00/:00:02.0/uevent

'uevent' can probably be considered a special case, I would not really
want to copy it.

> DRIVER=vfio-pci
> PCI_CLASS=3
> PCI_ID=8086:591D
> PCI_SUBSYS_ID=8086:2212
> PCI_SLOT_NAME=:00:02.0
> MODALIAS=pci:v8086d591Dsv8086sd2212bc03sc00i00
> 

(...)

> what about a migration_compatible attribute under device node like
> below?
> 
> #cat /sys/bus/pci/devices/\:00\:02.0/UUID1/migration_compatible
> SELF:
>   device_type=pci
>   device_id=8086591d
>   mdev_type=i915-GVTg_V5_2
>   aggregator=1
>   pv_mode="none+ppgtt+context"
>   interface_version=3
> COMPATIBLE:
>   device_type=pci
>   device_id=8086591d
>   mdev_type=i915-GVTg_V5_{val1:int:1,2,4,8}
>   aggregator={val1}/2
>   pv_mode={val2:string:"none+ppgtt","none+context","none+ppgtt+context"} 
>   interface_version={val3:int:2,3}
> COMPATIBLE:
>   device_type=pci
>   device_id=8086591d
>   mdev_type=i915-GVTg_V5_{val1:int:1,2,4,8}
>   aggregator={val1}/2
>   pv_mode=""  #"" meaning empty, could be absent in a compatible device
>   interface_version=1

I'd consider anything of a comparable complexity to be a big no-no. If
anything, this needs to be split into individual files (with many of
them being vendor driver specific anyway.)

I think we can list compatible versions in a range/list format, though.
Something like

cat interface_version 
2.1.3

cat interface_version_compatible
2.0.2-2.0.4,2.1.0-

(indicating that versions 2.0.{2,3,4} and all versions after 2.1.0 are
compatible, considering versions <2 and >2 incompatible by default)

Possible compatibility between different mdev types feels a bit odd to
me, and should not be included by default (only if it makes sense for a
particular vendor driver.)



Re: [PATCH v2 2/2] docs: Improve zPCI section in pci-addresses.rst

2020-04-17 Thread Cornelia Huck
On Fri, 17 Apr 2020 14:27:34 +0200
Boris Fiuczynski  wrote:

> Improving the zPCI example by choosing more distinct values and
> adding explanation for fid.
> 
> Signed-off-by: Boris Fiuczynski 
> ---
>  docs/pci-addresses.rst | 22 --
>  1 file changed, 12 insertions(+), 10 deletions(-)

Reviewed-by: Cornelia Huck 



Re: [PATCH 2/2] docs: Improve zPCI section in pci-addresses.rst

2020-04-17 Thread Cornelia Huck
On Fri, 17 Apr 2020 13:29:43 +0200
Boris Fiuczynski  wrote:

> On 4/17/20 11:02 AM, Cornelia Huck wrote:
> > On Fri, 17 Apr 2020 10:50:02 +0200
> > Boris Fiuczynski  wrote:
> >   
> >> On 4/16/20 6:14 PM, Cornelia Huck wrote:  
> >>> On Thu, 16 Apr 2020 17:56:18 +0200
> >>> Boris Fiuczynski  wrote:
> >>>  
> >>>> Improving the zPCI example by choosing more distinct values and
> >>>> adding explanation for fid.
> >>>>
> >>>> Signed-off-by: Boris Fiuczynski 
> >>>> ---
> >>>>docs/pci-addresses.rst | 15 ---
> >>>>1 file changed, 8 insertions(+), 7 deletions(-)
> >>>>
> >>>> diff --git a/docs/pci-addresses.rst b/docs/pci-addresses.rst
> >>>> index 7c8e9edd73..4492389da5 100644
> >>>> --- a/docs/pci-addresses.rst
> >>>> +++ b/docs/pci-addresses.rst
> >>>> @@ -176,14 +176,14 @@ In the simplest case, the following XML snippet
> >>>>
> >>>>
> >>>> >>>> function='0x0'>
> >>>> -  
> >>>> +
> >>>
> >>> Why this change? The pci-bridge does not show up in the guest anyway.  
> >> My assumption was that uid and fid for this would be autogenerated.
> >> Since uid 0x0001 and fid 0x have been freed up due to the change
> >> below this would be the autogenerated set.  
> > 
> > If that makes the XML look saner, no objection.
> >   
> >>  
> >>>  
> >>>>
> >>>>  
> >>>>  
> >>>>
> >>>>
> >>>> >>>> function='0x0'>
> >>>> -  
> >>>> +  
> >>>>
> >>>>  
> >>>>
> >>>> @@ -191,21 +191,22 @@ will result in the following in a Linux guest:
> >>>>
> >>>>::
> >>>>
> >>>> -  0001:00:00.0 Ethernet controller: Red Hat, Inc. Virtio network device
> >>>> +  0007:00:00.0 Ethernet controller: Red Hat, Inc. Virtio network device
> >>>>
> >>>>Note that the PCI bridge is not visible in the guest; s390x always 
> >>>> has a flat
> >>>> -topology.
> >>>> +topology. Also ``fid`` does not define slot or function of the PCI 
> >>>> address.  
> >>>
> >>> I find the sentence regarding 'fid' confusing. Maybe instead move up
> >>> the explanation from below regarding uid and fid?
> >>>
> >>> "The PCI address in the guest is generated from..."
> >>>  
> >> Lets join your proposal with Andreas and move his rewrite up to here.
> >> Like:
> >> ...topology.
> >> The PCI address in the guest is generated from the information provided
> >> via the ``zpci`` element: more specifically, ``uid`` is used as the PCI
> >> domain.``fid`` doesn't appear in the PCI address itself, but it will be
> >> used in sysfs (``/sys/bus/pci/slots/$fid/...``).  
> > 
> > Sounds good.
> > 
> > (Also the rest of the changes.)
> >   
> Before I break the r-b chain as well... Is that your r-b? :)
> 
> 

I'll add my R-b to a final patch :)



Re: [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration

2020-04-17 Thread Cornelia Huck
On Fri, 17 Apr 2020 05:52:02 -0400
Yan Zhao  wrote:

> On Fri, Apr 17, 2020 at 04:44:50PM +0800, Cornelia Huck wrote:
> > On Mon, 13 Apr 2020 01:52:01 -0400
> > Yan Zhao  wrote:
> >   
> > > This patchset introduces a migration_version attribute under sysfs of VFIO
> > > Mediated devices.
> > > 
> > > This migration_version attribute is used to check migration compatibility
> > > between two mdev devices.
> > > 
> > > Currently, it has two locations:
> > > (1) under mdev_type node,
> > > which can be used even before device creation, but only for mdev
> > > devices of the same mdev type.
> > > (2) under mdev device node,
> > > which can only be used after the mdev devices are created, but the src
> > > and target mdev devices are not necessarily be of the same mdev type
> > > (The second location is newly added in v5, in order to keep consistent
> > > with the migration_version node for migratable pass-though devices)  
> > 
> > What is the relationship between those two attributes?
> >   
> (1) is for mdev devices specifically, and (2) is provided to keep the same
> sysfs interface as with non-mdev cases. so (2) is for both mdev devices and
> non-mdev devices.
> 
> in future, if we enable vfio-pci vendor ops, (i.e. a non-mdev device
> is binding to vfio-pci, but is able to register migration region and do
> migration transactions from a vendor provided affiliate driver),
> the vendor driver would export (2) directly, under device node.
> It is not able to provide (1) as there're no mdev devices involved.

Ok, creating an alternate attribute for non-mdev devices makes sense.
However, wouldn't that rather be a case (3)? The change here only
refers to mdev devices.

> 
> > Is existence (and compatibility) of (1) a pre-req for possible
> > existence (and compatibility) of (2)?
> >  
> no. (2) does not reply on (1).

Hm. Non-existence of (1) seems to imply "this type does not support
migration". If an mdev created for such a type suddenly does support
migration, it feels a bit odd.

(It obviously cannot be a prereq for what I called (3) above.)

> 
> > Does userspace need to check (1) or can it completely rely on (2), if
> > it so chooses?
> >  
> I think it can completely reply on (2) if compatibility check before
> mdev creation is not required.
> 
> > If devices with a different mdev type are indeed compatible, it seems
> > userspace can only find out after the devices have actually been
> > created, as (1) does not apply?  
> yes, I think so. 

How useful would it be for userspace to even look at (1) in that case?
It only knows if things have a chance of working if it actually goes
ahead and creates devices.

> 
> > One of my worries is that the existence of an attribute with the same
> > name in two similar locations might lead to confusion. But maybe it
> > isn't a problem.
> >  
> Yes, I have the same feeling. but as (2) is for sysfs interface
> consistency, to make it transparent to userspace tools like libvirt,
> I guess the same name is necessary?

What do we actually need here, I wonder? (1) and (2) seem to serve
slightly different purposes, while (2) and what I called (3) have the
same purpose. Is it important to userspace that (1) and (2) have the
same name?



Re: [PATCH 2/2] docs: Improve zPCI section in pci-addresses.rst

2020-04-17 Thread Cornelia Huck
On Fri, 17 Apr 2020 10:50:02 +0200
Boris Fiuczynski  wrote:

> On 4/16/20 6:14 PM, Cornelia Huck wrote:
> > On Thu, 16 Apr 2020 17:56:18 +0200
> > Boris Fiuczynski  wrote:
> >   
> >> Improving the zPCI example by choosing more distinct values and
> >> adding explanation for fid.
> >>
> >> Signed-off-by: Boris Fiuczynski 
> >> ---
> >>   docs/pci-addresses.rst | 15 ---
> >>   1 file changed, 8 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/docs/pci-addresses.rst b/docs/pci-addresses.rst
> >> index 7c8e9edd73..4492389da5 100644
> >> --- a/docs/pci-addresses.rst
> >> +++ b/docs/pci-addresses.rst
> >> @@ -176,14 +176,14 @@ In the simplest case, the following XML snippet
> >>   
> >>   
> >>>> function='0x0'>
> >> -  
> >> +
> > 
> > Why this change? The pci-bridge does not show up in the guest anyway.  
> My assumption was that uid and fid for this would be autogenerated.
> Since uid 0x0001 and fid 0x have been freed up due to the change 
> below this would be the autogenerated set.

If that makes the XML look saner, no objection.

> 
> >   
> >>   
> >> 
> >> 
> >>   
> >>   
> >>>> function='0x0'>
> >> -  
> >> +  
> >>   
> >> 
> >>   
> >> @@ -191,21 +191,22 @@ will result in the following in a Linux guest:
> >>   
> >>   ::
> >>   
> >> -  0001:00:00.0 Ethernet controller: Red Hat, Inc. Virtio network device
> >> +  0007:00:00.0 Ethernet controller: Red Hat, Inc. Virtio network device
> >>   
> >>   Note that the PCI bridge is not visible in the guest; s390x always has a 
> >> flat
> >> -topology.
> >> +topology. Also ``fid`` does not define slot or function of the PCI 
> >> address.  
> > 
> > I find the sentence regarding 'fid' confusing. Maybe instead move up
> > the explanation from below regarding uid and fid?
> > 
> > "The PCI address in the guest is generated from..."
> >   
> Lets join your proposal with Andreas and move his rewrite up to here.
> Like:
> ...topology.
> The PCI address in the guest is generated from the information provided 
> via the ``zpci`` element: more specifically, ``uid`` is used as the PCI 
> domain.``fid`` doesn't appear in the PCI address itself, but it will be
> used in sysfs (``/sys/bus/pci/slots/$fid/...``).

Sounds good.

(Also the rest of the changes.)



Re: [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration

2020-04-17 Thread Cornelia Huck
On Mon, 13 Apr 2020 01:52:01 -0400
Yan Zhao  wrote:

> This patchset introduces a migration_version attribute under sysfs of VFIO
> Mediated devices.
> 
> This migration_version attribute is used to check migration compatibility
> between two mdev devices.
> 
> Currently, it has two locations:
> (1) under mdev_type node,
> which can be used even before device creation, but only for mdev
> devices of the same mdev type.
> (2) under mdev device node,
> which can only be used after the mdev devices are created, but the src
> and target mdev devices are not necessarily be of the same mdev type
> (The second location is newly added in v5, in order to keep consistent
> with the migration_version node for migratable pass-though devices)

What is the relationship between those two attributes?

Is existence (and compatibility) of (1) a pre-req for possible
existence (and compatibility) of (2)?

Does userspace need to check (1) or can it completely rely on (2), if
it so chooses?

If devices with a different mdev type are indeed compatible, it seems
userspace can only find out after the devices have actually been
created, as (1) does not apply?

One of my worries is that the existence of an attribute with the same
name in two similar locations might lead to confusion. But maybe it
isn't a problem.

> 
> Patch 1 defines migration_version attribute for the first location in
> Documentation/vfio-mediated-device.txt
> 
> Patch 2 uses GVT as an example for patch 1 to show how to expose
> migration_version attribute and check migration compatibility in vendor
> driver.
> 
> Patch 3 defines migration_version attribute for the second location in
> Documentation/vfio-mediated-device.txt
> 
> Patch 4 uses GVT as an example for patch 3 to show how to expose
> migration_version attribute and check migration compatibility in vendor
> driver.
> 
> (The previous "Reviewed-by" and "Acked-by" for patch 1 and patch 2 are
> kept in v5, as there are only small changes to commit messages of the two
> patches.)
> 
> v5:
> added patch 2 and 4 for mdev device part of migration_version attribute.
> 
> v4:
> 1. fixed indentation/spell errors, reworded several error messages
> 2. added a missing memory free for error handling in patch 2
> 
> v3:
> 1. renamed version to migration_version
> 2. let errno to be freely defined by vendor driver
> 3. let checking mdev_type be prerequisite of migration compatibility check
> 4. reworded most part of patch 1
> 5. print detailed error log in patch 2 and generate migration_version
> string at init time
> 
> v2:
> 1. renamed patched 1
> 2. made definition of device version string completely private to vendor
> driver
> 3. reverted changes to sample mdev drivers
> 4. described intent and usage of version attribute more clearly.
> 
> 
> Yan Zhao (4):
>   vfio/mdev: add migration_version attribute for mdev (under mdev_type
> node)
>   drm/i915/gvt: export migration_version to mdev sysfs (under mdev_type
> node)
>   vfio/mdev: add migration_version attribute for mdev (under mdev device
> node)
>   drm/i915/gvt: export migration_version to mdev sysfs (under mdev
> device node)
> 
>  .../driver-api/vfio-mediated-device.rst   | 183 ++
>  drivers/gpu/drm/i915/gvt/Makefile |   2 +-
>  drivers/gpu/drm/i915/gvt/gvt.c|  39 
>  drivers/gpu/drm/i915/gvt/gvt.h|   7 +
>  drivers/gpu/drm/i915/gvt/kvmgt.c  |  55 ++
>  drivers/gpu/drm/i915/gvt/migration_version.c  | 170 
>  drivers/gpu/drm/i915/gvt/vgpu.c   |  13 +-
>  7 files changed, 466 insertions(+), 3 deletions(-)
>  create mode 100644 drivers/gpu/drm/i915/gvt/migration_version.c
> 



Re: [PATCH 2/2] docs: Improve zPCI section in pci-addresses.rst

2020-04-16 Thread Cornelia Huck
On Thu, 16 Apr 2020 17:56:18 +0200
Boris Fiuczynski  wrote:

> Improving the zPCI example by choosing more distinct values and
> adding explanation for fid.
> 
> Signed-off-by: Boris Fiuczynski 
> ---
>  docs/pci-addresses.rst | 15 ---
>  1 file changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/docs/pci-addresses.rst b/docs/pci-addresses.rst
> index 7c8e9edd73..4492389da5 100644
> --- a/docs/pci-addresses.rst
> +++ b/docs/pci-addresses.rst
> @@ -176,14 +176,14 @@ In the simplest case, the following XML snippet
>  
>  
>   function='0x0'>
> -  
> +  

Why this change? The pci-bridge does not show up in the guest anyway.

>  
>
>
>  
>  
>   function='0x0'>
> -  
> +  
>  
>
>  
> @@ -191,21 +191,22 @@ will result in the following in a Linux guest:
>  
>  ::
>  
> -  0001:00:00.0 Ethernet controller: Red Hat, Inc. Virtio network device
> +  0007:00:00.0 Ethernet controller: Red Hat, Inc. Virtio network device
>  
>  Note that the PCI bridge is not visible in the guest; s390x always has a flat
> -topology.
> +topology. Also ``fid`` does not define slot or function of the PCI address.

I find the sentence regarding 'fid' confusing. Maybe instead move up
the explanation from below regarding uid and fid?

"The PCI address in the guest is generated from..."

>  
>  Neither are any changes in the PCI address visible in the guest; replacing

The 'neither' is now a bit confusing; what about

"Any changes in the PCI address are not visible in the guest..." ?

>  the PCI address for the ``virtio-net`` device with
>  
>  ::
>  
> -  
> +  
>  
>  will result in the exactly same view in the guest, as the addresses there

If you move the fid/uid stuff up, make this

"as the fid and uid values in the zpci element remain unchanged." ?

> -are generated from the information provided via the ``zpci`` element (in
> -fact, from the ``uid``).
> +are generated from the information provided via the ``zpci`` element:
> +the ``uid`` is used as PCI domain, and the ``fid`` is used as the PCI devices

s/devices/device's/

> +slot in the sysfs.

s/the sysfs./sysfs (it does not influence the PCI device address.)/
*



>  
>  
>  Device assignment



Re: [PATCH 1/2] docs: Update introduction in pci-addresses.rst

2020-04-16 Thread Cornelia Huck
On Thu, 16 Apr 2020 17:56:17 +0200
Boris Fiuczynski  wrote:

> Changing the introduction to bring the idea of this document better across.
> 
> Signed-off-by: Andrea Bolognani 
> Reviewed-by: Boris Fiuczynski 
> ---
>  docs/pci-addresses.rst | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/docs/pci-addresses.rst b/docs/pci-addresses.rst
> index 1d2dc8e5fc..7c8e9edd73 100644
> --- a/docs/pci-addresses.rst
> +++ b/docs/pci-addresses.rst
> @@ -4,9 +4,12 @@ PCI addresses in domain XML and guest OS
>  
>  .. contents::
>  
> -When discussing PCI addresses, it's important to understand the
> -relationship between the addresses that can be seen in the domain XML
> -and those that are visible inside the guest OS.
> +Looking at the configuration for a guest, it would be reasonable
> +to expect that each PCI device would show up in the guest OS with
> +a PCI address that matches the one present in the corresponding
> + element of the domain XML, but that's not guaranteed
> +to happen and will in fact not be the case in all but the simplest
> +scenarios.
>  
>  
>  Simple cases

Reviewed-by: Cornelia Huck 

...although I'm not sure about the s-o-b chain here :)



Re: [libvirt PATCH 4/4] docs: Remove one example from pci-addresses.rst

2020-04-16 Thread Cornelia Huck
On Thu, 16 Apr 2020 09:28:58 +0200
Andrea Bolognani  wrote:

> On Wed, 2020-04-15 at 19:47 +0200, Cornelia Huck wrote:
> > On Wed, 15 Apr 2020 19:31:36 +0200
> > Andrea Bolognani  wrote:  
> > > -Therefore, replacing the virtio-net device definition with the following 
> > > XML
> > > -snippet
> > > -
> > > -::
> > > -
> > > -  
> > > -
> > > -
> > > - > > function='0x3'>
> > > -  
> > > -
> > > -  
> > > -
> > > -will yield the following result in a Linux guest:
> > > -
> > > -::
> > > -
> > > -  0007:00:00.0 Ethernet controller: Red Hat, Inc. Virtio network device  
> > 
> > Hm, should that rather go somewhere else? What I wanted to show is "you
> > can have the same PCI address in the XML and still get a different PCI
> > address in the guest, if you change the zpci values", as that might be
> > another source of confusion.  
> 
> I think the previous example, specifically the last bit where you
> explain how changing the PCI address completely in the domain XML
> would not change what the guest OS sees because the latter is derived
> from uid and fid, already drives the point home. It's really two
> sides of the same coin.
> 
> Additionally, as I explained elsewhere, this document is not meant to
> list every possible situation in which PCI addresses in the domain
> XML and in the guest OS are out of sync, but merely to show that such
> cases exist. It's valuable to mention the zPCI scenario, but we don't
> need to show more than one variation of it in my opinion.
> 

Fair enough.

Reviewed-by: Cornelia Huck 



Re: [libvirt PATCH 4/4] docs: Remove one example from pci-addresses.rst

2020-04-15 Thread Cornelia Huck
On Wed, 15 Apr 2020 19:31:36 +0200
Andrea Bolognani  wrote:

> The idea behind this document is to show, with actual examples,
> that users should not expect PCI addresses in the domain XML and
> in the guest OS to match.
> 
> The first zPCI example already serves this purpose perfectly, so
> in the interest of keeping the page as brief and easy to digest
> as possible the second one is removed.
> 
> Signed-off-by: Andrea Bolognani 
> ---
>  docs/pci-addresses.rst | 19 ---
>  1 file changed, 19 deletions(-)
> 
> diff --git a/docs/pci-addresses.rst b/docs/pci-addresses.rst
> index 86a41df6ce..1d2dc8e5fc 100644
> --- a/docs/pci-addresses.rst
> +++ b/docs/pci-addresses.rst
> @@ -204,25 +204,6 @@ will result in the exactly same view in the guest, as 
> the addresses there
>  are generated from the information provided via the ``zpci`` element (in
>  fact, from the ``uid``).
>  
> -Therefore, replacing the virtio-net device definition with the following XML
> -snippet
> -
> -::
> -
> -  
> -
> -
> - function='0x3'>
> -  
> -
> -  
> -
> -will yield the following result in a Linux guest:
> -
> -::
> -
> -  0007:00:00.0 Ethernet controller: Red Hat, Inc. Virtio network device
> -
>  
>  Device assignment
>  =

Hm, should that rather go somewhere else? What I wanted to show is "you
can have the same PCI address in the XML and still get a different PCI
address in the guest, if you change the zpci values", as that might be
another source of confusion.



Re: [libvirt PATCH 3/4] docs: Remove MAC addresses from pci-addresses.rst

2020-04-15 Thread Cornelia Huck
On Wed, 15 Apr 2020 19:31:35 +0200
Andrea Bolognani  wrote:

> Signed-off-by: Andrea Bolognani 
> ---
>  docs/pci-addresses.rst | 2 --
>  1 file changed, 2 deletions(-)

Yeah, not that useful.

Reviewed-by: Cornelia Huck 



Re: [libvirt PATCH 1/4] docs: Use consistent style in pci-addresses.rst

2020-04-15 Thread Cornelia Huck
On Wed, 15 Apr 2020 19:31:33 +0200
Andrea Bolognani  wrote:

> Indent all code snippets by the same number of spaces, and don't
> embed the :: marker in the line preceding a code block.
> 
> This commit is best viewed with 'git show -w'.
> 
> Signed-off-by: Andrea Bolognani 
> ---
>  docs/pci-addresses.rst | 60 ++
>  1 file changed, 32 insertions(+), 28 deletions(-)

Reviewed-by: Cornelia Huck 



Re: [libvirt PATCH 2/4] docs: Move sections around in pci-addresses.rst

2020-04-15 Thread Cornelia Huck
On Wed, 15 Apr 2020 19:31:34 +0200
Andrea Bolognani  wrote:

> The section about VFIO devices is kept separate from the rest
> because it's less about domain XML and guest OS disagreeing on the
> PCI address of a device, and more about which of the two PCI
> addresses in the domain XML is even relevant to the guest OS.
> 
> The section on zPCI addresses, on the other hand, falls squarely
> in the "more complex cases" category, so it should live in the
> corresponding section.
> 
> Signed-off-by: Andrea Bolognani 
> ---
>  docs/pci-addresses.rst | 57 +-
>  1 file changed, 28 insertions(+), 29 deletions(-)

Not including a 'wonky cases' category is fine with me :)

Reviewed-by: Cornelia Huck 



Re: [PATCH] docs: add zpci information to pci-addresses.rst

2020-04-15 Thread Cornelia Huck
On Wed, 15 Apr 2020 18:38:06 +0200
Andrea Bolognani  wrote:

> On Wed, 2020-04-15 at 16:45 +0200, Cornelia Huck wrote:
> > On Wed, 15 Apr 2020 16:23:46 +0200
> > Boris Fiuczynski  wrote:  
> > > Once I understand you confusion above I will provide a patch...  
> > 
> > I'd say just go ahead.  
> 
> While I appreciate the sentiment, I would rather not have an
> excessive amount of detail added to this page.
> 
> The idea behind it was to show users that they shouldn't expect the
> address in the domain XML to match the one in the guest OS, with a
> few real-life examples to illustrate the point. So, I don't think
> the details of how exactly zPCI IDs translate to guest-visible PCI
> addresses is in scope.
> 
> It would be great information to have in some other document, though!
> Is there something like that in qemu.git, or in the QEMU wiki? Those
> are the places where I would expect it to live, since it's not really
> tied to libvirt...

I don't think there's much in QEMU documentation yet; someone(tm) wrote
https://virtualpenguins.blogspot.com/2018/02/notes-on-pci-on-s390x.html,
which might provide a starting point.



Re: [PATCH] docs: add zpci information to pci-addresses.rst

2020-04-15 Thread Cornelia Huck
On Wed, 15 Apr 2020 16:23:46 +0200
Boris Fiuczynski  wrote:

> On 4/15/20 3:42 PM, Cornelia Huck wrote:
> > On Tue, 14 Apr 2020 23:06:47 +0200
> > Boris Fiuczynski  wrote:
> >   
> >> On 4/15/20 12:51 PM, Cornelia Huck wrote:  

> >>> +In the simplest case, the following XML snippet
> >>> +
> >>> +::
> >>> +
> >>> +  
> >>> +  
> >>> +
> >>> +
> >>> + >>> function='0x0'>
> >>> +  
> >>> +
> >>> +  
> >>> +  
> >>> +
> >>> +
> >>> +
> >>> + >>> function='0x0'>
> >>> +  
> >>> +
> >>> +  
> >>> +
> >>> +will result in the following in a Linux guest::
> >>> +
> >>> +  0001:00:00.0 Ethernet controller: Red Hat, Inc. Virtio network device
> >>> +
> >>> +Note that the PCI bridge is not visible in the guest; s390x always has a 
> >>> flat
> >>> +topology.
> >>> +
> >>> +Neither are any changes in the PCI address visible in the guest; 
> >>> replacing
> >>> +the PCI address for the virtio-net device with
> >>> +
> >>> +::
> >>> +
> >>> +   >>> function='0x3'>
> >>> +
> >>> +will result in the exactly same view in the guest, as the addresses there
> >>> +are generated from the information provided via the ``zpci`` element (in
> >>> +fact, from the ``uid``).
> >>> +  
> >> This explains that the uid is used by the guest to define the pci domain
> >> of the guest device.
> >> How about an addition for the fid? Something like:
> >>
> >> The function id 'fid' defines the slot address of the pci card in the
> >> guest. It can be observed in the guest at /sys/bus/pci/slots. In the
> >> example given above the card would be at /sys/bus/pci/slots/.  
> > 
> > Hm, is it intentional that this does not show up in the actual pci
> > address? But maybe I'm confused.  
> Are you referring to the slot of the pci address?

Yes.

> 
> If yes, the pci slot does not provided the required address space for 
> the function id. Also we once said that the pci bus in qemu would be 
> like the pci bus of the CEC and the firmware does provide a a function 
> id for every pci function. The zpci fid does one allow to do the same.

Ok, now I dimly remember something like that. Still confusing, but
makes sense.

> 
> >   
> >>
> >>  
> >>> +Therefore, replacing the virtio-net device definition with the following 
> >>> XML
> >>> +snippet
> >>> +
> >>> +::
> >>> +
> >>> +  
> >>> +
> >>> +
> >>> +
> >>> + >>> function='0x3'>
> >>> +  
> >>> +
> >>> +  
> >>> +
> >>> +will yield the following result in a Linux guest::
> >>> +
> >>> +  0007:00:00.0 Ethernet controller: Red Hat, Inc. Virtio network device
> >>>  
> >> and the card would be at /sys/bus/pci/slots/0003.  
> > 
> > Do you want to explain the fid/slot stuff in a patch on top? Despite
> > your attempts at time travel, this patch has already been pushed :)   
> It seems to be a very pushy time... :( no matter how many breaks are 
> produced... ;)  11 minutes from patch production time to commit time. 
> Why even send it for review? :D

Well, I don't have commit rights O:-)

> Once I understand you confusion above I will provide a patch...

I'd say just go ahead.



Re: [PATCH] docs: add zpci information to pci-addresses.rst

2020-04-15 Thread Cornelia Huck
On Tue, 14 Apr 2020 23:06:47 +0200
Boris Fiuczynski  wrote:

> On 4/15/20 12:51 PM, Cornelia Huck wrote:

> > +zPCI addresses
> > +==
> > +
> > +For s390x machines, PCI addresses are handled yet differently. No
> > +topology information is relayed in the PCI addresses; instead, the
> > +``fid`` and ``uid`` elements of the ``zpci`` device convey information.  
> Should it be mentioned that qemu uses the pci address internally to plug 
> the device into its pci bus since the pci address as far as I know must 
> still be properly provided or generated.

Not sure how much is autogenerated; however, as this document talks
about how pci addresses in the xml translate to pci addresses in the
guest, I think we can simply ignore it?

> 
> 
> > +In the simplest case, the following XML snippet
> > +
> > +::
> > +
> > +  
> > +  
> > +
> > +
> > + > function='0x0'>
> > +  
> > +
> > +  
> > +  
> > +
> > +
> > +
> > + > function='0x0'>
> > +  
> > +
> > +  
> > +
> > +will result in the following in a Linux guest::
> > +
> > +  0001:00:00.0 Ethernet controller: Red Hat, Inc. Virtio network device
> > +
> > +Note that the PCI bridge is not visible in the guest; s390x always has a 
> > flat
> > +topology.
> > +
> > +Neither are any changes in the PCI address visible in the guest; replacing
> > +the PCI address for the virtio-net device with
> > +
> > +::
> > +
> > +   > function='0x3'>
> > +
> > +will result in the exactly same view in the guest, as the addresses there
> > +are generated from the information provided via the ``zpci`` element (in
> > +fact, from the ``uid``).
> > +  
> This explains that the uid is used by the guest to define the pci domain 
> of the guest device.
> How about an addition for the fid? Something like:
> 
> The function id 'fid' defines the slot address of the pci card in the 
> guest. It can be observed in the guest at /sys/bus/pci/slots. In the 
> example given above the card would be at /sys/bus/pci/slots/.

Hm, is it intentional that this does not show up in the actual pci
address? But maybe I'm confused.

> 
> 
> > +Therefore, replacing the virtio-net device definition with the following 
> > XML
> > +snippet
> > +
> > +::
> > +
> > +  
> > +
> > +
> > +
> > + > function='0x3'>
> > +  
> > +
> > +  
> > +
> > +will yield the following result in a Linux guest::
> > +
> > +  0007:00:00.0 Ethernet controller: Red Hat, Inc. Virtio network device
> >   
> and the card would be at /sys/bus/pci/slots/0003.

Do you want to explain the fid/slot stuff in a patch on top? Despite
your attempts at time travel, this patch has already been pushed :)



[PATCH] docs: add zpci information to pci-addresses.rst

2020-04-15 Thread Cornelia Huck
Add some information on how pci address work on s390x.

Signed-off-by: Cornelia Huck 
---
 docs/pci-addresses.rst | 63 ++
 1 file changed, 63 insertions(+)

diff --git a/docs/pci-addresses.rst b/docs/pci-addresses.rst
index 923783a151b0..9e241a24fcfb 100644
--- a/docs/pci-addresses.rst
+++ b/docs/pci-addresses.rst
@@ -184,3 +184,66 @@ guest OS rather than as ``0001:08:00.1``, which is the 
address of the
 device on the host.
 
 Of course, all the rules and behaviors described above still apply.
+
+zPCI addresses
+==
+
+For s390x machines, PCI addresses are handled yet differently. No
+topology information is relayed in the PCI addresses; instead, the
+``fid`` and ``uid`` elements of the ``zpci`` device convey information.
+In the simplest case, the following XML snippet
+
+::
+
+  
+  
+
+
+
+  
+
+  
+  
+
+
+
+
+  
+
+  
+
+will result in the following in a Linux guest::
+
+  0001:00:00.0 Ethernet controller: Red Hat, Inc. Virtio network device
+
+Note that the PCI bridge is not visible in the guest; s390x always has a flat
+topology.
+
+Neither are any changes in the PCI address visible in the guest; replacing
+the PCI address for the virtio-net device with
+
+::
+
+  
+
+will result in the exactly same view in the guest, as the addresses there
+are generated from the information provided via the ``zpci`` element (in
+fact, from the ``uid``).
+
+Therefore, replacing the virtio-net device definition with the following XML
+snippet
+
+::
+
+  
+
+
+
+
+  
+
+  
+
+will yield the following result in a Linux guest::
+
+  0007:00:00.0 Ethernet controller: Red Hat, Inc. Virtio network device
-- 
2.21.1



Re: [libvirt PATCH] docs: Add pci-addresses.rst

2020-04-15 Thread Cornelia Huck
On Tue, 14 Apr 2020 19:53:05 +0200
Andrea Bolognani  wrote:

> This document describes the relationship between PCI addresses as
> seen in the domain XML and by the guest OS, which is a topic that
> people get confused by time and time again.
> 
> Signed-off-by: Andrea Bolognani 
> ---
>  docs/formatdomain.html.in |   6 +-
>  docs/pci-addresses.rst| 184 ++
>  2 files changed, 189 insertions(+), 1 deletion(-)
>  create mode 100644 docs/pci-addresses.rst
> 

(...)

> diff --git a/docs/pci-addresses.rst b/docs/pci-addresses.rst
> new file mode 100644
> index 00..96c6466899
> --- /dev/null
> +++ b/docs/pci-addresses.rst
> @@ -0,0 +1,184 @@
> +
> +PCI addresses in domain XML and guest OS
> +
> +
> +.. contents::
> +
> +When discussing PCI addresses, it's important to understand the the
> +relationship between the addresses that can be seen in the domain XML
> +and those that are visible inside the guest OS.
> +
> +
> +Simple cases
> +

(...)

> +More complex cases
> +==

(...)

I'm wondering whether it is worth mentioning zPCI under 'More complex
cases', or maybe under 'Completely wacky cases', as the PCI addresses a
Linux guest will generate do not have any relation to whatever
addresses are used in the XML at all, but only to the zPCI attributes?



Re: [libvirt] libvirt mdev migration, mdevctl integration

2019-12-10 Thread Cornelia Huck
On Tue, 10 Dec 2019 10:36:36 +
Daniel P. Berrangé  wrote:

> On Tue, Dec 10, 2019 at 11:24:44AM +0100, Cornelia Huck wrote:
> > On Tue, 10 Dec 2019 10:09:34 +
> > Daniel P. Berrangé  wrote:
> >   
> > > On Mon, Dec 09, 2019 at 02:23:38PM -0600, Jonathon Jongsma wrote:  
> > > > mdevctl also supports assigning arbitrary sysfs attributes to a device.
> > > > These attributes have an explicit ordering and are written to sysfs in
> > > > the specified order when a device is started. This might be the only
> > > > thing that doesn't fit into the current xml format.
> > 
> > Not sure how much the 'explicit ordering' is actually required by the
> > devices currently supporting this. It's probably a good idea to keep
> > this, though, as future device types might end up having such a
> > requirement.
> >   
> > > Well we need to define a schema, but there will need to be some kind
> > > of validation added because. AFAICT, mdevctl does no validation, so a
> > > plain passthrough of this allows arbitrary writing of files anywhere
> > > on the host given a suitable malicious attribute name.  
> > 
> > Uh, we really should do something about that in mdevctl as well. Writes
> > outside the sysfs hierarchy should not be allowed.  
> 
> I'm pretty worried about overall safety/reliability of the mdevctrl
> tool in general. Given that it is written in shell, it is really hard
> to ensure that it isn't vulnerable to any shell quoting / meta character
> flaws, whether from malicious or accidental data input.

I'm not sure I'm trusting myself too much to get that right, either...
review obviously welcome, but this is shell, as you say.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] libvirt mdev migration, mdevctl integration

2019-12-10 Thread Cornelia Huck
On Tue, 10 Dec 2019 10:09:34 +
Daniel P. Berrangé  wrote:

> On Mon, Dec 09, 2019 at 02:23:38PM -0600, Jonathon Jongsma wrote:
> > mdevctl also supports assigning arbitrary sysfs attributes to a device.
> > These attributes have an explicit ordering and are written to sysfs in
> > the specified order when a device is started. This might be the only
> > thing that doesn't fit into the current xml format.  

Not sure how much the 'explicit ordering' is actually required by the
devices currently supporting this. It's probably a good idea to keep
this, though, as future device types might end up having such a
requirement.

> Well we need to define a schema, but there will need to be some kind
> of validation added because. AFAICT, mdevctl does no validation, so a
> plain passthrough of this allows arbitrary writing of files anywhere
> on the host given a suitable malicious attribute name.

Uh, we really should do something about that in mdevctl as well. Writes
outside the sysfs hierarchy should not be allowed.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] libvirt mdev migration, mdevctl integration

2019-11-26 Thread Cornelia Huck
On Tue, 26 Nov 2019 10:54:59 +0100
Boris Fiuczynski  wrote:

> On 11/25/19 6:14 PM, Cornelia Huck wrote:
> > Also, I'm wondering if we need special care for vfio-ap, although I'm
> > not sure if it is feasible to add migration support for it all. We
> > currently have a matrix device (always same parent) defined by the
> > UUID, and adapters/domains configured for this matrix device (which is
> > handled as extra parameters in the mdevctl device config). I'm not sure
> > how different adapters/domains translate between systems we want to
> > migrate between. Not sure how much sense it makes to dwell on this at  
> Aside from the card preparation with the appropriate masterkeys the 
> adapter/domain configuration (including the card types) for an mdev 
> needs to remain the same since there is no virtualization of 
> adapter/domain addresses in the current vfio-ap driver implementation. 
> As a result a currently possible migration scenario is cross-CEC.

Ok, given the non-virtualization of queue addresses, we need an exact
match on both sides.

> 
>  From libvirts perspective:
> Assuming that mdevs on the source and target system exist, would a 
> matching UUID be enough assurance that these two host resources match 
> for a migration? If not, is a check performed on the configuration of 
> the two mdevs? What is in that case considered migration save? Where are 
> these checks implemented? Does the checking for migratablity go beyond 
> the configuration data of mdev devices, e.g. vfio-ap: check for 
> existence of masterkeys, card type equivalency or as Connie mentioned 
> before on vfio-ccw the equivalency of the child ccw device of the 
> subchannels.

Entrusting a management layer with setting up the other side probably
makes the most sense, at the very least for an initial implementation.

One concern I have: How easy is it to find out that the management
layer has messed things up? Ideally, we want to find out as early as
possible that the other side does not match and abort the migration.
Limping on with subtle errors would be the worst case.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list



Re: [libvirt] libvirt mdev migration, mdevctl integration

2019-11-26 Thread Cornelia Huck
On Mon, 25 Nov 2019 17:47:26 +
Daniel P. Berrangé  wrote:

> On Mon, Nov 25, 2019 at 06:14:33PM +0100, Cornelia Huck wrote:
> > On Mon, 18 Nov 2019 19:00:25 +
> > Daniel P. Berrangé  wrote:
> >   
> > > On Mon, Nov 18, 2019 at 10:06:34AM -0700, Alex Williamson wrote:  

> > > > An issue here though is that the json will also include the parent
> > > > device, which we obviously cannot assume is the same (particularly the
> > > > bus address) on the migration target.  We can allow commandline
> > > > overrides for the parent just as we do above for the UUID when defining
> > > > the mdev device from json, but it's an open issue who is going to be
> > > > smart enough (perhaps dumb enough) to claim this responsibility.  It
> > > > would be interesting to understand how libvirt handles other host
> > > > specific information during migration, for instance if node or processor
> > > > affinities are part of the VM XML, how is that translated to the
> > > > target?  I could imagine that we could introduce a simple "first
> > > > available" placement in mdevctl, but maybe there should minimally be a
> > > > node allocation preference with optional enforcement (similar to
> > > > numactl), or maybe something above libvirt needs to take this
> > > > responsibility to prepare the target before we get ourselves into
> > > > trouble.
> > > 
> > > I don't think we need to solve placement in libvirt.
> > > 
> > > The guest XML will just reference the mdev via a UUID that
> > > was used with virNodeDeviceDefineXML. 
> > > 
> > > The virNodeDeviceDefineXML call where the mdev is first defined
> > > will set the details of the mdev creation for this specific host.
> > > The XML used with virNodeDeviceDefineXML can be different on the
> > > source + target hosts. As long as the UUID is the same in both
> > > hosts, the VM will associate with it correctly.  
> > 
> > I wonder how to sync up with different placements, but maybe I'm just
> > missing something.
> > 
> > Looking at this from the vfio-ccw angle, we can easily have the same
> > device (as identified by the device number) on different subchannels
> > (parents). To find out the device number, you need to look at the child
> > ccw device of the subchannel while it is *not* bound to vfio-ccw, but
> > to the normal I/O subchannel driver instead. Or ask your admin for the
> > system definition...  
> 
> This just means that whoever/whatever is invoking "virDomainDeviceDefinXML"
> or "mdevctl create" will pass different parameters on each host. When
> migrating a guest the mgmt app can indicate which device should be used
> for the guest on each host. This is similar issue to migrating a guest
> which uses a ethNNN device that's got different name on each host ,or
> a /dev/sdNNN that's different on each host, etc

Ok, so the burden will be on a management layer resp. the admin to make
sure that the correct device is in place, even if it resides in
different places in the topology? Makes sense, I guess.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] libvirt mdev migration, mdevctl integration

2019-11-25 Thread Cornelia Huck
On Mon, 18 Nov 2019 19:00:25 +
Daniel P. Berrangé  wrote:

> On Mon, Nov 18, 2019 at 10:06:34AM -0700, Alex Williamson wrote:
> > Hey folks,
> > 
> > We had some discussions at KVM Forum around mdev live migration and
> > what that might mean for libvirt handling of mdev devices and
> > potential libvirt/mdevctl[1] flows.  I believe the current situation is
> > that libvirt knows nothing about an mdev beyond the UUID in the XML.
> > It expects the mdev to exist on the system prior to starting the VM.
> > The intention is for mdevctl to step in here by providing persistence
> > for mdev devices such that these pre-defined mdevs are potentially not
> > just ephemeral, for example, we can tag specific mdevs for automatic
> > startup on each boot.
> > 
> > It seems the next step in this journey is to figure out if libvirt can
> > interact with mdevctl to "manage" a device.  I believe we've avoided
> > defining managed='yes' behavior for mdev hostdevs up to this point
> > because creating an mdev device involves policy decisions.  For
> > example, which parent device hosts the mdev, are there optimal NUMA
> > considerations, are there performance versus power considerations, what
> > is the nature of the mdev, etc.  mdevctl doesn't necessarily want to
> > make placement decisions either, but it does understand how to create
> > and remove an mdev, what it's type is, associate it to a fixed
> > parent, apply attributes, etc.  So would it be reasonable that for a
> > manage='yes' mdev hostdev device, libvirt might attempt to use mdevctl
> > to start an mdev by UUID and stop it when the VM is shutdown?  This
> > assumes the mdev referenced by the UUID is already defined and known to
> > mdevct.  I'd expect semantics much like managed='yes' around vfio-pci
> > binding, ex. start/stop if it doesn't exist, leave it alone if it
> > already exists.
> > 
> > If that much seems reasonable, and someone is willing to invest some
> > development time to support it, what are then the next steps to enable
> > migration?  
> 
> The first step is to deal with our virNodeDevice APIs.
> 
> Currently we have
> 
>  - Listing devices via ( virConnectListAllNodeDevices )
>  - Create transient device ( virNodeDeviceCreateXML )
>  - Delete transient device ( virNodeDeviceDestroy )
> 
> The create/delete APIs only deal with NPIV HBAs right now, so we need
> to extend that to deal with mdevs as first step.

I assume the listing function already deals with all device types
supported by libvirt?

> 
> This entails defining an XML format that can represent the information
> we need about an mdev. We'll then have to convert from this XML into
> the JSON format and invoke mdevctl as needed to create/delete the
> devices.
> 
> During startup we'll also want to query mdevctl to detect any existing
> devices, parse their JSON, so we can report them with
> virConnectListAllNodeDevices.
> 
> If we allow people to create/delete mdevs behind libvirt's back, then
> we'll need to be able to watch for those coming/going, so that we are
> up2date when people call virConnectListAllNodeDevices.
> 
> 
> Transient devices are fine if an external mgmt app (OpenStack, etc)
> wants to explicitly create things at time of guest boot. Not everyone
> will want that though, so we'll need persistent device support.
> 
> This will means creating new APIs in libvirt
> 
>   - Define a persistent device virNodeDeviceDefineXML
>   - Create a persistent device  virNodeDeviceCreate
>   - Undefine a persistent device virNodeDeviceUndefine
> 
> again these will mostly just be wired through to mdevctl, converting
> our XML into JSON. We dont need to store libvirt's own XML format on
> disk anywhere, since mdevctl defines a storage format we can use.

That sounds good.

> 
> 
> With this done in the virNodeDevice APIs we have the building blocks
> needed to support "managed=yes" in the domain XML.
> 
> > AIUI, libvirt blindly assumes hostdev devices cannot be migrated.  This
> > may already be getting some work due to Jens' network failover support
> > where the attached hostdev doesn't really migrate, but it allows the
> > migration to proceed in a partially detached state so that it can jump
> > back into action should the migration fail.  Long term we expect that
> > not only some mdev hostdevs might be migratable, but possibly some
> > regular vfio-pci hostdevs as well.  I think libvirt will need to remove
> > any assumptions around hostdev migration and rather rely on
> > introspection of the QEMU process to determine if any devices hold
> > migration blockers (or simply try the migration and let QEMU fail
> > quickly if there are blockers).  
> 
> Does QEMU provide any way to report that yet ? If not, we'll need to
> get planning on how to report migratability of devices via QMP.

I'm not aware of any incantation to get that info.

We have two ways of marking a device unmigratable:
- Add a migration blocker (kept in a list for the whole machine). While
  a test 

Re: [libvirt] [PATCH 0/6] VFIO mdev aggregated resources handling

2019-11-07 Thread Cornelia Huck
On Wed, 6 Nov 2019 11:44:40 -0700
Alex Williamson  wrote:

> On Wed, 6 Nov 2019 12:20:31 +0800
> Zhenyu Wang  wrote:
> 
> > On 2019.11.05 14:10:42 -0700, Alex Williamson wrote:  
> > > On Thu, 24 Oct 2019 13:08:23 +0800
> > > Zhenyu Wang  wrote:
> > > 
> > > > Hi,
> > > > 
> > > > This is a refresh for previous send of this series. I got impression 
> > > > that
> > > > some SIOV drivers would still deploy their own create and config method 
> > > > so
> > > > stopped effort on this. But seems this would still be useful for some 
> > > > other
> > > > SIOV driver which may simply want capability to aggregate resources. So 
> > > > here's
> > > > refreshed series.
> > > > 
> > > > Current mdev device create interface depends on fixed mdev type, which 
> > > > get uuid
> > > > from user to create instance of mdev device. If user wants to use 
> > > > customized
> > > > number of resource for mdev device, then only can create new mdev type 
> > > > for that
> > > > which may not be flexible. This requirement comes not only from to be 
> > > > able to
> > > > allocate flexible resources for KVMGT, but also from Intel scalable IO
> > > > virtualization which would use vfio/mdev to be able to allocate 
> > > > arbitrary
> > > > resources on mdev instance. More info on [1] [2] [3].
> > > > 
> > > > To allow to create user defined resources for mdev, it trys to extend 
> > > > mdev
> > > > create interface by adding new "aggregate=xxx" parameter following 
> > > > UUID, for
> > > > target mdev type if aggregation is supported, it can create new mdev 
> > > > device
> > > > which contains resources combined by number of instances, e.g
> > > > 
> > > > echo ",aggregate=10" > create
> > > > 
> > > > VM manager e.g libvirt can check mdev type with "aggregation" attribute 
> > > > which
> > > > can support this setting. If no "aggregation" attribute found for mdev 
> > > > type,
> > > > previous behavior is still kept for one instance allocation. And new 
> > > > sysfs
> > > > attribute "aggregated_instances" is created for each mdev device to 
> > > > show allocated number.
> > > 
> > > Given discussions we've had recently around libvirt interacting with
> > > mdev, I think that libvirt would rather have an abstract interface via
> > > mdevctl[1].  Therefore can you evaluate how mdevctl would support this
> > > creation extension?  It seems like it would fit within the existing
> > > mdev and mdevctl framework if aggregation were simply a sysfs attribute
> > > for the device.  For example, the mdevctl steps might look like this:
> > > 
> > > mdevctl define -u UUID -p PARENT -t TYPE
> > > mdevctl modify -u UUID --addattr=mdev/aggregation --value=2
> > > mdevctl start -u UUID
> > > 
> > > When mdevctl starts the mdev, it will first create it using the
> > > existing mechanism, then apply aggregation attribute, which can consume
> > > the necessary additional instances from the parent device, or return an
> > > error, which would unwind and return a failure code to the caller
> > > (libvirt).  I think the vendor driver would then have freedom to decide
> > > when the attribute could be modified, for instance it would be entirely
> > > reasonable to return -EBUSY if the user attempts to modify the
> > > attribute while the mdev device is in-use.  Effectively aggregation
> > > simply becomes a standardized attribute with common meaning.  Thoughts?
> > > [cc libvirt folks for their impression] Thanks,
> > 
> > I think one problem is that before mdevctl start to create mdev you
> > don't know what vendor attributes are, as we apply mdev attributes
> > after create. You may need some lookup depending on parent.. I think
> > making aggregation like other vendor attribute for mdev might be the
> > simplest way, but do we want to define its behavior in formal? e.g
> > like previous discussed it should show maxium instances for aggregation, 
> > etc.  
> 
> Yes, we'd still want to standardize how we enable and discover
> aggregation since we expect multiple users.  Even if libvirt were to
> use mdevctl as it's mdev interface, higher level tools should have an
> introspection mechanism available.  Possibly the sysfs interfaces
> proposed in this series remains largely the same, but I think perhaps
> the implementation of them moves out to the vendor driver.  In fact,
> perhaps the only change to mdev core is to define the standard.  For
> example, the "aggregation" attribute on the type is potentially simply
> a defined, optional, per type attribute, similar to "name" and
> "description".  For "aggregated_instances" we already have the
> mdev_attr_groups of the mdev_parent_ops, we could define an
> attribute_group with .name = "mdev" as a set of standardized
> attributes, such that vendors could provide both their own vendor
> specific attributes and per device attributes with a common meaning and
> semantic defined in the mdev ABI.

+1 to standardizing this. While not every vendor driver will support
aggregation, 

Re: [libvirt] Call for volunteers: LWN.net articles about KVM Forum talks

2019-09-18 Thread Cornelia Huck
On Tue, 17 Sep 2019 14:02:59 +0100
Stefan Hajnoczi  wrote:

> Hi,
> LWN.net is a popular open source news site that covers Linux and other
> open source communities (Python, GNOME, Debian, etc).  It has published
> a few KVM articles in the past too.
> 
> Let's raise awareness of QEMU, KVM, and libvirt by submitting articles 
> covering
> KVM Forum.

Great idea!

> 
> I am looking for ~5 volunteers who are attending KVM Forum to write an article
> about a talk they find interesting.
> 
> Please pick a talk you'd like to cover and reply to this email thread.
> I will then send an email to LWN with a heads-up so they can let us know
> if they are interested in publishing a KVM Forum special.  I will not
> ask LWN.net for money.
> 
> KVM Forum schedule:
> https://events.linuxfoundation.org/events/kvm-forum-2019/program/schedule/

I think it might make sense to cover "Managing Matryoshkas: Testing
Nested Guests" (Marc Hartmayer) and "Nesting" (Vitaly
Kuznetsov) in one article, and I volunteer for that.
> 
> LWN.net guidelines:
> https://lwn.net/op/AuthorGuide.lwn
> "Our general guideline is for articles to be around 1500 words in
> length, though somewhat longer or shorter can work too. The best
> articles cover a fairly narrow topic completely, without any big
> omissions or any extra padding."
> 
> I volunteer to cover Michael Tsirkin's "VirtIO without the Virt -
> Towards Implementations in Hardware" talk.
> 
> Thanks,
> Stefan

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] mdevctl: A shoestring mediated device management and persistence utility

2019-07-01 Thread Cornelia Huck
On Mon, 1 Jul 2019 08:40:51 -0600
Alex Williamson  wrote:

> On Mon, 1 Jul 2019 10:20:43 +0200
> Cornelia Huck  wrote:
> 
> > On Fri, 28 Jun 2019 11:05:46 -0600
> > Alex Williamson  wrote:
> >   
> > > On Fri, 28 Jun 2019 11:06:48 +0200
> > > Cornelia Huck  wrote:
> >   
> > > > What do you think of a way to specify JSON for the attributes directly
> > > > on the command line? Or would it be better to just edit the config
> > > > files directly?  
> > > 
> > > Supplying json on the command like seems difficult, even doing so with
> > > with jq requires escaping quotes.  It's not a very friendly
> > > experience.  Maybe something more like how virsh allows snippets of xml
> > > to be included, we could use jq to validate a json snippet provided
> > > as a file and add it to the attributes... of course if we need to allow
> > > libvirt to modify the json config files directly, the user could do
> > > that as well.  Is there a use case you're thinking of?  Maybe we could
> > > augment the 'list' command to take a --uuid and --dumpjson option and
> > > the 'define' command to accept a --jsonfile.  Maybe the 'start' command
> > > could accept the same, so a transient device could define attributes
> > > w/o excessive command line options.  Thanks,
> > > 
> > > Alex
> > 
> > I was mostly thinking about complex configurations where writing a JSON
> > config would be simpler than adding a lot of command line options.
> > Something like dumping a JSON file and allowing to refer to a JSON file
> > as you suggested could be useful; but then, those very complex use
> > cases are probably already covered by editing the config file directly.
> > Not sure if it is worth the effort; maybe just leave it as it is for
> > now.  
> 
> Well, I already did it.  It seems useful for creating transient devices
> with attribute specifications.  If it's too ugly we can drop it.

I should probably look at the repository before I reply :)

Anyway, this doesn't look too ugly to me; but I think it would benefit
from some usage examples (which I just sent you a pull request for :)

> Thanks,
> 
> Alex

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] mdevctl: A shoestring mediated device management and persistence utility

2019-07-01 Thread Cornelia Huck
On Fri, 28 Jun 2019 11:05:46 -0600
Alex Williamson  wrote:

> On Fri, 28 Jun 2019 11:06:48 +0200
> Cornelia Huck  wrote:

> > What do you think of a way to specify JSON for the attributes directly
> > on the command line? Or would it be better to just edit the config
> > files directly?  
> 
> Supplying json on the command like seems difficult, even doing so with
> with jq requires escaping quotes.  It's not a very friendly
> experience.  Maybe something more like how virsh allows snippets of xml
> to be included, we could use jq to validate a json snippet provided
> as a file and add it to the attributes... of course if we need to allow
> libvirt to modify the json config files directly, the user could do
> that as well.  Is there a use case you're thinking of?  Maybe we could
> augment the 'list' command to take a --uuid and --dumpjson option and
> the 'define' command to accept a --jsonfile.  Maybe the 'start' command
> could accept the same, so a transient device could define attributes
> w/o excessive command line options.  Thanks,
> 
> Alex

I was mostly thinking about complex configurations where writing a JSON
config would be simpler than adding a lot of command line options.
Something like dumping a JSON file and allowing to refer to a JSON file
as you suggested could be useful; but then, those very complex use
cases are probably already covered by editing the config file directly.
Not sure if it is worth the effort; maybe just leave it as it is for
now.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] mdevctl: A shoestring mediated device management and persistence utility

2019-06-28 Thread Cornelia Huck
On Thu, 27 Jun 2019 19:57:04 -0600
Alex Williamson  wrote:

> On Thu, 27 Jun 2019 15:15:02 -0600
> Alex Williamson  wrote:
> 
> > On Thu, 27 Jun 2019 09:38:32 -0600
> > Alex Williamson  wrote:  
> > > > On 6/27/19 8:26 AM, Cornelia Huck wrote:  
> > > > > 
> > > > > {
> > > > >   "foo": "1",
> > > > >   "bar": "42",
> > > > >   "baz": {
> > > > > "depends": ["foo", "bar"],
> > > > > "value": "plahh"
> > > > >   }
> > > > > }
> > > > > 
> > > > > Something like that?  
> > > 
> > > I'm not sure yet.  I think we need to look at what's feasible (and
> > > easy) with jq.  Thanks,
> > 
> > I think it's not too much trouble to remove and insert into arrays, so
> > what if we were to define the config as:
> > 
> > {
> >   "mdev_type":"vendor-type",
> >   "start":"auto",
> >   "attrs": [
> >   {"attrX":["Xvalue1","Xvalue2"]},
> >   {"dir/attrY": "Yvalue1"},
> >   {"attrX": "Xvalue3"}
> > ]
> > }
> > 
> > "attr" here would define sysfs attributes under the device.  The array
> > would be processed in order, so in the above example we'd do the
> > following:
> > 
> >  1. echo Xvalue1 > attrX
> >  2. echo Xvalue2 > attrX
> >  3. echo Yvalue1 > dir/attrY
> >  4. echo Xvalue3 > attrX
> > 
> > When starting the device mdevctl would simply walk the array, if the
> > attribute key exists write the value(s).  If a write fails or the
> > attribute doesn't exist, remove the device and report error.

Yes, I think it makes sense to fail the startup of a device where we
cannot set all attributes to the requested values.

> > 
> > I think it's easiest with jq to manipulate arrays by removing and
> > inserting by index.  Also if we end up with something like above, it's
> > ambiguous if we reference the "attrX" key.  So perhaps we add the
> > following options to the modify command:
> > 
> > --addattr=ATTRIBUTE --delattr --index=INDEX --value=VALUE1[,VALUE2]
> > 
> > We could handle it like a stack, so if --index is not supplied, add to
> > the end or remove from the end.  If --index is provided, delete that
> > index or add the attribute at that index.  So if you had the above and
> > wanted to remove Xvalue1 but keep the ordering, you'd do:
> > 
> > --delattr --index=0
> > --addattr --index=0 --value=Xvalue2
> > 
> > Which should results in:
> > 
> >   "attrs": [
> >   {"attrX": "Xvalue2"},
> >   {"dir/attrY": "Yvalue1"},
> >   {"attrX": "Xvalue3"}
> > ]

Modifying by index looks reasonable; I just sent a pull request to
print the index of an attribute out as well, so it is easier to specify
the right attribute to modify.

> > 
> > If we want to modify a running device, I'm thinking we probably want a
> > new command and options --attr=ATTRIBUTE --value=VALUE might suffice.
> > 
> > Do we need to support something like this for the 'start' command or
> > should we leave that for simple devices and require a sequence of:
> > 
> > # mdevctl define ...
> > # mdevctl modify --addattr...
> > ...
> > # mdevctl start
> > # mdevctl undefine
> > 
> > This is effectively the long way to get a transient device.  Otherwise
> > we'd need to figure out how to have --attr --value appear multiple
> > times on the start command line.  Thanks,  

What do you think of a way to specify JSON for the attributes directly
on the command line? Or would it be better to just edit the config
files directly?

> 
> This is now implemented, and yes you can specify '--addattr remove
> --value 1' and mdevctl will immediately remove the device after it's
> created (more power to the admin).  Listing defined devices also lists

Fun ;)

> any attributes defined for easy inspection.  It is also possible to
> override the conversion of comma separated values into an array by
> encoding and escaping the comma.  It's a little cumbersome, but
> possible in case a driver isn't fully on board with the one attribute,
> one value rule of sysfs.  Does this work for vfio-ap?  I also still

I do not have ap devices to actually test this with; but defining a
device and adding attributes seems to work.

> need to check if this allows an NVIDIA vGPU mdev to be configured such
> that the framerate limiter can be automatically controlled.  Thanks,
> 
> Alex

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] mdevctl: A shoestring mediated device management and persistence utility

2019-06-27 Thread Cornelia Huck
On Wed, 26 Jun 2019 19:53:50 -0600
Alex Williamson  wrote:

> On Wed, 26 Jun 2019 08:37:20 -0600
> Alex Williamson  wrote:
> 
> > On Wed, 26 Jun 2019 11:58:06 +0200
> > Cornelia Huck  wrote:
> >   
> > > On Tue, 25 Jun 2019 16:52:51 -0600
> > > Alex Williamson  wrote:
> > > 
> > > > Hi,
> > > > 
> > > > Based on the discussions we've had, I've rewritten the bulk of
> > > > mdevctl.  I think it largely does everything we want now, modulo
> > > > devices that will need some sort of 1:N values per key for
> > > > configuration in the config file versus the 1:1 key:value setup we
> > > > currently have (so don't consider the format final just yet).  
> > > 
> > > We might want to factor out that config format handling while we're
> > > trying to finalize it.
> > > 
> > > cc:ing Matt for his awareness. I'm currently not quite sure how to
> > > handle those vfio-ap "write several values to an attribute one at a
> > > time" requirements. Maybe 1:N key:value is the way to go; maybe we
> > > need/want JSON or something like that.
> > 
> > Maybe we should just do JSON for future flexibility.  I assume there
> > are lots of helpers that should make it easy even from a bash script.
> > I'll look at that next.  
> 
> Done.  Throw away any old mdev config files, we use JSON now. 

The code changes look quite straightforward, thanks.

> The per
> mdev config now looks like this:
> 
> {
>   "mdev_type": "i915-GVTg_V4_8",
>   "start": "auto"
> }
> 
> My expectation, and what I've already pre-enabled support in set_key
> and get_key functions, is that we'd use arrays for values, so we might
> have:
> 
>   "new_key": ["value1", "value2"]
> 
> set_key will automatically convert a comma separated list of values
> into such an array, so I'm thinking this would be specified by the user
> as:
> 
> # mdevctl modify -u UUID --key=new_key --value=value1,value2

Looks sensible.

For vfio-ap, we'd probably end up with something like the following:

{
  "mdev_type": "vfio_ap-passthrough",
  "start": "auto",
  "assign_adapter": ["5", "6"],
  "assign_domain": ["4", "0xab"]
}

(following the Guest1 example in the kernel documentation)



> 
> We should think about whether ordering is important and maybe
> incorporate that into key naming conventions or come up with some
> syntax for specifying startup blocks.  Thanks,
> 
> Alex

Hm...

{
  "foo": "1",
  "bar": "42",
  "baz": {
"depends": ["foo", "bar"],
"value": "plahh"
  }
}

Something like that?

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] mdevctl: A shoestring mediated device management and persistence utility

2019-06-26 Thread Cornelia Huck
On Tue, 25 Jun 2019 16:52:51 -0600
Alex Williamson  wrote:

> Hi,
> 
> Based on the discussions we've had, I've rewritten the bulk of
> mdevctl.  I think it largely does everything we want now, modulo
> devices that will need some sort of 1:N values per key for
> configuration in the config file versus the 1:1 key:value setup we
> currently have (so don't consider the format final just yet).

We might want to factor out that config format handling while we're
trying to finalize it.

cc:ing Matt for his awareness. I'm currently not quite sure how to
handle those vfio-ap "write several values to an attribute one at a
time" requirements. Maybe 1:N key:value is the way to go; maybe we
need/want JSON or something like that.

> 
> We now support "transient" devices and there's no distinction or
> difference in handling of such devices whether they're created by
> mdevctl or externally.  All devices will also have systemd management,
> though systemd is no longer required, it's used if available.  The
> instance name used for systemd device units has also changed to allow
> us to use BindsTo= such that services are not only created, but are
> also removed if the device is removed.  Unfortunately it's not a simple
> UUID via the systemd route any longer.

That's a bit unfortunate; however, making it workable without systemd
certainly is a good thing :)

> 
> Since the original posting, the project has moved from my personal
> github to here:
> 
> https://github.com/mdevctl/mdevctl
> 
> Please see the README there for overview of the new commands and
> example of their usage.  There is no attempt to maintain backwards
> compatibility with previous versions, this tool is in its infancy.
> Also since the original posting, RPM packaging is included, so simply
> run 'make rpm' and install the resulting package.

Nice.

> 
> Highlights of this version include proper argument parsing via getopts
> so that options can be provided in any order.  I'm still using the
> format 'mdevctl {command} [options]' but now it's consistent that all
> the options come after the command, in any order.  I think this is
> relatively consistent with a variety of other tools.

Parsing via getops is also very nice.

> 
> Devices are no longer automatically persisted, we handle them as
> transient, but we also can promote them to persistent through the
> 'define' command.  The define, undefine, and modify commands all
> operate only on the config file, so that we can define separate from
> creating.  When promoting from a transient to defined device, we can
> use the existing device to create the config.  Both the type and the
> startup of a device can be modified in the config, without affecting
> the running device.
> 
> Starting an mdev device no longer relies exclusively on a saved config,
> the device can be fully specified via options to create a transient
> device.  Specifying only a uuid is also sufficient for a defined
> device.  Some consideration has also been given to uuid collisions.
> The mdev interface in the kernel prevents multiple mdevs with the same
> uuid running concurrently, but mdevctl allows mdevs to be defined with
> the same uuid under separate parent devices.  Some options therefore
> allow both a uuid and parent to be specified and require this if the
> uuid alone is ambiguous.  Clearly starting two such devices at the same
> time will fail and is left to higher level tools to manage, just like
> the ability to define more devices than there are available instances on
> the host system.

I still have to look into the details of this.

> 
> The stop and list commands are largely the same ideas as previous
> though the semantics are completely different.  Listing running devices
> now notes which are defined versus transient.  Perhaps it might also be
> useful when listing defined devices to note which are running.

Yes, I think it would be useful.

> 
> The sbin/libexec split of mdevctl has been squashed.  There are some
> commands in the script that are currently only intended to be used from
> udev or systemd, these are simply excluded from the help.  It's
> possible we may want to promote the start-parent-mdevs command out of
> this class, but the rest are specifically systemd helpers.
> 
> I'll include the current help test message below for further semantic
> details, but please have a look-see, or better yet give it a try.

Had a quick look, sent two pull requests with some minor tweaks. Still
have to do a proper review, and actually try the thing on an s390.

> Thanks,
> 
> Alex
> 
> PS - I'm looking at adding udev change events when a device registers
> or unregisters with the mdev core, which should help us know when to
> trigger creation of persistent, auto started devices.  That support is
> included here with the MDEV_STATE="registered|unregistered" environment
> values.  Particularly, kvmgt now supports dynamic loading an unloading,
> so as long as the enable_gvt=1 option is provided to the i915 driver

Re: [libvirt] mdevctl: A shoestring mediated device management and persistence utility

2019-06-19 Thread Cornelia Huck
On Wed, 19 Jun 2019 11:04:15 +0200
Sylvain Bauza  wrote:

> On Wed, Jun 19, 2019 at 12:27 AM Alex Williamson 
> wrote:
> 
> > On Tue, 18 Jun 2019 14:48:11 +0200
> > Sylvain Bauza  wrote:
> >  
> > > On Tue, Jun 18, 2019 at 1:01 PM Cornelia Huck  wrote:

> > > > I think we need to reach consensus about the actual scope of the
> > > > mdevctl tool.
> > > >
> > > >  
> > > Thanks Cornelia, my thoughts:
> > >
> > > - Is it supposed to be responsible for managing *all* mdev devices in  
> > > >   the system, or is it more supposed to be a convenience helper for
> > > >   users/software wanting to manage mdevs?
> > > >  
> > >
> > > The latter. If an operator (or some software) wants to create mdevs by  
> > not  
> > > using mdevctl (and rather directly calling the sysfs), I think it's OK.
> > > That said, mdevs created by mdevctl would be supported by systemctl,  
> > while  
> > > the others not but I think it's okay.  
> >
> > I agree (sort of), and I'm hearing that we should drop any sort of
> > automatic persistence of mdevs created outside of mdevctl.  The problem
> > comes when we try to draw the line between unmanaged and manged
> > devices.  For instance, if we have a command to list mdevs it would
> > feel incomplete if it didn't list all mdevs both those managed by
> > mdevctl and those created elsewhere.  For managed devices, I expect
> > we'll also have commands that allow the mode of the device to be
> > switched between transient, saved, and persistent.  Should a user then
> > be allowed to promote an unmanaged device to one of these modes via the
> > same command?  Should they be allowed to stop an unmanaged device
> > through driverctl?  Through systemctl?  These all seem like reasonable
> > things to do, so what then is the difference between transient and
> > unmanaged mdev and is mdevctl therefore managing all mdevs, not just
> > those it has created?
> >
> >  
> Well, IMHO, mdevs created by mdevctl could all be persisted or transient
> just by adding an option when calling mdevctl, like :
> "mdevctl create-mdev [--transient]   " where default
> would be persisting the mdev.

This sounds useful; the caller can avoid fiddling with sysfs entries
directly, while not committing to a permanent configuration.

> 
> For mdevs *not* created by mdevctl, then a usecase could be "I'd like to
> ask mdevctl to manage mdevs I already created" and if so, a mdevctl command
> like :
> "mdevctl manage-mdev [--transient] "

What kind of 'managing' would this actually enable? If we rely on
mdevctl working with sysfs directly for transient devices, I can't
really think of anything...

> 
> Of course, that would mean that when you list mdevs by "mdev list-all" you
> wouldn't get mdevs managed by mdevctl.
> Thoughts ?
> 
> > - Do we want mdevctl to manage config files for individual mdevs, or  
> > > >   are they supposed to be in a common format that can also be managed
> > > >   by e.g. libvirt?
> > > >  
> > >
> > > Unless I misunderstand, I think mdevctl just helps to create mdevs for
> > > being used by guests created either by libvirt or QEMU or even others.
> > > How a guest would allocate a mdev (ie. saying "I'll use this specific  
> > mdev  
> > > UUID") is IMHO not something for mdevctl.  
> >
> > Right, mdevctl isn't concerned with how a specific mdev is used, but I
> > think what Connie is after is more the proposal from Daniel where
> > libvirt can essentially manage mdevctl config files itself and then
> > only invoke mdevctl for the dirty work of creating and deleting
> > devices.  In fact, assuming systemd, libvirt could avoid direct
> > interaction with mdevctl entirely, instead using systemctl device units
> > to start and stop the mdevs.  Maybe where that proposal takes a turn is
> > when we again consider non-systemd hosts, where maybe mdevctl needs to
> > write out an init script per mdev and libvirt injecting itself into
> > manipulation of the config files would either need to perform the same
> > or fall back to mdevctl.  Unfortunately there seems to be an ultimatum
> > to either condone external config file manipulation or expand the scope
> > of the project into becoming a library.
> >
> >  
> Well, like I said, I think it's maybe another user case : just using
> libvirt when you want a guest having vGPUs and then libvirt would create
> mdevs (so users wouln't need to know at that).
> That sai

Re: [libvirt] mdevctl: A shoestring mediated device management and persistence utility

2019-06-19 Thread Cornelia Huck
On Wed, 19 Jun 2019 08:28:02 +0100
Daniel P. Berrangé  wrote:

> On Tue, Jun 18, 2019 at 04:12:10PM -0600, Alex Williamson wrote:
> > On Tue, 18 Jun 2019 14:48:11 +0200
> > Sylvain Bauza  wrote:
> >   
> > > On Tue, Jun 18, 2019 at 1:01 PM Cornelia Huck  wrote:

> > > > I think we need to reach consensus about the actual scope of the
> > > > mdevctl tool.
> > > >
> > > >
> > > Thanks Cornelia, my thoughts:
> > > 
> > > - Is it supposed to be responsible for managing *all* mdev devices in  
> > > >   the system, or is it more supposed to be a convenience helper for
> > > >   users/software wanting to manage mdevs?
> > > >
> > > 
> > > The latter. If an operator (or some software) wants to create mdevs by not
> > > using mdevctl (and rather directly calling the sysfs), I think it's OK.
> > > That said, mdevs created by mdevctl would be supported by systemctl, while
> > > the others not but I think it's okay.  
> > 
> > I agree (sort of), and I'm hearing that we should drop any sort of
> > automatic persistence of mdevs created outside of mdevctl.  The problem
> > comes when we try to draw the line between unmanaged and manged
> > devices.  For instance, if we have a command to list mdevs it would
> > feel incomplete if it didn't list all mdevs both those managed by
> > mdevctl and those created elsewhere.  For managed devices, I expect
> > we'll also have commands that allow the mode of the device to be
> > switched between transient, saved, and persistent.  Should a user then

Hm, what's the difference between 'saved' and 'persistent'? That
'saved' devices are not necessarily present?

> > be allowed to promote an unmanaged device to one of these modes via the
> > same command?  Should they be allowed to stop an unmanaged device
> > through driverctl?  Through systemctl?  These all seem like reasonable
> > things to do, so what then is the difference between transient and
> > unmanaged mdev and is mdevctl therefore managing all mdevs, not just
> > those it has created?  
> 
> To my mind there shouldn't really need to be a difference between
> transient mdevs created by mdevctrl and mdevs created by an user
> directly using sysfs. Both are mdevs on the running system with
> no config file that you have to enumerate by looking at sysfs.
> This ties back to my belief that we shouldn't need to have any
> config on disk for a transient mdev, just discover them all
> dynamically when required.

So mdevctl can potentially interact with any mdev device on the system,
it just has to be instructed by a user or software to do so? I think we
can work with that.

>  
> > > - Do we want mdevctl to manage config files for individual mdevs, or  
> > > >   are they supposed to be in a common format that can also be managed
> > > >   by e.g. libvirt?
> > > >
> > > 
> > > Unless I misunderstand, I think mdevctl just helps to create mdevs for
> > > being used by guests created either by libvirt or QEMU or even others.
> > > How a guest would allocate a mdev (ie. saying "I'll use this specific mdev
> > > UUID") is IMHO not something for mdevctl.  
> > 
> > Right, mdevctl isn't concerned with how a specific mdev is used, but I
> > think what Connie is after is more the proposal from Daniel where
> > libvirt can essentially manage mdevctl config files itself and then
> > only invoke mdevctl for the dirty work of creating and deleting
> > devices.  In fact, assuming systemd, libvirt could avoid direct
> > interaction with mdevctl entirely, instead using systemctl device units
> > to start and stop the mdevs.  Maybe where that proposal takes a turn is
> > when we again consider non-systemd hosts, where maybe mdevctl needs to
> > write out an init script per mdev and libvirt injecting itself into
> > manipulation of the config files would either need to perform the same
> > or fall back to mdevctl.  Unfortunately there seems to be an ultimatum
> > to either condone external config file manipulation or expand the scope
> > of the project into becoming a library.  
> 
> Is mdevctl really tackling a problem that is complex enough that we
> will gain significantly by keeping the config files private and forcing
> use of a CLI or Library to access them ?  The amount of information
> that we need to store per mdev looks pretty small, with minimal
> compound structure. To me it feels like we can easily define a standard
> config format without suffering any serious long term pain, as the chances
> we'd need to radic

Re: [libvirt] [PATCH RFC 1/1] allow to specify additional config data

2019-06-18 Thread Cornelia Huck
On Thu, 6 Jun 2019 10:15:52 -0600
Alex Williamson  wrote:

> On Thu, 6 Jun 2019 09:32:24 -0600
> Alex Williamson  wrote:
> 
> > On Thu,  6 Jun 2019 16:44:17 +0200
> > Cornelia Huck  wrote:
> >   
> > > Add a rough implementation for vfio-ap.
> > > 
> > > Signed-off-by: Cornelia Huck 
> > > ---
> > >  mdevctl.libexec | 25 ++
> > >  mdevctl.sbin| 56 -
> > >  2 files changed, 80 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/mdevctl.libexec b/mdevctl.libexec
> > > index 804166b5086d..cc0546142924 100755
> > > --- a/mdevctl.libexec
> > > +++ b/mdevctl.libexec
> > > @@ -54,6 +54,19 @@ wait_for_supported_types () {
> > >  fi
> > >  }
> > >  
> > > +# configure vfio-ap devices  
> > > +configure_ap_devices() {
> > > +list="`echo "${config[$1]}" | sed 's/,/ /'`"
> > > +[ -z "$list" ] && return
> > > +for a in $list; do
> > > +echo "$a" > 
> > > "$supported_types/${config[mdev_type]}/devices/$uuid/$2"
> > > +if [ $? -ne 0 ]; then
> > > +echo "Error writing '$a' to '$uuid/$2'" >&2
> > > +exit 1
> > > +fi
> > > +done
> > > +}
> > > +
> > >  case ${1} in
> > >  start-mdev|stop-mdev)
> > >  if [ $# -ne 2 ]; then
> > > @@ -148,6 +161,18 @@ case ${cmd} in
> > >  echo "Error creating mdev type ${config[mdev_type]} on 
> > > $parent" >&2
> > >  exit 1
> > >  fi
> > > +
> > > +# some types may specify additional config data
> > > +case ${config[mdev_type]} in
> > > +vfio_ap-passthrough)
> > 
> > I think this could have some application beyond ap too, I know NVIDIA
> > GRID vGPUs do have some controls under the vendor hierarchy of the
> > device, ex. setting the frame rate limiter.  The implementation here is
> > a bit rigid, we know a specific protocol for a specific mdev type, but
> > for supporting arbitrary vendor options we'd really just want to try to
> > apply whatever options are provided.  If we didn't care about ordering,
> > we could just look for keys for every file in the device's immediate
> > sysfs hierarchy and apply any value we find, independent of the
> > mdev_type, ex. intel_vgpu/foo=bar  Thanks,  
> 
> For example:
> 
> for key in find -P $mdev_base/$uuid/ \( -path
> "$mdev_base/$uuid/power/*" -o -path $mdev_base/$uuid/uevent -o -path 
> $mdev_base/$uuid/remove \) -prune -o -type f -print | sed -e 
> "s|$mdev_base/$uuid/||g"); do
>   [ -z ${config[$key]} ] && continue
>   ... parse value(s) and iteratively apply to key
> done
> 
> The find is a little ugly to exclude stuff, maybe we just let people do
> screwy stuff like specify remove=1 in their config.  Also need to think
> about whether we're imposing a delimiter to apply multiple values to a
> key that conflicts with the attribute usage.  Thanks,
> 
> Alex

Hm, so I tried to write something like that, but there's an obvious
drawback for the vfio-ap use case: we want to specify a list of values
to be written to an attribute. We would have to model that as a list of
key=value pairs; but that would make it harder to remove a specific
value. (I currently overwrite a given key=value pair with a new value
or delete it.) We could specify something like ^key=value to cancel out
key=value.

Does it make sense to write *all* values specified for a specific key
to that attribute in sequence, or may this have surprising
consequences? Can we live with those possible surprises?

> 
> > > +configure_ap_devices ap_adapters assign_adapter
> > > +configure_ap_devices ap_domains assign_domain
> > > +configure_ap_devices ap_control_domains 
> > > assign_control_domain
> > > +# TODO: is assigning idempotent? Should we unwind on 
> > > error?
> > > +;;
> > > +*)
> > > +;;
> > > +esac
> > >  ;;
> > >  
> > >  add-mdev)

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] mdevctl: A shoestring mediated device management and persistence utility

2019-06-18 Thread Cornelia Huck
On Mon, 17 Jun 2019 11:05:17 -0600
Alex Williamson  wrote:

> On Mon, 17 Jun 2019 16:10:30 +0100
> Daniel P. Berrangé  wrote:
> 
> > On Mon, Jun 17, 2019 at 08:54:38AM -0600, Alex Williamson wrote:  
> > > On Mon, 17 Jun 2019 15:00:00 +0100
> > > Daniel P. Berrangé  wrote:
> > > 
> > > > On Thu, May 23, 2019 at 05:20:01PM -0600, Alex Williamson wrote:
> > > > > Hi,
> > > > > 
> > > > > Currently mediated device management, much like SR-IOV VF management,
> > > > > is largely left as an exercise for the user.  This is an attempt to
> > > > > provide something and see where it goes.  I doubt we'll solve
> > > > > everyone's needs on the first pass, but maybe we'll solve enough and
> > > > > provide helpers for the rest.  Without further ado, I'll point to what
> > > > > I have so far:
> > > > > 
> > > > > https://github.com/awilliam/mdevctl
> > > > > 
> > > > > This is inspired by driverctl, which is also a bash utility.  mdevctl
> > > > > uses udev and systemd to record and recreate mdev devices for
> > > > > persistence and provides a command line utility for querying, listing,
> > > > > starting, stopping, adding, and removing mdev devices.  Currently, for
> > > > > better or worse, it considers anything created to be persistent.  I 
> > > > > can
> > > > > imagine a global configuration option that might disable this and
> > > > > perhaps an autostart flag per mdev device, such that mdevctl might
> > > > > simply "know" about some mdevs but not attempt to create them
> > > > > automatically.  Clearly command line usage help, man pages, and
> > > > > packaging are lacking as well, release early, release often, plus this
> > > > > is a discussion starter to see if perhaps this is sufficient to meet
> > > > > some needs.  
> > > > 
> > > > I think from libvirt's POV, we would *not* want devices to be made
> > > > unconditionally persistent. We usually wish to expose a choice to
> > > > applications whether to have resources be transient or persistent.
> > > > 
> > > > So from that POV, a global config option to turn off persistence
> > > > is not workable either. We would want control per-device, with
> > > > autostart control per device too.
> > > 
> > > The code has progressed somewhat in the past 3+ weeks, we still persist
> > > all devices, but the start-up mode can be selected per device or with a
> > > global default mode.  Devices configured with 'auto' start-up
> > > automatically while 'manual' devices are simply known and available to
> > > be started.  I imagine we could add a 'transient' mode where we purge
> > > the information about the device when it is removed or the next time
> > > the parent device is added.
> > 
> > Having a pesistent config written out & then purged later is still
> > problematic. If the host crashes, nothing will purge the config file,
> > so it will become a persistent device. Also when listing devices we
> > want to be able to report whether it is persistent or transient. The
> > obvious way todo that is to simply look if a config file exists or
> > not.  
> 
> I was thinking that the config file would identify the device as
> transient, therefore if the system crashed we'd have the opportunity to
> purge those entries on the next boot as we're processing the entries
> for that parent device.  Clearly it has yet to be implemented, but I
> expect there are some advantages to tracking devices via a transient
> config entry or else we're constantly re-discovering foreign mdevs.

I think we need to reach consensus about the actual scope of the
mdevctl tool.

- Is it supposed to be responsible for managing *all* mdev devices in
  the system, or is it more supposed to be a convenience helper for
  users/software wanting to manage mdevs?
- Do we want mdevctl to manage config files for individual mdevs, or
  are they supposed to be in a common format that can also be managed
  by e.g. libvirt?
- Should mdevctl be a stand-alone tool, provide library functions, or
  both? Related: should it keep any internal state that is not written
  to disk? (I think that also plays into the transient vs. persistent
  question.)

My personal opinion is that mdevctl should be able to tolerate mdevs
being configured by other means, but probably should not try to impose
its own configuration if it detects that (unless explicitly asked to do
so). Not sure how feasible that goal is.

A well-defined config file format is probably a win, even if it only
ends up being used by mdevctl itself.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] mdevctl: A shoestring mediated device management and persistence utility

2019-06-17 Thread Cornelia Huck
On Fri, 14 Jun 2019 10:04:18 -0600
Alex Williamson  wrote:

> On Fri, 14 Jun 2019 17:06:15 +0200
> Christophe de Dinechin  wrote:
> 
> > > On 14 Jun 2019, at 16:23, Alex Williamson  
> > > wrote:
> > > 
> > > On Fri, 14 Jun 2019 11:54:42 +0200
> > > Christophe de Dinechin  wrote:

> > > Where is the parent/type ownership implied?
> > 
> > I did not imply it, but I read some concern about ownership
> > on your part in "they need to guess that an mdev device
> > with the same parent and type is *theirs*.” (emphasis mine)
> > 
> > I personally see no change on the “need to guess” implied
> > by the fact that you run uuidgen inside the script, so
> > that’s why I tried to guess what you meant.  
> 
> As I noted in the reply to the pull request, putting `uuidgen` inline
> was probably a bad example. 

FWIW, I just sent a pull req to get rid of that inline `uuidgen` in the
example.

> However, the difference is that the user
> has imposed the race on themselves if they invoke mdevctl like this,
> they've provided a uuid but they didn't record what it is.  This is the
> user's problem.  Pushing uuid selection into mdevctl makes it mdevctl's
> problem because the interface is fundamentally broken.
> 
> > > The intended semantics are
> > > "try to create this type of device under this parent”.
> > 
> > Agreed. Which is why I don’t see why trying to create
> > with some new UUID introduces any race (as long as
> > the script prints out that UUID, which I admit my patch
> > entirely failed to to)  
> 
> And that's the piece that makes it fundamentally broken.  Beyond that,
> it seems unnecessary.  I don't see this as the primary invocation of
> mdevctl and the functionality it adds is trivially accomplished in a
> wrapper, so what's the value?
> 
> > >>> How do you resolve two instances of this happening in parallel and both
> > >>> coming to the same conclusion which is their device.  If a user wants
> > >>> this sort of headache they can call mdevctl with `uuidgen` but I don't
> > >>> think we should encourage it further.  
> > >> 
> > >> I agree there is a race, but if anything, having a usage where you don’t
> > >> pass the UUID on the command line is a step in the right direction.
> > >> It leaves the door open for the create-mdev script to do smarter things,
> > >> like deferring the allocation of the mdevs to an entity that has slightly
> > >> more knowledge of the global system state than uuidgen.
> > > 
> > > A user might (likely) require a specific uuid to match their VM
> > > configuration.  I can only think of very niche use cases where a user
> > > doesn't care what uuid they get.
> > 
> > They do care. But I typically copy-paste my UUIDs, and then
> > 
> > 1. copy-pasting at the end is always faster than between
> > the command and other arguments (3-args case). 
> > 
> > 2. copy-pasting the output of the previous command is faster
> > than having one extra step where I need to copy the same thing twice
> > (2-args case).
> > 
> > So to me, if the script is intended to be used by humans, my
> > proposal makes it slightly more comfortable to use. Nothing more.  
> 
> This is your preference, but I wouldn't call it universal.  Specifying
> the uuid last seems backwards to me, we're creating an object so let's
> first name that object.  We then specify where that object should be
> created and what type it has.  This seems very logical to me, besides,
> it's also the exact same order we use when listing mdevs :P
> 
> Clearly there's personal preference here, so let's not arbitrarily pick
> a different preference.  If copy/paste order is more important to you
> then submit a patch to give mdevctl real argument processing so you can
> specify --uuid, --parent, --type in whatever order you want.

I agree that these are personal preferences :) Real argument processing
makes sense, however.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH RFC 1/1] allow to specify additional config data

2019-06-13 Thread Cornelia Huck
On Tue, 11 Jun 2019 10:19:29 -0400
Tony Krowiak  wrote:

> On 6/7/19 4:03 PM, Alex Williamson wrote:
> > On Fri, 7 Jun 2019 14:26:13 -0400
> > Tony Krowiak  wrote:
> >   
> >> On 6/6/19 12:15 PM, Alex Williamson wrote:  
> >>> On Thu, 6 Jun 2019 09:32:24 -0600
> >>> Alex Williamson  wrote:
> >>>      
> >>>> On Thu,  6 Jun 2019 16:44:17 +0200
> >>>> Cornelia Huck  wrote:
> >>>> 
> >>>>> Add a rough implementation for vfio-ap.
> >>>>>
> >>>>> Signed-off-by: Cornelia Huck 
> >>>>> ---
> >>>>>mdevctl.libexec | 25 ++
> >>>>>mdevctl.sbin| 56 
> >>>>> -
> >>>>>2 files changed, 80 insertions(+), 1 deletion(-)
> >>>>>
> >>>>> diff --git a/mdevctl.libexec b/mdevctl.libexec
> >>>>> index 804166b5086d..cc0546142924 100755
> >>>>> --- a/mdevctl.libexec
> >>>>> +++ b/mdevctl.libexec
> >>>>> @@ -54,6 +54,19 @@ wait_for_supported_types () {
> >>>>>fi
> >>>>>}
> >>>>>
> >>>>> +# configure vfio-ap devices  
> >>>>> +configure_ap_devices() {
> >>>>> +list="`echo "${config[$1]}" | sed 's/,/ /'`"
> >>>>> +[ -z "$list" ] && return
> >>>>> +for a in $list; do
> >>>>> +echo "$a" > 
> >>>>> "$supported_types/${config[mdev_type]}/devices/$uuid/$2"
> >>>>> +if [ $? -ne 0 ]; then
> >>>>> +echo "Error writing '$a' to '$uuid/$2'" >&2
> >>>>> +exit 1
> >>>>> +fi
> >>>>> +done
> >>>>> +}
> >>>>> +
> >>>>>case ${1} in
> >>>>>start-mdev|stop-mdev)
> >>>>>if [ $# -ne 2 ]; then
> >>>>> @@ -148,6 +161,18 @@ case ${cmd} in
> >>>>>echo "Error creating mdev type ${config[mdev_type]} on 
> >>>>> $parent" >&2
> >>>>>exit 1
> >>>>>fi
> >>>>> +
> >>>>> +# some types may specify additional config data
> >>>>> +case ${config[mdev_type]} in
> >>>>> +vfio_ap-passthrough)  
> >>>>
> >>>> I think this could have some application beyond ap too, I know NVIDIA
> >>>> GRID vGPUs do have some controls under the vendor hierarchy of the
> >>>> device, ex. setting the frame rate limiter.  The implementation here is
> >>>> a bit rigid, we know a specific protocol for a specific mdev type, but
> >>>> for supporting arbitrary vendor options we'd really just want to try to
> >>>> apply whatever options are provided.  If we didn't care about ordering,
> >>>> we could just look for keys for every file in the device's immediate
> >>>> sysfs hierarchy and apply any value we find, independent of the
> >>>> mdev_type, ex. intel_vgpu/foo=bar  Thanks,  
> >>>
> >>> For example:
> >>>
> >>> for key in find -P $mdev_base/$uuid/ \( -path
> >>> "$mdev_base/$uuid/power/*" -o -path $mdev_base/$uuid/uevent -o -path 
> >>> $mdev_base/$uuid/remove \) -prune -o -type f -print | sed -e 
> >>> "s|$mdev_base/$uuid/||g"); do
> >>> [ -z ${config[$key]} ] && continue
> >>> ... parse value(s) and iteratively apply to key
> >>> done
> >>>
> >>> The find is a little ugly to exclude stuff, maybe we just let people do
> >>> screwy stuff like specify remove=1 in their config.  Also need to think
> >>> about whether we're imposing a delimiter to apply multiple values to a
> >>> key that conflicts with the attribute usage.  Thanks,
> >>>
> >>> Alex  

One thing that this does is limiting us to things that can be expressed
with "if you encounter key=value, take value (possibly decomposed) and
write it to /key". A problem with this generic approach is that
the code cannot decide itself whether value should be decomposed (and
if yes, with which delimiter), or not. We also cannot cover a

Re: [libvirt] [PATCH RFC 0/1] mdevctl: further config for vfio-ap

2019-06-13 Thread Cornelia Huck
On Fri, 7 Jun 2019 14:30:48 -0400
Tony Krowiak  wrote:

> On 6/7/19 10:56 AM, Cornelia Huck wrote:
> > On Thu, 6 Jun 2019 12:45:29 -0400
> > Matthew Rosato  wrote:
> >   
> >> On 6/6/19 10:44 AM, Cornelia Huck wrote:  
> >>> This patch adds a very rough implementation of additional config data
> >>> for mdev devices. The idea is to make it possible to specify some
> >>> type-specific key=value pairs in the config file for an mdev device.
> >>> If a device is started automatically, the device is stopped and restarted
> >>> after applying the config.
> >>>
> >>> The code has still some problems, like not doing a lot of error handling
> >>> and being ugly in general; but most importantly, I can't really test it,
> >>> as I don't have the needed hardware. Feedback welcome; would be good to
> >>> know if the direction is sensible in general.  
> >>
> >> Hi Connie,
> >>
> >> This is very similar to what I was looking to do in zdev (config via
> >> key=value pairs), so I like your general approach.
> >>
> >> I pulled your code and took it for a spin on an LPAR with access to
> >> crypto cards:
> >>
> >> # mdevctl create-mdev `uuidgen` matrix vfio_ap-passthrough
> >> # mdevctl set-additional-config  ap_adapters=0x4,0x5
> >> # mdevctl set-additional-config  ap_domains=0x36
> >> # mdevctl set-additional-config  ap_control_domains=0x37
> >>
> >> Assuming all valid inputs, this successfully creates the appropriate
> >> mdev and what looks to be a valid mdevctl.d entry.  A subsequent reboot
> >> successfully brings the same vfio_ap-passthrough device up again.  
> > 
> > Cool, thanks for checking!  
> 
> I also confirmed this. I realize this is a very early proof of concept,
> if you will, but error handling could be an interesting endeavor in
> the case of vfio_ap given the layers of configuration involved; 

I agree; that's why I mostly ignored it for now :)

> for
> example:
> 
> * The necessity for the vfio_ap module to be installed
> * The necessity that the /sys/bus/ap/apmask and /sys/bus/ap/aqmask must
>be appropriately configured

What do you think about outsourcing that configuration to some
s390-specific tool (probably something in s390-tools)? While we can
(and should) rely on driverctl for vfio-ccw, this does not look like
something that can be easily served by some generic tooling.

> 
> >   
> >>
> >> Matt
> >>  
> >>>
> >>> Also available at
> >>>
> >>> https://github.com/cohuck/mdevctl conf-data
> >>>
> >>> Cornelia Huck (1):
> >>>allow to specify additional config data
> >>>
> >>>   mdevctl.libexec | 25 ++
> >>>   mdevctl.sbin| 56 -
> >>>   2 files changed, 80 insertions(+), 1 deletion(-)
> >>>  
> >>  
> >   
> 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] mdevctl: A shoestring mediated device management and persistence utility

2019-06-13 Thread Cornelia Huck
On Wed, 12 Jun 2019 17:54:34 +0200
Halil Pasic  wrote:

> On Wed, 12 Jun 2019 09:14:39 +0200
> Cornelia Huck  wrote:

> > Ok, looked at driverctl. Extending this one for non-PCI seems like a
> > reasonable path. However, we would also need to extend any non-PCI
> > device type we want to support with a driver_override attribute like
> > you did for PCI in 782a985d7af26db39e86070d28f987cad2 -- so this is
> > only for newer kernels. Adding that attribute for subchannels looks
> > feasible at a glance, but I have not tried to actually do it :)
> > 
> > Halil, do you think that would make sense?  
> 
> Looks doable. Did not quite figure out the details yet, but it seems
> that for PCI driver_override has more benefits than for cio (compared
> to simple unbind/bind), as matching and probing seems to be more
> elaborate for PCI. The benefit I see are
> 1) the ability to exclude the device form driver binding, and
> 2) having the same mechanism and thus consistent experience for pci and
> cio.

Yes, we should provide the same mechanism, even if it is much simpler
for the css bus.

> 
> What we IMHO should not do is make driver_override the override the
> sch->st == id->type check.

Agreed. The number of possible ids is much lower on the css bus, and a
driver wanting to match to any device may simply specify all of them
(not that this looks very useful).

I'm currently playing with this change; will send out a patch when I
have it in reasonable shape.

> 
> Regards,
> Halil
> 
> > 
> > [This might also help with the lcs vs. ctc confusion on a certain 3088
> > cu model if this is added for ccw devices as well; but I'm not sure if
> > these are still out in the wild at all. Probably not worth the effort
> > for that.]  
> 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] mdevctl: A shoestring mediated device management and persistence utility

2019-06-12 Thread Cornelia Huck
On Tue, 11 Jun 2019 14:28:22 -0600
Alex Williamson  wrote:

> On Tue, 11 Jun 2019 21:45:08 +0200
> Cornelia Huck  wrote:
> 
> > On Fri, 7 Jun 2019 18:06:30 +0200
> > Halil Pasic  wrote:

> > > I guess for vfio-ccw one needs to make sure that the ccw device is bound
> > > to the vfio-ccw driver first, and only after that can one use  
> > > create-mdev to create the mdev on top of the subchannel.
> > > 
> > > So to make this work persistently (survive a reboot) one would need to
> > > take care of the subchannel getting bound to the right vfio_ccw driver
> > > before mdevctl is called. Right?
> > > 
> > > BTW how does this concurrence situation between the drivers io_subchannel
> > > and vfio_ccw work? Especially if both are build in?
> > 
> > If you have two drivers that match to the same device type, you'll
> > always have the issue that the driver that is first matched with the
> > device will bind to it and you have to do the unbind/rebind dance to
> > get it bound to the correct device driver. (I guess that this was the
> > basic motivation behind the ap bus default driver infrastructure,
> > right?) I think that in our case the io_subchannel driver will be
> > called first (alphabetical order and the fact that vfio-ccw will often
> > be a module). I'm not sure if it is within the scope of mdevctl to
> > ensure that the device is bound to the correct driver, or if it rather
> > should work with devices already bound to the correct driver only.
> > Maybe a separate udev-rules generator?  
> 
> Getting a device bound to a specific driver is exactly the domain of
> driverctl.  Implement the sysfs interfaces driverctl uses and see if it
> works.  Driverctl defaults to PCI and knows some extra things about
> PCI, but appears to be written to be generally bus agnostic.  Thanks,
> 
> Alex

Ok, looked at driverctl. Extending this one for non-PCI seems like a
reasonable path. However, we would also need to extend any non-PCI
device type we want to support with a driver_override attribute like
you did for PCI in 782a985d7af26db39e86070d28f987cad2 -- so this is
only for newer kernels. Adding that attribute for subchannels looks
feasible at a glance, but I have not tried to actually do it :)

Halil, do you think that would make sense?

[This might also help with the lcs vs. ctc confusion on a certain 3088
cu model if this is added for ccw devices as well; but I'm not sure if
these are still out in the wild at all. Probably not worth the effort
for that.]

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] mdevctl: A shoestring mediated device management and persistence utility

2019-06-11 Thread Cornelia Huck
On Fri, 7 Jun 2019 18:06:30 +0200
Halil Pasic  wrote:

> On Fri, 24 May 2019 12:11:06 +0200
> Cornelia Huck  wrote:
> 
> > On Thu, 23 May 2019 17:20:01 -0600
> > Alex Williamson  wrote:
> >   
> > > Hi,
> > >   
> 
> [..]
> 
> > > 
> > > It would be really useful if s390 folks could help me understand
> > > whether it's possible to glean all the information necessary to
> > > recreate a ccw or ap mdev device from sysfs.  I expect the file where
> > > we currently only store the mdev_type to evolve into something that
> > > includes more information to facilitate more complicated devices.  For
> > > now I make no claims to maintaining compatibility of recorded mdev
> > > devices, it will absolutely change, but I didn't want to get bogged
> > > down in making sure I don't accidentally source a root kit hidden in an
> > > mdev config file.  
> > 
> > I played a bit with it on my LPAR, and it is at least not obviously
> > broken with vfio-ccw :) I don't have any ap devices to play with,
> > though.
> >   
> 
> Sorry for being late...
> 
> I guess for vfio-ccw one needs to make sure that the ccw device is bound
> to the vfio-ccw driver first, and only after that can one use  
> create-mdev to create the mdev on top of the subchannel.
> 
> So to make this work persistently (survive a reboot) one would need to
> take care of the subchannel getting bound to the right vfio_ccw driver
> before mdevctl is called. Right?
> 
> BTW how does this concurrence situation between the drivers io_subchannel
> and vfio_ccw work? Especially if both are build in?

If you have two drivers that match to the same device type, you'll
always have the issue that the driver that is first matched with the
device will bind to it and you have to do the unbind/rebind dance to
get it bound to the correct device driver. (I guess that this was the
basic motivation behind the ap bus default driver infrastructure,
right?) I think that in our case the io_subchannel driver will be
called first (alphabetical order and the fact that vfio-ccw will often
be a module). I'm not sure if it is within the scope of mdevctl to
ensure that the device is bound to the correct driver, or if it rather
should work with devices already bound to the correct driver only.
Maybe a separate udev-rules generator?

There's also the question where that automatic configuration should
stop. Should cio_ignore handling be part of it as well? [That's a
non-generic interface, of course. Tooling within s390-tools, maybe?]

> > > 
> > > I'm also curious how or if libvirt or openstack might use this.
> > > If nothing else, it makes libvirt hook scripts easier to write,
> > > especially if we add an option not to autostart mdevs, or if
> > > users don't mind persistent mdevs, maybe there's nothing more to
> > > do. 
> 
> +1
> 
> @Alex: I'm curious what is the big management picture for non-auto
> looks like.
> 
> Regards,
> Halil
> 
> [..]
> 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH RFC 0/1] mdevctl: further config for vfio-ap

2019-06-07 Thread Cornelia Huck
On Thu, 6 Jun 2019 12:45:29 -0400
Matthew Rosato  wrote:

> On 6/6/19 10:44 AM, Cornelia Huck wrote:
> > This patch adds a very rough implementation of additional config data
> > for mdev devices. The idea is to make it possible to specify some
> > type-specific key=value pairs in the config file for an mdev device.
> > If a device is started automatically, the device is stopped and restarted
> > after applying the config.
> > 
> > The code has still some problems, like not doing a lot of error handling
> > and being ugly in general; but most importantly, I can't really test it,
> > as I don't have the needed hardware. Feedback welcome; would be good to
> > know if the direction is sensible in general.  
> 
> Hi Connie,
> 
> This is very similar to what I was looking to do in zdev (config via
> key=value pairs), so I like your general approach.
> 
> I pulled your code and took it for a spin on an LPAR with access to
> crypto cards:
> 
> # mdevctl create-mdev `uuidgen` matrix vfio_ap-passthrough
> # mdevctl set-additional-config  ap_adapters=0x4,0x5
> # mdevctl set-additional-config  ap_domains=0x36
> # mdevctl set-additional-config  ap_control_domains=0x37
> 
> Assuming all valid inputs, this successfully creates the appropriate
> mdev and what looks to be a valid mdevctl.d entry.  A subsequent reboot
> successfully brings the same vfio_ap-passthrough device up again.

Cool, thanks for checking!

> 
> Matt
> 
> > 
> > Also available at
> > 
> > https://github.com/cohuck/mdevctl conf-data
> > 
> > Cornelia Huck (1):
> >   allow to specify additional config data
> > 
> >  mdevctl.libexec | 25 ++
> >  mdevctl.sbin| 56 -
> >  2 files changed, 80 insertions(+), 1 deletion(-)
> >   
> 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH RFC 1/1] allow to specify additional config data

2019-06-06 Thread Cornelia Huck
Add a rough implementation for vfio-ap.

Signed-off-by: Cornelia Huck 
---
 mdevctl.libexec | 25 ++
 mdevctl.sbin| 56 -
 2 files changed, 80 insertions(+), 1 deletion(-)

diff --git a/mdevctl.libexec b/mdevctl.libexec
index 804166b5086d..cc0546142924 100755
--- a/mdevctl.libexec
+++ b/mdevctl.libexec
@@ -54,6 +54,19 @@ wait_for_supported_types () {
 fi
 }
 
+# configure vfio-ap devices  
+configure_ap_devices() {
+list="`echo "${config[$1]}" | sed 's/,/ /'`"
+[ -z "$list" ] && return
+for a in $list; do
+echo "$a" > "$supported_types/${config[mdev_type]}/devices/$uuid/$2"
+if [ $? -ne 0 ]; then
+echo "Error writing '$a' to '$uuid/$2'" >&2
+exit 1
+fi
+done
+}
+
 case ${1} in
 start-mdev|stop-mdev)
 if [ $# -ne 2 ]; then
@@ -148,6 +161,18 @@ case ${cmd} in
 echo "Error creating mdev type ${config[mdev_type]} on $parent" >&2
 exit 1
 fi
+
+# some types may specify additional config data
+case ${config[mdev_type]} in
+vfio_ap-passthrough)
+configure_ap_devices ap_adapters assign_adapter
+configure_ap_devices ap_domains assign_domain
+configure_ap_devices ap_control_domains assign_control_domain
+# TODO: is assigning idempotent? Should we unwind on error?
+;;
+*)
+;;
+esac
 ;;
 
 add-mdev)
diff --git a/mdevctl.sbin b/mdevctl.sbin
index 276cf6ddc817..eb5ee0091879 100755
--- a/mdevctl.sbin
+++ b/mdevctl.sbin
@@ -33,6 +33,8 @@ usage() {
 echo "set-start : change mdev start policy, if no option 
specified," >&2
 echo "   system default policy is used" >&2
 echo "   options: [--auto] [--manual]" >&2
+echo "set-additional-config  {fmt...}: supply additional 
configuration" >&2
+echo "show-additional-config-format :  prints the format 
expected by the device" >&2
 echo "list-all: list all possible mdev types supported in the system" >&2
 echo "list-available: list all mdev types currently available" >&2
 echo "list-mdevs: list currently configured mdevs" >&2
@@ -48,7 +50,7 @@ while (($# > 0)); do
 --manual)
 config[start]=manual
 ;;
-start-mdev|stop-mdev|remove-mdev|set-start)
+
start-mdev|stop-mdev|remove-mdev|set-start|show-additional-config-format)
 [ $# -ne 2 ] && usage
 cmd=$1
 uuid=$2
@@ -67,6 +69,14 @@ while (($# > 0)); do
 cmd=$1
 break
 ;;
+set-additional-config)
+[ $# -le 2 ] && usage
+cmd=$1
+uuid=$2
+shift 2
+addtl_config="$*"
+break
+;;
 *)
 usage
 ;;
@@ -114,6 +124,50 @@ case ${cmd} in
 fi
 ;;
 
+set-additional-config)
+file=$(find $persist_base -name $uuid -type f)
+if [ -w "$file" ]; then
+read_config "$file"
+if [ ${config[start]} == "auto" ]; then
+systemctl stop mdev@$uuid.service
+fi
+# FIXME: validate input!
+for i in $addtl_config; do
+key="`echo "$i" | cut -d '=' -f 1`"
+value="`echo "$i" | cut -d '=' -f 2-`"
+if grep -q ^$key $file; then
+if [ -z "$value" ]; then
+sed -i "s/^$key=.*//g" $file
+else
+sed -i "s/^$key=.*/$key=$value/g" $file
+fi
+else
+echo "$i" >> "$file"
+fi
+done
+if [ ${config[start]} == "auto" ]; then
+systemctl start mdev@$uuid.service
+fi
+else
+exit 1
+fi
+;;
+
+show-additional-config-format)
+file=$(find $persist_base -name $uuid -type f)
+read_config "$file"
+case ${config[mdev_type]} in
+vfio_ap-passthrough)
+echo "ap_adapters="
+echo "ap_domains="
+echo "ap_control_domains="
+;;
+*)
+echo "no additional configuration defined"
+;;
+esac
+;;
+
 list-mdevs)
 for mdev in $(find $mdev_base/ -maxdepth 1 -mindepth 1 -type l); do
 uuid=$(basename $mdev)
-- 
2.20.1

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH RFC 0/1] mdevctl: further config for vfio-ap

2019-06-06 Thread Cornelia Huck
This patch adds a very rough implementation of additional config data
for mdev devices. The idea is to make it possible to specify some
type-specific key=value pairs in the config file for an mdev device.
If a device is started automatically, the device is stopped and restarted
after applying the config.

The code has still some problems, like not doing a lot of error handling
and being ugly in general; but most importantly, I can't really test it,
as I don't have the needed hardware. Feedback welcome; would be good to
know if the direction is sensible in general.

Also available at

https://github.com/cohuck/mdevctl conf-data

Cornelia Huck (1):
  allow to specify additional config data

 mdevctl.libexec | 25 ++
 mdevctl.sbin| 56 -
 2 files changed, 80 insertions(+), 1 deletion(-)

-- 
2.20.1

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [PATCH v3 2/2] drm/i915/gvt: export migration_version to mdev sysfs for Intel vGPU

2019-05-28 Thread Cornelia Huck
On Sun, 26 May 2019 23:44:37 -0400
Yan Zhao  wrote:

> This feature implements the migration_version attribute for Intel's vGPU
> mdev devices.
> 
> migration_version attribute is rw.
> It's used to check migration compatibility for two mdev devices of the
> same mdev type.
> migration_version string is defined by vendor driver and opaque to
> userspace.
> 
> For Intel vGPU of gen8 and gen9, the format of migration_version string
> is:
>   ---.
> 
> For future platforms, the format of migration_version string is to be
> expanded to include more meta data to identify Intel vGPUs for live
> migration compatibility check
> 
> For old platforms, and for GVT not supporting vGPU live migration
> feature, -ENODEV is returned on read(2)/write(2) of migration_version
> attribute.
> For vGPUs running old GVT who do not expose migration_version
> attribute, live migration is regarded as not supported for those vGPUs.
> 
> Cc: Alex Williamson 
> Cc: Erik Skultety 
> Cc: "Dr. David Alan Gilbert" 
> Cc: Cornelia Huck 
> Cc: "Tian, Kevin" 
> Cc: Zhenyu Wang 
> Cc: "Wang, Zhi A" 
> c: Neo Jia 
> Cc: Kirti Wankhede 
> 
> Signed-off-by: Yan Zhao 
> 
> ---
> v3:
> 1. renamed version to migration_version
> (Christophe de Dinechin, Cornelia Huck, Alex Williamson)
> 2. instead of generating migration version strings each time, storing
> them in vgpu types generated during initialization.
> (Zhenyu Wang, Cornelia Huck)
> 3. replaced multiple snprintf to one big snprintf in
> intel_gvt_get_vfio_migration_version()
> (Dr. David Alan Gilbert)
> 4. printed detailed error log
> (Alex Williamson, Erik Skultety, Cornelia Huck, Dr. David Alan Gilbert)
> 5. incorporated  into migration_version string
> (Alex Williamson)
> 6. do not use ifndef macro to switch off migration_version attribute
> (Zhenyu Wang)
> 
> v2:
> 1. removed 32 common part of version string
> (Alex Williamson)
> 2. do not register version attribute for GVT not supporting live
> migration.(Cornelia Huck)
> 3. for platforms out of gen8, gen9, return -EINVAL --> -ENODEV for
> incompatible. (Cornelia Huck)
> ---
>  drivers/gpu/drm/i915/gvt/Makefile|   2 +-
>  drivers/gpu/drm/i915/gvt/gvt.c   |  39 +
>  drivers/gpu/drm/i915/gvt/gvt.h   |   5 +
>  drivers/gpu/drm/i915/gvt/migration_version.c | 167 +++
>  drivers/gpu/drm/i915/gvt/vgpu.c  |  13 +-
>  5 files changed, 223 insertions(+), 3 deletions(-)
>  create mode 100644 drivers/gpu/drm/i915/gvt/migration_version.c
> 

(...)

> diff --git a/drivers/gpu/drm/i915/gvt/gvt.c b/drivers/gpu/drm/i915/gvt/gvt.c
> index 43f4242062dd..be2980e8ac75 100644
> --- a/drivers/gpu/drm/i915/gvt/gvt.c
> +++ b/drivers/gpu/drm/i915/gvt/gvt.c
> @@ -105,14 +105,53 @@ static ssize_t description_show(struct kobject *kobj, 
> struct device *dev,
>  type->weight);
>  }
>  
> +static ssize_t migration_version_show(struct kobject *kobj, struct device 
> *dev,
> + char *buf)

Indentation looks a bit odd? (Also below.)

> +{
> + struct intel_vgpu_type *type;
> + void *gvt = kdev_to_i915(dev)->gvt;
> +
> + type = intel_gvt_find_vgpu_type(gvt, kobject_name(kobj));
> + if (!type || !type->migration_version) {
> + gvt_err("Does not support migraion on type %s. Please search 
> previous detailed log\n",

s/migraion/migration/ (also below)

Or reword to "Migration not supported on type %s."?

> + kobject_name(kobj));
> + return -ENODEV;
> + }
> +
> + return snprintf(buf, strlen(type->migration_version) + 2,
> + "%s\n", type->migration_version);
> +}
> +
> +static ssize_t migration_version_store(struct kobject *kobj, struct device 
> *dev,
> + const char *buf, size_t count)
> +{
> + int ret = 0;
> + struct intel_vgpu_type *type;
> + void *gvt = kdev_to_i915(dev)->gvt;
> +
> + type = intel_gvt_find_vgpu_type(gvt, kobject_name(kobj));
> + if (!type || !type->migration_version) {
> + gvt_err("Does not support migraion on type %s. Please search 
> previous detailed log\n",
> + kobject_name(kobj));
> + return -ENODEV;
> + }
> +
> + ret = intel_gvt_check_vfio_migration_version(gvt,
> + type->migration_version, buf);
> +
> + return (ret < 0 ? ret : count);
> +}
> +
>  static MDEV_TYPE_ATTR_RO(available_instances);
>  static MDEV_TYPE_ATTR_RO(device_api);
>  static MDEV_TYPE_ATTR_RO(description);
> +static

Re: [libvirt] [PATCH v3 1/2] vfio/mdev: add migration_version attribute for mdev device

2019-05-28 Thread Cornelia Huck
On Sun, 26 May 2019 23:43:42 -0400
Yan Zhao  wrote:

> migration_version attribute is used to check migration compatibility
> between two mdev device of the same mdev type.

s/device/devices/

> The key is that it's rw and its data is opaque to userspace.
> 
> Userspace reads migration_version of mdev device at source side and
> writes the value to migration_version attribute of mdev device at target
> side. It judges migration compatibility according to whether the read
> and write operations succeed or fail.
> 
> As this attribute is under mdev_type node, userspace is able to know
> whether two mdev devices are compatible before a mdev device is created.
> 
> userspace needs to check whether the two mdev devices are of the same
> mdev type before checking the migration_version attribute. It also needs
> to check device creation parameters if aggregation is supported in
> future.
> 
>  __userspace
>   /\  \
>  / \write
> / read  \
>/__   ___\|/_
>   | migration_version | | migration_version |-->check migration
>   - -   compatibility
> mdev device A   mdev device B
> 
> Cc: Alex Williamson 
> Cc: Erik Skultety 
> Cc: "Dr. David Alan Gilbert" 
> Cc: Cornelia Huck 
> Cc: "Tian, Kevin" 
> Cc: Zhenyu Wang 
> Cc: "Wang, Zhi A" 
> Cc: Neo Jia 
> Cc: Kirti Wankhede 
> Cc: Daniel P. Berrangé 
> Cc: Christophe de Dinechin 
> 
> Signed-off-by: Yan Zhao 
> 
> ---
> v3:
> 1. renamed version to migration_version
> (Christophe de Dinechin, Cornelia Huck, Alex Williamson)
> 2. let errno to be freely defined by vendor driver
> (Alex Williamson, Erik Skultety, Cornelia Huck, Dr. David Alan Gilbert)
> 3. let checking mdev_type be prerequisite of migration compatibility
> check. (Alex Williamson)
> 4. reworded example usage section.
> (most of this section came from Alex Williamson)
> 5. reworded attribute intention section (Cornelia Huck)
> 
> v2:
> 1. added detailed intent and usage
> 2. made definition of version string completely private to vendor driver
>(Alex Williamson)
> 3. abandoned changes to sample mdev drivers (Alex Williamson)
> 4. mandatory --> optional (Cornelia Huck)
> 5. added description for errno (Cornelia Huck)
> ---
>  Documentation/vfio-mediated-device.txt | 113 +
>  1 file changed, 113 insertions(+)
> 

While I probably would have written a more compact description, your
version is fine with me as well.

Reviewed-by: Cornelia Huck 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] mdevctl: A shoestring mediated device management and persistence utility

2019-05-24 Thread Cornelia Huck
On Thu, 23 May 2019 17:20:01 -0600
Alex Williamson  wrote:

> Hi,
> 
> Currently mediated device management, much like SR-IOV VF management,
> is largely left as an exercise for the user.  This is an attempt to
> provide something and see where it goes.  I doubt we'll solve
> everyone's needs on the first pass, but maybe we'll solve enough and
> provide helpers for the rest.  Without further ado, I'll point to what
> I have so far:
> 
> https://github.com/awilliam/mdevctl
> 
> This is inspired by driverctl, which is also a bash utility.  mdevctl
> uses udev and systemd to record and recreate mdev devices for
> persistence and provides a command line utility for querying, listing,
> starting, stopping, adding, and removing mdev devices.  Currently, for
> better or worse, it considers anything created to be persistent.  I can
> imagine a global configuration option that might disable this and
> perhaps an autostart flag per mdev device, such that mdevctl might
> simply "know" about some mdevs but not attempt to create them
> automatically.  Clearly command line usage help, man pages, and
> packaging are lacking as well, release early, release often, plus this
> is a discussion starter to see if perhaps this is sufficient to meet
> some needs.
> 
> Originally I thought about making a utility to manage both mdev and
> SR-IOV VFs all in one, but it seemed more natural to start here
> (besides, I couldn't think of a good name for the combined utility).
> If this seems useful, maybe I'll start on a vfctl for SR-IOV and we'll
> see whether they have enough synergy to become one.
> 
> It would be really useful if s390 folks could help me understand
> whether it's possible to glean all the information necessary to
> recreate a ccw or ap mdev device from sysfs.  I expect the file where
> we currently only store the mdev_type to evolve into something that
> includes more information to facilitate more complicated devices.  For
> now I make no claims to maintaining compatibility of recorded mdev
> devices, it will absolutely change, but I didn't want to get bogged
> down in making sure I don't accidentally source a root kit hidden in an
> mdev config file.

I played a bit with it on my LPAR, and it is at least not obviously
broken with vfio-ccw :) I don't have any ap devices to play with,
though.

> 
> I'm also curious how or if libvirt or openstack might use this.  If
> nothing else, it makes libvirt hook scripts easier to write, especially
> if we add an option not to autostart mdevs, or if users don't mind
> persistent mdevs, maybe there's nothing more to do.
> 
> BTW, feel free to clean up by bash, I'm a brute force and ignorance
> shell coder ;)  

Not that I'm a good shell coder, but I sent you a pull req at least
adding a basic help text ;)

I have not yet looked at most of the code, though.

> Thanks,
> 
> Alex

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [PATCH v2 1/2] vfio/mdev: add version attribute for mdev device

2019-05-14 Thread Cornelia Huck
On Tue, 14 May 2019 12:01:45 +0100
"Dr. David Alan Gilbert"  wrote:

> * Cornelia Huck (coh...@redhat.com) wrote:
> > On Tue, 14 May 2019 03:47:36 -0400
> > Yan Zhao  wrote:

> > > hi Cornelia and Dave,
> > > do you also agree on:
> > > 1. "not to define the specific errno returned for a specific situation,
> > > let the vendor driver decide, userspace simply needs to know that an 
> > > errno on
> > > read indicates the device does not support migration version comparison 
> > > and
> > > that an errno on write indicates the devices are incompatible or the 
> > > target
> > > doesn't support migration versions. "
> > > 2. vendor driver should log detailed error reasons in kernel log.  
> > 
> > Two questions:
> > - How reasonable is it to refer to the system log in order to find out
> >   what exactly went wrong?
> > - If detailed error reporting is basically done to the syslog, do
> >   different error codes still provide useful information? Or should the
> >   vendor driver decide what it wants to do?  
> 
> I don't see error codes as being that helpful; if we can't actually get
> an error message back up the stack (which was my preference), then I guess
> syslog is as good as it will get.

Ok, so letting the vendor driver simply return an(y) error and possibly
dumping an error message into the syslog seems to be the most
reasonable approach.


Re: [PATCH v2 1/2] vfio/mdev: add version attribute for mdev device

2019-05-14 Thread Cornelia Huck
On Tue, 14 May 2019 03:47:36 -0400
Yan Zhao  wrote:

> On Tue, May 14, 2019 at 03:43:44PM +0800, Erik Skultety wrote:
> > On Tue, May 14, 2019 at 03:32:19AM -0400, Yan Zhao wrote:  
> > > On Tue, May 14, 2019 at 03:20:40PM +0800, Erik Skultety wrote:  

> > > > That said, from libvirt POV as a consumer, I'd expect there to be truly 
> > > > only 2
> > > > errors (I believe Alex has mentioned something similar in one of his 
> > > > responses
> > > > in one of the threads):
> > > > a) read error indicating that an mdev type doesn't support migration
> > > > - I assume if one type doesn't support migration, none of the 
> > > > other
> > > >   types exposed on the parent device do, is that a fair 
> > > > assumption?

Probably; but there might be cases where the migratability depends not
on the device type, but how the partitioning has been done... or is
that too contrived?

> > > > b) write error indicating that the mdev types are incompatible for
> > > > migration
> > > >
> > > > Regards,
> > > > Erik  
> > > Thanks for this explanation.
> > > so, can we arrive at below agreements?
> > >
> > > 1. "not to define the specific errno returned for a specific situation,
> > > let the vendor driver decide, userspace simply needs to know that an 
> > > errno on
> > > read indicates the device does not support migration version comparison 
> > > and
> > > that an errno on write indicates the devices are incompatible or the 
> > > target
> > > doesn't support migration versions. "
> > > 2. vendor driver should log detailed error reasons in kernel log.  
> > 
> > That would be my take on this, yes, but I open to hear any other 
> > suggestions and
> > ideas I couldn't think of as well.

So, read to find out whether migration is supported at all, write to
find out whether it is supported for that concrete pairing is
reasonable for libvirt?

> > 
> > Erik  
> got it. thanks a lot!
> 
> hi Cornelia and Dave,
> do you also agree on:
> 1. "not to define the specific errno returned for a specific situation,
> let the vendor driver decide, userspace simply needs to know that an errno on
> read indicates the device does not support migration version comparison and
> that an errno on write indicates the devices are incompatible or the target
> doesn't support migration versions. "
> 2. vendor driver should log detailed error reasons in kernel log.

Two questions:
- How reasonable is it to refer to the system log in order to find out
  what exactly went wrong?
- If detailed error reporting is basically done to the syslog, do
  different error codes still provide useful information? Or should the
  vendor driver decide what it wants to do?


Re: [libvirt] [PATCH v2 1/2] vfio/mdev: add version attribute for mdev device

2019-05-14 Thread Cornelia Huck
On Tue, 14 May 2019 02:12:35 -0400
Yan Zhao  wrote:

> On Mon, May 13, 2019 at 09:28:04PM +0800, Erik Skultety wrote:

> > In case of libvirt checking the compatibility, it won't matter how good the
> > error message in the kernel log is and regardless of how many error states 
> > you
> > want to handle, libvirt's only limited to errno here, since we're going to 
> > do
> > plain read/write, so our internal error message returned to the user is only
> > going to contain what the errno says - okay, of course we can (and we DO)
> > provide libvirt specific string, further specifying the error but like I
> > mentioned, depending on how many error cases we want to distinguish this 
> > may be
> > hard for anyone to figure out solely on the error code, as apps will most
> > probably not parse the
> > logs.
> > 
> > Regards,
> > Erik  
> hi Erik
> do you mean you are agreeing on defining common errors and only returning 
> errno?
> 
> e.g.
> #define ENOMIGRATION 140  /* device not supporting migration */
> #define EUNATCH  49  /* software version not match */
> #define EHWNM142  /* hardware not matching*/

Defining custom error codes is probably not such a good idea... can we
match to common error codes instead? Do we have a good idea about
common error categories, anyway?

(Btw: does libvirt do a generic error-to-description translation, or
does it match to the context? I.e., can libvirt translate well-defined
error codes to a useful message for a specific case?)

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH v2 1/2] vfio/mdev: add version attribute for mdev device

2019-05-10 Thread Cornelia Huck
On Fri, 10 May 2019 10:36:09 +0100
"Dr. David Alan Gilbert"  wrote:

> * Cornelia Huck (coh...@redhat.com) wrote:
> > On Thu, 9 May 2019 17:48:26 +0100
> > "Dr. David Alan Gilbert"  wrote:
> >   
> > > * Cornelia Huck (coh...@redhat.com) wrote:  
> > > > On Thu, 9 May 2019 16:48:57 +0100
> > > > "Dr. David Alan Gilbert"  wrote:
> > > > 
> > > > > * Cornelia Huck (coh...@redhat.com) wrote:
> > > > > > On Tue, 7 May 2019 15:18:26 -0600
> > > > > > Alex Williamson  wrote:
> > > > > >   
> > > > > > > On Sun,  5 May 2019 21:49:04 -0400
> > > > > > > Yan Zhao  wrote:  
> > > > > >   
> > > > > > > > +  Errno:
> > > > > > > > +  If vendor driver wants to claim a mdev device incompatible 
> > > > > > > > to all other mdev
> > > > > > > > +  devices, it should not register version attribute for this 
> > > > > > > > mdev device. But if
> > > > > > > > +  a vendor driver has already registered version attribute and 
> > > > > > > > it wants to claim
> > > > > > > > +  a mdev device incompatible to all other mdev devices, it 
> > > > > > > > needs to return
> > > > > > > > +  -ENODEV on access to this mdev device's version attribute.
> > > > > > > > +  If a mdev device is only incompatible to certain mdev 
> > > > > > > > devices, write of
> > > > > > > > +  incompatible mdev devices's version strings to its version 
> > > > > > > > attribute should
> > > > > > > > +  return -EINVAL;
> > > > > > > 
> > > > > > > I think it's best not to define the specific errno returned for a
> > > > > > > specific situation, let the vendor driver decide, userspace simply
> > > > > > > needs to know that an errno on read indicates the device does not
> > > > > > > support migration version comparison and that an errno on write
> > > > > > > indicates the devices are incompatible or the target doesn't 
> > > > > > > support
> > > > > > > migration versions.  
> > > > > > 
> > > > > > I think I have to disagree here: It's probably valuable to have an
> > > > > > agreed error for 'cannot migrate at all' vs 'cannot migrate between
> > > > > > those two particular devices'. Userspace might want to do different
> > > > > > things (e.g. trying with different device pairs).  
> > > > > 
> > > > > Trying to stuff these things down an errno seems a bad idea; we can't
> > > > > get much information that way.
> > > > 
> > > > So, what would be a reasonable approach? Userspace should first read
> > > > the version attributes on both devices (to find out whether migration
> > > > is supported at all), and only then figure out via writing whether they
> > > > are compatible?
> > > > 
> > > > (Or just go ahead and try, if it does not care about the reason.)
> > > 
> > > Well, I'm OK with something like writing to test whether it's
> > > compatible, it's just we need a better way of saying 'no'.
> > > I'm not sure if that involves reading back from somewhere after
> > > the write or what.  
> > 
> > Hm, so I basically see two ways of doing that:
> > - standardize on some error codes... problem: error codes can be hard
> >   to fit to reasons
> > - make the error available in some attribute that can be read
> > 
> > I'm not sure how we can serialize the readback with the last write,
> > though (this looks inherently racy).
> > 
> > How important is detailed error reporting here?  
> 
> I think we need something, otherwise we're just going to get vague
> user reports of 'but my VM doesn't migrate'; I'd like the error to be
> good enough to point most users to something they can understand
> (e.g. wrong card family/too old a driver etc).

Ok, that sounds like a reasonable point. Not that I have a better idea
how to achieve that, though... we could also log a more verbose error
message to the kernel log, but that's not necessarily where a user will
look first.

Ideally, we'd want to have the user space program setting up things
querying the general compatibility for migration (so that it becomes
their problem on how to alert the user to problems :), but I'm not sure
how to eliminate the race between asking the vendor driver for
compatibility and getting the result of that operation.

Unless we introduce an interface that can retrieve _all_ results
together with the written value? Or is that not going to be much of a
problem in practice?

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH v2 1/2] vfio/mdev: add version attribute for mdev device

2019-05-10 Thread Cornelia Huck
On Thu, 9 May 2019 17:48:26 +0100
"Dr. David Alan Gilbert"  wrote:

> * Cornelia Huck (coh...@redhat.com) wrote:
> > On Thu, 9 May 2019 16:48:57 +0100
> > "Dr. David Alan Gilbert"  wrote:
> >   
> > > * Cornelia Huck (coh...@redhat.com) wrote:  
> > > > On Tue, 7 May 2019 15:18:26 -0600
> > > > Alex Williamson  wrote:
> > > > 
> > > > > On Sun,  5 May 2019 21:49:04 -0400
> > > > > Yan Zhao  wrote:
> > > > 
> > > > > > +  Errno:
> > > > > > +  If vendor driver wants to claim a mdev device incompatible to 
> > > > > > all other mdev
> > > > > > +  devices, it should not register version attribute for this mdev 
> > > > > > device. But if
> > > > > > +  a vendor driver has already registered version attribute and it 
> > > > > > wants to claim
> > > > > > +  a mdev device incompatible to all other mdev devices, it needs 
> > > > > > to return
> > > > > > +  -ENODEV on access to this mdev device's version attribute.
> > > > > > +  If a mdev device is only incompatible to certain mdev devices, 
> > > > > > write of
> > > > > > +  incompatible mdev devices's version strings to its version 
> > > > > > attribute should
> > > > > > +  return -EINVAL;  
> > > > > 
> > > > > I think it's best not to define the specific errno returned for a
> > > > > specific situation, let the vendor driver decide, userspace simply
> > > > > needs to know that an errno on read indicates the device does not
> > > > > support migration version comparison and that an errno on write
> > > > > indicates the devices are incompatible or the target doesn't support
> > > > > migration versions.
> > > > 
> > > > I think I have to disagree here: It's probably valuable to have an
> > > > agreed error for 'cannot migrate at all' vs 'cannot migrate between
> > > > those two particular devices'. Userspace might want to do different
> > > > things (e.g. trying with different device pairs).
> > > 
> > > Trying to stuff these things down an errno seems a bad idea; we can't
> > > get much information that way.  
> > 
> > So, what would be a reasonable approach? Userspace should first read
> > the version attributes on both devices (to find out whether migration
> > is supported at all), and only then figure out via writing whether they
> > are compatible?
> > 
> > (Or just go ahead and try, if it does not care about the reason.)  
> 
> Well, I'm OK with something like writing to test whether it's
> compatible, it's just we need a better way of saying 'no'.
> I'm not sure if that involves reading back from somewhere after
> the write or what.

Hm, so I basically see two ways of doing that:
- standardize on some error codes... problem: error codes can be hard
  to fit to reasons
- make the error available in some attribute that can be read

I'm not sure how we can serialize the readback with the last write,
though (this looks inherently racy).

How important is detailed error reporting here?

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH v2 1/2] vfio/mdev: add version attribute for mdev device

2019-05-09 Thread Cornelia Huck
On Thu, 9 May 2019 16:48:57 +0100
"Dr. David Alan Gilbert"  wrote:

> * Cornelia Huck (coh...@redhat.com) wrote:
> > On Tue, 7 May 2019 15:18:26 -0600
> > Alex Williamson  wrote:
> >   
> > > On Sun,  5 May 2019 21:49:04 -0400
> > > Yan Zhao  wrote:  
> >   
> > > > +  Errno:
> > > > +  If vendor driver wants to claim a mdev device incompatible to all 
> > > > other mdev
> > > > +  devices, it should not register version attribute for this mdev 
> > > > device. But if
> > > > +  a vendor driver has already registered version attribute and it 
> > > > wants to claim
> > > > +  a mdev device incompatible to all other mdev devices, it needs to 
> > > > return
> > > > +  -ENODEV on access to this mdev device's version attribute.
> > > > +  If a mdev device is only incompatible to certain mdev devices, write 
> > > > of
> > > > +  incompatible mdev devices's version strings to its version attribute 
> > > > should
> > > > +  return -EINVAL;
> > > 
> > > I think it's best not to define the specific errno returned for a
> > > specific situation, let the vendor driver decide, userspace simply
> > > needs to know that an errno on read indicates the device does not
> > > support migration version comparison and that an errno on write
> > > indicates the devices are incompatible or the target doesn't support
> > > migration versions.  
> > 
> > I think I have to disagree here: It's probably valuable to have an
> > agreed error for 'cannot migrate at all' vs 'cannot migrate between
> > those two particular devices'. Userspace might want to do different
> > things (e.g. trying with different device pairs).  
> 
> Trying to stuff these things down an errno seems a bad idea; we can't
> get much information that way.

So, what would be a reasonable approach? Userspace should first read
the version attributes on both devices (to find out whether migration
is supported at all), and only then figure out via writing whether they
are compatible?

(Or just go ahead and try, if it does not care about the reason.)

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH v2 1/2] vfio/mdev: add version attribute for mdev device

2019-05-09 Thread Cornelia Huck
On Tue, 7 May 2019 15:18:26 -0600
Alex Williamson  wrote:

> On Sun,  5 May 2019 21:49:04 -0400
> Yan Zhao  wrote:

> > +  Errno:
> > +  If vendor driver wants to claim a mdev device incompatible to all other 
> > mdev
> > +  devices, it should not register version attribute for this mdev device. 
> > But if
> > +  a vendor driver has already registered version attribute and it wants to 
> > claim
> > +  a mdev device incompatible to all other mdev devices, it needs to return
> > +  -ENODEV on access to this mdev device's version attribute.
> > +  If a mdev device is only incompatible to certain mdev devices, write of
> > +  incompatible mdev devices's version strings to its version attribute 
> > should
> > +  return -EINVAL;  
> 
> I think it's best not to define the specific errno returned for a
> specific situation, let the vendor driver decide, userspace simply
> needs to know that an errno on read indicates the device does not
> support migration version comparison and that an errno on write
> indicates the devices are incompatible or the target doesn't support
> migration versions.

I think I have to disagree here: It's probably valuable to have an
agreed error for 'cannot migrate at all' vs 'cannot migrate between
those two particular devices'. Userspace might want to do different
things (e.g. trying with different device pairs).

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH v2 1/2] vfio/mdev: add version attribute for mdev device

2019-05-09 Thread Cornelia Huck
On Wed, 8 May 2019 07:57:05 -0400
Yan Zhao  wrote:

> On Tue, May 07, 2019 at 05:19:54PM +0800, Cornelia Huck wrote:
> > On Sun,  5 May 2019 21:49:04 -0400
> > Yan Zhao  wrote:
> >   
> > > version attribute is used to check two mdev devices' compatibility.
> > > 
> > > The key point of this version attribute is that it's rw.
> > > User space has no need to understand internal of device version and no
> > > need to compare versions by itself.
> > > Compared to reading version strings from both two mdev devices being
> > > checked, user space only reads from one mdev device's version attribute.
> > > After getting its version string, user space writes this string into the
> > > other mdev device's version attribute. Vendor driver of mdev device
> > > whose version attribute being written will check device compatibility of
> > > the two mdev devices for user space and return success for compatibility
> > > or errno for incompatibility.  
> > 
> > I'm still missing a bit _what_ is actually supposed to be
> > compatible/incompatible. I'd assume some internal state descriptions
> > (even if this is not actually limited to migration).
> >  
> right.
> originally, I thought this attribute should only contain a device's hardware
> compatibility info. But seems also including vendor specific software 
> migration
> version is more reasonable, because general VFIO migration code cannot know
> version of vendor specific software migration code until migration data is
> transferring to the target vm. Then renaming it to migration_version is more
> appropriate.
> :)

Nod.

(...)

> > > @@ -246,6 +249,143 @@ Directories and files under the sysfs for Each 
> > > Physical Device
> > >This attribute should show the number of devices of type  
> > > that can be
> > >created.
> > >  
> > > +* version
> > > +
> > > +  This attribute is rw, and is optional.
> > > +  It is used to check device compatibility between two mdev devices and 
> > > is
> > > +  accessed in pairs between the two mdev devices being checked.
> > > +  The intent of this attribute is to make an mdev device's version 
> > > opaque to
> > > +  user space, so instead of reading two mdev devices' version strings and
> > > +  comparing in userspace, user space should only read one mdev device's 
> > > version
> > > +  attribute, and writes this version string into the other mdev device's 
> > > version
> > > +  attribute. Then vendor driver of mdev device whose version attribute 
> > > being
> > > +  written would check the incoming version string and tell user space 
> > > whether
> > > +  the two mdev devices are compatible via return value. That's why this
> > > +  attribute is writable.  
> > 
> > I would reword this a bit:
> > 
> > "This attribute provides a way to check device compatibility between
> > two mdev devices from userspace. The intended usage is for userspace to
> > read the version attribute from one mdev device and then writing that
> > value to the version attribute of the other mdev device. The second
> > mdev device indicates compatibility via the return code of the write
> > operation. This makes compatibility between mdev devices completely
> > vendor-defined and opaque to userspace."
> > 
> > We still should explain _what_ compatibility we're talking about here,
> > though.
> >   
> Thanks. It's much better than mine:) 
> Then I'll change compatibility --> migration compatibility.

Ok, with that it should be clear enough.

> 
> > > +
> > > +  when reading this attribute, it should show device version string of
> > > +  the device of type .
> > > +
> > > +  This string is private to vendor driver itself. Vendor driver is able 
> > > to
> > > +  freely define format and length of device version string.
> > > +  e.g. It can use a combination of pciid of parent device + mdev type.
> > > +
> > > +  When writing a string to this attribute, vendor driver should analyze 
> > > this
> > > +  string and check whether the mdev device being identified by this 
> > > string is
> > > +  compatible with the mdev device for this attribute. vendor driver 
> > > should then
> > > +  return written string's length if it regards the two mdev devices are
> > > +  compatible; vendor driver should return negative errno if it regards 
> > > the two
> > > +  mdev devices

Re: [libvirt] [PATCH v2 2/2] drm/i915/gvt: export mdev device version to sysfs for Intel vGPU

2019-05-07 Thread Cornelia Huck
On Sun,  5 May 2019 21:51:02 -0400
Yan Zhao  wrote:

> This feature implements the version attribute for Intel's vGPU mdev
> devices.
> 
> version attribute is rw.
> It's used to check device compatibility for two mdev devices.
> version string format and length are private for vendor driver. vendor
> driver is able to define them freely.
> 
> For Intel vGPU of gen8 and gen9, the mdev device version
> consists of 3 fields: "vendor id" + "device id" + "mdev type".
> 
> Reading from a vGPU's version attribute, a string is returned in below
> format: --. e.g.
> 8086-193b-i915-GVTg_V5_2.
> 
> Writing a string to a vGPU's version attribute will trigger GVT to check
> whether a vGPU identified by the written string is compatible with
> current vGPU owning this version attribute. errno is returned if the two
> vGPUs are incompatible. The length of written string is returned in
> compatible case.
> 
> For other platforms, and for GVT not supporting vGPU live migration
> feature, errnos are returned when read/write of mdev devices' version
> attributes.
> 
> For old GVT versions where no version attributes exposed in sysfs, it is
> regarded as not supporting vGPU live migration.
> 
> For future platforms, besides the current 2 fields in vendor proprietary
> part, more fields may be added to identify Intel vGPU well for live
> migration purpose.
> 
> v2:
> 1. removed 32 common part of version string
> (Alex Williamson)
> 2. do not register version attribute for GVT not supporting live
> migration.(Cornelia Huck)
> 3. for platforms out of gen8, gen9, return -EINVAL --> -ENODEV for
> incompatible. (Cornelia Huck)

Should go below '---'.

> 
> Cc: Alex Williamson 
> Cc: Erik Skultety 
> Cc: "Dr. David Alan Gilbert" 
> Cc: Cornelia Huck 
> Cc: "Tian, Kevin" 
> Cc: Zhenyu Wang 
> Cc: "Wang, Zhi A" 
> c: Neo Jia 
> Cc: Kirti Wankhede 
> 
> Signed-off-by: Yan Zhao 
> ---
>  drivers/gpu/drm/i915/gvt/Makefile |  2 +-
>  drivers/gpu/drm/i915/gvt/device_version.c | 87 +++
>  drivers/gpu/drm/i915/gvt/gvt.c| 51 +
>  drivers/gpu/drm/i915/gvt/gvt.h|  6 ++
>  4 files changed, 145 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/gpu/drm/i915/gvt/device_version.c
> 

(...)

> diff --git a/drivers/gpu/drm/i915/gvt/device_version.c 
> b/drivers/gpu/drm/i915/gvt/device_version.c
> new file mode 100644
> index ..bd4cdcbdba95
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/gvt/device_version.c
> @@ -0,0 +1,87 @@
> +/*
> + * Copyright(c) 2011-2017 Intel Corporation. All rights reserved.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
> FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 
> THE
> + * SOFTWARE.
> + *
> + * Authors:
> + *Yan Zhao 
> + */
> +#include 
> +#include "i915_drv.h"
> +
> +static bool is_compatible(const char *self, const char *remote)
> +{
> + if (strlen(remote) != strlen(self))
> + return false;
> +
> + return (strncmp(self, remote, strlen(self))) ? false : true;
> +}
> +
> +ssize_t intel_gvt_get_vfio_device_version_len(struct drm_i915_private 
> *dev_priv)
> +{
> + if (!IS_GEN(dev_priv, 8) && !IS_GEN(dev_priv, 9))
> + return -ENODEV;
> +
> + return PAGE_SIZE;
> +}
> +
> +ssize_t intel_gvt_get_vfio_device_version(struct drm_i915_private *dev_priv,
> + char *buf, const char *mdev_type)
> +{
> + int cnt = 0, ret = 0;
> + const char *str = NULL;
> +
> + /* currently only gen8 & gen9 ar

Re: [libvirt] [PATCH v2 1/2] vfio/mdev: add version attribute for mdev device

2019-05-07 Thread Cornelia Huck
On Sun,  5 May 2019 21:49:04 -0400
Yan Zhao  wrote:

> version attribute is used to check two mdev devices' compatibility.
> 
> The key point of this version attribute is that it's rw.
> User space has no need to understand internal of device version and no
> need to compare versions by itself.
> Compared to reading version strings from both two mdev devices being
> checked, user space only reads from one mdev device's version attribute.
> After getting its version string, user space writes this string into the
> other mdev device's version attribute. Vendor driver of mdev device
> whose version attribute being written will check device compatibility of
> the two mdev devices for user space and return success for compatibility
> or errno for incompatibility.

I'm still missing a bit _what_ is actually supposed to be
compatible/incompatible. I'd assume some internal state descriptions
(even if this is not actually limited to migration).

> So two readings of version attributes + checking in user space are now
> changed to one reading + one writing of version attributes + checking in
> vendor driver.

I'm not sure that needs to go into the patch description (sounds like
it is rather a change log?)

> Format and length of version strings are now private to vendor driver
> who can define them freely.

Same here; simply drop the 'now'?

> 
>  __ user space
>   /\  \
>  / \write
> / read  \
>  __/__   ___\|/___
> | version | | version |-->check compatibility
> --- ---
> mdev device A   mdev device B
> 
> This version attribute is optional. If a mdev device does not provide
> with a version attribute, this mdev device is incompatible to all other
> mdev devices.

Again, I'd like an explanation here what kind of compatibility we're
talking about.

> 
> Live migration is able to take advantage of this version attribute.
> Before user space actually starts live migration, it can first check
> whether two mdev devices are compatible.
> 
> v2:
> 1. added detailed intent and usage
> 2. made definition of version string completely private to vendor driver
>    (Alex Williamson)
> 3. abandoned changes to sample mdev drivers (Alex Williamson)
> 4. mandatory --> optional (Cornelia Huck)
> 5. added description for errno (Cornelia Huck)

This changelog should go below the ---, so that it does not actually
show up in the patch description later :)

> 
> Cc: Alex Williamson 
> Cc: Erik Skultety 
> Cc: "Dr. David Alan Gilbert" 
> Cc: Cornelia Huck 
> Cc: "Tian, Kevin" 
> Cc: Zhenyu Wang 
> Cc: "Wang, Zhi A" 
> Cc: Neo Jia 
> Cc: Kirti Wankhede 
> Cc: Daniel P. Berrangé 
> Cc: Christophe de Dinechin 
> 
> Signed-off-by: Yan Zhao 
> ---
>  Documentation/vfio-mediated-device.txt | 140 +
>  1 file changed, 140 insertions(+)
> 
> diff --git a/Documentation/vfio-mediated-device.txt 
> b/Documentation/vfio-mediated-device.txt
> index c3f69bcaf96e..013a764968eb 100644
> --- a/Documentation/vfio-mediated-device.txt
> +++ b/Documentation/vfio-mediated-device.txt
> @@ -202,6 +202,7 @@ Directories and files under the sysfs for Each Physical 
> Device
>| |   |--- available_instances
>| |   |--- device_api
>| |   |--- description
> +  | |   |--- version
>| |   |--- [devices]
>| |--- []
>| |   |--- create
> @@ -209,6 +210,7 @@ Directories and files under the sysfs for Each Physical 
> Device
>| |   |--- available_instances
>| |   |--- device_api
>| |   |--- description
> +  | |   |--- version
>| |   |--- [devices]
>| |--- []
>|  |--- create
> @@ -216,6 +218,7 @@ Directories and files under the sysfs for Each Physical 
> Device
>|  |--- available_instances
>|  |--- device_api
>|  |--- description
> +  |  |--- version
>|  |--- [devices]
>  
>  * [mdev_supported_types]
> @@ -246,6 +249,143 @@ Directories and files under the sysfs for Each Physical 
> Device
>This attribute should show the number of devices of type  that 
> can be
>created.
>  
> +* version
> +
> +  This attribute is rw, and is optional.
> +  It is used to check device compatibility between two mdev devices and is
> +  accessed in pairs between the two mdev devices being checked.
> +  The intent of this attribute is to make an mdev device's version opaque to
> +  user space, so instead of reading two mdev devices' version strings and
> +  comparing in userspace, user space should only read one mdev dev

Re: [libvirt] [PATCH 1/2] vfio/mdev: add version field as mandatory attribute for mdev device

2019-05-07 Thread Cornelia Huck
On Tue, 7 May 2019 01:39:13 -0400
Yan Zhao  wrote:

> On Tue, Apr 30, 2019 at 11:29:08PM +0800, Cornelia Huck wrote:

> > If I followed the discussion correctly, I think you plan to drop this
> > format, don't you? I'd be happy if a vendor driver can use a simple
> > number without any prefixes if it so chooses.
> > 
> > I also like the idea of renaming this "migration_version" so that it is
> > clear we're dealing with versioning of the migration capability (and
> > not a version of the device or so).  
> hi Cornelia,
> sorry I just saw this mail after sending v2 of this patch set...
> yes, I dropped the common part and vendor driver now can define whatever it
> wishes to identify a device version.

Ok, I'll look at v2.

> However, I don't agree to rename it to "migration_version", as it still may
> bring some kind of confusing with the migration version a vendor driver is
> using, e.g. vendor driver changes migration code and increases that migration
> version.
> In fact, what info we want to get from this attribute is whether this mdev
> device is compatible with another mdev device, which is tied to device, and 
> not
> necessarily bound to migration.
> 
> do you think so?

I'm not 100% convinced; but we can continue the discussion on v2.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 1/2] vfio/mdev: add version field as mandatory attribute for mdev device

2019-04-30 Thread Cornelia Huck
On Wed, 24 Apr 2019 04:15:58 -0400
Yan Zhao  wrote:

> On Wed, Apr 24, 2019 at 03:56:24PM +0800, Cornelia Huck wrote:
> > On Tue, 23 Apr 2019 23:10:37 -0400
> > Yan Zhao  wrote:
> >   
> > > On Tue, Apr 23, 2019 at 05:59:32PM +0800, Cornelia Huck wrote:  
> > > > On Fri, 19 Apr 2019 04:35:04 -0400
> > > > Yan Zhao  wrote:

> > > > > @@ -225,6 +228,8 @@ Directories and files under the sysfs for Each 
> > > > > Physical Device
> > > > >[], device_api, and available_instances are mandatory 
> > > > > attributes
> > > > >that should be provided by vendor driver.
> > > > >
> > > > > +  version is a mandatory attribute if a mdev device supports live 
> > > > > migration.
> > > > 
> > > > What about "An mdev device wishing to support live migration must
> > > > provide the version attribute."?
> > > yes, I just want to keep consistent with the line above it 
> > > " [], device_api, and available_instances are mandatory 
> > > attributes
> > >   that should be provided by vendor driver."
> > > what about below one?
> > >   "version is a mandatory attribute if a mdev device wishing to support 
> > > live
> > >   migration."  
> > 
> > My point is that an attribute is not mandatory if it can be left out :)
> > (I'm not a native speaker, though; maybe this makes perfect sense
> > after all?)
> > 
> > Maybe "version is a required attribute if live migration is supported
> > for an mdev device"?
> >   
> you are right, "mandatory" may bring some confusion.
> Maybe
> "vendor driver must provide version attribute for an mdev device wishing to
> support live migration." ?
> based on your first version :)

"The vendor driver must provide the version attribute for any mdev
device it wishes to support live migration for." ?

> 
> > > 
> > >   
> > > > > +
> > > > >  * []
> > > > >
> > > > >The [] name is created by adding the device driver string 
> > > > > as a prefix
> > > > > @@ -246,6 +251,35 @@ Directories and files under the sysfs for Each 
> > > > > Physical Device
> > > > >This attribute should show the number of devices of type  
> > > > > that can be
> > > > >created.
> > > > >
> > > > > +* version
> > > > > +
> > > > > +  This attribute is rw. It is used to check whether two devices are 
> > > > > compatible
> > > > > +  for live migration. If this attribute is missing, then the 
> > > > > corresponding mdev
> > > > > +  device is regarded as not supporting live migration.
> > > > > +
> > > > > +  It consists of two parts: common part and vendor proprietary part.
> > > > > +  common part: 32 bit. lower 16 bits is vendor id and higher 16 bits 
> > > > > identifies
> > > > > +   device type. e.g., for pci device, it is
> > > > > +   "pci vendor id" | (VFIO_DEVICE_FLAGS_PCI << 16).
> > > > > +  vendor proprietary part: this part is varied in length. vendor 
> > > > > driver can
> > > > > +   specify any string to identify a device.
> > > > > +
> > > > > +  When reading this attribute, it should show device version string 
> > > > > of the device
> > > > > +  of type . If a device does not support live migration, it 
> > > > > should
> > > > > +  return errno.
> > > > > +  When writing a string to this attribute, it returns errno for 
> > > > > incompatibility
> > > > > +  or returns written string length in compatibility case. If a 
> > > > > device does not
> > > > > +  support live migration, it always returns errno.
> > > > 
> > > > I'm not sure whether a device that does not support live migration
> > > > should expose this attribute in the first place. Or is that to cover
> > > > cases where a driver supports live migration only for some of the
> > > > devices it supports?
> > > yes, driver returning error code is to cover the cases where only part of 
> > > devices it
> > > supports can be migrated.
> > > 
> > >   
> >

Re: [libvirt] [PATCH 1/2] vfio/mdev: add version field as mandatory attribute for mdev device

2019-04-24 Thread Cornelia Huck
On Tue, 23 Apr 2019 23:10:37 -0400
Yan Zhao  wrote:

> On Tue, Apr 23, 2019 at 05:59:32PM +0800, Cornelia Huck wrote:
> > On Fri, 19 Apr 2019 04:35:04 -0400
> > Yan Zhao  wrote:
> >   
> > > device version attribute in mdev sysfs is used by user space software
> > > (e.g. libvirt) to query device compatibility for live migration of VFIO
> > > mdev devices. This attribute is mandatory if a mdev device supports live
> > > migration.
> > >
> > > It consists of two parts: common part and vendor proprietary part.
> > > common part: 32 bit. lower 16 bits is vendor id and higher 16 bits
> > >  identifies device type. e.g., for pci device, it is
> > >  "pci vendor id" | (VFIO_DEVICE_FLAGS_PCI << 16).
> > > vendor proprietary part: this part is varied in length. vendor driver can
> > >  specify any string to identify a device.
> > >
> > > When reading this attribute, it should show device version string of the
> > > device of type . If a device does not support live migration, it
> > > should return errno.  
> > 
> > It might make more sense if the driver does not register the attribute
> > for the device in that case at all.
> >   
> yes. what about rephrase like this:
> "
> When reading this attribute, it should show device version string of the
> device of type .
> If a device does not support live migration, it has two choices:
> 1. do not register this version attribute
> 2. return -ENODEV on rw this version attribute

"return -ENODEV when accessing the version attribute" ?

> Choice 1 is preferred.
> "
> 
> 
> > > When writing a string to this attribute, it returns errno for
> > > incompatibility or returns written string length in compatibility case.
> > > If a device does not support live migration, it always returns errno.
> > >
> > > For user space software to use:
> > > 1.
> > > Before starting live migration, user space software first reads source 
> > > side
> > > mdev device's version. e.g.
> > > "#cat \
> > > /sys/bus/pci/devices/\:00\:02.0/5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd/mdev_type/version"
> > > 00028086-193b-i915-GVTg_V5_4
> > >
> > > 2.
> > > Then, user space software writes the source side returned version string
> > > to device version attribute in target side, and checks the return value.
> > > If a negative errno is returned in the target side, then mdev devices in
> > > source and target sides are not compatible;
> > > If a positive number is returned and it equals to the length of written
> > > string, then the two mdev devices in source and target side are 
> > > compatible.
> > > e.g.
> > > (a) compatibility case
> > > "# echo 00028086-193b-i915-GVTg_V5_4 >
> > > /sys/bus/pci/devices/\:00\:02.0/882cc4da-dede-11e7-9180-078a62063ab1/mdev_type/version"
> > >
> > > (b) incompatibility case
> > > "#echo 00028086-193b-i915-GVTg_V5_1 >
> > > /sys/bus/pci/devices/\:00\:02.0/882cc4da-dede-11e7-9180-078a62063ab1/mdev_type/version"
> > > -bash: echo: write error: Invalid argument
> > >
> > > 3. if two mdev devices are compatible, user space software can start
> > > live migration, and vice versa.
> > >
> > > Note: if a mdev device does not support live migration, it either does
> > > not provide a version attribute, or always returns errno when its version
> > > attribute is read/written.
> > >
> > > Cc: Alex Williamson 
> > > Cc: Erik Skultety 
> > > Cc: "Dr. David Alan Gilbert" 
> > > Cc: Cornelia Huck 
> > > Cc: "Tian, Kevin" 
> > > Cc: Zhenyu Wang 
> > > Cc: "Wang, Zhi A" 
> > > Cc: Neo Jia 
> > > Cc: Kirti Wankhede 
> > >
> > > Signed-off-by: Yan Zhao 
> > > ---
> > >  Documentation/vfio-mediated-device.txt | 36 ++
> > >  samples/vfio-mdev/mbochs.c | 17 
> > >  samples/vfio-mdev/mdpy.c   | 16 
> > >  samples/vfio-mdev/mtty.c   | 16 
> > >  4 files changed, 85 insertions(+)
> > >
> > > diff --git a/Documentation/vfio-mediated-device.txt 
> > > b/Documentation/vfio-mediated-device.txt
> > > index c3f69bcaf96e..bc28471c0667 100644
> > > --- a/Documentation/vfio-mediated-device.txt

Re: [libvirt] [PATCH 2/2] drm/i915/gvt: export mdev device version to sysfs for Intel vGPU

2019-04-23 Thread Cornelia Huck
On Fri, 19 Apr 2019 04:35:59 -0400
Yan Zhao  wrote:

> This feature implements the version attribute for Intel's vGPU mdev
> devices.
> 
> version attribute is rw. It is queried by userspace software like libvirt
> to check whether two vGPUs are compatible for live migration.
> 
> It consists of two parts: common part and vendor proprietary part.
> common part: 32 bit. lower 16 bits is vendor id and higher 16 bits
>  identifies device type. e.g., for pci device, it is
>  "pci vendor id" | (VFIO_DEVICE_FLAGS_PCI << 16).
> vendor proprietary part: this part is varied in length. vendor driver can
>  specify any string to identify a device.
> 
> For Intel vGPU of gen8 and gen9, the vendor proprietary part currently
> consists of 2 fields: "device id" + "mdev type".
> 
> Reading from a vGPU's version attribute, a string is returned in below
> format: 00028086--. e.g.
> 00028086-193b-i915-GVTg_V5_2.
> 
> Writing a string to a vGPU's version attribute will trigger GVT to check
> whether a vGPU identified by the written string is compatible with
> current vGPU owning this version attribute. errno is returned if the two
> vGPUs are incompatible. The length of written string is returned in
> compatible case.
> 
> For other platforms, and for GVT not supporting vGPU live migration
> feature, errnos are returned when read/write of mdev devices' version
> attributes.
> 
> For old GVT versions where no version attributes exposed in sysfs, it is
> regarded as not supporting vGPU live migration.
> 
> For future platforms, besides the current 2 fields in vendor proprietary
> part, more fields may be added to identify Intel vGPU well for live
> migration purpose.
> 
> Cc: Alex Williamson 
> Cc: Erik Skultety 
> Cc: "Dr. David Alan Gilbert" 
> Cc: Cornelia Huck 
> Cc: "Tian, Kevin" 
> Cc: Zhenyu Wang 
> Cc: "Wang, Zhi A" 
> c: Neo Jia 
> Cc: Kirti Wankhede 
> 
> Signed-off-by: Yan Zhao 
> ---
>  drivers/gpu/drm/i915/gvt/Makefile |  2 +-
>  drivers/gpu/drm/i915/gvt/device_version.c | 94 +++
>  drivers/gpu/drm/i915/gvt/gvt.c| 55 +
>  drivers/gpu/drm/i915/gvt/gvt.h|  6 ++
>  4 files changed, 156 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/gpu/drm/i915/gvt/device_version.c
> 

(...)

> +static bool is_compatible(const char *self, const char *remote)
> +{
> + if (strlen(remote) != strlen(self))
> + return false;
> +
> + return (strncmp(self, remote, strlen(self))) ? false : true;
> +}
> +
> +ssize_t intel_gvt_get_vfio_device_version_len(struct drm_i915_private 
> *dev_priv)
> +{
> + if (!IS_GEN(dev_priv, 8) && !IS_GEN(dev_priv, 9))
> + return -ENODEV;
> +
> + return PAGE_SIZE;
> +}
> +
> +ssize_t intel_gvt_get_vfio_device_version(struct drm_i915_private *dev_priv,
> + char *buf, const char *mdev_type)
> +{
> + int cnt = 0, ret = 0;
> + const char *str = NULL;
> +
> + /* currently only gen8 & gen9 are supported */
> + if (!IS_GEN(dev_priv, 8) && !IS_GEN(dev_priv, 9))
> + return -ENODEV;
> +
> + /* first 32 bit common part specifying vendor id and it's a pci
> +  * device
> +  */
> + cnt = snprintf(buf, GVT_DEVICE_VERSION_COMMON_LEN + 1,
> + "%08x", GVT_VFIO_DEVICE_VENDOR_ID);
> + buf += cnt;
> + ret += cnt;
> +
> + /* vendor proprietary part: device id + mdev type */
> + /* device id */
> + cnt = snprintf(buf, GVT_DEVICE_VERSION_DEVICE_ID_LEN + 2,
> + "-%04x", INTEL_DEVID(dev_priv));
> + buf += cnt;
> + ret += cnt;
> +
> + /* mdev type */
> + str = mdev_type;
> + cnt = snprintf(buf, strlen(str) + 3, "-%s\n", mdev_type);
> + buf += cnt;
> + ret += cnt;
> +
> + return ret;
> +}

Looking at this handling, it seems much easier to me to simply use a
numeric value instead of a string: You don't have to build it via
sprintf, there are generic functions for parsing a string input into a
simple number, and you have more options for compatibility (e.g.
"version must be between m and n" instead of an exact match).

If you still need to encode the device id here, you should be able to
easily do something like (device_id << 16) | migration_version -- do
you think that could work?

> +
> +ssize_t intel_gvt_check_vfio_device_version(struct drm_i915_private 
> *dev_priv,
> + const char *self, const char *remote)
> +{
> +
> + /* currently only gen8 & gen9 are supporte

Re: [libvirt] [PATCH 1/2] vfio/mdev: add version field as mandatory attribute for mdev device

2019-04-23 Thread Cornelia Huck
On Fri, 19 Apr 2019 04:35:04 -0400
Yan Zhao  wrote:

> device version attribute in mdev sysfs is used by user space software
> (e.g. libvirt) to query device compatibility for live migration of VFIO
> mdev devices. This attribute is mandatory if a mdev device supports live
> migration.
> 
> It consists of two parts: common part and vendor proprietary part.
> common part: 32 bit. lower 16 bits is vendor id and higher 16 bits
>  identifies device type. e.g., for pci device, it is
>  "pci vendor id" | (VFIO_DEVICE_FLAGS_PCI << 16).
> vendor proprietary part: this part is varied in length. vendor driver can
>  specify any string to identify a device.
> 
> When reading this attribute, it should show device version string of the
> device of type . If a device does not support live migration, it
> should return errno.

It might make more sense if the driver does not register the attribute
for the device in that case at all.

> When writing a string to this attribute, it returns errno for
> incompatibility or returns written string length in compatibility case.
> If a device does not support live migration, it always returns errno.
> 
> For user space software to use:
> 1.
> Before starting live migration, user space software first reads source side
> mdev device's version. e.g.
> "#cat \
> /sys/bus/pci/devices/\:00\:02.0/5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd/mdev_type/version"
> 00028086-193b-i915-GVTg_V5_4
> 
> 2.
> Then, user space software writes the source side returned version string
> to device version attribute in target side, and checks the return value.
> If a negative errno is returned in the target side, then mdev devices in
> source and target sides are not compatible;
> If a positive number is returned and it equals to the length of written
> string, then the two mdev devices in source and target side are compatible.
> e.g.
> (a) compatibility case
> "# echo 00028086-193b-i915-GVTg_V5_4 >
> /sys/bus/pci/devices/\:00\:02.0/882cc4da-dede-11e7-9180-078a62063ab1/mdev_type/version"
> 
> (b) incompatibility case
> "#echo 00028086-193b-i915-GVTg_V5_1 >
> /sys/bus/pci/devices/\:00\:02.0/882cc4da-dede-11e7-9180-078a62063ab1/mdev_type/version"
> -bash: echo: write error: Invalid argument
> 
> 3. if two mdev devices are compatible, user space software can start
> live migration, and vice versa.
> 
> Note: if a mdev device does not support live migration, it either does
> not provide a version attribute, or always returns errno when its version
> attribute is read/written.
> 
> Cc: Alex Williamson 
> Cc: Erik Skultety 
> Cc: "Dr. David Alan Gilbert" 
> Cc: Cornelia Huck 
> Cc: "Tian, Kevin" 
> Cc: Zhenyu Wang 
> Cc: "Wang, Zhi A" 
> Cc: Neo Jia 
> Cc: Kirti Wankhede 
> 
> Signed-off-by: Yan Zhao 
> ---
>  Documentation/vfio-mediated-device.txt | 36 ++
>  samples/vfio-mdev/mbochs.c | 17 
>  samples/vfio-mdev/mdpy.c   | 16 
>  samples/vfio-mdev/mtty.c   | 16 
>  4 files changed, 85 insertions(+)
> 
> diff --git a/Documentation/vfio-mediated-device.txt 
> b/Documentation/vfio-mediated-device.txt
> index c3f69bcaf96e..bc28471c0667 100644
> --- a/Documentation/vfio-mediated-device.txt
> +++ b/Documentation/vfio-mediated-device.txt
> @@ -202,6 +202,7 @@ Directories and files under the sysfs for Each Physical 
> Device
>| |   |--- available_instances
>| |   |--- device_api
>| |   |--- description
> +  | |   |--- version
>| |   |--- [devices]
>| |--- []
>| |   |--- create
> @@ -209,6 +210,7 @@ Directories and files under the sysfs for Each Physical 
> Device
>| |   |--- available_instances
>| |   |--- device_api
>| |   |--- description
> +  | |   |--- version
>| |   |--- [devices]
>| |--- []
>|  |--- create
> @@ -216,6 +218,7 @@ Directories and files under the sysfs for Each Physical 
> Device
>|  |--- available_instances
>|  |--- device_api
>|  |--- description
> +  |  |--- version
>|  |--- [devices]
>  
>  * [mdev_supported_types]
> @@ -225,6 +228,8 @@ Directories and files under the sysfs for Each Physical 
> Device
>[], device_api, and available_instances are mandatory attributes
>that should be provided by vendor driver.
>  
> +  version is a mandatory attribute if a mdev device supports live migration.

What about "An mdev device wishing to support live migration must
provide the version attribute.

Re: [libvirt] [PATCH 1/2] vfio/mdev: add version field as mandatory attribute for mdev device

2019-04-23 Thread Cornelia Huck
On Tue, 23 Apr 2019 01:41:57 -0400
Yan Zhao  wrote:

> On Tue, Apr 23, 2019 at 09:21:00AM +0800, Alex Williamson wrote:
> > On Mon, 22 Apr 2019 21:01:52 -0400
> > Yan Zhao  wrote:
> >   
> > > On Mon, Apr 22, 2019 at 10:39:50PM +0800, Alex Williamson wrote:  
> > > > On Fri, 19 Apr 2019 04:35:04 -0400
> > > > Yan Zhao  wrote:
> > > > 
> > > > > device version attribute in mdev sysfs is used by user space software
> > > > > (e.g. libvirt) to query device compatibility for live migration of 
> > > > > VFIO
> > > > > mdev devices. This attribute is mandatory if a mdev device supports 
> > > > > live
> > > > > migration.
> > > > 
> > > > The Subject: doesn't quite match what's being proposed here.
> > > >  
> > > > > It consists of two parts: common part and vendor proprietary part.
> > > > > common part: 32 bit. lower 16 bits is vendor id and higher 16 bits
> > > > >  identifies device type. e.g., for pci device, it is
> > > > >  "pci vendor id" | (VFIO_DEVICE_FLAGS_PCI << 16).
> > > > 
> > > > What purpose does this serve?  If it's intended as some sort of
> > > > namespace feature, shouldn't we first assume that we can only support
> > > > migration to devices of the same type?  Therefore each type would
> > > > already have its own namespace.  Also that would make the trailing bit
> > > > of the version string listed below in the example redundant.  A vendor
> > > > is still welcome to include this in their version string if they wish,
> > > > but I think the string should be entirely vendor defined.
> > > >
> > > hi Alex,
> > > This common part is a kind of namespace.
> > > Because if version string is entirely defined by vendors, I'm worried 
> > > about
> > > if there is a case that one vendor's version string happens to deceive and
> > > interfere with another vendor's version checking?
> > > e.g.
> > > vendor A has a version string like: vendor id + device id + mdev type
> > > vendor B has a version string like: device id + vendor id + mdev type
> > > but vendor A's vendor id is 0x8086, device id is 0x1217
> > > vendor B's vendor id is 0x1217, device id is 0x8086.
> > > 
> > > In this corner case, the two vendors may regard the two device is
> > > migratable but actually they are not.
> > > 
> > > That's the reason for this common part that serve as a kind of namespace
> > > that all vendors will comply with to avoid overlap.  
> > 
> > If we assume that migration can only occur between matching mdev types,
> > this is redundant, each type already has their own namespace.
> >  
> hi Alex,
> do you mean user space software like libvirt needs to first check whether
> mdev type is matching and then check whether version is matching?
> 
> if user space software only checks version for migration, it means vendor
> driver has to include mdev type in their vendor proprietary part string,
> right?

Can't userspace simply check for the driver in use and only then check
the version attribute?

> Another thing is that could there be any future mdev parent driver that
> applies to all mdev devices, just like vfio-pci? like Yi's vfio-pci-mdev
> driver (https://lkml.org/lkml/2019/3/13/114)?

Hm, I think that the vfio-pci-mdev driver then needs to expose
information regarding compatibility (and not the core).

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH for-4.0 v3 2/2] virtio: Provide version-specific variants of virtio PCI devices

2018-11-27 Thread Cornelia Huck
On Tue, 27 Nov 2018 00:49:59 -0200
Eduardo Habkost  wrote:

> Many of the current virtio-*-pci device types actually represent
> 3 different types of devices:
> * virtio 1.0 non-transitional devices
> * virtio 1.0 transitional devices
> * virtio 0.9 ("legacy device" in virtio 1.0 terminology)
> 
> That would be just an annoyance if it didn't break our device/bus
> compatibility QMP interfaces.  With these multi-purpose device
> types, there's no way to tell management software that
> transitional devices and legacy devices require a Conventional
> PCI bus.
> 
> The multi-purpose device types would also prevent us from telling
> management software what's the PCI vendor/device ID for them,
> because their PCI IDs change at runtime depending on the bus
> where they were plugged.
> 
> This patch adds separate device types for each of those virtio
> device flavors:
> 
> - virtio-*-pci: the existing multi-purpose device types
>   - Configurable using `disable-legacy` and `disable-modern`
> properties
>   - Legacy driver support is automatically enabled/disabled
> depending on the bus where it is plugged
>   - Supports Conventional PCI and PCI Express buses
> (but Conventional PCI is incompatible with
> disable-legacy=off)
>   - Changes PCI vendor/device IDs at runtime
> - virtio-*-pci-transitional: virtio-1.0 device supporting legacy drivers
>   - Supports Conventional PCI buses only, because
> it has a PIO BAR
> - virtio-*-pci-non-transitional: modern-only
>   - Supports both Conventional PCI and PCI Express buses
> 
> The existing TYPE_* macros for these types will point to an
> abstract base type, so existing casts in the code will keep
> working for all variants.
> 
> A simple test script (tests/acceptance/virtio_version.py) is
> included, to check if the new device types are equivalent to
> using the `disable-legacy` and `disable-modern` options.
> 
> Acked-by: Andrea Bolognani 
> Signed-off-by: Eduardo Habkost 
> ---
> Changes v2 -> v3:
> * Split into two separate patches (type registration helper
>   and introduction of new types)
> * Include full type names as literals in the code instead
>   of generating type names automatically
> 
> Changes v1 -> v2:
> * Removed *-0.9 devices.  Nobody will want to use them, if
>   transitional devices work with legacy drivers
>   (Gerd Hoffmann, Michael S. Tsirkin)
> * Drop virtio version from name: rename -1.0-transitional to
>   -transitional (Michael S. Tsirkin)
> * Renamed -1.0 to -non-transitional
> * Don't add any extra variants to modern-only device types
>   (they don't need it)
> * Fix typo on TYPE_VIRTIO_INPUT_HOST_PCI (crash reported by
>   Cornelia Huck)
> * No need to change cast macros for modern-only devices
> * Rename virtio_register_types() to virtio_pci_types_register()
> ---
>  hw/virtio/virtio-pci.h |  24 ++--
>  hw/virtio/virtio-pci.c |  60 --
>  tests/acceptance/virtio_version.py | 177 +
>  3 files changed, 237 insertions(+), 24 deletions(-)
>  create mode 100644 tests/acceptance/virtio_version.py
> 

(...)

> diff --git a/tests/acceptance/virtio_version.py 
> b/tests/acceptance/virtio_version.py
> new file mode 100644
> index 00..a8d0b20b75
> --- /dev/null
> +++ b/tests/acceptance/virtio_version.py
> @@ -0,0 +1,177 @@
> +"""
> +Check compatibility of virtio device types
> +"""
> +# Copyright (c) 2018 Red Hat, Inc.
> +#
> +# Author:
> +#  Eduardo Habkost 
> +#
> +# This work is licensed under the terms of the GNU GPL, version 2 or
> +# later.  See the COPYING file in the top-level directory.
> +import sys
> +import os
> +
> +sys.path.append(os.path.join(os.path.dirname(__file__), "..", "..", 
> "scripts"))
> +from qemu import QEMUMachine
> +from avocado_qemu import Test
> +
> +# Virtio Device IDs:
> +VIRTIO_NET = 1
> +VIRTIO_BLOCK = 2
> +VIRTIO_CONSOLE = 3
> +VIRTIO_RNG = 4
> +VIRTIO_BALLOON = 5
> +VIRTIO_RPMSG = 7
> +VIRTIO_SCSI = 8
> +VIRTIO_9P = 9
> +VIRTIO_RPROC_SERIAL = 11
> +VIRTIO_CAIF = 12
> +VIRTIO_GPU = 16
> +VIRTIO_INPUT = 18
> +VIRTIO_VSOCK = 19
> +VIRTIO_CRYPTO = 20

We may want to include additional device types when they are added; but
I expect them to be modern-only, so I don't think we'd miss problems if
they aren't covered here.

(...)

Review of the test case was more superficial; but have a

Reviewed-by: Cornelia Huck 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH for-4.0 v3 1/2] virtio: Helper for registering virtio device types

2018-11-27 Thread Cornelia Huck
On Tue, 27 Nov 2018 00:49:58 -0200
Eduardo Habkost  wrote:

> Introduce a helper for registering different flavours of virtio
> devices.  Convert code to use the helper, but keep only the
> existing generic types.  Transitional and non-transitional device
> types will be added by another patch.
> 
> Acked-by: Andrea Bolognani 
> Signed-off-by: Eduardo Habkost 
> ---
> Changes v2 -> v3:
> * Split into two separate patches (type registration helper
>   and introduction of new types)
> * Rewrote comments describing each flavour
> * Rewrote virtio_pci_types_register() completely:
>   * Replaced magic generation of type names with explicit fields in
> VirtioPCIDeviceTypeInfo
>   * Removed modern_only field (won't be used anymore)
>   * Don't register a separate base type unless it's required by
> the caller
> ---
>  hw/virtio/virtio-pci.h|  54 
>  hw/display/virtio-gpu-pci.c   |   7 +-
>  hw/display/virtio-vga.c   |   7 +-
>  hw/virtio/virtio-crypto-pci.c |   7 +-
>  hw/virtio/virtio-pci.c| 231 --
>  5 files changed, 228 insertions(+), 78 deletions(-)
> 
> diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
> index 813082b0d7..b26731a305 100644
> --- a/hw/virtio/virtio-pci.h
> +++ b/hw/virtio/virtio-pci.h
> @@ -417,4 +417,58 @@ struct VirtIOCryptoPCI {
>  /* Virtio ABI version, if we increment this, we break the guest driver. */
>  #define VIRTIO_PCI_ABI_VERSION  0
>  
> +/* Input for virtio_pci_types_register() */
> +typedef struct VirtioPCIDeviceTypeInfo {
> +/*
> + * Common base class for the subclasses below.
> + *
> + * Required only if transitional_name or non_transitional_name is set.
> + *
> + * We need a separate base type instead of making all types
> + * inherit from generic_name for two reasons:
> + * 1) generic_name implements INTERFACE_PCIE_DEVICE, but
> + *transitional_name don't.

s/don't/does not/

> + * 2) generic_name has the "disable-legacy" and "disable-modern"
> + *properties, transitional_name and non_transitional name don't.
> + */
> +const char *base_name;
> +/*
> + * Generic device type.  Optional.
> + *
> + * Supports both transitional and non-transitional modes,
> + * using the disable-legacy and disable-modern properties.
> + * If disable-legacy=auto, (non-)transitional mode is selected
> + * depending on the bus where the device is plugged.
> + *
> + * Implements both INTERFACE_PCIE_DEVICE and 
> INTERFACE_CONVENTIONAL_PCI_DEVICE,
> + * but PCI Express is supported only in non-transitional mode.
> + *
> + * The only type implemented by QEMU 3.1 and older.
> + */
> +const char *generic_name;
> +/*
> + * The non-transitional device type.  Optional.

That's transitional...

> + *
> + * Implements both INTERFACE_PCIE_DEVICE and 
> INTERFACE_CONVENTIONAL_PCI_DEVICE.
> + */
> +const char *transitional_name;
> +/*
> + * The transitional device type.  Optional.

...and that's non-transitional :)

> + *
> + * Implements INTERFACE_CONVENTIONAL_PCI_DEVICE only.
> + */
> +const char *non_transitional_name;
> +
> +/* Parent type.  If NULL, TYPE_VIRTIO_PCI is used */
> +const char *parent;
> +
> +/* Same as TypeInfo fields: */
> +size_t instance_size;
> +void (*instance_init)(Object *obj);
> +void (*class_init)(ObjectClass *klass, void *data);
> +} VirtioPCIDeviceTypeInfo;

That's a bit of boilerplate, but the end result seems to be more
greppable.

> +
> +/* Register virtio-pci type(s).  @t must be static. */
> +void virtio_pci_types_register(const VirtioPCIDeviceTypeInfo *t);
> +
>  #endif

(...)

Looks sane to me. With the typos fixed,

Reviewed-by: Cornelia Huck 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH for-4.0 v2] virtio: Provide version-specific variants of virtio PCI devices

2018-11-20 Thread Cornelia Huck
On Mon, 19 Nov 2018 22:44:54 -0200
Eduardo Habkost  wrote:

> On Thu, Nov 15, 2018 at 11:50:56AM +0100, Cornelia Huck wrote:
> > On Thu, 15 Nov 2018 10:05:59 +
> > Daniel P. Berrangé  wrote:

> > > If libvirt did this compatibility approach, can you
> > > confirm this would be live migration state compatible.
> > > 
> > > ie can live migrate virtio-*-pci -> virtio-*-pci-transitional,
> > > provided only PCI bus was used.  
> > 
> > It also needs to make sure that neither disable-legacy nor
> > disable-modern is set. Then this would have a compatible state AFAICS.
> >   
> > >   
> > > > - virtio-*-pci-non-transitional: modern-only
> > > >   - Supports both Conventional PCI and PCI Express buses
> > > 
> > > IIUC, libvirt can again provide compatibility with old
> > > QEMU by simply using the existing device type and setting
> > > disable-legacy ?  Can you confirm this would be live
> > > migration compatible
> > > 
> > >   virtio-*-pci + disable-legacy -> virtio-*pci-non-transitional  
> > 
> > I think yes.  
> 
> This is exactly how it is implemented internally, but I'm not
> promising that this will be kept compatible forever.  And I
> wouldn't like to make that promise unless there's an important
> use case for that.

Shouldn't we be able to ensure compatibility by normal virtio feature
bit handling, as we have already done in the past?

> 
> We could easily ensure it will be compatible in pc-4.0 and older,
> though.  Would that be enough for the use case we have in mind?
> 


--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH for-4.0 v2] virtio: Provide version-specific variants of virtio PCI devices

2018-11-20 Thread Cornelia Huck
On Mon, 19 Nov 2018 19:32:32 -0200
Eduardo Habkost  wrote:

> However, I wish this kind of usability magic didn't automatically
> imposed us the burden of keeping guest ABI compatibility too.
> Keeping ABI compatibility on the machine-friendly device types and
> interfaces is already hard enough.

Hm, what are you thinking about here? We can fence off new features in
the hw compat section (and we already do so today), so the remaining
compat woe is what you are addressing with this patch set, isn't it?

> 
> We already have aliases that automatically select a virtio device
> type at qdev-monitor.c:qdev_alias_table[], and I don't know if
> they are supposed to keep a stable guest ABI.

As the comment there states, it was a bad idea... when you are using
the alias, you can't pass any properties that are unique to the
transport anyway. Hopefully the compat code dealt with the
disable-modern for old machine types correctly in this case; but I
don't think that helps for the bus dilemma.

I'm not sure we want to worry about the aliases too much; it might even
be a good idea to deprecate them.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


  1   2   >