date:20200715

Re: [PATCH 12/18] qemu: capabilities: Add QEMU_CAPS_BLOCKDEV_HOSTDEV_SCSI

2020-07-15 Thread Peter Krempa

On Thu, Jul 16, 2020 at 00:16:23 +0200, JÃ¡n Tomko wrote:
> On a Friday in 2020, Peter Krempa wrote:
> > We want to instantiate hostdevs via -blockdev too. Add a separate
> > capability for them for a clean transition. The new capability will be
> > enabled when QEMU_CAPS_BLOCKDEV is present once all code is prepared.
> > 
> 
> What is the benefit here compared to using QEMU_CAPS_BLOCKDEV directly?

It encodes whether the particular VM instance was already started with
-blockdev used for hostdevs. The benefit ... or rather necessity is that
we need to handle cases when the VM was started with -drive differently
on hot-unplug of the hostdev.

I'd have to encode that bit of information somehow regardless of which
approach is used to actually store it.

> Seems more like a libvirt capability than a QEMU capability.

It is a capability of that particular instance of the VM the same way as
others. A VM started with pre-blockdev libvirt but modern qemu will not
use -blockdev for hotplug after libvirt upgrade and this is exactly te
same situation.

> Jano
> 
> > Signed-off-by: Peter Krempa 
> > ---
> > src/qemu/qemu_capabilities.c | 1 +
> > src/qemu/qemu_capabilities.h | 1 +
> > 2 files changed, 2 insertions(+)

Re: device compatibility interface for live migration with assigned devices

2020-07-15 Thread Jason Wang




On 2020/7/14 上午7:29, Yan Zhao wrote:

hi folks,
we are defining a device migration compatibility interface that helps upper
layer stack like openstack/ovirt/libvirt to check if two devices are
live migration compatible.
The "devices" here could be MDEVs, physical devices, or hybrid of the two.
e.g. we could use it to check whether
- a src MDEV can migrate to a target MDEV,
- a src VF in SRIOV can migrate to a target VF in SRIOV,
- a src MDEV can migration to a target VF in SRIOV.
   (e.g. SIOV/SRIOV backward compatibility case)

The upper layer stack could use this interface as the last step to check
if one device is able to migrate to another device before triggering a real
live migration procedure.
we are not sure if this interface is of value or help to you. please don't
hesitate to drop your valuable comments.


(1) interface definition
The interface is defined in below way:

  __userspace
   /\  \
  / \write
 / read  \
/__   ___\|/_
   | migration_version | | migration_version |-->check migration
   - -   compatibility
  device Adevice B


a device attribute named migration_version is defined under each device's
sysfs node. e.g. 
(/sys/bus/pci/devices/\:00\:02.0/$mdev_UUID/migration_version).



Are you aware of the devlink based device management interface that is 
proposed upstream? I think it has many advantages over sysfs, do you 
consider to switch to that?




userspace tools read the migration_version as a string from the source device,
and write it to the migration_version sysfs attribute in the target device.

The userspace should treat ANY of below conditions as two devices not 
compatible:
- any one of the two devices does not have a migration_version attribute
- error when reading from migration_version attribute of one device
- error when writing migration_version string of one device to
   migration_version attribute of the other device

The string read from migration_version attribute is defined by device vendor
driver and is completely opaque to the userspace.



My understanding is that something opaque to userspace is not the 
philosophy of Linux. Instead of having a generic API but opaque value, 
why not do in a vendor specific way like:


1) exposing the device capability in a vendor specific way via 
sysfs/devlink or other API
2) management read capability in both src and dst and determine whether 
we can do the migration


This is the way we plan to do with vDPA.

Thanks



for a Intel vGPU, string format can be defined like
"parent device PCI ID" + "version of gvt driver" + "mdev type" + "aggregator 
count".

for an NVMe VF connecting to a remote storage. it could be
"PCI ID" + "driver version" + "configured remote storage URL"

for a QAT VF, it may be
"PCI ID" + "driver version" + "supported encryption set".

(to avoid namespace confliction from each vendor, we may prefix a driver name to
each migration_version string. e.g. i915-v1-8086-591d-i915-GVTg_V5_8-1)


(2) backgrounds

The reason we hope the migration_version string is opaque to the userspace
is that it is hard to generalize standard comparing fields and comparing
methods for different devices from different vendors.
Though userspace now could still do a simple string compare to check if
two devices are compatible, and result should also be right, it's still
too limited as it excludes the possible candidate whose migration_version
string fails to be equal.
e.g. an MDEV with mdev_type_1, aggregator count 3 is probably compatible
with another MDEV with mdev_type_3, aggregator count 1, even their
migration_version strings are not equal.
(assumed mdev_type_3 is of 3 times equal resources of mdev_type_1).

besides that, driver version + configured resources are all elements demanding
to take into account.

So, we hope leaving the freedom to vendor driver and let it make the final 
decision
in a simple reading from source side and writing for test in the target side 
way.


we then think the device compatibility issues for live migration with assigned
devices can be divided into two steps:
a. management tools filter out possible migration target devices.
Tags could be created according to info from product specification.
we think openstack/ovirt may have vendor proprietary components to create
those customized tags for each product from each vendor.
e.g.
for Intel vGPU, with a vGPU(a MDEV device) in source side, the tags to
search target vGPU are like:
a tag for compatible parent PCI IDs,
a tag for a range of gvt driver versions,
a tag for a range of mdev type + aggregator count

for NVMe VF, the tags to search target VF may be like:
a tag for compatible PCI IDs,
a tag for a range of driver versions,
a tag for URL of configured remote storage.

b. with the output from st

Re: [RFC v2 1/1] memory: Delete assertion in memory_region_unregister_iommu_notifier

2020-07-15 Thread Jason Wang




On 2020/7/16 上午9:00, Peter Xu wrote:

On Mon, Jul 13, 2020 at 12:04:16PM +0800, Jason Wang wrote:

On 2020/7/10 下午9:30, Peter Xu wrote:

On Fri, Jul 10, 2020 at 02:34:11PM +0800, Jason Wang wrote:

On 2020/7/9 下午10:10, Peter Xu wrote:

On Thu, Jul 09, 2020 at 01:58:33PM +0800, Jason Wang wrote:

- If we care the performance, it's better to implement the MAP event for
vhost, otherwise it could be a lot of IOTLB miss

I feel like these are two things.

So far what we are talking about is whether vt-d should have knowledge about
what kind of events one iommu notifier is interested in.  I still think we
should keep this as answered in question 1.

The other question is whether we want to switch vhost from UNMAP to MAP/UNMAP
events even without vDMA, so that vhost can establish the mapping even before
IO starts.  IMHO it's doable, but only if the guest runs DPDK workloads.  When
the guest is using dynamic iommu page mappings, I feel like that can be even
slower, because then the worst case is for each IO we'll need to vmexit twice:

  - The first vmexit caused by an invalidation to MAP the page tables, so 
vhost
will setup the page table before IO starts

  - IO/DMA triggers and completes

  - The second vmexit caused by another invalidation to UNMAP the page 
tables

So it seems to be worse than when vhost only uses UNMAP like right now.  At
least we only have one vmexit (when UNMAP).  We'll have a vhost translate()
request from kernel to userspace, but IMHO that's cheaper than the vmexit.

Right but then I would still prefer to have another notifier.

Since vtd_page_walk has nothing to do with device IOTLB. IOMMU have a
dedicated command for flushing device IOTLB. But the check for
vtd_as_has_map_notifier is used to skip the device which can do demand
paging via ATS or device specific way. If we have two different notifiers,
vhost will be on the device iotlb notifier so we don't need it at all?

But we can still have iommu notifier that only registers to UNMAP even after we
introduce dev-iotlb notifier?  We don't want to do page walk for them as well.
TCG should be the only one so far, but I don't know.. maybe there can still be
new ones?

I think you're right. But looking at the codes, it looks like the check of
vtd_as_has_map_notifier() was only used in:

1) vtd_iommu_replay()
2) vtd_iotlb_page_invalidate_notify() (PSI)

For the replay, it's expensive anyhow. For PSI, I think it's just about one
or few mappings, not sure it will have obvious performance impact.

And I had two questions:

1) The codes doesn't check map for DSI or GI, does this match what spec
said? (It looks to me the spec is unclear in this part)

Both DSI/GI should cover maps too?  E.g. vtd_sync_shadow_page_table() in
vtd_iotlb_domain_invalidate().


I meant the code doesn't check whether there's an MAP notifier :)

It's actually checked, because it loops over vtd_as_with_notifiers, and only
MAP notifiers register to that. :)



I may miss something but I don't find the code to block UNMAP notifiers?

vhost_iommu_region_add()
    memory_region_register_iommu_notifier()
        memory_region_update_iommu_notify_flags()
            imrc->notify_flag_changed()
                vtd_iommu_notify_flag_changed()

?




But I agree with you that it should be cleaner to introduce the dev-iotlb
notifier type.



Yes, it can solve the issues of duplicated UNMAP issue of vhost.





2) for the replay() I don't see other implementations (either spapr or
generic one) that did unmap (actually they skip unmap explicitly), any
reason for doing this in intel IOMMU?

I could be wrong, but I'd guess it's because vt-d implemented the caching mode
by leveraging the same invalidation strucuture, so it's harder to make all
things right (IOW, we can't clearly identify MAP with UNMAP when we receive an
invalidation request, because MAP/UNMAP requests look the same).

I didn't check others, but I believe spapr is doing it differently by using
some hypercalls to deliver IOMMU map/unmap requests, which seems a bit close to
what virtio-iommu is doing.  Anyway, the point is if we have explicit MAP/UNMAP
from the guest, logically the replay indeed does not need to do any unmap
because we don't need to call replay() on an already existing device but only
for e.g. hot plug.


But this looks conflict with what memory_region_iommu_replay( ) did, for
IOMMU that doesn't have a replay method, it skips UNMAP request:

     for (addr = 0; addr < memory_region_size(mr); addr += granularity) {
     iotlb = imrc->translate(iommu_mr, addr, IOMMU_NONE, n->iommu_idx);
     if (iotlb.perm != IOMMU_NONE) {
     n->notify(n, &iotlb);
     }

I guess there's no knowledge of whether guest have an explicit MAP/UMAP for
this generic code. Or replay implies that guest doesn't have explicit
MAP/UNMAP?

I think it matches exactly with a hot plug case?  Note that when IOMMU_NONE
could also mean the translation does not exist.  So it's actually trying to m

Re: [RFC v2 1/1] memory: Delete assertion in memory_region_unregister_iommu_notifier

2020-07-15 Thread Peter Xu

On Mon, Jul 13, 2020 at 12:04:16PM +0800, Jason Wang wrote:
> 
> On 2020/7/10 下午9:30, Peter Xu wrote:
> > On Fri, Jul 10, 2020 at 02:34:11PM +0800, Jason Wang wrote:
> > > On 2020/7/9 下午10:10, Peter Xu wrote:
> > > > On Thu, Jul 09, 2020 at 01:58:33PM +0800, Jason Wang wrote:
> > > > > > > - If we care the performance, it's better to implement the MAP 
> > > > > > > event for
> > > > > > > vhost, otherwise it could be a lot of IOTLB miss
> > > > > > I feel like these are two things.
> > > > > > 
> > > > > > So far what we are talking about is whether vt-d should have 
> > > > > > knowledge about
> > > > > > what kind of events one iommu notifier is interested in.  I still 
> > > > > > think we
> > > > > > should keep this as answered in question 1.
> > > > > > 
> > > > > > The other question is whether we want to switch vhost from UNMAP to 
> > > > > > MAP/UNMAP
> > > > > > events even without vDMA, so that vhost can establish the mapping 
> > > > > > even before
> > > > > > IO starts.  IMHO it's doable, but only if the guest runs DPDK 
> > > > > > workloads.  When
> > > > > > the guest is using dynamic iommu page mappings, I feel like that 
> > > > > > can be even
> > > > > > slower, because then the worst case is for each IO we'll need to 
> > > > > > vmexit twice:
> > > > > > 
> > > > > >  - The first vmexit caused by an invalidation to MAP the page 
> > > > > > tables, so vhost
> > > > > >will setup the page table before IO starts
> > > > > > 
> > > > > >  - IO/DMA triggers and completes
> > > > > > 
> > > > > >  - The second vmexit caused by another invalidation to UNMAP 
> > > > > > the page tables
> > > > > > 
> > > > > > So it seems to be worse than when vhost only uses UNMAP like right 
> > > > > > now.  At
> > > > > > least we only have one vmexit (when UNMAP).  We'll have a vhost 
> > > > > > translate()
> > > > > > request from kernel to userspace, but IMHO that's cheaper than the 
> > > > > > vmexit.
> > > > > Right but then I would still prefer to have another notifier.
> > > > > 
> > > > > Since vtd_page_walk has nothing to do with device IOTLB. IOMMU have a
> > > > > dedicated command for flushing device IOTLB. But the check for
> > > > > vtd_as_has_map_notifier is used to skip the device which can do demand
> > > > > paging via ATS or device specific way. If we have two different 
> > > > > notifiers,
> > > > > vhost will be on the device iotlb notifier so we don't need it at all?
> > > > But we can still have iommu notifier that only registers to UNMAP even 
> > > > after we
> > > > introduce dev-iotlb notifier?  We don't want to do page walk for them 
> > > > as well.
> > > > TCG should be the only one so far, but I don't know.. maybe there can 
> > > > still be
> > > > new ones?
> > > 
> > > I think you're right. But looking at the codes, it looks like the check of
> > > vtd_as_has_map_notifier() was only used in:
> > > 
> > > 1) vtd_iommu_replay()
> > > 2) vtd_iotlb_page_invalidate_notify() (PSI)
> > > 
> > > For the replay, it's expensive anyhow. For PSI, I think it's just about 
> > > one
> > > or few mappings, not sure it will have obvious performance impact.
> > > 
> > > And I had two questions:
> > > 
> > > 1) The codes doesn't check map for DSI or GI, does this match what spec
> > > said? (It looks to me the spec is unclear in this part)
> > Both DSI/GI should cover maps too?  E.g. vtd_sync_shadow_page_table() in
> > vtd_iotlb_domain_invalidate().
> 
> 
> I meant the code doesn't check whether there's an MAP notifier :)

It's actually checked, because it loops over vtd_as_with_notifiers, and only
MAP notifiers register to that. :)

But I agree with you that it should be cleaner to introduce the dev-iotlb
notifier type.

> 
> 
> > 
> > > 2) for the replay() I don't see other implementations (either spapr or
> > > generic one) that did unmap (actually they skip unmap explicitly), any
> > > reason for doing this in intel IOMMU?
> > I could be wrong, but I'd guess it's because vt-d implemented the caching 
> > mode
> > by leveraging the same invalidation strucuture, so it's harder to make all
> > things right (IOW, we can't clearly identify MAP with UNMAP when we receive 
> > an
> > invalidation request, because MAP/UNMAP requests look the same).
> > 
> > I didn't check others, but I believe spapr is doing it differently by using
> > some hypercalls to deliver IOMMU map/unmap requests, which seems a bit 
> > close to
> > what virtio-iommu is doing.  Anyway, the point is if we have explicit 
> > MAP/UNMAP
> > from the guest, logically the replay indeed does not need to do any unmap
> > because we don't need to call replay() on an already existing device but 
> > only
> > for e.g. hot plug.
> 
> 
> But this looks conflict with what memory_region_iommu_replay( ) did, for
> IOMMU that doesn't have a replay method, it skips UNMAP request:
> 
>     for (addr = 0; addr < memory_region_size(mr); addr += granularity) {
>     iotlb = imrc->translate(iommu_mr, addr, IOMM

Re: [PATCH 03/11] virDomainHostdevDefFormatSubsys: Split out formatting of PCI subsystem

2020-07-15 Thread Ján Tomko


On a Tuesday in 2020, Peter Krempa wrote:

Similarly to previous commit split out formatting of the PCI subsystem
related stuff.

Signed-off-by: Peter Krempa 
---
src/conf/domain_conf.c | 74 ++
1 file changed, 46 insertions(+), 28 deletions(-)



Reviewed-by: Ján Tomko 

Jano


signature.asc
Description: PGP signature

Re: [PATCH 02/11] virDomainHostdevDefFormatSubsys: Split out formatting of USB subsystem

2020-07-15 Thread Ján Tomko


On a Tuesday in 2020, Peter Krempa wrote:

Separate out bits related to USB so that the logic isn't entangled in
multiple conditional statements.

Signed-off-by: Peter Krempa 
---
src/conf/domain_conf.c | 73 +-
1 file changed, 51 insertions(+), 22 deletions(-)



Reviewed-by: Ján Tomko 

Jano


signature.asc
Description: PGP signature

Re: [PATCH 01/11] virDomainHostdevDefFormatSubsys: Use virXMLFormatElement

2020-07-15 Thread Ján Tomko


On a Tuesday in 2020, Peter Krempa wrote:

Refactor the formatter to the new multiple buffer based approach so that
we can easily separate it into formatters per subsys type.

Signed-off-by: Peter Krempa 
---
src/conf/domain_conf.c | 65 ++
1 file changed, 27 insertions(+), 38 deletions(-)


Reviewed-by: Ján Tomko 

Jano


signature.asc
Description: PGP signature

Re: [PATCH 17/18] qemu: caps: Enable QEMU_CAPS_BLOCKDEV_HOSTDEV_SCSI

2020-07-15 Thread Ján Tomko


On a Friday in 2020, Peter Krempa wrote:

Enable it when regular QEMU_CAPS_BLOCKDEV is present.

Signed-off-by: Peter Krempa 
---
src/qemu/qemu_capabilities.c  |  3 ++
.../caps_4.2.0.aarch64.xml|  1 +
.../qemucapabilitiesdata/caps_4.2.0.s390x.xml |  1 +
.../caps_4.2.0.x86_64.xml |  1 +
.../caps_5.0.0.aarch64.xml|  1 +
.../qemucapabilitiesdata/caps_5.0.0.ppc64.xml |  1 +
.../caps_5.0.0.riscv64.xml|  1 +
.../caps_5.0.0.x86_64.xml |  1 +
.../caps_5.1.0.x86_64.xml |  1 +
.../hostdev-scsi-lsi.x86_64-latest.args   | 52 +++
...ostdev-scsi-virtio-scsi.x86_64-latest.args | 46 
11 files changed, 65 insertions(+), 44 deletions(-)



Reviewed-by: Ján Tomko 

Jano


signature.asc
Description: PGP signature

Re: [PATCH 12/18] qemu: capabilities: Add QEMU_CAPS_BLOCKDEV_HOSTDEV_SCSI

2020-07-15 Thread Ján Tomko


On a Friday in 2020, Peter Krempa wrote:

We want to instantiate hostdevs via -blockdev too. Add a separate
capability for them for a clean transition. The new capability will be
enabled when QEMU_CAPS_BLOCKDEV is present once all code is prepared.



What is the benefit here compared to using QEMU_CAPS_BLOCKDEV directly?

Seems more like a libvirt capability than a QEMU capability.

Jano


Signed-off-by: Peter Krempa 
---
src/qemu/qemu_capabilities.c | 1 +
src/qemu/qemu_capabilities.h | 1 +
2 files changed, 2 insertions(+)


signature.asc
Description: PGP signature

Re: [PATCH 18/18] qemuBuildSCSIHostdevDrvStr: unexport

2020-07-15 Thread Ján Tomko


On a Friday in 2020, Peter Krempa wrote:

The function is no longer called from other modules.

Signed-off-by: Peter Krempa 
---
src/qemu/qemu_command.c | 2 +-
src/qemu/qemu_command.h | 3 ---
2 files changed, 1 insertion(+), 4 deletions(-)



Reviewed-by: Ján Tomko 

Jano


signature.asc
Description: PGP signature

Re: [PATCH 16/18] qemuDomainRemoveHostDevice: Use new infrastructure for (i)SCSI

2020-07-15 Thread Ján Tomko


On a Friday in 2020, Peter Krempa wrote:

Similarly to previous commits, modify the hostdev detach code to use
blockdev infrastructure to detach (i)SCSI hostdevs.

Signed-off-by: Peter Krempa 
---
src/qemu/qemu_hotplug.c | 20 +++-
1 file changed, 3 insertions(+), 17 deletions(-)



Reviewed-by: Ján Tomko 

Jano


signature.asc
Description: PGP signature

Re: [PATCH 15/18] qemuDomainAttachHostSCSIDevice: Use new infrastructure

2020-07-15 Thread Ján Tomko


On a Friday in 2020, Peter Krempa wrote:

Similarly to command line creation, use the blockdev helpers when
hotplugging an (i)SCSI hostdev.

Signed-off-by: Peter Krempa 
---
src/qemu/qemu_hotplug.c | 45 +++--
1 file changed, 7 insertions(+), 38 deletions(-)



Reviewed-by: Ján Tomko 

Jano


signature.asc
Description: PGP signature

Re: [PATCH 14/18] qemuBuildHostdevSCSICommandLine: Use new infrastructure

2020-07-15 Thread Ján Tomko


On a Friday in 2020, Peter Krempa wrote:

In preparation for instantiating (i)SCSI hostdevs via -blockdev,
refactor qemuBuildHostdevSCSICommandLine to use the new infrastructure
which will do it automatically.

Signed-off-by: Peter Krempa 
---
src/qemu/qemu_command.c | 44 -
1 file changed, 4 insertions(+), 40 deletions(-)



Reviewed-by: Ján Tomko 

Jano


signature.asc
Description: PGP signature

Re: [PATCH 13/18] qemu: command: Create qemuBlockStorageSourceAttachData for (i)SCSI hostdevs

2020-07-15 Thread Ján Tomko


On a Friday in 2020, Peter Krempa wrote:

Add convertor for creating qemuBlockStorageSourceAttachData which will
allow reusing the infrastructure which we have for attaching disks also
for hostdevs.

Signed-off-by: Peter Krempa 
---
src/qemu/qemu_command.c | 83 +
src/qemu/qemu_command.h |  8 
2 files changed, 91 insertions(+)



Reviewed-by: Ján Tomko 

Jano


signature.asc
Description: PGP signature

Re: [PATCH 11/18] qemuBuildSCSIHostdevDevStr: Pass in backend alias

2020-07-15 Thread Ján Tomko


On a Friday in 2020, Peter Krempa wrote:

Don't (re)generate the backend alias (alias of the -drive backend for
now) internally but rather pass it in. Later on it will be replaced by
the nodename when blockdev is used depending on the capabilities.

Signed-off-by: Peter Krempa 
---
src/qemu/qemu_command.c | 14 --
src/qemu/qemu_command.h |  4 +++-
src/qemu/qemu_hotplug.c |  2 +-
3 files changed, 12 insertions(+), 8 deletions(-)



Reviewed-by: Ján Tomko 

Jano


signature.asc
Description: PGP signature

Re: [PATCH 10/18] qemuBuildHostdevCommandLine: Extract (i)SCSI code

2020-07-15 Thread Ján Tomko


On a Friday in 2020, Peter Krempa wrote:

Move all (i)SCSI related code into a new function named
'qemuBuildHostdevSCSICommandLine'.

Signed-off-by: Peter Krempa 
---
src/qemu/qemu_command.c | 61 +
1 file changed, 38 insertions(+), 23 deletions(-)



Reviewed-by: Ján Tomko 

Jano


signature.asc
Description: PGP signature

Re: [PATCH 08/18] qemu: domain: Regenerate hostdev source private data

2020-07-15 Thread Ján Tomko


On a Friday in 2020, Peter Krempa wrote:

When upgrading from a libvirt which didn't format private data of a
virStorageSource representing an iSCSI hostdev source, we might need to
generate some internal data so that the code still works as if it was
present in the status XML.

Signed-off-by: Peter Krempa 
---
src/qemu/qemu_domain.c| 54 ++-
.../disk-secinfo-upgrade-in.xml   | 20 +++
.../disk-secinfo-upgrade-out.xml  | 30 +++
3 files changed, 102 insertions(+), 2 deletions(-)



Reviewed-by: Ján Tomko 

Jano


signature.asc
Description: PGP signature

Re: [PATCH 09/18] qemu: hotplug: Don't regenerate iSCSI secret alias

2020-07-15 Thread Ján Tomko


On a Friday in 2020, Peter Krempa wrote:

We now store the alias of the secrets in the status XML so there's no
need to generate it again.

Signed-off-by: Peter Krempa 
---
src/qemu/qemu_hotplug.c | 19 +++
1 file changed, 7 insertions(+), 12 deletions(-)



Reviewed-by: Ján Tomko 

Jano


signature.asc
Description: PGP signature

Re: [PATCH 07/18] qemuDomainSecretHostdevDestroy: Don't clear secinfo alias

2020-07-15 Thread Ján Tomko


On a Friday in 2020, Peter Krempa wrote:

We need the alias to deal with hot-unplug of the hostdev. Use
qemuDomainSecretInfoDestroy which clears only the secrets and not the
alias. The same function is used also for handling disk secrets.

Signed-off-by: Peter Krempa 
---
src/qemu/qemu_domain.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)



Reviewed-by: Ján Tomko 

Jano


signature.asc
Description: PGP signature

Re: [PATCH 06/18] qemustatusxml2xmltest: Add tests for iSCSI hostdev private data handling

2020-07-15 Thread Ján Tomko


On a Friday in 2020, Peter Krempa wrote:

Signed-off-by: Peter Krempa 
---
tests/qemustatusxml2xmldata/modern-in.xml | 18 ++
1 file changed, 18 insertions(+)



Reviewed-by: Ján Tomko 

Jano


signature.asc
Description: PGP signature

Re: [PATCH 05/18] virDomainHostdevSubsysSCSIiSCSIDefParseXML: Parse private data of virStorageSource

2020-07-15 Thread Ján Tomko


On a Friday in 2020, Peter Krempa wrote:

We store the config of an iSCSI hostdev in a virStorageSource structure.
Parse the private data portion.

Signed-off-by: Peter Krempa 
---
src/conf/domain_conf.c | 39 +--
1 file changed, 29 insertions(+), 10 deletions(-)

diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
index bda9375f13..ceaf73772d 100644
--- a/src/conf/domain_conf.c
+++ b/src/conf/domain_conf.c
@@ -8283,7 +8283,9 @@ virDomainHostdevSubsysSCSIHostDefParseXML(xmlNodePtr 
sourcenode,
static int
virDomainHostdevSubsysSCSIiSCSIDefParseXML(xmlNodePtr sourcenode,
   virDomainHostdevSubsysSCSIPtr def,
-   xmlXPathContextPtr ctxt)
+   xmlXPathContextPtr ctxt,
+   unsigned int flags,
+   virDomainXMLOptionPtr xmlopt)
{
int auth_secret_usage = -1;
xmlNodePtr cur;
@@ -8348,13 +8350,27 @@ virDomainHostdevSubsysSCSIiSCSIDefParseXML(xmlNodePtr 
sourcenode,
}
cur = cur->next;
}
+
+if ((flags & VIR_DOMAIN_DEF_PARSE_STATUS) &&


Extra parentheses.


+xmlopt && xmlopt->privateData.storageParse) {
+VIR_XPATH_NODE_AUTORESTORE(ctxt);
+
+ctxt->node = sourcenode;
+
+if ((ctxt->node = virXPathNode("./privateData", ctxt)) &&
+xmlopt->privateData.storageParse(ctxt, iscsisrc->src) < 0)
+return -1;
+}
+
return 0;
}



Reviewed-by: Ján Tomko 

Jano


signature.asc
Description: PGP signature

Re: [PATCH 04/18] virDomainHostdevDefFormatSubsys: Format private data for a virStorageSource

2020-07-15 Thread Ján Tomko


On a Friday in 2020, Peter Krempa wrote:

iSCSI subsystem hostdevs store the data as a virStorageSource. Format
the private data part of the virStorageSource in the appropriate place.

Signed-off-by: Peter Krempa 
---
src/conf/domain_conf.c | 32 
1 file changed, 20 insertions(+), 12 deletions(-)



Reviewed-by: Ján Tomko 

Jano


signature.asc
Description: PGP signature

Re: [PATCH 02/18] qemuBlockStorageSourceGetBackendProps: Allow skipping "discard":"unmap"

2020-07-15 Thread Ján Tomko


On a Friday in 2020, Peter Krempa wrote:

It doesn't make sense to format "discard" when doing a -blockdev backend
of scsi-generic used with SCSI hostdevs. Add a way to skip it.

Signed-off-by: Peter Krempa 
---
src/qemu/qemu_block.c | 8 
src/qemu/qemu_block.h | 1 +
2 files changed, 9 insertions(+)

diff --git a/src/qemu/qemu_block.c b/src/qemu/qemu_block.c
index 10ddf53b3b..bdd07d9925 100644
--- a/src/qemu/qemu_block.c
+++ b/src/qemu/qemu_block.c
@@ -1061,6 +1061,9 @@ 
qemuBlockStorageSourceGetBlockdevGetCacheProps(virStorageSourcePtr src,
 *  omit any data which does not identify the image itself
 *  QEMU_BLOCK_STORAGE_SOURCE_BACKEND_PROPS_AUTO_READONLY:
 *  use the auto-read-only feature of qemu
+ *  QEMU_BLOCK_STORAGE_SOURCE_BACKEND_PROPS_SKIP_UNMAP:
+ *  don't enable 'discard:unmap' option for passing through dicards


discards


+ *  (note that this is disabled also for _LEGACY and _TARGET_ONLY options)
 *


So outside of tests, this is (currently) used by one out of three
callers. Would it make sense to negate the option?


 * Creates a JSON object describing the underlying storage or protocol of a
 * storage source. Returns NULL on error and reports an appropriate error 
message.


Reviewed-by: Ján Tomko 

Jano


signature.asc
Description: PGP signature

Re: [PATCH 03/18] qemuBlockStorageSourceAttachData: Add field for ad-hoc storage node name

2020-07-15 Thread Ján Tomko


On a Friday in 2020, Peter Krempa wrote:

SCSI hostdevs don't have a virStorageSource associated with the backend
in certain cases. Adding a separate field to hold memory for a copy of
the nodename of the storage backend will allow reusing the blockdev
machinery also for SCSI hostdevs.

Signed-off-by: Peter Krempa 
---
src/qemu/qemu_block.c | 1 +
src/qemu/qemu_block.h | 1 +
2 files changed, 2 insertions(+)



Reviewed-by: Ján Tomko 

Jano


signature.asc
Description: PGP signature

Re: [PATCH 01/18] qemuBlockStorageSourceGetBackendProps: Convert boolean arguments to flags

2020-07-15 Thread Ján Tomko


On a Friday in 2020, Peter Krempa wrote:

Upcoming commit will need to add another flag for the function so
convert it to a bitwise-or'd array of flags to prevent having 4
booleans.



false true  false true  false true  false false
false true  true  false true  false false false
false true  true  false false false false true
false true  true  false true  true  true  false
false true  true  false true  false true  true
false false true  false false false false false
false true  true  true  true  false false true
false true  true  false true  true  true  true
false true  true  true  false true  false true
false false true  false false false false true


Signed-off-by: Peter Krempa 
---
src/qemu/qemu_block.c   | 32 +---
src/qemu/qemu_block.h   | 10 +++---
src/qemu/qemu_command.c |  3 ++-
tests/qemublocktest.c   | 18 --
4 files changed, 42 insertions(+), 21 deletions(-)

diff --git a/src/qemu/qemu_block.c b/src/qemu/qemu_block.c
index a2eabbcd64..10ddf53b3b 100644
--- a/src/qemu/qemu_block.c
+++ b/src/qemu/qemu_block.c
@@ -1052,26 +1052,32 @@ 
qemuBlockStorageSourceGetBlockdevGetCacheProps(virStorageSourcePtr src,
/**
 * qemuBlockStorageSourceGetBackendProps:
 * @src: disk source
- * @legacy: use legacy formatting of attributes (for -drive / old qemus)
- * @onlytarget: omit any data which does not identify the image itself
- * @autoreadonly: use the auto-read-only feature of qemu
+ * @flags: bitwise-or of qemuBlockStorageSourceBackendPropsFlags
+ *
+ * Flags:
+ *   QEMU_BLOCK_STORAGE_SOURCE_BACKEND_PROPS_LEGACY:


This flag has three spaces in front of it.


+ *  use legacy formatting of attributes (for -drive / old qemus)
+ *  QEMU_BLOCK_STORAGE_SOURCE_BACKEND_PROPS_TARGET_ONLY:
+ *  omit any data which does not identify the image itself
+ *  QEMU_BLOCK_STORAGE_SOURCE_BACKEND_PROPS_AUTO_READONLY:
+ *  use the auto-read-only feature of qemu
 *


Reviewed-by: Ján Tomko 

Jano


signature.asc
Description: PGP signature

Re: [PATCH 1/3] virNetSocketCheckProtocols: Separate out checking family via getaddrinfo()

2020-07-15 Thread Ján Tomko


On a Wednesday in 2020, Michal Privoznik wrote:

The virNetSocketCheckProtocols() function is supposed to tell
caller whether IPv4 and/or IPv6 is supported on the system. In
the initial round, it uses getifaddrs() to see if an interface
has IPv4/IPv6 address assigned and then to double check IPv6 it
uses getaddrinfo() to lookup IPv6 loopback address. Separate out
this latter code because it is going to be reused.

Since the original code lived under an #ifdef and the new
function doesn't it is marked as unused - because on some systems
it may be so.

Signed-off-by: Michal Privoznik 
---
src/rpc/virnetsocket.c | 63 ++
1 file changed, 39 insertions(+), 24 deletions(-)

diff --git a/src/rpc/virnetsocket.c b/src/rpc/virnetsocket.c
index d1f4c531aa..b6bc3edc3b 100644
--- a/src/rpc/virnetsocket.c
+++ b/src/rpc/virnetsocket.c
@@ -141,16 +141,48 @@ static int virNetSocketForkDaemon(const char *binary)
}
#endif

+
+static int G_GNUC_UNUSED
+virNetSocketCheckProtocolByLookup(const char *address,
+  int family,
+  bool *hasFamily)
+{
+struct addrinfo hints;
+struct addrinfo *ai = NULL;
+int gaierr;
+
+memset(&hints, 0, sizeof(hints));
+hints.ai_family = family;
+hints.ai_flags = AI_PASSIVE | AI_ADDRCONFIG;
+hints.ai_socktype = SOCK_STREAM;
+
+if ((gaierr = getaddrinfo(address, NULL, &hints, &ai)) != 0) {
+*hasFamily = false;
+
+if (gaierr == EAI_FAMILY ||
+#ifdef EAI_ADDRFAMILY
+gaierr == EAI_ADDRFAMILY ||
+#endif
+gaierr == EAI_NONAME) {
+} else {
+virReportError(VIR_ERR_INTERNAL_ERROR,
+   _("Cannot resolve ::1 address: %s"),


s/::1/'%s'/


+   gai_strerror(gaierr));
+return -1;
+}
+} else {
+*hasFamily = true;
+}


Jano


signature.asc
Description: PGP signature

Re: [PATCH 0/3] Fix virnetsocket failure in IPv4 disabled environment

2020-07-15 Thread Ján Tomko


On a Wednesday in 2020, Michal Privoznik wrote:

This imperfection was reported by Andrea here:

https://www.redhat.com/archives/libvir-list/2020-July/msg00753.html

Michal Prívozník (3):
 virNetSocketCheckProtocols: Separate out checking family via
   getaddrinfo()
 virNetSocketCheckProtocols: lookup IPv6 only if suspecting IPv6
 virNetSocketCheckProtocols: Confirm IPv4 by lookup too

src/rpc/virnetsocket.c | 67 +++---
1 file changed, 43 insertions(+), 24 deletions(-)



Reviewed-by: Ján Tomko 

Jano


signature.asc
Description: PGP signature

[libvirt PATCH] network: split out networkSetIPv6Sysctl

2020-07-15 Thread Ján Tomko

Refactor networkSetIPv6Sysctls to remove repetition and reuse
of the 'field' variable.

Signed-off-by: Ján Tomko 
---
 src/network/bridge_driver.c | 71 ++---
 1 file changed, 35 insertions(+), 36 deletions(-)

diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c
index 713763130b..d817b5cbd6 100644
--- a/src/network/bridge_driver.c
+++ b/src/network/bridge_driver.c
@@ -2249,37 +2249,51 @@ networkEnableIPForwarding(bool enableIPv4,
 }
 
 
+static int
+networkSetIPv6Sysctl(const char *bridge,
+ const char *sysctl_field,
+ const char *sysctl_setting,
+ bool ignoreMissing)
+{
+g_autofree char *field = g_strdup_printf(SYSCTL_PATH 
"/net/ipv6/conf/%s/%s",
+ bridge, sysctl_field);
+
+if (ignoreMissing && access(field, W_OK) < 0 && errno == ENOENT)
+return -2;
+
+if (virFileWriteStr(field, sysctl_setting, 0) < 0) {
+virReportSystemError(errno,
+ _("cannot write to '%s' on bridge '%s'"),
+ field, bridge);
+return -1;
+}
+
+return 0;
+}
+
+
 static int
 networkSetIPv6Sysctls(virNetworkObjPtr obj)
 {
 virNetworkDefPtr def = virNetworkObjGetDef(obj);
-char *field = NULL;
-int ret = -1;
 bool enableIPv6 = !!virNetworkDefGetIPByIndex(def, AF_INET6, 0);
+int rc;
 
 /* set disable_ipv6 if there are no ipv6 addresses defined for the
  * network. But also unset it if there *are* ipv6 addresses, as we
  * can't be sure of its default value.
  */
-field = g_strdup_printf(SYSCTL_PATH "/net/ipv6/conf/%s/disable_ipv6",
-def->bridge);
-
-if (access(field, W_OK) < 0 && errno == ENOENT) {
+rc = networkSetIPv6Sysctl(def->bridge, "disable_ipv6",
+  enableIPv6 ? "0" : "1", true);
+if (rc == -2) {
 if (!enableIPv6)
 VIR_DEBUG("ipv6 appears to already be disabled on %s",
   def->bridge);
-ret = 0;
-goto cleanup;
+return 0;
+} else if (rc < 0) {
+return -1;
 }
 
-if (virFileWriteStr(field, enableIPv6 ? "0" : "1", 0) < 0) {
-virReportSystemError(errno,
- _("cannot write to %s to enable/disable IPv6 "
-   "on bridge %s"), field, def->bridge);
-goto cleanup;
-}
-VIR_FREE(field);
-
 /* The rest of the ipv6 sysctl tunables should always be set the
  * same, whether or not we're using ipv6 on this bridge.
  */
@@ -2287,31 +2301,16 @@ networkSetIPv6Sysctls(virNetworkObjPtr obj)
 /* Prevent guests from hijacking the host network by sending out
  * their own router advertisements.
  */
-field = g_strdup_printf(SYSCTL_PATH "/net/ipv6/conf/%s/accept_ra",
-def->bridge);
-
-if (virFileWriteStr(field, "0", 0) < 0) {
-virReportSystemError(errno,
- _("cannot disable %s"), field);
-goto cleanup;
-}
-VIR_FREE(field);
+if (networkSetIPv6Sysctl(def->bridge, "accept_ra", "0", false) < 0)
+return -1;
 
 /* All interfaces used as a gateway (which is what this is, by
  * definition), must always have autoconf=0.
  */
-field = g_strdup_printf(SYSCTL_PATH "/net/ipv6/conf/%s/autoconf", 
def->bridge);
+if (networkSetIPv6Sysctl(def->bridge, "autoconf", "0", false) < 0)
+return -1;
 
-if (virFileWriteStr(field, "0", 0) < 0) {
-virReportSystemError(errno,
- _("cannot disable %s"), field);
-goto cleanup;
-}
-
-ret = 0;
- cleanup:
-VIR_FREE(field);
-return ret;
+return 0;
 }
 
 
-- 
2.25.4

Re: [libvirt PATCH v2 05/15] network: use g_auto wherever appropriate

2020-07-15 Thread Laine Stump


On 7/15/20 11:10 AM, Ján Tomko wrote:

On a Tuesday in 2020, Laine Stump wrote:

This includes standard g_autofree() as well as other objects that have
a cleanup function defined to use via g_autoptr (virCommand,
virJSONValue)

Signed-off-by: Laine Stump 
---
src/network/bridge_driver.c   | 206 ++
src/network/bridge_driver_linux.c |   7 +-
src/network/leaseshelper.c    |  16 +--
3 files changed, 76 insertions(+), 153 deletions(-)

diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c
index ab359acdb5..31bd0dd92c 100644
--- a/src/network/bridge_driver.c
+++ b/src/network/bridge_driver.c


[...]


@@ -1095,7 +1081,6 @@ networkDnsmasqConfContents(virNetworkObjPtr obj,
    bool wantDNS = dns->enable != VIR_TRISTATE_BOOL_NO;
    virNetworkIPDefPtr tmpipdef, ipdef, ipv4def, ipv6def;
    bool ipv6SLAAC;
-    char *saddr = NULL, *eaddr = NULL;

    *configstr = NULL;



[...]


@@ -1414,6 +1396,8 @@ networkDnsmasqConfContents(virNetworkObjPtr obj,
    int thisRange;
    virNetworkDHCPRangeDef range = ipdef->ranges[r];
    g_autofree char *leasetime = NULL;
+    g_autofree char *saddr = NULL;
+    g_autofree char *eaddr = NULL;


300 lines below the original location. Long function is long. :)



At least there were no unrelated changes in be... oh, wait. Nevermind.


A long time ago (1988) in a galaxy far far away (Lake City, Minnesota) I 
worked with a guy who told me that any function that wouldn't fit on a 
single screen was too long and needed to be broken up (this was in the 
80x25 ASCII terminal days). He would probably rip out his moustache and 
scream if he saw some of the functions in libvirt.







    if (!(saddr = virSocketAddrFormat(&range.addr.start)) ||
    !(eaddr = virSocketAddrFormat(&range.addr.end)))


[...]


@@ -2248,7 +2206,7 @@ static int
networkSetIPv6Sysctls(virNetworkObjPtr obj)
{
    virNetworkDefPtr def = virNetworkObjGetDef(obj);
-    char *field = NULL;
+    g_autofree char *field = NULL;


Last time I tried manually freeing an autofree'd variable, I was told
not to do that O:-)



Yeah, there's a few places where we re-use a pointer for "temporary" 
strings. In a different patch I sent a few weeks ago, I fixed it by just 
declaring multiple separate autofree variables, one for each usage, and 
I think that is the least future-bug-prone method of dealing with it.



(I hadn't seen anyone scolding about not manually freeing and autofree'd 
variable, but since doing so made me uneasy anyway, I'm happy to jump on 
the bandwagon :-)





The clean way here seems to be refactoring the function. I can put that
somewhere into the depths of my TODO list.



If you really want to, you can. Otherwise I can send a patch for that to 
be pushed along with this series, just so that I won't have the icky 
feeling of leaving a job not quite done.






    int ret = -1;
    bool enableIPv6 = !!virNetworkDefGetIPByIndex(def, AF_INET6, 0);

@@ -2273,7 +2231,6 @@ networkSetIPv6Sysctls(virNetworkObjPtr obj)
   "on bridge %s"), field, def->bridge);
    goto cleanup;
    }
-    VIR_FREE(field);

    /* The rest of the ipv6 sysctl tunables should always be set the
 * same, whether or not we're using ipv6 on this bridge.
@@ -2282,6 +2239,7 @@ networkSetIPv6Sysctls(virNetworkObjPtr obj)
    /* Prevent guests from hijacking the host network by sending out
 * their own router advertisements.
 */
+    VIR_FREE(field);
    field = g_strdup_printf(SYSCTL_PATH "/net/ipv6/conf/%s/accept_ra",
    def->bridge);

@@ -2290,11 +2248,11 @@ networkSetIPv6Sysctls(virNetworkObjPtr obj)
 _("cannot disable %s"), field);
    goto cleanup;
    }
-    VIR_FREE(field);

    /* All interfaces used as a gateway (which is what this is, by
 * definition), must always have autoconf=0.
 */
+    VIR_FREE(field);
    field = g_strdup_printf(SYSCTL_PATH "/net/ipv6/conf/%s/autoconf", 
def->bridge);


    if (virFileWriteStr(field, "0", 0) < 0) {
@@ -2305,7 +2263,6 @@ networkSetIPv6Sysctls(virNetworkObjPtr obj)

    ret = 0;
 cleanup:
-    VIR_FREE(field);
    return ret;
}



[...]

@@ -3276,8 +3221,6 @@ 
networkFindUnusedBridgeName(virNetworkObjListPtr nets,

   MAX_BRIDGE_ID);
    ret = 0;


So this function returned 0 even on failure.
Introduced by a28d3e485f01d16320af15780bc935f54782a45d


 cleanup:
-    if (ret < 0)
-    VIR_FREE(newname);
    return ret;
}



Without the networkSetIPv6Sysctls changes:
Reviewed-by: Ján Tomko 

Jano

Re: [libvirt-jenkins-ci PATCH 7/7] lcitool: Create and expose ccache wrappers

2020-07-15 Thread Andrea Bolognani

On Wed, 2020-07-15 at 16:57 +0100, Daniel P. Berrangé wrote:
> On Fri, Mar 27, 2020 at 08:34:59PM +0100, Andrea Bolognani wrote:
> > +commands.extend([
> > +"mkdir -p /usr/local/share/ccache-wrappers",
> > +])
> > +
> > +if cross_arch:
> > +commands.extend([
> > +"ln -s {ccache} 
> > /usr/local/share/ccache-wrappers/{cross_abi}-cc",
> > +"ln -s {ccache} 
> > /usr/local/share/ccache-wrappers/{cross_abi}-$(basename {cc})",
> > +])
> > +else:
> > +commands.extend([
> > +"ln -s {ccache} /usr/local/share/ccache-wrappers/cc",
> > +"ln -s {ccache} 
> > /usr/local/share/ccache-wrappers/$(basename {cc})",
> > +])
> > +
> >  script = "\nRUN " + (" && \\\n".join(commands)) + "\n"
> >  sys.stdout.write(script.format(**varmap))
> 
> I've just realized that this addition has prevented the caching and
> reuse the base layer across the cross images. The first "RUN" cmmand
> was supposed to have stuff that is common across cross images, but
> we've accidentally included the "cross_abi" in the path for ccache
> here.
> 
> We need to put the ccache setup in the second RUN command, for cross
> containers. Only native images can have it in the first RUN command.

Yeah, I realized that a while ago but always forgot to send out a
patch :) Do you want to take a stab at it?

-- 
Andrea Bolognani / Red Hat / Virtualization

Re: [libvirt-jenkins-ci PATCH 7/7] lcitool: Create and expose ccache wrappers

2020-07-15 Thread Daniel P . Berrangé

On Fri, Mar 27, 2020 at 08:34:59PM +0100, Andrea Bolognani wrote:
> VM-based builds have used ccache by default for a very long time,
> and now container-based builds will too.
> 
> Signed-off-by: Andrea Bolognani 
> ---
>  guests/lcitool | 18 ++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/guests/lcitool b/guests/lcitool
> index 117e1a5..011fc07 100755
> --- a/guests/lcitool
> +++ b/guests/lcitool
> @@ -651,6 +651,8 @@ class Application:
>  varmap = self._dockerfile_build_varmap_rpm(facts, mappings, 
> pip_mappings, projects, cross_arch)
>  
>  varmap["package_manager"] = facts["package_manager"]
> +varmap["cc"] = facts["cc"]
> +varmap["ccache"] = facts["ccache"]
>  varmap["make"] = facts["make"]
>  varmap["ninja"] = facts["ninja"]
>  varmap["python"] = facts["python"]
> @@ -864,6 +866,21 @@ class Application:
>  "{package_manager} clean all -y",
>  ])
>  
> +commands.extend([
> +"mkdir -p /usr/local/share/ccache-wrappers",
> +])
> +
> +if cross_arch:
> +commands.extend([
> +"ln -s {ccache} 
> /usr/local/share/ccache-wrappers/{cross_abi}-cc",
> +"ln -s {ccache} 
> /usr/local/share/ccache-wrappers/{cross_abi}-$(basename {cc})",
> +])
> +else:
> +commands.extend([
> +"ln -s {ccache} /usr/local/share/ccache-wrappers/cc",
> +"ln -s {ccache} /usr/local/share/ccache-wrappers/$(basename 
> {cc})",
> +])
> +
>  script = "\nRUN " + (" && \\\n".join(commands)) + "\n"
>  sys.stdout.write(script.format(**varmap))

I've just realized that this addition has prevented the caching and
reuse the base layer across the cross images. The first "RUN" cmmand
was supposed to have stuff that is common across cross images, but
we've accidentally included the "cross_abi" in the path for ccache
here.

We need to put the ccache setup in the second RUN command, for cross
containers. Only native images can have it in the first RUN command.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [libvirt PATCH v2 15/15] nwfilter: convert remaining VIR_FREE() to g_free()

2020-07-15 Thread Ján Tomko


On a Tuesday in 2020, Laine Stump wrote:

Signed-off-by: Laine Stump 
---
src/nwfilter/nwfilter_dhcpsnoop.c | 16 
src/nwfilter/nwfilter_driver.c| 10 +-
src/nwfilter/nwfilter_ebiptables_driver.c |  2 +-
src/nwfilter/nwfilter_gentech_driver.c|  6 +++---
src/nwfilter/nwfilter_learnipaddr.c   |  8 
5 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/src/nwfilter/nwfilter_dhcpsnoop.c 
b/src/nwfilter/nwfilter_dhcpsnoop.c
index 64671af135..aafa6de322 100644
--- a/src/nwfilter/nwfilter_dhcpsnoop.c
+++ b/src/nwfilter/nwfilter_dhcpsnoop.c
@@ -314,7 +314,7 @@ virNWFilterSnoopCancel(char **threadKey)
virNWFilterSnoopActiveLock();

ignore_value(virHashRemoveEntry(virNWFilterSnoopState.active, *threadKey));
-VIR_FREE(*threadKey);
+g_free(*threadKey);


This one should use g_clear_pointer.



virNWFilterSnoopActiveUnlock();
}
diff --git a/src/nwfilter/nwfilter_driver.c b/src/nwfilter/nwfilter_driver.c
index 39d0a2128e..7853ad59fa 100644
--- a/src/nwfilter/nwfilter_driver.c
+++ b/src/nwfilter/nwfilter_driver.c
@@ -303,7 +303,7 @@ nwfilterStateInitialize(bool privileged,

 err_free_driverstate:
virNWFilterObjListFree(driver->nwfilters);
-VIR_FREE(driver);
+g_free(driver);


Same here.



return VIR_DRV_STATE_INIT_ERROR;
}
@@ -379,7 +379,7 @@ nwfilterStateCleanup(void)
virNWFilterObjListFree(driver->nwfilters);

virMutexDestroy(&driver->lock);
-VIR_FREE(driver);
+g_free(driver);



Possibly here.


return 0;
}


(I have not verified whether the double use of the pointer may
practically happen, but we do a non-NULL check of these in a few cases)

Reviewed-by: Ján Tomko 

Jano


signature.asc
Description: PGP signature

Re: [libvirt PATCH v2 14/15] nwfilter: convert local pointers to use g_auto*

2020-07-15 Thread Ján Tomko


On a Tuesday in 2020, Laine Stump wrote:

Signed-off-by: Laine Stump 
---
src/nwfilter/nwfilter_dhcpsnoop.c | 91 +++
src/nwfilter/nwfilter_ebiptables_driver.c | 75 +++
src/nwfilter/nwfilter_gentech_driver.c| 15 ++--
src/nwfilter/nwfilter_learnipaddr.c   |  7 +-
4 files changed, 61 insertions(+), 127 deletions(-)



Reviewed-by: Ján Tomko 

Jano


signature.asc
Description: PGP signature

Re: [libvirt PATCH v2 08/15] nwfilter: remove unnecessary code from ebtablesGetSubChainInsts()

2020-07-15 Thread Ján Tomko


On a Tuesday in 2020, Laine Stump wrote:

On failure, this function would clear out and free the list of
subchains it had been called with. This is unnecessary, because the
*only* caller of this function will also clear out and free the list
of subchains if it gets a failure from ebtablesGetSubChainInsts().

(It also makes more logical sense for the function that is creating
the entire list to be the one freeing the entire list, rather than
having a function whose purpose is only to create *one item* on the
list freeing the entire list).


This is the function creating the list, I think it makes sense
to not leave anything allocated in case of failure.

Jano



Signed-off-by: Laine Stump 
---
src/nwfilter/nwfilter_ebiptables_driver.c | 6 --
1 file changed, 6 deletions(-)



signature.asc
Description: PGP signature

Re: [libvirt PATCH v2 13/15] nwfilter: replace VIR_ALLOC with g_new0

2020-07-15 Thread Ján Tomko


On a Tuesday in 2020, Laine Stump wrote:

Signed-off-by: Laine Stump 
---
src/nwfilter/nwfilter_dhcpsnoop.c | 9 +++--
src/nwfilter/nwfilter_driver.c| 3 +--
src/nwfilter/nwfilter_ebiptables_driver.c | 3 +--
src/nwfilter/nwfilter_gentech_driver.c| 3 +--
src/nwfilter/nwfilter_learnipaddr.c   | 6 ++
5 files changed, 8 insertions(+), 16 deletions(-)



Reviewed-by: Ján Tomko 

Jano


signature.asc
Description: PGP signature

Re: [libvirt PATCH v2 12/15] nwfilter: use standard label names when reasonable

2020-07-15 Thread Ján Tomko


On a Tuesday in 2020, Laine Stump wrote:

Rather than having labels named exit, done, exit_snooprequnlock,
skip_rename, etc, use the standard "cleanup" label. And instead of
err_exit, malformed, tear_down_tmpebchains, use "error".

Signed-off-by: Laine Stump 
---
src/nwfilter/nwfilter_dhcpsnoop.c | 36 +++
src/nwfilter/nwfilter_ebiptables_driver.c | 12 
src/nwfilter/nwfilter_gentech_driver.c| 32 ++--
src/nwfilter/nwfilter_learnipaddr.c   | 24 +++
4 files changed, 52 insertions(+), 52 deletions(-)



Reviewed-by: Ján Tomko 

Jano


signature.asc
Description: PGP signature

Re: [libvirt PATCH v2 11/15] nwfilter: transform logic in virNWFilterRuleInstSort to eliminate label

2020-07-15 Thread Ján Tomko


On a Tuesday in 2020, Laine Stump wrote:

This rewrite of a nested conditional produces the same results, but
eliminate a goto and corresponding label.

Signed-off-by: Laine Stump 
---
src/nwfilter/nwfilter_ebiptables_driver.c | 11 +--
1 file changed, 5 insertions(+), 6 deletions(-)



Reviewed-by: Ján Tomko 

Jano


signature.asc
Description: PGP signature

Re: [libvirt PATCH v2 09/15] nwfilter: clear nrules when resetting virNWFilterInst

2020-07-15 Thread Ján Tomko


On a Tuesday in 2020, Laine Stump wrote:

It's possible/probable the callers to virNWFilterInstReset() make it
unnecessary to set the object's nrules to 0 after freeing all its
rules, but that same function is setting nfilters to 0, so let's do
the same for the sake of consistency.

Signed-off-by: Laine Stump 
---
src/nwfilter/nwfilter_gentech_driver.c | 1 +
1 file changed, 1 insertion(+)



Reviewed-by: Ján Tomko 

Jano


signature.asc
Description: PGP signature

Re: [libvirt PATCH v2 10/15] nwfilter: define a typedef for struct ebtablesSubChainInst

2020-07-15 Thread Ján Tomko


On a Tuesday in 2020, Laine Stump wrote:

Signed-off-by: Laine Stump 
---
src/nwfilter/nwfilter_ebiptables_driver.c | 14 --
1 file changed, 8 insertions(+), 6 deletions(-)



Reviewed-by: Ján Tomko 

Jano


signature.asc
Description: PGP signature

[PATCH v4 1/2] conf: add 'isa' controller type

2020-07-15 Thread Roman Bogorodskiy

Introduce 'isa' controller type. In domain XML it looks this way:

...

  

...

Currently, this is needed for the bhyve driver to allow choosing a
specific PCI address for that. In bhyve, this controller is used to
attach serial ports and a boot ROM.

Signed-off-by: Roman Bogorodskiy 
---
 docs/schemas/domaincommon.rng  | 6 ++
 src/conf/domain_conf.c | 9 +
 src/conf/domain_conf.h | 8 
 src/qemu/qemu_command.c| 1 +
 src/qemu/qemu_domain.c | 1 +
 src/qemu/qemu_domain_address.c | 1 +
 src/qemu/qemu_validate.c   | 1 +
 src/vbox/vbox_common.c | 1 +
 8 files changed, 28 insertions(+)

diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng
index 4b4aa60c66..7bed99b161 100644
--- a/docs/schemas/domaincommon.rng
+++ b/docs/schemas/domaincommon.rng
@@ -2416,6 +2416,12 @@
   
 
   
+  
+  
+
+  isa
+
+  
   
   
 
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
index d14485f18d..6befe4bbd8 100644
--- a/src/conf/domain_conf.c
+++ b/src/conf/domain_conf.c
@@ -399,6 +399,7 @@ VIR_ENUM_IMPL(virDomainController,
   "usb",
   "pci",
   "xenbus",
+  "isa",
 );
 
 VIR_ENUM_IMPL(virDomainControllerModelPCI,
@@ -444,6 +445,9 @@ VIR_ENUM_IMPL(virDomainControllerModelSCSI,
   "virtio-non-transitional",
 );
 
+VIR_ENUM_IMPL(virDomainControllerModelISA, 
VIR_DOMAIN_CONTROLLER_MODEL_ISA_LAST,
+);
+
 VIR_ENUM_IMPL(virDomainControllerModelUSB,
   VIR_DOMAIN_CONTROLLER_MODEL_USB_LAST,
   "piix3-uhci",
@@ -2312,6 +2316,7 @@ virDomainControllerDefNew(virDomainControllerType type)
 case VIR_DOMAIN_CONTROLLER_TYPE_SCSI:
 case VIR_DOMAIN_CONTROLLER_TYPE_SATA:
 case VIR_DOMAIN_CONTROLLER_TYPE_CCID:
+case VIR_DOMAIN_CONTROLLER_TYPE_ISA:
 case VIR_DOMAIN_CONTROLLER_TYPE_LAST:
 break;
 }
@@ -10975,6 +10980,8 @@ virDomainControllerModelTypeFromString(const 
virDomainControllerDef *def,
 return virDomainControllerModelIDETypeFromString(model);
 else if (def->type == VIR_DOMAIN_CONTROLLER_TYPE_VIRTIO_SERIAL)
 return virDomainControllerModelVirtioSerialTypeFromString(model);
+else if (def->type == VIR_DOMAIN_CONTROLLER_TYPE_ISA)
+return virDomainControllerModelISATypeFromString(model);
 
 return -1;
 }
@@ -10994,6 +11001,8 @@ 
virDomainControllerModelTypeToString(virDomainControllerDefPtr def,
 return virDomainControllerModelIDETypeToString(model);
 else if (def->type == VIR_DOMAIN_CONTROLLER_TYPE_VIRTIO_SERIAL)
 return virDomainControllerModelVirtioSerialTypeToString(model);
+else if (def->type == VIR_DOMAIN_CONTROLLER_TYPE_ISA)
+return virDomainControllerModelISATypeToString(model);
 
 return NULL;
 }
diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h
index 6a737591e2..188385cc6b 100644
--- a/src/conf/domain_conf.h
+++ b/src/conf/domain_conf.h
@@ -595,6 +595,7 @@ typedef enum {
 VIR_DOMAIN_CONTROLLER_TYPE_USB,
 VIR_DOMAIN_CONTROLLER_TYPE_PCI,
 VIR_DOMAIN_CONTROLLER_TYPE_XENBUS,
+VIR_DOMAIN_CONTROLLER_TYPE_ISA,
 
 VIR_DOMAIN_CONTROLLER_TYPE_LAST
 } virDomainControllerType;
@@ -686,6 +687,12 @@ typedef enum {
 VIR_DOMAIN_CONTROLLER_MODEL_VIRTIO_SERIAL_LAST
 } virDomainControllerModelVirtioSerial;
 
+typedef enum {
+VIR_DOMAIN_CONTROLLER_MODEL_ISA_DEFAULT = -1,
+
+VIR_DOMAIN_CONTROLLER_MODEL_ISA_LAST
+} virDomainControllerModelISA;
+
 #define IS_USB2_CONTROLLER(ctrl) \
 (((ctrl)->type == VIR_DOMAIN_CONTROLLER_TYPE_USB) && \
  ((ctrl)->model == VIR_DOMAIN_CONTROLLER_MODEL_USB_ICH9_EHCI1 || \
@@ -3547,6 +3554,7 @@ VIR_ENUM_DECL(virDomainControllerModelSCSI);
 VIR_ENUM_DECL(virDomainControllerModelUSB);
 VIR_ENUM_DECL(virDomainControllerModelIDE);
 VIR_ENUM_DECL(virDomainControllerModelVirtioSerial);
+VIR_ENUM_DECL(virDomainControllerModelISA);
 VIR_ENUM_DECL(virDomainFS);
 VIR_ENUM_DECL(virDomainFSDriver);
 VIR_ENUM_DECL(virDomainFSAccessMode);
diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c
index 24e99e13b8..d4e190a1ae 100644
--- a/src/qemu/qemu_command.c
+++ b/src/qemu/qemu_command.c
@@ -2699,6 +2699,7 @@ qemuBuildControllerDevStr(const virDomainDef *domainDef,
 case VIR_DOMAIN_CONTROLLER_TYPE_IDE:
 case VIR_DOMAIN_CONTROLLER_TYPE_FDC:
 case VIR_DOMAIN_CONTROLLER_TYPE_XENBUS:
+case VIR_DOMAIN_CONTROLLER_TYPE_ISA:
 case VIR_DOMAIN_CONTROLLER_TYPE_LAST:
 virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
_("Unsupported controller type: %s"),
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
index 2d9d8226d6..f5dbadb323 100644
--- a/src/qemu/qemu_domain.c
+++ b/src/qemu/qemu_domain.c
@@ -5054,6 +5054,7 @@ 
qemuDomainControllerDefPostParse(virDomainControllerDefPtr cont,
 case VIR_DOMAIN_CONTROLLER_TYPE_IDE:

[PATCH v4 2/2] bhyve: support 'isa' controller for LPC

2020-07-15 Thread Roman Bogorodskiy

Support modeling of the 'isa' controller for bhyve. User can manually
define any PCI slot for the 'isa' controller, including PCI slot 1,
but other devices are not allowed to use this address.

When domain configuration requires the 'isa' controller to be present,
automatically add it on domain post-parse stage.

Now, as this controller is always available when needed, it's not
necessary to implicitly add it to the bhyve command line, so remove
bhyveBuildLPCArgStr().

Also, make bhyveDomainDefNeedsISAController() static as it's no longer
used outside of bhyve_domain.c.

As more than one ISA controller is not supported by bhyve,
and multiple controllers with the same index are forbidden,
so forbid ISA controllers with non-zero index for bhyve.

Signed-off-by: Roman Bogorodskiy 
---
 src/bhyve/bhyve_command.c | 27 +++---
 src/bhyve/bhyve_device.c  | 23 +---
 src/bhyve/bhyve_domain.c  | 25 +++--
 src/bhyve/bhyve_domain.h  |  2 --
 ...ml2argv-addr-isa-controller-on-slot-1.args | 10 ++
 ...2argv-addr-isa-controller-on-slot-1.ldargs |  3 ++
 ...xml2argv-addr-isa-controller-on-slot-1.xml | 26 ++
 ...l2argv-addr-isa-controller-on-slot-31.args | 10 ++
 ...argv-addr-isa-controller-on-slot-31.ldargs |  3 ++
 ...ml2argv-addr-isa-controller-on-slot-31.xml | 26 ++
 ...argv-addr-non-isa-controller-on-slot-1.xml | 23 
 .../bhyvexml2argv-console.args|  2 +-
 .../bhyvexml2argv-isa-controller.args | 10 ++
 .../bhyvexml2argv-isa-controller.ldargs   |  3 ++
 .../bhyvexml2argv-isa-controller.xml  | 24 +
 ...bhyvexml2argv-isa-multiple-controllers.xml | 25 +
 .../bhyvexml2argv-serial-grub-nocons.args |  2 +-
 .../bhyvexml2argv-serial-grub.args|  2 +-
 .../bhyvexml2argv-serial.args |  2 +-
 .../bhyvexml2argvdata/bhyvexml2argv-uefi.args |  4 +--
 .../bhyvexml2argv-vnc-autoport.args   |  4 +--
 .../bhyvexml2argv-vnc-vgaconf-io.args |  4 +--
 .../bhyvexml2argv-vnc-vgaconf-off.args|  4 +--
 .../bhyvexml2argv-vnc-vgaconf-on.args |  4 +--
 .../bhyvexml2argvdata/bhyvexml2argv-vnc.args  |  4 +--
 tests/bhyvexml2argvtest.c |  5 +++
 ...l2xmlout-addr-isa-controller-on-slot-1.xml | 36 +++
 ...2xmlout-addr-isa-controller-on-slot-31.xml | 36 +++
 .../bhyvexml2xmlout-console.xml   |  3 ++
 .../bhyvexml2xmlout-isa-controller.xml| 36 +++
 .../bhyvexml2xmlout-serial-grub-nocons.xml|  3 ++
 .../bhyvexml2xmlout-serial-grub.xml   |  3 ++
 .../bhyvexml2xmlout-serial.xml|  3 ++
 .../bhyvexml2xmlout-vnc-autoport.xml  |  3 ++
 .../bhyvexml2xmlout-vnc-vgaconf-io.xml|  3 ++
 .../bhyvexml2xmlout-vnc-vgaconf-off.xml   |  3 ++
 .../bhyvexml2xmlout-vnc-vgaconf-on.xml|  3 ++
 .../bhyvexml2xmlout-vnc.xml   |  3 ++
 tests/bhyvexml2xmltest.c  |  3 ++
 39 files changed, 378 insertions(+), 37 deletions(-)
 create mode 100644 
tests/bhyvexml2argvdata/bhyvexml2argv-addr-isa-controller-on-slot-1.args
 create mode 100644 
tests/bhyvexml2argvdata/bhyvexml2argv-addr-isa-controller-on-slot-1.ldargs
 create mode 100644 
tests/bhyvexml2argvdata/bhyvexml2argv-addr-isa-controller-on-slot-1.xml
 create mode 100644 
tests/bhyvexml2argvdata/bhyvexml2argv-addr-isa-controller-on-slot-31.args
 create mode 100644 
tests/bhyvexml2argvdata/bhyvexml2argv-addr-isa-controller-on-slot-31.ldargs
 create mode 100644 
tests/bhyvexml2argvdata/bhyvexml2argv-addr-isa-controller-on-slot-31.xml
 create mode 100644 
tests/bhyvexml2argvdata/bhyvexml2argv-addr-non-isa-controller-on-slot-1.xml
 create mode 100644 tests/bhyvexml2argvdata/bhyvexml2argv-isa-controller.args
 create mode 100644 tests/bhyvexml2argvdata/bhyvexml2argv-isa-controller.ldargs
 create mode 100644 tests/bhyvexml2argvdata/bhyvexml2argv-isa-controller.xml
 create mode 100644 
tests/bhyvexml2argvdata/bhyvexml2argv-isa-multiple-controllers.xml
 create mode 100644 
tests/bhyvexml2xmloutdata/bhyvexml2xmlout-addr-isa-controller-on-slot-1.xml
 create mode 100644 
tests/bhyvexml2xmloutdata/bhyvexml2xmlout-addr-isa-controller-on-slot-31.xml
 create mode 100644 tests/bhyvexml2xmloutdata/bhyvexml2xmlout-isa-controller.xml

diff --git a/src/bhyve/bhyve_command.c b/src/bhyve/bhyve_command.c
index 22d0b24ec4..2a3e10d649 100644
--- a/src/bhyve/bhyve_command.c
+++ b/src/bhyve/bhyve_command.c
@@ -329,7 +329,8 @@ bhyveBuildControllerArgStr(const virDomainDef *def,
virDomainControllerDefPtr controller,
bhyveConnPtr driver,
virCommandPtr cmd,
-   unsigned *nusbcontrollers)
+   unsigned *nusbcontrollers,
+   unsigned *nisacontrollers)
 {
 switch (c

[PATCH v4 0/2] conf: add 'isa' controller type

2020-07-15 Thread Roman Bogorodskiy

No code changes, just re-arranged commits. Now commits are more
granular than v3, but still less granular as v2. Specifically, now
there are a 'conf' part as a separate commit and 'bhyve' part as a
separate commit. My thought was that automatic addition of the 'isa'
controller and controller index validation patches are small compared to
the main code, so should not increase reviewing difficulty much. Please
let me know if it's still desired to make more granular commits.

I plan to submit corresponding news and docs entries as a separate
series.

Roman Bogorodskiy (2):
  conf: add 'isa' controller type
  bhyve: support 'isa' controller for LPC

 docs/schemas/domaincommon.rng |  6 
 src/bhyve/bhyve_command.c | 27 +++---
 src/bhyve/bhyve_device.c  | 23 +---
 src/bhyve/bhyve_domain.c  | 25 +++--
 src/bhyve/bhyve_domain.h  |  2 --
 src/conf/domain_conf.c|  9 +
 src/conf/domain_conf.h|  8 +
 src/qemu/qemu_command.c   |  1 +
 src/qemu/qemu_domain.c|  1 +
 src/qemu/qemu_domain_address.c|  1 +
 src/qemu/qemu_validate.c  |  1 +
 src/vbox/vbox_common.c|  1 +
 ..ml2argv-addr-isa-controller-on-slot-1.args | 10 ++
 ...2argv-addr-isa-controller-on-slot-1.ldargs |  3 ++
 ...xml2argv-addr-isa-controller-on-slot-1.xml | 26 ++
 ...l2argv-addr-isa-controller-on-slot-31.args | 10 ++
 ...argv-addr-isa-controller-on-slot-31.ldargs |  3 ++
 ...ml2argv-addr-isa-controller-on-slot-31.xml | 26 ++
 ...argv-addr-non-isa-controller-on-slot-1.xml | 23 
 .../bhyvexml2argv-console.args|  2 +-
 .../bhyvexml2argv-isa-controller.args | 10 ++
 .../bhyvexml2argv-isa-controller.ldargs   |  3 ++
 .../bhyvexml2argv-isa-controller.xml  | 24 +
 ...bhyvexml2argv-isa-multiple-controllers.xml | 25 +
 .../bhyvexml2argv-serial-grub-nocons.args |  2 +-
 .../bhyvexml2argv-serial-grub.args|  2 +-
 .../bhyvexml2argv-serial.args |  2 +-
 .../bhyvexml2argvdata/bhyvexml2argv-uefi.args |  4 +--
 .../bhyvexml2argv-vnc-autoport.args   |  4 +--
 .../bhyvexml2argv-vnc-vgaconf-io.args |  4 +--
 .../bhyvexml2argv-vnc-vgaconf-off.args|  4 +--
 .../bhyvexml2argv-vnc-vgaconf-on.args |  4 +--
 .../bhyvexml2argvdata/bhyvexml2argv-vnc.args  |  4 +--
 tests/bhyvexml2argvtest.c |  5 +++
 ...l2xmlout-addr-isa-controller-on-slot-1.xml | 36 +++
 ...2xmlout-addr-isa-controller-on-slot-31.xml | 36 +++
 .../bhyvexml2xmlout-console.xml   |  3 ++
 .../bhyvexml2xmlout-isa-controller.xml| 36 +++
 .../bhyvexml2xmlout-serial-grub-nocons.xml|  3 ++
 .../bhyvexml2xmlout-serial-grub.xml   |  3 ++
 .../bhyvexml2xmlout-serial.xml|  3 ++
 .../bhyvexml2xmlout-vnc-autoport.xml  |  3 ++
 .../bhyvexml2xmlout-vnc-vgaconf-io.xml|  3 ++
 .../bhyvexml2xmlout-vnc-vgaconf-off.xml   |  3 ++
 .../bhyvexml2xmlout-vnc-vgaconf-on.xml|  3 ++
 .../bhyvexml2xmlout-vnc.xml   |  3 ++
 tests/bhyvexml2xmltest.c  |  3 ++
 47 files changed, 406 insertions(+), 37 deletions(-)
 create mode 100644 
tests/bhyvexml2argvdata/bhyvexml2argv-addr-isa-controller-on-slot-1.args
 create mode 100644 
tests/bhyvexml2argvdata/bhyvexml2argv-addr-isa-controller-on-slot-1.ldargs
 create mode 100644 
tests/bhyvexml2argvdata/bhyvexml2argv-addr-isa-controller-on-slot-1.xml
 create mode 100644 
tests/bhyvexml2argvdata/bhyvexml2argv-addr-isa-controller-on-slot-31.args
 create mode 100644 
tests/bhyvexml2argvdata/bhyvexml2argv-addr-isa-controller-on-slot-31.ldargs
 create mode 100644 
tests/bhyvexml2argvdata/bhyvexml2argv-addr-isa-controller-on-slot-31.xml
 create mode 100644 
tests/bhyvexml2argvdata/bhyvexml2argv-addr-non-isa-controller-on-slot-1.xml
 create mode 100644 tests/bhyvexml2argvdata/bhyvexml2argv-isa-controller.args
 create mode 100644 tests/bhyvexml2argvdata/bhyvexml2argv-isa-controller.ldargs
 create mode 100644 tests/bhyvexml2argvdata/bhyvexml2argv-isa-controller.xml
 create mode 100644 
tests/bhyvexml2argvdata/bhyvexml2argv-isa-multiple-controllers.xml
 create mode 100644 
tests/bhyvexml2xmloutdata/bhyvexml2xmlout-addr-isa-controller-on-slot-1.xml
 create mode 100644 
tests/bhyvexml2xmloutdata/bhyvexml2xmlout-addr-isa-controller-on-slot-31.xml
 create mode 100644 tests/bhyvexml2xmloutdata/bhyvexml2xmlout-isa-controller.xml

-- 
2.27.0

Re: [libvirt PATCH v2 07/15] network: use g_free() in place of remaining VIR_FREE()

2020-07-15 Thread Ján Tomko


On a Tuesday in 2020, Laine Stump wrote:

Signed-off-by: Laine Stump 
---
src/network/bridge_driver.c | 45 +++--
1 file changed, 23 insertions(+), 22 deletions(-)



Reviewed-by: Ján Tomko 

Jano


signature.asc
Description: PGP signature

Re: [libvirt PATCH v2 06/15] network: eliminate unnecessary labels

2020-07-15 Thread Ján Tomko


On a Tuesday in 2020, Laine Stump wrote:

All these cleanup/error labels were reduced to having just "return
ret" by a previous patch, so get rid of them and return directly.

Signed-off-by: Laine Stump 
---
src/network/bridge_driver.c   | 264 --
src/network/bridge_driver_linux.c |  15 +-
2 files changed, 113 insertions(+), 166 deletions(-)

@@ -3190,7 +3143,7 @@ static int
networkFindUnusedBridgeName(virNetworkObjListPtr nets,
virNetworkDefPtr def)
{
-int ret = -1, id = 0;
+int id = 0;
const char *templ = "virbr%d";
const char *p;

@@ -3211,17 +3164,14 @@ networkFindUnusedBridgeName(virNetworkObjListPtr nets,
  virNetDevExists(newname) == 1)) {
VIR_FREE(def->bridge); /*could contain template */
def->bridge = g_steal_pointer(&newname);
-ret = 0;
-goto cleanup;
+return 0;
}
} while (++id <= MAX_BRIDGE_ID);

virReportError(VIR_ERR_INTERNAL_ERROR,
   _("Bridge generation exceeded max id %d"),
   MAX_BRIDGE_ID);
-ret = 0;
- cleanup:
-return ret;
+return -1;


This fix deserves a separate commit. Or at least a mention in the commit
message.


}




Reviewed-by: Ján Tomko 

Jano


signature.asc
Description: PGP signature

Re: [libvirt PATCH v2 05/15] network: use g_auto wherever appropriate

2020-07-15 Thread Ján Tomko


On a Tuesday in 2020, Laine Stump wrote:

This includes standard g_autofree() as well as other objects that have
a cleanup function defined to use via g_autoptr (virCommand,
virJSONValue)

Signed-off-by: Laine Stump 
---
src/network/bridge_driver.c   | 206 ++
src/network/bridge_driver_linux.c |   7 +-
src/network/leaseshelper.c|  16 +--
3 files changed, 76 insertions(+), 153 deletions(-)

diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c
index ab359acdb5..31bd0dd92c 100644
--- a/src/network/bridge_driver.c
+++ b/src/network/bridge_driver.c


[...]


@@ -1095,7 +1081,6 @@ networkDnsmasqConfContents(virNetworkObjPtr obj,
bool wantDNS = dns->enable != VIR_TRISTATE_BOOL_NO;
virNetworkIPDefPtr tmpipdef, ipdef, ipv4def, ipv6def;
bool ipv6SLAAC;
-char *saddr = NULL, *eaddr = NULL;

*configstr = NULL;



[...]


@@ -1414,6 +1396,8 @@ networkDnsmasqConfContents(virNetworkObjPtr obj,
int thisRange;
virNetworkDHCPRangeDef range = ipdef->ranges[r];
g_autofree char *leasetime = NULL;
+g_autofree char *saddr = NULL;
+g_autofree char *eaddr = NULL;


300 lines below the original location. Long function is long. :)



if (!(saddr = virSocketAddrFormat(&range.addr.start)) ||
!(eaddr = virSocketAddrFormat(&range.addr.end)))


[...]


@@ -2248,7 +2206,7 @@ static int
networkSetIPv6Sysctls(virNetworkObjPtr obj)
{
virNetworkDefPtr def = virNetworkObjGetDef(obj);
-char *field = NULL;
+g_autofree char *field = NULL;


Last time I tried manually freeing an autofree'd variable, I was told
not to do that O:-)

The clean way here seems to be refactoring the function. I can put that
somewhere into the depths of my TODO list.


int ret = -1;
bool enableIPv6 = !!virNetworkDefGetIPByIndex(def, AF_INET6, 0);

@@ -2273,7 +2231,6 @@ networkSetIPv6Sysctls(virNetworkObjPtr obj)
   "on bridge %s"), field, def->bridge);
goto cleanup;
}
-VIR_FREE(field);

/* The rest of the ipv6 sysctl tunables should always be set the
 * same, whether or not we're using ipv6 on this bridge.
@@ -2282,6 +2239,7 @@ networkSetIPv6Sysctls(virNetworkObjPtr obj)
/* Prevent guests from hijacking the host network by sending out
 * their own router advertisements.
 */
+VIR_FREE(field);
field = g_strdup_printf(SYSCTL_PATH "/net/ipv6/conf/%s/accept_ra",
def->bridge);

@@ -2290,11 +2248,11 @@ networkSetIPv6Sysctls(virNetworkObjPtr obj)
 _("cannot disable %s"), field);
goto cleanup;
}
-VIR_FREE(field);

/* All interfaces used as a gateway (which is what this is, by
 * definition), must always have autoconf=0.
 */
+VIR_FREE(field);
field = g_strdup_printf(SYSCTL_PATH "/net/ipv6/conf/%s/autoconf", 
def->bridge);

if (virFileWriteStr(field, "0", 0) < 0) {
@@ -2305,7 +2263,6 @@ networkSetIPv6Sysctls(virNetworkObjPtr obj)

ret = 0;
 cleanup:
-VIR_FREE(field);
return ret;
}



[...]


@@ -3276,8 +3221,6 @@ networkFindUnusedBridgeName(virNetworkObjListPtr nets,
   MAX_BRIDGE_ID);
ret = 0;


So this function returned 0 even on failure.
Introduced by a28d3e485f01d16320af15780bc935f54782a45d


 cleanup:
-if (ret < 0)
-VIR_FREE(newname);
return ret;
}



Without the networkSetIPv6Sysctls changes:
Reviewed-by: Ján Tomko 

Jano


signature.asc
Description: PGP signature

[libvirt PATCH] Partially revert "qemu: fix missing error reports in capabilities probing"

2020-07-15 Thread Daniel P . Berrangé

This partially reverts commit 5331c4804f4f419b9e75741084f926e52413d3a1.

The original commit mistakenly thought virFileCacheLookup did not set
an error. In fact the only case it doesn't set an error for is when
the cache key is NULL. This in fact the fault of the caller for passing
an invalid cache key, so doesn't need to be handled.

This caller bug was fixed by checking for a NULL binary in the
virQEMUCapsCacheLookupDefault method.

Signed-off-by: Daniel P. Berrangé 
---
 src/qemu/qemu_capabilities.c | 5 -
 src/qemu/qemu_domain.c   | 4 +---
 2 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c
index b27a5e5c81..7264fe300f 100644
--- a/src/qemu/qemu_capabilities.c
+++ b/src/qemu/qemu_capabilities.c
@@ -5636,11 +5636,6 @@ virQEMUCapsCacheLookup(virFileCachePtr cache,
 priv->microcodeVersion = virHostCPUGetMicrocodeVersion();
 
 ret = virFileCacheLookup(cache, binary);
-if (!ret) {
-virReportError(VIR_ERR_INTERNAL_ERROR,
-   _("no capabilities available for %s"), binary);
-return NULL;
-}
 
 VIR_DEBUG("Returning caps %p for %s", ret, binary);
 return ret;
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
index 4a2daffc0a..614781 100644
--- a/src/qemu/qemu_domain.c
+++ b/src/qemu/qemu_domain.c
@@ -5256,10 +5256,8 @@ qemuDomainPostParseDataAlloc(const virDomainDef *def,
 virQEMUDriverPtr driver = opaque;
 
 if (!(*parseOpaque = virQEMUCapsCacheLookup(driver->qemuCapsCache,
-def->emulator))) {
-virResetLastError();
+def->emulator)))
 return 1;
-}
 
 return 0;
 }
-- 
2.26.2

Re: [libvirt PATCH] qemu: fix missing error reports in capabilities probing

2020-07-15 Thread Daniel P . Berrangé

On Mon, Jul 13, 2020 at 09:44:07PM +0200, Michal Privoznik wrote:
> On 6/15/20 3:56 PM, Daniel P. Berrangé wrote:
> > The "virsh domcapabilities --arch ppc64" command will fail with no
> > error message set if qemu-system-ppc64 is not currently installed.
> > 
> > This is because virQEMUCapsCacheLookup() does not report any error
> > message if not capabilities can be obtained from the cache. Almost
> > all methods calling this expected an error to be set on failure.
> > 
> > Once that's fixed though, we see a further bug which is that
> > virQEMUCapsCacheLookupDefault() is passing a NULL binary path to
> > virQEMUCapsCacheLookup(), so we need to catch that too.
> > 
> > Signed-off-by: Daniel P. Berrangé 
> > ---
> >   src/qemu/qemu_capabilities.c | 11 +++
> >   src/qemu/qemu_domain.c   |  4 +++-
> >   2 files changed, 14 insertions(+), 1 deletion(-)
> > 
> 
> 
> > diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
> > index 2dad823a86..97096073e6 100644
> > --- a/src/qemu/qemu_domain.c
> > +++ b/src/qemu/qemu_domain.c
> > @@ -6068,8 +6068,10 @@ qemuDomainPostParseDataAlloc(const virDomainDef *def,
> >   virQEMUDriverPtr driver = opaque;
> >   if (!(*parseOpaque = virQEMUCapsCacheLookup(driver->qemuCapsCache,
> > -def->emulator)))
> > +def->emulator))) {
> > +virResetLastError();
> >   return 1;
> > +}
> >   return 0;
> >   }
> > 
> 
> I think we need to revisit this patch. In my testing, I have qemu built on a
> side from git and one domain that runs it. As I updated my system, but not
> rebuilt the qemu from git it no longer runs (fails to link):
> 
> ~/work/qemu/qemu.git/x86_64-softmmu/qemu-system-x86_64: error while loading
> shared libraries: libnettle.so.7: cannot open shared object file: No such
> file or directory
> 
> This is expected. But, virsh start fails with:
> 
> error: Failed to start domain fedora
> error: An error occurred, but the cause is unknown
> 
> I've tracked this down to the virResetLastError() in the hunk above. And it
> kind of makes sense - we failed to load capabilities on daemon startup (oh,
> yeah, daemon runs from git too, but it's been rebuilt) so we try again on
> domain startup. But now that I am reading the commit message it doesn't make
> much sense to me either. 'virsh domcapabilities' has nothing to do with
> PostParse callbacks, does it? Can it be that this error reset call is just
> misplaced?

I was attempting to preserve what I thought was existing behaviour.

I was seeing that  virFileCacheLookup returned NULL, but didn't set
any error. So I added an error report in virQEMUCapsCacheLookup,
and thus in reurn clearer the eror in virQEMUCapsCacheLookup.

It looks like i was mistaken about virFileCacheLookup tough - it
does indeed set an error, but not in all cases. So I thin we need
to revert part of my commit.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [libvirt PATCH v2 04/15] network: replace VIR_ALLOC/REALLOC with g_new0/g_renew

2020-07-15 Thread Ján Tomko


On a Tuesday in 2020, Laine Stump wrote:

Signed-off-by: Laine Stump 
---
src/network/bridge_driver.c | 29 ++---
1 file changed, 10 insertions(+), 19 deletions(-)

diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c
index 713763130b..ab359acdb5 100644
--- a/src/network/bridge_driver.c
+++ b/src/network/bridge_driver.c
@@ -177,8 +177,7 @@ 
networkDnsmasqDefNamespaceParseOptions(networkDnsmasqXmlNsDefPtr nsdef,
if (nnodes == 0)
return 0;

-if (VIR_ALLOC_N(nsdef->options, nnodes) < 0)
-return -1;
+nsdef->options = g_new0(char *, nnodes);

for (i = 0; i < nnodes; i++) {
if (!(nsdef->options[nsdef->noptions++] = virXMLPropString(nodes[i], 
"value"))) {
@@ -196,12 +195,9 @@ static int
networkDnsmasqDefNamespaceParse(xmlXPathContextPtr ctxt,
void **data)
{
-networkDnsmasqXmlNsDefPtr nsdata = NULL;
+networkDnsmasqXmlNsDefPtr nsdata = g_new0(networkDnsmasqXmlNsDef, 1);
int ret = -1;

-if (VIR_ALLOC(nsdata) < 0)
-return -1;
-
if (networkDnsmasqDefNamespaceParseOptions(nsdata, ctxt))
goto cleanup;

@@ -733,8 +729,7 @@ networkStateInitialize(bool privileged,
return -1;
}

-if (VIR_ALLOC(network_driver) < 0)
-goto error;
+network_driver = g_new0(virNetworkDriverState, 1);

network_driver->lockFD = -1;
if (virMutexInit(&network_driver->lock) < 0) {
@@ -2753,8 +2748,7 @@ networkCreateInterfacePool(virNetworkDefPtr netdef)
goto cleanup;
}

-if (VIR_ALLOC_N(netdef->forward.ifs, numVirtFns) < 0)
-goto cleanup;
+netdef->forward.ifs = g_new0(virNetworkForwardIfDef, numVirtFns);

for (i = 0; i < numVirtFns; i++) {
virPCIDeviceAddressPtr thisVirtFn = virtFns[i];
@@ -4323,8 +4317,7 @@ networkGetDHCPLeases(virNetworkPtr net,
continue;

if (need_results) {
-if (VIR_ALLOC(lease) < 0)
-goto error;
+lease = g_new0(virNetworkDHCPLease, 1);

lease->expirytime = expirytime_tmp;

@@ -4378,9 +4371,8 @@ networkGetDHCPLeases(virNetworkPtr net,

if (leases_ret) {
/* NULL terminated array */
-ignore_value(VIR_REALLOC_N(leases_ret, nleases + 1));
-*leases = leases_ret;
-leases_ret = NULL;
+leases_ret = g_renew(virNetworkDHCPLeasePtr, leases_ret,  nleases + 1);


Double space before nleases.

Also, this is a faithful conversion, but neither VIR_REALLOC_N nor
g_renew guarantee that the memory will be zeroed.


+*leases = g_steal_pointer(&leases_ret);
}

rv = nleases;


With the double space removed:
Reviewed-by: Ján Tomko 

Jano


signature.asc
Description: PGP signature

Re: [libvirt PATCH v2 03/15] define g_autoptr cleanup function for virNetworkDHCPLease

2020-07-15 Thread Ján Tomko


On a Tuesday in 2020, Laine Stump wrote:

virNetworkDHCPLease and virNetworkDHCPLeaseFree() are declared in the
public API file libvirt-network.h, and we can't pollute that with glib
macro invocations, so put this in src/datatypes.h next to the other
virNetwork items.

Signed-off-by: Laine Stump 
---
src/datatypes.h | 2 ++
1 file changed, 2 insertions(+)



Reviewed-by: Ján Tomko 

Jano


signature.asc
Description: PGP signature

Re: [libvirt PATCH v2 02/15] util: define g_autoptr cleanups for a couple dnsmasq objects

2020-07-15 Thread Ján Tomko


On a Tuesday in 2020, Laine Stump wrote:

Signed-off-by: Laine Stump 
---
src/util/virdnsmasq.h | 4 
1 file changed, 4 insertions(+)



Reviewed-by: Ján Tomko 

Jano


signature.asc
Description: PGP signature

Re: [libvirt PATCH v2 01/15] replace g_new() with g_new0() for consistency

2020-07-15 Thread Ján Tomko


On a Tuesday in 2020, Laine Stump wrote:

g_new() is used in only 3 places. Switching them to g_new0() will do
no harm, reduces confusion, and helps me sleep better at night knowing


Sweet dreams.


that all allocated memory is initialized to 0 :-) (Yes, I *know* that
in all three cases the associated memory is immediately assigned some
other value. Today.)

Signed-off-by: Laine Stump 
---
src/qemu/qemu_backup.c  | 2 +-
src/util/virutil.c  | 2 +-
tests/qemuhotplugmock.c | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)



Reviewed-by: Ján Tomko 

Jano


signature.asc
Description: PGP signature

Re: [PATCH 00/10] resolve hangs/crashes on libvirtd shutdown

2020-07-15 Thread Daniel P . Berrangé

On Wed, Jul 15, 2020 at 08:51:03AM +0300, Nikolay Shirokovskiy wrote:
> 
> 
> On 14.07.2020 17:53, Daniel Henrique Barboza wrote:
> > As far as code goes:
> > 
> > 
> > Reviewed-by: Daniel Henrique Barboza 
> > 
> > 
> > About the design I have a question about the timeout. Patch 5/10 is setting 
> > a
> > 15 second timeout. How did you reach this value? Reading the bug, specially
> > this comment from Daniel:
> > 
> > https://bugzilla.redhat.com/show_bug.cgi?id=1828207#c6
> > 
> > He mentions "give it 5 seconds of running before shutting it down".
> 
> I guess 5 seconds is time for libvirtd to finish startup. This time has
> different nature than time for libvirtd to finish it's work on shutdown
> so it can be different.
> 
> > 
> > 5 seconds before shutdown is something that most users can be slightly 
> > annoyed
> > but in the end don't mind that much, but 15 seconds is something that will
> > cause bugs to be opened because "Libvirt is taking too long to shutdown".
> > Besides, it's a fair assumption that a transaction that takes more than
> > 5 or so seconds to finish is already compromised* - might as well shutdown
> > the daemon and deal with the errors.
> 
> 15 seconds was mentioned by Daniel in [1] when he first proposed the approach
> so I used this value without any extra thought. However I missed that in
> the last John's series [2] the default for waiting time is 0s. May be this
> is the current decision on waiting time. Let's wait for others to join
> the review.

Don't read too much into the precise numbers I mentioned, they would just
be plucked out of the air :-)

If there is some job taking place wrt a VM that is taking a long time to
complete and thus blocking shutdown, I think it is important to give it a
fair opportunity to finish gracefully.  systemd itself gives services
something like 90 seconds to exit before it gives up on them.

On a heavily loaded host, 5 seconds is almost certainly too short. 15
seconds is not bad, but I wouldn't object to 30 seconds either, as long
as we're emitting some log message warning that we're delayed.

In the "normal" case these timeouts won't be hit, so we're only delayed
in the scenarios where we're likely to be doing something important for
a VM.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [libvirt PATCH 9/9] rpc: use new virt-nc binary for remote tunnelling

2020-07-15 Thread Andrea Bolognani

On Wed, 2020-07-15 at 14:25 +0100, Daniel P. Berrangé wrote:
> On Wed, Jul 15, 2020 at 02:25:14PM +0200, Andrea Bolognani wrote:
> > Mh, that makes sense but I'm still wary of using "proxy" due to the
> > potential for confusion, since in this case the proxy is on the
> > opposite side of the connection than one would probably expect it
> > to be. Something like "remoteproxy" or "serverproxy", perhaps?
> 
> I don't think there's really any problem of confusion here unless
> someone doesn't read the docs at all, in which case they won't
> even know about this parameter. So I don't think using a more
> verbose term is any benefit.

Okay.

> > Going back to the name of the command itself, since it's an internal
> > implementation details, and as such it's not intended to be invoked
> > by users and accordingly we're installing it under $(libexecdir)
> > along with existing helpers, what about following the established
> > naming convention and calling it 'libvirt_proxyhelper'?
> 
> Installing it to libexecdir is actually a mistake in this version. It
> needs to be installed into bindir, as it must be present in $PATH.

Why is that so? Is it because otherwise we can't guarantee that ssh
on the remote end will find it, seeing as $(libexecdir) can be
changed at configure time and is usually not part of $PATH anyway?

If the binary will show up in $PATH, then I think it's even more
important to ensure it's very apparent that it's for internal use
and not to be invoked directly. Including a message explaining this
in the help output as well as the usage message that is printed when
a URI is not passed on the command line would be a great start.

As for the name of the binary itself, longer and more descriptive is
clearly better to avoid confusion. What about 'virt-proxy-helper'?

-- 
Andrea Bolognani / Red Hat / Virtualization

Re: [libvirt PATCH 9/9] rpc: use new virt-nc binary for remote tunnelling

2020-07-15 Thread Daniel P . Berrangé

On Wed, Jul 15, 2020 at 02:25:14PM +0200, Andrea Bolognani wrote:
> On Wed, 2020-07-15 at 11:00 +0100, Daniel P. Berrangé wrote:
> > On Fri, Jul 10, 2020 at 07:21:47PM +0200, Andrea Bolognani wrote:
> > > Just a couple of comments about the UI: would it make sense to use
> > > something like
> > > 
> > >   qemu+ssh://host/system?tunnelcmd=virt-tunnel
> > > 
> > > instead? virt-nc could be seen as a bit of a misnomer, considering
> > > that (by design) it doesn't do nearly as much as the actual netcat
> > > does; as for the 'proxy' argument, I'm afraid it might lead people
> > > to believe it's used for HTTP proxying or some other form of
> > > proxying *between the client and the host*, whereas it's really
> > > something that only affects operations on the host itself - not to
> > > mention that we also have a virtproxyd now, further increasing the
> > > potential for confusion...
> > 
> > I chose proxy not tunnel, because SSH is providing the tunnel here.
> > virt-nc is a proxy linking the tunnel to the daemon. virtproxyd is
> > conceptually similar, again linking a libvirt client to the real
> > daemon.
> 
> Mh, that makes sense but I'm still wary of using "proxy" due to the
> potential for confusion, since in this case the proxy is on the
> opposite side of the connection than one would probably expect it
> to be. Something like "remoteproxy" or "serverproxy", perhaps?

I don't think there's really any problem of confusion here unless
someone doesn't read the docs at all, in which case they won't
even know about this parameter. So I don't think using a more
verbose term is any benefit.

> 
> > I probably shouldn't mention "virt-nc" by name in the URI and instead
> > have "proxy=netcat" vs "proxy=native", as users don't get to choose
> > the actual binary here - they are providing an enum string, which
> > gets mapped to the desired binary.
> 
> Yeah, that's much better.
> 
> Going back to the name of the command itself, since it's an internal
> implementation details, and as such it's not intended to be invoked
> by users and accordingly we're installing it under $(libexecdir)
> along with existing helpers, what about following the established
> naming convention and calling it 'libvirt_proxyhelper'?

Installing it to libexecdir is actually a mistake in this version. It
needs to be installed into bindir, as it must be present in $PATH.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

[PATCH 07/10] virStorageSourceFindByNodeName: Remove unused 'idx' argument

2020-07-15 Thread Peter Krempa

None of the callers actually use it.

Signed-off-by: Peter Krempa 
---
 src/qemu/qemu_domain.c|  9 -
 src/util/virstoragefile.c | 16 +++-
 src/util/virstoragefile.h |  3 +--
 3 files changed, 8 insertions(+), 20 deletions(-)

diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
index 64bd52c51f..ed7ec77ed4 100644
--- a/src/qemu/qemu_domain.c
+++ b/src/qemu/qemu_domain.c
@@ -2498,15 +2498,15 @@ 
qemuDomainObjPrivateXMLParseBlockjobNodename(qemuBlockJobDataPtr job,
 return;

 if (job->disk &&
-(*src = virStorageSourceFindByNodeName(job->disk->src, nodename, 
NULL)))
+(*src = virStorageSourceFindByNodeName(job->disk->src, nodename)))
 return;

 if (job->chain &&
-(*src = virStorageSourceFindByNodeName(job->chain, nodename, NULL)))
+(*src = virStorageSourceFindByNodeName(job->chain, nodename)))
 return;

 if (job->mirrorChain &&
-(*src = virStorageSourceFindByNodeName(job->mirrorChain, nodename, 
NULL)))
+(*src = virStorageSourceFindByNodeName(job->mirrorChain, nodename)))
 return;

 /* the node was in the XML but was not found in the job definitions */
@@ -11486,8 +11486,7 @@ qemuDomainDiskLookupByNodename(virDomainDefPtr def,
 *src = NULL;

 for (i = 0; i < def->ndisks; i++) {
-if ((tmp = virStorageSourceFindByNodeName(def->disks[i]->src,
-  nodename, NULL))) {
+if ((tmp = virStorageSourceFindByNodeName(def->disks[i]->src, 
nodename))) {
 if (src)
 *src = tmp;

diff --git a/src/util/virstoragefile.c b/src/util/virstoragefile.c
index 274883c4bd..00d8e16ef9 100644
--- a/src/util/virstoragefile.c
+++ b/src/util/virstoragefile.c
@@ -4589,33 +4589,23 @@ virStorageSourceIsRelative(virStorageSourcePtr src)
  * virStorageSourceFindByNodeName:
  * @top: backing chain top
  * @nodeName: node name to find in backing chain
- * @index: if provided the index in the backing chain
  *
  * Looks up the given storage source in the backing chain and returns the
- * pointer to it. If @index is passed then it's filled by the index in the
- * backing chain. On failure NULL is returned and no error is reported.
+ * pointer to it.
+ * On failure NULL is returned and no error is reported.
  */
 virStorageSourcePtr
 virStorageSourceFindByNodeName(virStorageSourcePtr top,
-   const char *nodeName,
-   unsigned int *idx)
+   const char *nodeName)
 {
 virStorageSourcePtr tmp;

-if (idx)
-*idx = 0;
-
 for (tmp = top; virStorageSourceIsBacking(tmp); tmp = tmp->backingStore) {
 if ((tmp->nodeformat && STREQ(tmp->nodeformat, nodeName)) ||
 (tmp->nodestorage && STREQ(tmp->nodestorage, nodeName)))
 return tmp;
-
-if (idx)
-(*idx)++;
 }

-if (idx)
-*idx = 0;
 return NULL;
 }

diff --git a/src/util/virstoragefile.h b/src/util/virstoragefile.h
index c68bdc9680..f73b3ee005 100644
--- a/src/util/virstoragefile.h
+++ b/src/util/virstoragefile.h
@@ -526,8 +526,7 @@ bool virStorageSourceIsRelative(virStorageSourcePtr src);

 virStorageSourcePtr
 virStorageSourceFindByNodeName(virStorageSourcePtr top,
-   const char *nodeName,
-   unsigned int *index)
+   const char *nodeName)
 ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2);

 void
-- 
2.26.2

[PATCH 10/10] virDomainSetBlockThreshold: Mention that the event can be registered for

2020-07-15 Thread Peter Krempa

The infrastructure supports setting the threshold also for the .
Mention it in the docs.

https://bugzilla.redhat.com/show_bug.cgi?id=1807741

Signed-off-by: Peter Krempa 
---
 src/libvirt-domain.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c
index 6c5ff5b0db..a4e73d5480 100644
--- a/src/libvirt-domain.c
+++ b/src/libvirt-domain.c
@@ -12381,6 +12381,9 @@ int virDomainGetGuestInfo(virDomainPtr domain,
  * live VM XML for 'backingStore' or 'source' elements of a disk. If index is
  * given the threshold is set for the corresponding image.
  *
+ * Note that the threshold event can be registered also for destinations of a
+ * 'virDomainBlockCopy' destination by using the 'index' of the 'mirror' 
source.
+ *
  * Hypervisors report the last written sector of an image in the bulk stats API
  * (virConnectGetAllDomainStats/virDomainListGetStats) as
  * "block..allocation" in the VIR_DOMAIN_STATS_BLOCK group. The current
-- 
2.26.2

[PATCH 05/10] virDomainSetBlockThreshold: Clarify values of @dev the event is fired for

2020-07-15 Thread Peter Krempa

Top level image may get two events, one with the disk target (vda) and
one with disk target with index (vda[3]) if the top level image has an
index.

Signed-off-by: Peter Krempa 
---
 src/libvirt-domain.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c
index ba30d18f65..6c5ff5b0db 100644
--- a/src/libvirt-domain.c
+++ b/src/libvirt-domain.c
@@ -12370,6 +12370,10 @@ int virDomainGetGuestInfo(virDomainPtr domain,
  * described by @dev is written beyond the set threshold level. The threshold
  * level is unset once the event fires. The event might not be delivered at all
  * if libvirtd was not running at the moment when the threshold was reached.
+ * Note that if the threshold level is reached for a top level image the event
+ * is emitted for @dev corresponding to the disk target, and may also be 
reported
+ * with @dev corresponding to the disk target with an index corresponding to 
the
+ * 'index' attribute of 'source' in the live VM XML if the atribute is present.
  *
  * @dev can either be a disk target name (vda, sda) or disk target with index (
  * vda[4]). Without the index the top image in the backing chain will have the
-- 
2.26.2

[PATCH 01/10] virDomainSetBlockThreshold: Document values of '@dev' better

2020-07-15 Thread Peter Krempa

Mention where to obtain the index and how it's treated.

Signed-off-by: Peter Krempa 
---
 src/libvirt-domain.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c
index fe4dab7cdf..ba30d18f65 100644
--- a/src/libvirt-domain.c
+++ b/src/libvirt-domain.c
@@ -12371,6 +12371,12 @@ int virDomainGetGuestInfo(virDomainPtr domain,
  * level is unset once the event fires. The event might not be delivered at all
  * if libvirtd was not running at the moment when the threshold was reached.
  *
+ * @dev can either be a disk target name (vda, sda) or disk target with index (
+ * vda[4]). Without the index the top image in the backing chain will have the
+ * threshold set. The index corresponds to the 'index' attribute reported in 
the
+ * live VM XML for 'backingStore' or 'source' elements of a disk. If index is
+ * given the threshold is set for the corresponding image.
+ *
  * Hypervisors report the last written sector of an image in the bulk stats API
  * (virConnectGetAllDomainStats/virDomainListGetStats) as
  * "block..allocation" in the VIR_DOMAIN_STATS_BLOCK group. The current
-- 
2.26.2

[PATCH 03/10] qemuDomainDiskBackingStoreGetName: Eliminate temp variable

2020-07-15 Thread Peter Krempa

We can return the formatted string directly.

Signed-off-by: Peter Krempa 
---
 src/qemu/qemu_domain.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
index 3d136a6b8a..cfdd9270da 100644
--- a/src/qemu/qemu_domain.c
+++ b/src/qemu/qemu_domain.c
@@ -11517,14 +11517,10 @@ char *
 qemuDomainDiskBackingStoreGetName(virDomainDiskDefPtr disk,
   unsigned int idx)
 {
-char *ret = NULL;
-
 if (idx)
-ret = g_strdup_printf("%s[%d]", disk->dst, idx);
+return g_strdup_printf("%s[%d]", disk->dst, idx);
 else
-ret = g_strdup(disk->dst);
-
-return ret;
+return g_strdup(disk->dst);
 }


-- 
2.26.2

[PATCH 02/10] qemuDomainDiskBackingStoreGetName: Remove unused argument

2020-07-15 Thread Peter Krempa

Signed-off-by: Peter Krempa 
---
 src/qemu/qemu_domain.c  | 1 -
 src/qemu/qemu_domain.h  | 1 -
 src/qemu/qemu_process.c | 2 +-
 3 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
index 4a2daffc0a..3d136a6b8a 100644
--- a/src/qemu/qemu_domain.c
+++ b/src/qemu/qemu_domain.c
@@ -11515,7 +11515,6 @@ qemuDomainDiskLookupByNodename(virDomainDefPtr def,
  */
 char *
 qemuDomainDiskBackingStoreGetName(virDomainDiskDefPtr disk,
-  virStorageSourcePtr src G_GNUC_UNUSED,
   unsigned int idx)
 {
 char *ret = NULL;
diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h
index e524fd0002..6944b37ff7 100644
--- a/src/qemu/qemu_domain.h
+++ b/src/qemu/qemu_domain.h
@@ -963,7 +963,6 @@ virDomainDiskDefPtr 
qemuDomainDiskLookupByNodename(virDomainDefPtr def,
unsigned int *idx);

 char *qemuDomainDiskBackingStoreGetName(virDomainDiskDefPtr disk,
-virStorageSourcePtr src,
 unsigned int idx);

 virStorageSourcePtr qemuDomainGetStorageSourceByDevstr(const char *devstr,
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index 70fc24b993..2ee778c606 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -1514,7 +1514,7 @@ qemuProcessHandleBlockThreshold(qemuMonitorPtr mon 
G_GNUC_UNUSED,
 if (virStorageSourceIsLocalStorage(src))
 path = src->path;

-if ((dev = qemuDomainDiskBackingStoreGetName(disk, src, idx)))
+if ((dev = qemuDomainDiskBackingStoreGetName(disk, idx)))
 event = virDomainEventBlockThresholdNewFromObj(vm, dev, path,
threshold, excess);
 }
-- 
2.26.2

[PATCH 06/10] qemuDomainDiskLookupByNodename: Remove unused 'idx'

2020-07-15 Thread Peter Krempa

All callers pass NULL as the value. Remove the argument.

Signed-off-by: Peter Krempa 
---
 src/qemu/qemu_domain.c  | 12 ++--
 src/qemu/qemu_domain.h  |  3 +--
 src/qemu/qemu_process.c |  4 ++--
 3 files changed, 5 insertions(+), 14 deletions(-)

diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
index cfdd9270da..64bd52c51f 100644
--- a/src/qemu/qemu_domain.c
+++ b/src/qemu/qemu_domain.c
@@ -11469,7 +11469,6 @@ qemuDomainNamespaceTeardownInput(virDomainObjPtr vm,
  * @def: domain definition to look for the disk
  * @nodename: block backend node name to find
  * @src: filled with the specific backing store element if provided
- * @idx: index of @src in the backing chain, if provided
  *
  * Looks up the disk in the domain via @nodename and returns its definition.
  * Optionally fills @src and @idx if provided with the specific backing chain
@@ -11478,24 +11477,17 @@ qemuDomainNamespaceTeardownInput(virDomainObjPtr vm,
 virDomainDiskDefPtr
 qemuDomainDiskLookupByNodename(virDomainDefPtr def,
const char *nodename,
-   virStorageSourcePtr *src,
-   unsigned int *idx)
+   virStorageSourcePtr *src)
 {
 size_t i;
-unsigned int srcindex;
 virStorageSourcePtr tmp = NULL;

-if (!idx)
-idx = &srcindex;
-
 if (src)
 *src = NULL;

-*idx = 0;
-
 for (i = 0; i < def->ndisks; i++) {
 if ((tmp = virStorageSourceFindByNodeName(def->disks[i]->src,
-  nodename, idx))) {
+  nodename, NULL))) {
 if (src)
 *src = tmp;

diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h
index 6944b37ff7..d75fbc0af3 100644
--- a/src/qemu/qemu_domain.h
+++ b/src/qemu/qemu_domain.h
@@ -959,8 +959,7 @@ int qemuDomainNamespaceTeardownInput(virDomainObjPtr vm,

 virDomainDiskDefPtr qemuDomainDiskLookupByNodename(virDomainDefPtr def,
const char *nodename,
-   virStorageSourcePtr *src,
-   unsigned int *idx);
+   virStorageSourcePtr *src);

 char *qemuDomainDiskBackingStoreGetName(virDomainDiskDefPtr disk,
 unsigned int idx);
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index 9b6dc8e68b..580abb0fb4 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -880,7 +880,7 @@ qemuProcessHandleIOError(qemuMonitorPtr mon G_GNUC_UNUSED,
 if (diskAlias)
 disk = qemuProcessFindDomainDiskByAliasOrQOM(vm, diskAlias, NULL);
 else if (nodename)
-disk = qemuDomainDiskLookupByNodename(vm->def, nodename, NULL, NULL);
+disk = qemuDomainDiskLookupByNodename(vm->def, nodename, NULL);
 else
 disk = NULL;

@@ -1509,7 +1509,7 @@ qemuProcessHandleBlockThreshold(qemuMonitorPtr mon 
G_GNUC_UNUSED,
   "threshold '%llu' exceeded by '%llu'",
   nodename, vm, vm->def->name, threshold, excess);

-if ((disk = qemuDomainDiskLookupByNodename(vm->def, nodename, &src, 
NULL))) {
+if ((disk = qemuDomainDiskLookupByNodename(vm->def, nodename, &src))) {
 if (virStorageSourceIsLocalStorage(src))
 path = src->path;

-- 
2.26.2

[PATCH 08/10] qemuDomainDiskLookupByNodename: Look also for 'mirror' node names

2020-07-15 Thread Peter Krempa

When doing a block copy, there is another chain of images attached to a
disk. Consider them as well when looking up a disk using nodename.

Signed-off-by: Peter Krempa 
---
 src/qemu/qemu_domain.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
index ed7ec77ed4..18fd445e30 100644
--- a/src/qemu/qemu_domain.c
+++ b/src/qemu/qemu_domain.c
@@ -11492,6 +11492,14 @@ qemuDomainDiskLookupByNodename(virDomainDefPtr def,

 return def->disks[i];
 }
+
+if (def->disks[i]->mirror &&
+(tmp = virStorageSourceFindByNodeName(def->disks[i]->mirror, 
nodename))) {
+if (src)
+*src = tmp;
+
+return def->disks[i];
+}
 }

 return NULL;
-- 
2.26.2

[PATCH 09/10] qemuDomainGetStorageSourceByDevstr: Look also in 'mirror' chain

2020-07-15 Thread Peter Krempa

A disk can have a mirror, look also in it's backing chain.

Signed-off-by: Peter Krempa 
---
 src/qemu/qemu_domain.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
index 18fd445e30..ebf18a546e 100644
--- a/src/qemu/qemu_domain.c
+++ b/src/qemu/qemu_domain.c
@@ -11553,11 +11553,16 @@ qemuDomainGetStorageSourceByDevstr(const char *devstr,
 }

 if (idx == 0)
-src = disk->src;
-else
-src = virStorageFileChainLookup(disk->src, NULL, NULL, idx, NULL);
+return disk->src;
+
+if ((src = virStorageFileChainLookup(disk->src, NULL, NULL, idx, NULL)))
+return src;

-return src;
+if (disk->mirror &&
+(src = virStorageFileChainLookup(disk->mirror, NULL, NULL, idx, NULL)))
+return src;
+
+return NULL;
 }


-- 
2.26.2

[PATCH 00/10] Fix device names reported by 'write_threshold' event and add support for monitoring 'mirror'

2020-07-15 Thread Peter Krempa

Allow monitoring the 'mirror' image write threshold and also report
correct aliases. Clean up some leftovers and improve docs while at it.

Peter Krempa (10):
  virDomainSetBlockThreshold: Document values of '@dev' better
  qemuDomainDiskBackingStoreGetName: Remove unused argument
  qemuDomainDiskBackingStoreGetName: Eliminate temp variable
  qemuProcessHandleBlockThreshold: Report correct indexes
  virDomainSetBlockThreshold: Clarify values of @dev the event is fired
for
  qemuDomainDiskLookupByNodename: Remove unused 'idx'
  virStorageSourceFindByNodeName: Remove unused 'idx' argument
  qemuDomainDiskLookupByNodename: Look also for 'mirror' node names
  qemuDomainGetStorageSourceByDevstr: Look also in 'mirror' chain
  virDomainSetBlockThreshold: Mention that the event can be registered
for 

 src/libvirt-domain.c  | 13 +++
 src/qemu/qemu_domain.c| 49 +++
 src/qemu/qemu_domain.h|  4 +---
 src/qemu/qemu_process.c   | 28 +++---
 src/util/virstoragefile.c | 16 +++--
 src/util/virstoragefile.h |  3 +--
 6 files changed, 61 insertions(+), 52 deletions(-)

-- 
2.26.2

[PATCH 04/10] qemuProcessHandleBlockThreshold: Report correct indexes

2020-07-15 Thread Peter Krempa

The index returned by qemuDomainDiskLookupByNodename is the position in
the backing chain rather than the index we report in the XML.

Since with -blockdev they differ now and additionally the disk source
also has an index we need to fix the 'threshold' evens we report:

1) If it's the top level image we must always trigger the event without
   any suffix as we did until now

2) We must report the correct index

3) We must report the correct index also for the top level image, when
   blockdev is used.

This means that we need to potentially emit 2 events, one for the device
without the index and then when blockdev is used and the top level image
has an idex we must do it also with the index.

This will fix it for blockdev cases, while also not removing previous
semantics.

https://bugzilla.redhat.com/show_bug.cgi?id=1857204

Signed-off-by: Peter Krempa 
---
 src/qemu/qemu_process.c | 26 ++
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index 2ee778c606..9b6dc8e68b 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -1497,10 +1497,10 @@ qemuProcessHandleBlockThreshold(qemuMonitorPtr mon 
G_GNUC_UNUSED,
 void *opaque)
 {
 virQEMUDriverPtr driver = opaque;
-virObjectEventPtr event = NULL;
+virObjectEventPtr eventSource = NULL;
+virObjectEventPtr eventDevice = NULL;
 virDomainDiskDefPtr disk;
 virStorageSourcePtr src;
-unsigned int idx;
 const char *path = NULL;

 virObjectLock(vm);
@@ -1509,18 +1509,28 @@ qemuProcessHandleBlockThreshold(qemuMonitorPtr mon 
G_GNUC_UNUSED,
   "threshold '%llu' exceeded by '%llu'",
   nodename, vm, vm->def->name, threshold, excess);

-if ((disk = qemuDomainDiskLookupByNodename(vm->def, nodename, &src, 
&idx))) {
-g_autofree char *dev = NULL;
+if ((disk = qemuDomainDiskLookupByNodename(vm->def, nodename, &src, 
NULL))) {
 if (virStorageSourceIsLocalStorage(src))
 path = src->path;

-if ((dev = qemuDomainDiskBackingStoreGetName(disk, idx)))
-event = virDomainEventBlockThresholdNewFromObj(vm, dev, path,
-   threshold, excess);
+if (src == disk->src) {
+g_autofree char *dev = qemuDomainDiskBackingStoreGetName(disk, 0);
+
+eventDevice = virDomainEventBlockThresholdNewFromObj(vm, dev, path,
+ threshold, 
excess);
+}
+
+if (src->id != 0) {
+g_autofree char *dev = qemuDomainDiskBackingStoreGetName(disk, 
src->id);
+
+eventSource = virDomainEventBlockThresholdNewFromObj(vm, dev, path,
+ threshold, 
excess);
+}
 }

 virObjectUnlock(vm);
-virObjectEventStateQueue(driver->domainEventState, event);
+virObjectEventStateQueue(driver->domainEventState, eventDevice);
+virObjectEventStateQueue(driver->domainEventState, eventSource);

 return 0;
 }
-- 
2.26.2

Re: [PATCH] Substitute security_context_t with char *

2020-07-15 Thread Andrea Bolognani

On Wed, 2020-07-15 at 13:45 +0200, Michal Privoznik wrote:
> Historically, we've used security_context_t for variables passed
> to libselinux APIs. But almost 7 years ago, libselinux developers
> admitted in their API that in fact, it's just a 'char *' type
> [1]. Ever since then the APIs accept 'char *' instead, but they
> kept the old alias just for API stability. Well, not anymore [2].
> 
> 1: 
> https://github.com/SELinuxProject/selinux/commit/9eb9c9327563014ad6a807814e7975424642d5b9
> 2: 
> https://github.com/SELinuxProject/selinux/commit/7a124ca2758136f49cc38efc26fb1a2d385ecfd9
> 
> Signed-off-by: Michal Privoznik 
> ---
>  src/libvirt-lxc.c|  2 +-
>  src/rpc/virnetsocket.c   |  2 +-
>  src/security/security_selinux.c  | 26 +-
>  src/storage/storage_util.c   |  2 +-
>  src/util/viridentity.c   |  2 +-
>  tests/securityselinuxhelper.c| 16 
>  tests/securityselinuxlabeltest.c |  4 ++--
>  tests/securityselinuxtest.c  |  2 +-
>  tests/viridentitytest.c  |  2 +-
>  9 files changed, 29 insertions(+), 29 deletions(-)

Reviewed-by: Andrea Bolognani 

-- 
Andrea Bolognani / Red Hat / Virtualization

Re: [libvirt PATCH 9/9] rpc: use new virt-nc binary for remote tunnelling

2020-07-15 Thread Andrea Bolognani

On Wed, 2020-07-15 at 11:00 +0100, Daniel P. Berrangé wrote:
> On Fri, Jul 10, 2020 at 07:21:47PM +0200, Andrea Bolognani wrote:
> > Just a couple of comments about the UI: would it make sense to use
> > something like
> > 
> >   qemu+ssh://host/system?tunnelcmd=virt-tunnel
> > 
> > instead? virt-nc could be seen as a bit of a misnomer, considering
> > that (by design) it doesn't do nearly as much as the actual netcat
> > does; as for the 'proxy' argument, I'm afraid it might lead people
> > to believe it's used for HTTP proxying or some other form of
> > proxying *between the client and the host*, whereas it's really
> > something that only affects operations on the host itself - not to
> > mention that we also have a virtproxyd now, further increasing the
> > potential for confusion...
> 
> I chose proxy not tunnel, because SSH is providing the tunnel here.
> virt-nc is a proxy linking the tunnel to the daemon. virtproxyd is
> conceptually similar, again linking a libvirt client to the real
> daemon.

Mh, that makes sense but I'm still wary of using "proxy" due to the
potential for confusion, since in this case the proxy is on the
opposite side of the connection than one would probably expect it
to be. Something like "remoteproxy" or "serverproxy", perhaps?

> I probably shouldn't mention "virt-nc" by name in the URI and instead
> have "proxy=netcat" vs "proxy=native", as users don't get to choose
> the actual binary here - they are providing an enum string, which
> gets mapped to the desired binary.

Yeah, that's much better.

Going back to the name of the command itself, since it's an internal
implementation details, and as such it's not intended to be invoked
by users and accordingly we're installing it under $(libexecdir)
along with existing helpers, what about following the established
naming convention and calling it 'libvirt_proxyhelper'?

-- 
Andrea Bolognani / Red Hat / Virtualization

[PATCH 0/3] Fix virnetsocket failure in IPv4 disabled environment

2020-07-15 Thread Michal Privoznik

This imperfection was reported by Andrea here:

https://www.redhat.com/archives/libvir-list/2020-July/msg00753.html

Michal Prívozník (3):
  virNetSocketCheckProtocols: Separate out checking family via
getaddrinfo()
  virNetSocketCheckProtocols: lookup IPv6 only if suspecting IPv6
  virNetSocketCheckProtocols: Confirm IPv4 by lookup too

 src/rpc/virnetsocket.c | 67 +++---
 1 file changed, 43 insertions(+), 24 deletions(-)

-- 
2.26.2

[PATCH 2/3] virNetSocketCheckProtocols: lookup IPv6 only if suspecting IPv6

2020-07-15 Thread Michal Privoznik

There is not much sense trying to disprove host is IPv6 capable
if we know after first round (getifaddrs()) that is is not.

Signed-off-by: Michal Privoznik 
---
 src/rpc/virnetsocket.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/rpc/virnetsocket.c b/src/rpc/virnetsocket.c
index b6bc3edc3b..b0d63f0f2c 100644
--- a/src/rpc/virnetsocket.c
+++ b/src/rpc/virnetsocket.c
@@ -205,7 +205,8 @@ int virNetSocketCheckProtocols(bool *hasIPv4,
 freeifaddrs(ifaddr);
 
 
-if (virNetSocketCheckProtocolByLookup("::1", AF_INET6, hasIPv6) < 0)
+if (hasIPv6 &&
+virNetSocketCheckProtocolByLookup("::1", AF_INET6, hasIPv6) < 0)
 return -1;
 
 VIR_DEBUG("Protocols: v4 %d v6 %d", *hasIPv4, *hasIPv6);
-- 
2.26.2

[PATCH 3/3] virNetSocketCheckProtocols: Confirm IPv4 by lookup too

2020-07-15 Thread Michal Privoznik

Historically, if we found an interface with an IPv6 address we
did not blindly trust that host is IPv6 capable (as in we can
successfully translate IPv4 addresses) but used getaddrinfo() to
confirm it. Turns out, we have use the same argument for IPv4.
For instance, in an namespace created by the following steps,
getaddrinfo("127.0.0.1", ...) fails (demonstrating by "Socket
TCP/IPv4 Accept" test case failing in virnetsockettest):

  unshare -n
  ip link set lo up
  ip link add dummy0 type dummy
  ip link set dummy0 up

Signed-off-by: Michal Privoznik 
---
 src/rpc/virnetsocket.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/rpc/virnetsocket.c b/src/rpc/virnetsocket.c
index b0d63f0f2c..c62c2fb3fc 100644
--- a/src/rpc/virnetsocket.c
+++ b/src/rpc/virnetsocket.c
@@ -204,6 +204,9 @@ int virNetSocketCheckProtocols(bool *hasIPv4,
 
 freeifaddrs(ifaddr);
 
+if (hasIPv4 &&
+virNetSocketCheckProtocolByLookup("127.0.0.1", AF_INET, hasIPv4) < 0)
+return -1;
 
 if (hasIPv6 &&
 virNetSocketCheckProtocolByLookup("::1", AF_INET6, hasIPv6) < 0)
-- 
2.26.2

[PATCH 1/3] virNetSocketCheckProtocols: Separate out checking family via getaddrinfo()

2020-07-15 Thread Michal Privoznik

The virNetSocketCheckProtocols() function is supposed to tell
caller whether IPv4 and/or IPv6 is supported on the system. In
the initial round, it uses getifaddrs() to see if an interface
has IPv4/IPv6 address assigned and then to double check IPv6 it
uses getaddrinfo() to lookup IPv6 loopback address. Separate out
this latter code because it is going to be reused.

Since the original code lived under an #ifdef and the new
function doesn't it is marked as unused - because on some systems
it may be so.

Signed-off-by: Michal Privoznik 
---
 src/rpc/virnetsocket.c | 63 ++
 1 file changed, 39 insertions(+), 24 deletions(-)

diff --git a/src/rpc/virnetsocket.c b/src/rpc/virnetsocket.c
index d1f4c531aa..b6bc3edc3b 100644
--- a/src/rpc/virnetsocket.c
+++ b/src/rpc/virnetsocket.c
@@ -141,16 +141,48 @@ static int virNetSocketForkDaemon(const char *binary)
 }
 #endif
 
+
+static int G_GNUC_UNUSED
+virNetSocketCheckProtocolByLookup(const char *address,
+  int family,
+  bool *hasFamily)
+{
+struct addrinfo hints;
+struct addrinfo *ai = NULL;
+int gaierr;
+
+memset(&hints, 0, sizeof(hints));
+hints.ai_family = family;
+hints.ai_flags = AI_PASSIVE | AI_ADDRCONFIG;
+hints.ai_socktype = SOCK_STREAM;
+
+if ((gaierr = getaddrinfo(address, NULL, &hints, &ai)) != 0) {
+*hasFamily = false;
+
+if (gaierr == EAI_FAMILY ||
+#ifdef EAI_ADDRFAMILY
+gaierr == EAI_ADDRFAMILY ||
+#endif
+gaierr == EAI_NONAME) {
+} else {
+virReportError(VIR_ERR_INTERNAL_ERROR,
+   _("Cannot resolve ::1 address: %s"),
+   gai_strerror(gaierr));
+return -1;
+}
+} else {
+*hasFamily = true;
+}
+
+freeaddrinfo(ai);
+return 0;
+}
+
 int virNetSocketCheckProtocols(bool *hasIPv4,
bool *hasIPv6)
 {
 #ifdef HAVE_IFADDRS_H
 struct ifaddrs *ifaddr = NULL, *ifa;
-struct addrinfo hints;
-struct addrinfo *ai = NULL;
-int gaierr;
-
-memset(&hints, 0, sizeof(hints));
 
 *hasIPv4 = *hasIPv6 = false;
 
@@ -172,26 +204,9 @@ int virNetSocketCheckProtocols(bool *hasIPv4,
 
 freeifaddrs(ifaddr);
 
-hints.ai_flags = AI_PASSIVE | AI_ADDRCONFIG;
-hints.ai_family = AF_INET6;
-hints.ai_socktype = SOCK_STREAM;
 
-if ((gaierr = getaddrinfo("::1", NULL, &hints, &ai)) != 0) {
-if (gaierr == EAI_FAMILY ||
-# ifdef EAI_ADDRFAMILY
-gaierr == EAI_ADDRFAMILY ||
-# endif
-gaierr == EAI_NONAME) {
-*hasIPv6 = false;
-} else {
-virReportError(VIR_ERR_INTERNAL_ERROR,
-   _("Cannot resolve ::1 address: %s"),
-   gai_strerror(gaierr));
-return -1;
-}
-}
-
-freeaddrinfo(ai);
+if (virNetSocketCheckProtocolByLookup("::1", AF_INET6, hasIPv6) < 0)
+return -1;
 
 VIR_DEBUG("Protocols: v4 %d v6 %d", *hasIPv4, *hasIPv6);
 
-- 
2.26.2

[PATCH] Substitute security_context_t with char *

2020-07-15 Thread Michal Privoznik

Historically, we've used security_context_t for variables passed
to libselinux APIs. But almost 7 years ago, libselinux developers
admitted in their API that in fact, it's just a 'char *' type
[1]. Ever since then the APIs accept 'char *' instead, but they
kept the old alias just for API stability. Well, not anymore [2].

1: 
https://github.com/SELinuxProject/selinux/commit/9eb9c9327563014ad6a807814e7975424642d5b9
2: 
https://github.com/SELinuxProject/selinux/commit/7a124ca2758136f49cc38efc26fb1a2d385ecfd9

Signed-off-by: Michal Privoznik 
---
 src/libvirt-lxc.c|  2 +-
 src/rpc/virnetsocket.c   |  2 +-
 src/security/security_selinux.c  | 26 +-
 src/storage/storage_util.c   |  2 +-
 src/util/viridentity.c   |  2 +-
 tests/securityselinuxhelper.c| 16 
 tests/securityselinuxlabeltest.c |  4 ++--
 tests/securityselinuxtest.c  |  2 +-
 tests/viridentitytest.c  |  2 +-
 9 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/src/libvirt-lxc.c b/src/libvirt-lxc.c
index 47a06a39f2..25f1cfc5f7 100644
--- a/src/libvirt-lxc.c
+++ b/src/libvirt-lxc.c
@@ -204,7 +204,7 @@ virDomainLxcEnterSecurityLabel(virSecurityModelPtr model,
 if (STREQ(model->model, "selinux")) {
 #ifdef WITH_SELINUX
 if (oldlabel) {
-security_context_t ctx;
+char *ctx;
 
 if (getcon(&ctx) < 0) {
 virReportSystemError(errno,
diff --git a/src/rpc/virnetsocket.c b/src/rpc/virnetsocket.c
index c62c2fb3fc..9aaabb4577 100644
--- a/src/rpc/virnetsocket.c
+++ b/src/rpc/virnetsocket.c
@@ -1612,7 +1612,7 @@ int virNetSocketGetUNIXIdentity(virNetSocketPtr sock 
G_GNUC_UNUSED,
 int virNetSocketGetSELinuxContext(virNetSocketPtr sock,
   char **context)
 {
-security_context_t seccon = NULL;
+char *seccon = NULL;
 int ret = -1;
 
 *context = NULL;
diff --git a/src/security/security_selinux.c b/src/security/security_selinux.c
index 1d28430035..cc8fb1099c 100644
--- a/src/security/security_selinux.c
+++ b/src/security/security_selinux.c
@@ -198,7 +198,7 @@ virSecuritySELinuxTransactionAppend(const char *path,
 
 static int
 virSecuritySELinuxRememberLabel(const char *path,
-const security_context_t con)
+const char *con)
 {
 return virSecuritySetRememberedLabel(SECURITY_SELINUX_NAME,
  path, con);
@@ -207,7 +207,7 @@ virSecuritySELinuxRememberLabel(const char *path,
 
 static int
 virSecuritySELinuxRecallLabel(const char *path,
-  security_context_t *con)
+  char **con)
 {
 int rv;
 
@@ -431,7 +431,7 @@ virSecuritySELinuxMCSGetProcessRange(char **sens,
  int *catMin,
  int *catMax)
 {
-security_context_t ourSecContext = NULL;
+char *ourSecContext = NULL;
 context_t ourContext = NULL;
 char *cat = NULL;
 char *tmp;
@@ -530,8 +530,8 @@ virSecuritySELinuxMCSGetProcessRange(char **sens,
 }
 
 static char *
-virSecuritySELinuxContextAddRange(security_context_t src,
-  security_context_t dst)
+virSecuritySELinuxContextAddRange(char *src,
+  char *dst)
 {
 char *str = NULL;
 char *ret = NULL;
@@ -575,7 +575,7 @@ virSecuritySELinuxGenNewContext(const char *basecontext,
 context_t context = NULL;
 char *ret = NULL;
 char *str;
-security_context_t ourSecContext = NULL;
+char *ourSecContext = NULL;
 context_t ourContext = NULL;
 
 VIR_DEBUG("basecontext=%s mcs=%s isObjectContext=%d",
@@ -955,7 +955,7 @@ virSecuritySELinuxReserveLabel(virSecurityManagerPtr mgr,
virDomainDefPtr def,
pid_t pid)
 {
-security_context_t pctx;
+char *pctx;
 context_t ctx = NULL;
 const char *mcs;
 int rv;
@@ -1203,7 +1203,7 @@ virSecuritySELinuxGetProcessLabel(virSecurityManagerPtr 
mgr G_GNUC_UNUSED,
   pid_t pid,
   virSecurityLabelPtr sec)
 {
-security_context_t ctx;
+char *ctx;
 
 if (getpidcon_raw(pid, &ctx) == -1) {
 virReportSystemError(errno,
@@ -1316,7 +1316,7 @@ virSecuritySELinuxSetFilecon(virSecurityManagerPtr mgr,
  bool remember)
 {
 bool privileged = virSecurityManagerGetPrivileged(mgr);
-security_context_t econ = NULL;
+char *econ = NULL;
 int refcount;
 int rc;
 bool rollback = false;
@@ -1426,7 +1426,7 @@ virSecuritySELinuxFSetFilecon(int fd, char *tcon)
 /* Set fcon to the appropriate label for path and mode, or return -1.  */
 static int
 getContext(virSecurityManagerPtr mgr G_GNUC_UNUSED,
-   const char *newpath, mode_t mode, security_context_t *fcon)
+   const char *newpath, mode_t mo

Re: [libvirt PATCH 0/2] tests: Don't assume IPv4 connectivity is available

2020-07-15 Thread Andrea Bolognani

On Wed, 2020-07-15 at 12:38 +0200, Michal Privoznik wrote:
> On 7/14/20 10:32 PM, Andrea Bolognani wrote:
> > I started looking into this after seeing
> > 
> >FAIL: virnetsockettest
> >==
> > 
> >TEST: virnetsockettest
> > 1) Socket TCP/IPv4 Accept   ... libvirt: XML-RPC 
> > error : Unable to resolve address '127.0.0.1' service '5672': Address 
> > family for hostname not supported
> >FAILED
> > during a Debian package build.
> > 
> > Full log:
> > 
> >
> > https://buildd.debian.org/status/fetch.php?pkg=libvirt&arch=armel&ver=6.4.0-2&stamp=1594738948&raw=0
> > 
> > Just a few days ago, this issue was discussed in
> > 
> >https://lists.debian.org/debian-devel/2020/07/msg00070.html
> > 
> > and libvirt was mentioned explicitly as a package that could be
> > affected by it.
> 
> Indeed. I'm able to reproduce and working on a fix as we speak.
> The problem is that our test assumes that if there is an interface with 
> IPv4 address (as returned by getifaddrs()) then getaddrinfo() of an IPv4 
> address succeeds. We've seen with IPv6 that it is not true - that's why 
> virNetSocketCheckProtocols() does explicit getaddrinfo() for an IPv6 
> address. We need to do the same for IPv4.

That's what I thought needed to happen, but I lack the networking
know-how to write the patches myself O:-)

> Anyway, for these two:
> 
> Reviewed-by: Michal Privoznik 

Thanks, pushed now.

-- 
Andrea Bolognani / Red Hat / Virtualization

Re: [libvirt PATCH 0/2] tests: Don't assume IPv4 connectivity is available

2020-07-15 Thread Michal Privoznik


On 7/14/20 10:32 PM, Andrea Bolognani wrote:

I started looking into this after seeing

   FAIL: virnetsockettest
   ==

   TEST: virnetsockettest
1) Socket TCP/IPv4 Accept   ... libvirt: XML-RPC error 
: Unable to resolve address '127.0.0.1' service '5672': Address family for 
hostname not supported
   FAILED




during a Debian package build.

Full log:

   
https://buildd.debian.org/status/fetch.php?pkg=libvirt&arch=armel&ver=6.4.0-2&stamp=1594738948&raw=0

Just a few days ago, this issue was discussed in

   https://lists.debian.org/debian-devel/2020/07/msg00070.html

and libvirt was mentioned explicitly as a package that could be
affected by it.


Indeed. I'm able to reproduce and working on a fix as we speak.
The problem is that our test assumes that if there is an interface with 
IPv4 address (as returned by getifaddrs()) then getaddrinfo() of an IPv4 
address succeeds. We've seen with IPv6 that it is not true - that's why 
virNetSocketCheckProtocols() does explicit getaddrinfo() for an IPv6 
address. We need to do the same for IPv4.


Anyway, for these two:

Reviewed-by: Michal Privoznik 

Michal

Re: Re: [RFC 01/21] build-aux: Add a tool to generate xml parse/format functions

2020-07-15 Thread Daniel P . Berrangé

On Wed, Jul 01, 2020 at 12:06:36AM +0800, Shi Lei wrote:
> >On Wed, Jun 10, 2020 at 09:20:29AM +0800, Shi Lei wrote:
> >> This tool is used to generate parsexml/formatbuf functions for structs.
> >> It is based on libclang and its python-binding.
> >> Some directives (such as genparse, xmlattr, etc.) need to be added on
> >> the declarations of structs to direct the tool.
> >>
> >> Signed-off-by: Shi Lei 
> >> ---
> >>  build-aux/generator/directive.py | 839 +++
> >>  build-aux/generator/go   |  14 +
> >>  build-aux/generator/main.py  | 416 +++
> >>  build-aux/generator/utils.py | 100 

> >
> >> +if __name__ == "__main__":
> >
> >> +
> >> +    libclang_path = os.environ.get('libclang_path')
> >> +    assert libclang_path, 'No libclang library.'
> >> +    Config.set_library_file(libclang_path)
> >
> >I'm wondering if we really need this at all ? AFAICT, the libclang
> >python library works fine without it, so this is only required if
> >trying to use a non-standard libclang, which I think ought to be
> >possible already by setting LD_LIBRARY_PATH
> > 
> 
> I'm working on Ubuntu. On Ubuntu 20.04 LTS and 18.04, I find:
> 
> # ldconfig -p | grep libclang
>         libclang-10.so.1 (libc6,x86-64) => 
> /lib/x86_64-linux-gnu/libclang-10.so.1
>         libclang-10.so (libc6,x86-64) => /lib/x86_64-linux-gnu/libclang-10.so
> 
> There's no libclang.so!
> 
> If I install python3 bindings from the standard apt repository, it works!
> By default, this version just find 'libclang-10.so'.
> 
> But if I use 'pip3' to install python3 bindings, it doesn't work.
> Because it is another verison. This version find 'libclang.so' and it can't 
> find it!
> 
> So I think we can retain these lines to adapt to the distinction among 
> platforms.

Generally libvirt only cares about working against the distro provided
versions of packages.  So if the apt installed python binding works
that is sufficient.

If we want to support an alternative libclang though, I'd suggest that
instead of trying to figure it out magically, just have an optional
arg to configure, eg  --with-libclang=/lib/x86_64-linux-gnu/libclang-10.so

By default don't set any libclang in the python binding - just let it
use its built-in default.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [libvirt PATCH 9/9] rpc: use new virt-nc binary for remote tunnelling

2020-07-15 Thread Daniel P . Berrangé

On Fri, Jul 10, 2020 at 07:21:47PM +0200, Andrea Bolognani wrote:
> On Thu, 2020-07-09 at 19:36 +0100, Daniel P. Berrangé wrote:
> > This wires up support for using the new virt-nc binary with the ssh,
> > libssh and libssh2 protocols.
> > 
> > The new binary will be used preferentially if it is available in $PATH,
> > otherwise we fall back to traditional netcat.
> > 
> > The "proxy" URI parameter can be used to force use of netcat e.g.
> > 
> >   qemu+ssh://host/system?proxy=netcat
> > 
> > or the disable fallback e.g.
> > 
> >   qemu+ssh://host/system?proxy=virt-nc
> > 
> > With use of virt-nc, we can now support remote session URIs
> > 
> >   qemu+ssh://host/session
> > 
> > and this will only use virt-nc, with no fallback. This also lets the
> > libvirtd process be auto-started.
> 
> Just a couple of comments about the UI: would it make sense to use
> something like
> 
>   qemu+ssh://host/system?tunnelcmd=virt-tunnel
> 
> instead? virt-nc could be seen as a bit of a misnomer, considering
> that (by design) it doesn't do nearly as much as the actual netcat
> does; as for the 'proxy' argument, I'm afraid it might lead people
> to believe it's used for HTTP proxying or some other form of
> proxying *between the client and the host*, whereas it's really
> something that only affects operations on the host itself - not to
> mention that we also have a virtproxyd now, further increasing the
> potential for confusion...

I chose proxy not tunnel, because SSH is providing the tunnel here.
virt-nc is a proxy linking the tunnel to the daemon. virtproxyd is
conceptually similar, again linking a libvirt client to the real
daemon.

I probably shouldn't mention "virt-nc" by name in the URI and instead
have "proxy=netcat" vs "proxy=native", as users don't get to choose
the actual binary here - they are providing an enum string, which
gets mapped to the desired binary.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [libvirt PATCH 7/9] remote: introduce virtd-nc helper binary

2020-07-15 Thread Daniel P . Berrangé

On Fri, Jul 10, 2020 at 02:04:00PM +0200, Michal Privoznik wrote:
> On 7/9/20 8:36 PM, Daniel P. Berrangé wrote:
> > When accessing libvirtd over a SSH tunnel, the remote driver must spawn
> > the remote 'nc' process, pointing it to the libvirtd socket path. This
> > is problematic for a number of reasons:
> > 
> >   - The socket path varies according to the --prefix chosen at build
> > time. The remote client is seeing the local prefix, but what we
> > need is the remote prefix
> > 
> >   - The socket path varies according to remote env variables, such as
> > the XDG_RUNTIME_DIR location. Again we see the local XDG_RUNTIME_DIR
> > value, but what we need is the remote value (if any)
> > 
> >   - We can not able to autospawn the libvirtd daemon for session mode
> > access
> > 
> > To address these problems this patch introduces the 'virtd-nc' helper
> > program which takes the URI for the remote driver as a CLI parameter.
> > It then figures out the socket path to connect to using the same
> > code as the remote driver does on the remote host.
> > 
> > Signed-off-by: Daniel P. Berrangé 
> > ---
> >   build-aux/syntax-check.mk  |   2 +-
> >   po/POTFILES.in |   1 +
> >   src/remote/Makefile.inc.am |  30 +++
> >   src/remote/remote_nc.c | 424 +
> >   src/rpc/virnetsocket.h |   1 +
> >   5 files changed, 457 insertions(+), 1 deletion(-)
> >   create mode 100644 src/remote/remote_nc.c
> 
> The spec file needs to be updated too.

Oh right, of course.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [GSoC][PATCH v4 4/4] qemu_domainjob: introduce `privateData` for `qemuDomainJobInfo`

2020-07-15 Thread Prathamesh Chavan

On Tue, Jul 14, 2020 at 10:25 PM Michal Privoznik  wrote:
>
> On 7/14/20 5:14 PM, Prathamesh Chavan wrote:
> > Currently, domainJobInfo also uses "stats" as one of the job specific
> > parameters. To remove this dependency, a privateData structure is
> > introduced.
> >
> > The plan is to even have this structure renamed as
> > `virDomainJobInfoInternal` as there already exists a
> > `virDomainJobInfo'.
>
> I see. Well, I'm not that sure about virDomainJobInfoInternal (mostly
> because this qemuDomainJobInfo structure looks too qemu specific).
> How about moving it under qemuDomainJobObj privateData? I mean,
> qemuDomainJobPrivate structure you propose in 3/4?

Yes, this can be done. This shall be sufficient to remove
qemu_domainjob dependency on other files. Also, since libxl simply
uses the virDomainJobInfo, I think we can skip creating the
virDomainJobInfoInternal altogether.

Thanks.

Re: device compatibility interface for live migration with assigned devices

2020-07-15 Thread Alex Xu

Yan Zhao  于2020年7月15日周三 下午4:32写道：

> On Tue, Jul 14, 2020 at 02:59:48PM -0600, Alex Williamson wrote:
> > On Tue, 14 Jul 2020 18:19:46 +0100
> > "Dr. David Alan Gilbert"  wrote:
> >
> > > * Alex Williamson (alex.william...@redhat.com) wrote:
> > > > On Tue, 14 Jul 2020 11:21:29 +0100
> > > > Daniel P. BerrangÃ©  wrote:
> > > >
> > > > > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote:
> > > > > > hi folks,
> > > > > > we are defining a device migration compatibility interface that
> helps upper
> > > > > > layer stack like openstack/ovirt/libvirt to check if two devices
> are
> > > > > > live migration compatible.
> > > > > > The "devices" here could be MDEVs, physical devices, or hybrid
> of the two.
> > > > > > e.g. we could use it to check whether
> > > > > > - a src MDEV can migrate to a target MDEV,
> > > > > > - a src VF in SRIOV can migrate to a target VF in SRIOV,
> > > > > > - a src MDEV can migration to a target VF in SRIOV.
> > > > > >   (e.g. SIOV/SRIOV backward compatibility case)
> > > > > >
> > > > > > The upper layer stack could use this interface as the last step
> to check
> > > > > > if one device is able to migrate to another device before
> triggering a real
> > > > > > live migration procedure.
> > > > > > we are not sure if this interface is of value or help to you.
> please don't
> > > > > > hesitate to drop your valuable comments.
> > > > > >
> > > > > >
> > > > > > (1) interface definition
> > > > > > The interface is defined in below way:
> > > > > >
> > > > > >  __userspace
> > > > > >   /\  \
> > > > > >  / \write
> > > > > > / read  \
> > > > > >/__   ___\|/_
> > > > > >   | migration_version | | migration_version |-->check
> migration
> > > > > >   - -   compatibility
> > > > > >  device Adevice B
> > > > > >
> > > > > >
> > > > > > a device attribute named migration_version is defined under each
> device's
> > > > > > sysfs node. e.g.
> (/sys/bus/pci/devices/\:00\:02.0/$mdev_UUID/migration_version).
> > > > > > userspace tools read the migration_version as a string from the
> source device,
> > > > > > and write it to the migration_version sysfs attribute in the
> target device.
> > > > > >
> > > > > > The userspace should treat ANY of below conditions as two
> devices not compatible:
> > > > > > - any one of the two devices does not have a migration_version
> attribute
> > > > > > - error when reading from migration_version attribute of one
> device
> > > > > > - error when writing migration_version string of one device to
> > > > > >   migration_version attribute of the other device
> > > > > >
> > > > > > The string read from migration_version attribute is defined by
> device vendor
> > > > > > driver and is completely opaque to the userspace.
> > > > > > for a Intel vGPU, string format can be defined like
> > > > > > "parent device PCI ID" + "version of gvt driver" + "mdev type" +
> "aggregator count".
> > > > > >
> > > > > > for an NVMe VF connecting to a remote storage. it could be
> > > > > > "PCI ID" + "driver version" + "configured remote storage URL"
> > > > > >
> > > > > > for a QAT VF, it may be
> > > > > > "PCI ID" + "driver version" + "supported encryption set".
> > > > > >
> > > > > > (to avoid namespace confliction from each vendor, we may prefix
> a driver name to
> > > > > > each migration_version string. e.g.
> i915-v1-8086-591d-i915-GVTg_V5_8-1)
> > > >
> > > > It's very strange to define it as opaque and then proceed to describe
> > > > the contents of that opaque string.  The point is that its contents
> > > > are defined by the vendor driver to describe the device, driver
> version,
> > > > and possibly metadata about the configuration of the device.  One
> > > > instance of a device might generate a different string from another.
> > > > The string that a device produces is not necessarily the only string
> > > > the vendor driver will accept, for example the driver might support
> > > > backwards compatible migrations.
> > >
> > > (As I've said in the previous discussion, off one of the patch series)
> > >
> > > My view is it makes sense to have a half-way house on the opaqueness of
> > > this string; I'd expect to have an ID and version that are human
> > > readable, maybe a device ID/name that's human interpretable and then a
> > > bunch of other cruft that maybe device/vendor/version specific.
> > >
> > > I'm thinking that we want to be able to report problems and include the
> > > string and the user to be able to easily identify the device that was
> > > complaining and notice a difference in versions, and perhaps also use
> > > it in compatibility patterns to find compatible hosts; but that does
> > > get tricky when it's a 'ask the device if it's compatible'.
> >
> > In the reply I just sent to Dan, I gave this example of what a
> > "

Re: device compatibility interface for live migration with assigned devices

2020-07-15 Thread Daniel P . Berrangé

On Tue, Jul 14, 2020 at 02:47:15PM -0600, Alex Williamson wrote:
> On Tue, 14 Jul 2020 17:47:22 +0100
> Daniel P. Berrangé  wrote:

> > I'm sure OpenStack maintainers can speak to this more, as they've put
> > alot of work into their scheduling engine to optimize the way it places
> > VMs largely driven from simple structured data reported from hosts.
> 
> I think we've weeded out that our intended approach is not worthwhile,
> testing a compatibility string at a device is too much overhead, we
> need to provide enough information to the management engine to predict
> the response without interaction beyond the initial capability probing.

Just to clarify in case people mis-interpreted my POV...

I think that testing a compatibility string at a device *is* useful, as
it allows for a final accurate safety check to be performed before the
migration stream starts. Libvirt could use that reasonably easily I
believe.

It just isn't sufficient for a complete solution.

In parallel with the device level test in sysfs, we need something else
to support the host placement selection problems in an efficient way, as
you are trying to address in the remainder of your mail.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

RE: device compatibility interface for live migration with assigned devices

2020-07-15 Thread Feng, Shaohe




-Original Message-
From: Zhao, Yan Y  
Sent: 2020骞�7���15��� 16:21
To: Alex Williamson 
Cc: Dr. David Alan Gilbert ; Daniel P. Berrang茅 
; de...@ovirt.org; openstack-disc...@lists.openstack.org; 
libvir-list@redhat.com; intel-gvt-...@lists.freedesktop.org; 
k...@vger.kernel.org; qemu-de...@nongnu.org; smoo...@redhat.com; 
eskul...@redhat.com; coh...@redhat.com; dinec...@redhat.com; cor...@lwn.net; 
kwankh...@nvidia.com; eau...@redhat.com; Ding, Jian-feng 
; Xu, Hejie ; Tian, Kevin 
; zhen...@linux.intel.com; bao.yum...@zte.com.cn; Wang, 
Xin-ran ; Feng, Shaohe 
Subject: Re: device compatibility interface for live migration with assigned 
devices

On Tue, Jul 14, 2020 at 02:59:48PM -0600, Alex Williamson wrote:
> On Tue, 14 Jul 2020 18:19:46 +0100
> "Dr. David Alan Gilbert"  wrote:
> 
> > * Alex Williamson (alex.william...@redhat.com) wrote:
> > > On Tue, 14 Jul 2020 11:21:29 +0100 Daniel P. Berrang茅 
> > >  wrote:
> > >   
> > > > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote:  
> > > > > hi folks,
> > > > > we are defining a device migration compatibility interface 
> > > > > that helps upper layer stack like openstack/ovirt/libvirt to 
> > > > > check if two devices are live migration compatible.
> > > > > The "devices" here could be MDEVs, physical devices, or hybrid of the 
> > > > > two.
> > > > > e.g. we could use it to check whether
> > > > > - a src MDEV can migrate to a target MDEV,
> > > > > - a src VF in SRIOV can migrate to a target VF in SRIOV,
> > > > > - a src MDEV can migration to a target VF in SRIOV.
> > > > >   (e.g. SIOV/SRIOV backward compatibility case)
> > > > > 
> > > > > The upper layer stack could use this interface as the last 
> > > > > step to check if one device is able to migrate to another 
> > > > > device before triggering a real live migration procedure.
> > > > > we are not sure if this interface is of value or help to you. 
> > > > > please don't hesitate to drop your valuable comments.
> > > > > 
> > > > > 
> > > > > (1) interface definition
> > > > > The interface is defined in below way:
> > > > > 
> > > > >  __userspace
> > > > >   /\  \
> > > > >  / \write
> > > > > / read  \
> > > > >/__   ___\|/_
> > > > >   | migration_version | | migration_version |-->check migration
> > > > >   - -   compatibility
> > > > >  device Adevice B
> > > > > 
> > > > > 
> > > > > a device attribute named migration_version is defined under 
> > > > > each device's sysfs node. e.g. 
> > > > > (/sys/bus/pci/devices/\:00\:02.0/$mdev_UUID/migration_version).
> > > > > userspace tools read the migration_version as a string from 
> > > > > the source device, and write it to the migration_version sysfs 
> > > > > attribute in the target device.
> > > > > 
> > > > > The userspace should treat ANY of below conditions as two devices not 
> > > > > compatible:
> > > > > - any one of the two devices does not have a migration_version 
> > > > > attribute
> > > > > - error when reading from migration_version attribute of one 
> > > > > device
> > > > > - error when writing migration_version string of one device to
> > > > >   migration_version attribute of the other device
> > > > > 
> > > > > The string read from migration_version attribute is defined by 
> > > > > device vendor driver and is completely opaque to the userspace.
> > > > > for a Intel vGPU, string format can be defined like "parent 
> > > > > device PCI ID" + "version of gvt driver" + "mdev type" + "aggregator 
> > > > > count".
> > > > > 
> > > > > for an NVMe VF connecting to a remote storage. it could be 
> > > > > "PCI ID" + "driver version" + "configured remote storage URL"
> > > > > 
> > > > > for a QAT VF, it may be
> > > > > "PCI ID" + "driver version" + "supported encryption set".
> > > > > 
> > > > > (to avoid namespace confliction from each vendor, we may 
> > > > > prefix a driver name to each migration_version string. e.g. 
> > > > > i915-v1-8086-591d-i915-GVTg_V5_8-1)
> > > 
> > > It's very strange to define it as opaque and then proceed to 
> > > describe the contents of that opaque string.  The point is that 
> > > its contents are defined by the vendor driver to describe the 
> > > device, driver version, and possibly metadata about the 
> > > configuration of the device.  One instance of a device might generate a 
> > > different string from another.
> > > The string that a device produces is not necessarily the only 
> > > string the vendor driver will accept, for example the driver might 
> > > support backwards compatible migrations.
> > 
> > (As I've said in the previous discussion, off one of the patch 
> > series)
> > 
> > My view is it makes sense to have a half-way house on the opaqueness 
> > of this string; I'd expect to have an ID and version that are human 
> > readable, maybe a de

Re: [libvirt PATCH 1/2] ci: Drop Debian 9 jobs

2020-07-15 Thread Daniel P . Berrangé

On Tue, Jul 14, 2020 at 07:59:07PM +0200, Andrea Bolognani wrote:
> The esisting cross-compilation jobs are carefully redistributed

existing.

> among Debian 10 and Debian sid to ensure we don't use the latter
> for aarch64, mipsel or mips64el, since those architectures are
> currently broken.
> 
> Signed-off-by: Andrea Bolognani 
> ---
>  .gitlab-ci.yml | 74 --
>  1 file changed, 12 insertions(+), 62 deletions(-)

Reviewed-by: Daniel P. Berrangé 


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [libvirt PATCH 2/2] ci: Drop Debian 9 containers

2020-07-15 Thread Daniel P . Berrangé

On Tue, Jul 14, 2020 at 07:59:08PM +0200, Andrea Bolognani wrote:
> The corresponding libvirt-ci commit is 5abf5e7e2326.
> 
> Signed-off-by: Andrea Bolognani 
> ---
>  .../libvirt-debian-9-cross-aarch64.Dockerfile | 128 --
>  .../libvirt-debian-9-cross-armv6l.Dockerfile  | 126 -
>  .../libvirt-debian-9-cross-armv7l.Dockerfile  | 127 -
>  .../libvirt-debian-9-cross-mips.Dockerfile| 127 -
>  ...libvirt-debian-9-cross-mips64el.Dockerfile | 127 -
>  .../libvirt-debian-9-cross-mipsel.Dockerfile  | 127 -
>  .../libvirt-debian-9-cross-ppc64le.Dockerfile | 127 -
>  .../libvirt-debian-9-cross-s390x.Dockerfile   | 127 -
>  ci/containers/libvirt-debian-9.Dockerfile | 118 
>  9 files changed, 1134 deletions(-)
>  delete mode 100644 ci/containers/libvirt-debian-9-cross-aarch64.Dockerfile
>  delete mode 100644 ci/containers/libvirt-debian-9-cross-armv6l.Dockerfile
>  delete mode 100644 ci/containers/libvirt-debian-9-cross-armv7l.Dockerfile
>  delete mode 100644 ci/containers/libvirt-debian-9-cross-mips.Dockerfile
>  delete mode 100644 ci/containers/libvirt-debian-9-cross-mips64el.Dockerfile
>  delete mode 100644 ci/containers/libvirt-debian-9-cross-mipsel.Dockerfile
>  delete mode 100644 ci/containers/libvirt-debian-9-cross-ppc64le.Dockerfile
>  delete mode 100644 ci/containers/libvirt-debian-9-cross-s390x.Dockerfile
>  delete mode 100644 ci/containers/libvirt-debian-9.Dockerfile

Reviewed-by: Daniel P. Berrangé 


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: device compatibility interface for live migration with assigned devices

2020-07-15 Thread Yan Zhao

On Tue, Jul 14, 2020 at 02:59:48PM -0600, Alex Williamson wrote:
> On Tue, 14 Jul 2020 18:19:46 +0100
> "Dr. David Alan Gilbert"  wrote:
> 
> > * Alex Williamson (alex.william...@redhat.com) wrote:
> > > On Tue, 14 Jul 2020 11:21:29 +0100
> > > Daniel P. BerrangÃÂ©  wrote:
> > >   
> > > > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote:  
> > > > > hi folks,
> > > > > we are defining a device migration compatibility interface that helps 
> > > > > upper
> > > > > layer stack like openstack/ovirt/libvirt to check if two devices are
> > > > > live migration compatible.
> > > > > The "devices" here could be MDEVs, physical devices, or hybrid of the 
> > > > > two.
> > > > > e.g. we could use it to check whether
> > > > > - a src MDEV can migrate to a target MDEV,
> > > > > - a src VF in SRIOV can migrate to a target VF in SRIOV,
> > > > > - a src MDEV can migration to a target VF in SRIOV.
> > > > >   (e.g. SIOV/SRIOV backward compatibility case)
> > > > > 
> > > > > The upper layer stack could use this interface as the last step to 
> > > > > check
> > > > > if one device is able to migrate to another device before triggering 
> > > > > a real
> > > > > live migration procedure.
> > > > > we are not sure if this interface is of value or help to you. please 
> > > > > don't
> > > > > hesitate to drop your valuable comments.
> > > > > 
> > > > > 
> > > > > (1) interface definition
> > > > > The interface is defined in below way:
> > > > > 
> > > > >  __userspace
> > > > >   /\  \
> > > > >  / \write
> > > > > / read  \
> > > > >/__   ___\|/_
> > > > >   | migration_version | | migration_version |-->check migration
> > > > >   - -   compatibility
> > > > >  device Adevice B
> > > > > 
> > > > > 
> > > > > a device attribute named migration_version is defined under each 
> > > > > device's
> > > > > sysfs node. e.g. 
> > > > > (/sys/bus/pci/devices/\:00\:02.0/$mdev_UUID/migration_version).
> > > > > userspace tools read the migration_version as a string from the 
> > > > > source device,
> > > > > and write it to the migration_version sysfs attribute in the target 
> > > > > device.
> > > > > 
> > > > > The userspace should treat ANY of below conditions as two devices not 
> > > > > compatible:
> > > > > - any one of the two devices does not have a migration_version 
> > > > > attribute
> > > > > - error when reading from migration_version attribute of one device
> > > > > - error when writing migration_version string of one device to
> > > > >   migration_version attribute of the other device
> > > > > 
> > > > > The string read from migration_version attribute is defined by device 
> > > > > vendor
> > > > > driver and is completely opaque to the userspace.
> > > > > for a Intel vGPU, string format can be defined like
> > > > > "parent device PCI ID" + "version of gvt driver" + "mdev type" + 
> > > > > "aggregator count".
> > > > > 
> > > > > for an NVMe VF connecting to a remote storage. it could be
> > > > > "PCI ID" + "driver version" + "configured remote storage URL"
> > > > > 
> > > > > for a QAT VF, it may be
> > > > > "PCI ID" + "driver version" + "supported encryption set".
> > > > > 
> > > > > (to avoid namespace confliction from each vendor, we may prefix a 
> > > > > driver name to
> > > > > each migration_version string. e.g. 
> > > > > i915-v1-8086-591d-i915-GVTg_V5_8-1)  
> > > 
> > > It's very strange to define it as opaque and then proceed to describe
> > > the contents of that opaque string.  The point is that its contents
> > > are defined by the vendor driver to describe the device, driver version,
> > > and possibly metadata about the configuration of the device.  One
> > > instance of a device might generate a different string from another.
> > > The string that a device produces is not necessarily the only string
> > > the vendor driver will accept, for example the driver might support
> > > backwards compatible migrations.  
> > 
> > (As I've said in the previous discussion, off one of the patch series)
> > 
> > My view is it makes sense to have a half-way house on the opaqueness of
> > this string; I'd expect to have an ID and version that are human
> > readable, maybe a device ID/name that's human interpretable and then a
> > bunch of other cruft that maybe device/vendor/version specific.
> > 
> > I'm thinking that we want to be able to report problems and include the
> > string and the user to be able to easily identify the device that was
> > complaining and notice a difference in versions, and perhaps also use
> > it in compatibility patterns to find compatible hosts; but that does
> > get tricky when it's a 'ask the device if it's compatible'.
> 
> In the reply I just sent to Dan, I gave this example of what a
> "compatibility string" might look like represented as json:

Re: device compatibility interface for live migration with assigned devices

2020-07-15 Thread Dr. David Alan Gilbert

* Alex Williamson (alex.william...@redhat.com) wrote:
> On Tue, 14 Jul 2020 18:19:46 +0100
> "Dr. David Alan Gilbert"  wrote:
> 
> > * Alex Williamson (alex.william...@redhat.com) wrote:
> > > On Tue, 14 Jul 2020 11:21:29 +0100
> > > Daniel P. BerrangÃ©  wrote:
> > >   
> > > > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote:  
> > > > > hi folks,
> > > > > we are defining a device migration compatibility interface that helps 
> > > > > upper
> > > > > layer stack like openstack/ovirt/libvirt to check if two devices are
> > > > > live migration compatible.
> > > > > The "devices" here could be MDEVs, physical devices, or hybrid of the 
> > > > > two.
> > > > > e.g. we could use it to check whether
> > > > > - a src MDEV can migrate to a target MDEV,
> > > > > - a src VF in SRIOV can migrate to a target VF in SRIOV,
> > > > > - a src MDEV can migration to a target VF in SRIOV.
> > > > >   (e.g. SIOV/SRIOV backward compatibility case)
> > > > > 
> > > > > The upper layer stack could use this interface as the last step to 
> > > > > check
> > > > > if one device is able to migrate to another device before triggering 
> > > > > a real
> > > > > live migration procedure.
> > > > > we are not sure if this interface is of value or help to you. please 
> > > > > don't
> > > > > hesitate to drop your valuable comments.
> > > > > 
> > > > > 
> > > > > (1) interface definition
> > > > > The interface is defined in below way:
> > > > > 
> > > > >  __userspace
> > > > >   /\  \
> > > > >  / \write
> > > > > / read  \
> > > > >/__   ___\|/_
> > > > >   | migration_version | | migration_version |-->check migration
> > > > >   - -   compatibility
> > > > >  device Adevice B
> > > > > 
> > > > > 
> > > > > a device attribute named migration_version is defined under each 
> > > > > device's
> > > > > sysfs node. e.g. 
> > > > > (/sys/bus/pci/devices/\:00\:02.0/$mdev_UUID/migration_version).
> > > > > userspace tools read the migration_version as a string from the 
> > > > > source device,
> > > > > and write it to the migration_version sysfs attribute in the target 
> > > > > device.
> > > > > 
> > > > > The userspace should treat ANY of below conditions as two devices not 
> > > > > compatible:
> > > > > - any one of the two devices does not have a migration_version 
> > > > > attribute
> > > > > - error when reading from migration_version attribute of one device
> > > > > - error when writing migration_version string of one device to
> > > > >   migration_version attribute of the other device
> > > > > 
> > > > > The string read from migration_version attribute is defined by device 
> > > > > vendor
> > > > > driver and is completely opaque to the userspace.
> > > > > for a Intel vGPU, string format can be defined like
> > > > > "parent device PCI ID" + "version of gvt driver" + "mdev type" + 
> > > > > "aggregator count".
> > > > > 
> > > > > for an NVMe VF connecting to a remote storage. it could be
> > > > > "PCI ID" + "driver version" + "configured remote storage URL"
> > > > > 
> > > > > for a QAT VF, it may be
> > > > > "PCI ID" + "driver version" + "supported encryption set".
> > > > > 
> > > > > (to avoid namespace confliction from each vendor, we may prefix a 
> > > > > driver name to
> > > > > each migration_version string. e.g. 
> > > > > i915-v1-8086-591d-i915-GVTg_V5_8-1)  
> > > 
> > > It's very strange to define it as opaque and then proceed to describe
> > > the contents of that opaque string.  The point is that its contents
> > > are defined by the vendor driver to describe the device, driver version,
> > > and possibly metadata about the configuration of the device.  One
> > > instance of a device might generate a different string from another.
> > > The string that a device produces is not necessarily the only string
> > > the vendor driver will accept, for example the driver might support
> > > backwards compatible migrations.  
> > 
> > (As I've said in the previous discussion, off one of the patch series)
> > 
> > My view is it makes sense to have a half-way house on the opaqueness of
> > this string; I'd expect to have an ID and version that are human
> > readable, maybe a device ID/name that's human interpretable and then a
> > bunch of other cruft that maybe device/vendor/version specific.
> > 
> > I'm thinking that we want to be able to report problems and include the
> > string and the user to be able to easily identify the device that was
> > complaining and notice a difference in versions, and perhaps also use
> > it in compatibility patterns to find compatible hosts; but that does
> > get tricky when it's a 'ask the device if it's compatible'.
> 
> In the reply I just sent to Dan, I gave this example of what a
> "compatibility string" might look like represented as json:
> 
> {
>   "

Re: device compatibility interface for live migration with assigned devices

2020-07-15 Thread Alex Xu

Alex Williamson  于2020年7月15日周三 上午5:00写道：

> On Tue, 14 Jul 2020 18:19:46 +0100
> "Dr. David Alan Gilbert"  wrote:
>
> > * Alex Williamson (alex.william...@redhat.com) wrote:
> > > On Tue, 14 Jul 2020 11:21:29 +0100
> > > Daniel P. Berrangé  wrote:
> > >
> > > > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote:
> > > > > hi folks,
> > > > > we are defining a device migration compatibility interface that
> helps upper
> > > > > layer stack like openstack/ovirt/libvirt to check if two devices
> are
> > > > > live migration compatible.
> > > > > The "devices" here could be MDEVs, physical devices, or hybrid of
> the two.
> > > > > e.g. we could use it to check whether
> > > > > - a src MDEV can migrate to a target MDEV,
> > > > > - a src VF in SRIOV can migrate to a target VF in SRIOV,
> > > > > - a src MDEV can migration to a target VF in SRIOV.
> > > > >   (e.g. SIOV/SRIOV backward compatibility case)
> > > > >
> > > > > The upper layer stack could use this interface as the last step to
> check
> > > > > if one device is able to migrate to another device before
> triggering a real
> > > > > live migration procedure.
> > > > > we are not sure if this interface is of value or help to you.
> please don't
> > > > > hesitate to drop your valuable comments.
> > > > >
> > > > >
> > > > > (1) interface definition
> > > > > The interface is defined in below way:
> > > > >
> > > > >  __userspace
> > > > >   /\  \
> > > > >  / \write
> > > > > / read  \
> > > > >/__   ___\|/_
> > > > >   | migration_version | | migration_version |-->check migration
> > > > >   - -   compatibility
> > > > >  device Adevice B
> > > > >
> > > > >
> > > > > a device attribute named migration_version is defined under each
> device's
> > > > > sysfs node. e.g.
> (/sys/bus/pci/devices/\:00\:02.0/$mdev_UUID/migration_version).
> > > > > userspace tools read the migration_version as a string from the
> source device,
> > > > > and write it to the migration_version sysfs attribute in the
> target device.
> > > > >
> > > > > The userspace should treat ANY of below conditions as two devices
> not compatible:
> > > > > - any one of the two devices does not have a migration_version
> attribute
> > > > > - error when reading from migration_version attribute of one device
> > > > > - error when writing migration_version string of one device to
> > > > >   migration_version attribute of the other device
> > > > >
> > > > > The string read from migration_version attribute is defined by
> device vendor
> > > > > driver and is completely opaque to the userspace.
> > > > > for a Intel vGPU, string format can be defined like
> > > > > "parent device PCI ID" + "version of gvt driver" + "mdev type" +
> "aggregator count".
> > > > >
> > > > > for an NVMe VF connecting to a remote storage. it could be
> > > > > "PCI ID" + "driver version" + "configured remote storage URL"
> > > > >
> > > > > for a QAT VF, it may be
> > > > > "PCI ID" + "driver version" + "supported encryption set".
> > > > >
> > > > > (to avoid namespace confliction from each vendor, we may prefix a
> driver name to
> > > > > each migration_version string. e.g.
> i915-v1-8086-591d-i915-GVTg_V5_8-1)
> > >
> > > It's very strange to define it as opaque and then proceed to describe
> > > the contents of that opaque string.  The point is that its contents
> > > are defined by the vendor driver to describe the device, driver
> version,
> > > and possibly metadata about the configuration of the device.  One
> > > instance of a device might generate a different string from another.
> > > The string that a device produces is not necessarily the only string
> > > the vendor driver will accept, for example the driver might support
> > > backwards compatible migrations.
> >
> > (As I've said in the previous discussion, off one of the patch series)
> >
> > My view is it makes sense to have a half-way house on the opaqueness of
> > this string; I'd expect to have an ID and version that are human
> > readable, maybe a device ID/name that's human interpretable and then a
> > bunch of other cruft that maybe device/vendor/version specific.
> >
> > I'm thinking that we want to be able to report problems and include the
> > string and the user to be able to easily identify the device that was
> > complaining and notice a difference in versions, and perhaps also use
> > it in compatibility patterns to find compatible hosts; but that does
> > get tricky when it's a 'ask the device if it's compatible'.
>
> In the reply I just sent to Dan, I gave this example of what a
> "compatibility string" might look like represented as json:
>
> {
>   "device_api": "vfio-pci",
>   "vendor": "vendor-driver-name",
>   "version": {
> "major": 0,
> "minor": 1
>   },
>

The OpenStack Placement service doesn't support

Re: device compatibility interface for live migration with assigned devices

2020-07-15 Thread Alex Xu

Alex Williamson  于2020年7月15日周三 上午12:16写道：

> On Tue, 14 Jul 2020 11:21:29 +0100
> Daniel P. Berrangé  wrote:
>
> > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote:
> > > hi folks,
> > > we are defining a device migration compatibility interface that helps
> upper
> > > layer stack like openstack/ovirt/libvirt to check if two devices are
> > > live migration compatible.
> > > The "devices" here could be MDEVs, physical devices, or hybrid of the
> two.
> > > e.g. we could use it to check whether
> > > - a src MDEV can migrate to a target MDEV,
> > > - a src VF in SRIOV can migrate to a target VF in SRIOV,
> > > - a src MDEV can migration to a target VF in SRIOV.
> > >   (e.g. SIOV/SRIOV backward compatibility case)
> > >
> > > The upper layer stack could use this interface as the last step to
> check
> > > if one device is able to migrate to another device before triggering a
> real
> > > live migration procedure.
> > > we are not sure if this interface is of value or help to you. please
> don't
> > > hesitate to drop your valuable comments.
> > >
> > >
> > > (1) interface definition
> > > The interface is defined in below way:
> > >
> > >  __userspace
> > >   /\  \
> > >  / \write
> > > / read  \
> > >/__   ___\|/_
> > >   | migration_version | | migration_version |-->check migration
> > >   - -   compatibility
> > >  device Adevice B
> > >
> > >
> > > a device attribute named migration_version is defined under each
> device's
> > > sysfs node. e.g.
> (/sys/bus/pci/devices/\:00\:02.0/$mdev_UUID/migration_version).
> > > userspace tools read the migration_version as a string from the source
> device,
> > > and write it to the migration_version sysfs attribute in the target
> device.
> > >
> > > The userspace should treat ANY of below conditions as two devices not
> compatible:
> > > - any one of the two devices does not have a migration_version
> attribute
> > > - error when reading from migration_version attribute of one device
> > > - error when writing migration_version string of one device to
> > >   migration_version attribute of the other device
> > >
> > > The string read from migration_version attribute is defined by device
> vendor
> > > driver and is completely opaque to the userspace.
> > > for a Intel vGPU, string format can be defined like
> > > "parent device PCI ID" + "version of gvt driver" + "mdev type" +
> "aggregator count".

> >
> > > for an NVMe VF connecting to a remote storage. it could be
> > > "PCI ID" + "driver version" + "configured remote storage URL"
>

If the "configured remote storage URL" is something configuration setting
before the usage, then it isn't something we need for migration compatible
check. Openstack only needs to know the target device's driver and hardware
compatible for migration, then the scheduler will choose a host which such
device, and then Openstack will pre-configure the target host and target
device before the migration, then openstack will configure the correct
remote storage URL to the device. If we want, we can do a sanity check
after the live migration with the os.


> > >
> > > for a QAT VF, it may be
> > > "PCI ID" + "driver version" + "supported encryption set".
> > >
> > > (to avoid namespace confliction from each vendor, we may prefix a
> driver name to
> > > each migration_version string. e.g. i915-v1-8086-591d-i915-GVTg_V5_8-1)
>
> It's very strange to define it as opaque and then proceed to describe
> the contents of that opaque string.  The point is that its contents
> are defined by the vendor driver to describe the device, driver version,
> and possibly metadata about the configuration of the device.  One
> instance of a device might generate a different string from another.
> The string that a device produces is not necessarily the only string
> the vendor driver will accept, for example the driver might support
> backwards compatible migrations.
>
> > > (2) backgrounds
> > >
> > > The reason we hope the migration_version string is opaque to the
> userspace
> > > is that it is hard to generalize standard comparing fields and
> comparing
> > > methods for different devices from different vendors.
> > > Though userspace now could still do a simple string compare to check if
> > > two devices are compatible, and result should also be right, it's still
> > > too limited as it excludes the possible candidate whose
> migration_version
> > > string fails to be equal.
> > > e.g. an MDEV with mdev_type_1, aggregator count 3 is probably
> compatible
> > > with another MDEV with mdev_type_3, aggregator count 1, even their
> > > migration_version strings are not equal.
> > > (assumed mdev_type_3 is of 3 times equal resources of mdev_type_1).
> > >
> > > besides that, driver version + configured resources are all elements
> demanding
> > > t

87 matches

Mail list logo