Re: [PATCH 12/18] qemu: capabilities: Add QEMU_CAPS_BLOCKDEV_HOSTDEV_SCSI
On Thu, Jul 16, 2020 at 00:16:23 +0200, Ján Tomko wrote: > On a Friday in 2020, Peter Krempa wrote: > > We want to instantiate hostdevs via -blockdev too. Add a separate > > capability for them for a clean transition. The new capability will be > > enabled when QEMU_CAPS_BLOCKDEV is present once all code is prepared. > > > > What is the benefit here compared to using QEMU_CAPS_BLOCKDEV directly? It encodes whether the particular VM instance was already started with -blockdev used for hostdevs. The benefit ... or rather necessity is that we need to handle cases when the VM was started with -drive differently on hot-unplug of the hostdev. I'd have to encode that bit of information somehow regardless of which approach is used to actually store it. > Seems more like a libvirt capability than a QEMU capability. It is a capability of that particular instance of the VM the same way as others. A VM started with pre-blockdev libvirt but modern qemu will not use -blockdev for hotplug after libvirt upgrade and this is exactly te same situation. > Jano > > > Signed-off-by: Peter Krempa > > --- > > src/qemu/qemu_capabilities.c | 1 + > > src/qemu/qemu_capabilities.h | 1 + > > 2 files changed, 2 insertions(+)
Re: device compatibility interface for live migration with assigned devices
On 2020/7/14 上午7:29, Yan Zhao wrote: hi folks, we are defining a device migration compatibility interface that helps upper layer stack like openstack/ovirt/libvirt to check if two devices are live migration compatible. The "devices" here could be MDEVs, physical devices, or hybrid of the two. e.g. we could use it to check whether - a src MDEV can migrate to a target MDEV, - a src VF in SRIOV can migrate to a target VF in SRIOV, - a src MDEV can migration to a target VF in SRIOV. (e.g. SIOV/SRIOV backward compatibility case) The upper layer stack could use this interface as the last step to check if one device is able to migrate to another device before triggering a real live migration procedure. we are not sure if this interface is of value or help to you. please don't hesitate to drop your valuable comments. (1) interface definition The interface is defined in below way: __userspace /\ \ / \write / read \ /__ ___\|/_ | migration_version | | migration_version |-->check migration - - compatibility device Adevice B a device attribute named migration_version is defined under each device's sysfs node. e.g. (/sys/bus/pci/devices/\:00\:02.0/$mdev_UUID/migration_version). Are you aware of the devlink based device management interface that is proposed upstream? I think it has many advantages over sysfs, do you consider to switch to that? userspace tools read the migration_version as a string from the source device, and write it to the migration_version sysfs attribute in the target device. The userspace should treat ANY of below conditions as two devices not compatible: - any one of the two devices does not have a migration_version attribute - error when reading from migration_version attribute of one device - error when writing migration_version string of one device to migration_version attribute of the other device The string read from migration_version attribute is defined by device vendor driver and is completely opaque to the userspace. My understanding is that something opaque to userspace is not the philosophy of Linux. Instead of having a generic API but opaque value, why not do in a vendor specific way like: 1) exposing the device capability in a vendor specific way via sysfs/devlink or other API 2) management read capability in both src and dst and determine whether we can do the migration This is the way we plan to do with vDPA. Thanks for a Intel vGPU, string format can be defined like "parent device PCI ID" + "version of gvt driver" + "mdev type" + "aggregator count". for an NVMe VF connecting to a remote storage. it could be "PCI ID" + "driver version" + "configured remote storage URL" for a QAT VF, it may be "PCI ID" + "driver version" + "supported encryption set". (to avoid namespace confliction from each vendor, we may prefix a driver name to each migration_version string. e.g. i915-v1-8086-591d-i915-GVTg_V5_8-1) (2) backgrounds The reason we hope the migration_version string is opaque to the userspace is that it is hard to generalize standard comparing fields and comparing methods for different devices from different vendors. Though userspace now could still do a simple string compare to check if two devices are compatible, and result should also be right, it's still too limited as it excludes the possible candidate whose migration_version string fails to be equal. e.g. an MDEV with mdev_type_1, aggregator count 3 is probably compatible with another MDEV with mdev_type_3, aggregator count 1, even their migration_version strings are not equal. (assumed mdev_type_3 is of 3 times equal resources of mdev_type_1). besides that, driver version + configured resources are all elements demanding to take into account. So, we hope leaving the freedom to vendor driver and let it make the final decision in a simple reading from source side and writing for test in the target side way. we then think the device compatibility issues for live migration with assigned devices can be divided into two steps: a. management tools filter out possible migration target devices. Tags could be created according to info from product specification. we think openstack/ovirt may have vendor proprietary components to create those customized tags for each product from each vendor. e.g. for Intel vGPU, with a vGPU(a MDEV device) in source side, the tags to search target vGPU are like: a tag for compatible parent PCI IDs, a tag for a range of gvt driver versions, a tag for a range of mdev type + aggregator count for NVMe VF, the tags to search target VF may be like: a tag for compatible PCI IDs, a tag for a range of driver versions, a tag for URL of configured remote storage. b. with the output from st
Re: [RFC v2 1/1] memory: Delete assertion in memory_region_unregister_iommu_notifier
On 2020/7/16 上午9:00, Peter Xu wrote: On Mon, Jul 13, 2020 at 12:04:16PM +0800, Jason Wang wrote: On 2020/7/10 下午9:30, Peter Xu wrote: On Fri, Jul 10, 2020 at 02:34:11PM +0800, Jason Wang wrote: On 2020/7/9 下午10:10, Peter Xu wrote: On Thu, Jul 09, 2020 at 01:58:33PM +0800, Jason Wang wrote: - If we care the performance, it's better to implement the MAP event for vhost, otherwise it could be a lot of IOTLB miss I feel like these are two things. So far what we are talking about is whether vt-d should have knowledge about what kind of events one iommu notifier is interested in. I still think we should keep this as answered in question 1. The other question is whether we want to switch vhost from UNMAP to MAP/UNMAP events even without vDMA, so that vhost can establish the mapping even before IO starts. IMHO it's doable, but only if the guest runs DPDK workloads. When the guest is using dynamic iommu page mappings, I feel like that can be even slower, because then the worst case is for each IO we'll need to vmexit twice: - The first vmexit caused by an invalidation to MAP the page tables, so vhost will setup the page table before IO starts - IO/DMA triggers and completes - The second vmexit caused by another invalidation to UNMAP the page tables So it seems to be worse than when vhost only uses UNMAP like right now. At least we only have one vmexit (when UNMAP). We'll have a vhost translate() request from kernel to userspace, but IMHO that's cheaper than the vmexit. Right but then I would still prefer to have another notifier. Since vtd_page_walk has nothing to do with device IOTLB. IOMMU have a dedicated command for flushing device IOTLB. But the check for vtd_as_has_map_notifier is used to skip the device which can do demand paging via ATS or device specific way. If we have two different notifiers, vhost will be on the device iotlb notifier so we don't need it at all? But we can still have iommu notifier that only registers to UNMAP even after we introduce dev-iotlb notifier? We don't want to do page walk for them as well. TCG should be the only one so far, but I don't know.. maybe there can still be new ones? I think you're right. But looking at the codes, it looks like the check of vtd_as_has_map_notifier() was only used in: 1) vtd_iommu_replay() 2) vtd_iotlb_page_invalidate_notify() (PSI) For the replay, it's expensive anyhow. For PSI, I think it's just about one or few mappings, not sure it will have obvious performance impact. And I had two questions: 1) The codes doesn't check map for DSI or GI, does this match what spec said? (It looks to me the spec is unclear in this part) Both DSI/GI should cover maps too? E.g. vtd_sync_shadow_page_table() in vtd_iotlb_domain_invalidate(). I meant the code doesn't check whether there's an MAP notifier :) It's actually checked, because it loops over vtd_as_with_notifiers, and only MAP notifiers register to that. :) I may miss something but I don't find the code to block UNMAP notifiers? vhost_iommu_region_add() memory_region_register_iommu_notifier() memory_region_update_iommu_notify_flags() imrc->notify_flag_changed() vtd_iommu_notify_flag_changed() ? But I agree with you that it should be cleaner to introduce the dev-iotlb notifier type. Yes, it can solve the issues of duplicated UNMAP issue of vhost. 2) for the replay() I don't see other implementations (either spapr or generic one) that did unmap (actually they skip unmap explicitly), any reason for doing this in intel IOMMU? I could be wrong, but I'd guess it's because vt-d implemented the caching mode by leveraging the same invalidation strucuture, so it's harder to make all things right (IOW, we can't clearly identify MAP with UNMAP when we receive an invalidation request, because MAP/UNMAP requests look the same). I didn't check others, but I believe spapr is doing it differently by using some hypercalls to deliver IOMMU map/unmap requests, which seems a bit close to what virtio-iommu is doing. Anyway, the point is if we have explicit MAP/UNMAP from the guest, logically the replay indeed does not need to do any unmap because we don't need to call replay() on an already existing device but only for e.g. hot plug. But this looks conflict with what memory_region_iommu_replay( ) did, for IOMMU that doesn't have a replay method, it skips UNMAP request: for (addr = 0; addr < memory_region_size(mr); addr += granularity) { iotlb = imrc->translate(iommu_mr, addr, IOMMU_NONE, n->iommu_idx); if (iotlb.perm != IOMMU_NONE) { n->notify(n, &iotlb); } I guess there's no knowledge of whether guest have an explicit MAP/UMAP for this generic code. Or replay implies that guest doesn't have explicit MAP/UNMAP? I think it matches exactly with a hot plug case? Note that when IOMMU_NONE could also mean the translation does not exist. So it's actually trying to m
Re: [RFC v2 1/1] memory: Delete assertion in memory_region_unregister_iommu_notifier
On Mon, Jul 13, 2020 at 12:04:16PM +0800, Jason Wang wrote: > > On 2020/7/10 下午9:30, Peter Xu wrote: > > On Fri, Jul 10, 2020 at 02:34:11PM +0800, Jason Wang wrote: > > > On 2020/7/9 下午10:10, Peter Xu wrote: > > > > On Thu, Jul 09, 2020 at 01:58:33PM +0800, Jason Wang wrote: > > > > > > > - If we care the performance, it's better to implement the MAP > > > > > > > event for > > > > > > > vhost, otherwise it could be a lot of IOTLB miss > > > > > > I feel like these are two things. > > > > > > > > > > > > So far what we are talking about is whether vt-d should have > > > > > > knowledge about > > > > > > what kind of events one iommu notifier is interested in. I still > > > > > > think we > > > > > > should keep this as answered in question 1. > > > > > > > > > > > > The other question is whether we want to switch vhost from UNMAP to > > > > > > MAP/UNMAP > > > > > > events even without vDMA, so that vhost can establish the mapping > > > > > > even before > > > > > > IO starts. IMHO it's doable, but only if the guest runs DPDK > > > > > > workloads. When > > > > > > the guest is using dynamic iommu page mappings, I feel like that > > > > > > can be even > > > > > > slower, because then the worst case is for each IO we'll need to > > > > > > vmexit twice: > > > > > > > > > > > > - The first vmexit caused by an invalidation to MAP the page > > > > > > tables, so vhost > > > > > >will setup the page table before IO starts > > > > > > > > > > > > - IO/DMA triggers and completes > > > > > > > > > > > > - The second vmexit caused by another invalidation to UNMAP > > > > > > the page tables > > > > > > > > > > > > So it seems to be worse than when vhost only uses UNMAP like right > > > > > > now. At > > > > > > least we only have one vmexit (when UNMAP). We'll have a vhost > > > > > > translate() > > > > > > request from kernel to userspace, but IMHO that's cheaper than the > > > > > > vmexit. > > > > > Right but then I would still prefer to have another notifier. > > > > > > > > > > Since vtd_page_walk has nothing to do with device IOTLB. IOMMU have a > > > > > dedicated command for flushing device IOTLB. But the check for > > > > > vtd_as_has_map_notifier is used to skip the device which can do demand > > > > > paging via ATS or device specific way. If we have two different > > > > > notifiers, > > > > > vhost will be on the device iotlb notifier so we don't need it at all? > > > > But we can still have iommu notifier that only registers to UNMAP even > > > > after we > > > > introduce dev-iotlb notifier? We don't want to do page walk for them > > > > as well. > > > > TCG should be the only one so far, but I don't know.. maybe there can > > > > still be > > > > new ones? > > > > > > I think you're right. But looking at the codes, it looks like the check of > > > vtd_as_has_map_notifier() was only used in: > > > > > > 1) vtd_iommu_replay() > > > 2) vtd_iotlb_page_invalidate_notify() (PSI) > > > > > > For the replay, it's expensive anyhow. For PSI, I think it's just about > > > one > > > or few mappings, not sure it will have obvious performance impact. > > > > > > And I had two questions: > > > > > > 1) The codes doesn't check map for DSI or GI, does this match what spec > > > said? (It looks to me the spec is unclear in this part) > > Both DSI/GI should cover maps too? E.g. vtd_sync_shadow_page_table() in > > vtd_iotlb_domain_invalidate(). > > > I meant the code doesn't check whether there's an MAP notifier :) It's actually checked, because it loops over vtd_as_with_notifiers, and only MAP notifiers register to that. :) But I agree with you that it should be cleaner to introduce the dev-iotlb notifier type. > > > > > > > 2) for the replay() I don't see other implementations (either spapr or > > > generic one) that did unmap (actually they skip unmap explicitly), any > > > reason for doing this in intel IOMMU? > > I could be wrong, but I'd guess it's because vt-d implemented the caching > > mode > > by leveraging the same invalidation strucuture, so it's harder to make all > > things right (IOW, we can't clearly identify MAP with UNMAP when we receive > > an > > invalidation request, because MAP/UNMAP requests look the same). > > > > I didn't check others, but I believe spapr is doing it differently by using > > some hypercalls to deliver IOMMU map/unmap requests, which seems a bit > > close to > > what virtio-iommu is doing. Anyway, the point is if we have explicit > > MAP/UNMAP > > from the guest, logically the replay indeed does not need to do any unmap > > because we don't need to call replay() on an already existing device but > > only > > for e.g. hot plug. > > > But this looks conflict with what memory_region_iommu_replay( ) did, for > IOMMU that doesn't have a replay method, it skips UNMAP request: > > for (addr = 0; addr < memory_region_size(mr); addr += granularity) { > iotlb = imrc->translate(iommu_mr, addr, IOMM
Re: [PATCH 03/11] virDomainHostdevDefFormatSubsys: Split out formatting of PCI subsystem
On a Tuesday in 2020, Peter Krempa wrote: Similarly to previous commit split out formatting of the PCI subsystem related stuff. Signed-off-by: Peter Krempa --- src/conf/domain_conf.c | 74 ++ 1 file changed, 46 insertions(+), 28 deletions(-) Reviewed-by: Ján Tomko Jano signature.asc Description: PGP signature
Re: [PATCH 02/11] virDomainHostdevDefFormatSubsys: Split out formatting of USB subsystem
On a Tuesday in 2020, Peter Krempa wrote: Separate out bits related to USB so that the logic isn't entangled in multiple conditional statements. Signed-off-by: Peter Krempa --- src/conf/domain_conf.c | 73 +- 1 file changed, 51 insertions(+), 22 deletions(-) Reviewed-by: Ján Tomko Jano signature.asc Description: PGP signature
Re: [PATCH 01/11] virDomainHostdevDefFormatSubsys: Use virXMLFormatElement
On a Tuesday in 2020, Peter Krempa wrote: Refactor the formatter to the new multiple buffer based approach so that we can easily separate it into formatters per subsys type. Signed-off-by: Peter Krempa --- src/conf/domain_conf.c | 65 ++ 1 file changed, 27 insertions(+), 38 deletions(-) Reviewed-by: Ján Tomko Jano signature.asc Description: PGP signature
Re: [PATCH 17/18] qemu: caps: Enable QEMU_CAPS_BLOCKDEV_HOSTDEV_SCSI
On a Friday in 2020, Peter Krempa wrote: Enable it when regular QEMU_CAPS_BLOCKDEV is present. Signed-off-by: Peter Krempa --- src/qemu/qemu_capabilities.c | 3 ++ .../caps_4.2.0.aarch64.xml| 1 + .../qemucapabilitiesdata/caps_4.2.0.s390x.xml | 1 + .../caps_4.2.0.x86_64.xml | 1 + .../caps_5.0.0.aarch64.xml| 1 + .../qemucapabilitiesdata/caps_5.0.0.ppc64.xml | 1 + .../caps_5.0.0.riscv64.xml| 1 + .../caps_5.0.0.x86_64.xml | 1 + .../caps_5.1.0.x86_64.xml | 1 + .../hostdev-scsi-lsi.x86_64-latest.args | 52 +++ ...ostdev-scsi-virtio-scsi.x86_64-latest.args | 46 11 files changed, 65 insertions(+), 44 deletions(-) Reviewed-by: Ján Tomko Jano signature.asc Description: PGP signature
Re: [PATCH 12/18] qemu: capabilities: Add QEMU_CAPS_BLOCKDEV_HOSTDEV_SCSI
On a Friday in 2020, Peter Krempa wrote: We want to instantiate hostdevs via -blockdev too. Add a separate capability for them for a clean transition. The new capability will be enabled when QEMU_CAPS_BLOCKDEV is present once all code is prepared. What is the benefit here compared to using QEMU_CAPS_BLOCKDEV directly? Seems more like a libvirt capability than a QEMU capability. Jano Signed-off-by: Peter Krempa --- src/qemu/qemu_capabilities.c | 1 + src/qemu/qemu_capabilities.h | 1 + 2 files changed, 2 insertions(+) signature.asc Description: PGP signature
Re: [PATCH 18/18] qemuBuildSCSIHostdevDrvStr: unexport
On a Friday in 2020, Peter Krempa wrote: The function is no longer called from other modules. Signed-off-by: Peter Krempa --- src/qemu/qemu_command.c | 2 +- src/qemu/qemu_command.h | 3 --- 2 files changed, 1 insertion(+), 4 deletions(-) Reviewed-by: Ján Tomko Jano signature.asc Description: PGP signature
Re: [PATCH 16/18] qemuDomainRemoveHostDevice: Use new infrastructure for (i)SCSI
On a Friday in 2020, Peter Krempa wrote: Similarly to previous commits, modify the hostdev detach code to use blockdev infrastructure to detach (i)SCSI hostdevs. Signed-off-by: Peter Krempa --- src/qemu/qemu_hotplug.c | 20 +++- 1 file changed, 3 insertions(+), 17 deletions(-) Reviewed-by: Ján Tomko Jano signature.asc Description: PGP signature
Re: [PATCH 15/18] qemuDomainAttachHostSCSIDevice: Use new infrastructure
On a Friday in 2020, Peter Krempa wrote: Similarly to command line creation, use the blockdev helpers when hotplugging an (i)SCSI hostdev. Signed-off-by: Peter Krempa --- src/qemu/qemu_hotplug.c | 45 +++-- 1 file changed, 7 insertions(+), 38 deletions(-) Reviewed-by: Ján Tomko Jano signature.asc Description: PGP signature
Re: [PATCH 14/18] qemuBuildHostdevSCSICommandLine: Use new infrastructure
On a Friday in 2020, Peter Krempa wrote: In preparation for instantiating (i)SCSI hostdevs via -blockdev, refactor qemuBuildHostdevSCSICommandLine to use the new infrastructure which will do it automatically. Signed-off-by: Peter Krempa --- src/qemu/qemu_command.c | 44 - 1 file changed, 4 insertions(+), 40 deletions(-) Reviewed-by: Ján Tomko Jano signature.asc Description: PGP signature
Re: [PATCH 13/18] qemu: command: Create qemuBlockStorageSourceAttachData for (i)SCSI hostdevs
On a Friday in 2020, Peter Krempa wrote: Add convertor for creating qemuBlockStorageSourceAttachData which will allow reusing the infrastructure which we have for attaching disks also for hostdevs. Signed-off-by: Peter Krempa --- src/qemu/qemu_command.c | 83 + src/qemu/qemu_command.h | 8 2 files changed, 91 insertions(+) Reviewed-by: Ján Tomko Jano signature.asc Description: PGP signature
Re: [PATCH 11/18] qemuBuildSCSIHostdevDevStr: Pass in backend alias
On a Friday in 2020, Peter Krempa wrote: Don't (re)generate the backend alias (alias of the -drive backend for now) internally but rather pass it in. Later on it will be replaced by the nodename when blockdev is used depending on the capabilities. Signed-off-by: Peter Krempa --- src/qemu/qemu_command.c | 14 -- src/qemu/qemu_command.h | 4 +++- src/qemu/qemu_hotplug.c | 2 +- 3 files changed, 12 insertions(+), 8 deletions(-) Reviewed-by: Ján Tomko Jano signature.asc Description: PGP signature
Re: [PATCH 10/18] qemuBuildHostdevCommandLine: Extract (i)SCSI code
On a Friday in 2020, Peter Krempa wrote: Move all (i)SCSI related code into a new function named 'qemuBuildHostdevSCSICommandLine'. Signed-off-by: Peter Krempa --- src/qemu/qemu_command.c | 61 + 1 file changed, 38 insertions(+), 23 deletions(-) Reviewed-by: Ján Tomko Jano signature.asc Description: PGP signature
Re: [PATCH 08/18] qemu: domain: Regenerate hostdev source private data
On a Friday in 2020, Peter Krempa wrote: When upgrading from a libvirt which didn't format private data of a virStorageSource representing an iSCSI hostdev source, we might need to generate some internal data so that the code still works as if it was present in the status XML. Signed-off-by: Peter Krempa --- src/qemu/qemu_domain.c| 54 ++- .../disk-secinfo-upgrade-in.xml | 20 +++ .../disk-secinfo-upgrade-out.xml | 30 +++ 3 files changed, 102 insertions(+), 2 deletions(-) Reviewed-by: Ján Tomko Jano signature.asc Description: PGP signature
Re: [PATCH 09/18] qemu: hotplug: Don't regenerate iSCSI secret alias
On a Friday in 2020, Peter Krempa wrote: We now store the alias of the secrets in the status XML so there's no need to generate it again. Signed-off-by: Peter Krempa --- src/qemu/qemu_hotplug.c | 19 +++ 1 file changed, 7 insertions(+), 12 deletions(-) Reviewed-by: Ján Tomko Jano signature.asc Description: PGP signature
Re: [PATCH 07/18] qemuDomainSecretHostdevDestroy: Don't clear secinfo alias
On a Friday in 2020, Peter Krempa wrote: We need the alias to deal with hot-unplug of the hostdev. Use qemuDomainSecretInfoDestroy which clears only the secrets and not the alias. The same function is used also for handling disk secrets. Signed-off-by: Peter Krempa --- src/qemu/qemu_domain.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Reviewed-by: Ján Tomko Jano signature.asc Description: PGP signature
Re: [PATCH 06/18] qemustatusxml2xmltest: Add tests for iSCSI hostdev private data handling
On a Friday in 2020, Peter Krempa wrote: Signed-off-by: Peter Krempa --- tests/qemustatusxml2xmldata/modern-in.xml | 18 ++ 1 file changed, 18 insertions(+) Reviewed-by: Ján Tomko Jano signature.asc Description: PGP signature
Re: [PATCH 05/18] virDomainHostdevSubsysSCSIiSCSIDefParseXML: Parse private data of virStorageSource
On a Friday in 2020, Peter Krempa wrote: We store the config of an iSCSI hostdev in a virStorageSource structure. Parse the private data portion. Signed-off-by: Peter Krempa --- src/conf/domain_conf.c | 39 +-- 1 file changed, 29 insertions(+), 10 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index bda9375f13..ceaf73772d 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -8283,7 +8283,9 @@ virDomainHostdevSubsysSCSIHostDefParseXML(xmlNodePtr sourcenode, static int virDomainHostdevSubsysSCSIiSCSIDefParseXML(xmlNodePtr sourcenode, virDomainHostdevSubsysSCSIPtr def, - xmlXPathContextPtr ctxt) + xmlXPathContextPtr ctxt, + unsigned int flags, + virDomainXMLOptionPtr xmlopt) { int auth_secret_usage = -1; xmlNodePtr cur; @@ -8348,13 +8350,27 @@ virDomainHostdevSubsysSCSIiSCSIDefParseXML(xmlNodePtr sourcenode, } cur = cur->next; } + +if ((flags & VIR_DOMAIN_DEF_PARSE_STATUS) && Extra parentheses. +xmlopt && xmlopt->privateData.storageParse) { +VIR_XPATH_NODE_AUTORESTORE(ctxt); + +ctxt->node = sourcenode; + +if ((ctxt->node = virXPathNode("./privateData", ctxt)) && +xmlopt->privateData.storageParse(ctxt, iscsisrc->src) < 0) +return -1; +} + return 0; } Reviewed-by: Ján Tomko Jano signature.asc Description: PGP signature
Re: [PATCH 04/18] virDomainHostdevDefFormatSubsys: Format private data for a virStorageSource
On a Friday in 2020, Peter Krempa wrote: iSCSI subsystem hostdevs store the data as a virStorageSource. Format the private data part of the virStorageSource in the appropriate place. Signed-off-by: Peter Krempa --- src/conf/domain_conf.c | 32 1 file changed, 20 insertions(+), 12 deletions(-) Reviewed-by: Ján Tomko Jano signature.asc Description: PGP signature
Re: [PATCH 02/18] qemuBlockStorageSourceGetBackendProps: Allow skipping "discard":"unmap"
On a Friday in 2020, Peter Krempa wrote: It doesn't make sense to format "discard" when doing a -blockdev backend of scsi-generic used with SCSI hostdevs. Add a way to skip it. Signed-off-by: Peter Krempa --- src/qemu/qemu_block.c | 8 src/qemu/qemu_block.h | 1 + 2 files changed, 9 insertions(+) diff --git a/src/qemu/qemu_block.c b/src/qemu/qemu_block.c index 10ddf53b3b..bdd07d9925 100644 --- a/src/qemu/qemu_block.c +++ b/src/qemu/qemu_block.c @@ -1061,6 +1061,9 @@ qemuBlockStorageSourceGetBlockdevGetCacheProps(virStorageSourcePtr src, * omit any data which does not identify the image itself * QEMU_BLOCK_STORAGE_SOURCE_BACKEND_PROPS_AUTO_READONLY: * use the auto-read-only feature of qemu + * QEMU_BLOCK_STORAGE_SOURCE_BACKEND_PROPS_SKIP_UNMAP: + * don't enable 'discard:unmap' option for passing through dicards discards + * (note that this is disabled also for _LEGACY and _TARGET_ONLY options) * So outside of tests, this is (currently) used by one out of three callers. Would it make sense to negate the option? * Creates a JSON object describing the underlying storage or protocol of a * storage source. Returns NULL on error and reports an appropriate error message. Reviewed-by: Ján Tomko Jano signature.asc Description: PGP signature
Re: [PATCH 03/18] qemuBlockStorageSourceAttachData: Add field for ad-hoc storage node name
On a Friday in 2020, Peter Krempa wrote: SCSI hostdevs don't have a virStorageSource associated with the backend in certain cases. Adding a separate field to hold memory for a copy of the nodename of the storage backend will allow reusing the blockdev machinery also for SCSI hostdevs. Signed-off-by: Peter Krempa --- src/qemu/qemu_block.c | 1 + src/qemu/qemu_block.h | 1 + 2 files changed, 2 insertions(+) Reviewed-by: Ján Tomko Jano signature.asc Description: PGP signature
Re: [PATCH 01/18] qemuBlockStorageSourceGetBackendProps: Convert boolean arguments to flags
On a Friday in 2020, Peter Krempa wrote: Upcoming commit will need to add another flag for the function so convert it to a bitwise-or'd array of flags to prevent having 4 booleans. false true false true false true false false false true true false true false false false false true true false false false false true false true true false true true true false false true true false true false true true false false true false false false false false false true true true true false false true false true true false true true true true false true true true false true false true false false true false false false false true Signed-off-by: Peter Krempa --- src/qemu/qemu_block.c | 32 +--- src/qemu/qemu_block.h | 10 +++--- src/qemu/qemu_command.c | 3 ++- tests/qemublocktest.c | 18 -- 4 files changed, 42 insertions(+), 21 deletions(-) diff --git a/src/qemu/qemu_block.c b/src/qemu/qemu_block.c index a2eabbcd64..10ddf53b3b 100644 --- a/src/qemu/qemu_block.c +++ b/src/qemu/qemu_block.c @@ -1052,26 +1052,32 @@ qemuBlockStorageSourceGetBlockdevGetCacheProps(virStorageSourcePtr src, /** * qemuBlockStorageSourceGetBackendProps: * @src: disk source - * @legacy: use legacy formatting of attributes (for -drive / old qemus) - * @onlytarget: omit any data which does not identify the image itself - * @autoreadonly: use the auto-read-only feature of qemu + * @flags: bitwise-or of qemuBlockStorageSourceBackendPropsFlags + * + * Flags: + * QEMU_BLOCK_STORAGE_SOURCE_BACKEND_PROPS_LEGACY: This flag has three spaces in front of it. + * use legacy formatting of attributes (for -drive / old qemus) + * QEMU_BLOCK_STORAGE_SOURCE_BACKEND_PROPS_TARGET_ONLY: + * omit any data which does not identify the image itself + * QEMU_BLOCK_STORAGE_SOURCE_BACKEND_PROPS_AUTO_READONLY: + * use the auto-read-only feature of qemu * Reviewed-by: Ján Tomko Jano signature.asc Description: PGP signature
Re: [PATCH 1/3] virNetSocketCheckProtocols: Separate out checking family via getaddrinfo()
On a Wednesday in 2020, Michal Privoznik wrote: The virNetSocketCheckProtocols() function is supposed to tell caller whether IPv4 and/or IPv6 is supported on the system. In the initial round, it uses getifaddrs() to see if an interface has IPv4/IPv6 address assigned and then to double check IPv6 it uses getaddrinfo() to lookup IPv6 loopback address. Separate out this latter code because it is going to be reused. Since the original code lived under an #ifdef and the new function doesn't it is marked as unused - because on some systems it may be so. Signed-off-by: Michal Privoznik --- src/rpc/virnetsocket.c | 63 ++ 1 file changed, 39 insertions(+), 24 deletions(-) diff --git a/src/rpc/virnetsocket.c b/src/rpc/virnetsocket.c index d1f4c531aa..b6bc3edc3b 100644 --- a/src/rpc/virnetsocket.c +++ b/src/rpc/virnetsocket.c @@ -141,16 +141,48 @@ static int virNetSocketForkDaemon(const char *binary) } #endif + +static int G_GNUC_UNUSED +virNetSocketCheckProtocolByLookup(const char *address, + int family, + bool *hasFamily) +{ +struct addrinfo hints; +struct addrinfo *ai = NULL; +int gaierr; + +memset(&hints, 0, sizeof(hints)); +hints.ai_family = family; +hints.ai_flags = AI_PASSIVE | AI_ADDRCONFIG; +hints.ai_socktype = SOCK_STREAM; + +if ((gaierr = getaddrinfo(address, NULL, &hints, &ai)) != 0) { +*hasFamily = false; + +if (gaierr == EAI_FAMILY || +#ifdef EAI_ADDRFAMILY +gaierr == EAI_ADDRFAMILY || +#endif +gaierr == EAI_NONAME) { +} else { +virReportError(VIR_ERR_INTERNAL_ERROR, + _("Cannot resolve ::1 address: %s"), s/::1/'%s'/ + gai_strerror(gaierr)); +return -1; +} +} else { +*hasFamily = true; +} Jano signature.asc Description: PGP signature
Re: [PATCH 0/3] Fix virnetsocket failure in IPv4 disabled environment
On a Wednesday in 2020, Michal Privoznik wrote: This imperfection was reported by Andrea here: https://www.redhat.com/archives/libvir-list/2020-July/msg00753.html Michal Prívozník (3): virNetSocketCheckProtocols: Separate out checking family via getaddrinfo() virNetSocketCheckProtocols: lookup IPv6 only if suspecting IPv6 virNetSocketCheckProtocols: Confirm IPv4 by lookup too src/rpc/virnetsocket.c | 67 +++--- 1 file changed, 43 insertions(+), 24 deletions(-) Reviewed-by: Ján Tomko Jano signature.asc Description: PGP signature
[libvirt PATCH] network: split out networkSetIPv6Sysctl
Refactor networkSetIPv6Sysctls to remove repetition and reuse of the 'field' variable. Signed-off-by: Ján Tomko --- src/network/bridge_driver.c | 71 ++--- 1 file changed, 35 insertions(+), 36 deletions(-) diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c index 713763130b..d817b5cbd6 100644 --- a/src/network/bridge_driver.c +++ b/src/network/bridge_driver.c @@ -2249,37 +2249,51 @@ networkEnableIPForwarding(bool enableIPv4, } +static int +networkSetIPv6Sysctl(const char *bridge, + const char *sysctl_field, + const char *sysctl_setting, + bool ignoreMissing) +{ +g_autofree char *field = g_strdup_printf(SYSCTL_PATH "/net/ipv6/conf/%s/%s", + bridge, sysctl_field); + +if (ignoreMissing && access(field, W_OK) < 0 && errno == ENOENT) +return -2; + +if (virFileWriteStr(field, sysctl_setting, 0) < 0) { +virReportSystemError(errno, + _("cannot write to '%s' on bridge '%s'"), + field, bridge); +return -1; +} + +return 0; +} + + static int networkSetIPv6Sysctls(virNetworkObjPtr obj) { virNetworkDefPtr def = virNetworkObjGetDef(obj); -char *field = NULL; -int ret = -1; bool enableIPv6 = !!virNetworkDefGetIPByIndex(def, AF_INET6, 0); +int rc; /* set disable_ipv6 if there are no ipv6 addresses defined for the * network. But also unset it if there *are* ipv6 addresses, as we * can't be sure of its default value. */ -field = g_strdup_printf(SYSCTL_PATH "/net/ipv6/conf/%s/disable_ipv6", -def->bridge); - -if (access(field, W_OK) < 0 && errno == ENOENT) { +rc = networkSetIPv6Sysctl(def->bridge, "disable_ipv6", + enableIPv6 ? "0" : "1", true); +if (rc == -2) { if (!enableIPv6) VIR_DEBUG("ipv6 appears to already be disabled on %s", def->bridge); -ret = 0; -goto cleanup; +return 0; +} else if (rc < 0) { +return -1; } -if (virFileWriteStr(field, enableIPv6 ? "0" : "1", 0) < 0) { -virReportSystemError(errno, - _("cannot write to %s to enable/disable IPv6 " - "on bridge %s"), field, def->bridge); -goto cleanup; -} -VIR_FREE(field); - /* The rest of the ipv6 sysctl tunables should always be set the * same, whether or not we're using ipv6 on this bridge. */ @@ -2287,31 +2301,16 @@ networkSetIPv6Sysctls(virNetworkObjPtr obj) /* Prevent guests from hijacking the host network by sending out * their own router advertisements. */ -field = g_strdup_printf(SYSCTL_PATH "/net/ipv6/conf/%s/accept_ra", -def->bridge); - -if (virFileWriteStr(field, "0", 0) < 0) { -virReportSystemError(errno, - _("cannot disable %s"), field); -goto cleanup; -} -VIR_FREE(field); +if (networkSetIPv6Sysctl(def->bridge, "accept_ra", "0", false) < 0) +return -1; /* All interfaces used as a gateway (which is what this is, by * definition), must always have autoconf=0. */ -field = g_strdup_printf(SYSCTL_PATH "/net/ipv6/conf/%s/autoconf", def->bridge); +if (networkSetIPv6Sysctl(def->bridge, "autoconf", "0", false) < 0) +return -1; -if (virFileWriteStr(field, "0", 0) < 0) { -virReportSystemError(errno, - _("cannot disable %s"), field); -goto cleanup; -} - -ret = 0; - cleanup: -VIR_FREE(field); -return ret; +return 0; } -- 2.25.4
Re: [libvirt PATCH v2 05/15] network: use g_auto wherever appropriate
On 7/15/20 11:10 AM, Ján Tomko wrote: On a Tuesday in 2020, Laine Stump wrote: This includes standard g_autofree() as well as other objects that have a cleanup function defined to use via g_autoptr (virCommand, virJSONValue) Signed-off-by: Laine Stump --- src/network/bridge_driver.c | 206 ++ src/network/bridge_driver_linux.c | 7 +- src/network/leaseshelper.c | 16 +-- 3 files changed, 76 insertions(+), 153 deletions(-) diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c index ab359acdb5..31bd0dd92c 100644 --- a/src/network/bridge_driver.c +++ b/src/network/bridge_driver.c [...] @@ -1095,7 +1081,6 @@ networkDnsmasqConfContents(virNetworkObjPtr obj, bool wantDNS = dns->enable != VIR_TRISTATE_BOOL_NO; virNetworkIPDefPtr tmpipdef, ipdef, ipv4def, ipv6def; bool ipv6SLAAC; - char *saddr = NULL, *eaddr = NULL; *configstr = NULL; [...] @@ -1414,6 +1396,8 @@ networkDnsmasqConfContents(virNetworkObjPtr obj, int thisRange; virNetworkDHCPRangeDef range = ipdef->ranges[r]; g_autofree char *leasetime = NULL; + g_autofree char *saddr = NULL; + g_autofree char *eaddr = NULL; 300 lines below the original location. Long function is long. :) At least there were no unrelated changes in be... oh, wait. Nevermind. A long time ago (1988) in a galaxy far far away (Lake City, Minnesota) I worked with a guy who told me that any function that wouldn't fit on a single screen was too long and needed to be broken up (this was in the 80x25 ASCII terminal days). He would probably rip out his moustache and scream if he saw some of the functions in libvirt. if (!(saddr = virSocketAddrFormat(&range.addr.start)) || !(eaddr = virSocketAddrFormat(&range.addr.end))) [...] @@ -2248,7 +2206,7 @@ static int networkSetIPv6Sysctls(virNetworkObjPtr obj) { virNetworkDefPtr def = virNetworkObjGetDef(obj); - char *field = NULL; + g_autofree char *field = NULL; Last time I tried manually freeing an autofree'd variable, I was told not to do that O:-) Yeah, there's a few places where we re-use a pointer for "temporary" strings. In a different patch I sent a few weeks ago, I fixed it by just declaring multiple separate autofree variables, one for each usage, and I think that is the least future-bug-prone method of dealing with it. (I hadn't seen anyone scolding about not manually freeing and autofree'd variable, but since doing so made me uneasy anyway, I'm happy to jump on the bandwagon :-) The clean way here seems to be refactoring the function. I can put that somewhere into the depths of my TODO list. If you really want to, you can. Otherwise I can send a patch for that to be pushed along with this series, just so that I won't have the icky feeling of leaving a job not quite done. int ret = -1; bool enableIPv6 = !!virNetworkDefGetIPByIndex(def, AF_INET6, 0); @@ -2273,7 +2231,6 @@ networkSetIPv6Sysctls(virNetworkObjPtr obj) "on bridge %s"), field, def->bridge); goto cleanup; } - VIR_FREE(field); /* The rest of the ipv6 sysctl tunables should always be set the * same, whether or not we're using ipv6 on this bridge. @@ -2282,6 +2239,7 @@ networkSetIPv6Sysctls(virNetworkObjPtr obj) /* Prevent guests from hijacking the host network by sending out * their own router advertisements. */ + VIR_FREE(field); field = g_strdup_printf(SYSCTL_PATH "/net/ipv6/conf/%s/accept_ra", def->bridge); @@ -2290,11 +2248,11 @@ networkSetIPv6Sysctls(virNetworkObjPtr obj) _("cannot disable %s"), field); goto cleanup; } - VIR_FREE(field); /* All interfaces used as a gateway (which is what this is, by * definition), must always have autoconf=0. */ + VIR_FREE(field); field = g_strdup_printf(SYSCTL_PATH "/net/ipv6/conf/%s/autoconf", def->bridge); if (virFileWriteStr(field, "0", 0) < 0) { @@ -2305,7 +2263,6 @@ networkSetIPv6Sysctls(virNetworkObjPtr obj) ret = 0; cleanup: - VIR_FREE(field); return ret; } [...] @@ -3276,8 +3221,6 @@ networkFindUnusedBridgeName(virNetworkObjListPtr nets, MAX_BRIDGE_ID); ret = 0; So this function returned 0 even on failure. Introduced by a28d3e485f01d16320af15780bc935f54782a45d cleanup: - if (ret < 0) - VIR_FREE(newname); return ret; } Without the networkSetIPv6Sysctls changes: Reviewed-by: Ján Tomko Jano
Re: [libvirt-jenkins-ci PATCH 7/7] lcitool: Create and expose ccache wrappers
On Wed, 2020-07-15 at 16:57 +0100, Daniel P. Berrangé wrote: > On Fri, Mar 27, 2020 at 08:34:59PM +0100, Andrea Bolognani wrote: > > +commands.extend([ > > +"mkdir -p /usr/local/share/ccache-wrappers", > > +]) > > + > > +if cross_arch: > > +commands.extend([ > > +"ln -s {ccache} > > /usr/local/share/ccache-wrappers/{cross_abi}-cc", > > +"ln -s {ccache} > > /usr/local/share/ccache-wrappers/{cross_abi}-$(basename {cc})", > > +]) > > +else: > > +commands.extend([ > > +"ln -s {ccache} /usr/local/share/ccache-wrappers/cc", > > +"ln -s {ccache} > > /usr/local/share/ccache-wrappers/$(basename {cc})", > > +]) > > + > > script = "\nRUN " + (" && \\\n".join(commands)) + "\n" > > sys.stdout.write(script.format(**varmap)) > > I've just realized that this addition has prevented the caching and > reuse the base layer across the cross images. The first "RUN" cmmand > was supposed to have stuff that is common across cross images, but > we've accidentally included the "cross_abi" in the path for ccache > here. > > We need to put the ccache setup in the second RUN command, for cross > containers. Only native images can have it in the first RUN command. Yeah, I realized that a while ago but always forgot to send out a patch :) Do you want to take a stab at it? -- Andrea Bolognani / Red Hat / Virtualization
Re: [libvirt-jenkins-ci PATCH 7/7] lcitool: Create and expose ccache wrappers
On Fri, Mar 27, 2020 at 08:34:59PM +0100, Andrea Bolognani wrote: > VM-based builds have used ccache by default for a very long time, > and now container-based builds will too. > > Signed-off-by: Andrea Bolognani > --- > guests/lcitool | 18 ++ > 1 file changed, 18 insertions(+) > > diff --git a/guests/lcitool b/guests/lcitool > index 117e1a5..011fc07 100755 > --- a/guests/lcitool > +++ b/guests/lcitool > @@ -651,6 +651,8 @@ class Application: > varmap = self._dockerfile_build_varmap_rpm(facts, mappings, > pip_mappings, projects, cross_arch) > > varmap["package_manager"] = facts["package_manager"] > +varmap["cc"] = facts["cc"] > +varmap["ccache"] = facts["ccache"] > varmap["make"] = facts["make"] > varmap["ninja"] = facts["ninja"] > varmap["python"] = facts["python"] > @@ -864,6 +866,21 @@ class Application: > "{package_manager} clean all -y", > ]) > > +commands.extend([ > +"mkdir -p /usr/local/share/ccache-wrappers", > +]) > + > +if cross_arch: > +commands.extend([ > +"ln -s {ccache} > /usr/local/share/ccache-wrappers/{cross_abi}-cc", > +"ln -s {ccache} > /usr/local/share/ccache-wrappers/{cross_abi}-$(basename {cc})", > +]) > +else: > +commands.extend([ > +"ln -s {ccache} /usr/local/share/ccache-wrappers/cc", > +"ln -s {ccache} /usr/local/share/ccache-wrappers/$(basename > {cc})", > +]) > + > script = "\nRUN " + (" && \\\n".join(commands)) + "\n" > sys.stdout.write(script.format(**varmap)) I've just realized that this addition has prevented the caching and reuse the base layer across the cross images. The first "RUN" cmmand was supposed to have stuff that is common across cross images, but we've accidentally included the "cross_abi" in the path for ccache here. We need to put the ccache setup in the second RUN command, for cross containers. Only native images can have it in the first RUN command. Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: [libvirt PATCH v2 15/15] nwfilter: convert remaining VIR_FREE() to g_free()
On a Tuesday in 2020, Laine Stump wrote: Signed-off-by: Laine Stump --- src/nwfilter/nwfilter_dhcpsnoop.c | 16 src/nwfilter/nwfilter_driver.c| 10 +- src/nwfilter/nwfilter_ebiptables_driver.c | 2 +- src/nwfilter/nwfilter_gentech_driver.c| 6 +++--- src/nwfilter/nwfilter_learnipaddr.c | 8 5 files changed, 21 insertions(+), 21 deletions(-) diff --git a/src/nwfilter/nwfilter_dhcpsnoop.c b/src/nwfilter/nwfilter_dhcpsnoop.c index 64671af135..aafa6de322 100644 --- a/src/nwfilter/nwfilter_dhcpsnoop.c +++ b/src/nwfilter/nwfilter_dhcpsnoop.c @@ -314,7 +314,7 @@ virNWFilterSnoopCancel(char **threadKey) virNWFilterSnoopActiveLock(); ignore_value(virHashRemoveEntry(virNWFilterSnoopState.active, *threadKey)); -VIR_FREE(*threadKey); +g_free(*threadKey); This one should use g_clear_pointer. virNWFilterSnoopActiveUnlock(); } diff --git a/src/nwfilter/nwfilter_driver.c b/src/nwfilter/nwfilter_driver.c index 39d0a2128e..7853ad59fa 100644 --- a/src/nwfilter/nwfilter_driver.c +++ b/src/nwfilter/nwfilter_driver.c @@ -303,7 +303,7 @@ nwfilterStateInitialize(bool privileged, err_free_driverstate: virNWFilterObjListFree(driver->nwfilters); -VIR_FREE(driver); +g_free(driver); Same here. return VIR_DRV_STATE_INIT_ERROR; } @@ -379,7 +379,7 @@ nwfilterStateCleanup(void) virNWFilterObjListFree(driver->nwfilters); virMutexDestroy(&driver->lock); -VIR_FREE(driver); +g_free(driver); Possibly here. return 0; } (I have not verified whether the double use of the pointer may practically happen, but we do a non-NULL check of these in a few cases) Reviewed-by: Ján Tomko Jano signature.asc Description: PGP signature
Re: [libvirt PATCH v2 14/15] nwfilter: convert local pointers to use g_auto*
On a Tuesday in 2020, Laine Stump wrote: Signed-off-by: Laine Stump --- src/nwfilter/nwfilter_dhcpsnoop.c | 91 +++ src/nwfilter/nwfilter_ebiptables_driver.c | 75 +++ src/nwfilter/nwfilter_gentech_driver.c| 15 ++-- src/nwfilter/nwfilter_learnipaddr.c | 7 +- 4 files changed, 61 insertions(+), 127 deletions(-) Reviewed-by: Ján Tomko Jano signature.asc Description: PGP signature
Re: [libvirt PATCH v2 08/15] nwfilter: remove unnecessary code from ebtablesGetSubChainInsts()
On a Tuesday in 2020, Laine Stump wrote: On failure, this function would clear out and free the list of subchains it had been called with. This is unnecessary, because the *only* caller of this function will also clear out and free the list of subchains if it gets a failure from ebtablesGetSubChainInsts(). (It also makes more logical sense for the function that is creating the entire list to be the one freeing the entire list, rather than having a function whose purpose is only to create *one item* on the list freeing the entire list). This is the function creating the list, I think it makes sense to not leave anything allocated in case of failure. Jano Signed-off-by: Laine Stump --- src/nwfilter/nwfilter_ebiptables_driver.c | 6 -- 1 file changed, 6 deletions(-) signature.asc Description: PGP signature
Re: [libvirt PATCH v2 13/15] nwfilter: replace VIR_ALLOC with g_new0
On a Tuesday in 2020, Laine Stump wrote: Signed-off-by: Laine Stump --- src/nwfilter/nwfilter_dhcpsnoop.c | 9 +++-- src/nwfilter/nwfilter_driver.c| 3 +-- src/nwfilter/nwfilter_ebiptables_driver.c | 3 +-- src/nwfilter/nwfilter_gentech_driver.c| 3 +-- src/nwfilter/nwfilter_learnipaddr.c | 6 ++ 5 files changed, 8 insertions(+), 16 deletions(-) Reviewed-by: Ján Tomko Jano signature.asc Description: PGP signature
Re: [libvirt PATCH v2 12/15] nwfilter: use standard label names when reasonable
On a Tuesday in 2020, Laine Stump wrote: Rather than having labels named exit, done, exit_snooprequnlock, skip_rename, etc, use the standard "cleanup" label. And instead of err_exit, malformed, tear_down_tmpebchains, use "error". Signed-off-by: Laine Stump --- src/nwfilter/nwfilter_dhcpsnoop.c | 36 +++ src/nwfilter/nwfilter_ebiptables_driver.c | 12 src/nwfilter/nwfilter_gentech_driver.c| 32 ++-- src/nwfilter/nwfilter_learnipaddr.c | 24 +++ 4 files changed, 52 insertions(+), 52 deletions(-) Reviewed-by: Ján Tomko Jano signature.asc Description: PGP signature
Re: [libvirt PATCH v2 11/15] nwfilter: transform logic in virNWFilterRuleInstSort to eliminate label
On a Tuesday in 2020, Laine Stump wrote: This rewrite of a nested conditional produces the same results, but eliminate a goto and corresponding label. Signed-off-by: Laine Stump --- src/nwfilter/nwfilter_ebiptables_driver.c | 11 +-- 1 file changed, 5 insertions(+), 6 deletions(-) Reviewed-by: Ján Tomko Jano signature.asc Description: PGP signature
Re: [libvirt PATCH v2 09/15] nwfilter: clear nrules when resetting virNWFilterInst
On a Tuesday in 2020, Laine Stump wrote: It's possible/probable the callers to virNWFilterInstReset() make it unnecessary to set the object's nrules to 0 after freeing all its rules, but that same function is setting nfilters to 0, so let's do the same for the sake of consistency. Signed-off-by: Laine Stump --- src/nwfilter/nwfilter_gentech_driver.c | 1 + 1 file changed, 1 insertion(+) Reviewed-by: Ján Tomko Jano signature.asc Description: PGP signature
Re: [libvirt PATCH v2 10/15] nwfilter: define a typedef for struct ebtablesSubChainInst
On a Tuesday in 2020, Laine Stump wrote: Signed-off-by: Laine Stump --- src/nwfilter/nwfilter_ebiptables_driver.c | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) Reviewed-by: Ján Tomko Jano signature.asc Description: PGP signature
[PATCH v4 1/2] conf: add 'isa' controller type
Introduce 'isa' controller type. In domain XML it looks this way: ... ... Currently, this is needed for the bhyve driver to allow choosing a specific PCI address for that. In bhyve, this controller is used to attach serial ports and a boot ROM. Signed-off-by: Roman Bogorodskiy --- docs/schemas/domaincommon.rng | 6 ++ src/conf/domain_conf.c | 9 + src/conf/domain_conf.h | 8 src/qemu/qemu_command.c| 1 + src/qemu/qemu_domain.c | 1 + src/qemu/qemu_domain_address.c | 1 + src/qemu/qemu_validate.c | 1 + src/vbox/vbox_common.c | 1 + 8 files changed, 28 insertions(+) diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index 4b4aa60c66..7bed99b161 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -2416,6 +2416,12 @@ + + + + isa + + diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index d14485f18d..6befe4bbd8 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -399,6 +399,7 @@ VIR_ENUM_IMPL(virDomainController, "usb", "pci", "xenbus", + "isa", ); VIR_ENUM_IMPL(virDomainControllerModelPCI, @@ -444,6 +445,9 @@ VIR_ENUM_IMPL(virDomainControllerModelSCSI, "virtio-non-transitional", ); +VIR_ENUM_IMPL(virDomainControllerModelISA, VIR_DOMAIN_CONTROLLER_MODEL_ISA_LAST, +); + VIR_ENUM_IMPL(virDomainControllerModelUSB, VIR_DOMAIN_CONTROLLER_MODEL_USB_LAST, "piix3-uhci", @@ -2312,6 +2316,7 @@ virDomainControllerDefNew(virDomainControllerType type) case VIR_DOMAIN_CONTROLLER_TYPE_SCSI: case VIR_DOMAIN_CONTROLLER_TYPE_SATA: case VIR_DOMAIN_CONTROLLER_TYPE_CCID: +case VIR_DOMAIN_CONTROLLER_TYPE_ISA: case VIR_DOMAIN_CONTROLLER_TYPE_LAST: break; } @@ -10975,6 +10980,8 @@ virDomainControllerModelTypeFromString(const virDomainControllerDef *def, return virDomainControllerModelIDETypeFromString(model); else if (def->type == VIR_DOMAIN_CONTROLLER_TYPE_VIRTIO_SERIAL) return virDomainControllerModelVirtioSerialTypeFromString(model); +else if (def->type == VIR_DOMAIN_CONTROLLER_TYPE_ISA) +return virDomainControllerModelISATypeFromString(model); return -1; } @@ -10994,6 +11001,8 @@ virDomainControllerModelTypeToString(virDomainControllerDefPtr def, return virDomainControllerModelIDETypeToString(model); else if (def->type == VIR_DOMAIN_CONTROLLER_TYPE_VIRTIO_SERIAL) return virDomainControllerModelVirtioSerialTypeToString(model); +else if (def->type == VIR_DOMAIN_CONTROLLER_TYPE_ISA) +return virDomainControllerModelISATypeToString(model); return NULL; } diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 6a737591e2..188385cc6b 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -595,6 +595,7 @@ typedef enum { VIR_DOMAIN_CONTROLLER_TYPE_USB, VIR_DOMAIN_CONTROLLER_TYPE_PCI, VIR_DOMAIN_CONTROLLER_TYPE_XENBUS, +VIR_DOMAIN_CONTROLLER_TYPE_ISA, VIR_DOMAIN_CONTROLLER_TYPE_LAST } virDomainControllerType; @@ -686,6 +687,12 @@ typedef enum { VIR_DOMAIN_CONTROLLER_MODEL_VIRTIO_SERIAL_LAST } virDomainControllerModelVirtioSerial; +typedef enum { +VIR_DOMAIN_CONTROLLER_MODEL_ISA_DEFAULT = -1, + +VIR_DOMAIN_CONTROLLER_MODEL_ISA_LAST +} virDomainControllerModelISA; + #define IS_USB2_CONTROLLER(ctrl) \ (((ctrl)->type == VIR_DOMAIN_CONTROLLER_TYPE_USB) && \ ((ctrl)->model == VIR_DOMAIN_CONTROLLER_MODEL_USB_ICH9_EHCI1 || \ @@ -3547,6 +3554,7 @@ VIR_ENUM_DECL(virDomainControllerModelSCSI); VIR_ENUM_DECL(virDomainControllerModelUSB); VIR_ENUM_DECL(virDomainControllerModelIDE); VIR_ENUM_DECL(virDomainControllerModelVirtioSerial); +VIR_ENUM_DECL(virDomainControllerModelISA); VIR_ENUM_DECL(virDomainFS); VIR_ENUM_DECL(virDomainFSDriver); VIR_ENUM_DECL(virDomainFSAccessMode); diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 24e99e13b8..d4e190a1ae 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -2699,6 +2699,7 @@ qemuBuildControllerDevStr(const virDomainDef *domainDef, case VIR_DOMAIN_CONTROLLER_TYPE_IDE: case VIR_DOMAIN_CONTROLLER_TYPE_FDC: case VIR_DOMAIN_CONTROLLER_TYPE_XENBUS: +case VIR_DOMAIN_CONTROLLER_TYPE_ISA: case VIR_DOMAIN_CONTROLLER_TYPE_LAST: virReportError(VIR_ERR_CONFIG_UNSUPPORTED, _("Unsupported controller type: %s"), diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 2d9d8226d6..f5dbadb323 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -5054,6 +5054,7 @@ qemuDomainControllerDefPostParse(virDomainControllerDefPtr cont, case VIR_DOMAIN_CONTROLLER_TYPE_IDE:
[PATCH v4 2/2] bhyve: support 'isa' controller for LPC
Support modeling of the 'isa' controller for bhyve. User can manually define any PCI slot for the 'isa' controller, including PCI slot 1, but other devices are not allowed to use this address. When domain configuration requires the 'isa' controller to be present, automatically add it on domain post-parse stage. Now, as this controller is always available when needed, it's not necessary to implicitly add it to the bhyve command line, so remove bhyveBuildLPCArgStr(). Also, make bhyveDomainDefNeedsISAController() static as it's no longer used outside of bhyve_domain.c. As more than one ISA controller is not supported by bhyve, and multiple controllers with the same index are forbidden, so forbid ISA controllers with non-zero index for bhyve. Signed-off-by: Roman Bogorodskiy --- src/bhyve/bhyve_command.c | 27 +++--- src/bhyve/bhyve_device.c | 23 +--- src/bhyve/bhyve_domain.c | 25 +++-- src/bhyve/bhyve_domain.h | 2 -- ...ml2argv-addr-isa-controller-on-slot-1.args | 10 ++ ...2argv-addr-isa-controller-on-slot-1.ldargs | 3 ++ ...xml2argv-addr-isa-controller-on-slot-1.xml | 26 ++ ...l2argv-addr-isa-controller-on-slot-31.args | 10 ++ ...argv-addr-isa-controller-on-slot-31.ldargs | 3 ++ ...ml2argv-addr-isa-controller-on-slot-31.xml | 26 ++ ...argv-addr-non-isa-controller-on-slot-1.xml | 23 .../bhyvexml2argv-console.args| 2 +- .../bhyvexml2argv-isa-controller.args | 10 ++ .../bhyvexml2argv-isa-controller.ldargs | 3 ++ .../bhyvexml2argv-isa-controller.xml | 24 + ...bhyvexml2argv-isa-multiple-controllers.xml | 25 + .../bhyvexml2argv-serial-grub-nocons.args | 2 +- .../bhyvexml2argv-serial-grub.args| 2 +- .../bhyvexml2argv-serial.args | 2 +- .../bhyvexml2argvdata/bhyvexml2argv-uefi.args | 4 +-- .../bhyvexml2argv-vnc-autoport.args | 4 +-- .../bhyvexml2argv-vnc-vgaconf-io.args | 4 +-- .../bhyvexml2argv-vnc-vgaconf-off.args| 4 +-- .../bhyvexml2argv-vnc-vgaconf-on.args | 4 +-- .../bhyvexml2argvdata/bhyvexml2argv-vnc.args | 4 +-- tests/bhyvexml2argvtest.c | 5 +++ ...l2xmlout-addr-isa-controller-on-slot-1.xml | 36 +++ ...2xmlout-addr-isa-controller-on-slot-31.xml | 36 +++ .../bhyvexml2xmlout-console.xml | 3 ++ .../bhyvexml2xmlout-isa-controller.xml| 36 +++ .../bhyvexml2xmlout-serial-grub-nocons.xml| 3 ++ .../bhyvexml2xmlout-serial-grub.xml | 3 ++ .../bhyvexml2xmlout-serial.xml| 3 ++ .../bhyvexml2xmlout-vnc-autoport.xml | 3 ++ .../bhyvexml2xmlout-vnc-vgaconf-io.xml| 3 ++ .../bhyvexml2xmlout-vnc-vgaconf-off.xml | 3 ++ .../bhyvexml2xmlout-vnc-vgaconf-on.xml| 3 ++ .../bhyvexml2xmlout-vnc.xml | 3 ++ tests/bhyvexml2xmltest.c | 3 ++ 39 files changed, 378 insertions(+), 37 deletions(-) create mode 100644 tests/bhyvexml2argvdata/bhyvexml2argv-addr-isa-controller-on-slot-1.args create mode 100644 tests/bhyvexml2argvdata/bhyvexml2argv-addr-isa-controller-on-slot-1.ldargs create mode 100644 tests/bhyvexml2argvdata/bhyvexml2argv-addr-isa-controller-on-slot-1.xml create mode 100644 tests/bhyvexml2argvdata/bhyvexml2argv-addr-isa-controller-on-slot-31.args create mode 100644 tests/bhyvexml2argvdata/bhyvexml2argv-addr-isa-controller-on-slot-31.ldargs create mode 100644 tests/bhyvexml2argvdata/bhyvexml2argv-addr-isa-controller-on-slot-31.xml create mode 100644 tests/bhyvexml2argvdata/bhyvexml2argv-addr-non-isa-controller-on-slot-1.xml create mode 100644 tests/bhyvexml2argvdata/bhyvexml2argv-isa-controller.args create mode 100644 tests/bhyvexml2argvdata/bhyvexml2argv-isa-controller.ldargs create mode 100644 tests/bhyvexml2argvdata/bhyvexml2argv-isa-controller.xml create mode 100644 tests/bhyvexml2argvdata/bhyvexml2argv-isa-multiple-controllers.xml create mode 100644 tests/bhyvexml2xmloutdata/bhyvexml2xmlout-addr-isa-controller-on-slot-1.xml create mode 100644 tests/bhyvexml2xmloutdata/bhyvexml2xmlout-addr-isa-controller-on-slot-31.xml create mode 100644 tests/bhyvexml2xmloutdata/bhyvexml2xmlout-isa-controller.xml diff --git a/src/bhyve/bhyve_command.c b/src/bhyve/bhyve_command.c index 22d0b24ec4..2a3e10d649 100644 --- a/src/bhyve/bhyve_command.c +++ b/src/bhyve/bhyve_command.c @@ -329,7 +329,8 @@ bhyveBuildControllerArgStr(const virDomainDef *def, virDomainControllerDefPtr controller, bhyveConnPtr driver, virCommandPtr cmd, - unsigned *nusbcontrollers) + unsigned *nusbcontrollers, + unsigned *nisacontrollers) { switch (c
[PATCH v4 0/2] conf: add 'isa' controller type
No code changes, just re-arranged commits. Now commits are more granular than v3, but still less granular as v2. Specifically, now there are a 'conf' part as a separate commit and 'bhyve' part as a separate commit. My thought was that automatic addition of the 'isa' controller and controller index validation patches are small compared to the main code, so should not increase reviewing difficulty much. Please let me know if it's still desired to make more granular commits. I plan to submit corresponding news and docs entries as a separate series. Roman Bogorodskiy (2): conf: add 'isa' controller type bhyve: support 'isa' controller for LPC docs/schemas/domaincommon.rng | 6 src/bhyve/bhyve_command.c | 27 +++--- src/bhyve/bhyve_device.c | 23 +--- src/bhyve/bhyve_domain.c | 25 +++-- src/bhyve/bhyve_domain.h | 2 -- src/conf/domain_conf.c| 9 + src/conf/domain_conf.h| 8 + src/qemu/qemu_command.c | 1 + src/qemu/qemu_domain.c| 1 + src/qemu/qemu_domain_address.c| 1 + src/qemu/qemu_validate.c | 1 + src/vbox/vbox_common.c| 1 + ..ml2argv-addr-isa-controller-on-slot-1.args | 10 ++ ...2argv-addr-isa-controller-on-slot-1.ldargs | 3 ++ ...xml2argv-addr-isa-controller-on-slot-1.xml | 26 ++ ...l2argv-addr-isa-controller-on-slot-31.args | 10 ++ ...argv-addr-isa-controller-on-slot-31.ldargs | 3 ++ ...ml2argv-addr-isa-controller-on-slot-31.xml | 26 ++ ...argv-addr-non-isa-controller-on-slot-1.xml | 23 .../bhyvexml2argv-console.args| 2 +- .../bhyvexml2argv-isa-controller.args | 10 ++ .../bhyvexml2argv-isa-controller.ldargs | 3 ++ .../bhyvexml2argv-isa-controller.xml | 24 + ...bhyvexml2argv-isa-multiple-controllers.xml | 25 + .../bhyvexml2argv-serial-grub-nocons.args | 2 +- .../bhyvexml2argv-serial-grub.args| 2 +- .../bhyvexml2argv-serial.args | 2 +- .../bhyvexml2argvdata/bhyvexml2argv-uefi.args | 4 +-- .../bhyvexml2argv-vnc-autoport.args | 4 +-- .../bhyvexml2argv-vnc-vgaconf-io.args | 4 +-- .../bhyvexml2argv-vnc-vgaconf-off.args| 4 +-- .../bhyvexml2argv-vnc-vgaconf-on.args | 4 +-- .../bhyvexml2argvdata/bhyvexml2argv-vnc.args | 4 +-- tests/bhyvexml2argvtest.c | 5 +++ ...l2xmlout-addr-isa-controller-on-slot-1.xml | 36 +++ ...2xmlout-addr-isa-controller-on-slot-31.xml | 36 +++ .../bhyvexml2xmlout-console.xml | 3 ++ .../bhyvexml2xmlout-isa-controller.xml| 36 +++ .../bhyvexml2xmlout-serial-grub-nocons.xml| 3 ++ .../bhyvexml2xmlout-serial-grub.xml | 3 ++ .../bhyvexml2xmlout-serial.xml| 3 ++ .../bhyvexml2xmlout-vnc-autoport.xml | 3 ++ .../bhyvexml2xmlout-vnc-vgaconf-io.xml| 3 ++ .../bhyvexml2xmlout-vnc-vgaconf-off.xml | 3 ++ .../bhyvexml2xmlout-vnc-vgaconf-on.xml| 3 ++ .../bhyvexml2xmlout-vnc.xml | 3 ++ tests/bhyvexml2xmltest.c | 3 ++ 47 files changed, 406 insertions(+), 37 deletions(-) create mode 100644 tests/bhyvexml2argvdata/bhyvexml2argv-addr-isa-controller-on-slot-1.args create mode 100644 tests/bhyvexml2argvdata/bhyvexml2argv-addr-isa-controller-on-slot-1.ldargs create mode 100644 tests/bhyvexml2argvdata/bhyvexml2argv-addr-isa-controller-on-slot-1.xml create mode 100644 tests/bhyvexml2argvdata/bhyvexml2argv-addr-isa-controller-on-slot-31.args create mode 100644 tests/bhyvexml2argvdata/bhyvexml2argv-addr-isa-controller-on-slot-31.ldargs create mode 100644 tests/bhyvexml2argvdata/bhyvexml2argv-addr-isa-controller-on-slot-31.xml create mode 100644 tests/bhyvexml2argvdata/bhyvexml2argv-addr-non-isa-controller-on-slot-1.xml create mode 100644 tests/bhyvexml2argvdata/bhyvexml2argv-isa-controller.args create mode 100644 tests/bhyvexml2argvdata/bhyvexml2argv-isa-controller.ldargs create mode 100644 tests/bhyvexml2argvdata/bhyvexml2argv-isa-controller.xml create mode 100644 tests/bhyvexml2argvdata/bhyvexml2argv-isa-multiple-controllers.xml create mode 100644 tests/bhyvexml2xmloutdata/bhyvexml2xmlout-addr-isa-controller-on-slot-1.xml create mode 100644 tests/bhyvexml2xmloutdata/bhyvexml2xmlout-addr-isa-controller-on-slot-31.xml create mode 100644 tests/bhyvexml2xmloutdata/bhyvexml2xmlout-isa-controller.xml -- 2.27.0
Re: [libvirt PATCH v2 07/15] network: use g_free() in place of remaining VIR_FREE()
On a Tuesday in 2020, Laine Stump wrote: Signed-off-by: Laine Stump --- src/network/bridge_driver.c | 45 +++-- 1 file changed, 23 insertions(+), 22 deletions(-) Reviewed-by: Ján Tomko Jano signature.asc Description: PGP signature
Re: [libvirt PATCH v2 06/15] network: eliminate unnecessary labels
On a Tuesday in 2020, Laine Stump wrote: All these cleanup/error labels were reduced to having just "return ret" by a previous patch, so get rid of them and return directly. Signed-off-by: Laine Stump --- src/network/bridge_driver.c | 264 -- src/network/bridge_driver_linux.c | 15 +- 2 files changed, 113 insertions(+), 166 deletions(-) @@ -3190,7 +3143,7 @@ static int networkFindUnusedBridgeName(virNetworkObjListPtr nets, virNetworkDefPtr def) { -int ret = -1, id = 0; +int id = 0; const char *templ = "virbr%d"; const char *p; @@ -3211,17 +3164,14 @@ networkFindUnusedBridgeName(virNetworkObjListPtr nets, virNetDevExists(newname) == 1)) { VIR_FREE(def->bridge); /*could contain template */ def->bridge = g_steal_pointer(&newname); -ret = 0; -goto cleanup; +return 0; } } while (++id <= MAX_BRIDGE_ID); virReportError(VIR_ERR_INTERNAL_ERROR, _("Bridge generation exceeded max id %d"), MAX_BRIDGE_ID); -ret = 0; - cleanup: -return ret; +return -1; This fix deserves a separate commit. Or at least a mention in the commit message. } Reviewed-by: Ján Tomko Jano signature.asc Description: PGP signature
Re: [libvirt PATCH v2 05/15] network: use g_auto wherever appropriate
On a Tuesday in 2020, Laine Stump wrote: This includes standard g_autofree() as well as other objects that have a cleanup function defined to use via g_autoptr (virCommand, virJSONValue) Signed-off-by: Laine Stump --- src/network/bridge_driver.c | 206 ++ src/network/bridge_driver_linux.c | 7 +- src/network/leaseshelper.c| 16 +-- 3 files changed, 76 insertions(+), 153 deletions(-) diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c index ab359acdb5..31bd0dd92c 100644 --- a/src/network/bridge_driver.c +++ b/src/network/bridge_driver.c [...] @@ -1095,7 +1081,6 @@ networkDnsmasqConfContents(virNetworkObjPtr obj, bool wantDNS = dns->enable != VIR_TRISTATE_BOOL_NO; virNetworkIPDefPtr tmpipdef, ipdef, ipv4def, ipv6def; bool ipv6SLAAC; -char *saddr = NULL, *eaddr = NULL; *configstr = NULL; [...] @@ -1414,6 +1396,8 @@ networkDnsmasqConfContents(virNetworkObjPtr obj, int thisRange; virNetworkDHCPRangeDef range = ipdef->ranges[r]; g_autofree char *leasetime = NULL; +g_autofree char *saddr = NULL; +g_autofree char *eaddr = NULL; 300 lines below the original location. Long function is long. :) if (!(saddr = virSocketAddrFormat(&range.addr.start)) || !(eaddr = virSocketAddrFormat(&range.addr.end))) [...] @@ -2248,7 +2206,7 @@ static int networkSetIPv6Sysctls(virNetworkObjPtr obj) { virNetworkDefPtr def = virNetworkObjGetDef(obj); -char *field = NULL; +g_autofree char *field = NULL; Last time I tried manually freeing an autofree'd variable, I was told not to do that O:-) The clean way here seems to be refactoring the function. I can put that somewhere into the depths of my TODO list. int ret = -1; bool enableIPv6 = !!virNetworkDefGetIPByIndex(def, AF_INET6, 0); @@ -2273,7 +2231,6 @@ networkSetIPv6Sysctls(virNetworkObjPtr obj) "on bridge %s"), field, def->bridge); goto cleanup; } -VIR_FREE(field); /* The rest of the ipv6 sysctl tunables should always be set the * same, whether or not we're using ipv6 on this bridge. @@ -2282,6 +2239,7 @@ networkSetIPv6Sysctls(virNetworkObjPtr obj) /* Prevent guests from hijacking the host network by sending out * their own router advertisements. */ +VIR_FREE(field); field = g_strdup_printf(SYSCTL_PATH "/net/ipv6/conf/%s/accept_ra", def->bridge); @@ -2290,11 +2248,11 @@ networkSetIPv6Sysctls(virNetworkObjPtr obj) _("cannot disable %s"), field); goto cleanup; } -VIR_FREE(field); /* All interfaces used as a gateway (which is what this is, by * definition), must always have autoconf=0. */ +VIR_FREE(field); field = g_strdup_printf(SYSCTL_PATH "/net/ipv6/conf/%s/autoconf", def->bridge); if (virFileWriteStr(field, "0", 0) < 0) { @@ -2305,7 +2263,6 @@ networkSetIPv6Sysctls(virNetworkObjPtr obj) ret = 0; cleanup: -VIR_FREE(field); return ret; } [...] @@ -3276,8 +3221,6 @@ networkFindUnusedBridgeName(virNetworkObjListPtr nets, MAX_BRIDGE_ID); ret = 0; So this function returned 0 even on failure. Introduced by a28d3e485f01d16320af15780bc935f54782a45d cleanup: -if (ret < 0) -VIR_FREE(newname); return ret; } Without the networkSetIPv6Sysctls changes: Reviewed-by: Ján Tomko Jano signature.asc Description: PGP signature
[libvirt PATCH] Partially revert "qemu: fix missing error reports in capabilities probing"
This partially reverts commit 5331c4804f4f419b9e75741084f926e52413d3a1. The original commit mistakenly thought virFileCacheLookup did not set an error. In fact the only case it doesn't set an error for is when the cache key is NULL. This in fact the fault of the caller for passing an invalid cache key, so doesn't need to be handled. This caller bug was fixed by checking for a NULL binary in the virQEMUCapsCacheLookupDefault method. Signed-off-by: Daniel P. Berrangé --- src/qemu/qemu_capabilities.c | 5 - src/qemu/qemu_domain.c | 4 +--- 2 files changed, 1 insertion(+), 8 deletions(-) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index b27a5e5c81..7264fe300f 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -5636,11 +5636,6 @@ virQEMUCapsCacheLookup(virFileCachePtr cache, priv->microcodeVersion = virHostCPUGetMicrocodeVersion(); ret = virFileCacheLookup(cache, binary); -if (!ret) { -virReportError(VIR_ERR_INTERNAL_ERROR, - _("no capabilities available for %s"), binary); -return NULL; -} VIR_DEBUG("Returning caps %p for %s", ret, binary); return ret; diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 4a2daffc0a..614781 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -5256,10 +5256,8 @@ qemuDomainPostParseDataAlloc(const virDomainDef *def, virQEMUDriverPtr driver = opaque; if (!(*parseOpaque = virQEMUCapsCacheLookup(driver->qemuCapsCache, -def->emulator))) { -virResetLastError(); +def->emulator))) return 1; -} return 0; } -- 2.26.2
Re: [libvirt PATCH] qemu: fix missing error reports in capabilities probing
On Mon, Jul 13, 2020 at 09:44:07PM +0200, Michal Privoznik wrote: > On 6/15/20 3:56 PM, Daniel P. Berrangé wrote: > > The "virsh domcapabilities --arch ppc64" command will fail with no > > error message set if qemu-system-ppc64 is not currently installed. > > > > This is because virQEMUCapsCacheLookup() does not report any error > > message if not capabilities can be obtained from the cache. Almost > > all methods calling this expected an error to be set on failure. > > > > Once that's fixed though, we see a further bug which is that > > virQEMUCapsCacheLookupDefault() is passing a NULL binary path to > > virQEMUCapsCacheLookup(), so we need to catch that too. > > > > Signed-off-by: Daniel P. Berrangé > > --- > > src/qemu/qemu_capabilities.c | 11 +++ > > src/qemu/qemu_domain.c | 4 +++- > > 2 files changed, 14 insertions(+), 1 deletion(-) > > > > > > diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c > > index 2dad823a86..97096073e6 100644 > > --- a/src/qemu/qemu_domain.c > > +++ b/src/qemu/qemu_domain.c > > @@ -6068,8 +6068,10 @@ qemuDomainPostParseDataAlloc(const virDomainDef *def, > > virQEMUDriverPtr driver = opaque; > > if (!(*parseOpaque = virQEMUCapsCacheLookup(driver->qemuCapsCache, > > -def->emulator))) > > +def->emulator))) { > > +virResetLastError(); > > return 1; > > +} > > return 0; > > } > > > > I think we need to revisit this patch. In my testing, I have qemu built on a > side from git and one domain that runs it. As I updated my system, but not > rebuilt the qemu from git it no longer runs (fails to link): > > ~/work/qemu/qemu.git/x86_64-softmmu/qemu-system-x86_64: error while loading > shared libraries: libnettle.so.7: cannot open shared object file: No such > file or directory > > This is expected. But, virsh start fails with: > > error: Failed to start domain fedora > error: An error occurred, but the cause is unknown > > I've tracked this down to the virResetLastError() in the hunk above. And it > kind of makes sense - we failed to load capabilities on daemon startup (oh, > yeah, daemon runs from git too, but it's been rebuilt) so we try again on > domain startup. But now that I am reading the commit message it doesn't make > much sense to me either. 'virsh domcapabilities' has nothing to do with > PostParse callbacks, does it? Can it be that this error reset call is just > misplaced? I was attempting to preserve what I thought was existing behaviour. I was seeing that virFileCacheLookup returned NULL, but didn't set any error. So I added an error report in virQEMUCapsCacheLookup, and thus in reurn clearer the eror in virQEMUCapsCacheLookup. It looks like i was mistaken about virFileCacheLookup tough - it does indeed set an error, but not in all cases. So I thin we need to revert part of my commit. Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: [libvirt PATCH v2 04/15] network: replace VIR_ALLOC/REALLOC with g_new0/g_renew
On a Tuesday in 2020, Laine Stump wrote: Signed-off-by: Laine Stump --- src/network/bridge_driver.c | 29 ++--- 1 file changed, 10 insertions(+), 19 deletions(-) diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c index 713763130b..ab359acdb5 100644 --- a/src/network/bridge_driver.c +++ b/src/network/bridge_driver.c @@ -177,8 +177,7 @@ networkDnsmasqDefNamespaceParseOptions(networkDnsmasqXmlNsDefPtr nsdef, if (nnodes == 0) return 0; -if (VIR_ALLOC_N(nsdef->options, nnodes) < 0) -return -1; +nsdef->options = g_new0(char *, nnodes); for (i = 0; i < nnodes; i++) { if (!(nsdef->options[nsdef->noptions++] = virXMLPropString(nodes[i], "value"))) { @@ -196,12 +195,9 @@ static int networkDnsmasqDefNamespaceParse(xmlXPathContextPtr ctxt, void **data) { -networkDnsmasqXmlNsDefPtr nsdata = NULL; +networkDnsmasqXmlNsDefPtr nsdata = g_new0(networkDnsmasqXmlNsDef, 1); int ret = -1; -if (VIR_ALLOC(nsdata) < 0) -return -1; - if (networkDnsmasqDefNamespaceParseOptions(nsdata, ctxt)) goto cleanup; @@ -733,8 +729,7 @@ networkStateInitialize(bool privileged, return -1; } -if (VIR_ALLOC(network_driver) < 0) -goto error; +network_driver = g_new0(virNetworkDriverState, 1); network_driver->lockFD = -1; if (virMutexInit(&network_driver->lock) < 0) { @@ -2753,8 +2748,7 @@ networkCreateInterfacePool(virNetworkDefPtr netdef) goto cleanup; } -if (VIR_ALLOC_N(netdef->forward.ifs, numVirtFns) < 0) -goto cleanup; +netdef->forward.ifs = g_new0(virNetworkForwardIfDef, numVirtFns); for (i = 0; i < numVirtFns; i++) { virPCIDeviceAddressPtr thisVirtFn = virtFns[i]; @@ -4323,8 +4317,7 @@ networkGetDHCPLeases(virNetworkPtr net, continue; if (need_results) { -if (VIR_ALLOC(lease) < 0) -goto error; +lease = g_new0(virNetworkDHCPLease, 1); lease->expirytime = expirytime_tmp; @@ -4378,9 +4371,8 @@ networkGetDHCPLeases(virNetworkPtr net, if (leases_ret) { /* NULL terminated array */ -ignore_value(VIR_REALLOC_N(leases_ret, nleases + 1)); -*leases = leases_ret; -leases_ret = NULL; +leases_ret = g_renew(virNetworkDHCPLeasePtr, leases_ret, nleases + 1); Double space before nleases. Also, this is a faithful conversion, but neither VIR_REALLOC_N nor g_renew guarantee that the memory will be zeroed. +*leases = g_steal_pointer(&leases_ret); } rv = nleases; With the double space removed: Reviewed-by: Ján Tomko Jano signature.asc Description: PGP signature
Re: [libvirt PATCH v2 03/15] define g_autoptr cleanup function for virNetworkDHCPLease
On a Tuesday in 2020, Laine Stump wrote: virNetworkDHCPLease and virNetworkDHCPLeaseFree() are declared in the public API file libvirt-network.h, and we can't pollute that with glib macro invocations, so put this in src/datatypes.h next to the other virNetwork items. Signed-off-by: Laine Stump --- src/datatypes.h | 2 ++ 1 file changed, 2 insertions(+) Reviewed-by: Ján Tomko Jano signature.asc Description: PGP signature
Re: [libvirt PATCH v2 02/15] util: define g_autoptr cleanups for a couple dnsmasq objects
On a Tuesday in 2020, Laine Stump wrote: Signed-off-by: Laine Stump --- src/util/virdnsmasq.h | 4 1 file changed, 4 insertions(+) Reviewed-by: Ján Tomko Jano signature.asc Description: PGP signature
Re: [libvirt PATCH v2 01/15] replace g_new() with g_new0() for consistency
On a Tuesday in 2020, Laine Stump wrote: g_new() is used in only 3 places. Switching them to g_new0() will do no harm, reduces confusion, and helps me sleep better at night knowing Sweet dreams. that all allocated memory is initialized to 0 :-) (Yes, I *know* that in all three cases the associated memory is immediately assigned some other value. Today.) Signed-off-by: Laine Stump --- src/qemu/qemu_backup.c | 2 +- src/util/virutil.c | 2 +- tests/qemuhotplugmock.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) Reviewed-by: Ján Tomko Jano signature.asc Description: PGP signature
Re: [PATCH 00/10] resolve hangs/crashes on libvirtd shutdown
On Wed, Jul 15, 2020 at 08:51:03AM +0300, Nikolay Shirokovskiy wrote: > > > On 14.07.2020 17:53, Daniel Henrique Barboza wrote: > > As far as code goes: > > > > > > Reviewed-by: Daniel Henrique Barboza > > > > > > About the design I have a question about the timeout. Patch 5/10 is setting > > a > > 15 second timeout. How did you reach this value? Reading the bug, specially > > this comment from Daniel: > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1828207#c6 > > > > He mentions "give it 5 seconds of running before shutting it down". > > I guess 5 seconds is time for libvirtd to finish startup. This time has > different nature than time for libvirtd to finish it's work on shutdown > so it can be different. > > > > > 5 seconds before shutdown is something that most users can be slightly > > annoyed > > but in the end don't mind that much, but 15 seconds is something that will > > cause bugs to be opened because "Libvirt is taking too long to shutdown". > > Besides, it's a fair assumption that a transaction that takes more than > > 5 or so seconds to finish is already compromised* - might as well shutdown > > the daemon and deal with the errors. > > 15 seconds was mentioned by Daniel in [1] when he first proposed the approach > so I used this value without any extra thought. However I missed that in > the last John's series [2] the default for waiting time is 0s. May be this > is the current decision on waiting time. Let's wait for others to join > the review. Don't read too much into the precise numbers I mentioned, they would just be plucked out of the air :-) If there is some job taking place wrt a VM that is taking a long time to complete and thus blocking shutdown, I think it is important to give it a fair opportunity to finish gracefully. systemd itself gives services something like 90 seconds to exit before it gives up on them. On a heavily loaded host, 5 seconds is almost certainly too short. 15 seconds is not bad, but I wouldn't object to 30 seconds either, as long as we're emitting some log message warning that we're delayed. In the "normal" case these timeouts won't be hit, so we're only delayed in the scenarios where we're likely to be doing something important for a VM. Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: [libvirt PATCH 9/9] rpc: use new virt-nc binary for remote tunnelling
On Wed, 2020-07-15 at 14:25 +0100, Daniel P. Berrangé wrote: > On Wed, Jul 15, 2020 at 02:25:14PM +0200, Andrea Bolognani wrote: > > Mh, that makes sense but I'm still wary of using "proxy" due to the > > potential for confusion, since in this case the proxy is on the > > opposite side of the connection than one would probably expect it > > to be. Something like "remoteproxy" or "serverproxy", perhaps? > > I don't think there's really any problem of confusion here unless > someone doesn't read the docs at all, in which case they won't > even know about this parameter. So I don't think using a more > verbose term is any benefit. Okay. > > Going back to the name of the command itself, since it's an internal > > implementation details, and as such it's not intended to be invoked > > by users and accordingly we're installing it under $(libexecdir) > > along with existing helpers, what about following the established > > naming convention and calling it 'libvirt_proxyhelper'? > > Installing it to libexecdir is actually a mistake in this version. It > needs to be installed into bindir, as it must be present in $PATH. Why is that so? Is it because otherwise we can't guarantee that ssh on the remote end will find it, seeing as $(libexecdir) can be changed at configure time and is usually not part of $PATH anyway? If the binary will show up in $PATH, then I think it's even more important to ensure it's very apparent that it's for internal use and not to be invoked directly. Including a message explaining this in the help output as well as the usage message that is printed when a URI is not passed on the command line would be a great start. As for the name of the binary itself, longer and more descriptive is clearly better to avoid confusion. What about 'virt-proxy-helper'? -- Andrea Bolognani / Red Hat / Virtualization
Re: [libvirt PATCH 9/9] rpc: use new virt-nc binary for remote tunnelling
On Wed, Jul 15, 2020 at 02:25:14PM +0200, Andrea Bolognani wrote: > On Wed, 2020-07-15 at 11:00 +0100, Daniel P. Berrangé wrote: > > On Fri, Jul 10, 2020 at 07:21:47PM +0200, Andrea Bolognani wrote: > > > Just a couple of comments about the UI: would it make sense to use > > > something like > > > > > > qemu+ssh://host/system?tunnelcmd=virt-tunnel > > > > > > instead? virt-nc could be seen as a bit of a misnomer, considering > > > that (by design) it doesn't do nearly as much as the actual netcat > > > does; as for the 'proxy' argument, I'm afraid it might lead people > > > to believe it's used for HTTP proxying or some other form of > > > proxying *between the client and the host*, whereas it's really > > > something that only affects operations on the host itself - not to > > > mention that we also have a virtproxyd now, further increasing the > > > potential for confusion... > > > > I chose proxy not tunnel, because SSH is providing the tunnel here. > > virt-nc is a proxy linking the tunnel to the daemon. virtproxyd is > > conceptually similar, again linking a libvirt client to the real > > daemon. > > Mh, that makes sense but I'm still wary of using "proxy" due to the > potential for confusion, since in this case the proxy is on the > opposite side of the connection than one would probably expect it > to be. Something like "remoteproxy" or "serverproxy", perhaps? I don't think there's really any problem of confusion here unless someone doesn't read the docs at all, in which case they won't even know about this parameter. So I don't think using a more verbose term is any benefit. > > > I probably shouldn't mention "virt-nc" by name in the URI and instead > > have "proxy=netcat" vs "proxy=native", as users don't get to choose > > the actual binary here - they are providing an enum string, which > > gets mapped to the desired binary. > > Yeah, that's much better. > > Going back to the name of the command itself, since it's an internal > implementation details, and as such it's not intended to be invoked > by users and accordingly we're installing it under $(libexecdir) > along with existing helpers, what about following the established > naming convention and calling it 'libvirt_proxyhelper'? Installing it to libexecdir is actually a mistake in this version. It needs to be installed into bindir, as it must be present in $PATH. Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
[PATCH 07/10] virStorageSourceFindByNodeName: Remove unused 'idx' argument
None of the callers actually use it. Signed-off-by: Peter Krempa --- src/qemu/qemu_domain.c| 9 - src/util/virstoragefile.c | 16 +++- src/util/virstoragefile.h | 3 +-- 3 files changed, 8 insertions(+), 20 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 64bd52c51f..ed7ec77ed4 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -2498,15 +2498,15 @@ qemuDomainObjPrivateXMLParseBlockjobNodename(qemuBlockJobDataPtr job, return; if (job->disk && -(*src = virStorageSourceFindByNodeName(job->disk->src, nodename, NULL))) +(*src = virStorageSourceFindByNodeName(job->disk->src, nodename))) return; if (job->chain && -(*src = virStorageSourceFindByNodeName(job->chain, nodename, NULL))) +(*src = virStorageSourceFindByNodeName(job->chain, nodename))) return; if (job->mirrorChain && -(*src = virStorageSourceFindByNodeName(job->mirrorChain, nodename, NULL))) +(*src = virStorageSourceFindByNodeName(job->mirrorChain, nodename))) return; /* the node was in the XML but was not found in the job definitions */ @@ -11486,8 +11486,7 @@ qemuDomainDiskLookupByNodename(virDomainDefPtr def, *src = NULL; for (i = 0; i < def->ndisks; i++) { -if ((tmp = virStorageSourceFindByNodeName(def->disks[i]->src, - nodename, NULL))) { +if ((tmp = virStorageSourceFindByNodeName(def->disks[i]->src, nodename))) { if (src) *src = tmp; diff --git a/src/util/virstoragefile.c b/src/util/virstoragefile.c index 274883c4bd..00d8e16ef9 100644 --- a/src/util/virstoragefile.c +++ b/src/util/virstoragefile.c @@ -4589,33 +4589,23 @@ virStorageSourceIsRelative(virStorageSourcePtr src) * virStorageSourceFindByNodeName: * @top: backing chain top * @nodeName: node name to find in backing chain - * @index: if provided the index in the backing chain * * Looks up the given storage source in the backing chain and returns the - * pointer to it. If @index is passed then it's filled by the index in the - * backing chain. On failure NULL is returned and no error is reported. + * pointer to it. + * On failure NULL is returned and no error is reported. */ virStorageSourcePtr virStorageSourceFindByNodeName(virStorageSourcePtr top, - const char *nodeName, - unsigned int *idx) + const char *nodeName) { virStorageSourcePtr tmp; -if (idx) -*idx = 0; - for (tmp = top; virStorageSourceIsBacking(tmp); tmp = tmp->backingStore) { if ((tmp->nodeformat && STREQ(tmp->nodeformat, nodeName)) || (tmp->nodestorage && STREQ(tmp->nodestorage, nodeName))) return tmp; - -if (idx) -(*idx)++; } -if (idx) -*idx = 0; return NULL; } diff --git a/src/util/virstoragefile.h b/src/util/virstoragefile.h index c68bdc9680..f73b3ee005 100644 --- a/src/util/virstoragefile.h +++ b/src/util/virstoragefile.h @@ -526,8 +526,7 @@ bool virStorageSourceIsRelative(virStorageSourcePtr src); virStorageSourcePtr virStorageSourceFindByNodeName(virStorageSourcePtr top, - const char *nodeName, - unsigned int *index) + const char *nodeName) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); void -- 2.26.2
[PATCH 10/10] virDomainSetBlockThreshold: Mention that the event can be registered for
The infrastructure supports setting the threshold also for the . Mention it in the docs. https://bugzilla.redhat.com/show_bug.cgi?id=1807741 Signed-off-by: Peter Krempa --- src/libvirt-domain.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c index 6c5ff5b0db..a4e73d5480 100644 --- a/src/libvirt-domain.c +++ b/src/libvirt-domain.c @@ -12381,6 +12381,9 @@ int virDomainGetGuestInfo(virDomainPtr domain, * live VM XML for 'backingStore' or 'source' elements of a disk. If index is * given the threshold is set for the corresponding image. * + * Note that the threshold event can be registered also for destinations of a + * 'virDomainBlockCopy' destination by using the 'index' of the 'mirror' source. + * * Hypervisors report the last written sector of an image in the bulk stats API * (virConnectGetAllDomainStats/virDomainListGetStats) as * "block..allocation" in the VIR_DOMAIN_STATS_BLOCK group. The current -- 2.26.2
[PATCH 05/10] virDomainSetBlockThreshold: Clarify values of @dev the event is fired for
Top level image may get two events, one with the disk target (vda) and one with disk target with index (vda[3]) if the top level image has an index. Signed-off-by: Peter Krempa --- src/libvirt-domain.c | 4 1 file changed, 4 insertions(+) diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c index ba30d18f65..6c5ff5b0db 100644 --- a/src/libvirt-domain.c +++ b/src/libvirt-domain.c @@ -12370,6 +12370,10 @@ int virDomainGetGuestInfo(virDomainPtr domain, * described by @dev is written beyond the set threshold level. The threshold * level is unset once the event fires. The event might not be delivered at all * if libvirtd was not running at the moment when the threshold was reached. + * Note that if the threshold level is reached for a top level image the event + * is emitted for @dev corresponding to the disk target, and may also be reported + * with @dev corresponding to the disk target with an index corresponding to the + * 'index' attribute of 'source' in the live VM XML if the atribute is present. * * @dev can either be a disk target name (vda, sda) or disk target with index ( * vda[4]). Without the index the top image in the backing chain will have the -- 2.26.2
[PATCH 01/10] virDomainSetBlockThreshold: Document values of '@dev' better
Mention where to obtain the index and how it's treated. Signed-off-by: Peter Krempa --- src/libvirt-domain.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c index fe4dab7cdf..ba30d18f65 100644 --- a/src/libvirt-domain.c +++ b/src/libvirt-domain.c @@ -12371,6 +12371,12 @@ int virDomainGetGuestInfo(virDomainPtr domain, * level is unset once the event fires. The event might not be delivered at all * if libvirtd was not running at the moment when the threshold was reached. * + * @dev can either be a disk target name (vda, sda) or disk target with index ( + * vda[4]). Without the index the top image in the backing chain will have the + * threshold set. The index corresponds to the 'index' attribute reported in the + * live VM XML for 'backingStore' or 'source' elements of a disk. If index is + * given the threshold is set for the corresponding image. + * * Hypervisors report the last written sector of an image in the bulk stats API * (virConnectGetAllDomainStats/virDomainListGetStats) as * "block..allocation" in the VIR_DOMAIN_STATS_BLOCK group. The current -- 2.26.2
[PATCH 03/10] qemuDomainDiskBackingStoreGetName: Eliminate temp variable
We can return the formatted string directly. Signed-off-by: Peter Krempa --- src/qemu/qemu_domain.c | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 3d136a6b8a..cfdd9270da 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -11517,14 +11517,10 @@ char * qemuDomainDiskBackingStoreGetName(virDomainDiskDefPtr disk, unsigned int idx) { -char *ret = NULL; - if (idx) -ret = g_strdup_printf("%s[%d]", disk->dst, idx); +return g_strdup_printf("%s[%d]", disk->dst, idx); else -ret = g_strdup(disk->dst); - -return ret; +return g_strdup(disk->dst); } -- 2.26.2
[PATCH 02/10] qemuDomainDiskBackingStoreGetName: Remove unused argument
Signed-off-by: Peter Krempa --- src/qemu/qemu_domain.c | 1 - src/qemu/qemu_domain.h | 1 - src/qemu/qemu_process.c | 2 +- 3 files changed, 1 insertion(+), 3 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 4a2daffc0a..3d136a6b8a 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -11515,7 +11515,6 @@ qemuDomainDiskLookupByNodename(virDomainDefPtr def, */ char * qemuDomainDiskBackingStoreGetName(virDomainDiskDefPtr disk, - virStorageSourcePtr src G_GNUC_UNUSED, unsigned int idx) { char *ret = NULL; diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index e524fd0002..6944b37ff7 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -963,7 +963,6 @@ virDomainDiskDefPtr qemuDomainDiskLookupByNodename(virDomainDefPtr def, unsigned int *idx); char *qemuDomainDiskBackingStoreGetName(virDomainDiskDefPtr disk, -virStorageSourcePtr src, unsigned int idx); virStorageSourcePtr qemuDomainGetStorageSourceByDevstr(const char *devstr, diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 70fc24b993..2ee778c606 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -1514,7 +1514,7 @@ qemuProcessHandleBlockThreshold(qemuMonitorPtr mon G_GNUC_UNUSED, if (virStorageSourceIsLocalStorage(src)) path = src->path; -if ((dev = qemuDomainDiskBackingStoreGetName(disk, src, idx))) +if ((dev = qemuDomainDiskBackingStoreGetName(disk, idx))) event = virDomainEventBlockThresholdNewFromObj(vm, dev, path, threshold, excess); } -- 2.26.2
[PATCH 06/10] qemuDomainDiskLookupByNodename: Remove unused 'idx'
All callers pass NULL as the value. Remove the argument. Signed-off-by: Peter Krempa --- src/qemu/qemu_domain.c | 12 ++-- src/qemu/qemu_domain.h | 3 +-- src/qemu/qemu_process.c | 4 ++-- 3 files changed, 5 insertions(+), 14 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index cfdd9270da..64bd52c51f 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -11469,7 +11469,6 @@ qemuDomainNamespaceTeardownInput(virDomainObjPtr vm, * @def: domain definition to look for the disk * @nodename: block backend node name to find * @src: filled with the specific backing store element if provided - * @idx: index of @src in the backing chain, if provided * * Looks up the disk in the domain via @nodename and returns its definition. * Optionally fills @src and @idx if provided with the specific backing chain @@ -11478,24 +11477,17 @@ qemuDomainNamespaceTeardownInput(virDomainObjPtr vm, virDomainDiskDefPtr qemuDomainDiskLookupByNodename(virDomainDefPtr def, const char *nodename, - virStorageSourcePtr *src, - unsigned int *idx) + virStorageSourcePtr *src) { size_t i; -unsigned int srcindex; virStorageSourcePtr tmp = NULL; -if (!idx) -idx = &srcindex; - if (src) *src = NULL; -*idx = 0; - for (i = 0; i < def->ndisks; i++) { if ((tmp = virStorageSourceFindByNodeName(def->disks[i]->src, - nodename, idx))) { + nodename, NULL))) { if (src) *src = tmp; diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 6944b37ff7..d75fbc0af3 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -959,8 +959,7 @@ int qemuDomainNamespaceTeardownInput(virDomainObjPtr vm, virDomainDiskDefPtr qemuDomainDiskLookupByNodename(virDomainDefPtr def, const char *nodename, - virStorageSourcePtr *src, - unsigned int *idx); + virStorageSourcePtr *src); char *qemuDomainDiskBackingStoreGetName(virDomainDiskDefPtr disk, unsigned int idx); diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 9b6dc8e68b..580abb0fb4 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -880,7 +880,7 @@ qemuProcessHandleIOError(qemuMonitorPtr mon G_GNUC_UNUSED, if (diskAlias) disk = qemuProcessFindDomainDiskByAliasOrQOM(vm, diskAlias, NULL); else if (nodename) -disk = qemuDomainDiskLookupByNodename(vm->def, nodename, NULL, NULL); +disk = qemuDomainDiskLookupByNodename(vm->def, nodename, NULL); else disk = NULL; @@ -1509,7 +1509,7 @@ qemuProcessHandleBlockThreshold(qemuMonitorPtr mon G_GNUC_UNUSED, "threshold '%llu' exceeded by '%llu'", nodename, vm, vm->def->name, threshold, excess); -if ((disk = qemuDomainDiskLookupByNodename(vm->def, nodename, &src, NULL))) { +if ((disk = qemuDomainDiskLookupByNodename(vm->def, nodename, &src))) { if (virStorageSourceIsLocalStorage(src)) path = src->path; -- 2.26.2
[PATCH 08/10] qemuDomainDiskLookupByNodename: Look also for 'mirror' node names
When doing a block copy, there is another chain of images attached to a disk. Consider them as well when looking up a disk using nodename. Signed-off-by: Peter Krempa --- src/qemu/qemu_domain.c | 8 1 file changed, 8 insertions(+) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index ed7ec77ed4..18fd445e30 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -11492,6 +11492,14 @@ qemuDomainDiskLookupByNodename(virDomainDefPtr def, return def->disks[i]; } + +if (def->disks[i]->mirror && +(tmp = virStorageSourceFindByNodeName(def->disks[i]->mirror, nodename))) { +if (src) +*src = tmp; + +return def->disks[i]; +} } return NULL; -- 2.26.2
[PATCH 09/10] qemuDomainGetStorageSourceByDevstr: Look also in 'mirror' chain
A disk can have a mirror, look also in it's backing chain. Signed-off-by: Peter Krempa --- src/qemu/qemu_domain.c | 13 + 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 18fd445e30..ebf18a546e 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -11553,11 +11553,16 @@ qemuDomainGetStorageSourceByDevstr(const char *devstr, } if (idx == 0) -src = disk->src; -else -src = virStorageFileChainLookup(disk->src, NULL, NULL, idx, NULL); +return disk->src; + +if ((src = virStorageFileChainLookup(disk->src, NULL, NULL, idx, NULL))) +return src; -return src; +if (disk->mirror && +(src = virStorageFileChainLookup(disk->mirror, NULL, NULL, idx, NULL))) +return src; + +return NULL; } -- 2.26.2
[PATCH 00/10] Fix device names reported by 'write_threshold' event and add support for monitoring 'mirror'
Allow monitoring the 'mirror' image write threshold and also report correct aliases. Clean up some leftovers and improve docs while at it. Peter Krempa (10): virDomainSetBlockThreshold: Document values of '@dev' better qemuDomainDiskBackingStoreGetName: Remove unused argument qemuDomainDiskBackingStoreGetName: Eliminate temp variable qemuProcessHandleBlockThreshold: Report correct indexes virDomainSetBlockThreshold: Clarify values of @dev the event is fired for qemuDomainDiskLookupByNodename: Remove unused 'idx' virStorageSourceFindByNodeName: Remove unused 'idx' argument qemuDomainDiskLookupByNodename: Look also for 'mirror' node names qemuDomainGetStorageSourceByDevstr: Look also in 'mirror' chain virDomainSetBlockThreshold: Mention that the event can be registered for src/libvirt-domain.c | 13 +++ src/qemu/qemu_domain.c| 49 +++ src/qemu/qemu_domain.h| 4 +--- src/qemu/qemu_process.c | 28 +++--- src/util/virstoragefile.c | 16 +++-- src/util/virstoragefile.h | 3 +-- 6 files changed, 61 insertions(+), 52 deletions(-) -- 2.26.2
[PATCH 04/10] qemuProcessHandleBlockThreshold: Report correct indexes
The index returned by qemuDomainDiskLookupByNodename is the position in the backing chain rather than the index we report in the XML. Since with -blockdev they differ now and additionally the disk source also has an index we need to fix the 'threshold' evens we report: 1) If it's the top level image we must always trigger the event without any suffix as we did until now 2) We must report the correct index 3) We must report the correct index also for the top level image, when blockdev is used. This means that we need to potentially emit 2 events, one for the device without the index and then when blockdev is used and the top level image has an idex we must do it also with the index. This will fix it for blockdev cases, while also not removing previous semantics. https://bugzilla.redhat.com/show_bug.cgi?id=1857204 Signed-off-by: Peter Krempa --- src/qemu/qemu_process.c | 26 ++ 1 file changed, 18 insertions(+), 8 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 2ee778c606..9b6dc8e68b 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -1497,10 +1497,10 @@ qemuProcessHandleBlockThreshold(qemuMonitorPtr mon G_GNUC_UNUSED, void *opaque) { virQEMUDriverPtr driver = opaque; -virObjectEventPtr event = NULL; +virObjectEventPtr eventSource = NULL; +virObjectEventPtr eventDevice = NULL; virDomainDiskDefPtr disk; virStorageSourcePtr src; -unsigned int idx; const char *path = NULL; virObjectLock(vm); @@ -1509,18 +1509,28 @@ qemuProcessHandleBlockThreshold(qemuMonitorPtr mon G_GNUC_UNUSED, "threshold '%llu' exceeded by '%llu'", nodename, vm, vm->def->name, threshold, excess); -if ((disk = qemuDomainDiskLookupByNodename(vm->def, nodename, &src, &idx))) { -g_autofree char *dev = NULL; +if ((disk = qemuDomainDiskLookupByNodename(vm->def, nodename, &src, NULL))) { if (virStorageSourceIsLocalStorage(src)) path = src->path; -if ((dev = qemuDomainDiskBackingStoreGetName(disk, idx))) -event = virDomainEventBlockThresholdNewFromObj(vm, dev, path, - threshold, excess); +if (src == disk->src) { +g_autofree char *dev = qemuDomainDiskBackingStoreGetName(disk, 0); + +eventDevice = virDomainEventBlockThresholdNewFromObj(vm, dev, path, + threshold, excess); +} + +if (src->id != 0) { +g_autofree char *dev = qemuDomainDiskBackingStoreGetName(disk, src->id); + +eventSource = virDomainEventBlockThresholdNewFromObj(vm, dev, path, + threshold, excess); +} } virObjectUnlock(vm); -virObjectEventStateQueue(driver->domainEventState, event); +virObjectEventStateQueue(driver->domainEventState, eventDevice); +virObjectEventStateQueue(driver->domainEventState, eventSource); return 0; } -- 2.26.2
Re: [PATCH] Substitute security_context_t with char *
On Wed, 2020-07-15 at 13:45 +0200, Michal Privoznik wrote: > Historically, we've used security_context_t for variables passed > to libselinux APIs. But almost 7 years ago, libselinux developers > admitted in their API that in fact, it's just a 'char *' type > [1]. Ever since then the APIs accept 'char *' instead, but they > kept the old alias just for API stability. Well, not anymore [2]. > > 1: > https://github.com/SELinuxProject/selinux/commit/9eb9c9327563014ad6a807814e7975424642d5b9 > 2: > https://github.com/SELinuxProject/selinux/commit/7a124ca2758136f49cc38efc26fb1a2d385ecfd9 > > Signed-off-by: Michal Privoznik > --- > src/libvirt-lxc.c| 2 +- > src/rpc/virnetsocket.c | 2 +- > src/security/security_selinux.c | 26 +- > src/storage/storage_util.c | 2 +- > src/util/viridentity.c | 2 +- > tests/securityselinuxhelper.c| 16 > tests/securityselinuxlabeltest.c | 4 ++-- > tests/securityselinuxtest.c | 2 +- > tests/viridentitytest.c | 2 +- > 9 files changed, 29 insertions(+), 29 deletions(-) Reviewed-by: Andrea Bolognani -- Andrea Bolognani / Red Hat / Virtualization
Re: [libvirt PATCH 9/9] rpc: use new virt-nc binary for remote tunnelling
On Wed, 2020-07-15 at 11:00 +0100, Daniel P. Berrangé wrote: > On Fri, Jul 10, 2020 at 07:21:47PM +0200, Andrea Bolognani wrote: > > Just a couple of comments about the UI: would it make sense to use > > something like > > > > qemu+ssh://host/system?tunnelcmd=virt-tunnel > > > > instead? virt-nc could be seen as a bit of a misnomer, considering > > that (by design) it doesn't do nearly as much as the actual netcat > > does; as for the 'proxy' argument, I'm afraid it might lead people > > to believe it's used for HTTP proxying or some other form of > > proxying *between the client and the host*, whereas it's really > > something that only affects operations on the host itself - not to > > mention that we also have a virtproxyd now, further increasing the > > potential for confusion... > > I chose proxy not tunnel, because SSH is providing the tunnel here. > virt-nc is a proxy linking the tunnel to the daemon. virtproxyd is > conceptually similar, again linking a libvirt client to the real > daemon. Mh, that makes sense but I'm still wary of using "proxy" due to the potential for confusion, since in this case the proxy is on the opposite side of the connection than one would probably expect it to be. Something like "remoteproxy" or "serverproxy", perhaps? > I probably shouldn't mention "virt-nc" by name in the URI and instead > have "proxy=netcat" vs "proxy=native", as users don't get to choose > the actual binary here - they are providing an enum string, which > gets mapped to the desired binary. Yeah, that's much better. Going back to the name of the command itself, since it's an internal implementation details, and as such it's not intended to be invoked by users and accordingly we're installing it under $(libexecdir) along with existing helpers, what about following the established naming convention and calling it 'libvirt_proxyhelper'? -- Andrea Bolognani / Red Hat / Virtualization
[PATCH 0/3] Fix virnetsocket failure in IPv4 disabled environment
This imperfection was reported by Andrea here: https://www.redhat.com/archives/libvir-list/2020-July/msg00753.html Michal Prívozník (3): virNetSocketCheckProtocols: Separate out checking family via getaddrinfo() virNetSocketCheckProtocols: lookup IPv6 only if suspecting IPv6 virNetSocketCheckProtocols: Confirm IPv4 by lookup too src/rpc/virnetsocket.c | 67 +++--- 1 file changed, 43 insertions(+), 24 deletions(-) -- 2.26.2
[PATCH 2/3] virNetSocketCheckProtocols: lookup IPv6 only if suspecting IPv6
There is not much sense trying to disprove host is IPv6 capable if we know after first round (getifaddrs()) that is is not. Signed-off-by: Michal Privoznik --- src/rpc/virnetsocket.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/rpc/virnetsocket.c b/src/rpc/virnetsocket.c index b6bc3edc3b..b0d63f0f2c 100644 --- a/src/rpc/virnetsocket.c +++ b/src/rpc/virnetsocket.c @@ -205,7 +205,8 @@ int virNetSocketCheckProtocols(bool *hasIPv4, freeifaddrs(ifaddr); -if (virNetSocketCheckProtocolByLookup("::1", AF_INET6, hasIPv6) < 0) +if (hasIPv6 && +virNetSocketCheckProtocolByLookup("::1", AF_INET6, hasIPv6) < 0) return -1; VIR_DEBUG("Protocols: v4 %d v6 %d", *hasIPv4, *hasIPv6); -- 2.26.2
[PATCH 3/3] virNetSocketCheckProtocols: Confirm IPv4 by lookup too
Historically, if we found an interface with an IPv6 address we did not blindly trust that host is IPv6 capable (as in we can successfully translate IPv4 addresses) but used getaddrinfo() to confirm it. Turns out, we have use the same argument for IPv4. For instance, in an namespace created by the following steps, getaddrinfo("127.0.0.1", ...) fails (demonstrating by "Socket TCP/IPv4 Accept" test case failing in virnetsockettest): unshare -n ip link set lo up ip link add dummy0 type dummy ip link set dummy0 up Signed-off-by: Michal Privoznik --- src/rpc/virnetsocket.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/rpc/virnetsocket.c b/src/rpc/virnetsocket.c index b0d63f0f2c..c62c2fb3fc 100644 --- a/src/rpc/virnetsocket.c +++ b/src/rpc/virnetsocket.c @@ -204,6 +204,9 @@ int virNetSocketCheckProtocols(bool *hasIPv4, freeifaddrs(ifaddr); +if (hasIPv4 && +virNetSocketCheckProtocolByLookup("127.0.0.1", AF_INET, hasIPv4) < 0) +return -1; if (hasIPv6 && virNetSocketCheckProtocolByLookup("::1", AF_INET6, hasIPv6) < 0) -- 2.26.2
[PATCH 1/3] virNetSocketCheckProtocols: Separate out checking family via getaddrinfo()
The virNetSocketCheckProtocols() function is supposed to tell caller whether IPv4 and/or IPv6 is supported on the system. In the initial round, it uses getifaddrs() to see if an interface has IPv4/IPv6 address assigned and then to double check IPv6 it uses getaddrinfo() to lookup IPv6 loopback address. Separate out this latter code because it is going to be reused. Since the original code lived under an #ifdef and the new function doesn't it is marked as unused - because on some systems it may be so. Signed-off-by: Michal Privoznik --- src/rpc/virnetsocket.c | 63 ++ 1 file changed, 39 insertions(+), 24 deletions(-) diff --git a/src/rpc/virnetsocket.c b/src/rpc/virnetsocket.c index d1f4c531aa..b6bc3edc3b 100644 --- a/src/rpc/virnetsocket.c +++ b/src/rpc/virnetsocket.c @@ -141,16 +141,48 @@ static int virNetSocketForkDaemon(const char *binary) } #endif + +static int G_GNUC_UNUSED +virNetSocketCheckProtocolByLookup(const char *address, + int family, + bool *hasFamily) +{ +struct addrinfo hints; +struct addrinfo *ai = NULL; +int gaierr; + +memset(&hints, 0, sizeof(hints)); +hints.ai_family = family; +hints.ai_flags = AI_PASSIVE | AI_ADDRCONFIG; +hints.ai_socktype = SOCK_STREAM; + +if ((gaierr = getaddrinfo(address, NULL, &hints, &ai)) != 0) { +*hasFamily = false; + +if (gaierr == EAI_FAMILY || +#ifdef EAI_ADDRFAMILY +gaierr == EAI_ADDRFAMILY || +#endif +gaierr == EAI_NONAME) { +} else { +virReportError(VIR_ERR_INTERNAL_ERROR, + _("Cannot resolve ::1 address: %s"), + gai_strerror(gaierr)); +return -1; +} +} else { +*hasFamily = true; +} + +freeaddrinfo(ai); +return 0; +} + int virNetSocketCheckProtocols(bool *hasIPv4, bool *hasIPv6) { #ifdef HAVE_IFADDRS_H struct ifaddrs *ifaddr = NULL, *ifa; -struct addrinfo hints; -struct addrinfo *ai = NULL; -int gaierr; - -memset(&hints, 0, sizeof(hints)); *hasIPv4 = *hasIPv6 = false; @@ -172,26 +204,9 @@ int virNetSocketCheckProtocols(bool *hasIPv4, freeifaddrs(ifaddr); -hints.ai_flags = AI_PASSIVE | AI_ADDRCONFIG; -hints.ai_family = AF_INET6; -hints.ai_socktype = SOCK_STREAM; -if ((gaierr = getaddrinfo("::1", NULL, &hints, &ai)) != 0) { -if (gaierr == EAI_FAMILY || -# ifdef EAI_ADDRFAMILY -gaierr == EAI_ADDRFAMILY || -# endif -gaierr == EAI_NONAME) { -*hasIPv6 = false; -} else { -virReportError(VIR_ERR_INTERNAL_ERROR, - _("Cannot resolve ::1 address: %s"), - gai_strerror(gaierr)); -return -1; -} -} - -freeaddrinfo(ai); +if (virNetSocketCheckProtocolByLookup("::1", AF_INET6, hasIPv6) < 0) +return -1; VIR_DEBUG("Protocols: v4 %d v6 %d", *hasIPv4, *hasIPv6); -- 2.26.2
[PATCH] Substitute security_context_t with char *
Historically, we've used security_context_t for variables passed to libselinux APIs. But almost 7 years ago, libselinux developers admitted in their API that in fact, it's just a 'char *' type [1]. Ever since then the APIs accept 'char *' instead, but they kept the old alias just for API stability. Well, not anymore [2]. 1: https://github.com/SELinuxProject/selinux/commit/9eb9c9327563014ad6a807814e7975424642d5b9 2: https://github.com/SELinuxProject/selinux/commit/7a124ca2758136f49cc38efc26fb1a2d385ecfd9 Signed-off-by: Michal Privoznik --- src/libvirt-lxc.c| 2 +- src/rpc/virnetsocket.c | 2 +- src/security/security_selinux.c | 26 +- src/storage/storage_util.c | 2 +- src/util/viridentity.c | 2 +- tests/securityselinuxhelper.c| 16 tests/securityselinuxlabeltest.c | 4 ++-- tests/securityselinuxtest.c | 2 +- tests/viridentitytest.c | 2 +- 9 files changed, 29 insertions(+), 29 deletions(-) diff --git a/src/libvirt-lxc.c b/src/libvirt-lxc.c index 47a06a39f2..25f1cfc5f7 100644 --- a/src/libvirt-lxc.c +++ b/src/libvirt-lxc.c @@ -204,7 +204,7 @@ virDomainLxcEnterSecurityLabel(virSecurityModelPtr model, if (STREQ(model->model, "selinux")) { #ifdef WITH_SELINUX if (oldlabel) { -security_context_t ctx; +char *ctx; if (getcon(&ctx) < 0) { virReportSystemError(errno, diff --git a/src/rpc/virnetsocket.c b/src/rpc/virnetsocket.c index c62c2fb3fc..9aaabb4577 100644 --- a/src/rpc/virnetsocket.c +++ b/src/rpc/virnetsocket.c @@ -1612,7 +1612,7 @@ int virNetSocketGetUNIXIdentity(virNetSocketPtr sock G_GNUC_UNUSED, int virNetSocketGetSELinuxContext(virNetSocketPtr sock, char **context) { -security_context_t seccon = NULL; +char *seccon = NULL; int ret = -1; *context = NULL; diff --git a/src/security/security_selinux.c b/src/security/security_selinux.c index 1d28430035..cc8fb1099c 100644 --- a/src/security/security_selinux.c +++ b/src/security/security_selinux.c @@ -198,7 +198,7 @@ virSecuritySELinuxTransactionAppend(const char *path, static int virSecuritySELinuxRememberLabel(const char *path, -const security_context_t con) +const char *con) { return virSecuritySetRememberedLabel(SECURITY_SELINUX_NAME, path, con); @@ -207,7 +207,7 @@ virSecuritySELinuxRememberLabel(const char *path, static int virSecuritySELinuxRecallLabel(const char *path, - security_context_t *con) + char **con) { int rv; @@ -431,7 +431,7 @@ virSecuritySELinuxMCSGetProcessRange(char **sens, int *catMin, int *catMax) { -security_context_t ourSecContext = NULL; +char *ourSecContext = NULL; context_t ourContext = NULL; char *cat = NULL; char *tmp; @@ -530,8 +530,8 @@ virSecuritySELinuxMCSGetProcessRange(char **sens, } static char * -virSecuritySELinuxContextAddRange(security_context_t src, - security_context_t dst) +virSecuritySELinuxContextAddRange(char *src, + char *dst) { char *str = NULL; char *ret = NULL; @@ -575,7 +575,7 @@ virSecuritySELinuxGenNewContext(const char *basecontext, context_t context = NULL; char *ret = NULL; char *str; -security_context_t ourSecContext = NULL; +char *ourSecContext = NULL; context_t ourContext = NULL; VIR_DEBUG("basecontext=%s mcs=%s isObjectContext=%d", @@ -955,7 +955,7 @@ virSecuritySELinuxReserveLabel(virSecurityManagerPtr mgr, virDomainDefPtr def, pid_t pid) { -security_context_t pctx; +char *pctx; context_t ctx = NULL; const char *mcs; int rv; @@ -1203,7 +1203,7 @@ virSecuritySELinuxGetProcessLabel(virSecurityManagerPtr mgr G_GNUC_UNUSED, pid_t pid, virSecurityLabelPtr sec) { -security_context_t ctx; +char *ctx; if (getpidcon_raw(pid, &ctx) == -1) { virReportSystemError(errno, @@ -1316,7 +1316,7 @@ virSecuritySELinuxSetFilecon(virSecurityManagerPtr mgr, bool remember) { bool privileged = virSecurityManagerGetPrivileged(mgr); -security_context_t econ = NULL; +char *econ = NULL; int refcount; int rc; bool rollback = false; @@ -1426,7 +1426,7 @@ virSecuritySELinuxFSetFilecon(int fd, char *tcon) /* Set fcon to the appropriate label for path and mode, or return -1. */ static int getContext(virSecurityManagerPtr mgr G_GNUC_UNUSED, - const char *newpath, mode_t mode, security_context_t *fcon) + const char *newpath, mode_t mo
Re: [libvirt PATCH 0/2] tests: Don't assume IPv4 connectivity is available
On Wed, 2020-07-15 at 12:38 +0200, Michal Privoznik wrote: > On 7/14/20 10:32 PM, Andrea Bolognani wrote: > > I started looking into this after seeing > > > >FAIL: virnetsockettest > >== > > > >TEST: virnetsockettest > > 1) Socket TCP/IPv4 Accept ... libvirt: XML-RPC > > error : Unable to resolve address '127.0.0.1' service '5672': Address > > family for hostname not supported > >FAILED > > during a Debian package build. > > > > Full log: > > > > > > https://buildd.debian.org/status/fetch.php?pkg=libvirt&arch=armel&ver=6.4.0-2&stamp=1594738948&raw=0 > > > > Just a few days ago, this issue was discussed in > > > >https://lists.debian.org/debian-devel/2020/07/msg00070.html > > > > and libvirt was mentioned explicitly as a package that could be > > affected by it. > > Indeed. I'm able to reproduce and working on a fix as we speak. > The problem is that our test assumes that if there is an interface with > IPv4 address (as returned by getifaddrs()) then getaddrinfo() of an IPv4 > address succeeds. We've seen with IPv6 that it is not true - that's why > virNetSocketCheckProtocols() does explicit getaddrinfo() for an IPv6 > address. We need to do the same for IPv4. That's what I thought needed to happen, but I lack the networking know-how to write the patches myself O:-) > Anyway, for these two: > > Reviewed-by: Michal Privoznik Thanks, pushed now. -- Andrea Bolognani / Red Hat / Virtualization
Re: [libvirt PATCH 0/2] tests: Don't assume IPv4 connectivity is available
On 7/14/20 10:32 PM, Andrea Bolognani wrote: I started looking into this after seeing FAIL: virnetsockettest == TEST: virnetsockettest 1) Socket TCP/IPv4 Accept ... libvirt: XML-RPC error : Unable to resolve address '127.0.0.1' service '5672': Address family for hostname not supported FAILED during a Debian package build. Full log: https://buildd.debian.org/status/fetch.php?pkg=libvirt&arch=armel&ver=6.4.0-2&stamp=1594738948&raw=0 Just a few days ago, this issue was discussed in https://lists.debian.org/debian-devel/2020/07/msg00070.html and libvirt was mentioned explicitly as a package that could be affected by it. Indeed. I'm able to reproduce and working on a fix as we speak. The problem is that our test assumes that if there is an interface with IPv4 address (as returned by getifaddrs()) then getaddrinfo() of an IPv4 address succeeds. We've seen with IPv6 that it is not true - that's why virNetSocketCheckProtocols() does explicit getaddrinfo() for an IPv6 address. We need to do the same for IPv4. Anyway, for these two: Reviewed-by: Michal Privoznik Michal
Re: Re: [RFC 01/21] build-aux: Add a tool to generate xml parse/format functions
On Wed, Jul 01, 2020 at 12:06:36AM +0800, Shi Lei wrote: > >On Wed, Jun 10, 2020 at 09:20:29AM +0800, Shi Lei wrote: > >> This tool is used to generate parsexml/formatbuf functions for structs. > >> It is based on libclang and its python-binding. > >> Some directives (such as genparse, xmlattr, etc.) need to be added on > >> the declarations of structs to direct the tool. > >> > >> Signed-off-by: Shi Lei > >> --- > >> build-aux/generator/directive.py | 839 +++ > >> build-aux/generator/go | 14 + > >> build-aux/generator/main.py | 416 +++ > >> build-aux/generator/utils.py | 100 > > > >> +if __name__ == "__main__": > > > >> + > >> + libclang_path = os.environ.get('libclang_path') > >> + assert libclang_path, 'No libclang library.' > >> + Config.set_library_file(libclang_path) > > > >I'm wondering if we really need this at all ? AFAICT, the libclang > >python library works fine without it, so this is only required if > >trying to use a non-standard libclang, which I think ought to be > >possible already by setting LD_LIBRARY_PATH > > > > I'm working on Ubuntu. On Ubuntu 20.04 LTS and 18.04, I find: > > # ldconfig -p | grep libclang > libclang-10.so.1 (libc6,x86-64) => > /lib/x86_64-linux-gnu/libclang-10.so.1 > libclang-10.so (libc6,x86-64) => /lib/x86_64-linux-gnu/libclang-10.so > > There's no libclang.so! > > If I install python3 bindings from the standard apt repository, it works! > By default, this version just find 'libclang-10.so'. > > But if I use 'pip3' to install python3 bindings, it doesn't work. > Because it is another verison. This version find 'libclang.so' and it can't > find it! > > So I think we can retain these lines to adapt to the distinction among > platforms. Generally libvirt only cares about working against the distro provided versions of packages. So if the apt installed python binding works that is sufficient. If we want to support an alternative libclang though, I'd suggest that instead of trying to figure it out magically, just have an optional arg to configure, eg --with-libclang=/lib/x86_64-linux-gnu/libclang-10.so By default don't set any libclang in the python binding - just let it use its built-in default. Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: [libvirt PATCH 9/9] rpc: use new virt-nc binary for remote tunnelling
On Fri, Jul 10, 2020 at 07:21:47PM +0200, Andrea Bolognani wrote: > On Thu, 2020-07-09 at 19:36 +0100, Daniel P. Berrangé wrote: > > This wires up support for using the new virt-nc binary with the ssh, > > libssh and libssh2 protocols. > > > > The new binary will be used preferentially if it is available in $PATH, > > otherwise we fall back to traditional netcat. > > > > The "proxy" URI parameter can be used to force use of netcat e.g. > > > > qemu+ssh://host/system?proxy=netcat > > > > or the disable fallback e.g. > > > > qemu+ssh://host/system?proxy=virt-nc > > > > With use of virt-nc, we can now support remote session URIs > > > > qemu+ssh://host/session > > > > and this will only use virt-nc, with no fallback. This also lets the > > libvirtd process be auto-started. > > Just a couple of comments about the UI: would it make sense to use > something like > > qemu+ssh://host/system?tunnelcmd=virt-tunnel > > instead? virt-nc could be seen as a bit of a misnomer, considering > that (by design) it doesn't do nearly as much as the actual netcat > does; as for the 'proxy' argument, I'm afraid it might lead people > to believe it's used for HTTP proxying or some other form of > proxying *between the client and the host*, whereas it's really > something that only affects operations on the host itself - not to > mention that we also have a virtproxyd now, further increasing the > potential for confusion... I chose proxy not tunnel, because SSH is providing the tunnel here. virt-nc is a proxy linking the tunnel to the daemon. virtproxyd is conceptually similar, again linking a libvirt client to the real daemon. I probably shouldn't mention "virt-nc" by name in the URI and instead have "proxy=netcat" vs "proxy=native", as users don't get to choose the actual binary here - they are providing an enum string, which gets mapped to the desired binary. Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: [libvirt PATCH 7/9] remote: introduce virtd-nc helper binary
On Fri, Jul 10, 2020 at 02:04:00PM +0200, Michal Privoznik wrote: > On 7/9/20 8:36 PM, Daniel P. Berrangé wrote: > > When accessing libvirtd over a SSH tunnel, the remote driver must spawn > > the remote 'nc' process, pointing it to the libvirtd socket path. This > > is problematic for a number of reasons: > > > > - The socket path varies according to the --prefix chosen at build > > time. The remote client is seeing the local prefix, but what we > > need is the remote prefix > > > > - The socket path varies according to remote env variables, such as > > the XDG_RUNTIME_DIR location. Again we see the local XDG_RUNTIME_DIR > > value, but what we need is the remote value (if any) > > > > - We can not able to autospawn the libvirtd daemon for session mode > > access > > > > To address these problems this patch introduces the 'virtd-nc' helper > > program which takes the URI for the remote driver as a CLI parameter. > > It then figures out the socket path to connect to using the same > > code as the remote driver does on the remote host. > > > > Signed-off-by: Daniel P. Berrangé > > --- > > build-aux/syntax-check.mk | 2 +- > > po/POTFILES.in | 1 + > > src/remote/Makefile.inc.am | 30 +++ > > src/remote/remote_nc.c | 424 + > > src/rpc/virnetsocket.h | 1 + > > 5 files changed, 457 insertions(+), 1 deletion(-) > > create mode 100644 src/remote/remote_nc.c > > The spec file needs to be updated too. Oh right, of course. Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: [GSoC][PATCH v4 4/4] qemu_domainjob: introduce `privateData` for `qemuDomainJobInfo`
On Tue, Jul 14, 2020 at 10:25 PM Michal Privoznik wrote: > > On 7/14/20 5:14 PM, Prathamesh Chavan wrote: > > Currently, domainJobInfo also uses "stats" as one of the job specific > > parameters. To remove this dependency, a privateData structure is > > introduced. > > > > The plan is to even have this structure renamed as > > `virDomainJobInfoInternal` as there already exists a > > `virDomainJobInfo'. > > I see. Well, I'm not that sure about virDomainJobInfoInternal (mostly > because this qemuDomainJobInfo structure looks too qemu specific). > How about moving it under qemuDomainJobObj privateData? I mean, > qemuDomainJobPrivate structure you propose in 3/4? Yes, this can be done. This shall be sufficient to remove qemu_domainjob dependency on other files. Also, since libxl simply uses the virDomainJobInfo, I think we can skip creating the virDomainJobInfoInternal altogether. Thanks.
Re: device compatibility interface for live migration with assigned devices
Yan Zhao 于2020年7月15日周三 下午4:32写道: > On Tue, Jul 14, 2020 at 02:59:48PM -0600, Alex Williamson wrote: > > On Tue, 14 Jul 2020 18:19:46 +0100 > > "Dr. David Alan Gilbert" wrote: > > > > > * Alex Williamson (alex.william...@redhat.com) wrote: > > > > On Tue, 14 Jul 2020 11:21:29 +0100 > > > > Daniel P. Berrangé wrote: > > > > > > > > > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote: > > > > > > hi folks, > > > > > > we are defining a device migration compatibility interface that > helps upper > > > > > > layer stack like openstack/ovirt/libvirt to check if two devices > are > > > > > > live migration compatible. > > > > > > The "devices" here could be MDEVs, physical devices, or hybrid > of the two. > > > > > > e.g. we could use it to check whether > > > > > > - a src MDEV can migrate to a target MDEV, > > > > > > - a src VF in SRIOV can migrate to a target VF in SRIOV, > > > > > > - a src MDEV can migration to a target VF in SRIOV. > > > > > > (e.g. SIOV/SRIOV backward compatibility case) > > > > > > > > > > > > The upper layer stack could use this interface as the last step > to check > > > > > > if one device is able to migrate to another device before > triggering a real > > > > > > live migration procedure. > > > > > > we are not sure if this interface is of value or help to you. > please don't > > > > > > hesitate to drop your valuable comments. > > > > > > > > > > > > > > > > > > (1) interface definition > > > > > > The interface is defined in below way: > > > > > > > > > > > > __userspace > > > > > > /\ \ > > > > > > / \write > > > > > > / read \ > > > > > >/__ ___\|/_ > > > > > > | migration_version | | migration_version |-->check > migration > > > > > > - - compatibility > > > > > > device Adevice B > > > > > > > > > > > > > > > > > > a device attribute named migration_version is defined under each > device's > > > > > > sysfs node. e.g. > (/sys/bus/pci/devices/\:00\:02.0/$mdev_UUID/migration_version). > > > > > > userspace tools read the migration_version as a string from the > source device, > > > > > > and write it to the migration_version sysfs attribute in the > target device. > > > > > > > > > > > > The userspace should treat ANY of below conditions as two > devices not compatible: > > > > > > - any one of the two devices does not have a migration_version > attribute > > > > > > - error when reading from migration_version attribute of one > device > > > > > > - error when writing migration_version string of one device to > > > > > > migration_version attribute of the other device > > > > > > > > > > > > The string read from migration_version attribute is defined by > device vendor > > > > > > driver and is completely opaque to the userspace. > > > > > > for a Intel vGPU, string format can be defined like > > > > > > "parent device PCI ID" + "version of gvt driver" + "mdev type" + > "aggregator count". > > > > > > > > > > > > for an NVMe VF connecting to a remote storage. it could be > > > > > > "PCI ID" + "driver version" + "configured remote storage URL" > > > > > > > > > > > > for a QAT VF, it may be > > > > > > "PCI ID" + "driver version" + "supported encryption set". > > > > > > > > > > > > (to avoid namespace confliction from each vendor, we may prefix > a driver name to > > > > > > each migration_version string. e.g. > i915-v1-8086-591d-i915-GVTg_V5_8-1) > > > > > > > > It's very strange to define it as opaque and then proceed to describe > > > > the contents of that opaque string. The point is that its contents > > > > are defined by the vendor driver to describe the device, driver > version, > > > > and possibly metadata about the configuration of the device. One > > > > instance of a device might generate a different string from another. > > > > The string that a device produces is not necessarily the only string > > > > the vendor driver will accept, for example the driver might support > > > > backwards compatible migrations. > > > > > > (As I've said in the previous discussion, off one of the patch series) > > > > > > My view is it makes sense to have a half-way house on the opaqueness of > > > this string; I'd expect to have an ID and version that are human > > > readable, maybe a device ID/name that's human interpretable and then a > > > bunch of other cruft that maybe device/vendor/version specific. > > > > > > I'm thinking that we want to be able to report problems and include the > > > string and the user to be able to easily identify the device that was > > > complaining and notice a difference in versions, and perhaps also use > > > it in compatibility patterns to find compatible hosts; but that does > > > get tricky when it's a 'ask the device if it's compatible'. > > > > In the reply I just sent to Dan, I gave this example of what a > > "
Re: device compatibility interface for live migration with assigned devices
On Tue, Jul 14, 2020 at 02:47:15PM -0600, Alex Williamson wrote: > On Tue, 14 Jul 2020 17:47:22 +0100 > Daniel P. Berrangé wrote: > > I'm sure OpenStack maintainers can speak to this more, as they've put > > alot of work into their scheduling engine to optimize the way it places > > VMs largely driven from simple structured data reported from hosts. > > I think we've weeded out that our intended approach is not worthwhile, > testing a compatibility string at a device is too much overhead, we > need to provide enough information to the management engine to predict > the response without interaction beyond the initial capability probing. Just to clarify in case people mis-interpreted my POV... I think that testing a compatibility string at a device *is* useful, as it allows for a final accurate safety check to be performed before the migration stream starts. Libvirt could use that reasonably easily I believe. It just isn't sufficient for a complete solution. In parallel with the device level test in sysfs, we need something else to support the host placement selection problems in an efficient way, as you are trying to address in the remainder of your mail. Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
RE: device compatibility interface for live migration with assigned devices
-Original Message- From: Zhao, Yan Y Sent: 2020骞�7���15��� 16:21 To: Alex Williamson Cc: Dr. David Alan Gilbert ; Daniel P. Berrang茅 ; de...@ovirt.org; openstack-disc...@lists.openstack.org; libvir-list@redhat.com; intel-gvt-...@lists.freedesktop.org; k...@vger.kernel.org; qemu-de...@nongnu.org; smoo...@redhat.com; eskul...@redhat.com; coh...@redhat.com; dinec...@redhat.com; cor...@lwn.net; kwankh...@nvidia.com; eau...@redhat.com; Ding, Jian-feng ; Xu, Hejie ; Tian, Kevin ; zhen...@linux.intel.com; bao.yum...@zte.com.cn; Wang, Xin-ran ; Feng, Shaohe Subject: Re: device compatibility interface for live migration with assigned devices On Tue, Jul 14, 2020 at 02:59:48PM -0600, Alex Williamson wrote: > On Tue, 14 Jul 2020 18:19:46 +0100 > "Dr. David Alan Gilbert" wrote: > > > * Alex Williamson (alex.william...@redhat.com) wrote: > > > On Tue, 14 Jul 2020 11:21:29 +0100 Daniel P. Berrang茅 > > > wrote: > > > > > > > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote: > > > > > hi folks, > > > > > we are defining a device migration compatibility interface > > > > > that helps upper layer stack like openstack/ovirt/libvirt to > > > > > check if two devices are live migration compatible. > > > > > The "devices" here could be MDEVs, physical devices, or hybrid of the > > > > > two. > > > > > e.g. we could use it to check whether > > > > > - a src MDEV can migrate to a target MDEV, > > > > > - a src VF in SRIOV can migrate to a target VF in SRIOV, > > > > > - a src MDEV can migration to a target VF in SRIOV. > > > > > (e.g. SIOV/SRIOV backward compatibility case) > > > > > > > > > > The upper layer stack could use this interface as the last > > > > > step to check if one device is able to migrate to another > > > > > device before triggering a real live migration procedure. > > > > > we are not sure if this interface is of value or help to you. > > > > > please don't hesitate to drop your valuable comments. > > > > > > > > > > > > > > > (1) interface definition > > > > > The interface is defined in below way: > > > > > > > > > > __userspace > > > > > /\ \ > > > > > / \write > > > > > / read \ > > > > >/__ ___\|/_ > > > > > | migration_version | | migration_version |-->check migration > > > > > - - compatibility > > > > > device Adevice B > > > > > > > > > > > > > > > a device attribute named migration_version is defined under > > > > > each device's sysfs node. e.g. > > > > > (/sys/bus/pci/devices/\:00\:02.0/$mdev_UUID/migration_version). > > > > > userspace tools read the migration_version as a string from > > > > > the source device, and write it to the migration_version sysfs > > > > > attribute in the target device. > > > > > > > > > > The userspace should treat ANY of below conditions as two devices not > > > > > compatible: > > > > > - any one of the two devices does not have a migration_version > > > > > attribute > > > > > - error when reading from migration_version attribute of one > > > > > device > > > > > - error when writing migration_version string of one device to > > > > > migration_version attribute of the other device > > > > > > > > > > The string read from migration_version attribute is defined by > > > > > device vendor driver and is completely opaque to the userspace. > > > > > for a Intel vGPU, string format can be defined like "parent > > > > > device PCI ID" + "version of gvt driver" + "mdev type" + "aggregator > > > > > count". > > > > > > > > > > for an NVMe VF connecting to a remote storage. it could be > > > > > "PCI ID" + "driver version" + "configured remote storage URL" > > > > > > > > > > for a QAT VF, it may be > > > > > "PCI ID" + "driver version" + "supported encryption set". > > > > > > > > > > (to avoid namespace confliction from each vendor, we may > > > > > prefix a driver name to each migration_version string. e.g. > > > > > i915-v1-8086-591d-i915-GVTg_V5_8-1) > > > > > > It's very strange to define it as opaque and then proceed to > > > describe the contents of that opaque string. The point is that > > > its contents are defined by the vendor driver to describe the > > > device, driver version, and possibly metadata about the > > > configuration of the device. One instance of a device might generate a > > > different string from another. > > > The string that a device produces is not necessarily the only > > > string the vendor driver will accept, for example the driver might > > > support backwards compatible migrations. > > > > (As I've said in the previous discussion, off one of the patch > > series) > > > > My view is it makes sense to have a half-way house on the opaqueness > > of this string; I'd expect to have an ID and version that are human > > readable, maybe a de
Re: [libvirt PATCH 1/2] ci: Drop Debian 9 jobs
On Tue, Jul 14, 2020 at 07:59:07PM +0200, Andrea Bolognani wrote: > The esisting cross-compilation jobs are carefully redistributed existing. > among Debian 10 and Debian sid to ensure we don't use the latter > for aarch64, mipsel or mips64el, since those architectures are > currently broken. > > Signed-off-by: Andrea Bolognani > --- > .gitlab-ci.yml | 74 -- > 1 file changed, 12 insertions(+), 62 deletions(-) Reviewed-by: Daniel P. Berrangé Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: [libvirt PATCH 2/2] ci: Drop Debian 9 containers
On Tue, Jul 14, 2020 at 07:59:08PM +0200, Andrea Bolognani wrote: > The corresponding libvirt-ci commit is 5abf5e7e2326. > > Signed-off-by: Andrea Bolognani > --- > .../libvirt-debian-9-cross-aarch64.Dockerfile | 128 -- > .../libvirt-debian-9-cross-armv6l.Dockerfile | 126 - > .../libvirt-debian-9-cross-armv7l.Dockerfile | 127 - > .../libvirt-debian-9-cross-mips.Dockerfile| 127 - > ...libvirt-debian-9-cross-mips64el.Dockerfile | 127 - > .../libvirt-debian-9-cross-mipsel.Dockerfile | 127 - > .../libvirt-debian-9-cross-ppc64le.Dockerfile | 127 - > .../libvirt-debian-9-cross-s390x.Dockerfile | 127 - > ci/containers/libvirt-debian-9.Dockerfile | 118 > 9 files changed, 1134 deletions(-) > delete mode 100644 ci/containers/libvirt-debian-9-cross-aarch64.Dockerfile > delete mode 100644 ci/containers/libvirt-debian-9-cross-armv6l.Dockerfile > delete mode 100644 ci/containers/libvirt-debian-9-cross-armv7l.Dockerfile > delete mode 100644 ci/containers/libvirt-debian-9-cross-mips.Dockerfile > delete mode 100644 ci/containers/libvirt-debian-9-cross-mips64el.Dockerfile > delete mode 100644 ci/containers/libvirt-debian-9-cross-mipsel.Dockerfile > delete mode 100644 ci/containers/libvirt-debian-9-cross-ppc64le.Dockerfile > delete mode 100644 ci/containers/libvirt-debian-9-cross-s390x.Dockerfile > delete mode 100644 ci/containers/libvirt-debian-9.Dockerfile Reviewed-by: Daniel P. Berrangé Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: device compatibility interface for live migration with assigned devices
On Tue, Jul 14, 2020 at 02:59:48PM -0600, Alex Williamson wrote: > On Tue, 14 Jul 2020 18:19:46 +0100 > "Dr. David Alan Gilbert" wrote: > > > * Alex Williamson (alex.william...@redhat.com) wrote: > > > On Tue, 14 Jul 2020 11:21:29 +0100 > > > Daniel P. Berrangé wrote: > > > > > > > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote: > > > > > hi folks, > > > > > we are defining a device migration compatibility interface that helps > > > > > upper > > > > > layer stack like openstack/ovirt/libvirt to check if two devices are > > > > > live migration compatible. > > > > > The "devices" here could be MDEVs, physical devices, or hybrid of the > > > > > two. > > > > > e.g. we could use it to check whether > > > > > - a src MDEV can migrate to a target MDEV, > > > > > - a src VF in SRIOV can migrate to a target VF in SRIOV, > > > > > - a src MDEV can migration to a target VF in SRIOV. > > > > > (e.g. SIOV/SRIOV backward compatibility case) > > > > > > > > > > The upper layer stack could use this interface as the last step to > > > > > check > > > > > if one device is able to migrate to another device before triggering > > > > > a real > > > > > live migration procedure. > > > > > we are not sure if this interface is of value or help to you. please > > > > > don't > > > > > hesitate to drop your valuable comments. > > > > > > > > > > > > > > > (1) interface definition > > > > > The interface is defined in below way: > > > > > > > > > > __userspace > > > > > /\ \ > > > > > / \write > > > > > / read \ > > > > >/__ ___\|/_ > > > > > | migration_version | | migration_version |-->check migration > > > > > - - compatibility > > > > > device Adevice B > > > > > > > > > > > > > > > a device attribute named migration_version is defined under each > > > > > device's > > > > > sysfs node. e.g. > > > > > (/sys/bus/pci/devices/\:00\:02.0/$mdev_UUID/migration_version). > > > > > userspace tools read the migration_version as a string from the > > > > > source device, > > > > > and write it to the migration_version sysfs attribute in the target > > > > > device. > > > > > > > > > > The userspace should treat ANY of below conditions as two devices not > > > > > compatible: > > > > > - any one of the two devices does not have a migration_version > > > > > attribute > > > > > - error when reading from migration_version attribute of one device > > > > > - error when writing migration_version string of one device to > > > > > migration_version attribute of the other device > > > > > > > > > > The string read from migration_version attribute is defined by device > > > > > vendor > > > > > driver and is completely opaque to the userspace. > > > > > for a Intel vGPU, string format can be defined like > > > > > "parent device PCI ID" + "version of gvt driver" + "mdev type" + > > > > > "aggregator count". > > > > > > > > > > for an NVMe VF connecting to a remote storage. it could be > > > > > "PCI ID" + "driver version" + "configured remote storage URL" > > > > > > > > > > for a QAT VF, it may be > > > > > "PCI ID" + "driver version" + "supported encryption set". > > > > > > > > > > (to avoid namespace confliction from each vendor, we may prefix a > > > > > driver name to > > > > > each migration_version string. e.g. > > > > > i915-v1-8086-591d-i915-GVTg_V5_8-1) > > > > > > It's very strange to define it as opaque and then proceed to describe > > > the contents of that opaque string. The point is that its contents > > > are defined by the vendor driver to describe the device, driver version, > > > and possibly metadata about the configuration of the device. One > > > instance of a device might generate a different string from another. > > > The string that a device produces is not necessarily the only string > > > the vendor driver will accept, for example the driver might support > > > backwards compatible migrations. > > > > (As I've said in the previous discussion, off one of the patch series) > > > > My view is it makes sense to have a half-way house on the opaqueness of > > this string; I'd expect to have an ID and version that are human > > readable, maybe a device ID/name that's human interpretable and then a > > bunch of other cruft that maybe device/vendor/version specific. > > > > I'm thinking that we want to be able to report problems and include the > > string and the user to be able to easily identify the device that was > > complaining and notice a difference in versions, and perhaps also use > > it in compatibility patterns to find compatible hosts; but that does > > get tricky when it's a 'ask the device if it's compatible'. > > In the reply I just sent to Dan, I gave this example of what a > "compatibility string" might look like represented as json:
Re: device compatibility interface for live migration with assigned devices
* Alex Williamson (alex.william...@redhat.com) wrote: > On Tue, 14 Jul 2020 18:19:46 +0100 > "Dr. David Alan Gilbert" wrote: > > > * Alex Williamson (alex.william...@redhat.com) wrote: > > > On Tue, 14 Jul 2020 11:21:29 +0100 > > > Daniel P. Berrangé wrote: > > > > > > > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote: > > > > > hi folks, > > > > > we are defining a device migration compatibility interface that helps > > > > > upper > > > > > layer stack like openstack/ovirt/libvirt to check if two devices are > > > > > live migration compatible. > > > > > The "devices" here could be MDEVs, physical devices, or hybrid of the > > > > > two. > > > > > e.g. we could use it to check whether > > > > > - a src MDEV can migrate to a target MDEV, > > > > > - a src VF in SRIOV can migrate to a target VF in SRIOV, > > > > > - a src MDEV can migration to a target VF in SRIOV. > > > > > (e.g. SIOV/SRIOV backward compatibility case) > > > > > > > > > > The upper layer stack could use this interface as the last step to > > > > > check > > > > > if one device is able to migrate to another device before triggering > > > > > a real > > > > > live migration procedure. > > > > > we are not sure if this interface is of value or help to you. please > > > > > don't > > > > > hesitate to drop your valuable comments. > > > > > > > > > > > > > > > (1) interface definition > > > > > The interface is defined in below way: > > > > > > > > > > __userspace > > > > > /\ \ > > > > > / \write > > > > > / read \ > > > > >/__ ___\|/_ > > > > > | migration_version | | migration_version |-->check migration > > > > > - - compatibility > > > > > device Adevice B > > > > > > > > > > > > > > > a device attribute named migration_version is defined under each > > > > > device's > > > > > sysfs node. e.g. > > > > > (/sys/bus/pci/devices/\:00\:02.0/$mdev_UUID/migration_version). > > > > > userspace tools read the migration_version as a string from the > > > > > source device, > > > > > and write it to the migration_version sysfs attribute in the target > > > > > device. > > > > > > > > > > The userspace should treat ANY of below conditions as two devices not > > > > > compatible: > > > > > - any one of the two devices does not have a migration_version > > > > > attribute > > > > > - error when reading from migration_version attribute of one device > > > > > - error when writing migration_version string of one device to > > > > > migration_version attribute of the other device > > > > > > > > > > The string read from migration_version attribute is defined by device > > > > > vendor > > > > > driver and is completely opaque to the userspace. > > > > > for a Intel vGPU, string format can be defined like > > > > > "parent device PCI ID" + "version of gvt driver" + "mdev type" + > > > > > "aggregator count". > > > > > > > > > > for an NVMe VF connecting to a remote storage. it could be > > > > > "PCI ID" + "driver version" + "configured remote storage URL" > > > > > > > > > > for a QAT VF, it may be > > > > > "PCI ID" + "driver version" + "supported encryption set". > > > > > > > > > > (to avoid namespace confliction from each vendor, we may prefix a > > > > > driver name to > > > > > each migration_version string. e.g. > > > > > i915-v1-8086-591d-i915-GVTg_V5_8-1) > > > > > > It's very strange to define it as opaque and then proceed to describe > > > the contents of that opaque string. The point is that its contents > > > are defined by the vendor driver to describe the device, driver version, > > > and possibly metadata about the configuration of the device. One > > > instance of a device might generate a different string from another. > > > The string that a device produces is not necessarily the only string > > > the vendor driver will accept, for example the driver might support > > > backwards compatible migrations. > > > > (As I've said in the previous discussion, off one of the patch series) > > > > My view is it makes sense to have a half-way house on the opaqueness of > > this string; I'd expect to have an ID and version that are human > > readable, maybe a device ID/name that's human interpretable and then a > > bunch of other cruft that maybe device/vendor/version specific. > > > > I'm thinking that we want to be able to report problems and include the > > string and the user to be able to easily identify the device that was > > complaining and notice a difference in versions, and perhaps also use > > it in compatibility patterns to find compatible hosts; but that does > > get tricky when it's a 'ask the device if it's compatible'. > > In the reply I just sent to Dan, I gave this example of what a > "compatibility string" might look like represented as json: > > { > "
Re: device compatibility interface for live migration with assigned devices
Alex Williamson 于2020年7月15日周三 上午5:00写道: > On Tue, 14 Jul 2020 18:19:46 +0100 > "Dr. David Alan Gilbert" wrote: > > > * Alex Williamson (alex.william...@redhat.com) wrote: > > > On Tue, 14 Jul 2020 11:21:29 +0100 > > > Daniel P. Berrangé wrote: > > > > > > > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote: > > > > > hi folks, > > > > > we are defining a device migration compatibility interface that > helps upper > > > > > layer stack like openstack/ovirt/libvirt to check if two devices > are > > > > > live migration compatible. > > > > > The "devices" here could be MDEVs, physical devices, or hybrid of > the two. > > > > > e.g. we could use it to check whether > > > > > - a src MDEV can migrate to a target MDEV, > > > > > - a src VF in SRIOV can migrate to a target VF in SRIOV, > > > > > - a src MDEV can migration to a target VF in SRIOV. > > > > > (e.g. SIOV/SRIOV backward compatibility case) > > > > > > > > > > The upper layer stack could use this interface as the last step to > check > > > > > if one device is able to migrate to another device before > triggering a real > > > > > live migration procedure. > > > > > we are not sure if this interface is of value or help to you. > please don't > > > > > hesitate to drop your valuable comments. > > > > > > > > > > > > > > > (1) interface definition > > > > > The interface is defined in below way: > > > > > > > > > > __userspace > > > > > /\ \ > > > > > / \write > > > > > / read \ > > > > >/__ ___\|/_ > > > > > | migration_version | | migration_version |-->check migration > > > > > - - compatibility > > > > > device Adevice B > > > > > > > > > > > > > > > a device attribute named migration_version is defined under each > device's > > > > > sysfs node. e.g. > (/sys/bus/pci/devices/\:00\:02.0/$mdev_UUID/migration_version). > > > > > userspace tools read the migration_version as a string from the > source device, > > > > > and write it to the migration_version sysfs attribute in the > target device. > > > > > > > > > > The userspace should treat ANY of below conditions as two devices > not compatible: > > > > > - any one of the two devices does not have a migration_version > attribute > > > > > - error when reading from migration_version attribute of one device > > > > > - error when writing migration_version string of one device to > > > > > migration_version attribute of the other device > > > > > > > > > > The string read from migration_version attribute is defined by > device vendor > > > > > driver and is completely opaque to the userspace. > > > > > for a Intel vGPU, string format can be defined like > > > > > "parent device PCI ID" + "version of gvt driver" + "mdev type" + > "aggregator count". > > > > > > > > > > for an NVMe VF connecting to a remote storage. it could be > > > > > "PCI ID" + "driver version" + "configured remote storage URL" > > > > > > > > > > for a QAT VF, it may be > > > > > "PCI ID" + "driver version" + "supported encryption set". > > > > > > > > > > (to avoid namespace confliction from each vendor, we may prefix a > driver name to > > > > > each migration_version string. e.g. > i915-v1-8086-591d-i915-GVTg_V5_8-1) > > > > > > It's very strange to define it as opaque and then proceed to describe > > > the contents of that opaque string. The point is that its contents > > > are defined by the vendor driver to describe the device, driver > version, > > > and possibly metadata about the configuration of the device. One > > > instance of a device might generate a different string from another. > > > The string that a device produces is not necessarily the only string > > > the vendor driver will accept, for example the driver might support > > > backwards compatible migrations. > > > > (As I've said in the previous discussion, off one of the patch series) > > > > My view is it makes sense to have a half-way house on the opaqueness of > > this string; I'd expect to have an ID and version that are human > > readable, maybe a device ID/name that's human interpretable and then a > > bunch of other cruft that maybe device/vendor/version specific. > > > > I'm thinking that we want to be able to report problems and include the > > string and the user to be able to easily identify the device that was > > complaining and notice a difference in versions, and perhaps also use > > it in compatibility patterns to find compatible hosts; but that does > > get tricky when it's a 'ask the device if it's compatible'. > > In the reply I just sent to Dan, I gave this example of what a > "compatibility string" might look like represented as json: > > { > "device_api": "vfio-pci", > "vendor": "vendor-driver-name", > "version": { > "major": 0, > "minor": 1 > }, > The OpenStack Placement service doesn't support
Re: device compatibility interface for live migration with assigned devices
Alex Williamson 于2020年7月15日周三 上午12:16写道: > On Tue, 14 Jul 2020 11:21:29 +0100 > Daniel P. Berrangé wrote: > > > On Tue, Jul 14, 2020 at 07:29:57AM +0800, Yan Zhao wrote: > > > hi folks, > > > we are defining a device migration compatibility interface that helps > upper > > > layer stack like openstack/ovirt/libvirt to check if two devices are > > > live migration compatible. > > > The "devices" here could be MDEVs, physical devices, or hybrid of the > two. > > > e.g. we could use it to check whether > > > - a src MDEV can migrate to a target MDEV, > > > - a src VF in SRIOV can migrate to a target VF in SRIOV, > > > - a src MDEV can migration to a target VF in SRIOV. > > > (e.g. SIOV/SRIOV backward compatibility case) > > > > > > The upper layer stack could use this interface as the last step to > check > > > if one device is able to migrate to another device before triggering a > real > > > live migration procedure. > > > we are not sure if this interface is of value or help to you. please > don't > > > hesitate to drop your valuable comments. > > > > > > > > > (1) interface definition > > > The interface is defined in below way: > > > > > > __userspace > > > /\ \ > > > / \write > > > / read \ > > >/__ ___\|/_ > > > | migration_version | | migration_version |-->check migration > > > - - compatibility > > > device Adevice B > > > > > > > > > a device attribute named migration_version is defined under each > device's > > > sysfs node. e.g. > (/sys/bus/pci/devices/\:00\:02.0/$mdev_UUID/migration_version). > > > userspace tools read the migration_version as a string from the source > device, > > > and write it to the migration_version sysfs attribute in the target > device. > > > > > > The userspace should treat ANY of below conditions as two devices not > compatible: > > > - any one of the two devices does not have a migration_version > attribute > > > - error when reading from migration_version attribute of one device > > > - error when writing migration_version string of one device to > > > migration_version attribute of the other device > > > > > > The string read from migration_version attribute is defined by device > vendor > > > driver and is completely opaque to the userspace. > > > for a Intel vGPU, string format can be defined like > > > "parent device PCI ID" + "version of gvt driver" + "mdev type" + > "aggregator count". > > > > > for an NVMe VF connecting to a remote storage. it could be > > > "PCI ID" + "driver version" + "configured remote storage URL" > If the "configured remote storage URL" is something configuration setting before the usage, then it isn't something we need for migration compatible check. Openstack only needs to know the target device's driver and hardware compatible for migration, then the scheduler will choose a host which such device, and then Openstack will pre-configure the target host and target device before the migration, then openstack will configure the correct remote storage URL to the device. If we want, we can do a sanity check after the live migration with the os. > > > > > > for a QAT VF, it may be > > > "PCI ID" + "driver version" + "supported encryption set". > > > > > > (to avoid namespace confliction from each vendor, we may prefix a > driver name to > > > each migration_version string. e.g. i915-v1-8086-591d-i915-GVTg_V5_8-1) > > It's very strange to define it as opaque and then proceed to describe > the contents of that opaque string. The point is that its contents > are defined by the vendor driver to describe the device, driver version, > and possibly metadata about the configuration of the device. One > instance of a device might generate a different string from another. > The string that a device produces is not necessarily the only string > the vendor driver will accept, for example the driver might support > backwards compatible migrations. > > > > (2) backgrounds > > > > > > The reason we hope the migration_version string is opaque to the > userspace > > > is that it is hard to generalize standard comparing fields and > comparing > > > methods for different devices from different vendors. > > > Though userspace now could still do a simple string compare to check if > > > two devices are compatible, and result should also be right, it's still > > > too limited as it excludes the possible candidate whose > migration_version > > > string fails to be equal. > > > e.g. an MDEV with mdev_type_1, aggregator count 3 is probably > compatible > > > with another MDEV with mdev_type_3, aggregator count 1, even their > > > migration_version strings are not equal. > > > (assumed mdev_type_3 is of 3 times equal resources of mdev_type_1). > > > > > > besides that, driver version + configured resources are all elements > demanding > > > t