On Mon, Aug 9, 2021 at 4:01 PM Nir Soffer <nsof...@redhat.com> wrote: > > On Mon, Aug 9, 2021 at 2:42 PM Yedidyah Bar David <d...@redhat.com> wrote: > > > > On Mon, Aug 9, 2021 at 1:43 PM Nir Soffer <nsof...@redhat.com> wrote: > > > > > > On Mon, Aug 9, 2021 at 10:35 AM Yedidyah Bar David <d...@redhat.com> > > > wrote: > > > > > > > > On Sun, Aug 8, 2021 at 5:42 PM Code Review <ger...@ovirt.org> wrote: > > > > > > > > > > From Jenkins CI <jenk...@ovirt.org>: > > > > > > > > > > Jenkins CI has posted comments on this change. ( > > > > > https://gerrit.ovirt.org/c/ovirt-system-tests/+/115392 ) > > > > > > > > > > Change subject: HE: Use node image > > > > > ...................................................................... > > > > > > > > > > > > > > > Patch Set 13: Continuous-Integration-1 > > > > > > > > > > Build Failed > > > > > > > > While trying to deactivate a host, the engine wanted to migrate a VM > > > > (vm0) from host-0 to host-1. vdsm log of host-0 says: > > > > > > > > 2021-08-08 14:31:10,076+0000 ERROR (migsrc/cde311f9) [virt.vm] > > > > (vmId='cde311f9-9a33-4eb9-8338-fa22ff49edc2') Failed to migrate > > > > (migration:503) > > > > Traceback (most recent call last): > > > > File "/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line > > > > 477, in _regular_run > > > > time.time(), machineParams > > > > File "/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line > > > > 578, in _startUnderlyingMigration > > > > self._perform_with_conv_schedule(duri, muri) > > > > File "/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line > > > > 667, in _perform_with_conv_schedule > > > > self._perform_migration(duri, muri) > > > > File "/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line > > > > 596, in _perform_migration > > > > self._migration_flags) > > > > File "/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line > > > > 159, in call > > > > return getattr(self._vm._dom, name)(*a, **kw) > > > > File "/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line > > > > 101, in f > > > > ret = attr(*args, **kwargs) > > > > File > > > > "/usr/lib/python3.6/site-packages/vdsm/common/libvirtconnection.py", > > > > line 131, in wrapper > > > > ret = f(*args, **kwargs) > > > > File "/usr/lib/python3.6/site-packages/vdsm/common/function.py", > > > > line 94, in wrapper > > > > return func(inst, *args, **kwargs) > > > > File "/usr/lib64/python3.6/site-packages/libvirt.py", line 2126, in > > > > migrateToURI3 > > > > raise libvirtError('virDomainMigrateToURI3() failed') > > > > libvirt.libvirtError: Unsafe migration: Migration without shared > > > > storage is unsafe > > > > > > Please share the vm xml: > > > > > > sudo virsh -r dumpxl vm-name > > > > I think you should be able to find a dump of it in vdsm.log: > > > > https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/18650/artifact/check-patch.he-basic_suite_master.el8.x86_64/test_logs/ost-he-basic-suite-master-host-0/var/log/vdsm/vdsm.log > > > > I think the first line of starting a migration is: > > > > 2021-08-08 14:31:08,350+0000 DEBUG (jsonrpc/4) [jsonrpc.JsonRpcServer] > > Calling 'VM.migrate' in bridge with {'vmID': > > 'cde311f9-9a33-4eb9-8338-fa22ff49edc2', 'params': > > > > A few lines later: > > > > 2021-08-08 14:31:08,387+0000 DEBUG (migsrc/cde311f9) > > [virt.metadata.Descriptor] dumped metadata for > > cde311f9-9a33-4eb9-8338-fa22ff49edc2: <?xml version='1.0' > > encoding='utf-8'?> > > <vm> > > <balloonTarget type="int">98304</balloonTarget> > > This is not the vm xml but the metadata xml. > > Looking at the logs on both hosts: > > [nsoffer@sparse ost]$ head -1 *vdsm.log > ==> host0-vdsm.log <== > 2021-08-08 13:16:04,676+0000 INFO (MainThread) [vds] (PID: 65169) I > am the actual vdsm 4.40.80.3.12.git6d67b935b > ost-he-basic-suite-master-host-0 (4.18.0-326.el8.x86_64) (vdsmd:162) > > ==> host1-vdsm.log <== > 2021-08-08 15:40:54,367+0200 INFO (MainThread) [vds] (PID: 23005) I > am the actual vdsm 4.40.80.4.5.git4309a3949 > ost-he-basic-suite-master-host-1 (4.18.0-326.el8.x86_64) (vdsmd:162) > > - The hosts clocks are using different time zones (+0000 vs +0200) is > this intended? > - You are testing different versions of vdsm - is this intended? > > We have about 60 errors: > $ grep 'Migration without shared storage is unsafe' host0-vdsm.log | wc -l > 60 > > Looking at the first migration that failed, the vm xml is here: > > 2021-08-08 14:20:34,127+0000 INFO (vm/cde311f9) [virt.vm] > (vmId='cde311f9-9a33-4eb9-8338-fa22ff49edc2') <?xml version='1.0' e > ncoding='utf-8'?> > <domain xmlns:ns0="http://libvirt.org/schemas/domain/qemu/1.0" > xmlns:ovirt-vm="http://ovirt.org/vm/1.0" type="kvm"> > <name>vm0</name> > <uuid>cde311f9-9a33-4eb9-8338-fa22ff49edc2</uuid> > ... > > The relevant parts for storage are: > > <disk device="disk" snapshot="no" type="file"> > <driver cache="none" error_policy="stop" io="threads" > iothread="1" name="qemu" type="qcow2" /> > <source > file="/rhev/data-center/7d97ea80-f849-11eb-ac79-5452d501341a/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87"> > <seclabel model="dac" relabel="no" /> > </source> > <target bus="virtio" dev="vda" /> > <serial>20002ad2-4a97-4d2f-b3fc-c103477b5b91</serial> > <boot order="1" /> > <alias name="ua-20002ad2-4a97-4d2f-b3fc-c103477b5b91" /> > <address bus="0x05" domain="0x0000" function="0x0" > slot="0x00" type="pci" /> > </disk> > <disk device="disk" snapshot="no" type="block"> > <driver cache="none" error_policy="stop" io="native" > name="qemu" type="raw" /> > <source dev="/dev/mapper/36001405bc9d94e4419b4b80a2f702e2f"> > <seclabel model="dac" relabel="no" /> > </source> > <target bus="scsi" dev="sda" /> > <serial>738c8486-8929-44ec-9083-69327bde9c65</serial> > <alias name="ua-738c8486-8929-44ec-9083-69327bde9c65" /> > <address bus="0" controller="0" target="0" type="drive" unit="0" > /> > </disk> > > So we have one qcow2 disk on file storage, and one direct lun. > > On the destination, the first migration attempt is here: > > 2021-08-08 16:31:08,437+0200 DEBUG (jsonrpc/2) [jsonrpc.JsonRpcServer] > Calling 'VM.migrationCreate' in bridge with {'vmID': 'c > de311f9-9a33-4eb9-8338-fa22ff49edc2', 'params': {'_srcDomXML': > '<domain type=\'kvm\' id=\'6\' xmlns:qemu=\'http://libvirt.org/ > ... > > We prepare the qcow2 disk: > > 2021-08-08 16:31:09,313+0200 INFO (vm/cde311f9) [vdsm.api] FINISH > prepareImage return={'path': > '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87', > 'info': {'type': 'file', 'path': > '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87'}, > 'imgVolumesInfo': [{'domainID': > '46fa5761-bb9e-46be-8f1c-35f4b03d0203', 'imageID': > '20002ad2-4a97-4d2f-b3fc-c103477b5b91', 'volumeID': > '1d3f07dc-b481-492f-a2a6-7c46689d82ba', 'path': > '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/1d3f07dc-b481-492f-a2a6-7c46689d82ba', > 'leasePath': > '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/1d3f07dc-b481-492f-a2a6-7c46689d82ba.lease', > 'leaseOffset': 0}, {'domainID': > '46fa5761-bb9e-46be-8f1c-35f4b03d0203', 'imageID': > '20002ad2-4a97-4d2f-b3fc-c103477b5b91', 'volumeID': > '614abd56-4d4f-4412-aa2a-3f7bad2f3a87', 'path': > '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87', > 'leasePath': > '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87.lease', > 'leaseOffset': 0}, {'domainID': > '46fa5761-bb9e-46be-8f1c-35f4b03d0203', 'imageID': > '20002ad2-4a97-4d2f-b3fc-c103477b5b91', 'volumeID': > 'a4309ef3-01bb-45db-8bf7-0f9498a7feeb', 'path': > '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/a4309ef3-01bb-45db-8bf7-0f9498a7feeb', > 'leasePath': > '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/a4309ef3-01bb-45db-8bf7-0f9498a7feeb.lease', > 'leaseOffset': 0}]} from=internal, > task_id=f2dfbacb-154c-4f5b-b57d-affcbf419691 (api:54) > 2021-08-08 16:31:09,314+0200 INFO (vm/cde311f9) [vds] prepared volume > path: > /rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87 > (clientIF:518) > > And the direct lun: > > 2021-08-08 16:31:09,315+0200 INFO (vm/cde311f9) [vdsm.api] START > appropriateDevice(guid='36001405bc9d94e4419b4b80a2f702e2f', > thiefId='cde311f9-9a33-4eb9-8338-fa22ff49edc2', deviceType='mpath') > from=internal, task_id=220f1c1f-caec-4327-a157-7a4fab3b54a5 (api:48) > 2021-08-08 16:31:09,550+0200 INFO (vm/cde311f9) [vdsm.api] FINISH > appropriateDevice return={'truesize': '21474836480', 'apparentsize': > '21474836480', 'path': > '/dev/mapper/36001405bc9d94e4419b4b80a2f702e2f'} from=internal, > task_id=220f1c1f-caec-4327-a157-7a4fab3b54a5 (api:54) > 2021-08-08 16:31:09,550+0200 INFO (vm/cde311f9) [vds] prepared volume > path: /dev/mapper/36001405bc9d94e4419b4b80a2f702e2f (clientIF:518) > > The interesting thing is that the qcow2 disk is using different path > on the source > and destination vms: > > source: > /rhev/data-center/7d97ea80-f849-11eb-ac79-5452d501341a/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87 > > destination: > /rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87 > > On the source we have: > /rhev/data-center/pool-id/domain-id/images/image-id/volume-id > > On the destination we have: > /rhev/data-center/mnt/mountpoint/domain-id/images/image-id/volume-id > > Both lead to the same disk, but libvirt probably compared the strings > and decided that > we don't have shared storage. > > It may be new validation in libvirt, or maybe engine changed the way the disk > path is added to the xml recently? > > In vdsm we use os.path.realpath() when we compare disk paths since we know > that we can have different paths to the same volume. > > This kind of issue is likely to be reproducible without ost.
Trying to understand why the drive path is different on source: The source path was created by a snapshot: 2021-08-08 14:19:40,219+0000 INFO (tasks/9) [storage.Volume] Request to create snapshot 20002ad2-4a97-4d2f-b3fc-c103477b5b91/ 614abd56-4d4f-4412-aa2a-3f7bad2f3a87 of volume 20002ad2-4a97-4d2f-b3fc-c103477b5b91/1d3f07dc-b481-492f-a2a6-7c46689d82ba with capacity 117440512 (fileVolume:528) Vdsm reports the volume as: 2021-08-08 14:19:56,944+0000 INFO (jsonrpc/6) [vdsm.api] FINISH getVolumeInfo return={'info': {'uuid': '614abd56-4d4f-4412-aa2a-3f7bad2f3a87', 'type': 'SPARSE', 'format': 'COW', 'disktype': 'DATA', 'voltype': 'LEAF', 'capacity': '117440512', 'parent': '1d3f07dc-b481-492f-a2a6-7c46689d82ba', 'description': '', 'pool': '', 'domain': '46fa5761-bb9e-46be-8f1c-35f4b03d0203', 'image': '20002ad2-4a97-4d2f-b3fc-c103477b5b91', 'ctime': '1628432381', 'mtime': '0', 'legality': 'LEGAL', 'generation': 0, 'apparentsize': '196616', 'truesize': '200704', 'status': 'OK', 'lease': {'path': '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87.lease', 'offset': 0, 'owners': [], 'version': None}, 'children': []}} from=::ffff:192.168.200.61,44352, flow_id=fe1c52bf-a1ea-462f-ac32-14cd8f16c411, task_id=14914c43-a38e-4692-9753-78ac2ddfbe5d (api:54) (Using the /rhev/data-center/mnt/mountpoint/...) The image was prepared later: 2021-08-08 14:20:32,047+0000 INFO (vm/cde311f9) [vdsm.api] START prepareImage(sdUUID='46fa5761-bb9e-46be-8f1c-35f4b03d0203', spUUID='7d97ea80-f849-11eb-ac79-5452d501341a', imgUUID='20002ad2-4a97-4d2f-b3fc-c103477b5b91', leafUUID='614abd56-4d4f-4412-aa ... 2021-08-08 14:20:32,726+0000 INFO (vm/cde311f9) [vdsm.api] FINISH prepareImage return={'path': '/rhev/data-center/mnt/192.168 .200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412 -aa2a-3f7bad2f3a87', 'info': {'type': 'file', 'path': '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-4 6be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87'}, 'imgVolumesInfo': [{ 'domainID': '46fa5761-bb9e-46be-8f1c-35f4b03d0203', 'imageID': '20002ad2-4a97-4d2f-b3fc-c103477b5b91', 'volumeID': '1d3f07dc-b 481-492f-a2a6-7c46689d82ba', 'path': '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d 0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/1d3f07dc-b481-492f-a2a6-7c46689d82ba', 'leasePath': '/rhev/data-center/mnt/19 2.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/1d3f07dc-b481 -492f-a2a6-7c46689d82ba.lease', 'leaseOffset': 0}, {'domainID': '46fa5761-bb9e-46be-8f1c-35f4b03d0203', 'imageID': '20002ad2-4 a97-4d2f-b3fc-c103477b5b91', 'volumeID': '614abd56-4d4f-4412-aa2a-3f7bad2f3a87', 'path': '/rhev/data-center/mnt/192.168.200.2: _exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3 f7bad2f3a87', 'leasePath': '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/image s/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87.lease', 'leaseOffset': 0}, {'domainID': '46fa5761- bb9e-46be-8f1c-35f4b03d0203', 'imageID': '20002ad2-4a97-4d2f-b3fc-c103477b5b91', 'volumeID': 'a4309ef3-01bb-45db-8bf7-0f9498a7 feeb', 'path': '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4 a97-4d2f-b3fc-c103477b5b91/a4309ef3-01bb-45db-8bf7-0f9498a7feeb', 'leasePath': '/rhev/data-center/mnt/192.168.200.2:_exports_n fs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/a4309ef3-01bb-45db-8bf7-0f9498a7fee b.lease', 'leaseOffset': 0}]} from=internal, task_id=92af0386-5c0f-4495-9af2-9d5c47d02c48 (api:54) 2021-08-08 14:20:32,726+0000 INFO (vm/cde311f9) [vds] prepared volume path: /rhev/data-center/mnt/192.168.200.2:_exports_nfs_ share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87 ( clientIF:518) ... 2021-08-08 14:20:32,726+0000 INFO (vm/cde311f9) [vds] prepared volume path: /rhev/data-center/mnt/192.168.200.2:_exports_nfs_ share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87 ( clientIF:518) And preapreImage returns the /rhev/data-center/mnt/... path - this path should be used by the VM! But then we see this suspicious log: 2021-08-08 14:20:33,975+0000 INFO (vm/cde311f9) [virt.vm] (vmId='cde311f9-9a33-4eb9-8338-fa22ff49edc2') drive 'vda' path: 'file=/rhev/data-center/7d97ea80-f849-11eb-ac79-5452d501341a/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87' -> '*file=/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87' (storagexml:341) And we see the wrong paths in the vm xml: <disk device="disk" snapshot="no" type="file"> <driver cache="none" error_policy="stop" io="threads" iothread="1" name="qemu" type="qcow2" /> <source file="/rhev/data-center/7d97ea80-f849-11eb-ac79-5452d501341a/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87"> <seclabel model="dac" relabel="no" /> </source> So it looks like code in storagexml is *changing* the path returned from storage (rhev/data-center/mnt/...), with another path (rhev/data-center/pool-id/...), which is incorrect and confuses libvirt. I don't think we change anything in this area in vdsm, so this is likely a new validation added in libvirt, revealing old bug in vdsm/engine. Nir _______________________________________________ Devel mailing list -- devel@ovirt.org To unsubscribe send an email to devel-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/devel@ovirt.org/message/U2ODXS2UGGZFAA72EL5BPDLGHOOFACUD/