On Mon, Aug 9, 2021 at 4:01 PM Nir Soffer <nsof...@redhat.com> wrote:
>
> On Mon, Aug 9, 2021 at 2:42 PM Yedidyah Bar David <d...@redhat.com> wrote:
> >
> > On Mon, Aug 9, 2021 at 1:43 PM Nir Soffer <nsof...@redhat.com> wrote:
> > >
> > > On Mon, Aug 9, 2021 at 10:35 AM Yedidyah Bar David <d...@redhat.com> 
> > > wrote:
> > > >
> > > > On Sun, Aug 8, 2021 at 5:42 PM Code Review <ger...@ovirt.org> wrote:
> > > > >
> > > > > From Jenkins CI <jenk...@ovirt.org>:
> > > > >
> > > > > Jenkins CI has posted comments on this change. ( 
> > > > > https://gerrit.ovirt.org/c/ovirt-system-tests/+/115392 )
> > > > >
> > > > > Change subject: HE: Use node image
> > > > > ......................................................................
> > > > >
> > > > >
> > > > > Patch Set 13: Continuous-Integration-1
> > > > >
> > > > > Build Failed
> > > >
> > > > While trying to deactivate a host, the engine wanted to migrate a VM
> > > > (vm0) from host-0 to host-1. vdsm log of host-0 says:
> > > >
> > > > 2021-08-08 14:31:10,076+0000 ERROR (migsrc/cde311f9) [virt.vm]
> > > > (vmId='cde311f9-9a33-4eb9-8338-fa22ff49edc2') Failed to migrate
> > > > (migration:503)
> > > > Traceback (most recent call last):
> > > >   File "/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line
> > > > 477, in _regular_run
> > > >     time.time(), machineParams
> > > >   File "/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line
> > > > 578, in _startUnderlyingMigration
> > > >     self._perform_with_conv_schedule(duri, muri)
> > > >   File "/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line
> > > > 667, in _perform_with_conv_schedule
> > > >     self._perform_migration(duri, muri)
> > > >   File "/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line
> > > > 596, in _perform_migration
> > > >     self._migration_flags)
> > > >   File "/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line
> > > > 159, in call
> > > >     return getattr(self._vm._dom, name)(*a, **kw)
> > > >   File "/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line 
> > > > 101, in f
> > > >     ret = attr(*args, **kwargs)
> > > >   File 
> > > > "/usr/lib/python3.6/site-packages/vdsm/common/libvirtconnection.py",
> > > > line 131, in wrapper
> > > >     ret = f(*args, **kwargs)
> > > >   File "/usr/lib/python3.6/site-packages/vdsm/common/function.py",
> > > > line 94, in wrapper
> > > >     return func(inst, *args, **kwargs)
> > > >   File "/usr/lib64/python3.6/site-packages/libvirt.py", line 2126, in
> > > > migrateToURI3
> > > >     raise libvirtError('virDomainMigrateToURI3() failed')
> > > > libvirt.libvirtError: Unsafe migration: Migration without shared
> > > > storage is unsafe
> > >
> > > Please share the vm xml:
> > >
> > >     sudo virsh -r dumpxl vm-name
> >
> > I think you should be able to find a dump of it in vdsm.log:
> >
> > https://jenkins.ovirt.org/job/ovirt-system-tests_standard-check-patch/18650/artifact/check-patch.he-basic_suite_master.el8.x86_64/test_logs/ost-he-basic-suite-master-host-0/var/log/vdsm/vdsm.log
> >
> > I think the first line of starting a migration is:
> >
> > 2021-08-08 14:31:08,350+0000 DEBUG (jsonrpc/4) [jsonrpc.JsonRpcServer]
> > Calling 'VM.migrate' in bridge with {'vmID':
> > 'cde311f9-9a33-4eb9-8338-fa22ff49edc2', 'params':
> >
> > A few lines later:
> >
> > 2021-08-08 14:31:08,387+0000 DEBUG (migsrc/cde311f9)
> > [virt.metadata.Descriptor] dumped metadata for
> > cde311f9-9a33-4eb9-8338-fa22ff49edc2: <?xml version='1.0'
> > encoding='utf-8'?>
> > <vm>
> >     <balloonTarget type="int">98304</balloonTarget>
>
> This is not the vm xml but the metadata xml.
>
> Looking at the logs on both hosts:
>
> [nsoffer@sparse ost]$ head -1 *vdsm.log
> ==> host0-vdsm.log <==
> 2021-08-08 13:16:04,676+0000 INFO  (MainThread) [vds] (PID: 65169) I
> am the actual vdsm 4.40.80.3.12.git6d67b935b
> ost-he-basic-suite-master-host-0 (4.18.0-326.el8.x86_64) (vdsmd:162)
>
> ==> host1-vdsm.log <==
> 2021-08-08 15:40:54,367+0200 INFO  (MainThread) [vds] (PID: 23005) I
> am the actual vdsm 4.40.80.4.5.git4309a3949
> ost-he-basic-suite-master-host-1 (4.18.0-326.el8.x86_64) (vdsmd:162)
>
> - The hosts clocks are using different time zones (+0000 vs +0200) is
> this intended?
> - You are testing different versions of vdsm - is this intended?
>
> We have about 60 errors:
> $ grep 'Migration without shared storage is unsafe' host0-vdsm.log | wc -l
> 60
>
> Looking at the first migration that failed, the vm xml is here:
>
> 2021-08-08 14:20:34,127+0000 INFO  (vm/cde311f9) [virt.vm]
> (vmId='cde311f9-9a33-4eb9-8338-fa22ff49edc2') <?xml version='1.0' e
> ncoding='utf-8'?>
> <domain xmlns:ns0="http://libvirt.org/schemas/domain/qemu/1.0";
> xmlns:ovirt-vm="http://ovirt.org/vm/1.0"; type="kvm">
>     <name>vm0</name>
>     <uuid>cde311f9-9a33-4eb9-8338-fa22ff49edc2</uuid>
> ...
>
> The relevant parts for storage are:
>
>         <disk device="disk" snapshot="no" type="file">
>             <driver cache="none" error_policy="stop" io="threads"
> iothread="1" name="qemu" type="qcow2" />
>             <source
> file="/rhev/data-center/7d97ea80-f849-11eb-ac79-5452d501341a/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87">
>                 <seclabel model="dac" relabel="no" />
>             </source>
>             <target bus="virtio" dev="vda" />
>             <serial>20002ad2-4a97-4d2f-b3fc-c103477b5b91</serial>
>             <boot order="1" />
>             <alias name="ua-20002ad2-4a97-4d2f-b3fc-c103477b5b91" />
>             <address bus="0x05" domain="0x0000" function="0x0"
> slot="0x00" type="pci" />
>         </disk>
>         <disk device="disk" snapshot="no" type="block">
>             <driver cache="none" error_policy="stop" io="native"
> name="qemu" type="raw" />
>             <source dev="/dev/mapper/36001405bc9d94e4419b4b80a2f702e2f">
>                 <seclabel model="dac" relabel="no" />
>             </source>
>             <target bus="scsi" dev="sda" />
>             <serial>738c8486-8929-44ec-9083-69327bde9c65</serial>
>             <alias name="ua-738c8486-8929-44ec-9083-69327bde9c65" />
>             <address bus="0" controller="0" target="0" type="drive" unit="0" 
> />
>         </disk>
>
> So we have one qcow2 disk on file storage, and one direct lun.
>
> On the destination, the first migration attempt is here:
>
> 2021-08-08 16:31:08,437+0200 DEBUG (jsonrpc/2) [jsonrpc.JsonRpcServer]
> Calling 'VM.migrationCreate' in bridge with {'vmID': 'c
> de311f9-9a33-4eb9-8338-fa22ff49edc2', 'params': {'_srcDomXML':
> '<domain type=\'kvm\' id=\'6\' xmlns:qemu=\'http://libvirt.org/
> ...
>
> We prepare the qcow2 disk:
>
> 2021-08-08 16:31:09,313+0200 INFO  (vm/cde311f9) [vdsm.api] FINISH
> prepareImage return={'path':
> '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87',
> 'info': {'type': 'file', 'path':
> '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87'},
> 'imgVolumesInfo': [{'domainID':
> '46fa5761-bb9e-46be-8f1c-35f4b03d0203', 'imageID':
> '20002ad2-4a97-4d2f-b3fc-c103477b5b91', 'volumeID':
> '1d3f07dc-b481-492f-a2a6-7c46689d82ba', 'path':
> '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/1d3f07dc-b481-492f-a2a6-7c46689d82ba',
> 'leasePath': 
> '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/1d3f07dc-b481-492f-a2a6-7c46689d82ba.lease',
> 'leaseOffset': 0}, {'domainID':
> '46fa5761-bb9e-46be-8f1c-35f4b03d0203', 'imageID':
> '20002ad2-4a97-4d2f-b3fc-c103477b5b91', 'volumeID':
> '614abd56-4d4f-4412-aa2a-3f7bad2f3a87', 'path':
> '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87',
> 'leasePath': 
> '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87.lease',
> 'leaseOffset': 0}, {'domainID':
> '46fa5761-bb9e-46be-8f1c-35f4b03d0203', 'imageID':
> '20002ad2-4a97-4d2f-b3fc-c103477b5b91', 'volumeID':
> 'a4309ef3-01bb-45db-8bf7-0f9498a7feeb', 'path':
> '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/a4309ef3-01bb-45db-8bf7-0f9498a7feeb',
> 'leasePath': 
> '/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/a4309ef3-01bb-45db-8bf7-0f9498a7feeb.lease',
> 'leaseOffset': 0}]} from=internal,
> task_id=f2dfbacb-154c-4f5b-b57d-affcbf419691 (api:54)
> 2021-08-08 16:31:09,314+0200 INFO  (vm/cde311f9) [vds] prepared volume
> path: 
> /rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87
> (clientIF:518)
>
> And the direct lun:
>
> 2021-08-08 16:31:09,315+0200 INFO  (vm/cde311f9) [vdsm.api] START
> appropriateDevice(guid='36001405bc9d94e4419b4b80a2f702e2f',
> thiefId='cde311f9-9a33-4eb9-8338-fa22ff49edc2', deviceType='mpath')
> from=internal, task_id=220f1c1f-caec-4327-a157-7a4fab3b54a5 (api:48)
> 2021-08-08 16:31:09,550+0200 INFO  (vm/cde311f9) [vdsm.api] FINISH
> appropriateDevice return={'truesize': '21474836480', 'apparentsize':
> '21474836480', 'path':
> '/dev/mapper/36001405bc9d94e4419b4b80a2f702e2f'} from=internal,
> task_id=220f1c1f-caec-4327-a157-7a4fab3b54a5 (api:54)
> 2021-08-08 16:31:09,550+0200 INFO  (vm/cde311f9) [vds] prepared volume
> path: /dev/mapper/36001405bc9d94e4419b4b80a2f702e2f (clientIF:518)
>
> The interesting thing is that the qcow2 disk is  using different path
> on the source
> and destination vms:
>
> source:
> /rhev/data-center/7d97ea80-f849-11eb-ac79-5452d501341a/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87
>
> destination:
> /rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87
>
> On the source we have:
> /rhev/data-center/pool-id/domain-id/images/image-id/volume-id
>
> On the destination we have:
> /rhev/data-center/mnt/mountpoint/domain-id/images/image-id/volume-id
>
> Both lead to the same disk, but libvirt probably compared the strings
> and decided that
> we don't have shared storage.
>
> It may be new validation in libvirt, or maybe engine changed the way the disk
> path is added to the xml recently?
>
> In vdsm we use os.path.realpath() when we compare disk paths since we know
> that we can have different paths to the same volume.
>
> This kind of issue is likely to be reproducible without ost.

Trying to understand why the drive path is different on source:

The source path was created by a snapshot:

2021-08-08 14:19:40,219+0000 INFO  (tasks/9) [storage.Volume] Request
to create snapshot 20002ad2-4a97-4d2f-b3fc-c103477b5b91/
614abd56-4d4f-4412-aa2a-3f7bad2f3a87 of volume
20002ad2-4a97-4d2f-b3fc-c103477b5b91/1d3f07dc-b481-492f-a2a6-7c46689d82ba
with
capacity 117440512 (fileVolume:528)

Vdsm reports the volume as:

2021-08-08 14:19:56,944+0000 INFO  (jsonrpc/6) [vdsm.api] FINISH
getVolumeInfo return={'info': {'uuid':
'614abd56-4d4f-4412-aa2a-3f7bad2f3a87', 'type': 'SPARSE', 'format':
'COW', 'disktype': 'DATA', 'voltype': 'LEAF', 'capacity': '117440512',
'parent': '1d3f07dc-b481-492f-a2a6-7c46689d82ba', 'description': '',
'pool': '', 'domain': '46fa5761-bb9e-46be-8f1c-35f4b03d0203', 'image':
'20002ad2-4a97-4d2f-b3fc-c103477b5b91', 'ctime': '1628432381',
'mtime': '0', 'legality': 'LEGAL', 'generation': 0, 'apparentsize':
'196616', 'truesize': '200704', 'status': 'OK', 'lease': {'path':
'/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87.lease',
'offset': 0, 'owners': [], 'version': None}, 'children': []}}
from=::ffff:192.168.200.61,44352,
flow_id=fe1c52bf-a1ea-462f-ac32-14cd8f16c411,
task_id=14914c43-a38e-4692-9753-78ac2ddfbe5d (api:54)

(Using the /rhev/data-center/mnt/mountpoint/...)

The image was prepared later:

2021-08-08 14:20:32,047+0000 INFO  (vm/cde311f9) [vdsm.api] START
prepareImage(sdUUID='46fa5761-bb9e-46be-8f1c-35f4b03d0203',
spUUID='7d97ea80-f849-11eb-ac79-5452d501341a',
imgUUID='20002ad2-4a97-4d2f-b3fc-c103477b5b91',
leafUUID='614abd56-4d4f-4412-aa
...
2021-08-08 14:20:32,726+0000 INFO  (vm/cde311f9) [vdsm.api] FINISH
prepareImage return={'path': '/rhev/data-center/mnt/192.168
.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412
-aa2a-3f7bad2f3a87', 'info': {'type': 'file', 'path':
'/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-4
6be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87'},
'imgVolumesInfo': [{
'domainID': '46fa5761-bb9e-46be-8f1c-35f4b03d0203', 'imageID':
'20002ad2-4a97-4d2f-b3fc-c103477b5b91', 'volumeID': '1d3f07dc-b
481-492f-a2a6-7c46689d82ba', 'path':
'/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d
0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/1d3f07dc-b481-492f-a2a6-7c46689d82ba',
'leasePath': '/rhev/data-center/mnt/19
2.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/1d3f07dc-b481
-492f-a2a6-7c46689d82ba.lease', 'leaseOffset': 0}, {'domainID':
'46fa5761-bb9e-46be-8f1c-35f4b03d0203', 'imageID': '20002ad2-4
a97-4d2f-b3fc-c103477b5b91', 'volumeID':
'614abd56-4d4f-4412-aa2a-3f7bad2f3a87', 'path':
'/rhev/data-center/mnt/192.168.200.2:
_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3
f7bad2f3a87', 'leasePath':
'/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/image
s/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87.lease',
'leaseOffset': 0}, {'domainID': '46fa5761-
bb9e-46be-8f1c-35f4b03d0203', 'imageID':
'20002ad2-4a97-4d2f-b3fc-c103477b5b91', 'volumeID':
'a4309ef3-01bb-45db-8bf7-0f9498a7
feeb', 'path': 
'/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4
a97-4d2f-b3fc-c103477b5b91/a4309ef3-01bb-45db-8bf7-0f9498a7feeb',
'leasePath': '/rhev/data-center/mnt/192.168.200.2:_exports_n
fs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/a4309ef3-01bb-45db-8bf7-0f9498a7fee
b.lease', 'leaseOffset': 0}]} from=internal,
task_id=92af0386-5c0f-4495-9af2-9d5c47d02c48 (api:54)
2021-08-08 14:20:32,726+0000 INFO  (vm/cde311f9) [vds] prepared volume
path: /rhev/data-center/mnt/192.168.200.2:_exports_nfs_
share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87
(
clientIF:518)
...
2021-08-08 14:20:32,726+0000 INFO  (vm/cde311f9) [vds] prepared volume
path: /rhev/data-center/mnt/192.168.200.2:_exports_nfs_
share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87
(
clientIF:518)

And preapreImage returns the /rhev/data-center/mnt/... path - this path should
be used by the VM!

But then we see this suspicious log:

2021-08-08 14:20:33,975+0000 INFO  (vm/cde311f9) [virt.vm]
(vmId='cde311f9-9a33-4eb9-8338-fa22ff49edc2') drive 'vda' path:
'file=/rhev/data-center/7d97ea80-f849-11eb-ac79-5452d501341a/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87'
-> 
'*file=/rhev/data-center/mnt/192.168.200.2:_exports_nfs_share1/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87'
(storagexml:341)

And we see the wrong paths in the vm xml:

        <disk device="disk" snapshot="no" type="file">
            <driver cache="none" error_policy="stop" io="threads"
iothread="1" name="qemu" type="qcow2" />
            <source
file="/rhev/data-center/7d97ea80-f849-11eb-ac79-5452d501341a/46fa5761-bb9e-46be-8f1c-35f4b03d0203/images/20002ad2-4a97-4d2f-b3fc-c103477b5b91/614abd56-4d4f-4412-aa2a-3f7bad2f3a87">
                <seclabel model="dac" relabel="no" />
            </source>


So it looks like code in storagexml is *changing* the path returned from storage
 (rhev/data-center/mnt/...),  with another path (rhev/data-center/pool-id/...),
which is incorrect and confuses libvirt.

I don't think we change anything in this area in vdsm, so this is likely a new
validation added in libvirt, revealing old bug in vdsm/engine.

Nir
_______________________________________________
Devel mailing list -- devel@ovirt.org
To unsubscribe send an email to devel-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/U2ODXS2UGGZFAA72EL5BPDLGHOOFACUD/

Reply via email to