Re: [Qemu-block] [VDSM] travis tests fail consistently since Apr 14
On 06/11/2018 06:07 PM, Nir Soffer wrote: Correct image files should be aligned to 512 byte sectors, so something is wrong with your image to start with (hard disks don't have half sectors). Ideally, a guest will never see an unaligned image size: qemu should round UP when creating an image, and then truncate DOWN when opening an unaligned image that it cannot resize. qemu-img DOES round the guest virtual size up on creation; and thus, for the raw format, a file created by qemu-img will be aligned, and you can only get an unaligned image by manual efforts. But for qcow2 files, the qcow2 spec is clear that an incomplete final cluster (whether or not that last cluster end is also sector-aligned) is well-formed with the tail reading as zeroes; if the last cluster is guest-visible, qemu-img will have rounded it up to sector (but not necessarily cluster) alignment. But if the last cluster is something else, like the refcount, L1, or L2 table, then yes, it is very common that the consumed disk size is not sector aligned. $ qemu-img create -f qcow2 test3 123 Formatting 'test3', fmt=qcow2 size=123 cluster_size=65536 lazy_refcounts=off refcount_bits=16 $ qemu-img info test3 image: test3 file format: qcow2 virtual size: 512 (512 bytes) disk size: 196K cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false $ ls -l test3 -rw-r--r--. 1 eblake eblake 196616 Jun 11 20:32 test3 $ echo $((196616/512*512)) 196608 So, you can see that the guest-visible size was rounded up (123 became 512), which is now sector-aligned but not cluster-aligned; and that the final cluster occupies only 8 bytes at the moment (the difference between 196616 and 196608). Maybe we should improve qemu to always create cluster-aligned qcow2 images, but that's a bigger audit. Right. We indeed forbid uploads on unaligned images in the UI. But we found that qemu-img creates unaligned images: $ qemu-img create -f qcow2 empty.raw 10g Formatting 'empty.raw', fmt=qcow2 size=10737418240 cluster_size=65536 lazy_refcounts=off refcount_bits=16 $ ls -l empty.raw -rw-r--r--. 1 nsoffer nsoffer 196768 Jun 12 01:59 empty.raw $ python -c "print 196768 % 512" 160 $ qemu-img map -f raw --output json test.raw Wait; where did test.raw come from; given that your earlier commands were dealing with empty.raw? And mapping a qcow2 file as though it were raw is indeed confusing (you don't want to do that for a guest using the image). The image becomes aligned once you write anything into it, but I still find it strange that qemu-img create such files. Shouldn't it create always 3 complete clusters? Because historical versions of qemu do not, we already have to cope with unaligned images (when opening them as qcow2; but not when transliterating and opening them as raw, since it took this long to find the regression); we could make future versions of qemu always create cluster-aligned images, but you'd still have to cope with existing images that aren't. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [Qemu-block] [VDSM] travis tests fail consistently since Apr 14
On Mon, Jun 11, 2018 at 7:20 PM Kevin Wolf wrote: > Am 11.06.2018 um 12:43 hat Nir Soffer geschrieben: > > On Mon, Jun 11, 2018 at 1:03 PM Maor Lipchuk > wrote: > > > > > On Tue, Jun 5, 2018 at 9:40 AM, Dan Kenigsberg > wrote: > > > > > >> On Mon, Jun 4, 2018 at 7:14 PM, Nir Soffer > wrote: > > >> > On Mon, Jun 4, 2018 at 6:56 PM Dan Kenigsberg > > >> wrote: > > >> >> > > >> >> On Tue, May 8, 2018 at 11:59 AM, Nir Soffer > > >> wrote: > > >> >> > There are several issues: > > >> >> > > > >> >> > 1. coverage fail after this patch: > > >> >> > > > >> >> > > > >> > https://github.com/oVirt/vdsm/commit/6b905c2c134bcf344961d28eefbd05f2838d2ec8 > > >> >> > > > >> >> > https://travis-ci.org/oVirt/vdsm/builds/366574414 > > >> >> > ... > > >> >> > pwd > > >> >> > /vdsm/tests > > >> >> > ls .cov* > > >> >> > ls: cannot access .cov*: No such file or directory > > >> >> > make[1]: *** [check] Error 2 > > >> >> > make[1]: Leaving directory `/vdsm/tests' > > >> >> > > >> >> That was me, sorry. This should solve it: > > >> >> https://gerrit.ovirt.org/#/c/91925/ > > >> > > > >> > Thanks! > > >> > > > >> >> BTW, on my fc28 I see TestCountClusters.test_multiple_blocks > failing > > >> with > > >> >> > > >> >> E Error: Command ['/usr/bin/qemu-img', 'map', '--output', > > >> >> 'json', '/var/tmp/vdsm/test_multiple_blocks0/test'] failed with > rc=-6 > > >> >> out='' err="qemu-img: > > >> >> /builddir/build/BUILD/qemu-2.12.0-rc1/qemu-img.c:2680: > > >> >> get_block_status: Assertion `bytes' failed.\n" > > >> >> > > >> >> Any idea what's that? > > >> > > > >> > > > >> > Looks like qemu-img bug. > > >> > > > >> > Can you file a qemu-img bug? > > >> > > >> I hope Maor can translate the test to qemu-img speak. > > >> > > > > > > Opened the following bug: > > > https://bugzilla.redhat.com/1589738 > > > > > > > Adding qemu-block > > It's related to the unaligned image size. Right, I think we forgot the -1 in the test setup: f.seek(16 * 1024 - 1) f.write(b"x") But it is good that we made this error :-) > Correct image files should be > aligned to 512 byte sectors, so something is wrong with your image to > start with (hard disks don't have half sectors). > Right. We indeed forbid uploads on unaligned images in the UI. But we found that qemu-img creates unaligned images: $ qemu-img create -f qcow2 empty.raw 10g Formatting 'empty.raw', fmt=qcow2 size=10737418240 cluster_size=65536 lazy_refcounts=off refcount_bits=16 $ ls -l empty.raw -rw-r--r--. 1 nsoffer nsoffer 196768 Jun 12 01:59 empty.raw $ python -c "print 196768 % 512" 160 $ qemu-img map -f raw --output json test.raw [{ "start": 0, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 0}, { "start": 4096, "length": 12288, "depth": 0, "zero": true, "data": false, "offset": 4096}, { "start": 16384, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": 16384}, qemu-img: /builddir/build/BUILD/qemu-2.12.0-rc1/qemu-img.c:2680: get_block_status: Assertion `bytes' failed. Aborted (core dumped) The image becomes aligned once you write anything into it, but I still find it strange that qemu-img create such files. Shouldn't it create always 3 complete clusters? > > Anyway, git bisects points to this one: > > a290f085901b528265787cd27ebda19c970be4ee is the first bad commit > commit a290f085901b528265787cd27ebda19c970be4ee > Author: Eric Blake > Date: Tue Feb 13 14:26:44 2018 -0600 > > file-posix: Switch to .bdrv_co_block_status() > > We are gradually moving away from sector-based interfaces, towards > byte-based. Update the file protocol driver accordingly. > > In want_zero mode, we continue to report fine-grained hole > information (the caller wants as much mapping detail as possible); > but when not in that mode, the caller prefers larger *pnum and > merely cares about what offsets are allocated at this layer, rather > than where the holes live. Since holes still read as zeroes at > this layer (rather than deferring to a backing layer), we can take > the shortcut of skipping lseek(), and merely state that all bytes > are allocated. > > We can also drop redundant bounds checks that are already > guaranteed by the block layer. > > Signed-off-by: Eric Blake > Reviewed-by: Vladimir Sementsov-Ogievskiy > Reviewed-by: Fam Zheng > Signed-off-by: Kevin Wolf > > I think the problem is a bit higher up the call stack, but I'm not > completely sure yet. It manifests in img_map(), in this code: > > while (curr.start + curr.length < length) { > ... > n = QEMU_ALIGN_DOWN(MIN(1 << 30, length - offset), > BDRV_SECTOR_SIZE); > ret = get_block_status(bs, offset, n, ); > > The loop condition is still true because a single byte is left to be > processed, but n is aligned down to 0. I'm not sure why the > QEMU_ALIGN_DOWN() is even there. > > Eric, would just removing the QEMU_ALIGN_DOWN() be correct? > > Kevin >
Re: [Qemu-block] [VDSM] travis tests fail consistently since Apr 14
On 06/11/2018 11:19 AM, Kevin Wolf wrote: Opened the following bug: https://bugzilla.redhat.com/1589738 Adding qemu-block It's related to the unaligned image size. Correct image files should be aligned to 512 byte sectors, so something is wrong with your image to start with (hard disks don't have half sectors). Anyway, git bisects points to this one: a290f085901b528265787cd27ebda19c970be4ee is the first bad commit commit a290f085901b528265787cd27ebda19c970be4ee Author: Eric Blake Date: Tue Feb 13 14:26:44 2018 -0600 file-posix: Switch to .bdrv_co_block_status() Hmm, definitely fallout from my changes. I think the problem is a bit higher up the call stack, but I'm not completely sure yet. It manifests in img_map(), in this code: while (curr.start + curr.length < length) { ... n = QEMU_ALIGN_DOWN(MIN(1 << 30, length - offset), BDRV_SECTOR_SIZE); ret = get_block_status(bs, offset, n, ); The loop condition is still true because a single byte is left to be processed, but n is aligned down to 0. I'm not sure why the QEMU_ALIGN_DOWN() is even there. Eric, would just removing the QEMU_ALIGN_DOWN() be correct? I think so, but I'm testing it now. If so, the real culprit was that I added the rounding in commit 5e344dd8 when I switched qemu-img.c get_block_status() to take bytes but operate in sectors, but didn't remove it when I later removed sector-based limitations in commit 237d78f8. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [Qemu-block] [VDSM] travis tests fail consistently since Apr 14
Am 11.06.2018 um 12:43 hat Nir Soffer geschrieben: > On Mon, Jun 11, 2018 at 1:03 PM Maor Lipchuk wrote: > > > On Tue, Jun 5, 2018 at 9:40 AM, Dan Kenigsberg wrote: > > > >> On Mon, Jun 4, 2018 at 7:14 PM, Nir Soffer wrote: > >> > On Mon, Jun 4, 2018 at 6:56 PM Dan Kenigsberg > >> wrote: > >> >> > >> >> On Tue, May 8, 2018 at 11:59 AM, Nir Soffer > >> wrote: > >> >> > There are several issues: > >> >> > > >> >> > 1. coverage fail after this patch: > >> >> > > >> >> > > >> https://github.com/oVirt/vdsm/commit/6b905c2c134bcf344961d28eefbd05f2838d2ec8 > >> >> > > >> >> > https://travis-ci.org/oVirt/vdsm/builds/366574414 > >> >> > ... > >> >> > pwd > >> >> > /vdsm/tests > >> >> > ls .cov* > >> >> > ls: cannot access .cov*: No such file or directory > >> >> > make[1]: *** [check] Error 2 > >> >> > make[1]: Leaving directory `/vdsm/tests' > >> >> > >> >> That was me, sorry. This should solve it: > >> >> https://gerrit.ovirt.org/#/c/91925/ > >> > > >> > Thanks! > >> > > >> >> BTW, on my fc28 I see TestCountClusters.test_multiple_blocks failing > >> with > >> >> > >> >> E Error: Command ['/usr/bin/qemu-img', 'map', '--output', > >> >> 'json', '/var/tmp/vdsm/test_multiple_blocks0/test'] failed with rc=-6 > >> >> out='' err="qemu-img: > >> >> /builddir/build/BUILD/qemu-2.12.0-rc1/qemu-img.c:2680: > >> >> get_block_status: Assertion `bytes' failed.\n" > >> >> > >> >> Any idea what's that? > >> > > >> > > >> > Looks like qemu-img bug. > >> > > >> > Can you file a qemu-img bug? > >> > >> I hope Maor can translate the test to qemu-img speak. > >> > > > > Opened the following bug: > > https://bugzilla.redhat.com/1589738 > > > > Adding qemu-block It's related to the unaligned image size. Correct image files should be aligned to 512 byte sectors, so something is wrong with your image to start with (hard disks don't have half sectors). Anyway, git bisects points to this one: a290f085901b528265787cd27ebda19c970be4ee is the first bad commit commit a290f085901b528265787cd27ebda19c970be4ee Author: Eric Blake Date: Tue Feb 13 14:26:44 2018 -0600 file-posix: Switch to .bdrv_co_block_status() We are gradually moving away from sector-based interfaces, towards byte-based. Update the file protocol driver accordingly. In want_zero mode, we continue to report fine-grained hole information (the caller wants as much mapping detail as possible); but when not in that mode, the caller prefers larger *pnum and merely cares about what offsets are allocated at this layer, rather than where the holes live. Since holes still read as zeroes at this layer (rather than deferring to a backing layer), we can take the shortcut of skipping lseek(), and merely state that all bytes are allocated. We can also drop redundant bounds checks that are already guaranteed by the block layer. Signed-off-by: Eric Blake Reviewed-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Fam Zheng Signed-off-by: Kevin Wolf I think the problem is a bit higher up the call stack, but I'm not completely sure yet. It manifests in img_map(), in this code: while (curr.start + curr.length < length) { ... n = QEMU_ALIGN_DOWN(MIN(1 << 30, length - offset), BDRV_SECTOR_SIZE); ret = get_block_status(bs, offset, n, ); The loop condition is still true because a single byte is left to be processed, but n is aligned down to 0. I'm not sure why the QEMU_ALIGN_DOWN() is even there. Eric, would just removing the QEMU_ALIGN_DOWN() be correct? Kevin
Re: [Qemu-block] [VDSM] travis tests fail consistently since Apr 14
On Mon, Jun 11, 2018 at 1:03 PM Maor Lipchuk wrote: > On Tue, Jun 5, 2018 at 9:40 AM, Dan Kenigsberg wrote: > >> On Mon, Jun 4, 2018 at 7:14 PM, Nir Soffer wrote: >> > On Mon, Jun 4, 2018 at 6:56 PM Dan Kenigsberg >> wrote: >> >> >> >> On Tue, May 8, 2018 at 11:59 AM, Nir Soffer >> wrote: >> >> > There are several issues: >> >> > >> >> > 1. coverage fail after this patch: >> >> > >> >> > >> https://github.com/oVirt/vdsm/commit/6b905c2c134bcf344961d28eefbd05f2838d2ec8 >> >> > >> >> > https://travis-ci.org/oVirt/vdsm/builds/366574414 >> >> > ... >> >> > pwd >> >> > /vdsm/tests >> >> > ls .cov* >> >> > ls: cannot access .cov*: No such file or directory >> >> > make[1]: *** [check] Error 2 >> >> > make[1]: Leaving directory `/vdsm/tests' >> >> >> >> That was me, sorry. This should solve it: >> >> https://gerrit.ovirt.org/#/c/91925/ >> > >> > Thanks! >> > >> >> BTW, on my fc28 I see TestCountClusters.test_multiple_blocks failing >> with >> >> >> >> E Error: Command ['/usr/bin/qemu-img', 'map', '--output', >> >> 'json', '/var/tmp/vdsm/test_multiple_blocks0/test'] failed with rc=-6 >> >> out='' err="qemu-img: >> >> /builddir/build/BUILD/qemu-2.12.0-rc1/qemu-img.c:2680: >> >> get_block_status: Assertion `bytes' failed.\n" >> >> >> >> Any idea what's that? >> > >> > >> > Looks like qemu-img bug. >> > >> > Can you file a qemu-img bug? >> >> I hope Maor can translate the test to qemu-img speak. >> > > Opened the following bug: > https://bugzilla.redhat.com/1589738 > Adding qemu-block