Re: [ovirt-devel] Disk sizes not updated on unmap/discard

2020-10-02 Thread Eric Blake
On 10/2/20 3:41 AM, Kevin Wolf wrote:

>> Kevin, is this the expected behavior or a bug in qemu?
>>
>> The disk I tested is a single qcow2 image without the backing file, so
>> theoretically qemu can deallocate all the discarded clusters.
> 
> This is expected. Discard just frees the cluster whereever it is stored,
> but it doesn't compact the image, i.e. move data at higher offsets to
> lower offsets (which would be a rather expensive operation).
> 
> If your storage supports thin provisioning/hole punching (the most
> common case of this is sparse files on a filesystem), then you can use
> the freed space for something else. If it doesn't, it's just marked free
> on the qcow2 level and future writes to the image will allocate the
> freed space first instead of growing the image, but you won't be able to
> use it for things outside of the image.
> 
> In contrast, 'qemu-img convert' starts with an empty file and only
> writes what needs to be written, so it will result in a compacted image
> file that doesn't have holes and is as short as it can be.

Of course, writing a tool to defragment qcow2 files in-place is not a
bad idea, if someone wants a potentially fun project.  But it's not the
highest priority task (since copying to fresh storage gets the same
effect, albeit with a temporarily larger storage requirement), so I
won't hold my breath on someone jumping into such a task in the near future.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [ovirt-devel] Disk sizes not updated on unmap/discard

2020-10-02 Thread Kevin Wolf
Am 02.10.2020 um 00:57 hat Nir Soffer geschrieben:
> After sparsifying disk:
> 
> storage:
> $ qemu-img check /var/tmp/download.qcow2
> No errors were found on the image.
> 170/163840 = 0.10% allocated, 0.59% fragmented, 0.00% compressed clusters
> Image end offset: 11927552
> 
> $ ls -lhs /home/target/2/00
> 2.1G -rw-r--r--. 1 root root 100G Oct  2 01:14 /home/target/2/00
> 
> host:
> 
> # qemu-img check
> /dev/27f2b637-ffb1-48f9-8f68-63ed227392b9/42cf66df-43ad-4cfa-ab57-a943516155d1
> No errors were found on the image.
> 170/163840 = 0.10% allocated, 0.59% fragmented, 0.00% compressed clusters
> Image end offset: 4822138880
> 
> Allocation decreased from 50% to 0.1%, but image end offset
> decreased only from 5381423104 to 4822138880 (-10.5%).
> 
> I don't know if this is a behavior change in virt-sparsify or qemu or
> it was always like that.
> 
> We had an old and unused sparsifyVolume API in vdsm before 4.4. This
> did not use --in-place and was very complicated because of this. But I
> think it would work in this case, since qemu-img convert will drop the
> unallocated areas.
> 
> For example after downloading the sparsified disk, we get:
> 
> $ qemu-img check download.qcow2
> No errors were found on the image.
> 170/163840 = 0.10% allocated, 0.59% fragmented, 0.00% compressed clusters
> Image end offset: 11927552
> 
> 
> Kevin, is this the expected behavior or a bug in qemu?
> 
> The disk I tested is a single qcow2 image without the backing file, so
> theoretically qemu can deallocate all the discarded clusters.

This is expected. Discard just frees the cluster whereever it is stored,
but it doesn't compact the image, i.e. move data at higher offsets to
lower offsets (which would be a rather expensive operation).

If your storage supports thin provisioning/hole punching (the most
common case of this is sparse files on a filesystem), then you can use
the freed space for something else. If it doesn't, it's just marked free
on the qcow2 level and future writes to the image will allocate the
freed space first instead of growing the image, but you won't be able to
use it for things outside of the image.

In contrast, 'qemu-img convert' starts with an empty file and only
writes what needs to be written, so it will result in a compacted image
file that doesn't have holes and is as short as it can be.

Kevin




Re: [ovirt-devel] Disk sizes not updated on unmap/discard

2020-10-02 Thread Richard W.M. Jones
On Fri, Oct 02, 2020 at 01:57:04AM +0300, Nir Soffer wrote:
> On Wed, Sep 30, 2020 at 1:49 PM Tomáš Golembiovský  
> wrote:
> > Hi,
> >
> > currently, when we run virt-sparsify on VM or user runs VM with discard
> > enabled and when the disk is on block storage in qcow, the results are
> > not reflected in oVirt. The blocks get discarded, storage can reuse them
> > and reports correct allocation statistics, but oVirt does not. In oVirt
> > one can still see the original allocation for disk and storage domain as
> > it was before blocks were discarded. This is super-confusing to the
> > users because when they check after running virt-sparsify and see the
> > same values they think sparsification is not working. Which is not true.
> 
> This may be documentation issue. This is a known limitation of oVirt thin
> provisioned storage. We allocate space as needed, but we release the
> space only when a volume is deleted.
> 
> > It all seems to be because of our LVM layout that we have on storage
> > domain. The feature page for discard [1] suggests it could be solved by
> > running lvreduce. But this does not seem to be true. When blocks are
> > discarded the QCOW does not necessarily change its apparent size, the
> > blocks don't have to be removed from the end of the disk. So running
> > lvreduce is likely to remove valuable data.
> 
> We have an API to (safely) reduce a volume to optimal size:
> http://ovirt.github.io/ovirt-engine-api-model/master/#services/disk/methods/reduce
> 
> Reducing images depends on qcow2 image-end-offset. We can tell which
> is the highest offset used by inactive disk:
> https://github.com/oVirt/vdsm/blob/24f646383acb615b090078fc7aeddaf7097afe57/lib/vdsm/storage/blockVolume.py#L403
> 
> and reduce the logical volume to this size.
> 
> But this will not works since qcow2 image-end-offset is not decreased by
> 
> virt-sparsify --in-place

Right - this doesn't "defragment" the qcow2 file, ie. moving clusters
to the beginning - so (except by accident) it won't make the qcow2
file smaller.

Virt-sparsify in copying mode will actually do what you want, but
obviously is much more heavyweight and complex to use.

> So it is true that sparsify releases unused space on storage level, but it 
> does
> not decrease the qcow2 image allocation, so we cannot reduce the logical
> volumes.
> 
> > At the moment I don't see how we could achieve the correct values. If
> > anyone has any idea feel free to entertain me. The only option seems to
> > be to switch to LVM thin pools. Do we have any plans on doing that?
> 
> No, thin pools do not support clustering, this can be used only on a single
> host. oVirt lvm based volumes are accessed on multiple hosts at the same
> time.
> 
> Here is an example sparisfy test showing the issue:
> 
> Before writing data to new disk
> 
> guest:
> 
> # df -h /data
> Filesystem  Size  Used Avail Use% Mounted on
> /dev/sda110G  104M  9.9G   2% /data
> 
> storage:
> 
> $ ls -lhs /home/target/2/00
> 2.1G -rw-r--r--. 1 root root 100G Oct  2 00:57 /home/target/2/00
> 
> host:
> 
> # qemu-img info
> /dev/27f2b637-ffb1-48f9-8f68-63ed227392b9/42cf66df-43ad-4cfa-ab57-a943516155d1
> image: 
> /dev/27f2b637-ffb1-48f9-8f68-63ed227392b9/42cf66df-43ad-4cfa-ab57-a943516155d1
> file format: qcow2
> virtual size: 10 GiB (10737418240 bytes)
> disk size: 0 B
> cluster_size: 65536
> Format specific information:
> compat: 1.1
> compression type: zlib
> lazy refcounts: false
> refcount bits: 16
> corrupt: false
> 
> # qemu-img check
> /dev/27f2b637-ffb1-48f9-8f68-63ed227392b9/42cf66df-43ad-4cfa-ab57-a943516155d1
> No errors were found on the image.
> 168/163840 = 0.10% allocated, 0.60% fragmented, 0.00% compressed clusters
> Image end offset: 12582912
> 
> 
> After writing 5g file to file system on this disk in the guest:
> 
> guest:
> 
> $ dd if=/dev/zero bs=8M count=640 of=/data/test oflag=direct
> conv=fsync status=progress
> 
> # df -h /data
> Filesystem  Size  Used Avail Use% Mounted on
> /dev/sda110G  5.2G  4.9G  52% /data
> 
> storage:
> 
> $ ls -lhs /home/target/2/00
> 7.1G -rw-r--r--. 1 root root 100G Oct  2 01:06 /home/target/2/00
> 
> host:
> 
> # qemu-img check
> /dev/27f2b637-ffb1-48f9-8f68-63ed227392b9/42cf66df-43ad-4cfa-ab57-a943516155d1
> No errors were found on the image.
> 82088/163840 = 50.10% allocated, 5.77% fragmented, 0.00% compressed clusters
> Image end offset: 5381423104
> 
> 
> After deleting the 5g file:
> 
> guest:
> 
> # df -h /data
> Filesystem  Size  Used Avail Use% Mounted on
> /dev/sda110G  104M  9.9G   2% /data
> 
> storage:
> 
> $ ls -lhs /home/target/2/00
> 7.1G -rw-r--r--. 1 root root 100G Oct  2 01:12 /home/target/2/00
> 
> host:
> 
> # qemu-img check
> /dev/27f2b637-ffb1-48f9-8f68-63ed227392b9/42cf66df-43ad-4cfa-ab57-a943516155d1
> No errors were found on the image.
> 82088/163840 = 50.10% allocated, 5.77% fragmented, 0.00% compressed clusters
> Image end offset: 5381423104
> 
> 
> After 

Re: [ovirt-devel] Disk sizes not updated on unmap/discard

2020-10-01 Thread Nir Soffer
On Wed, Sep 30, 2020 at 1:49 PM Tomáš Golembiovský  wrote:
>
> Hi,
>
> currently, when we run virt-sparsify on VM or user runs VM with discard
> enabled and when the disk is on block storage in qcow, the results are
> not reflected in oVirt. The blocks get discarded, storage can reuse them
> and reports correct allocation statistics, but oVirt does not. In oVirt
> one can still see the original allocation for disk and storage domain as
> it was before blocks were discarded. This is super-confusing to the
> users because when they check after running virt-sparsify and see the
> same values they think sparsification is not working. Which is not true.

This may be documentation issue. This is a known limitation of oVirt thin
provisioned storage. We allocate space as needed, but we release the
space only when a volume is deleted.

> It all seems to be because of our LVM layout that we have on storage
> domain. The feature page for discard [1] suggests it could be solved by
> running lvreduce. But this does not seem to be true. When blocks are
> discarded the QCOW does not necessarily change its apparent size, the
> blocks don't have to be removed from the end of the disk. So running
> lvreduce is likely to remove valuable data.

We have an API to (safely) reduce a volume to optimal size:
http://ovirt.github.io/ovirt-engine-api-model/master/#services/disk/methods/reduce

Reducing images depends on qcow2 image-end-offset. We can tell which
is the highest offset used by inactive disk:
https://github.com/oVirt/vdsm/blob/24f646383acb615b090078fc7aeddaf7097afe57/lib/vdsm/storage/blockVolume.py#L403

and reduce the logical volume to this size.

But this will not works since qcow2 image-end-offset is not decreased by

virt-sparsify --in-place

So it is true that sparsify releases unused space on storage level, but it does
not decrease the qcow2 image allocation, so we cannot reduce the logical
volumes.

> At the moment I don't see how we could achieve the correct values. If
> anyone has any idea feel free to entertain me. The only option seems to
> be to switch to LVM thin pools. Do we have any plans on doing that?

No, thin pools do not support clustering, this can be used only on a single
host. oVirt lvm based volumes are accessed on multiple hosts at the same
time.

Here is an example sparisfy test showing the issue:

Before writing data to new disk

guest:

# df -h /data
Filesystem  Size  Used Avail Use% Mounted on
/dev/sda110G  104M  9.9G   2% /data

storage:

$ ls -lhs /home/target/2/00
2.1G -rw-r--r--. 1 root root 100G Oct  2 00:57 /home/target/2/00

host:

# qemu-img info
/dev/27f2b637-ffb1-48f9-8f68-63ed227392b9/42cf66df-43ad-4cfa-ab57-a943516155d1
image: 
/dev/27f2b637-ffb1-48f9-8f68-63ed227392b9/42cf66df-43ad-4cfa-ab57-a943516155d1
file format: qcow2
virtual size: 10 GiB (10737418240 bytes)
disk size: 0 B
cluster_size: 65536
Format specific information:
compat: 1.1
compression type: zlib
lazy refcounts: false
refcount bits: 16
corrupt: false

# qemu-img check
/dev/27f2b637-ffb1-48f9-8f68-63ed227392b9/42cf66df-43ad-4cfa-ab57-a943516155d1
No errors were found on the image.
168/163840 = 0.10% allocated, 0.60% fragmented, 0.00% compressed clusters
Image end offset: 12582912


After writing 5g file to file system on this disk in the guest:

guest:

$ dd if=/dev/zero bs=8M count=640 of=/data/test oflag=direct
conv=fsync status=progress

# df -h /data
Filesystem  Size  Used Avail Use% Mounted on
/dev/sda110G  5.2G  4.9G  52% /data

storage:

$ ls -lhs /home/target/2/00
7.1G -rw-r--r--. 1 root root 100G Oct  2 01:06 /home/target/2/00

host:

# qemu-img check
/dev/27f2b637-ffb1-48f9-8f68-63ed227392b9/42cf66df-43ad-4cfa-ab57-a943516155d1
No errors were found on the image.
82088/163840 = 50.10% allocated, 5.77% fragmented, 0.00% compressed clusters
Image end offset: 5381423104


After deleting the 5g file:

guest:

# df -h /data
Filesystem  Size  Used Avail Use% Mounted on
/dev/sda110G  104M  9.9G   2% /data

storage:

$ ls -lhs /home/target/2/00
7.1G -rw-r--r--. 1 root root 100G Oct  2 01:12 /home/target/2/00

host:

# qemu-img check
/dev/27f2b637-ffb1-48f9-8f68-63ed227392b9/42cf66df-43ad-4cfa-ab57-a943516155d1
No errors were found on the image.
82088/163840 = 50.10% allocated, 5.77% fragmented, 0.00% compressed clusters
Image end offset: 5381423104


After sparsifying disk:

storage:
$ qemu-img check /var/tmp/download.qcow2
No errors were found on the image.
170/163840 = 0.10% allocated, 0.59% fragmented, 0.00% compressed clusters
Image end offset: 11927552

$ ls -lhs /home/target/2/00
2.1G -rw-r--r--. 1 root root 100G Oct  2 01:14 /home/target/2/00

host:

# qemu-img check
/dev/27f2b637-ffb1-48f9-8f68-63ed227392b9/42cf66df-43ad-4cfa-ab57-a943516155d1
No errors were found on the image.
170/163840 = 0.10% allocated, 0.59% fragmented, 0.00% compressed clusters
Image end offset: 4822138880

Allocation decreased from 50% to 0.1%, but image end offset
decreased