Am 22.06.2020 um 17:50 hat Nir Soffer geschrieben: > On Mon, Jun 22, 2020 at 12:47 PM Max Reitz <mre...@redhat.com> wrote: > > > > On 22.06.20 00:25, Nir Soffer wrote: > > > On Fri, Jun 19, 2020 at 1:40 PM Max Reitz <mre...@redhat.com> wrote: > > >> > > >> Hi, > > >> > > >> As discussed here: > > >> > > >> https://lists.nongnu.org/archive/html/qemu-block/2020-02/msg00644.html > > >> https://lists.nongnu.org/archive/html/qemu-block/2020-04/msg00329.html > > >> https://lists.nongnu.org/archive/html/qemu-block/2020-06/msg00240.html > > >> > > >> I think that qcow2 images with data-file-raw should always have > > >> preallocated 1:1 L1/L2 tables, so that the image always looks the same > > >> whether you respect or ignore the qcow2 metadata. > > > > > > I don't know the internals of qcow2 data_file, but are we really using > > > qcow2 metadata when accessing the data file? > > > > Yes. > > > > > This may have unwanted performance consequences. > > > > I don’t think so, because in practice normal lookups of L1/L2 mappings > > generally don’t cost that much performance. > > > > > If I understand correctly, qcow2 metadata is needed only for keeping > > > bitmaps (or maybe > > > future extensions) for raw data file, and reading from the qcow2 image > > > should be read > > > directly from the raw file without any extra work. > > > > > > Writing to the data file should also bypass the qcow2 metadata, since the > > > bitmap > > > is updated in memory. > > > > Well, with this series, writing would no longer update the metadata at > > least, because it would always be preallocated already. > > > > >> The easiest way to > > >> achieve that is to enforce at least metadata preallocation whenever > > >> data-file-raw is given. > > > > > > But preallocation is not free, even on file systems, it can be even > > > slow (NFS < 4.2). > > > > Metadata preallocation with an external data file should be the same > > speed on every file system. We only need to create the metadata > > structures, which, with the default cluster size (64k) take up a bit > > more than 1/8192 of the full image size. > > > > Sure, it’s not free. But if we decide we should indeed fully ignore the > > L1/L2 tables for data-file-raw images, the qcow2 spec must be amended. > > As I can read it, it currently doesn’t say so. > > > > (By the way, this is not a trivial change. Right now, data-file-raw is > > an autoclear flag: If a version of qemu that doesn’t support it accesses > > the image, it will automatically clear the flag, but the image stays > > valid. If we decide to completely ignore the L1/L2 tables (i.e. not > > even create them), then this can no longer be an autoclear flag. We’d > > need a new incompatible flag. (Because without L1/L2 tables, the image > > becomes useless to older qemu versions.)) > > > > > With block storage this means you need to allocate the entire image size > > > on > > > storage for writing the metadata. > > > > > > While oVirt does not use qcow2 with data_file, having preallocated qcow2 > > > will make this very hard to use, for example for 500 GiB disk we will > > > have to > > > allocate 500 GiB disk for the raw data file and 500 GiB disk for the qcow2 > > > metadata disk which will be 99% unused. > > > > I don’t understand this. When you use an external data file, the qcow2 > > file will only contain the metadata: > > > > $ qemu-img create -f qcow2 \ > > -o data_file=foo.data,data_file_raw=on,preallocation=metadata \ > > foo.qcow2 8G > > Formatting 'foo.qcow2', fmt=qcow2 size=8589934592 data_file=foo.data > > data_file_raw=on cluster_size=65536 preallocation=metadata > > lazy_refcounts=off refcount_bits=16 > > $ ls -l foo.qcow2 > > ... 1310720 ... foo.qcow2 > > $ ls -l foo.data > > ... 8589934592 ... foo.data > > When allocating metadata in regular qcow2, need the to allocate the > entire device > (+ extra space for metadata overhead): > > # qemu-img create -f qcow2 -o preallocation=metadata foo.qcow2 500g > Formatting 'foo.qcow2', fmt=qcow2 size=536870912000 cluster_size=65536 > preallocation=metadata lazy_refcounts=off refcount_bits=16 > > # qemu-img check foo.qcow2 > No errors were found on the image. > 8192000/8192000 = 100.00% allocated, 0.00% fragmented, 0.00% compressed > clusters > Image end offset: 536953094144
I think we shouldn't really call this "allocating" because we don't actually reserve space for it yet. On a filesystem, you get a large file size, but it's almost completely sparse. On block devices, it depends on whether the storage has thin provisioning. > But I see that with metadata file we allocate much less: > > # qemu-img create -f qcow2 -o > data_file=foo.data,data_file_raw=on,preallocation=metadata foo.qcow2 > 500g > Formatting 'foo.qcow2', fmt=qcow2 size=536870912000 data_file=foo.data > data_file_raw=on cluster_size=65536 preallocation=metadata > lazy_refcounts=off refcount_bits=16 > > # qemu-img check foo.qcow2 > No errors were found on the image. > 8192000/8192000 = 100.00% allocated, 0.00% fragmented, 0.00% compressed > clusters > Image end offset: 65798144 Actually, this is not much less, but just split in two places. You still have the 500 GB data file. The metadata is small, but it was already small before: 536953094144 - 536870912000 = ~78 MB. Not exactly sure why it's more than the 64 MB you get for an external data file, maybe some alignment thing, but not significant anyway. Kevin