Re: Exporting qcow2 images as raw data from ova file with qemu-nbd

Kevin Wolf Mon, 29 Jun 2020 06:28:29 -0700

Am 29.06.2020 um 15:08 hat Nir Soffer geschrieben:
> On Mon, Jun 29, 2020 at 3:06 PM Kevin Wolf <kw...@redhat.com> wrote:
> >
> > Am 26.06.2020 um 21:42 hat Nir Soffer geschrieben:
> > > On Tue, Jun 23, 2020 at 1:21 AM Nir Soffer <nsof...@redhat.com> wrote:
> > > >
> > > > I'm trying to export qcow2 images from ova format using qemu-nbd.
> > > >
> > > > I create 2 compressed qcow2 images, with different data:
> > > >
> > > > $ qemu-img info disk1.qcow2
> > > > image: disk1.qcow2
> > > > file format: qcow2
> > > > virtual size: 200 MiB (209715200 bytes)
> > > > disk size: 384 KiB
> > > > ...
> > > >
> > > > $ qemu-img info disk2.qcow2
> > > > image: disk2.qcow2
> > > > file format: qcow2
> > > > virtual size: 200 MiB (209715200 bytes)
> > > > disk size: 384 KiB
> > > > ...
> > > >
> > > > And packed them in a tar file. This is not a valid ova but good enough
> > > > for this test:
> > > >
> > > > $ tar tvf vm.ova
> > > > -rw-r--r-- nsoffer/nsoffer 454144 2020-06-22 21:34 disk1.qcow2
> > > > -rw-r--r-- nsoffer/nsoffer 454144 2020-06-22 21:34 disk2.qcow2
> > > >
> > > > To get info about the disks in ova file, we can use:
> > > >
> > > > $ python -c 'import tarfile; print(list({"name": m.name, "offset":
> > > > m.offset_data, "size": m.size} for m in tarfile.open("vm.ova")))'
> > > > [{'name': 'disk1.qcow2', 'offset': 512, 'size': 454144}, {'name':
> > > > 'disk2.qcow2', 'offset': 455168, 'size': 454144}]
> > > >
> > > > First I tried the obvious:
> > > >
> > > > $ qemu-nbd --persistent --socket=/tmp/nbd.sock --read-only --offset=512 
> > > > vm.ova
> > > >
> > > > And it works, but it exposes the qcow2 data. I want to raw data so I
> > > > can upload the guest
> > > > data to ovirt, where is may be converted to qcow2 format.
> > > >
> > > > $ qemu-img info --output json "nbd+unix://?socket=/tmp/nbd.sock"
> > > > {
> > > >     "virtual-size": 209715200,
> > > >     "filename": "nbd+unix://?socket=/tmp/nbd.sock",
> > > >     "format": "qcow2",
> > > >  ...
> > > > }
> > > >
> > > > Looking in qemu manual and qapi/block-core.json, I could construct this 
> > > > command:
> > > >
> > > > $ qemu-nbd --persistent --socket=/tmp/nbd.sock --read-only
> > > > 'json:{"driver": "qcow2", "file": {"driver": "raw", "offset": 512,
> > > > "size": 454144, "file": {"driver": "file", "filename": "vm.ova"}}}'
> > > >
> > > > And it works:
> > > >
> > > > $ qemu-img info --output json "nbd+unix://?socket=/tmp/nbd.sock"
> > > > {
> > > >     "virtual-size": 209715200,
> > > >     "filename": "nbd+unix://?socket=/tmp/nbd.sock",
> > > >     "format": "raw"
> > > > }
> > > >
> > > > $ qemu-img map --output json "nbd+unix://?socket=/tmp/nbd.sock"
> > > > [{ "start": 0, "length": 104857600, "depth": 0, "zero": false, "data":
> > > > true, "offset": 0},
> > > > { "start": 104857600, "length": 104857600, "depth": 0, "zero": true,
> > > > "data": false, "offset": 104857600}]
> > > >
> > > > $ qemu-img map --output json disk1.qcow2
> > > > [{ "start": 0, "length": 104857600, "depth": 0, "zero": false, "data": 
> > > > true},
> > > > { "start": 104857600, "length": 104857600, "depth": 0, "zero": true,
> > > > "data": false}]
> > > >
> > > > $ qemu-img convert -f raw -O raw nbd+unix://?socket=/tmp/nbd.sock 
> > > > disk1.raw
> > > >
> > > > $ qemu-img info disk1.raw
> > > > image: disk1.raw
> > > > file format: raw
> > > > virtual size: 200 MiB (209715200 bytes)
> > > > disk size: 100 MiB
> > > >
> > > > $ qemu-img compare disk1.raw disk1.qcow2
> > > > Images are identical.
> > > >
> > > > I wonder if this is the best way to stack a qcow2 driver on top of a
> > > > raw driver exposing a range from a tar file.
> >
> > Yes, if you want to specify an offset and a size to access only part of
> > a file as the disk image, sticking a raw driver in the middle is the way
> > to go.
> >
> > > Other related challenges with this are:
> > >
> > > 1. probing image format
> > >
> > > With standalone images, we probe image format using:
> > >
> > >     qemu-img info image
> > >
> > > I know probing is considered dangerous, but I think this ok when user
> > > run this code on his machine, on an image they want to upload to
> > > oVirt. On a hypervisor we use prlimit to limit the resources used by
> > > qemu-img, so we can use the same solution also when running by a user
> > > if needed.
> > >
> > > However not being able to probe image format is a usability issue. It
> > > does not make sense that qemu-img cannot probe image format safely, at
> > > least for qcow2 format.
> > >
> > > I can get image info using:
> > >
> > > $ qemu-img info 'json:{"driver": "qcow2", "file": {"driver": "raw",
> > > "offset": 1536, "file": {"driver": "file", "filename":
> > > "fedora-32.ova"}}}'
> > > image: json:{"driver": "qcow2", "file": {"offset": 1536, "driver":
> > > "raw", "file": {"driver": "file", "filename": "fedora-32.ova"}}}
> > > file format: qcow2
> > > virtual size: 6 GiB (6442450944 bytes)
> > > disk size: 645 MiB
> > > cluster_size: 65536
> > > Format specific information:
> > >     compat: 1.1
> > >     lazy refcounts: false
> > >     refcount bits: 16
> > >     corrupt: false
> > >
> > > But there is no way to probe the format, unless I try first with
> > > qcow2, and consider the image as raw otherwise.
> >
> > Just leave out the top-level "driver" option. This isn't -blockdev
> > (which does indeed require a "driver"), but uses the same logic as
> > -drive and therefore supports format probing:
> >
> > $ ./qemu-img info 
> > 'json:{"file":{"driver":"raw","offset":512,"size":2424832,"file":{"filename":"/tmp/test.ova"}}}'
> > image: json:{"driver": "qcow2", "file": {"offset": 512, "driver": "raw", 
> > "size": 2424832, "file": {"driver": "file", "filename": "/tmp/test.ova"}}}
> 
> Nice!
> 
> > file format: qcow2
> > virtual size: 64 MiB (67108864 bytes)
> > disk size: 2.32 MiB
> > cluster_size: 65536
> > Format specific information:
> >     compat: 1.1
> >     compression type: zlib
> >     lazy refcounts: false
> >     refcount bits: 16
> >     corrupt: false
> >
> > > We can parse the qcow2 header manually, as we already do in oVirt
> > > engine UI in javascript:
> > > https://github.com/oVirt/ovirt-engine/blob/9d48ea6274fdd1bef3fc8e309f9161be3b540890/frontend/webadmin/modules/uicommonweb/src/main/java/org/ovirt/engine/ui/uicommonweb/models/storage/ImageInfoModel.java#L103
> > >
> > > We have used this code for 5 years and had no issues with it yet.
> > >
> > > In the worst case, if we fail to detect, or let the user upload a
> > > qcow2 files oVirt does not
> > > support, the uload will fail at the end, in the verification step,
> > > when we run check the
> > > uploaded image using "qemu-img info". This is done using prlimit since
> > > we treat this
> > > image as untrusted.
> > >
> > > I think it would be useful if the qemu project was publishing
> > > libraries in C/python/javascript
> > > supporting format probing for qcow2 format.
> > >
> > > 2. getting image virtual size
> > >
> > > So we can use qemu-img info with a custom json: filename, but this is
> > > very complicated and error prone.
> >
> > How is this complicated and error prone? I would understand the
> > reasoning for human use (maybe not really error prone, but the syntax is
> > somewhat hard to remember), but isn't the context here use by a machine?
> 
> Yes, this is complicated for humans, meaning that someone need to hide the
> complexity for the user. For a machine the json syntax is great.
> 
> It would be even nicer if we could also get the block graph in qemu as json
> for debugging and understanding how things are wired up under the hood.
> 
> > > 3. measuring image required size when converting to qcow2 image on block 
> > > device
> > >
> > > This works if we know the image format:
> > >
> > > $ qemu-img measure -O qcow2 'json:{"driver": "qcow2", "file":
> > > {"driver": "raw", "offset": 1536, "file": {"driver": "file",
> > > "filename": "fedora-32.ova"}}}'
> > > required size: 1381302272
> > > fully allocated size: 6443696128
> > >
> > > But it is complicated.
> > >
> > > Can we have better support in qemu-img/qemu-nbd for accessing images
> > > in a tar file?
> > >
> > > Maybe something like:
> > >
> > >     qemu-img info tar://vm.ova?member=fedora-32.qcow2
> >
> > The problem with such convenient shortcut URLs is that they always fail
> > to cover more than the simplest cases. For example, how would you
> > express that you want to use a file from a tar file accessed through NBD
> > or HTTP?
> >
> > Of course, even if you have to revert to JSON (or the equivalent dotted
> > key syntax) for these cases, you would still save the work to find out
> > the right offsets yourself, so the idea does have some merit.
> 
> A tar driver can parse the tar file, find the requested file and use the right
> offset and size.
> 
> So we can have:
> 
>     {"file": {"driver": "tar",
>               "file-name": "disk1.qcow2",
>               "file": {"driver": "curl",
>                        "url": ...


Yes, as I said, JSON or dotted keys are expressive enough to cover these
cases. It's just not a nice URL any more then, but maybe these cases are
exotic enough that it doesn't really matter.

Kevin

Re: Exporting qcow2 images as raw data from ova file with qemu-nbd

Reply via email to