* Max Reitz (mre...@redhat.com) wrote: > On 2018-06-06 13:19, Michal Suchánek wrote: > > On Wed, 6 Jun 2018 13:02:53 +0200 > > Max Reitz <mre...@redhat.com> wrote: > > > >> On 2018-06-06 12:32, Michal Suchánek wrote: > >>> On Tue, 29 May 2018 12:14:15 +0200 > >>> Max Reitz <mre...@redhat.com> wrote: > >>> > >>>> On 2018-05-29 08:44, Kevin Wolf wrote: > >>>>> Am 28.05.2018 um 23:25 hat Richard W.M. Jones geschrieben: > >>>>>> On Mon, May 28, 2018 at 10:20:54PM +0100, Richard W.M. Jones > >>>>>> wrote: > >>>>>>> On Mon, May 28, 2018 at 08:38:33PM +0200, Kevin Wolf wrote: > >>>>>>>> Just accessing the image file within a tar archive is possible > >>>>>>>> and we could write a block driver for that (I actually think we > >>>>>>>> should do this), but it restricts you because certain > >>>>>>>> operations like resizing aren't really possible in tar. > >>>>>>>> Unfortunately, resizing is a really common operation for > >>>>>>>> non-raw image formats. > >>>>>>> > >>>>>>> We do this already in virt-v2v (using file.offset and file.size > >>>>>>> parameters in the raw driver). > >>>>>>> > >>>>>>> For virt-v2v we only need to read the source so resizing isn't > >>>>>>> an issue. For most of the cases we're talking about the > >>>>>>> downloaded image would also be a template / base image, so I > >>>>>>> suppose only reading would be required too. > >>>>>>> > >>>>>>> I also wrote an nbdkit tar file driver (supports writes, but not > >>>>>>> resizing). > >>>>>>> https://manpages.debian.org/testing/nbdkit-plugin-perl/nbdkit-tar-plugin.1.en.html > >>>>>>> > >>>>>> > >>>>>> I should add the other thorny issue with OVA files is that the > >>>>>> metadata contains a checksum (SHA1 or SHA256) of the disk images. > >>>>>> If you modify the disk images in-place in the tar file then you > >>>>>> need to recalculate those. > >>>>> > >>>>> All of this means that OVA isn't really well suited to be used as > >>>>> a native format for VM configuration + images. It's just for > >>>>> sharing read-only images that are converted into another native > >>>>> format before they are used. > >>>>> > >>>>> Which is probably fair for the use case it was made for, but means > >>>>> that we need something else to solve our problem. > >>>> > >>>> Maybe we should first narrow down our problem. Maybe you have done > >>>> that already, but I'm quite in the dark still. > >>>> > >>>> The original problem was that you need to supply a machine type to > >>>> qemu, and that multiple common architectures now have multiple > >>>> machine types and not necessarily all work with a single image. So > >>>> far so good, but I have two issues here already: > >>>> > >>>> (1) How is qemu supposed to interpret that information? If it's > >>>> stored in the image file, I don't see a nice way of retrieving it > >>>> before the machine is initialized, at least not with qemu's current > >>>> architecture. Once we support configuring qemu solely through QMP, > >>>> sure, you can do a blockdev-add and then build the machine > >>>> accordingly. But that is not here today, and I'm not sure this is > >>>> a good idea either, because that would mean automagic defaults for > >>>> the machine-building QMP commands derived from the blockdev-add > >>>> earlier, which should get a plain "No". Also, having to use QMP to > >>>> build your machine wouldn't make anything easier; at least not > >>>> easier than just supplying a configuration file along with the > >>>> image. > >>>> > >>>> (Building the magic into -blockdev might be less horrible, but such > >>>> magic (adding block devices influences machine defaults) to me > >>>> still doesn't seem worth not having to supply a config file along > >>>> with the disk image.) > >>>> > >>>> (2) Again, I personally just really don't like saving such > >>>> information in a disk image. One actual argument I can bring up > >>>> for that distaste is this: Suppose, you have multiple images > >>>> attached to your VM. Now the VM wants to store the machine type. > >>>> Where does it go? Into all of them? But some of those images may > >>>> only contain data and might be intended to be shared between > >>>> multiple VMs. So those shouldn't receive the mark. Only disks > >>>> with binaries should receive them. But what if those binaries are > >>>> just cross-compiled binaries for some other VM? Oh no, so not > >>>> even binaries are a sure indicator... So I have no idea where the > >>>> information is supposed to be stored. In any case, "the first > >>>> image" just gets an outright "no" from me, and "all images" gets > >>>> an "I don't think this is a good idea". > >>>> > >>>> Loading is fun, too. OK, so you attach multiple disk images to a > >>>> VM. Oops, they have varying machine type information... Now > >>>> what? Use the information from the first one? Definitely no. > >>>> Just ignore all of the information in such a case and have the > >>>> user supply the machine type again? Possible, but it seems weird > >>>> to me that qemu would usually guess the machine type, but once you > >>>> attach some random other image to it, it suddenly fails to do > >>>> that. But maybe it's just me who thinks this is weird. > >>>> > >>>> > >>>> OK, so let's go a step further. We have stored the machine type > >>>> information in order to not have to supply a config file with the > >>>> qcow2 image -- because if we did, it could just contain the machine > >>>> type and that would be it. > >>>> > >>>> So to me it follows naturally that just storing the machine type > >>>> doesn't make much sense if we cannot also store more VM > >>>> configuration in a qcow2 file, because I don't see why you should > >>>> be able to ship an image without a config file only if all you > >>>> need to supply is a machine type. Often, you also need to supply > >>>> how much memory the VM needs (which depends on the OS on the > >>>> image) or what storage controller to use (does the OS have virtio > >>>> drivers? (to be fair, it usually does, because you're supplying a > >>>> VM image in the first place)). > >>>> > >>>> So I think if we decide to store the machine type, that is kind of > >>>> a slippery slope and then there are good arguments for storing > >>>> even more configuration options in the file, too. But I really, > >>>> really don't like that. > >>>> > >>>> For one thing, I suspect it to get really ugly implementation-wise. > >>>> Getting the machine type out of a disk image and actually > >>>> interpreting it automatically is bad enough, but getting possibly > >>>> everything out of it? It's not going to be any better. > >>>> > >>>> For another, how do we store the data? key-value seems wrong if we > >>>> want to store everything. JSON might be fine. But eventually we > >>>> just want basically a qemu configuration file in there, I would > >>>> think (which may support JSON at some point?). So basically we > >>>> would store the data as a binary blob and let the rest of qemu do > >>>> its thing with it. But then please tell me why I fought so > >>>> valiantly against storing random bitmaps in qcow2 files. > >>> > >>> Yes, I wonder. Why did you? > >> > >> That was mostly directed at Kevin. > >> > >> My reasoning was that a qcow2 file is a disk image. All data stored > >> therein should be immediately associated with the stored data. > >> Another reason was that from the perspective of qcow2 you don't lose > >> anything by tying the bitmaps directly to that data; all we lost was > >> the capability of storing bitmaps for unrelated raw files. > >> > >> (And the reasoning for that is "if you want features, use qcow2" -- > >> although R/W backing files may loosen that phrase.) > >> > >>>> I hate the idea of making qcow2 a random archive format. > >>> > >>> What's wrong with that? > >> > >> The fact that qcow2 isn't. > >> > >> From my perspective it would increase the format's complexity to a > >> point where you could just create a new format altogether. Well, > >> actually, all you do is design a filesystem (or reuse an existing > >> one). > >> > >>>> We have tar for that. > >>> > >>> It does not support expanding the stored files. > >> > >> Nor does qcow2, because it does not support storing files at all. > > > > AFAICT from the previous discussion it already does allow storing > > multiple data streams that can be changed independently so it basically > > is an archive format or filesystem except the streams are not named nor > > easily accessible separately outside of qemu. > > I don't quite understand what you are referring to. We have snapshots, > we have bitmaps, yes, but all of that are related directly to the stored > guest disk data. > > The only thing we currently have in qcow2 that is opaque is the VM state > that can be stored in snapshots (and don't hold me responsible for that). > > >> Secondly, that completely depends on how you use it. You can freely > >> expand the last file in the archive, for instance. Also I've seen > >> people store files in chunks so they can indeed resize it. > >> > >> (I'm wondering if we could write a block driver that could provide > >> such a chunk allocation transparently to qcow2... Note that a qcow2 > >> file does not need to be continuous, so you could in theory indeed > >> store the qcow2 file and its data in completely separate places in a > >> tar file.) > > > > Which basically invents another new filesystem on top of tar for no > > good reason. Especially when we have already support for storage format > > that is capable enough. > > No different from inventing a filesystem on top of qcow2. > > I don't think qcow2 is any more capable than tar. > > >> What I'm trying to get at is that qcow2 was not designed to be a > >> container format for arbitrary files. If you want to make it such, > >> I'm sure there are existing formats that work better. > > > > Such as? > > ext2? > > It seems to me that you want to make qcow2 a filesystem. Sure, the FS > we'd end up with would probably be simpler than ext2, but I assume > thanks to feature creep we'd eventually end up with a qcow2 format that > is a worse FS than real FS (especially performance-wise), but that is > similarly complex. > > >>>> Unless I have got something terribly wrong (which is indeed a > >>>> possibility!), to me this proposal means basically to turn qcow2 > >>>> into (1) a VM description format for qemu, and (2) to turn it into > >>>> an archive format on the way. > >>> > >>> And if you go all the way you can store multiple disks along with > >>> the VM definition so you can have the whole appliance in one file. > >>> It conveniently solves the problem of synchronizing snapshots across > >>> multiple disk images and the question where to store the machine > >>> state if you want to suspend it. > >> > >> Yeah, but why make qcow2 that format? That's what I completely fail > >> to understand. > >> > >> If you want to have a single VM description file that contains the VM > >> configuration and some qcow2/raw/whatever files along with it for the > >> guest disk data, sure, go ahead. But why does the format of the whole > >> thing need to be qcow2? > > > > Because then qemu can access the disk data from the image directly > > without any need for extraction, copying to different file, etc. > > This does not explain why it needs to be qcow2. There is absolutely no > reason why you couldn't use qcow2 files in-place inside of another file.
Because then we'd have to change the whole stack to take advantage of that. Adding a feature into qcow2 means nothing else changes. Dave > Max > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK