On 05.09.2015 18:43, Vladimir Sementsov-Ogievskiy wrote: > Persistent dirty bitmaps will be saved into qcow2 files. It may be used > as 'internal' bitmaps (for qcow2 drives) or as 'external' bitmaps for > other drives (there may be qcow2 file with zero disk size but with > several dirty bitmaps for other drives). > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsement...@virtuozzo.com> > --- > docs/specs/qcow2.txt | 127 > ++++++++++++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 126 insertions(+), 1 deletion(-)
Overall: I'm strongly against putting dirty bitmaps into qcow2 files, at least not as it is envisioned by this series. If you don't feel like reading why, and you'd rather read what I'd do if you really, really want to put them into qcow2, files, skip ahead until the "RANT OVER" line. The first indication of why that is the case is that this patch does not add any explanation to the qcow2 specification what these dirty bitmaps are. Therefore, there are basically just binary data that is given a name and dumped into a qcow2 file as if it were a tar file. One could argue that this is qemu and we know what dirty bitmaps are. But qcow2.txt is located in docs/specs/, not just in docs/. It is not an explanation, but a *specification*, and as such it should explain everything related to qcow2. As a side notice, we already have a binary data dump in qcow2 files, and that is the VM state. This is bad enough and if it would have been up to me, it would have never been there. That's because it's something only qemu can make use of, and not even different versions of qemu are compatible there, so it was (in my opinion) a pretty bad idea to put it into qcow2. So what this specification is definitely lacking is an explanation on how any independent program (i.e. *not qemu*) is to interpret the dirty bitmaps. I do believe this is possible, as opposed to the VM state. The VM state, nobody can do anything with it, it's even difficult for qemu itself sometimes. So let's imagine this specification would contain an explanation on what dirty bitmaps are and what they mean. Actually, now that I think about it, I cannot really imagine it, because I'm lacking that explanation. What do they mean? As far as I can see from the series, they actually don't mean anything. It's just a dump of data into a qcow2 file, and it can be any bitmap, be it associated with the file itself or not. This is further pointed to by your feature proposal "Allow qcow2 images without l1_table and other staff but only with dirty bitmaps with minimum overhead". There is a file format for exactly that, and it's called tar (yes, you are missing some metadata, but just add a JSON description file to the archive and you're done). By the way: I heard John briefly touch this in his talk at KVM Forum when he explained that this would make qcow2 files something like better tar files, and I didn't like the idea back then either. I was hoping that it would actually be differently, and was waiting for some discussion to appear, but I didn't notice this series, because it doesn't have "qcow2" in the cover letter's subject (and I wasn't CC'd, but I don't really see why I should have been, as I'm not mentioned in the MAINTAINERS file (what a lucky man I am!)). I only just noticed today when I saw a lone reply from John on qemu-block to a patch with a "qcow2:" prefix. So, what you are apparently planning to do is to dump dirty bitmaps into any available qcow2 file. If the image you are operating on is a qcow2 file, great! If it isn't, you create some empty qcow2 file and dump the bitmaps there. Then, I'm asking myself why you don't use tar files in the second case, and then, why you don't use tar files in the first case. I do remember John saying that there was a dicussion about it, but I don't know about it, so I don't know why you dropped that idea in favor of making qcow2 files tar archives. The only reason I can think of off the top of my head is that we have infrastructure for reading qcow2 files, but not for tar files. However, this series is like just appending a tar file to a qcow2 file, and then implementing a reader for tar archives inside of the qcow2 driver, so it doesn't seem to be much simpler in practice. In any case, if my assumptions so far are more or less correct, no outside program can do anything with the dirty bitmaps contained in the qcow2 file, because they are just binary data which does not necessarily have any connection to the qcow2 file itself. Not even qemu can make sense of them, it appears, it needs the user or the management tool to do so. I am strongly against putting binary data into a qcow2 file which does not have any visible connection to the file's contents. Obviously, it is possible that there is some connection which I am just not seeing, though. --- RANT OVER --- Okay, that was enough destructive criticism, now to get some constructive arguments and ideas. So, there are two points I don't like: First, it's binary data which isn't explained in the qcow2 specification. This can easily be fixed. Second, there is no obvious connection between the qcow2 file and a dirty bitmap. I'd drop the idea of "If you use anything else than qcow2, we create an empty qcow2 file and put the dirty bitmaps there". Please don't do that. If you are using something else and want this feature, that's your problem. If you need features, you use qcow2. That's it. If you really want to support it for other file formats, but the data into tar archives and not into qcow2 files. For comparison, this is like using a qcow2 file for implementing backing files for raw images. The cluster offsets in L2 tables would then point to offsets in the raw image (and the host offset would have to match the guest offset), and by looking at which L2 table entries are unused, one could deduce which sectors are to be read from the backing file. We don't support that either, because you should just use qcow2 if you want backing files. Next we need to know for every dirty bitmap what the reference disk is. Since generally that reference disk is stored in some image file somewhere, I'd add a filename for each of the dirty bitmaps which is the base file in respect to which these clusters are considered dirty. As a measurement on how well you have done to associate a dirty bitmap with a qcow2 file, imagine the following scenario: You are writing a program independent of qemu, and that program is to make use of the dirty bitmaps for incremental backups. With my proposal above, it would open the qcow2 file and pick some bitmap based on name, base image, user choice or maybe some property of the bitmap itself (e.g. lowest dirty bit count). Then, it would create a new overlay file (the backup image), let's say a qcow2 file, and use the base image filename of the selected dirty bitmap as the filename of the backing file for the backup image. Then, it would copy all dirty clusters from the original qcow2 file to the backup image, and that's it. Right now, with this patch alone, the tool has no idea what the base image is, and some bitmaps may not even be related to the very qcow2 file they are in at all. With that fixed, I could be moved to accept the concept of dirty bitmaps in qcow2 files grudgingly. Maybe happily, if you give me a good reason why we should not put them into tar files. And I have some other comments in regards to the specification: > diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt > index 121dfc8..5fc0365 100644 > --- a/docs/specs/qcow2.txt > +++ b/docs/specs/qcow2.txt > @@ -103,7 +103,13 @@ in the description of a field. > write to an image with unknown auto-clear features if it > clears the respective bits from this field first. > > - Bits 0-63: Reserved (set to 0) > + Bit 0: Dirty bitmaps bit. If this bit is set then > + there is a _consistent_ Dirty bitmaps > extension > + in the image. If it is not set, but there is > a > + Dirty bitmaps extension, its data should be > + considered as inconsistent. > + > + Bits 1-63: Reserved (set to 0) > > 96 - 99: refcount_order > Describes the width of a reference count block entry > (width > @@ -123,6 +129,7 @@ be stored. Each extension has a structure like the > following: > 0x00000000 - End of the header extension area > 0xE2792ACA - Backing file format name > 0x6803f857 - Feature name table > + 0x23852875 - Dirty bitmaps > other - Unknown header extension, can be safely > ignored > > @@ -166,6 +173,24 @@ the header extension data. Each entry look like this: > terminated if it has full length) > > > +== Dirty bitmaps == > + > +Dirty bitmaps is an optional header extension. It provides an ability to > store > +dirty bitmaps in a qcow2 image. The fields are: > + > + 0 - 3: nb_dirty_bitmaps > + The number of dirty bitmaps contained in the image. Valid > + values: 0 - 65535. Why? Because that's what qemu supports? That's not a real reason. If so, you may make a note of that (see the cluster_bits documentation), or just omit it; for years, qemu only supported refcount_order = 4, but the specification did not make a note of that. It was just a limitation of qemu, but not of the format. > + > + 4 - 7: dirty_bitmap_directory_size > + Size of the Dirty Bitmap Directory in bytes. Valid values: > + 0 - 67108864 (= 1024 * nb_dirty_bitmaps). Same here. > + > + 8 - 15: dirty_bitmap_directory_offset > + Offset into the image file at which the Dirty Bitmap > + Directory starts. Must be aligned to a cluster boundary. > + > + > == Host cluster management == > > qcow2 manages the allocation of host clusters by maintaining a reference > count > @@ -360,3 +385,103 @@ Snapshot table entry: > > variable: Padding to round up the snapshot table entry size to the > next multiple of 8. > + > + > +== Dirty bitmaps == > + > +The feature supports storing dirty bitmaps in a qcow2 image. I think I've made my point clear enough in the huge wall of text above, but I'll just repeat it once more: This should explain what dirty bitmaps are and how they are to be interpreted. > + > +=== Cluster mapping === > + > +Dirty bitmaps are stored using a ONE-level structure for the mapping of > +bitmaps to host clusters. It is called Dirty Bitmap Table. > + > +The Dirty Bitmap Table has a variable size (stored in the Dirty Bitmap > +Directory Entry) and may use multiple clusters, however it must be contiguous > +in the image file. > + > +Given an offset (in bytes) into the bitmap, the offset into the image file > can > +be obtained as follows: > + > + byte_offset = > + dirty_bitmap_table[offset / cluster_size] + (offset % cluster_size) > + > +Taking into accout the granularity of the bitmap, an offset in bits into the > +image file can be obtained like this: > + > + bit_offset = > + byte_offset(bit_nr / granularity / 8) * 8 + (bit_nr / granularity) % > 8 > + > +Here bit_nr is a number of "virtual" bit of the bitmap, which is covered by > +"physical" bit with number (bit_nr / granularity). > + > +Dirty Bitmap Table entry: > + > + Bit 0 - 8: Reserved > + > + 9 - 55: Bits 9-55 of host cluster offset. Must be aligned to a > + cluster boundary. If the offset is 0, the cluster is > + unallocated, and should be read as all zeros. > + > + 56 - 63: Reserved > + > +=== Dirty Bitmap Directory === > + > +Each dirty bitmap, saved in the image is described in the Dirty Bitmap > +Directory entry. Dirty Bitmap Directory is a contiguous area in the image > file, > +whose starting offset and length are given by the header extension fields > +dirty_bitmap_directory_offset and dirty_bitmap_directory_size. The entries of > +the bitmap directory have variable length, depending on the length of the > +bitmap name. > + > +Dirty Bitmap Directory Entry: > + > + Byte 0 - 7: dirty_bitmap_table_offset > + Offset into the image file at which the Dirty Bitmap > Table > + for the bitmap starts. Must be aligned to a cluster > + boundary. > + > + 8 - 15: nb_virtual_bits > + Number of "virtual" bits in the bitmap. Number of > + "physical" bits would be: > + (nb_virtual_bits + granularity - 1) / granularity > + > + 16 - 19: dirty_bitmap_table_size > + Number of entries in the Dirty Bitmap Table of the > bitmap. > + Valid values: 0 - 0x8000000. > + Also, (dirty_bitmap_table_size * cluster_size) should not > + be greater than 0x20000000 (512 MB) Again, is this a qemu limitation or is there another reason? Also, you should decide between the two limitations. The second one automatically limits the number of values to 0 - 1048575 at maximum (512 byte clusters). > + > + 20 - 23: granularity_bits > + Granularity bits. Valid values are: 0 - 63. > + > + Granularity is calculated as > + granularity = 1 << granularity_bits > + > + Granularity of the bitmap is how many "virtual" bits > + accounts for one "physical" bit. > + > + 24 - 27: flags > + Bit > + 0: in_use > + The bitmap is in use and may be inconsistent. What does "in use" mean? You are not supposed to use a qcow2 file which is in use by qemu anyway. > + > + 1: self > + The bitmap is a dirty bitmap for the containing > image. As I said, I don't see why we should support this ever being not set, so I am very much in favor of dropping this. > + > + 2: auto > + The bitmap should be autoloaded as block dirty > bitmap. > + Only available if bit 1 (self) is set. The phrasing is too qemu-specific. Remember that this is not an explanation for how qemu is to interpret qcow2 files, but a *specification* of qcow2 files for *any* tool. So if I understand the intention behind this flag, a more general expression would be "The default bitmap". Then it is qemu's decision to always auto-load this default bitmap. > + > + 3: read_only > + The bitmap should not be rewritten. > + > + Bits 4 - 31 are reserved. > + > + 28 - 29: name_size > + Size of the bitmap name. Valid values: 0 - 1023. > + > + variable: The name of the bitmap (not null terminated). > + > + variable: Padding to round up the Dirty Bitmap Directory Entry > size to > + the next multiple of 8. > The interesting thing is that I have written a huge wall of text above and all my comments (except for "just put it into tar") can be addressed relatively easy. Just add documentation for what dirty bitmaps are, and a "variable: base_filename" field here, and that would be it. But there is a reason why I'm keeping the wall of text there: I feel like while these are very minor changes, they are fundamental design differences. Without these changes, you just add a binary data dump extension to qcow2, which is of no use to anyone but qemu (and not even qemu alone, it needs the user or a management tool to tell it what to do with it, unless the @auto flag is set). With these changes, it suddenly actually becomes an integral part of the qcow2 file which can be interpreted and used in a meaningful way by tools other than qemu itself. Max
signature.asc
Description: OpenPGP digital signature