On 12/14/2015 11:18 PM, Fam Zheng wrote:
> On Mon, 12/14 21:05, Max Reitz wrote:
>> On 14.12.2015 18:43, Vladimir Sementsov-Ogievskiy wrote:
>>> The new feature for qcow2: storing dirty bitmaps.
>>>
>>> Only dirty bitmaps relative to this qcow2 image should be stored in it.
>>>
>>> Strings started from +# are RFC-strings, not to be commited of course.
>>>
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsement...@virtuozzo.com>
>>> ---
>>>
>>>  docs/specs/qcow2.txt | 151 
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++-
>>>  1 file changed, 150 insertions(+), 1 deletion(-)
>>
>> Overall: Looks better to me. Good enough for me to ACK it, but I still
>> have some issues with it.
>>
>> Let's evaluate the main point of critique I had: I really want this not
>> to be qemu-specific but potentially useful to all programs.
>>
>> Pretty good: You do implicitly describe what a (dirty) bitmap looks like
>> by describing how to obtain the bit offset of a certain byte guest
>> offset. So it's not an opaque binary data dump anymore.
>>
>> (Why only "pretty good"? I find the description to be a bit too
>> "implicit", I think a separate section describing the bitmap structure
>> would be better.)
>>
>> Good: The bitmap actually describes the qcow2 file.
>>
>> Not so good: While now any program knows how to read the bitmap and that
>> it does refer to this qcow2 file, it's interpretation is not so easy
>> still. Generally, a dirty bitmap has some reference point, that is the
>> state of the disk when the bitmap was cleared or created. For instance,
>> for incremental backups, whenever you create a backup based on a dirty
>> bitmap, the dirty bitmap is cleared and the backup target is then said
>> reference point.
>> I think it would be nice to put that reference point (i.e. the name of
>> an image file that contains the clean image) into the dirty bitmap
>> header, if possible.
>>
>>
>> (Note: I won't comment on orthography, because I feel like that is
>> something a native speaker should do. O:-))
>>
>>> diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
>>> index 121dfc8..3c89580 100644
>>> --- a/docs/specs/qcow2.txt
>>> +++ b/docs/specs/qcow2.txt
>>> @@ -103,7 +103,17 @@ in the description of a field.
>>>                      write to an image with unknown auto-clear features if 
>>> it
>>>                      clears the respective bits from this field first.
>>>  
>>> -                    Bits 0-63:  Reserved (set to 0)
>>> +                    Bit 0:      Dirty bitmaps bit.
>>> +                                This bit is responsible for Dirty bitmaps
>>> +                                extension consistency.
>>> +                                If it is set, but there is no Dirty bitmaps
>>> +                                extensions, this should be considered as an
>>> +                                error.
>>> +                                If it is not set, but there is a Dirty 
>>> bitmaps
>>> +                                extension, its data should be considered as
>>> +                                inconsistent.
>>> +
>>> +                    Bits 1-63:  Reserved (set to 0)
>>>  
>>>           96 -  99:  refcount_order
>>>                      Describes the width of a reference count block entry 
>>> (width
>>> @@ -123,6 +133,7 @@ be stored. Each extension has a structure like the 
>>> following:
>>>                          0x00000000 - End of the header extension area
>>>                          0xE2792ACA - Backing file format name
>>>                          0x6803f857 - Feature name table
>>> +                        0x23852875 - Dirty bitmaps
>>>                          other      - Unknown header extension, can be 
>>> safely
>>>                                       ignored
>>>  
>>> @@ -166,6 +177,31 @@ the header extension data. Each entry look like this:
>>>                      terminated if it has full length)
>>>  
>>>  
>>> +== Dirty bitmaps ==
>>> +
>>> +Dirty bitmaps is an optional header extension. It provides an ability to 
>>> store
>>> +dirty bitmaps in a qcow2 image. The data of this extension should be 
>>> considered
>>> +as consistent only if corresponding auto-clear feature bit is set (see
>>> +autoclear_features above).
>>> +The fields of Dirty bitmaps extension are:
>>> +
>>> +          0 -  3:  nb_dirty_bitmaps
>>> +                   The number of dirty bitmaps contained in the image. 
>>> Valid
>>> +                   values: 1 - 65535.
>>
>> Again, I don't see a reason for why we should impose a strict upper
>> limit here. I'd prefer "Note that qemu currently only supports up to
>> 65535 dirty bitmaps per image."
>>
>>> +# Let's be strict, the feature should be deleted with deleting last bitmap.
> 
> Do you mean unsetting the auto-clear feature bit? Yes, I think that makes 
> sense.
> 

I assumed he meant the entire bitmap header. If there's no bitmaps,
there's no reason to store the extension anymore.

>>> +
>>> +          4 -  7:  dirty_bitmap_directory_size
>>> +                   Size of the Dirty Bitmap Directory in bytes. It should 
>>> be
>>> +                   equal to sum of sizes of all (nb_dirty_bitmaps) dirty 
>>> bitmap
>>> +                   headers.
>>
>> No, it "should" not be equal, it *must* be equal. But I think you can
>> just omit that last sentence, that would be just as fine.
>>
>>> +# This field is necessary to effectively read Dirty Bitmap Directory, 
>>> because
>>> +# it's entries (which are dirty bitmap headers) may have different lengths.
>>> +
>>> +          8 - 15:  dirty_bitmap_directory_offset
>>> +                   Offset into the image file at which the Dirty Bitmap
>>> +                   Directory starts. Must be aligned to a cluster boundary.
>>> +
>>> +
>>>  == Host cluster management ==
>>>  
>>>  qcow2 manages the allocation of host clusters by maintaining a reference 
>>> count
>>> @@ -360,3 +396,116 @@ Snapshot table entry:
>>>  
>>>          variable:   Padding to round up the snapshot table entry size to 
>>> the
>>>                      next multiple of 8.
>>> +
>>> +
>>> +== Dirty bitmaps ==
>>> +
>>> +The feature supports storing dirty bitmaps in a qcow2 image. All dirty 
>>> bitmaps
>>> +are relating to the virtual disk, stored in this image.
>>> +
>>> +=== Dirty Bitmap Directory ===
>>> +
>>> +Each dirty bitmap saved in the image is described in a Dirty Bitmap 
>>> Directory
>>> +entry. Dirty Bitmap Directory is a contiguous area in the image file, whose
>>> +starting offset and length are given by the header extension fields
>>> +dirty_bitmap_directory_offset and dirty_bitmap_directory_size. The entries 
>>> of
>>> +the bitmap directory have variable length, depending on the length of the
>>> +bitmap name. These entries are also called dirty bitmap headers.
>>> +
>>> +Dirty Bitmap Directory Entry:
>>> +
>>> +    Byte 0 -  7:    dirty_bitmap_table_offset
>>> +                    Offset into the image file at which the Dirty Bitmap 
>>> Table
>>> +                    (described below) for the bitmap starts. Must be 
>>> aligned to
>>> +                    a cluster boundary.
>>> +
>>> +         8 - 11:    dirty_bitmap_table_size
>>> +                    Number of entries in the Dirty Bitmap Table of the 
>>> bitmap.
>>> +
>>> +        12 - 15:    flags
>>> +                    Bit
>>> +                      0: in_use
>>> +                         The bitmap was not saved correctly and may be
>>> +                         inconsistent.
>>> +
>>> +                      1: auto
>>> +                         The bitmap should be autoloaded as block dirty 
>>> bitmap
>>> +                         and tracking should be started. Type of the bitmap
>>> +                         should be 'Dirty Tracking Bitmap'.
>>
>> I find the wording a bit too qemu-specific. How about this:
>>
>> This bitmap is the default dirty bitmap for the virtual disk represented
>> by this qcow2 image. It should track all write accesses immediately
>> after the image has been opened.
>>
>> And I find the "should" in "Type of the bitmap should be..." a bit too
>> weak. Maybe "This flag is valid only for Dirty Tracking Bitmaps" is better.
>>
>>> +
>>> +                    Bits 2 - 31 are reserved and must be 0.
>>> +
>>> +        16 - 17:    name_size
>>> +                    Size of the bitmap name. Valid values: 1 - 1023.
>>> +
>>> +             18:    type
>>> +                    This field describes the sort of the bitmap.
>>> +                    Values:
>>> +                      0: Dirty Tracking Bitmap
>>
>> If we allow different kinds of bitmaps, it should not be called "dirty
>> bitmap" everywhere anymore.
>>
>>> +
>>> +                    Values 1 - 255 are reserved.
>>> +# This is mostly for error checking and information in qemu-img info 
>>> output.
>>> +# The other types may be, for example, "Backup Bitmap" - to make it 
>>> possible
>>> +# stop backup job on vm stop and resume it later. The another one is 
>>> "Sector
>>> +# Alloction Bitmap" (Fam, John, please comment).
>>
>> I'm waiting for their comments because that sounds like "refcount table
>> with refcount_bits=1" to me. :-)
> 
> Allocation information for qcow2 can be obtained from L1/L2/refcount tables, 
> so
> maybe it's not worth a type here.  I am not quite sure about the "backup
> bitmap" type either, because it is QEMU specific; alternatively we can add an
> "enabled" flag to each dirty bitmap, so that the "drive-backup" bitmap can be
> saved as a disabled dirty tracking bitmap.
> 
> But the idea of having the type field makes sense to me in general.
> 

Right -- our plans for other types are still nebulous, but this gives us
some wiggle room.

>>
>>> +             19:    granularity_bits
>>> +                    Granularity bits. Valid values are: 0 - 31.
>>> +# Now, qemu allows creating bitmaps with granularity as a 32bit value. And
>>> +# there are no reasons of increasing it.
>>
>> Good (implicit) question. I can't imagine any reason for wanting to have
>> a coarser granularity than 2 GB, but I do think there may be a need in
>> the future for some people.
>>
>> Once again, I think we should discriminate between what is generally a
>> useful limitation and what is simply due to qemu not supporting anything
>> else right now.
>>
>> Thus, I think it would be better to increase the range to 0 - 63 and
>> make a note that qemu only supports values up to 31 right now.
>>
>>> +
>>> +                    Granularity is calculated as
>>> +                        granularity = 1 << granularity_bits
>>> +
>>> +                    Granularity of the bitmap is how many bytes of the 
>>> image
>>> +                    accounts for one bit of the bitmap.
>>> +# To be closer to qcow2 and its reality, I've decided to use 
>>> byte-granularity
>>> +# here, not sector-granularity.
>>
>> I like that. But do note that qcow2 does align everything at least to
>> 512 bytes, so having used sector granularity wouldn't have been too bad.
> 
> I would then suggest limiting the valid values to 9 - 63, at least as a note
> for QEMU support.
> 

>From qcow2.txt:

'20 - 23:   cluster_bits
            Number of bits that are used for addressing an offset
            within a cluster (1 << cluster_bits is the cluster size).
            Must not be less than 9 (i.e. 512 byte clusters).'

If we wanted to make the argument for it, we actually could feasibly
introduce 9 as a minimum, arguing that there's simply no use for a
bitmap that tracks granularities any smaller than the qcow2 can even
support.

We could also simply make the case that the bitmap granularity can never
be any smaller than the qcow2's current sector size, but that would
require some small adjustments in QEMU currently to reject bitmaps
configured to be smaller than such.

It's also fine as-is, as having a smaller granularity won't actually
break anything, it's just redundantly verbose and wasteful.

>>
>>> +
>>> +        variable:   The name of the bitmap (not null terminated). Should be
>>> +                    unique among all dirty bitmap names within the Dirty
>>> +                    bitmaps extension.
>>> +
>>> +        variable:   Padding to round up the Dirty Bitmap Directory Entry 
>>> size
>>> +                    to the next multiple of 8.
>>
>> What I'd like here is variable additional information based on the
>> bitmap type. For some types, this may be absolutely necessary; for dirty
>> tracking bitmaps it depends on what we do about the reference point thing.
>>
>> The reference point thing is the following: As mentioned at the top, I'd
>> like there to be some kind of description of what the clean state was.
>> As far as I know, this is generally a backup in the form of a file. In
>> that case, we could put that filename here.
>>
>> I don't think not having a reference point description is a serious show
>> stopper. qemu itself does rely on the management layer to know which
>> bitmap to use when. But I think it would be pretty nice to have it here.
>>
>>> +
>>> +=== Dirty Bitmap Table ===
>>> +
>>> +Dirty bitmaps are stored using a one-level (not two-level like refcounts 
>>> and
>>> +guest clusters mapping) structure for the mapping of bitmaps to host 
>>> clusters.
>>> +It is called Dirty Bitmap Table.
>>> +
>>> +Each Dirty Bitmap Table has a variable size (stored in the Dirty Bitmap
>>> +Directory Entry) and may use multiple clusters, however it must be 
>>> contiguous
>>> +in the image file.
>>> +
>>> +Given an offset (in bytes) into the bitmap, the offset into the image file 
>>> can
>>> +be obtained as follows:
>>> +
>>> +    byte_offset =
>>> +        dirty_bitmap_table[offset / cluster_size] + (offset % cluster_size)
>>> +
>>> +Taking into account the granularity of the bitmap, an offset in bits into 
>>> the
>>> +image file, corresponding to byte number byte_nr of the image can be 
>>> calculated
>>> +like this:
>>> +
>>> +    bit_offset =
>>> +        byte_offset(byte_nr / granularity / 8) * 8 + (byte_nr / 
>>> granularity) % 8
>>> +
>>> +Note: the last formula is only for understanding the things, it is 
>>> unlikely for
>>> +it to be useful in a program.
>>
>> I think this note is superfluous. All the pseudo-code in this file is
>> only that, pseudo-code. ;-)
>>
>> Apart from that, I think this last formula should be in its own section
>> ("Dirty Bitmaps" or simply "Bitmaps") which describes the structure of a
>> bitmap. Putting this term there should basically suffice.
>>
>> I was about to say I'd like it to define the bit order, too (i.e. "bit
>> offset 0 is the LSb"), but, well, it just uses the bit order used
>> everywhere in qcow2.
>>
>>> +
>>> +Dirty Bitmap Table entry:
>>> +
>>> +    Bit       0:    Reserved and should be zero if bits 9 - 55 are 
>>> non-zero.
>>> +                    If bits 9 - 55 are zero:
>>> +                      0: Cluster should be read as all zeros.
>>> +                      1: Cluster should be read as all ones.
>>> +
>>> +         1 -  8:    Reserved and must be zero.
>>> +
>>> +         9 - 55:    Bits 9 - 55 of host cluster offset. Must be aligned to 
>>> a
>>> +                    cluster boundary. If the offset is 0, the cluster is
>>> +                    unallocated, see bit 0 description.
>>> +
>>> +        56 - 63:    Reserved, must be 0.
>>>
>>
>> Looks good.
>>
> 
> Apart from all above, is it worth documenting what will happen with all the
> existing dirty bitmaps when internal snapshots are created/removed/switched?
> 
> Fam
> 

The bitmaps need to be marked appropriately dirty for any cluster(s)
that are changed as a result of the operation.

The degenerate case if it's too hard to tell is to just mark the entire
bitmap dirty. Not sure if deleting any bitmaps is the right answer,
since management apps might expect to see them ...

Even though a drive can be "rolled back" via a snapshot, there's not
really an equivalent operation for a backup chain, and I don't even
think it would be possible to implement.

Unless the .qcow2 had a special knowledge of "This bitmap was last
synced when we made a snapshot" QEMU wouldn't even be able to notify the
management application that it should consider pointing to [some
incremental] as the now most current one. Incrementals are very
external-state dependent features.

So even if we decided to keep backup copies of bitmaps, they wouldn't
necessarily even be of much use to anyone, I think. Heaven help you if
you delete a snapshot not created near any particular incremental.

So the degenerate case of "You deleted a snapshot, that was an
unfortunate choice, your bitmaps have been marked dirty" seems
appropriate, even though the incremental backup chain now has a weird
"old2 -> old1 -> deleted -> old1.1" style timeflow to it now.

This is worth hashing out and including a blurb, I think.

--js

Reply via email to