I'm trying to clarify that it is not up to the qcow2 driver what "used"
or "unused" *means* but rather that only the qcow2 driver itself can
know which segments are used or unused, as these are semantic
distinctions that belong to the format driver layer. Does that make
sense?
That's the spirit of my suggestion.
+# the feature is Qcow2 and for this case 'used' are clusters with
positive
+# refcount and unused a clusters with zero refcount. Described
portions include
+# all format file allocations, not only virtual disk data
(metadata, internal
+# snapshots, etc. are included).
+#
"For now, the only format driver supporting this feature is Qcow2 which
is a cluster based format. Clusters considered in-use by qcow2 are those
with a non-zero refcount in the format metadata. All other clusters, if
present, are considered unused."
(Your original description is actually pretty clear.)
I might add some further examples to illustrate the abstraction boundary
we're targeting:
"Examples of unused allocations for the Qcow2 format are leaked
clusters, pre-allocated clusters, and recently freed clusters."
+# For the underlying file there are native block-status types of
the portions:
How about "underlying protocol file" or "underlying storage protocol" or
something that uses the word "protocol" to make it very clear about when
we're talking about a format (qcow2) and when we're talking about the
storage/protocol file itself (raw posix)
+# - data: allocated data
ACK. My favorite kind of data. Easy to understand for idiots like me.
+# - zero: read-as-zero holes
This is not *necessarily* a hole, at the discretion of the protocol.
Depending on how we've backed the qcow2 we might not actually know how
the zeroes are stored, or if they are stored. All we know is that the
storage protocol here knows that this data happens to be zero.
I find the usage of "hole" here to be misleading, as it suggests
naturally either filesystem sparse allocations (which is correct,
incidentally) but also qcow2 holes, which doesn't have anything to do
with zeroes, necessarily.
+# - discarded: not allocated
This might be OK; I don't have a better suggestion. "not allocated" is
again protocol-dependent, but I can't think of a better way to phrase
this, actually...
+# 4th additional type is 'overrun', which is for the format file
portions beyond
+# the end of the underlying file.
+#
"Which is data referenced by the format driver located beyond EOF of the
protocol file."
The key thing I am trying to illustrate in the phrase is that the format
file specifies or alludes to the existence of data that is beyond the
EOF for the protocol file.
I think -- though I cannot prove -- that this is almost certainly a
special case of read-as-zero. If that is the case, perhaps we could
mention as much.
An example here would be really illustrative:
"For example, a partially allocated cluster at the end of a QCOW2 file,
where Qcow2 generally operates on complete clusters."
+# So, the fields are:
+#
+# @used-data: used by the format file and backed by data in the
underlying file
+#
+# @used-zero: used by the format file and backed by a hole in the
underlying
+# file
+#
Maybe "backed by zeroes in the underlying file; which may be a
filesystem hole for e.g. POSIX files."
+# @used-discarded: used by the format file but actually unallocated
in the
+# underlying file
+#
Which would almost certainly be an error, right? Mentioning as much
might be good.
+# @used-overrun: used by the format file beyond the end of the
underlying file
+#
Which may or may not be an error, depending on how the protocol file
(for the format driver?) handles reads to areas out of bounds.
+# @unused-data: allocated data in the underlying file not used by
the format
+#
+# @unused-zero: holes in the underlying file not used by the format
file
+#
+# @unused-discarded: unallocated areas in the underlying file not
used by the
+# format file
+#
+# Note: sum of 6 fields {used,unused}-{data,zero,discarded} is
equal to the
+# length of the underlying file.
+#
+# Since: 2.10
+#
+##
+{ 'struct': 'BlockFormatAllocInfo',
+ 'data': {'used-data': 'uint64',
+ 'used-zero': 'uint64',
+ 'used-discarded': 'uint64',
+ 'used-overrun': 'uint64',
+ 'unused-data': 'uint64',
+ 'unused-zero': 'uint64',
+ 'unused-discarded': 'uint64' } }
+
+##
# @ImageCheck:
#
# Information about a QEMU image file check
All of my suggestions here are purely on phrasings. The mechanics of
this patch are now clear to me and I think it is useful information to
have in qemu.
Thanks for putting up with my questions!