On 12.03.21 13:46, Vladimir Sementsov-Ogievskiy wrote:
12.03.2021 15:32, Vladimir Sementsov-Ogievskiy wrote:
12.03.2021 14:17, Max Reitz wrote:
On 12.03.21 10:09, Vladimir Sementsov-Ogievskiy wrote:
11.03.2021 22:58, Max Reitz wrote:
On 05.03.21 18:35, Vladimir Sementsov-Ogievskiy wrote:
There is a bug in qcow2: host cluster can be discarded (refcount
becomes 0) and reused during data write. In this case data write may
[..]
@@ -885,6 +1019,13 @@ static int QEMU_WARN_UNUSED_RESULT
update_refcount(BlockDriverState *bs,
if (refcount == 0) {
void *table;
+ Qcow2InFlightRefcount *infl = find_infl_wr(s,
cluster_index);
+
+ if (infl) {
+ infl->refcount_zero = true;
+ infl->type = type;
+ continue;
+ }
I don’t understand what this is supposed to do exactly. It seems
like it wants to keep metadata structures in the cache that are
still in use (because dropping them from the caches is what happens
next), but users of metadata structures won’t set in-flight
counters for those metadata structures, will they?
Don't follow.
We want the code in "if (refcount == 0)" to be triggered only when
full reference count of the host cluster becomes 0, including
inflight-write-cnt. So, if at this point inflight-write-cnt is not
0, we postpone freeing the host cluster, it will be done later from
"slow path" in update_inflight_write_cnt().
But the code under “if (refcount == 0)” doesn’t free anything, does
it? All I can see is code to remove metadata structures from the
metadata caches (if the discarded cluster was an L2 table or a
refblock), and finally the discard on the underlying file. I don’t
see how that protocol-level discard has anything to do with our
problem, though.
Hmm. Still, if we do this discard, and then our in-flight write, we'll
have data instead of a hole. Not a big deal, but seems better to
postpone discard.
On the other hand, clearing caches is OK, as its related only to
qcow2-refcount, not to inflight-write-cnt
As far as I understand, the freeing happens immediately above the “if
(refcount == 0)” block by s->set_refcount() setting the refcount to
0. (including updating s->free_cluster_index if the refcount is 0).
Hmm.. And that (setting s->free_cluster_index) what I should actually
prevent until total reference count becomes zero.
And about s->set_refcount(): it only update a refcount itself, and
don't free anything.
So, it is more correct like this:
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 464d133368..1da282446d 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1012,21 +1012,12 @@ static int QEMU_WARN_UNUSED_RESULT
update_refcount(BlockDriverState *bs,
} else {
refcount += addend;
}
- if (refcount == 0 && cluster_index < s->free_cluster_index) {
- s->free_cluster_index = cluster_index;
- }
s->set_refcount(refcount_block, block_index, refcount);
if (refcount == 0) {
void *table;
Qcow2InFlightRefcount *infl = find_infl_wr(s, cluster_index);
- if (infl) {
- infl->refcount_zero = true;
- infl->type = type;
- continue;
- }
-
table = qcow2_cache_is_table_offset(s->refcount_block_cache,
offset);
if (table != NULL) {
@@ -1040,6 +1031,16 @@ static int QEMU_WARN_UNUSED_RESULT
update_refcount(BlockDriverState *bs,
qcow2_cache_discard(s->l2_table_cache, table);
}
+ if (infl) {
+ infl->refcount_zero = true;
+ infl->type = type;
+ continue;
+ }
+
+ if (cluster_index < s->free_cluster_index) {
+ s->free_cluster_index = cluster_index;
+ }
+
if (s->discard_passthrough[type]) {
update_refcount_discard(bs, cluster_offset,
s->cluster_size);
}
I don’t think I like using s->free_cluster_index as a protection against
allocating something before it.
First, it comes back the problem I just described in my mail from 15:58
GMT+1, which is that you’re changing the definition of what a free
cluster is. With this proposal, you’re proposing yet a new definition:
A free cluster is anything with refcount == 0 after free_cluster_index.
Now looking only at the allocation functions, it may look like that kind
of is the definition already. But I don’t think that was the intention
when free_cluster_index was introduced, so we’d have to check every
place that sets free_cluster_index, to see whether it adheres to this
definition.
And I think it’s clear that there is a place that won’t adhere to this
definition, and that is this very place here, in update_refcount(). Say
free_cluster_index is 42. Then you free cluster 39, but there is a
write to it, so free_cluster_index isn’t update. Then you free cluster
38, and there are writes to that cluster, so free_cluster_index is
updated to 38. Suddenly, 39 is free to be allocated, too.
(The precise problem is that with this new definition decreasing
free_cluster_index suddenly has the power to free any cluster between
its new and all value. With the old definition, changing
free_cluster_index would never free any cluster. So when you decrease
free_cluster_index, you suddenly have to be sure that all clusters
between the new and old value that have refcount 0 are indeed to be
considered free.)
Max