Hello Hanna, Sorry for the delay and thanks for your thorough and detailed review.
On 10/31/23 17:53, Hanna Czenczek wrote: > On 20.10.23 23:56, Andrey Drobyshev wrote: >> This helper simply obtains the l2 table parameters of the cluster which >> contains the given subclusters range. Right now this info is being >> obtained and used by zero_l2_subclusters(). As we're about to introduce >> the subclusters discard operation, this helper would let us avoid code >> duplication. >> >> Also introduce struct SubClusterRangeInfo, which would contain all the >> needed params. >> >> Signed-off-by: Andrey Drobyshev <andrey.drobys...@virtuozzo.com> >> --- >> block/qcow2-cluster.c | 90 +++++++++++++++++++++++++++++-------------- >> 1 file changed, 62 insertions(+), 28 deletions(-) >> >> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c >> index 904f00d1b3..8801856b93 100644 >> --- a/block/qcow2-cluster.c >> +++ b/block/qcow2-cluster.c >> @@ -32,6 +32,13 @@ >> #include "qemu/memalign.h" >> #include "trace.h" >> +typedef struct SubClusterRangeInfo { >> + uint64_t *l2_slice; > > We should document that this is a strong reference that must be returned > via qcow2_cache_put(). Maybe you could also define a clean-up function > using G_DEFINE_AUTO_CLEANUP_CLEAR_FUNC() that does this, allowing this > type to be used with g_auto(). > >> + int l2_index; >> + uint64_t l2_entry; >> + uint64_t l2_bitmap; >> +} SubClusterRangeInfo; >> + >> int coroutine_fn qcow2_shrink_l1_table(BlockDriverState *bs, >> uint64_t exact_size) >> { >> @@ -1892,6 +1899,50 @@ again: >> return 0; >> } >> +static int get_sc_range_info(BlockDriverState *bs, uint64_t offset, >> + unsigned nb_subclusters, >> + SubClusterRangeInfo *scri) > > It would be good to have documentation for this function, for example > that it only works on a single cluster, i.e. that the range denoted by > @offset and @nb_subclusters must not cross a cluster boundary, and that > @offset must be aligned to subclusters. > I figured out those restrictions are derived from the number of asserts in the function body itself, much like it's done in zero_l2_subclusters() and friends. But of course I don't mind documenting this. > In general, it is unclear to me at this point what this function does. The sole purpose of this function is to avoid the code duplication, since both zero_l2_subclusters() (pre-existing) and discard_l2_subclusters() (newly introduced) need to obtain the same info about the subclusters range they're working with. > OK, it gets the SCRI for all subclusters in the cluster at @offset (this > is what its name implies), but then it also has this loop that checks > whether there are compressed clusters among the @nb_subclusters. Look at the pre-existing implementation of zero_l2_subclusters(): it also checks that the subcluster range we're dealing with isn't contained within a compressed cluster (otherwise there's no point zeroizing individual subclusters). So the logic isn't changed here. The only reason I decided to loop through the subclusters (instead of calling qcow2_get_cluster_type() for the whole cluster) is so that I could detect invalid subclusters and return -EINVAL right away. > It has a comment about being unable to zero/discard subclusters in compressed > clusters, but the function name says nothing about this scope of > zeroing/discarding. > Maybe rename it then to stress out that we're dealing with the regular subclusters only? get_normal_sc_range_info()? >> +{ >> + BDRVQcow2State *s = bs->opaque; >> + int ret, sc_cleared = 0, sc_index = offset_to_sc_index(s, offset); >> + QCow2SubclusterType sctype; >> + >> + /* Here we only work with the subclusters within single cluster. */ >> + assert(nb_subclusters > 0 && nb_subclusters < >> s->subclusters_per_cluster); >> + assert(sc_index + nb_subclusters <= s->subclusters_per_cluster); >> + assert(offset_into_subcluster(s, offset) == 0); >> + >> + ret = get_cluster_table(bs, offset, &scri->l2_slice, >> &scri->l2_index); >> + if (ret < 0) { >> + return ret; >> + } >> + >> + scri->l2_entry = get_l2_entry(s, scri->l2_slice, scri->l2_index); >> + scri->l2_bitmap = get_l2_bitmap(s, scri->l2_slice, scri->l2_index); >> + >> + do { >> + qcow2_get_subcluster_range_type(bs, scri->l2_entry, >> scri->l2_bitmap, >> + sc_index, &sctype); > > I think there’s a `ret = ` missing here. > Of course there is, thanks for catching this. >> + if (ret < 0) { >> + return ret; >> + } >> + >> + switch (sctype) { >> + case QCOW2_SUBCLUSTER_COMPRESSED: >> + /* We cannot partially zeroize/discard compressed >> clusters. */ >> + return -ENOTSUP; >> + case QCOW2_SUBCLUSTER_INVALID: >> + return -EINVAL; >> + default: >> + break; >> + } >> + >> + sc_cleared += ret; >> + } while (sc_cleared < nb_subclusters); >> + >> + return 0; >> +} >> + >> /* >> * This discards as many clusters of nb_clusters as possible at once >> (i.e. >> * all clusters in the same L2 slice) and returns the number of >> discarded >> @@ -2097,44 +2148,27 @@ zero_l2_subclusters(BlockDriverState *bs, >> uint64_t offset, >> unsigned nb_subclusters) >> { >> BDRVQcow2State *s = bs->opaque; >> - uint64_t *l2_slice; >> - uint64_t old_l2_bitmap, l2_bitmap; >> - int l2_index, ret, sc = offset_to_sc_index(s, offset); >> + uint64_t new_l2_bitmap; >> + int ret, sc = offset_to_sc_index(s, offset); >> + SubClusterRangeInfo scri = { 0 }; >> - /* For full clusters use zero_in_l2_slice() instead */ >> - assert(nb_subclusters > 0 && nb_subclusters < >> s->subclusters_per_cluster); >> - assert(sc + nb_subclusters <= s->subclusters_per_cluster); >> - assert(offset_into_subcluster(s, offset) == 0); >> - >> - ret = get_cluster_table(bs, offset, &l2_slice, &l2_index); >> + ret = get_sc_range_info(bs, offset, nb_subclusters, &scri); >> if (ret < 0) { >> - return ret; >> - } >> - >> - switch (qcow2_get_cluster_type(bs, get_l2_entry(s, l2_slice, >> l2_index))) { >> - case QCOW2_CLUSTER_COMPRESSED: >> - ret = -ENOTSUP; /* We cannot partially zeroize compressed >> clusters */ >> goto out; >> - case QCOW2_CLUSTER_NORMAL: >> - case QCOW2_CLUSTER_UNALLOCATED: >> - break; >> - default: >> - g_assert_not_reached(); >> } >> - old_l2_bitmap = l2_bitmap = get_l2_bitmap(s, l2_slice, l2_index); >> - >> - l2_bitmap |= QCOW_OFLAG_SUB_ZERO_RANGE(sc, sc + nb_subclusters); >> - l2_bitmap &= ~QCOW_OFLAG_SUB_ALLOC_RANGE(sc, sc + nb_subclusters); >> + new_l2_bitmap = scri.l2_bitmap; >> + new_l2_bitmap |= QCOW_OFLAG_SUB_ZERO_RANGE(sc, sc + >> nb_subclusters); >> + new_l2_bitmap &= ~QCOW_OFLAG_SUB_ALLOC_RANGE(sc, sc + >> nb_subclusters); >> - if (old_l2_bitmap != l2_bitmap) { >> - set_l2_bitmap(s, l2_slice, l2_index, l2_bitmap); >> - qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice); >> + if (new_l2_bitmap != scri.l2_bitmap) { >> + set_l2_bitmap(s, scri.l2_slice, scri.l2_index, new_l2_bitmap); >> + qcow2_cache_entry_mark_dirty(s->l2_table_cache, scri.l2_slice); >> } >> ret = 0; >> out: >> - qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice); >> + qcow2_cache_put(s->l2_table_cache, (void **) &scri.l2_slice); > > qcow2_cache_put() doesn’t look like it’s safe to be called if the table > is NULL (i.e. `scri.l2_slice == NULL`). We can get here if > get_sc_range_info() fails, so that may happen. I think you should either: > > (A) Implement G_DEFINE_AUTO_CLEANUP_CLEAR_FUNC() for the SCRI type so it > isn’t an issue (at least I assume that could solve it), or > > (B) Call qcow2_cache_put() here only if `scri.l2_slice != NULL`. > Since in patch 5 I add the code path zero_l2_subclusters() -> discard_l2_subclusters(), option (A) would only be valid if we manage to skip SCRI param on this call. If I understand correctly, you're suggesting adding a destructor for the SCRI type so that we call qcow2_cache_put() in one place and one place only when this structure goes out of the current scope. But even in this case we'd have to perform the '!= NULL' check in that destructor, the only benefit is that we do this in one place. > In any case, I think it also makes sense to have get_sc_range_info() > only return a valid table if it returns success, i.e. if there’s any > error in get_sc_range_info(), it should clean up `scri.l2_slice` itself. > Yes, this seems like a sane behaviour. > Hanna > >> return ret; >> } >