On Mon, Jun 01, 2026 at 11:44:04PM +0200, Sam Li wrote: > By adding zone operations and zoned metadata, the zoned emulation > capability enables full emulation support of zoned device using > a qcow2 file. The zoned device metadata includes zone type, > zoned device state and write pointer (WP) of each zone, which is > stored to an array of unsigned integers. > > WP accessor (qcow2_rw_wp_at) routes reads and writes of an 8-byte > WP slot through the write pointer cache. The write pointer cache is > written to disk after the qcow2 metadata is written, thus guaranteeing > that the write pointer is updated after the corresponding data is > written. Per-completion cache flush is deferred. The WP cluster reaches > disk on the next flush. > > Each zone of a zoned device makes state transitions following > the zone state machine. The zone state machine mainly describes > five states, IMPLICIT OPEN, EXPLICIT OPEN, FULL, EMPTY and CLOSED. > READ ONLY and OFFLINE states will generally be affected by device > internal events. The operations on zones cause corresponding state > changing. > > Zoned devices have limits on zone resources, which put constraints on > write operations on zones. It is managed by active zone queues > following LRU policy. > > Signed-off-by: Sam Li <[email protected]> > --- > block/qcow2-cache.c | 8 + > block/qcow2-refcount.c | 7 + > block/qcow2.c | 1137 +++++++++++++++++++++++++++++++++++++++- > block/trace-events | 2 + > 4 files changed, 1149 insertions(+), 5 deletions(-) > > diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c > index 23d9588b08..bdfb11ce88 100644 > --- a/block/qcow2-cache.c > +++ b/block/qcow2-cache.c > @@ -275,6 +275,14 @@ int qcow2_cache_set_dependency(BlockDriverState *bs, > Qcow2Cache *c, > { > int ret; > > + /* > + * If the dependency graph is unchanged, nothing to do. This avoids > + * a synchronous flush on every call below. > + */ > + if (c->depends == dependency) { > + return 0; > + }
This makes part of the expression below tautologous:
if (c->depends && (c->depends != dependency)) {
^^^^^^^^^^^^^^^^^^^^^^^^^^
ret = qcow2_cache_flush_dependency(bs, c);
if (ret < 0) {
return ret;
}
}
That sub-expression could be dropped, but it makes me worry that the
earlier if (dependency->depends) statement is needed even when
c->depends == dependency.
Kevin: Any thoughts on this?
> +
> if (dependency->depends) {
> ret = qcow2_cache_flush_dependency(bs, dependency);
> if (ret < 0) {
> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
> index 6512cda407..f551726609 100644
> --- a/block/qcow2-refcount.c
> +++ b/block/qcow2-refcount.c
> @@ -1239,6 +1239,13 @@ int qcow2_write_caches(BlockDriverState *bs)
> }
> }
>
> + if (s->wp_cache) {
> + ret = qcow2_cache_write(bs, s->wp_cache);
> + if (ret < 0) {
> + return ret;
> + }
> + }
> +
> return 0;
> }
>
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 29eec33e34..bdc8923b71 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -195,6 +195,300 @@ qcow2_extract_crypto_opts(QemuOpts *opts, const char
> *fmt, Error **errp)
> return cryptoopts_qdict;
> }
>
> +#define QCOW2_ZT_IS_CONV(wp) (wp & 1ULL << 59)
> +#define QCOW2_GET_WP(wp) ((wp << 5) >> 5)
> +
> +/*
> + * To emulate a real zoned device, closed, empty and full states are
> + * preserved after a power cycle. The open states are in-memory and will
> + * be lost after closing the device. Read-only and offline states are
> + * device-internal events, which are not considered for simplicity.
> + */
> +static inline BlockZoneState qcow2_get_zone_state(BlockDriverState *bs,
> + uint32_t index)
I guess this function requires bs->wps->colock or s->lock, otherwise the
TAILQ accesses could race? Please check and document the locking
requirements.
I/O requests may be processed in multiple threads simultaneously.
s->lock protects qcow2 state.
> +{
> + BDRVQcow2State *s = bs->opaque;
> + Qcow2ZoneListEntry *zone_entry = &s->zone_list_entries[index];
> + uint64_t zone_wp = bs->wps->wp[index];
> + uint64_t zone_start;
> +
> + if (QCOW2_ZT_IS_CONV(zone_wp)) {
> + return BLK_ZS_NOT_WP;
> + }
> +
> + if (QTAILQ_IN_USE(zone_entry, exp_open_zone_entry)) {
> + return BLK_ZS_EOPEN;
> + }
> + if (QTAILQ_IN_USE(zone_entry, imp_open_zone_entry)) {
> + return BLK_ZS_IOPEN;
> + }
> +
> + zone_start = index * bs->bl.zone_size;
This is a uint32_t * uint32_t multiplication that can overflow. Avoid
that with:
zone_start = (uint64_t)index * bs->bl.zone_size;
> + if (zone_wp == zone_start) {
> + return BLK_ZS_EMPTY;
> + }
> + if (zone_wp >= zone_start + bs->bl.zone_capacity) {
> + return BLK_ZS_FULL;
> + }
> + if (zone_wp > zone_start) {
> + if (!QTAILQ_IN_USE(zone_entry, closed_zone_entry)) {
> + /*
> + * The number of closed zones is not always updated in time when
> + * the device is closed. However, it only matters when doing
> + * zone report. Refresh the count and list of closed zones to
> + * provide correct zone states for zone report.
> + */
> + QTAILQ_INSERT_HEAD(&s->closed_zones, zone_entry,
> closed_zone_entry);
> + s->nr_zones_closed++;
> + }
> + return BLK_ZS_CLOSED;
> + }
> + return BLK_ZS_NOT_WP;
> +}
> +
> +static void qcow2_rm_exp_open_zone(BDRVQcow2State *s,
> + uint32_t index)
Locking requirements here and in the functions that follow?
> +{
> + Qcow2ZoneListEntry *zone_entry = &s->zone_list_entries[index];
> +
> + QTAILQ_REMOVE(&s->exp_open_zones, zone_entry, exp_open_zone_entry);
> + s->nr_zones_exp_open--;
> +}
> +
> +static void qcow2_rm_imp_open_zone(BDRVQcow2State *s,
> + int32_t index)
> +{
> + Qcow2ZoneListEntry *zone_entry;
> + if (index < 0) {
> + /* Apply LRU when the index is not specified. */
> + zone_entry = QTAILQ_LAST(&s->imp_open_zones);
> + } else {
> + zone_entry = &s->zone_list_entries[index];
> + }
> +
> + QTAILQ_REMOVE(&s->imp_open_zones, zone_entry, imp_open_zone_entry);
> + s->nr_zones_imp_open--;
> +}
> +
> +static void qcow2_rm_open_zone(BDRVQcow2State *s,
> + uint32_t index)
> +{
> + Qcow2ZoneListEntry *zone_entry = &s->zone_list_entries[index];
> +
> + if (QTAILQ_IN_USE(zone_entry, exp_open_zone_entry)) {
> + qcow2_rm_exp_open_zone(s, index);
> + } else if (QTAILQ_IN_USE(zone_entry, imp_open_zone_entry)) {
> + qcow2_rm_imp_open_zone(s, index);
> + }
> +}
> +
> +static void qcow2_rm_closed_zone(BDRVQcow2State *s,
> + uint32_t index)
> +{
> + Qcow2ZoneListEntry *zone_entry = &s->zone_list_entries[index];
> +
> + QTAILQ_REMOVE(&s->closed_zones, zone_entry, closed_zone_entry);
> + s->nr_zones_closed--;
> +}
> +
> +static void qcow2_do_imp_open_zone(BDRVQcow2State *s,
> + uint32_t index,
> + BlockZoneState zs)
> +{
> + Qcow2ZoneListEntry *zone_entry = &s->zone_list_entries[index];
> +
> + switch (zs) {
> + case BLK_ZS_EMPTY:
> + break;
> + case BLK_ZS_CLOSED:
> + qcow2_rm_closed_zone(s, index);
> + break;
> + case BLK_ZS_IOPEN:
> + /*
> + * The LRU policy: update the zone that is most recently
> + * used to the head of the zone list
> + */
> + if (zone_entry == QTAILQ_FIRST(&s->imp_open_zones)) {
> + return;
> + }
> + QTAILQ_REMOVE(&s->imp_open_zones, zone_entry, imp_open_zone_entry);
> + s->nr_zones_imp_open--;
> + break;
> + default:
> + return;
> + }
> +
> + QTAILQ_INSERT_HEAD(&s->imp_open_zones, zone_entry, imp_open_zone_entry);
> + s->nr_zones_imp_open++;
> +}
> +
> +static void qcow2_do_exp_open_zone(BDRVQcow2State *s,
> + uint32_t index)
> +{
> + Qcow2ZoneListEntry *zone_entry = &s->zone_list_entries[index];
> +
> + QTAILQ_INSERT_HEAD(&s->exp_open_zones, zone_entry, exp_open_zone_entry);
> + s->nr_zones_exp_open++;
> +}
> +
> +/*
> + * The list of zones is managed using an LRU policy: the last
> + * zone of the list is always the one that was least recently used
> + * for writing and is chosen as the zone to close to be able to
> + * implicitly open another zone.
> + *
> + * We can only close the open zones. The index is not specified
> + * when it is less than 0.
> + */
> +static void qcow2_do_close_zone(BlockDriverState *bs,
> + int32_t index,
> + BlockZoneState zs)
> +{
> + BDRVQcow2State *s = bs->opaque;
> + Qcow2ZoneListEntry *zone_entry;
> +
> + if (index >= 0) {
> + zone_entry = &s->zone_list_entries[index];
> + } else {
> + /* before removal of the last implicitly open zone */
> + zone_entry = QTAILQ_LAST(&s->imp_open_zones);
There is an assumption that zone_entry is no NULL when zs ==
BLK_ZS_IOPEN? I think that make sense and it means NULL dereferences
cannot happen, but I wanted to check. You could add an assert(zone_entry
!= NULL) here to make that explicit.
> + }
> +
> + if (zs == BLK_ZS_IOPEN) {
> + qcow2_rm_imp_open_zone(s, index);
> + goto close_zone;
> + }
> +
> + if (index >= 0 && zs == BLK_ZS_EOPEN) {
> + qcow2_rm_exp_open_zone(s, index);
> + /*
> + * The zone state changes when the zone is removed from the list of
> + * open zones (explicitly open -> empty). The closed zone list is
> + * refreshed during get_zone_state().
> + */
> + qcow2_get_zone_state(bs, index);
> + }
> + return;
> +
> +close_zone:
> + QTAILQ_INSERT_HEAD(&s->closed_zones, zone_entry, closed_zone_entry);
> + s->nr_zones_closed++;
Is the goto and label necessary? Maybe move this inside the if
statement instead to simplify the function.
> +}
I've reviewed up to here for now.
signature.asc
Description: PGP signature
