Stefan Hajnoczi <stefa...@redhat.com> 于2023年8月23日周三 03:48写道: > > On Mon, Aug 14, 2023 at 04:58:01PM +0800, Sam Li wrote: > > By adding zone operations and zoned metadata, the zoned emulation > > capability enables full emulation support of zoned device using > > a qcow2 file. The zoned device metadata includes zone type, > > zoned device state and write pointer of each zone, which is stored > > to an array of unsigned integers. > > > > Each zone of a zoned device makes state transitions following > > the zone state machine. The zone state machine mainly describes > > five states, IMPLICIT OPEN, EXPLICIT OPEN, FULL, EMPTY and CLOSED. > > READ ONLY and OFFLINE states will generally be affected by device > > internal events. The operations on zones cause corresponding state > > changing. > > > > Zoned devices have a limit on zone resources, which puts constraints on > > write operations into zones. > > > > Signed-off-by: Sam Li <faithilike...@gmail.com> > > --- > > block/qcow2.c | 676 ++++++++++++++++++++++++++++++++++++++++- > > block/qcow2.h | 2 + > > docs/interop/qcow2.txt | 2 + > > 3 files changed, 678 insertions(+), 2 deletions(-) > > > > diff --git a/block/qcow2.c b/block/qcow2.c > > index c1077c4a4a..5ccf79cbe7 100644 > > --- a/block/qcow2.c > > +++ b/block/qcow2.c > > @@ -194,6 +194,164 @@ qcow2_extract_crypto_opts(QemuOpts *opts, const char > > *fmt, Error **errp) > > return cryptoopts_qdict; > > } > > > > +#define QCOW2_ZT_IS_CONV(wp) (wp & 1ULL << 59) > > + > > +static inline int qcow2_get_wp(uint64_t wp) > > +{ > > + /* clear state and type information */ > > + return ((wp << 5) >> 5); > > +} > > + > > +static inline int qcow2_get_zs(uint64_t wp) > > +{ > > + return (wp >> 60); > > +} > > + > > +static inline void qcow2_set_wp(uint64_t *wp, BlockZoneState zs) > > +{ > > + uint64_t addr = qcow2_get_wp(*wp); > > + addr |= ((uint64_t)zs << 60); > > + *wp = addr; > > +} > > + > > +/* > > + * File wp tracking: reset zone, finish zone and append zone can > > + * change the value of write pointer. All zone operations will change > > + * the state of that/those zone. > > + * */ > > +static inline void qcow2_wp_tracking_helper(int index, uint64_t wp) { > > + /* format: operations, the wp. */ > > + printf("wps[%d]: 0x%x\n", index, qcow2_get_wp(wp)>>BDRV_SECTOR_BITS); > > +} > > + > > +/* > > + * Perform a state assignment and a flush operation that writes the new wp > > + * value to the dedicated location of the disk file. > > + */ > > +static int qcow2_write_wp_at(BlockDriverState *bs, uint64_t *wp, > > + uint32_t index, BlockZoneState zs) { > > + BDRVQcow2State *s = bs->opaque; > > + int ret; > > + > > + qcow2_set_wp(wp, zs); > > + ret = bdrv_pwrite(bs->file, s->zoned_header.zonedmeta_offset > > + + sizeof(uint64_t) * index, sizeof(uint64_t), wp, 0); > > + > > + if (ret < 0) { > > + goto exit; > > Should *wp be restored to its original value to undo the effect of > qcow2_set_wp()? > > > + } > > + qcow2_wp_tracking_helper(index, *wp); > > + return ret; > > + > > +exit: > > + error_report("Failed to write metadata with file"); > > + return ret; > > +} > > + > > +static int qcow2_check_active(BlockDriverState *bs) > > Please rename this to qcow2_check_active_zones() to avoid confusion with > other uses "active" in qcow2. > > > +{ > > + BDRVQcow2State *s = bs->opaque; > > + > > + if (!s->zoned_header.max_active_zones) { > > + return 0; > > + } > > + > > + if (s->nr_zones_exp_open + s->nr_zones_imp_open + s->nr_zones_closed > > + < s->zoned_header.max_active_zones) { > > + return 0; > > + } > > + > > + return -1; > > +} > > (This function could return a bool instead of 0/-1 since it doesn't > really need an int.) > > > + > > +static int qcow2_check_open(BlockDriverState *bs) > > qcow2_check_open_zones() or, even better, qcow2_can_open_zone(). > > > +{ > > + BDRVQcow2State *s = bs->opaque; > > + int ret; > > + > > + if (!s->zoned_header.max_open_zones) { > > + return 0; > > + } > > + > > + if (s->nr_zones_exp_open + s->nr_zones_imp_open > > + < s->zoned_header.max_open_zones) { > > + return 0; > > + } > > + > > + if(s->nr_zones_imp_open) { > > + ret = qcow2_check_active(bs); > > + if (ret == 0) { > > + /* TODO: it takes O(n) time complexity (n = nr_zones). > > + * Optimizations required. */ > > One solution is to keep an implicitly open list. Then this operation is > O(1).
Yes, I'll add this maybe in the v4. > > > + /* close one implicitly open zones to make it available */ > > + for (int i = s->zoned_header.zone_nr_conv; > > + i < bs->bl.nr_zones; ++i) { > > + uint64_t *wp = &s->wps->wp[i]; > > + if (qcow2_get_zs(*wp) == BLK_ZS_IOPEN) { > > + ret = qcow2_write_wp_at(bs, wp, i, BLK_ZS_CLOSED); > > I'm wondering if it's correct to store the zone state persistently in > the qcow2 file. If the guest or QEMU crashes, then zones will be left in > states like EOPEN. Since the guest software will have forgotten about > explicitly opened zones, the guest would need to recover zone states. > I'm not sure if existing software is designed to do that. > > Damien: Should the zone state be persistent? Pin? > > > + if (ret < 0) { > > + return ret; > > + } > > + s->wps->wp[i] = *wp; > > + s->nr_zones_imp_open--; > > + s->nr_zones_closed++; > > + break; > > + } > > + } > > + return 0; > > This returns 0 if there are no IOPEN zones available. It needs to return > an error when there are not enough resources available. > > > + } > > + return ret; > > + } > > + > > + return -1; > > This function mixes 0/-1 with -errno return values. -1 is -EPERM, but I > think that's not what you want.