Hi Damien, Damien Le Moal <damien.lem...@opensource.wdc.com> 于2022年6月1日周三 13:47写道: > > On 6/1/22 11:57, Sam Li wrote: > > Hi Stefan, > > > > Stefan Hajnoczi <stefa...@gmail.com> 于2022年5月30日周一 19:19写道: > > > > > >> > >> On Mon, 30 May 2022 at 06:09, Sam Li <faithilike...@gmail.com> wrote: > >>> > >>> Hi everyone, > >>> I'm Sam Li, working on the Outreachy project which is to add zoned > >>> device support to QEMU's virtio-blk emulation. > >>> > >>> For the first goal, adding QEMU block layer APIs resembling Linux ZBD > >>> ioctls, I think the naive approach would be to introduce a new stable > >>> struct zbd_zone descriptor for the library function interface. More > >>> specifically, what I'd like to add to the BlockDriver struct are: > >>> 1. zbd_info as zone block device information: includes numbers of > >>> zones, size of logical blocks, and physical blocks. > >>> 2. zbd_zone_type and zbd_zone_state > >>> 3. zbd_dev_model: host-managed zbd, host-aware zbd > >>> With those basic structs, we can start to implement new functions as > >>> bdrv*() APIs for BLOCK*ZONE ioctls. > >>> > >>> I'll start to finish this task based on the above description. If > >>> there is any problem or something I may miss in the design, please let > >>> me know. > >> > >> Hi Sam, > >> Can you propose function prototypes for the new BlockDriver callbacks > >> needed for zoned devices? > > > > I have made some modifications based on Damien's device in design part > > 1 and added the function prototypes in design part 2. If there is any > > problem or part I missed, please let me know. > > > > Design of Block Layer APIs in BlockDriver: > > 1. introduce a new stable struct zbd_zone descriptor for the library > > function interface. > > a. zbd_info as zone block device information: includes numbers of > > zones, size of blocks, write granularity in byte(minimal write size > > and alignment > > - write granularity: 512e SMRs: writes in units of physical block > > size, 4096 bytes; NVMe ZNS write granularity is equal to the block > > size. > > - zone descriptor: start, length, capacity, write pointer, zone type > > b. zbd_zone_type > > - zone type: conventional, sequential write required, sequential > > write preferred > > c. zbd_dev_model: host-managed zbd, host-aware zbd > > This explanation is a little hard to understand. It seems to be mixing up > device level information and per-zone information. I think it would be a > lot simpler to write a struct definition to directly illustrate what you > are planning. > > It is something like this ? > > struct zbd_zone { > enum zone_type type; > enum zone_cond cond; > uint64_t start; > uint32_t length; > uint32_t cap; > uint64_t wp; > }; > > strcut zbd_dev { > enum zone_model model; > uint32_t block_size; > uint32_t write_granularity; > uint32_t nr_zones > struct zbd_zone *zones; /* array of zones */ > }; >
Yes! Sorry that I was not very clear in the descriptions. > If yes, then my comments are as follows. > > For the device struct: It may be good to have also the maximum number of > open zones and the maximum number of active zones. > > For the zone struct: You may need to add a read-write lock per zone to be > able to write lock zones to ensure a sequential write pattern (virtio > devices can be multi-queue and so writes may be coming in from different > contexts) and to correctly emulate zone append operations with an atomic > update of the wp field. > Yes, I haven't thought through the thread-safety problem but I'll come up with an approach. > These need to be integrated into the generic block driver interface in > include/block/block_int-common.h or include/block/block-common.h. > > > > > 2. implement new functions as bdrv*() APIs for BLK*ZONE ioctls > > a. support basic operations: get the APIs working when executing > > the zone operations from a guest > > - zone information access: report > > - zone manipulation: reset,open,close,finish > > These operations are generally referred to as "zone management" operations. > > > b. support zone append operation: zone capacity, write pointer > > positions of all zones(excluded for now) > > - can track the zone state we need: zone is full or not. > > > > More specifically, the function prototypes for 2a are as follows: > > > > int zbd_report_zones(int fd, off_t offset, off_t len, enum > > zbd_report_opetion ro, struct zbd_zone *zones, unsigned int > > *nr_zones); > > The current virtio zone specification draft does not have a reporting > option field for the report zones command. However, it has a "partial" > field that will need to be passed to this function so that the correct > report zones reply can be built by the driver. > > > int zbd_reset_zones(int fd, off_t offset, off_t len); > > int zbd_open_zones(int fd, off_t offset, off_t len); > > int zbd_close_zones(int fd, off_t offset, off_t len); > > int zbd_finish_zones(int fd, off_t offset, off_t len); > > These 4 functions have the exact same signature, modulo the function name. > So we should simplify here to reduce the number of functions. Something like: > > int zbd_zone_mgmt(int fd, enum zone_op op, off_t offset, off_t len); > > where op can be BDRV_ZONE_RESET, BDRV_ZONE_OPEN, etc can aggregate all 4 > functions into one. > Thanks for the suggestions. It would be better to use one general function supporting four operations rather than four. > In any case, if you look at how block drivers are defined (an example is > the one for raw files in qemu/block/file-posix.c) using the BlockDriver > data type (defined as a struct in qemu/include/block/block_int-common.h), > you will see that the driver callback functions do not use a file > descriptor (fd) but a pointer to some data structure (most of the time a > BlockDriverState pointer). > Yes, I'll use it instead. Meanwhile, the BlockDriver callbacks can be(with Damien's suggestion): -> virtio_blk_handle_zone_mngmt() -> blk_aio_zone_mngmt() -> blk_aio_prwv() + blk_aio_zone_mngmt_entry() -> bdrv_co_do_zone_mngmt() -> bdrv_co_zone_mngmt(). I am still on the way to understand the BlockDriver structures. So the above may need a second thought. > Another thing: you will need one more callback to get the device > information related to zones: maximum number of open and active zones at > least (the number of zones can be discovered with a report zones call). > Is this callback zbd_get_sysfs_attrr(char *devname, const char *attr, char *str, int str_len) in libzbd? Thanks for reviewing! Have a good night. Sam > Cheers. > > -- > Damien Le Moal > Western Digital Research