Re: [PATCH v3 4/8] m25p80: Add the mx25l25635f SFPD table
On 10/7/22 16:44, Francisco Iglesias wrote: On [2022 Jul 22] Fri 08:35:58, Cédric Le Goater wrote: The mx25l25635e and mx25l25635f chips have the same JEDEC id but the mx25l25635f has more capabilities reported in the SFDP table. Support for 4B opcodes is of interest because it is exploited by the Linux kernel. The SFDP table size is 0x200 bytes long. The mandatory table for basic features is available at byte 0x30 and an extra Macronix specific table is available at 0x60. Signed-off-by: Cédric Le Goater --- hw/block/m25p80_sfdp.h | 1 + hw/block/m25p80.c | 2 ++ hw/block/m25p80_sfdp.c | 68 ++ 3 files changed, 71 insertions(+) diff --git a/hw/block/m25p80_sfdp.h b/hw/block/m25p80_sfdp.h index 0c46e669b335..87690a173c78 100644 --- a/hw/block/m25p80_sfdp.h +++ b/hw/block/m25p80_sfdp.h @@ -18,6 +18,7 @@ extern uint8_t m25p80_sfdp_n25q256a(uint32_t addr); extern uint8_t m25p80_sfdp_mx25l25635e(uint32_t addr); +extern uint8_t m25p80_sfdp_mx25l25635f(uint32_t addr); (optional -extern above) #endif diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c index 028b026d8ba2..6b120ce65212 100644 --- a/hw/block/m25p80.c +++ b/hw/block/m25p80.c @@ -234,6 +234,8 @@ static const FlashPartInfo known_devices[] = { { INFO("mx25l12855e", 0xc22618, 0, 64 << 10, 256, 0) }, { INFO6("mx25l25635e", 0xc22019, 0xc22019, 64 << 10, 512, 0), .sfdp_read = m25p80_sfdp_mx25l25635e }, +{ INFO6("mx25l25635f", 0xc22019, 0xc22019, 64 << 10, 512, 0), I think I'm not seeing the extended id part in the datasheet I've found so might be that you can switch to just INFO and _ext_id 0 above This was added by commit 6bbe036f32dc ("m25p80: Return the JEDEC ID twice for mx25l25635e") to fix a real breakage on HW. Thanks, C. (might be the same in the previous patch with the similar flash). Otherwise looks good to me: Reviewed-by: Francisco Iglesias + .sfdp_read = m25p80_sfdp_mx25l25635f }, { INFO("mx25l25655e", 0xc22619, 0, 64 << 10, 512, 0) }, { INFO("mx66l51235f", 0xc2201a, 0, 64 << 10, 1024, ER_4K | ER_32K) }, { INFO("mx66u51235f", 0xc2253a, 0, 64 << 10, 1024, ER_4K | ER_32K) }, diff --git a/hw/block/m25p80_sfdp.c b/hw/block/m25p80_sfdp.c index 6499c4c39954..70c13aea7c63 100644 --- a/hw/block/m25p80_sfdp.c +++ b/hw/block/m25p80_sfdp.c @@ -82,3 +82,71 @@ static const uint8_t sfdp_mx25l25635e[] = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, }; define_sfdp_read(mx25l25635e) + +static const uint8_t sfdp_mx25l25635f[] = { +0x53, 0x46, 0x44, 0x50, 0x00, 0x01, 0x01, 0xff, +0x00, 0x00, 0x01, 0x09, 0x30, 0x00, 0x00, 0xff, +0xc2, 0x00, 0x01, 0x04, 0x60, 0x00, 0x00, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xe5, 0x20, 0xf3, 0xff, 0xff, 0xff, 0xff, 0x0f, +0x44, 0xeb, 0x08, 0x6b, 0x08, 0x3b, 0x04, 0xbb, +0xfe, 0xff, 0xff, 0xff, 0xff, 0xff, 0x00, 0xff, +0xff, 0xff, 0x44, 0xeb, 0x0c, 0x20, 0x0f, 0x52, +0x10, 0xd8, 0x00, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0x00, 0x36, 0x00, 0x27, 0x9d, 0xf9, 0xc0, 0x64, +0x85, 0xcb, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xc2, 0xf5, 0x08, 0x0a, +0x08, 0x04, 0x03, 0x06, 0x00, 0x00, 0x07, 0x29, +0x17, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff
Re: [PATCH v11 7/7] docs/zoned-storage: add zoned device documentation
On 10/10/22 04:21, Sam Li wrote: Add the documentation about the zoned device support to virtio-blk emulation. Signed-off-by: Sam Li Reviewed-by: Stefan Hajnoczi --- docs/devel/zoned-storage.rst | 40 ++ docs/system/qemu-block-drivers.rst.inc | 6 2 files changed, 46 insertions(+) create mode 100644 docs/devel/zoned-storage.rst diff --git a/docs/devel/zoned-storage.rst b/docs/devel/zoned-storage.rst new file mode 100644 index 00..deaa4ce99b --- /dev/null +++ b/docs/devel/zoned-storage.rst @@ -0,0 +1,40 @@ += +zoned-storage += + +Zoned Block Devices (ZBDs) devide the LBA space into block regions called zones divide +that are larger than the LBA size. They can only allow sequential writes, which +can reduce write amplification in SSDs, and potentially lead to higher +throughput and increased capacity. More details about ZBDs can be found at: + +https://zonedstorage.io/docs/introduction/zoned-storage + +1. Block layer APIs for zoned storage +- +QEMU block layer has three zoned storage model: +- BLK_Z_HM: This model only allows sequential writes access. It supports a set +of ZBD-specific I/O request that used by the host to manage device zones. Maybe: This model only allow for sequential write access to zones. It supports ZBD-specific I/O requests to manage device zones. +- BLK_Z_HA: It deals with both sequential writes and random writes access. Maybe better: This model allows sequential and random writes to zones. It supports ZBD-specific I/O requests to manage device zones. +- BLK_Z_NONE: Regular block devices and drive-managed ZBDs are treated as +non-zoned devices. Maybe: This is the default model with no zones support; it includes both regular and drive-managed ZBD devices. ZBD-specific I/O requests are not supported. + +The block device information resides inside BlockDriverState. QEMU uses +BlockLimits struct(BlockDriverState::bl) that is continuously accessed by the +block layer while processing I/O requests. A BlockBackend has a root pointer to +a BlockDriverState graph(for example, raw format on top of file-posix). The +zoned storage information can be propagated from the leaf BlockDriverState all +the way up to the BlockBackend. If the zoned storage model in file-posix is +set to BLK_Z_HM, then block drivers will declare support for zoned host device. + +The block layer APIs support commands needed for zoned storage devices, +including report zones, four zone operations, and zone append. + +2. Emulating zoned storage controllers +-- +When the BlockBackend's BlockLimits model reports a zoned storage device, users +like the virtio-blk emulation or the qemu-io-cmds.c utility can use block layer +APIs for zoned storage emulation or testing. + +For example, to test zone_report on a null_blk device using qemu-io is: +$ path/to/qemu-io --image-opts -n driver=zoned_host_device,filename=/dev/nullb0 +-c "zrp offset nr_zones" diff --git a/docs/system/qemu-block-drivers.rst.inc b/docs/system/qemu-block-drivers.rst.inc index dfe5d2293d..0b97227fd9 100644 --- a/docs/system/qemu-block-drivers.rst.inc +++ b/docs/system/qemu-block-drivers.rst.inc @@ -430,6 +430,12 @@ Hard disks you may corrupt your host data (use the ``-snapshot`` command line option or modify the device permissions accordingly). +Zoned block devices + Zoned block devices can be passed through to the guest if the emulated storage + controller supports zoned storage. Use ``--blockdev zoned_host_device, + node-name=drive0,filename=/dev/nullb0`` to pass through ``/dev/nullb0`` + as ``drive0``. + Windows ^^^ Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman
Re: [PATCH v3 2/8] m25p80: Add the n25q256a SFDP table
On 10/7/22 16:03, Francisco Iglesias wrote: On [2022 Jul 22] Fri 08:35:56, Cédric Le Goater wrote: The same values were collected on 4 differents OpenPower systems, palmettos, romulus and tacoma. The SFDP table size is defined as being 0x100 bytes but it could be bigger. Only the mandatory table for basic features is available at byte 0x30. Signed-off-by: Cédric Le Goater --- hw/block/m25p80_sfdp.h | 2 ++ hw/block/m25p80.c | 8 +++--- hw/block/m25p80_sfdp.c | 58 ++ hw/block/meson.build | 1 + 4 files changed, 66 insertions(+), 3 deletions(-) create mode 100644 hw/block/m25p80_sfdp.c diff --git a/hw/block/m25p80_sfdp.h b/hw/block/m25p80_sfdp.h index 230b07ef3308..d3a0a778ae84 100644 --- a/hw/block/m25p80_sfdp.h +++ b/hw/block/m25p80_sfdp.h @@ -15,4 +15,6 @@ */ #define M25P80_SFDP_MAX_SIZE (1 << 24) +extern uint8_t m25p80_sfdp_n25q256a(uint32_t addr); (-extern above if we would like) + #endif diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c index abdc4c0b0da7..13e7b28fd2b0 100644 --- a/hw/block/m25p80.c +++ b/hw/block/m25p80.c @@ -247,13 +247,15 @@ static const FlashPartInfo known_devices[] = { { INFO("n25q128a11", 0x20bb18, 0, 64 << 10, 256, ER_4K) }, { INFO("n25q128a13", 0x20ba18, 0, 64 << 10, 256, ER_4K) }, { INFO("n25q256a11", 0x20bb19, 0, 64 << 10, 512, ER_4K) }, -{ INFO("n25q256a13", 0x20ba19, 0, 64 << 10, 512, ER_4K) }, +{ INFO("n25q256a13", 0x20ba19, 0, 64 << 10, 512, ER_4K), + .sfdp_read = m25p80_sfdp_n25q256a }, { INFO("n25q512a11", 0x20bb20, 0, 64 << 10, 1024, ER_4K) }, { INFO("n25q512a13", 0x20ba20, 0, 64 << 10, 1024, ER_4K) }, { INFO("n25q128", 0x20ba18, 0, 64 << 10, 256, 0) }, { INFO("n25q256a",0x20ba19, 0, 64 << 10, 512, - ER_4K | HAS_SR_BP3_BIT6 | HAS_SR_TB) }, -{ INFO("n25q512a",0x20ba20, 0, 64 << 10, 1024, ER_4K) }, + ER_4K | HAS_SR_BP3_BIT6 | HAS_SR_TB), + .sfdp_read = m25p80_sfdp_n25q256a }, + { INFO("n25q512a",0x20ba20, 0, 64 << 10, 1024, ER_4K) }, { INFO("n25q512ax3", 0x20ba20, 0x1000, 64 << 10, 1024, ER_4K) }, { INFO("mt25ql512ab", 0x20ba20, 0x1044, 64 << 10, 1024, ER_4K | ER_32K) }, { INFO_STACKED("mt35xu01g", 0x2c5b1b, 0x104100, 128 << 10, 1024, diff --git a/hw/block/m25p80_sfdp.c b/hw/block/m25p80_sfdp.c new file mode 100644 index ..24ec05de79a1 --- /dev/null +++ b/hw/block/m25p80_sfdp.c @@ -0,0 +1,58 @@ +/* + * M25P80 Serial Flash Discoverable Parameter (SFDP) + * + * Copyright (c) 2020, IBM Corporation. + * + * This code is licensed under the GPL version 2 or later. See the + * COPYING file in the top-level directory. + */ + +#include "qemu/osdep.h" +#include "qemu/host-utils.h" +#include "m25p80_sfdp.h" + +#define define_sfdp_read(model) \ +uint8_t m25p80_sfdp_##model(uint32_t addr)\ +{ \ +assert(is_power_of_2(sizeof(sfdp_##model))); \ +return sfdp_##model[addr & (sizeof(sfdp_##model) - 1)]; \ +} + +/* + * Micron + */ +static const uint8_t sfdp_n25q256a[] = { The datasheets I found wasn't completetly as this table but I can't argue with the hw read out of 4 flashes. It is mentioned there : http://datasheet.octopart.com/N25Q256A13E1241F-Micron-datasheet-11552757.pdf C. Reviewed-by: Francisco Iglesias +0x53, 0x46, 0x44, 0x50, 0x00, 0x01, 0x00, 0xff, +0x00, 0x00, 0x01, 0x09, 0x30, 0x00, 0x00, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xe5, 0x20, 0xfb, 0xff, 0xff, 0xff, 0xff, 0x0f, +0x29, 0xeb, 0x27, 0x6b, 0x08, 0x3b, 0x27, 0xbb, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x27, 0xbb, +0xff, 0xff, 0x29, 0xeb, 0x0c, 0x20, 0x10, 0xd8, +0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +
Re: [PATCH v3 3/8] m25p80: Add the mx25l25635e SFPD table
Hello Francisco On 10/7/22 15:59, Francisco Iglesias wrote: Hi Cedric, On [2022 Jul 22] Fri 08:35:57, Cédric Le Goater wrote: The SFDP table is 0x80 bytes long. The mandatory table for basic features is available at byte 0x30 and an extra Macronix specific table is available at 0x60. 4B opcodes are not supported. Signed-off-by: Cédric Le Goater --- hw/block/m25p80_sfdp.h | 3 +++ hw/block/m25p80.c | 3 ++- hw/block/m25p80_sfdp.c | 26 ++ 3 files changed, 31 insertions(+), 1 deletion(-) diff --git a/hw/block/m25p80_sfdp.h b/hw/block/m25p80_sfdp.h index d3a0a778ae84..0c46e669b335 100644 --- a/hw/block/m25p80_sfdp.h +++ b/hw/block/m25p80_sfdp.h @@ -17,4 +17,7 @@ extern uint8_t m25p80_sfdp_n25q256a(uint32_t addr); +extern uint8_t m25p80_sfdp_mx25l25635e(uint32_t addr); We could be without 'extern' in above hdr if we like (also the other patches), Yes. I dropped all of them in v4. Thanks, C. either way: Reviewed-by: Francisco Iglesias + + #endif diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c index 13e7b28fd2b0..028b026d8ba2 100644 --- a/hw/block/m25p80.c +++ b/hw/block/m25p80.c @@ -232,7 +232,8 @@ static const FlashPartInfo known_devices[] = { { INFO("mx25l6405d", 0xc22017, 0, 64 << 10, 128, 0) }, { INFO("mx25l12805d", 0xc22018, 0, 64 << 10, 256, 0) }, { INFO("mx25l12855e", 0xc22618, 0, 64 << 10, 256, 0) }, -{ INFO6("mx25l25635e", 0xc22019, 0xc22019, 64 << 10, 512, 0) }, +{ INFO6("mx25l25635e", 0xc22019, 0xc22019, 64 << 10, 512, 0), + .sfdp_read = m25p80_sfdp_mx25l25635e }, { INFO("mx25l25655e", 0xc22619, 0, 64 << 10, 512, 0) }, { INFO("mx66l51235f", 0xc2201a, 0, 64 << 10, 1024, ER_4K | ER_32K) }, { INFO("mx66u51235f", 0xc2253a, 0, 64 << 10, 1024, ER_4K | ER_32K) }, diff --git a/hw/block/m25p80_sfdp.c b/hw/block/m25p80_sfdp.c index 24ec05de79a1..6499c4c39954 100644 --- a/hw/block/m25p80_sfdp.c +++ b/hw/block/m25p80_sfdp.c @@ -56,3 +56,29 @@ static const uint8_t sfdp_n25q256a[] = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, }; define_sfdp_read(n25q256a); + + +/* + * Matronix + */ + +/* mx25l25635e. No 4B opcodes */ +static const uint8_t sfdp_mx25l25635e[] = { +0x53, 0x46, 0x44, 0x50, 0x00, 0x01, 0x01, 0xff, +0x00, 0x00, 0x01, 0x09, 0x30, 0x00, 0x00, 0xff, +0xc2, 0x00, 0x01, 0x04, 0x60, 0x00, 0x00, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xe5, 0x20, 0xf3, 0xff, 0xff, 0xff, 0xff, 0x0f, +0x44, 0xeb, 0x08, 0x6b, 0x08, 0x3b, 0x04, 0xbb, +0xee, 0xff, 0xff, 0xff, 0xff, 0xff, 0x00, 0xff, +0xff, 0xff, 0x00, 0xff, 0x0c, 0x20, 0x0f, 0x52, +0x10, 0xd8, 0x00, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0x00, 0x36, 0x00, 0x27, 0xf7, 0x4f, 0xff, 0xff, +0xd9, 0xc8, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, +}; +define_sfdp_read(mx25l25635e) -- 2.35.3
Re: [PATCH v11 5/7] config: add check to block layer
On 10/10/22 04:21, Sam Li wrote: Putting zoned/non-zoned BlockDrivers on top of each other is not allowed. Signed-off-by: Sam Li Reviewed-by: Stefan Hajnoczi --- block.c | 17 + block/file-posix.c | 13 + block/raw-format.c | 1 + include/block/block_int-common.h | 5 + 4 files changed, 36 insertions(+) mode change 100644 => 100755 block.c mode change 100644 => 100755 block/file-posix.c diff --git a/block.c b/block.c old mode 100644 new mode 100755 index bc85f46eed..bf2f2918e7 --- a/block.c +++ b/block.c @@ -7947,6 +7947,23 @@ void bdrv_add_child(BlockDriverState *parent_bs, BlockDriverState *child_bs, return; } +/* + * Non-zoned block drivers do not follow zoned storage constraints + * (i.e. sequential writes to zones). Refuse mixing zoned and non-zoned + * drivers in a graph. + */ +if (!parent_bs->drv->supports_zoned_children && +/* The host-aware model allows zoned storage constraints and random + * write. Allow mixing host-aware and non-zoned drivers. Using + * host-aware device as a regular device. */ It's a very unusual style to put comments inside a condition. Please move it before or after the condition to keep the condition together. +child_bs->bl.zoned == BLK_Z_HM) { +error_setg(errp, "Cannot add a %s child to a %s parent", + child_bs->bl.zoned == BLK_Z_HM ? "zoned" : "non-zoned", + parent_bs->drv->supports_zoned_children ? + "support zoned children" : "not support zoned children"); +return; +} + if (!QLIST_EMPTY(&child_bs->parents)) { error_setg(errp, "The node %s already has a parent", child_bs->node_name); diff --git a/block/file-posix.c b/block/file-posix.c old mode 100644 new mode 100755 index 226f5d48f5..a9d347292e --- a/block/file-posix.c +++ b/block/file-posix.c @@ -778,6 +778,19 @@ static int raw_open_common(BlockDriverState *bs, QDict *options, goto fail; } } +#ifdef CONFIG_BLKZONED +/* + * The kernel page cache does not reliably work for writes to SWR zones + * of zoned block device because it can not guarantee the order of writes. + */ +if (strcmp(bs->drv->format_name, "zoned_host_device") == 0) { +if (!(s->open_flags & O_DIRECT)) { You can join these conditions with '&&' and safe one level of intendation. +error_setg(errp, "driver=zoned_host_device was specified, but it " + "requires cache.direct=on, which was not specified."); +return -EINVAL; /* No host kernel page cache */ +} +} +#endif if (S_ISBLK(st.st_mode)) { #ifdef BLKDISCARDZEROES diff --git a/block/raw-format.c b/block/raw-format.c index 618c6b1ec2..b885688434 100644 --- a/block/raw-format.c +++ b/block/raw-format.c @@ -614,6 +614,7 @@ static void raw_child_perm(BlockDriverState *bs, BdrvChild *c, BlockDriver bdrv_raw = { .format_name = "raw", .instance_size= sizeof(BDRVRawState), +.supports_zoned_children = true, .bdrv_probe = &raw_probe, .bdrv_reopen_prepare = &raw_reopen_prepare, .bdrv_reopen_commit = &raw_reopen_commit, diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h index cdc06e77a6..37dddc603c 100644 --- a/include/block/block_int-common.h +++ b/include/block/block_int-common.h @@ -127,6 +127,11 @@ struct BlockDriver { */ bool is_format; +/* + * Set to true if the BlockDriver supports zoned children. + */ +bool supports_zoned_children; + /* * Drivers not implementing bdrv_parse_filename nor bdrv_open should have * this field set to true, except ones that are defined only by their The remainder looks good. Once you fixed the minor editing issues you can add: Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman
Re: [PATCH v11 4/7] raw-format: add zone operations to pass through requests
On 10/10/22 04:21, Sam Li wrote: raw-format driver usually sits on top of file-posix driver. It needs to pass through requests of zone commands. Signed-off-by: Sam Li Reviewed-by: Stefan Hajnoczi Reviewed-by: Damien Le Moal --- block/raw-format.c | 13 + 1 file changed, 13 insertions(+) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman
Re: [PATCH v11 1/7] include: add zoned device structs
On 10/10/22 04:21, Sam Li wrote: Signed-off-by: Sam Li Reviewed-by: Stefan Hajnoczi Reviewed-by: Damien Le Moal --- include/block/block-common.h | 43 1 file changed, 43 insertions(+) Reviewed-by: Hannes Reinecke Cheers, Hannes -- Dr. Hannes ReineckeKernel Storage Architect h...@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Martje Boudien Moerman
[PATCH v6 1/2] block: Ignore close() failure in get_tmp_filename()
The temporary file has been created and is ready for use. Checking return value of close() does not seem useful. The file descriptor is almost certainly closed; see close(2) under "Dealing with error returns from close()". Let's simply ignore close() failure here. Suggested-by: Markus Armbruster Signed-off-by: Bin Meng Reviewed-by: Markus Armbruster --- (no changes since v5) Changes in v5: - new patch: "block: Ignore close() failure in get_tmp_filename()" block.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/block.c b/block.c index bc85f46eed..582c205307 100644 --- a/block.c +++ b/block.c @@ -886,10 +886,7 @@ int get_tmp_filename(char *filename, int size) if (fd < 0) { return -errno; } -if (close(fd) != 0) { -unlink(filename); -return -errno; -} +close(fd); return 0; #endif } -- 2.25.1
[PATCH v6 2/2] block: Refactor get_tmp_filename()
At present there are two callers of get_tmp_filename() and they are inconsistent. One does: /* TODO: extra byte is a hack to ensure MAX_PATH space on Windows. */ char *tmp_filename = g_malloc0(PATH_MAX + 1); ... ret = get_tmp_filename(tmp_filename, PATH_MAX + 1); while the other does: s->qcow_filename = g_malloc(PATH_MAX); ret = get_tmp_filename(s->qcow_filename, PATH_MAX); As we can see different 'size' arguments are passed. There are also platform specific implementations inside the function, and the use of snprintf is really undesirable. The function name is also misleading. It creates a temporary file, not just a filename. Refactor this routine by changing its name and signature to: char *create_tmp_file(Error **errp) and use g_get_tmp_dir() / g_mkstemp() for a consistent implementation. While we are here, add some comments to mention that /var/tmp is preferred over /tmp on non-win32 hosts. Signed-off-by: Bin Meng --- Changes in v6: - use g_mkstemp() and stick to use /var/tmp for non-win32 hosts Changes in v5: - minor change in the commit message - add some notes in the function comment block - add g_autofree for tmp_filename Changes in v4: - Rename the function to create_tmp_file() and take "Error **errp" as a parameter, so that callers can pass errp all the way down to this routine. - Commit message updated to reflect the latest change Changes in v3: - Do not use errno directly, instead still let get_tmp_filename() return a negative number to indicate error Changes in v2: - Use g_autofree and g_steal_pointer include/block/block_int-common.h | 2 +- block.c | 56 +--- block/vvfat.c| 7 ++-- 3 files changed, 34 insertions(+), 31 deletions(-) diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h index 8947abab76..d7c0a7e96f 100644 --- a/include/block/block_int-common.h +++ b/include/block/block_int-common.h @@ -1230,7 +1230,7 @@ static inline BlockDriverState *child_bs(BdrvChild *child) } int bdrv_check_request(int64_t offset, int64_t bytes, Error **errp); -int get_tmp_filename(char *filename, int size); +char *create_tmp_file(Error **errp); void bdrv_parse_filename_strip_prefix(const char *filename, const char *prefix, QDict *options); diff --git a/block.c b/block.c index 582c205307..8eeaa5623e 100644 --- a/block.c +++ b/block.c @@ -860,35 +860,42 @@ int bdrv_probe_geometry(BlockDriverState *bs, HDGeometry *geo) /* * Create a uniquely-named empty temporary file. - * Return 0 upon success, otherwise a negative errno value. + * Return the actual file name used upon success, otherwise NULL. + * This string should be freed with g_free() when not needed any longer. + * + * Note: creating a temporary file for the caller to (re)open is + * inherently racy. Use g_file_open_tmp() instead whenever practical. */ -int get_tmp_filename(char *filename, int size) +char *create_tmp_file(Error **errp) { -#ifdef _WIN32 -char temp_dir[MAX_PATH]; -/* GetTempFileName requires that its output buffer (4th param) - have length MAX_PATH or greater. */ -assert(size >= MAX_PATH); -return (GetTempPath(MAX_PATH, temp_dir) -&& GetTempFileName(temp_dir, "qem", 0, filename) -? 0 : -GetLastError()); -#else int fd; const char *tmpdir; -tmpdir = getenv("TMPDIR"); -if (!tmpdir) { +g_autofree char *filename = NULL; + +tmpdir = g_get_tmp_dir(); +#ifndef _WIN32 +/* + * See commit 69bef79 ("block: use /var/tmp instead of /tmp for -snapshot") + * + * This function is used to create temporary disk images (like -snapshot), + * so the files can become very large. /tmp is often a tmpfs where as + * /var/tmp is usually on a disk, so more appropriate for disk images. + */ +if (!g_strcmp0(tmpdir, "/tmp")) { tmpdir = "/var/tmp"; } -if (snprintf(filename, size, "%s/vl.XX", tmpdir) >= size) { -return -EOVERFLOW; -} -fd = mkstemp(filename); +#endif + +filename = g_strdup_printf("%s/vl.XX", tmpdir); +fd = g_mkstemp(filename); if (fd < 0) { -return -errno; +error_setg_errno(errp, -errno, "Could not open temporary file '%s'", + filename); +return NULL; } close(fd); -return 0; -#endif + +return g_steal_pointer(&filename); } /* @@ -3714,8 +3721,7 @@ static BlockDriverState *bdrv_append_temp_snapshot(BlockDriverState *bs, QDict *snapshot_options, Error **errp) { -/* TODO: extra byte is a hack to ensure MAX_PATH space on Windows. */ -char *tmp_filename = g_malloc0(PATH_MAX + 1); +g_autofree char *tmp_filename = NULL; int64_t total_size; QemuOpts *opts = NULL; BlockDriverState *bs_snapshot = NULL;
[PATCH v3 3/3] qemu-iotests: test zone append operation
This tests is mainly a helper to indicate append writes in block layer behaves as expected. Signed-off-by: Sam Li --- qemu-io-cmds.c | 62 ++ tests/qemu-iotests/tests/zoned.out | 7 tests/qemu-iotests/tests/zoned.sh | 9 + 3 files changed, 78 insertions(+) diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c index e56c8d1c30..6cb86de35b 100644 --- a/qemu-io-cmds.c +++ b/qemu-io-cmds.c @@ -1855,6 +1855,67 @@ static const cmdinfo_t zone_reset_cmd = { .oneline = "reset a zone write pointer in zone block device", }; +static int do_aio_zone_append(BlockBackend *blk, QEMUIOVector *qiov, + int64_t *offset, int flags, int *total) +{ +int async_ret = NOT_DONE; + +blk_aio_zone_append(blk, offset, qiov, flags, aio_rw_done, &async_ret); +while (async_ret == NOT_DONE) { +main_loop_wait(false); +} + +*total = qiov->size; +return async_ret < 0 ? async_ret : 1; +} + +static int zone_append_f(BlockBackend *blk, int argc, char **argv) { +int ret; +int flags = 0; +int total = 0; +int64_t offset; +char *buf; +int nr_iov; +int pattern = 0xcd; +QEMUIOVector qiov; + +if (optind > argc - 2) { +return -EINVAL; +} +optind++; +offset = cvtnum(argv[optind]); +if (offset < 0) { +print_cvtnum_err(offset, argv[optind]); +return offset; +} +optind++; +nr_iov = argc - optind; +buf = create_iovec(blk, &qiov, &argv[optind], nr_iov, pattern); +if (buf == NULL) { +return -EINVAL; +} +ret = do_aio_zone_append(blk, &qiov, &offset, flags, &total); +if (ret < 0) { +printf("zone append failed: %s\n", strerror(-ret)); +goto out; +} + +out: +qemu_iovec_destroy(&qiov); +qemu_io_free(buf); +return ret; +} + +static const cmdinfo_t zone_append_cmd = { +.name = "zone_append", +.altname = "zap", +.cfunc = zone_append_f, +.argmin = 3, +.argmax = 3, +.args = "offset len [len..]", +.oneline = "append write a number of bytes at a specified offset", +}; + static int truncate_f(BlockBackend *blk, int argc, char **argv); static const cmdinfo_t truncate_cmd = { .name = "truncate", @@ -2652,6 +2713,7 @@ static void __attribute((constructor)) init_qemuio_commands(void) qemuio_add_command(&zone_close_cmd); qemuio_add_command(&zone_finish_cmd); qemuio_add_command(&zone_reset_cmd); +qemuio_add_command(&zone_append_cmd); qemuio_add_command(&truncate_cmd); qemuio_add_command(&length_cmd); qemuio_add_command(&info_cmd); diff --git a/tests/qemu-iotests/tests/zoned.out b/tests/qemu-iotests/tests/zoned.out index 0c8f96deb9..b3b139b4ec 100644 --- a/tests/qemu-iotests/tests/zoned.out +++ b/tests/qemu-iotests/tests/zoned.out @@ -50,4 +50,11 @@ start: 0x8, len 0x8, cap 0x8, wptr 0x10, zcond:14, [type: 2] (5) resetting the second zone After resetting a zone: start: 0x8, len 0x8, cap 0x8, wptr 0x8, zcond:1, [type: 2] + + +(6) append write +After appending the first zone: +start: 0x0, len 0x8, cap 0x8, wptr 0x18, zcond:2, [type: 2] +After appending the second zone: +start: 0x8, len 0x8, cap 0x8, wptr 0x80018, zcond:2, [type: 2] *** done diff --git a/tests/qemu-iotests/tests/zoned.sh b/tests/qemu-iotests/tests/zoned.sh index fced0194c5..888711eef2 100755 --- a/tests/qemu-iotests/tests/zoned.sh +++ b/tests/qemu-iotests/tests/zoned.sh @@ -79,6 +79,15 @@ echo "(5) resetting the second zone" sudo $QEMU_IO $IMG -c "zrs 268435456 268435456" echo "After resetting a zone:" sudo $QEMU_IO $IMG -c "zrp 268435456 1" +echo +echo +echo "(6) append write" # physical block size of the device is 4096 +sudo $QEMU_IO $IMG -c "zap 0 0x1000 0x2000" +echo "After appending the first zone:" +sudo $QEMU_IO $IMG -c "zrp 0 1" +sudo $QEMU_IO $IMG -c "zap 268435456 0x1000 0x2000" +echo "After appending the second zone:" +sudo $QEMU_IO $IMG -c "zrp 268435456 1" # success, all done echo "*** done" -- 2.37.3
[PATCH v3 2/3] block: introduce zone append write for zoned devices
A zone append command is a write operation that specifies the first logical block of a zone as the write position. When writing to a zoned block device using zone append, the byte offset of writes is pointing to the write pointer of that zone. Upon completion the device will respond with the position the data has been written in the zone. Signed-off-by: Sam Li --- block/block-backend.c | 64 +++ block/file-posix.c| 64 --- block/io.c| 21 ++ block/raw-format.c| 7 include/block/block-io.h | 3 ++ include/block/block_int-common.h | 3 ++ include/block/raw-aio.h | 4 +- include/sysemu/block-backend-io.h | 9 + 8 files changed, 168 insertions(+), 7 deletions(-) diff --git a/block/block-backend.c b/block/block-backend.c index ddc569e3ac..bfdb719bc8 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -1439,6 +1439,9 @@ typedef struct BlkRwCo { struct { BlockZoneOp op; } zone_mgmt; +struct { +int64_t *append_sector; +} zone_append; }; } BlkRwCo; @@ -1869,6 +1872,46 @@ BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op, return &acb->common; } +static void coroutine_fn blk_aio_zone_append_entry(void *opaque) { +BlkAioEmAIOCB *acb = opaque; +BlkRwCo *rwco = &acb->rwco; + +rwco->ret = blk_co_zone_append(rwco->blk, rwco->zone_append.append_sector, + rwco->iobuf, rwco->flags); +blk_aio_complete(acb); +} + +BlockAIOCB *blk_aio_zone_append(BlockBackend *blk, int64_t *offset, +QEMUIOVector *qiov, BdrvRequestFlags flags, +BlockCompletionFunc *cb, void *opaque) { +BlkAioEmAIOCB *acb; +Coroutine *co; +IO_CODE(); + +blk_inc_in_flight(blk); +acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque); +acb->rwco = (BlkRwCo) { +.blk= blk, +.ret= NOT_DONE, +.flags = flags, +.iobuf = qiov, +.zone_append = { +.append_sector = offset, +}, +}; +acb->has_returned = false; + +co = qemu_coroutine_create(blk_aio_zone_append_entry, acb); +bdrv_coroutine_enter(blk_bs(blk), co); +acb->has_returned = true; +if (acb->rwco.ret != NOT_DONE) { +replay_bh_schedule_oneshot_event(blk_get_aio_context(blk), + blk_aio_complete_bh, acb); +} + +return &acb->common; +} + /* * Send a zone_report command. * offset is a byte offset from the start of the device. No alignment @@ -1921,6 +1964,27 @@ int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op, return ret; } +/* + * Send a zone_append command. + */ +int coroutine_fn blk_co_zone_append(BlockBackend *blk, int64_t *offset, +QEMUIOVector *qiov, BdrvRequestFlags flags) +{ +int ret; +IO_CODE(); + +blk_inc_in_flight(blk); +blk_wait_while_drained(blk); +if (!blk_is_available(blk)) { +blk_dec_in_flight(blk); +return -ENOMEDIUM; +} + +ret = bdrv_co_zone_append(blk_bs(blk), offset, qiov, flags); +blk_dec_in_flight(blk); +return ret; +} + void blk_drain(BlockBackend *blk) { BlockDriverState *bs = blk_bs(blk); diff --git a/block/file-posix.c b/block/file-posix.c index 17c0b58158..08ab164df4 100755 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -1657,7 +1657,7 @@ static ssize_t handle_aiocb_rw_vector(RawPosixAIOData *aiocb) ssize_t len; do { -if (aiocb->aio_type & QEMU_AIO_WRITE) +if (aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) len = qemu_pwritev(aiocb->aio_fildes, aiocb->io.iov, aiocb->io.niov, @@ -1687,7 +1687,7 @@ static ssize_t handle_aiocb_rw_linear(RawPosixAIOData *aiocb, char *buf) ssize_t len; while (offset < aiocb->aio_nbytes) { -if (aiocb->aio_type & QEMU_AIO_WRITE) { +if (aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) { len = pwrite(aiocb->aio_fildes, (const char *)buf + offset, aiocb->aio_nbytes - offset, @@ -1731,7 +1731,7 @@ static int handle_aiocb_rw(void *opaque) * The offset of regular writes, append writes is the wp location * of that zone. */ -if (aiocb->aio_type & QEMU_AIO_WRITE) { +if (aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) { if (aiocb->bs->bl.zone_size > 0) { BlockZoneWps *wps = aiocb->bs->bl.wps; qemu_mutex_lock(&wps->lock); @@ -1794,7 +1794,7 @@ static int handle_aiocb_rw(void *opaque) } nbytes = handle_aiocb_rw_linear(aiocb, buf); -if (!(aiocb->aio_type & QEMU_AIO_WRITE)) { +if (!(aiocb-
[PATCH v3 1/3] file-posix:add the tracking of the zones write pointers
Since Linux doesn't have a user API to issue zone append operations to zoned devices from user space, the file-posix driver is modified to add zone append emulation using regular writes. To do this, the file-posix driver tracks the wp location of all zones of the device. It uses an array of uint64_t. The most significant bit of each wp location indicates if the zone type is conventional zones. The zones wp can be changed due to the following operations issued: - zone reset: change the wp to the start offset of that zone - zone finish: change to the end location of that zone - write to a zone - zone append Signed-off-by: Sam Li --- block/file-posix.c | 158 +++ include/block/block-common.h | 14 +++ include/block/block_int-common.h | 5 + 3 files changed, 177 insertions(+) diff --git a/block/file-posix.c b/block/file-posix.c index a9d347292e..17c0b58158 100755 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -206,6 +206,7 @@ typedef struct RawPosixAIOData { struct { struct iovec *iov; int niov; +int64_t *append_sector; } io; struct { uint64_t cmd; @@ -226,6 +227,7 @@ typedef struct RawPosixAIOData { struct { unsigned long zone_op; const char *zone_op_name; +bool all; } zone_mgmt; }; } RawPosixAIOData; @@ -1331,6 +1333,67 @@ static int hdev_get_max_segments(int fd, struct stat *st) { #endif } +#if defined(CONFIG_BLKZONED) +static int get_zones_wp(int64_t offset, int fd, BlockZoneWps *wps, +unsigned int nrz) { +struct blk_zone *blkz; +int64_t rep_size; +int64_t sector = offset >> BDRV_SECTOR_BITS; +int ret, n = 0, i = 0; +rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct blk_zone); +g_autofree struct blk_zone_report *rep = NULL; + +rep = g_malloc(rep_size); +blkz = (struct blk_zone *)(rep + 1); +while (n < nrz) { +memset(rep, 0, rep_size); +rep->sector = sector; +rep->nr_zones = nrz - n; + +do { +ret = ioctl(fd, BLKREPORTZONE, rep); +} while (ret != 0 && errno == EINTR); +if (ret != 0) { +error_report("%d: ioctl BLKREPORTZONE at %" PRId64 " failed %d", +fd, offset, errno); +return -errno; +} + +if (!rep->nr_zones) { +break; +} + +for (i = 0; i < rep->nr_zones; i++, n++) { +/* + * The wp tracking cares only about sequential writes required and + * sequential write preferred zones so that the wp can advance to + * the right location. + * Use the most significant bit of the wp location to indicate the + * zone type: 0 for SWR/SWP zones and 1 for conventional zones. + */ +if (!(blkz[i].type != BLK_ZONE_TYPE_CONVENTIONAL)) { +wps->wp[i] += 1ULL << 63; +} else { +wps->wp[i] = blkz[i].wp << BDRV_SECTOR_BITS; +} +} +sector = blkz[i-1].start + blkz[i-1].len; +} + +return 0; +} + +static void update_zones_wp(int64_t offset, int fd, BlockZoneWps *wps, +unsigned int nrz) { +qemu_mutex_lock(&wps->lock); +if (get_zones_wp(offset, fd, wps, nrz) < 0) { +error_report("report zone wp failed"); +return; +} +qemu_mutex_unlock(&wps->lock); +} +#endif + static void raw_refresh_limits(BlockDriverState *bs, Error **errp) { BDRVRawState *s = bs->opaque; @@ -1414,6 +1477,19 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp) error_report("Invalid device capacity %" PRId64 " bytes ", bs->bl.capacity); return; } + +ret = get_sysfs_long_val(&st, "physical_block_size"); +if (ret >= 0) { +bs->bl.write_granularity = ret; +} + +bs->bl.wps = g_malloc(sizeof(BlockZoneWps) + sizeof(int64_t) * ret); +if (get_zones_wp(0, s->fd, bs->bl.wps, ret) < 0){ +error_report("report wps failed"); +g_free(bs->bl.wps); +return; +} +qemu_mutex_init(&bs->bl.wps->lock); } } @@ -1651,6 +1727,20 @@ static int handle_aiocb_rw(void *opaque) ssize_t nbytes; char *buf; +/* + * The offset of regular writes, append writes is the wp location + * of that zone. + */ +if (aiocb->aio_type & QEMU_AIO_WRITE) { +if (aiocb->bs->bl.zone_size > 0) { +BlockZoneWps *wps = aiocb->bs->bl.wps; +qemu_mutex_lock(&wps->lock); +aiocb->aio_offset = wps->wp[aiocb->aio_offset / +aiocb->bs->bl.zone_size]; +qemu_mutex_unlock(&wps->lock); +} +} + if (!(aiocb->aio_type & QEMU_AIO_MISALIGNED)) { /* * If there is just
[PATCH v3 0/3] Add zone append write for zoned device
v3: - only read wps when it is locked [Damien] - allow last smaller zone case [Damien] - add zone type and state checks in zone_mgmt command [Damien] - fix RESET_ALL related problems v2: - split patch to two patches for better reviewing - change BlockZoneWps's structure to an array of integers - use only mutex lock on locking conditions of zone wps - coding styles and clean-ups v1: - introduce zone append write Sam Li (3): file-posix:add the tracking of the zones write pointers block: introduce zone append write for zoned devices qemu-iotests: test zone append operation block/block-backend.c | 64 + block/file-posix.c | 216 - block/io.c | 21 +++ block/raw-format.c | 7 + include/block/block-common.h | 14 ++ include/block/block-io.h | 3 + include/block/block_int-common.h | 8 ++ include/block/raw-aio.h| 4 +- include/sysemu/block-backend-io.h | 9 ++ qemu-io-cmds.c | 62 + tests/qemu-iotests/tests/zoned.out | 7 + tests/qemu-iotests/tests/zoned.sh | 9 ++ 12 files changed, 420 insertions(+), 4 deletions(-) -- 2.37.3
[PATCH v11 5/7] config: add check to block layer
Putting zoned/non-zoned BlockDrivers on top of each other is not allowed. Signed-off-by: Sam Li Reviewed-by: Stefan Hajnoczi --- block.c | 17 + block/file-posix.c | 13 + block/raw-format.c | 1 + include/block/block_int-common.h | 5 + 4 files changed, 36 insertions(+) mode change 100644 => 100755 block.c mode change 100644 => 100755 block/file-posix.c diff --git a/block.c b/block.c old mode 100644 new mode 100755 index bc85f46eed..bf2f2918e7 --- a/block.c +++ b/block.c @@ -7947,6 +7947,23 @@ void bdrv_add_child(BlockDriverState *parent_bs, BlockDriverState *child_bs, return; } +/* + * Non-zoned block drivers do not follow zoned storage constraints + * (i.e. sequential writes to zones). Refuse mixing zoned and non-zoned + * drivers in a graph. + */ +if (!parent_bs->drv->supports_zoned_children && +/* The host-aware model allows zoned storage constraints and random + * write. Allow mixing host-aware and non-zoned drivers. Using + * host-aware device as a regular device. */ +child_bs->bl.zoned == BLK_Z_HM) { +error_setg(errp, "Cannot add a %s child to a %s parent", + child_bs->bl.zoned == BLK_Z_HM ? "zoned" : "non-zoned", + parent_bs->drv->supports_zoned_children ? + "support zoned children" : "not support zoned children"); +return; +} + if (!QLIST_EMPTY(&child_bs->parents)) { error_setg(errp, "The node %s already has a parent", child_bs->node_name); diff --git a/block/file-posix.c b/block/file-posix.c old mode 100644 new mode 100755 index 226f5d48f5..a9d347292e --- a/block/file-posix.c +++ b/block/file-posix.c @@ -778,6 +778,19 @@ static int raw_open_common(BlockDriverState *bs, QDict *options, goto fail; } } +#ifdef CONFIG_BLKZONED +/* + * The kernel page cache does not reliably work for writes to SWR zones + * of zoned block device because it can not guarantee the order of writes. + */ +if (strcmp(bs->drv->format_name, "zoned_host_device") == 0) { +if (!(s->open_flags & O_DIRECT)) { +error_setg(errp, "driver=zoned_host_device was specified, but it " + "requires cache.direct=on, which was not specified."); +return -EINVAL; /* No host kernel page cache */ +} +} +#endif if (S_ISBLK(st.st_mode)) { #ifdef BLKDISCARDZEROES diff --git a/block/raw-format.c b/block/raw-format.c index 618c6b1ec2..b885688434 100644 --- a/block/raw-format.c +++ b/block/raw-format.c @@ -614,6 +614,7 @@ static void raw_child_perm(BlockDriverState *bs, BdrvChild *c, BlockDriver bdrv_raw = { .format_name = "raw", .instance_size= sizeof(BDRVRawState), +.supports_zoned_children = true, .bdrv_probe = &raw_probe, .bdrv_reopen_prepare = &raw_reopen_prepare, .bdrv_reopen_commit = &raw_reopen_commit, diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h index cdc06e77a6..37dddc603c 100644 --- a/include/block/block_int-common.h +++ b/include/block/block_int-common.h @@ -127,6 +127,11 @@ struct BlockDriver { */ bool is_format; +/* + * Set to true if the BlockDriver supports zoned children. + */ +bool supports_zoned_children; + /* * Drivers not implementing bdrv_parse_filename nor bdrv_open should have * this field set to true, except ones that are defined only by their -- 2.37.3
[PATCH v11 6/7] qemu-iotests: test new zone operations
We have added new block layer APIs of zoned block devices. Test it with: Create a null_blk device, run each zone operation on it and see whether reporting right zone information. Signed-off-by: Sam Li Reviewed-by: Stefan Hajnoczi --- tests/qemu-iotests/tests/zoned.out | 53 ++ tests/qemu-iotests/tests/zoned.sh | 86 ++ 2 files changed, 139 insertions(+) create mode 100644 tests/qemu-iotests/tests/zoned.out create mode 100755 tests/qemu-iotests/tests/zoned.sh diff --git a/tests/qemu-iotests/tests/zoned.out b/tests/qemu-iotests/tests/zoned.out new file mode 100644 index 00..0c8f96deb9 --- /dev/null +++ b/tests/qemu-iotests/tests/zoned.out @@ -0,0 +1,53 @@ +QA output created by zoned.sh +Testing a null_blk device: +Simple cases: if the operations work +(1) report the first zone: +start: 0x0, len 0x8, cap 0x8, wptr 0x0, zcond:1, [type: 2] + +report the first 10 zones +start: 0x0, len 0x8, cap 0x8, wptr 0x0, zcond:1, [type: 2] +start: 0x8, len 0x8, cap 0x8, wptr 0x8, zcond:1, [type: 2] +start: 0x10, len 0x8, cap 0x8, wptr 0x10, zcond:1, [type: 2] +start: 0x18, len 0x8, cap 0x8, wptr 0x18, zcond:1, [type: 2] +start: 0x20, len 0x8, cap 0x8, wptr 0x20, zcond:1, [type: 2] +start: 0x28, len 0x8, cap 0x8, wptr 0x28, zcond:1, [type: 2] +start: 0x30, len 0x8, cap 0x8, wptr 0x30, zcond:1, [type: 2] +start: 0x38, len 0x8, cap 0x8, wptr 0x38, zcond:1, [type: 2] +start: 0x40, len 0x8, cap 0x8, wptr 0x40, zcond:1, [type: 2] +start: 0x48, len 0x8, cap 0x8, wptr 0x48, zcond:1, [type: 2] + +report the last zone: +start: 0x1f38, len 0x8, cap 0x8, wptr 0x1f38, zcond:1, [type: 2] + + +(2) opening the first zone +report after: +start: 0x0, len 0x8, cap 0x8, wptr 0x0, zcond:3, [type: 2] + +opening the second zone +report after: +start: 0x8, len 0x8, cap 0x8, wptr 0x8, zcond:3, [type: 2] + +opening the last zone +report after: +start: 0x1f38, len 0x8, cap 0x8, wptr 0x1f38, zcond:3, [type: 2] + + +(3) closing the first zone +report after: +start: 0x0, len 0x8, cap 0x8, wptr 0x0, zcond:1, [type: 2] + +closing the last zone +report after: +start: 0x1f38, len 0x8, cap 0x8, wptr 0x1f38, zcond:1, [type: 2] + + +(4) finishing the second zone +After finishing a zone: +start: 0x8, len 0x8, cap 0x8, wptr 0x10, zcond:14, [type: 2] + + +(5) resetting the second zone +After resetting a zone: +start: 0x8, len 0x8, cap 0x8, wptr 0x8, zcond:1, [type: 2] +*** done diff --git a/tests/qemu-iotests/tests/zoned.sh b/tests/qemu-iotests/tests/zoned.sh new file mode 100755 index 00..fced0194c5 --- /dev/null +++ b/tests/qemu-iotests/tests/zoned.sh @@ -0,0 +1,86 @@ +#!/usr/bin/env bash +# +# Test zone management operations. +# + +seq="$(basename $0)" +echo "QA output created by $seq" +status=1 # failure is the default! + +_cleanup() +{ + _cleanup_test_img + sudo rmmod null_blk +} +trap "_cleanup; exit \$status" 0 1 2 3 15 + +# get standard environment, filters and checks +. ./common.rc +. ./common.filter +. ./common.qemu + +# This test only runs on Linux hosts with raw image files. +_supported_fmt raw +_supported_proto file +_supported_os Linux + +QEMU_IO="build/qemu-io" +IMG="--image-opts -n driver=zoned_host_device,filename=/dev/nullb0" +QEMU_IO_OPTIONS=$QEMU_IO_OPTIONS_NO_FMT + +echo "Testing a null_blk device:" +echo "case 1: if the operations work" +sudo modprobe null_blk nr_devices=1 zoned=1 + +echo "(1) report the first zone:" +sudo $QEMU_IO $IMG -c "zrp 0 1" +echo +echo "report the first 10 zones" +sudo $QEMU_IO $IMG -c "zrp 0 10" +echo +echo "report the last zone:" +sudo $QEMU_IO $IMG -c "zrp 0x3e7000 2" # 0x3e7000 / 512 = 0x1f38 +echo +echo +echo "(2) opening the first zone" +sudo $QEMU_IO $IMG -c "zo 0 268435456" # 268435456 / 512 = 524288 +echo "report after:" +sudo $QEMU_IO $IMG -c "zrp 0 1" +echo +echo "opening the second zone" +sudo $QEMU_IO $IMG -c "zo 268435456 268435456" # +echo "report after:" +sudo $QEMU_IO $IMG -c "zrp 268435456 1" +echo +echo "opening the last zone" +sudo $QEMU_IO $IMG -c "zo 0x3e7000 268435456" +echo "report after:" +sudo $QEMU_IO $IMG -c "zrp 0x3e7000 2" +echo +echo +echo "(3) closing the first zone" +sudo $QEMU_IO $IMG -c "zc 0 268435456" +echo "report after:" +sudo $QEMU_IO $IMG -c "zrp 0 1" +echo +echo "closing the last zone" +sudo $QEMU_IO $IMG -c "zc 0x3e7000 268435456" +echo "report after:" +sudo $QEMU_IO $IMG -c "zrp 0x3e7000 2" +echo +echo +echo "(4) finishing the second zone" +sudo $QEMU_IO $IMG -c "zf 268435456 268435456" +echo "After finishing a zone:" +sudo $QEMU_IO $IMG -c "zrp 268435456 1" +echo +echo +echo "(5) resetting the second zone" +sudo $QEMU_IO $IMG -c "zrs 268435456 268435456" +echo "After resetting a zone:" +sudo $QEMU_IO $I
[PATCH v11 7/7] docs/zoned-storage: add zoned device documentation
Add the documentation about the zoned device support to virtio-blk emulation. Signed-off-by: Sam Li Reviewed-by: Stefan Hajnoczi --- docs/devel/zoned-storage.rst | 40 ++ docs/system/qemu-block-drivers.rst.inc | 6 2 files changed, 46 insertions(+) create mode 100644 docs/devel/zoned-storage.rst diff --git a/docs/devel/zoned-storage.rst b/docs/devel/zoned-storage.rst new file mode 100644 index 00..deaa4ce99b --- /dev/null +++ b/docs/devel/zoned-storage.rst @@ -0,0 +1,40 @@ += +zoned-storage += + +Zoned Block Devices (ZBDs) devide the LBA space into block regions called zones +that are larger than the LBA size. They can only allow sequential writes, which +can reduce write amplification in SSDs, and potentially lead to higher +throughput and increased capacity. More details about ZBDs can be found at: + +https://zonedstorage.io/docs/introduction/zoned-storage + +1. Block layer APIs for zoned storage +- +QEMU block layer has three zoned storage model: +- BLK_Z_HM: This model only allows sequential writes access. It supports a set +of ZBD-specific I/O request that used by the host to manage device zones. +- BLK_Z_HA: It deals with both sequential writes and random writes access. +- BLK_Z_NONE: Regular block devices and drive-managed ZBDs are treated as +non-zoned devices. + +The block device information resides inside BlockDriverState. QEMU uses +BlockLimits struct(BlockDriverState::bl) that is continuously accessed by the +block layer while processing I/O requests. A BlockBackend has a root pointer to +a BlockDriverState graph(for example, raw format on top of file-posix). The +zoned storage information can be propagated from the leaf BlockDriverState all +the way up to the BlockBackend. If the zoned storage model in file-posix is +set to BLK_Z_HM, then block drivers will declare support for zoned host device. + +The block layer APIs support commands needed for zoned storage devices, +including report zones, four zone operations, and zone append. + +2. Emulating zoned storage controllers +-- +When the BlockBackend's BlockLimits model reports a zoned storage device, users +like the virtio-blk emulation or the qemu-io-cmds.c utility can use block layer +APIs for zoned storage emulation or testing. + +For example, to test zone_report on a null_blk device using qemu-io is: +$ path/to/qemu-io --image-opts -n driver=zoned_host_device,filename=/dev/nullb0 +-c "zrp offset nr_zones" diff --git a/docs/system/qemu-block-drivers.rst.inc b/docs/system/qemu-block-drivers.rst.inc index dfe5d2293d..0b97227fd9 100644 --- a/docs/system/qemu-block-drivers.rst.inc +++ b/docs/system/qemu-block-drivers.rst.inc @@ -430,6 +430,12 @@ Hard disks you may corrupt your host data (use the ``-snapshot`` command line option or modify the device permissions accordingly). +Zoned block devices + Zoned block devices can be passed through to the guest if the emulated storage + controller supports zoned storage. Use ``--blockdev zoned_host_device, + node-name=drive0,filename=/dev/nullb0`` to pass through ``/dev/nullb0`` + as ``drive0``. + Windows ^^^ -- 2.37.3
[PATCH v11 3/7] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls
Add a new zoned_host_device BlockDriver. The zoned_host_device option accepts only zoned host block devices. By adding zone management operations in this new BlockDriver, users can use the new block layer APIs including Report Zone and four zone management operations (open, close, finish, reset, reset_all). Qemu-io uses the new APIs to perform zoned storage commands of the device: zone_report(zrp), zone_open(zo), zone_close(zc), zone_reset(zrs), zone_finish(zf). For example, to test zone_report, use following command: $ ./build/qemu-io --image-opts -n driver=zoned_host_device, filename=/dev/nullb0 -c "zrp offset nr_zones" Signed-off-by: Sam Li Reviewed-by: Hannes Reinecke --- block/block-backend.c | 146 + block/file-posix.c| 329 ++ block/io.c| 41 include/block/block-common.h | 1 + include/block/block-io.h | 7 + include/block/block_int-common.h | 24 +++ include/block/raw-aio.h | 6 +- include/sysemu/block-backend-io.h | 17 ++ meson.build | 4 + qapi/block-core.json | 8 +- qemu-io-cmds.c| 148 ++ 11 files changed, 728 insertions(+), 3 deletions(-) diff --git a/block/block-backend.c b/block/block-backend.c index d4a5df2ac2..ddc569e3ac 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -1431,6 +1431,15 @@ typedef struct BlkRwCo { void *iobuf; int ret; BdrvRequestFlags flags; +union { +struct { +unsigned int *nr_zones; +BlockZoneDescriptor *zones; +} zone_report; +struct { +BlockZoneOp op; +} zone_mgmt; +}; } BlkRwCo; int blk_make_zero(BlockBackend *blk, BdrvRequestFlags flags) @@ -1775,6 +1784,143 @@ int coroutine_fn blk_co_flush(BlockBackend *blk) return ret; } +static void coroutine_fn blk_aio_zone_report_entry(void *opaque) { +BlkAioEmAIOCB *acb = opaque; +BlkRwCo *rwco = &acb->rwco; + +rwco->ret = blk_co_zone_report(rwco->blk, rwco->offset, + rwco->zone_report.nr_zones, + rwco->zone_report.zones); +blk_aio_complete(acb); +} + +BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset, +unsigned int *nr_zones, +BlockZoneDescriptor *zones, +BlockCompletionFunc *cb, void *opaque) +{ +BlkAioEmAIOCB *acb; +Coroutine *co; +IO_CODE(); + +blk_inc_in_flight(blk); +acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque); +acb->rwco = (BlkRwCo) { +.blk= blk, +.offset = offset, +.ret= NOT_DONE, +.zone_report = { +.zones = zones, +.nr_zones = nr_zones, +}, +}; +acb->has_returned = false; + +co = qemu_coroutine_create(blk_aio_zone_report_entry, acb); +bdrv_coroutine_enter(blk_bs(blk), co); + +acb->has_returned = true; +if (acb->rwco.ret != NOT_DONE) { +replay_bh_schedule_oneshot_event(blk_get_aio_context(blk), + blk_aio_complete_bh, acb); +} + +return &acb->common; +} + +static void coroutine_fn blk_aio_zone_mgmt_entry(void *opaque) { +BlkAioEmAIOCB *acb = opaque; +BlkRwCo *rwco = &acb->rwco; + +rwco->ret = blk_co_zone_mgmt(rwco->blk, rwco->zone_mgmt.op, + rwco->offset, acb->bytes); +blk_aio_complete(acb); +} + +BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op, + int64_t offset, int64_t len, + BlockCompletionFunc *cb, void *opaque) { +BlkAioEmAIOCB *acb; +Coroutine *co; +IO_CODE(); + +blk_inc_in_flight(blk); +acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque); +acb->rwco = (BlkRwCo) { +.blk= blk, +.offset = offset, +.ret= NOT_DONE, +.zone_mgmt = { +.op = op, +}, +}; +acb->bytes = len; +acb->has_returned = false; + +co = qemu_coroutine_create(blk_aio_zone_mgmt_entry, acb); +bdrv_coroutine_enter(blk_bs(blk), co); + +acb->has_returned = true; +if (acb->rwco.ret != NOT_DONE) { +replay_bh_schedule_oneshot_event(blk_get_aio_context(blk), + blk_aio_complete_bh, acb); +} + +return &acb->common; +} + +/* + * Send a zone_report command. + * offset is a byte offset from the start of the device. No alignment + * required for offset. + * nr_zones represents IN maximum and OUT actual. + */ +int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset, +unsigned int *nr_zones, +BlockZoneDescriptor *zones) +{ +int ret; +IO_CODE(); + +blk_inc_in_flight(blk); /* inc
[PATCH v11 4/7] raw-format: add zone operations to pass through requests
raw-format driver usually sits on top of file-posix driver. It needs to pass through requests of zone commands. Signed-off-by: Sam Li Reviewed-by: Stefan Hajnoczi Reviewed-by: Damien Le Moal --- block/raw-format.c | 13 + 1 file changed, 13 insertions(+) diff --git a/block/raw-format.c b/block/raw-format.c index c7278e348e..618c6b1ec2 100644 --- a/block/raw-format.c +++ b/block/raw-format.c @@ -314,6 +314,17 @@ static int coroutine_fn raw_co_pdiscard(BlockDriverState *bs, return bdrv_co_pdiscard(bs->file, offset, bytes); } +static int coroutine_fn raw_co_zone_report(BlockDriverState *bs, int64_t offset, + unsigned int *nr_zones, + BlockZoneDescriptor *zones) { +return bdrv_co_zone_report(bs->file->bs, offset, nr_zones, zones); +} + +static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op, + int64_t offset, int64_t len) { +return bdrv_co_zone_mgmt(bs->file->bs, op, offset, len); +} + static int64_t raw_getlength(BlockDriverState *bs) { int64_t len; @@ -614,6 +625,8 @@ BlockDriver bdrv_raw = { .bdrv_co_pwritev = &raw_co_pwritev, .bdrv_co_pwrite_zeroes = &raw_co_pwrite_zeroes, .bdrv_co_pdiscard = &raw_co_pdiscard, +.bdrv_co_zone_report = &raw_co_zone_report, +.bdrv_co_zone_mgmt = &raw_co_zone_mgmt, .bdrv_co_block_status = &raw_co_block_status, .bdrv_co_copy_range_from = &raw_co_copy_range_from, .bdrv_co_copy_range_to = &raw_co_copy_range_to, -- 2.37.3
[PATCH v11 2/7] file-posix: introduce helper functions for sysfs attributes
Use get_sysfs_str_val() to get the string value of device zoned model. Then get_sysfs_zoned_model() can convert it to BlockZoneModel type of QEMU. Use get_sysfs_long_val() to get the long value of zoned device information. Signed-off-by: Sam Li Reviewed-by: Hannes Reinecke Reviewed-by: Stefan Hajnoczi Reviewed-by: Damien Le Moal --- block/file-posix.c | 121 ++- include/block/block_int-common.h | 3 + 2 files changed, 88 insertions(+), 36 deletions(-) diff --git a/block/file-posix.c b/block/file-posix.c index 66fdb07820..0db4b04e8a 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -1210,66 +1210,109 @@ static int hdev_get_max_hw_transfer(int fd, struct stat *st) #endif } -static int hdev_get_max_segments(int fd, struct stat *st) -{ +/* + * Get a sysfs attribute value as character string. + */ +static int get_sysfs_str_val(struct stat *st, const char *attribute, + char **val) { #ifdef CONFIG_LINUX -char buf[32]; -const char *end; -char *sysfspath = NULL; +g_autofree char *sysfspath = NULL; int ret; -int sysfd = -1; -long max_segments; +size_t len; -if (S_ISCHR(st->st_mode)) { -if (ioctl(fd, SG_GET_SG_TABLESIZE, &ret) == 0) { -return ret; -} +if (!S_ISBLK(st->st_mode)) { return -ENOTSUP; } -if (!S_ISBLK(st->st_mode)) { -return -ENOTSUP; +sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/%s", +major(st->st_rdev), minor(st->st_rdev), +attribute); +ret = g_file_get_contents(sysfspath, val, &len, NULL); +if (ret == -1) { +return -ENOENT; } -sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/max_segments", -major(st->st_rdev), minor(st->st_rdev)); -sysfd = open(sysfspath, O_RDONLY); -if (sysfd == -1) { -ret = -errno; -goto out; +/* The file is ended with '\n' */ +char *p; +p = *val; +if (*(p + len - 1) == '\n') { +*(p + len - 1) = '\0'; } -do { -ret = read(sysfd, buf, sizeof(buf) - 1); -} while (ret == -1 && errno == EINTR); +return ret; +#else +return -ENOTSUP; +#endif +} + +static int get_sysfs_zoned_model(struct stat *st, BlockZoneModel *zoned) { +g_autofree char *val = NULL; +int ret; + +ret = get_sysfs_str_val(st, "zoned", &val); if (ret < 0) { -ret = -errno; -goto out; -} else if (ret == 0) { -ret = -EIO; -goto out; +return ret; } -buf[ret] = 0; -/* The file is ended with '\n', pass 'end' to accept that. */ -ret = qemu_strtol(buf, &end, 10, &max_segments); -if (ret == 0 && end && *end == '\n') { -ret = max_segments; + +if (strcmp(val, "host-managed") == 0) { +*zoned = BLK_Z_HM; +} else if (strcmp(val, "host-aware") == 0) { +*zoned = BLK_Z_HA; +} else if (strcmp(val, "none") == 0) { +*zoned = BLK_Z_NONE; +} else { +return -ENOTSUP; } +return 0; +} -out: -if (sysfd != -1) { -close(sysfd); +/* + * Get a sysfs attribute value as a long integer. + */ +static long get_sysfs_long_val(struct stat *st, const char *attribute) { +#ifdef CONFIG_LINUX +g_autofree char *str = NULL; +const char *end; +long val; +int ret; + +ret = get_sysfs_str_val(st, attribute, &str); +if (ret < 0) { +return ret; +} + +/* The file is ended with '\n', pass 'end' to accept that. */ +ret = qemu_strtol(str, &end, 10, &val); +if (ret == 0 && end && *end == '\0') { +ret = val; } -g_free(sysfspath); return ret; #else return -ENOTSUP; #endif } +static int hdev_get_max_segments(int fd, struct stat *st) { +#ifdef CONFIG_LINUX +int ret; + +if (S_ISCHR(st->st_mode)) { +if (ioctl(fd, SG_GET_SG_TABLESIZE, &ret) == 0) { +return ret; +} +return -ENOTSUP; +} +return get_sysfs_long_val(st, "max_segments"); +#else +return -ENOTSUP; +#endif +} + static void raw_refresh_limits(BlockDriverState *bs, Error **errp) { BDRVRawState *s = bs->opaque; struct stat st; +int ret; +BlockZoneModel zoned; s->needs_alignment = raw_needs_alignment(bs); raw_probe_alignment(bs, s->fd, errp); @@ -1307,6 +1350,12 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp) bs->bl.max_hw_iov = ret; } } + +ret = get_sysfs_zoned_model(&st, &zoned); +if (ret < 0) { +zoned = BLK_Z_NONE; +} +bs->bl.zoned = zoned; } static int check_for_dasd(int fd) diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h index 8947abab76..7f7863cc9e 100644 --- a/include/block/block_int-common.h +++ b/include/block/block_int-common.h @@ -825,6 +825,9 @@ typedef struct BlockLimits {
[PATCH v11 1/7] include: add zoned device structs
Signed-off-by: Sam Li Reviewed-by: Stefan Hajnoczi Reviewed-by: Damien Le Moal --- include/block/block-common.h | 43 1 file changed, 43 insertions(+) diff --git a/include/block/block-common.h b/include/block/block-common.h index fdb7306e78..36bd0e480e 100644 --- a/include/block/block-common.h +++ b/include/block/block-common.h @@ -49,6 +49,49 @@ typedef struct BlockDriver BlockDriver; typedef struct BdrvChild BdrvChild; typedef struct BdrvChildClass BdrvChildClass; +typedef enum BlockZoneOp { +BLK_ZO_OPEN, +BLK_ZO_CLOSE, +BLK_ZO_FINISH, +BLK_ZO_RESET, +} BlockZoneOp; + +typedef enum BlockZoneModel { +BLK_Z_NONE = 0x0, /* Regular block device */ +BLK_Z_HM = 0x1, /* Host-managed zoned block device */ +BLK_Z_HA = 0x2, /* Host-aware zoned block device */ +} BlockZoneModel; + +typedef enum BlockZoneCondition { +BLK_ZS_NOT_WP = 0x0, +BLK_ZS_EMPTY = 0x1, +BLK_ZS_IOPEN = 0x2, +BLK_ZS_EOPEN = 0x3, +BLK_ZS_CLOSED = 0x4, +BLK_ZS_RDONLY = 0xD, +BLK_ZS_FULL = 0xE, +BLK_ZS_OFFLINE = 0xF, +} BlockZoneCondition; + +typedef enum BlockZoneType { +BLK_ZT_CONV = 0x1, /* Conventional random writes supported */ +BLK_ZT_SWR = 0x2, /* Sequential writes required */ +BLK_ZT_SWP = 0x3, /* Sequential writes preferred */ +} BlockZoneType; + +/* + * Zone descriptor data structure. + * Provides information on a zone with all position and size values in bytes. + */ +typedef struct BlockZoneDescriptor { +uint64_t start; +uint64_t length; +uint64_t cap; +uint64_t wp; +BlockZoneType type; +BlockZoneCondition cond; +} BlockZoneDescriptor; + typedef struct BlockDriverInfo { /* in bytes, 0 if irrelevant */ int cluster_size; -- 2.37.3
[PATCH v11 0/7] Add support for zoned device
Zoned Block Devices (ZBDs) devide the LBA space to block regions called zones that are larger than the LBA size. It can only allow sequential writes, which reduces write amplification in SSD, leading to higher throughput and increased capacity. More details about ZBDs can be found at: https://zonedstorage.io/docs/introduction/zoned-storage The zoned device support aims to let guests (virtual machines) access zoned storage devices on the host (hypervisor) through a virtio-blk device. This involves extending QEMU's block layer and virtio-blk emulation code. In its current status, the virtio-blk device is not aware of ZBDs but the guest sees host-managed drives as regular drive that will runs correctly under the most common write workloads. This patch series extend the block layer APIs with the minimum set of zoned commands that are necessary to support zoned devices. The commands are - Report Zones, four zone operations and Zone Append (developing). It can be tested on a null_blk device using qemu-io or qemu-iotests. For example, to test zone report using qemu-io: $ path/to/qemu-io --image-opts -n driver=zoned_host_device,filename=/dev/nullb0 -c "zrp offset nr_zones" v11: - address review comments * fix possible BLKZONED config compiling warnings [Stefan] * fix capacity field compiling warnings on older kernel [Stefan,Damien] v10: - address review comments * deal with the last small zone case in zone_mgmt operations [Damien] * handle the capacity field outdated in old kernel(before 5.9) [Damien] * use byte unit in block layer to be consistent with QEMU [Eric] * fix coding style related problems [Stefan] v9: - address review comments * specify units of zone commands requests [Stefan] * fix some error handling in file-posix [Stefan] * introduce zoned_host_devcie in the commit message [Markus] v8: - address review comments * solve patch conflicts and merge sysfs helper funcations into one patch * add cache.direct=on check in config v7: - address review comments * modify sysfs attribute helper funcations * move the input validation and error checking into raw_co_zone_* function * fix checks in config v6: - drop virtio-blk emulation changes - address Stefan's review comments * fix CONFIG_BLKZONED configs in related functions * replace reading fd by g_file_get_contents() in get_sysfs_str_val() * rewrite documentation for zoned storage v5: - add zoned storage emulation to virtio-blk device - add documentation for zoned storage - address review comments * fix qemu-iotests * fix check to block layer * modify interfaces of sysfs helper functions * rename zoned device structs according to QEMU styles * reorder patches v4: - add virtio-blk headers for zoned device - add configurations for zoned host device - add zone operations for raw-format - address review comments * fix memory leak bug in zone_report * add checks to block layers * fix qemu-iotests format * fix sysfs helper functions v3: - add helper functions to get sysfs attributes - address review comments * fix zone report bugs * fix the qemu-io code path * use thread pool to avoid blocking ioctl() calls v2: - add qemu-io sub-commands - address review comments * modify interfaces of APIs v1: - add block layer APIs resembling Linux ZoneBlockDevice ioctls Sam Li (7): include: add zoned device structs file-posix: introduce helper functions for sysfs attributes block: add block layer APIs resembling Linux ZonedBlockDevice ioctls raw-format: add zone operations to pass through requests config: add check to block layer qemu-iotests: test new zone operations docs/zoned-storage: add zoned device documentation block.c| 17 + block/block-backend.c | 146 block/file-posix.c | 463 +++-- block/io.c | 41 +++ block/raw-format.c | 14 + docs/devel/zoned-storage.rst | 40 +++ docs/system/qemu-block-drivers.rst.inc | 6 + include/block/block-common.h | 44 +++ include/block/block-io.h | 7 + include/block/block_int-common.h | 32 ++ include/block/raw-aio.h| 6 +- include/sysemu/block-backend-io.h | 17 + meson.build| 4 + qapi/block-core.json | 8 +- qemu-io-cmds.c | 148 tests/qemu-iotests/tests/zoned.out | 53 +++ tests/qemu-iotests/tests/zoned.sh | 86 + 17 files changed, 1093 insertions(+), 39 deletions(-) mode change 100644 => 100755 block.c mode change 100644 => 100755 block/file-posix.c create mode 100644 docs/devel/zoned-storage.rst create mode 100644 tests/qemu-iotests/tests/zoned.out create mode 100755 tests/qemu-iotests/tests/zoned.sh -- 2.37.3
Re: [PATCH v2 00/13] ppc/e500: Add support for two types of flash, cleanup
On 10/9/22 00:30, Bin Meng wrote: On Sun, Oct 9, 2022 at 12:11 AM Bernhard Beschow wrote: Am 4. Oktober 2022 12:43:35 UTC schrieb Daniel Henrique Barboza : Hey, On 10/3/22 18:27, Philippe Mathieu-Daudé wrote: Hi Daniel, On 3/10/22 22:31, Bernhard Beschow wrote: Cover letter: ~ This series adds support for -pflash and direct SD card access to the PPC e500 boards. The idea is to increase compatibility with "real" firmware images where only the bare minimum of drivers is compiled in. Bernhard Beschow (13): hw/ppc/meson: Allow e500 boards to be enabled separately hw/gpio/meson: Introduce dedicated config switch for hw/gpio/mpc8xxx docs/system/ppc/ppce500: Add heading for networking chapter hw/ppc/e500: Reduce usage of sysbus API hw/ppc/mpc8544ds: Rename wrongly named method hw/ppc/mpc8544ds: Add platform bus hw/ppc/e500: Remove if statement which is now always true This first part is mostly reviewed and can already go via your ppc-next queue. We're missing an ACK in patch 6/13: hw/ppc/mpc8544ds: Add platform bus Bin: Ping? Sorry for the delay. I have provided the R-b to this patch. Thanks for the review. Patches 1-7 queued in gitlab.com/danielhb/qemu/tree/ppc-next. Daniel Regards, Bin