Re: [PATCH v8 3/7] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls

2022-09-29 Thread Markus Armbruster
Damien Le Moal  writes:

> On 9/1/22 23:57, Markus Armbruster wrote:
>> Sam Li  writes:
>> 
>>> Markus Armbruster  于2022年8月31日周三 16:35写道:

 Sam Li  writes:

> Markus Armbruster  于2022年8月30日周二 19:57写道:

[...]

> Zoned_host_device is basically host_device + zone operations. It
> serves for a simple purpose: if the host device is zoned, register
> zoned_host_device driver; else, register host_device.

 Why would I ever want to use host_device instead of zoned_host_device?

 To answer this question, we need to understand how their behavior
 differs.

 We can ignore the legacy protocol prefix / string filename part.

 All that's left seems to be "if the host device is zoned, then using the
 zoned_host_device driver gets you the zoned features, whereas using the
 host_device driver doesn't".  What am I missing?
>>>
>>> I think that's basically what users need to know about.
>> 
>> Now answer my previous question, please: why would I ever want to use
>> host_device instead of zoned_host_device?
>> 
>> Or in other words, why would I ever want to present a zoned host device
>> to a guest as non-zoned device?
>> 
>> Notably common is .bdrv_file_open = hdev_open.  What happens when you
>> try to create a zoned_host_device where the @filename argument is not in
>> fact a zoned device?
>
> If the device is a regular block device, QEMU will still open the
> device. For instance, I use a loopback device to test zone_report in
> qemu-io. It returns ENOTTY which indicates Inappropriate ioctl for the
> device. Meanwhile, if using a regular block device when emulation a
> zoned device on a guest os, the best case is that the guest can boot
> but has no emulated block device. In some cases, QEMU just terminates
> because the block device has not met the alignment requirements.

 I'm not sure I understand all of this.  I'm also not sure I have to :)
>>>
>>> Maybe I didn't explain it very well. Which part would you like to know
>>> more about?
>> 
>> Let's try more specific questions.  Say I configure a zoned_host_device
>> backed by a host device that isn't zoned.
>> 
>> 1. Is this configuration accepted?
>
> If we assume we have the special zoned_host_device driver, with the
> associated command line zoned_host_device option explicitly calling for
> it, then no, I do not think this should be allowed at all and an error
> should be returned on startup. That would be consistent with the fact that
> the options zoned_host_device and host_device are different to make sure
> we can check that the user knows what he/she/them is doing.
>
> If we have only host_device as a setup option and driver, the driver
> methods can be trivially adjusted to do the right thing based on the
> device type (i.e. zoned vs regular/not zoned). However, that would prevent
> an interesting future extension of this work to implement a full zone
> emulation on top of a regular (not zoned) host block device.
>
> With this in mind, we currently have the following:
>
> 1) host_device option -> accept only regular non-zoned host block devices
> 2) zoned_host_device option -> accept only zoned host block devices

2) matches my intuitive expectations for this driver name.

However, if host_device works even with a zoned host device before the
patch presenting it as non-zoned to the guest, then it needs to continue
to do so.

> And in the future, we can have:
>
> 1) host_device option -> accept only regular non-zoned host block devices
> 2) zoned_host_device option -> accept any host block device type
>   a) Use native zone kernel API for zoned host block devices
>   b) Use full zone emulation for regular host block devices

Understood.

> But sure, internally, we could have a single driver structure with methods
> adjusted to do the correct thing based on the device type and option
> specified. Having a 1:1 mapping between the driver name and driver
> structure does clarify things I think (even though there are indeed a lot
> of methods that are identical).

I think this is basically a matter of user interface design.  Let's
review what we have: host_device and host_cdrom.  I'm only passingly
familiar with them, so please correct my misunderstandings, if any.

host_device and host_cdrom let you "pass through" a host device to a
guest.

host_cdrom presents a removable device to the guest.  I appears to
accept any host block device, even a non-removable one.  What happens
when you try to use a non-removable host device as removable guest
device I don't know.

host_device presents a non-removable device to the guest.  It accepts
any host block device, even a removable one (as long as it has a
medium).

host_device detects whether the host device is a SCSI generic device.
Guest devices scsi-hd and scsi-cd reject a SCSI generic host device.
Guest device scsi-block requires one (I think).

On the one hand, there is precedence for using 

Re: [PATCH v8 3/7] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls

2022-09-01 Thread Damien Le Moal
On 9/1/22 23:57, Markus Armbruster wrote:
> Sam Li  writes:
> 
>> Markus Armbruster  于2022年8月31日周三 16:35写道:
>>>
>>> Sam Li  writes:
>>>
 Markus Armbruster  于2022年8月30日周二 19:57写道:
>
> Sam Li  writes:
>
>> By adding zone management operations in BlockDriver, storage controller
>> emulation can use the new block layer APIs including Report Zone and
>> four zone management operations (open, close, finish, reset).
>>
>> Add zoned storage commands of the device: zone_report(zrp), 
>> zone_open(zo),
>> zone_close(zc), zone_reset(zrs), zone_finish(zf).
>>
>> For example, to test zone_report, use following command:
>> $ ./build/qemu-io --image-opts driver=zoned_host_device, 
>> filename=/dev/nullb0
>> -c "zrp offset nr_zones"
>>
>> Signed-off-by: Sam Li 
>> Reviewed-by: Hannes Reinecke 
>
> [...]
>
>> diff --git a/block/file-posix.c b/block/file-posix.c
>> index 0a8b4b426e..e3efba6db7 100644
>> --- a/block/file-posix.c
>> +++ b/block/file-posix.c
>
> [...]
>
>> @@ -3752,6 +4025,54 @@ static BlockDriver bdrv_host_device = {
>>  #endif
>>  };
>>
>> +#if defined(CONFIG_BLKZONED)
>> +static BlockDriver bdrv_zoned_host_device = {
>> +.format_name = "zoned_host_device",
>
> Indentation should be 4, not 8.
>
>> +.protocol_name = "zoned_host_device",
>> +.instance_size = sizeof(BDRVRawState),
>> +.bdrv_needs_filename = true,
>> +.bdrv_probe_device  = hdev_probe_device,
>> +.bdrv_file_open = hdev_open,
>> +.bdrv_close = raw_close,
>> +.bdrv_reopen_prepare = raw_reopen_prepare,
>> +.bdrv_reopen_commit  = raw_reopen_commit,
>> +.bdrv_reopen_abort   = raw_reopen_abort,
>> +.bdrv_co_create_opts = bdrv_co_create_opts_simple,
>> +.create_opts = _create_opts_simple,
>> +.mutable_opts= mutable_opts,
>> +.bdrv_co_invalidate_cache = raw_co_invalidate_cache,
>> +.bdrv_co_pwrite_zeroes = hdev_co_pwrite_zeroes,
>> +
>> +.bdrv_co_preadv = raw_co_preadv,
>> +.bdrv_co_pwritev= raw_co_pwritev,
>> +.bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
>> +.bdrv_co_pdiscard   = hdev_co_pdiscard,
>> +.bdrv_co_copy_range_from = raw_co_copy_range_from,
>> +.bdrv_co_copy_range_to  = raw_co_copy_range_to,
>> +.bdrv_refresh_limits = raw_refresh_limits,
>> +.bdrv_io_plug = raw_aio_plug,
>> +.bdrv_io_unplug = raw_aio_unplug,
>> +.bdrv_attach_aio_context = raw_aio_attach_aio_context,
>> +
>> +.bdrv_co_truncate   = raw_co_truncate,
>> +.bdrv_getlength = raw_getlength,
>> +.bdrv_get_info = raw_get_info,
>> +.bdrv_get_allocated_file_size
>> += raw_get_allocated_file_size,
>> +.bdrv_get_specific_stats = hdev_get_specific_stats,
>> +.bdrv_check_perm = raw_check_perm,
>> +.bdrv_set_perm   = raw_set_perm,
>> +.bdrv_abort_perm_update = raw_abort_perm_update,
>> +.bdrv_probe_blocksizes = hdev_probe_blocksizes,
>> +.bdrv_probe_geometry = hdev_probe_geometry,
>> +.bdrv_co_ioctl = hdev_co_ioctl,
>> +
>> +/* zone management operations */
>> +.bdrv_co_zone_report = raw_co_zone_report,
>> +.bdrv_co_zone_mgmt = raw_co_zone_mgmt,
>> +};
>
> Differences to bdrv_host_device:
>
> * .bdrv_parse_filename is not set
>
> * .bdrv_co_ioctl is not set
>
> * .bdrv_co_zone_report and .bdrv_co_zone_mgmt are set

 As Stefan mentioned, zoned_host_device is a new driver that doesn't
 work with string filenames. .bdrv_parse_filename() helps legacy
 drivers strip the optional protocol prefix off the filename and no use
 here. Therefore it can be dropped.
>>>
>>> Makes sense.
>>>
 .bdrv_co_ioctl is set actually.
>>>
>>> You're right; I diffed the two and misread the result.
>>>
 Zoned_host_device is basically host_device + zone operations. It
 serves for a simple purpose: if the host device is zoned, register
 zoned_host_device driver; else, register host_device.
>>>
>>> Why would I ever want to use host_device instead of zoned_host_device?
>>>
>>> To answer this question, we need to understand how their behavior
>>> differs.
>>>
>>> We can ignore the legacy protocol prefix / string filename part.
>>>
>>> All that's left seems to be "if the host device is zoned, then using the
>>> zoned_host_device driver gets you the zoned features, whereas using the
>>> host_device driver doesn't".  What am I missing?
>>
>> I think that's basically what users need to know about.
> 
> Now answer my previous question, please: why would 

Re: [PATCH v8 3/7] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls

2022-09-01 Thread Markus Armbruster
Markus Armbruster  writes:

> Sam Li  writes:
>
>> Markus Armbruster  于2022年8月31日周三 16:35写道:
>>>
>>> Sam Li  writes:
>>>
>>> > Markus Armbruster  于2022年8月30日周二 19:57写道:
>>> >>
>>> >> Sam Li  writes:
>>> >>
>>> >> > By adding zone management operations in BlockDriver, storage controller
>>> >> > emulation can use the new block layer APIs including Report Zone and
>>> >> > four zone management operations (open, close, finish, reset).
>>> >> >
>>> >> > Add zoned storage commands of the device: zone_report(zrp), 
>>> >> > zone_open(zo),
>>> >> > zone_close(zc), zone_reset(zrs), zone_finish(zf).
>>> >> >
>>> >> > For example, to test zone_report, use following command:
>>> >> > $ ./build/qemu-io --image-opts driver=zoned_host_device, 
>>> >> > filename=/dev/nullb0
>>> >> > -c "zrp offset nr_zones"
>>> >> >
>>> >> > Signed-off-by: Sam Li 
>>> >> > Reviewed-by: Hannes Reinecke 
>>> >>
>>> >> [...]
>>> >>
>>> >> > diff --git a/block/file-posix.c b/block/file-posix.c
>>> >> > index 0a8b4b426e..e3efba6db7 100644
>>> >> > --- a/block/file-posix.c
>>> >> > +++ b/block/file-posix.c
>>> >>
>>> >> [...]
>>> >>
>>> >> > @@ -3752,6 +4025,54 @@ static BlockDriver bdrv_host_device = {
>>> >> >  #endif
>>> >> >  };
>>> >> >
>>> >> > +#if defined(CONFIG_BLKZONED)
>>> >> > +static BlockDriver bdrv_zoned_host_device = {
>>> >> > +.format_name = "zoned_host_device",
>>> >>
>>> >> Indentation should be 4, not 8.
>>> >>
>>> >> > +.protocol_name = "zoned_host_device",
>>> >> > +.instance_size = sizeof(BDRVRawState),
>>> >> > +.bdrv_needs_filename = true,
>>> >> > +.bdrv_probe_device  = hdev_probe_device,
>>> >> > +.bdrv_file_open = hdev_open,
>>> >> > +.bdrv_close = raw_close,
>>> >> > +.bdrv_reopen_prepare = raw_reopen_prepare,
>>> >> > +.bdrv_reopen_commit  = raw_reopen_commit,
>>> >> > +.bdrv_reopen_abort   = raw_reopen_abort,
>>> >> > +.bdrv_co_create_opts = bdrv_co_create_opts_simple,
>>> >> > +.create_opts = _create_opts_simple,
>>> >> > +.mutable_opts= mutable_opts,
>>> >> > +.bdrv_co_invalidate_cache = raw_co_invalidate_cache,
>>> >> > +.bdrv_co_pwrite_zeroes = hdev_co_pwrite_zeroes,
>>> >> > +
>>> >> > +.bdrv_co_preadv = raw_co_preadv,
>>> >> > +.bdrv_co_pwritev= raw_co_pwritev,
>>> >> > +.bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
>>> >> > +.bdrv_co_pdiscard   = hdev_co_pdiscard,
>>> >> > +.bdrv_co_copy_range_from = raw_co_copy_range_from,
>>> >> > +.bdrv_co_copy_range_to  = raw_co_copy_range_to,
>>> >> > +.bdrv_refresh_limits = raw_refresh_limits,
>>> >> > +.bdrv_io_plug = raw_aio_plug,
>>> >> > +.bdrv_io_unplug = raw_aio_unplug,
>>> >> > +.bdrv_attach_aio_context = raw_aio_attach_aio_context,
>>> >> > +
>>> >> > +.bdrv_co_truncate   = raw_co_truncate,
>>> >> > +.bdrv_getlength = raw_getlength,
>>> >> > +.bdrv_get_info = raw_get_info,
>>> >> > +.bdrv_get_allocated_file_size
>>> >> > += raw_get_allocated_file_size,
>>> >> > +.bdrv_get_specific_stats = hdev_get_specific_stats,
>>> >> > +.bdrv_check_perm = raw_check_perm,
>>> >> > +.bdrv_set_perm   = raw_set_perm,
>>> >> > +.bdrv_abort_perm_update = raw_abort_perm_update,
>>> >> > +.bdrv_probe_blocksizes = hdev_probe_blocksizes,
>>> >> > +.bdrv_probe_geometry = hdev_probe_geometry,
>>> >> > +.bdrv_co_ioctl = hdev_co_ioctl,
>>> >> > +
>>> >> > +/* zone management operations */
>>> >> > +.bdrv_co_zone_report = raw_co_zone_report,
>>> >> > +.bdrv_co_zone_mgmt = raw_co_zone_mgmt,
>>> >> > +};
>>> >>
>>> >> Differences to bdrv_host_device:
>>> >>
>>> >> * .bdrv_parse_filename is not set
>>> >>
>>> >> * .bdrv_co_ioctl is not set
>>> >>
>>> >> * .bdrv_co_zone_report and .bdrv_co_zone_mgmt are set
>>> >
>>> > As Stefan mentioned, zoned_host_device is a new driver that doesn't
>>> > work with string filenames. .bdrv_parse_filename() helps legacy
>>> > drivers strip the optional protocol prefix off the filename and no use
>>> > here. Therefore it can be dropped.
>>>
>>> Makes sense.
>>>
>>> > .bdrv_co_ioctl is set actually.
>>>
>>> You're right; I diffed the two and misread the result.
>>>
>>> > Zoned_host_device is basically host_device + zone operations. It
>>> > serves for a simple purpose: if the host device is zoned, register
>>> > zoned_host_device driver; else, register host_device.
>>>
>>> Why would I ever want to use host_device instead of zoned_host_device?
>>>
>>> To answer this question, we need to understand how their behavior
>>> differs.
>>>
>>> We can ignore the legacy protocol prefix / string filename part.
>>>
>>> All that's left seems to be "if the host device is zoned, then using the
>>> zoned_host_device driver gets you the zoned features, whereas using the
>>> host_device 

Re: [PATCH v8 3/7] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls

2022-09-01 Thread Markus Armbruster
Sam Li  writes:

> Markus Armbruster  于2022年8月31日周三 16:35写道:
>>
>> Sam Li  writes:
>>
>> > Markus Armbruster  于2022年8月30日周二 19:57写道:
>> >>
>> >> Sam Li  writes:
>> >>
>> >> > By adding zone management operations in BlockDriver, storage controller
>> >> > emulation can use the new block layer APIs including Report Zone and
>> >> > four zone management operations (open, close, finish, reset).
>> >> >
>> >> > Add zoned storage commands of the device: zone_report(zrp), 
>> >> > zone_open(zo),
>> >> > zone_close(zc), zone_reset(zrs), zone_finish(zf).
>> >> >
>> >> > For example, to test zone_report, use following command:
>> >> > $ ./build/qemu-io --image-opts driver=zoned_host_device, 
>> >> > filename=/dev/nullb0
>> >> > -c "zrp offset nr_zones"
>> >> >
>> >> > Signed-off-by: Sam Li 
>> >> > Reviewed-by: Hannes Reinecke 
>> >>
>> >> [...]
>> >>
>> >> > diff --git a/block/file-posix.c b/block/file-posix.c
>> >> > index 0a8b4b426e..e3efba6db7 100644
>> >> > --- a/block/file-posix.c
>> >> > +++ b/block/file-posix.c
>> >>
>> >> [...]
>> >>
>> >> > @@ -3752,6 +4025,54 @@ static BlockDriver bdrv_host_device = {
>> >> >  #endif
>> >> >  };
>> >> >
>> >> > +#if defined(CONFIG_BLKZONED)
>> >> > +static BlockDriver bdrv_zoned_host_device = {
>> >> > +.format_name = "zoned_host_device",
>> >>
>> >> Indentation should be 4, not 8.
>> >>
>> >> > +.protocol_name = "zoned_host_device",
>> >> > +.instance_size = sizeof(BDRVRawState),
>> >> > +.bdrv_needs_filename = true,
>> >> > +.bdrv_probe_device  = hdev_probe_device,
>> >> > +.bdrv_file_open = hdev_open,
>> >> > +.bdrv_close = raw_close,
>> >> > +.bdrv_reopen_prepare = raw_reopen_prepare,
>> >> > +.bdrv_reopen_commit  = raw_reopen_commit,
>> >> > +.bdrv_reopen_abort   = raw_reopen_abort,
>> >> > +.bdrv_co_create_opts = bdrv_co_create_opts_simple,
>> >> > +.create_opts = _create_opts_simple,
>> >> > +.mutable_opts= mutable_opts,
>> >> > +.bdrv_co_invalidate_cache = raw_co_invalidate_cache,
>> >> > +.bdrv_co_pwrite_zeroes = hdev_co_pwrite_zeroes,
>> >> > +
>> >> > +.bdrv_co_preadv = raw_co_preadv,
>> >> > +.bdrv_co_pwritev= raw_co_pwritev,
>> >> > +.bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
>> >> > +.bdrv_co_pdiscard   = hdev_co_pdiscard,
>> >> > +.bdrv_co_copy_range_from = raw_co_copy_range_from,
>> >> > +.bdrv_co_copy_range_to  = raw_co_copy_range_to,
>> >> > +.bdrv_refresh_limits = raw_refresh_limits,
>> >> > +.bdrv_io_plug = raw_aio_plug,
>> >> > +.bdrv_io_unplug = raw_aio_unplug,
>> >> > +.bdrv_attach_aio_context = raw_aio_attach_aio_context,
>> >> > +
>> >> > +.bdrv_co_truncate   = raw_co_truncate,
>> >> > +.bdrv_getlength = raw_getlength,
>> >> > +.bdrv_get_info = raw_get_info,
>> >> > +.bdrv_get_allocated_file_size
>> >> > += raw_get_allocated_file_size,
>> >> > +.bdrv_get_specific_stats = hdev_get_specific_stats,
>> >> > +.bdrv_check_perm = raw_check_perm,
>> >> > +.bdrv_set_perm   = raw_set_perm,
>> >> > +.bdrv_abort_perm_update = raw_abort_perm_update,
>> >> > +.bdrv_probe_blocksizes = hdev_probe_blocksizes,
>> >> > +.bdrv_probe_geometry = hdev_probe_geometry,
>> >> > +.bdrv_co_ioctl = hdev_co_ioctl,
>> >> > +
>> >> > +/* zone management operations */
>> >> > +.bdrv_co_zone_report = raw_co_zone_report,
>> >> > +.bdrv_co_zone_mgmt = raw_co_zone_mgmt,
>> >> > +};
>> >>
>> >> Differences to bdrv_host_device:
>> >>
>> >> * .bdrv_parse_filename is not set
>> >>
>> >> * .bdrv_co_ioctl is not set
>> >>
>> >> * .bdrv_co_zone_report and .bdrv_co_zone_mgmt are set
>> >
>> > As Stefan mentioned, zoned_host_device is a new driver that doesn't
>> > work with string filenames. .bdrv_parse_filename() helps legacy
>> > drivers strip the optional protocol prefix off the filename and no use
>> > here. Therefore it can be dropped.
>>
>> Makes sense.
>>
>> > .bdrv_co_ioctl is set actually.
>>
>> You're right; I diffed the two and misread the result.
>>
>> > Zoned_host_device is basically host_device + zone operations. It
>> > serves for a simple purpose: if the host device is zoned, register
>> > zoned_host_device driver; else, register host_device.
>>
>> Why would I ever want to use host_device instead of zoned_host_device?
>>
>> To answer this question, we need to understand how their behavior
>> differs.
>>
>> We can ignore the legacy protocol prefix / string filename part.
>>
>> All that's left seems to be "if the host device is zoned, then using the
>> zoned_host_device driver gets you the zoned features, whereas using the
>> host_device driver doesn't".  What am I missing?
>
> I think that's basically what users need to know about.

Now answer my previous question, please: why would I 

Re: [PATCH v8 3/7] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls

2022-08-31 Thread Sam Li
Markus Armbruster  于2022年8月31日周三 16:35写道:
>
> Sam Li  writes:
>
> > Markus Armbruster  于2022年8月30日周二 19:57写道:
> >>
> >> Sam Li  writes:
> >>
> >> > By adding zone management operations in BlockDriver, storage controller
> >> > emulation can use the new block layer APIs including Report Zone and
> >> > four zone management operations (open, close, finish, reset).
> >> >
> >> > Add zoned storage commands of the device: zone_report(zrp), 
> >> > zone_open(zo),
> >> > zone_close(zc), zone_reset(zrs), zone_finish(zf).
> >> >
> >> > For example, to test zone_report, use following command:
> >> > $ ./build/qemu-io --image-opts driver=zoned_host_device, 
> >> > filename=/dev/nullb0
> >> > -c "zrp offset nr_zones"
> >> >
> >> > Signed-off-by: Sam Li 
> >> > Reviewed-by: Hannes Reinecke 
> >>
> >> [...]
> >>
> >> > diff --git a/block/file-posix.c b/block/file-posix.c
> >> > index 0a8b4b426e..e3efba6db7 100644
> >> > --- a/block/file-posix.c
> >> > +++ b/block/file-posix.c
> >>
> >> [...]
> >>
> >> > @@ -3752,6 +4025,54 @@ static BlockDriver bdrv_host_device = {
> >> >  #endif
> >> >  };
> >> >
> >> > +#if defined(CONFIG_BLKZONED)
> >> > +static BlockDriver bdrv_zoned_host_device = {
> >> > +.format_name = "zoned_host_device",
> >>
> >> Indentation should be 4, not 8.
> >>
> >> > +.protocol_name = "zoned_host_device",
> >> > +.instance_size = sizeof(BDRVRawState),
> >> > +.bdrv_needs_filename = true,
> >> > +.bdrv_probe_device  = hdev_probe_device,
> >> > +.bdrv_file_open = hdev_open,
> >> > +.bdrv_close = raw_close,
> >> > +.bdrv_reopen_prepare = raw_reopen_prepare,
> >> > +.bdrv_reopen_commit  = raw_reopen_commit,
> >> > +.bdrv_reopen_abort   = raw_reopen_abort,
> >> > +.bdrv_co_create_opts = bdrv_co_create_opts_simple,
> >> > +.create_opts = _create_opts_simple,
> >> > +.mutable_opts= mutable_opts,
> >> > +.bdrv_co_invalidate_cache = raw_co_invalidate_cache,
> >> > +.bdrv_co_pwrite_zeroes = hdev_co_pwrite_zeroes,
> >> > +
> >> > +.bdrv_co_preadv = raw_co_preadv,
> >> > +.bdrv_co_pwritev= raw_co_pwritev,
> >> > +.bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
> >> > +.bdrv_co_pdiscard   = hdev_co_pdiscard,
> >> > +.bdrv_co_copy_range_from = raw_co_copy_range_from,
> >> > +.bdrv_co_copy_range_to  = raw_co_copy_range_to,
> >> > +.bdrv_refresh_limits = raw_refresh_limits,
> >> > +.bdrv_io_plug = raw_aio_plug,
> >> > +.bdrv_io_unplug = raw_aio_unplug,
> >> > +.bdrv_attach_aio_context = raw_aio_attach_aio_context,
> >> > +
> >> > +.bdrv_co_truncate   = raw_co_truncate,
> >> > +.bdrv_getlength = raw_getlength,
> >> > +.bdrv_get_info = raw_get_info,
> >> > +.bdrv_get_allocated_file_size
> >> > += raw_get_allocated_file_size,
> >> > +.bdrv_get_specific_stats = hdev_get_specific_stats,
> >> > +.bdrv_check_perm = raw_check_perm,
> >> > +.bdrv_set_perm   = raw_set_perm,
> >> > +.bdrv_abort_perm_update = raw_abort_perm_update,
> >> > +.bdrv_probe_blocksizes = hdev_probe_blocksizes,
> >> > +.bdrv_probe_geometry = hdev_probe_geometry,
> >> > +.bdrv_co_ioctl = hdev_co_ioctl,
> >> > +
> >> > +/* zone management operations */
> >> > +.bdrv_co_zone_report = raw_co_zone_report,
> >> > +.bdrv_co_zone_mgmt = raw_co_zone_mgmt,
> >> > +};
> >>
> >> Differences to bdrv_host_device:
> >>
> >> * .bdrv_parse_filename is not set
> >>
> >> * .bdrv_co_ioctl is not set
> >>
> >> * .bdrv_co_zone_report and .bdrv_co_zone_mgmt are set
> >
> > As Stefan mentioned, zoned_host_device is a new driver that doesn't
> > work with string filenames. .bdrv_parse_filename() helps legacy
> > drivers strip the optional protocol prefix off the filename and no use
> > here. Therefore it can be dropped.
>
> Makes sense.
>
> > .bdrv_co_ioctl is set actually.
>
> You're right; I diffed the two and misread the result.
>
> > Zoned_host_device is basically host_device + zone operations. It
> > serves for a simple purpose: if the host device is zoned, register
> > zoned_host_device driver; else, register host_device.
>
> Why would I ever want to use host_device instead of zoned_host_device?
>
> To answer this question, we need to understand how their behavior
> differs.
>
> We can ignore the legacy protocol prefix / string filename part.
>
> All that's left seems to be "if the host device is zoned, then using the
> zoned_host_device driver gets you the zoned features, whereas using the
> host_device driver doesn't".  What am I missing?

I think that's basically what users need to know about.

>
> >> Notably common is .bdrv_file_open = hdev_open.  What happens when you
> >> try to create a zoned_host_device where the @filename argument is not in
> >> fact a zoned device?
> >
> > If the 

Re: [PATCH v8 3/7] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls

2022-08-31 Thread Markus Armbruster
Sam Li  writes:

> Markus Armbruster  于2022年8月30日周二 19:57写道:
>>
>> Sam Li  writes:
>>
>> > By adding zone management operations in BlockDriver, storage controller
>> > emulation can use the new block layer APIs including Report Zone and
>> > four zone management operations (open, close, finish, reset).
>> >
>> > Add zoned storage commands of the device: zone_report(zrp), zone_open(zo),
>> > zone_close(zc), zone_reset(zrs), zone_finish(zf).
>> >
>> > For example, to test zone_report, use following command:
>> > $ ./build/qemu-io --image-opts driver=zoned_host_device, 
>> > filename=/dev/nullb0
>> > -c "zrp offset nr_zones"
>> >
>> > Signed-off-by: Sam Li 
>> > Reviewed-by: Hannes Reinecke 
>>
>> [...]
>>
>> > diff --git a/block/file-posix.c b/block/file-posix.c
>> > index 0a8b4b426e..e3efba6db7 100644
>> > --- a/block/file-posix.c
>> > +++ b/block/file-posix.c
>>
>> [...]
>>
>> > @@ -3752,6 +4025,54 @@ static BlockDriver bdrv_host_device = {
>> >  #endif
>> >  };
>> >
>> > +#if defined(CONFIG_BLKZONED)
>> > +static BlockDriver bdrv_zoned_host_device = {
>> > +.format_name = "zoned_host_device",
>>
>> Indentation should be 4, not 8.
>>
>> > +.protocol_name = "zoned_host_device",
>> > +.instance_size = sizeof(BDRVRawState),
>> > +.bdrv_needs_filename = true,
>> > +.bdrv_probe_device  = hdev_probe_device,
>> > +.bdrv_file_open = hdev_open,
>> > +.bdrv_close = raw_close,
>> > +.bdrv_reopen_prepare = raw_reopen_prepare,
>> > +.bdrv_reopen_commit  = raw_reopen_commit,
>> > +.bdrv_reopen_abort   = raw_reopen_abort,
>> > +.bdrv_co_create_opts = bdrv_co_create_opts_simple,
>> > +.create_opts = _create_opts_simple,
>> > +.mutable_opts= mutable_opts,
>> > +.bdrv_co_invalidate_cache = raw_co_invalidate_cache,
>> > +.bdrv_co_pwrite_zeroes = hdev_co_pwrite_zeroes,
>> > +
>> > +.bdrv_co_preadv = raw_co_preadv,
>> > +.bdrv_co_pwritev= raw_co_pwritev,
>> > +.bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
>> > +.bdrv_co_pdiscard   = hdev_co_pdiscard,
>> > +.bdrv_co_copy_range_from = raw_co_copy_range_from,
>> > +.bdrv_co_copy_range_to  = raw_co_copy_range_to,
>> > +.bdrv_refresh_limits = raw_refresh_limits,
>> > +.bdrv_io_plug = raw_aio_plug,
>> > +.bdrv_io_unplug = raw_aio_unplug,
>> > +.bdrv_attach_aio_context = raw_aio_attach_aio_context,
>> > +
>> > +.bdrv_co_truncate   = raw_co_truncate,
>> > +.bdrv_getlength = raw_getlength,
>> > +.bdrv_get_info = raw_get_info,
>> > +.bdrv_get_allocated_file_size
>> > += raw_get_allocated_file_size,
>> > +.bdrv_get_specific_stats = hdev_get_specific_stats,
>> > +.bdrv_check_perm = raw_check_perm,
>> > +.bdrv_set_perm   = raw_set_perm,
>> > +.bdrv_abort_perm_update = raw_abort_perm_update,
>> > +.bdrv_probe_blocksizes = hdev_probe_blocksizes,
>> > +.bdrv_probe_geometry = hdev_probe_geometry,
>> > +.bdrv_co_ioctl = hdev_co_ioctl,
>> > +
>> > +/* zone management operations */
>> > +.bdrv_co_zone_report = raw_co_zone_report,
>> > +.bdrv_co_zone_mgmt = raw_co_zone_mgmt,
>> > +};
>>
>> Differences to bdrv_host_device:
>>
>> * .bdrv_parse_filename is not set
>>
>> * .bdrv_co_ioctl is not set
>>
>> * .bdrv_co_zone_report and .bdrv_co_zone_mgmt are set
>
> As Stefan mentioned, zoned_host_device is a new driver that doesn't
> work with string filenames. .bdrv_parse_filename() helps legacy
> drivers strip the optional protocol prefix off the filename and no use
> here. Therefore it can be dropped.

Makes sense.

> .bdrv_co_ioctl is set actually.

You're right; I diffed the two and misread the result.

> Zoned_host_device is basically host_device + zone operations. It
> serves for a simple purpose: if the host device is zoned, register
> zoned_host_device driver; else, register host_device.

Why would I ever want to use host_device instead of zoned_host_device?

To answer this question, we need to understand how their behavior
differs.

We can ignore the legacy protocol prefix / string filename part.

All that's left seems to be "if the host device is zoned, then using the
zoned_host_device driver gets you the zoned features, whereas using the
host_device driver doesn't".  What am I missing?

>> Notably common is .bdrv_file_open = hdev_open.  What happens when you
>> try to create a zoned_host_device where the @filename argument is not in
>> fact a zoned device?
>
> If the device is a regular block device, QEMU will still open the
> device. For instance, I use a loopback device to test zone_report in
> qemu-io. It returns ENOTTY which indicates Inappropriate ioctl for the
> device. Meanwhile, if using a regular block device when emulation a
> zoned device on a guest os, the best case is that the guest can boot
> 

Re: [PATCH v8 3/7] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls

2022-08-30 Thread Sam Li
Markus Armbruster  于2022年8月30日周二 23:09写道:
>
> Sam Li  writes:
>
> > Markus Armbruster  于2022年8月30日周二 19:57写道:
> >>
> >> Sam Li  writes:
> >>
> >> > By adding zone management operations in BlockDriver, storage controller
> >> > emulation can use the new block layer APIs including Report Zone and
> >> > four zone management operations (open, close, finish, reset).
> >> >
> >> > Add zoned storage commands of the device: zone_report(zrp), 
> >> > zone_open(zo),
> >> > zone_close(zc), zone_reset(zrs), zone_finish(zf).
> >> >
> >> > For example, to test zone_report, use following command:
> >> > $ ./build/qemu-io --image-opts driver=zoned_host_device, 
> >> > filename=/dev/nullb0
> >> > -c "zrp offset nr_zones"
> >> >
> >> > Signed-off-by: Sam Li 
> >> > Reviewed-by: Hannes Reinecke 
>
> [...]
>
> >> > diff --git a/qapi/block-core.json b/qapi/block-core.json
> >> > index 2173e7734a..c6bbb7a037 100644
> >> > --- a/qapi/block-core.json
> >> > +++ b/qapi/block-core.json
> >> > @@ -2942,6 +2942,7 @@
> >> >  # @compress: Since 5.0
> >> >  # @copy-before-write: Since 6.2
> >> >  # @snapshot-access: Since 7.0
> >> > +# @zoned_host_device: Since 7.2
> >> >  #
> >> >  # Since: 2.9
> >> >  ##
> >> > @@ -2955,7 +2956,8 @@
> >> >  'luks', 'nbd', 'nfs', 'null-aio', 'null-co', 'nvme', 
> >> > 'parallels',
> >> >  'preallocate', 'qcow', 'qcow2', 'qed', 'quorum', 'raw', 
> >> > 'rbd',
> >> >  { 'name': 'replication', 'if': 'CONFIG_REPLICATION' },
> >> > -'ssh', 'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat' ] }
> >> > +'ssh', 'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat',
> >> > +{ 'name': 'zoned_host_device', 'if': 'CONFIG_BLKZONED' } ] }
> >>
> >> QAPI naming conventions ask for 'zoned-host-device'.  We may choose to
> >> ignore them to stay closer to existing 'host_device'.
> >
> > I am not sure why should ignore zoned_host_device. Can you be more specific?
>
> "them" = QAPI naming conventions.  Clear now?

Ok, I thought "them" means 'zoned_host_device'.

>
> [...]
>



Re: [PATCH v8 3/7] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls

2022-08-30 Thread Markus Armbruster
Sam Li  writes:

> Markus Armbruster  于2022年8月30日周二 19:57写道:
>>
>> Sam Li  writes:
>>
>> > By adding zone management operations in BlockDriver, storage controller
>> > emulation can use the new block layer APIs including Report Zone and
>> > four zone management operations (open, close, finish, reset).
>> >
>> > Add zoned storage commands of the device: zone_report(zrp), zone_open(zo),
>> > zone_close(zc), zone_reset(zrs), zone_finish(zf).
>> >
>> > For example, to test zone_report, use following command:
>> > $ ./build/qemu-io --image-opts driver=zoned_host_device, 
>> > filename=/dev/nullb0
>> > -c "zrp offset nr_zones"
>> >
>> > Signed-off-by: Sam Li 
>> > Reviewed-by: Hannes Reinecke 

[...]

>> > diff --git a/qapi/block-core.json b/qapi/block-core.json
>> > index 2173e7734a..c6bbb7a037 100644
>> > --- a/qapi/block-core.json
>> > +++ b/qapi/block-core.json
>> > @@ -2942,6 +2942,7 @@
>> >  # @compress: Since 5.0
>> >  # @copy-before-write: Since 6.2
>> >  # @snapshot-access: Since 7.0
>> > +# @zoned_host_device: Since 7.2
>> >  #
>> >  # Since: 2.9
>> >  ##
>> > @@ -2955,7 +2956,8 @@
>> >  'luks', 'nbd', 'nfs', 'null-aio', 'null-co', 'nvme', 
>> > 'parallels',
>> >  'preallocate', 'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'rbd',
>> >  { 'name': 'replication', 'if': 'CONFIG_REPLICATION' },
>> > -'ssh', 'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat' ] }
>> > +'ssh', 'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat',
>> > +{ 'name': 'zoned_host_device', 'if': 'CONFIG_BLKZONED' } ] }
>>
>> QAPI naming conventions ask for 'zoned-host-device'.  We may choose to
>> ignore them to stay closer to existing 'host_device'.
>
> I am not sure why should ignore zoned_host_device. Can you be more specific?

"them" = QAPI naming conventions.  Clear now?

[...]




Re: [PATCH v8 3/7] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls

2022-08-30 Thread Sam Li
Markus Armbruster  于2022年8月30日周二 19:57写道:
>
> Sam Li  writes:
>
> > By adding zone management operations in BlockDriver, storage controller
> > emulation can use the new block layer APIs including Report Zone and
> > four zone management operations (open, close, finish, reset).
> >
> > Add zoned storage commands of the device: zone_report(zrp), zone_open(zo),
> > zone_close(zc), zone_reset(zrs), zone_finish(zf).
> >
> > For example, to test zone_report, use following command:
> > $ ./build/qemu-io --image-opts driver=zoned_host_device, 
> > filename=/dev/nullb0
> > -c "zrp offset nr_zones"
> >
> > Signed-off-by: Sam Li 
> > Reviewed-by: Hannes Reinecke 
>
> [...]
>
> > diff --git a/block/file-posix.c b/block/file-posix.c
> > index 0a8b4b426e..e3efba6db7 100644
> > --- a/block/file-posix.c
> > +++ b/block/file-posix.c
>
> [...]
>
> > @@ -3752,6 +4025,54 @@ static BlockDriver bdrv_host_device = {
> >  #endif
> >  };
> >
> > +#if defined(CONFIG_BLKZONED)
> > +static BlockDriver bdrv_zoned_host_device = {
> > +.format_name = "zoned_host_device",
>
> Indentation should be 4, not 8.
>
> > +.protocol_name = "zoned_host_device",
> > +.instance_size = sizeof(BDRVRawState),
> > +.bdrv_needs_filename = true,
> > +.bdrv_probe_device  = hdev_probe_device,
> > +.bdrv_file_open = hdev_open,
> > +.bdrv_close = raw_close,
> > +.bdrv_reopen_prepare = raw_reopen_prepare,
> > +.bdrv_reopen_commit  = raw_reopen_commit,
> > +.bdrv_reopen_abort   = raw_reopen_abort,
> > +.bdrv_co_create_opts = bdrv_co_create_opts_simple,
> > +.create_opts = _create_opts_simple,
> > +.mutable_opts= mutable_opts,
> > +.bdrv_co_invalidate_cache = raw_co_invalidate_cache,
> > +.bdrv_co_pwrite_zeroes = hdev_co_pwrite_zeroes,
> > +
> > +.bdrv_co_preadv = raw_co_preadv,
> > +.bdrv_co_pwritev= raw_co_pwritev,
> > +.bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
> > +.bdrv_co_pdiscard   = hdev_co_pdiscard,
> > +.bdrv_co_copy_range_from = raw_co_copy_range_from,
> > +.bdrv_co_copy_range_to  = raw_co_copy_range_to,
> > +.bdrv_refresh_limits = raw_refresh_limits,
> > +.bdrv_io_plug = raw_aio_plug,
> > +.bdrv_io_unplug = raw_aio_unplug,
> > +.bdrv_attach_aio_context = raw_aio_attach_aio_context,
> > +
> > +.bdrv_co_truncate   = raw_co_truncate,
> > +.bdrv_getlength = raw_getlength,
> > +.bdrv_get_info = raw_get_info,
> > +.bdrv_get_allocated_file_size
> > += raw_get_allocated_file_size,
> > +.bdrv_get_specific_stats = hdev_get_specific_stats,
> > +.bdrv_check_perm = raw_check_perm,
> > +.bdrv_set_perm   = raw_set_perm,
> > +.bdrv_abort_perm_update = raw_abort_perm_update,
> > +.bdrv_probe_blocksizes = hdev_probe_blocksizes,
> > +.bdrv_probe_geometry = hdev_probe_geometry,
> > +.bdrv_co_ioctl = hdev_co_ioctl,
> > +
> > +/* zone management operations */
> > +.bdrv_co_zone_report = raw_co_zone_report,
> > +.bdrv_co_zone_mgmt = raw_co_zone_mgmt,
> > +};
>
> Differences to bdrv_host_device:
>
> * .bdrv_parse_filename is not set
>
> * .bdrv_co_ioctl is not set
>
> * .bdrv_co_zone_report and .bdrv_co_zone_mgmt are set

As Stefan mentioned, zoned_host_device is a new driver that doesn't
work with string filenames. .bdrv_parse_filename() helps legacy
drivers strip the optional protocol prefix off the filename and no use
here. Therefore it can be dropped.

.bdrv_co_ioctl is set actually.

Zoned_host_device is basically host_device + zone operations. It
serves for a simple purpose: if the host device is zoned, register
zoned_host_device driver; else, register host_device.

>
> Notably common is .bdrv_file_open = hdev_open.  What happens when you
> try to create a zoned_host_device where the @filename argument is not in
> fact a zoned device?

If the device is a regular block device, QEMU will still open the
device. For instance, I use a loopback device to test zone_report in
qemu-io. It returns ENOTTY which indicates Inappropriate ioctl for the
device. Meanwhile, if using a regular block device when emulation a
zoned device on a guest os, the best case is that the guest can boot
but has no emulated block device. In some cases, QEMU just terminates
because the block device has not met the alignment requirements.

>
> Do we really need a separate, but almost identical BlockDriver?  Could
> the existing one provide zoned functionality exactly when the underlying
> host device does?

I did use the existing one host device to add zoned commands at first.
But then, we decided to change that and use a separate BlockDriver.
Though the existing one can provide zoned functionality, a new
BlockDriver makes it clear when mixing block drivers, adding more
configurations/constraints, etc. For 

Re: [PATCH v8 3/7] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls

2022-08-30 Thread Markus Armbruster
Sam Li  writes:

> By adding zone management operations in BlockDriver, storage controller
> emulation can use the new block layer APIs including Report Zone and
> four zone management operations (open, close, finish, reset).
>
> Add zoned storage commands of the device: zone_report(zrp), zone_open(zo),
> zone_close(zc), zone_reset(zrs), zone_finish(zf).
>
> For example, to test zone_report, use following command:
> $ ./build/qemu-io --image-opts driver=zoned_host_device, filename=/dev/nullb0
> -c "zrp offset nr_zones"
>
> Signed-off-by: Sam Li 
> Reviewed-by: Hannes Reinecke 

[...]

> diff --git a/block/file-posix.c b/block/file-posix.c
> index 0a8b4b426e..e3efba6db7 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c

[...]

> @@ -3752,6 +4025,54 @@ static BlockDriver bdrv_host_device = {
>  #endif
>  };
>  
> +#if defined(CONFIG_BLKZONED)
> +static BlockDriver bdrv_zoned_host_device = {
> +.format_name = "zoned_host_device",

Indentation should be 4, not 8.

> +.protocol_name = "zoned_host_device",
> +.instance_size = sizeof(BDRVRawState),
> +.bdrv_needs_filename = true,
> +.bdrv_probe_device  = hdev_probe_device,
> +.bdrv_file_open = hdev_open,
> +.bdrv_close = raw_close,
> +.bdrv_reopen_prepare = raw_reopen_prepare,
> +.bdrv_reopen_commit  = raw_reopen_commit,
> +.bdrv_reopen_abort   = raw_reopen_abort,
> +.bdrv_co_create_opts = bdrv_co_create_opts_simple,
> +.create_opts = _create_opts_simple,
> +.mutable_opts= mutable_opts,
> +.bdrv_co_invalidate_cache = raw_co_invalidate_cache,
> +.bdrv_co_pwrite_zeroes = hdev_co_pwrite_zeroes,
> +
> +.bdrv_co_preadv = raw_co_preadv,
> +.bdrv_co_pwritev= raw_co_pwritev,
> +.bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
> +.bdrv_co_pdiscard   = hdev_co_pdiscard,
> +.bdrv_co_copy_range_from = raw_co_copy_range_from,
> +.bdrv_co_copy_range_to  = raw_co_copy_range_to,
> +.bdrv_refresh_limits = raw_refresh_limits,
> +.bdrv_io_plug = raw_aio_plug,
> +.bdrv_io_unplug = raw_aio_unplug,
> +.bdrv_attach_aio_context = raw_aio_attach_aio_context,
> +
> +.bdrv_co_truncate   = raw_co_truncate,
> +.bdrv_getlength = raw_getlength,
> +.bdrv_get_info = raw_get_info,
> +.bdrv_get_allocated_file_size
> += raw_get_allocated_file_size,
> +.bdrv_get_specific_stats = hdev_get_specific_stats,
> +.bdrv_check_perm = raw_check_perm,
> +.bdrv_set_perm   = raw_set_perm,
> +.bdrv_abort_perm_update = raw_abort_perm_update,
> +.bdrv_probe_blocksizes = hdev_probe_blocksizes,
> +.bdrv_probe_geometry = hdev_probe_geometry,
> +.bdrv_co_ioctl = hdev_co_ioctl,
> +
> +/* zone management operations */
> +.bdrv_co_zone_report = raw_co_zone_report,
> +.bdrv_co_zone_mgmt = raw_co_zone_mgmt,
> +};

Differences to bdrv_host_device:

* .bdrv_parse_filename is not set

* .bdrv_co_ioctl is not set

* .bdrv_co_zone_report and .bdrv_co_zone_mgmt are set

Notably common is .bdrv_file_open = hdev_open.  What happens when you
try to create a zoned_host_device where the @filename argument is not in
fact a zoned device?

Do we really need a separate, but almost identical BlockDriver?  Could
the existing one provide zoned functionality exactly when the underlying
host device does?

Forgive me if these are ignorant questions, or have been discussed
before.

> +#endif
> +
>  #if defined(__linux__) || defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
>  static void cdrom_parse_filename(const char *filename, QDict *options,
>   Error **errp)
> @@ -4012,6 +4333,9 @@ static void bdrv_file_init(void)
>  bdrv_register(_file);
>  #if defined(HAVE_HOST_BLOCK_DEVICE)
>  bdrv_register(_host_device);
> +#if defined(CONFIG_BLKZONED)
> +bdrv_register(_zoned_host_device);
> +#endif
>  #ifdef __linux__
>  bdrv_register(_host_cdrom);
>  #endif

[...]

> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 2173e7734a..c6bbb7a037 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -2942,6 +2942,7 @@
>  # @compress: Since 5.0
>  # @copy-before-write: Since 6.2
>  # @snapshot-access: Since 7.0
> +# @zoned_host_device: Since 7.2
>  #
>  # Since: 2.9
>  ##
> @@ -2955,7 +2956,8 @@
>  'luks', 'nbd', 'nfs', 'null-aio', 'null-co', 'nvme', 'parallels',
>  'preallocate', 'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'rbd',
>  { 'name': 'replication', 'if': 'CONFIG_REPLICATION' },
> -'ssh', 'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat' ] }
> +'ssh', 'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat',
> +{ 'name': 'zoned_host_device', 'if': 'CONFIG_BLKZONED' } ] }

QAPI naming conventions ask for 

Re: [PATCH v8 3/7] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls

2022-08-29 Thread Stefan Hajnoczi
On Sat, Aug 27, 2022 at 12:17:04AM +0800, Sam Li wrote:
> +/*
> + * Send a zone_management command.
> + * op is the zone operation.
> + * offset is the starting zone specified as a sector offset.

Does "sector offset" mean "byte offset from the start of the device" or
does it mean in 512B sector units? For consistency this should be in
bytes.

> + * len is the maximum number of sectors the command should operate on. It
> + * should be aligned with the zone sector size.

Please use bytes for consistency with QEMU's block layer APIs.

> @@ -3022,6 +3183,118 @@ static void raw_account_discard(BDRVRawState *s, 
> uint64_t nbytes, int ret)
>  }
>  }
>  
> +/*
> + * zone report - Get a zone block device's information in the form
> + * of an array of zone descriptors.
> + *
> + * @param bs: passing zone block device file descriptor
> + * @param zones: an array of zone descriptors to hold zone
> + * information on reply
> + * @param offset: offset can be any byte within the zone size.

This isn't an offset within a zone, it's an offset within the entire
device, so I think "zone size" is confusing here.

> + * @param len: (not sure yet.

Please remove this and document nr_zones instead.

> + * @return 0 on success, -1 on failure
> + */
> +static int coroutine_fn raw_co_zone_report(BlockDriverState *bs, int64_t 
> offset,
> +   unsigned int *nr_zones,
> +   BlockZoneDescriptor *zones) {
> +#if defined(CONFIG_BLKZONED)
> +BDRVRawState *s = bs->opaque;
> +RawPosixAIOData acb;
> +
> +acb = (RawPosixAIOData) {
> +.bs = bs,
> +.aio_fildes = s->fd,
> +.aio_type   = QEMU_AIO_ZONE_REPORT,
> +.aio_offset = offset,
> +.zone_report= {
> +.nr_zones   = nr_zones,
> +.zones  = zones,
> +},
> +};
> +
> +return raw_thread_pool_submit(bs, handle_aiocb_zone_report, );
> +#else
> +return -ENOTSUP;
> +#endif
> +}
> +
> +/*
> + * zone management operations - Execute an operation on a zone
> + */
> +static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp 
> op,
> +int64_t offset, int64_t len) {
> +#if defined(CONFIG_BLKZONED)
> +BDRVRawState *s = bs->opaque;
> +RawPosixAIOData acb;
> +int64_t zone_sector, zone_sector_mask;
> +const char *ioctl_name;
> +unsigned long zone_op;
> +int ret;
> +
> +struct stat st;
> +if (fstat(s->fd, ) < 0) {
> +ret = -errno;
> +return ret;
> +}

st is not used and can be removed.

> +zone_sector = bs->bl.zone_sectors;
> +zone_sector_mask = zone_sector - 1;
> +if (offset & zone_sector_mask) {
> +error_report("sector offset %" PRId64 " is not aligned to zone size "
> + "%" PRId64 "", offset, zone_sector);
> +return -EINVAL;
> +}
> +
> +if (len & zone_sector_mask) {
> +error_report("number of sectors %" PRId64 " is not aligned to zone 
> size"
> +  " %" PRId64 "", len, zone_sector);
> +return -EINVAL;
> +}
> +
> +switch (op) {
> +case BLK_ZO_OPEN:
> +ioctl_name = "BLKOPENZONE";
> +zone_op = BLKOPENZONE;
> +break;
> +case BLK_ZO_CLOSE:
> +ioctl_name = "BLKCLOSEZONE";
> +zone_op = BLKCLOSEZONE;
> +break;
> +case BLK_ZO_FINISH:
> +ioctl_name = "BLKFINISHZONE";
> +zone_op = BLKFINISHZONE;
> +break;
> +case BLK_ZO_RESET:
> +ioctl_name = "BLKRESETZONE";
> +zone_op = BLKRESETZONE;
> +break;
> +default:
> +error_report("Invalid zone operation 0x%x", op);
> +return -EINVAL;
> +}
> +
> +acb = (RawPosixAIOData) {
> +.bs = bs,
> +.aio_fildes = s->fd,
> +.aio_type   = QEMU_AIO_ZONE_MGMT,
> +.aio_offset = offset,
> +.aio_nbytes = len,
> +.zone_mgmt  = {
> +.zone_op = zone_op,
> +},
> +};
> +
> +ret = raw_thread_pool_submit(bs, handle_aiocb_zone_mgmt, );
> +if (ret != 0) {
> +error_report("ioctl %s failed %d", ioctl_name, errno);
> +return -errno;

ret contains a negative errno value. The errno variable is not used by
raw_thread_pool_submit().

I suggest simplifying it to:

  return raw_thread_pool_submit(bs, handle_aiocb_zone_mgmt, );

That's what most of the other raw_thread_pool_submit() callers.


signature.asc
Description: PGP signature


Re: [PATCH v8 3/7] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls

2022-08-29 Thread Sam Li
Sam Li  于2022年8月29日周一 20:53写道:
>
> By adding zone management operations in BlockDriver, storage controller
> emulation can use the new block layer APIs including Report Zone and
> four zone management operations (open, close, finish, reset).
>
> Add zoned storage commands of the device: zone_report(zrp), zone_open(zo),
> zone_close(zc), zone_reset(zrs), zone_finish(zf).
>
> For example, to test zone_report, use following command:
> $ ./build/qemu-io --image-opts driver=zoned_host_device, filename=/dev/nullb0
> -c "zrp offset nr_zones"
>
> Signed-off-by: Sam Li 
> Reviewed-by: Hannes Reinecke 
> ---
>  block/block-backend.c |  51 +
>  block/file-posix.c| 326 +-
>  block/io.c|  41 
>  include/block/block-io.h  |   7 +
>  include/block/block_int-common.h  |  21 ++
>  include/block/raw-aio.h   |   6 +-
>  include/sysemu/block-backend-io.h |  17 ++
>  meson.build   |   1 +
>  qapi/block-core.json  |   8 +-
>  qemu-io-cmds.c| 143 +
>  10 files changed, 617 insertions(+), 4 deletions(-)
>
> diff --git a/block/block-backend.c b/block/block-backend.c
> index d4a5df2ac2..c5798651df 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -1775,6 +1775,57 @@ int coroutine_fn blk_co_flush(BlockBackend *blk)
>  return ret;
>  }
>
> +/*
> + * Send a zone_report command.
> + * offset is a byte offset from the start of the device. No alignment
> + * required for offset.
> + * nr_zones represents IN maximum and OUT actual.
> + */
> +int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
> +unsigned int *nr_zones,
> +BlockZoneDescriptor *zones)
> +{
> +int ret;
> +IO_CODE();
> +
> +blk_inc_in_flight(blk); /* increase before waiting */
> +blk_wait_while_drained(blk);
> +if (!blk_is_available(blk)) {
> +blk_dec_in_flight(blk);
> +return -ENOMEDIUM;
> +}
> +ret = bdrv_co_zone_report(blk_bs(blk), offset, nr_zones, zones);
> +blk_dec_in_flight(blk);
> +return ret;
> +}
> +
> +/*
> + * Send a zone_management command.
> + * op is the zone operation.
> + * offset is the starting zone specified as a sector offset.
> + * len is the maximum number of sectors the command should operate on. It
> + * should be aligned with the zone sector size.
> + */
> +int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
> +int64_t offset, int64_t len)
> +{
> +int ret;
> +IO_CODE();
> +
> +
> +blk_inc_in_flight(blk);
> +blk_wait_while_drained(blk);
> +
> +ret = blk_check_byte_request(blk, offset, len);
> +if (ret < 0) {
> +return ret;
> +}
> +
> +ret = bdrv_co_zone_mgmt(blk_bs(blk), op, offset, len);
> +blk_dec_in_flight(blk);
> +return ret;
> +}
> +
>  void blk_drain(BlockBackend *blk)
>  {
>  BlockDriverState *bs = blk_bs(blk);
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 0a8b4b426e..e3efba6db7 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -67,6 +67,9 @@
>  #include 
>  #include 
>  #include 
> +#if defined(CONFIG_BLKZONED)
> +#include 
> +#endif
>  #include 
>  #include 
>  #include 
> @@ -216,6 +219,13 @@ typedef struct RawPosixAIOData {
>  PreallocMode prealloc;
>  Error **errp;
>  } truncate;
> +struct {
> +unsigned int *nr_zones;
> +BlockZoneDescriptor *zones;
> +} zone_report;
> +struct {
> +unsigned long zone_op;
> +} zone_mgmt;
>  };
>  } RawPosixAIOData;
>
> @@ -1339,7 +1349,7 @@ static void raw_refresh_limits(BlockDriverState *bs, 
> Error **errp)
>  #endif
>
>  if (bs->sg || S_ISBLK(st.st_mode)) {
> -int ret = hdev_get_max_hw_transfer(s->fd, );
> +ret = hdev_get_max_hw_transfer(s->fd, );
>
>  if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) {
>  bs->bl.max_hw_transfer = ret;
> @@ -1356,6 +1366,27 @@ static void raw_refresh_limits(BlockDriverState *bs, 
> Error **errp)
>  zoned = BLK_Z_NONE;
>  }
>  bs->bl.zoned = zoned;
> +if (zoned != BLK_Z_NONE) {
> +ret = get_sysfs_long_val(, "chunk_sectors");
> +if (ret > 0) {
> +bs->bl.zone_sectors = ret;
> +}
> +
> +ret = get_sysfs_long_val(, "zone_append_max_bytes");
> +if (ret > 0) {
> +bs->bl.zone_append_max_bytes = ret;
> +}
> +
> +ret = get_sysfs_long_val(, "max_open_zones");
> +if (ret >= 0) {
> +bs->bl.max_open_zones = ret;
> +}
> +
> +ret = get_sysfs_long_val(, "max_active_zones");
> +if (ret >= 0) {
> +bs->bl.max_active_zones = ret;
> +}
> +}
>  }
>
>  static int check_for_dasd(int fd)
> @@ -1850,6 +1881,136 @@ static off_t copy_file_range(int in_fd, 

[PATCH v8 3/7] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls

2022-08-29 Thread Sam Li
By adding zone management operations in BlockDriver, storage controller
emulation can use the new block layer APIs including Report Zone and
four zone management operations (open, close, finish, reset).

Add zoned storage commands of the device: zone_report(zrp), zone_open(zo),
zone_close(zc), zone_reset(zrs), zone_finish(zf).

For example, to test zone_report, use following command:
$ ./build/qemu-io --image-opts driver=zoned_host_device, filename=/dev/nullb0
-c "zrp offset nr_zones"

Signed-off-by: Sam Li 
Reviewed-by: Hannes Reinecke 
---
 block/block-backend.c |  51 +
 block/file-posix.c| 326 +-
 block/io.c|  41 
 include/block/block-io.h  |   7 +
 include/block/block_int-common.h  |  21 ++
 include/block/raw-aio.h   |   6 +-
 include/sysemu/block-backend-io.h |  17 ++
 meson.build   |   1 +
 qapi/block-core.json  |   8 +-
 qemu-io-cmds.c| 143 +
 10 files changed, 617 insertions(+), 4 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index d4a5df2ac2..c5798651df 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1775,6 +1775,57 @@ int coroutine_fn blk_co_flush(BlockBackend *blk)
 return ret;
 }
 
+/*
+ * Send a zone_report command.
+ * offset is a byte offset from the start of the device. No alignment
+ * required for offset.
+ * nr_zones represents IN maximum and OUT actual.
+ */
+int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
+unsigned int *nr_zones,
+BlockZoneDescriptor *zones)
+{
+int ret;
+IO_CODE();
+
+blk_inc_in_flight(blk); /* increase before waiting */
+blk_wait_while_drained(blk);
+if (!blk_is_available(blk)) {
+blk_dec_in_flight(blk);
+return -ENOMEDIUM;
+}
+ret = bdrv_co_zone_report(blk_bs(blk), offset, nr_zones, zones);
+blk_dec_in_flight(blk);
+return ret;
+}
+
+/*
+ * Send a zone_management command.
+ * op is the zone operation.
+ * offset is the starting zone specified as a sector offset.
+ * len is the maximum number of sectors the command should operate on. It
+ * should be aligned with the zone sector size.
+ */
+int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
+int64_t offset, int64_t len)
+{
+int ret;
+IO_CODE();
+
+
+blk_inc_in_flight(blk);
+blk_wait_while_drained(blk);
+
+ret = blk_check_byte_request(blk, offset, len);
+if (ret < 0) {
+return ret;
+}
+
+ret = bdrv_co_zone_mgmt(blk_bs(blk), op, offset, len);
+blk_dec_in_flight(blk);
+return ret;
+}
+
 void blk_drain(BlockBackend *blk)
 {
 BlockDriverState *bs = blk_bs(blk);
diff --git a/block/file-posix.c b/block/file-posix.c
index 0a8b4b426e..e3efba6db7 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -67,6 +67,9 @@
 #include 
 #include 
 #include 
+#if defined(CONFIG_BLKZONED)
+#include 
+#endif
 #include 
 #include 
 #include 
@@ -216,6 +219,13 @@ typedef struct RawPosixAIOData {
 PreallocMode prealloc;
 Error **errp;
 } truncate;
+struct {
+unsigned int *nr_zones;
+BlockZoneDescriptor *zones;
+} zone_report;
+struct {
+unsigned long zone_op;
+} zone_mgmt;
 };
 } RawPosixAIOData;
 
@@ -1339,7 +1349,7 @@ static void raw_refresh_limits(BlockDriverState *bs, 
Error **errp)
 #endif
 
 if (bs->sg || S_ISBLK(st.st_mode)) {
-int ret = hdev_get_max_hw_transfer(s->fd, );
+ret = hdev_get_max_hw_transfer(s->fd, );
 
 if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) {
 bs->bl.max_hw_transfer = ret;
@@ -1356,6 +1366,27 @@ static void raw_refresh_limits(BlockDriverState *bs, 
Error **errp)
 zoned = BLK_Z_NONE;
 }
 bs->bl.zoned = zoned;
+if (zoned != BLK_Z_NONE) {
+ret = get_sysfs_long_val(, "chunk_sectors");
+if (ret > 0) {
+bs->bl.zone_sectors = ret;
+}
+
+ret = get_sysfs_long_val(, "zone_append_max_bytes");
+if (ret > 0) {
+bs->bl.zone_append_max_bytes = ret;
+}
+
+ret = get_sysfs_long_val(, "max_open_zones");
+if (ret >= 0) {
+bs->bl.max_open_zones = ret;
+}
+
+ret = get_sysfs_long_val(, "max_active_zones");
+if (ret >= 0) {
+bs->bl.max_active_zones = ret;
+}
+}
 }
 
 static int check_for_dasd(int fd)
@@ -1850,6 +1881,136 @@ static off_t copy_file_range(int in_fd, off_t *in_off, 
int out_fd,
 }
 #endif
 
+/*
+ * parse_zone - Fill a zone descriptor
+ */
+#if defined(CONFIG_BLKZONED)
+static inline void parse_zone(struct BlockZoneDescriptor *zone,
+  const struct blk_zone *blkz) {
+zone->start = blkz->start;
+zone->length = blkz->len;
+zone->cap = 

[PATCH v8 3/7] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls

2022-08-26 Thread Sam Li
By adding zone management operations in BlockDriver, storage controller
emulation can use the new block layer APIs including Report Zone and
four zone management operations (open, close, finish, reset).

Add zoned storage commands of the device: zone_report(zrp), zone_open(zo),
zone_close(zc), zone_reset(zrs), zone_finish(zf).

For example, to test zone_report, use following command:
$ ./build/qemu-io --image-opts driver=zoned_host_device, filename=/dev/nullb0
-c "zrp offset nr_zones"

Signed-off-by: Sam Li 
Reviewed-by: Hannes Reinecke 
---
 block/block-backend.c |  51 +
 block/file-posix.c| 326 +-
 block/io.c|  41 
 include/block/block-io.h  |   7 +
 include/block/block_int-common.h  |  21 ++
 include/block/raw-aio.h   |   6 +-
 include/sysemu/block-backend-io.h |  17 ++
 meson.build   |   1 +
 qapi/block-core.json  |   8 +-
 qemu-io-cmds.c| 143 +
 10 files changed, 617 insertions(+), 4 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index d4a5df2ac2..c5798651df 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1775,6 +1775,57 @@ int coroutine_fn blk_co_flush(BlockBackend *blk)
 return ret;
 }
 
+/*
+ * Send a zone_report command.
+ * offset is a byte offset from the start of the device. No alignment
+ * required for offset.
+ * nr_zones represents IN maximum and OUT actual.
+ */
+int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
+unsigned int *nr_zones,
+BlockZoneDescriptor *zones)
+{
+int ret;
+IO_CODE();
+
+blk_inc_in_flight(blk); /* increase before waiting */
+blk_wait_while_drained(blk);
+if (!blk_is_available(blk)) {
+blk_dec_in_flight(blk);
+return -ENOMEDIUM;
+}
+ret = bdrv_co_zone_report(blk_bs(blk), offset, nr_zones, zones);
+blk_dec_in_flight(blk);
+return ret;
+}
+
+/*
+ * Send a zone_management command.
+ * op is the zone operation.
+ * offset is the starting zone specified as a sector offset.
+ * len is the maximum number of sectors the command should operate on. It
+ * should be aligned with the zone sector size.
+ */
+int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
+int64_t offset, int64_t len)
+{
+int ret;
+IO_CODE();
+
+
+blk_inc_in_flight(blk);
+blk_wait_while_drained(blk);
+
+ret = blk_check_byte_request(blk, offset, len);
+if (ret < 0) {
+return ret;
+}
+
+ret = bdrv_co_zone_mgmt(blk_bs(blk), op, offset, len);
+blk_dec_in_flight(blk);
+return ret;
+}
+
 void blk_drain(BlockBackend *blk)
 {
 BlockDriverState *bs = blk_bs(blk);
diff --git a/block/file-posix.c b/block/file-posix.c
index 0a8b4b426e..e3efba6db7 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -67,6 +67,9 @@
 #include 
 #include 
 #include 
+#if defined(CONFIG_BLKZONED)
+#include 
+#endif
 #include 
 #include 
 #include 
@@ -216,6 +219,13 @@ typedef struct RawPosixAIOData {
 PreallocMode prealloc;
 Error **errp;
 } truncate;
+struct {
+unsigned int *nr_zones;
+BlockZoneDescriptor *zones;
+} zone_report;
+struct {
+unsigned long zone_op;
+} zone_mgmt;
 };
 } RawPosixAIOData;
 
@@ -1339,7 +1349,7 @@ static void raw_refresh_limits(BlockDriverState *bs, 
Error **errp)
 #endif
 
 if (bs->sg || S_ISBLK(st.st_mode)) {
-int ret = hdev_get_max_hw_transfer(s->fd, );
+ret = hdev_get_max_hw_transfer(s->fd, );
 
 if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) {
 bs->bl.max_hw_transfer = ret;
@@ -1356,6 +1366,27 @@ static void raw_refresh_limits(BlockDriverState *bs, 
Error **errp)
 zoned = BLK_Z_NONE;
 }
 bs->bl.zoned = zoned;
+if (zoned != BLK_Z_NONE) {
+ret = get_sysfs_long_val(, "chunk_sectors");
+if (ret > 0) {
+bs->bl.zone_sectors = ret;
+}
+
+ret = get_sysfs_long_val(, "zone_append_max_bytes");
+if (ret > 0) {
+bs->bl.zone_append_max_bytes = ret;
+}
+
+ret = get_sysfs_long_val(, "max_open_zones");
+if (ret >= 0) {
+bs->bl.max_open_zones = ret;
+}
+
+ret = get_sysfs_long_val(, "max_active_zones");
+if (ret >= 0) {
+bs->bl.max_active_zones = ret;
+}
+}
 }
 
 static int check_for_dasd(int fd)
@@ -1850,6 +1881,136 @@ static off_t copy_file_range(int in_fd, off_t *in_off, 
int out_fd,
 }
 #endif
 
+/*
+ * parse_zone - Fill a zone descriptor
+ */
+#if defined(CONFIG_BLKZONED)
+static inline void parse_zone(struct BlockZoneDescriptor *zone,
+  const struct blk_zone *blkz) {
+zone->start = blkz->start;
+zone->length = blkz->len;
+zone->cap =