[PATCH RESEND 0/2] block/qapi: Dead code cleanup

2023-09-01 Thread Fabiano Rosas
Hi,

I'm resending a couple of already reviewed patches that were part of a
larger series[1].

Thanks

1- https://lore.kernel.org/r/20230609201910.12100-1-faro...@suse.de

Fabiano Rosas (2):
  block: Remove bdrv_query_block_node_info
  block: Remove unnecessary variable in bdrv_block_device_info

 block/qapi.c | 32 ++--
 include/block/qapi.h |  3 ---
 2 files changed, 2 insertions(+), 33 deletions(-)

-- 
2.35.3




[PATCH RESEND 1/2] block: Remove bdrv_query_block_node_info

2023-09-01 Thread Fabiano Rosas
The last call site of this function has been removed by commit
c04d0ab026 ("qemu-img: Let info print block graph").

Reviewed-by: Claudio Fontana 
Signed-off-by: Fabiano Rosas 
---
 block/qapi.c | 27 ---
 include/block/qapi.h |  3 ---
 2 files changed, 30 deletions(-)

diff --git a/block/qapi.c b/block/qapi.c
index f34f95e0ef..79bf80c503 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -309,33 +309,6 @@ out:
 aio_context_release(bdrv_get_aio_context(bs));
 }
 
-/**
- * bdrv_query_block_node_info:
- * @bs: block node to examine
- * @p_info: location to store node information
- * @errp: location to store error information
- *
- * Store image information about @bs in @p_info.
- *
- * @p_info will be set only on success. On error, store error in @errp.
- */
-void bdrv_query_block_node_info(BlockDriverState *bs,
-BlockNodeInfo **p_info,
-Error **errp)
-{
-BlockNodeInfo *info;
-ERRP_GUARD();
-
-info = g_new0(BlockNodeInfo, 1);
-bdrv_do_query_node_info(bs, info, errp);
-if (*errp) {
-qapi_free_BlockNodeInfo(info);
-return;
-}
-
-*p_info = info;
-}
-
 /**
  * bdrv_query_image_info:
  * @bs: block node to examine
diff --git a/include/block/qapi.h b/include/block/qapi.h
index 18d48ddb70..8663971c58 100644
--- a/include/block/qapi.h
+++ b/include/block/qapi.h
@@ -36,9 +36,6 @@ BlockDeviceInfo *bdrv_block_device_info(BlockBackend *blk,
 int bdrv_query_snapshot_info_list(BlockDriverState *bs,
   SnapshotInfoList **p_list,
   Error **errp);
-void bdrv_query_block_node_info(BlockDriverState *bs,
-BlockNodeInfo **p_info,
-Error **errp);
 void bdrv_query_image_info(BlockDriverState *bs,
ImageInfo **p_info,
bool flat,
-- 
2.35.3




[PATCH RESEND 2/2] block: Remove unnecessary variable in bdrv_block_device_info

2023-09-01 Thread Fabiano Rosas
The commit 5d8813593f ("block/qapi: Let bdrv_query_image_info()
recurse") removed the loop where we set the 'bs0' variable, so now it
is just the same as 'bs'.

Signed-off-by: Fabiano Rosas 
Reviewed-by: Philippe Mathieu-Daudé 
---
 block/qapi.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/block/qapi.c b/block/qapi.c
index 79bf80c503..1cbb0935ff 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -48,7 +48,7 @@ BlockDeviceInfo *bdrv_block_device_info(BlockBackend *blk,
 {
 ImageInfo **p_image_info;
 ImageInfo *backing_info;
-BlockDriverState *bs0, *backing;
+BlockDriverState *backing;
 BlockDeviceInfo *info;
 ERRP_GUARD();
 
@@ -145,7 +145,6 @@ BlockDeviceInfo *bdrv_block_device_info(BlockBackend *blk,
 
 info->write_threshold = bdrv_write_threshold_get(bs);
 
-bs0 = bs;
 p_image_info = &info->image;
 info->backing_file_depth = 0;
 
@@ -153,7 +152,7 @@ BlockDeviceInfo *bdrv_block_device_info(BlockBackend *blk,
  * Skip automatically inserted nodes that the user isn't aware of for
  * query-block (blk != NULL), but not for query-named-block-nodes
  */
-bdrv_query_image_info(bs0, p_image_info, flat, blk != NULL, errp);
+bdrv_query_image_info(bs, p_image_info, flat, blk != NULL, errp);
 if (*errp) {
 qapi_free_BlockDeviceInfo(info);
 return NULL;
-- 
2.35.3




Re: zlib-ng as a compat replacement for zlib

2023-09-01 Thread Richard W.M. Jones
On Fri, Sep 01, 2023 at 06:45:22PM +0200, Florian Weimer wrote:
> * Richard W. M. Jones:
> 
> > I tested the speed of decompression using:
> >
> >   $ hyperfine 'qemu-img convert -W -m 16 -f qcow2 test.qcow2.XXX -O raw 
> > test.out'
> >   (qemu 8.0.0-4.fc39.x86_64)
> >
> >   $ hyperfine 'nbdkit -U - --filter=qcow2dec file test.qcow2.XXX --run 
> > '\''nbdcopy --request-size "$uri" test.out'\'' '
> >   (nbdkit-1.35.11-2.fc40.x86_64)
> 
> How realistic is that?  Larger cluster sizes will make random access
> perform noticeably worse is some cases.  Think about reading a few bytes
> towards the end of the cluster.  It makes a difference whether you have
> to decompress 64 KiB bytes for that, or 2 MiB.  As far as I understand
> it, the above commands use all data decompressed, so they don't suffer
> from this issue (particularly with read-ahead to deal with unfortunate
> cluster boundaries).
> 
> Time to first HTTP request served after boot or something like that
> might be a better comparison.

Yes, this is a good point.

Current Fedora images use 64k cluster size which I think is also the
default:

$ qemu-img info 
https://download.fedoraproject.org/pub/fedora/linux/releases/38/Cloud/x86_64/images/Fedora-Cloud-Base-38-1.6.x86_64.qcow2
 
image: 
https://download.fedoraproject.org/pub/fedora/linux/releases/38/Cloud/x86_64/images/Fedora-Cloud-Base-38-1.6.x86_64.qcow2
file format: qcow2
virtual size: 5 GiB (5368709120 bytes)
disk size: unavailable
cluster_size: 65536<
Format specific information:
compat: 0.10
compression type: zlib
refcount bits: 16

And the tests showed there's barely any performance gain above the
default cluster size anyway.  Only very small clusters should be
avoided, but there are good reasons to avoid those already such as
metadata overhead and small maximum image size.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top




Re: [PATCH v2 4/4] block-coroutine-wrapper: use qemu_get_current_aio_context()

2023-09-01 Thread Kevin Wolf
Am 24.08.2023 um 01:59 hat Stefan Hajnoczi geschrieben:
> Use qemu_get_current_aio_context() in mixed wrappers and coroutine
> wrappers so that code runs in the caller's AioContext instead of moving
> to the BlockDriverState's AioContext. This change is necessary for the
> multi-queue block layer where any thread can call into the block layer.
> 
> Most wrappers are IO_CODE where it's safe to use the current AioContext
> nowadays. BlockDrivers and the core block layer use their own locks and
> no longer depend on the AioContext lock for thread-safety.
> 
> The bdrv_create() wrapper invokes GLOBAL_STATE code. Using the current
> AioContext is safe because this code is only called with the BQL held
> from the main loop thread.
> 
> The output of qemu-iotests 051 is sensitive to event loop activity.
> Update the output because the monitor BH runs at a different time,
> causing prompts to be printed differently in the output.
> 
> Signed-off-by: Stefan Hajnoczi 

The update for 051 is actually missing from this patch, and so the test
fails.

I missed the dependency on your qio_channel series, so 281 ran into an
abort() for me (see below for the stack trace). I expect that the other
series actually fixes this, but this kind of interaction wasn't really
obvious. How did you make sure that there aren't other places that don't
like this change?

Kevin

(gdb) bt
#0  0x7f8ef0d2fe5c in __pthread_kill_implementation () at /lib64/libc.so.6
#1  0x7f8ef0cdfa76 in raise () at /lib64/libc.so.6
#2  0x7f8ef0cc97fc in abort () at /lib64/libc.so.6
#3  0x7f8ef0cc971b in _nl_load_domain.cold () at /lib64/libc.so.6
#4  0x7f8ef0cd8656 in  () at /lib64/libc.so.6
#5  0x55fd19da6af3 in qio_channel_yield (ioc=0x7f8eeb70, 
condition=G_IO_IN) at ../io/channel.c:583
#6  0x55fd19e0382f in nbd_read_eof (bs=0x55fd1b681350, ioc=0x7f8eeb70, 
buffer=0x55fd1b680da0, size=4, errp=0x0) at ../nbd/client.c:1454
#7  0x55fd19e03612 in nbd_receive_reply (bs=0x55fd1b681350, 
ioc=0x7f8eeb70, reply=0x55fd1b680da0, errp=0x0) at ../nbd/client.c:1491
#8  0x55fd19e40575 in nbd_receive_replies (s=0x55fd1b680b00, cookie=1) at 
../block/nbd.c:461
#9  0x55fd19e3fec4 in nbd_co_do_receive_one_chunk
(s=0x55fd1b680b00, cookie=1, only_structured=true, 
request_ret=0x7f8ee8bff91c, qiov=0x7f8ee8bfff10, payload=0x7f8ee8bff9d0, 
errp=0x7f8ee8bff910) at ../block/nbd.c:844
#10 0x55fd19e3fd55 in nbd_co_receive_one_chunk
(s=0x55fd1b680b00, cookie=1, only_structured=true, 
request_ret=0x7f8ee8bff91c, qiov=0x7f8ee8bfff10, reply=0x7f8ee8bff9f0, 
payload=0x7f8ee8bff9d0, errp=0x7f8ee8bff910)
at ../block/nbd.c:925
#11 0x55fd19e3f7b5 in nbd_reply_chunk_iter_receive (s=0x55fd1b680b00, 
iter=0x7f8ee8bff9d8, cookie=1, qiov=0x7f8ee8bfff10, reply=0x7f8ee8bff9f0, 
payload=0x7f8ee8bff9d0)
at ../block/nbd.c:1008
#12 0x55fd19e3ecf7 in nbd_co_receive_cmdread_reply (s=0x55fd1b680b00, 
cookie=1, offset=0, qiov=0x7f8ee8bfff10, request_ret=0x7f8ee8bffad4, 
errp=0x7f8ee8bffac8) at ../block/nbd.c:1074
#13 0x55fd19e3c804 in nbd_client_co_preadv (bs=0x55fd1b681350, offset=0, 
bytes=131072, qiov=0x7f8ee8bfff10, flags=0) at ../block/nbd.c:1258
#14 0x55fd19e33547 in bdrv_driver_preadv (bs=0x55fd1b681350, offset=0, 
bytes=131072, qiov=0x7f8ee8bfff10, qiov_offset=0, flags=0) at ../block/io.c:1005
#15 0x55fd19e2c8bb in bdrv_aligned_preadv (child=0x55fd1c282d90, 
req=0x7f8ee8bffd90, offset=0, bytes=131072, align=1, qiov=0x7f8ee8bfff10, 
qiov_offset=0, flags=0) at ../block/io.c:1398
#16 0x55fd19e2bf7d in bdrv_co_preadv_part (child=0x55fd1c282d90, offset=0, 
bytes=131072, qiov=0x7f8ee8bfff10, qiov_offset=0, flags=0) at ../block/io.c:1815
#17 0x55fd19e176bd in blk_co_do_preadv_part (blk=0x55fd1c269c00, offset=0, 
bytes=131072, qiov=0x7f8ee8bfff10, qiov_offset=0, flags=0) at 
../block/block-backend.c:1344
#18 0x55fd19e17588 in blk_co_preadv (blk=0x55fd1c269c00, offset=0, 
bytes=131072, qiov=0x7f8ee8bfff10, flags=0) at ../block/block-backend.c:1369
#19 0x55fd19e17514 in blk_co_pread (blk=0x55fd1c269c00, offset=0, 
bytes=131072, buf=0x55fd1c16d000, flags=0) at ../block/block-backend.c:1358
#20 0x55fd19ddcc91 in blk_co_pread_entry (opaque=0x7ffc4bbdd9a0) at 
block/block-gen.c:1519
#21 0x55fd19feb2a1 in coroutine_trampoline (i0=460835072, i1=22013) at 
../util/coroutine-ucontext.c:177
#22 0x7f8ef0cf5c20 in __start_context () at /lib64/libc.so.6

io/channel.c:583 is this:

577 void coroutine_fn qio_channel_yield(QIOChannel *ioc,
578 GIOCondition condition)
579 {
580 AioContext *ioc_ctx = ioc->ctx ?: qemu_get_aio_context();
581
582 assert(qemu_in_coroutine());
583 assert(in_aio_context_home_thread(ioc_ctx));
584




Re: zlib-ng as a compat replacement for zlib

2023-09-01 Thread Florian Weimer
* Richard W. M. Jones:

> I tested the speed of decompression using:
>
>   $ hyperfine 'qemu-img convert -W -m 16 -f qcow2 test.qcow2.XXX -O raw 
> test.out'
>   (qemu 8.0.0-4.fc39.x86_64)
>
>   $ hyperfine 'nbdkit -U - --filter=qcow2dec file test.qcow2.XXX --run 
> '\''nbdcopy --request-size "$uri" test.out'\'' '
>   (nbdkit-1.35.11-2.fc40.x86_64)

How realistic is that?  Larger cluster sizes will make random access
perform noticeably worse is some cases.  Think about reading a few bytes
towards the end of the cluster.  It makes a difference whether you have
to decompress 64 KiB bytes for that, or 2 MiB.  As far as I understand
it, the above commands use all data decompressed, so they don't suffer
from this issue (particularly with read-ahead to deal with unfortunate
cluster boundaries).

Time to first HTTP request served after boot or something like that
might be a better comparison.

Thanks,
Florian




Re: [PATCH 7/7] qobject atomics osdep: Make a few macros more hygienic

2023-09-01 Thread Cédric Le Goater

On 9/1/23 16:50, Markus Armbruster wrote:

Cédric Le Goater  writes:


On 8/31/23 16:30, Eric Blake wrote:

On Thu, Aug 31, 2023 at 03:25:46PM +0200, Markus Armbruster wrote:
[This paragraph written last: Bear with my stream of consciousness
review below, where I end up duplicating some of the conslusions you
reached before the point where I saw where the patch was headed]


Variables declared in macros can shadow other variables.  Much of the
time, this is harmless, e.g.:

  #define _FDT(exp)  \
  do {   \
  int ret = (exp);   \
  if (ret < 0) { \
  error_report("error creating device tree: %s: %s",   \
  #exp, fdt_strerror(ret));  \
  exit(1);   \
  }  \
  } while (0)

Which is why I've seen some projects require a strict namespace
separation: if all macro parameters and any identifiers declared in
macros use either a leading or a trailing _ (I prefer a trailing one,
to avoid risking conflicts with libc reserved namespace; but leading
is usually okay), and all other identifiers avoid that namespace, then
you will never have shadowing by calling a macro from normal code.


I started fixing the _FDT() macro since it is quite noisy at compile.
Same for qemu_fdt_setprop_cells(). So are we ok with names like 'ret_'
and 'i_' ? I used a 'local_' prefix for now but I can change.


I believe identifiers with a '_' suffix are just fine in macros.  We
have quite a few of them already.


ok


I also have a bunch of fixes for ppc.


Appreciated!


count me on for the ppc files :

 hw/ppc/pnv_psi.c
 hw/ppc/spapr.c
 hw/ppc/spapr_drc.c
 hw/ppc/spapr_pci.c
 include/hw/ppc/fdt.h

and surely some other files under target/ppc/

This one was taken care of by Phil:

 include/sysemu/device_tree.h

C.



Re: [PATCH 7/7] qobject atomics osdep: Make a few macros more hygienic

2023-09-01 Thread Markus Armbruster
Cédric Le Goater  writes:

> On 8/31/23 16:30, Eric Blake wrote:
>> On Thu, Aug 31, 2023 at 03:25:46PM +0200, Markus Armbruster wrote:
>> [This paragraph written last: Bear with my stream of consciousness
>> review below, where I end up duplicating some of the conslusions you
>> reached before the point where I saw where the patch was headed]
>> 
>>> Variables declared in macros can shadow other variables.  Much of the
>>> time, this is harmless, e.g.:
>>>
>>>  #define _FDT(exp)  \
>>>  do {   \
>>>  int ret = (exp);   \
>>>  if (ret < 0) { \
>>>  error_report("error creating device tree: %s: %s",   \
>>>  #exp, fdt_strerror(ret));  \
>>>  exit(1);   \
>>>  }  \
>>>  } while (0)
>> Which is why I've seen some projects require a strict namespace
>> separation: if all macro parameters and any identifiers declared in
>> macros use either a leading or a trailing _ (I prefer a trailing one,
>> to avoid risking conflicts with libc reserved namespace; but leading
>> is usually okay), and all other identifiers avoid that namespace, then
>> you will never have shadowing by calling a macro from normal code.
>
> I started fixing the _FDT() macro since it is quite noisy at compile.
> Same for qemu_fdt_setprop_cells(). So are we ok with names like 'ret_'
> and 'i_' ? I used a 'local_' prefix for now but I can change.

I believe identifiers with a '_' suffix are just fine in macros.  We
have quite a few of them already.

> I also have a bunch of fixes for ppc.

Appreciated!




Re: [PATCH 1/2] hw/ide/core.c (cmd_read_native_max): Avoid limited device parameters

2023-09-01 Thread Alexander Bulekov
On 230112 0412, Lev Kujawski wrote:
> 
> John Snow writes:
> 
> > On Mon, Oct 10, 2022 at 4:52 AM Lev Kujawski  wrote:
> >>
> >> Always use the native CHS device parameters for the ATA commands READ
> >> NATIVE MAX ADDRESS and READ NATIVE MAX ADDRESS EXT, not those limited
> >> by the ATA command INITIALIZE_DEVICE_PARAMETERS (introduced in patch
> >> 176e4961, hw/ide/core.c: Implement ATA INITIALIZE_DEVICE_PARAMETERS
> >> command, 2022-07-07.)
> >>
> >> As stated by the ATA/ATAPI specification, "[t]he native maximum is the
> >> highest address accepted by the device in the factory default
> >> condition."  Therefore this patch substitutes the native values in
> >> drive_heads and drive_sectors before calling ide_set_sector().
> >>
> >> One consequence of the prior behavior was that setting zero sectors
> >> per track could lead to an FPE within ide_set_sector().  Thanks to
> >> Alexander Bulekov for reporting this issue.
> >>
> >> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1243
> >> Signed-off-by: Lev Kujawski 
> >
> > Does this need attention?
> >
> > --js
> >
> 
> Hi John,
> 
> This patch needs to be merged to mitigate issue 1243, which is still
> present within QEMU master as of aa96ab7c9d.
> 
> Thanks, Lev
> 

Ping. oss-fuzz re-discovered this bug.



Re: [PATCH 7/7] qobject atomics osdep: Make a few macros more hygienic

2023-09-01 Thread Philippe Mathieu-Daudé

On 1/9/23 14:59, Cédric Le Goater wrote:

On 8/31/23 16:30, Eric Blake wrote:

On Thu, Aug 31, 2023 at 03:25:46PM +0200, Markus Armbruster wrote:

[This paragraph written last: Bear with my stream of consciousness
review below, where I end up duplicating some of the conslusions you
reached before the point where I saw where the patch was headed]


Variables declared in macros can shadow other variables.  Much of the
time, this is harmless, e.g.:

 #define 
_FDT(exp)  \
 do 
{   \
 int ret = 
(exp);   \
 if (ret < 0) 
{ \

 error_report("error creating device tree: %s: %s",   \
 #exp, 
fdt_strerror(ret));  \
 
exit(1);   \
 
}  \

 } while (0)


Which is why I've seen some projects require a strict namespace
separation: if all macro parameters and any identifiers declared in
macros use either a leading or a trailing _ (I prefer a trailing one,
to avoid risking conflicts with libc reserved namespace; but leading
is usually okay), and all other identifiers avoid that namespace, then
you will never have shadowing by calling a macro from normal code.


I started fixing the _FDT() macro since it is quite noisy at compile.
Same for qemu_fdt_setprop_cells(). So are we ok with names like 'ret_'
and 'i_' ? I used a 'local_' prefix for now but I can change.


See 
https://lore.kernel.org/qemu-devel/20230831225607.30829-12-phi...@linaro.org/





Re: [PATCH] fix bdrv_open_child return value check

2023-09-01 Thread Kevin Wolf
Am 01.09.2023 um 13:26 hat Дмитрий Фролов geschrieben:
> Hello, Kevin.
> 
> This was just cleanup, based on the inspection.

Thanks for clarifying. I've updated the subject line to "vmdk: Clean up
bdrv_open_child() return value check" to avoid misunderstandings and
applied this to my block branch.

Kevin




Re: [PATCH 7/7] qobject atomics osdep: Make a few macros more hygienic

2023-09-01 Thread Eric Blake
On Fri, Sep 01, 2023 at 10:48:26AM +0200, Markus Armbruster wrote:
> > Indeed, not fully understanding the preprocessor makes for some
> > interesting death traps.
> 
> We use ALL_CAPS for macros to signal "watch out for traps".
> 

> >> -#define QOBJECT(obj) ({ \
> >> -typeof(obj) _obj = (obj);   \
> >> -_obj ? container_of(&(_obj)->base, QObject, base) : NULL;   \
> >> +#define QOBJECT_INTERNAL(obj, l) ({ \
> >> +typeof(obj) PASTE(_obj, l) = (obj); \
> >> +PASTE(_obj, l)  \
> >> +? container_of(&(PASTE(_obj, l))->base, QObject, base) : NULL;  \
> >>  })
> >> +#define QOBJECT(obj) QOBJECT_INTERNAL((obj), __COUNTER__)
> >
> > Slick!  Every call to QOBJECT() defines a unique temporary variable
> > name.  But as written, QOBJECT(o) expands to this (when __COUNTER__ is
> > 1):
> >
> > ({
> >   typeof((o)) _obj1 = ((o));
> >   _obj1 ? container_of(&(_obj1)->base, QObject, base) : NULL;
> > })
> >
> > which has three sets of redundant parens that could be elided.  Why do
> > you need to add () around obj when forwarding to QOBJECT_INTERNAL()?
> 
> Habit: enclose macro arguments in parenthesis unless there's a specific
> need not to.
> 
> > The only way the expansion of the text passed through the 'obj'
> > parameter can contain a comma is if the user has already parenthesized
> > it on their end (not that I expect a comma expression to be a frequent
> > argument to QOBJECT(), but who knows).  Arguing that it is to silence
> > a syntax checker is weak; since we must NOT add parens around the
> > parameters to QOBJECT_INTERNAL (calling PASTE((_obj), (l)) is
> > obviously wrong).
> >
> > Meanwhile, the repetition of three calls to PASTE() is annoying.  How
> > about:
> >
> > #define QOBJECT_INTERNAL_2(_obj, _tmp) ({ \
> >   typeof _obj _tmp = _obj; \
> >   _tmp ? container_of(&_tmp->base, QObject, base) : NULL; \
> >   )}
> > #define QOBJECT_INTERNAL_1(_arg, _tmp) QOBJECT_INTERNAL_2(_arg, _tmp)
> > #define QOBJECT_INTERNAL(_arg, _ctr) \
> >   QOBJECT_INTERNAL_1(_arg, PASTE(_obj, _ctr))
> > #define QOBJECT(obj) QOBJECT_INTERNAL((obj), __COUNTER__)
> >
> > or:
> >
> > #define QOBJECT_INTERNAL_2(_obj, _tmp) ({ \
> >   typeof(_obj) _tmp = (_obj); \
> >   _tmp ? container_of(&_tmp->base, QObject, base) : NULL; \
> >   )}
> > #define QOBJECT_INTERNAL_1(_arg, _tmp) QOBJECT_INTERNAL_2(_arg, _tmp)
> > #define QOBJECT_INTERNAL(_arg, _ctr) \
> >   QOBJECT_INTERNAL_1(_arg, PASTE(_obj, _ctr))
> > #define QOBJECT(obj) QOBJECT_INTERNAL(obj, __COUNTER__)
> >
> > where, in either case, QOBJECT(o) should then expand to
> >
> > ({
> >   typeof (o) _obj1 = (o);
> >   _obj1 ? container_of(&_obj1->base, QObject, base) : NULL;
> > })
> 
> The idea is to have the innermost macro take the variable name instead
> of the counter.  "PASTE(_obj, l)" then becomes "_tmp" there, which is
> more legible.  However, we pay for it by going through two more macros.
> 
> Can you explain why you need two more?

Likewise habit, on my part (and laziness - I wrote the previous email
without testing anything). I've learned over the years that pasting is
hard (you can't mix ## with a macro that you want expanded without 2
layers of indirection), so I started out by adding two layers of
QOBJECT_INTERNAL, then wrote QOBJECT to forward to QOBJECT_INTERNAL.
But now that you asked, I actually spent the time to test with the
preprocessor, and your hunch is indeed correct: I over-compensated
becaues of my habit.

$cat foo.c
#define PASTE(_a, _b) _a##_b

#define QOBJECT_INTERNAL_2(_obj, _tmp) ({   \
  typeof _obj _tmp = _obj; \
  _tmp ? container_of(&_tmp->base, QObject, base) : NULL; \
  )}
#define QOBJECT_INTERNAL_1(_arg, _tmp) QOBJECT_INTERNAL_2(_arg, _tmp)
#define QOBJECT_INTERNAL(_arg, _ctr) \
  QOBJECT_INTERNAL_1(_arg, PASTE(_obj, _ctr))
#define QOBJECT(obj) QOBJECT_INTERNAL((obj), __COUNTER__)

QOBJECT(o)

#define Q_INTERNAL_1(_obj, _tmp) ({ \
  typeof _obj _tmp = _obj; \
  _tmp ? container_of(&_tmp->base, QObject, base) : NULL; \
  )}
#define Q_INTERNAL(_arg, _ctr) \
  Q_INTERNAL_1(_arg, PASTE(_obj, _ctr))
#define Q(obj) Q_INTERNAL((obj), __COUNTER__)

Q(o)
$ gcc -E foo.c
# 0 "foo.c"
# 0 ""
# 0 ""
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 0 "" 2
# 1 "foo.c"
# 12 "foo.c"
({ typeof (o) _obj0 = (o); _obj0 ? container_of(&_obj0->base, QObject, base) : 
NULL; )}
# 22 "foo.c"
({ typeof (o) _obj1 = (o); _obj1 ? container_of(&_obj1->base, QObject, base) : 
NULL; )}

So the important part was merely ensuring that __COUNTER__ has a
chance to be expanded PRIOR to the point where ## gets its argument,
and one less layer of indirection was still sufficient for that.

> 
> Complication: the innermost macro needs to take all its variable names
> as arguments.  The macro above has just one.  Macros below have two.
> 
> Not quite sure which version is easier 

Re: [PATCH 7/7] qobject atomics osdep: Make a few macros more hygienic

2023-09-01 Thread Cédric Le Goater

On 8/31/23 16:30, Eric Blake wrote:

On Thu, Aug 31, 2023 at 03:25:46PM +0200, Markus Armbruster wrote:

[This paragraph written last: Bear with my stream of consciousness
review below, where I end up duplicating some of the conslusions you
reached before the point where I saw where the patch was headed]


Variables declared in macros can shadow other variables.  Much of the
time, this is harmless, e.g.:

 #define _FDT(exp)  \
 do {   \
 int ret = (exp);   \
 if (ret < 0) { \
 error_report("error creating device tree: %s: %s",   \
 #exp, fdt_strerror(ret));  \
 exit(1);   \
 }  \
 } while (0)


Which is why I've seen some projects require a strict namespace
separation: if all macro parameters and any identifiers declared in
macros use either a leading or a trailing _ (I prefer a trailing one,
to avoid risking conflicts with libc reserved namespace; but leading
is usually okay), and all other identifiers avoid that namespace, then
you will never have shadowing by calling a macro from normal code.


I started fixing the _FDT() macro since it is quite noisy at compile.
Same for qemu_fdt_setprop_cells(). So are we ok with names like 'ret_'
and 'i_' ? I used a 'local_' prefix for now but I can change.

I also have a bunch of fixes for ppc.

C.





Harmless shadowing in h_client_architecture_support():

 target_ulong ret;

 [...]

 ret = do_client_architecture_support(cpu, spapr, vec, fdt_bufsize);
 if (ret == H_SUCCESS) {
 _FDT((fdt_pack(spapr->fdt_blob)));
 [...]
 }

 return ret;

However, we can get in trouble when the shadowed variable is used in a
macro argument:

 #define QOBJECT(obj) ({ \
 typeof(obj) o = (obj);  \
 o ? container_of(&(o)->base, QObject, base) : NULL; \
  })

QOBJECT(o) expands into

 ({
--->typeof(o) o = (o);
 o ? container_of(&(o)->base, QObject, base) : NULL;
 })

Unintended variable name capture at --->.  We'd be saved by
-Winit-self.  But I could certainly construct more elaborate death
traps that don't trigger it.


Indeed, not fully understanding the preprocessor makes for some
interesting death traps.



To reduce the risk of trapping ourselves, we use variable names in
macros that no sane person would use elsewhere.  Here's our actual
definition of QOBJECT():

 #define QOBJECT(obj) ({ \
 typeof(obj) _obj = (obj);   \
 _obj ? container_of(&(_obj)->base, QObject, base) : NULL;   \
 })


Yep, goes back to the policy I've seen of enforcing naming conventions
that ensure macros can't shadow normal code.



Works well enough until we nest macro calls.  For instance, with

 #define qobject_ref(obj) ({ \
 typeof(obj) _obj = (obj);   \
 qobject_ref_impl(QOBJECT(_obj));\
 _obj;   \
 })

the expression qobject_ref(obj) expands into

 ({
 typeof(obj) _obj = (obj);
 qobject_ref_impl(
 ({
--->typeof(_obj) _obj = (_obj);
 _obj ? container_of(&(_obj)->base, QObject, base) : NULL;
 }));
 _obj;
 })

Unintended variable name capture at --->.


Yep, you've just proven how macros calling macros requires care, as we
no longer have the namespace protections.  If macro calls can only
form a DAG, it is possible to choose unique intermediate declarations
for every macro (although remembering to do that, and still keeping
the macro definition legible, may not be easy).  But if you can have
macros that form any sort of nesting loop (and we WANT to allow things
like MIN(MIN(a, b), c) - which is such a loop), dynamic naming becomes
the only solution.



The only reliable way to prevent unintended variable name capture is
-Wshadow.


Yes, I would love to have that enabled eventually.



One blocker for enabling it is shadowing hiding in function-like
macros like

  qdict_put(dict, "name", qobject_ref(...))

qdict_put() wraps its last argument in QOBJECT(), and the last
argument here contains another QOBJECT().

Use dark preprocessor sorcery to make the macros that give us this
problem use different variable names on every call.


Sounds foreboding; hopefully not many macros are affected...



Signed-off-by: Markus Armbruster 
---
  include/qapi/qmp/qobject.h |  8 +---
  include/qemu/atomic.h  | 11 ++-
  include/qem

Re: [PATCH v3 2/2] vhost: Add Error parameter to vhost_scsi_common_start()

2023-09-01 Thread Markus Armbruster
Li Feng  writes:

> Add a Error parameter to report the real error, like vhost-user-blk.
>
> Signed-off-by: Li Feng 
> ---
>  hw/scsi/vhost-scsi-common.c   | 16 +---
>  hw/scsi/vhost-scsi.c  |  5 +++--
>  hw/scsi/vhost-user-scsi.c | 14 --
>  include/hw/virtio/vhost-scsi-common.h |  2 +-
>  4 files changed, 21 insertions(+), 16 deletions(-)
>
> diff --git a/hw/scsi/vhost-scsi-common.c b/hw/scsi/vhost-scsi-common.c
> index a61cd0e907..4c8637045d 100644
> --- a/hw/scsi/vhost-scsi-common.c
> +++ b/hw/scsi/vhost-scsi-common.c
> @@ -16,6 +16,7 @@
>   */
>  
>  #include "qemu/osdep.h"
> +#include "qapi/error.h"
>  #include "qemu/error-report.h"
>  #include "qemu/module.h"
>  #include "hw/virtio/vhost.h"
> @@ -25,7 +26,7 @@
>  #include "hw/virtio/virtio-access.h"
>  #include "hw/fw-path-provider.h"
>  
> -int vhost_scsi_common_start(VHostSCSICommon *vsc)
> +int vhost_scsi_common_start(VHostSCSICommon *vsc, Error **errp)
>  {
>  int ret, i;
>  VirtIODevice *vdev = VIRTIO_DEVICE(vsc);
> @@ -35,18 +36,19 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc)
>  VirtIOSCSICommon *vs = (VirtIOSCSICommon *)vsc;
>  
>  if (!k->set_guest_notifiers) {
> -error_report("binding does not support guest notifiers");
> +error_setg(errp, "binding does not support guest notifiers");
>  return -ENOSYS;
>  }
>  
>  ret = vhost_dev_enable_notifiers(&vsc->dev, vdev);
>  if (ret < 0) {
> +error_setg_errno(errp, -ret, "Error enabling host notifiers");

Looks like the error is silent before your patch.  Correct?

>  return ret;
>  }
>  
>  ret = k->set_guest_notifiers(qbus->parent, vsc->dev.nvqs, true);
>  if (ret < 0) {
> -error_report("Error binding guest notifier");
> +error_setg_errno(errp, -ret, "Error binding guest notifier");

Error message now provides more detail.

>  goto err_host_notifiers;
>  }
>  
> @@ -54,7 +56,7 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc)
>  
>  ret = vhost_dev_prepare_inflight(&vsc->dev, vdev);
>  if (ret < 0) {
> -error_report("Error setting inflight format: %d", -ret);
> +error_setg_errno(errp, -ret, "Error setting inflight format");

Error message now shows errno in human-readable form.

>  goto err_guest_notifiers;
>  }
>  
> @@ -64,21 +66,21 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc)
>  vs->conf.virtqueue_size,
>  vsc->inflight);
>  if (ret < 0) {
> -error_report("Error getting inflight: %d", -ret);
> +error_setg_errno(errp, -ret, "Error getting inflight");

Likewise.

>  goto err_guest_notifiers;
>  }
>  }
>  
>  ret = vhost_dev_set_inflight(&vsc->dev, vsc->inflight);
>  if (ret < 0) {
> -error_report("Error setting inflight: %d", -ret);
> +error_setg_errno(errp, -ret, "Error setting inflight");

Likewise.

>  goto err_guest_notifiers;
>  }
>  }
>  
>  ret = vhost_dev_start(&vsc->dev, vdev, true);
>  if (ret < 0) {
> -error_report("Error start vhost dev");
> +error_setg_errno(errp, -ret, "Error starting vhost dev");

Likewise.

>  goto err_guest_notifiers;
>  }
>  
> diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
> index 443f67daa4..01a3ab4277 100644
> --- a/hw/scsi/vhost-scsi.c
> +++ b/hw/scsi/vhost-scsi.c
> @@ -75,6 +75,7 @@ static int vhost_scsi_start(VHostSCSI *s)
>  int ret, abi_version;
>  VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s);
>  const VhostOps *vhost_ops = vsc->dev.vhost_ops;
> +Error *local_err = NULL;
>  
>  ret = vhost_ops->vhost_scsi_get_abi_version(&vsc->dev, &abi_version);
>  if (ret < 0) {
> @@ -88,14 +89,14 @@ static int vhost_scsi_start(VHostSCSI *s)
>  return -ENOSYS;
>  }
>  
> -ret = vhost_scsi_common_start(vsc);
> +ret = vhost_scsi_common_start(vsc, &local_err);
>  if (ret < 0) {
>  return ret;
>  }
>  
>  ret = vhost_scsi_set_endpoint(s);
>  if (ret < 0) {
> -error_report("Error setting vhost-scsi endpoint");
> +error_reportf_err(local_err, "Error setting vhost-scsi endpoint");
>  vhost_scsi_common_stop(vsc);
>  }
>  
> diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
> index e931df9f5b..62fc98bb1c 100644
> --- a/hw/scsi/vhost-user-scsi.c
> +++ b/hw/scsi/vhost-user-scsi.c
> @@ -43,12 +43,12 @@ enum VhostUserProtocolFeature {
>  VHOST_USER_PROTOCOL_F_RESET_DEVICE = 13,
>  };
>  
> -static int vhost_user_scsi_start(VHostUserSCSI *s)
> +static int vhost_user_scsi_start(VHostUserSCSI *s, Error **errp)
>  {
>  VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s);
>  int ret;
>  
> -ret = vhost_scsi_common_start(vsc);
> +ret = vhost_scsi_common_start(vsc, errp);
>  s->s

Re: [PATCH v3 4/5] vhost-user-scsi: support reconnect to backend

2023-09-01 Thread Markus Armbruster
Li Feng  writes:

> If the backend crashes and restarts, the device is broken.
> This patch adds reconnect for vhost-user-scsi.
>
> Tested with spdk backend.
>
> Signed-off-by: Li Feng 
> ---
>  hw/scsi/vhost-user-scsi.c   | 199 +---
>  include/hw/virtio/vhost-user-scsi.h |   4 +
>  2 files changed, 184 insertions(+), 19 deletions(-)
>
> diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
> index ee99b19e7a..5bf012461b 100644
> --- a/hw/scsi/vhost-user-scsi.c
> +++ b/hw/scsi/vhost-user-scsi.c
> @@ -43,26 +43,54 @@ enum VhostUserProtocolFeature {
>  VHOST_USER_PROTOCOL_F_RESET_DEVICE = 13,
>  };
>  
> +static int vhost_user_scsi_start(VHostUserSCSI *s)
> +{
> +VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s);
> +int ret;
> +
> +ret = vhost_scsi_common_start(vsc);
> +s->started_vu = (ret < 0 ? false : true);
> +
> +return ret;
> +}
> +
> +static void vhost_user_scsi_stop(VHostUserSCSI *s)
> +{
> +VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s);
> +
> +if (!s->started_vu) {
> +return;
> +}
> +s->started_vu = false;
> +
> +vhost_scsi_common_stop(vsc);
> +}
> +
>  static void vhost_user_scsi_set_status(VirtIODevice *vdev, uint8_t status)
>  {
>  VHostUserSCSI *s = (VHostUserSCSI *)vdev;
> +DeviceState *dev = &s->parent_obj.parent_obj.parent_obj.parent_obj;
>  VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s);
> -bool start = (status & VIRTIO_CONFIG_S_DRIVER_OK) && vdev->vm_running;
> +VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev);
> +bool should_start = virtio_device_should_start(vdev, status);
> +int ret;
>  
> -if (vhost_dev_is_started(&vsc->dev) == start) {
> +if (!s->connected) {
>  return;
>  }
>  
> -if (start) {
> -int ret;
> +if (vhost_dev_is_started(&vsc->dev) == should_start) {
> +return;
> +}
>  
> -ret = vhost_scsi_common_start(vsc);
> +if (should_start) {
> +ret = vhost_user_scsi_start(s);
>  if (ret < 0) {
>  error_report("unable to start vhost-user-scsi: %s", 
> strerror(-ret));
> -exit(1);
> +qemu_chr_fe_disconnect(&vs->conf.chardev);
>  }
>  } else {
> -vhost_scsi_common_stop(vsc);
> +vhost_user_scsi_stop(s);
>  }
>  }
>  
> @@ -89,14 +117,126 @@ static void vhost_dummy_handle_output(VirtIODevice 
> *vdev, VirtQueue *vq)
>  {
>  }
>  
> +static int vhost_user_scsi_connect(DeviceState *dev, Error **errp)
> +{
> +VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> +VHostUserSCSI *s = VHOST_USER_SCSI(vdev);
> +VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s);
> +VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev);
> +int ret = 0;
> +
> +if (s->connected) {
> +return 0;
> +}
> +s->connected = true;
> +
> +vsc->dev.num_queues = vs->conf.num_queues;
> +vsc->dev.nvqs = VIRTIO_SCSI_VQ_NUM_FIXED + vs->conf.num_queues;
> +vsc->dev.vqs = s->vhost_vqs;
> +vsc->dev.vq_index = 0;
> +vsc->dev.backend_features = 0;
> +
> +ret = vhost_dev_init(&vsc->dev, &s->vhost_user, VHOST_BACKEND_TYPE_USER, 
> 0,
> + errp);
> +if (ret < 0) {
> +return ret;
> +}
> +
> +/* restore vhost state */
> +if (virtio_device_started(vdev, vdev->status)) {
> +ret = vhost_user_scsi_start(s);
> +if (ret < 0) {
> +return ret;
> +}

Fails without setting an error, violating the function's (tacit)
postcondition.  Callers:

* vhost_user_scsi_event()

  Dereferences the null error and crashes.

* vhost_user_scsi_realize_connect()

  Also fails without setting an error.  Caller:

  - vhost_user_scsi_realize()

1. Doesn't emit the "Reconnecting after error: " message.  See
   also below.

2. Succeeds instead of failing!

> +}
> +
> +return 0;
> +}
> +
> +static void vhost_user_scsi_event(void *opaque, QEMUChrEvent event);
> +
> +static void vhost_user_scsi_disconnect(DeviceState *dev)
> +{
> +VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> +VHostUserSCSI *s = VHOST_USER_SCSI(vdev);
> +VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s);
> +VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev);
> +
> +if (!s->connected) {
> +return;
> +}
> +s->connected = false;
> +
> +vhost_user_scsi_stop(s);
> +
> +vhost_dev_cleanup(&vsc->dev);
> +
> +/* Re-instate the event handler for new connections */
> +qemu_chr_fe_set_handlers(&vs->conf.chardev, NULL, NULL,
> + vhost_user_scsi_event, NULL, dev, NULL, true);
> +}
> +
> +static void vhost_user_scsi_event(void *opaque, QEMUChrEvent event)
> +{
> +DeviceState *dev = opaque;
> +VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> +VHostUserSCSI *s = VHOST_USER_SCSI(vdev);
> +VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s);
> +VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev);
> +Error *local_err = NULL;
> +
> +switch (event) {
> +case CH

Re: zlib-ng as a compat replacement for zlib

2023-09-01 Thread Richard W.M. Jones
On Fri, Sep 01, 2023 at 01:10:16PM +0200, Kevin Wolf wrote:
> And nbdkit seems to get worse instead of better with larger cluster
> size, no matter whether zlib or zstd is used.

It's caused by nbdcopy's default request size being 256k.  Increasing
it to 2M cures the scaling problem - see updated results below.  (Note
nbdkit & nbdcopy are being used together so we're copying between two
programs.  The request size is the size of NBD requests between the two.)

> If you think using more threads is the key for the remaining difference
> at 64k, would increasing QCOW2_MAX_THREADS (currently only 4) help on
> the qemu-img side?

Results:

  qemu800 = qemu-img-8.0.0-4.fc39.x86_64 [previous results]
  qemugit = qemu @ 17780edd81d
  qemuthr = qemu @ 17780edd81d with QCOW2_MAX_THREADS changed from 4 to 16
  nbdkit = nbdkit-1.35.11-2.fc40.x86_64 [previous results]
  nbdkit2M = nbdkit with nbdcopy --request-size=$((2*1024*1024))


Cluster  Compression  Compressed size  Prog  Decompression speed

4k   zlib 3228811264   qemu800   5.921 s ±  0.074 s
4k   zstd 3258097664   qemu800   5.189 s ±  0.158 s

4k   zlib 3228811264   qemugit   7.021 s ±  0.234 s
4k   zstd 3258097664   qemugit   6.594 s ±  0.170 s

4k   zlib 3228811264   qemuthr   6.744 s ±  0.111 s
4k   zstd 3258097664   qemuthr   6.428 s ±  0.206 s

4k   zlib 3228811264   nbdkit1.390 s ±  0.094 s
4k   zstd 3258097664   nbdkit1.328 s ±  0.055 s


64k  zlib 3164667904   qemu800   3.579 s ±  0.094 s
64k  zstd 3132686336   qemu800   1.770 s ±  0.060 s

64k  zlib 3164667904   qemugit   3.644 s ±  0.018 s
64k  zstd 3132686336   qemugit   1.814 s ±  0.098 s

64k  zlib 3164667904   qemuthr   1.356 s ±  0.058 s
64k  zstd 3132686336   qemuthr   1.266 s ±  0.064 s

64k  zlib 3164667904   nbdkit1.254 s ±  0.065 s
64k  zstd 3132686336   nbdkit1.315 s ±  0.037 s


512k zlib 3158744576   qemu800   4.008 s ±  0.058 s
512k zstd 3032697344   qemu800   1.503 s ±  0.072 s

512k zlib 3158744576   qemugit   4.015 s ±  0.040 s
512k zstd 3032697344   qemugit   1.557 s ±  0.025 s

512k zlib 3158744576   qemuthr   1.233 s ±  0.050 s
512k zstd 3032697344   qemuthr   1.149 s ±  0.032 s

512k zlib 3158744576   nbdkit1.702 s ±  0.026 s
512k zstd 3032697344   nbdkit1.593 s ±  0.039 s


2048kzlib 3197569024   qemu800   4.327 s ±  0.051 s
2048kzstd 2995143168   qemu800   1.465 s ±  0.085 s

2048kzlib 3197569024   qemugit   4.323 s ±  0.031 s
2048kzstd 2995143168   qemugit   1.484 s ±  0.067 s

2048kzlib 3197569024   qemuthr   1.299 s ±  0.055 s
2048kzstd 2995143168   qemuthr   1.229 s ±  0.046 s

2048kzlib 3197569024   nbdkit2M  1.636 s ±  0.071 s
2048kzstd 2995143168   nbdkit2M  1.644 s ±  0.040 s


Increasing the number of threads makes a big difference, so I think
changing the default (or making it run-time adjustable somehow) is a
good idea, also an easy win.

Increased qcow2 threads + zlib-ng would be _very_ interesting.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
nbdkit - Flexible, fast NBD server with plugins
https://gitlab.com/nbdkit/nbdkit




Re: [PATCH v3 5/5] vhost-user-scsi: start vhost when guest kicks

2023-09-01 Thread Markus Armbruster
Li Feng  writes:

> Let's keep the same behavior as vhost-user-blk.
>
> Some old guests kick virtqueue before setting VIRTIO_CONFIG_S_DRIVER_OK.
>
> Signed-off-by: Li Feng 
> ---
>  hw/scsi/vhost-user-scsi.c | 48 +++
>  1 file changed, 44 insertions(+), 4 deletions(-)
>
> diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
> index 5bf012461b..a7fa8e8df2 100644
> --- a/hw/scsi/vhost-user-scsi.c
> +++ b/hw/scsi/vhost-user-scsi.c
> @@ -113,8 +113,48 @@ static void vhost_user_scsi_reset(VirtIODevice *vdev)
>  }
>  }
>  
> -static void vhost_dummy_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> +static void vhost_user_scsi_handle_output(VirtIODevice *vdev, VirtQueue *vq)
>  {
> +VHostUserSCSI *s = (VHostUserSCSI *)vdev;
> +DeviceState *dev = &s->parent_obj.parent_obj.parent_obj.parent_obj;
> +VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s);
> +VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev);
> +
> +Error *local_err = NULL;
> +int i, ret;
> +
> +if (!vdev->start_on_kick) {
> +return;
> +}
> +
> +if (!s->connected) {
> +return;
> +}
> +
> +if (vhost_dev_is_started(&vsc->dev)) {
> +return;
> +}
> +
> +/*
> + * Some guests kick before setting VIRTIO_CONFIG_S_DRIVER_OK so start
> + * vhost here instead of waiting for .set_status().
> + */
> +ret = vhost_user_scsi_start(s);
> +if (ret < 0) {
> +error_reportf_err(local_err, "vhost-user-scsi: vhost start failed: 
> ");

Crashes, since @local_err is null.  Please test your error paths.

Obvious fix: drop this call.

> +qemu_chr_fe_disconnect(&vs->conf.chardev);
> +return;
> +}
> +
> +/* Kick right away to begin processing requests already in vring */
> +for (i = 0; i < vsc->dev.nvqs; i++) {
> +VirtQueue *kick_vq = virtio_get_queue(vdev, i);
> +
> +if (!virtio_queue_get_desc_addr(vdev, i)) {
> +continue;
> +}
> +event_notifier_set(virtio_queue_get_host_notifier(kick_vq));
> +}
>  }
>  
>  static int vhost_user_scsi_connect(DeviceState *dev, Error **errp)
> @@ -243,9 +283,9 @@ static void vhost_user_scsi_realize(DeviceState *dev, 
> Error **errp)
>  return;
>  }
>  
> -virtio_scsi_common_realize(dev, vhost_dummy_handle_output,
> -   vhost_dummy_handle_output,
> -   vhost_dummy_handle_output, &err);
> +virtio_scsi_common_realize(dev, vhost_user_scsi_handle_output,
> +   vhost_user_scsi_handle_output,
> +   vhost_user_scsi_handle_output, &err);
>  if (err != NULL) {
>  error_propagate(errp, err);
>  return;




Re: [PATCH 04/11] target/m68k: Clean up local variable shadowing

2023-09-01 Thread Peter Maydell
On Thu, 31 Aug 2023 at 23:58, Philippe Mathieu-Daudé  wrote:
>
> Fix:
>
>   target/m68k/translate.c:828:18: error: declaration shadows a local variable 
> [-Werror,-Wshadow]
> TCGv tmp = tcg_temp_new();
>  ^
>   target/m68k/translate.c:801:15: note: previous declaration is here
> TCGv reg, tmp, result;
>   ^
>
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  target/m68k/translate.c | 2 +-

Reviewed-by: Peter Maydell 

thanks
-- PMM



Re: [PATCH 06/11] hw/arm/allwinner: Clean up local variable shadowing

2023-09-01 Thread Peter Maydell
On Thu, 31 Aug 2023 at 23:56, Philippe Mathieu-Daudé  wrote:
>
> Fix:
>
>   hw/arm/allwinner-r40.c:412:14: error: declaration shadows a local variable 
> [-Werror,-Wshadow]
> for (int i = 0; i < AW_R40_NUM_MMCS; i++) {
>  ^
>   hw/arm/allwinner-r40.c:299:14: note: previous declaration is here
> unsigned i;
>  ^
>
> Signed-off-by: Philippe Mathieu-Daudé 
> ---

Reviewed-by: Peter Maydell 

thanks
-- PMM



Re: [PATCH 05/11] hw/arm/virt: Clean up local variable shadowing

2023-09-01 Thread Peter Maydell
On Thu, 31 Aug 2023 at 23:56, Philippe Mathieu-Daudé  wrote:
>
> Fix:
>
>   hw/arm/virt.c:821:22: error: declaration shadows a local variable 
> [-Werror,-Wshadow]
> qemu_irq irq = qdev_get_gpio_in(vms->gic,
>  ^
>   hw/arm/virt.c:803:13: note: previous declaration is here
> int irq;
> ^
>
> Signed-off-by: Philippe Mathieu-Daudé 
> ---

Reviewed-by: Peter Maydell 

thanks
-- PMM



Re: [PATCH] fix bdrv_open_child return value check

2023-09-01 Thread Дмитрий Фролов

Hello, Kevin.

This was just cleanup, based on the inspection.

Dmitry.

01.09.2023 14:15, Kevin Wolf пишет:

Am 31.08.2023 um 14:59 hat Dmitry Frolov geschrieben:

bdrv_open_child() may return NULL.
Usually return value is checked for this function.
Check for return value is more reliable.

Fixes: 24bc15d1f6 ("vmdk: Use BdrvChild instead of BDS for references to 
extents")

Signed-off-by: Dmitry Frolov 

Did you actually see some failure because of this? If so, what is the
specific case that failed?

Or is this just cleanup based on code inspection?

Kevin






Re: [PATCH] fix bdrv_open_child return value check

2023-09-01 Thread Kevin Wolf
Am 31.08.2023 um 14:59 hat Dmitry Frolov geschrieben:
> bdrv_open_child() may return NULL.
> Usually return value is checked for this function.
> Check for return value is more reliable.
> 
> Fixes: 24bc15d1f6 ("vmdk: Use BdrvChild instead of BDS for references to 
> extents")
> 
> Signed-off-by: Dmitry Frolov 

Did you actually see some failure because of this? If so, what is the
specific case that failed?

Or is this just cleanup based on code inspection?

Kevin




Re: zlib-ng as a compat replacement for zlib

2023-09-01 Thread Kevin Wolf
Am 01.09.2023 um 12:03 hat Richard W.M. Jones geschrieben:
> On Fri, Sep 01, 2023 at 10:55:50AM +0100, Daniel P. Berrangé wrote:
> > On Fri, Sep 01, 2023 at 10:42:16AM +0100, Richard W.M. Jones wrote:
> > > On Fri, Sep 01, 2023 at 10:48:14AM +0200, Kevin Wolf wrote:
> > > > I understand the context and didn't really think about file size at all.
> > > > 
> > > > My question was essentially if decompressing many small blocks (as we do
> > > > today) performs significantly different from decompressing fewer larger
> > > > blocks (as we would do with a larger cluster size).
> > > 
> > > I did a quick test just by adjusting the cluster size of a qcow2 file:
> > > 
> > >   $ virt-builder fedora-36
> > >   $ ls -lsh fedora-36.img 
> > >   1.2G -rw-r--r--. 1 rjones rjones 6.0G Sep  1 09:53 fedora-36.img
> > >   $ cat fedora-36.img fedora-36.img fedora-36.img fedora-36.img  > 
> > > test.raw
> > >   $ ls -lsh test.raw 
> > >   4.7G -rw-r--r--. 1 rjones rjones 24G Sep  1 09:53 test.raw
> > >   $ qemu-img convert -f raw test.raw -O qcow2 test.qcow2.zlib.4k -c -o 
> > > compression_type=zlib,cluster_size=4096
> > > 
> > > (for cluster sizes 4k, 64k, 512k, 2048k, and
> > > compression types zlib & zstd)
> > > 
> > > I tested the speed of decompression using:
> > > 
> > >   $ hyperfine 'qemu-img convert -W -m 16 -f qcow2 test.qcow2.XXX -O raw 
> > > test.out'
> > >   (qemu 8.0.0-4.fc39.x86_64)
> > > 
> > >   $ hyperfine 'nbdkit -U - --filter=qcow2dec file test.qcow2.XXX --run 
> > > '\''nbdcopy --request-size "$uri" test.out'\'' '
> > >   (nbdkit-1.35.11-2.fc40.x86_64)
> > > 
> > > Results:
> > > 
> > >   Cluster  Compression  Compressed size  Prog   Decompression speed
> > > 
> > >   4k   zlib 3228811264   qemu   5.921 s ±  0.074 s
> > >   4k   zstd 3258097664   qemu   5.189 s ±  0.158 s
> > > 
> > >   4k   zlib/zstd nbdkit failed, bug!!
> > > 
> > >   64k  zlib 3164667904   qemu   3.579 s ±  0.094 s
> > >   64k  zstd 3132686336   qemu   1.770 s ±  0.060 s
> > > 
> > >   64k  zlib 3164667904   nbdkit 1.254 s ±  0.065 s
> > >   64k  zstd 3132686336   nbdkit 1.315 s ±  0.037 s
> > > 
> > >   512k zlib 3158744576   qemu   4.008 s ±  0.058 s
> > >   512k zstd 3032697344   qemu   1.503 s ±  0.072 s
> > > 
> > >   512k zlib 3158744576   nbdkit 1.702 s ±  0.026 s
> > >   512k zstd 3032697344   nbdkit 1.593 s ±  0.039 s
> > > 
> > >   2048kzlib 3197569024   qemu   4.327 s ±  0.051 s
> > >   2048kzstd 2995143168   qemu   1.465 s ±  0.085 s
> > > 
> > >   2048kzlib 3197569024   nbdkit 3.660 s ±  0.011 s
> > >   2048kzstd 2995143168   nbdkit 3.368 s ±  0.057 s
> > > 
> > > No great surprises - very small cluster size is inefficient, but
> > > scaling up the cluster size gain performance, and zstd performs better
> > > than zlib once the cluster size is sufficiently large.

It's interesting that for zstd, qemu-img keeps getting better with
increasing cluster size, while zlib numbers get worse again above 64k.
And nbdkit seems to get worse instead of better with larger cluster
size, no matter whether zlib or zstd is used.

> > The default qcow2 cluster size is 64k, which means we've already
> > got the vast majority of the perfornmance and file size win. Going
> > beyond 64k defaults doesn't seem massively compelling.
> > 
> > zstd does have a small space win over zlib as expected, but again
> > nothing so drastic that it seems compelling to change - that win
> > will be line noise in the overall bigger picture of image storage
> > and download times.
> 
> Yeah, I was a bit surprised by this.  I expected zstd files to be
> significantly smaller than zlib even though that's not what zstd is
> optimized for.  Not that they'd be about the same.
> 
> > The major difference here is that zstd is much faster than zlib
> > at decompress. I'd be curious if zlib-ng closes that gap ?
> 
> It's quite hard to use zlib-ng in Fedora (currently) since it requires
> changes to the source code.  That is what the pull request being
> discussed would change, as you could simply install zlib-ng-compat
> which would replace libz.so.  But anyway I can't easily get results
> for qemu + zlib-ng, but we expect it would be ~ 40% faster at
> decompression, and decompression is what is taking most of the time in
> the qemu numbers above.
> 
> I forgot to say that nbdkit is using zlib-ng, since I made the source
> level changes a few weeks back (but most of the nbdkit performance
> improvement comes from being able to use lots of threads).

Ah, that's actually a very important detail. I was wondering why zlib
was performing so much better in nbdkit when zstd is more or less
comparable in both.

If you think using more threads is the key for the remaining difference
at 64k, would increasing QCOW2_MAX_THREADS (currentl

Re: [PATCH v3 2/4] qcow2: add configurations for zoned format extension

2023-09-01 Thread Markus Armbruster
Sam Li  writes:

> To configure the zoned format feature on the qcow2 driver, it
> requires following arguments: the device size, zoned profile,

"Zoned profile" is gone in v3.

> zone model, zone size, zone capacity, number of conventional
> zones, limits on zone resources (max append sectors, max open
> zones, and max_active_zones).
>
> To create a qcow2 file with zoned format, use command like this:
> $ qemu-img create -f qcow2 test.qcow2 -o size=768M -o
> zone_size=64M -o zone_capacity=64M -o nr_conv_zones=0 -o
> max_append_sectors=512 -o max_open_zones=0 -o max_active_zones=0
> -o zone_model=1
>
> Signed-off-by: Sam Li 

[...]

> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 2b1d493d6e..0d8f9e0a88 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -5021,6 +5021,27 @@
>  # @compression-type: The image cluster compression method
>  # (default: zlib, since 5.1)
>  #
> +# @zone-model: Zoned device model, 1 for host-managed and 0 for

Why is this encoded as a number?

If it's fundamentally a flag, use bool.

If more models could appear in the future, make it an enum.

> +# non-zoned devices (default: 0, since 8.0)

Since 8.2.  More of the same below.

> +#
> +# @zone-size: Total number of logical blocks within zones in bytes
> +# (since 8.0)
> +#
> +# @zone-capacity: The number of usable logical blocks within zones
> +# in bytes. A zone capacity is always smaller or equal to the
> +# zone size. (since 8.0)

Two spaces between sentences for consistency, please.

> +#
> +# @nr-conv-zones: The number of conventional zones of the zoned device
> +# (since 8.0)

I think @conventional-zones would be more obvious.

> +#
> +# @max-open-zones: The maximal allowed open zones (since 8.0)

Maybe "The maximum number of open zones".

> +#
> +# @max-active-zones: The limit of the zones that have the implicit
> +# open, explicit open or closed state (since 8.0)

Maybe "The maximum number of zones in the implicit open, explicit open
or closed state".

> +#
> +# @max-append-sectors: The maximal data size in sectors of a zone
> +# append request that can be issued to the device. (since 8.0)

What's the sector size, and how can the user determine it?  Why can't we
use bytes here?

> +#
>  # Since: 2.12
>  ##
>  { 'struct': 'BlockdevCreateOptionsQcow2',
> @@ -5037,7 +5058,14 @@
>  '*preallocation':   'PreallocMode',
>  '*lazy-refcounts':  'bool',
>  '*refcount-bits':   'int',
> -'*compression-type':'Qcow2CompressionType' } }
> +'*compression-type':'Qcow2CompressionType',
> +'*zone-model': 'uint8',
> +'*zone-size':  'size',
> +'*zone-capacity':  'size',
> +'*nr-conv-zones':  'uint32',
> +'*max-open-zones': 'uint32',
> +'*max-active-zones':   'uint32',
> +'*max-append-sectors': 'uint32' } }
>  
>  ##
>  # @BlockdevCreateOptionsQed:




Re: [PATCH 02/11] target/arm: Clean up local variable shadowing

2023-09-01 Thread Peter Maydell
On Thu, 31 Aug 2023 at 23:56, Philippe Mathieu-Daudé  wrote:
>
> Fix:
>
>   target/arm/tcg/translate-m-nocp.c:509:18: error: declaration shadows a 
> local variable [-Werror,-Wshadow]
> TCGv_i32 tmp = load_cpu_field(v7m.fpdscr[M_REG_NS]);
>  ^
>   target/arm/tcg/translate-m-nocp.c:433:14: note: previous declaration is here
> TCGv_i32 tmp;
>  ^
>   target/arm/tcg/mve_helper.c:2463:17: error: declaration shadows a local 
> variable [-Werror,-Wshadow]
> int64_t extval = sextract64(src << shift, 0, 48);
> ^
>   target/arm/tcg/mve_helper.c:2443:18: note: previous declaration is here
> int64_t val, extval;
>  ^
>   target/arm/hvf/hvf.c:1936:13: error: declaration shadows a local variable 
> [-Werror,-Wshadow]
> int ret = 0;
> ^
>   target/arm/hvf/hvf.c:1807:9: note: previous declaration is here
> int ret;
> ^
>
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  target/arm/hvf/hvf.c  | 1 -
>  target/arm/tcg/mve_helper.c   | 8 
>  target/arm/tcg/translate-m-nocp.c | 2 +-
>  3 files changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c
> index 486f90be1d..20d534faef 100644
> --- a/target/arm/hvf/hvf.c
> +++ b/target/arm/hvf/hvf.c
> @@ -1933,7 +1933,6 @@ int hvf_vcpu_exec(CPUState *cpu)
>  uint32_t rt = (syndrome >> 5) & 0x1f;
>  uint32_t reg = syndrome & SYSREG_MASK;
>  uint64_t val;
> -int ret = 0;
>
>  if (isread) {
>  ret = hvf_sysreg_read(cpu, reg, rt);

I'm not sure this is correct.

The hvf_vcpu_exec() function is not documented, but in practice
its caller expects it to return either EXCP_DEBUG (for "this was
a guest debug exception you need to deal with") or something else
(presumably the intention being 0 for OK).

The hvf_sysreg_read() and hvf_sysreg_write() functions are also not
documented, but they return 0 on success, or 1 for a completely
unrecognized sysreg where we've raised the UNDEF exception (but
not if we raised an UNDEF exception for an unrecognized GIC sysreg --
I think this is a bug). We use this return value to decide whether
we need to advance the PC past the insn or not. It's not the same
as the return value we want to return from hvf_vcpu_exec().

So I think the correct fix here is to retain the variable as
locally scoped but give it a name that doesn't clash with the
other function-scoped variable.

> diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c
> index 403b345ea3..32087b6f0a 100644
> --- a/target/arm/tcg/mve_helper.c
> +++ b/target/arm/tcg/mve_helper.c
> @@ -924,8 +924,8 @@ DO_1OP_IMM(vorri, DO_ORRI)
>  bool qc = false;\
>  for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {  \
>  bool sat = false;   \
> -TYPE r = FN(n[H##ESIZE(e)], m[H##ESIZE(e)], &sat);  \
> -mergemask(&d[H##ESIZE(e)], r, mask);\
> +TYPE r_ = FN(n[H##ESIZE(e)], m[H##ESIZE(e)], &sat); \
> +mergemask(&d[H##ESIZE(e)], r_, mask);   \
>  qc |= sat & mask & 1;   \
>  }   \
>  if (qc) {   \

The commit message doesn't list an error message relating to
this change and it's not immediately obvious to me what 'r'
would be shadowing here.

> @@ -2460,7 +2460,7 @@ static inline int64_t do_sqrshl48_d(int64_t src, 
> int64_t shift,
>  return extval;
>  }
>  } else if (shift < 48) {
> -int64_t extval = sextract64(src << shift, 0, 48);
> +extval = sextract64(src << shift, 0, 48);
>  if (!sat || src == (extval >> shift)) {
>  return extval;
>  }
> @@ -2492,7 +2492,7 @@ static inline uint64_t do_uqrshl48_d(uint64_t src, 
> int64_t shift,
>  return extval;
>  }
>  } else if (shift < 48) {
> -uint64_t extval = extract64(src << shift, 0, 48);
> +extval = extract64(src << shift, 0, 48);
>  if (!sat || src == (extval >> shift)) {
>  return extval;
>  }

These two parts are good, but one of them is missing from the
listed error messages in the commit message.

> diff --git a/target/arm/tcg/translate-m-nocp.c 
> b/target/arm/tcg/translate-m-nocp.c
> index 33f6478bb9..42308c4db5 100644
> --- a/target/arm/tcg/translate-m-nocp.c
> +++ b/target/arm/tcg/translate-m-nocp.c
> @@ -506,7 +506,7 @@ static bool gen_M_fp_sysreg_read(DisasContext *s, int 
> regno,
>
>  gen_branch_fpInactive(s, TCG_COND_EQ, lab_active);
>  /* fpInactive case: reads as FPDSCR_NS */
> -TCGv_i32 tmp = load_cpu_field(v7m.fpdscr[M_REG_NS]);
> +tmp = 

Re: [PATCH v3 10/20] hw/virtio: add config support to vhost-user-device

2023-09-01 Thread Erik Schilling
On Thu Aug 31, 2023 at 5:47 PM CEST, Alex Bennée wrote:
>
> Albert Esteve  writes:
>
> > Sorry to bring up this post, it's been a while since you posted.
> > But I have been testing the patch the last couple of days.
> >
> > On Mon, Jul 10, 2023 at 9:58 PM Michael S. Tsirkin  wrote:
> >
> >  On Mon, Jul 10, 2023 at 04:35:12PM +0100, Alex Bennée wrote:
> >  > To use the generic device the user will need to provide the config
> >  > region size via the command line. We also add a notifier so the guest
> >  > can be pinged if the remote daemon updates the config.
> >  > 
> >  > With these changes:
> >  > 
> >  >   -device vhost-user-device-pci,virtio-id=41,num_vqs=2,config_size=8
> >  > 
> >  > is equivalent to:
> >  > 
> >  >   -device vhost-user-gpio-pci
> >  > 
> >  > Signed-off-by: Alex Bennée 
> >
> >  This one I think it's best to defer until we get a better
> >  handle on how we want the configuration to look.
> >
> >  > ---
> >  >  include/hw/virtio/vhost-user-device.h |  1 +
> >  >  hw/virtio/vhost-user-device.c | 58 ++-
> >  >  2 files changed, 58 insertions(+), 1 deletion(-)
> >  > 
> >  > diff --git a/include/hw/virtio/vhost-user-device.h 
> > b/include/hw/virtio/vhost-user-device.h
> >  > index 9105011e25..3ddf88a146 100644
> >  > --- a/include/hw/virtio/vhost-user-device.h
> >  > +++ b/include/hw/virtio/vhost-user-device.h
> >  > @@ -22,6 +22,7 @@ struct VHostUserBase {
> >  >  CharBackend chardev;
> >  >  uint16_t virtio_id;
> >  >  uint32_t num_vqs;
> >  > +uint32_t config_size;
> >  >  /* State tracking */
> >  >  VhostUserState vhost_user;
> >  >  struct vhost_virtqueue *vhost_vq;
> >  > diff --git a/hw/virtio/vhost-user-device.c 
> > b/hw/virtio/vhost-user-device.c
> >  > index b0239fa033..2b028cae08 100644
> >  > --- a/hw/virtio/vhost-user-device.c
> >  > +++ b/hw/virtio/vhost-user-device.c
> >  > @@ -117,6 +117,42 @@ static uint64_t vub_get_features(VirtIODevice *vdev,
> >  >  return vub->vhost_dev.features & ~(1ULL << 
> > VHOST_USER_F_PROTOCOL_FEATURES);
> >  >  }
> >  >  
> >  > +/*
> >  > + * To handle VirtIO config we need to know the size of the config
> >  > + * space. We don't cache the config but re-fetch it from the guest
> >  > + * every time in case something has changed.
> >  > + */
> >  > +static void vub_get_config(VirtIODevice *vdev, uint8_t *config)
> >  > +{
> >  > +VHostUserBase *vub = VHOST_USER_BASE(vdev);
> >  > +Error *local_err = NULL;
> >  > +
> >  > +/*
> >  > + * There will have been a warning during vhost_dev_init, but lets
> >  > + * assert here as nothing will go right now.
> >  > + */
> >  > +g_assert(vub->config_size && vub->vhost_user.supports_config == 
> > true);
> >  > +
> >  > +if (vhost_dev_get_config(&vub->vhost_dev, config,
> >  > + vub->config_size, &local_err)) {
> >  > +error_report_err(local_err);
> >  > +}
> >  > +}
> >  > +
> >  > +/*
> >  > + * When the daemon signals an update to the config we just need to
> >  > + * signal the guest as we re-read the config on demand above.
> >  > + */
> >  > +static int vub_config_notifier(struct vhost_dev *dev)
> >  > +{
> >  > +virtio_notify_config(dev->vdev);
> >  > +return 0;
> >  > +}
> >  > +
> >  > +const VhostDevConfigOps vub_config_ops = {
> >  > +.vhost_dev_config_notifier = vub_config_notifier,
> >  > +};
> >  > +
> >  >  static void vub_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> >  >  {
> >  >  /*
> >  > @@ -141,12 +177,21 @@ static int vub_connect(DeviceState *dev)
> >  >  {
> >  >  VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> >  >  VHostUserBase *vub = VHOST_USER_BASE(vdev);
> >  > +struct vhost_dev *vhost_dev = &vub->vhost_dev;
> >  >  
> >  >  if (vub->connected) {
> >  >  return 0;
> >  >  }
> >  >  vub->connected = true;
> >  >  
> >  > +/*
> >  > + * If we support VHOST_USER_GET_CONFIG we must enable the notifier
> >  > + * so we can ping the guest when it updates.
> >  > + */
> >  > +if (vub->vhost_user.supports_config) {
> >  > +vhost_dev_set_config_notifier(vhost_dev, &vub_config_ops);
> >  > +}
> >  > +
> >  >  /* restore vhost state */
> >  >  if (virtio_device_started(vdev, vdev->status)) {
> >  >  vub_start(vdev);
> >  > @@ -214,11 +259,20 @@ static void vub_device_realize(DeviceState *dev, 
> > Error **errp)
> >  >  vub->num_vqs = 1; /* reasonable default? */
> >  >  }
> >  >  
> >  > +/*
> >  > + * We can't handle config requests unless we know the size of the
> >  > + * config region, specialisations of the vhost-user-device will be
> >  > + * able to set this.
> >  > + */
> >  > +if (vub->config_size) {
> >  > +vub->vhost_user.supports_config = true;
> >  > +}
> >
> > Shouldn't the `supports_config = true' be set before we call 
> > vhost_dev_init() a few lines above?
> > Otherwise, we end up che

Re: [PATCH] qemu-img: Update documentation for compressed images

2023-09-01 Thread Richard W.M. Jones
On Fri, Sep 01, 2023 at 12:24:30PM +0200, Kevin Wolf wrote:
> Document the 'compression_type' option for qcow2, and mention that
> streamOptimized vmdk supports compression, too.
> 
> Reported-by: Richard W.M. Jones 
> Signed-off-by: Kevin Wolf 

Looks good, so:

Reviewed-by: Richard W.M. Jones 

> ---
>  docs/tools/qemu-img.rst | 19 +--
>  1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
> index 15aeddc6d8..ca5a2773cf 100644
> --- a/docs/tools/qemu-img.rst
> +++ b/docs/tools/qemu-img.rst
> @@ -106,7 +106,11 @@ by the used format or see the format descriptions below 
> for details.
>  
>  .. option:: -c
>  
> -  Indicates that target image must be compressed (qcow format only).
> +  Indicates that target image must be compressed (qcow/qcow2 and vmdk with
> +  streamOptimized subformat only).
> +
> +  For qcow2, the compression algorithm can be specified with the ``-o
> +  compression_type=...`` option (see below).
>  
>  .. option:: -h
>  
> @@ -776,7 +780,7 @@ Supported image file formats:
>  
>QEMU image format, the most versatile format. Use it to have smaller
>images (useful if your filesystem does not supports holes, for example
> -  on Windows), optional AES encryption, zlib based compression and
> +  on Windows), optional AES encryption, zlib or zstd based compression and
>support of multiple VM snapshots.
>  
>Supported options:
> @@ -794,6 +798,17 @@ Supported image file formats:
>``backing_fmt``
>  Image format of the base image
>  
> +  ``compression_type``
> +This option configures which compression algorithm will be used for
> +compressed clusters on the image. Note that setting this option doesn't 
> yet
> +cause the image to actually receive compressed writes. It is most 
> commonly
> +used with the ``-c`` option of ``qemu-img convert``, but can also be used
> +with the ``compress`` filter driver or backup block jobs with compression
> +enabled.
> +
> +Valid values are ``zlib`` and ``zstd``. For images that use
> +``compat=0.10``, only ``zlib`` compression is available.
> +
>``encryption``
>  If this option is set to ``on``, the image is encrypted with
>  128-bit AES-CBC.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org




Re: zlib-ng as a compat replacement for zlib

2023-09-01 Thread Richard W.M. Jones
On Fri, Sep 01, 2023 at 11:06:20AM +0100, Daniel P. Berrangé wrote:
> On Fri, Sep 01, 2023 at 11:03:54AM +0100, Richard W.M. Jones wrote:
> > I forgot to say that nbdkit is using zlib-ng, since I made the source
> > level changes a few weeks back (but most of the nbdkit performance
> > improvement comes from being able to use lots of threads).
> 
> Ah that last point is interesting. If we look at nbdkit results we can
> see that while zstd is clearly faster, the margin of the win is massively
> lower. So I presume we can infer similar margins if qemu-img were
> switched too.

FWIW here's the nbdkit commit which added zlib-ng support (assuming
you don't want to wait for a compat package).  It's not a massive
change so something similar might be done for qemu:

https://gitlab.com/nbdkit/nbdkit/-/commit/1b67e323e998a5d719f1afe43d5be84e45c6739b

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines.  Supports shell scripting,
bindings from many languages.  http://libguestfs.org




[PATCH] qemu-img: Update documentation for compressed images

2023-09-01 Thread Kevin Wolf
Document the 'compression_type' option for qcow2, and mention that
streamOptimized vmdk supports compression, too.

Reported-by: Richard W.M. Jones 
Signed-off-by: Kevin Wolf 
---
 docs/tools/qemu-img.rst | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
index 15aeddc6d8..ca5a2773cf 100644
--- a/docs/tools/qemu-img.rst
+++ b/docs/tools/qemu-img.rst
@@ -106,7 +106,11 @@ by the used format or see the format descriptions below 
for details.
 
 .. option:: -c
 
-  Indicates that target image must be compressed (qcow format only).
+  Indicates that target image must be compressed (qcow/qcow2 and vmdk with
+  streamOptimized subformat only).
+
+  For qcow2, the compression algorithm can be specified with the ``-o
+  compression_type=...`` option (see below).
 
 .. option:: -h
 
@@ -776,7 +780,7 @@ Supported image file formats:
 
   QEMU image format, the most versatile format. Use it to have smaller
   images (useful if your filesystem does not supports holes, for example
-  on Windows), optional AES encryption, zlib based compression and
+  on Windows), optional AES encryption, zlib or zstd based compression and
   support of multiple VM snapshots.
 
   Supported options:
@@ -794,6 +798,17 @@ Supported image file formats:
   ``backing_fmt``
 Image format of the base image
 
+  ``compression_type``
+This option configures which compression algorithm will be used for
+compressed clusters on the image. Note that setting this option doesn't yet
+cause the image to actually receive compressed writes. It is most commonly
+used with the ``-c`` option of ``qemu-img convert``, but can also be used
+with the ``compress`` filter driver or backup block jobs with compression
+enabled.
+
+Valid values are ``zlib`` and ``zstd``. For images that use
+``compat=0.10``, only ``zlib`` compression is available.
+
   ``encryption``
 If this option is set to ``on``, the image is encrypted with
 128-bit AES-CBC.
-- 
2.41.0




Re: zlib-ng as a compat replacement for zlib

2023-09-01 Thread Daniel P . Berrangé
On Fri, Sep 01, 2023 at 11:03:54AM +0100, Richard W.M. Jones wrote:
> On Fri, Sep 01, 2023 at 10:55:50AM +0100, Daniel P. Berrangé wrote:
> > On Fri, Sep 01, 2023 at 10:42:16AM +0100, Richard W.M. Jones wrote:
> > > On Fri, Sep 01, 2023 at 10:48:14AM +0200, Kevin Wolf wrote:
> > > > Am 31.08.2023 um 11:20 hat Richard W.M. Jones geschrieben:
> > > > > On Thu, Aug 31, 2023 at 11:05:55AM +0200, Kevin Wolf wrote:
> > > > > > [ Cc: qemu-block ]
> > > > > > 
> > > > > > Am 30.08.2023 um 20:26 hat Richard W.M. Jones geschrieben:
> > > > > > > On Tue, Aug 29, 2023 at 05:49:24PM -, Daniel Alley wrote:
> > > > > > > > > The background to this is I've spent far too long trying to 
> > > > > > > > > optimize
> > > > > > > > > the conversion of qcow2 files to raw files.  Most existing 
> > > > > > > > > qcow2 files
> > > > > > > > > that you can find online are zlib compressed, including the 
> > > > > > > > > qcow2
> > > > > > > > > images provided by Fedora.  Each cluster in the file is 
> > > > > > > > > separately
> > > > > > > > > compressed as a zlib stream, and qemu uses zlib library 
> > > > > > > > > functions to
> > > > > > > > > decompress them.  When downloading and decompressing these 
> > > > > > > > > files, I
> > > > > > > > > measured 40%+ of the total CPU time is doing zlib 
> > > > > > > > > decompression.
> > > > > > > > > 
> > > > > > > > > [You don't need to tell me how great Zstd is, qcow2 supports 
> > > > > > > > > this for
> > > > > > > > > compression also, but it is not widely used by existing 
> > > > > > > > > content.]
> > > > > > 
> > > > > > You make it sound like compressing each cluster individually has a 
> > > > > > big
> > > > > > impact. If so, does increasing the cluster size make a difference, 
> > > > > > too?
> > > > > > That could be an change with less compatibility concerns.
> > > > > 
> > > > > The issue we're discussing in the original thread is speed of
> > > > > decompression.  We noted that using zlib-ng (a not-quite drop-in
> > > > > replacement for zlib) improves decompression speed by 40% or more.
> > > > > 
> > > > > Original thread:
> > > > > https://lists.fedoraproject.org/archives/list/de...@lists.fedoraproject.org/thread/CDNPJ4SOTRQMYVCDI3ZSY4SP4FYESCWD/
> > > > > zlib-ng proposed change:
> > > > > https://src.fedoraproject.org/rpms/zlib-ng/pull-request/3
> > > > > 
> > > > > Size of the compressed file is also a concern, but wasn't discussed.
> > > > 
> > > > I understand the context and didn't really think about file size at all.
> > > > 
> > > > My question was essentially if decompressing many small blocks (as we do
> > > > today) performs significantly different from decompressing fewer larger
> > > > blocks (as we would do with a larger cluster size).
> > > 
> > > I did a quick test just by adjusting the cluster size of a qcow2 file:
> > > 
> > >   $ virt-builder fedora-36
> > >   $ ls -lsh fedora-36.img 
> > >   1.2G -rw-r--r--. 1 rjones rjones 6.0G Sep  1 09:53 fedora-36.img
> > >   $ cat fedora-36.img fedora-36.img fedora-36.img fedora-36.img  > 
> > > test.raw
> > >   $ ls -lsh test.raw 
> > >   4.7G -rw-r--r--. 1 rjones rjones 24G Sep  1 09:53 test.raw
> > >   $ qemu-img convert -f raw test.raw -O qcow2 test.qcow2.zlib.4k -c -o 
> > > compression_type=zlib,cluster_size=4096
> > > 
> > > (for cluster sizes 4k, 64k, 512k, 2048k, and
> > > compression types zlib & zstd)
> > > 
> > > I tested the speed of decompression using:
> > > 
> > >   $ hyperfine 'qemu-img convert -W -m 16 -f qcow2 test.qcow2.XXX -O raw 
> > > test.out'
> > >   (qemu 8.0.0-4.fc39.x86_64)
> > > 
> > >   $ hyperfine 'nbdkit -U - --filter=qcow2dec file test.qcow2.XXX --run 
> > > '\''nbdcopy --request-size "$uri" test.out'\'' '
> > >   (nbdkit-1.35.11-2.fc40.x86_64)
> > > 
> > > Results:
> > > 
> > >   Cluster  Compression  Compressed size  Prog   Decompression speed
> > > 
> > >   4k   zlib 3228811264   qemu   5.921 s ±  0.074 s
> > >   4k   zstd 3258097664   qemu   5.189 s ±  0.158 s
> > > 
> > >   4k   zlib/zstd nbdkit failed, bug!!
> > > 
> > >   64k  zlib 3164667904   qemu   3.579 s ±  0.094 s
> > >   64k  zstd 3132686336   qemu   1.770 s ±  0.060 s
> > > 
> > >   64k  zlib 3164667904   nbdkit 1.254 s ±  0.065 s
> > >   64k  zstd 3132686336   nbdkit 1.315 s ±  0.037 s
> > > 
> > >   512k zlib 3158744576   qemu   4.008 s ±  0.058 s
> > >   512k zstd 3032697344   qemu   1.503 s ±  0.072 s
> > > 
> > >   512k zlib 3158744576   nbdkit 1.702 s ±  0.026 s
> > >   512k zstd 3032697344   nbdkit 1.593 s ±  0.039 s
> > > 
> > >   2048kzlib 3197569024   qemu   4.327 s ±  0.051 s
> > >   2048kzstd 2995143168   qemu   1.465 s ±  0.085 s
> > > 
> > >   2048kzlib 3197569024   nbdkit 3.660 s ±  0.011 s
> > >   2048kzstd 2995143168   nbdkit 3.368 s 

Re: zlib-ng as a compat replacement for zlib

2023-09-01 Thread Richard W.M. Jones
On Fri, Sep 01, 2023 at 10:55:50AM +0100, Daniel P. Berrangé wrote:
> On Fri, Sep 01, 2023 at 10:42:16AM +0100, Richard W.M. Jones wrote:
> > On Fri, Sep 01, 2023 at 10:48:14AM +0200, Kevin Wolf wrote:
> > > Am 31.08.2023 um 11:20 hat Richard W.M. Jones geschrieben:
> > > > On Thu, Aug 31, 2023 at 11:05:55AM +0200, Kevin Wolf wrote:
> > > > > [ Cc: qemu-block ]
> > > > > 
> > > > > Am 30.08.2023 um 20:26 hat Richard W.M. Jones geschrieben:
> > > > > > On Tue, Aug 29, 2023 at 05:49:24PM -, Daniel Alley wrote:
> > > > > > > > The background to this is I've spent far too long trying to 
> > > > > > > > optimize
> > > > > > > > the conversion of qcow2 files to raw files.  Most existing 
> > > > > > > > qcow2 files
> > > > > > > > that you can find online are zlib compressed, including the 
> > > > > > > > qcow2
> > > > > > > > images provided by Fedora.  Each cluster in the file is 
> > > > > > > > separately
> > > > > > > > compressed as a zlib stream, and qemu uses zlib library 
> > > > > > > > functions to
> > > > > > > > decompress them.  When downloading and decompressing these 
> > > > > > > > files, I
> > > > > > > > measured 40%+ of the total CPU time is doing zlib decompression.
> > > > > > > > 
> > > > > > > > [You don't need to tell me how great Zstd is, qcow2 supports 
> > > > > > > > this for
> > > > > > > > compression also, but it is not widely used by existing 
> > > > > > > > content.]
> > > > > 
> > > > > You make it sound like compressing each cluster individually has a big
> > > > > impact. If so, does increasing the cluster size make a difference, 
> > > > > too?
> > > > > That could be an change with less compatibility concerns.
> > > > 
> > > > The issue we're discussing in the original thread is speed of
> > > > decompression.  We noted that using zlib-ng (a not-quite drop-in
> > > > replacement for zlib) improves decompression speed by 40% or more.
> > > > 
> > > > Original thread:
> > > > https://lists.fedoraproject.org/archives/list/de...@lists.fedoraproject.org/thread/CDNPJ4SOTRQMYVCDI3ZSY4SP4FYESCWD/
> > > > zlib-ng proposed change:
> > > > https://src.fedoraproject.org/rpms/zlib-ng/pull-request/3
> > > > 
> > > > Size of the compressed file is also a concern, but wasn't discussed.
> > > 
> > > I understand the context and didn't really think about file size at all.
> > > 
> > > My question was essentially if decompressing many small blocks (as we do
> > > today) performs significantly different from decompressing fewer larger
> > > blocks (as we would do with a larger cluster size).
> > 
> > I did a quick test just by adjusting the cluster size of a qcow2 file:
> > 
> >   $ virt-builder fedora-36
> >   $ ls -lsh fedora-36.img 
> >   1.2G -rw-r--r--. 1 rjones rjones 6.0G Sep  1 09:53 fedora-36.img
> >   $ cat fedora-36.img fedora-36.img fedora-36.img fedora-36.img  > test.raw
> >   $ ls -lsh test.raw 
> >   4.7G -rw-r--r--. 1 rjones rjones 24G Sep  1 09:53 test.raw
> >   $ qemu-img convert -f raw test.raw -O qcow2 test.qcow2.zlib.4k -c -o 
> > compression_type=zlib,cluster_size=4096
> > 
> > (for cluster sizes 4k, 64k, 512k, 2048k, and
> > compression types zlib & zstd)
> > 
> > I tested the speed of decompression using:
> > 
> >   $ hyperfine 'qemu-img convert -W -m 16 -f qcow2 test.qcow2.XXX -O raw 
> > test.out'
> >   (qemu 8.0.0-4.fc39.x86_64)
> > 
> >   $ hyperfine 'nbdkit -U - --filter=qcow2dec file test.qcow2.XXX --run 
> > '\''nbdcopy --request-size "$uri" test.out'\'' '
> >   (nbdkit-1.35.11-2.fc40.x86_64)
> > 
> > Results:
> > 
> >   Cluster  Compression  Compressed size  Prog   Decompression speed
> > 
> >   4k   zlib 3228811264   qemu   5.921 s ±  0.074 s
> >   4k   zstd 3258097664   qemu   5.189 s ±  0.158 s
> > 
> >   4k   zlib/zstd nbdkit failed, bug!!
> > 
> >   64k  zlib 3164667904   qemu   3.579 s ±  0.094 s
> >   64k  zstd 3132686336   qemu   1.770 s ±  0.060 s
> > 
> >   64k  zlib 3164667904   nbdkit 1.254 s ±  0.065 s
> >   64k  zstd 3132686336   nbdkit 1.315 s ±  0.037 s
> > 
> >   512k zlib 3158744576   qemu   4.008 s ±  0.058 s
> >   512k zstd 3032697344   qemu   1.503 s ±  0.072 s
> > 
> >   512k zlib 3158744576   nbdkit 1.702 s ±  0.026 s
> >   512k zstd 3032697344   nbdkit 1.593 s ±  0.039 s
> > 
> >   2048kzlib 3197569024   qemu   4.327 s ±  0.051 s
> >   2048kzstd 2995143168   qemu   1.465 s ±  0.085 s
> > 
> >   2048kzlib 3197569024   nbdkit 3.660 s ±  0.011 s
> >   2048kzstd 2995143168   nbdkit 3.368 s ±  0.057 s
> > 
> > No great surprises - very small cluster size is inefficient, but
> > scaling up the cluster size gain performance, and zstd performs better
> > than zlib once the cluster size is sufficiently large.
> 
> The default qcow2 cluster size is 64k, which means we've already
> got the vast maj

Re: [RFC PATCH v3 20/20] hw/virtio: allow vhost-user-device to be driven by backend

2023-09-01 Thread Albert Esteve
On Mon, Jul 10, 2023 at 6:44 PM Alex Bennée  wrote:

> Instead of requiring all the information up front allow the
> vhost_dev_init to complete and then see what information we have from
> the backend.
>
> This does change the order around somewhat.
>
> Signed-off-by: Alex Bennée 
> ---
>  hw/virtio/vhost-user-device.c | 45 +--
>  1 file changed, 32 insertions(+), 13 deletions(-)
>
> diff --git a/hw/virtio/vhost-user-device.c b/hw/virtio/vhost-user-device.c
> index 0109d4829d..b30b6265fb 100644
> --- a/hw/virtio/vhost-user-device.c
> +++ b/hw/virtio/vhost-user-device.c
> @@ -243,7 +243,6 @@ static void vub_device_realize(DeviceState *dev, Error
> **errp)
>  {
>  VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>  VHostUserBase *vub = VHOST_USER_BASE(dev);
> -int ret;
>
>  if (!vub->chardev.chr) {
>  error_setg(errp, "vhost-user-device: missing chardev");
> @@ -254,13 +253,43 @@ static void vub_device_realize(DeviceState *dev,
> Error **errp)
>  return;
>  }
>
> +if (vhost_dev_init(&vub->vhost_dev, &vub->vhost_user,
> +   VHOST_BACKEND_TYPE_USER, 0, errp)!=0) {
> +error_setg(errp, "vhost-user-device: unable to start connection");
> +return;
> +}
> +
> +if (vub->vhost_dev.specs.device_id) {
> +if (vub->virtio_id && vub->virtio_id !=
> vub->vhost_dev.specs.device_id) {
> +error_setg(errp, "vhost-user-device: backend id %d doesn't
> match cli %d",
> +   vub->vhost_dev.specs.device_id, vub->virtio_id);
> +return;
> +}
> +vub->virtio_id = vub->vhost_dev.specs.device_id;
> +}
> +
>  if (!vub->virtio_id) {
> -error_setg(errp, "vhost-user-device: need to define device id");
> +error_setg(errp, "vhost-user-device: need to define or be told
> device id");
>  return;
>  }
>
> +if (vub->vhost_dev.specs.min_vqs) {
> +if (vub->num_vqs) {
> +if (vub->num_vqs < vub->vhost_dev.specs.min_vqs ||
> +vub->num_vqs > vub->vhost_dev.specs.max_vqs) {
> +error_setg(errp,
> +   "vhost-user-device: selected nvqs (%d) out of
> bounds (%d->%d)",
> +   vub->num_vqs,
> +   vub->vhost_dev.specs.min_vqs,
> vub->vhost_dev.specs.max_vqs);
> +return;
> +}
> +} else {
> +vub->num_vqs = vub->vhost_dev.specs.min_vqs;
> +}
> +}
> +
>  if (!vub->num_vqs) {
> -vub->num_vqs = 1; /* reasonable default? */
> +error_setg(errp, "vhost-user-device: need to define number of
> vqs");
>  }
>
>  /*
> @@ -287,16 +316,6 @@ static void vub_device_realize(DeviceState *dev,
> Error **errp)
>  virtio_add_queue(vdev, 4, vub_handle_output));
>  }
>
> -vub->vhost_dev.nvqs = vub->num_vqs;
>

Who sets `vub->vhost_dev.nvqs` after removing this line?
Why having `vub->num_vqs` in the first place? In vub_start for example we
still
use `vub->vhost_dev.nvqs`, and we pass `vhost_dev` to other functions that
use its `nvqs`, so `num_vqs` is redundant and requires a logic to
copy/initialise `vhost_dev.nvqs`.

Maybe it would be better to initialse `nvqs` through a function, in the
device file, instead of doing:
`vub->num_vqs = 2;`
We could have:
`vub_set_nvqs(vub, 2);`
Or something along those lines. And the function will have all the internal
logic in this commit, i.e.,
checking the boundaries, setting the `vhost_dev.nvqs` value, printing the
error, etc.
So we can save the extra variable, and the logic to copy the value to the
device.


> -
> -/* connect to backend */
> -ret = vhost_dev_init(&vub->vhost_dev, &vub->vhost_user,
> - VHOST_BACKEND_TYPE_USER, 0, errp);
> -
> -if (ret < 0) {
> -do_vhost_user_cleanup(vdev, vub);
> -}
> -
>  qemu_chr_fe_set_handlers(&vub->chardev, NULL, NULL, vub_event, NULL,
>   dev, NULL, true);
>  }
> --
> 2.39.2
>
>
>


Re: zlib-ng as a compat replacement for zlib

2023-09-01 Thread Richard W.M. Jones
On Fri, Sep 01, 2023 at 10:42:16AM +0100, Richard W.M. Jones wrote:
> Results:
> 
>   Cluster  Compression  Compressed size  Prog   Decompression speed
> 
>   4k   zlib 3228811264   qemu   5.921 s ±  0.074 s
>   4k   zstd 3258097664   qemu   5.189 s ±  0.158 s
> 
>   4k   zlib/zstd nbdkit failed, bug!!

Stupid bounds checking bug, fixed now [1]:

4k   zlib 3228811264   nbdkit 1.390 s ±  0.094 s
4k   zstd 3258097664   nbdkit 1.328 s ±  0.055 s

>   64k  zlib 3164667904   qemu   3.579 s ±  0.094 s
>   64k  zstd 3132686336   qemu   1.770 s ±  0.060 s
> 
>   64k  zlib 3164667904   nbdkit 1.254 s ±  0.065 s
>   64k  zstd 3132686336   nbdkit 1.315 s ±  0.037 s
> 
>   512k zlib 3158744576   qemu   4.008 s ±  0.058 s
>   512k zstd 3032697344   qemu   1.503 s ±  0.072 s
> 
>   512k zlib 3158744576   nbdkit 1.702 s ±  0.026 s
>   512k zstd 3032697344   nbdkit 1.593 s ±  0.039 s
> 
>   2048kzlib 3197569024   qemu   4.327 s ±  0.051 s
>   2048kzstd 2995143168   qemu   1.465 s ±  0.085 s
> 
>   2048kzlib 3197569024   nbdkit 3.660 s ±  0.011 s
>   2048kzstd 2995143168   nbdkit 3.368 s ±  0.057 s

[1] 
https://gitlab.com/nbdkit/nbdkit/-/commit/73b58dc13dd7a3bbc7e7b3b718a535ee362caa92

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html




Re: zlib-ng as a compat replacement for zlib

2023-09-01 Thread Daniel P . Berrangé
On Fri, Sep 01, 2023 at 10:42:16AM +0100, Richard W.M. Jones wrote:
> On Fri, Sep 01, 2023 at 10:48:14AM +0200, Kevin Wolf wrote:
> > Am 31.08.2023 um 11:20 hat Richard W.M. Jones geschrieben:
> > > On Thu, Aug 31, 2023 at 11:05:55AM +0200, Kevin Wolf wrote:
> > > > [ Cc: qemu-block ]
> > > > 
> > > > Am 30.08.2023 um 20:26 hat Richard W.M. Jones geschrieben:
> > > > > On Tue, Aug 29, 2023 at 05:49:24PM -, Daniel Alley wrote:
> > > > > > > The background to this is I've spent far too long trying to 
> > > > > > > optimize
> > > > > > > the conversion of qcow2 files to raw files.  Most existing qcow2 
> > > > > > > files
> > > > > > > that you can find online are zlib compressed, including the qcow2
> > > > > > > images provided by Fedora.  Each cluster in the file is separately
> > > > > > > compressed as a zlib stream, and qemu uses zlib library functions 
> > > > > > > to
> > > > > > > decompress them.  When downloading and decompressing these files, 
> > > > > > > I
> > > > > > > measured 40%+ of the total CPU time is doing zlib decompression.
> > > > > > > 
> > > > > > > [You don't need to tell me how great Zstd is, qcow2 supports this 
> > > > > > > for
> > > > > > > compression also, but it is not widely used by existing content.]
> > > > 
> > > > You make it sound like compressing each cluster individually has a big
> > > > impact. If so, does increasing the cluster size make a difference, too?
> > > > That could be an change with less compatibility concerns.
> > > 
> > > The issue we're discussing in the original thread is speed of
> > > decompression.  We noted that using zlib-ng (a not-quite drop-in
> > > replacement for zlib) improves decompression speed by 40% or more.
> > > 
> > > Original thread:
> > > https://lists.fedoraproject.org/archives/list/de...@lists.fedoraproject.org/thread/CDNPJ4SOTRQMYVCDI3ZSY4SP4FYESCWD/
> > > zlib-ng proposed change:
> > > https://src.fedoraproject.org/rpms/zlib-ng/pull-request/3
> > > 
> > > Size of the compressed file is also a concern, but wasn't discussed.
> > 
> > I understand the context and didn't really think about file size at all.
> > 
> > My question was essentially if decompressing many small blocks (as we do
> > today) performs significantly different from decompressing fewer larger
> > blocks (as we would do with a larger cluster size).
> 
> I did a quick test just by adjusting the cluster size of a qcow2 file:
> 
>   $ virt-builder fedora-36
>   $ ls -lsh fedora-36.img 
>   1.2G -rw-r--r--. 1 rjones rjones 6.0G Sep  1 09:53 fedora-36.img
>   $ cat fedora-36.img fedora-36.img fedora-36.img fedora-36.img  > test.raw
>   $ ls -lsh test.raw 
>   4.7G -rw-r--r--. 1 rjones rjones 24G Sep  1 09:53 test.raw
>   $ qemu-img convert -f raw test.raw -O qcow2 test.qcow2.zlib.4k -c -o 
> compression_type=zlib,cluster_size=4096
> 
> (for cluster sizes 4k, 64k, 512k, 2048k, and
> compression types zlib & zstd)
> 
> I tested the speed of decompression using:
> 
>   $ hyperfine 'qemu-img convert -W -m 16 -f qcow2 test.qcow2.XXX -O raw 
> test.out'
>   (qemu 8.0.0-4.fc39.x86_64)
> 
>   $ hyperfine 'nbdkit -U - --filter=qcow2dec file test.qcow2.XXX --run 
> '\''nbdcopy --request-size "$uri" test.out'\'' '
>   (nbdkit-1.35.11-2.fc40.x86_64)
> 
> Results:
> 
>   Cluster  Compression  Compressed size  Prog   Decompression speed
> 
>   4k   zlib 3228811264   qemu   5.921 s ±  0.074 s
>   4k   zstd 3258097664   qemu   5.189 s ±  0.158 s
> 
>   4k   zlib/zstd nbdkit failed, bug!!
> 
>   64k  zlib 3164667904   qemu   3.579 s ±  0.094 s
>   64k  zstd 3132686336   qemu   1.770 s ±  0.060 s
> 
>   64k  zlib 3164667904   nbdkit 1.254 s ±  0.065 s
>   64k  zstd 3132686336   nbdkit 1.315 s ±  0.037 s
> 
>   512k zlib 3158744576   qemu   4.008 s ±  0.058 s
>   512k zstd 3032697344   qemu   1.503 s ±  0.072 s
> 
>   512k zlib 3158744576   nbdkit 1.702 s ±  0.026 s
>   512k zstd 3032697344   nbdkit 1.593 s ±  0.039 s
> 
>   2048kzlib 3197569024   qemu   4.327 s ±  0.051 s
>   2048kzstd 2995143168   qemu   1.465 s ±  0.085 s
> 
>   2048kzlib 3197569024   nbdkit 3.660 s ±  0.011 s
>   2048kzstd 2995143168   nbdkit 3.368 s ±  0.057 s
> 
> No great surprises - very small cluster size is inefficient, but
> scaling up the cluster size gain performance, and zstd performs better
> than zlib once the cluster size is sufficiently large.

The default qcow2 cluster size is 64k, which means we've already
got the vast majority of the perfornmance and file size win. Going
beyond 64k defaults doesn't seem massively compelling.

zstd does have a small space win over zlib as expected, but again
nothing so drastic that it seems compelling to change - that win
will be line noise in the overall bigger picture of image storage
and download times.

The major

Re: zlib-ng as a compat replacement for zlib

2023-09-01 Thread Richard W.M. Jones
On Fri, Sep 01, 2023 at 10:42:16AM +0100, Richard W.M. Jones wrote:
>   $ hyperfine 'nbdkit -U - --filter=qcow2dec file test.qcow2.XXX --run 
> '\''nbdcopy --request-size "$uri" test.out'\'' '

Sorry, copy and paste error, the command is:

hyperfine 'nbdkit -U - --filter=qcow2dec file test.qcow2.XXX --run '\''nbdcopy 
"$uri" test.out'\'' '

unless you want to adjust the --request-size parameter to fix
the scaling problem at the largest cluster size.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v




Re: zlib-ng as a compat replacement for zlib

2023-09-01 Thread Richard W.M. Jones
On Fri, Sep 01, 2023 at 10:48:14AM +0200, Kevin Wolf wrote:
> Am 31.08.2023 um 11:20 hat Richard W.M. Jones geschrieben:
> > On Thu, Aug 31, 2023 at 11:05:55AM +0200, Kevin Wolf wrote:
> > > [ Cc: qemu-block ]
> > > 
> > > Am 30.08.2023 um 20:26 hat Richard W.M. Jones geschrieben:
> > > > On Tue, Aug 29, 2023 at 05:49:24PM -, Daniel Alley wrote:
> > > > > > The background to this is I've spent far too long trying to optimize
> > > > > > the conversion of qcow2 files to raw files.  Most existing qcow2 
> > > > > > files
> > > > > > that you can find online are zlib compressed, including the qcow2
> > > > > > images provided by Fedora.  Each cluster in the file is separately
> > > > > > compressed as a zlib stream, and qemu uses zlib library functions to
> > > > > > decompress them.  When downloading and decompressing these files, I
> > > > > > measured 40%+ of the total CPU time is doing zlib decompression.
> > > > > > 
> > > > > > [You don't need to tell me how great Zstd is, qcow2 supports this 
> > > > > > for
> > > > > > compression also, but it is not widely used by existing content.]
> > > 
> > > You make it sound like compressing each cluster individually has a big
> > > impact. If so, does increasing the cluster size make a difference, too?
> > > That could be an change with less compatibility concerns.
> > 
> > The issue we're discussing in the original thread is speed of
> > decompression.  We noted that using zlib-ng (a not-quite drop-in
> > replacement for zlib) improves decompression speed by 40% or more.
> > 
> > Original thread:
> > https://lists.fedoraproject.org/archives/list/de...@lists.fedoraproject.org/thread/CDNPJ4SOTRQMYVCDI3ZSY4SP4FYESCWD/
> > zlib-ng proposed change:
> > https://src.fedoraproject.org/rpms/zlib-ng/pull-request/3
> > 
> > Size of the compressed file is also a concern, but wasn't discussed.
> 
> I understand the context and didn't really think about file size at all.
> 
> My question was essentially if decompressing many small blocks (as we do
> today) performs significantly different from decompressing fewer larger
> blocks (as we would do with a larger cluster size).

I did a quick test just by adjusting the cluster size of a qcow2 file:

  $ virt-builder fedora-36
  $ ls -lsh fedora-36.img 
  1.2G -rw-r--r--. 1 rjones rjones 6.0G Sep  1 09:53 fedora-36.img
  $ cat fedora-36.img fedora-36.img fedora-36.img fedora-36.img  > test.raw
  $ ls -lsh test.raw 
  4.7G -rw-r--r--. 1 rjones rjones 24G Sep  1 09:53 test.raw
  $ qemu-img convert -f raw test.raw -O qcow2 test.qcow2.zlib.4k -c -o 
compression_type=zlib,cluster_size=4096

(for cluster sizes 4k, 64k, 512k, 2048k, and
compression types zlib & zstd)

I tested the speed of decompression using:

  $ hyperfine 'qemu-img convert -W -m 16 -f qcow2 test.qcow2.XXX -O raw 
test.out'
  (qemu 8.0.0-4.fc39.x86_64)

  $ hyperfine 'nbdkit -U - --filter=qcow2dec file test.qcow2.XXX --run 
'\''nbdcopy --request-size "$uri" test.out'\'' '
  (nbdkit-1.35.11-2.fc40.x86_64)

Results:

  Cluster  Compression  Compressed size  Prog   Decompression speed

  4k   zlib 3228811264   qemu   5.921 s ±  0.074 s
  4k   zstd 3258097664   qemu   5.189 s ±  0.158 s

  4k   zlib/zstd nbdkit failed, bug!!

  64k  zlib 3164667904   qemu   3.579 s ±  0.094 s
  64k  zstd 3132686336   qemu   1.770 s ±  0.060 s

  64k  zlib 3164667904   nbdkit 1.254 s ±  0.065 s
  64k  zstd 3132686336   nbdkit 1.315 s ±  0.037 s

  512k zlib 3158744576   qemu   4.008 s ±  0.058 s
  512k zstd 3032697344   qemu   1.503 s ±  0.072 s

  512k zlib 3158744576   nbdkit 1.702 s ±  0.026 s
  512k zstd 3032697344   nbdkit 1.593 s ±  0.039 s

  2048kzlib 3197569024   qemu   4.327 s ±  0.051 s
  2048kzstd 2995143168   qemu   1.465 s ±  0.085 s

  2048kzlib 3197569024   nbdkit 3.660 s ±  0.011 s
  2048kzstd 2995143168   nbdkit 3.368 s ±  0.057 s

No great surprises - very small cluster size is inefficient, but
scaling up the cluster size gain performance, and zstd performs better
than zlib once the cluster size is sufficiently large.

Something strange happens with 2048k clusters + zlib - the file gets
larger and decompression gets slower again.

nbdkit has several issues, fails completely at 4k, and scaling issues
at the largest cluster size.  The scaling issue turns out to be caused
by the default request size used by nbdcopy (256k, increasing it cures
the problem).

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW




Re: [PATCH 7/7] qobject atomics osdep: Make a few macros more hygienic

2023-09-01 Thread Markus Armbruster
Eric Blake  writes:

> On Thu, Aug 31, 2023 at 03:25:46PM +0200, Markus Armbruster wrote:
>
> [This paragraph written last: Bear with my stream of consciousness
> review below, where I end up duplicating some of the conslusions you
> reached before the point where I saw where the patch was headed]
>
>> Variables declared in macros can shadow other variables.  Much of the
>> time, this is harmless, e.g.:
>> 
>> #define _FDT(exp)  \
>> do {   \
>> int ret = (exp);   \
>> if (ret < 0) { \
>> error_report("error creating device tree: %s: %s",   \
>> #exp, fdt_strerror(ret));  \
>> exit(1);   \
>> }  \
>> } while (0)
>
> Which is why I've seen some projects require a strict namespace
> separation: if all macro parameters and any identifiers declared in
> macros use either a leading or a trailing _ (I prefer a trailing one,
> to avoid risking conflicts with libc reserved namespace; but leading
> is usually okay), and all other identifiers avoid that namespace, then
> you will never have shadowing by calling a macro from normal code.
>
>> 
>> Harmless shadowing in h_client_architecture_support():
>> 
>> target_ulong ret;
>> 
>> [...]
>> 
>> ret = do_client_architecture_support(cpu, spapr, vec, fdt_bufsize);
>> if (ret == H_SUCCESS) {
>> _FDT((fdt_pack(spapr->fdt_blob)));
>> [...]
>> }
>> 
>> return ret;
>> 
>> However, we can get in trouble when the shadowed variable is used in a
>> macro argument:
>> 
>> #define QOBJECT(obj) ({ \
>> typeof(obj) o = (obj);  \
>> o ? container_of(&(o)->base, QObject, base) : NULL; \
>>  })
>> 
>> QOBJECT(o) expands into
>> 
>> ({
>> --->typeof(o) o = (o);
>> o ? container_of(&(o)->base, QObject, base) : NULL;
>> })
>> 
>> Unintended variable name capture at --->.  We'd be saved by
>> -Winit-self.  But I could certainly construct more elaborate death
>> traps that don't trigger it.
>
> Indeed, not fully understanding the preprocessor makes for some
> interesting death traps.

We use ALL_CAPS for macros to signal "watch out for traps".

>> To reduce the risk of trapping ourselves, we use variable names in
>> macros that no sane person would use elsewhere.  Here's our actual
>> definition of QOBJECT():
>> 
>> #define QOBJECT(obj) ({ \
>> typeof(obj) _obj = (obj);   \
>> _obj ? container_of(&(_obj)->base, QObject, base) : NULL;   \
>> })
>
> Yep, goes back to the policy I've seen of enforcing naming conventions
> that ensure macros can't shadow normal code.
>
>> 
>> Works well enough until we nest macro calls.  For instance, with
>> 
>> #define qobject_ref(obj) ({ \
>> typeof(obj) _obj = (obj);   \
>> qobject_ref_impl(QOBJECT(_obj));\
>> _obj;   \
>> })
>> 
>> the expression qobject_ref(obj) expands into
>> 
>> ({
>> typeof(obj) _obj = (obj);
>> qobject_ref_impl(
>> ({
>> --->typeof(_obj) _obj = (_obj);
>> _obj ? container_of(&(_obj)->base, QObject, base) : NULL;
>> }));
>> _obj;
>> })
>> 
>> Unintended variable name capture at --->.
>
> Yep, you've just proven how macros calling macros requires care, as we
> no longer have the namespace protections.  If macro calls can only
> form a DAG, it is possible to choose unique intermediate declarations
> for every macro (although remembering to do that, and still keeping
> the macro definition legible, may not be easy).  But if you can have
> macros that form any sort of nesting loop (and we WANT to allow things
> like MIN(MIN(a, b), c) - which is such a loop), dynamic naming becomes
> the only solution.
>
>> 
>> The only reliable way to prevent unintended variable name capture is
>> -Wshadow.
>
> Yes, I would love to have that enabled eventually.
>
>> 
>> One blocker for enabling it is shadowing hiding in function-like
>> macros like
>> 
>>  qdict_put(dict, "name", qobject_ref(...))
>> 
>> qdict_put() wraps its last argument in QOBJECT(), and the last
>> argument here contains another QOBJECT().
>> 
>> Use dark preprocessor sorcery to make the macros that give us this
>> problem use different variable names on every call.
>
> Sounds foreboding; hopefully not many macros are affected...
>
>> 
>> Signed-off-by: Markus Armbruster 
>> ---
>>  include/qapi/qmp/qobjec

Re: zlib-ng as a compat replacement for zlib

2023-09-01 Thread Kevin Wolf
Am 31.08.2023 um 11:20 hat Richard W.M. Jones geschrieben:
> On Thu, Aug 31, 2023 at 11:05:55AM +0200, Kevin Wolf wrote:
> > [ Cc: qemu-block ]
> > 
> > Am 30.08.2023 um 20:26 hat Richard W.M. Jones geschrieben:
> > > On Tue, Aug 29, 2023 at 05:49:24PM -, Daniel Alley wrote:
> > > > > The background to this is I've spent far too long trying to optimize
> > > > > the conversion of qcow2 files to raw files.  Most existing qcow2 files
> > > > > that you can find online are zlib compressed, including the qcow2
> > > > > images provided by Fedora.  Each cluster in the file is separately
> > > > > compressed as a zlib stream, and qemu uses zlib library functions to
> > > > > decompress them.  When downloading and decompressing these files, I
> > > > > measured 40%+ of the total CPU time is doing zlib decompression.
> > > > > 
> > > > > [You don't need to tell me how great Zstd is, qcow2 supports this for
> > > > > compression also, but it is not widely used by existing content.]
> > 
> > You make it sound like compressing each cluster individually has a big
> > impact. If so, does increasing the cluster size make a difference, too?
> > That could be an change with less compatibility concerns.
> 
> The issue we're discussing in the original thread is speed of
> decompression.  We noted that using zlib-ng (a not-quite drop-in
> replacement for zlib) improves decompression speed by 40% or more.
> 
> Original thread:
> https://lists.fedoraproject.org/archives/list/de...@lists.fedoraproject.org/thread/CDNPJ4SOTRQMYVCDI3ZSY4SP4FYESCWD/
> zlib-ng proposed change:
> https://src.fedoraproject.org/rpms/zlib-ng/pull-request/3
> 
> Size of the compressed file is also a concern, but wasn't discussed.

I understand the context and didn't really think about file size at all.

My question was essentially if decompressing many small blocks (as we do
today) performs significantly different from decompressing fewer larger
blocks (as we would do with a larger cluster size).

Kevin




Re: [PATCH v3 10/20] hw/virtio: add config support to vhost-user-device

2023-09-01 Thread Albert Esteve
On Thu, Aug 31, 2023 at 6:03 PM Alex Bennée  wrote:

>
> Albert Esteve  writes:
>
> > Sorry to bring up this post, it's been a while since you posted.
> > But I have been testing the patch the last couple of days.
> >
> > On Mon, Jul 10, 2023 at 9:58 PM Michael S. Tsirkin 
> wrote:
> >
> >  On Mon, Jul 10, 2023 at 04:35:12PM +0100, Alex Bennée wrote:
> >  > To use the generic device the user will need to provide the config
> >  > region size via the command line. We also add a notifier so the guest
> >  > can be pinged if the remote daemon updates the config.
> >  >
> >  > With these changes:
> >  >
> >  >   -device vhost-user-device-pci,virtio-id=41,num_vqs=2,config_size=8
> >  >
> >  > is equivalent to:
> >  >
> >  >   -device vhost-user-gpio-pci
> >  >
> >  > Signed-off-by: Alex Bennée 
> >
> >  This one I think it's best to defer until we get a better
> >  handle on how we want the configuration to look.
> >
> >  > ---
> >  >  include/hw/virtio/vhost-user-device.h |  1 +
> >  >  hw/virtio/vhost-user-device.c | 58
> ++-
> >  >  2 files changed, 58 insertions(+), 1 deletion(-)
> >  >
> >  > diff --git a/include/hw/virtio/vhost-user-device.h
> b/include/hw/virtio/vhost-user-device.h
> >  > index 9105011e25..3ddf88a146 100644
> >  > --- a/include/hw/virtio/vhost-user-device.h
> >  > +++ b/include/hw/virtio/vhost-user-device.h
> >  > @@ -22,6 +22,7 @@ struct VHostUserBase {
> >  >  CharBackend chardev;
> >  >  uint16_t virtio_id;
> >  >  uint32_t num_vqs;
> >  > +uint32_t config_size;
> >  >  /* State tracking */
> >  >  VhostUserState vhost_user;
> >  >  struct vhost_virtqueue *vhost_vq;
> >  > diff --git a/hw/virtio/vhost-user-device.c
> b/hw/virtio/vhost-user-device.c
> >  > index b0239fa033..2b028cae08 100644
> >  > --- a/hw/virtio/vhost-user-device.c
> >  > +++ b/hw/virtio/vhost-user-device.c
> >  > @@ -117,6 +117,42 @@ static uint64_t vub_get_features(VirtIODevice
> *vdev,
> >  >  return vub->vhost_dev.features & ~(1ULL <<
> VHOST_USER_F_PROTOCOL_FEATURES);
> >  >  }
> >  >
> >  > +/*
> >  > + * To handle VirtIO config we need to know the size of the config
> >  > + * space. We don't cache the config but re-fetch it from the guest
> >  > + * every time in case something has changed.
> >  > + */
> >  > +static void vub_get_config(VirtIODevice *vdev, uint8_t *config)
> >  > +{
> >  > +VHostUserBase *vub = VHOST_USER_BASE(vdev);
> >  > +Error *local_err = NULL;
> >  > +
> >  > +/*
> >  > + * There will have been a warning during vhost_dev_init, but lets
> >  > + * assert here as nothing will go right now.
> >  > + */
> >  > +g_assert(vub->config_size && vub->vhost_user.supports_config ==
> true);
> >  > +
> >  > +if (vhost_dev_get_config(&vub->vhost_dev, config,
> >  > + vub->config_size, &local_err)) {
> >  > +error_report_err(local_err);
> >  > +}
> >  > +}
> >  > +
> >  > +/*
> >  > + * When the daemon signals an update to the config we just need to
> >  > + * signal the guest as we re-read the config on demand above.
> >  > + */
> >  > +static int vub_config_notifier(struct vhost_dev *dev)
> >  > +{
> >  > +virtio_notify_config(dev->vdev);
> >  > +return 0;
> >  > +}
> >  > +
> >  > +const VhostDevConfigOps vub_config_ops = {
> >  > +.vhost_dev_config_notifier = vub_config_notifier,
> >  > +};
> >  > +
> >  >  static void vub_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> >  >  {
> >  >  /*
> >  > @@ -141,12 +177,21 @@ static int vub_connect(DeviceState *dev)
> >  >  {
> >  >  VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> >  >  VHostUserBase *vub = VHOST_USER_BASE(vdev);
> >  > +struct vhost_dev *vhost_dev = &vub->vhost_dev;
> >  >
> >  >  if (vub->connected) {
> >  >  return 0;
> >  >  }
> >  >  vub->connected = true;
> >  >
> >  > +/*
> >  > + * If we support VHOST_USER_GET_CONFIG we must enable the
> notifier
> >  > + * so we can ping the guest when it updates.
> >  > + */
> >  > +if (vub->vhost_user.supports_config) {
> >  > +vhost_dev_set_config_notifier(vhost_dev, &vub_config_ops);
> >  > +}
> >  > +
> >  >  /* restore vhost state */
> >  >  if (virtio_device_started(vdev, vdev->status)) {
> >  >  vub_start(vdev);
> >  > @@ -214,11 +259,20 @@ static void vub_device_realize(DeviceState
> *dev, Error **errp)
> >  >  vub->num_vqs = 1; /* reasonable default? */
> >  >  }
> >  >
> >  > +/*
> >  > + * We can't handle config requests unless we know the size of the
> >  > + * config region, specialisations of the vhost-user-device will
> be
> >  > + * able to set this.
> >  > + */
> >  > +if (vub->config_size) {
> >  > +vub->vhost_user.supports_config = true;
> >  > +}
> >
> > Shouldn't the `supports_config = true' be set before we call
> vhost_dev_init() a few lines above?
> > Otherwise, we end up checking the `supports_confi

[PULL 09/14] block/throttle-groups: Use ThrottleDirection instread of bool is_write

2023-09-01 Thread Hanna Czenczek
From: zhenwei pi 

'bool is_write' style is obsolete from throttle framework, adapt
block throttle groups to the new style:
- use ThrottleDirection instead of 'bool is_write'. Ex,
  schedule_next_request(ThrottleGroupMember *tgm, bool is_write)
  -> schedule_next_request(ThrottleGroupMember *tgm, ThrottleDirection 
direction)

- use THROTTLE_MAX instead of hard code. Ex, ThrottleGroupMember *tokens[2]
  -> ThrottleGroupMember *tokens[THROTTLE_MAX]

- use ThrottleDirection instead of hard code on iteration. Ex, (i = 0; i < 2; 
i++)
  -> for (dir = THROTTLE_READ; dir < THROTTLE_MAX; dir++)

Use a simple python script to test the new style:
 #!/usr/bin/python3
import subprocess
import random
import time

commands = ['virsh blkdeviotune jammy vda --write-bytes-sec ', \
'virsh blkdeviotune jammy vda --write-iops-sec ', \
'virsh blkdeviotune jammy vda --read-bytes-sec ', \
'virsh blkdeviotune jammy vda --read-iops-sec ']

for loop in range(1, 1000):
time.sleep(random.randrange(3, 5))
command = commands[random.randrange(0, 3)] + str(random.randrange(0, 
100))
subprocess.run(command, shell=True, check=True)

This works fine.

Signed-off-by: zhenwei pi 
Message-Id: <20230728022006.1098509-10-pizhen...@bytedance.com>
Reviewed-by: Hanna Czenczek 
Signed-off-by: Hanna Czenczek 
---
 include/block/throttle-groups.h |   6 +-
 block/block-backend.c   |   4 +-
 block/throttle-groups.c | 161 
 block/throttle.c|   8 +-
 4 files changed, 90 insertions(+), 89 deletions(-)

diff --git a/include/block/throttle-groups.h b/include/block/throttle-groups.h
index ff282fc0f8..2355e8d9de 100644
--- a/include/block/throttle-groups.h
+++ b/include/block/throttle-groups.h
@@ -37,7 +37,7 @@ typedef struct ThrottleGroupMember {
 AioContext   *aio_context;
 /* throttled_reqs_lock protects the CoQueues for throttled requests.  */
 CoMutex  throttled_reqs_lock;
-CoQueue  throttled_reqs[2];
+CoQueue  throttled_reqs[THROTTLE_MAX];
 
 /* Nonzero if the I/O limits are currently being ignored; generally
  * it is zero.  Accessed with atomic operations.
@@ -54,7 +54,7 @@ typedef struct ThrottleGroupMember {
  * throttle_state tells us if I/O limits are configured. */
 ThrottleState *throttle_state;
 ThrottleTimers throttle_timers;
-unsigned   pending_reqs[2];
+unsigned   pending_reqs[THROTTLE_MAX];
 QLIST_ENTRY(ThrottleGroupMember) round_robin;
 
 } ThrottleGroupMember;
@@ -78,7 +78,7 @@ void throttle_group_restart_tgm(ThrottleGroupMember *tgm);
 
 void coroutine_fn throttle_group_co_io_limits_intercept(ThrottleGroupMember 
*tgm,
 int64_t bytes,
-bool is_write);
+ThrottleDirection 
direction);
 void throttle_group_attach_aio_context(ThrottleGroupMember *tgm,
AioContext *new_context);
 void throttle_group_detach_aio_context(ThrottleGroupMember *tgm);
diff --git a/block/block-backend.c b/block/block-backend.c
index 4009ed5fed..47d360c97a 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1341,7 +1341,7 @@ blk_co_do_preadv_part(BlockBackend *blk, int64_t offset, 
int64_t bytes,
 /* throttling disk I/O */
 if (blk->public.throttle_group_member.throttle_state) {
 
throttle_group_co_io_limits_intercept(&blk->public.throttle_group_member,
-bytes, false);
+bytes, THROTTLE_READ);
 }
 
 ret = bdrv_co_preadv_part(blk->root, offset, bytes, qiov, qiov_offset,
@@ -1415,7 +1415,7 @@ blk_co_do_pwritev_part(BlockBackend *blk, int64_t offset, 
int64_t bytes,
 /* throttling disk I/O */
 if (blk->public.throttle_group_member.throttle_state) {
 
throttle_group_co_io_limits_intercept(&blk->public.throttle_group_member,
-bytes, true);
+bytes, THROTTLE_WRITE);
 }
 
 if (!blk->enable_write_cache) {
diff --git a/block/throttle-groups.c b/block/throttle-groups.c
index 3847d27801..3eda4c4e3d 100644
--- a/block/throttle-groups.c
+++ b/block/throttle-groups.c
@@ -37,7 +37,7 @@
 
 static void throttle_group_obj_init(Object *obj);
 static void throttle_group_obj_complete(UserCreatable *obj, Error **errp);
-static void timer_cb(ThrottleGroupMember *tgm, bool is_write);
+static void timer_cb(ThrottleGroupMember *tgm, ThrottleDirection direction);
 
 /* The ThrottleGroup structure (with its ThrottleState) is shared
  * among different ThrottleGroupMembers and it's independent from
@@ -73,8 +73,8 @@ struct ThrottleGroup {
 QemuMutex lock; /* This lock protects the following four fields */
 ThrottleState ts;
 QLIST_HEAD(, ThrottleGroupMember) head;
-ThrottleGroupMember *tokens[2];
-bool any_timer_armed[2];
+ThrottleGroupMember *tokens[THROTTLE_MAX

[PULL 13/14] file-posix: Simplify raw_co_prw's 'out' zone code

2023-09-01 Thread Hanna Czenczek
We duplicate the same condition three times here, pull it out to the top
level.

Signed-off-by: Hanna Czenczek 
Message-Id: <20230824155345.109765-5-hre...@redhat.com>
Reviewed-by: Sam Li 
---
 block/file-posix.c | 18 +-
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index a050682e97..aa89789737 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2506,11 +2506,10 @@ static int coroutine_fn raw_co_prw(BlockDriverState 
*bs, uint64_t offset,
 
 out:
 #if defined(CONFIG_BLKZONED)
-{
-BlockZoneWps *wps = bs->wps;
-if (ret == 0) {
-if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) &&
-bs->bl.zoned != BLK_Z_NONE) {
+if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) &&
+bs->bl.zoned != BLK_Z_NONE) {
+BlockZoneWps *wps = bs->wps;
+if (ret == 0) {
 uint64_t *wp = &wps->wp[offset / bs->bl.zone_size];
 if (!BDRV_ZT_IS_CONV(*wp)) {
 if (type & QEMU_AIO_ZONE_APPEND) {
@@ -2523,19 +2522,12 @@ out:
 *wp = offset + bytes;
 }
 }
-}
-} else {
-if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) &&
-bs->bl.zoned != BLK_Z_NONE) {
+} else {
 update_zones_wp(bs, s->fd, 0, 1);
 }
-}
 
-if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) &&
-bs->blk.zoned != BLK_Z_NONE) {
 qemu_co_mutex_unlock(&wps->colock);
 }
-}
 #endif
 return ret;
 }
-- 
2.41.0




[PULL 07/14] throttle: use THROTTLE_MAX/ARRAY_SIZE for hard code

2023-09-01 Thread Hanna Czenczek
From: zhenwei pi 

The first dimension of both to_check and
bucket_types_size/bucket_types_units is used as throttle direction,
use THROTTLE_MAX instead of hard coded number. Also use ARRAY_SIZE()
to avoid hard coded number for the second dimension.

Hanna noticed that the two array should be static. Yes, turn them
into static variables.

Reviewed-by: Hanna Czenczek 
Signed-off-by: zhenwei pi 
Message-Id: <20230728022006.1098509-8-pizhen...@bytedance.com>
Signed-off-by: Hanna Czenczek 
---
 util/throttle.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/util/throttle.c b/util/throttle.c
index 7d3eb6032f..9582899da3 100644
--- a/util/throttle.c
+++ b/util/throttle.c
@@ -142,7 +142,8 @@ int64_t throttle_compute_wait(LeakyBucket *bkt)
 static int64_t throttle_compute_wait_for(ThrottleState *ts,
  ThrottleDirection direction)
 {
-BucketType to_check[2][4] = { {THROTTLE_BPS_TOTAL,
+static const BucketType to_check[THROTTLE_MAX][4] = {
+  {THROTTLE_BPS_TOTAL,
THROTTLE_OPS_TOTAL,
THROTTLE_BPS_READ,
THROTTLE_OPS_READ},
@@ -153,7 +154,7 @@ static int64_t throttle_compute_wait_for(ThrottleState *ts,
 int64_t wait, max_wait = 0;
 int i;
 
-for (i = 0; i < 4; i++) {
+for (i = 0; i < ARRAY_SIZE(to_check[THROTTLE_READ]); i++) {
 BucketType index = to_check[direction][i];
 wait = throttle_compute_wait(&ts->cfg.buckets[index]);
 if (wait > max_wait) {
@@ -469,11 +470,11 @@ bool throttle_schedule_timer(ThrottleState *ts,
 void throttle_account(ThrottleState *ts, ThrottleDirection direction,
   uint64_t size)
 {
-const BucketType bucket_types_size[2][2] = {
+static const BucketType bucket_types_size[THROTTLE_MAX][2] = {
 { THROTTLE_BPS_TOTAL, THROTTLE_BPS_READ },
 { THROTTLE_BPS_TOTAL, THROTTLE_BPS_WRITE }
 };
-const BucketType bucket_types_units[2][2] = {
+static const BucketType bucket_types_units[THROTTLE_MAX][2] = {
 { THROTTLE_OPS_TOTAL, THROTTLE_OPS_READ },
 { THROTTLE_OPS_TOTAL, THROTTLE_OPS_WRITE }
 };
@@ -486,7 +487,7 @@ void throttle_account(ThrottleState *ts, ThrottleDirection 
direction,
 units = (double) size / ts->cfg.op_size;
 }
 
-for (i = 0; i < 2; i++) {
+for (i = 0; i < ARRAY_SIZE(bucket_types_size[THROTTLE_READ]); i++) {
 LeakyBucket *bkt;
 
 bkt = &ts->cfg.buckets[bucket_types_size[direction][i]];
-- 
2.41.0




[PULL 10/14] file-posix: Clear bs->bl.zoned on error

2023-09-01 Thread Hanna Czenczek
bs->bl.zoned is what indicates whether the zone information is present
and valid; it is the only thing that raw_refresh_zoned_limits() sets if
CONFIG_BLKZONED is not defined, and it is also the only thing that it
sets if CONFIG_BLKZONED is defined, but there are no zones.

Make sure that it is always set to BLK_Z_NONE if there is an error
anywhere in raw_refresh_zoned_limits() so that we do not accidentally
announce zones while our information is incomplete or invalid.

This also fixes a memory leak in the last error path in
raw_refresh_zoned_limits().

Signed-off-by: Hanna Czenczek 
Message-Id: <20230824155345.109765-2-hre...@redhat.com>
Reviewed-by: Sam Li 
---
 block/file-posix.c | 21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index b16e9c21a1..2b88b9eefa 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1412,11 +1412,9 @@ static void raw_refresh_zoned_limits(BlockDriverState 
*bs, struct stat *st,
 BlockZoneModel zoned;
 int ret;
 
-bs->bl.zoned = BLK_Z_NONE;
-
 ret = get_sysfs_zoned_model(st, &zoned);
 if (ret < 0 || zoned == BLK_Z_NONE) {
-return;
+goto no_zoned;
 }
 bs->bl.zoned = zoned;
 
@@ -1437,10 +1435,10 @@ static void raw_refresh_zoned_limits(BlockDriverState 
*bs, struct stat *st,
 if (ret < 0) {
 error_setg_errno(errp, -ret, "Unable to read chunk_sectors "
  "sysfs attribute");
-return;
+goto no_zoned;
 } else if (!ret) {
 error_setg(errp, "Read 0 from chunk_sectors sysfs attribute");
-return;
+goto no_zoned;
 }
 bs->bl.zone_size = ret << BDRV_SECTOR_BITS;
 
@@ -1448,10 +1446,10 @@ static void raw_refresh_zoned_limits(BlockDriverState 
*bs, struct stat *st,
 if (ret < 0) {
 error_setg_errno(errp, -ret, "Unable to read nr_zones "
  "sysfs attribute");
-return;
+goto no_zoned;
 } else if (!ret) {
 error_setg(errp, "Read 0 from nr_zones sysfs attribute");
-return;
+goto no_zoned;
 }
 bs->bl.nr_zones = ret;
 
@@ -1472,10 +1470,15 @@ static void raw_refresh_zoned_limits(BlockDriverState 
*bs, struct stat *st,
 ret = get_zones_wp(bs, s->fd, 0, bs->bl.nr_zones, 0);
 if (ret < 0) {
 error_setg_errno(errp, -ret, "report wps failed");
-bs->wps = NULL;
-return;
+goto no_zoned;
 }
 qemu_co_mutex_init(&bs->wps->colock);
+return;
+
+no_zoned:
+bs->bl.zoned = BLK_Z_NONE;
+g_free(bs->wps);
+bs->wps = NULL;
 }
 #else /* !defined(CONFIG_BLKZONED) */
 static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
-- 
2.41.0




[PULL 06/14] throttle: use enum ThrottleDirection instead of bool is_write

2023-09-01 Thread Hanna Czenczek
From: zhenwei pi 

enum ThrottleDirection is already there, use ThrottleDirection instead
of 'bool is_write' for throttle API, also modify related codes from
block, fsdev, cryptodev and tests.

Reviewed-by: Hanna Czenczek 
Signed-off-by: zhenwei pi 
Message-Id: <20230728022006.1098509-7-pizhen...@bytedance.com>
Signed-off-by: Hanna Czenczek 
---
 include/qemu/throttle.h |  5 +++--
 backends/cryptodev.c|  9 +
 block/throttle-groups.c |  6 --
 fsdev/qemu-fsdev-throttle.c |  8 +---
 tests/unit/test-throttle.c  |  4 ++--
 util/throttle.c | 31 +--
 6 files changed, 36 insertions(+), 27 deletions(-)

diff --git a/include/qemu/throttle.h b/include/qemu/throttle.h
index 9ca5ab8197..181245d29b 100644
--- a/include/qemu/throttle.h
+++ b/include/qemu/throttle.h
@@ -154,9 +154,10 @@ void throttle_config_init(ThrottleConfig *cfg);
 /* usage */
 bool throttle_schedule_timer(ThrottleState *ts,
  ThrottleTimers *tt,
- bool is_write);
+ ThrottleDirection direction);
 
-void throttle_account(ThrottleState *ts, bool is_write, uint64_t size);
+void throttle_account(ThrottleState *ts, ThrottleDirection direction,
+  uint64_t size);
 void throttle_limits_to_config(ThrottleLimits *arg, ThrottleConfig *cfg,
Error **errp);
 void throttle_config_to_limits(ThrottleConfig *cfg, ThrottleLimits *var);
diff --git a/backends/cryptodev.c b/backends/cryptodev.c
index c2356550c8..e5006bd215 100644
--- a/backends/cryptodev.c
+++ b/backends/cryptodev.c
@@ -252,10 +252,11 @@ static void cryptodev_backend_throttle_timer_cb(void 
*opaque)
 continue;
 }
 
-throttle_account(&backend->ts, true, ret);
+throttle_account(&backend->ts, THROTTLE_WRITE, ret);
 cryptodev_backend_operation(backend, op_info);
 if (throttle_enabled(&backend->tc) &&
-throttle_schedule_timer(&backend->ts, &backend->tt, true)) {
+throttle_schedule_timer(&backend->ts, &backend->tt,
+THROTTLE_WRITE)) {
 break;
 }
 }
@@ -271,7 +272,7 @@ int cryptodev_backend_crypto_operation(
 goto do_account;
 }
 
-if (throttle_schedule_timer(&backend->ts, &backend->tt, true) ||
+if (throttle_schedule_timer(&backend->ts, &backend->tt, THROTTLE_WRITE) ||
 !QTAILQ_EMPTY(&backend->opinfos)) {
 QTAILQ_INSERT_TAIL(&backend->opinfos, op_info, next);
 return 0;
@@ -283,7 +284,7 @@ do_account:
 return ret;
 }
 
-throttle_account(&backend->ts, true, ret);
+throttle_account(&backend->ts, THROTTLE_WRITE, ret);
 
 return cryptodev_backend_operation(backend, op_info);
 }
diff --git a/block/throttle-groups.c b/block/throttle-groups.c
index fb203c3ced..3847d27801 100644
--- a/block/throttle-groups.c
+++ b/block/throttle-groups.c
@@ -270,6 +270,7 @@ static bool 
throttle_group_schedule_timer(ThrottleGroupMember *tgm,
 ThrottleState *ts = tgm->throttle_state;
 ThrottleGroup *tg = container_of(ts, ThrottleGroup, ts);
 ThrottleTimers *tt = &tgm->throttle_timers;
+ThrottleDirection direction = is_write ? THROTTLE_WRITE : THROTTLE_READ;
 bool must_wait;
 
 if (qatomic_read(&tgm->io_limits_disabled)) {
@@ -281,7 +282,7 @@ static bool 
throttle_group_schedule_timer(ThrottleGroupMember *tgm,
 return true;
 }
 
-must_wait = throttle_schedule_timer(ts, tt, is_write);
+must_wait = throttle_schedule_timer(ts, tt, direction);
 
 /* If a timer just got armed, set tgm as the current token */
 if (must_wait) {
@@ -364,6 +365,7 @@ void coroutine_fn 
throttle_group_co_io_limits_intercept(ThrottleGroupMember *tgm
 bool must_wait;
 ThrottleGroupMember *token;
 ThrottleGroup *tg = container_of(tgm->throttle_state, ThrottleGroup, ts);
+ThrottleDirection direction = is_write ? THROTTLE_WRITE : THROTTLE_READ;
 
 assert(bytes >= 0);
 
@@ -386,7 +388,7 @@ void coroutine_fn 
throttle_group_co_io_limits_intercept(ThrottleGroupMember *tgm
 }
 
 /* The I/O will be executed, so do the accounting */
-throttle_account(tgm->throttle_state, is_write, bytes);
+throttle_account(tgm->throttle_state, direction, bytes);
 
 /* Schedule the next request */
 schedule_next_request(tgm, is_write);
diff --git a/fsdev/qemu-fsdev-throttle.c b/fsdev/qemu-fsdev-throttle.c
index 5c83a1cc09..1c137d6f0f 100644
--- a/fsdev/qemu-fsdev-throttle.c
+++ b/fsdev/qemu-fsdev-throttle.c
@@ -97,16 +97,18 @@ void fsdev_throttle_init(FsThrottle *fst)
 void coroutine_fn fsdev_co_throttle_request(FsThrottle *fst, bool is_write,
 struct iovec *iov, int iovcnt)
 {
+ThrottleDirection direction = is_write ? THROTTLE_WRITE : THROTTLE_READ;
+
 if (throttle_enabled(&fst->cfg)) {
-if (throttle_schedule_timer(&fst->ts, &fst->tt, i

[PULL 11/14] file-posix: Check bs->bl.zoned for zone info

2023-09-01 Thread Hanna Czenczek
Instead of checking bs->wps or bs->bl.zone_size for whether zone
information is present, check bs->bl.zoned.  That is the flag that
raw_refresh_zoned_limits() reliably sets to indicate zone support.  If
it is set to something other than BLK_Z_NONE, other values and objects
like bs->wps and bs->bl.zone_size must be non-null/zero and valid; if it
is not, we cannot rely on their validity.

Signed-off-by: Hanna Czenczek 
Message-Id: <20230824155345.109765-3-hre...@redhat.com>
Reviewed-by: Sam Li 
---
 block/file-posix.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 2b88b9eefa..46e22403fe 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2455,9 +2455,10 @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, 
uint64_t offset,
 if (fd_open(bs) < 0)
 return -EIO;
 #if defined(CONFIG_BLKZONED)
-if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) && bs->wps) {
+if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) &&
+bs->bl.zoned != BLK_Z_NONE) {
 qemu_co_mutex_lock(&bs->wps->colock);
-if (type & QEMU_AIO_ZONE_APPEND && bs->bl.zone_size) {
+if (type & QEMU_AIO_ZONE_APPEND) {
 int index = offset / bs->bl.zone_size;
 offset = bs->wps->wp[index];
 }
@@ -2508,8 +2509,8 @@ out:
 {
 BlockZoneWps *wps = bs->wps;
 if (ret == 0) {
-if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND))
-&& wps && bs->bl.zone_size) {
+if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) &&
+bs->bl.zoned != BLK_Z_NONE) {
 uint64_t *wp = &wps->wp[offset / bs->bl.zone_size];
 if (!BDRV_ZT_IS_CONV(*wp)) {
 if (type & QEMU_AIO_ZONE_APPEND) {
@@ -2529,7 +2530,8 @@ out:
 }
 }
 
-if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) && wps) {
+if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) &&
+bs->blk.zoned != BLK_Z_NONE) {
 qemu_co_mutex_unlock(&wps->colock);
 }
 }
-- 
2.41.0




[PULL 14/14] tests/file-io-error: New test

2023-09-01 Thread Hanna Czenczek
This is a regression test for
https://bugzilla.redhat.com/show_bug.cgi?id=2234374.

All this test needs to do is trigger an I/O error inside of file-posix
(specifically raw_co_prw()).  One reliable way to do this without
requiring special privileges is to use a FUSE export, which allows us to
inject any error that we want, e.g. via blkdebug.

Signed-off-by: Hanna Czenczek 
Message-Id: <20230824155345.109765-6-hre...@redhat.com>
[hreitz: Fixed test to be skipped when there is no FUSE support, to
 suppress fusermount's allow_other warning, and to be skipped
 with $IMGOPTSSYNTAX enabled]
Signed-off-by: Hanna Czenczek 
---
 tests/qemu-iotests/tests/file-io-error | 119 +
 tests/qemu-iotests/tests/file-io-error.out |  33 ++
 2 files changed, 152 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/file-io-error
 create mode 100644 tests/qemu-iotests/tests/file-io-error.out

diff --git a/tests/qemu-iotests/tests/file-io-error 
b/tests/qemu-iotests/tests/file-io-error
new file mode 100755
index 00..88ee5f670c
--- /dev/null
+++ b/tests/qemu-iotests/tests/file-io-error
@@ -0,0 +1,119 @@
+#!/usr/bin/env bash
+# group: rw
+#
+# Produce an I/O error in file-posix, and hope that it is not catastrophic.
+# Regression test for: https://bugzilla.redhat.com/show_bug.cgi?id=2234374
+#
+# Copyright (C) 2023 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+seq=$(basename "$0")
+echo "QA output created by $seq"
+
+status=1   # failure is the default!
+
+_cleanup()
+{
+_cleanup_qemu
+rm -f "$TEST_DIR/fuse-export"
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ../common.rc
+. ../common.filter
+. ../common.qemu
+
+# Format-agnostic (we do not use any), but we do test the file protocol
+_supported_proto file
+_require_drivers blkdebug null-co
+
+if [ "$IMGOPTSSYNTAX" = "true" ]; then
+# We need `$QEMU_IO -f file` to work; IMGOPTSSYNTAX uses --image-opts,
+# breaking -f.
+_unsupported_fmt $IMGFMT
+fi
+
+# This is a regression test of a bug in which flie-posix would access zone
+# information in case of an I/O error even when there is no zone information,
+# resulting in a division by zero.
+# To reproduce the problem, we need to trigger an I/O error inside of
+# file-posix, which can be done (rootless) by providing a FUSE export that
+# presents only errors when accessed.
+
+_launch_qemu
+_send_qemu_cmd $QEMU_HANDLE \
+"{'execute': 'qmp_capabilities'}" \
+'return'
+
+_send_qemu_cmd $QEMU_HANDLE \
+"{'execute': 'blockdev-add',
+  'arguments': {
+  'driver': 'blkdebug',
+  'node-name': 'node0',
+  'inject-error': [{'event': 'none'}],
+  'image': {
+  'driver': 'null-co'
+  }
+  }}" \
+'return'
+
+# FUSE mountpoint must exist and be a regular file
+touch "$TEST_DIR/fuse-export"
+
+# The grep -v to filter fusermount's (benign) error when /etc/fuse.conf does
+# not contain user_allow_other and the subsequent check for missing FUSE 
support
+# have both been taken from iotest 308.
+output=$(_send_qemu_cmd $QEMU_HANDLE \
+"{'execute': 'block-export-add',
+  'arguments': {
+  'id': 'exp0',
+  'type': 'fuse',
+  'node-name': 'node0',
+  'mountpoint': '$TEST_DIR/fuse-export',
+  'writable': true
+  }}" \
+'return' \
+| grep -v 'option allow_other only allowed if')
+
+if echo "$output" | grep -q "Parameter 'type' does not accept value 'fuse'"; 
then
+_notrun 'No FUSE support'
+fi
+echo "$output"
+
+echo
+# This should fail, but gracefully, i.e. just print an I/O error, not crash.
+$QEMU_IO -f file -c 'write 0 64M' "$TEST_DIR/fuse-export" | _filter_qemu_io
+echo
+
+_send_qemu_cmd $QEMU_HANDLE \
+"{'execute': 'block-export-del',
+  'arguments': {'id': 'exp0'}}" \
+'return'
+
+_send_qemu_cmd $QEMU_HANDLE \
+'' \
+'BLOCK_EXPORT_DELETED'
+
+_send_qemu_cmd $QEMU_HANDLE \
+"{'execute': 'blockdev-del',
+  'arguments': {'node-name': 'node0'}}" \
+'return'
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/tests/file-io-error.out 
b/tests/qemu-iotests/tests/file-io-error.out
new file mode 100644
index 00..0f46455a94
--- /dev/null
+++ b/tests/qemu-iotests/tests/file-io-error.out
@@ -0,0 +1,3

[PULL 12/14] file-posix: Fix zone update in I/O error path

2023-09-01 Thread Hanna Czenczek
We must check that zone information is present before running
update_zones_wp().

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2234374
Fixes: Coverity CID 1512459
Signed-off-by: Hanna Czenczek 
Message-Id: <20230824155345.109765-4-hre...@redhat.com>
Reviewed-by: Sam Li 
---
 block/file-posix.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 46e22403fe..a050682e97 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2525,7 +2525,8 @@ out:
 }
 }
 } else {
-if (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
+if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) &&
+bs->bl.zoned != BLK_Z_NONE) {
 update_zones_wp(bs, s->fd, 0, 1);
 }
 }
-- 
2.41.0




[PULL 08/14] fsdev: Use ThrottleDirection instread of bool is_write

2023-09-01 Thread Hanna Czenczek
From: zhenwei pi 

'bool is_write' style is obsolete from throttle framework, adapt
fsdev to the new style.

Cc: Greg Kurz 
Reviewed-by: Hanna Czenczek 
Signed-off-by: zhenwei pi 
Message-Id: <20230728022006.1098509-9-pizhen...@bytedance.com>
Reviewed-by: Greg Kurz 
Signed-off-by: Hanna Czenczek 
---
 fsdev/qemu-fsdev-throttle.h |  4 ++--
 fsdev/qemu-fsdev-throttle.c | 14 +++---
 hw/9pfs/cofile.c|  4 ++--
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/fsdev/qemu-fsdev-throttle.h b/fsdev/qemu-fsdev-throttle.h
index a21aecddc7..daa8ca2494 100644
--- a/fsdev/qemu-fsdev-throttle.h
+++ b/fsdev/qemu-fsdev-throttle.h
@@ -23,14 +23,14 @@ typedef struct FsThrottle {
 ThrottleState ts;
 ThrottleTimers tt;
 ThrottleConfig cfg;
-CoQueue  throttled_reqs[2];
+CoQueue  throttled_reqs[THROTTLE_MAX];
 } FsThrottle;
 
 int fsdev_throttle_parse_opts(QemuOpts *, FsThrottle *, Error **);
 
 void fsdev_throttle_init(FsThrottle *);
 
-void coroutine_fn fsdev_co_throttle_request(FsThrottle *, bool ,
+void coroutine_fn fsdev_co_throttle_request(FsThrottle *, ThrottleDirection ,
 struct iovec *, int);
 
 void fsdev_throttle_cleanup(FsThrottle *);
diff --git a/fsdev/qemu-fsdev-throttle.c b/fsdev/qemu-fsdev-throttle.c
index 1c137d6f0f..d912da906d 100644
--- a/fsdev/qemu-fsdev-throttle.c
+++ b/fsdev/qemu-fsdev-throttle.c
@@ -94,22 +94,22 @@ void fsdev_throttle_init(FsThrottle *fst)
 }
 }
 
-void coroutine_fn fsdev_co_throttle_request(FsThrottle *fst, bool is_write,
+void coroutine_fn fsdev_co_throttle_request(FsThrottle *fst,
+ThrottleDirection direction,
 struct iovec *iov, int iovcnt)
 {
-ThrottleDirection direction = is_write ? THROTTLE_WRITE : THROTTLE_READ;
-
+assert(direction < THROTTLE_MAX);
 if (throttle_enabled(&fst->cfg)) {
 if (throttle_schedule_timer(&fst->ts, &fst->tt, direction) ||
-!qemu_co_queue_empty(&fst->throttled_reqs[is_write])) {
-qemu_co_queue_wait(&fst->throttled_reqs[is_write], NULL);
+!qemu_co_queue_empty(&fst->throttled_reqs[direction])) {
+qemu_co_queue_wait(&fst->throttled_reqs[direction], NULL);
 }
 
 throttle_account(&fst->ts, direction, iov_size(iov, iovcnt));
 
-if (!qemu_co_queue_empty(&fst->throttled_reqs[is_write]) &&
+if (!qemu_co_queue_empty(&fst->throttled_reqs[direction]) &&
 !throttle_schedule_timer(&fst->ts, &fst->tt, direction)) {
-qemu_co_queue_next(&fst->throttled_reqs[is_write]);
+qemu_co_queue_next(&fst->throttled_reqs[direction]);
 }
 }
 }
diff --git a/hw/9pfs/cofile.c b/hw/9pfs/cofile.c
index 9c5344039e..71174c3e4a 100644
--- a/hw/9pfs/cofile.c
+++ b/hw/9pfs/cofile.c
@@ -252,7 +252,7 @@ int coroutine_fn v9fs_co_pwritev(V9fsPDU *pdu, V9fsFidState 
*fidp,
 if (v9fs_request_cancelled(pdu)) {
 return -EINTR;
 }
-fsdev_co_throttle_request(s->ctx.fst, true, iov, iovcnt);
+fsdev_co_throttle_request(s->ctx.fst, THROTTLE_WRITE, iov, iovcnt);
 v9fs_co_run_in_worker(
 {
 err = s->ops->pwritev(&s->ctx, &fidp->fs, iov, iovcnt, offset);
@@ -272,7 +272,7 @@ int coroutine_fn v9fs_co_preadv(V9fsPDU *pdu, V9fsFidState 
*fidp,
 if (v9fs_request_cancelled(pdu)) {
 return -EINTR;
 }
-fsdev_co_throttle_request(s->ctx.fst, false, iov, iovcnt);
+fsdev_co_throttle_request(s->ctx.fst, THROTTLE_READ, iov, iovcnt);
 v9fs_co_run_in_worker(
 {
 err = s->ops->preadv(&s->ctx, &fidp->fs, iov, iovcnt, offset);
-- 
2.41.0




[PULL 05/14] cryptodev: use NULL throttle timer cb for read direction

2023-09-01 Thread Hanna Czenczek
From: zhenwei pi 

Operations on a cryptodev are considered as *write* only, the callback
of read direction is never invoked. Use NULL instead of an unreachable
path(cryptodev_backend_throttle_timer_cb on read direction).

The dummy read timer(never invoked) is already removed here, it means
that the 'FIXME' tag is no longer needed.

Reviewed-by: Alberto Garcia 
Reviewed-by: Hanna Czenczek 
Signed-off-by: zhenwei pi 
Message-Id: <20230728022006.1098509-6-pizhen...@bytedance.com>
Signed-off-by: Hanna Czenczek 
---
 backends/cryptodev.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/backends/cryptodev.c b/backends/cryptodev.c
index 4d183f7237..c2356550c8 100644
--- a/backends/cryptodev.c
+++ b/backends/cryptodev.c
@@ -341,8 +341,7 @@ static void cryptodev_backend_set_throttle(CryptoDevBackend 
*backend, int field,
 if (!enabled) {
 throttle_init(&backend->ts);
 throttle_timers_init(&backend->tt, qemu_get_aio_context(),
- QEMU_CLOCK_REALTIME,
- cryptodev_backend_throttle_timer_cb, /* FIXME */
+ QEMU_CLOCK_REALTIME, NULL,
  cryptodev_backend_throttle_timer_cb, backend);
 }
 
-- 
2.41.0




[PULL 01/14] throttle: introduce enum ThrottleDirection

2023-09-01 Thread Hanna Czenczek
From: zhenwei pi 

Use enum ThrottleDirection instead of number index.

Reviewed-by: Alberto Garcia 
Reviewed-by: Hanna Czenczek 
Signed-off-by: zhenwei pi 
Message-Id: <20230728022006.1098509-2-pizhen...@bytedance.com>
Signed-off-by: Hanna Czenczek 
---
 include/qemu/throttle.h | 11 ---
 util/throttle.c | 16 +---
 2 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/include/qemu/throttle.h b/include/qemu/throttle.h
index 05f6346137..9ca5ab8197 100644
--- a/include/qemu/throttle.h
+++ b/include/qemu/throttle.h
@@ -99,13 +99,18 @@ typedef struct ThrottleState {
 int64_t previous_leak;/* timestamp of the last leak done */
 } ThrottleState;
 
+typedef enum {
+THROTTLE_READ = 0,
+THROTTLE_WRITE,
+THROTTLE_MAX
+} ThrottleDirection;
+
 typedef struct ThrottleTimers {
-QEMUTimer *timers[2]; /* timers used to do the throttling */
+QEMUTimer *timers[THROTTLE_MAX];/* timers used to do the throttling */
 QEMUClockType clock_type; /* the clock used */
 
 /* Callbacks */
-QEMUTimerCB *read_timer_cb;
-QEMUTimerCB *write_timer_cb;
+QEMUTimerCB *timer_cb[THROTTLE_MAX];
 void *timer_opaque;
 } ThrottleTimers;
 
diff --git a/util/throttle.c b/util/throttle.c
index 81f247a8d1..5642e61763 100644
--- a/util/throttle.c
+++ b/util/throttle.c
@@ -199,10 +199,12 @@ static bool throttle_compute_timer(ThrottleState *ts,
 void throttle_timers_attach_aio_context(ThrottleTimers *tt,
 AioContext *new_context)
 {
-tt->timers[0] = aio_timer_new(new_context, tt->clock_type, SCALE_NS,
-  tt->read_timer_cb, tt->timer_opaque);
-tt->timers[1] = aio_timer_new(new_context, tt->clock_type, SCALE_NS,
-  tt->write_timer_cb, tt->timer_opaque);
+tt->timers[THROTTLE_READ] =
+aio_timer_new(new_context, tt->clock_type, SCALE_NS,
+  tt->timer_cb[THROTTLE_READ], tt->timer_opaque);
+tt->timers[THROTTLE_WRITE] =
+aio_timer_new(new_context, tt->clock_type, SCALE_NS,
+  tt->timer_cb[THROTTLE_WRITE], tt->timer_opaque);
 }
 
 /*
@@ -236,8 +238,8 @@ void throttle_timers_init(ThrottleTimers *tt,
 memset(tt, 0, sizeof(ThrottleTimers));
 
 tt->clock_type = clock_type;
-tt->read_timer_cb = read_timer_cb;
-tt->write_timer_cb = write_timer_cb;
+tt->timer_cb[THROTTLE_READ] = read_timer_cb;
+tt->timer_cb[THROTTLE_WRITE] = write_timer_cb;
 tt->timer_opaque = timer_opaque;
 throttle_timers_attach_aio_context(tt, aio_context);
 }
@@ -256,7 +258,7 @@ void throttle_timers_detach_aio_context(ThrottleTimers *tt)
 {
 int i;
 
-for (i = 0; i < 2; i++) {
+for (i = 0; i < THROTTLE_MAX; i++) {
 throttle_timer_destroy(&tt->timers[i]);
 }
 }
-- 
2.41.0




[PULL 04/14] test-throttle: test read only and write only

2023-09-01 Thread Hanna Czenczek
From: zhenwei pi 

Reviewed-by: Alberto Garcia 
Reviewed-by: Hanna Czenczek 
Signed-off-by: zhenwei pi 
Message-Id: <20230728022006.1098509-5-pizhen...@bytedance.com>
Signed-off-by: Hanna Czenczek 
---
 tests/unit/test-throttle.c | 66 ++
 1 file changed, 66 insertions(+)

diff --git a/tests/unit/test-throttle.c b/tests/unit/test-throttle.c
index a60b5fe22e..5547837a58 100644
--- a/tests/unit/test-throttle.c
+++ b/tests/unit/test-throttle.c
@@ -184,6 +184,70 @@ static void test_init(void)
 throttle_timers_destroy(tt);
 }
 
+static void test_init_readonly(void)
+{
+int i;
+
+tt = &tgm.throttle_timers;
+
+/* fill the structures with crap */
+memset(&ts, 1, sizeof(ts));
+memset(tt, 1, sizeof(*tt));
+
+/* init structures */
+throttle_init(&ts);
+throttle_timers_init(tt, ctx, QEMU_CLOCK_VIRTUAL,
+ read_timer_cb, NULL, &ts);
+
+/* check initialized fields */
+g_assert(tt->clock_type == QEMU_CLOCK_VIRTUAL);
+g_assert(tt->timers[THROTTLE_READ]);
+g_assert(!tt->timers[THROTTLE_WRITE]);
+
+/* check other fields where cleared */
+g_assert(!ts.previous_leak);
+g_assert(!ts.cfg.op_size);
+for (i = 0; i < BUCKETS_COUNT; i++) {
+g_assert(!ts.cfg.buckets[i].avg);
+g_assert(!ts.cfg.buckets[i].max);
+g_assert(!ts.cfg.buckets[i].level);
+}
+
+throttle_timers_destroy(tt);
+}
+
+static void test_init_writeonly(void)
+{
+int i;
+
+tt = &tgm.throttle_timers;
+
+/* fill the structures with crap */
+memset(&ts, 1, sizeof(ts));
+memset(tt, 1, sizeof(*tt));
+
+/* init structures */
+throttle_init(&ts);
+throttle_timers_init(tt, ctx, QEMU_CLOCK_VIRTUAL,
+ NULL, write_timer_cb, &ts);
+
+/* check initialized fields */
+g_assert(tt->clock_type == QEMU_CLOCK_VIRTUAL);
+g_assert(!tt->timers[THROTTLE_READ]);
+g_assert(tt->timers[THROTTLE_WRITE]);
+
+/* check other fields where cleared */
+g_assert(!ts.previous_leak);
+g_assert(!ts.cfg.op_size);
+for (i = 0; i < BUCKETS_COUNT; i++) {
+g_assert(!ts.cfg.buckets[i].avg);
+g_assert(!ts.cfg.buckets[i].max);
+g_assert(!ts.cfg.buckets[i].level);
+}
+
+throttle_timers_destroy(tt);
+}
+
 static void test_destroy(void)
 {
 int i;
@@ -752,6 +816,8 @@ int main(int argc, char **argv)
 g_test_add_func("/throttle/leak_bucket",test_leak_bucket);
 g_test_add_func("/throttle/compute_wait",   test_compute_wait);
 g_test_add_func("/throttle/init",   test_init);
+g_test_add_func("/throttle/init_readonly",  test_init_readonly);
+g_test_add_func("/throttle/init_writeonly", test_init_writeonly);
 g_test_add_func("/throttle/destroy",test_destroy);
 g_test_add_func("/throttle/have_timer", test_have_timer);
 g_test_add_func("/throttle/detach_attach",  test_detach_attach);
-- 
2.41.0




[PULL 02/14] test-throttle: use enum ThrottleDirection

2023-09-01 Thread Hanna Czenczek
From: zhenwei pi 

Use enum ThrottleDirection instead in the throttle test codes.

Reviewed-by: Alberto Garcia 
Reviewed-by: Hanna Czenczek 
Signed-off-by: zhenwei pi 
Message-Id: <20230728022006.1098509-3-pizhen...@bytedance.com>
Signed-off-by: Hanna Czenczek 
---
 tests/unit/test-throttle.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tests/unit/test-throttle.c b/tests/unit/test-throttle.c
index 7adb5e6652..a60b5fe22e 100644
--- a/tests/unit/test-throttle.c
+++ b/tests/unit/test-throttle.c
@@ -169,8 +169,8 @@ static void test_init(void)
 
 /* check initialized fields */
 g_assert(tt->clock_type == QEMU_CLOCK_VIRTUAL);
-g_assert(tt->timers[0]);
-g_assert(tt->timers[1]);
+g_assert(tt->timers[THROTTLE_READ]);
+g_assert(tt->timers[THROTTLE_WRITE]);
 
 /* check other fields where cleared */
 g_assert(!ts.previous_leak);
@@ -191,7 +191,7 @@ static void test_destroy(void)
 throttle_timers_init(tt, ctx, QEMU_CLOCK_VIRTUAL,
  read_timer_cb, write_timer_cb, &ts);
 throttle_timers_destroy(tt);
-for (i = 0; i < 2; i++) {
+for (i = 0; i < THROTTLE_MAX; i++) {
 g_assert(!tt->timers[i]);
 }
 }
-- 
2.41.0




[PULL 00/14] Block patches

2023-09-01 Thread Hanna Czenczek
The following changes since commit f5fe7c17ac4e309e47e78f0f9761aebc8d2f2c81:

  Merge tag 'pull-tcg-20230823-2' of https://gitlab.com/rth7680/qemu into 
staging (2023-08-28 16:07:04 -0400)

are available in the Git repository at:

  https://gitlab.com/hreitz/qemu.git tags/pull-block-2023-09-01

for you to fetch changes up to 380448464dd89291cf7fd7434be6c225482a334d:

  tests/file-io-error: New test (2023-08-29 13:01:24 +0200)


Block patches

- Fix for file-posix's zoning code crashing on I/O errors
- Throttling refactoring


Hanna Czenczek (5):
  file-posix: Clear bs->bl.zoned on error
  file-posix: Check bs->bl.zoned for zone info
  file-posix: Fix zone update in I/O error path
  file-posix: Simplify raw_co_prw's 'out' zone code
  tests/file-io-error: New test

Zhenwei Pi (9):
  throttle: introduce enum ThrottleDirection
  test-throttle: use enum ThrottleDirection
  throttle: support read-only and write-only
  test-throttle: test read only and write only
  cryptodev: use NULL throttle timer cb for read direction
  throttle: use enum ThrottleDirection instead of bool is_write
  throttle: use THROTTLE_MAX/ARRAY_SIZE for hard code
  fsdev: Use ThrottleDirection instread of bool is_write
  block/throttle-groups: Use ThrottleDirection instread of bool is_write

 fsdev/qemu-fsdev-throttle.h|   4 +-
 include/block/throttle-groups.h|   6 +-
 include/qemu/throttle.h|  16 +-
 backends/cryptodev.c   |  12 +-
 block/block-backend.c  |   4 +-
 block/file-posix.c |  42 +++---
 block/throttle-groups.c| 163 +++--
 block/throttle.c   |   8 +-
 fsdev/qemu-fsdev-throttle.c|  18 ++-
 hw/9pfs/cofile.c   |   4 +-
 tests/unit/test-throttle.c |  76 +-
 util/throttle.c|  84 +++
 tests/qemu-iotests/tests/file-io-error | 119 +++
 tests/qemu-iotests/tests/file-io-error.out |  33 +
 14 files changed, 418 insertions(+), 171 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/file-io-error
 create mode 100644 tests/qemu-iotests/tests/file-io-error.out

-- 
2.41.0




[PULL 03/14] throttle: support read-only and write-only

2023-09-01 Thread Hanna Czenczek
From: zhenwei pi 

Only one direction is necessary in several scenarios:
- a read-only disk
- operations on a device are considered as *write* only. For example,
  encrypt/decrypt/sign/verify operations on a cryptodev use a single
  *write* timer(read timer callback is defined, but never invoked).

Allow a single direction in throttle, this reduces memory, and uplayer
does not need a dummy callback any more.

Reviewed-by: Alberto Garcia 
Reviewed-by: Hanna Czenczek 
Signed-off-by: zhenwei pi 
Message-Id: <20230728022006.1098509-4-pizhen...@bytedance.com>
Signed-off-by: Hanna Czenczek 
---
 util/throttle.c | 42 --
 1 file changed, 28 insertions(+), 14 deletions(-)

diff --git a/util/throttle.c b/util/throttle.c
index 5642e61763..0439028d21 100644
--- a/util/throttle.c
+++ b/util/throttle.c
@@ -199,12 +199,15 @@ static bool throttle_compute_timer(ThrottleState *ts,
 void throttle_timers_attach_aio_context(ThrottleTimers *tt,
 AioContext *new_context)
 {
-tt->timers[THROTTLE_READ] =
-aio_timer_new(new_context, tt->clock_type, SCALE_NS,
-  tt->timer_cb[THROTTLE_READ], tt->timer_opaque);
-tt->timers[THROTTLE_WRITE] =
-aio_timer_new(new_context, tt->clock_type, SCALE_NS,
-  tt->timer_cb[THROTTLE_WRITE], tt->timer_opaque);
+ThrottleDirection dir;
+
+for (dir = THROTTLE_READ; dir < THROTTLE_MAX; dir++) {
+if (tt->timer_cb[dir]) {
+tt->timers[dir] =
+aio_timer_new(new_context, tt->clock_type, SCALE_NS,
+  tt->timer_cb[dir], tt->timer_opaque);
+}
+}
 }
 
 /*
@@ -235,6 +238,7 @@ void throttle_timers_init(ThrottleTimers *tt,
   QEMUTimerCB *write_timer_cb,
   void *timer_opaque)
 {
+assert(read_timer_cb || write_timer_cb);
 memset(tt, 0, sizeof(ThrottleTimers));
 
 tt->clock_type = clock_type;
@@ -247,7 +251,9 @@ void throttle_timers_init(ThrottleTimers *tt,
 /* destroy a timer */
 static void throttle_timer_destroy(QEMUTimer **timer)
 {
-assert(*timer != NULL);
+if (*timer == NULL) {
+return;
+}
 
 timer_free(*timer);
 *timer = NULL;
@@ -256,10 +262,10 @@ static void throttle_timer_destroy(QEMUTimer **timer)
 /* Remove timers from event loop */
 void throttle_timers_detach_aio_context(ThrottleTimers *tt)
 {
-int i;
+ThrottleDirection dir;
 
-for (i = 0; i < THROTTLE_MAX; i++) {
-throttle_timer_destroy(&tt->timers[i]);
+for (dir = THROTTLE_READ; dir < THROTTLE_MAX; dir++) {
+throttle_timer_destroy(&tt->timers[dir]);
 }
 }
 
@@ -272,8 +278,12 @@ void throttle_timers_destroy(ThrottleTimers *tt)
 /* is any throttling timer configured */
 bool throttle_timers_are_initialized(ThrottleTimers *tt)
 {
-if (tt->timers[0]) {
-return true;
+ThrottleDirection dir;
+
+for (dir = THROTTLE_READ; dir < THROTTLE_MAX; dir++) {
+if (tt->timers[dir]) {
+return true;
+}
 }
 
 return false;
@@ -424,8 +434,12 @@ bool throttle_schedule_timer(ThrottleState *ts,
 {
 int64_t now = qemu_clock_get_ns(tt->clock_type);
 int64_t next_timestamp;
+QEMUTimer *timer;
 bool must_wait;
 
+timer = is_write ? tt->timers[THROTTLE_WRITE] : tt->timers[THROTTLE_READ];
+assert(timer);
+
 must_wait = throttle_compute_timer(ts,
is_write,
now,
@@ -437,12 +451,12 @@ bool throttle_schedule_timer(ThrottleState *ts,
 }
 
 /* request throttled and timer pending -> do nothing */
-if (timer_pending(tt->timers[is_write])) {
+if (timer_pending(timer)) {
 return true;
 }
 
 /* request throttled and timer not pending -> arm timer */
-timer_mod(tt->timers[is_write], next_timestamp);
+timer_mod(timer, next_timestamp);
 return true;
 }
 
-- 
2.41.0




Re: [PATCH 7/7] qobject atomics osdep: Make a few macros more hygienic

2023-09-01 Thread Markus Armbruster
Richard Henderson  writes:

> On 8/31/23 06:25, Markus Armbruster wrote:
>> +#define PASTE(a, b) a##b
>
> We already have glue() in qemu/compiler.h.

Missed it, will fix.

> The rest of it looks quite sensible.

Thanks!




Re: [PATCH 3/7] ui: Clean up local variable shadowing

2023-09-01 Thread Markus Armbruster
Peter Maydell  writes:

> On Thu, 31 Aug 2023 at 14:25, Markus Armbruster  wrote:
>>
>> Local variables shadowing other local variables or parameters make the
>> code needlessly hard to understand.  Tracked down with -Wshadow=local.
>> Clean up: delete inner declarations when they are actually redundant,
>> else rename variables.
>>
>> Signed-off-by: Markus Armbruster 
>
>
>> diff --git a/ui/vnc-enc-zrle.c.inc b/ui/vnc-enc-zrle.c.inc
>> index c107d8affc..edf42d4a6a 100644
>> --- a/ui/vnc-enc-zrle.c.inc
>> +++ b/ui/vnc-enc-zrle.c.inc
>> @@ -153,11 +153,12 @@ static void ZRLE_ENCODE_TILE(VncState *vs, ZRLE_PIXEL 
>> *data, int w, int h,
>>  }
>>
>>  if (use_rle) {
>> -ZRLE_PIXEL *ptr = data;
>> -ZRLE_PIXEL *end = ptr + w * h;
>>  ZRLE_PIXEL *run_start;
>>  ZRLE_PIXEL pix;
>>
>> +ptr = data;
>> +end = ptr + w * h;
>> +
>>  while (ptr < end) {
>>  int len;
>>  int index = 0;
>> @@ -198,7 +199,7 @@ static void ZRLE_ENCODE_TILE(VncState *vs, ZRLE_PIXEL 
>> *data, int w, int h,
>>  }
>>  } else if (use_palette) { /* no RLE */
>>  int bppp;
>> -ZRLE_PIXEL *ptr = data;
>> +ptr = data;
>>
>>  /* packed pixels */
>>
>> @@ -241,8 +242,6 @@ static void ZRLE_ENCODE_TILE(VncState *vs, ZRLE_PIXEL 
>> *data, int w, int h,
>>  #endif
>>  {
>>  #ifdef ZRLE_COMPACT_PIXEL
>> -ZRLE_PIXEL *ptr;
>> -
>>  for (ptr = data; ptr < data + w * h; ptr++) {
>>  ZRLE_WRITE_PIXEL(vs, *ptr);
>>  }
>
> For this one I'm tempted to suggest instead moving the
> pix and end currently at whole-function scope into their
> own block, so it's clear these are actually four
> completely independent uses of ptr/end. But either way

You have a point.  However, we'd need to splice in a block just for
that, which involved some reindenting.  Can do if the assigned
maintainers (Gerd and Marc-André) prefer it.  Else, I'll stay with
minimally invasive.

> Reviewed-by: Peter Maydell 

Thanks!




Re: [PATCH 0/7] Steps towards enabling -Wshadow=local

2023-09-01 Thread Markus Armbruster
Markus Armbruster  writes:

> Local variables shadowing other local variables or parameters make the
> code needlessly hard to understand.  Bugs love to hide in such code.
> Evidence: PATCH 1.
>
> Enabling -Wshadow would prevent bugs like this one.  But we'd have to
> clean up all the offenders first.  We got a lot of them.
>
> Enabling -Wshadow=local should be less work for almost as much gain.
> I took a stab at it.  There's a small, exciting part, and a large,
> boring part.
>
> The exciting part is dark preprocessor sorcery to let us nest macro
> calls without shadowing: PATCH 7.
>
> The boring part is cleaning up all the other warnings.  I did some
> [PATCH 2-6], but ran out of steam long before finishing the job.  Some
> 160 unique warnings remain.
>
> To see them, enable -Wshadow=local like so:
>
> diff --git a/meson.build b/meson.build
> index 98e68ef0b1..9fc4c7ac9d 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -466,6 +466,9 @@ warn_flags = [
>'-Wno-tautological-type-limit-compare',
>'-Wno-psabi',
>'-Wno-gnu-variable-sized-type-not-at-end',
> +  '-Wshadow=local',
> +  '-Wno-error=shadow=local',
> +  '-Wno-error=shadow=compatible-local',
>  ]
>  
>  if targetos != 'darwin'
>
> You may want to drop the -Wno-error lines.
>
> Subsystems with -Wshadow=local warnings:
>
> virtio-gpu
> virtio
> Device Tree
> Overall TCG CPUs

Philippe's "[PATCH 00/11] (few more) Steps towards enabling -Wshadow"
takes care of this one.

> Overall Audio backends
> Open Sound System (OSS) Audio backend
> vhost
> vhost-user-gpu
> Cryptography
> M68K TCG CPUs
> Dump
> ACPI/SMBIOS
> Allwinner-a10

Likewise.

> ARM TCG CPUs
> MPS2
> ASPEED BMCs
> ARM SMMU
> Virt
> Machine core
> PC Chipset
> X86 TCG CPUs
> PC
> VT-d Emulation
> IDE

Likewise.

> ARM cores
> OpenPIC interrupt controller
> q800

Likewise.

> petalogix_ml605
> MicroBlaze TCG CPUs
> Versatile PB
> Network devices
> NiosII TCG CPUs
> nvme
> PowerNV (Non-Virtualized)
> sPAPR (pseries)
> OpenTitan
> RISC-V TCG CPUs
> SCSI
> USB
> Linux user
> Network packet abstractions

Likewise.

> Network device backends

Likewise.

> Network Block Device (NBD)
> Semihosting
> Memory API
> Seccomp
> Main loop
> Hexagon TCG CPUs
> X86 KVM CPUs
> MIPS TCG CPUs

Likewise.

> PowerPC TCG CPUs
> TriCore TCG CPUs
> Common TCG code

Likewise.

> qtest
> Throttling infrastructure
> Vhost-user block device backend server
>
> Files with -Wshadow=local warnings:
>
> accel/tcg/tb-maint.c

Likewise.

> audio/audio.c
> audio/ossaudio.c
> contrib/vhost-user-gpu/vhost-user-gpu.c
> contrib/vhost-user-gpu/vugpu.h
> crypto/cipher-gnutls.c.inc
> crypto/tls-cipher-suites.c
> disas/m68k.c
> dump/dump.c
> hw/acpi/cpu_hotplug.c
> hw/arm/allwinner-r40.c

Likewise.

> hw/arm/armsse.c
> hw/arm/armv7m.c
> hw/arm/aspeed_ast2600.c

Likewise.

> hw/arm/smmuv3-internal.h
> hw/arm/smmuv3.c
> hw/arm/virt.c

Likewise.

> hw/core/machine.c
> hw/i2c/aspeed_i2c.c
> hw/i2c/pm_smbus.c
> hw/i386/acpi-build.c
> hw/i386/acpi-microvm.c
> hw/i386/intel_iommu.c
> hw/i386/pc.c
> hw/i386/x86.c
> hw/ide/ahci.c

Likewise.

> hw/intc/arm_gicv3_its.c
> hw/intc/openpic.c
> hw/loongarch/virt.c
> hw/m68k/bootinfo.h

Likewise.

> hw/microblaze/petalogix_ml605_mmu.c
> hw/misc/arm_sysctl.c
> hw/misc/aspeed_i3c.c
> hw/net/vhost_net.c
> hw/nios2/10m50_devboard.c
> hw/nvme/ns.c
> hw/ppc/pnv_psi.c
> hw/ppc/spapr.c
> hw/ppc/spapr_drc.c
> hw/ppc/spapr_pci.c
> hw/riscv/opentitan.c
> hw/scsi/mptsas.c
> hw/smbios/smbios.c
> hw/usb/desc.c
> hw/usb/dev-hub.c
> hw/usb/dev-storage.c
> hw/usb/hcd-xhci.c
> hw/usb/host-libusb.c
> hw/virtio/vhost.c
> hw/virtio/virtio-pci.c
> include/hw/cxl/cxl_device.h
> include/hw/ppc/fdt.h
> include/hw/virtio/virtio-gpu.h
> include/sysemu/device_tree.h

Likewise.

> linux-user/flatload.c
> linux-user/mmap.c
> linux-user/strace.c
> linux-user/syscall.c
> net/eth.c

Likewise.

> qemu-nbd.c
> semihosting/arm-compat-semi.c
> softmmu/device_tree.c
> softmmu/memory.c
> softmmu/physmem.c
> softmmu/qemu-seccomp.c
> softmmu/vl.c

  target/arm/hvf/hvf.c

Likewise.

> target/arm/tcg/mve_helper.c

Likewise.

> target/arm/tcg/translate-m-nocp.c

Likewise.

> target/hexagon/helper_funcs_generated.c.inc

This is actually

  target/hexagon/gen_helper_funcs.py

> target/hexagon/mmvec/macros.h
> target/hexagon/op_helper.c
> target/hexagon/translate.c
> target/i386/cpu.c
> target/i386/kvm/kvm.c
> target/i386/tcg/seg_helper.c
> target/i386/tcg/sysemu/svm_helper.c
> target/i386/tcg/translate.c

Re: [PATCH 00/11] (few more) Steps towards enabling -Wshadow

2023-09-01 Thread Markus Armbruster
Philippe Mathieu-Daudé  writes:

> For rational see Markus cover on
> https://lore.kernel.org/qemu-devel/20230831132546.3525721-1-arm...@redhat.com/
>
> This series contains few more, my take.
>
> Based-on: <20230831132546.3525721-1-arm...@redhat.com>

Awesome, thanks!




Re: [PATCH 10/11] net/eth: Clean up local variable shadowing

2023-09-01 Thread Akihiko Odaki

On 2023/09/01 7:56, Philippe Mathieu-Daudé wrote:

Fix:

   net/eth.c:435:20: error: declaration shadows a local variable 
[-Werror,-Wshadow]
 size_t input_size = iov_size(pkt, pkt_frags);
^
   net/eth.c:413:16: note: previous declaration is here
 size_t input_size = iov_size(pkt, pkt_frags);
^

Signed-off-by: Philippe Mathieu-Daudé 
---
  net/eth.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/eth.c b/net/eth.c
index 649e66bb1f..cf030eed7b 100644
--- a/net/eth.c
+++ b/net/eth.c
@@ -432,7 +432,7 @@ _eth_get_rss_ex_src_addr(const struct iovec *pkt, int 
pkt_frags,
  }
  
  if (opthdr.type == IP6_OPT_HOME) {

-size_t input_size = iov_size(pkt, pkt_frags);
+input_size = iov_size(pkt, pkt_frags);


You can just remove this statement.

  
  if (input_size < opt_offset + sizeof(opthdr)) {

  return false;




Re: [PATCH v2 12/12] hw/vmapple/vmapple: Add vmapple machine type

2023-09-01 Thread Mark Cave-Ayland

On 30/08/2023 17:14, Alexander Graf wrote:

Hi Alex,


Apple defines a new "vmapple" machine type as part of its proprietary
macOS Virtualization.Framework vmm. This machine type is similar to the
virt one, but with subtle differences in base devices, a few special
vmapple device additions and a vastly different boot chain.

This patch reimplements this machine type in QEMU. To use it, you
have to have a readily installed version of macOS for VMApple,
run on macOS with -accel hvf, pass the Virtualization.Framework
boot rom (AVPBooter) in via -bios, pass the aux and root volume as pflash
and pass aux and root volume as virtio drives. In addition, you also
need to find the machine UUID and pass that as -M vmapple,uuid= parameter:

$ qemu-system-aarch64 -accel hvf -M vmapple,uuid=0x1234 -m 4G \
 -bios 
/System/Library/Frameworks/Virtualization.framework/Versions/A/Resources/AVPBooter.vmapple2.bin
 -drive file=aux,if=pflash,format=raw \
 -drive file=root,if=pflash,format=raw \
 -drive file=aux,if=none,id=aux,format=raw \
 -device vmapple-virtio-aux,drive=aux \
 -drive file=root,if=none,id=root,format=raw \
 -device vmapple-virtio-root,drive=root

With all these in place, you should be able to see macOS booting
successfully.

Signed-off-by: Alexander Graf 

---

v1 -> v2:

   - Adapt to system_ss meson.build target
   - Add documentation
---
  MAINTAINERS |   1 +
  docs/system/arm/vmapple.rst |  63 
  docs/system/target-arm.rst  |   1 +
  hw/vmapple/vmapple.c| 661 
  hw/vmapple/Kconfig  |  19 ++
  hw/vmapple/meson.build  |   1 +
  6 files changed, 746 insertions(+)
  create mode 100644 docs/system/arm/vmapple.rst
  create mode 100644 hw/vmapple/vmapple.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 3104e58eff..1d3b1e0034 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2578,6 +2578,7 @@ M: Alexander Graf 
  S: Maintained
  F: hw/vmapple/*
  F: include/hw/vmapple/*
+F: docs/system/arm/vmapple.rst
  
  Subsystems

  --
diff --git a/docs/system/arm/vmapple.rst b/docs/system/arm/vmapple.rst
new file mode 100644
index 00..c7486b21d9
--- /dev/null
+++ b/docs/system/arm/vmapple.rst
@@ -0,0 +1,63 @@
+VMApple machine emulation
+
+
+VMApple is the device model that the macOS built-in hypervisor called 
"Virtualization.framework"
+exposes to Apple Silicon macOS guests. The "vmapple" machine model in QEMU 
implements the same
+device model, but does not use any code from Virtualization.Framework.
+
+Prerequisites
+-
+
+To run the vmapple machine model, you need to
+
+ * Run on Apple Silicon
+ * Run on macOS 12.0 or above
+ * Have an already installed copy of a Virtualization.Framework macOS virtual 
machine. I will
+   assume that you installed it using the macosvm CLI.
+
+First, we need to extract the UUID from the virtual machine that you 
installed. You can do this
+by running the following shell script:
+
+.. code-block:: bash
+  :caption: uuid.sh script to extract the UUID from a macosvm.json file
+
+  #!/bin/bash
+
+  MID=$(cat "$1" | python3 -c 'import 
json,sys;obj=json.load(sys.stdin);print(obj["machineId"]);')
+  echo "$MID" | base64 -d | plutil -extract ECID raw -
+
+Now we also need to trim the aux partition. It contains metadata that we can 
just discard:
+
+.. code-block:: bash
+  :caption: Command to trim the aux file
+
+  $ dd if="aux.img" of="aux.img.trimmed" bs=$(( 0x4000 )) skip=1
+
+How to run
+--
+
+Then, we can launch QEMU with the Virtualization.Framework pre-boot 
environment and the readily
+installed target disk images. I recommend to port forward the VM's ssh and vnc 
ports to the host
+to get better interactive access into the target system:
+
+.. code-block:: bash
+  :caption: Example execution command line
+
+  $ UUID=$(uuid.sh macosvm.json)
+  $ 
AVPBOOTER=/System/Library/Frameworks/Virtualization.framework/Resources/AVPBooter.vmapple2.bin
+  $ AUX=aux.img.trimmed
+  $ DISK=disk.img
+  $ qemu-system-aarch64 \
+   -serial mon:stdio \
+   -m 4G \
+   -accel hvf \
+   -M vmapple,uuid=$UUID \
+   -bios $AVPBOOTER \
+-drive file="$AUX",if=pflash,format=raw \
+-drive file="$DISK",if=pflash,format=raw \
+   -drive file="$AUX",if=none,id=aux,format=raw \
+   -drive file="$DISK",if=none,id=root,format=raw \
+   -device vmapple-virtio-aux,drive=aux \
+   -device vmapple-virtio-root,drive=root \
+   -net user,ipv6=off,hostfwd=tcp::-:22,hostfwd=tcp::5901-:5900 \
+   -net nic,model=virtio \
diff --git a/docs/system/target-arm.rst b/docs/system/target-arm.rst
index 790ac1b8a2..bf663df4a6 100644
--- a/docs/system/target-arm.rst
+++ b/docs/system/target-arm.rst
@@ -106,6 +106,7 @@ undocumented; you can get a complete list by running
 arm/stellaris
 arm/stm32
 arm/virt
+   arm/vmapple
 arm/xlnx-versal-virt

Re: [PATCH v2 11/12] hw/vmapple/virtio-blk: Add support for apple virtio-blk

2023-09-01 Thread Mark Cave-Ayland

On 30/08/2023 17:14, Alexander Graf wrote:

Hi Alex,


Apple has its own virtio-blk PCI device ID where it deviates from the
official virtio-pci spec slightly: It puts a new "apple type"
field at a static offset in config space and introduces a new barrier
command.

This patch first creates a mechanism for virtio-blk downstream classes to
handle unknown commands. It then creates such a downstream class and a new
vmapple-virtio-blk-pci class which support the additional apple type config
identifier as well as the barrier command.

It then exposes 2 subclasses from that that we can use to expose root and
aux virtio-blk devices: "vmapple-virtio-root" and "vmapple-virtio-aux".

Signed-off-by: Alexander Graf 

---

v1 -> v2:

   - Rework to make all vmapple virtio-blk logic a subclass
---
  include/hw/pci/pci_ids.h|   1 +
  include/hw/virtio/virtio-blk.h  |  12 +-
  include/hw/vmapple/virtio-blk.h |  39 ++
  hw/block/virtio-blk.c   |  19 ++-
  hw/vmapple/virtio-blk.c | 212 
  hw/vmapple/Kconfig  |   3 +
  hw/vmapple/meson.build  |   1 +
  7 files changed, 282 insertions(+), 5 deletions(-)
  create mode 100644 include/hw/vmapple/virtio-blk.h
  create mode 100644 hw/vmapple/virtio-blk.c

diff --git a/include/hw/pci/pci_ids.h b/include/hw/pci/pci_ids.h
index e4386ebb20..74e589a298 100644
--- a/include/hw/pci/pci_ids.h
+++ b/include/hw/pci/pci_ids.h
@@ -188,6 +188,7 @@
  #define PCI_DEVICE_ID_APPLE_UNI_N_AGP0x0020
  #define PCI_DEVICE_ID_APPLE_U3_AGP   0x004b
  #define PCI_DEVICE_ID_APPLE_UNI_N_GMAC   0x0021
+#define PCI_DEVICE_ID_APPLE_VIRTIO_BLK   0x1a00
  
  #define PCI_VENDOR_ID_SUN0x108e

  #define PCI_DEVICE_ID_SUN_EBUS   0x1000
diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index dafec432ce..381a906410 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -23,7 +23,7 @@
  #include "qom/object.h"
  
  #define TYPE_VIRTIO_BLK "virtio-blk-device"

-OBJECT_DECLARE_SIMPLE_TYPE(VirtIOBlock, VIRTIO_BLK)
+OBJECT_DECLARE_TYPE(VirtIOBlock, VirtIOBlkClass, VIRTIO_BLK)
  
  /* This is the last element of the write scatter-gather list */

  struct virtio_blk_inhdr
@@ -91,6 +91,16 @@ typedef struct MultiReqBuffer {
  bool is_write;
  } MultiReqBuffer;
  
+typedef struct VirtIOBlkClass {

+/*< private >*/
+VirtioDeviceClass parent;


This should be parent_class for consistency.


+/*< public >*/
+bool (*handle_unknown_request)(VirtIOBlockReq *req, MultiReqBuffer *mrb,
+   uint32_t type);
+} VirtIOBlkClass;
+


Same as before you can drop the typedef and public/private comments, and a newline 
after parent_class.



  void virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq);
+void virtio_blk_free_request(VirtIOBlockReq *req);
+void virtio_blk_req_complete(VirtIOBlockReq *req, unsigned char status);
  
  #endif

diff --git a/include/hw/vmapple/virtio-blk.h b/include/hw/vmapple/virtio-blk.h
new file mode 100644
index 00..b23106a3df
--- /dev/null
+++ b/include/hw/vmapple/virtio-blk.h
@@ -0,0 +1,39 @@
+/*
+ * VMApple specific VirtIO Block implementation
+ *
+ * Copyright © 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef HW_VMAPPLE_CFG_H
+#define HW_VMAPPLE_CFG_H
+
+#include "hw/sysbus.h"
+#include "qom/object.h"
+#include "hw/virtio/virtio-pci.h"
+#include "hw/virtio/virtio-blk.h"
+
+#define TYPE_VMAPPLE_VIRTIO_BLK "vmapple-virtio-blk"
+#define TYPE_VMAPPLE_VIRTIO_ROOT "vmapple-virtio-root"
+#define TYPE_VMAPPLE_VIRTIO_AUX "vmapple-virtio-aux"
+
+OBJECT_DECLARE_TYPE(VMAppleVirtIOBlk, VMAppleVirtIOBlkClass, 
VMAPPLE_VIRTIO_BLK)
+
+typedef struct VMAppleVirtIOBlkClass {
+/*< private >*/
+VirtIOBlkClass parent;
+/*< public >*/
+void (*get_config)(VirtIODevice *vdev, uint8_t *config);
+} VMAppleVirtIOBlkClass;


Same here.


+typedef struct VMAppleVirtIOBlk {
+/*  */
+VirtIOBlock parent_obj;
+
+/*  */
+uint32_t apple_type;
+} VMAppleVirtIOBlk;


And here.


+#endif /* HW_VMAPPLE_CFG_H */
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 39e7f23fab..1645cdccbe 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -48,12 +48,12 @@ static void virtio_blk_init_request(VirtIOBlock *s, 
VirtQueue *vq,
  req->mr_next = NULL;
  }
  
-static void virtio_blk_free_request(VirtIOBlockReq *req)

+void virtio_blk_free_request(VirtIOBlockReq *req)
  {
  g_free(req);
  }
  
-static void virtio_blk_req_complete(VirtIOBlockReq *req, unsigned char status)

+void virtio_blk_req_complete(VirtIOBlockReq *req, unsigned char status)
  {
  VirtIOBlock *s = req->dev;
  VirtIODevice *vdev = VIRTIO_DEVICE(s);
@@ -1121,8 +1121,18 @@ static int virtio_blk_handle_request(VirtIOBlockReq 
*req, Multi