from:"Stefan Hajnoczi"

Re: [PULL V2 00/17] Net patches

2023-09-19 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

npcm7xx_timer-test.c is unreliable

2023-09-19 Thread Stefan Hajnoczi

Hi,
Sometimes npcm7xx_timer-test fails intermittently: 
https://gitlab.com/qemu-project/qemu/-/jobs/5121787250

38/96 qemu:qtest+qtest-arm / qtest-arm/npcm7xx_timer-test   ERROR   
0.95s   exit status 1
>>> QTEST_QEMU_BINARY=./qemu-system-arm 
>>> QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon 
>>> G_TEST_DBUS_DAEMON=/builds/qemu-project/qemu/tests/dbus-vmstate-daemon.sh 
>>> QTEST_QEMU_IMG=./qemu-img MALLOC_PERTURB_=103 
>>> /builds/qemu-project/qemu/build/tests/qtest/npcm7xx_timer-test --tap -k
― ✀  ―
stderr:
**
ERROR:../tests/qtest/npcm7xx_timer-test.c:475:test_periodic_interrupt: 
assertion failed (tim_read(td, TISR) == tim_timer_bit(td)): (0x == 
0x0004)
**
ERROR:../tests/qtest/npcm7xx_timer-test.c:476:test_periodic_interrupt: 
'qtest_get_irq(global_qtest, tim_timer_irq(td))' should be TRUE
(test program exited with status code 1)
――

When I reran the CI job, it passed.

Please investigate why this test is unreliable and fix it. Thanks!

There is a GitLab Issue to track this here:
https://gitlab.com/qemu-project/qemu/-/issues/1897

Stefan


signature.asc
Description: PGP signature

Re: [PULL 00/28] Block layer patches

2023-09-19 Thread Stefan Hajnoczi

On Tue, 19 Sept 2023 at 06:26, Kevin Wolf  wrote:
>
> Am 18.09.2023 um 20:56 hat Stefan Hajnoczi geschrieben:
> > Hi Kevin,
> > I believe that my own commit "block-coroutine-wrapper: use
> > qemu_get_current_aio_context()" breaks this test. The failure is
> > non-deterministic (happens about 1 out of 4 runs).
> >
> > It seems the job hangs and the test times out in vm.run_job('job1', 
> > wait=5.0).
> >
> > I haven't debugged it yet but wanted to share this information to save
> > some time. Tomorrow I'll investigate further.
>
> Yes, it's relatively easily reproducible if I run the test in a loop,
> and I can't seem to reproduce it without the last patch. Should I
> unstage the full series again, or do you think that the last patch is
> really optional this time?

Please drop the last patch. I'm not aware of dependencies on the last patch.

> However, I'm unsure how the stack traces I'm seeing are related to your
> patch. Maybe it just made an existing bug more likely to be triggered?

I'll share my thoughts once I've looked at the crash today.

Regarding AioContext lock removal: I'll work on that and see what
still depends on the lock.

Stefan

> What I'm seeing is that the reader lock is held by an iothread that is
> waiting for its AioContext lock to make progress:
>
> Thread 3 (Thread 0x7f811e9346c0 (LWP 26390) "qemu-system-x86"):
> #0  0x7f81250aaf80 in __lll_lock_wait () at /lib64/libc.so.6
> #1  0x7f81250b149a in pthread_mutex_lock@@GLIBC_2.2.5 () at 
> /lib64/libc.so.6
> #2  0x55b7b170967e in qemu_mutex_lock_impl (mutex=0x55b7b34e3080, 
> file=0x55b7b199e1f7 "../util/async.c", line=728) at 
> ../util/qemu-thread-posix.c:94
> #3  0x55b7b1709953 in qemu_rec_mutex_lock_impl (mutex=0x55b7b34e3080, 
> file=0x55b7b199e1f7 "../util/async.c", line=728) at 
> ../util/qemu-thread-posix.c:149
> #4  0x55b7b1728318 in aio_context_acquire (ctx=0x55b7b34e3020) at 
> ../util/async.c:728
> #5  0x55b7b1727c49 in co_schedule_bh_cb (opaque=0x55b7b34e3020) at 
> ../util/async.c:565
> #6  0x55b7b1726f1c in aio_bh_call (bh=0x55b7b34e2e70) at 
> ../util/async.c:169
> #7  0x55b7b17270ee in aio_bh_poll (ctx=0x55b7b34e3020) at 
> ../util/async.c:216
> #8  0x55b7b170351d in aio_poll (ctx=0x55b7b34e3020, blocking=true) at 
> ../util/aio-posix.c:722
> #9  0x55b7b1518604 in iothread_run (opaque=0x55b7b2904460) at 
> ../iothread.c:63
> #10 0x55b7b170a955 in qemu_thread_start (args=0x55b7b34e36b0) at 
> ../util/qemu-thread-posix.c:541
> #11 0x7f81250ae15d in start_thread () at /lib64/libc.so.6
> #12 0x7f812512fc00 in clone3 () at /lib64/libc.so.6
>
> On the other hand, the main thread wants to acquire the writer lock,
> but it holds the AioContext lock of the iothread (it takes it in
> job_prepare_locked()):
>
> Thread 1 (Thread 0x7f811f4b7b00 (LWP 26388) "qemu-system-x86"):
> #0  0x7f8125122356 in ppoll () at /lib64/libc.so.6
> #1  0x55b7b172eae0 in qemu_poll_ns (fds=0x55b7b34ec910, nfds=1, 
> timeout=-1) at ../util/qemu-timer.c:339
> #2  0x55b7b1704ebd in fdmon_poll_wait (ctx=0x55b7b3269210, 
> ready_list=0x7ffc90b05680, timeout=-1) at ../util/fdmon-poll.c:79
> #3  0x55b7b1703284 in aio_poll (ctx=0x55b7b3269210, blocking=true) at 
> ../util/aio-posix.c:670
> #4  0x55b7b1567c3b in bdrv_graph_wrlock (bs=0x0) at 
> ../block/graph-lock.c:145
> #5  0x55b7b1554c1c in blk_remove_bs (blk=0x55b7b4425800) at 
> ../block/block-backend.c:916
> #6  0x55b7b1554779 in blk_delete (blk=0x55b7b4425800) at 
> ../block/block-backend.c:497
> #7  0x55b7b1554133 in blk_unref (blk=0x55b7b4425800) at 
> ../block/block-backend.c:557
> #8  0x55b7b157a149 in mirror_exit_common (job=0x55b7b4419000) at 
> ../block/mirror.c:696
> #9  0x55b7b1577015 in mirror_prepare (job=0x55b7b4419000) at 
> ../block/mirror.c:807
> #10 0x55b7b153a1a7 in job_prepare_locked (job=0x55b7b4419000) at 
> ../job.c:988
> #11 0x55b7b153a0d9 in job_txn_apply_locked (job=0x55b7b4419000, 
> fn=0x55b7b153a110 ) at ../job.c:191
> #12 0x55b7b1538b6d in job_do_finalize_locked (job=0x55b7b4419000) at 
> ../job.c:1011
> #13 0x55b7b153a886 in job_completed_txn_success_locked 
> (job=0x55b7b4419000) at ../job.c:1068
> #14 0x55b7b1539372 in job_completed_locked (job=0x55b7b4419000) at 
> ../job.c:1082
> #15 0x55b7b153a71b in job_exit (opaque=0x55b7b4419000) at ../job.c:1103
> #16 0x55b7b1726f1c in aio_bh_call (bh=0x7f8110005470) at 
> ../util/async.c:169
> #17 0x55b7b17270ee in aio_bh_poll (ctx=0x55b7b3269210) at 
> ../util/async.c:216
> #18 0x55b7b1702c05 in ai

Re: [PULL 0/8] Hppa btlb patches

2023-09-19 Thread Stefan Hajnoczi

Please take a look at the following CI failure and resend when you
have fixed the error:

mipsel-linux-gnu-gcc -Ilibqemu-hppa-softmmu.fa.p -I. -I..
-Itarget/hppa -I../target/hppa -Iqapi -Itrace -Iui -Iui/shader
-I/usr/include/pixman-1 -I/usr/include/capstone
-I/usr/include/spice-server -I/usr/include/spice-1
-I/usr/include/glib-2.0 -I/usr/lib/mipsel-linux-gnu/glib-2.0/include
-fdiagnostics-color=auto -Wall -Winvalid-pch -Werror -std=gnu11 -O2 -g
-fstack-protector-strong -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -Wundef
-Wwrite-strings -Wmissing-prototypes -Wstrict-prototypes
-Wredundant-decls -Wold-style-declaration -Wold-style-definition
-Wtype-limits -Wformat-security -Wformat-y2k -Winit-self
-Wignored-qualifiers -Wempty-body -Wnested-externs -Wendif-labels
-Wexpansion-to-defined -Wimplicit-fallthrough=2
-Wmissing-format-attribute -Wno-missing-include-dirs
-Wno-shift-negative-value -Wno-psabi -isystem
/builds/qemu-project/qemu/linux-headers -isystem linux-headers -iquote
. -iquote /builds/qemu-project/qemu -iquote
/builds/qemu-project/qemu/include -iquote
/builds/qemu-project/qemu/host/include/generic -iquote
/builds/qemu-project/qemu/tcg/mips -pthread -D_GNU_SOURCE
-D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -fno-strict-aliasing
-fno-common -fwrapv -fPIE -isystem../linux-headers
-isystemlinux-headers -DNEED_CPU_H
'-DCONFIG_TARGET="hppa-softmmu-config-target.h"'
'-DCONFIG_DEVICES="hppa-softmmu-config-devices.h"' -MD -MQ
libqemu-hppa-softmmu.fa.p/target_hppa_mem_helper.c.o -MF
libqemu-hppa-softmmu.fa.p/target_hppa_mem_helper.c.o.d -o
libqemu-hppa-softmmu.fa.p/target_hppa_mem_helper.c.o -c
../target/hppa/mem_helper.c
In file included from ../target/hppa/mem_helper.c:21:
../target/hppa/mem_helper.c: In function ‘helper_diag_btlb’:
../target/hppa/mem_helper.c:461:36: error: format ‘%lx’ expects
argument of type ‘long unsigned int’, but argument 4 has type
‘uint64_t’ {aka ‘long long unsigned int’} [-Werror=format=]
461 | qemu_log_mask(CPU_LOG_MMU, "PDC_BLOCK_TLB: PDC_BTLB_INSERT "
| ^
..
466 | virt_page, phys_page, len, slot);
| ~
| |
| uint64_t {aka long long unsigned int}
../include/qemu/log.h:55:22: note: in definition of macro ‘qemu_log_mask’
55 | qemu_log(FMT, ## __VA_ARGS__); \
| ^~~
cc1: all warnings being treated as errors

Thanks,
Stefan

On Sat, 16 Sept 2023 at 15:33,  wrote:
>
> From: Helge Deller 
>
> The following changes since commit 9ef497755afc252fb8e060c9ea6b0987abfd20b6:
>
>   Merge tag 'pull-vfio-20230911' of https://github.com/legoater/qemu into 
> staging (2023-09-11 09:13:08 -0400)
>
> are available in the Git repository at:
>
>   https://github.com/hdeller/qemu-hppa.git tags/hppa-btlb-pull-request
>
> for you to fetch changes up to 303b1febe3dcd519314d6ed80d97a706cdd21f64:
>
>   linux-user/hppa: lock both words of function descriptor (2023-09-16 
> 21:13:08 +0200)
>
> 
> Block-TLB support and linux-user fixes for hppa target
>
> All 32-bit hppa CPUs allow a fixed number of TLB entries to have a
> different page size than the default 4k.
> Those are called "Block-TLBs" and are created at startup by the
> operating system and managed by the firmware of hppa machines
> through the firmware PDC_BLOCK_TLB call.
>
> This patchset adds the necessary glue to SeaBIOS-hppa and
> qemu to allow up to 16 BTLB entries in the emulation.
>
> Two patches from Mikulas Patocka fix signal delivery issues
> in linux-user on hppa.
>
> 
>
> Helge Deller (6):
>   target/hppa: Update to SeaBIOS-hppa version 9
>   target/hppa: Allow up to 16 BTLB entries
>   target/hppa: Report and clear BTLBs via fw_cfg at startup
>   target/hppa: Add BTLB support to hppa TLB functions
>   target/hppa: Extract diagnose immediate value
>   target/hppa: Wire up diag instruction to support BTLB
>
> Mikulas Patocka (2):
>   linux-user/hppa: clear the PSW 'N' bit when delivering signals
>   linux-user/hppa: lock both words of function descriptor
>
>  hw/hppa/machine.c |  10 +--
>  linux-user/hppa/signal.c  |   6 +-
>  pc-bios/hppa-firmware.img | Bin 720216 -> 732376 bytes
>  roms/seabios-hppa |   2 +-
>  target/hppa/cpu.h |  11 ++-
>  target/hppa/helper.h  |   1 +
>  target/hppa/insns.decode  |   2 +-
>  target/hppa/int_helper.c  |   2 +-
>  target/hppa/mem_helper.c  | 179 --
>  target/hppa/op_helper.c   |   3 +-
>  target/hppa/translate.c   |  15 +++-
>  11 files changed, 188 insertions(+), 43 deletions(-)
>
> --
> 2.41.0
>
>

Re: [PULL v2 0/9] testing updates (back to green!)

2023-09-19 Thread Stefan Hajnoczi

On Tue, 19 Sept 2023 at 12:00, Alex Bennée  wrote:
>
>
> Stefan Hajnoczi  writes:
>
> > There is some funny business with tests/lcitool/libvirt-ci. Please
> > rebase on master and send a v3. Sorry for the trouble, I am afraid I
> > would mess something up with the submodule if I attempted to resolve
> > it myself.
> >
> > (If you don't see a conflict when rebasing, please wait until the end
> > of the day when the other pull requests queued on the staging branch
> > are pushed to master.)
>
> That's weird, was their another PR in flight which touched libvirt-ci?

It's probably a conflict with Ilya Maximets' patches in Jason Wang's
net pull request:

https://lore.kernel.org/qemu-devel/20230918083132.55423-1-jasow...@redhat.com/

>
> >
> > Thanks!
> >
> > Auto-merging tests/docker/dockerfiles/debian-amd64-cross.docker
> > Auto-merging tests/docker/dockerfiles/debian-amd64.docker
> > Auto-merging tests/docker/dockerfiles/debian-arm64-cross.docker
> > Auto-merging tests/docker/dockerfiles/debian-armhf-cross.docker
> > Auto-merging tests/docker/dockerfiles/debian-ppc64el-cross.docker
> > Auto-merging tests/docker/dockerfiles/debian-s390x-cross.docker
> > Failed to merge submodule tests/lcitool/libvirt-ci (not checked out)
> > CONFLICT (submodule): Merge conflict in tests/lcitool/libvirt-ci
> > Recursive merging with submodules currently only supports trivial cases.
> > Please manually handle the merging of each conflicted submodule.
> > This can be accomplished with the following steps:
> >  - come back to superproject and run:
> >
> >   git add tests/lcitool/libvirt-ci
> >
> >to record the above merge or update
> >  - resolve any other conflicts in the superproject
> >  - commit the resulting index in the superproject
> > Automatic merge failed; fix conflicts and then commit the result.
> >
> > Stefan
> >
> > On Tue, 19 Sept 2023 at 02:59, Alex Bennée  wrote:
> >>
> >> The following changes since commit 
> >> 13d6b1608160de40ec65ae4c32419e56714bbadf:
> >>
> >>   Merge tag 'pull-crypto-20230915' of https://gitlab.com/rth7680/qemu into 
> >> staging (2023-09-18 11:04:21 -0400)
> >>
> >> are available in the Git repository at:
> >>
> >>   https://gitlab.com/stsquad/qemu.git tags/pull-maintainer-ominbus-190923-1
> >>
> >> for you to fetch changes up to bb3c01212b54595f5bbdbe235cb353b220f94943:
> >>
> >>   tests/avocado: Disable MIPS Malta tests due to GitLab issue #1884 
> >> (2023-09-19 07:46:02 +0100)
> >>
> >> 
> >> testing updates:
> >>
> >>   - update most Debian to bookworm
> >>   - fix some typos
> >>   - update loongarch toolchain
> >>   - fix microbit test
> >>   - handle GitLab/Cirrus timeout discrepancy
> >>   - improve avocado console handling
> >>   - disable mips avocado images pending bugfix
> >>
> >> 
> >> Alex Bennée (2):
> >>   tests: update most Debian images to Bookworm
> >>   gitlab: fix typo/spelling in comments
> >>
> >> Daniel P. Berrangé (4):
> >>   microbit: add missing qtest_quit() call
> >>   qtest: kill orphaned qtest QEMU processes on FreeBSD
> >>   gitlab: make Cirrus CI timeout explicit
> >>   gitlab: make Cirrus CI jobs gating
> >>
> >> Nicholas Piggin (1):
> >>   tests/avocado: Fix console data loss
> >>
> >> Philippe Mathieu-Daudé (1):
> >>   tests/avocado: Disable MIPS Malta tests due to GitLab issue #1884
> >>
> >> Richard Henderson (1):
> >>   tests/docker: Update docker-loongarch-cross toolchain
> >>
> >>  tests/qtest/libqtest.c|  7 +++
> >>  tests/qtest/microbit-test.c   |  2 ++
> >>  .gitlab-ci.d/base.yml |  2 +-
> >>  .gitlab-ci.d/cirrus.yml   |  4 +++-
> >>  .gitlab-ci.d/cirrus/build.yml |  2 ++
> >>  python/qemu/machine/machine.py| 19 
> >> +++
> >>  tests/avocado/avocado_qemu/__init__.py|  2 +-
> >>  tests/avocado/boot_linux_console.py   |  7 +++
> >>  tests/avocado/machine_mips_malta.py   |  6 ++
>

Re: [PULL v2 0/9] testing updates (back to green!)

2023-09-19 Thread Stefan Hajnoczi

There is some funny business with tests/lcitool/libvirt-ci. Please
rebase on master and send a v3. Sorry for the trouble, I am afraid I
would mess something up with the submodule if I attempted to resolve
it myself.

(If you don't see a conflict when rebasing, please wait until the end
of the day when the other pull requests queued on the staging branch
are pushed to master.)

Thanks!

Auto-merging tests/docker/dockerfiles/debian-amd64-cross.docker
Auto-merging tests/docker/dockerfiles/debian-amd64.docker
Auto-merging tests/docker/dockerfiles/debian-arm64-cross.docker
Auto-merging tests/docker/dockerfiles/debian-armhf-cross.docker
Auto-merging tests/docker/dockerfiles/debian-ppc64el-cross.docker
Auto-merging tests/docker/dockerfiles/debian-s390x-cross.docker
Failed to merge submodule tests/lcitool/libvirt-ci (not checked out)
CONFLICT (submodule): Merge conflict in tests/lcitool/libvirt-ci
Recursive merging with submodules currently only supports trivial cases.
Please manually handle the merging of each conflicted submodule.
This can be accomplished with the following steps:
 - come back to superproject and run:

  git add tests/lcitool/libvirt-ci

   to record the above merge or update
 - resolve any other conflicts in the superproject
 - commit the resulting index in the superproject
Automatic merge failed; fix conflicts and then commit the result.

Stefan

On Tue, 19 Sept 2023 at 02:59, Alex Bennée  wrote:
>
> The following changes since commit 13d6b1608160de40ec65ae4c32419e56714bbadf:
>
>   Merge tag 'pull-crypto-20230915' of https://gitlab.com/rth7680/qemu into 
> staging (2023-09-18 11:04:21 -0400)
>
> are available in the Git repository at:
>
>   https://gitlab.com/stsquad/qemu.git tags/pull-maintainer-ominbus-190923-1
>
> for you to fetch changes up to bb3c01212b54595f5bbdbe235cb353b220f94943:
>
>   tests/avocado: Disable MIPS Malta tests due to GitLab issue #1884 
> (2023-09-19 07:46:02 +0100)
>
> 
> testing updates:
>
>   - update most Debian to bookworm
>   - fix some typos
>   - update loongarch toolchain
>   - fix microbit test
>   - handle GitLab/Cirrus timeout discrepancy
>   - improve avocado console handling
>   - disable mips avocado images pending bugfix
>
> 
> Alex Bennée (2):
>   tests: update most Debian images to Bookworm
>   gitlab: fix typo/spelling in comments
>
> Daniel P. Berrangé (4):
>   microbit: add missing qtest_quit() call
>   qtest: kill orphaned qtest QEMU processes on FreeBSD
>   gitlab: make Cirrus CI timeout explicit
>   gitlab: make Cirrus CI jobs gating
>
> Nicholas Piggin (1):
>   tests/avocado: Fix console data loss
>
> Philippe Mathieu-Daudé (1):
>   tests/avocado: Disable MIPS Malta tests due to GitLab issue #1884
>
> Richard Henderson (1):
>   tests/docker: Update docker-loongarch-cross toolchain
>
>  tests/qtest/libqtest.c|  7 +++
>  tests/qtest/microbit-test.c   |  2 ++
>  .gitlab-ci.d/base.yml |  2 +-
>  .gitlab-ci.d/cirrus.yml   |  4 +++-
>  .gitlab-ci.d/cirrus/build.yml |  2 ++
>  python/qemu/machine/machine.py| 19 
> +++
>  tests/avocado/avocado_qemu/__init__.py|  2 +-
>  tests/avocado/boot_linux_console.py   |  7 +++
>  tests/avocado/machine_mips_malta.py   |  6 ++
>  tests/avocado/replay_kernel.py|  7 +++
>  tests/avocado/tuxrun_baselines.py |  4 
>  tests/docker/dockerfiles/debian-amd64-cross.docker| 10 +++---
>  tests/docker/dockerfiles/debian-amd64.docker  | 10 +++---
>  tests/docker/dockerfiles/debian-arm64-cross.docker| 10 +++---
>  tests/docker/dockerfiles/debian-armel-cross.docker|  2 +-
>  tests/docker/dockerfiles/debian-armhf-cross.docker| 10 +++---
>  .../docker/dockerfiles/debian-loongarch-cross.docker  |  2 +-
>  tests/docker/dockerfiles/debian-ppc64el-cross.docker  | 10 +++---
>  tests/docker/dockerfiles/debian-s390x-cross.docker| 10 +++---
>  tests/docker/dockerfiles/ubuntu2004.docker|  2 +-
>  tests/docker/dockerfiles/ubuntu2204.docker|  2 +-
>  tests/lcitool/libvirt-ci  |  2 +-
>  tests/lcitool/refresh | 17 +
>  23 files changed, 91 insertions(+), 58 deletions(-)
>
> --
> 2.39.2
>
>

Re: [PULL 00/28] Block layer patches

2023-09-18 Thread Stefan Hajnoczi

Hi Kevin,
I believe that my own commit "block-coroutine-wrapper: use
qemu_get_current_aio_context()" breaks this test. The failure is
non-deterministic (happens about 1 out of 4 runs).

It seems the job hangs and the test times out in vm.run_job('job1', wait=5.0).

I haven't debugged it yet but wanted to share this information to save
some time. Tomorrow I'll investigate further.

Stefan

Re: [PULL 00/19] crypto: Provide clmul.h and host accel

2023-09-18 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

[PATCH v2 1/2] qdev: add IOThreadVirtQueueMappingList property type

2023-09-18 Thread Stefan Hajnoczi

virtio-blk and virtio-scsi devices will need a way to specify the
mapping between IOThreads and virtqueues. At the moment all virtqueues
are assigned to a single IOThread or the main loop. This single thread
can be a CPU bottleneck, so it is necessary to allow finer-grained
assignment to spread the load.

Introduce DEFINE_PROP_IOTHREAD_VQ_MAPPING_LIST() so devices can take a
parameter that maps virtqueues to IOThreads. The command-line syntax for
this new property is as follows:

  --device 
'{"driver":"foo","iothread-vq-mapping":[{"iothread":"iothread0","vqs":[0,1,2]},...]}'

IOThreads are specified by name and virtqueues are specified by 0-based
index.

It will be common to simply assign virtqueues round-robin across a set
of IOThreads. A convenient syntax that does not require specifying
individual virtqueue indices is available:

  --device 
'{"driver":"foo","iothread-vq-mapping":[{"iothread":"iothread0"},{"iothread":"iothread1"},...]}'

Signed-off-by: Stefan Hajnoczi 
---
 qapi/virtio.json| 30 ++
 include/hw/qdev-properties-system.h |  4 +++
 hw/core/qdev-properties-system.c| 47 +
 3 files changed, 81 insertions(+)

diff --git a/qapi/virtio.json b/qapi/virtio.json
index e6dcee7b83..cb341ae596 100644
--- a/qapi/virtio.json
+++ b/qapi/virtio.json
@@ -928,3 +928,33 @@
   'data': { 'path': 'str', 'queue': 'uint16', '*index': 'uint16' },
   'returns': 'VirtioQueueElement',
   'features': [ 'unstable' ] }
+
+##
+# @IOThreadVirtQueueMapping:
+#
+# Describes the subset of virtqueues assigned to an IOThread.
+#
+# @iothread: the id of IOThread object
+# @vqs: an optional array of virtqueue indices that will be handled by this
+#   IOThread. When absent, virtqueues are assigned round-robin across all
+#   IOThreadVirtQueueMappings provided. Either all
+#   IOThreadVirtQueueMappings must have @vqs or none of them must have it.
+#
+# Since: 8.2
+#
+##
+
+{ 'struct': 'IOThreadVirtQueueMapping',
+  'data': { 'iothread': 'str', '*vqs': ['uint16'] } }
+
+##
+# @IOThreadVirtQueueMappings:
+#
+# IOThreadVirtQueueMapping list. This struct is not actually used but the
+# IOThreadVirtQueueMappingList type it generates is!
+#
+# Since: 8.2
+##
+
+{ 'struct': 'IOThreadVirtQueueMappings',
+  'data': { 'mappings': ['IOThreadVirtQueueMapping'] } }
diff --git a/include/hw/qdev-properties-system.h 
b/include/hw/qdev-properties-system.h
index 0ac327ae60..c526e502c8 100644
--- a/include/hw/qdev-properties-system.h
+++ b/include/hw/qdev-properties-system.h
@@ -22,6 +22,7 @@ extern const PropertyInfo qdev_prop_audiodev;
 extern const PropertyInfo qdev_prop_off_auto_pcibar;
 extern const PropertyInfo qdev_prop_pcie_link_speed;
 extern const PropertyInfo qdev_prop_pcie_link_width;
+extern const PropertyInfo qdev_prop_iothread_vq_mapping_list;
 
 #define DEFINE_PROP_PCI_DEVFN(_n, _s, _f, _d)   \
 DEFINE_PROP_SIGNED(_n, _s, _f, _d, qdev_prop_pci_devfn, int32_t)
@@ -73,5 +74,8 @@ extern const PropertyInfo qdev_prop_pcie_link_width;
 #define DEFINE_PROP_UUID_NODEFAULT(_name, _state, _field) \
 DEFINE_PROP(_name, _state, _field, qdev_prop_uuid, QemuUUID)
 
+#define DEFINE_PROP_IOTHREAD_VQ_MAPPING_LIST(_name, _state, _field) \
+DEFINE_PROP(_name, _state, _field, qdev_prop_iothread_vq_mapping_list, \
+IOThreadVirtQueueMappingList *)
 
 #endif
diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
index 6d5d43eda2..831796e106 100644
--- a/hw/core/qdev-properties-system.c
+++ b/hw/core/qdev-properties-system.c
@@ -18,6 +18,7 @@
 #include "qapi/qapi-types-block.h"
 #include "qapi/qapi-types-machine.h"
 #include "qapi/qapi-types-migration.h"
+#include "qapi/qapi-visit-virtio.h"
 #include "qapi/qmp/qerror.h"
 #include "qemu/ctype.h"
 #include "qemu/cutils.h"
@@ -1147,3 +1148,49 @@ const PropertyInfo qdev_prop_uuid = {
 .set   = set_uuid,
 .set_default_value = set_default_uuid_auto,
 };
+
+/* --- IOThreadVirtQueueMappingList --- */
+
+static void get_iothread_vq_mapping_list(Object *obj, Visitor *v,
+const char *name, void *opaque, Error **errp)
+{
+IOThreadVirtQueueMappingList **prop_ptr =
+object_field_prop_ptr(obj, opaque);
+IOThreadVirtQueueMappingList *list = *prop_ptr;
+
+visit_type_IOThreadVirtQueueMappingList(v, name, &list, errp);
+}
+
+static void set_iothread_vq_mapping_list(Object *obj, Visitor *v,
+const char *name, void *opaque, Error **errp)
+{
+IOThreadVirtQueueMappingList **prop_ptr =
+

[PATCH v2 2/2] virtio-blk: add iothread-vq-mapping parameter

2023-09-18 Thread Stefan Hajnoczi

Add the iothread-vq-mapping parameter to assign virtqueues to IOThreads.
Store the vq:AioContext mapping in the new struct
VirtIOBlockDataPlane->vq_aio_context[] field and refactor the code to
use the per-vq AioContext instead of the BlockDriverState's AioContext.

Reimplement --device virtio-blk-pci,iothread= and non-IOThread mode by
assigning all virtqueues to the IOThread and main loop's AioContext in
vq_aio_context[], respectively.

The comment in struct VirtIOBlockDataPlane about EventNotifiers is
stale. Remove it.

Signed-off-by: Stefan Hajnoczi 
---
 hw/block/dataplane/virtio-blk.h |   3 +
 include/hw/virtio/virtio-blk.h  |   2 +
 hw/block/dataplane/virtio-blk.c | 163 
 hw/block/virtio-blk.c   |  92 +++---
 4 files changed, 206 insertions(+), 54 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.h b/hw/block/dataplane/virtio-blk.h
index 5e18bb99ae..1a806fe447 100644
--- a/hw/block/dataplane/virtio-blk.h
+++ b/hw/block/dataplane/virtio-blk.h
@@ -28,4 +28,7 @@ void virtio_blk_data_plane_notify(VirtIOBlockDataPlane *s, 
VirtQueue *vq);
 int virtio_blk_data_plane_start(VirtIODevice *vdev);
 void virtio_blk_data_plane_stop(VirtIODevice *vdev);
 
+void virtio_blk_data_plane_detach(VirtIOBlockDataPlane *s);
+void virtio_blk_data_plane_attach(VirtIOBlockDataPlane *s);
+
 #endif /* HW_DATAPLANE_VIRTIO_BLK_H */
diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index 9881009c22..5e4091e4da 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -21,6 +21,7 @@
 #include "sysemu/block-backend.h"
 #include "sysemu/block-ram-registrar.h"
 #include "qom/object.h"
+#include "qapi/qapi-types-virtio.h"
 
 #define TYPE_VIRTIO_BLK "virtio-blk-device"
 OBJECT_DECLARE_SIMPLE_TYPE(VirtIOBlock, VIRTIO_BLK)
@@ -37,6 +38,7 @@ struct VirtIOBlkConf
 {
 BlockConf conf;
 IOThread *iothread;
+IOThreadVirtQueueMappingList *iothread_vq_mapping_list;
 char *serial;
 uint32_t request_merging;
 uint16_t num_queues;
diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index f83bb0f116..7c933ed800 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -32,13 +32,11 @@ struct VirtIOBlockDataPlane {
 VirtIOBlkConf *conf;
 VirtIODevice *vdev;
 
-/* Note that these EventNotifiers are assigned by value.  This is
- * fine as long as you do not call event_notifier_cleanup on them
- * (because you don't own the file descriptor or handle; you just
- * use it).
+/*
+ * The AioContext for each virtqueue. The BlockDriverState will use the
+ * first element as its AioContext.
  */
-IOThread *iothread;
-AioContext *ctx;
+AioContext **vq_aio_context;
 };
 
 /* Raise an interrupt to signal guest, if necessary */
@@ -47,6 +45,45 @@ void virtio_blk_data_plane_notify(VirtIOBlockDataPlane *s, 
VirtQueue *vq)
 virtio_notify_irqfd(s->vdev, vq);
 }
 
+/* Generate vq:AioContext mappings from a validated iothread-vq-mapping list */
+static void
+apply_vq_mapping(IOThreadVirtQueueMappingList *iothread_vq_mapping_list,
+ AioContext **vq_aio_context, uint16_t num_queues)
+{
+IOThreadVirtQueueMappingList *node;
+size_t num_iothreads = 0;
+size_t cur_iothread = 0;
+
+for (node = iothread_vq_mapping_list; node; node = node->next) {
+num_iothreads++;
+}
+
+for (node = iothread_vq_mapping_list; node; node = node->next) {
+IOThread *iothread = iothread_by_id(node->value->iothread);
+AioContext *ctx = iothread_get_aio_context(iothread);
+
+/* Released in virtio_blk_data_plane_destroy() */
+object_ref(OBJECT(iothread));
+
+if (node->value->vqs) {
+uint16List *vq;
+
+/* Explicit vq:IOThread assignment */
+for (vq = node->value->vqs; vq; vq = vq->next) {
+vq_aio_context[vq->value] = ctx;
+}
+} else {
+/* Round-robin vq:IOThread assignment */
+for (unsigned i = cur_iothread; i < num_queues;
+ i += num_iothreads) {
+vq_aio_context[i] = ctx;
+}
+}
+
+cur_iothread++;
+}
+}
+
 /* Context: QEMU global mutex held */
 bool virtio_blk_data_plane_create(VirtIODevice *vdev, VirtIOBlkConf *conf,
   VirtIOBlockDataPlane **dataplane,
@@ -58,7 +95,7 @@ bool virtio_blk_data_plane_create(VirtIODevice *vdev, 
VirtIOBlkConf *conf,
 
 *dataplane = NULL;
 
-if (conf->iothread) {
+if (conf->iothread || conf->iothread_vq_mapping_list) {
 if (!k->set_guest_notifiers || !k->ioeventfd_assign) {
 error_setg(errp,
"device is incompatible with iothread "
@@ -86,13 +123,24 @@ bool virtio_blk_data_p

[PATCH v2 0/2] virtio-blk: add iothread-vq-mapping parameter

2023-09-18 Thread Stefan Hajnoczi

virtio-blk and virtio-scsi devices need a way to specify the mapping between
IOThreads and virtqueues. At the moment all virtqueues are assigned to a single
IOThread or the main loop. This single thread can be a CPU bottleneck, so it is
necessary to allow finer-grained assignment to spread the load. With this
series applied, "pidstat -t 1" shows that guests with -smp 2 or higher are able
to exploit multiple IOThreads.

This series introduces command-line syntax for the new iothread-vq-mapping
property is as follows:

  --device 
'{"driver":"virtio-blk-pci","iothread-vq-mapping":[{"iothread":"iothread0","vqs":[0,1,2]},...]},...'

IOThreads are specified by name and virtqueues are specified by 0-based
index.

It will be common to simply assign virtqueues round-robin across a set
of IOThreads. A convenient syntax that does not require specifying
individual virtqueue indices is available:

  --device 
'{"driver":"virtio-blk-pci","iothread-vq-mapping":[{"iothread":"iothread0"},{"iothread":"iothread1"},...]},...'

There is no way to reassign virtqueues at runtime and I expect that to be a
very rare requirement.

Note that JSON --device syntax is required for the iothread-vq-mapping
parameter because it's non-scalar.

Based-on: 20230912231037.826804-1-stefa...@redhat.com ("[PATCH v3 0/5] 
block-backend: process I/O in the current AioContext")

Stefan Hajnoczi (2):
  qdev: add IOThreadVirtQueueMappingList property type
  virtio-blk: add iothread-vq-mapping parameter

 qapi/virtio.json|  30 +
 hw/block/dataplane/virtio-blk.h |   3 +
 include/hw/qdev-properties-system.h |   4 +
 include/hw/virtio/virtio-blk.h  |   2 +
 hw/block/dataplane/virtio-blk.c | 163 +---
 hw/block/virtio-blk.c   |  92 ++--
 hw/core/qdev-properties-system.c|  47 
 7 files changed, 287 insertions(+), 54 deletions(-)

-- 
2.41.0

Re: [PULL 00/28] Block layer patches

2023-09-18 Thread Stefan Hajnoczi

Hi Kevin,
The following CI failure looks like it is related to this pull
request. Please take a look:
https://gitlab.com/qemu-project/qemu/-/jobs/5112083994

▶ 823/840 qcow2 iothreads-commit-active FAIL
823/840 qemu:block / io-qcow2-iothreads-commit-active ERROR 6.16s exit status 1
>>> MALLOC_PERTURB_=184 
>>> PYTHON=/home/gitlab-runner/builds/E8PpwMky/0/qemu-project/qemu/build/pyvenv/bin/python3
>>>  
>>> /home/gitlab-runner/builds/E8PpwMky/0/qemu-project/qemu/build/pyvenv/bin/python3
>>>  
>>> /home/gitlab-runner/builds/E8PpwMky/0/qemu-project/qemu/build/../tests/qemu-iotests/check
>>>  -tap -qcow2 iothreads-commit-active --source-dir 
>>> /home/gitlab-runner/builds/E8PpwMky/0/qemu-project/qemu/tests/qemu-iotests 
>>> --build-dir 
>>> /home/gitlab-runner/builds/E8PpwMky/0/qemu-project/qemu/build/tests/qemu-iotests
― ✀ ―
stderr:
--- 
/home/gitlab-runner/builds/E8PpwMky/0/qemu-project/qemu/tests/qemu-iotests/tests/iothreads-commit-active.out
+++ 
/home/gitlab-runner/builds/E8PpwMky/0/qemu-project/qemu/build/scratch/qcow2-file-iothreads-commit-active/iothreads-commit-active.out.bad
@@ -18,6 +18,35 @@
{"execute": "job-complete", "arguments": {"id": "job1"}}
{"return": {}}
{"data": {"device": "job1", "len": 131072, "offset": 131072, "speed":
0, "type": "commit"}, "event": "BLOCK_JOB_READY", "timestamp":
{"microseconds": "USECS", "seconds": "SECS"}}
-{"data": {"device": "job1", "len": 131072, "offset": 131072, "speed":
0, "type": "commit"}, "event": "BLOCK_JOB_COMPLETED", "timestamp":
{"microseconds": "USECS", "seconds": "SECS"}}
-{"execute": "job-dismiss", "arguments": {"id": "job1"}}
-{"return": {}}
+Traceback (most recent call last):
+ File 
"/home/gitlab-runner/builds/E8PpwMky/0/qemu-project/qemu/python/qemu/qmp/events.py",
line 557, in get
+ return await self._queue.get()
+ File "/usr/lib/python3.10/asyncio/queues.py", line 159, in get
+ await getter
+asyncio.exceptions.CancelledError
+
+During handling of the above exception, another exception occurred:
+
+Traceback (most recent call last):
+ File "/usr/lib/python3.10/asyncio/tasks.py", line 456, in wait_for
+ return fut.result()
+asyncio.exceptions.CancelledError
+
+The above exception was the direct cause of the following exception:

On Fri, 15 Sept 2023 at 10:45, Kevin Wolf  wrote:
>
> The following changes since commit 005ad32358f12fe9313a4a01918a55e60d4f39e5:
>
>   Merge tag 'pull-tpm-2023-09-12-3' of 
> https://github.com/stefanberger/qemu-tpm into staging (2023-09-13 13:41:57 
> -0400)
>
> are available in the Git repository at:
>
>   https://repo.or.cz/qemu/kevin.git tags/for-upstream
>
> for you to fetch changes up to 5d96864b73225ee61b0dad7e928f0cddf14270fc:
>
>   block-coroutine-wrapper: use qemu_get_current_aio_context() (2023-09-15 
> 15:49:14 +0200)
>
> 
> Block layer patches
>
> - Graph locking part 4 (node management)
> - qemu-img map: report compressed data blocks
> - block-backend: process I/O in the current AioContext
>
> 
> Andrey Drobyshev via (2):
>   block: add BDRV_BLOCK_COMPRESSED flag for bdrv_block_status()
>   qemu-img: map: report compressed data blocks
>
> Kevin Wolf (21):
>   block: Remove unused BlockReopenQueueEntry.perms_checked
>   preallocate: Factor out preallocate_truncate_to_real_size()
>   preallocate: Don't poll during permission updates
>   block: Take AioContext lock for bdrv_append() more consistently
>   block: Introduce bdrv_schedule_unref()
>   block-coroutine-wrapper: Add no_co_wrapper_bdrv_wrlock functions
>   block-coroutine-wrapper: Allow arbitrary parameter names
>   block: Mark bdrv_replace_child_noperm() GRAPH_WRLOCK
>   block: Mark bdrv_replace_child_tran() GRAPH_WRLOCK
>   block: Mark bdrv_attach_child_common() GRAPH_WRLOCK
>   block: Call transaction callbacks with lock held
>   block: Mark bdrv_attach_child() GRAPH_WRLOCK
>   block: Mark bdrv_parent_perms_conflict() and callers GRAPH_RDLOCK
>   block: Mark bdrv_get_cumulative_perm() and callers GRAPH_RDLOCK
>   block: Mark bdrv_child_perm() GRAPH_RDLOCK
>   block: Mark bdrv_parent_cb_chan

Re: [PULL 0/9] testing updates (back to green!)

2023-09-18 Thread Stefan Hajnoczi

On Fri, 15 Sept 2023 at 11:10, Alex Bennée  wrote:
>
> The following changes since commit 005ad32358f12fe9313a4a01918a55e60d4f39e5:
>
>   Merge tag 'pull-tpm-2023-09-12-3' of 
> https://github.com/stefanberger/qemu-tpm into staging (2023-09-13 13:41:57 
> -0400)
>
> are available in the Git repository at:
>
>   https://gitlab.com/stsquad/qemu.git tags/pull-maintainer-ominbus-150923-1
>
> for you to fetch changes up to 5acd4bf25dc9becd05b8772b94982722e1fa76a3:
>
>   tests/avocado: Disable MIPS Malta tests due to GitLab issue #1884 
> (2023-09-15 15:17:52 +0100)
>
> 
> testing updates:
>
>   - update most Debian to bookworm

This breaks the armel-debian-cross-container job:
https://gitlab.com/qemu-project/qemu/-/jobs/508339

I have dropped this pull request for now. Please take a look.

Thanks,
Stefan

>   - fix some typos
>   - update loongarch toolchain
>   - fix microbit test
>   - handle GitLab/Cirrus timeout discrepancy
>   - improve avocado console handling
>   - disable mips avocado images pending bugfix
>
> 
> Alex Bennée (2):
>   tests: update Debian images to Bookworm
>   gitlab: fix typo/spelling in comments
>
> Daniel P. Berrangé (4):
>   microbit: add missing qtest_quit() call
>   qtest: kill orphaned qtest QEMU processes on FreeBSD
>   gitlab: make Cirrus CI timeout explicit
>   gitlab: make Cirrus CI jobs gating
>
> Nicholas Piggin (1):
>   tests/avocado: Fix console data loss
>
> Philippe Mathieu-Daudé (1):
>   tests/avocado: Disable MIPS Malta tests due to GitLab issue #1884
>
> Richard Henderson (1):
>   tests/docker: Update docker-loongarch-cross toolchain
>
>  tests/qtest/libqtest.c|  7 +++
>  tests/qtest/microbit-test.c   |  2 ++
>  .gitlab-ci.d/base.yml |  2 +-
>  .gitlab-ci.d/cirrus.yml   |  4 +++-
>  .gitlab-ci.d/cirrus/build.yml |  2 ++
>  python/qemu/machine/machine.py| 19 
> +++
>  tests/avocado/avocado_qemu/__init__.py|  2 +-
>  tests/avocado/boot_linux_console.py   |  7 +++
>  tests/avocado/machine_mips_malta.py   |  6 ++
>  tests/avocado/replay_kernel.py|  7 +++
>  tests/avocado/tuxrun_baselines.py |  4 
>  tests/docker/dockerfiles/debian-amd64-cross.docker| 10 +++---
>  tests/docker/dockerfiles/debian-amd64.docker  | 10 +++---
>  tests/docker/dockerfiles/debian-arm64-cross.docker| 10 +++---
>  tests/docker/dockerfiles/debian-armel-cross.docker| 10 +++---
>  tests/docker/dockerfiles/debian-armhf-cross.docker| 10 +++---
>  .../docker/dockerfiles/debian-loongarch-cross.docker  |  2 +-
>  tests/docker/dockerfiles/debian-ppc64el-cross.docker  | 10 +++---
>  tests/docker/dockerfiles/debian-s390x-cross.docker| 10 +++---
>  tests/docker/dockerfiles/ubuntu2004.docker|  2 +-
>  tests/docker/dockerfiles/ubuntu2204.docker|  2 +-
>  tests/lcitool/libvirt-ci  |  2 +-
>  tests/lcitool/refresh | 18 +-
>  23 files changed, 93 insertions(+), 65 deletions(-)
>
>
> --
> 2.39.2
>
>

Re: [PULL 0/3] Firmware/seabios 20230912 patches

2023-09-18 Thread Stefan Hajnoczi

On Mon, 18 Sept 2023 at 06:00, Gerd Hoffmann  wrote:
>
> > Hi Gerd,
> > I think either this pull request or your edk2 pull request causes the
> > following CI failure:
> >
> > >>> G_TEST_DBUS_DAEMON=/builds/qemu-project/qemu/tests/dbus-vmstate-daemon.sh
> > >>>  QTEST_QEMU_BINARY=./qemu-system-aarch64 MALLOC_PERTURB_=199 
> > >>> /builds/qemu-project/qemu/build/tests/qtest/bios-tables-test --tap -k
> > ― ✀ 
> > ―
>
> Address change in ACPI tables (edk2 PR):
>
>  DefinitionBlock ("", "SSDT", 1, "BOCHS ", "NVDIMM", 0x0001)
>  {
>  Scope (\_SB)
>  {
>  Device (NVDR)
>  {
>  Name (_HID, "ACPI0012" /* NVDIMM Root Device */)  // _HID: 
> Hardware ID
>  [ ... ]
>  }
>  }
>
> -Name (MEMA, 0x43D1)
> +Name (MEMA, 0x43C9)
>  }
>
> seabios PR is fine and passes "make check".

I'm still seeing a CI failure:

3/61 qemu:qtest+qtest-x86_64 / qtest-x86_64/bios-tables-test ERROR
19.18s killed by signal 6 SIGABRT
― ✀ ―
stderr:
acpi-test: Warning! DSDT binary file mismatch. Actual
[aml:/var/folders/76/zy5ktkns50v6gt5g8r0sf6scgn/T/aml-SW7IB2],
Expected [aml:tests/data/acpi/q35/DSDT.mmio64].
See source file tests/qtest/bios-tables-test.c for instructions on how
to update expected files.
to see ASL diff between mismatched files install IASL, rebuild QEMU
from scratch and re-run tests with V=1 environment variable set**
ERROR:../tests/qtest/bios-tables-test.c:535:test_acpi_asl: assertion
failed: (all_tables_match)
(test program exited with status code -6)

https://gitlab.com/qemu-project/qemu/-/jobs/5110608123

>
> take care,
>   Gerd
>

Re: [PATCH v3 2/5] softmmu: Support concurrent bounce buffers

2023-09-15 Thread Stefan Hajnoczi

On Fri, 15 Sept 2023 at 05:55, Mattias Nissler  wrote:
>
> On Thu, Sep 14, 2023 at 8:49 PM Stefan Hajnoczi  wrote:
> >
> > On Thu, Sep 07, 2023 at 06:04:07AM -0700, Mattias Nissler wrote:
> > > When DMA memory can't be directly accessed, as is the case when
> > > running the device model in a separate process without shareable DMA
> > > file descriptors, bounce buffering is used.
> > >
> > > It is not uncommon for device models to request mapping of several DMA
> > > regions at the same time. Examples include:
> > >  * net devices, e.g. when transmitting a packet that is split across
> > >several TX descriptors (observed with igb)
> > >  * USB host controllers, when handling a packet with multiple data TRBs
> > >(observed with xhci)
> > >
> > > Previously, qemu only provided a single bounce buffer per AddressSpace
> > > and would fail DMA map requests while the buffer was already in use. In
> > > turn, this would cause DMA failures that ultimately manifest as hardware
> > > errors from the guest perspective.
> > >
> > > This change allocates DMA bounce buffers dynamically instead of
> > > supporting only a single buffer. Thus, multiple DMA mappings work
> > > correctly also when RAM can't be mmap()-ed.
> > >
> > > The total bounce buffer allocation size is limited individually for each
> > > AddressSpace. The default limit is 4096 bytes, matching the previous
> > > maximum buffer size. A new x-max-bounce-buffer-size parameter is
> > > provided to configure the limit for PCI devices.
> > >
> > > Signed-off-by: Mattias Nissler 
> > > ---
> > >  hw/pci/pci.c|  8 
> > >  include/exec/memory.h   | 14 ++
> > >  include/hw/pci/pci_device.h |  3 ++
> > >  softmmu/memory.c|  3 +-
> > >  softmmu/physmem.c   | 94 +
> > >  5 files changed, 80 insertions(+), 42 deletions(-)
> > >
> > > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> > > index 881d774fb6..8c4541b394 100644
> > > --- a/hw/pci/pci.c
> > > +++ b/hw/pci/pci.c
> > > @@ -85,6 +85,8 @@ static Property pci_props[] = {
> > >  QEMU_PCIE_ERR_UNC_MASK_BITNR, true),
> > >  DEFINE_PROP_BIT("x-pcie-ari-nextfn-1", PCIDevice, cap_present,
> > >  QEMU_PCIE_ARI_NEXTFN_1_BITNR, false),
> > > +DEFINE_PROP_SIZE("x-max-bounce-buffer-size", PCIDevice,
> > > + max_bounce_buffer_size, 4096),
> > >  DEFINE_PROP_END_OF_LIST()
> > >  };
> > >
> > > @@ -1208,6 +1210,8 @@ static PCIDevice *do_pci_register_device(PCIDevice 
> > > *pci_dev,
> > > "bus master container", UINT64_MAX);
> > >  address_space_init(&pci_dev->bus_master_as,
> > > &pci_dev->bus_master_container_region, 
> > > pci_dev->name);
> > > +pci_dev->bus_master_as.max_bounce_buffer_size =
> > > +pci_dev->max_bounce_buffer_size;
> > >
> > >  if (phase_check(PHASE_MACHINE_READY)) {
> > >  pci_init_bus_master(pci_dev);
> > > @@ -2664,6 +2668,10 @@ static void pci_device_class_init(ObjectClass 
> > > *klass, void *data)
> > >  k->unrealize = pci_qdev_unrealize;
> > >  k->bus_type = TYPE_PCI_BUS;
> > >  device_class_set_props(k, pci_props);
> > > +object_class_property_set_description(
> > > +klass, "x-max-bounce-buffer-size",
> > > +"Maximum buffer size allocated for bounce buffers used for 
> > > mapped "
> > > +"access to indirect DMA memory");
> > >  }
> > >
> > >  static void pci_device_class_base_init(ObjectClass *klass, void *data)
> > > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > > index 7d68936157..5577542b5e 100644
> > > --- a/include/exec/memory.h
> > > +++ b/include/exec/memory.h
> > > @@ -1081,14 +1081,6 @@ typedef struct AddressSpaceMapClient {
> > >  QLIST_ENTRY(AddressSpaceMapClient) link;
> > >  } AddressSpaceMapClient;
> > >
> > > -typedef struct {
> > > -MemoryRegion *mr;
> > > -void *buffer;
> > > -hwaddr addr;
> > > -hwaddr len;
> > > -bool in_use;
> > > -} BounceBuffer;
> > > -
> > >  /**
>

Re: [RFC PATCH 0/8] i386/sev: Use C API of Rust SEV library

2023-09-15 Thread Stefan Hajnoczi

On Fri, 15 Sept 2023 at 09:50, Peter Maydell  wrote:
>
> On Fri, 15 Sept 2023 at 10:54, Daniel P. Berrangé  wrote:
> > My summary, is that I'd personally be in favour of opening the door
> > to Rust code as a mandatory pre-requisite for QEMU, at the very least
> > for system emulators. Not because this particular series is compelling,
> > but because I think Rust could be more beneficial to QEMU over the long
> > term than we expect. In terms of consuming it though, if we're going
> > to replace existing QEMU functionality, then I think we need to bundle
> > the Rust code and natively integrate it into the build system, as we
> > have recently started doing with our python deps, to detach ourselves
> > from the limits of what distros ship.
>
> I'm not against this, but there is a fair amount of work here
> in figuring out how exactly to integrate Rust components
> into the build system, questions like what our minimum required
> rust version would be, liasing with downstream distros to
> check that what we're proposing isn't a nightmare for them
> to package, etc.

Those details are similar to what librsvg2, libblkio, and other
libraries (like the sev crate in this patch series) have had to solve.

libblkio uses meson as the build system and has C tests that cover the
C API. Cargo is still used to build the Rust code. It is possible to
integrate the two and I think QEMU could take that approach. It's a
little ugly to glue together the two build systems, but it has been
shown to work.

Finding the minimum Rust version across QEMU's support matrix is doable.

Stefan

Re: [RFC PATCH 0/8] i386/sev: Use C API of Rust SEV library

2023-09-15 Thread Stefan Hajnoczi

On Fri, 15 Sept 2023 at 05:54, Daniel P. Berrangé  wrote:
>
> On Thu, Sep 14, 2023 at 01:58:27PM -0400, Tyler Fanelli wrote:
> > These patches are submitted as an RFC mainly because I'm a relative
> > newcomer to QEMU with no knowledge of the community's views on
> > including Rust code, nor it's preference of using library APIs for
> > ioctls that were previously implemented in QEMU directly.
>
> We've talked about Rust alot, but thus far most focus has been on
> areas peripheral to QEMU. Projects that might have been part of
> QEMU in the past, and now being done as separate efforts, and
> have bene taking advantage of Rust. eg virtiofsd Rust replacing
> QEMU's in -tree C impl. eg passt providing an alternative to
> slirp. eg the dbus display in QEMU allowing a remote display
> frontend to be provided, written in rust. eg libblkio providing
> a block backend in Rust.
>
> The libblkio work is likely closest to what you've proposed
> here, in that it is a Rust create exposed as a C shared library
> for apps to consume. In theory apps don't need to care that it
> is written in Rust, as it is opaque.
>
> The one key difference though is that it was not replacing
> existing functionality, it was adding a new feature. So users
> who didn't have libblkio or whom want to avoid Rust dependancies
> didn't loose anything they were already using.
>
> If we use the libsev.so we create a hard dependancy on the Rust
> sev crate, otherwise users loose the SEV feature in QEMU. Right
> now the sev crate C library is not present in *any* distro that
> I can see.
>
> If we treat 'sev' as just another opaque 3rd party library to be
> provided by the distro, this creates a problem. Our support
> policy is that we usually won't drop features in existing distros,
> but that is what would happen if we applied this patchset today.
> We did bend that rule slightly with virtiofsd, but that was already
> a separate binary and we followed our deprecation path before
> deleting it, giving distros time to adapt.
>
>
> If we rollback the curtain, however, and decide to expose Rust
> directly to QEMU we could address this problem. We could bundle
> the dependant Rust crates directly with QEMU tarballs, and
> generate the FFI C library as part of QEMU build and static
> link the library. Distros would not have todo anything, though
> they could have the choice of dyn linking if they really wanted
> to.
>
> If we directly exposed the notion of Rust to QEMU, then we are
> also not limited by whether a Rust crate provides a C FFI itself.
> QEMU could provide C FFI glue for any Rust crate it sees as
> useful to its code.
>
> This all forces us, however, to have the difficult discussion
> about whether we're willing to make Rust a mandatory dependancy
> of QEMU and permit (or even welcome) its use /anywhere/ in the
> QEMU tree that looks relevant.
>
> We've already queried whether Rust will actually benefit the
> core QEMU codebase, or whether we'll end up punching too many
> holes in its safety net to make it worthwhile. My opinion is
> that we probably shouldn't obsess over that as I think it is
> hard to predict the future, it has a habit of surprising us.
> Your patch series here doesn't demonstrate an obvious safety
> benefit, since we have existing working code and that code is
> not especially complex. Once we open the doors to Rust code
> in QEMU though, we will probably surprise ourselves with the
> range of benefits we'll see 2, 3, 5 years down the road.
>
> IOW, we shouldn't judge future benefits based on this patch
> series. It is great that this series is actually quite simple,
> because it lets us focus on how we might integrate Rust more
> directly into QEMU, without worrying much about the actual
> code being replaced.
>
> > This series looks to explore the possibility of using the library and
> > show a bit of what it would look like. I'm looking for comments
> > regarding if this feature is desired.
>
> My summary, is that I'd personally be in favour of opening the door
> to Rust code as a mandatory pre-requisite for QEMU, at the very least
> for system emulators. Not because this particular series is compelling,
> but because I think Rust could be more beneficial to QEMU over the long
> term than we expect. In terms of consuming it though, if we're going
> to replace existing QEMU functionality, then I think we need to bundle
> the Rust code and natively integrate it into the build system, as we
> have recently started doing with our python deps, to detach ourselves
> from the limits of what distros ship.

I support using Rust directly within QEMU.

David Gibson looked at Rust's operating system and CPU architecture
coverage a few years ago:
https://wiki.qemu.org/RustInQemu

Please update that support matrix to check that depending on Rust in
core QEMU code really works everywhere where QEMU is supported today.
This is probably just a formality at this stage since Rust has become
widely used over the past few years.

The lib

Re: [PATCH v4 02/14] simpletrace: annotate magic constants from QEMU code

2023-09-14 Thread Stefan Hajnoczi

On Wed, Aug 23, 2023 at 10:54:17AM +0200, Mads Ynddal wrote:
> From: Mads Ynddal 
> 
> It wasn't clear where the constants and structs came from, so I added
> comments to help.
> 
> Signed-off-by: Mads Ynddal 
> ---
>  scripts/simpletrace.py | 5 +
>  1 file changed, 5 insertions(+)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH v4 01/14] simpletrace: add all to define public interface

2023-09-14 Thread Stefan Hajnoczi

On Wed, Aug 23, 2023 at 10:54:16AM +0200, Mads Ynddal wrote:
> From: Mads Ynddal 
> 
> It was unclear what was the supported public interface. I.e. when
> refactoring the code, what functions/classes are important to retain.
> 
> Signed-off-by: Mads Ynddal 
> ---
>  scripts/simpletrace.py | 2 ++
>  1 file changed, 2 insertions(+)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH v3 5/5] vfio-user: Fix config space access byte order

2023-09-14 Thread Stefan Hajnoczi

On Thu, Sep 07, 2023 at 06:04:10AM -0700, Mattias Nissler wrote:
> PCI config space is little-endian, so on a big-endian host we need to
> perform byte swaps for values as they are passed to and received from
> the generic PCI config space access machinery.
> 
> Signed-off-by: Mattias Nissler 
> ---
>  hw/remote/vfio-user-obj.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

After some discussion about PCI Configuration Space endianness on IRC
with aw, mcayland, and f4bug I am now happy with this patch:

1. Configuration space can only be accessed in 1-, 2-, or 4-byte
   accesses.
2. If it's a 2- or 4-byte access then your patch adds the missing
   little-endian conversion.
3. If it's a 1-byte access then there is (effectively) no byteswap in
   the code path and the pci_dev->config[] array is already
   little-endian.

Reviewed-by: Stefan Hajnoczi 

signature.asc
Description: PGP signature

Re: [PATCH v3 5/5] vfio-user: Fix config space access byte order

2023-09-14 Thread Stefan Hajnoczi

On Thu, Sep 07, 2023 at 06:04:10AM -0700, Mattias Nissler wrote:
> PCI config space is little-endian, so on a big-endian host we need to
> perform byte swaps for values as they are passed to and received from
> the generic PCI config space access machinery.

Byteswapping only works when registers are accessed with their natural
size, even with this patch.

If there is something like a PCI Capability structure, then it needs to
be read one register at a time to get back valid data. It cannot simply
be copied in a single multi-DWORD access.

I'm not sure if this fix is sufficient. Maybe pci_host_*() needs to be
extended to support little-endian accesses instead?

> 
> Signed-off-by: Mattias Nissler 
> ---
>  hw/remote/vfio-user-obj.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
> index cee5e615a9..d38b4700f3 100644
> --- a/hw/remote/vfio-user-obj.c
> +++ b/hw/remote/vfio-user-obj.c
> @@ -281,7 +281,7 @@ static ssize_t vfu_object_cfg_access(vfu_ctx_t *vfu_ctx, 
> char * const buf,
>  while (bytes > 0) {
>  len = (bytes > pci_access_width) ? pci_access_width : bytes;
>  if (is_write) {
> -memcpy(&val, ptr, len);
> +val = ldn_le_p(ptr, len);
>  pci_host_config_write_common(o->pci_dev, offset,
>   pci_config_size(o->pci_dev),
>   val, len);
> @@ -289,7 +289,7 @@ static ssize_t vfu_object_cfg_access(vfu_ctx_t *vfu_ctx, 
> char * const buf,
>  } else {
>  val = pci_host_config_read_common(o->pci_dev, offset,
>pci_config_size(o->pci_dev), 
> len);
> -memcpy(ptr, &val, len);
> +stn_le_p(ptr, len, val);
>  trace_vfu_cfg_read(offset, val);
>  }
>  offset += len;
> -- 
> 2.34.1
> 


signature.asc
Description: PGP signature

Re: [PATCH v3 4/5] vfio-user: Message-based DMA support

2023-09-14 Thread Stefan Hajnoczi

On Thu, Sep 07, 2023 at 06:04:09AM -0700, Mattias Nissler wrote:
> Wire up support for DMA for the case where the vfio-user client does not
> provide mmap()-able file descriptors, but DMA requests must be performed
> via the VFIO-user protocol. This installs an indirect memory region,
> which already works for pci_dma_{read,write}, and pci_dma_map works
> thanks to the existing DMA bounce buffering support.
> 
> Note that while simple scenarios work with this patch, there's a known
> race condition in libvfio-user that will mess up the communication
> channel. See https://github.com/nutanix/libvfio-user/issues/279 for
> details as well as a proposed fix.
> 
> Signed-off-by: Mattias Nissler 
> ---
>  hw/remote/trace-events|  2 +
>  hw/remote/vfio-user-obj.c | 84 +++
>  2 files changed, 79 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/remote/trace-events b/hw/remote/trace-events
> index 0d1b7d56a5..358a68fb34 100644
> --- a/hw/remote/trace-events
> +++ b/hw/remote/trace-events
> @@ -9,6 +9,8 @@ vfu_cfg_read(uint32_t offset, uint32_t val) "vfu: cfg: 0x%x 
> -> 0x%x"
>  vfu_cfg_write(uint32_t offset, uint32_t val) "vfu: cfg: 0x%x <- 0x%x"
>  vfu_dma_register(uint64_t gpa, size_t len) "vfu: registering GPA 
> 0x%"PRIx64", %zu bytes"
>  vfu_dma_unregister(uint64_t gpa) "vfu: unregistering GPA 0x%"PRIx64""
> +vfu_dma_read(uint64_t gpa, size_t len) "vfu: DMA read 0x%"PRIx64", %zu bytes"
> +vfu_dma_write(uint64_t gpa, size_t len) "vfu: DMA write 0x%"PRIx64", %zu 
> bytes"
>  vfu_bar_register(int i, uint64_t addr, uint64_t size) "vfu: BAR %d: addr 
> 0x%"PRIx64" size 0x%"PRIx64""
>  vfu_bar_rw_enter(const char *op, uint64_t addr) "vfu: %s request for BAR 
> address 0x%"PRIx64""
>  vfu_bar_rw_exit(const char *op, uint64_t addr) "vfu: Finished %s of BAR 
> address 0x%"PRIx64""
> diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
> index 8b10c32a3c..cee5e615a9 100644
> --- a/hw/remote/vfio-user-obj.c
> +++ b/hw/remote/vfio-user-obj.c
> @@ -300,6 +300,63 @@ static ssize_t vfu_object_cfg_access(vfu_ctx_t *vfu_ctx, 
> char * const buf,
>  return count;
>  }
>  
> +static MemTxResult vfu_dma_read(void *opaque, hwaddr addr, uint64_t *val,
> +unsigned size, MemTxAttrs attrs)
> +{
> +MemoryRegion *region = opaque;
> +VfuObject *o = VFU_OBJECT(region->owner);
> +uint8_t buf[sizeof(uint64_t)];
> +
> +trace_vfu_dma_read(region->addr + addr, size);
> +
> +dma_sg_t *sg = alloca(dma_sg_size());

Variable-length arrays have recently been removed from QEMU and
alloca(3) is a similar case. An example is commit
b3c8246750b7077add335559341268f2956f6470 ("hw/nvme: Avoid dynamic stack
allocation").

libvfio-user returns a sane sizeof(struct dma_sg) value so we don't need
to worry about bogus values, so the risk is low here.

However, its hard to scan for and forbid the dangerous alloca(3) calls
when exceptions are made for some alloca(3) uses.

I would avoid alloca(3) and instead use:

  g_autofree dma_sg_t *sg = g_new(dma_sg_size(), 1);

> +vfu_dma_addr_t vfu_addr = (vfu_dma_addr_t)(region->addr + addr);
> +if (vfu_addr_to_sgl(o->vfu_ctx, vfu_addr, size, sg, 1, PROT_READ) < 0 ||
> +vfu_sgl_read(o->vfu_ctx, sg, 1, buf) != 0) {
> +return MEMTX_ERROR;
> +}
> +
> +*val = ldn_he_p(buf, size);
> +
> +return MEMTX_OK;
> +}
> +
> +static MemTxResult vfu_dma_write(void *opaque, hwaddr addr, uint64_t val,
> + unsigned size, MemTxAttrs attrs)
> +{
> +MemoryRegion *region = opaque;
> +VfuObject *o = VFU_OBJECT(region->owner);
> +uint8_t buf[sizeof(uint64_t)];
> +
> +trace_vfu_dma_write(region->addr + addr, size);
> +
> +stn_he_p(buf, size, val);
> +
> +dma_sg_t *sg = alloca(dma_sg_size());

Same here.

> +vfu_dma_addr_t vfu_addr = (vfu_dma_addr_t)(region->addr + addr);
> +if (vfu_addr_to_sgl(o->vfu_ctx, vfu_addr, size, sg, 1, PROT_WRITE) < 0 ||
> +vfu_sgl_write(o->vfu_ctx, sg, 1, buf) != 0)  {
> +return MEMTX_ERROR;
> +}
> +
> +return MEMTX_OK;
> +}
> +
> +static const MemoryRegionOps vfu_dma_ops = {
> +.read_with_attrs = vfu_dma_read,
> +.write_with_attrs = vfu_dma_write,
> +.endianness = DEVICE_HOST_ENDIAN,
> +.valid = {
> +.min_access_size = 1,
> +.max_access_size = 8,
> +.unaligned = true,
> +},
> +.impl = {
> +.min_access_size = 1,
> +.max_access_size = 8,
> +},
> +};
> +
>  static void dma_register(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
>  {
>  VfuObject *o = vfu_get_private(vfu_ctx);
> @@ -308,17 +365,30 @@ static void dma_register(vfu_ctx_t *vfu_ctx, 
> vfu_dma_info_t *info)
>  g_autofree char *name = NULL;
>  struct iovec *iov = &info->iova;
>  
> -if (!info->vaddr) {
> -return;
> -}
> -
>  name = g_strdup_printf("mem-%s-%"PRIx64"", o->device,
> -   (uint64_t)info->vaddr);
> +

Re: [PATCH v3 2/5] softmmu: Support concurrent bounce buffers

2023-09-14 Thread Stefan Hajnoczi

On Thu, Sep 07, 2023 at 06:04:07AM -0700, Mattias Nissler wrote:
> When DMA memory can't be directly accessed, as is the case when
> running the device model in a separate process without shareable DMA
> file descriptors, bounce buffering is used.
> 
> It is not uncommon for device models to request mapping of several DMA
> regions at the same time. Examples include:
>  * net devices, e.g. when transmitting a packet that is split across
>several TX descriptors (observed with igb)
>  * USB host controllers, when handling a packet with multiple data TRBs
>(observed with xhci)
> 
> Previously, qemu only provided a single bounce buffer per AddressSpace
> and would fail DMA map requests while the buffer was already in use. In
> turn, this would cause DMA failures that ultimately manifest as hardware
> errors from the guest perspective.
> 
> This change allocates DMA bounce buffers dynamically instead of
> supporting only a single buffer. Thus, multiple DMA mappings work
> correctly also when RAM can't be mmap()-ed.
> 
> The total bounce buffer allocation size is limited individually for each
> AddressSpace. The default limit is 4096 bytes, matching the previous
> maximum buffer size. A new x-max-bounce-buffer-size parameter is
> provided to configure the limit for PCI devices.
> 
> Signed-off-by: Mattias Nissler 
> ---
>  hw/pci/pci.c|  8 
>  include/exec/memory.h   | 14 ++
>  include/hw/pci/pci_device.h |  3 ++
>  softmmu/memory.c|  3 +-
>  softmmu/physmem.c   | 94 +
>  5 files changed, 80 insertions(+), 42 deletions(-)
> 
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index 881d774fb6..8c4541b394 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -85,6 +85,8 @@ static Property pci_props[] = {
>  QEMU_PCIE_ERR_UNC_MASK_BITNR, true),
>  DEFINE_PROP_BIT("x-pcie-ari-nextfn-1", PCIDevice, cap_present,
>  QEMU_PCIE_ARI_NEXTFN_1_BITNR, false),
> +DEFINE_PROP_SIZE("x-max-bounce-buffer-size", PCIDevice,
> + max_bounce_buffer_size, 4096),
>  DEFINE_PROP_END_OF_LIST()
>  };
>  
> @@ -1208,6 +1210,8 @@ static PCIDevice *do_pci_register_device(PCIDevice 
> *pci_dev,
> "bus master container", UINT64_MAX);
>  address_space_init(&pci_dev->bus_master_as,
> &pci_dev->bus_master_container_region, pci_dev->name);
> +pci_dev->bus_master_as.max_bounce_buffer_size =
> +pci_dev->max_bounce_buffer_size;
>  
>  if (phase_check(PHASE_MACHINE_READY)) {
>  pci_init_bus_master(pci_dev);
> @@ -2664,6 +2668,10 @@ static void pci_device_class_init(ObjectClass *klass, 
> void *data)
>  k->unrealize = pci_qdev_unrealize;
>  k->bus_type = TYPE_PCI_BUS;
>  device_class_set_props(k, pci_props);
> +object_class_property_set_description(
> +klass, "x-max-bounce-buffer-size",
> +"Maximum buffer size allocated for bounce buffers used for mapped "
> +"access to indirect DMA memory");
>  }
>  
>  static void pci_device_class_base_init(ObjectClass *klass, void *data)
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 7d68936157..5577542b5e 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -1081,14 +1081,6 @@ typedef struct AddressSpaceMapClient {
>  QLIST_ENTRY(AddressSpaceMapClient) link;
>  } AddressSpaceMapClient;
>  
> -typedef struct {
> -MemoryRegion *mr;
> -void *buffer;
> -hwaddr addr;
> -hwaddr len;
> -bool in_use;
> -} BounceBuffer;
> -
>  /**
>   * struct AddressSpace: describes a mapping of addresses to #MemoryRegion 
> objects
>   */
> @@ -1106,8 +1098,10 @@ struct AddressSpace {
>  QTAILQ_HEAD(, MemoryListener) listeners;
>  QTAILQ_ENTRY(AddressSpace) address_spaces_link;
>  
> -/* Bounce buffer to use for this address space. */
> -BounceBuffer bounce;
> +/* Maximum DMA bounce buffer size used for indirect memory map requests 
> */
> +uint64_t max_bounce_buffer_size;
> +/* Total size of bounce buffers currently allocated, atomically accessed 
> */
> +uint64_t bounce_buffer_size;
>  /* List of callbacks to invoke when buffers free up */
>  QemuMutex map_client_list_lock;
>  QLIST_HEAD(, AddressSpaceMapClient) map_client_list;
> diff --git a/include/hw/pci/pci_device.h b/include/hw/pci/pci_device.h
> index d3dd0f64b2..f4027c5379 100644
> --- a/include/hw/pci/pci_device.h
> +++ b/include/hw/pci/pci_device.h
> @@ -160,6 +160,9 @@ struct PCIDevice {
>  /* ID of standby device in net_failover pair */
>  char *failover_pair_id;
>  uint32_t acpi_index;
> +
> +/* Maximum DMA bounce buffer size used for indirect memory map requests 
> */
> +uint64_t max_bounce_buffer_size;
>  };
>  
>  static inline int pci_intx(PCIDevice *pci_dev)
> diff --git a/softmmu/memory.c b/softmmu/memory.c
> index 5c9622c3d6..e02799359c 100644
> --- a/softmmu/memory.c
> ++

Re: [PATCH v3 0/5] Support message-based DMA in vfio-user server

2023-09-14 Thread Stefan Hajnoczi

On Thu, Sep 07, 2023 at 06:04:05AM -0700, Mattias Nissler wrote:
> This series adds basic support for message-based DMA in qemu's vfio-user
> server. This is useful for cases where the client does not provide file
> descriptors for accessing system memory via memory mappings. My motivating use
> case is to hook up device models as PCIe endpoints to a hardware design. This
> works by bridging the PCIe transaction layer to vfio-user, and the endpoint
> does not access memory directly, but sends memory requests TLPs to the 
> hardware
> design in order to perform DMA.
> 
> Note that there is some more work required on top of this series to get
> message-based DMA to really work well:
> 
> * libvfio-user has a long-standing issue where socket communication gets 
> messed
>   up when messages are sent from both ends at the same time. See
>   https://github.com/nutanix/libvfio-user/issues/279 for more details. I've
>   been engaging there and a fix is in review.
> 
> * qemu currently breaks down DMA accesses into chunks of size 8 bytes at
>   maximum, each of which will be handled in a separate vfio-user DMA request
>   message. This is quite terrible for large DMA accesses, such as when nvme
>   reads and writes page-sized blocks for example. Thus, I would like to 
> improve
>   qemu to be able to perform larger accesses, at least for indirect memory
>   regions. I have something working locally, but since this will likely result
>   in more involved surgery and discussion, I am leaving this to be addressed 
> in
>   a separate patch.

Have you tried setting mr->ops->valid.max_access_size to something like
64 KB?

Paolo: Any suggestions for increasing DMA transaction sizes?

Stefan

> 
> Changes from v1:
> 
> * Address Stefan's review comments. In particular, enforce an allocation limit
>   and don't drop the map client callbacks given that map requests can fail 
> when
>   hitting size limits.
> 
> * libvfio-user version bump now included in the series.
> 
> * Tested as well on big-endian s390x. This uncovered another byte order issue
>   in vfio-user server code that I've included a fix for.
> 
> Changes from v2:
> 
> * Add a preparatory patch to make bounce buffering an AddressSpace-specific
>   concept.
> 
> * The total buffer size limit parameter is now per AdressSpace and can be
>   configured for PCIDevice via a property.
> 
> * Store a magic value in first bytes of bounce buffer struct as a best effort
>   measure to detect invalid pointers in address_space_unmap.
> 
> Mattias Nissler (5):
>   softmmu: Per-AddressSpace bounce buffering
>   softmmu: Support concurrent bounce buffers
>   Update subprojects/libvfio-user
>   vfio-user: Message-based DMA support
>   vfio-user: Fix config space access byte order
> 
>  hw/pci/pci.c  |   8 ++
>  hw/remote/trace-events|   2 +
>  hw/remote/vfio-user-obj.c |  88 +--
>  include/exec/cpu-common.h |   2 -
>  include/exec/memory.h |  39 -
>  include/hw/pci/pci_device.h   |   3 +
>  softmmu/dma-helpers.c |   4 +-
>  softmmu/memory.c  |   4 +
>  softmmu/physmem.c | 155 ++
>  subprojects/libvfio-user.wrap |   2 +-
>  10 files changed, 220 insertions(+), 87 deletions(-)
> 
> -- 
> 2.34.1
> 


signature.asc
Description: PGP signature

[PATCH 1/4] block/file-posix: set up Linux AIO and io_uring in the current thread

2023-09-14 Thread Stefan Hajnoczi

The file-posix block driver currently only sets up Linux AIO and
io_uring in the BDS's AioContext. In the multi-queue block layer we must
be able to submit I/O requests in AioContexts that do not have Linux AIO
and io_uring set up yet since any thread can call into the block driver.

Set up Linux AIO and io_uring for the current AioContext during request
submission. We lose the ability to return an error from
.bdrv_file_open() when Linux AIO and io_uring setup fails (e.g. due to
resource limits). Instead the user only gets warnings and we fall back
to aio=threads. This is still better than a fatal error after startup.

Signed-off-by: Stefan Hajnoczi 
---
 block/file-posix.c | 99 ++
 1 file changed, 47 insertions(+), 52 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 4757914ac0..e9dbb87c57 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -713,17 +713,11 @@ static int raw_open_common(BlockDriverState *bs, QDict 
*options,
 
 #ifdef CONFIG_LINUX_AIO
  /* Currently Linux does AIO only for files opened with O_DIRECT */
-if (s->use_linux_aio) {
-if (!(s->open_flags & O_DIRECT)) {
-error_setg(errp, "aio=native was specified, but it requires "
- "cache.direct=on, which was not specified.");
-ret = -EINVAL;
-goto fail;
-}
-if (!aio_setup_linux_aio(bdrv_get_aio_context(bs), errp)) {
-error_prepend(errp, "Unable to use native AIO: ");
-goto fail;
-}
+if (s->use_linux_aio && !(s->open_flags & O_DIRECT)) {
+error_setg(errp, "aio=native was specified, but it requires "
+ "cache.direct=on, which was not specified.");
+ret = -EINVAL;
+goto fail;
 }
 #else
 if (s->use_linux_aio) {
@@ -734,14 +728,7 @@ static int raw_open_common(BlockDriverState *bs, QDict 
*options,
 }
 #endif /* !defined(CONFIG_LINUX_AIO) */
 
-#ifdef CONFIG_LINUX_IO_URING
-if (s->use_linux_io_uring) {
-if (!aio_setup_linux_io_uring(bdrv_get_aio_context(bs), errp)) {
-error_prepend(errp, "Unable to use io_uring: ");
-goto fail;
-}
-}
-#else
+#ifndef CONFIG_LINUX_IO_URING
 if (s->use_linux_io_uring) {
 error_setg(errp, "aio=io_uring was specified, but is not supported "
  "in this build.");
@@ -2442,6 +2429,44 @@ static bool bdrv_qiov_is_aligned(BlockDriverState *bs, 
QEMUIOVector *qiov)
 return true;
 }
 
+static inline bool raw_check_linux_io_uring(BDRVRawState *s)
+{
+Error *local_err = NULL;
+AioContext *ctx;
+
+if (!s->use_linux_io_uring) {
+return false;
+}
+
+ctx = qemu_get_current_aio_context();
+if (unlikely(!aio_setup_linux_io_uring(ctx, &local_err))) {
+error_reportf_err(local_err, "Unable to use linux io_uring, "
+ "falling back to thread pool: ");
+s->use_linux_io_uring = false;
+return false;
+}
+return true;
+}
+
+static inline bool raw_check_linux_aio(BDRVRawState *s)
+{
+Error *local_err = NULL;
+AioContext *ctx;
+
+if (!s->use_linux_aio) {
+return false;
+}
+
+ctx = qemu_get_current_aio_context();
+if (unlikely(!aio_setup_linux_aio(ctx, &local_err))) {
+error_reportf_err(local_err, "Unable to use Linux AIO, "
+ "falling back to thread pool: ");
+s->use_linux_aio = false;
+return false;
+}
+return true;
+}
+
 static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
uint64_t bytes, QEMUIOVector *qiov, int 
type)
 {
@@ -2470,13 +2495,13 @@ static int coroutine_fn raw_co_prw(BlockDriverState 
*bs, uint64_t offset,
 if (s->needs_alignment && !bdrv_qiov_is_aligned(bs, qiov)) {
 type |= QEMU_AIO_MISALIGNED;
 #ifdef CONFIG_LINUX_IO_URING
-} else if (s->use_linux_io_uring) {
+} else if (raw_check_linux_io_uring(s)) {
 assert(qiov->size == bytes);
 ret = luring_co_submit(bs, s->fd, offset, qiov, type);
 goto out;
 #endif
 #ifdef CONFIG_LINUX_AIO
-} else if (s->use_linux_aio) {
+} else if (raw_check_linux_aio(s)) {
 assert(qiov->size == bytes);
 ret = laio_co_submit(s->fd, offset, qiov, type,
   s->aio_max_batch);
@@ -2566,39 +2591,13 @@ static int coroutine_fn 
raw_co_flush_to_disk(BlockDriverState *bs)
 };
 
 #ifdef CONFIG_LINUX_IO_URING
-if (s->use_linux_io_uring) {
+if (raw_check_linux_io_uring(s)) {
 return luring_co_submit(bs, s->fd, 0, NULL, QEMU_AIO_FLUSH);
 }
 #endif
 return raw_thread_pool_submit

[PATCH 0/4] virtio-blk: prepare for the multi-queue block layer

2023-09-14 Thread Stefan Hajnoczi

The virtio-blk device will soon be able to assign virtqueues to IOThreads,
eliminating the single IOThread bottleneck. In order to do that, the I/O code
path must support running in multiple threads.

This patch series removes the AioContext lock from the virtio-blk I/O code
path, adds thread-safety where it is required, and ensures that Linux AIO and
io_uring are available regardless of which thread calls into the block driver.
With these changes virtio-blk is ready for the iothread-vq-mapping feature,
which will be introduced in the next patch series.

Based-on: 20230913200045.1024233-1-stefa...@redhat.com ("[PATCH v3 0/4] 
virtio-blk: use blk_io_plug_call() instead of notification BH")
Based-on: 20230912231037.826804-1-stefa...@redhat.com ("[PATCH v3 0/5] 
block-backend: process I/O in the current AioContext")

Stefan Hajnoczi (4):
  block/file-posix: set up Linux AIO and io_uring in the current thread
  virtio-blk: add lock to protect s->rq
  virtio-blk: don't lock AioContext in the completion code path
  virtio-blk: don't lock AioContext in the submission code path

 include/hw/virtio/virtio-blk.h |   3 +-
 block/file-posix.c |  99 +++---
 hw/block/virtio-blk.c  | 106 +++--
 3 files changed, 98 insertions(+), 110 deletions(-)

-- 
2.41.0

[PATCH 4/4] virtio-blk: don't lock AioContext in the submission code path

2023-09-14 Thread Stefan Hajnoczi

There is no need to acquire the AioContext lock around blk_aio_*() or
blk_get_geometry() anymore. I/O plugging (defer_call()) also does not
require the AioContext lock anymore.

Signed-off-by: Stefan Hajnoczi 
---
 hw/block/virtio-blk.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index f5315df042..e110f9718b 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -,7 +,6 @@ void virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq)
 MultiReqBuffer mrb = {};
 bool suppress_notifications = virtio_queue_get_notification(vq);
 
-aio_context_acquire(blk_get_aio_context(s->blk));
 defer_call_begin();
 
 do {
@@ -1137,7 +1136,6 @@ void virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq)
 }
 
 defer_call_end();
-aio_context_release(blk_get_aio_context(s->blk));
 }
 
 static void virtio_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
@@ -1168,7 +1166,6 @@ static void virtio_blk_dma_restart_bh(void *opaque)
 s->rq = NULL;
 }
 
-aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
 while (req) {
 VirtIOBlockReq *next = req->next;
 if (virtio_blk_handle_request(req, &mrb)) {
@@ -1192,8 +1189,6 @@ static void virtio_blk_dma_restart_bh(void *opaque)
 
 /* Paired with inc in virtio_blk_dma_restart_cb() */
 blk_dec_in_flight(s->conf.conf.blk);
-
-aio_context_release(blk_get_aio_context(s->conf.conf.blk));
 }
 
 static void virtio_blk_dma_restart_cb(void *opaque, bool running,
-- 
2.41.0

[PATCH 2/4] virtio-blk: add lock to protect s->rq

2023-09-14 Thread Stefan Hajnoczi

s->rq is accessed from IO_CODE and GLOBAL_STATE_CODE. Introduce a lock
to protect s->rq and eliminate reliance on the AioContext lock.

Signed-off-by: Stefan Hajnoczi 
---
 include/hw/virtio/virtio-blk.h |  3 +-
 hw/block/virtio-blk.c  | 67 +++---
 2 files changed, 47 insertions(+), 23 deletions(-)

diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index dafec432ce..9881009c22 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -54,7 +54,8 @@ struct VirtIOBlockReq;
 struct VirtIOBlock {
 VirtIODevice parent_obj;
 BlockBackend *blk;
-void *rq;
+QemuMutex rq_lock;
+void *rq; /* protected by rq_lock */
 VirtIOBlkConf conf;
 unsigned short sector_mask;
 bool original_wce;
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index a1f8e15522..ee38e089bc 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -82,8 +82,11 @@ static int virtio_blk_handle_rw_error(VirtIOBlockReq *req, 
int error,
 /* Break the link as the next request is going to be parsed from the
  * ring again. Otherwise we may end up doing a double completion! */
 req->mr_next = NULL;
-req->next = s->rq;
-s->rq = req;
+
+WITH_QEMU_LOCK_GUARD(&s->rq_lock) {
+req->next = s->rq;
+s->rq = req;
+}
 } else if (action == BLOCK_ERROR_ACTION_REPORT) {
 virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR);
 if (acct_failed) {
@@ -1183,10 +1186,13 @@ static void virtio_blk_dma_restart_bh(void *opaque)
 {
 VirtIOBlock *s = opaque;
 
-VirtIOBlockReq *req = s->rq;
+VirtIOBlockReq *req;
 MultiReqBuffer mrb = {};
 
-s->rq = NULL;
+WITH_QEMU_LOCK_GUARD(&s->rq_lock) {
+req = s->rq;
+s->rq = NULL;
+}
 
 aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
 while (req) {
@@ -1238,22 +1244,29 @@ static void virtio_blk_reset(VirtIODevice *vdev)
 AioContext *ctx;
 VirtIOBlockReq *req;
 
+/* Dataplane has stopped... */
+assert(!s->dataplane_started);
+
+/* ...but requests may still be in flight. */
 ctx = blk_get_aio_context(s->blk);
 aio_context_acquire(ctx);
 blk_drain(s->blk);
+aio_context_release(ctx);
 
 /* We drop queued requests after blk_drain() because blk_drain() itself can
  * produce them. */
-while (s->rq) {
-req = s->rq;
-s->rq = req->next;
-virtqueue_detach_element(req->vq, &req->elem, 0);
-virtio_blk_free_request(req);
+WITH_QEMU_LOCK_GUARD(&s->rq_lock) {
+while (s->rq) {
+req = s->rq;
+s->rq = req->next;
+
+/* No other threads can access req->vq here */
+virtqueue_detach_element(req->vq, &req->elem, 0);
+
+virtio_blk_free_request(req);
+}
 }
 
-aio_context_release(ctx);
-
-assert(!s->dataplane_started);
 blk_set_enable_write_cache(s->blk, s->original_wce);
 }
 
@@ -1443,18 +1456,22 @@ static void virtio_blk_set_status(VirtIODevice *vdev, 
uint8_t status)
 static void virtio_blk_save_device(VirtIODevice *vdev, QEMUFile *f)
 {
 VirtIOBlock *s = VIRTIO_BLK(vdev);
-VirtIOBlockReq *req = s->rq;
 
-while (req) {
-qemu_put_sbyte(f, 1);
+WITH_QEMU_LOCK_GUARD(&s->rq_lock) {
+VirtIOBlockReq *req = s->rq;
 
-if (s->conf.num_queues > 1) {
-qemu_put_be32(f, virtio_get_queue_index(req->vq));
+while (req) {
+qemu_put_sbyte(f, 1);
+
+if (s->conf.num_queues > 1) {
+qemu_put_be32(f, virtio_get_queue_index(req->vq));
+}
+
+qemu_put_virtqueue_element(vdev, f, &req->elem);
+req = req->next;
 }
-
-qemu_put_virtqueue_element(vdev, f, &req->elem);
-req = req->next;
 }
+
 qemu_put_sbyte(f, 0);
 }
 
@@ -1480,8 +1497,11 @@ static int virtio_blk_load_device(VirtIODevice *vdev, 
QEMUFile *f,
 
 req = qemu_get_virtqueue_element(vdev, f, sizeof(VirtIOBlockReq));
 virtio_blk_init_request(s, virtio_get_queue(vdev, vq_idx), req);
-req->next = s->rq;
-s->rq = req;
+
+WITH_QEMU_LOCK_GUARD(&s->rq_lock) {
+req->next = s->rq;
+s->rq = req;
+}
 }
 
 return 0;
@@ -1628,6 +1648,8 @@ static void virtio_blk_device_realize(DeviceState *dev, 
Error **errp)
 s->host_features);
 virtio_init(vdev, VIRTIO_ID_BLOCK, s->config_size);
 
+qemu_mutex_init(&s->rq_lock);
+
 s->blk = conf->conf.blk;
 s->rq = NULL;
 s->sector_mask = (s->conf.conf.logical_block_size / BDRV_SECTOR_SIZE) - 1;
@@

[PATCH 3/4] virtio-blk: don't lock AioContext in the completion code path

2023-09-14 Thread Stefan Hajnoczi

Nothing in the completion code path relies on the AioContext lock
anymore. Virtqueues are only accessed from one thread at any moment and
the s->rq global state is protected by its own lock now.

Signed-off-by: Stefan Hajnoczi 
---
 hw/block/virtio-blk.c | 34 --
 1 file changed, 4 insertions(+), 30 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index ee38e089bc..f5315df042 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -105,7 +105,6 @@ static void virtio_blk_rw_complete(void *opaque, int ret)
 VirtIOBlock *s = next->dev;
 VirtIODevice *vdev = VIRTIO_DEVICE(s);
 
-aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
 while (next) {
 VirtIOBlockReq *req = next;
 next = req->mr_next;
@@ -138,7 +137,6 @@ static void virtio_blk_rw_complete(void *opaque, int ret)
 block_acct_done(blk_get_stats(s->blk), &req->acct);
 virtio_blk_free_request(req);
 }
-aio_context_release(blk_get_aio_context(s->conf.conf.blk));
 }
 
 static void virtio_blk_flush_complete(void *opaque, int ret)
@@ -146,19 +144,13 @@ static void virtio_blk_flush_complete(void *opaque, int 
ret)
 VirtIOBlockReq *req = opaque;
 VirtIOBlock *s = req->dev;
 
-aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
-if (ret) {
-if (virtio_blk_handle_rw_error(req, -ret, 0, true)) {
-goto out;
-}
+if (ret && virtio_blk_handle_rw_error(req, -ret, 0, true)) {
+return;
 }
 
 virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
 block_acct_done(blk_get_stats(s->blk), &req->acct);
 virtio_blk_free_request(req);
-
-out:
-aio_context_release(blk_get_aio_context(s->conf.conf.blk));
 }
 
 static void virtio_blk_discard_write_zeroes_complete(void *opaque, int ret)
@@ -168,11 +160,8 @@ static void virtio_blk_discard_write_zeroes_complete(void 
*opaque, int ret)
 bool is_write_zeroes = (virtio_ldl_p(VIRTIO_DEVICE(s), &req->out.type) &
 ~VIRTIO_BLK_T_BARRIER) == 
VIRTIO_BLK_T_WRITE_ZEROES;
 
-aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
-if (ret) {
-if (virtio_blk_handle_rw_error(req, -ret, false, is_write_zeroes)) {
-goto out;
-}
+if (ret && virtio_blk_handle_rw_error(req, -ret, false, is_write_zeroes)) {
+return;
 }
 
 virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
@@ -180,9 +169,6 @@ static void virtio_blk_discard_write_zeroes_complete(void 
*opaque, int ret)
 block_acct_done(blk_get_stats(s->blk), &req->acct);
 }
 virtio_blk_free_request(req);
-
-out:
-aio_context_release(blk_get_aio_context(s->conf.conf.blk));
 }
 
 #ifdef __linux__
@@ -229,10 +215,8 @@ static void virtio_blk_ioctl_complete(void *opaque, int 
status)
 virtio_stl_p(vdev, &scsi->data_len, hdr->dxfer_len);
 
 out:
-aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
 virtio_blk_req_complete(req, status);
 virtio_blk_free_request(req);
-aio_context_release(blk_get_aio_context(s->conf.conf.blk));
 g_free(ioctl_req);
 }
 
@@ -672,7 +656,6 @@ static void virtio_blk_zone_report_complete(void *opaque, 
int ret)
 {
 ZoneCmdData *data = opaque;
 VirtIOBlockReq *req = data->req;
-VirtIOBlock *s = req->dev;
 VirtIODevice *vdev = VIRTIO_DEVICE(req->dev);
 struct iovec *in_iov = data->in_iov;
 unsigned in_num = data->in_num;
@@ -763,10 +746,8 @@ static void virtio_blk_zone_report_complete(void *opaque, 
int ret)
 }
 
 out:
-aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
 virtio_blk_req_complete(req, err_status);
 virtio_blk_free_request(req);
-aio_context_release(blk_get_aio_context(s->conf.conf.blk));
 g_free(data->zone_report_data.zones);
 g_free(data);
 }
@@ -829,10 +810,8 @@ static void virtio_blk_zone_mgmt_complete(void *opaque, 
int ret)
 err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
 }
 
-aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
 virtio_blk_req_complete(req, err_status);
 virtio_blk_free_request(req);
-aio_context_release(blk_get_aio_context(s->conf.conf.blk));
 }
 
 static int virtio_blk_handle_zone_mgmt(VirtIOBlockReq *req, BlockZoneOp op)
@@ -882,7 +861,6 @@ static void virtio_blk_zone_append_complete(void *opaque, 
int ret)
 {
 ZoneCmdData *data = opaque;
 VirtIOBlockReq *req = data->req;
-VirtIOBlock *s = req->dev;
 VirtIODevice *vdev = VIRTIO_DEVICE(req->dev);
 int64_t append_sector, n;
 uint8_t err_status = VIRTIO_BLK_S_OK;
@@ -905,10 +883,8 @@ static void virtio_blk_zone_append_complete(void *opaque, 
int ret)
 trace_virtio_blk_zone_append_complete(vdev, req, append_sector, ret);
 
 out:
-aio_context_acquire(blk_get_aio_context(s->conf.conf.blk))

Re: [PATCH] qdev-properties: alias all object class properties

2023-09-14 Thread Stefan Hajnoczi

Paolo: ping?

On Thu, 3 Aug 2023 at 15:51, Stefan Hajnoczi  wrote:
>
> qdev_alias_all_properties() aliases a DeviceState's qdev properties onto
> an Object. This is used for VirtioPCIProxy types so that --device
> virtio-blk-pci has properties of its embedded --device virtio-blk-device
> object.
>
> Currently this function is implemented using qdev properties. Change the
> function to use QOM object class properties instead. This works because
> qdev properties create QOM object class properties, but it also catches
> any QOM object class-only properties that have no qdev properties.
>
> This change ensures that properties of devices are shown with --device
> foo,\? even if they are QOM object class properties.
>
> Signed-off-by: Stefan Hajnoczi 
> ---
>  hw/core/qdev-properties.c | 18 ++
>  1 file changed, 10 insertions(+), 8 deletions(-)
>
> diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
> index 357b8761b5..fbf3969d3c 100644
> --- a/hw/core/qdev-properties.c
> +++ b/hw/core/qdev-properties.c
> @@ -959,16 +959,18 @@ void device_class_set_props(DeviceClass *dc, Property 
> *props)
>  void qdev_alias_all_properties(DeviceState *target, Object *source)
>  {
>  ObjectClass *class;
> -Property *prop;
> +ObjectPropertyIterator iter;
> +ObjectProperty *prop;
>
>  class = object_get_class(OBJECT(target));
> -do {
> -DeviceClass *dc = DEVICE_CLASS(class);
>
> -for (prop = dc->props_; prop && prop->name; prop++) {
> -object_property_add_alias(source, prop->name,
> -  OBJECT(target), prop->name);
> +object_class_property_iter_init(&iter, class);
> +while ((prop = object_property_iter_next(&iter))) {
> +if (object_property_find(source, prop->name)) {
> +continue; /* skip duplicate properties */
>  }
> -class = object_class_get_parent(class);
> -} while (class != object_class_by_name(TYPE_DEVICE));
> +
> +object_property_add_alias(source, prop->name,
> +  OBJECT(target), prop->name);
> +}
>  }
> --
> 2.41.0
>
>

Re: Various changes "backportability"

2023-09-14 Thread Stefan Hajnoczi

On Wed, Sep 13, 2023 at 05:44:38PM +0300, Michael Tokarev wrote:
> 13.09.2023 17:27, Stefan Hajnoczi wrote:
> ...
> > > For example, recent tpm bugfix, which is trivial by its own,
> > > uses RETRY_ON_EINTR helper which were introduced recently and
> > > which is now used everywhere.  coroutine_fn et al markers is
> > > another example, translator_io_start is yet another, and so
> > > on and so on.
> 
> > The general concept makes sense to me but I'm not sure what the
> > specific issue with adding (?) coroutine_fn was. Can you link to the
> > patch that caused difficulties so I can review it?
> 
> There's nothing really exciting here, and coroutine_fn example isn't
> a best one really.  I'm talking about this:
> 
> https://gitlab.com/mjt0k/qemu/-/commit/c5034f827726f5876234bf4c6a0fab648fd8b020
> 
> which is a current back-port of 92e2e6a867334a990f8d29f07ca34e3162fdd6ec
> "virtio: Drop out of coroutine context in virtio_load()":
> 
> https://gitlab.com/mjt0k/qemu/-/commit/92e2e6a867334a990f8d29f07ca34e3162fdd6ec
> 
> This is a bugfix which I tried to cherry-pick (btw, I dunno yet if it should
> go to 8.0 or 7.2 to begin with, asked this in another email, but it still
> serves as an example).  Original patch adds coroutine_mixed_fn to some 
> existing
> functions and to a newly added function.
> 
> The patch introducing coroutine_mixed_fn marker is v7.2.0-909-g0f3de970 .
> This is actually a very good example of a way how things are done best,
> an excellent example of what I'm talking here, - this 0f3de970 only introduces
> the new concept (to be widely used), not converting everything to it
> right away.  So it's a good example of how things can be done right.
> 
> But this 0f3de970 change is based on earlier change which split things up
> and moved stuff from one place to another, and which is too large to
> backport.  So even if 0f3de970 did an excellent job, it is still of no
> use in this context.
> 
> I decided to drop coroutine_mixed_fn markings in the fix for 7.2 in this
> context, - again, if this particular fix is needed there to begin with,
> which is a question unrelated to this topic.
> 
> 
> A better example is a trivial thing with RETRY_ON_EINTR introduction.
> A trivial macro which replaced TFR in
> 
> commit 37b0b24e933c18269dddbf6b83f91823cacf8105
> Author: Nikita Ivanov 
> Date:   Sun Oct 23 12:04:22 2022 +0300
> 
> error handling: Use RETRY_ON_EINTR() macro where applicable
> 
> if this change were split into two, first introducing the new macro
> and second converting existing code & removing old macro, it'd be
> possible to just cherry-pick the first part and thered' be no need
> to modify further cherry-picks which uses RETRY_ON_EINTR.
> 
> But once again, this all is definitely not as important as getting
> good code into main :)

I see, thank you!

Stefan


signature.asc
Description: PGP signature

Re: [PATCH v3 2/5] test-bdrv-drain: avoid race with BH in IOThread drain test

2023-09-14 Thread Stefan Hajnoczi

On Wed, Sep 13, 2023 at 11:08:54AM -0500, Eric Blake wrote:
> On Tue, Sep 12, 2023 at 07:10:34PM -0400, Stefan Hajnoczi wrote:
> > This patch fixes a race condition in test-bdrv-drain that is difficult
> > to reproduce. test-bdrv-drain sometimes fails without an error message
> > on the block pull request sent by Kevin Wolf on Sep 4, 2023. I was able
> > to reproduce it locally and found that "block-backend: process I/O in
> > the current AioContext" (in this patch series) is the first commit where
> > it reproduces.
> > 
> > I do not know why "block-backend: process I/O in the current AioContext"
> > exposes this bug. It might be related to the fact that the test's preadv
> > request runs in the main thread instead of IOThread a after my commit.
> 
> In reading the commit message before the impacted code, my first
> thought was that you had a typo of an extra word (that is, something
> to fix by s/a //), but reading further, a better fix would be calling
> attention to the fact that you are referencing a specific named
> thread, as in s/IOThread a/IOThread A/...
> 
> > That might simply change the timing of the test.
> > 
> > Now on to the race condition in test-bdrv-drain. The main thread
> > schedules a BH in IOThread a and then drains the BDS:
> 
> ...and another spot with the same parse issue...
> 
> > 
> >   aio_bh_schedule_oneshot(ctx_a, test_iothread_main_thread_bh, &data);
> > 
> >   /* The request is running on the IOThread a. Draining its block device
> 
> ...but here you were quoting from the existing code base, which is
> where I finally realized it was more than just your commit message.
> 
> >* will make sure that it has completed as far as the BDS is concerned,
> >* but the drain in this thread can continue immediately after
> >* bdrv_dec_in_flight() and aio_ret might be assigned only slightly
> >* later. */
> >   do_drain_begin(drain_type, bs);
> > 
> > If the BH completes before do_drain_begin() then there is nothing to
> > worry about.
> > 
> > If the BH invokes bdrv_flush() before do_drain_begin(), then
> > do_drain_begin() waits for it to complete.
> > 
> > The problematic case is when do_drain_begin() runs before the BH enters
> > bdrv_flush(). Then do_drain_begin() misses the BH and the drain
> > mechanism has failed in quiescing I/O.
> > 
> > Fix this by incrementing the in_flight counter so that do_drain_begin()
> > waits for test_iothread_main_thread_bh().
> > 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >  tests/unit/test-bdrv-drain.c | 8 
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
> > index ccc453c29e..67a79aa3f0 100644
> > --- a/tests/unit/test-bdrv-drain.c
> > +++ b/tests/unit/test-bdrv-drain.c
> > @@ -512,6 +512,7 @@ static void test_iothread_main_thread_bh(void *opaque)
> >   * executed during drain, otherwise this would deadlock. */
> >  aio_context_acquire(bdrv_get_aio_context(data->bs));
> >  bdrv_flush(data->bs);
> > +bdrv_dec_in_flight(data->bs); /* incremented by test_iothread_common() 
> > */
> >  aio_context_release(bdrv_get_aio_context(data->bs));
> >  }
> >  
> > @@ -583,6 +584,13 @@ static void test_iothread_common(enum drain_type 
> > drain_type, int drain_thread)
> >  aio_context_acquire(ctx_a);
> >  }
> >  
> > +/*
> > + * Increment in_flight so that do_drain_begin() waits for
> > + * test_iothread_main_thread_bh(). This prevents the race between
> > + * test_iothread_main_thread_bh() in IOThread a and 
> > do_drain_begin() in
> > + * this thread. test_iothread_main_thread_bh() decrements 
> > in_flight.
> > + */
> > +bdrv_inc_in_flight(bs);
> >  aio_bh_schedule_oneshot(ctx_a, test_iothread_main_thread_bh, 
> > &data);
> >  
> >  /* The request is running on the IOThread a. Draining its block 
> > device
> 
> and indeed, your commit message is consistent with the current code's
> naming convention.  If you have reason to respin, a pre-req patch to
> change the case before adding more references might be nice, but I
> won't insist.
> 
> Reviewed-by: Eric Blake 

Sorry about that. It is confusing.

Stefan


signature.asc
Description: PGP signature

Re: [PATCH v3 3/4] qcow2: add zoned emulation capability

2023-09-13 Thread Stefan Hajnoczi

On Mon, Aug 28, 2023 at 11:09:54PM +0800, Sam Li wrote:
> By adding zone operations and zoned metadata, the zoned emulation
> capability enables full emulation support of zoned device using
> a qcow2 file. The zoned device metadata includes zone type,
> zoned device state and write pointer of each zone, which is stored
> to an array of unsigned integers.
> 
> Each zone of a zoned device makes state transitions following
> the zone state machine. The zone state machine mainly describes
> five states, IMPLICIT OPEN, EXPLICIT OPEN, FULL, EMPTY and CLOSED.
> READ ONLY and OFFLINE states will generally be affected by device
> internal events. The operations on zones cause corresponding state
> changing.
> 
> Zoned devices have a limit on zone resources, which puts constraints on
> write operations into zones.
> 
> Signed-off-by: Sam Li 
> ---
>  block/qcow2.c  | 657 -
>  block/qcow2.h  |   2 +
>  block/trace-events |   1 +
>  docs/interop/qcow2.txt |   6 +
>  4 files changed, 664 insertions(+), 2 deletions(-)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 7074bfc620..bc98d98c8e 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -194,6 +194,153 @@ qcow2_extract_crypto_opts(QemuOpts *opts, const char 
> *fmt, Error **errp)
>  return cryptoopts_qdict;
>  }
>  
> +#define QCOW2_ZT_IS_CONV(wp)(wp & 1ULL << 59)
> +
> +static inline int qcow2_get_wp(uint64_t wp)
> +{
> +/* clear state and type information */
> +return ((wp << 5) >> 5);
> +}
> +
> +static inline int qcow2_get_zs(uint64_t wp)
> +{
> +return (wp >> 60);
> +}
> +
> +static inline void qcow2_set_zs(uint64_t *wp, BlockZoneState zs)
> +{
> +uint64_t addr = qcow2_get_wp(*wp);
> +addr |= ((uint64_t)zs << 60);
> +*wp = addr;
> +}
> +
> +/*
> + * Perform a state assignment and a flush operation that writes the new wp
> + * value to the dedicated location of the disk file.
> + */
> +static int qcow2_write_wp_at(BlockDriverState *bs, uint64_t *wp,
> + uint32_t index, BlockZoneState zs) {
> +BDRVQcow2State *s = bs->opaque;
> +uint64_t wpv = *wp;
> +int ret;
> +
> +qcow2_set_zs(wp, zs);
> +ret = bdrv_pwrite(bs->file, s->zoned_header.zonedmeta_offset
> ++ sizeof(uint64_t) * index, sizeof(uint64_t), wp, 0);
> +
> +if (ret < 0) {
> +goto exit;
> +}
> +trace_qcow2_wp_tracking(index, qcow2_get_wp(*wp) >> BDRV_SECTOR_BITS);
> +return ret;
> +
> +exit:
> +*wp = wpv;
> +error_report("Failed to write metadata with file");
> +return ret;
> +}
> +
> +static bool qcow2_check_active_zones(BlockDriverState *bs)
> +{
> +BDRVQcow2State *s = bs->opaque;
> +
> +if (!s->zoned_header.max_active_zones) {
> +return true;
> +}
> +
> +if (s->nr_zones_exp_open + s->nr_zones_imp_open + s->nr_zones_closed
> +< s->zoned_header.max_active_zones) {
> +return true;
> +}
> +
> +return false;
> +}
> +
> +static int qcow2_check_open_zones(BlockDriverState *bs)
> +{
> +BDRVQcow2State *s = bs->opaque;
> +int ret;
> +
> +if (!s->zoned_header.max_open_zones) {
> +return 0;
> +}
> +
> +if (s->nr_zones_exp_open + s->nr_zones_imp_open
> +< s->zoned_header.max_open_zones) {
> +return 0;
> +}
> +
> +if(s->nr_zones_imp_open && qcow2_check_active_zones(bs)) {
> +/* TODO: it takes O(n) time complexity (n = nr_zones).
> + * Optimizations required. */
> +/* close one implicitly open zones to make it available */
> +for (int i = s->zoned_header.nr_conv_zones;

Please use uint32_t to keep the types consistent.

> +i < bs->bl.nr_zones; ++i) {
> +uint64_t *wp = &bs->wps->wp[i];
> +if (qcow2_get_zs(*wp) == BLK_ZS_IOPEN) {
> +ret = qcow2_write_wp_at(bs, wp, i, BLK_ZS_CLOSED);
> +if (ret < 0) {
> +return ret;
> +}
> +bs->wps->wp[i] = *wp;

This assignment is unnecessary since wp points to bs->wps->wp[i]. The
value has already been updated.

> +s->nr_zones_imp_open--;
> +s->nr_zones_closed++;
> +break;
> +}
> +}
> +return 0;

If this for loop completes with i == bs->bl.nr_zones then we've failed
to close an IOPEN zone. The return value should be an error like -EBUSY.

> +}
> +
> +return -EINVAL;

Which case does this -EINVAL cover? Won't we get here when no zones are
open yet?


signature.asc
Description: PGP signature

Re: [PATCH v3 2/4] qcow2: add configurations for zoned format extension

2023-09-13 Thread Stefan Hajnoczi

On Mon, Aug 28, 2023 at 11:09:53PM +0800, Sam Li wrote:
> To configure the zoned format feature on the qcow2 driver, it
> requires following arguments: the device size, zoned profile,
> zone model, zone size, zone capacity, number of conventional
> zones, limits on zone resources (max append sectors, max open
> zones, and max_active_zones).
> 
> To create a qcow2 file with zoned format, use command like this:
> $ qemu-img create -f qcow2 test.qcow2 -o size=768M -o
> zone_size=64M -o zone_capacity=64M -o nr_conv_zones=0 -o
> max_append_sectors=512 -o max_open_zones=0 -o max_active_zones=0
> -o zone_model=1
> 
> Signed-off-by: Sam Li 
> ---
>  block/qcow2.c| 176 ++-
>  block/qcow2.h|  20 
>  docs/interop/qcow2.txt   |  36 +++
>  include/block/block_int-common.h |  13 +++
>  qapi/block-core.json |  30 +-
>  5 files changed, 273 insertions(+), 2 deletions(-)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index c51388e99d..7074bfc620 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -73,6 +73,7 @@ typedef struct {
>  #define  QCOW2_EXT_MAGIC_CRYPTO_HEADER 0x0537be77
>  #define  QCOW2_EXT_MAGIC_BITMAPS 0x23852875
>  #define  QCOW2_EXT_MAGIC_DATA_FILE 0x44415441
> +#define  QCOW2_EXT_MAGIC_ZONED_FORMAT 0x7a6264
>  
>  static int coroutine_fn
>  qcow2_co_preadv_compressed(BlockDriverState *bs,
> @@ -210,6 +211,7 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t 
> start_offset,
>  uint64_t offset;
>  int ret;
>  Qcow2BitmapHeaderExt bitmaps_ext;
> +Qcow2ZonedHeaderExtension zoned_ext;
>  
>  if (need_update_header != NULL) {
>  *need_update_header = false;
> @@ -431,6 +433,55 @@ qcow2_read_extensions(BlockDriverState *bs, uint64_t 
> start_offset,
>  break;
>  }
>  
> +case QCOW2_EXT_MAGIC_ZONED_FORMAT:
> +{
> +if (ext.len != sizeof(zoned_ext)) {
> +error_setg(errp, "zoned_ext: Invalid extension length");
> +return -EINVAL;
> +}
> +ret = bdrv_pread(bs->file, offset, ext.len, &zoned_ext, 0);
> +if (ret < 0) {
> +error_setg_errno(errp, -ret, "zoned_ext: "
> + "Could not read ext header");
> +return ret;
> +}
> +
> +zoned_ext.zone_size = be32_to_cpu(zoned_ext.zone_size);
> +zoned_ext.zone_capacity = be32_to_cpu(zoned_ext.zone_capacity);
> +zoned_ext.nr_conv_zones = be32_to_cpu(zoned_ext.nr_conv_zones);
> +zoned_ext.nr_zones = be32_to_cpu(zoned_ext.nr_zones);
> +zoned_ext.max_open_zones = be32_to_cpu(zoned_ext.max_open_zones);
> +zoned_ext.max_active_zones =
> +be32_to_cpu(zoned_ext.max_active_zones);
> +zoned_ext.max_append_sectors =
> +be32_to_cpu(zoned_ext.max_append_sectors);
> +s->zoned_header = zoned_ext;
> +
> +/* refuse to open broken images */
> +if (zoned_ext.zone_size == 0) {
> +error_setg(errp, "Zoned extension header zone_size field "
> + "can not be 0");
> +return -EINVAL;
> +}
> +if (zoned_ext.zone_capacity > zoned_ext.zone_size) {
> +error_setg(errp, "Zoned extension header zone_capacity field 
> "
> + "can not be larger that zone_size field");
> +return -EINVAL;
> +}
> +if (zoned_ext.nr_zones != DIV_ROUND_UP(
> +bs->total_sectors * BDRV_SECTOR_SIZE, zoned_ext.zone_size)) {
> +error_setg(errp, "Zoned extension header nr_zones field "
> + "gets wrong");

"gets" -> "is"

> +return -EINVAL;
> +}
> +
> +#ifdef DEBUG_EXT
> +printf("Qcow2: Got zoned format extension: "
> +   "offset=%" PRIu32 "\n", offset);
> +#endif
> +break;
> +}
> +
>  default:
>  /* unknown magic - save it in case we need to rewrite the header 
> */
>  /* If you add a new feature, make sure to also update the fast
> @@ -1967,6 +2018,14 @@ static void qcow2_refresh_limits(BlockDriverState *bs, 
> Error **errp)
>  }
>  bs->bl.pwrite_zeroes_alignment = s->subcluster_size;
>  bs->bl.pdiscard_alignment = s->cluster_size;
> +bs->bl.zoned = s->zoned_header.zoned;
> +bs->bl.nr_zones = s->zoned_header.nr_zones;
> +bs->wps = s->wps;
> +bs->bl.max_append_sectors = s->zoned_header.max_append_sectors;
> +bs->bl.max_active_zones = s->zoned_header.max_active_zones;
> +bs->bl.max_open_zones = s->zoned_header.max_open_zones;
> +bs->bl.zone_size = s->zoned_header.zone_size;
> +bs->bl.write_granularity = BDRV_SECTOR_SIZE;
>  }
>  
>  static int qcow2_reopen_prepare(BDRVReopenState *

[PATCH v3 1/4] block: rename blk_io_plug_call() API to defer_call()

2023-09-13 Thread Stefan Hajnoczi

Prepare to move the blk_io_plug_call() API out of the block layer so
that other subsystems call use this deferred call mechanism. Rename it
to defer_call() but leave the code in block/plug.c.

The next commit will move the code out of the block layer.

Suggested-by: Ilya Maximets 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Paul Durrant 
Signed-off-by: Stefan Hajnoczi 
---
 include/sysemu/block-backend-io.h |   6 +-
 block/blkio.c |   8 +--
 block/io_uring.c  |   4 +-
 block/linux-aio.c |   4 +-
 block/nvme.c  |   4 +-
 block/plug.c  | 109 +++---
 hw/block/dataplane/xen-block.c|  10 +--
 hw/block/virtio-blk.c |   4 +-
 hw/scsi/virtio-scsi.c |   6 +-
 9 files changed, 76 insertions(+), 79 deletions(-)

diff --git a/include/sysemu/block-backend-io.h 
b/include/sysemu/block-backend-io.h
index be4dcef59d..cfcfd85c1d 100644
--- a/include/sysemu/block-backend-io.h
+++ b/include/sysemu/block-backend-io.h
@@ -100,9 +100,9 @@ void blk_iostatus_set_err(BlockBackend *blk, int error);
 int blk_get_max_iov(BlockBackend *blk);
 int blk_get_max_hw_iov(BlockBackend *blk);
 
-void blk_io_plug(void);
-void blk_io_unplug(void);
-void blk_io_plug_call(void (*fn)(void *), void *opaque);
+void defer_call_begin(void);
+void defer_call_end(void);
+void defer_call(void (*fn)(void *), void *opaque);
 
 AioContext *blk_get_aio_context(BlockBackend *blk);
 BlockAcctStats *blk_get_stats(BlockBackend *blk);
diff --git a/block/blkio.c b/block/blkio.c
index 1dd495617c..7cf6d61f47 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -312,10 +312,10 @@ static void blkio_detach_aio_context(BlockDriverState *bs)
 }
 
 /*
- * Called by blk_io_unplug() or immediately if not plugged. Called without
- * blkio_lock.
+ * Called by defer_call_end() or immediately if not in a deferred section.
+ * Called without blkio_lock.
  */
-static void blkio_unplug_fn(void *opaque)
+static void blkio_deferred_fn(void *opaque)
 {
 BDRVBlkioState *s = opaque;
 
@@ -332,7 +332,7 @@ static void blkio_submit_io(BlockDriverState *bs)
 {
 BDRVBlkioState *s = bs->opaque;
 
-blk_io_plug_call(blkio_unplug_fn, s);
+defer_call(blkio_deferred_fn, s);
 }
 
 static int coroutine_fn
diff --git a/block/io_uring.c b/block/io_uring.c
index 69d9820928..8429f341be 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -306,7 +306,7 @@ static void ioq_init(LuringQueue *io_q)
 io_q->blocked = false;
 }
 
-static void luring_unplug_fn(void *opaque)
+static void luring_deferred_fn(void *opaque)
 {
 LuringState *s = opaque;
 trace_luring_unplug_fn(s, s->io_q.blocked, s->io_q.in_queue,
@@ -367,7 +367,7 @@ static int luring_do_submit(int fd, LuringAIOCB *luringcb, 
LuringState *s,
 return ret;
 }
 
-blk_io_plug_call(luring_unplug_fn, s);
+defer_call(luring_deferred_fn, s);
 }
 return 0;
 }
diff --git a/block/linux-aio.c b/block/linux-aio.c
index 1a51503271..49a37174c2 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -353,7 +353,7 @@ static uint64_t laio_max_batch(LinuxAioState *s, uint64_t 
dev_max_batch)
 return max_batch;
 }
 
-static void laio_unplug_fn(void *opaque)
+static void laio_deferred_fn(void *opaque)
 {
 LinuxAioState *s = opaque;
 
@@ -393,7 +393,7 @@ static int laio_do_submit(int fd, struct qemu_laiocb 
*laiocb, off_t offset,
 if (s->io_q.in_queue >= laio_max_batch(s, dev_max_batch)) {
 ioq_submit(s);
 } else {
-blk_io_plug_call(laio_unplug_fn, s);
+defer_call(laio_deferred_fn, s);
 }
 }
 
diff --git a/block/nvme.c b/block/nvme.c
index b6e95f0b7e..dfbd1085fd 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -476,7 +476,7 @@ static void nvme_trace_command(const NvmeCmd *cmd)
 }
 }
 
-static void nvme_unplug_fn(void *opaque)
+static void nvme_deferred_fn(void *opaque)
 {
 NVMeQueuePair *q = opaque;
 
@@ -503,7 +503,7 @@ static void nvme_submit_command(NVMeQueuePair *q, 
NVMeRequest *req,
 q->need_kick++;
 qemu_mutex_unlock(&q->lock);
 
-blk_io_plug_call(nvme_unplug_fn, q);
+defer_call(nvme_deferred_fn, q);
 }
 
 static void nvme_admin_cmd_sync_cb(void *opaque, int ret)
diff --git a/block/plug.c b/block/plug.c
index 98a155d2f4..f26173559c 100644
--- a/block/plug.c
+++ b/block/plug.c
@@ -1,24 +1,21 @@
 /* SPDX-License-Identifier: GPL-2.0-or-later */
 /*
- * Block I/O plugging
+ * Deferred calls
  *
  * Copyright Red Hat.
  *
- * This API defers a function call within a blk_io_plug()/blk_io_unplug()
+ * This API defers a function call within a defer_call_begin()/defer_call_end()
  * section, allowing multiple calls to batch up. This is a performance
  * optimization that is used in the block layer to submit several I/O requests
  * at once instead of individually:
  *
- *   blk_io_plug(); <-- start of plugged region
+ *   d

[PATCH v3 3/4] virtio: use defer_call() in virtio_irqfd_notify()

2023-09-13 Thread Stefan Hajnoczi

virtio-blk and virtio-scsi invoke virtio_irqfd_notify() to send Used
Buffer Notifications from an IOThread. This involves an eventfd
write(2) syscall. Calling this repeatedly when completing multiple I/O
requests in a row is wasteful.

Use the defer_call() API to batch together virtio_irqfd_notify() calls
made during thread pool (aio=threads), Linux AIO (aio=native), and
io_uring (aio=io_uring) completion processing.

Behavior is unchanged for emulated devices that do not use
defer_call_begin()/defer_call_end() since defer_call() immediately
invokes the callback when called outside a
defer_call_begin()/defer_call_end() region.

fio rw=randread bs=4k iodepth=64 numjobs=8 IOPS increases by ~9% with a
single IOThread and 8 vCPUs. iodepth=1 decreases by ~1% but this could
be noise. Detailed performance data and configuration specifics are
available here:
https://gitlab.com/stefanha/virt-playbooks/-/tree/blk_io_plug-irqfd

This duplicates the BH that virtio-blk uses for batching. The next
commit will remove it.

Reviewed-by: Eric Blake 
Signed-off-by: Stefan Hajnoczi 
---
 block/io_uring.c   |  6 ++
 block/linux-aio.c  |  4 
 hw/virtio/virtio.c | 13 -
 util/thread-pool.c |  5 +
 hw/virtio/trace-events |  1 +
 5 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/block/io_uring.c b/block/io_uring.c
index 3a1e1f45b3..7cdd00e9f1 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -125,6 +125,9 @@ static void luring_process_completions(LuringState *s)
 {
 struct io_uring_cqe *cqes;
 int total_bytes;
+
+defer_call_begin();
+
 /*
  * Request completion callbacks can run the nested event loop.
  * Schedule ourselves so the nested event loop will "see" remaining
@@ -217,7 +220,10 @@ end:
 aio_co_wake(luringcb->co);
 }
 }
+
 qemu_bh_cancel(s->completion_bh);
+
+defer_call_end();
 }
 
 static int ioq_submit(LuringState *s)
diff --git a/block/linux-aio.c b/block/linux-aio.c
index a2670b3e46..ec05d946f3 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -205,6 +205,8 @@ static void qemu_laio_process_completions(LinuxAioState *s)
 {
 struct io_event *events;
 
+defer_call_begin();
+
 /* Reschedule so nested event loops see currently pending completions */
 qemu_bh_schedule(s->completion_bh);
 
@@ -231,6 +233,8 @@ static void qemu_laio_process_completions(LinuxAioState *s)
  * own `for` loop.  If we are the last all counters dropped to zero. */
 s->event_max = 0;
 s->event_idx = 0;
+
+defer_call_end();
 }
 
 static void qemu_laio_process_completions_and_submit(LinuxAioState *s)
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 969c25f4cf..d9aeed7012 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -15,6 +15,7 @@
 #include "qapi/error.h"
 #include "qapi/qapi-commands-virtio.h"
 #include "trace.h"
+#include "qemu/defer-call.h"
 #include "qemu/error-report.h"
 #include "qemu/log.h"
 #include "qemu/main-loop.h"
@@ -2426,6 +2427,16 @@ static bool virtio_should_notify(VirtIODevice *vdev, 
VirtQueue *vq)
 }
 }
 
+/* Batch irqs while inside a defer_call_begin()/defer_call_end() section */
+static void virtio_notify_irqfd_deferred_fn(void *opaque)
+{
+EventNotifier *notifier = opaque;
+VirtQueue *vq = container_of(notifier, VirtQueue, guest_notifier);
+
+trace_virtio_notify_irqfd_deferred_fn(vq->vdev, vq);
+event_notifier_set(notifier);
+}
+
 void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq)
 {
 WITH_RCU_READ_LOCK_GUARD() {
@@ -2452,7 +2463,7 @@ void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue 
*vq)
  * to an atomic operation.
  */
 virtio_set_isr(vq->vdev, 0x1);
-event_notifier_set(&vq->guest_notifier);
+defer_call(virtio_notify_irqfd_deferred_fn, &vq->guest_notifier);
 }
 
 static void virtio_irq(VirtQueue *vq)
diff --git a/util/thread-pool.c b/util/thread-pool.c
index e3d8292d14..d84961779a 100644
--- a/util/thread-pool.c
+++ b/util/thread-pool.c
@@ -15,6 +15,7 @@
  * GNU GPL, version 2 or (at your option) any later version.
  */
 #include "qemu/osdep.h"
+#include "qemu/defer-call.h"
 #include "qemu/queue.h"
 #include "qemu/thread.h"
 #include "qemu/coroutine.h"
@@ -175,6 +176,8 @@ static void thread_pool_completion_bh(void *opaque)
 ThreadPool *pool = opaque;
 ThreadPoolElement *elem, *next;
 
+defer_call_begin(); /* cb() may use defer_call() to coalesce work */
+
 restart:
 QLIST_FOREACH_SAFE(elem, &pool->head, all, next) {
 if (elem->state != THREAD_DONE) {
@@ -208,6 +211,8 @@ restart:
 qemu_aio_unref(elem);
 }
 }
+
+defer_call_end();
 }
 
 static void thread_pool_cancel(BlockAIOCB *acb)
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 7109cf1a3

[PATCH v3 0/4] virtio-blk: use blk_io_plug_call() instead of notification BH

2023-09-13 Thread Stefan Hajnoczi

v3:
- Add comment pointing to API documentation in .c file [Philippe]
- Add virtio_notify_irqfd_deferred_fn trace event [Ilya]
- Remove outdated #include [Ilya]
v2:
- Rename blk_io_plug() to defer_call() and move it to util/ so the net
  subsystem can use it [Ilya]
- Add defer_call_begin()/end() to thread_pool_completion_bh() to match Linux
  AIO and io_uring completion batching

Replace the seldom-used virtio-blk notification BH mechanism with
blk_io_plug(). This is part of an effort to enable the multi-queue block layer
in virtio-blk. The notification BH was not multi-queue friendly.

The blk_io_plug() mechanism improves fio rw=randread bs=4k iodepth=64 numjobs=8
IOPS by ~9% with a single IOThread and 8 vCPUs (this is not even a multi-queue
block layer configuration) compared to no completion batching. iodepth=1
decreases by ~1% but this could be noise. Benchmark details are available here:
https://gitlab.com/stefanha/virt-playbooks/-/tree/blk_io_plug-irqfd

Stefan Hajnoczi (4):
  block: rename blk_io_plug_call() API to defer_call()
  util/defer-call: move defer_call() to util/
  virtio: use defer_call() in virtio_irqfd_notify()
  virtio-blk: remove batch notification BH

 MAINTAINERS   |   3 +-
 include/qemu/defer-call.h |  16 +++
 include/sysemu/block-backend-io.h |   4 -
 block/blkio.c |   9 +-
 block/io_uring.c  |  11 ++-
 block/linux-aio.c |   9 +-
 block/nvme.c  |   5 +-
 block/plug.c  | 159 --
 hw/block/dataplane/virtio-blk.c   |  48 +
 hw/block/dataplane/xen-block.c|  11 ++-
 hw/block/virtio-blk.c |   5 +-
 hw/scsi/virtio-scsi.c |   7 +-
 hw/virtio/virtio.c|  13 ++-
 util/defer-call.c | 156 +
 util/thread-pool.c|   5 +
 block/meson.build |   1 -
 hw/virtio/trace-events|   1 +
 util/meson.build  |   1 +
 18 files changed, 231 insertions(+), 233 deletions(-)
 create mode 100644 include/qemu/defer-call.h
 delete mode 100644 block/plug.c
 create mode 100644 util/defer-call.c

-- 
2.41.0

[PATCH v3 4/4] virtio-blk: remove batch notification BH

2023-09-13 Thread Stefan Hajnoczi

There is a batching mechanism for virtio-blk Used Buffer Notifications
that is no longer needed because the previous commit added batching to
virtio_notify_irqfd().

Note that this mechanism was rarely used in practice because it is only
enabled when EVENT_IDX is not negotiated by the driver. Modern drivers
enable EVENT_IDX.

Reviewed-by: Eric Blake 
Signed-off-by: Stefan Hajnoczi 
---
 hw/block/dataplane/virtio-blk.c | 48 +
 1 file changed, 1 insertion(+), 47 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index da36fcfd0b..f83bb0f116 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -31,9 +31,6 @@ struct VirtIOBlockDataPlane {
 
 VirtIOBlkConf *conf;
 VirtIODevice *vdev;
-QEMUBH *bh; /* bh for guest notification */
-unsigned long *batch_notify_vqs;
-bool batch_notifications;
 
 /* Note that these EventNotifiers are assigned by value.  This is
  * fine as long as you do not call event_notifier_cleanup on them
@@ -47,36 +44,7 @@ struct VirtIOBlockDataPlane {
 /* Raise an interrupt to signal guest, if necessary */
 void virtio_blk_data_plane_notify(VirtIOBlockDataPlane *s, VirtQueue *vq)
 {
-if (s->batch_notifications) {
-set_bit(virtio_get_queue_index(vq), s->batch_notify_vqs);
-qemu_bh_schedule(s->bh);
-} else {
-virtio_notify_irqfd(s->vdev, vq);
-}
-}
-
-static void notify_guest_bh(void *opaque)
-{
-VirtIOBlockDataPlane *s = opaque;
-unsigned nvqs = s->conf->num_queues;
-unsigned long bitmap[BITS_TO_LONGS(nvqs)];
-unsigned j;
-
-memcpy(bitmap, s->batch_notify_vqs, sizeof(bitmap));
-memset(s->batch_notify_vqs, 0, sizeof(bitmap));
-
-for (j = 0; j < nvqs; j += BITS_PER_LONG) {
-unsigned long bits = bitmap[j / BITS_PER_LONG];
-
-while (bits != 0) {
-unsigned i = j + ctzl(bits);
-VirtQueue *vq = virtio_get_queue(s->vdev, i);
-
-virtio_notify_irqfd(s->vdev, vq);
-
-bits &= bits - 1; /* clear right-most bit */
-}
-}
+virtio_notify_irqfd(s->vdev, vq);
 }
 
 /* Context: QEMU global mutex held */
@@ -126,9 +94,6 @@ bool virtio_blk_data_plane_create(VirtIODevice *vdev, 
VirtIOBlkConf *conf,
 } else {
 s->ctx = qemu_get_aio_context();
 }
-s->bh = aio_bh_new_guarded(s->ctx, notify_guest_bh, s,
-   &DEVICE(vdev)->mem_reentrancy_guard);
-s->batch_notify_vqs = bitmap_new(conf->num_queues);
 
 *dataplane = s;
 
@@ -146,8 +111,6 @@ void virtio_blk_data_plane_destroy(VirtIOBlockDataPlane *s)
 
 vblk = VIRTIO_BLK(s->vdev);
 assert(!vblk->dataplane_started);
-g_free(s->batch_notify_vqs);
-qemu_bh_delete(s->bh);
 if (s->iothread) {
 object_unref(OBJECT(s->iothread));
 }
@@ -173,12 +136,6 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
 
 s->starting = true;
 
-if (!virtio_vdev_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX)) {
-s->batch_notifications = true;
-} else {
-s->batch_notifications = false;
-}
-
 /* Set up guest notifier (irq) */
 r = k->set_guest_notifiers(qbus->parent, nvqs, true);
 if (r != 0) {
@@ -370,9 +327,6 @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev)
 
 aio_context_release(s->ctx);
 
-qemu_bh_cancel(s->bh);
-notify_guest_bh(s); /* final chance to notify guest */
-
 /* Clean up guest notifier (irq) */
 k->set_guest_notifiers(qbus->parent, nvqs, false);
 
-- 
2.41.0

[PATCH v3 2/4] util/defer-call: move defer_call() to util/

2023-09-13 Thread Stefan Hajnoczi

The networking subsystem may wish to use defer_call(), so move the code
to util/ where it can be reused.

As a reminder of what defer_call() does:

This API defers a function call within a defer_call_begin()/defer_call_end()
section, allowing multiple calls to batch up. This is a performance
optimization that is used in the block layer to submit several I/O requests
at once instead of individually:

  defer_call_begin(); <-- start of section
  ...
  defer_call(my_func, my_obj); <-- deferred my_func(my_obj) call
  defer_call(my_func, my_obj); <-- another
  defer_call(my_func, my_obj); <-- another
  ...
  defer_call_end(); <-- end of section, my_func(my_obj) is called once

Suggested-by: Ilya Maximets 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Stefan Hajnoczi 
---
 MAINTAINERS   |  3 ++-
 include/qemu/defer-call.h | 16 
 include/sysemu/block-backend-io.h |  4 
 block/blkio.c |  1 +
 block/io_uring.c  |  1 +
 block/linux-aio.c |  1 +
 block/nvme.c  |  1 +
 hw/block/dataplane/xen-block.c|  1 +
 hw/block/virtio-blk.c |  1 +
 hw/scsi/virtio-scsi.c |  1 +
 block/plug.c => util/defer-call.c |  2 +-
 block/meson.build |  1 -
 util/meson.build  |  1 +
 13 files changed, 27 insertions(+), 7 deletions(-)
 create mode 100644 include/qemu/defer-call.h
 rename block/plug.c => util/defer-call.c (99%)

diff --git a/MAINTAINERS b/MAINTAINERS
index 00562f924f..acda735326 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2685,12 +2685,13 @@ S: Supported
 F: util/async.c
 F: util/aio-*.c
 F: util/aio-*.h
+F: util/defer-call.c
 F: util/fdmon-*.c
 F: block/io.c
-F: block/plug.c
 F: migration/block*
 F: include/block/aio.h
 F: include/block/aio-wait.h
+F: include/qemu/defer-call.h
 F: scripts/qemugdb/aio.py
 F: tests/unit/test-fdmon-epoll.c
 T: git https://github.com/stefanha/qemu.git block
diff --git a/include/qemu/defer-call.h b/include/qemu/defer-call.h
new file mode 100644
index 00..e2c1d24572
--- /dev/null
+++ b/include/qemu/defer-call.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Deferred calls
+ *
+ * Copyright Red Hat.
+ */
+
+#ifndef QEMU_DEFER_CALL_H
+#define QEMU_DEFER_CALL_H
+
+/* See documentation in util/defer-call.c */
+void defer_call_begin(void);
+void defer_call_end(void);
+void defer_call(void (*fn)(void *), void *opaque);
+
+#endif /* QEMU_DEFER_CALL_H */
diff --git a/include/sysemu/block-backend-io.h 
b/include/sysemu/block-backend-io.h
index cfcfd85c1d..d174275a5c 100644
--- a/include/sysemu/block-backend-io.h
+++ b/include/sysemu/block-backend-io.h
@@ -100,10 +100,6 @@ void blk_iostatus_set_err(BlockBackend *blk, int error);
 int blk_get_max_iov(BlockBackend *blk);
 int blk_get_max_hw_iov(BlockBackend *blk);
 
-void defer_call_begin(void);
-void defer_call_end(void);
-void defer_call(void (*fn)(void *), void *opaque);
-
 AioContext *blk_get_aio_context(BlockBackend *blk);
 BlockAcctStats *blk_get_stats(BlockBackend *blk);
 void *blk_aio_get(const AIOCBInfo *aiocb_info, BlockBackend *blk,
diff --git a/block/blkio.c b/block/blkio.c
index 7cf6d61f47..0a0a6c0f5f 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -13,6 +13,7 @@
 #include "block/block_int.h"
 #include "exec/memory.h"
 #include "exec/cpu-common.h" /* for qemu_ram_get_fd() */
+#include "qemu/defer-call.h"
 #include "qapi/error.h"
 #include "qemu/error-report.h"
 #include "qapi/qmp/qdict.h"
diff --git a/block/io_uring.c b/block/io_uring.c
index 8429f341be..3a1e1f45b3 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -15,6 +15,7 @@
 #include "block/block.h"
 #include "block/raw-aio.h"
 #include "qemu/coroutine.h"
+#include "qemu/defer-call.h"
 #include "qapi/error.h"
 #include "sysemu/block-backend.h"
 #include "trace.h"
diff --git a/block/linux-aio.c b/block/linux-aio.c
index 49a37174c2..a2670b3e46 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -14,6 +14,7 @@
 #include "block/raw-aio.h"
 #include "qemu/event_notifier.h"
 #include "qemu/coroutine.h"
+#include "qemu/defer-call.h"
 #include "qapi/error.h"
 #include "sysemu/block-backend.h"
 
diff --git a/block/nvme.c b/block/nvme.c
index dfbd1085fd..96b3f8f2fa 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -16,6 +16,7 @@
 #include "qapi/error.h"
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qstring.h"
+#include "qemu/defer-call.h"
 #include "qemu/error-report.h"
 #include "qemu/main-loop.h"
 #include "qemu/module.h"
diff --git a/hw/block/dataplane/xen-block.c b/hw/block/dataplane/xen-block.c
index e9dd8f8a99..c4bb28c66f 100644
--- a/hw/block/dataplane/xen-block.c
+++ b/hw/

Re: QEMU migration-test CI intermittent failure

2023-09-13 Thread Stefan Hajnoczi

On Wed, 13 Sept 2023 at 15:44, Fabiano Rosas  wrote:
>
> Stefan Hajnoczi  writes:
>
> > Hi,
> > The following intermittent failure occurred in the CI and I have filed
> > an Issue for it:
> > https://gitlab.com/qemu-project/qemu/-/issues/1886
> >
> > Output:
> >
> >   >>> QTEST_QEMU_IMG=./qemu-img MALLOC_PERTURB_=116 
> > QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon 
> > G_TEST_DBUS_DAEMON=/builds/qemu-project/qemu/tests/dbus-vmstate-daemon.sh 
> > QTEST_QEMU_BINARY=./qemu-system-x86_64 
> > /builds/qemu-project/qemu/build/tests/qtest/migration-test --tap -k
> >   ― ✀  
> > ―
> >   stderr:
> >   qemu-system-x86_64: Unable to read from socket: Connection reset by peer
> >   Memory content inconsistency at 5b43000 first_byte = bd last_byte = bc 
> > current = 4f hit_edge = 1
> >   **
> >   ERROR:../tests/qtest/migration-test.c:300:check_guests_ram: assertion 
> > failed: (bad == 0)
> >   (test program exited with status code -6)
> >
> > You can find the full output here:
> > https://gitlab.com/qemu-project/qemu/-/jobs/5080200417
>
> This is the postcopy return path issue that I'm addressing here:
>
> https://lore.kernel.org/r/20230911171320.24372-1-faro...@suse.de
> Subject: [PATCH v6 00/10] Fix segfault on migration return path
> Message-ID: <20230911171320.24372-1-faro...@suse.de>

Awesome, thanks!

Stefan

Re: [PATCH v2 2/4] util/defer-call: move defer_call() to util/

2023-09-13 Thread Stefan Hajnoczi

On Fri, Aug 18, 2023 at 10:31:40AM +0200, Philippe Mathieu-Daudé wrote:
> Hi Stefan,
> 
> On 17/8/23 17:58, Stefan Hajnoczi wrote:
> > The networking subsystem may wish to use defer_call(), so move the code
> > to util/ where it can be reused.
> > 
> > As a reminder of what defer_call() does:
> > 
> > This API defers a function call within a defer_call_begin()/defer_call_end()
> > section, allowing multiple calls to batch up. This is a performance
> > optimization that is used in the block layer to submit several I/O requests
> > at once instead of individually:
> > 
> >defer_call_begin(); <-- start of section
> >...
> >defer_call(my_func, my_obj); <-- deferred my_func(my_obj) call
> >defer_call(my_func, my_obj); <-- another
> >defer_call(my_func, my_obj); <-- another
> >...
> >defer_call_end(); <-- end of section, my_func(my_obj) is called once
> > 
> > Suggested-by: Ilya Maximets 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >   MAINTAINERS   |  3 ++-
> >   include/qemu/defer-call.h | 15 +++
> >   include/sysemu/block-backend-io.h |  4 
> >   block/blkio.c |  1 +
> >   block/io_uring.c  |  1 +
> >   block/linux-aio.c |  1 +
> >   block/nvme.c  |  1 +
> >   hw/block/dataplane/xen-block.c|  1 +
> >   hw/block/virtio-blk.c |  1 +
> >   hw/scsi/virtio-scsi.c |  1 +
> >   block/plug.c => util/defer-call.c |  2 +-
> >   block/meson.build |  1 -
> >   util/meson.build  |  1 +
> >   13 files changed, 26 insertions(+), 7 deletions(-)
> >   create mode 100644 include/qemu/defer-call.h
> >   rename block/plug.c => util/defer-call.c (99%)
> > 
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 6111b6b4d9..7cd7132ffc 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -2676,12 +2676,13 @@ S: Supported
> >   F: util/async.c
> >   F: util/aio-*.c
> >   F: util/aio-*.h
> > +F: util/defer-call.c
> 
> If used by network/other backends, maybe worth adding a
> brand new section instead, rather than "Block I/O path".

Changes to defer-call.c will go through my block tree. We don't split
out the event loop (async.c, aio-*.c, etc) either even though it's
shared by other subsystems. The important thing is that
scripts/get_maintainer.pl identifies the maintainers.

I'd rather not create lots of micro-subsystems in MAINTAINERS that
duplicate my email and block git repo URL.

> 
> >   F: util/fdmon-*.c
> >   F: block/io.c
> > -F: block/plug.c
> >   F: migration/block*
> >   F: include/block/aio.h
> >   F: include/block/aio-wait.h
> > +F: include/qemu/defer-call.h
> >   F: scripts/qemugdb/aio.py
> >   F: tests/unit/test-fdmon-epoll.c
> >   T: git https://github.com/stefanha/qemu.git block
> > diff --git a/include/qemu/defer-call.h b/include/qemu/defer-call.h
> > new file mode 100644
> > index 00..291f86c987
> > --- /dev/null
> > +++ b/include/qemu/defer-call.h
> > @@ -0,0 +1,15 @@
> > +/* SPDX-License-Identifier: GPL-2.0-or-later */
> > +/*
> > + * Deferred calls
> > + *
> > + * Copyright Red Hat.
> > + */
> > +
> > +#ifndef QEMU_DEFER_CALL_H
> > +#define QEMU_DEFER_CALL_H
> > +
> 
> Please add smth like:
> 
>/* See documentation in util/defer-call.c */

Sure, will fix.

> 
> > +void defer_call_begin(void);
> > +void defer_call_end(void);
> > +void defer_call(void (*fn)(void *), void *opaque);
> > +
> > +#endif /* QEMU_DEFER_CALL_H */
> 
> Reviewed-by: Philippe Mathieu-Daudé 
> 


signature.asc
Description: PGP signature

QEMU migration-test CI intermittent failure

2023-09-13 Thread Stefan Hajnoczi

Hi,
The following intermittent failure occurred in the CI and I have filed
an Issue for it:
https://gitlab.com/qemu-project/qemu/-/issues/1886

Output:

  >>> QTEST_QEMU_IMG=./qemu-img MALLOC_PERTURB_=116 
QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon 
G_TEST_DBUS_DAEMON=/builds/qemu-project/qemu/tests/dbus-vmstate-daemon.sh 
QTEST_QEMU_BINARY=./qemu-system-x86_64 
/builds/qemu-project/qemu/build/tests/qtest/migration-test --tap -k
  ― ✀  ―
  stderr:
  qemu-system-x86_64: Unable to read from socket: Connection reset by peer
  Memory content inconsistency at 5b43000 first_byte = bd last_byte = bc 
current = 4f hit_edge = 1
  **
  ERROR:../tests/qtest/migration-test.c:300:check_guests_ram: assertion failed: 
(bad == 0)
  (test program exited with status code -6)

You can find the full output here:
https://gitlab.com/qemu-project/qemu/-/jobs/5080200417

Please take a look!

Thanks,
Stefan


signature.asc
Description: PGP signature

Re: [PULL 0/2] hw/nvme: updates

2023-09-13 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

Re: [PULL 0/4] Build fix patches for 2023-09-13

2023-09-13 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

Re: [PULL v3 0/1] Merge tpm 2023/09/12 v3

2023-09-13 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

Re: Assertion `dpy_ui_info_supported(con)' failed

2023-09-13 Thread Stefan Hajnoczi

Oops, I forgot to CC qemu-devel.

On Wed, 13 Sept 2023 at 15:06, Stefan Hajnoczi  wrote:
>
> I can't start a qemu.git build with a GTK UI but -display vnc=:0 works:
>
> $ ./configure --target-list=x86_64-softmmu
> $ build/qemu-system-x86_64
> qemu-system-x86_64: ../ui/console.c:818: dpy_get_ui_info: Assertion
> `dpy_ui_info_supported(con)' failed.
>
> Here is the configure output:
>   Build environment
> Build directory  : /home/stefanha/qemu/build
> Source path  : /home/stefanha/qemu
> Download dependencies: YES
>
>   Directories
> Build directory  : /home/stefanha/qemu/build
> Source path  : /home/stefanha/qemu
> Download dependencies: YES
> Install prefix   : /usr/local
> BIOS directory   : share/qemu
> firmware path: share/qemu-firmware
> binary directory : /usr/local/bin
> library directory: /usr/local/lib64
> module directory : lib64/qemu
> libexec directory: /usr/local/libexec
> include directory: /usr/local/include
> config directory : /usr/local/etc
> local state directory: /var/local
> Manual directory : /usr/local/share/man
> Doc directory: /usr/local/share/doc
>
>   Host binaries
> python   :
> /home/stefanha/qemu/build/pyvenv/bin/python3 (version: 3.11)
> sphinx-build :
> /home/stefanha/qemu/build/pyvenv/bin/sphinx-build
> gdb  : /usr/bin/gdb
> iasl : NO
> genisoimage  : /usr/bin/genisoimage
> smbd : /usr/sbin/smbd
>
>   Configurable features
> Documentation: YES
> system-mode emulation: YES
> user-mode emulation  : NO
> block layer  : YES
> Install blobs: YES
> module support   : NO
> fuzzing support  : NO
> Audio drivers: pa oss
> Trace backends   : log
> D-Bus display: YES
> QOM debugging: YES
> vhost-kernel support : YES
> vhost-net support: YES
> vhost-user support   : YES
> vhost-user-crypto support: YES
> vhost-user-blk server support: YES
> vhost-vdpa support   : YES
> build guest agent: YES
>
>   Compilation
> host CPU : x86_64
> host endianness  : little
> C compiler   : cc -m64 -mcx16
> Host C compiler  : cc -m64 -mcx16
> C++ compiler : NO
> CFLAGS   : -g -O2
> QEMU_CFLAGS  : -D_GNU_SOURCE
> -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -fno-strict-aliasing
> -fno-common -fwrapv -fstack-protector-strong -U_FORTIFY_SOURCE
> -D_FORTIFY_SOURCE=2
> QEMU_LDFLAGS :
> -fstack-protector-strong -Wl,-z,relro -Wl,-z,now -Wl,--warn-common
> link-time optimization (LTO) : NO
> PIE  : YES
> static build : NO
> malloc trim support  : YES
> membarrier   : NO
> debug graph lock : NO
> debug stack usage: NO
> mutex debugging  : NO
> memory allocator : system
> avx2 optimization: YES
> avx512bw optimization: YES
> avx512f optimization : NO
> gprof: NO
> gcov

Re: Various changes "backportability"

2023-09-13 Thread Stefan Hajnoczi

On Wed, 13 Sept 2023 at 04:13, Michael Tokarev  wrote:
>
> [Added some more active patch reviewers to Cc]
>
> Hi!
>
> Yesterday I wrote email about picking up changes from master
> for previous stable release(s).  What's interesting is that
> yesterday, basically in a single day, we've faced numerous
> examples of subsystem changes which makes such backporting
> significantly more difficult than might be.
>
> For example, recent tpm bugfix, which is trivial by its own,
> uses RETRY_ON_EINTR helper which were introduced recently and
> which is now used everywhere.  coroutine_fn et al markers is
> another example, translator_io_start is yet another, and so
> on and so on.
>
> When adding such subsystems/helpers which are to be used widely,
> please split the initial implementation patch out of a single
> "introduce foo; convert everything to use it" change.  Instead,
> add the feature in a small patch first, and convert all users
> tree-wide to it in a second, subsequent patch, maybe removing
> the old version in that second patch too.  Where it makes sense
> ofcourse, - sometimes it is not possible or just complicated to
> do that, like when old and new implementations can't be supported
> in parallel.
>
> Just by splitting "introduce" from "convert", especially for
> something simple which will be used all around, you'll greatly
> simplify stable trees maintenance.

The general concept makes sense to me but I'm not sure what the
specific issue with adding (?) coroutine_fn was. Can you link to the
patch that caused difficulties so I can review it?

Thanks,
Stefan

Re: CI container image interference between staging and staging-7.2

2023-09-13 Thread Stefan Hajnoczi

On Wed, Sep 13, 2023, 03:26 Michael Tokarev  wrote:

> 13.09.2023 02:07, Stefan Hajnoczi wrote:
> > Hi,
> > TL;DR Michael: Please check that the staging-7.2 branch has Dan's
> > commit e28112d00703abd136e2411d23931f4f891c9244 ("gitlab: stable
> > staging branches publish containers in a separate tag").
> ...
>
> Mea cupla, Stefan.  I'm always forgetting about the fact that ci controls
> don't work on older branches in one way or another. Sigh.
>
> The patch(es) you're talking about - I didn't pick them up for 7.2 (which
> was the branch in question this time, which interfered with your testing),
> thinking it would be ok.  Yes, after this fiasco (which is the first one
> actually), it looks like I should re-consider doing this.
>
> It needs quite a few changes in there. And one of them is to actually
> look at QEMU_CI={0,1,2} variable when pushing staging-N.M branches.  Right
> now - and this is what I forgot this time once again, - I used QEMU_CI=1
> so the job does not auto-start, but forgot that in 7.2 it auto-starts
> regardless of QEMU_CI value.
>
> I don't push staging-N.M branches often, usually doing all the CI on
> a my gitlab repository instead. And when I do push to qemu-project,
> I either intend to skip automatic job run, to run just the tests I'm
> interested in, or push it at a time when no other pipelines are to be
> run (which is easy due to time zone differences).
>
> But actually I'm a bit surprised this issue happened to begin with.
> Maybe something else is missing still.  The thing is that after
> Daniel's changes, qemu/staging container tags should be named differently,
> no?   Ah. No. Staging didn't change, it was staging-N.M which were
> renamed.  Once again, I'm sorry for not thinking well enough about this, -
> after container tags renaming I was kinda sure main staging tags were
> different from old staging-N.M, which is not the case..
>
> Please excuse me for this trouble.  Things like these usually takes quite
> some time to figure out.. :(  I'll make sure this wont happen again,
> one way or another.
>

No worries!

Stefan

>

[PATCH v3 1/5] block: remove AIOCBInfo->get_aio_context()

2023-09-12 Thread Stefan Hajnoczi

The synchronous bdrv_aio_cancel() function needs the acb's AioContext so
it can call aio_poll() to wait for cancellation.

It turns out that all users run under the BQL in the main AioContext, so
this callback is not needed.

Remove the callback, mark bdrv_aio_cancel() GLOBAL_STATE_CODE just like
its blk_aio_cancel() caller, and poll the main loop AioContext.

The purpose of this cleanup is to identify bdrv_aio_cancel() as an API
that does not work with the multi-queue block layer.

Signed-off-by: Stefan Hajnoczi 
---
 include/block/aio.h|  1 -
 include/block/block-global-state.h |  2 ++
 include/block/block-io.h   |  1 -
 block/block-backend.c  | 17 -
 block/io.c | 23 ---
 hw/nvme/ctrl.c |  7 ---
 softmmu/dma-helpers.c  |  8 
 util/thread-pool.c |  8 
 8 files changed, 10 insertions(+), 57 deletions(-)

diff --git a/include/block/aio.h b/include/block/aio.h
index 32042e8905..bcc165c974 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -31,7 +31,6 @@ typedef void BlockCompletionFunc(void *opaque, int ret);
 
 typedef struct AIOCBInfo {
 void (*cancel_async)(BlockAIOCB *acb);
-AioContext *(*get_aio_context)(BlockAIOCB *acb);
 size_t aiocb_size;
 } AIOCBInfo;
 
diff --git a/include/block/block-global-state.h 
b/include/block/block-global-state.h
index f347199bff..ac2a605ef5 100644
--- a/include/block/block-global-state.h
+++ b/include/block/block-global-state.h
@@ -185,6 +185,8 @@ void bdrv_drain_all_begin_nopoll(void);
 void bdrv_drain_all_end(void);
 void bdrv_drain_all(void);
 
+void bdrv_aio_cancel(BlockAIOCB *acb);
+
 int bdrv_has_zero_init_1(BlockDriverState *bs);
 int bdrv_has_zero_init(BlockDriverState *bs);
 BlockDriverState *bdrv_find_node(const char *node_name);
diff --git a/include/block/block-io.h b/include/block/block-io.h
index 6db48f2d35..f1c796a1ce 100644
--- a/include/block/block-io.h
+++ b/include/block/block-io.h
@@ -101,7 +101,6 @@ bdrv_co_delete_file_noerr(BlockDriverState *bs);
 
 
 /* async block I/O */
-void bdrv_aio_cancel(BlockAIOCB *acb);
 void bdrv_aio_cancel_async(BlockAIOCB *acb);
 
 /* sg packet commands */
diff --git a/block/block-backend.c b/block/block-backend.c
index 4009ed5fed..a77295a198 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -33,8 +33,6 @@
 
 #define NOT_DONE 0x7fff /* used while emulated sync operation in progress 
*/
 
-static AioContext *blk_aiocb_get_aio_context(BlockAIOCB *acb);
-
 typedef struct BlockBackendAioNotifier {
 void (*attached_aio_context)(AioContext *new_context, void *opaque);
 void (*detach_aio_context)(void *opaque);
@@ -103,7 +101,6 @@ typedef struct BlockBackendAIOCB {
 } BlockBackendAIOCB;
 
 static const AIOCBInfo block_backend_aiocb_info = {
-.get_aio_context = blk_aiocb_get_aio_context,
 .aiocb_size = sizeof(BlockBackendAIOCB),
 };
 
@@ -1545,16 +1542,8 @@ typedef struct BlkAioEmAIOCB {
 bool has_returned;
 } BlkAioEmAIOCB;
 
-static AioContext *blk_aio_em_aiocb_get_aio_context(BlockAIOCB *acb_)
-{
-BlkAioEmAIOCB *acb = container_of(acb_, BlkAioEmAIOCB, common);
-
-return blk_get_aio_context(acb->rwco.blk);
-}
-
 static const AIOCBInfo blk_aio_em_aiocb_info = {
 .aiocb_size = sizeof(BlkAioEmAIOCB),
-.get_aio_context= blk_aio_em_aiocb_get_aio_context,
 };
 
 static void blk_aio_complete(BlkAioEmAIOCB *acb)
@@ -2434,12 +2423,6 @@ AioContext *blk_get_aio_context(BlockBackend *blk)
 return blk->ctx;
 }
 
-static AioContext *blk_aiocb_get_aio_context(BlockAIOCB *acb)
-{
-BlockBackendAIOCB *blk_acb = DO_UPCAST(BlockBackendAIOCB, common, acb);
-return blk_get_aio_context(blk_acb->blk);
-}
-
 int blk_set_aio_context(BlockBackend *blk, AioContext *new_context,
 Error **errp)
 {
diff --git a/block/io.c b/block/io.c
index ba23a9bcd3..209a6da0c8 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2950,25 +2950,18 @@ int bdrv_load_vmstate(BlockDriverState *bs, uint8_t 
*buf,
 /**/
 /* async I/Os */
 
+/**
+ * Synchronously cancels an acb. Must be called with the BQL held and the acb
+ * must be processed with the BQL held too (IOThreads are not allowed).
+ *
+ * Use bdrv_aio_cancel_async() instead when possible.
+ */
 void bdrv_aio_cancel(BlockAIOCB *acb)
 {
-IO_CODE();
+GLOBAL_STATE_CODE();
 qemu_aio_ref(acb);
 bdrv_aio_cancel_async(acb);
-while (acb->refcnt > 1) {
-if (acb->aiocb_info->get_aio_context) {
-aio_poll(acb->aiocb_info->get_aio_context(acb), true);
-} else if (acb->bs) {
-/* qemu_aio_ref and qemu_aio_unref are not thread-safe, so
- * assert that we're not using an I/O thread.  Thread-safe
- * code should use bdrv_aio_cancel_async exclusively.
- */
-

[PATCH v3 4/5] block-backend: process zoned requests in the current AioContext

2023-09-12 Thread Stefan Hajnoczi

Process zoned requests in the current thread's AioContext instead of in
the BlockBackend's AioContext.

There is no need to use the BlockBackend's AioContext thanks to CoMutex
bs->wps->colock, which protects zone metadata.

Signed-off-by: Stefan Hajnoczi 
---
 block/block-backend.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 4863be5691..427ebcc0e4 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1890,11 +1890,11 @@ BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, 
int64_t offset,
 acb->has_returned = false;
 
 co = qemu_coroutine_create(blk_aio_zone_report_entry, acb);
-aio_co_enter(blk_get_aio_context(blk), co);
+aio_co_enter(qemu_get_current_aio_context(), co);
 
 acb->has_returned = true;
 if (acb->rwco.ret != NOT_DONE) {
-replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
+replay_bh_schedule_oneshot_event(qemu_get_current_aio_context(),
  blk_aio_complete_bh, acb);
 }
 
@@ -1931,11 +1931,11 @@ BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, 
BlockZoneOp op,
 acb->has_returned = false;
 
 co = qemu_coroutine_create(blk_aio_zone_mgmt_entry, acb);
-aio_co_enter(blk_get_aio_context(blk), co);
+aio_co_enter(qemu_get_current_aio_context(), co);
 
 acb->has_returned = true;
 if (acb->rwco.ret != NOT_DONE) {
-replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
+replay_bh_schedule_oneshot_event(qemu_get_current_aio_context(),
  blk_aio_complete_bh, acb);
 }
 
@@ -1971,10 +1971,10 @@ BlockAIOCB *blk_aio_zone_append(BlockBackend *blk, 
int64_t *offset,
 acb->has_returned = false;
 
 co = qemu_coroutine_create(blk_aio_zone_append_entry, acb);
-aio_co_enter(blk_get_aio_context(blk), co);
+aio_co_enter(qemu_get_current_aio_context(), co);
 acb->has_returned = true;
 if (acb->rwco.ret != NOT_DONE) {
-replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
+replay_bh_schedule_oneshot_event(qemu_get_current_aio_context(),
  blk_aio_complete_bh, acb);
 }
 
-- 
2.41.0

[PATCH v3 0/5] block-backend: process I/O in the current AioContext

2023-09-12 Thread Stefan Hajnoczi

v3
- Add Patch 2 to fix a race condition in test-bdrv-drain. This was the CI
  failure that bumped this patch series from Kevin's pull request.
- Add missing 051.pc.out file. I tried qemu-system-aarch64 to see of 051.out
  also needs to be updated, but no changes were necessary. [Kevin]
v2
- Add patch to remove AIOCBInfo->get_aio_context() [Kevin]
- Add patch to use qemu_get_current_aio_context() in block-coroutine-wrapper so
  that the wrappers use the current AioContext instead of
  bdrv_get_aio_context().

Switch blk_aio_*() APIs over to multi-queue by using
qemu_get_current_aio_context() instead of blk_get_aio_context(). This change
will allow devices to process I/O in multiple IOThreads in the future.

The final patch requires my QIOChannel AioContext series to pass
tests/qemu-iotests/check -qcow2 281 because the nbd block driver is now
accessed from the main loop thread in addition to the IOThread:
https://lore.kernel.org/qemu-devel/20230823234504.1387239-1-stefa...@redhat.com/T/#t

Based-on: 20230823234504.1387239-1-stefa...@redhat.com

Stefan Hajnoczi (5):
  block: remove AIOCBInfo->get_aio_context()
  test-bdrv-drain: avoid race with BH in IOThread drain test
  block-backend: process I/O in the current AioContext
  block-backend: process zoned requests in the current AioContext
  block-coroutine-wrapper: use qemu_get_current_aio_context()

 include/block/aio.h|  1 -
 include/block/block-global-state.h |  2 ++
 include/block/block-io.h   |  1 -
 block/block-backend.c  | 35 --
 block/io.c | 23 +++-
 hw/nvme/ctrl.c |  7 --
 softmmu/dma-helpers.c  |  8 ---
 tests/unit/test-bdrv-drain.c   |  8 +++
 util/thread-pool.c |  8 ---
 scripts/block-coroutine-wrapper.py |  6 ++---
 tests/qemu-iotests/051.pc.out  |  4 ++--
 11 files changed, 31 insertions(+), 72 deletions(-)

-- 
2.41.0

[PATCH v3 2/5] test-bdrv-drain: avoid race with BH in IOThread drain test

2023-09-12 Thread Stefan Hajnoczi

This patch fixes a race condition in test-bdrv-drain that is difficult
to reproduce. test-bdrv-drain sometimes fails without an error message
on the block pull request sent by Kevin Wolf on Sep 4, 2023. I was able
to reproduce it locally and found that "block-backend: process I/O in
the current AioContext" (in this patch series) is the first commit where
it reproduces.

I do not know why "block-backend: process I/O in the current AioContext"
exposes this bug. It might be related to the fact that the test's preadv
request runs in the main thread instead of IOThread a after my commit.
That might simply change the timing of the test.

Now on to the race condition in test-bdrv-drain. The main thread
schedules a BH in IOThread a and then drains the BDS:

  aio_bh_schedule_oneshot(ctx_a, test_iothread_main_thread_bh, &data);

  /* The request is running on the IOThread a. Draining its block device
   * will make sure that it has completed as far as the BDS is concerned,
   * but the drain in this thread can continue immediately after
   * bdrv_dec_in_flight() and aio_ret might be assigned only slightly
   * later. */
  do_drain_begin(drain_type, bs);

If the BH completes before do_drain_begin() then there is nothing to
worry about.

If the BH invokes bdrv_flush() before do_drain_begin(), then
do_drain_begin() waits for it to complete.

The problematic case is when do_drain_begin() runs before the BH enters
bdrv_flush(). Then do_drain_begin() misses the BH and the drain
mechanism has failed in quiescing I/O.

Fix this by incrementing the in_flight counter so that do_drain_begin()
waits for test_iothread_main_thread_bh().

Signed-off-by: Stefan Hajnoczi 
---
 tests/unit/test-bdrv-drain.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index ccc453c29e..67a79aa3f0 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -512,6 +512,7 @@ static void test_iothread_main_thread_bh(void *opaque)
  * executed during drain, otherwise this would deadlock. */
 aio_context_acquire(bdrv_get_aio_context(data->bs));
 bdrv_flush(data->bs);
+bdrv_dec_in_flight(data->bs); /* incremented by test_iothread_common() */
 aio_context_release(bdrv_get_aio_context(data->bs));
 }
 
@@ -583,6 +584,13 @@ static void test_iothread_common(enum drain_type 
drain_type, int drain_thread)
 aio_context_acquire(ctx_a);
 }
 
+/*
+ * Increment in_flight so that do_drain_begin() waits for
+ * test_iothread_main_thread_bh(). This prevents the race between
+ * test_iothread_main_thread_bh() in IOThread a and do_drain_begin() in
+ * this thread. test_iothread_main_thread_bh() decrements in_flight.
+ */
+bdrv_inc_in_flight(bs);
 aio_bh_schedule_oneshot(ctx_a, test_iothread_main_thread_bh, &data);
 
 /* The request is running on the IOThread a. Draining its block device
-- 
2.41.0

[PATCH v3 5/5] block-coroutine-wrapper: use qemu_get_current_aio_context()

2023-09-12 Thread Stefan Hajnoczi

Use qemu_get_current_aio_context() in mixed wrappers and coroutine
wrappers so that code runs in the caller's AioContext instead of moving
to the BlockDriverState's AioContext. This change is necessary for the
multi-queue block layer where any thread can call into the block layer.

Most wrappers are IO_CODE where it's safe to use the current AioContext
nowadays. BlockDrivers and the core block layer use their own locks and
no longer depend on the AioContext lock for thread-safety.

The bdrv_create() wrapper invokes GLOBAL_STATE code. Using the current
AioContext is safe because this code is only called with the BQL held
from the main loop thread.

The output of qemu-iotests 051 is sensitive to event loop activity.
Update the output because the monitor BH runs at a different time,
causing prompts to be printed differently in the output.

Signed-off-by: Stefan Hajnoczi 
---
 scripts/block-coroutine-wrapper.py | 6 ++
 tests/qemu-iotests/051.pc.out  | 4 ++--
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/scripts/block-coroutine-wrapper.py 
b/scripts/block-coroutine-wrapper.py
index d4a183db61..f93fe154c3 100644
--- a/scripts/block-coroutine-wrapper.py
+++ b/scripts/block-coroutine-wrapper.py
@@ -88,8 +88,6 @@ def __init__(self, wrapper_type: str, return_type: str, name: 
str,
 raise ValueError(f"no_co function can't be rdlock: 
{self.name}")
 self.target_name = f'{subsystem}_{subname}'
 
-self.ctx = self.gen_ctx()
-
 self.get_result = 's->ret = '
 self.ret = 'return s.ret;'
 self.co_ret = 'return '
@@ -162,7 +160,7 @@ def create_mixed_wrapper(func: FuncDecl) -> str:
 {func.co_ret}{name}({ func.gen_list('{name}') });
 }} else {{
 {struct_name} s = {{
-.poll_state.ctx = {func.ctx},
+.poll_state.ctx = qemu_get_current_aio_context(),
 .poll_state.in_progress = true,
 
 { func.gen_block('.{name} = {name},') }
@@ -186,7 +184,7 @@ def create_co_wrapper(func: FuncDecl) -> str:
 {func.return_type} {func.name}({ func.gen_list('{decl}') })
 {{
 {struct_name} s = {{
-.poll_state.ctx = {func.ctx},
+.poll_state.ctx = qemu_get_current_aio_context(),
 .poll_state.in_progress = true,
 
 { func.gen_block('.{name} = {name},') }
diff --git a/tests/qemu-iotests/051.pc.out b/tests/qemu-iotests/051.pc.out
index 4d4af5a486..650cfed8e2 100644
--- a/tests/qemu-iotests/051.pc.out
+++ b/tests/qemu-iotests/051.pc.out
@@ -177,11 +177,11 @@ QEMU_PROG: -device virtio-blk-pci,drive=disk,share-rw=on: 
Cannot change iothread
 
 Testing: -drive file=TEST_DIR/t.qcow2,if=none,node-name=disk -object 
iothread,id=thread0 -device virtio-scsi,iothread=thread0,id=virtio-scsi0 
-device scsi-hd,bus=virtio-scsi0.0,drive=disk,share-rw=on -device 
lsi53c895a,id=lsi0 -device scsi-hd,bus=lsi0.0,drive=disk,share-rw=on
 QEMU X.Y.Z monitor - type 'help' for more information
-(qemu) QEMU_PROG: -device scsi-hd,bus=lsi0.0,drive=disk,share-rw=on: HBA does 
not support iothreads
+QEMU_PROG: -device scsi-hd,bus=lsi0.0,drive=disk,share-rw=on: HBA does not 
support iothreads
 
 Testing: -drive file=TEST_DIR/t.qcow2,if=none,node-name=disk -object 
iothread,id=thread0 -device virtio-scsi,iothread=thread0,id=virtio-scsi0 
-device scsi-hd,bus=virtio-scsi0.0,drive=disk,share-rw=on -device 
virtio-scsi,id=virtio-scsi1 -device 
scsi-hd,bus=virtio-scsi1.0,drive=disk,share-rw=on
 QEMU X.Y.Z monitor - type 'help' for more information
-(qemu) QEMU_PROG: -device scsi-hd,bus=virtio-scsi1.0,drive=disk,share-rw=on: 
Cannot change iothread of active block backend
+QEMU_PROG: -device scsi-hd,bus=virtio-scsi1.0,drive=disk,share-rw=on: Cannot 
change iothread of active block backend
 
 Testing: -drive file=TEST_DIR/t.qcow2,if=none,node-name=disk -object 
iothread,id=thread0 -device virtio-scsi,iothread=thread0,id=virtio-scsi0 
-device scsi-hd,bus=virtio-scsi0.0,drive=disk,share-rw=on -device 
virtio-blk-pci,drive=disk,iothread=thread0,share-rw=on
 QEMU X.Y.Z monitor - type 'help' for more information
-- 
2.41.0

[PATCH v3 3/5] block-backend: process I/O in the current AioContext

2023-09-12 Thread Stefan Hajnoczi

Switch blk_aio_*() APIs over to multi-queue by using
qemu_get_current_aio_context() instead of blk_get_aio_context(). This
change will allow devices to process I/O in multiple IOThreads in the
future.

I audited existing blk_aio_*() callers:
- migration/block.c: blk_mig_lock() protects the data accessed by the
  completion callback.
- The remaining emulated devices and exports run with
  qemu_get_aio_context() == blk_get_aio_context().

Signed-off-by: Stefan Hajnoczi 
---
 block/block-backend.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index a77295a198..4863be5691 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1530,7 +1530,7 @@ BlockAIOCB *blk_abort_aio_request(BlockBackend *blk,
 acb->blk = blk;
 acb->ret = ret;
 
-replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
+replay_bh_schedule_oneshot_event(qemu_get_current_aio_context(),
  error_callback_bh, acb);
 return &acb->common;
 }
@@ -1584,11 +1584,11 @@ static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, 
int64_t offset,
 acb->has_returned = false;
 
 co = qemu_coroutine_create(co_entry, acb);
-aio_co_enter(blk_get_aio_context(blk), co);
+aio_co_enter(qemu_get_current_aio_context(), co);
 
 acb->has_returned = true;
 if (acb->rwco.ret != NOT_DONE) {
-replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
+replay_bh_schedule_oneshot_event(qemu_get_current_aio_context(),
  blk_aio_complete_bh, acb);
 }
 
-- 
2.41.0

CI container image interference between staging and staging-7.2

2023-09-12 Thread Stefan Hajnoczi

Hi,
TL;DR Michael: Please check that the staging-7.2 branch has Dan's
commit e28112d00703abd136e2411d23931f4f891c9244 ("gitlab: stable
staging branches publish containers in a separate tag").

I couldn't explain a check-cfi-x86_64 failure
(https://gitlab.com/qemu-project/qemu/-/jobs/5072006964), so I reran
build-cfi-x86_64 to see if it has an effect on its dependencies.

To my surprise the rerun of build-cfi-x86_64 failed:
https://gitlab.com/qemu-project/qemu/-/jobs/5072087783

The first run was successful:
https://gitlab.com/qemu-project/qemu/-/jobs/5071532799

Diffing the output shows that the software versions are different. The
successful run has Python 3.11.5 and meson 1.0.1 while the failed run
has Python 3.10.8 and meson 0.63.3.

I think staging and staging-7.2 pipelines are interfering with each
other. My understanding is that build-cfi-x86_64 uses
registry.gitlab.com/qemu-project/qemu/qemu/fedora:latest and that
should be built from fedora:38. Python 3.10.8 is what Fedora 35 uses.
The staging-7.2 branch's fedora.docker file uses fedora:35.

Stefan

Re: [PATCH v2 4/4] block-coroutine-wrapper: use qemu_get_current_aio_context()

2023-09-12 Thread Stefan Hajnoczi

On Fri, Sep 01, 2023 at 07:01:37PM +0200, Kevin Wolf wrote:
> Am 24.08.2023 um 01:59 hat Stefan Hajnoczi geschrieben:
> > Use qemu_get_current_aio_context() in mixed wrappers and coroutine
> > wrappers so that code runs in the caller's AioContext instead of moving
> > to the BlockDriverState's AioContext. This change is necessary for the
> > multi-queue block layer where any thread can call into the block layer.
> > 
> > Most wrappers are IO_CODE where it's safe to use the current AioContext
> > nowadays. BlockDrivers and the core block layer use their own locks and
> > no longer depend on the AioContext lock for thread-safety.
> > 
> > The bdrv_create() wrapper invokes GLOBAL_STATE code. Using the current
> > AioContext is safe because this code is only called with the BQL held
> > from the main loop thread.
> > 
> > The output of qemu-iotests 051 is sensitive to event loop activity.
> > Update the output because the monitor BH runs at a different time,
> > causing prompts to be printed differently in the output.
> > 
> > Signed-off-by: Stefan Hajnoczi 
> 
> The update for 051 is actually missing from this patch, and so the test
> fails.
> 
> I missed the dependency on your qio_channel series, so 281 ran into an
> abort() for me (see below for the stack trace). I expect that the other
> series actually fixes this, but this kind of interaction wasn't really
> obvious. How did you make sure that there aren't other places that don't
> like this change?

Only by running qemu-iotests.

Stefan

> 
> Kevin
> 
> (gdb) bt
> #0  0x7f8ef0d2fe5c in __pthread_kill_implementation () at /lib64/libc.so.6
> #1  0x7f8ef0cdfa76 in raise () at /lib64/libc.so.6
> #2  0x7f8ef0cc97fc in abort () at /lib64/libc.so.6
> #3  0x7f8ef0cc971b in _nl_load_domain.cold () at /lib64/libc.so.6
> #4  0x7f8ef0cd8656 in  () at /lib64/libc.so.6
> #5  0x55fd19da6af3 in qio_channel_yield (ioc=0x7f8eeb70, 
> condition=G_IO_IN) at ../io/channel.c:583
> #6  0x55fd19e0382f in nbd_read_eof (bs=0x55fd1b681350, 
> ioc=0x7f8eeb70, buffer=0x55fd1b680da0, size=4, errp=0x0) at 
> ../nbd/client.c:1454
> #7  0x55fd19e03612 in nbd_receive_reply (bs=0x55fd1b681350, 
> ioc=0x7f8eeb70, reply=0x55fd1b680da0, errp=0x0) at ../nbd/client.c:1491
> #8  0x55fd19e40575 in nbd_receive_replies (s=0x55fd1b680b00, cookie=1) at 
> ../block/nbd.c:461
> #9  0x55fd19e3fec4 in nbd_co_do_receive_one_chunk
> (s=0x55fd1b680b00, cookie=1, only_structured=true, 
> request_ret=0x7f8ee8bff91c, qiov=0x7f8ee8bfff10, payload=0x7f8ee8bff9d0, 
> errp=0x7f8ee8bff910) at ../block/nbd.c:844
> #10 0x55fd19e3fd55 in nbd_co_receive_one_chunk
> (s=0x55fd1b680b00, cookie=1, only_structured=true, 
> request_ret=0x7f8ee8bff91c, qiov=0x7f8ee8bfff10, reply=0x7f8ee8bff9f0, 
> payload=0x7f8ee8bff9d0, errp=0x7f8ee8bff910)
> at ../block/nbd.c:925
> #11 0x55fd19e3f7b5 in nbd_reply_chunk_iter_receive (s=0x55fd1b680b00, 
> iter=0x7f8ee8bff9d8, cookie=1, qiov=0x7f8ee8bfff10, reply=0x7f8ee8bff9f0, 
> payload=0x7f8ee8bff9d0)
> at ../block/nbd.c:1008
> #12 0x55fd19e3ecf7 in nbd_co_receive_cmdread_reply (s=0x55fd1b680b00, 
> cookie=1, offset=0, qiov=0x7f8ee8bfff10, request_ret=0x7f8ee8bffad4, 
> errp=0x7f8ee8bffac8) at ../block/nbd.c:1074
> #13 0x55fd19e3c804 in nbd_client_co_preadv (bs=0x55fd1b681350, offset=0, 
> bytes=131072, qiov=0x7f8ee8bfff10, flags=0) at ../block/nbd.c:1258
> #14 0x55fd19e33547 in bdrv_driver_preadv (bs=0x55fd1b681350, offset=0, 
> bytes=131072, qiov=0x7f8ee8bfff10, qiov_offset=0, flags=0) at 
> ../block/io.c:1005
> #15 0x55fd19e2c8bb in bdrv_aligned_preadv (child=0x55fd1c282d90, 
> req=0x7f8ee8bffd90, offset=0, bytes=131072, align=1, qiov=0x7f8ee8bfff10, 
> qiov_offset=0, flags=0) at ../block/io.c:1398
> #16 0x55fd19e2bf7d in bdrv_co_preadv_part (child=0x55fd1c282d90, 
> offset=0, bytes=131072, qiov=0x7f8ee8bfff10, qiov_offset=0, flags=0) at 
> ../block/io.c:1815
> #17 0x55fd19e176bd in blk_co_do_preadv_part (blk=0x55fd1c269c00, 
> offset=0, bytes=131072, qiov=0x7f8ee8bfff10, qiov_offset=0, flags=0) at 
> ../block/block-backend.c:1344
> #18 0x55fd19e17588 in blk_co_preadv (blk=0x55fd1c269c00, offset=0, 
> bytes=131072, qiov=0x7f8ee8bfff10, flags=0) at ../block/block-backend.c:1369
> #19 0x55fd19e17514 in blk_co_pread (blk=0x55fd1c269c00, offset=0, 
> bytes=131072, buf=0x55fd1c16d000, flags=0) at ../block/block-backend.c:1358
> #20 0x55fd19ddcc91 in blk_co_pread_entry (opaque=0x7ffc4bbdd9a0) at 
> block/block-gen.c:1519
> #21 0x55fd19feb2a1 in coroutine_trampoline (i0=460835072, i1=22013) at 
> ../util/coroutine-ucontext.c:177
> #22 0x7

Re: [PATCH 0/4] ci: fix hang of FreeBSD CI jobs

2023-09-12 Thread Stefan Hajnoczi

On Tue, 12 Sept 2023 at 14:41, Daniel P. Berrangé  wrote:
>
> This addresses
>
>   https://gitlab.com/qemu-project/qemu/-/issues/1882
>
> Which turned out to be a genuine flaw which we missed during merge
> as the patch hitting master co-incided with the FreeBSD CI job
> having an temporary outage due to changed release image version.
>
> Daniel P. Berrangé (4):
>   microbit: add missing qtest_quit() call
>   qtest: kill orphaned qtest QEMU processes on FreeBSD
>   gitlab: make Cirrus CI timeout explicit
>   gitlab: make Cirrus CI jobs gating
>
>  .gitlab-ci.d/cirrus.yml   | 4 +++-
>  .gitlab-ci.d/cirrus/build.yml | 2 ++
>  tests/qtest/libqtest.c| 7 +++
>  tests/qtest/microbit-test.c   | 2 ++
>  4 files changed, 14 insertions(+), 1 deletion(-)

Thank you!

Reviewed-by: Stefan Hajnoczi

Re: [PATCH] gitlab: remove unreliable avocado CI jobs

2023-09-12 Thread Stefan Hajnoczi

On Tue, 12 Sept 2023 at 14:36, Alex Bennée  wrote:
>
>
> Stefan Hajnoczi  writes:
>
> > On Tue, Sep 12, 2023, 12:14 Daniel P. Berrangé  wrote:
> >
> >  On Tue, Sep 12, 2023 at 05:01:26PM +0100, Alex Bennée wrote:
> >  >
> >  > Daniel P. Berrangé  writes:
> >  >
> >  > > On Tue, Sep 12, 2023 at 11:06:11AM -0400, Stefan Hajnoczi wrote:
> >  > >> The avocado-system-alpine, avocado-system-fedora, and
> >  > >> avocado-system-ubuntu jobs are unreliable. I identified them while
> >  > >> looking over CI failures from the past week:
> >  > >> https://gitlab.com/qemu-project/qemu/-/jobs/5058610614
> >  > >> https://gitlab.com/qemu-project/qemu/-/jobs/5058610654
> >  > >> https://gitlab.com/qemu-project/qemu/-/jobs/5030428571
> >  > >>
> >  > >> Thomas Huth suggest on IRC today that there may be a legitimate 
> > failure
> >  > >> in there:
> >  > >>
> >  > >>   th_huth: f4bug, yes, seems like it does not start at all correctly 
> > on
> >  > >>   alpine anymore ... and it's broken since ~ 2 weeks already, so if 
> > nobody
> >  > >>   noticed this by now, this is worrying
> >  > >>
> >  > >> It crept in because the jobs were already unreliable.
> >  > >>
> >  > >> I don't know how to interpret the job output, so all I can do is to
> >  > >> propose removing these jobs. A useful CI job has two outcomes: pass or
> >  > >> fail. Timeouts and other in-between states are not useful because they
> >  > >> require constant triaging by someone who understands the details of 
> > the
> >  > >> tests and they can occur when run against pull requests that have
> >  > >> nothing to do with the area covered by the test.
> >  > >>
> >  > >> Hopefully test owners will be able to identify the root causes and 
> > solve
> >  > >> them so that these jobs can stay. In their current state the jobs are
> >  > >> not useful since I cannot cannot tell whether job failures are real or
> >  > >> just intermittent when merging qemu.git pull requests.
> >  > >>
> >  > >> If you are a test owner, please take a look.
> >  > >>
> >  > >> It is likely that other avocado-system-* CI jobs have similar failures
> >  > >> from time to time, but I'll leave them as long as they are passing.
> >  > >>
> >  > >> Buglink: https://gitlab.com/qemu-project/qemu/-/issues/1884
> >  > >> Signed-off-by: Stefan Hajnoczi 
> >  > >> ---
> >  > >>  .gitlab-ci.d/buildtest.yml | 27 ---
> >  > >>  1 file changed, 27 deletions(-)
> >  > >>
> >  > >> diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
> >  > >> index aee9101507..83ce448c4d 100644
> >  > >> --- a/.gitlab-ci.d/buildtest.yml
> >  > >> +++ b/.gitlab-ci.d/buildtest.yml
> >  > >> @@ -22,15 +22,6 @@ check-system-alpine:
> >  > >>  IMAGE: alpine
> >  > >>  MAKE_CHECK_ARGS: check-unit check-qtest
> >  > >>
> >  > >> -avocado-system-alpine:
> >  > >> -  extends: .avocado_test_job_template
> >  > >> -  needs:
> >  > >> -- job: build-system-alpine
> >  > >> -  artifacts: true
> >  > >> -  variables:
> >  > >> -IMAGE: alpine
> >  > >> -MAKE_CHECK_ARGS: check-avocado
> >  > >
> >  > > Instead of entirely deleting, I'd suggest adding
> >  > >
> >  > ># Disabled due to frequent random failures
> >  > ># https://gitlab.com/qemu-project/qemu/-/issues/1884
> >  > >when: manual
> >  > >
> >  > > See example: https://docs.gitlab.com/ee/ci/yaml/#when
> >  > >
> >  > > This disables the job from running unless someone explicitly
> >  > > tells it to run
> >  >
> >  > What I don't understand is why we didn't gate the release back when they
> >  > first tripped. We should have noticed between:
> >  >
> >  >   https://gitlab.com/qemu-project/qemu/-/pipelines/956543770
> >  >
> >  > and
> >  >
> >  >   https://gitlab.com/qemu-project/qemu/-/pipelines/957154381
> >  >
> >  > that the system

Re: [RFC 3/3] qmp: make qmp_device_add() a coroutine

2023-09-12 Thread Stefan Hajnoczi

On Tue, 12 Sept 2023 at 12:47, Kevin Wolf  wrote:
>
> Am 06.09.2023 um 21:01 hat Stefan Hajnoczi geschrieben:
> > It is not safe to call drain_call_rcu() from qmp_device_add() because
> > some call stacks are not prepared for drain_call_rcu() to drop the Big
> > QEMU Lock (BQL).
> >
> > For example, device emulation code is protected by the BQL but when it
> > calls aio_poll() -> ... -> qmp_device_add() -> drain_call_rcu() then the
> > BQL is dropped. See bz#2215192 below for a concrete bug of this type.
> >
> > Another limitation of drain_call_rcu() is that it cannot be invoked
> > within an RCU read-side critical section since the reclamation phase
> > cannot complete until the end of the critical section. Unfortunately,
> > call stacks have been seen where this happens (see bz#2214985 below).
> >
> > Switch to call_drain_rcu_co() to avoid these problems. This requires
> > making qmp_device_add() a coroutine. qdev_device_add() is not designed
> > to be called from coroutines, so it must be invoked from a BH and then
> > switch back to the coroutine.
> >
> > Fixes: 7bed89958bfbf40df9ca681cefbdca63abdde39d ("device_core: use 
> > drain_call_rcu in in qmp_device_add")
> > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2215192
> > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2214985
> > Signed-off-by: Stefan Hajnoczi 
>
> Can you please include the relevant information directly in the commit
> message instead of only referencing Bugzilla? Both bugs only contain
> half of the story - I'm not even sure if the link with the stack trace
> is publically accessible - and then I think you got some information
> only from reproducing it yourself, and this information is missing from
> the bug reports. (The other question is how long the information will
> still be available in Bugzilla.)

Yes, I'll include the details in the commit description.

>
> >  qapi/qdev.json |  1 +
> >  include/monitor/qdev.h |  3 ++-
> >  monitor/qmp-cmds.c |  2 +-
> >  softmmu/qdev-monitor.c | 34 ++
> >  hmp-commands.hx|  1 +
> >  5 files changed, 35 insertions(+), 6 deletions(-)
> >
> > diff --git a/qapi/qdev.json b/qapi/qdev.json
> > index 6bc5a733b8..78e9d7f7b8 100644
> > --- a/qapi/qdev.json
> > +++ b/qapi/qdev.json
> > @@ -79,6 +79,7 @@
> >  ##
> >  { 'command': 'device_add',
> >'data': {'driver': 'str', '*bus': 'str', '*id': 'str'},
> > +  'coroutine': true,
> >'gen': false, # so we can get the additional arguments
> >'features': ['json-cli', 'json-cli-hotplug'] }
> >
> > diff --git a/include/monitor/qdev.h b/include/monitor/qdev.h
> > index 1d57bf6577..1fed9eb9ea 100644
> > --- a/include/monitor/qdev.h
> > +++ b/include/monitor/qdev.h
> > @@ -5,7 +5,8 @@
> >
> >  void hmp_info_qtree(Monitor *mon, const QDict *qdict);
> >  void hmp_info_qdm(Monitor *mon, const QDict *qdict);
> > -void qmp_device_add(QDict *qdict, QObject **ret_data, Error **errp);
> > +void coroutine_fn
> > +qmp_device_add(QDict *qdict, QObject **ret_data, Error **errp);
> >
> >  int qdev_device_help(QemuOpts *opts);
> >  DeviceState *qdev_device_add(QemuOpts *opts, Error **errp);
> > diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
> > index b0f948d337..a7419226fe 100644
> > --- a/monitor/qmp-cmds.c
> > +++ b/monitor/qmp-cmds.c
> > @@ -202,7 +202,7 @@ static void __attribute__((__constructor__)) 
> > monitor_init_qmp_commands(void)
> >  qmp_init_marshal(&qmp_commands);
> >
> >  qmp_register_command(&qmp_commands, "device_add",
> > - qmp_device_add, 0, 0);
> > + qmp_device_add, QCO_COROUTINE, 0);
> >
> >  QTAILQ_INIT(&qmp_cap_negotiation_commands);
> >  qmp_register_command(&qmp_cap_negotiation_commands, "qmp_capabilities",
> > diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c
> > index 74f4e41338..85ae62f7cf 100644
> > --- a/softmmu/qdev-monitor.c
> > +++ b/softmmu/qdev-monitor.c
> > @@ -839,8 +839,28 @@ void hmp_info_qdm(Monitor *mon, const QDict *qdict)
> >  qdev_print_devinfos(true);
> >  }
> >
> > -void qmp_device_add(QDict *qdict, QObject **ret_data, Error **errp)
> > +typedef struct {
> > +Coroutine *co;
> > +QemuOpts *opts;
> > +Error **er

Re: [RFC 2/3] rcu: add drain_call_rcu_co() API

2023-09-12 Thread Stefan Hajnoczi

On Tue, 12 Sept 2023 at 12:37, Kevin Wolf  wrote:
>
> Am 06.09.2023 um 21:01 hat Stefan Hajnoczi geschrieben:
> > call_drain_rcu() has limitations that make it unsuitable for use in
> > qmp_device_add().
>
> This sounds a bit vague with only alluding to some unnamed limitations.
> I assume that you mean the two points you add to rcu.txt. If so, maybe
> it would be better to add a reference to that in the commit message.

Yes, exactly. I will add a reference to the commit message.

>
> > Introduce a new coroutine version of drain_call_rcu()
> > with the same functionality but that does not drop the BQL. The next
> > patch will use it to fix qmp_device_add().
> >
> > Signed-off-by: Stefan Hajnoczi 
>
> I don't understand the reasoning here. How does yielding from the
> coroutine not effectively release the BQL, too? It's just that you won't
> have explicit code here, but the mainloop will do it for you while
> waiting for new events.
>
> Is this about not dropping the BQL specifically in nested event loops,
> but letting the coroutine wait until we return to the real main loop
> where dropping the BQL is hopefully not a problem?

Yes.

Stefan

Re: [PATCH v2 00/21] Graph locking part 4 (node management)

2023-09-12 Thread Stefan Hajnoczi

On Mon, Sep 11, 2023 at 11:45:59AM +0200, Kevin Wolf wrote:
> The previous parts of the graph locking changes focussed mostly on the
> BlockDriver side and taking reader locks while performing I/O. This
> series focusses more on the functions managing the graph structure, i.e
> adding, removing and replacing nodes and updating their permissions.
> 
> Many of these places actually need to take the writer lock to avoid
> readers seeing an inconsistent half-updated graph state. Therefore
> taking the writer lock is now moved from the very low-level function
> bdrv_replace_child_noperm() into its more high level callers.
> 
> v2:
> - Patch 5: Improved comments, added one for bdrv_schedule_unref()
> 
> Kevin Wolf (21):
>   block: Remove unused BlockReopenQueueEntry.perms_checked
>   preallocate: Factor out preallocate_truncate_to_real_size()
>   preallocate: Don't poll during permission updates
>   block: Take AioContext lock for bdrv_append() more consistently
>   block: Introduce bdrv_schedule_unref()
>   block-coroutine-wrapper: Add no_co_wrapper_bdrv_wrlock functions
>   block-coroutine-wrapper: Allow arbitrary parameter names
>   block: Mark bdrv_replace_child_noperm() GRAPH_WRLOCK
>   block: Mark bdrv_replace_child_tran() GRAPH_WRLOCK
>   block: Mark bdrv_attach_child_common() GRAPH_WRLOCK
>   block: Call transaction callbacks with lock held
>   block: Mark bdrv_attach_child() GRAPH_WRLOCK
>   block: Mark bdrv_parent_perms_conflict() and callers GRAPH_RDLOCK
>   block: Mark bdrv_get_cumulative_perm() and callers GRAPH_RDLOCK
>   block: Mark bdrv_child_perm() GRAPH_RDLOCK
>   block: Mark bdrv_parent_cb_change_media() GRAPH_RDLOCK
>   block: Take graph rdlock in bdrv_drop_intermediate()
>   block: Take graph rdlock in bdrv_change_aio_context()
>   block: Mark bdrv_root_unref_child() GRAPH_WRLOCK
>   block: Mark bdrv_unref_child() GRAPH_WRLOCK
>   block: Mark bdrv_add/del_child() and caller GRAPH_WRLOCK
> 
>  include/block/block-common.h|   4 +
>  include/block/block-global-state.h  |  30 +-
>  include/block/block_int-common.h|  34 +-
>  include/block/block_int-global-state.h  |  14 +-
>  include/sysemu/block-backend-global-state.h |   4 +-
>  block.c | 348 ++--
>  block/blklogwrites.c|   4 +
>  block/blkverify.c   |   2 +
>  block/block-backend.c   |  29 +-
>  block/copy-before-write.c   |  10 +-
>  block/crypto.c  |   6 +-
>  block/graph-lock.c  |  26 +-
>  block/mirror.c  |   8 +
>  block/preallocate.c | 133 +---
>  block/qcow2.c   |   4 +-
>  block/quorum.c  |  23 +-
>  block/replication.c |   9 +
>  block/snapshot.c|   2 +
>  block/stream.c  |  20 +-
>  block/vmdk.c|  13 +
>  blockdev.c  |  23 +-
>  blockjob.c  |   2 +
>  tests/unit/test-bdrv-drain.c|  23 +-
>  tests/unit/test-bdrv-graph-mod.c|  20 ++
>  tests/unit/test-block-iothread.c|   3 +
>  scripts/block-coroutine-wrapper.py  |  18 +-
>  tests/qemu-iotests/051.pc.out   |   6 +-
>  27 files changed, 591 insertions(+), 227 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH v2 18/21] block: Take graph rdlock in bdrv_change_aio_context()

2023-09-12 Thread Stefan Hajnoczi

On Mon, Sep 11, 2023 at 11:46:17AM +0200, Kevin Wolf wrote:
> The function reads the parents list, so it needs to hold the graph lock.
> 
> Signed-off-by: Kevin Wolf 
> Reviewed-by: Emanuele Giuseppe Esposito 
> ---
>  block.c | 4 
>  1 file changed, 4 insertions(+)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH v2 17/21] block: Take graph rdlock in bdrv_drop_intermediate()

2023-09-12 Thread Stefan Hajnoczi

On Mon, Sep 11, 2023 at 11:46:16AM +0200, Kevin Wolf wrote:
> The function reads the parents list, so it needs to hold the graph lock.
> 
> Signed-off-by: Kevin Wolf 
> Reviewed-by: Emanuele Giuseppe Esposito 
> ---
>  block.c | 2 ++
>  1 file changed, 2 insertions(+)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH v2 05/21] block: Introduce bdrv_schedule_unref()

2023-09-12 Thread Stefan Hajnoczi

On Mon, Sep 11, 2023 at 11:46:04AM +0200, Kevin Wolf wrote:
> bdrv_unref() is called by a lot of places that need to hold the graph
> lock (it naturally happens in the context of operations that change the
> graph). However, bdrv_unref() takes the graph writer lock internally, so
> it can't actually be called while already holding a graph lock without
> causing a deadlock.
> 
> bdrv_unref() also can't just become GRAPH_WRLOCK because it drains the
> node before closing it, and draining requires that the graph is
> unlocked.
> 
> The solution is to defer deleting the node until we don't hold the lock
> any more and draining is possible again.
> 
> Note that keeping images open for longer than necessary can create
> problems, too: You can't open an image again before it is really closed
> (if image locking didn't prevent it, it would cause corruption).
> Reopening an image immediately happens at least during bdrv_open() and
> bdrv_co_create().
> 
> In order to solve this problem, make sure to run the deferred unref in
> bdrv_graph_wrunlock(), i.e. the first possible place where we can drain
> again. This is also why bdrv_schedule_unref() is marked GRAPH_WRLOCK.
> 
> The output of iotest 051 is updated because the additional polling
> changes the order of HMP output, resulting in a new "(qemu)" prompt in
> the test output that was previously on a separate line and filtered out.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  include/block/block-global-state.h |  1 +
>  block.c| 17 +
>  block/graph-lock.c | 26 +++---
>  tests/qemu-iotests/051.pc.out  |  6 +++---
>  4 files changed, 40 insertions(+), 10 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH] gitlab: remove unreliable avocado CI jobs

2023-09-12 Thread Stefan Hajnoczi

On Tue, Sep 12, 2023, 12:14 Daniel P. Berrangé  wrote:

> On Tue, Sep 12, 2023 at 05:01:26PM +0100, Alex Bennée wrote:
> >
> > Daniel P. Berrangé  writes:
> >
> > > On Tue, Sep 12, 2023 at 11:06:11AM -0400, Stefan Hajnoczi wrote:
> > >> The avocado-system-alpine, avocado-system-fedora, and
> > >> avocado-system-ubuntu jobs are unreliable. I identified them while
> > >> looking over CI failures from the past week:
> > >> https://gitlab.com/qemu-project/qemu/-/jobs/5058610614
> > >> https://gitlab.com/qemu-project/qemu/-/jobs/5058610654
> > >> https://gitlab.com/qemu-project/qemu/-/jobs/5030428571
> > >>
> > >> Thomas Huth suggest on IRC today that there may be a legitimate
> failure
> > >> in there:
> > >>
> > >>   th_huth: f4bug, yes, seems like it does not start at all correctly
> on
> > >>   alpine anymore ... and it's broken since ~ 2 weeks already, so if
> nobody
> > >>   noticed this by now, this is worrying
> > >>
> > >> It crept in because the jobs were already unreliable.
> > >>
> > >> I don't know how to interpret the job output, so all I can do is to
> > >> propose removing these jobs. A useful CI job has two outcomes: pass or
> > >> fail. Timeouts and other in-between states are not useful because they
> > >> require constant triaging by someone who understands the details of
> the
> > >> tests and they can occur when run against pull requests that have
> > >> nothing to do with the area covered by the test.
> > >>
> > >> Hopefully test owners will be able to identify the root causes and
> solve
> > >> them so that these jobs can stay. In their current state the jobs are
> > >> not useful since I cannot cannot tell whether job failures are real or
> > >> just intermittent when merging qemu.git pull requests.
> > >>
> > >> If you are a test owner, please take a look.
> > >>
> > >> It is likely that other avocado-system-* CI jobs have similar failures
> > >> from time to time, but I'll leave them as long as they are passing.
> > >>
> > >> Buglink: https://gitlab.com/qemu-project/qemu/-/issues/1884
> > >> Signed-off-by: Stefan Hajnoczi 
> > >> ---
> > >>  .gitlab-ci.d/buildtest.yml | 27 ---
> > >>  1 file changed, 27 deletions(-)
> > >>
> > >> diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
> > >> index aee9101507..83ce448c4d 100644
> > >> --- a/.gitlab-ci.d/buildtest.yml
> > >> +++ b/.gitlab-ci.d/buildtest.yml
> > >> @@ -22,15 +22,6 @@ check-system-alpine:
> > >>  IMAGE: alpine
> > >>  MAKE_CHECK_ARGS: check-unit check-qtest
> > >>
> > >> -avocado-system-alpine:
> > >> -  extends: .avocado_test_job_template
> > >> -  needs:
> > >> -- job: build-system-alpine
> > >> -  artifacts: true
> > >> -  variables:
> > >> -IMAGE: alpine
> > >> -MAKE_CHECK_ARGS: check-avocado
> > >
> > > Instead of entirely deleting, I'd suggest adding
> > >
> > ># Disabled due to frequent random failures
> > ># https://gitlab.com/qemu-project/qemu/-/issues/1884
> > >when: manual
> > >
> > > See example: https://docs.gitlab.com/ee/ci/yaml/#when
> > >
> > > This disables the job from running unless someone explicitly
> > > tells it to run
> >
> > What I don't understand is why we didn't gate the release back when they
> > first tripped. We should have noticed between:
> >
> >   https://gitlab.com/qemu-project/qemu/-/pipelines/956543770
> >
> > and
> >
> >   https://gitlab.com/qemu-project/qemu/-/pipelines/957154381
> >
> > that the system tests where regressing. Yet we merged the changes
> > anyway.
>
> I think that green series is misleading, based on Richard's
> mail on list wrt the TCG pull series:
>
>   https://lists.gnu.org/archive/html/qemu-devel/2023-08/msg04014.html
>
>   "It's some sort of timing issue, which sometimes goes away
>when re-run. I was re-running tests *a lot* in order to
>get them to go green while running the 8.1 release. "
>
>
> Essentially I'd put this down to the tests being soo non-deterministic
> that we've given up trusting them.
>

Yes.

Stefan


> With regards,
> Daniel
> --
> |: https://berrange.com  -o-
> https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-
> https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-
> https://www.instagram.com/dberrange :|
>
>
>

[PATCH] gitlab: remove unreliable avocado CI jobs

2023-09-12 Thread Stefan Hajnoczi

The avocado-system-alpine, avocado-system-fedora, and
avocado-system-ubuntu jobs are unreliable. I identified them while
looking over CI failures from the past week:
https://gitlab.com/qemu-project/qemu/-/jobs/5058610614
https://gitlab.com/qemu-project/qemu/-/jobs/5058610654
https://gitlab.com/qemu-project/qemu/-/jobs/5030428571

Thomas Huth suggest on IRC today that there may be a legitimate failure
in there:

  th_huth: f4bug, yes, seems like it does not start at all correctly on
  alpine anymore ... and it's broken since ~ 2 weeks already, so if nobody
  noticed this by now, this is worrying

It crept in because the jobs were already unreliable.

I don't know how to interpret the job output, so all I can do is to
propose removing these jobs. A useful CI job has two outcomes: pass or
fail. Timeouts and other in-between states are not useful because they
require constant triaging by someone who understands the details of the
tests and they can occur when run against pull requests that have
nothing to do with the area covered by the test.

Hopefully test owners will be able to identify the root causes and solve
them so that these jobs can stay. In their current state the jobs are
not useful since I cannot cannot tell whether job failures are real or
just intermittent when merging qemu.git pull requests.

If you are a test owner, please take a look.

It is likely that other avocado-system-* CI jobs have similar failures
from time to time, but I'll leave them as long as they are passing.

Buglink: https://gitlab.com/qemu-project/qemu/-/issues/1884
Signed-off-by: Stefan Hajnoczi 
---
 .gitlab-ci.d/buildtest.yml | 27 ---
 1 file changed, 27 deletions(-)

diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
index aee9101507..83ce448c4d 100644
--- a/.gitlab-ci.d/buildtest.yml
+++ b/.gitlab-ci.d/buildtest.yml
@@ -22,15 +22,6 @@ check-system-alpine:
 IMAGE: alpine
 MAKE_CHECK_ARGS: check-unit check-qtest
 
-avocado-system-alpine:
-  extends: .avocado_test_job_template
-  needs:
-- job: build-system-alpine
-  artifacts: true
-  variables:
-IMAGE: alpine
-MAKE_CHECK_ARGS: check-avocado
-
 build-system-ubuntu:
   extends:
 - .native_build_job_template
@@ -53,15 +44,6 @@ check-system-ubuntu:
 IMAGE: ubuntu2204
 MAKE_CHECK_ARGS: check
 
-avocado-system-ubuntu:
-  extends: .avocado_test_job_template
-  needs:
-- job: build-system-ubuntu
-  artifacts: true
-  variables:
-IMAGE: ubuntu2204
-MAKE_CHECK_ARGS: check-avocado
-
 build-system-debian:
   extends:
 - .native_build_job_template
@@ -127,15 +109,6 @@ check-system-fedora:
 IMAGE: fedora
 MAKE_CHECK_ARGS: check
 
-avocado-system-fedora:
-  extends: .avocado_test_job_template
-  needs:
-- job: build-system-fedora
-  artifacts: true
-  variables:
-IMAGE: fedora
-MAKE_CHECK_ARGS: check-avocado
-
 crash-test-fedora:
   extends: .native_test_job_template
   needs:
-- 
2.41.0

Re: [PATCH] gitlab-ci/cirrus: Increase timeout to 100 minutes

2023-09-12 Thread Stefan Hajnoczi

Thank you!

Stefan

On Tue, 12 Sept 2023 at 10:15, Daniel P. Berrangé  wrote:
>
> On Tue, Sep 12, 2023 at 10:02:17AM -0400, Stefan Hajnoczi wrote:
> > On Tue, 12 Sept 2023 at 09:53, Daniel P. Berrangé  
> > wrote:
> > >
> > > On Tue, Sep 12, 2023 at 09:38:29AM -0400, Stefan Hajnoczi wrote:
> > > > The 80m timeout is not enough:
> > > >
> > > >   672/832 qemu:block / io-qcow2-041  OK 39.77s   1 
> > > > subtests passed
> > > >   Timed out!
> > >
> > > IIUC, that 'timed out' message is coming from Cirrus CI logs, which
> > > we can see over on the cirrus task:
> > >
> > >   https://cirrus-ci.com/task/6462328380588032
> > >
> > > > https://gitlab.com/qemu-project/qemu/-/jobs/5058610599
> > >
> > > This reports duration "64 minutes", vs a GitLab timeout of 1hr20.
> > >
> > > IOW, we're not hitting the gitlab timeout, we're hitting hte
> > > Cirrus CI timeout, which defaults to 60 minutes.  The other
> > > 4 minuts gitlab reports is likely because Cirrus queued the
> > > job for 4 minutes before starting execution.
> >
> > I'm glad you spotted that. I'm not familiar with Cirrus. Could you
> > send a patch that sets 'timeout_in'?
>
> Yes, testing now
>
>   
> https://gitlab.com/berrange/qemu/-/commit/c15d677de5ed2965464bc6212f049ed9785c4434
>
>   https://gitlab.com/berrange/qemu/-/jobs/5069195895
>
>   https://cirrus-ci.com/task/5135339078025216
>
> The cirrus CI job page looks to be picking up the elevated timeout.
>
>
> With regards,
> Daniel
> --
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
>

Re: [PATCH] tests/qtest/netdev-socket: Raise connection timeout to 120 seconds

2023-09-12 Thread Stefan Hajnoczi

Here is a log from the CI, but I don't think it has much information:
https://gitlab.com/qemu-project/qemu/-/jobs/5020899550

Is it possible to detect the crash? Timeouts are hard to diagnose, so
it would be better for the test to detect a terminated child process
and print an error.

Stefan

On Tue, 12 Sept 2023 at 10:08, Laurent Vivier  wrote:
>
> On 9/12/23 15:42, Daniel P. Berrangé wrote:
> > On Tue, Sep 12, 2023 at 09:33:10AM -0400, Stefan Hajnoczi wrote:
> >> The test still fails intermittently with a 60 second timeout in the
> >> GitLab CI environment. Raise the timeout to 120 seconds.
> >>
> >>576/839 ERROR:../tests/qtest/netdev-socket.c:293:test_stream_unix: 
> >> assertion failed (resp == expect): ("st0: index=0,type=stream,connection 
> >> error\r\n" == "st0: 
> >> index=0,type=stream,unix:/tmp/netdev-socket.UW5IA2/stream_unix\r\n") ERROR
> >>576/839 qemu:qtest+qtest-sh4 / qtest-sh4/netdev-socket  
> >>   ERROR  62.85s   killed by signal 6 SIGABRT
> >>>>> MALLOC_PERTURB_=249 QTEST_QEMU_BINARY=./qemu-system-sh4 
> >> QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon 
> >> G_TEST_DBUS_DAEMON=/home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/tests/dbus-vmstate-daemon.sh
> >>  QTEST_QEMU_IMG=./qemu-img 
> >> /home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/build/tests/qtest/netdev-socket
> >>  --tap -k
> >>― ✀  
> >> ―
> >>stderr:
> >>**
> >>ERROR:../tests/qtest/netdev-socket.c:293:test_stream_unix: assertion 
> >> failed (resp == expect): ("st0: index=0,type=stream,connection error\r\n" 
> >> == "st0: 
> >> index=0,type=stream,unix:/tmp/netdev-socket.UW5IA2/stream_unix\r\n")
> >>(test program exited with status code -6)
> >>
> >> Buglink: https://gitlab.com/qemu-project/qemu/-/issues/1881
> >> Fixes: 417296c8d858 ("tests/qtest/netdev-socket: Raise connection timeout 
> >> to 60 seconds")
> >> Signed-off-by: Stefan Hajnoczi 
> >
> > That bumped the timeout from 5 seconds to 60 seconds to
> > cope with intermittent failures, which was a x12
> > increases. I'm concerned that it would still be failing
> > in largely the same way after that, and possibly we are
> > instead hitting a race condition causing setup to fail,
> > which masquerades as a timeout.
> >
> >> ---
> >>   tests/qtest/netdev-socket.c | 2 +-
> >>   1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/tests/qtest/netdev-socket.c b/tests/qtest/netdev-socket.c
> >> index 8eed54801f..b2501d72a1 100644
> >> --- a/tests/qtest/netdev-socket.c
> >> +++ b/tests/qtest/netdev-socket.c
> >> @@ -16,7 +16,7 @@
> >>   #include "qapi/qobject-input-visitor.h"
> >>   #include "qapi/qapi-visit-sockets.h"
> >>
> >> -#define CONNECTION_TIMEOUT60
> >> +#define CONNECTION_TIMEOUT120
> >>
> >>   #define EXPECT_STATE(q, e, t) \
> >>   do {  \
> >
> > I'll add
> >
> > Reviewed-by: Daniel P. Berrangé 
> >
> > but with the caveat that i'm only 50/50 on whether this is actually
> > the right fix. Doesn't hurt to try it, but if 120 seconds still shows
> > failures I'd say we're hitting a functional race not a timeout.
>
> It can also happen if the first QEMU (server) crashes. Do we have some traces 
> from this side?
>
> Reviewed-by: Laurent Vivier 
>
> Thanks,
> Laurent
>
>

Re: [PATCH] gitlab-ci/cirrus: Increase timeout to 100 minutes

2023-09-12 Thread Stefan Hajnoczi

On Tue, 12 Sept 2023 at 09:53, Daniel P. Berrangé  wrote:
>
> On Tue, Sep 12, 2023 at 09:38:29AM -0400, Stefan Hajnoczi wrote:
> > The 80m timeout is not enough:
> >
> >   672/832 qemu:block / io-qcow2-041  OK 39.77s   1 
> > subtests passed
> >   Timed out!
>
> IIUC, that 'timed out' message is coming from Cirrus CI logs, which
> we can see over on the cirrus task:
>
>   https://cirrus-ci.com/task/6462328380588032
>
> > https://gitlab.com/qemu-project/qemu/-/jobs/5058610599
>
> This reports duration "64 minutes", vs a GitLab timeout of 1hr20.
>
> IOW, we're not hitting the gitlab timeout, we're hitting hte
> Cirrus CI timeout, which defaults to 60 minutes.  The other
> 4 minuts gitlab reports is likely because Cirrus queued the
> job for 4 minutes before starting execution.

I'm glad you spotted that. I'm not familiar with Cirrus. Could you
send a patch that sets 'timeout_in'?

Thanks,
Stefan

>
> >
> > Buglink: https://gitlab.com/qemu-project/qemu/-/issues/1882
> > Fixes: d06f3bf92267 ("gitlab-ci/cirrus: Increase timeout to 80 minutes")
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >  .gitlab-ci.d/cirrus.yml | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/.gitlab-ci.d/cirrus.yml b/.gitlab-ci.d/cirrus.yml
> > index 41d64d6680..d19633f758 100644
> > --- a/.gitlab-ci.d/cirrus.yml
> > +++ b/.gitlab-ci.d/cirrus.yml
> > @@ -15,7 +15,7 @@
> >stage: build
> >image: registry.gitlab.com/libvirt/libvirt-ci/cirrus-run:master
> >needs: []
> > -  timeout: 80m
> > +  timeout: 100m
> >allow_failure: true
> >script:
> >  - source .gitlab-ci.d/cirrus/$NAME.vars
>
> IIUC, we need to put a 'timeout_in' setting someone in
> .gitlab-ci.d/cirrus/build.yml instead, to override
> Cirrus 60 minute limit:
>
> https://cirrus-ci.org/faq/#instance-timed-out
>
>
> With regards,
> Daniel
> --
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
>
>

Re: cherry-picking something to -stable which might require other changes

2023-09-12 Thread Stefan Hajnoczi

When I backport patches into RHEL, the general process I follow is:
1. For context conflicts, just adjust the patch to resolve them.
2. For real dependencies, backport the dependencies, if possible.
3. If backporting the dependencies is not possible, think of a
downstream-only solution. This should be rare.

People make different backporting decisions (just like structuring
patch series). It can be a matter of taste.

Stefan

[PATCH] gitlab-ci/cirrus: Increase timeout to 100 minutes

2023-09-12 Thread Stefan Hajnoczi

The 80m timeout is not enough:

  672/832 qemu:block / io-qcow2-041  OK 39.77s   1 subtests 
passed
  Timed out!

https://gitlab.com/qemu-project/qemu/-/jobs/5058610599

Buglink: https://gitlab.com/qemu-project/qemu/-/issues/1882
Fixes: d06f3bf92267 ("gitlab-ci/cirrus: Increase timeout to 80 minutes")
Signed-off-by: Stefan Hajnoczi 
---
 .gitlab-ci.d/cirrus.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.gitlab-ci.d/cirrus.yml b/.gitlab-ci.d/cirrus.yml
index 41d64d6680..d19633f758 100644
--- a/.gitlab-ci.d/cirrus.yml
+++ b/.gitlab-ci.d/cirrus.yml
@@ -15,7 +15,7 @@
   stage: build
   image: registry.gitlab.com/libvirt/libvirt-ci/cirrus-run:master
   needs: []
-  timeout: 80m
+  timeout: 100m
   allow_failure: true
   script:
 - source .gitlab-ci.d/cirrus/$NAME.vars
-- 
2.41.0

[PATCH] tests/qtest/netdev-socket: Raise connection timeout to 120 seconds

2023-09-12 Thread Stefan Hajnoczi

The test still fails intermittently with a 60 second timeout in the
GitLab CI environment. Raise the timeout to 120 seconds.

  576/839 ERROR:../tests/qtest/netdev-socket.c:293:test_stream_unix: assertion 
failed (resp == expect): ("st0: index=0,type=stream,connection error\r\n" == 
"st0: index=0,type=stream,unix:/tmp/netdev-socket.UW5IA2/stream_unix\r\n") ERROR
  576/839 qemu:qtest+qtest-sh4 / qtest-sh4/netdev-socket
ERROR  62.85s   killed by signal 6 SIGABRT
  >>> MALLOC_PERTURB_=249 QTEST_QEMU_BINARY=./qemu-system-sh4 
QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon 
G_TEST_DBUS_DAEMON=/home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/tests/dbus-vmstate-daemon.sh
 QTEST_QEMU_IMG=./qemu-img 
/home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/build/tests/qtest/netdev-socket
 --tap -k
  ― ✀  ―
  stderr:
  **
  ERROR:../tests/qtest/netdev-socket.c:293:test_stream_unix: assertion failed 
(resp == expect): ("st0: index=0,type=stream,connection error\r\n" == "st0: 
index=0,type=stream,unix:/tmp/netdev-socket.UW5IA2/stream_unix\r\n")
  (test program exited with status code -6)

Buglink: https://gitlab.com/qemu-project/qemu/-/issues/1881
Fixes: 417296c8d858 ("tests/qtest/netdev-socket: Raise connection timeout to 60 
seconds")
Signed-off-by: Stefan Hajnoczi 
---
 tests/qtest/netdev-socket.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qtest/netdev-socket.c b/tests/qtest/netdev-socket.c
index 8eed54801f..b2501d72a1 100644
--- a/tests/qtest/netdev-socket.c
+++ b/tests/qtest/netdev-socket.c
@@ -16,7 +16,7 @@
 #include "qapi/qobject-input-visitor.h"
 #include "qapi/qapi-visit-sockets.h"
 
-#define CONNECTION_TIMEOUT60
+#define CONNECTION_TIMEOUT120
 
 #define EXPECT_STATE(q, e, t) \
 do {  \
-- 
2.41.0

Re: [PULL 0/3] Firmware/seabios 20230912 patches

2023-09-12 Thread Stefan Hajnoczi

On Tue, 12 Sept 2023 at 06:55, Gerd Hoffmann  wrote:
>
> The following changes since commit c5ea91da443b458352c1b629b490ee6631775cb4:
>
>   Merge tag 'pull-trivial-patches' of https://gitlab.com/mjt0k/qemu into 
> staging (2023-09-08 10:06:25 -0400)
>
> are available in the Git repository at:
>
>   https://gitlab.com/kraxel/qemu.git 
> tags/firmware/seabios-20230912-pull-request
>
> for you to fetch changes up to a14c30fc3d38d569415259a2d877c36a0b8de058:
>
>   seabios: update binaries to git snapshot (2023-09-11 17:32:44 +0200)
>
> 
> seabios: update to git snapshot
>
> Give seabios updates some testing coverage before
> tagging a new release.  Update to release code
> will follow later in the 8.2 devel cycle.
>
> 
>
> Gerd Hoffmann (3):
>   seabios: update submodule to git snapshot
>   seabios: turn off CONFIG_APMBIOS for 128k build
>   seabios: update binaries to git snapshot
>
>  pc-bios/bios-256k.bin | Bin 262144 -> 262144 bytes
>  pc-bios/bios-microvm.bin  | Bin 131072 -> 131072 bytes
>  pc-bios/bios.bin  | Bin 131072 -> 131072 bytes
>  pc-bios/vgabios-ati.bin   | Bin 39936 -> 39424 bytes
>  pc-bios/vgabios-bochs-display.bin | Bin 28672 -> 28672 bytes
>  pc-bios/vgabios-cirrus.bin| Bin 39424 -> 38912 bytes
>  pc-bios/vgabios-qxl.bin   | Bin 39936 -> 39424 bytes
>  pc-bios/vgabios-ramfb.bin | Bin 29184 -> 28672 bytes
>  pc-bios/vgabios-stdvga.bin| Bin 39936 -> 39424 bytes
>  pc-bios/vgabios-virtio.bin| Bin 39936 -> 39424 bytes
>  pc-bios/vgabios-vmware.bin| Bin 39936 -> 39424 bytes
>  pc-bios/vgabios.bin   | Bin 39424 -> 38912 bytes
>  roms/config.seabios-128k  |   1 +
>  roms/seabios  |   2 +-
>  14 files changed, 2 insertions(+), 1 deletion(-)

Hi Gerd,
I think either this pull request or your edk2 pull request causes the
following CI failure:

>>> G_TEST_DBUS_DAEMON=/builds/qemu-project/qemu/tests/dbus-vmstate-daemon.sh 
>>> QTEST_QEMU_BINARY=./qemu-system-aarch64 MALLOC_PERTURB_=199 
>>> /builds/qemu-project/qemu/build/tests/qtest/bios-tables-test --tap -k
― ✀ ―
stderr:
acpi-test: Warning! SSDT binary file mismatch. Actual
[aml:/tmp/aml-IO0CB2], Expected [aml:tests/data/acpi/virt/SSDT.memhp].
See source file tests/qtest/bios-tables-test.c for instructions on how
to update expected files.
to see ASL diff between mismatched files install IASL, rebuild QEMU
from scratch and re-run tests with V=1 environment variable set**
ERROR:../tests/qtest/bios-tables-test.c:535:test_acpi_asl: assertion
failed: (all_tables_match)
(test program exited with status code -6)

https://gitlab.com/qemu-project/qemu/-/jobs/5067995448

I have dropped this pull request for now. Please take a look.

Stefan

Re: [PATCH] vdpa: fix gcc cvq_isolated uninitialized variable warning

2023-09-12 Thread Stefan Hajnoczi

On Tue, 12 Sept 2023 at 02:19, Philippe Mathieu-Daudé  wrote:
>
> On 11/9/23 23:54, Stefan Hajnoczi wrote:
> > gcc 13.2.1 emits the following warning:
> >
> >net/vhost-vdpa.c: In function ‘net_vhost_vdpa_init.constprop’:
> >net/vhost-vdpa.c:1394:25: error: ‘cvq_isolated’ may be used 
> > uninitialized [-Werror=maybe-uninitialized]
> > 1394 | s->cvq_isolated = cvq_isolated;
> >  | ^~
> >net/vhost-vdpa.c:1355:9: note: ‘cvq_isolated’ was declared here
> > 1355 | int cvq_isolated;
> >  | ^~~~
> >cc1: all warnings being treated as errors
> >
> > Cc: Eugenio Pérez 
> > Cc: Michael S. Tsirkin 
> > Cc: Jason Wang 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >   net/vhost-vdpa.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > index 34202ca009..7eaee841aa 100644
> > --- a/net/vhost-vdpa.c
> > +++ b/net/vhost-vdpa.c
> > @@ -1352,7 +1352,7 @@ static NetClientState 
> > *net_vhost_vdpa_init(NetClientState *peer,
> >   VhostVDPAState *s;
> >   int ret = 0;
> >   assert(name);
> > -int cvq_isolated;
> > +int cvq_isolated = 0;
> >
> >   if (is_datapath) {
> >   nc = qemu_new_net_client(&net_vhost_vdpa_info, peer, device,
>
> Alternatively:
>
> -- >8 --
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 34202ca009..218fe0c305 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -1352,13 +1352,12 @@ static NetClientState
> *net_vhost_vdpa_init(NetClientState *peer,
>   VhostVDPAState *s;
>   int ret = 0;
>   assert(name);
> -int cvq_isolated;
>
>   if (is_datapath) {
>   nc = qemu_new_net_client(&net_vhost_vdpa_info, peer, device,
>name);
>   } else {
> -cvq_isolated = vhost_vdpa_probe_cvq_isolation(vdpa_device_fd,
> features,
> +int cvq_isolated =
> vhost_vdpa_probe_cvq_isolation(vdpa_device_fd, features,
> queue_pair_index
> * 2,
> errp);
>   if (unlikely(cvq_isolated < 0)) {
> @@ -1391,7 +1390,7 @@ static NetClientState
> *net_vhost_vdpa_init(NetClientState *peer,
>
>   s->vhost_vdpa.shadow_vq_ops = &vhost_vdpa_net_svq_ops;
>   s->vhost_vdpa.shadow_vq_ops_opaque = s;
> -s->cvq_isolated = cvq_isolated;
> +s->cvq_isolated = true;

This is incorrect because the return value of
vhost_vdpa_probe_cvq_isolation() is -errno for errors, 0 for no cvq
isolation, and 1 for cvq isolation. A variable is still needed to
distinguish between no cvq isolation and cvq isolation.

>
> ---
>
> Whichever you prefer:
>
> Reviewed-by: Philippe Mathieu-Daudé 
>
>

[PATCH] vdpa: fix gcc cvq_isolated uninitialized variable warning

2023-09-11 Thread Stefan Hajnoczi

gcc 13.2.1 emits the following warning:

  net/vhost-vdpa.c: In function ‘net_vhost_vdpa_init.constprop’:
  net/vhost-vdpa.c:1394:25: error: ‘cvq_isolated’ may be used uninitialized 
[-Werror=maybe-uninitialized]
   1394 | s->cvq_isolated = cvq_isolated;
| ^~
  net/vhost-vdpa.c:1355:9: note: ‘cvq_isolated’ was declared here
   1355 | int cvq_isolated;
| ^~~~
  cc1: all warnings being treated as errors

Cc: Eugenio Pérez 
Cc: Michael S. Tsirkin 
Cc: Jason Wang 
Signed-off-by: Stefan Hajnoczi 
---
 net/vhost-vdpa.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 34202ca009..7eaee841aa 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -1352,7 +1352,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState 
*peer,
 VhostVDPAState *s;
 int ret = 0;
 assert(name);
-int cvq_isolated;
+int cvq_isolated = 0;
 
 if (is_datapath) {
 nc = qemu_new_net_client(&net_vhost_vdpa_info, peer, device,
-- 
2.41.0

Re: [PULL v2 00/15] Block layer patches

2023-09-11 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

Re: [PULL v2 00/45] riscv-to-apply queue

2023-09-11 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

Re: [PULL 00/13] vfio queue

2023-09-11 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

Re: [PULL 00/26] target-arm queue

2023-09-11 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

Re: [PULL 00/65] riscv-to-apply queue

2023-09-11 Thread Stefan Hajnoczi

On Mon, 11 Sept 2023 at 09:38, Daniel Henrique Barboza
 wrote:
>
> Hi Stefan,
>
> On 9/8/23 08:06, Stefan Hajnoczi wrote:
> > Hi Alistair,
> > Please take a look at the following CI failure:
> >
> > https://gitlab.com/qemu-project/qemu/-/jobs/5045998521
> >
> > /usr/bin/ld: libqemu-riscv64-softmmu.fa.p/target_riscv_cpu.c.o: in
> > function `riscv_cpu_add_kvm_properties':
> > /home/gitlab-runner/builds/E8PpwMky/0/qemu-project/qemu/build/../target/riscv/cpu.c:2146:
> > undefined reference to `kvm_riscv_init_user_properties'
>
> I do not have the 'ubuntu-22.04-aarch64-alldbg' runner enabled when running 
> the gitlab CI.
> The CI on my end is all green:
>
> https://gitlab.com/danielhb/qemu/-/pipelines/999487372
>
> IIUC this runner is one of the custom runners from 
> .gitlab-ci.d/custom-runners that aren't
> run by default. I'm not sure if it's possible to triggr it manually on my end 
> or if it's
> triggered only when attempting a merge to master.
>
> I managed to reproduce the problem by reading the test log and copying the 
> build opts. I
> fixed it on my machine but, to be really sure that it's indeed fixed, it 
> would be nice
> to execute this particular runner somehow.

Hi Daniel,
I think you can reproduce this locally using "make
EXTRA_CONFIGURE_OPTS=--enable-debug vm-build-ubuntu.aarch64". It
builds QEMU in an Ubuntu 22.04 aarch64 VM. That seems to be equivalent
to the CI job.

Stefan

>
>
> Thanks,
>
> Daniel
>
> >
> > Stefan
> >
> > On Fri, 8 Sept 2023 at 03:10, Alistair Francis  wrote:
> >>
> >> The following changes since commit 
> >> 03a3a62fbd0aa5227e978eef3c67d3978aec9e5f:
> >>
> >>Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into 
> >> staging (2023-09-07 10:29:06 -0400)
> >>
> >> are available in the Git repository at:
> >>
> >>https://github.com/alistair23/qemu.git tags/pull-riscv-to-apply-20230908
> >>
> >> for you to fetch changes up to 69749970db9f1b05c8cd77a7bbb45e4e156f7d33:
> >>
> >>target/riscv/cpu.c: consider user option with RVG (2023-09-08 15:57:39 
> >> +1000)
> >>
> >> 
> >> First RISC-V PR for 8.2
> >>
> >>   * Remove 'host' CPU from TCG
> >>   * riscv_htif Fixup printing on big endian hosts
> >>   * Add zmmul isa string
> >>   * Add smepmp isa string
> >>   * Fix page_check_range use in fault-only-first
> >>   * Use existing lookup tables for MixColumns
> >>   * Add RISC-V vector cryptographic instruction set support
> >>   * Implement WARL behaviour for mcountinhibit/mcounteren
> >>   * Add Zihintntl extension ISA string to DTS
> >>   * Fix zfa fleq.d and fltq.d
> >>   * Fix upper/lower mtime write calculation
> >>   * Make rtc variable names consistent
> >>   * Use abi type for linux-user target_ucontext
> >>   * Add RISC-V KVM AIA Support
> >>   * Fix riscv,pmu DT node path in the virt machine
> >>   * Update CSR bits name for svadu extension
> >>   * Mark zicond non-experimental
> >>   * Fix satp_mode_finalize() when satp_mode.supported = 0
> >>   * Fix non-KVM --enable-debug build
> >>   * Add new extensions to hwprobe
> >>   * Use accelerated helper for AES64KS1I
> >>   * Allocate itrigger timers only once
> >>   * Respect mseccfg.RLB for pmpaddrX changes
> >>   * Align the AIA model to v1.0 ratified spec
> >>   * Don't read the CSR in riscv_csrrw_do64
> >>   * Add the 'max' CPU, detect user choice in TCG
> >>
> >> 
> >> Akihiko Odaki (1):
> >>target/riscv: Allocate itrigger timers only once
> >>
> >> Ard Biesheuvel (2):
> >>target/riscv: Use existing lookup tables for MixColumns
> >>target/riscv: Use accelerated helper for AES64KS1I
> >>
> >> Conor Dooley (1):
> >>hw/riscv: virt: Fix riscv,pmu DT node path
> >>
> >> Daniel Henrique Barboza (26):
> >>target/riscv/cpu.c: do not run 'host' CPU with TCG
> >>target/riscv/cpu.c: add zmmul isa string
> >>target/riscv/cpu.c: add smepmp isa string
> >>target/riscv: fix satp_mode_finalize() when satp_mode.supported = 0
> >>hw/riscv/virt.c: fix non-KVM --enable-debug build
> >>

Re: [PATCH] target/i386: Re-introduce few KVM stubs for Clang debug builds

2023-09-11 Thread Stefan Hajnoczi

Or instead of using linker behavior, maybe just change the #ifdef so it
only applies when KVM is disabled. I didn't look at the code to see if this
is possible, but it would be nice to avoid the very specific #ifdef
condition in this patch.

Stefan

On Mon, Sep 11, 2023, 07:15 Stefan Hajnoczi  wrote:

> On Mon, 11 Sept 2023 at 06:39, Philippe Mathieu-Daudé 
> wrote:
> >
> > Since commits 3adce820cf..ef1cf6890f, When building on
> > a x86 host configured as:
> >
> >   $ ./configure --cc=clang \
> > --target-list=x86_64-linux-user,x86_64-softmmu \
> > --enable-debug
> >
> > we get:
> >
> >   [71/71] Linking target qemu-x86_64
> >   FAILED: qemu-x86_64
> >   /usr/bin/ld: libqemu-x86_64-linux-user.fa.p/target_i386_cpu.c.o: in
> function `cpu_x86_cpuid':
> >   cpu.c:(.text+0x1374): undefined reference to
> `kvm_arch_get_supported_cpuid'
> >   /usr/bin/ld: libqemu-x86_64-linux-user.fa.p/target_i386_cpu.c.o: in
> function `x86_cpu_filter_features':
> >   cpu.c:(.text+0x81c2): undefined reference to
> `kvm_arch_get_supported_cpuid'
> >   /usr/bin/ld: cpu.c:(.text+0x81da): undefined reference to
> `kvm_arch_get_supported_cpuid'
> >   /usr/bin/ld: cpu.c:(.text+0x81f2): undefined reference to
> `kvm_arch_get_supported_cpuid'
> >   /usr/bin/ld: cpu.c:(.text+0x820a): undefined reference to
> `kvm_arch_get_supported_cpuid'
> >   /usr/bin/ld:
> libqemu-x86_64-linux-user.fa.p/target_i386_cpu.c.o:cpu.c:(.text+0x8225):
> more undefined references to `kvm_arch_get_supported_cpuid' follow
> >   clang: error: linker command failed with exit code 1 (use -v to see
> invocation)
> >   ninja: build stopped: subcommand failed.
> >
> > '--enable-debug' disables optimizations (CFLAGS=-O0).
> >
> > While at this (un)optimization level GCC eliminate the
> > following dead code:
> >
> >   if (0 && foo()) {
> >   ...
> >   }
> >
> > Clang does not. Therefore restore a pair of stubs for
> > unoptimized Clang builds.
> >
> > Reported-by: Kevin Wolf 
> > Fixes: 3adce820cf ("target/i386: Remove unused KVM stubs")
> > Fixes: ef1cf6890f ("target/i386: Allow elision of
> kvm_hv_vpindex_settable()")
> > Signed-off-by: Philippe Mathieu-Daudé 
> > ---
> >  target/i386/kvm/kvm_i386.h | 21 ++---
> >  1 file changed, 18 insertions(+), 3 deletions(-)
> >
> > diff --git a/target/i386/kvm/kvm_i386.h b/target/i386/kvm/kvm_i386.h
> > index 55d4e68c34..0b62ac628f 100644
> > --- a/target/i386/kvm/kvm_i386.h
> > +++ b/target/i386/kvm/kvm_i386.h
> > @@ -32,7 +32,6 @@
> >
> >  bool kvm_has_smm(void);
> >  bool kvm_enable_x2apic(void);
> > -bool kvm_hv_vpindex_settable(void);
> >  bool kvm_has_pit_state2(void);
> >
> >  bool kvm_enable_sgx_provisioning(KVMState *s);
> > @@ -41,8 +40,6 @@ bool kvm_hyperv_expand_features(X86CPU *cpu, Error
> **errp);
> >  void kvm_arch_reset_vcpu(X86CPU *cs);
> >  void kvm_arch_after_reset_vcpu(X86CPU *cpu);
> >  void kvm_arch_do_init_vcpu(X86CPU *cs);
> > -uint32_t kvm_arch_get_supported_cpuid(KVMState *env, uint32_t function,
> > -  uint32_t index, int reg);
> >  uint64_t kvm_arch_get_supported_msr_feature(KVMState *s, uint32_t
> index);
> >
> >  void kvm_set_max_apic_id(uint32_t max_apic_id);
> > @@ -60,6 +57,10 @@ void kvm_put_apicbase(X86CPU *cpu, uint64_t value);
> >
> >  bool kvm_has_x2apic_api(void);
> >  bool kvm_has_waitpkg(void);
> > +bool kvm_hv_vpindex_settable(void);
> > +
> > +uint32_t kvm_arch_get_supported_cpuid(KVMState *env, uint32_t function,
> > +  uint32_t index, int reg);
> >
> >  uint64_t kvm_swizzle_msi_ext_dest_id(uint64_t address);
> >  void kvm_update_msi_routes_all(void *private, bool global,
> > @@ -76,6 +77,20 @@ typedef struct kvm_msr_handlers {
> >  bool kvm_filter_msr(KVMState *s, uint32_t msr, QEMURDMSRHandler *rdmsr,
> >  QEMUWRMSRHandler *wrmsr);
> >
> > +#elif defined(__clang__) && !defined(__OPTIMIZE__)
>
> Another approach is a static library with a .o file containing the
> stubs so the linker only includes it in the executable if the compiler
> emitted the symbols. That way there is no need for defined(__clang__)
> && !defined(__OPTIMIZE__) and it will work with other
> compilers/optimization levels. It's more work to set up though.
>
> Reviewed-by: Stefan Hajnoczi 
>

Re: [PATCH] target/i386: Re-introduce few KVM stubs for Clang debug builds

2023-09-11 Thread Stefan Hajnoczi

On Mon, 11 Sept 2023 at 06:39, Philippe Mathieu-Daudé  wrote:
>
> Since commits 3adce820cf..ef1cf6890f, When building on
> a x86 host configured as:
>
>   $ ./configure --cc=clang \
> --target-list=x86_64-linux-user,x86_64-softmmu \
> --enable-debug
>
> we get:
>
>   [71/71] Linking target qemu-x86_64
>   FAILED: qemu-x86_64
>   /usr/bin/ld: libqemu-x86_64-linux-user.fa.p/target_i386_cpu.c.o: in 
> function `cpu_x86_cpuid':
>   cpu.c:(.text+0x1374): undefined reference to `kvm_arch_get_supported_cpuid'
>   /usr/bin/ld: libqemu-x86_64-linux-user.fa.p/target_i386_cpu.c.o: in 
> function `x86_cpu_filter_features':
>   cpu.c:(.text+0x81c2): undefined reference to `kvm_arch_get_supported_cpuid'
>   /usr/bin/ld: cpu.c:(.text+0x81da): undefined reference to 
> `kvm_arch_get_supported_cpuid'
>   /usr/bin/ld: cpu.c:(.text+0x81f2): undefined reference to 
> `kvm_arch_get_supported_cpuid'
>   /usr/bin/ld: cpu.c:(.text+0x820a): undefined reference to 
> `kvm_arch_get_supported_cpuid'
>   /usr/bin/ld: 
> libqemu-x86_64-linux-user.fa.p/target_i386_cpu.c.o:cpu.c:(.text+0x8225): more 
> undefined references to `kvm_arch_get_supported_cpuid' follow
>   clang: error: linker command failed with exit code 1 (use -v to see 
> invocation)
>   ninja: build stopped: subcommand failed.
>
> '--enable-debug' disables optimizations (CFLAGS=-O0).
>
> While at this (un)optimization level GCC eliminate the
> following dead code:
>
>   if (0 && foo()) {
>   ...
>   }
>
> Clang does not. Therefore restore a pair of stubs for
> unoptimized Clang builds.
>
> Reported-by: Kevin Wolf 
> Fixes: 3adce820cf ("target/i386: Remove unused KVM stubs")
> Fixes: ef1cf6890f ("target/i386: Allow elision of kvm_hv_vpindex_settable()")
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  target/i386/kvm/kvm_i386.h | 21 ++---
>  1 file changed, 18 insertions(+), 3 deletions(-)
>
> diff --git a/target/i386/kvm/kvm_i386.h b/target/i386/kvm/kvm_i386.h
> index 55d4e68c34..0b62ac628f 100644
> --- a/target/i386/kvm/kvm_i386.h
> +++ b/target/i386/kvm/kvm_i386.h
> @@ -32,7 +32,6 @@
>
>  bool kvm_has_smm(void);
>  bool kvm_enable_x2apic(void);
> -bool kvm_hv_vpindex_settable(void);
>  bool kvm_has_pit_state2(void);
>
>  bool kvm_enable_sgx_provisioning(KVMState *s);
> @@ -41,8 +40,6 @@ bool kvm_hyperv_expand_features(X86CPU *cpu, Error **errp);
>  void kvm_arch_reset_vcpu(X86CPU *cs);
>  void kvm_arch_after_reset_vcpu(X86CPU *cpu);
>  void kvm_arch_do_init_vcpu(X86CPU *cs);
> -uint32_t kvm_arch_get_supported_cpuid(KVMState *env, uint32_t function,
> -  uint32_t index, int reg);
>  uint64_t kvm_arch_get_supported_msr_feature(KVMState *s, uint32_t index);
>
>  void kvm_set_max_apic_id(uint32_t max_apic_id);
> @@ -60,6 +57,10 @@ void kvm_put_apicbase(X86CPU *cpu, uint64_t value);
>
>  bool kvm_has_x2apic_api(void);
>  bool kvm_has_waitpkg(void);
> +bool kvm_hv_vpindex_settable(void);
> +
> +uint32_t kvm_arch_get_supported_cpuid(KVMState *env, uint32_t function,
> +  uint32_t index, int reg);
>
>  uint64_t kvm_swizzle_msi_ext_dest_id(uint64_t address);
>  void kvm_update_msi_routes_all(void *private, bool global,
> @@ -76,6 +77,20 @@ typedef struct kvm_msr_handlers {
>  bool kvm_filter_msr(KVMState *s, uint32_t msr, QEMURDMSRHandler *rdmsr,
>  QEMUWRMSRHandler *wrmsr);
>
> +#elif defined(__clang__) && !defined(__OPTIMIZE__)

Another approach is a static library with a .o file containing the
stubs so the linker only includes it in the executable if the compiler
emitted the symbols. That way there is no need for defined(__clang__)
&& !defined(__OPTIMIZE__) and it will work with other
compilers/optimization levels. It's more work to set up though.

Reviewed-by: Stefan Hajnoczi

Re: [PULL 00/51] Build system, i386 changes for 2023-09-07

2023-09-11 Thread Stefan Hajnoczi

On Mon, 11 Sept 2023 at 06:10, Philippe Mathieu-Daudé  wrote:
>
> On 8/9/23 17:47, Stefan Hajnoczi wrote:
> > I wonder how it passed CI?
> > https://gitlab.com/qemu-project/qemu/-/pipelines/996175923/
>
> The conditions are:
> - x86 host
> - both system / user emulation enabled
> - KVM disabled
> - debug enabled
>
> We have jobs with 3 of the 4, but none with
> all the 4.
>
> Is it worth test it?

I think so.

Kevin: Can you confirm your configuration matches what Philippe described?

Stefan

>
> >
> > Stefan
> >
> > On Fri, 8 Sept 2023 at 11:02, Kevin Wolf  wrote:
> >>
> >> Am 07.09.2023 um 17:44 hat Stefan Hajnoczi geschrieben:
> >>> Applied, thanks.
> >>>
> >>> Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for 
> >>> any user-visible changes.
> >>
> >> Something in this has broken the build for me, it seems to be the
> >> linux-user binary that doesn't link any more:
> >>
> >>/usr/bin/ld: libqemu-x86_64-linux-user.fa.p/target_i386_cpu.c.o: in 
> >> function `cpu_x86_cpuid':
> >>/home/kwolf/source/qemu/build-clang/../target/i386/cpu.c:6180: 
> >> undefined reference to `kvm_arch_get_supported_cpuid'
> >>/usr/bin/ld: libqemu-x86_64-linux-user.fa.p/target_i386_cpu.c.o: in 
> >> function `x86_cpu_filter_features':
> >>/home/kwolf/source/qemu/build-clang/../target/i386/cpu.c:7158: 
> >> undefined reference to `kvm_arch_get_supported_cpuid'
> >>/usr/bin/ld: 
> >> /home/kwolf/source/qemu/build-clang/../target/i386/cpu.c:7159: undefined 
> >> reference to `kvm_arch_get_supported_cpuid'
> >>/usr/bin/ld: 
> >> /home/kwolf/source/qemu/build-clang/../target/i386/cpu.c:7160: undefined 
> >> reference to `kvm_arch_get_supported_cpuid'
> >>/usr/bin/ld: 
> >> /home/kwolf/source/qemu/build-clang/../target/i386/cpu.c:7161: undefined 
> >> reference to `kvm_arch_get_supported_cpuid'
> >>/usr/bin/ld: 
> >> libqemu-x86_64-linux-user.fa.p/target_i386_cpu.c.o:/home/kwolf/source/qemu/build-clang/../target/i386/cpu.c:7162:
> >>  more undefined references to `kvm_arch_get_supported_cpuid' follow
> >>clang-15: error: linker command failed with exit code 1 (use -v to see 
> >> invocation)
> >>
> >> In case it makes a difference, I'm using clang on F37.
> >>
> >> Kevin
> >
>

Re: [PULL 0/5] Block patches

2023-09-10 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

Re: [PULL v2 00/13] NBD patches for 2023-09-07

2023-09-10 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

Re: [PULL v2 00/22] Trivial patches for 2023-09-08

2023-09-10 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

Re: [PULL 00/51] Build system, i386 changes for 2023-09-07

2023-09-08 Thread Stefan Hajnoczi

I wonder how it passed CI?
https://gitlab.com/qemu-project/qemu/-/pipelines/996175923/

Stefan

On Fri, 8 Sept 2023 at 11:02, Kevin Wolf  wrote:
>
> Am 07.09.2023 um 17:44 hat Stefan Hajnoczi geschrieben:
> > Applied, thanks.
> >
> > Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any 
> > user-visible changes.
>
> Something in this has broken the build for me, it seems to be the
> linux-user binary that doesn't link any more:
>
>   /usr/bin/ld: libqemu-x86_64-linux-user.fa.p/target_i386_cpu.c.o: in 
> function `cpu_x86_cpuid':
>   /home/kwolf/source/qemu/build-clang/../target/i386/cpu.c:6180: undefined 
> reference to `kvm_arch_get_supported_cpuid'
>   /usr/bin/ld: libqemu-x86_64-linux-user.fa.p/target_i386_cpu.c.o: in 
> function `x86_cpu_filter_features':
>   /home/kwolf/source/qemu/build-clang/../target/i386/cpu.c:7158: undefined 
> reference to `kvm_arch_get_supported_cpuid'
>   /usr/bin/ld: /home/kwolf/source/qemu/build-clang/../target/i386/cpu.c:7159: 
> undefined reference to `kvm_arch_get_supported_cpuid'
>   /usr/bin/ld: /home/kwolf/source/qemu/build-clang/../target/i386/cpu.c:7160: 
> undefined reference to `kvm_arch_get_supported_cpuid'
>   /usr/bin/ld: /home/kwolf/source/qemu/build-clang/../target/i386/cpu.c:7161: 
> undefined reference to `kvm_arch_get_supported_cpuid'
>   /usr/bin/ld: 
> libqemu-x86_64-linux-user.fa.p/target_i386_cpu.c.o:/home/kwolf/source/qemu/build-clang/../target/i386/cpu.c:7162:
>  more undefined references to `kvm_arch_get_supported_cpuid' follow
>   clang-15: error: linker command failed with exit code 1 (use -v to see 
> invocation)
>
> In case it makes a difference, I'm using clang on F37.
>
> Kevin

Re: [RFC PATCH v2] docs/interop: define PROBE feature for vhost-user VirtIO devices

2023-09-08 Thread Stefan Hajnoczi

On Fri, Sep 08, 2023 at 01:03:26PM +0100, Alex Bennée wrote:
> 
> Stefan Hajnoczi  writes:
> 
> > On Fri, Sep 01, 2023 at 12:00:18PM +0100, Alex Bennée wrote:
> >> Currently QEMU has to know some details about the VirtIO device
> >> supported by a vhost-user daemon to be able to setup the guest. This
> >> makes it hard for QEMU to add support for additional vhost-user
> >> daemons without adding specific stubs for each additional VirtIO
> >> device.
> >> 
> >> This patch suggests a new feature flag (VHOST_USER_PROTOCOL_F_PROBE)
> >> which the back-end can advertise which allows a probe message to be
> >> sent to get all the details QEMU needs to know in one message.
> >> 
> >> Together with the existing features VHOST_USER_PROTOCOL_F_STATUS and
> >> VHOST_USER_PROTOCOL_F_CONFIG we can create "standalone" vhost-user
> >> daemons which are capable of handling all aspects of the VirtIO
> >> transactions with only a generic stub on the QEMU side. These daemons
> >> can also be used without QEMU in situations where there isn't a full
> >> VMM managing their setup.
> >> 
> >> Signed-off-by: Alex Bennée 
> >
> > I think the mindset for this change should be "vhost-user is becoming a
> > VIRTIO Transport". VIRTIO Transports have a reasonably well-defined
> > feature set in the VIRTIO specification. The goal should be to cover
> > every VIRTIO Transport operation via vhost-user protocol messages so
> > that the VIRTIO device model can be fully conveyed over vhost-user.
> 
> Is it though? The transport is a guest visible construct whereas
> vhost-user is purely a backend implementation detail that should be
> invisible to the guest.

No, the transport is not necessarily guest-visible. The vhost-user model
is that the front-end emulates a VIRTIO device and some aspects of that
device are delegated to the vhost-user back-end.

In other words, the vhost-user device is not the same as the VIRTIO
device that the guest sees, but it's still important for the vhost-user
back-end to be a VIRTIO Transport because that's how we can be sure it
supports the VIRTIO device model properly.

> 
> Also the various backends do things a different set of ways. The
> differences between MMIO and PCI are mostly around where config space is
> and how IRQs are handled. For CCW we do actually have a set of commands
> we can look at:
> 
>   #define CCW_CMD_SET_VQ 0x13 
>   #define CCW_CMD_VDEV_RESET 0x33 
>   #define CCW_CMD_SET_IND 0x43 
>   #define CCW_CMD_SET_CONF_IND 0x53 
>   #define CCW_CMD_SET_IND_ADAPTER 0x73 
>   #define CCW_CMD_READ_FEAT 0x12 
>   #define CCW_CMD_WRITE_FEAT 0x11 
>   #define CCW_CMD_READ_CONF 0x22 
>   #define CCW_CMD_WRITE_CONF 0x21 
>   #define CCW_CMD_WRITE_STATUS 0x31 
>   #define CCW_CMD_READ_VQ_CONF 0x32 
>   #define CCW_CMD_SET_VIRTIO_REV 0x83 
>   #define CCW_CMD_READ_STATUS 0x72
> 
> which I think we already have mappings for.

Yes, there are differences between the transports. vhost-user uses
eventfds (callfd/kickfd) instead of interrupts.

> > Anything less is yet another ad-hoc protocol extension that will lead to
> > more bugs and hacks when it turns out some VIRTIO devices cannot be
> > expressed due to limitations in the protocol.
> 
> I agree we want to do this right.
> 
> > This requires going through the VIRTIO spec to find a correspondence
> > between virtio-pci/virtio-mmio/virtio-ccw's interfaces and vhost-user
> > protocol messages. In most cases vhost-user already offers messages and
> > your patch adds more of what is missing. I think this effort is already
> > very close but missing the final check that it really matches the VIRTIO
> > spec.
> >
> > Please do the comparison against the VIRTIO Transports and then adjust
> > this patch to make it clear that the back-end is becoming a full-fledged
> > VIRTIO Transport:
> > - The name of the patch series should reflect that.
> > - The vhost-user protocol feature should be named F_TRANSPORT.
> > - The messages added in this patch should have a 1:1 correspondence with
> >   the VIRTIO spec including using the same terminology for consistency.
> >
> > Sorry for the hassle, but I think this is a really crucial point where
> > we have the chance to make vhost-user work smoothly in the future...but
> > only if we can faithfully expose VIRTIO Transport semantics.
> 
> I wonder if first be handled by cleaning up the VirtIO spec to make it
> clear what capabilities each transport needs to support?

It's a fair point that the VIRTIO spec does not provide an interface
definition for th

Re: [virtio-dev] [RFC PATCH v2] docs/interop: define PROBE feature for vhost-user VirtIO devices

2023-09-08 Thread Stefan Hajnoczi

A QEMU built-in VIRTIO device will also call virtio_add_queue() for
the maximum number of virtqueues.

I'm not sure what the concern is about adding as few virtqueues as possible?

If the front-end's implementation is inefficient, then it should be
optimized so that untouched virtqueues don't consume resources. I
don't see the need to try to add a special message to vhost-user to
try to reduce the number of virtqueues.

Stefan

On Fri, 8 Sept 2023 at 08:03, Alex Bennée  wrote:
>
>
> Stefan Hajnoczi  writes:
>
> > On Fri, 8 Sept 2023 at 02:43, Alex Bennée  wrote:
> >>
> >>
> >> Stefan Hajnoczi  writes:
> >>
> >> > On Tue, Sep 05, 2023 at 10:34:11AM +0100, Alex Bennée wrote:
> >> >>
> >> >> Albert Esteve  writes:
> >> >>
> >> >> > This looks great! Thanks for this proposal.
> >> >> >
> >> >> > On Fri, Sep 1, 2023 at 1:00 PM Alex Bennée  
> >> >> > wrote:
> >> >> >
> >> >> >  Currently QEMU has to know some details about the VirtIO device
> >> >> >  supported by a vhost-user daemon to be able to setup the guest. This
> >> >> >  makes it hard for QEMU to add support for additional vhost-user
> >> >> >  daemons without adding specific stubs for each additional VirtIO
> >> >> >  device.
> >> >> >
> >> >> >  This patch suggests a new feature flag (VHOST_USER_PROTOCOL_F_PROBE)
> >> >> >  which the back-end can advertise which allows a probe message to be
> >> >> >  sent to get all the details QEMU needs to know in one message.
> >> >> >
> >> >> >  Together with the existing features VHOST_USER_PROTOCOL_F_STATUS and
> >> >> >  VHOST_USER_PROTOCOL_F_CONFIG we can create "standalone" vhost-user
> >> >> >  daemons which are capable of handling all aspects of the VirtIO
> >> >> >  transactions with only a generic stub on the QEMU side. These daemons
> >> >> >  can also be used without QEMU in situations where there isn't a full
> >> >> >  VMM managing their setup.
> >> >> >
> >> >> >  Signed-off-by: Alex Bennée 
> >> >> >
> >> >> >  ---
> >> >> >  v2
> >> >> >- dropped F_STANDALONE in favour of F_PROBE
> >> >> >- split probe details across several messages
> >> >> >- probe messages don't automatically imply a standalone daemon
> >> >> >- add wording where probe details interact (F_MQ/F_CONFIG)
> >> >> >- define VMM and make clear QEMU is only one of many potential VMMs
> >> >> >- reword commit message
> >> >> >  ---
> >> >> >   docs/interop/vhost-user.rst | 90 
> >> >> > -
> >> >> >   hw/virtio/vhost-user.c  |  8 
> >> >> >   2 files changed, 88 insertions(+), 10 deletions(-)
> >> >> >
> >> >> >  diff --git a/docs/interop/vhost-user.rst 
> >> >> > b/docs/interop/vhost-user.rst
> >> >> >  index 5a070adbc1..ba3b5e07b7 100644
> >> >> >  --- a/docs/interop/vhost-user.rst
> >> >> >  +++ b/docs/interop/vhost-user.rst
> >> >> >  @@ -7,6 +7,7 @@ Vhost-user Protocol
> >> >> >   ..
> >> >> > Copyright 2014 Virtual Open Systems Sarl.
> >> >> > Copyright 2019 Intel Corporation
> >> >> >  +  Copyright 2023 Linaro Ltd
> >> >> > Licence: This work is licensed under the terms of the GNU GPL,
> >> >> >  version 2 or later. See the COPYING file in the top-level
> >> >> >  directory.
> >> >> >  @@ -27,17 +28,31 @@ The protocol defines 2 sides of the 
> >> >> > communication, *front-end* and
> >> >> >   *back-end*. The *front-end* is the application that shares its 
> >> >> > virtqueues, in
> >> >> >   our case QEMU. The *back-end* is the consumer of the virtqueues.
> >> >> >
> >> >> >  -In the current implementation QEMU is the *front-end*, and the 
> >> >> > *back-end*
> >> >> >  -is the external process consuming the virtio queues, for example a
> >> >> >  -software Ethernet switch running in user

Re: [PULL 00/17] Net patches

2023-09-08 Thread Stefan Hajnoczi

Hi Ilya and Jason,
There is a CI failure related to a missing Debian libxdp-dev package:
https://gitlab.com/qemu-project/qemu/-/jobs/5046139967

I think the issue is that the debian-amd64 container image that QEMU
uses for testing is based on Debian 11 ("bullseye" aka "oldstable")
and libxdp is not available on that release:
https://packages.debian.org/search?keywords=libxdp&searchon=names&suite=oldstable§ion=all

If we need to support Debian 11 CI then either XDP could be disabled
for that distro or libxdp could be compiled from source.

I have CCed Daniel Berrangé, because I think he knows how lcitool and
QEMU's minimum distro requirements work. Maybe he can suggest a way
forward.

Thanks,
Stefan

Re: [PULL 00/65] riscv-to-apply queue

2023-09-08 Thread Stefan Hajnoczi

Hi Alistair,
Please take a look at the following CI failure:

https://gitlab.com/qemu-project/qemu/-/jobs/5045998521

/usr/bin/ld: libqemu-riscv64-softmmu.fa.p/target_riscv_cpu.c.o: in
function `riscv_cpu_add_kvm_properties':
/home/gitlab-runner/builds/E8PpwMky/0/qemu-project/qemu/build/../target/riscv/cpu.c:2146:
undefined reference to `kvm_riscv_init_user_properties'

Stefan

On Fri, 8 Sept 2023 at 03:10, Alistair Francis  wrote:
>
> The following changes since commit 03a3a62fbd0aa5227e978eef3c67d3978aec9e5f:
>
>   Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging 
> (2023-09-07 10:29:06 -0400)
>
> are available in the Git repository at:
>
>   https://github.com/alistair23/qemu.git tags/pull-riscv-to-apply-20230908
>
> for you to fetch changes up to 69749970db9f1b05c8cd77a7bbb45e4e156f7d33:
>
>   target/riscv/cpu.c: consider user option with RVG (2023-09-08 15:57:39 
> +1000)
>
> 
> First RISC-V PR for 8.2
>
>  * Remove 'host' CPU from TCG
>  * riscv_htif Fixup printing on big endian hosts
>  * Add zmmul isa string
>  * Add smepmp isa string
>  * Fix page_check_range use in fault-only-first
>  * Use existing lookup tables for MixColumns
>  * Add RISC-V vector cryptographic instruction set support
>  * Implement WARL behaviour for mcountinhibit/mcounteren
>  * Add Zihintntl extension ISA string to DTS
>  * Fix zfa fleq.d and fltq.d
>  * Fix upper/lower mtime write calculation
>  * Make rtc variable names consistent
>  * Use abi type for linux-user target_ucontext
>  * Add RISC-V KVM AIA Support
>  * Fix riscv,pmu DT node path in the virt machine
>  * Update CSR bits name for svadu extension
>  * Mark zicond non-experimental
>  * Fix satp_mode_finalize() when satp_mode.supported = 0
>  * Fix non-KVM --enable-debug build
>  * Add new extensions to hwprobe
>  * Use accelerated helper for AES64KS1I
>  * Allocate itrigger timers only once
>  * Respect mseccfg.RLB for pmpaddrX changes
>  * Align the AIA model to v1.0 ratified spec
>  * Don't read the CSR in riscv_csrrw_do64
>  * Add the 'max' CPU, detect user choice in TCG
>
> 
> Akihiko Odaki (1):
>   target/riscv: Allocate itrigger timers only once
>
> Ard Biesheuvel (2):
>   target/riscv: Use existing lookup tables for MixColumns
>   target/riscv: Use accelerated helper for AES64KS1I
>
> Conor Dooley (1):
>   hw/riscv: virt: Fix riscv,pmu DT node path
>
> Daniel Henrique Barboza (26):
>   target/riscv/cpu.c: do not run 'host' CPU with TCG
>   target/riscv/cpu.c: add zmmul isa string
>   target/riscv/cpu.c: add smepmp isa string
>   target/riscv: fix satp_mode_finalize() when satp_mode.supported = 0
>   hw/riscv/virt.c: fix non-KVM --enable-debug build
>   hw/intc/riscv_aplic.c fix non-KVM --enable-debug build
>   target/riscv/cpu.c: split CPU options from riscv_cpu_extensions[]
>   target/riscv/cpu.c: skip 'bool' check when filtering KVM props
>   target/riscv/cpu.c: split kvm prop handling to its own helper
>   target/riscv: add DEFINE_PROP_END_OF_LIST() to riscv_cpu_options[]
>   target/riscv/cpu.c: split non-ratified exts from riscv_cpu_extensions[]
>   target/riscv/cpu.c: split vendor exts from riscv_cpu_extensions[]
>   target/riscv/cpu.c: add riscv_cpu_add_qdev_prop_array()
>   target/riscv/cpu.c: add riscv_cpu_add_kvm_unavail_prop_array()
>   target/riscv/cpu.c: limit cfg->vext_spec log message
>   target/riscv: add 'max' CPU type
>   avocado, risc-v: add tuxboot tests for 'max' CPU
>   target/riscv: deprecate the 'any' CPU type
>   target/riscv/cpu.c: use offset in isa_ext_is_enabled/update_enabled
>   target/riscv: make CPUCFG() macro public
>   target/riscv/cpu.c: introduce cpu_cfg_ext_auto_update()
>   target/riscv/cpu.c: use cpu_cfg_ext_auto_update() during realize()
>   target/riscv/cpu.c: introduce RISCVCPUMultiExtConfig
>   target/riscv: use isa_ext_update_enabled() in init_max_cpu_extensions()
>   target/riscv/cpu.c: honor user choice in cpu_cfg_ext_auto_update()
>   target/riscv/cpu.c: consider user option with RVG
>
> Dickon Hood (2):
>   target/riscv: Refactor translation of vector-widening instruction
>   target/riscv: Add Zvbb ISA extension support
>
> Jason Chien (3):
>   target/riscv: Add Zihintntl extension ISA string to DTS
>   hw/intc: Fix upper/lower mtime write calculation
>   hw/intc: Make rtc variable names consistent
>
> Kiran Ostrolenk (4):
>   target/riscv: Refactor some of the generic vector functionality
>   target/riscv: Refactor vector-vector translation macro
>   target/riscv: Refactor some of the generic vector functionality
>   target/riscv: Add Zvknh ISA extension support
>
> LIU Zhiwei (3):
>   target/riscv: Fix page_check_range use in fault-only-first
>   target/riscv: Fix zfa fleq.d and fltq.d
>   linu

Re: [PULL 12/13] qemu-nbd: Restore "qemu-nbd -v --fork" output

2023-09-08 Thread Stefan Hajnoczi

Please resolve the following CI failure:

https://gitlab.com/qemu-project/qemu/-/jobs/5045998355

ninja: job failed: cc -m64 -mcx16 -Iqemu-nbd.p -I. -I.. -Iqapi -Itrace
-Iui -Iui/shader -I/usr/include/p11-kit-1 -I/usr/include/glib-2.0
-I/usr/lib/glib-2.0/include -fdiagnostics-color=auto -Wall
-Winvalid-pch -Werror -std=gnu11 -O2 -g -fstack-protector-strong
-U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -Wundef -Wwrite-strings
-Wmissing-prototypes -Wstrict-prototypes -Wredundant-decls
-Wold-style-declaration -Wold-style-definition -Wtype-limits
-Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers
-Wempty-body -Wnested-externs -Wendif-labels -Wexpansion-to-defined
-Wimplicit-fallthrough=2 -Wmissing-format-attribute
-Wno-missing-include-dirs -Wno-shift-negative-value -Wno-psabi
-isystem /builds/qemu-project/qemu/linux-headers -isystem
linux-headers -iquote . -iquote /builds/qemu-project/qemu -iquote
/builds/qemu-project/qemu/include -iquote
/builds/qemu-project/qemu/host/include/x86_64 -iquote
/builds/qemu-project/qemu/host/include/generic -iquote
/builds/qemu-project/qemu/tcg/i386 -pthread -D_GNU_SOURCE
-D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -fno-strict-aliasing
-fno-common -fwrapv -fPIE -MD -MQ qemu-nbd.p/qemu-nbd.c.o -MF
qemu-nbd.p/qemu-nbd.c.o.d -o qemu-nbd.p/qemu-nbd.c.o -c ../qemu-nbd.c
In file included from /usr/include/fortify/stdio.h:23,
from ../include/qemu/osdep.h:110,
from ../qemu-nbd.c:19:
../qemu-nbd.c: In function 'nbd_client_thread':
../qemu-nbd.c:340:39: error: expected identifier before '(' token
340 | nbd_client_release_pipe(opts->stderr);
| ^~
../qemu-nbd.c: In function 'main':
../qemu-nbd.c:605:10: error: expected identifier before '(' token
605 | .stderr = STDOUT_FILENO,
| ^~
../qemu-nbd.c:962:22: error: expected identifier before '(' token
962 | opts.stderr = dup(STDERR_FILENO);
| ^~
../qemu-nbd.c:963:26: error: expected identifier before '(' token
963 | if (opts.stderr < 0) {
| ^~
../qemu-nbd.c:1200:38: error: expected identifier before '(' token
1200 | nbd_client_release_pipe(opts.stderr);
| ^~

On Thu, 7 Sept 2023 at 21:37, Eric Blake  wrote:
>
> From: "Denis V. Lunev" 
>
> Closing stderr earlier is good for daemonized qemu-nbd under ssh
> earlier, but breaks the case where -v is being used to track what is
> happening in the server, as in iotest 233.
>
> When we know we are verbose, we should preserve original stderr and
> restore it once the setup stage is done. This commit restores the
> original behavior with -v option. In this case original output
> inside the test is kept intact.
>
> Reported-by: Kevin Wolf 
> Signed-off-by: Denis V. Lunev 
> CC: Eric Blake 
> CC: Vladimir Sementsov-Ogievskiy 
> CC: Hanna Reitz 
> CC: Mike Maslenkin 
> Fixes: 5c56dd27a2 ("qemu-nbd: fix regression with qemu-nbd --fork run over 
> ssh")
> Message-ID: <20230906093210.339585-7-...@openvz.org>
> Reviewed-by: Eric Blake 
> Tested-by: Eric Blake 
> Signed-off-by: Eric Blake 
> ---
>  qemu-nbd.c | 24 
>  1 file changed, 20 insertions(+), 4 deletions(-)
>
> diff --git a/qemu-nbd.c b/qemu-nbd.c
> index 7c4e22def17..1cdc41ed292 100644
> --- a/qemu-nbd.c
> +++ b/qemu-nbd.c
> @@ -255,18 +255,23 @@ struct NbdClientOpts {
>  char *device;
>  char *srcpath;
>  SocketAddress *saddr;
> +int stderr;
>  bool fork_process;
>  bool verbose;
>  };
>
> -static void nbd_client_release_pipe(void)
> +static void nbd_client_release_pipe(int old_stderr)
>  {
>  /* Close stderr so that the qemu-nbd process exits.  */
> -if (dup2(STDOUT_FILENO, STDERR_FILENO) < 0) {
> +if (dup2(old_stderr, STDERR_FILENO) < 0) {
>  error_report("Could not release pipe to parent: %s",
>   strerror(errno));
>  exit(EXIT_FAILURE);
>  }
> +if (old_stderr != STDOUT_FILENO && close(old_stderr) < 0) {
> +error_report("Could not release qemu-nbd: %s", strerror(errno));
> +exit(EXIT_FAILURE);
> +}
>  }
>
>  #if HAVE_NBD_DEVICE
> @@ -332,7 +337,7 @@ static void *nbd_client_thread(void *arg)
>  fprintf(stderr, "NBD device %s is now connected to %s\n",
>  opts->device, opts->srcpath);
>  } else {
> -nbd_client_release_pipe();
> +nbd_client_release_pipe(opts->stderr);
>  }
>
>  if (nbd_client(fd) < 0) {
> @@ -597,6 +602,7 @@ int main(int argc, char **argv)
>  .device = NULL,
>  .srcpath = NULL,
>  .saddr = NULL,
> +.stderr = STDOUT_FILENO,
>  };
>
>  #ifdef CONFIG_POSIX
> @@ -951,6 +957,16 @@ int main(int argc, char **argv)
>
>  close(stderr_fd[0]);
>
> +/* Remember parent's stderr if we will be restoring it. */
> +if (opts.verbose /* fork_process is set */) {
> +opts.stderr = dup(STDERR_FILENO);
> +if (opts.stderr < 0) {
> +error_report("Could not dup original stderr: %s",
> + strerror(errno));
> +

Re: [PATCH v3 4/4] io: follow coroutine AioContext in qio_channel_yield()

2023-09-08 Thread Stefan Hajnoczi

On Thu, Sep 07, 2023 at 03:41:08PM -0500, Eric Blake wrote:
> On Wed, Aug 30, 2023 at 06:48:02PM -0400, Stefan Hajnoczi wrote:
> > The ongoing QEMU multi-queue block layer effort makes it possible for 
> > multiple
> > threads to process I/O in parallel. The nbd block driver is not compatible 
> > with
> > the multi-queue block layer yet because QIOChannel cannot be used easily 
> > from
> > coroutines running in multiple threads. This series changes the QIOChannel 
> > API
> > to make that possible.
> > 
> ...
> > 
> > This API change allows the nbd block driver to use QIOChannel from any 
> > thread.
> > It's important to keep in mind that the block driver already synchronizes
> > QIOChannel access and ensures that two coroutines never read simultaneously 
> > or
> > write simultaneously.
> > 
> > This patch updates all users of qio_channel_attach_aio_context() to the
> > new API. Most conversions are simple, but vhost-user-server requires a
> > new qemu_coroutine_yield() call to quiesce the vu_client_trip()
> > coroutine when not attached to any AioContext.
> > 
> > While the API is has become simpler, there is one wart: QIOChannel has a
> > special case for the iohandler AioContext (used for handlers that must not 
> > run
> > in nested event loops). I didn't find an elegant way preserve that 
> > behavior, so
> > I added a new API called qio_channel_set_follow_coroutine_ctx(ioc, 
> > true|false)
> > for opting in to the new AioContext model. By default QIOChannel uses the
> > iohandler AioHandler. Code that formerly called
> > qio_channel_attach_aio_context() now calls
> > qio_channel_set_follow_coroutine_ctx(ioc, true) once after the QIOChannel is
> > created.
> > 
> > Signed-off-by: Stefan Hajnoczi 
> > Reviewed-by: Eric Blake 
> > Acked-by: Daniel P. Berrangé 
> > ---
> >  include/io/channel-util.h|  23 ++
> >  include/io/channel.h |  69 --
> >  include/qemu/vhost-user-server.h |   1 +
> >  block/nbd.c  |  11 +--
> >  io/channel-command.c |  10 ++-
> >  io/channel-file.c|   9 ++-
> >  io/channel-null.c|   3 +-
> >  io/channel-socket.c  |   9 ++-
> >  io/channel-tls.c |   6 +-
> >  io/channel-util.c|  24 +++
> >  io/channel.c | 120 ++-
> >  migration/channel-block.c|   3 +-
> >  nbd/server.c |  14 +---
> >  scsi/qemu-pr-helper.c|   4 +-
> >  util/vhost-user-server.c |  27 +--
> >  15 files changed, 216 insertions(+), 117 deletions(-)
> 
> Looks like migration/rdma.c is also impacted:
> 
> ../migration/rdma.c: In function ‘qio_channel_rdma_class_init’:
> ../migration/rdma.c:4037:38: error: assignment to ‘void (*)(QIOChannel *, 
> AioContext *, void (*)(void *), AioContext *, void (*)(void *), void *)’ from 
> incompatible pointer type ‘void (*)(QIOChannel *, AioContext *, void (*)(void 
> *), void (*)(void *), void *)’ [-Werror=incompatible-pointer-types]
>  4037 | ioc_klass->io_set_aio_fd_handler = 
> qio_channel_rdma_set_aio_fd_handler;
>   |  ^
> 
> I'm squashing this in:
> 
> diff --git i/migration/rdma.c w/migration/rdma.c
> index ca430d319d9..a2a3db35b1d 100644
> --- i/migration/rdma.c
> +++ w/migration/rdma.c
> @@ -3103,22 +3103,23 @@ static GSource 
> *qio_channel_rdma_create_watch(QIOChannel *ioc,
>  }
> 
>  static void qio_channel_rdma_set_aio_fd_handler(QIOChannel *ioc,
> -  AioContext *ctx,
> -  IOHandler *io_read,
> -  IOHandler *io_write,
> -  void *opaque)
> +AioContext *read_ctx,
> +IOHandler *io_read,
> +AioContext *write_ctx,
> +IOHandler *io_write,
> +void *opaque)
>  {
>  QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
>  if (io_read) {
> -aio_set_fd_handler(ctx, rioc->rdmain->recv_comp_channel->fd, io_read,
> -   io_write, NULL, NULL, opaque);
> -aio_set_fd_handler(ctx, rioc->rdmain->send_comp_channel->fd, io_read,
> -

Re: [PATCH 0/2] virtio: Drop out of coroutine context in virtio_load()

2023-09-08 Thread Stefan Hajnoczi

On Tue, Sep 05, 2023 at 04:50:00PM +0200, Kevin Wolf wrote:
> This fixes a recently introduced assertion failure that was reported to
> happen when migrating virtio-net with a failover. The latent bug that
> we're executing code in coroutine context that was never supposed to run
> there has existed for a long time. However, the new assertion that
> callers of bdrv_graph_rdlock_main_loop() don't run in coroutine context
> makes it very visible because it's now always a crash.
> 
> Kevin Wolf (2):
>   vmstate: Mark VMStateInfo.get/put() coroutine_mixed_fn
>   virtio: Drop out of coroutine context in virtio_load()
> 
>  include/migration/vmstate.h |  8 ---
>  hw/virtio/virtio.c  | 45 -
>  2 files changed, 45 insertions(+), 8 deletions(-)
> 
> -- 
> 2.41.0
> 

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH 2/2] virtio: Drop out of coroutine context in virtio_load()

2023-09-08 Thread Stefan Hajnoczi

On Fri, Sep 08, 2023 at 10:59:48AM +0200, Kevin Wolf wrote:
> Am 07.09.2023 um 20:40 hat Stefan Hajnoczi geschrieben:
> > On Tue, Sep 05, 2023 at 04:50:02PM +0200, Kevin Wolf wrote:
> > > virtio_load() as a whole should run in coroutine context because it
> > > reads from the migration stream and we don't want this to block.
> > 
> > Is that "should" a "must" or a "can"?
> > 
> > If it's a "must" then virtio_load() needs assert(qemu_in_coroutine()).
> > 
> > But the previous patch mentioned that loadvm for snapshots calls it
> > outside coroutine context. So maybe it's a "can"?
> 
> Where this makes a difference is when the function indirectly calls into
> QIOChannel. When called from a coroutine, it yields while waiting for
> I/O, and outside of a coroutine it blocks. Yielding is always
> preferable, but in cases like HMP savevm/loadvm we also don't really
> care because it's synchronous anyway.
> 
> Whether that makes it a MAY or a SHOULD in the RFC sense, you decide.
> If you wanted to make it a MUST, you'd need to check all callers first
> and change some of them.

Thanks for clarifying. It is "can".

Stefan


signature.asc
Description: PGP signature

Re: [virtio-dev] [RFC PATCH v2] docs/interop: define PROBE feature for vhost-user VirtIO devices

2023-09-08 Thread Stefan Hajnoczi

On Fri, 8 Sept 2023 at 02:43, Alex Bennée  wrote:
>
>
> Stefan Hajnoczi  writes:
>
> > On Tue, Sep 05, 2023 at 10:34:11AM +0100, Alex Bennée wrote:
> >>
> >> Albert Esteve  writes:
> >>
> >> > This looks great! Thanks for this proposal.
> >> >
> >> > On Fri, Sep 1, 2023 at 1:00 PM Alex Bennée  
> >> > wrote:
> >> >
> >> >  Currently QEMU has to know some details about the VirtIO device
> >> >  supported by a vhost-user daemon to be able to setup the guest. This
> >> >  makes it hard for QEMU to add support for additional vhost-user
> >> >  daemons without adding specific stubs for each additional VirtIO
> >> >  device.
> >> >
> >> >  This patch suggests a new feature flag (VHOST_USER_PROTOCOL_F_PROBE)
> >> >  which the back-end can advertise which allows a probe message to be
> >> >  sent to get all the details QEMU needs to know in one message.
> >> >
> >> >  Together with the existing features VHOST_USER_PROTOCOL_F_STATUS and
> >> >  VHOST_USER_PROTOCOL_F_CONFIG we can create "standalone" vhost-user
> >> >  daemons which are capable of handling all aspects of the VirtIO
> >> >  transactions with only a generic stub on the QEMU side. These daemons
> >> >  can also be used without QEMU in situations where there isn't a full
> >> >  VMM managing their setup.
> >> >
> >> >  Signed-off-by: Alex Bennée 
> >> >
> >> >  ---
> >> >  v2
> >> >- dropped F_STANDALONE in favour of F_PROBE
> >> >- split probe details across several messages
> >> >- probe messages don't automatically imply a standalone daemon
> >> >- add wording where probe details interact (F_MQ/F_CONFIG)
> >> >- define VMM and make clear QEMU is only one of many potential VMMs
> >> >- reword commit message
> >> >  ---
> >> >   docs/interop/vhost-user.rst | 90 -
> >> >   hw/virtio/vhost-user.c  |  8 
> >> >   2 files changed, 88 insertions(+), 10 deletions(-)
> >> >
> >> >  diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> >> >  index 5a070adbc1..ba3b5e07b7 100644
> >> >  --- a/docs/interop/vhost-user.rst
> >> >  +++ b/docs/interop/vhost-user.rst
> >> >  @@ -7,6 +7,7 @@ Vhost-user Protocol
> >> >   ..
> >> > Copyright 2014 Virtual Open Systems Sarl.
> >> > Copyright 2019 Intel Corporation
> >> >  +  Copyright 2023 Linaro Ltd
> >> > Licence: This work is licensed under the terms of the GNU GPL,
> >> >  version 2 or later. See the COPYING file in the top-level
> >> >  directory.
> >> >  @@ -27,17 +28,31 @@ The protocol defines 2 sides of the communication, 
> >> > *front-end* and
> >> >   *back-end*. The *front-end* is the application that shares its 
> >> > virtqueues, in
> >> >   our case QEMU. The *back-end* is the consumer of the virtqueues.
> >> >
> >> >  -In the current implementation QEMU is the *front-end*, and the 
> >> > *back-end*
> >> >  -is the external process consuming the virtio queues, for example a
> >> >  -software Ethernet switch running in user space, such as Snabbswitch,
> >> >  -or a block device back-end processing read & write to a virtual
> >> >  -disk. In order to facilitate interoperability between various back-end
> >> >  -implementations, it is recommended to follow the :ref:`Backend program
> >> >  -conventions `.
> >> >  +In the current implementation a Virtual Machine Manager (VMM) such as
> >> >  +QEMU is the *front-end*, and the *back-end* is the external process
> >> >  +consuming the virtio queues, for example a software Ethernet switch
> >> >  +running in user space, such as Snabbswitch, or a block device back-end
> >> >  +processing read & write to a virtual disk. In order to facilitate
> >> >  +interoperability between various back-end implementations, it is
> >> >  +recommended to follow the :ref:`Backend program conventions
> >> >  +`.
> >> >
> >> >   The *front-end* and *back-end* can be either a client (i.e. 
> >> > connecting) or
> >> >   server (listening) in the socket communication.
> >> >
> >>

Re: [RFC 1/3] hmp: avoid the nested event loop in handle_hmp_command()

2023-09-07 Thread Stefan Hajnoczi

On Thu, 7 Sept 2023 at 16:53, Dr. David Alan Gilbert  wrote:
>
> * Stefan Hajnoczi (stefa...@gmail.com) wrote:
> > On Thu, 7 Sept 2023 at 10:07, Dr. David Alan Gilbert  
> > wrote:
> > >
> > > * Stefan Hajnoczi (stefa...@redhat.com) wrote:
> > > > On Thu, Sep 07, 2023 at 01:06:39AM +, Dr. David Alan Gilbert wrote:
> > > > > * Stefan Hajnoczi (stefa...@redhat.com) wrote:
> > > > > > Coroutine HMP commands currently run to completion in a nested event
> > > > > > loop with the Big QEMU Lock (BQL) held. The call_rcu thread also 
> > > > > > uses
> > > > > > the BQL and cannot process work while the coroutine monitor command 
> > > > > > is
> > > > > > running. A deadlock occurs when monitor commands attempt to wait for
> > > > > > call_rcu work to finish.
> > > > >
> > > > > I hate to think if there's anywhere else that ends up doing that
> > > > > other than the monitors.
> > > >
> > > > Luckily drain_call_rcu() has few callers: just
> > > > xen_block_device_destroy() and qmp_device_add(). We only need to worry
> > > > about their call stacks.
> > > >
> > > > I haven't looked at the Xen code.
> > > >
> > > > >
> > > > > But, not knowing the semantics of the rcu code, it looks kind of OK to
> > > > > me from the monitor.
> > > > >
> > > > > (Do you ever get anything like qemu quitting from one of the other
> > > > > monitors while this coroutine hasn't been run?)
> > > >
> > > > Not sure what you mean?
> > >
> > > Imagine that just after you create your coroutine, a vCPU does a
> > > shutdown and qemu is configured to quit, or on another monitor someone
> > > does a quit;  does your coroutine get executed or not?
> >
> > I think the answer is that it depends.
> >
> > A coroutine can run for a while and then yield while waiting for a
> > timer, BH, fd handler, etc. If the coroutine has yielded then I think
> > QEMU could terminate.
> >
> > The behavior of entering a coroutine for the first time depends on the
> > API that is used (e.g. qemu_coroutine_enter()/aio_co_enter()/etc).
> > qemu_coroutine_enter() is immediate but aio_co_enter() contains
> > indirect code paths like scheduling a BH.
> >
> > To summarize: ¯\_(ツ)_/¯
>
> That does mean you leave your g_new'd data and qdict allocated at
> exit - meh
>
> I'm not sure if it means you're making any other assumptions about what
> happens if the coroutine gets run during the exit path; although I guess
> there are plenty of other cases like that.

Yes, I think QEMU has some resources (memory, file descriptors, etc)
that are not freed on exit.

Stefan

>
> Dave
>
> > Stefan
> >
> > >
> > > Dave
> > >
> > > > Stefan
> > > >
> > > > >
> > > > > Dave
> > > > >
> > > > > > This patch refactors the HMP monitor to use the existing event loop
> > > > > > instead of creating a nested event loop. This will allow the next
> > > > > > patches to rely on draining call_rcu work.
> > > > > >
> > > > > > Signed-off-by: Stefan Hajnoczi 
> > > > > > ---
> > > > > >  monitor/hmp.c | 28 +++-
> > > > > >  1 file changed, 15 insertions(+), 13 deletions(-)
> > > > > >
> > > > > > diff --git a/monitor/hmp.c b/monitor/hmp.c
> > > > > > index 69c1b7e98a..6cff2810aa 100644
> > > > > > --- a/monitor/hmp.c
> > > > > > +++ b/monitor/hmp.c
> > > > > > @@ -,15 +,17 @@ typedef struct HandleHmpCommandCo {
> > > > > >  Monitor *mon;
> > > > > >  const HMPCommand *cmd;
> > > > > >  QDict *qdict;
> > > > > > -bool done;
> > > > > >  } HandleHmpCommandCo;
> > > > > >
> > > > > > -static void handle_hmp_command_co(void *opaque)
> > > > > > +static void coroutine_fn handle_hmp_command_co(void *opaque)
> > > > > >  {
> > > > > >  HandleHmpCommandCo *data = opaque;
> > > > > > +
> > > > > >  handle_hmp_command_exec(data->mon, data->cmd, data->

Re: [virtio-dev] [RFC PATCH v2] docs/interop: define PROBE feature for vhost-user VirtIO devices

2023-09-07 Thread Stefan Hajnoczi

On Tue, Sep 05, 2023 at 10:34:11AM +0100, Alex Bennée wrote:
> 
> Albert Esteve  writes:
> 
> > This looks great! Thanks for this proposal.
> >
> > On Fri, Sep 1, 2023 at 1:00 PM Alex Bennée  wrote:
> >
> >  Currently QEMU has to know some details about the VirtIO device
> >  supported by a vhost-user daemon to be able to setup the guest. This
> >  makes it hard for QEMU to add support for additional vhost-user
> >  daemons without adding specific stubs for each additional VirtIO
> >  device.
> >
> >  This patch suggests a new feature flag (VHOST_USER_PROTOCOL_F_PROBE)
> >  which the back-end can advertise which allows a probe message to be
> >  sent to get all the details QEMU needs to know in one message.
> >
> >  Together with the existing features VHOST_USER_PROTOCOL_F_STATUS and
> >  VHOST_USER_PROTOCOL_F_CONFIG we can create "standalone" vhost-user
> >  daemons which are capable of handling all aspects of the VirtIO
> >  transactions with only a generic stub on the QEMU side. These daemons
> >  can also be used without QEMU in situations where there isn't a full
> >  VMM managing their setup.
> >
> >  Signed-off-by: Alex Bennée 
> >
> >  ---
> >  v2
> >- dropped F_STANDALONE in favour of F_PROBE
> >- split probe details across several messages
> >- probe messages don't automatically imply a standalone daemon
> >- add wording where probe details interact (F_MQ/F_CONFIG)
> >- define VMM and make clear QEMU is only one of many potential VMMs
> >- reword commit message
> >  ---
> >   docs/interop/vhost-user.rst | 90 -
> >   hw/virtio/vhost-user.c  |  8 
> >   2 files changed, 88 insertions(+), 10 deletions(-)
> >
> >  diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> >  index 5a070adbc1..ba3b5e07b7 100644
> >  --- a/docs/interop/vhost-user.rst
> >  +++ b/docs/interop/vhost-user.rst
> >  @@ -7,6 +7,7 @@ Vhost-user Protocol
> >   ..
> > Copyright 2014 Virtual Open Systems Sarl.
> > Copyright 2019 Intel Corporation
> >  +  Copyright 2023 Linaro Ltd
> > Licence: This work is licensed under the terms of the GNU GPL,
> >  version 2 or later. See the COPYING file in the top-level
> >  directory.
> >  @@ -27,17 +28,31 @@ The protocol defines 2 sides of the communication, 
> > *front-end* and
> >   *back-end*. The *front-end* is the application that shares its 
> > virtqueues, in
> >   our case QEMU. The *back-end* is the consumer of the virtqueues.
> >
> >  -In the current implementation QEMU is the *front-end*, and the *back-end*
> >  -is the external process consuming the virtio queues, for example a
> >  -software Ethernet switch running in user space, such as Snabbswitch,
> >  -or a block device back-end processing read & write to a virtual
> >  -disk. In order to facilitate interoperability between various back-end
> >  -implementations, it is recommended to follow the :ref:`Backend program
> >  -conventions `.
> >  +In the current implementation a Virtual Machine Manager (VMM) such as
> >  +QEMU is the *front-end*, and the *back-end* is the external process
> >  +consuming the virtio queues, for example a software Ethernet switch
> >  +running in user space, such as Snabbswitch, or a block device back-end
> >  +processing read & write to a virtual disk. In order to facilitate
> >  +interoperability between various back-end implementations, it is
> >  +recommended to follow the :ref:`Backend program conventions
> >  +`.
> >
> >   The *front-end* and *back-end* can be either a client (i.e. connecting) or
> >   server (listening) in the socket communication.
> >
> >  +Probing device details
> >  +--
> >  +
> >  +Traditionally the vhost-user daemon *back-end* shares configuration
> >  +responsibilities with the VMM *front-end* which needs to know certain
> >  +key bits of information about the device. This means the VMM needs to
> >  +define at least a minimal stub for each VirtIO device it wants to
> >  +support. If the daemon supports the right set of protocol features the
> >  +VMM can probe the daemon for the information it needs to setup the
> >  +device. See :ref:`Probing features for standalone daemons
> >  +` for more details.
> >  +
> >  +
> >   Support for platforms other than Linux
> >   --
> >
> >  @@ -316,6 +331,7 @@ replies. Here is a list of the ones that do:
> >   * ``VHOST_USER_GET_VRING_BASE``
> >   * ``VHOST_USER_SET_LOG_BASE`` (if ``VHOST_USER_PROTOCOL_F_LOG_SHMFD``)
> >   * ``VHOST_USER_GET_INFLIGHT_FD`` (if 
> > ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD``)
> >  +* ``VHOST_USER_GET_BACKEND_SPECS`` (if 
> > ``VHOST_USER_PROTOCOL_F_STANDALONE``)
> >
> >   .. seealso::
> >
> >  @@ -396,9 +412,10 @@ must support changing some configuration aspects on 
> > the fly.
> >   Multiple queue support
> >   --
> >
> >  -Many devices have a fixed number of virtqueues.  In this case the 
> > front-end
> >  -

Re: [RFC PATCH v2] docs/interop: define PROBE feature for vhost-user VirtIO devices

2023-09-07 Thread Stefan Hajnoczi

On Fri, Sep 01, 2023 at 12:00:18PM +0100, Alex Bennée wrote:
> Currently QEMU has to know some details about the VirtIO device
> supported by a vhost-user daemon to be able to setup the guest. This
> makes it hard for QEMU to add support for additional vhost-user
> daemons without adding specific stubs for each additional VirtIO
> device.
> 
> This patch suggests a new feature flag (VHOST_USER_PROTOCOL_F_PROBE)
> which the back-end can advertise which allows a probe message to be
> sent to get all the details QEMU needs to know in one message.
> 
> Together with the existing features VHOST_USER_PROTOCOL_F_STATUS and
> VHOST_USER_PROTOCOL_F_CONFIG we can create "standalone" vhost-user
> daemons which are capable of handling all aspects of the VirtIO
> transactions with only a generic stub on the QEMU side. These daemons
> can also be used without QEMU in situations where there isn't a full
> VMM managing their setup.
> 
> Signed-off-by: Alex Bennée 

I think the mindset for this change should be "vhost-user is becoming a
VIRTIO Transport". VIRTIO Transports have a reasonably well-defined
feature set in the VIRTIO specification. The goal should be to cover
every VIRTIO Transport operation via vhost-user protocol messages so
that the VIRTIO device model can be fully conveyed over vhost-user.

Anything less is yet another ad-hoc protocol extension that will lead to
more bugs and hacks when it turns out some VIRTIO devices cannot be
expressed due to limitations in the protocol.

This requires going through the VIRTIO spec to find a correspondence
between virtio-pci/virtio-mmio/virtio-ccw's interfaces and vhost-user
protocol messages. In most cases vhost-user already offers messages and
your patch adds more of what is missing. I think this effort is already
very close but missing the final check that it really matches the VIRTIO
spec.

Please do the comparison against the VIRTIO Transports and then adjust
this patch to make it clear that the back-end is becoming a full-fledged
VIRTIO Transport:
- The name of the patch series should reflect that.
- The vhost-user protocol feature should be named F_TRANSPORT.
- The messages added in this patch should have a 1:1 correspondence with
  the VIRTIO spec including using the same terminology for consistency.

Sorry for the hassle, but I think this is a really crucial point where
we have the chance to make vhost-user work smoothly in the future...but
only if we can faithfully expose VIRTIO Transport semantics.

> 
> ---
> v2
>   - dropped F_STANDALONE in favour of F_PROBE
>   - split probe details across several messages
>   - probe messages don't automatically imply a standalone daemon
>   - add wording where probe details interact (F_MQ/F_CONFIG)
>   - define VMM and make clear QEMU is only one of many potential VMMs
>   - reword commit message
> ---
>  docs/interop/vhost-user.rst | 90 -
>  hw/virtio/vhost-user.c  |  8 
>  2 files changed, 88 insertions(+), 10 deletions(-)
> 
> diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> index 5a070adbc1..ba3b5e07b7 100644
> --- a/docs/interop/vhost-user.rst
> +++ b/docs/interop/vhost-user.rst
> @@ -7,6 +7,7 @@ Vhost-user Protocol
>  ..
>Copyright 2014 Virtual Open Systems Sarl.
>Copyright 2019 Intel Corporation
> +  Copyright 2023 Linaro Ltd
>Licence: This work is licensed under the terms of the GNU GPL,
> version 2 or later. See the COPYING file in the top-level
> directory.
> @@ -27,17 +28,31 @@ The protocol defines 2 sides of the communication, 
> *front-end* and
>  *back-end*. The *front-end* is the application that shares its virtqueues, in
>  our case QEMU. The *back-end* is the consumer of the virtqueues.
>  
> -In the current implementation QEMU is the *front-end*, and the *back-end*
> -is the external process consuming the virtio queues, for example a
> -software Ethernet switch running in user space, such as Snabbswitch,
> -or a block device back-end processing read & write to a virtual
> -disk. In order to facilitate interoperability between various back-end
> -implementations, it is recommended to follow the :ref:`Backend program
> -conventions `.
> +In the current implementation a Virtual Machine Manager (VMM) such as
> +QEMU is the *front-end*, and the *back-end* is the external process
> +consuming the virtio queues, for example a software Ethernet switch
> +running in user space, such as Snabbswitch, or a block device back-end
> +processing read & write to a virtual disk. In order to facilitate
> +interoperability between various back-end implementations, it is
> +recommended to follow the :ref:`Backend program conventions
> +`.
>  
>  The *front-end* and *back-end* can be either a client (i.e. connecting) or
>  server (listening) in the socket communication.
>  
> +Probing device details
> +--
> +
> +Traditionally the vhost-user daemon *back-end* shares configuration
> +responsibilities with

Re: [PATCH 0/2] virtio: Drop out of coroutine context in virtio_load()

2023-09-07 Thread Stefan Hajnoczi

On Tue, Sep 05, 2023 at 04:50:00PM +0200, Kevin Wolf wrote:
> This fixes a recently introduced assertion failure that was reported to
> happen when migrating virtio-net with a failover. The latent bug that
> we're executing code in coroutine context that was never supposed to run
> there has existed for a long time. However, the new assertion that
> callers of bdrv_graph_rdlock_main_loop() don't run in coroutine context
> makes it very visible because it's now always a crash.
> 
> Kevin Wolf (2):
>   vmstate: Mark VMStateInfo.get/put() coroutine_mixed_fn
>   virtio: Drop out of coroutine context in virtio_load()
> 
>  include/migration/vmstate.h |  8 ---
>  hw/virtio/virtio.c  | 45 -
>  2 files changed, 45 insertions(+), 8 deletions(-)

This looks like a bandaid for a specific instance of this problem rather
than a solution that takes care of the root cause.

Is it possible to make VMStateInfo.get/put() consistenty coroutine_fn?

Stefan


signature.asc
Description: PGP signature

< 4 5 6 7 8 9 10 11 12 13 >

801 - 900 of 9800 matches

Mail list logo