date:20240612

[PATCH v2] virtio-iommu: add error check before assert

2024-06-12 Thread Manos Pitsidianakis

A fuzzer case discovered by Zheyu Ma causes an assert failure.

Add a check before the assert, and respond with an error before moving
on to the next queue element.

To reproduce the failure:

cat << EOF | \
qemu-system-x86_64 \
-display none -machine accel=qtest -m 512M -machine q35 -nodefaults \
-device virtio-iommu -qtest stdio
outl 0xcf8 0x8804
outw 0xcfc 0x06
outl 0xcf8 0x8820
outl 0xcfc 0xe0004000
write 0x1e 0x1 0x01
write 0xe0004020 0x4 0x1000
write 0xe0004028 0x4 0x00101000
write 0xe000401c 0x1 0x01
write 0x106000 0x1 0x05
write 0x11 0x1 0x60
write 0x12 0x1 0x10
write 0x19 0x1 0x04
write 0x1c 0x1 0x01
write 0x100018 0x1 0x04
write 0x10001c 0x1 0x02
write 0x101003 0x1 0x01
write 0xe0007001 0x1 0x00
EOF

Reported-by: Zheyu Ma 
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2359
Signed-off-by: Manos Pitsidianakis 
---
Range-diff against v1:
1:  a665c6e73d ! 1:  8e50c1b00e virtio-iommu: add error check before assert
@@ Commit message
 
  ## hw/virtio/virtio-iommu.c ##
 @@ hw/virtio/virtio-iommu.c: static void 
virtio_iommu_handle_command(VirtIODevice *vdev, VirtQueue *vq)
+ iov = elem->out_sg;
+ sz = iov_to_buf(iov, iov_cnt, 0, , sizeof(head));
+ if (unlikely(sz != sizeof(head))) {
++qemu_log_mask(LOG_GUEST_ERROR,
++  "%s: read %zu bytes from command head"
++  "but expected %zu\n", __func__, sz, 
sizeof(head));
+ tail.status = VIRTIO_IOMMU_S_DEVERR;
+ goto out;
+ }
+@@ hw/virtio/virtio-iommu.c: static void 
virtio_iommu_handle_command(VirtIODevice *vdev, VirtQueue *vq)
  out:
  sz = iov_from_buf(elem->in_sg, elem->in_num, 0,
buf ? buf : , output_size);
 +if (unlikely(sz != output_size)) {
++qemu_log_mask(LOG_GUEST_ERROR,
++  "%s: wrote %zu bytes to command response"
++  "but response size is %zu\n",
++  __func__, sz, output_size);
 +tail.status = VIRTIO_IOMMU_S_DEVERR;
-+/* We checked that tail can fit earlier */
++/*
++ * We checked that sizeof(tail) can fit to elem->in_sg at the
++ * beginning of the loop
++ */
 +output_size = sizeof(tail);
 +g_free(buf);
 +buf = NULL;

 hw/virtio/virtio-iommu.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 1326c6ec41..9d801fb180 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -782,6 +782,9 @@ static void virtio_iommu_handle_command(VirtIODevice *vdev, 
VirtQueue *vq)
 iov = elem->out_sg;
 sz = iov_to_buf(iov, iov_cnt, 0, , sizeof(head));
 if (unlikely(sz != sizeof(head))) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "%s: read %zu bytes from command head"
+  "but expected %zu\n", __func__, sz, sizeof(head));
 tail.status = VIRTIO_IOMMU_S_DEVERR;
 goto out;
 }
@@ -818,6 +821,25 @@ static void virtio_iommu_handle_command(VirtIODevice 
*vdev, VirtQueue *vq)
 out:
 sz = iov_from_buf(elem->in_sg, elem->in_num, 0,
   buf ? buf : , output_size);
+if (unlikely(sz != output_size)) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "%s: wrote %zu bytes to command response"
+  "but response size is %zu\n",
+  __func__, sz, output_size);
+tail.status = VIRTIO_IOMMU_S_DEVERR;
+/*
+ * We checked that sizeof(tail) can fit to elem->in_sg at the
+ * beginning of the loop
+ */
+output_size = sizeof(tail);
+g_free(buf);
+buf = NULL;
+sz = iov_from_buf(elem->in_sg,
+  elem->in_num,
+  0,
+  ,
+  output_size);
+}
 assert(sz == output_size);
 
 virtqueue_push(vq, elem, sz);

base-commit: f3e8cc47de2bc537d4991e883a85208e4e1c0f98
-- 
γαῖα πυρί μιχθήτω

Re: [RFC PATCH 02/16] accel/tcg: memory access from CPU will pass access_type to IOMMU

2024-06-12 Thread Ethan Chen via

On Wed, Jun 12, 2024 at 04:14:02PM +0800, Jim Shu wrote:
> [EXTERNAL MAIL]
> 
> It is the preparation patch for upcoming RISC-V wgChecker device.
> 
> Since RISC-V wgChecker could permit access in RO/WO permission, the
> IOMMUMemoryRegion could return different section for read & write
> access. The memory access from CPU should also pass the access_type to
> IOMMU translate function so that IOMMU could return the correct section
> of specified access_type.
> 

Hi Jim,

Does this method take into account the situation where the CPU access type is
different from the access type when creating iotlb? I think the section
might be wrong in this situation.

Thanks,
Ethan
> 
>

Re: [PULL 00/25] target/i386, SCSI changes for 2024-06-11

2024-06-12 Thread Richard Henderson


On 6/11/24 07:24, Paolo Bonzini wrote:

The following changes since commit 80e8f0602168f451a93e71cbb1d59e93d745e62e:

   Merge tag 'bsd-user-misc-2024q2-pull-request' of gitlab.com:bsdimp/qemu into 
staging (2024-06-09 11:21:55 -0700)

are available in the Git repository at:

   https://gitlab.com/bonzini/qemu.git  tags/for-upstream

for you to fetch changes up to 58ab5e809ad66a02b6fa273ba11ed35b8b2fea60:

   target/i386: SEV: do not assume machine->cgs is SEV (2024-06-11 14:29:23 
+0200)


* i386: fix issue with cache topology passthrough
* scsi-disk: migrate emulated requests
* i386/sev: fix Coverity issues
* i386/tcg: more conversions to new decoder


Fails testing:

https://gitlab.com/qemu-project/qemu/-/jobs/7087568710
https://gitlab.com/qemu-project/qemu/-/jobs/7087568705
https://gitlab.com/qemu-project/qemu/-/jobs/7087568701
https://gitlab.com/qemu-project/qemu/-/jobs/7087568694


r~

Re: [PATCH v1] virtio-iommu: add error check before assert

2024-06-12 Thread Manos Pitsidianakis


On Wed, 12 Jun 2024 11:56, Alex Bennée  wrote:

Manos Pitsidianakis  writes:


On Tue, 11 Jun 2024 at 18:01, Philippe Mathieu-Daudé  wrote:


On 11/6/24 14:23, Manos Pitsidianakis wrote:
> A fuzzer case discovered by Zheyu Ma causes an assert failure.
>
> Add a check before the assert, and respond with an error before moving
> on to the next queue element.
>
> To reproduce the failure:
>
> cat << EOF | \
> qemu-system-x86_64 \
> -display none -machine accel=qtest -m 512M -machine q35 -nodefaults \
> -device virtio-iommu -qtest stdio
> outl 0xcf8 0x8804
> outw 0xcfc 0x06
> outl 0xcf8 0x8820
> outl 0xcfc 0xe0004000
> write 0x1e 0x1 0x01
> write 0xe0004020 0x4 0x1000
> write 0xe0004028 0x4 0x00101000
> write 0xe000401c 0x1 0x01
> write 0x106000 0x1 0x05
> write 0x11 0x1 0x60
> write 0x12 0x1 0x10
> write 0x19 0x1 0x04
> write 0x1c 0x1 0x01
> write 0x100018 0x1 0x04
> write 0x10001c 0x1 0x02
> write 0x101003 0x1 0x01
> write 0xe0007001 0x1 0x00
> EOF
>
> Reported-by: Zheyu Ma 
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2359
> Signed-off-by: Manos Pitsidianakis 
> ---
>   hw/virtio/virtio-iommu.c | 12 
>   1 file changed, 12 insertions(+)
>
> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
> index 1326c6ec41..9b99def39f 100644
> --- a/hw/virtio/virtio-iommu.c
> +++ b/hw/virtio/virtio-iommu.c
> @@ -818,6 +818,18 @@ static void virtio_iommu_handle_command(VirtIODevice 
*vdev, VirtQueue *vq)
>   out:
>   sz = iov_from_buf(elem->in_sg, elem->in_num, 0,
> buf ? buf : , output_size);
> +if (unlikely(sz != output_size)) {

Is this a normal guest behavior? Should we log it as GUEST_ERROR?


It's not, it'd be a virtio spec (implementation) mis-use by the guest.
the Internal device error (VIRTIO_IOMMU_S_DEVERR) would be logged by
the kernel; should we log it as well?


Yes logging guest errors are useful when attempting to work out if
guests are buggy or QEMU is in the future.


Thanks Philippe and Alex, will send a v2 with a log print.

Re: [PATCH v1] virtio-iommu: add error check before assert

2024-06-12 Thread Manos Pitsidianakis


On Wed, 12 Jun 2024 12:46, Alex Bennée  wrote:

Manos Pitsidianakis  writes:


A fuzzer case discovered by Zheyu Ma causes an assert failure.

Add a check before the assert, and respond with an error before moving
on to the next queue element.

To reproduce the failure:

cat << EOF | \
qemu-system-x86_64 \
-display none -machine accel=qtest -m 512M -machine q35 -nodefaults \
-device virtio-iommu -qtest stdio
outl 0xcf8 0x8804
outw 0xcfc 0x06
outl 0xcf8 0x8820
outl 0xcfc 0xe0004000
write 0x1e 0x1 0x01
write 0xe0004020 0x4 0x1000
write 0xe0004028 0x4 0x00101000
write 0xe000401c 0x1 0x01
write 0x106000 0x1 0x05
write 0x11 0x1 0x60
write 0x12 0x1 0x10
write 0x19 0x1 0x04
write 0x1c 0x1 0x01
write 0x100018 0x1 0x04
write 0x10001c 0x1 0x02
write 0x101003 0x1 0x01
write 0xe0007001 0x1 0x00
EOF

Reported-by: Zheyu Ma 
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2359
Signed-off-by: Manos Pitsidianakis 
---
 hw/virtio/virtio-iommu.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 1326c6ec41..9b99def39f 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -818,6 +818,18 @@ static void virtio_iommu_handle_command(VirtIODevice 
*vdev, VirtQueue *vq)
 out:
 sz = iov_from_buf(elem->in_sg, elem->in_num, 0,
   buf ? buf : , output_size);
+if (unlikely(sz != output_size)) {
+tail.status = VIRTIO_IOMMU_S_DEVERR;
+/* We checked that tail can fit earlier */
+output_size = sizeof(tail);
+g_free(buf);
+buf = NULL;


Hmm this is a similar pattern I noticed yesterday in:

 Message-ID: <20240527133140.218300-2-fro...@swemel.ru>
 Date: Mon, 27 May 2024 16:31:41 +0300
 Subject: [PATCH] hw/net/virtio-net.c: fix crash in iov_copy()
 From: Dmitry Frolov 

And I wonder if the same comment applies. Could we clean-up the loop
with autofrees to avoid making sure all the g_free() calls are properly
lined up?



The virtio-net.c patch adds an iov_size check for the virt queue element 
to make sure it can fit a header len. In this function, 
virtio_iommu_handle_command, a similar check is performed after popping 
the element after the queue. That's what the "we checked that tail can 
fit earlier" comment refers to.  Is this what you were referring to by 
any chance?





+sz = iov_from_buf(elem->in_sg,
+  elem->in_num,
+  0,
+  ,
+  output_size);
+}


Isn't this the next element? Could we continue; instead?


It's not, the element is popped on the beginning of the for loop. I 
think we should not continue because we have written a VIRTIO error 
value for the guest and have to give it back as a response.






 assert(sz == output_size);
 
 virtqueue_push(vq, elem, sz);


base-commit: 80e8f0602168f451a93e71cbb1d59e93d745e62e


--
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: Qemu License question

2024-06-12 Thread Markus Armbruster

Manos Pitsidianakis  writes:

> On Thu, 13 Jun 2024 06:26, Peng Fan  wrote:
>>Hi All,
>>
>>The following files are marked as GPL-3.0-or-later. Will these
>>Conflict with Qemu LICENSE?
>>
>>Should we update the files to GPL-2.0?
>>
>>./tests/tcg/aarch64/semicall.h:7: * SPDX-License-Identifier: GPL-3.0-or-later
>>./tests/tcg/x86_64/system/boot.S:13: * SPDX-License-Identifier: 
>>GPL-3.0-or-later
>>./tests/tcg/riscv64/semicall.h:7: * SPDX-License-Identifier: GPL-3.0-or-later
>>./tests/tcg/multiarch/float_convs.c:6: * SPDX-License-Identifier: 
>>GPL-3.0-or-later
>>./tests/tcg/multiarch/float_helpers.h:6: * SPDX-License-Identifier: 
>>GPL-3.0-or-later
>>./tests/tcg/multiarch/libs/float_helpers.c:10: * SPDX-License-Identifier: 
>>GPL-3.0-or-later
>>./tests/tcg/multiarch/arm-compat-semi/semihosting.c:7: * 
>>SPDX-License-Identifier: GPL-3.0-or-later
>>./tests/tcg/multiarch/arm-compat-semi/semiconsole.c:7: * 
>>SPDX-License-Identifier: GPL-3.0-or-later
>>./tests/tcg/multiarch/float_convd.c:6: * SPDX-License-Identifier: 
>>GPL-3.0-or-later
>>./tests/tcg/multiarch/float_madds.c:6: * SPDX-License-Identifier: 
>>GPL-3.0-or-later
>>./tests/tcg/i386/system/boot.S:10: * SPDX-License-Identifier: GPL-3.0-or-later
>>./tests/tcg/arm/semicall.h:7: * SPDX-License-Identifier: GPL-3.0-or-later
>>
>>Thanks,
>>Peng.
>
>
> Hello Peng,
>
> These are all actually GPL-2.0-or-later, in fact I can't find the string 
> GPL-3.0-or-later in the current master at all.

See commit 542b10bd148a (tests/tcg: update licenses to GPLv2 as intended).

[PATCH] Update event idx if guest has made extra buffers during double check

2024-06-12 Thread thomas

Fixes: 06b12970174 ("virtio-net: fix network stall under load")

If guest has made some buffers available during double check,
but the total buffer size available is lower than @bufsize,
notify the guest with the latest available idx(event idx)
seen by the host.
---
 hw/net/virtio-net.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 9c7e85caea..23c6c8c898 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1654,6 +1654,7 @@ static int virtio_net_has_buffers(VirtIONetQueue *q, int 
bufsize)
 if (virtio_queue_empty(q->rx_vq) ||
 (n->mergeable_rx_bufs &&
  !virtqueue_avail_bytes(q->rx_vq, bufsize, 0))) {
+virtio_queue_set_notification(q->rx_vq, 1);
 return 0;
 }
 }
-- 
2.39.0

Re: [RFC PATCH v2 0/5] Implement ARM PL011 in Rust

2024-06-12 Thread Manos Pitsidianakis


Good morning Daniel,

On Wed, 12 Jun 2024 11:37, "Daniel P. Berrangé"  wrote:

On Tue, Jun 11, 2024 at 01:33:29PM +0300, Manos Pitsidianakis wrote:



 .gitignore |   2 +
 .gitlab-ci.d/buildtest.yml |  64 ++--
 MAINTAINERS|  13 +
 configure  |  12 +
 hw/arm/virt.c  |   4 +
 meson.build| 102 ++
 meson_options.txt  |   4 +
 rust/meson.build   |  93 ++
 rust/pl011/.cargo/config.toml  |   2 +
 rust/pl011/.gitignore  |   2 +
 rust/pl011/Cargo.lock  | 120 +++
 rust/pl011/Cargo.toml  |  66 
 rust/pl011/README.md   |  42 +++
 rust/pl011/build.rs|  44 +++
 rust/pl011/deny.toml   |  57 
 rust/pl011/meson.build |   7 +
 rust/pl011/rustfmt.toml|   1 +
 rust/pl011/src/definitions.rs  |  95 ++
 rust/pl011/src/device.rs   | 531 ++
 rust/pl011/src/device_class.rs |  95 ++
 rust/pl011/src/generated.rs|   5 +
 rust/pl011/src/lib.rs  | 581 +
 rust/pl011/src/memory_ops.rs   |  38 +++
 rust/rustfmt.toml  |   7 +
 rust/wrapper.h |  39 +++
 scripts/cargo_wrapper.py   | 221 +
 scripts/meson-buildoptions.sh  |   6 +


Given the priority of getting the build system correct, what's missing
here is updates/integration into our standard GitLab CI pipeline. If
that can be shown to be working, that'll give alot more confidence in
the overall solution.

Ideally this should not require anything more than updating the docker
container definitions to add in the rust toolchain, plus the appropriate
std library build for the given target - we cross compiler for every
arch we officially care about.

Most of our dockerfiles these days are managed by lcitool, and it has
nearly sufficient support for cross compiling with the rust std library.
So to start with, this series should modify tests/lcitool/projects/qemu.yml
to add

 - rust
 - rust-std

to the package list, and run 'make lcitool-refresh' to re-create the
dockerfiles - see the docs/devel/testing.rst for more info about
lcitool if needed.

Assuming these 2 rust packages are in the container, I would then
expect QEMU to just "do the right thing" when building this rust
code. If it does not, then that's a sign of gaps that need closing.

Getting rid of the need to use --rust-target-triple will be the
immediate gap that needs fixing, as CI just passes --cross-prefix
for cross-builds and expects everything to be set from that.

The main gap we have is that for Windows I need to update lcitool
to pull in the mingw std lib target for rust, which I something I
missed when adding rust cross compiler support.




Thanks very much for the pointers! I will start dealing with this in the 
next RFC version.


Re: the target triple, I agree 100%. In fact it wasn't my addition, I 
kept it from the previous rust RFC patchset that was posted on the list 
some years ago. It should be possible to construct the triplets 
ourselves and let the user override if they want to as mentioned in 
another email.


The rust project has official Docker images, do you think it's something 
we could use or is it unnecessary? 


https://github.com/rust-lang/docker-rust

Thanks,
Manos

Re: [RFC PATCH v2 3/5] rust: add PL011 device model

2024-06-12 Thread Manos Pitsidianakis


Good morning Paolo,

On Thu, 13 Jun 2024 00:27, Paolo Bonzini  wrote:
Il mer 12 giu 2024, 22:58 Manos Pitsidianakis < 
manos.pitsidiana...@linaro.org> ha scritto:


In any case, it is out of scope for this RFC. Introducing wrappers 
would be a gradual process.




Sure, how would you feel about such bindings being developed on list, 
and maintained in a (somewhat) long-lived experimental branch?


Hm the only thing that worries me is keeping it synced and postponing 
merge indefinitely. If we declare the rust parts as "experimental" we 
could evolve them quickly even on master.  What do the other maintainers 
think?


Thanks,
Manos

Re: Qemu License question

2024-06-12 Thread Manos Pitsidianakis


On Thu, 13 Jun 2024 06:26, Peng Fan  wrote:

Hi All,

The following files are marked as GPL-3.0-or-later. Will these
Conflict with Qemu LICENSE?

Should we update the files to GPL-2.0?

./tests/tcg/aarch64/semicall.h:7: * SPDX-License-Identifier: GPL-3.0-or-later
./tests/tcg/x86_64/system/boot.S:13: * SPDX-License-Identifier: GPL-3.0-or-later
./tests/tcg/riscv64/semicall.h:7: * SPDX-License-Identifier: GPL-3.0-or-later
./tests/tcg/multiarch/float_convs.c:6: * SPDX-License-Identifier: 
GPL-3.0-or-later
./tests/tcg/multiarch/float_helpers.h:6: * SPDX-License-Identifier: 
GPL-3.0-or-later
./tests/tcg/multiarch/libs/float_helpers.c:10: * SPDX-License-Identifier: 
GPL-3.0-or-later
./tests/tcg/multiarch/arm-compat-semi/semihosting.c:7: * 
SPDX-License-Identifier: GPL-3.0-or-later
./tests/tcg/multiarch/arm-compat-semi/semiconsole.c:7: * 
SPDX-License-Identifier: GPL-3.0-or-later
./tests/tcg/multiarch/float_convd.c:6: * SPDX-License-Identifier: 
GPL-3.0-or-later
./tests/tcg/multiarch/float_madds.c:6: * SPDX-License-Identifier: 
GPL-3.0-or-later
./tests/tcg/i386/system/boot.S:10: * SPDX-License-Identifier: GPL-3.0-or-later
./tests/tcg/arm/semicall.h:7: * SPDX-License-Identifier: GPL-3.0-or-later

Thanks,
Peng.



Hello Peng,

These are all actually GPL-2.0-or-later, in fact I can't find the string 
GPL-3.0-or-later in the current master at all.


Manos

Qemu License question

2024-06-12 Thread Peng Fan

Hi All,

The following files are marked as GPL-3.0-or-later. Will these
Conflict with Qemu LICENSE?

Should we update the files to GPL-2.0?

./tests/tcg/aarch64/semicall.h:7: * SPDX-License-Identifier: GPL-3.0-or-later
./tests/tcg/x86_64/system/boot.S:13: * SPDX-License-Identifier: GPL-3.0-or-later
./tests/tcg/riscv64/semicall.h:7: * SPDX-License-Identifier: GPL-3.0-or-later
./tests/tcg/multiarch/float_convs.c:6: * SPDX-License-Identifier: 
GPL-3.0-or-later
./tests/tcg/multiarch/float_helpers.h:6: * SPDX-License-Identifier: 
GPL-3.0-or-later
./tests/tcg/multiarch/libs/float_helpers.c:10: * SPDX-License-Identifier: 
GPL-3.0-or-later
./tests/tcg/multiarch/arm-compat-semi/semihosting.c:7: * 
SPDX-License-Identifier: GPL-3.0-or-later
./tests/tcg/multiarch/arm-compat-semi/semiconsole.c:7: * 
SPDX-License-Identifier: GPL-3.0-or-later
./tests/tcg/multiarch/float_convd.c:6: * SPDX-License-Identifier: 
GPL-3.0-or-later
./tests/tcg/multiarch/float_madds.c:6: * SPDX-License-Identifier: 
GPL-3.0-or-later
./tests/tcg/i386/system/boot.S:10: * SPDX-License-Identifier: GPL-3.0-or-later
./tests/tcg/arm/semicall.h:7: * SPDX-License-Identifier: GPL-3.0-or-later

Thanks,
Peng.

RE: [PATCH] Hexagon: lldb read/write predicate registers p0/p1/p2/p3

2024-06-12 Thread ltaylorsimpson




> -Original Message-
> From: Matheus Tavares Bernardino 
> Sent: Wednesday, June 12, 2024 12:30 PM
> To: ltaylorsimp...@gmail.com
> Cc: qemu-devel@nongnu.org; bc...@quicinc.com; tedw...@quicinc.com;
> alex.ben...@linaro.org; quic_mathb...@quicinc.com;
> sidn...@quicinc.com; quic_mlie...@quicinc.com;
> richard.hender...@linaro.org; phi...@linaro.org; a...@rev.ng; a...@rev.ng
> Subject: Re: [PATCH] Hexagon: lldb read/write predicate registers
> p0/p1/p2/p3
> 
> On Wed, 12 Jun 2024 10:42:39 -0600 Taylor Simpson
>  wrote:
> >
> > diff --git a/target/hexagon/gdbstub.c b/target/hexagon/gdbstub.c index
> > 502c6987f0..e67e627fc9 100644
> > --- a/target/hexagon/gdbstub.c
> > +++ b/target/hexagon/gdbstub.c
> > @@ -56,6 +64,15 @@ int hexagon_gdb_write_register(CPUState *cs,
> uint8_t *mem_buf, int n)
> >  return sizeof(target_ulong);
> >  }
> >
> > +n -= TOTAL_PER_THREAD_REGS;
> > +
> > +if (n < NUM_PREGS) {
> > +env->pred[n] = ldtul_p(mem_buf);
> > +return sizeof(uint8_t);
> 
> I wonder, shouldn't this be sizeof(target_ulong) since we wrote a
> target_ulong?

Good question.

>From the architecture point of view, predicates are 8 bits (Section 2.2.5 of
the v73 Hexagon PRM).  However, we model them in QEMU as target_ulong
because TCG variables must be either 32 bits or 64 bits.  There isn't an
option for 8 bits.  Whenever we write to a predicate, do "and" with 0xff
first to ensure there are only 8 bits written (see gen_log_pred_write in
target/hexagon/genptr.c).

I did some more digging and here is what I found:
- Since we have bitsize="8" in hexagon-core.xml, lldb will reject any
attempt to write something larger.
  (lldb) reg write p1 0x1ff
  error: Failed to write register 'p1' with value '0x1ff': value 0x1ff is
too large to fit in a 1 byte unsigned integer value
- For the lldb "reg write" command, the return value from
hexagon_gdb_write_register isn't used.
- The only place the return value is used is in handle_write_all_regs.  This
function is called in response to a "G" packet from the debugger.  I don't
know if/when lldb uses this packet, but it seems like it would count on it
being 8 bits since that's what is in hexagon-core.xml.

Ted , when would lldb generate a "G" packet, and what
assumptions will it make about the size of predicate registers?

Thanks,
Taylor

Re: [RFC PATCH] memory: Introduce memory region fetch operation

2024-06-12 Thread Ethan Chen via

On Wed, Jun 12, 2024 at 01:43:41PM +0100, Peter Maydell wrote:
> 
> On Wed, 12 Jun 2024 at 10:02, Ethan Chen via  wrote:
> >
> > Allow the memory region to have different behaviors for read and fetch
> > operations.
> >
> > For example RISCV IOPMP will raise interrupt when cpu try to fetch a
> > non-excutable region.
> 
> It actually raises an interrupt rather than it being a permissions fault?

Device can return bus error, interrupt or success with fabricated data.

> 
> > If fetch operation of a memory region is not implemented, it still uses the
> > read operation for fetch.
> 
> This patch should probably be part of the series with the device that
> needs it.

I will add this patch to IOPMP patch series.

Thanks,
Ethan

[PATCH v3 1/3] hw/dma: Enhance error handling in loading description

2024-06-12 Thread Fea.Wang

Loading a description from memory may cause a bus-error. In this
case, the DMA should stop working, set the error flag, and return
the failure value.

When calling the loading a description function, it should be noticed
that the function may return a failure value. Breaking the loop in this
case is one of the possible ways to handle it.

Signed-off-by: Fea.Wang 
Reviewed-by: Frank Chang 
---
 hw/dma/xilinx_axidma.c | 30 ++
 1 file changed, 26 insertions(+), 4 deletions(-)

diff --git a/hw/dma/xilinx_axidma.c b/hw/dma/xilinx_axidma.c
index 0ae056ed06..ad307994c2 100644
--- a/hw/dma/xilinx_axidma.c
+++ b/hw/dma/xilinx_axidma.c
@@ -71,8 +71,11 @@ enum {
 enum {
 DMASR_HALTED = 1,
 DMASR_IDLE  = 2,
+DMASR_SLVERR = 1 << 5,
+DMASR_DECERR = 1 << 6,
 DMASR_IOC_IRQ  = 1 << 12,
 DMASR_DLY_IRQ  = 1 << 13,
+DMASR_ERR_IRQ  = 1 << 14,
 
 DMASR_IRQ_MASK = 7 << 12
 };
@@ -190,17 +193,32 @@ static inline int streamid_from_addr(hwaddr addr)
 return sid;
 }
 
-static void stream_desc_load(struct Stream *s, hwaddr addr)
+static MemTxResult stream_desc_load(struct Stream *s, hwaddr addr)
 {
 struct SDesc *d = >desc;
 
-address_space_read(>dma->as, addr, MEMTXATTRS_UNSPECIFIED, d, sizeof 
*d);
+MemTxResult result = address_space_read(>dma->as,
+addr, MEMTXATTRS_UNSPECIFIED,
+d, sizeof *d);
+if (result != MEMTX_OK) {
+if (result == MEMTX_DECODE_ERROR) {
+s->regs[R_DMASR] |= DMASR_DECERR;
+} else {
+s->regs[R_DMASR] |= DMASR_SLVERR;
+}
+
+s->regs[R_DMACR] &= ~DMACR_RUNSTOP;
+s->regs[R_DMASR] |= DMASR_HALTED;
+s->regs[R_DMASR] |= DMASR_ERR_IRQ;
+return result;
+}
 
 /* Convert from LE into host endianness.  */
 d->buffer_address = le64_to_cpu(d->buffer_address);
 d->nxtdesc = le64_to_cpu(d->nxtdesc);
 d->control = le32_to_cpu(d->control);
 d->status = le32_to_cpu(d->status);
+return result;
 }
 
 static void stream_desc_store(struct Stream *s, hwaddr addr)
@@ -279,7 +297,9 @@ static void stream_process_mem2s(struct Stream *s, 
StreamSink *tx_data_dev,
 }
 
 while (1) {
-stream_desc_load(s, s->regs[R_CURDESC]);
+if (MEMTX_OK != stream_desc_load(s, s->regs[R_CURDESC])) {
+break;
+}
 
 if (s->desc.status & SDESC_STATUS_COMPLETE) {
 s->regs[R_DMASR] |= DMASR_HALTED;
@@ -336,7 +356,9 @@ static size_t stream_process_s2mem(struct Stream *s, 
unsigned char *buf,
 }
 
 while (len) {
-stream_desc_load(s, s->regs[R_CURDESC]);
+if (MEMTX_OK != stream_desc_load(s, s->regs[R_CURDESC])) {
+break;
+}
 
 if (s->desc.status & SDESC_STATUS_COMPLETE) {
 s->regs[R_DMASR] |= DMASR_HALTED;
-- 
2.34.1

[PATCH v3 2/3] hw/dma: Add a trace log for a description loading failure

2024-06-12 Thread Fea.Wang

Due to a description loading failure, adding a trace log makes observing
the DMA behavior easy.

Signed-off-by: Fea.Wang 
Reviewed-by: Edgar E. Iglesias 
Reviewed-by: Frank Chang 
---
 hw/dma/trace-events| 3 +++
 hw/dma/xilinx_axidma.c | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/hw/dma/trace-events b/hw/dma/trace-events
index 3c47df54e4..4c09790f9a 100644
--- a/hw/dma/trace-events
+++ b/hw/dma/trace-events
@@ -44,3 +44,6 @@ pl330_debug_exec_stall(void) "stall of debug instruction not 
implemented"
 pl330_iomem_write(uint32_t offset, uint32_t value) "addr: 0x%08"PRIx32" data: 
0x%08"PRIx32
 pl330_iomem_write_clr(int i) "event interrupt lowered %d"
 pl330_iomem_read(uint32_t addr, uint32_t data) "addr: 0x%08"PRIx32" data: 
0x%08"PRIx32
+
+# xilinx_axidma.c
+xilinx_axidma_loading_desc_fail(uint32_t res) "error:%u"
diff --git a/hw/dma/xilinx_axidma.c b/hw/dma/xilinx_axidma.c
index ad307994c2..c9cfc3169b 100644
--- a/hw/dma/xilinx_axidma.c
+++ b/hw/dma/xilinx_axidma.c
@@ -36,6 +36,7 @@
 #include "sysemu/dma.h"
 #include "hw/stream.h"
 #include "qom/object.h"
+#include "trace.h"
 
 #define D(x)
 
@@ -201,6 +202,8 @@ static MemTxResult stream_desc_load(struct Stream *s, 
hwaddr addr)
 addr, MEMTXATTRS_UNSPECIFIED,
 d, sizeof *d);
 if (result != MEMTX_OK) {
+trace_xilinx_axidma_loading_desc_fail(result);
+
 if (result == MEMTX_DECODE_ERROR) {
 s->regs[R_DMASR] |= DMASR_DECERR;
 } else {
-- 
2.34.1

[PATCH v3 0/3] hw/dma: Add error handling for loading descriptions failing

2024-06-12 Thread Fea.Wang

The original code assumes that memory transmission is always successful,
but in some cases, it gets bus-error.

Add error handling for DMA reading description failures. Do some
reasonable settings, and return the corrected transmission size.
Finally, return the error status.

* Fix the trace format for an unsigned variable

base-commit: d82f37faf5643897b2e61abb229499d64a51aa26

[v2]
* Add DMASR_DECERR case
* Squash the two commits to one

base-commit: 915758c537b5fe09575291f4acd87e2d377a93de


[v1]
base-commit: 1806da76cb81088ea026ca3441551782b850e393

Fea.Wang (3):
  hw/dma: Enhance error handling in loading description
  hw/dma: Add a trace log for a description loading failure
  hw/net: Fix the transmission return size

Fea.Wang (3):
  hw/dma: Enhance error handling in loading description
  hw/dma: Add a trace log for a description loading failure
  hw/net: Fix the transmission return size

 hw/dma/trace-events |  3 +++
 hw/dma/xilinx_axidma.c  | 33 +
 hw/net/xilinx_axienet.c |  2 +-
 3 files changed, 33 insertions(+), 5 deletions(-)

-- 
2.34.1

[PATCH v3 3/3] hw/net: Fix the transmission return size

2024-06-12 Thread Fea.Wang

Fix the transmission return size because not all bytes could be
transmitted successfully. So, return a successful length instead of a
constant value.

Signed-off-by: Fea.Wang 
Reviewed-by: Edgar E. Iglesias 
Reviewed-by: Frank Chang 
---
 hw/net/xilinx_axienet.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/net/xilinx_axienet.c b/hw/net/xilinx_axienet.c
index 7d1fd37b4a..05d41bd548 100644
--- a/hw/net/xilinx_axienet.c
+++ b/hw/net/xilinx_axienet.c
@@ -847,7 +847,7 @@ static ssize_t eth_rx(NetClientState *nc, const uint8_t 
*buf, size_t size)
 axienet_eth_rx_notify(s);
 
 enet_update_irq(s);
-return size;
+return s->rxpos;
 }
 
 static size_t
-- 
2.34.1

Re: [PATCH v2 2/3] hw/dma: Add a trace log for a description loading failure

2024-06-12 Thread Fea Wang

Sure, I will fix it in the next patch series.
Thank you

Sincerely,
Fea

On Mon, Jun 10, 2024 at 7:49 PM Philippe Mathieu-Daudé 
wrote:

> Hi Fea,
>
> On 4/6/24 09:15, Fea.Wang wrote:
> > Due to a description loading failure, adding a trace log makes observing
> > the DMA behavior easy.
> >
> > Signed-off-by: Fea.Wang 
> > Reviewed-by: Edgar E. Iglesias 
> > Reviewed-by: Frank Chang 
> > ---
> >   hw/dma/trace-events| 3 +++
> >   hw/dma/xilinx_axidma.c | 3 +++
> >   2 files changed, 6 insertions(+)
>
>
> > +# xilinx_axidma.c
> > +xilinx_axidma_loading_desc_fail(uint32_t res) "error:%d"
>
> Unsigned format is "%u".
>
> Regards,
>
> Phil.
>

RE: [PATCH v8 4/7] migration/multifd: add qpl compression method

2024-06-12 Thread Liu, Yuan1

> -Original Message-
> From: Fabiano Rosas 
> Sent: Wednesday, June 12, 2024 10:31 PM
> To: Liu, Yuan1 ; pet...@redhat.com;
> pbonz...@redhat.com; marcandre.lur...@redhat.com; berra...@redhat.com;
> th...@redhat.com; phi...@linaro.org
> Cc: qemu-devel@nongnu.org; Liu, Yuan1 ; Zou, Nanhai
> ; shameerali.kolothum.th...@huawei.com
> Subject: Re: [PATCH v8 4/7] migration/multifd: add qpl compression method
> 
> Fabiano Rosas  writes:
> 
> > Yuan Liu  writes:
> >
> >> add the Query Processing Library (QPL) compression method
> >>
> >> Introduce the qpl as a new multifd migration compression method, it can
> >> use In-Memory Analytics Accelerator(IAA) to accelerate compression and
> >> decompression, which can not only reduce network bandwidth requirement
> >> but also reduce host compression and decompression CPU overhead.
> >>
> >> How to enable qpl compression during migration:
> >> migrate_set_parameter multifd-compression qpl
> >>
> >> There is no qpl compression level parameter added since it only
> supports
> >> level one, users do not need to specify the qpl compression level.
> >>
> >> Signed-off-by: Yuan Liu 
> >> Reviewed-by: Nanhai Zou 
> >> Reviewed-by: Peter Xu 
> >> Reviewed-by: Fabiano Rosas 
> >
> > I don't think I ever reviewed this patch. Please drop this when you
> > resubmit.
> 
> Actually, just leave it. I thought you'd need to fix the output size on
> 6/7, but I saw you just moved it elsewhere. So no need to respin. I'll
> queue this version shortly unless anyone else has comments.

Got it and thank you for your review.

RE: [PATCH v8 4/7] migration/multifd: add qpl compression method

2024-06-12 Thread Liu, Yuan1

> -Original Message-
> From: Fabiano Rosas 
> Sent: Wednesday, June 12, 2024 10:27 PM
> To: Liu, Yuan1 ; pet...@redhat.com;
> pbonz...@redhat.com; marcandre.lur...@redhat.com; berra...@redhat.com;
> th...@redhat.com; phi...@linaro.org
> Cc: qemu-devel@nongnu.org; Liu, Yuan1 ; Zou, Nanhai
> ; shameerali.kolothum.th...@huawei.com
> Subject: Re: [PATCH v8 4/7] migration/multifd: add qpl compression method
> 
> Yuan Liu  writes:
> 
> > add the Query Processing Library (QPL) compression method
> >
> > Introduce the qpl as a new multifd migration compression method, it can
> > use In-Memory Analytics Accelerator(IAA) to accelerate compression and
> > decompression, which can not only reduce network bandwidth requirement
> > but also reduce host compression and decompression CPU overhead.
> >
> > How to enable qpl compression during migration:
> > migrate_set_parameter multifd-compression qpl
> >
> > There is no qpl compression level parameter added since it only supports
> > level one, users do not need to specify the qpl compression level.
> >
> > Signed-off-by: Yuan Liu 
> > Reviewed-by: Nanhai Zou 
> > Reviewed-by: Peter Xu 
> > Reviewed-by: Fabiano Rosas 
> 
> I don't think I ever reviewed this patch. Please drop this when you
> resubmit.

You are right, this is my mistake, I am very sorry.

Re: [RFC PATCH v2 0/2] ui/gtk: Introduce new param - Connectors

2024-06-12 Thread Kim, Dongwon

On 6/11/2024 11:42 PM, Marc-André Lureau wrote:

Hi

On Tue, Jun 11, 2024 at 10:28 PM Kim, Dongwon > wrote:

Hi Marc-André,

On 6/5/2024 12:26 AM, Marc-André Lureau wrote:
 > Hi
 >
 > On Tue, Jun 4, 2024 at 9:59 PM Kim, Dongwon
mailto:dongwon@intel.com>
 > >> wrote:

 > Xorg may not be going away soon, but it's used less and less. As
one of
 > the developers, I am no longer running/testing it for a long time. I
 > wish we would just drop its support tbh.

There are features offered by Xorg that are not offered by Wayland
compositors and again, we have customers that rely on these features.
One of them is the ability to position the window via
gtk_window_set_position(). There are strong arguments
made on either side when it comes to window positioning:
https://gitlab.freedesktop.org/wayland/wayland-protocols/-/merge_requests/247 

Until there is a way to do this with Wayland compositors, we have to
unfortunately rely on Gnome + Xorg.

It's a smaller and smaller number of users. The potential/future users 
are greater if we focus on Wayland.

Right, but until Gtk + Wayland offers the same feature parity and 
customization as that of Gtk + Xorg, there will be distros/users that 
will keep it alive.

Fwiw, GNOME (and RHEL) is set to drop Xorg support 
(https://gitlab.gnome.org/GNOME/gnome-session/-/merge_requests/98 
)

Doesn't look like it is going to happen anytime soon given the massive 
pushback.

Btw, there is a similar monitor-mapping functionality implemented in 
virt-viewer/remote-viewer: 
https://www.mankier.com/1/virt-viewer#Configuration 
. Is this something 
that those users could use instead?

It looks a bit similar and interesting but one difference is that our 
feature uses monitor labels such as DP-1, HDMI-2 which is a bit more 
intuitive. And, the other key difference is that our feature includes 
"hotplug" functionality where a Guest display/window is deeply tied to a
physical monitor to make it appear to the guest that it is dealing with 
a real physical monitor.

In other words, when the physical monitor is unplugged, the associated 
guest display/window gets destroyed/hidden and gets recreated/shown when 
the monitor is hotplugged again.

--
Marc-André Lureau

Re: [PULL 0/6] Tracing patches

2024-06-12 Thread Richard Henderson


On 6/10/24 10:13, Stefan Hajnoczi wrote:

The following changes since commit 80e8f0602168f451a93e71cbb1d59e93d745e62e:

   Merge tag 'bsd-user-misc-2024q2-pull-request' of gitlab.com:bsdimp/qemu into 
staging (2024-06-09 11:21:55 -0700)

are available in the Git repository at:

   https://gitlab.com/stefanha/qemu.git  tags/tracing-pull-request

for you to fetch changes up to 4c2b6f328742084a5bd770af7c3a2ef07828c41c:

   tracetool: Forbid newline character in event format (2024-06-10 13:05:27 
-0400)


Pull request

Cleanups from Philippe Mathieu-Daudé.


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/9.1 as 
appropriate.


r~

Re: [PATCH 21/32] hw/sd: Add mmc switch function support

2024-06-12 Thread Philippe Mathieu-Daudé


On 3/7/23 15:24, Cédric Le Goater wrote:

From: Sai Pavan Boddu 

switch operation in mmc cards, updated the ext_csd register to
request changes in card operations. Here we implement similar
sequence but requests are mostly dummy and make no change.

Implement SWITCH_ERROR if the write operation offset goes beyond length
of ext_csd.

Signed-off-by: Sai Pavan Boddu 
Signed-off-by: Edgar E. Iglesias 
[ clg: - ported on SDProto framework ]
Signed-off-by: Cédric Le Goater 
---
  hw/sd/sd.c | 52 
  1 file changed, 52 insertions(+)




+static void mmc_function_switch(SDState *sd, uint32_t arg)
+{
+uint32_t access = extract32(arg, 24, 2);
+uint32_t index = extract32(arg, 16, 8);
+uint32_t value = extract32(arg, 8, 8);
+uint8_t b = sd->ext_csd[index];


This field is added in the next patch :)

../../hw/sd/sd.c:927:21: error: no member named 'ext_csd' in 'struct 
SDState'

uint8_t b = sd->ext_csd[index];
~~  ^
../../hw/sd/sd.c:949:9: error: no member named 'ext_csd' in 'struct SDState'
sd->ext_csd[index] = b;
~~  ^

No need to respin, as I'm integrating your work.


+switch (access) {
+case MMC_CMD6_ACCESS_COMMAND_SET:
+qemu_log_mask(LOG_UNIMP, "MMC Command set switching not supported\n");
+return;
+case MMC_CMD6_ACCESS_SET_BITS:
+b |= value;
+break;
+case MMC_CMD6_ACCESS_CLEAR_BITS:
+b &= ~value;
+break;
+case MMC_CMD6_ACCESS_WRITE_BYTE:
+b = value;
+break;
+}
+
+if (index >= 192) {
+sd->card_status |= R_CSR_SWITCH_ERROR_MASK;
+return;
+}
+
+sd->ext_csd[index] = b;
+}

Re: [PATCH 20/32] hw/sd: Add CMD21 tuning sequence

2024-06-12 Thread Philippe Mathieu-Daudé


On 13/6/24 00:15, Philippe Mathieu-Daudé wrote:

On 3/7/23 15:24, Cédric Le Goater wrote:

From: Sai Pavan Boddu 

MMC cards support different tuning sequence for entering HS200 mode.

Signed-off-by: Sai Pavan Boddu 
Signed-off-by: Edgar E. Iglesias 
[ clg: - ported on QEMU 7.0 ]
Signed-off-by: Cédric Le Goater 
---
  hw/sd/sd.c | 40 
  1 file changed, 40 insertions(+)

diff --git a/hw/sd/sd.c b/hw/sd/sd.c
index 4b4a4cda2e68..7332f7a18435 100644
--- a/hw/sd/sd.c
+++ b/hw/sd/sd.c
@@ -2017,6 +2017,30 @@ static const uint8_t 
sd_tuning_block_pattern[SD_TUNING_BLOCK_SIZE] = {

  0xbb, 0xff, 0xf7, 0xff, 0xf7, 0x7f, 0x7b, 0xde,
  };
+#define EXCSD_BUS_WIDTH_OFFSET 183
+#define BUS_WIDTH_8_MASK    0x4
+#define BUS_WIDTH_4_MASK    0x2
+#define MMC_TUNING_BLOCK_SIZE   128
+
+static const uint8_t mmc_tuning_block_pattern[128] = {
+   0xff, 0xff, 0x00, 0xff, 0xff, 0xff, 0x00, 0x00,
+   0xff, 0xff, 0xcc, 0xcc, 0xcc, 0x33, 0xcc, 0xcc,
+   0xcc, 0x33, 0x33, 0xcc, 0xcc, 0xcc, 0xff, 0xff,
+   0xff, 0xee, 0xff, 0xff, 0xff, 0xee, 0xee, 0xff,
+   0xff, 0xff, 0xdd, 0xff, 0xff, 0xff, 0xdd, 0xdd,
+   0xff, 0xff, 0xff, 0xbb, 0xff, 0xff, 0xff, 0xbb,
+   0xbb, 0xff, 0xff, 0xff, 0x77, 0xff, 0xff, 0xff,
+   0x77, 0x77, 0xff, 0x77, 0xbb, 0xdd, 0xee, 0xff,
+   0xff, 0xff, 0xff, 0x00, 0xff, 0xff, 0xff, 0x00,
+   0x00, 0xff, 0xff, 0xcc, 0xcc, 0xcc, 0x33, 0xcc,
+   0xcc, 0xcc, 0x33, 0x33, 0xcc, 0xcc, 0xcc, 0xff,
+   0xff, 0xff, 0xee, 0xff, 0xff, 0xff, 0xee, 0xee,
+   0xff, 0xff, 0xff, 0xdd, 0xff, 0xff, 0xff, 0xdd,
+   0xdd, 0xff, 0xff, 0xff, 0xbb, 0xff, 0xff, 0xff,
+   0xbb, 0xbb, 0xff, 0xff, 0xff, 0x77, 0xff, 0xff,
+   0xff, 0x77, 0x77, 0xff, 0x77, 0xbb, 0xdd, 0xee,
+};
+
  uint8_t sd_read_byte(SDState *sd)
  {
  /* TODO: Append CRCs */
@@ -2103,6 +2127,22 @@ uint8_t sd_read_byte(SDState *sd)
  ret = sd_tuning_block_pattern[sd->data_offset++];
  break;
+    case 21:    /* CMD21: SEND_TUNING_BLOCK (MMC) */


This can be accessed in SPI/SD modes, should we check for eMMC then?


This could do:

-- >8 --
diff --git a/hw/sd/sd.c b/hw/sd/sd.c
index 3c12ba2ad3..5bad19c766 100644
--- a/hw/sd/sd.c
+++ b/hw/sd/sd.c
@@ -160,12 +160,18 @@ static const struct SDProto *sd_proto(SDState *sd)
 }

 static const SDProto sd_proto_spi;
+static const SDProto sd_proto_emmc;

 static bool sd_is_spi(SDState *sd)
 {
 return sd_proto(sd) == _proto_spi;
 }

+static bool sd_is_emmc(SDState *sd)
+{
+return sd_proto(sd) == _proto_emmc;
+}
+
 static const char *sd_version_str(enum SDPhySpecificationVersion version)
 {
 static const char *sdphy_version[] = {
@@ -2389,7 +2395,9 @@ uint8_t sd_read_byte(SDState *sd)
 break;

 case 21:/* CMD21: SEND_TUNING_BLOCK (MMC) */
+if (!sd_is_emmc(sd)) {
+return 0x00;
+}
 if (sd->data_offset >= MMC_TUNING_BLOCK_SIZE - 1) {
 sd->state = sd_transfer_state;
 }
---


Similarly, other cases previous eMMC introduction only expect SPI/SD
but don't check for it. I need to think a bit more on how to handle
that.


+    if (sd->data_offset >= MMC_TUNING_BLOCK_SIZE - 1) {
+    sd->state = sd_transfer_state;
+    }
+    if (sd->ext_csd[EXCSD_BUS_WIDTH_OFFSET] & BUS_WIDTH_8_MASK) {
+    ret = mmc_tuning_block_pattern[sd->data_offset++];
+    } else {
+    /*
+ * Return LSB Nibbles of two byte from the 8bit tuning
+ * block for 4bit mode
+ */
+    ret = mmc_tuning_block_pattern[sd->data_offset++] & 0x0F;
+    ret |= (mmc_tuning_block_pattern[sd->data_offset++] & 
0x0F) << 4;

+    }
+    break;
+
  case 22:  /* ACMD22: SEND_NUM_WR_BLOCKS */
  ret = sd->data[sd->data_offset ++];

Re: [PATCH 26/32] hw/sd: Fix SET_BLOCK_COUNT command argument

2024-06-12 Thread Philippe Mathieu-Daudé


On 3/7/23 15:25, Cédric Le Goater wrote:

The number of blocks is defined in the lower bits [15:0].

TODO: This needs to be more precise on the spec version.

Signed-off-by: Cédric Le Goater 
---
  hw/sd/sd.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/sd/sd.c b/hw/sd/sd.c
index c4c9e9ee7999..7f07d0e99d15 100644
--- a/hw/sd/sd.c
+++ b/hw/sd/sd.c
@@ -1282,7 +1282,7 @@ static sd_rsp_type_t sd_cmd_SET_BLOCK_COUNT(SDState *sd, 
SDRequest req)
  return sd_invalid_state_for_cmd(sd, req);
  }
  
-sd->multi_blk_cnt = req.arg;

+sd->multi_blk_cnt = req.arg & 0x;


On the SD Physical Layer spec v9.10 this field is still 32-bit
(see table 4-24, p. 104).

Should we use a sd_is_emmc() helper similar to sd_is_spi()?

  
  return sd_r1;

  }

Re: [PATCH 20/32] hw/sd: Add CMD21 tuning sequence

2024-06-12 Thread Philippe Mathieu-Daudé


On 3/7/23 15:24, Cédric Le Goater wrote:

From: Sai Pavan Boddu 

MMC cards support different tuning sequence for entering HS200 mode.

Signed-off-by: Sai Pavan Boddu 
Signed-off-by: Edgar E. Iglesias 
[ clg: - ported on QEMU 7.0 ]
Signed-off-by: Cédric Le Goater 
---
  hw/sd/sd.c | 40 
  1 file changed, 40 insertions(+)

diff --git a/hw/sd/sd.c b/hw/sd/sd.c
index 4b4a4cda2e68..7332f7a18435 100644
--- a/hw/sd/sd.c
+++ b/hw/sd/sd.c
@@ -2017,6 +2017,30 @@ static const uint8_t 
sd_tuning_block_pattern[SD_TUNING_BLOCK_SIZE] = {
  0xbb, 0xff, 0xf7, 0xff, 0xf7, 0x7f, 0x7b, 0xde,
  };
  
+#define EXCSD_BUS_WIDTH_OFFSET 183

+#define BUS_WIDTH_8_MASK0x4
+#define BUS_WIDTH_4_MASK0x2
+#define MMC_TUNING_BLOCK_SIZE   128
+
+static const uint8_t mmc_tuning_block_pattern[128] = {
+   0xff, 0xff, 0x00, 0xff, 0xff, 0xff, 0x00, 0x00,
+   0xff, 0xff, 0xcc, 0xcc, 0xcc, 0x33, 0xcc, 0xcc,
+   0xcc, 0x33, 0x33, 0xcc, 0xcc, 0xcc, 0xff, 0xff,
+   0xff, 0xee, 0xff, 0xff, 0xff, 0xee, 0xee, 0xff,
+   0xff, 0xff, 0xdd, 0xff, 0xff, 0xff, 0xdd, 0xdd,
+   0xff, 0xff, 0xff, 0xbb, 0xff, 0xff, 0xff, 0xbb,
+   0xbb, 0xff, 0xff, 0xff, 0x77, 0xff, 0xff, 0xff,
+   0x77, 0x77, 0xff, 0x77, 0xbb, 0xdd, 0xee, 0xff,
+   0xff, 0xff, 0xff, 0x00, 0xff, 0xff, 0xff, 0x00,
+   0x00, 0xff, 0xff, 0xcc, 0xcc, 0xcc, 0x33, 0xcc,
+   0xcc, 0xcc, 0x33, 0x33, 0xcc, 0xcc, 0xcc, 0xff,
+   0xff, 0xff, 0xee, 0xff, 0xff, 0xff, 0xee, 0xee,
+   0xff, 0xff, 0xff, 0xdd, 0xff, 0xff, 0xff, 0xdd,
+   0xdd, 0xff, 0xff, 0xff, 0xbb, 0xff, 0xff, 0xff,
+   0xbb, 0xbb, 0xff, 0xff, 0xff, 0x77, 0xff, 0xff,
+   0xff, 0x77, 0x77, 0xff, 0x77, 0xbb, 0xdd, 0xee,
+};
+
  uint8_t sd_read_byte(SDState *sd)
  {
  /* TODO: Append CRCs */
@@ -2103,6 +2127,22 @@ uint8_t sd_read_byte(SDState *sd)
  ret = sd_tuning_block_pattern[sd->data_offset++];
  break;
  
+case 21:/* CMD21: SEND_TUNING_BLOCK (MMC) */


This can be accessed in SPI/SD modes, should we check for eMMC then?

Similarly, other cases previous eMMC introduction only expect SPI/SD
but don't check for it. I need to think a bit more on how to handle
that.


+if (sd->data_offset >= MMC_TUNING_BLOCK_SIZE - 1) {
+sd->state = sd_transfer_state;
+}
+if (sd->ext_csd[EXCSD_BUS_WIDTH_OFFSET] & BUS_WIDTH_8_MASK) {
+ret = mmc_tuning_block_pattern[sd->data_offset++];
+} else {
+/*
+ * Return LSB Nibbles of two byte from the 8bit tuning
+ * block for 4bit mode
+ */
+ret = mmc_tuning_block_pattern[sd->data_offset++] & 0x0F;
+ret |= (mmc_tuning_block_pattern[sd->data_offset++] & 0x0F) << 4;
+}
+break;
+
  case 22:  /* ACMD22: SEND_NUM_WR_BLOCKS */
  ret = sd->data[sd->data_offset ++];

[PULL 1/5] xen: mapcache: Make MCACHE_BUCKET_SHIFT runtime configurable

2024-06-12 Thread Stefano Stabellini

From: "Edgar E. Iglesias" 

Make MCACHE_BUCKET_SHIFT runtime configurable per cache instance.

Signed-off-by: Edgar E. Iglesias 
Reviewed-by: Stefano Stabellini 
Reviewed-by: Philippe Mathieu-Daudé 
---
 hw/xen/xen-mapcache.c | 54 ++-
 1 file changed, 33 insertions(+), 21 deletions(-)

diff --git a/hw/xen/xen-mapcache.c b/hw/xen/xen-mapcache.c
index fa6813b1ad..bc860f4373 100644
--- a/hw/xen/xen-mapcache.c
+++ b/hw/xen/xen-mapcache.c
@@ -23,13 +23,10 @@
 
 
 #if HOST_LONG_BITS == 32
-#  define MCACHE_BUCKET_SHIFT 16
 #  define MCACHE_MAX_SIZE (1UL<<31) /* 2GB Cap */
 #else
-#  define MCACHE_BUCKET_SHIFT 20
 #  define MCACHE_MAX_SIZE (1UL<<35) /* 32GB Cap */
 #endif
-#define MCACHE_BUCKET_SIZE (1UL << MCACHE_BUCKET_SHIFT)
 
 /* This is the size of the virtual address space reserve to QEMU that will not
  * be use by MapCache.
@@ -65,7 +62,8 @@ typedef struct MapCache {
 /* For most cases (>99.9%), the page address is the same. */
 MapCacheEntry *last_entry;
 unsigned long max_mcache_size;
-unsigned int mcache_bucket_shift;
+unsigned int bucket_shift;
+unsigned long bucket_size;
 
 phys_offset_to_gaddr_t phys_offset_to_gaddr;
 QemuMutex lock;
@@ -95,11 +93,14 @@ static inline int test_bits(int nr, int size, const 
unsigned long *addr)
 
 static MapCache *xen_map_cache_init_single(phys_offset_to_gaddr_t f,
void *opaque,
+   unsigned int bucket_shift,
unsigned long max_size)
 {
 unsigned long size;
 MapCache *mc;
 
+assert(bucket_shift >= XC_PAGE_SHIFT);
+
 mc = g_new0(MapCache, 1);
 
 mc->phys_offset_to_gaddr = f;
@@ -108,12 +109,14 @@ static MapCache 
*xen_map_cache_init_single(phys_offset_to_gaddr_t f,
 
 QTAILQ_INIT(>locked_entries);
 
+mc->bucket_shift = bucket_shift;
+mc->bucket_size = 1UL << bucket_shift;
 mc->max_mcache_size = max_size;
 
 mc->nr_buckets =
 (((mc->max_mcache_size >> XC_PAGE_SHIFT) +
-  (1UL << (MCACHE_BUCKET_SHIFT - XC_PAGE_SHIFT)) - 1) >>
- (MCACHE_BUCKET_SHIFT - XC_PAGE_SHIFT));
+  (1UL << (bucket_shift - XC_PAGE_SHIFT)) - 1) >>
+ (bucket_shift - XC_PAGE_SHIFT));
 
 size = mc->nr_buckets * sizeof(MapCacheEntry);
 size = (size + XC_PAGE_SIZE - 1) & ~(XC_PAGE_SIZE - 1);
@@ -126,6 +129,13 @@ void xen_map_cache_init(phys_offset_to_gaddr_t f, void 
*opaque)
 {
 struct rlimit rlimit_as;
 unsigned long max_mcache_size;
+unsigned int bucket_shift;
+
+if (HOST_LONG_BITS == 32) {
+bucket_shift = 16;
+} else {
+bucket_shift = 20;
+}
 
 if (geteuid() == 0) {
 rlimit_as.rlim_cur = RLIM_INFINITY;
@@ -146,7 +156,9 @@ void xen_map_cache_init(phys_offset_to_gaddr_t f, void 
*opaque)
 }
 }
 
-mapcache = xen_map_cache_init_single(f, opaque, max_mcache_size);
+mapcache = xen_map_cache_init_single(f, opaque,
+ bucket_shift,
+ max_mcache_size);
 setrlimit(RLIMIT_AS, _as);
 }
 
@@ -195,7 +207,7 @@ static void xen_remap_bucket(MapCache *mc,
 entry->valid_mapping = NULL;
 
 for (i = 0; i < nb_pfn; i++) {
-pfns[i] = (address_index << (MCACHE_BUCKET_SHIFT-XC_PAGE_SHIFT)) + i;
+pfns[i] = (address_index << (mc->bucket_shift - XC_PAGE_SHIFT)) + i;
 }
 
 /*
@@ -266,8 +278,8 @@ static uint8_t *xen_map_cache_unlocked(MapCache *mc,
 bool dummy = false;
 
 tryagain:
-address_index  = phys_addr >> MCACHE_BUCKET_SHIFT;
-address_offset = phys_addr & (MCACHE_BUCKET_SIZE - 1);
+address_index  = phys_addr >> mc->bucket_shift;
+address_offset = phys_addr & (mc->bucket_size - 1);
 
 trace_xen_map_cache(phys_addr);
 
@@ -294,14 +306,14 @@ tryagain:
 return mc->last_entry->vaddr_base + address_offset;
 }
 
-/* size is always a multiple of MCACHE_BUCKET_SIZE */
+/* size is always a multiple of mc->bucket_size */
 if (size) {
 cache_size = size + address_offset;
-if (cache_size % MCACHE_BUCKET_SIZE) {
-cache_size += MCACHE_BUCKET_SIZE - (cache_size % 
MCACHE_BUCKET_SIZE);
+if (cache_size % mc->bucket_size) {
+cache_size += mc->bucket_size - (cache_size % mc->bucket_size);
 }
 } else {
-cache_size = MCACHE_BUCKET_SIZE;
+cache_size = mc->bucket_size;
 }
 
 entry = >entry[address_index % mc->nr_buckets];
@@ -422,7 +434,7 @@ static ram_addr_t 
xen_ram_addr_from_mapcache_single(MapCache *mc, void *ptr)
 trace_xen_ram_addr_from_mapcache_not_in_cache(ptr);
 raddr = RAM_ADDR_INVALID;
 } else {
-raddr = (reventry->paddr_index << MCACHE_BUCKET_SHIFT) +
+raddr = (reventry->paddr_index << mc->bucket_shift) +
  ((unsigned long) ptr - (unsigned long) entry->vaddr_base);
 }

Re: [PATCH 0/2] Bug fixes for plugins

2024-06-12 Thread Alex Bennée

Pierrick Bouvier  writes:

> Those two commits are meant to be applied on top of current series
> <20240612153508.1532940-1-alex.ben...@linaro.org>
>
> The first fixes issue with windows plugins, while the second is a bug that
> concerns memory callbacks, and due to a previous refactoring.

Queued to maintainer/june-2024-omnibus, thanks.

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

[PULL 2/5] xen: mapcache: Unmap first entries in buckets

2024-06-12 Thread Stefano Stabellini

From: "Edgar E. Iglesias" 

When invalidating memory ranges, if we happen to hit the first
entry in a bucket we were never unmapping it. This was harmless
for foreign mappings but now that we're looking to reuse the
mapcache for transient grant mappings, we must unmap entries
when invalidated.

Signed-off-by: Edgar E. Iglesias 
Reviewed-by: Stefano Stabellini 
---
 hw/xen/xen-mapcache.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/hw/xen/xen-mapcache.c b/hw/xen/xen-mapcache.c
index bc860f4373..ec95445696 100644
--- a/hw/xen/xen-mapcache.c
+++ b/hw/xen/xen-mapcache.c
@@ -491,18 +491,23 @@ static void 
xen_invalidate_map_cache_entry_unlocked(MapCache *mc,
 return;
 }
 entry->lock--;
-if (entry->lock > 0 || pentry == NULL) {
+if (entry->lock > 0) {
 return;
 }
 
-pentry->next = entry->next;
 ram_block_notify_remove(entry->vaddr_base, entry->size, entry->size);
 if (munmap(entry->vaddr_base, entry->size) != 0) {
 perror("unmap fails");
 exit(-1);
 }
+
 g_free(entry->valid_mapping);
-g_free(entry);
+if (pentry) {
+pentry->next = entry->next;
+g_free(entry);
+} else {
+memset(entry, 0, sizeof *entry);
+}
 }
 
 typedef struct XenMapCacheData {
-- 
2.25.1

[PULL 5/5] hw/arm: xen: Enable use of grant mappings

2024-06-12 Thread Stefano Stabellini

From: "Edgar E. Iglesias" 

Signed-off-by: Edgar E. Iglesias 
Reviewed-by: Stefano Stabellini 
Reviewed-by: Manos Pitsidianakis 
Reviewed-by: Philippe Mathieu-Daudé 
---
 hw/arm/xen_arm.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/hw/arm/xen_arm.c b/hw/arm/xen_arm.c
index 15fa7dfa84..6fad829ede 100644
--- a/hw/arm/xen_arm.c
+++ b/hw/arm/xen_arm.c
@@ -125,6 +125,11 @@ static void xen_init_ram(MachineState *machine)
  GUEST_RAM1_BASE, ram_size[1]);
 memory_region_add_subregion(sysmem, GUEST_RAM1_BASE, _hi);
 }
+
+/* Setup support for grants.  */
+memory_region_init_ram(_grants, NULL, "xen.grants", block_len,
+   _fatal);
+memory_region_add_subregion(sysmem, XEN_GRANT_ADDR_OFF, _grants);
 }
 
 void arch_handle_ioreq(XenIOState *state, ioreq_t *req)
-- 
2.25.1

[PULL 4/5] xen: mapcache: Add support for grant mappings

2024-06-12 Thread Stefano Stabellini

From: "Edgar E. Iglesias" 

Add a second mapcache for grant mappings. The mapcache for
grants needs to work with XC_PAGE_SIZE granularity since
we can't map larger ranges than what has been granted to us.

Like with foreign mappings (xen_memory), machines using grants
are expected to initialize the xen_grants MR and map it
into their address-map accordingly.

CC: Manos Pitsidianakis 
Signed-off-by: Edgar E. Iglesias 
Reviewed-by: Stefano Stabellini 
---
 hw/xen/xen-hvm-common.c |  12 ++-
 hw/xen/xen-mapcache.c   | 165 +---
 include/hw/xen/xen-hvm-common.h |   3 +
 include/sysemu/xen.h|   1 +
 4 files changed, 144 insertions(+), 37 deletions(-)

diff --git a/hw/xen/xen-hvm-common.c b/hw/xen/xen-hvm-common.c
index a0a0252da0..b8ace1c368 100644
--- a/hw/xen/xen-hvm-common.c
+++ b/hw/xen/xen-hvm-common.c
@@ -10,12 +10,18 @@
 #include "hw/boards.h"
 #include "hw/xen/arch_hvm.h"
 
-MemoryRegion xen_memory;
+MemoryRegion xen_memory, xen_grants;
 
-/* Check for xen memory.  */
+/* Check for any kind of xen memory, foreign mappings or grants.  */
 bool xen_mr_is_memory(MemoryRegion *mr)
 {
-return mr == _memory;
+return mr == _memory || mr == _grants;
+}
+
+/* Check specifically for grants.  */
+bool xen_mr_is_grants(MemoryRegion *mr)
+{
+return mr == _grants;
 }
 
 void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size, MemoryRegion *mr,
diff --git a/hw/xen/xen-mapcache.c b/hw/xen/xen-mapcache.c
index a07c47b0b1..5f23b0adbe 100644
--- a/hw/xen/xen-mapcache.c
+++ b/hw/xen/xen-mapcache.c
@@ -14,6 +14,7 @@
 
 #include 
 
+#include "hw/xen/xen-hvm-common.h"
 #include "hw/xen/xen_native.h"
 #include "qemu/bitmap.h"
 
@@ -21,6 +22,8 @@
 #include "sysemu/xen-mapcache.h"
 #include "trace.h"
 
+#include 
+#include 
 
 #if HOST_LONG_BITS == 32
 #  define MCACHE_MAX_SIZE (1UL<<31) /* 2GB Cap */
@@ -41,6 +44,7 @@ typedef struct MapCacheEntry {
 unsigned long *valid_mapping;
 uint32_t lock;
 #define XEN_MAPCACHE_ENTRY_DUMMY (1 << 0)
+#define XEN_MAPCACHE_ENTRY_GRANT (1 << 1)
 uint8_t flags;
 hwaddr size;
 struct MapCacheEntry *next;
@@ -71,6 +75,8 @@ typedef struct MapCache {
 } MapCache;
 
 static MapCache *mapcache;
+static MapCache *mapcache_grants;
+static xengnttab_handle *xen_region_gnttabdev;
 
 static inline void mapcache_lock(MapCache *mc)
 {
@@ -131,6 +137,12 @@ void xen_map_cache_init(phys_offset_to_gaddr_t f, void 
*opaque)
 unsigned long max_mcache_size;
 unsigned int bucket_shift;
 
+xen_region_gnttabdev = xengnttab_open(NULL, 0);
+if (xen_region_gnttabdev == NULL) {
+error_report("mapcache: Failed to open gnttab device");
+exit(EXIT_FAILURE);
+}
+
 if (HOST_LONG_BITS == 32) {
 bucket_shift = 16;
 } else {
@@ -159,6 +171,15 @@ void xen_map_cache_init(phys_offset_to_gaddr_t f, void 
*opaque)
 mapcache = xen_map_cache_init_single(f, opaque,
  bucket_shift,
  max_mcache_size);
+
+/*
+ * Grant mappings must use XC_PAGE_SIZE granularity since we can't
+ * map anything beyond the number of pages granted to us.
+ */
+mapcache_grants = xen_map_cache_init_single(f, opaque,
+XC_PAGE_SHIFT,
+max_mcache_size);
+
 setrlimit(RLIMIT_AS, _as);
 }
 
@@ -168,17 +189,24 @@ static void xen_remap_bucket(MapCache *mc,
  hwaddr size,
  hwaddr address_index,
  bool dummy,
+ bool grant,
+ bool is_write,
  ram_addr_t ram_offset)
 {
 uint8_t *vaddr_base;
-xen_pfn_t *pfns;
-int *err;
+g_autofree uint32_t *refs = NULL;
+g_autofree xen_pfn_t *pfns = NULL;
+g_autofree int *err;
 unsigned int i;
 hwaddr nb_pfn = size >> XC_PAGE_SHIFT;
 
 trace_xen_remap_bucket(address_index);
 
-pfns = g_new0(xen_pfn_t, nb_pfn);
+if (grant) {
+refs = g_new0(uint32_t, nb_pfn);
+} else {
+pfns = g_new0(xen_pfn_t, nb_pfn);
+}
 err = g_new0(int, nb_pfn);
 
 if (entry->vaddr_base != NULL) {
@@ -207,21 +235,51 @@ static void xen_remap_bucket(MapCache *mc,
 g_free(entry->valid_mapping);
 entry->valid_mapping = NULL;
 
-for (i = 0; i < nb_pfn; i++) {
-pfns[i] = (address_index << (mc->bucket_shift - XC_PAGE_SHIFT)) + i;
+if (grant) {
+hwaddr grant_base = address_index - (ram_offset >> XC_PAGE_SHIFT);
+
+for (i = 0; i < nb_pfn; i++) {
+refs[i] = grant_base + i;
+}
+} else {
+for (i = 0; i < nb_pfn; i++) {
+pfns[i] = (address_index << (mc->bucket_shift - XC_PAGE_SHIFT)) + 
i;
+}
 }
 
-/*
- * If the caller has requested the mapping at a specific address use
- * MAP_FIXED to make sure

[PULL 3/5] xen: mapcache: Pass the ram_addr offset to xen_map_cache()

2024-06-12 Thread Stefano Stabellini

From: "Edgar E. Iglesias" 

Pass the ram_addr offset to xen_map_cache.
This is in preparation for adding grant mappings that need
to compute the address within the RAMBlock.

No functional changes.

Signed-off-by: Edgar E. Iglesias 
Reviewed-by: David Hildenbrand 
Reviewed-by: Stefano Stabellini 
Reviewed-by: Philippe Mathieu-Daudé 
---
 hw/xen/xen-mapcache.c | 16 +++-
 include/sysemu/xen-mapcache.h |  2 ++
 system/physmem.c  |  9 +
 3 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/hw/xen/xen-mapcache.c b/hw/xen/xen-mapcache.c
index ec95445696..a07c47b0b1 100644
--- a/hw/xen/xen-mapcache.c
+++ b/hw/xen/xen-mapcache.c
@@ -167,7 +167,8 @@ static void xen_remap_bucket(MapCache *mc,
  void *vaddr,
  hwaddr size,
  hwaddr address_index,
- bool dummy)
+ bool dummy,
+ ram_addr_t ram_offset)
 {
 uint8_t *vaddr_base;
 xen_pfn_t *pfns;
@@ -266,6 +267,7 @@ static void xen_remap_bucket(MapCache *mc,
 
 static uint8_t *xen_map_cache_unlocked(MapCache *mc,
hwaddr phys_addr, hwaddr size,
+   ram_addr_t ram_offset,
uint8_t lock, bool dma, bool is_write)
 {
 MapCacheEntry *entry, *pentry = NULL,
@@ -337,14 +339,16 @@ tryagain:
 if (!entry) {
 entry = g_new0(MapCacheEntry, 1);
 pentry->next = entry;
-xen_remap_bucket(mc, entry, NULL, cache_size, address_index, dummy);
+xen_remap_bucket(mc, entry, NULL, cache_size, address_index, dummy,
+ ram_offset);
 } else if (!entry->lock) {
 if (!entry->vaddr_base || entry->paddr_index != address_index ||
 entry->size != cache_size ||
 !test_bits(address_offset >> XC_PAGE_SHIFT,
 test_bit_size >> XC_PAGE_SHIFT,
 entry->valid_mapping)) {
-xen_remap_bucket(mc, entry, NULL, cache_size, address_index, 
dummy);
+xen_remap_bucket(mc, entry, NULL, cache_size, address_index, dummy,
+ ram_offset);
 }
 }
 
@@ -391,13 +395,15 @@ tryagain:
 
 uint8_t *xen_map_cache(MemoryRegion *mr,
hwaddr phys_addr, hwaddr size,
+   ram_addr_t ram_addr_offset,
uint8_t lock, bool dma,
bool is_write)
 {
 uint8_t *p;
 
 mapcache_lock(mapcache);
-p = xen_map_cache_unlocked(mapcache, phys_addr, size, lock, dma, is_write);
+p = xen_map_cache_unlocked(mapcache, phys_addr, size, ram_addr_offset,
+   lock, dma, is_write);
 mapcache_unlock(mapcache);
 return p;
 }
@@ -632,7 +638,7 @@ static uint8_t *xen_replace_cache_entry_unlocked(MapCache 
*mc,
 trace_xen_replace_cache_entry_dummy(old_phys_addr, new_phys_addr);
 
 xen_remap_bucket(mc, entry, entry->vaddr_base,
- cache_size, address_index, false);
+ cache_size, address_index, false, old_phys_addr);
 if (!test_bits(address_offset >> XC_PAGE_SHIFT,
 test_bit_size >> XC_PAGE_SHIFT,
 entry->valid_mapping)) {
diff --git a/include/sysemu/xen-mapcache.h b/include/sysemu/xen-mapcache.h
index 1ec9e66752..b5e3ea1bc0 100644
--- a/include/sysemu/xen-mapcache.h
+++ b/include/sysemu/xen-mapcache.h
@@ -19,6 +19,7 @@ typedef hwaddr (*phys_offset_to_gaddr_t)(hwaddr phys_offset,
 void xen_map_cache_init(phys_offset_to_gaddr_t f,
 void *opaque);
 uint8_t *xen_map_cache(MemoryRegion *mr, hwaddr phys_addr, hwaddr size,
+   ram_addr_t ram_addr_offset,
uint8_t lock, bool dma,
bool is_write);
 ram_addr_t xen_ram_addr_from_mapcache(void *ptr);
@@ -37,6 +38,7 @@ static inline void xen_map_cache_init(phys_offset_to_gaddr_t 
f,
 static inline uint8_t *xen_map_cache(MemoryRegion *mr,
  hwaddr phys_addr,
  hwaddr size,
+ ram_addr_t ram_addr_offset,
  uint8_t lock,
  bool dma,
  bool is_write)
diff --git a/system/physmem.c b/system/physmem.c
index b7847db1a2..33d09f7571 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -2231,13 +2231,14 @@ static void *qemu_ram_ptr_length(RAMBlock *block, 
ram_addr_t addr,
  */
 if (xen_mr_is_memory(block->mr)) {
 return xen_map_cache(block->mr, block->offset + addr,
- len, lock, lock,
- is_write);
+ len, block->offset,
+ lock, lock,

[PULL 0/5] virtio-grants-v8-tag

2024-06-12 Thread Stefano Stabellini

The following changes since commit 80e8f0602168f451a93e71cbb1d59e93d745e62e:

  Merge tag 'virtio-grants-v8-tag' into staging (2024-06-09 11:21:55 -0700)

are available in the Git repository at:

  https://gitlab.com/sstabellini/qemu.git 

for you to fetch changes up to 6d87a2a311fe4a8363143e6914d000697ea0cd83:

  hw/arm: xen: Enable use of grant mappings (2024-06-09 20:16:14 +0200)


Edgar E. Iglesias (5):
  xen: mapcache: Make MCACHE_BUCKET_SHIFT runtime configurable
  xen: mapcache: Unmap first entries in buckets
  xen: mapcache: Pass the ram_addr offset to xen_map_cache()
  xen: mapcache: Add support for grant mappings
  hw/arm: xen: Enable use of grant mappings

 hw/arm/xen_arm.c|   5 +
 hw/xen/xen-hvm-common.c |  12 ++-
 hw/xen/xen-mapcache.c   | 234 ++--
 include/hw/xen/xen-hvm-common.h |   3 +
 include/sysemu/xen-mapcache.h   |   2 +
 include/sysemu/xen.h|   1 +
 system/physmem.c|   9 +-
 7 files changed, 202 insertions(+), 64 deletions(-)

Re: [RFC PATCH v2 3/5] rust: add PL011 device model

2024-06-12 Thread Paolo Bonzini

Il mer 12 giu 2024, 22:58 Manos Pitsidianakis <
manos.pitsidiana...@linaro.org> ha scritto:

> In any case, it is out of scope for this RFC. Introducing wrappers would
> be a gradual process.
>

Sure, how would you feel about such bindings being developed on list, and
maintained in a (somewhat) long-lived experimental branch?

Paolo


> Thanks,
> Manos
>
>

Re: [PATCH 9/9] contrib/plugins: add ips plugin example for cost modeling

2024-06-12 Thread Dr. David Alan Gilbert

* Alex Bennée (alex.ben...@linaro.org) wrote:
> From: Pierrick Bouvier 
> 
> This plugin uses the new time control interface to make decisions
> about the state of time during the emulation. The algorithm is
> currently very simple. The user specifies an ips rate which applies
> per core. If the core runs ahead of its allocated execution time the
> plugin sleeps for a bit to let real time catch up. Either way time is
> updated for the emulation as a function of total executed instructions
> with some adjustments for cores that idle.

A few random thoughts:
  a) Are there any definitions of what a plugin that controls time
 should do with a live migration?
  b) The sleep in migration/dirtyrate.c points out g_usleep might
 sleep for longer, so reads the actual wall clock time to
 figure out a new 'now'.
  c) A fun thing to do with this would be to follow an external simulation
 or 2nd qemu, trying to keep the two from running too far past
 each other.

Dave

> Examples
> 
> 
> Slow down execution of /bin/true:
> $ num_insn=$(./build/qemu-x86_64 -plugin ./build/tests/plugin/libinsn.so -d 
> plugin /bin/true |& grep total | sed -e 's/.*: //')
> $ time ./build/qemu-x86_64 -plugin 
> ./build/contrib/plugins/libips.so,ips=$(($num_insn/4)) /bin/true
> real 4.000s
> 
> Boot a Linux kernel simulating a 250MHz cpu:
> $ /build/qemu-system-x86_64 -kernel /boot/vmlinuz-6.1.0-21-amd64 -append 
> "console=ttyS0" -plugin 
> ./build/contrib/plugins/libips.so,ips=$((250*1000*1000)) -smp 1 -m 512
> check time until kernel panic on serial0
> 
> Tested in system mode by booting a full debian system, and using:
> $ sysbench cpu run
> Performance decrease linearly with the given number of ips.
> 
> Signed-off-by: Pierrick Bouvier 
> Message-Id: <20240530220610.1245424-7-pierrick.bouv...@linaro.org>
> ---
>  contrib/plugins/ips.c| 164 +++
>  contrib/plugins/Makefile |   1 +
>  2 files changed, 165 insertions(+)
>  create mode 100644 contrib/plugins/ips.c
> 
> diff --git a/contrib/plugins/ips.c b/contrib/plugins/ips.c
> new file mode 100644
> index 00..db77729264
> --- /dev/null
> +++ b/contrib/plugins/ips.c
> @@ -0,0 +1,164 @@
> +/*
> + * ips rate limiting plugin.
> + *
> + * This plugin can be used to restrict the execution of a system to a
> + * particular number of Instructions Per Second (ips). This controls
> + * time as seen by the guest so while wall-clock time may be longer
> + * from the guests point of view time will pass at the normal rate.
> + *
> + * This uses the new plugin API which allows the plugin to control
> + * system time.
> + *
> + * Copyright (c) 2023 Linaro Ltd
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#include 
> +#include 
> +#include 
> +
> +QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
> +
> +/* how many times do we update time per sec */
> +#define NUM_TIME_UPDATE_PER_SEC 10
> +#define NSEC_IN_ONE_SEC (1000 * 1000 * 1000)
> +
> +static GMutex global_state_lock;
> +
> +static uint64_t max_insn_per_second = 1000 * 1000 * 1000; /* ips per core, 
> per second */
> +static uint64_t max_insn_per_quantum; /* trap every N instructions */
> +static int64_t virtual_time_ns; /* last set virtual time */
> +
> +static const void *time_handle;
> +
> +typedef struct {
> +uint64_t total_insn;
> +uint64_t quantum_insn; /* insn in last quantum */
> +int64_t last_quantum_time; /* time when last quantum started */
> +} vCPUTime;
> +
> +struct qemu_plugin_scoreboard *vcpus;
> +
> +/* return epoch time in ns */
> +static int64_t now_ns(void)
> +{
> +return g_get_real_time() * 1000;
> +}
> +
> +static uint64_t num_insn_during(int64_t elapsed_ns)
> +{
> +double num_secs = elapsed_ns / (double) NSEC_IN_ONE_SEC;
> +return num_secs * (double) max_insn_per_second;
> +}
> +
> +static int64_t time_for_insn(uint64_t num_insn)
> +{
> +double num_secs = (double) num_insn / (double) max_insn_per_second;
> +return num_secs * (double) NSEC_IN_ONE_SEC;
> +}
> +
> +static void update_system_time(vCPUTime *vcpu)
> +{
> +int64_t elapsed_ns = now_ns() - vcpu->last_quantum_time;
> +uint64_t max_insn = num_insn_during(elapsed_ns);
> +
> +if (vcpu->quantum_insn >= max_insn) {
> +/* this vcpu ran faster than expected, so it has to sleep */
> +uint64_t insn_advance = vcpu->quantum_insn - max_insn;
> +uint64_t time_advance_ns = time_for_insn(insn_advance);
> +int64_t sleep_us = time_advance_ns / 1000;
> +g_usleep(sleep_us);
> +}
> +
> +vcpu->total_insn += vcpu->quantum_insn;
> +vcpu->quantum_insn = 0;
> +vcpu->last_quantum_time = now_ns();
> +
> +/* based on total number of instructions, what should be the new time? */
> +int64_t new_virtual_time = time_for_insn(vcpu->total_insn);
> +
> +g_mutex_lock(_state_lock);
> +
> +/* Time only moves forward. Another vcpu might have updated it already. 
> */
> +if

Re: [RFC PATCH v2 3/5] rust: add PL011 device model

2024-06-12 Thread Manos Pitsidianakis


On Wed, 12 Jun 2024 19:06, "Daniel P. Berrangé"  wrote:

On Wed, Jun 12, 2024 at 05:14:56PM +0300, Manos Pitsidianakis wrote:

On Wed, 12 Jun 2024 15:29, Paolo Bonzini  wrote:
> I think this is extremely useful to show where we could go in the task
> of creating more idiomatic bindings.
> 
> On Tue, Jun 11, 2024 at 12:34 PM Manos Pitsidianakis

>  wrote:
> > +pub const PL011_ARM_INFO: TypeInfo = TypeInfo {
> > +name: TYPE_PL011.as_ptr(),
> > +parent: TYPE_SYS_BUS_DEVICE.as_ptr(),
> > +instance_size: core::mem::size_of::(),
> > +instance_align: core::mem::align_of::(),
> > +instance_init: Some(pl011_init),
> > +instance_post_init: None,
> > +instance_finalize: None,
> > +abstract_: false,
> > +class_size: 0,
> > +class_init: Some(pl011_class_init),
> > +class_base_init: None,
> > +class_data: core::ptr::null_mut(),
> > +interfaces: core::ptr::null_mut(),
> > +};
> 
> QOM is certainly a major part of what we need to do for idiomatic

> bindings. This series includes both using QOM objects (chardev) and
> defining them.
> 
> For using QOM objects, there is only one strategy that we need to

> implement for both Chardev and DeviceState/SysBusDevice which is nice.
> We can probably take inspiration from glib-rs, see for example
> - https://docs.rs/glib/latest/glib/object/trait.Cast.html
> - https://docs.rs/glib/latest/glib/object/trait.ObjectType.html
> - https://docs.rs/glib/latest/glib/object/struct.ObjectRef.html


There was consensus in the community call that we won't be writing Rust APIs
for internal C QEMU interfaces; or at least, that's not the goal


I think that is over-stating things. This was only mentioned in passing
and my feeling was that we didn't have that detailed of a discussion
because at this stage such a discussion is a bit like putting the cart
before the horse.

While the initial focus might be to just consume a Rust API that is a
1:1 mapping of the C API, I expect that over time we'll end up writing
various higher level Rust wrappers around the C API. If we didn't do that,
then in effect we'd be making ourselves write psuedo-C code in Rust,
undermining many of the benefits we could get.



In any case, it is out of scope for this RFC. Introducing wrappers would 
be a gradual process.


Thanks,
Manos

Re: [PATCH] accel/tcg/plugin: Fix inject_mem_cb rw masking

2024-06-12 Thread Pierrick Bouvier


On 6/5/24 15:39, Pierrick Bouvier wrote:

Thanks for catching this Richard.

I did the same mistake in plugins/core.c as well, could you fix it as
well as part of this patch?

For complement, rw are enums,
R is 0b01, W is 0b10, and RW is 0b11, thus it work as expected with &.

Thanks,
Pierrick

Reviewed-by: Pierrick Bouvier 

On 6/5/24 15:25, Richard Henderson wrote:

These are not booleans, but masks.

Fixes: f86fd4d8721 ("plugins: distinct types for callbacks")
Signed-off-by: Richard Henderson 
---
   accel/tcg/plugin-gen.c | 4 ++--
   1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
index cc1634e7a6..b6bae32b99 100644
--- a/accel/tcg/plugin-gen.c
+++ b/accel/tcg/plugin-gen.c
@@ -240,13 +240,13 @@ static void inject_mem_cb(struct qemu_plugin_dyn_cb *cb,
   {
   switch (cb->type) {
   case PLUGIN_CB_MEM_REGULAR:
-if (rw && cb->regular.rw) {
+if (rw & cb->regular.rw) {
   gen_mem_cb(>regular, meminfo, addr);
   }
   break;
   case PLUGIN_CB_INLINE_ADD_U64:
   case PLUGIN_CB_INLINE_STORE_U64:
-if (rw && cb->inline_insn.rw) {
+if (rw & cb->inline_insn.rw) {
   inject_cb(cb);
   }
   break;


I sent a new series with your fix + other in plugins/core.c, so no 
action is needed from you.

(20240612195147.93121-1-pierrick.bouv...@linaro.org)

Thanks,
Pierrick

Re: [PATCH 8/9] plugins: add time control API

2024-06-12 Thread Pierrick Bouvier


On 6/12/24 12:37, Alex Bennée wrote:

Pierrick Bouvier  writes:


Hi Alex,

I noticed the new symbols lack QEMU_PLUGIN_API qualifier in
include/qemu/qemu-plugin.h:
- qemu_plugin_update_ns
- qemu_plugin_request_time_control

So it would be impossible to use those symbols on windows.

I kept a reminder to send a new patch after you pulled this, but if we
go to a new series, it could be as fast for you to just add this
directly.


Sure if you send the patch I'll just merge it into the series.



I posted the series <20240612195147.93121-1-pierrick.bouv...@linaro.org> 
with this fix, and a second for issue reported by Richard.


If you could integrate those in current series, that would be perfect :)

Thanks!
Pierrick

[PATCH 0/2] Bug fixes for plugins

2024-06-12 Thread Pierrick Bouvier

Those two commits are meant to be applied on top of current series
<20240612153508.1532940-1-alex.ben...@linaro.org>

The first fixes issue with windows plugins, while the second is a bug that
concerns memory callbacks, and due to a previous refactoring.

Pierrick Bouvier (2):
  plugins: missing QEMU_PLUGIN_API for time control
  plugins: fix inject_mem_cb rw masking

 include/qemu/qemu-plugin.h | 2 ++
 accel/tcg/plugin-gen.c | 4 ++--
 plugins/core.c | 4 ++--
 3 files changed, 6 insertions(+), 4 deletions(-)

-- 
2.39.2

[PATCH 2/2] plugins: fix inject_mem_cb rw masking

2024-06-12 Thread Pierrick Bouvier

These are not booleans, but masks.
Issue found by Richard Henderson.

Fixes: f86fd4d8721 ("plugins: distinct types for callbacks")
Signed-off-by: Richard Henderson 
Signed-off-by: Pierrick Bouvier 
---
 accel/tcg/plugin-gen.c | 4 ++--
 plugins/core.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
index cc1634e7a6b..b6bae32b997 100644
--- a/accel/tcg/plugin-gen.c
+++ b/accel/tcg/plugin-gen.c
@@ -240,13 +240,13 @@ static void inject_mem_cb(struct qemu_plugin_dyn_cb *cb,
 {
 switch (cb->type) {
 case PLUGIN_CB_MEM_REGULAR:
-if (rw && cb->regular.rw) {
+if (rw & cb->regular.rw) {
 gen_mem_cb(>regular, meminfo, addr);
 }
 break;
 case PLUGIN_CB_INLINE_ADD_U64:
 case PLUGIN_CB_INLINE_STORE_U64:
-if (rw && cb->inline_insn.rw) {
+if (rw & cb->inline_insn.rw) {
 inject_cb(cb);
 }
 break;
diff --git a/plugins/core.c b/plugins/core.c
index badede28cf9..9d737d82787 100644
--- a/plugins/core.c
+++ b/plugins/core.c
@@ -589,7 +589,7 @@ void qemu_plugin_vcpu_mem_cb(CPUState *cpu, uint64_t vaddr,
 
 switch (cb->type) {
 case PLUGIN_CB_MEM_REGULAR:
-if (rw && cb->regular.rw) {
+if (rw & cb->regular.rw) {
 cb->regular.f.vcpu_mem(cpu->cpu_index,
make_plugin_meminfo(oi, rw),
vaddr, cb->regular.userp);
@@ -597,7 +597,7 @@ void qemu_plugin_vcpu_mem_cb(CPUState *cpu, uint64_t vaddr,
 break;
 case PLUGIN_CB_INLINE_ADD_U64:
 case PLUGIN_CB_INLINE_STORE_U64:
-if (rw && cb->inline_insn.rw) {
+if (rw & cb->inline_insn.rw) {
 exec_inline_op(cb->type, >inline_insn, cpu->cpu_index);
 }
 break;
-- 
2.39.2

[PATCH 1/2] plugins: missing QEMU_PLUGIN_API for time control

2024-06-12 Thread Pierrick Bouvier

Signed-off-by: Pierrick Bouvier 
---
 include/qemu/qemu-plugin.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/qemu/qemu-plugin.h b/include/qemu/qemu-plugin.h
index 80b1637cede..310ee10f301 100644
--- a/include/qemu/qemu-plugin.h
+++ b/include/qemu/qemu-plugin.h
@@ -670,6 +670,7 @@ void qemu_plugin_register_vcpu_mem_inline_per_vcpu(
  *
  * Returns an opaque handle or NULL if fails
  */
+QEMU_PLUGIN_API
 const void *qemu_plugin_request_time_control(void);
 
 /**
@@ -682,6 +683,7 @@ const void *qemu_plugin_request_time_control(void);
  *
  * Start time is 0.
  */
+QEMU_PLUGIN_API
 void qemu_plugin_update_ns(const void *handle, int64_t time);
 
 typedef void
-- 
2.39.2

Re: [PATCH] hw/usb/hcd-ohci: Fix ohci_service_td: accept valid TDs

2024-06-12 Thread Cord Amfmgm

On Wed, Jun 12, 2024 at 2:36 PM Alex Bennée  wrote:

> Cord Amfmgm  writes:
>
> > On Wed, Jun 12, 2024 at 9:21 AM Alex Bennée 
> wrote:
> >
> >  David Hubbard  writes:
> >
> >  > From: Cord Amfmgm 
> >  >
> >  > This changes the way the ohci emulation handles a Transfer Descriptor
> with
> >  > "Current Buffer Pointer" set to "Buffer End" + 1.
> >  >
> >  > The OHCI spec 4.3.1.2 Table 4-2 allows td.cbp to be one byte more
> than td.be
> >  > to signal the buffer has zero length. Currently qemu only accepts
> zero-length
> >  > Transfer Descriptors if the td.cbp is equal to 0, while actual OHCI
> hardware
> >  > accepts both cases.
> >  >
> >  > The qemu ohci emulation has a regression in ohci_service_td. Version
> 4.2
> >  > and earlier matched the spec. (I haven't taken the time to bisect
> exactly
> >  > where the logic was changed.)
> >
> >  I find it hard to characterise this as a regression because we've
> >  basically gone from no checks to actually doing bounds checks:
> >
> >1328fe0c32 (hw: usb: hcd-ohci: check len and frame_number variables)
> >
> >  The argument here seems to be that real hardware is laxer than the specs
> >  in what it accepts.
> >
> 
> >
> >  With the updated commit message:
> >
> >  Reviewed-by: Alex Bennée 
> >
> > Please forgive my lack of experience on this mailing list. I don't see a
> suggested commit message from Alex but in case that
> > was what is being considered, here is one. Feedback welcome, also if
> this is not what is wanted, please just say it.
> >
>
> Something along the lines of what you suggest here (where did this come
> from?)
>
> >> From: Cord Amfmgm 
> >>
> >> This changes the way the ohci emulation handles a Transfer Descriptor
> with
> >> "Buffer End" set to "Current Buffer Pointer" - 1, specifically in the
> case of a
> >> zero-length packet.
> >>
> >> The OHCI spec 4.3.1.2 Table 4-2 specifies td.cbp to be zero for a
> zero-length
> >> packet. Peter Maydell tracked down commit
> >> 1328fe0c32 (hw: usb: hcd-ohci: check len and frame_number variables)
> >> where qemu started checking this according to the spec.
> >>
> >> What this patch does is loosen the qemu ohci implementation to allow a
> >> zero-length packet if td.be (Buffer End) is set to td.cbp - 1, and
> with a
> >> non-zero td.cbp value.
> >>
> >> Is this correct? It appears to not follow the spec, so no. But actual hw
> >> seems to be ok with it.
> >>
> >> Does any OS rely on this behavior? There have been no reports to
> >> qemu-devel of this problem.
> >>
> >> This is attempting to have qemu behave like actual hardware,
> >> but this is just a minor change.
> >>
> >> With a tiny OS[1] that boots and executes a test, the behavior is
> >> reproducible:
> >>
> >> * OS that sends USB requests to a USB mass storage device
> >>   but sends td.be = td.cbp - 1
> >> * qemu 4.2
> >> * qemu HEAD (4e66a0854)
> >> * Actual OHCI controller (hardware)
> >>
> >> Command line:
> >> qemu-system-x86_64 -m 20 \
> >>  -device pci-ohci,id=ohci \
> >>  -drive if=none,format=raw,id=d,file=testmbr.raw \
> >>  -device usb-storage,bus=ohci.0,drive=d \
> >>  --trace "usb_*" --trace "ohci_*" -D qemu.log
> >>
> >> Results are:
> >>
> >>  qemu 4.2   | qemu HEAD  | actual HW
> >> ++
> >>  works fine | ohci_die() | works fine
> >>
> >> Tip: if the flags "-serial pty -serial stdio" are added to the command
> line
> >> the test will output USB requests like this:
> >>
> >> Testing qemu HEAD:
> >>
> >>> Free mem 2M ohci port2 conn FS
> >>> setup { 80 6 0 1 0 0 8 0 }
> >>> ED info=8 { mps=8 en=0 d=0 } tail=c20920
> >>>   td0 c20880 nxt=c20960 f200 setup cbp=c20900 be=c20907
> >>>   td1 c20960 nxt=c20980 f314in cbp=c20908 be=c2090f
> >>>   td2 c20980 nxt=c20920 f308   out cbp=c20910 be=c2090f ohci20
> host err
> >>> usb stopped
> >>
> >> And in qemu.log:
> >>
> >> usb_ohci_iso_td_bad_cc_overrun ISO_TD start_offset=0x00c20910 >
> next_offset=0x00c2090f
> >>
> >> Testing qemu 4.2:
> >>
> >>> Free mem 2M ohci port2 conn FS
> >>> setup { 80 6 0 1 0 0 8 0 }
> >>> ED info=8 { mps=8 en=0 d=0 } tail=620920
> >>>   td0 620880 nxt=620960 f200 setup cbp=620900 be=620907
>  cbp=0 be=620907
> >>>   td1 620960 nxt=620980 f314in cbp=620908 be=62090f
>  cbp=0 be=62090f
> >>>   td2 620980 nxt=620920 f308   out cbp=620910 be=62090f
>  cbp=0 be=62090f
> >>>rx { 12 1 0 2 0 0 0 8 }
> >>> setup { 0 5 1 0 0 0 0 0 } tx {}
> >>> ED info=8 { mps=8 en=0 d=0 } tail=620880
> >>>   td0 620920 nxt=620960 f200 setup cbp=620900 be=620907
>  cbp=0 be=620907
> >>>   td1 620960 nxt=620880 f310in cbp=620908 be=620907
>  cbp=0 be=620907
> >>> setup { 80 6 0 1 0 0 12 0 }
> >>> ED info=80001 { mps=8 en=0 d=1 } tail=620960
> >>>   td0 620880 nxt=6209c0 f200 setup cbp=620920 be=620927
>  cbp=0 be=620927
> >>>   td1 6209c0 nxt=6209e0 f314in cbp=620928 be=620939
>  cbp=0 be=620939
> >>>   td2 6209e0 nxt=620960 f308   out cbp=62093a be=620939
>  cbp=0

Re: [PATCH 8/9] plugins: add time control API

2024-06-12 Thread Alex Bennée

Pierrick Bouvier  writes:

> Hi Alex,
>
> I noticed the new symbols lack QEMU_PLUGIN_API qualifier in
> include/qemu/qemu-plugin.h:
> - qemu_plugin_update_ns
> - qemu_plugin_request_time_control
>
> So it would be impossible to use those symbols on windows.
>
> I kept a reminder to send a new patch after you pulled this, but if we
> go to a new series, it could be as fast for you to just add this
> directly.

Sure if you send the patch I'll just merge it into the series.

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH] hw/usb/hcd-ohci: Fix ohci_service_td: accept valid TDs

2024-06-12 Thread Alex Bennée

Cord Amfmgm  writes:

> On Wed, Jun 12, 2024 at 9:21 AM Alex Bennée  wrote:
>
>  David Hubbard  writes:
>
>  > From: Cord Amfmgm 
>  >
>  > This changes the way the ohci emulation handles a Transfer Descriptor with
>  > "Current Buffer Pointer" set to "Buffer End" + 1.
>  >
>  > The OHCI spec 4.3.1.2 Table 4-2 allows td.cbp to be one byte more than 
> td.be
>  > to signal the buffer has zero length. Currently qemu only accepts 
> zero-length
>  > Transfer Descriptors if the td.cbp is equal to 0, while actual OHCI 
> hardware
>  > accepts both cases.
>  >
>  > The qemu ohci emulation has a regression in ohci_service_td. Version 4.2
>  > and earlier matched the spec. (I haven't taken the time to bisect exactly
>  > where the logic was changed.)
>
>  I find it hard to characterise this as a regression because we've
>  basically gone from no checks to actually doing bounds checks:
>
>1328fe0c32 (hw: usb: hcd-ohci: check len and frame_number variables)
>
>  The argument here seems to be that real hardware is laxer than the specs
>  in what it accepts.
>

>
>  With the updated commit message:
>
>  Reviewed-by: Alex Bennée 
>
> Please forgive my lack of experience on this mailing list. I don't see a 
> suggested commit message from Alex but in case that
> was what is being considered, here is one. Feedback welcome, also if this is 
> not what is wanted, please just say it.
>

Something along the lines of what you suggest here (where did this come
from?)

>> From: Cord Amfmgm 
>>
>> This changes the way the ohci emulation handles a Transfer Descriptor with
>> "Buffer End" set to "Current Buffer Pointer" - 1, specifically in the case 
>> of a
>> zero-length packet.
>>
>> The OHCI spec 4.3.1.2 Table 4-2 specifies td.cbp to be zero for a zero-length
>> packet. Peter Maydell tracked down commit
>> 1328fe0c32 (hw: usb: hcd-ohci: check len and frame_number variables)
>> where qemu started checking this according to the spec.
>>
>> What this patch does is loosen the qemu ohci implementation to allow a
>> zero-length packet if td.be (Buffer End) is set to td.cbp - 1, and with a
>> non-zero td.cbp value.
>>
>> Is this correct? It appears to not follow the spec, so no. But actual hw
>> seems to be ok with it.
>>
>> Does any OS rely on this behavior? There have been no reports to
>> qemu-devel of this problem.
>>
>> This is attempting to have qemu behave like actual hardware,
>> but this is just a minor change.
>>
>> With a tiny OS[1] that boots and executes a test, the behavior is
>> reproducible:
>>
>> * OS that sends USB requests to a USB mass storage device
>>   but sends td.be = td.cbp - 1
>> * qemu 4.2
>> * qemu HEAD (4e66a0854)
>> * Actual OHCI controller (hardware)
>>
>> Command line:
>> qemu-system-x86_64 -m 20 \
>>  -device pci-ohci,id=ohci \
>>  -drive if=none,format=raw,id=d,file=testmbr.raw \
>>  -device usb-storage,bus=ohci.0,drive=d \
>>  --trace "usb_*" --trace "ohci_*" -D qemu.log
>>
>> Results are:
>>
>>  qemu 4.2   | qemu HEAD  | actual HW
>> ++
>>  works fine | ohci_die() | works fine
>>
>> Tip: if the flags "-serial pty -serial stdio" are added to the command line
>> the test will output USB requests like this:
>>
>> Testing qemu HEAD:
>>
>>> Free mem 2M ohci port2 conn FS
>>> setup { 80 6 0 1 0 0 8 0 }
>>> ED info=8 { mps=8 en=0 d=0 } tail=c20920
>>>   td0 c20880 nxt=c20960 f200 setup cbp=c20900 be=c20907
>>>   td1 c20960 nxt=c20980 f314in cbp=c20908 be=c2090f
>>>   td2 c20980 nxt=c20920 f308   out cbp=c20910 be=c2090f ohci20 host err
>>> usb stopped
>>
>> And in qemu.log:
>>
>> usb_ohci_iso_td_bad_cc_overrun ISO_TD start_offset=0x00c20910 > 
>> next_offset=0x00c2090f
>>
>> Testing qemu 4.2:
>>
>>> Free mem 2M ohci port2 conn FS
>>> setup { 80 6 0 1 0 0 8 0 }
>>> ED info=8 { mps=8 en=0 d=0 } tail=620920
>>>   td0 620880 nxt=620960 f200 setup cbp=620900 be=620907   cbp=0 
>>> be=620907
>>>   td1 620960 nxt=620980 f314in cbp=620908 be=62090f   cbp=0 
>>> be=62090f
>>>   td2 620980 nxt=620920 f308   out cbp=620910 be=62090f   cbp=0 
>>> be=62090f
>>>rx { 12 1 0 2 0 0 0 8 }
>>> setup { 0 5 1 0 0 0 0 0 } tx {}
>>> ED info=8 { mps=8 en=0 d=0 } tail=620880
>>>   td0 620920 nxt=620960 f200 setup cbp=620900 be=620907   cbp=0 
>>> be=620907
>>>   td1 620960 nxt=620880 f310in cbp=620908 be=620907   cbp=0 
>>> be=620907
>>> setup { 80 6 0 1 0 0 12 0 }
>>> ED info=80001 { mps=8 en=0 d=1 } tail=620960
>>>   td0 620880 nxt=6209c0 f200 setup cbp=620920 be=620927   cbp=0 
>>> be=620927
>>>   td1 6209c0 nxt=6209e0 f314in cbp=620928 be=620939   cbp=0 
>>> be=620939
>>>   td2 6209e0 nxt=620960 f308   out cbp=62093a be=620939   cbp=0 
>>> be=620939
>>>rx { 12 1 0 2 0 0 0 8 f4 46 1 0 0 0 1 2 3 1 }
>>> setup { 80 6 0 2 0 0 0 1 }
>>> ED info=80001 { mps=8 en=0 d=1 } tail=620880
>>>   td0 620960 nxt=6209a0 f200 setup cbp=620a20 be=620a27   cbp=0 
>>> be=620a27
>>>   td1

Re: [PATCH v4 5/5] iotests: add backup-discard-source

2024-06-12 Thread Vladimir Sementsov-Ogievskiy


On 11.06.24 20:49, Kevin Wolf wrote:

Am 13.03.2024 um 16:28 hat Vladimir Sementsov-Ogievskiy geschrieben:

Add test for a new backup option: discard-source.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fiona Ebner 
Tested-by: Fiona Ebner 


This test fails for me, and it already does so after this commit that
introduced it. I haven't checked what get_actual_size(), but I'm running
on XFS, so its preallocation could be causing this. We generally avoid
checking the number of allocated blocks in image files for this reason.



Hmm right, I see that relying on allocated size is bad thing. Better is to 
check block status, to see how many qcow2 clusters are allocated.

Do we have some qmp command to get such information? The simplest way I see is 
to add dirty to temp block-node, and then check its dirty count through 
query-named-block-nodes

--
Best regards,
Vladimir

[PATCH 0/4] i386/cpu: Add support for perfmon-v2, RAS bits and EPYC-Turin CPU model

2024-06-12 Thread Babu Moger



This series adds the support for following features in qemu.
1. RAS feature bits (SUCCOR, McaOverflowRecov)
2. perfmon-v2
3. Update EPYC-Genoa to support perfmon-v2 and RAS bits
4. Add support for EPYC-Turin


Babu Moger (3):
  i386/cpu: Add RAS feature bits on EPYC CPU models
  i386/cpu: Enable perfmon-v2 and RAS feature bits on EPYC-Genoa
  i386/cpu: Add support for EPYC-Turin model

Sandipan Das (1):
  i386/cpu: Add PerfMonV2 feature bit

 target/i386/cpu.c | 202 ++
 target/i386/cpu.h |   4 +
 2 files changed, 206 insertions(+)

-- 
2.34.1

[PATCH 3/4] i386/cpu: Enable perfmon-v2 and RAS feature bits on EPYC-Genoa

2024-06-12 Thread Babu Moger

Following feature bits are added on EPYC-Genoa-v2 model.

perfmon-v2: Allows guests to make use of the PerfMonV2 features.

SUCCOR: Software uncorrectable error containment and recovery capability.
The processor supports software containment of uncorrectable errors
through context synchronizing data poisoning and deferred error
interrupts.

McaOverflowRecov: MCA overflow recovery support.

The feature details are available in APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41.

Signed-off-by: Babu Moger 
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
---
 target/i386/cpu.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 7f1837cdc9..64e6dc62e2 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -5272,6 +5272,21 @@ static const X86CPUDefinition builtin_x86_defs[] = {
 .xlevel = 0x8022,
 .model_id = "AMD EPYC-Genoa Processor",
 .cache_info = _genoa_cache_info,
+.versions = (X86CPUVersionDefinition[]) {
+{ .version = 1 },
+{
+.version = 2,
+.props = (PropValue[]) {
+{ "overflow-recov", "on" },
+{ "succor", "on" },
+{ "perfmon-v2", "on" },
+{ "model-id",
+  "AMD EPYC-Genoa-v2 Processor" },
+{ /* end of list */ }
+},
+},
+{ /* end of list */ }
+}
 },
 };
 
-- 
2.34.1

[PATCH 1/4] i386/cpu: Add RAS feature bits on EPYC CPU models

2024-06-12 Thread Babu Moger

Add the support for following RAS features bits on AMD guests.

SUCCOR: Software uncorrectable error containment and recovery capability.
The processor supports software containment of uncorrectable errors
through context synchronizing data poisoning and deferred error
interrupts.

McaOverflowRecov: MCA overflow recovery support.

Signed-off-by: Babu Moger 
---
 target/i386/cpu.c | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 165b982c8c..86a90b1405 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -4939,6 +4939,16 @@ static const X86CPUDefinition builtin_x86_defs[] = {
 },
 .cache_info = _v4_cache_info
 },
+{
+.version = 5,
+.props = (PropValue[]) {
+{ "overflow-recov", "on" },
+{ "succor", "on" },
+{ "model-id",
+  "AMD EPYC-v5 Processor" },
+{ /* end of list */ }
+},
+},
 { /* end of list */ }
 }
 },
@@ -5077,6 +5087,16 @@ static const X86CPUDefinition builtin_x86_defs[] = {
 { /* end of list */ }
 },
 },
+{
+.version = 5,
+.props = (PropValue[]) {
+{ "overflow-recov", "on" },
+{ "succor", "on" },
+{ "model-id",
+  "AMD EPYC-Rome-v5 Processor" },
+{ /* end of list */ }
+},
+},
 { /* end of list */ }
 }
 },
@@ -5152,6 +5172,16 @@ static const X86CPUDefinition builtin_x86_defs[] = {
 },
 .cache_info = _milan_v2_cache_info
 },
+{
+.version = 3,
+.props = (PropValue[]) {
+{ "overflow-recov", "on" },
+{ "succor", "on" },
+{ "model-id",
+  "AMD EPYC-Milan-v3 Processor" },
+{ /* end of list */ }
+},
+},
 { /* end of list */ }
 }
 },
-- 
2.34.1

[PATCH 2/4] i386/cpu: Add PerfMonV2 feature bit

2024-06-12 Thread Babu Moger

From: Sandipan Das 

CPUID leaf 0x8022, i.e. ExtPerfMonAndDbg, advertises new performance
monitoring features for AMD processors. Bit 0 of EAX indicates support
for Performance Monitoring Version 2 (PerfMonV2) features. If found to
be set during PMU initialization, the EBX bits can be used to determine
the number of available counters for different PMUs. It also denotes the
availability of global control and status registers.

Add the required CPUID feature word and feature bit to allow guests to
make use of the PerfMonV2 features.

Signed-off-by: Sandipan Das 
Signed-off-by: Babu Moger 
---
 target/i386/cpu.c | 26 ++
 target/i386/cpu.h |  4 
 2 files changed, 30 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 86a90b1405..7f1837cdc9 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1228,6 +1228,22 @@ FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
 .tcg_features = 0,
 .unmigratable_flags = 0,
 },
+[FEAT_8000_0022_EAX] = {
+.type = CPUID_FEATURE_WORD,
+.feat_names = {
+"perfmon-v2", NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+},
+.cpuid = { .eax = 0x8022, .reg = R_EAX, },
+.tcg_features = 0,
+.unmigratable_flags = 0,
+},
 [FEAT_XSAVE] = {
 .type = CPUID_FEATURE_WORD,
 .feat_names = {
@@ -6998,6 +7014,16 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 *edx = 0;
 }
 break;
+case 0x8022:
+*eax = *ebx = *ecx = *edx = 0;
+/* AMD Extended Performance Monitoring and Debug */
+if (kvm_enabled() && cpu->enable_pmu &&
+(env->features[FEAT_8000_0022_EAX] & 
CPUID_8000_0022_EAX_PERFMON_V2)) {
+*eax = CPUID_8000_0022_EAX_PERFMON_V2;
+*ebx = kvm_arch_get_supported_cpuid(cs->kvm_state, index, count,
+R_EBX) & 0xf;
+}
+break;
 case 0xC000:
 *eax = env->cpuid_xlevel2;
 *ebx = 0;
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index ba7f740392..03378da8fa 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -611,6 +611,7 @@ typedef enum FeatureWord {
 FEAT_8000_0007_EDX, /* CPUID[8000_0007].EDX */
 FEAT_8000_0008_EBX, /* CPUID[8000_0008].EBX */
 FEAT_8000_0021_EAX, /* CPUID[8000_0021].EAX */
+FEAT_8000_0022_EAX, /* CPUID[8000_0022].EAX */
 FEAT_C000_0001_EDX, /* CPUID[C000_0001].EDX */
 FEAT_KVM,   /* CPUID[4000_0001].EAX (KVM_CPUID_FEATURES) */
 FEAT_KVM_HINTS, /* CPUID[4000_0001].EDX */
@@ -986,6 +987,9 @@ uint64_t x86_cpu_get_supported_feature_word(FeatureWord w,
 /* Automatic IBRS */
 #define CPUID_8000_0021_EAX_AUTO_IBRS   (1U << 8)
 
+/* Performance Monitoring Version 2 */
+#define CPUID_8000_0022_EAX_PERFMON_V2  (1U << 0)
+
 #define CPUID_XSAVE_XSAVEOPT   (1U << 0)
 #define CPUID_XSAVE_XSAVEC (1U << 1)
 #define CPUID_XSAVE_XGETBV1(1U << 2)
-- 
2.34.1

[PATCH 4/4] i386/cpu: Add support for EPYC-Turin model

2024-06-12 Thread Babu Moger

Adds the support for AMD EPYC zen 5 processors(EPYC-Turin).

Adds the following new feature bits on top of the feature bits from
the previous generation EPYC models.

movdiri: Move Doubleword as Direct Store Instruction
movdir64b  : Move 64 Bytes as Direct Store Instruction
avx512-vp2intersect: AVX512 Vector Pair Intersection to a Pair
 of Mask Register
avx-vnni   : AVX VNNI Instruction

Signed-off-by: Babu Moger 
---
 target/i386/cpu.c | 131 ++
 1 file changed, 131 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 64e6dc62e2..213b5f12f0 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -2370,6 +2370,60 @@ static const CPUCaches epyc_genoa_cache_info = {
 },
 };
 
+static const CPUCaches epyc_turin_cache_info = {
+.l1d_cache = &(CPUCacheInfo) {
+.type = DATA_CACHE,
+.level = 1,
+.size = 48 * KiB,
+.line_size = 64,
+.associativity = 12,
+.partitions = 1,
+.sets = 64,
+.lines_per_tag = 1,
+.self_init = 1,
+.no_invd_sharing = true,
+.share_level = CPU_TOPO_LEVEL_CORE,
+},
+.l1i_cache = &(CPUCacheInfo) {
+.type = INSTRUCTION_CACHE,
+.level = 1,
+.size = 32 * KiB,
+.line_size = 64,
+.associativity = 8,
+.partitions = 1,
+.sets = 64,
+.lines_per_tag = 1,
+.self_init = 1,
+.no_invd_sharing = true,
+.share_level = CPU_TOPO_LEVEL_CORE,
+},
+.l2_cache = &(CPUCacheInfo) {
+.type = UNIFIED_CACHE,
+.level = 2,
+.size = 1 * MiB,
+.line_size = 64,
+.associativity = 16,
+.partitions = 1,
+.sets = 1024,
+.lines_per_tag = 1,
+.share_level = CPU_TOPO_LEVEL_CORE,
+},
+.l3_cache = &(CPUCacheInfo) {
+.type = UNIFIED_CACHE,
+.level = 3,
+.size = 32 * MiB,
+.line_size = 64,
+.associativity = 16,
+.partitions = 1,
+.sets = 32768,
+.lines_per_tag = 1,
+.self_init = true,
+.inclusive = true,
+.complex_indexing = false,
+.share_level = CPU_TOPO_LEVEL_DIE,
+},
+};
+
 /* The following VMX features are not supported by KVM and are left out in the
  * CPU definitions:
  *
@@ -5288,6 +5342,83 @@ static const X86CPUDefinition builtin_x86_defs[] = {
 { /* end of list */ }
 }
 },
+{
+.name = "EPYC-Turin",
+.level = 0xd,
+.vendor = CPUID_VENDOR_AMD,
+.family = 26,
+.model = 0,
+.stepping = 0,
+.features[FEAT_1_ECX] =
+CPUID_EXT_RDRAND | CPUID_EXT_F16C | CPUID_EXT_AVX |
+CPUID_EXT_XSAVE | CPUID_EXT_AES |  CPUID_EXT_POPCNT |
+CPUID_EXT_MOVBE | CPUID_EXT_SSE42 | CPUID_EXT_SSE41 |
+CPUID_EXT_PCID | CPUID_EXT_CX16 | CPUID_EXT_FMA |
+CPUID_EXT_SSSE3 | CPUID_EXT_MONITOR | CPUID_EXT_PCLMULQDQ |
+CPUID_EXT_SSE3,
+.features[FEAT_1_EDX] =
+CPUID_SSE2 | CPUID_SSE | CPUID_FXSR | CPUID_MMX | CPUID_CLFLUSH |
+CPUID_PSE36 | CPUID_PAT | CPUID_CMOV | CPUID_MCA | CPUID_PGE |
+CPUID_MTRR | CPUID_SEP | CPUID_APIC | CPUID_CX8 | CPUID_MCE |
+CPUID_PAE | CPUID_MSR | CPUID_TSC | CPUID_PSE | CPUID_DE |
+CPUID_VME | CPUID_FP87,
+.features[FEAT_6_EAX] =
+CPUID_6_EAX_ARAT,
+.features[FEAT_7_0_EBX] =
+CPUID_7_0_EBX_FSGSBASE | CPUID_7_0_EBX_BMI1 | CPUID_7_0_EBX_AVX2 |
+CPUID_7_0_EBX_SMEP | CPUID_7_0_EBX_BMI2 | CPUID_7_0_EBX_ERMS |
+CPUID_7_0_EBX_INVPCID | CPUID_7_0_EBX_AVX512F |
+CPUID_7_0_EBX_AVX512DQ | CPUID_7_0_EBX_RDSEED | CPUID_7_0_EBX_ADX |
+CPUID_7_0_EBX_SMAP | CPUID_7_0_EBX_AVX512IFMA |
+CPUID_7_0_EBX_CLFLUSHOPT | CPUID_7_0_EBX_CLWB |
+CPUID_7_0_EBX_AVX512CD | CPUID_7_0_EBX_SHA_NI |
+CPUID_7_0_EBX_AVX512BW | CPUID_7_0_EBX_AVX512VL,
+.features[FEAT_7_0_ECX] =
+CPUID_7_0_ECX_AVX512_VBMI | CPUID_7_0_ECX_UMIP | CPUID_7_0_ECX_PKU 
|
+CPUID_7_0_ECX_AVX512_VBMI2 | CPUID_7_0_ECX_GFNI |
+CPUID_7_0_ECX_VAES | CPUID_7_0_ECX_VPCLMULQDQ |
+CPUID_7_0_ECX_AVX512VNNI | CPUID_7_0_ECX_AVX512BITALG |
+CPUID_7_0_ECX_AVX512_VPOPCNTDQ | CPUID_7_0_ECX_LA57 |
+CPUID_7_0_ECX_RDPID | CPUID_7_0_ECX_MOVDIRI |
+CPUID_7_0_ECX_MOVDIR64B,
+.features[FEAT_7_0_EDX] =
+CPUID_7_0_EDX_FSRM | CPUID_7_0_EDX_AVX512_VP2INTERSECT,
+.features[FEAT_7_1_EAX] =
+CPUID_7_1_EAX_AVX_VNNI | CPUID_7_1_EAX_AVX512_BF16,
+.features[FEAT_8000_0001_ECX] =
+CPUID_EXT3_OSVW | CPUID_EXT3_3DNOWPREFETCH |
+CPUID_EXT3_MISALIGNSSE | CPUID_EXT3_SSE4A | CPUID_EXT3_ABM |
+CPUID_EXT3_CR8LEG | CPUID_EXT3_SVM |

Re: [PATCH 2/4] migration: Rename thread debug names

2024-06-12 Thread Fabiano Rosas

Peter Xu  writes:

> The postcopy thread names on dest QEMU are slightly confusing, partly I'll
> need to blame myself on 36f62f11e4 ("migration: Postcopy preemption
> preparation on channel creation").  E.g., "fault-fast" reads like a fast
> version of "fault-default", but it's actually the fast version of
> "postcopy/listen".
>
> Taking this chance, rename all the migration threads with proper rules.
> Considering we only have 15 chars usable, prefix all threads with "mig/",
> meanwhile identify src/dst threads properly this time.  So now most thread
> names will look like "mig/DIR/xxx", where DIR will be "src"/"dst", except
> the bg-snapshot thread which doesn't have a direction.
>
> For multifd threads, making them "mig/{src|dst}/{send|recv}_%d".
>
> We used to have "live_migration" thread for a very long time, now it's
> called "mig/src/main".  We may hope to have "mig/dst/main" soon but not
> yet.
>
> Signed-off-by: Peter Xu 

Reviewed-by: Fabiano Rosas

Re: [PATCH] ui/gtk: Wait until the current guest frame is rendered before switching to RUN_STATE_SAVE_VM

2024-06-12 Thread Kim, Dongwon

On 6/11/2024 10:44 PM, Marc-André Lureau wrote:

Hi

On Wed, Jun 12, 2024 at 5:29 AM Kim, Dongwon > wrote:

Hi,

From: Marc-André Lureau mailto:marcandre.lur...@gmail.com>>
Sent: Wednesday, June 5, 2024 12:56 AM
To: Kim, Dongwon mailto:dongwon@intel.com>>
Cc: qemu-devel@nongnu.org ; Peter Xu
mailto:pet...@redhat.com>>
Subject: Re: [PATCH] ui/gtk: Wait until the current guest frame is
rendered before switching to RUN_STATE_SAVE_VM

Hi

On Tue, Jun 4, 2024 at 9:49 PM Kim, Dongwon
> wrote:
On 6/4/2024 4:12 AM, Marc-André Lureau wrote:
 > Hi
 >
 > On Thu, May 30, 2024 at 2:44 AM 
 > :dongwon@intel.com
>> wrote:
 >
 >     From: Dongwon  :dongwon@intel.com >>
 >
 >     Make sure rendering of the current frame is finished before
switching
 >     the run state to RUN_STATE_SAVE_VM by waiting for egl-sync
object to be
 >     signaled.
 >
 >
 > Can you expand on what this solves?

In current scheme, guest waits for the fence to be signaled for each
frame it submits before moving to the next frame. If the guest’s state
is saved while it is still waiting for the fence, The guest will
continue to  wait for the fence that was signaled while ago when it is
restored to the point. One way to prevent it is to get it finish the
current frame before changing the state.

After the UI sets a fence, hw_ops->gl_block(true) gets called, which
will block virtio-gpu/virgl from processing commands (until the
fence is signaled and gl_block/false called again).

But this "blocking" state is not saved. So how does this affect
save/restore? Please give more details, thanks

Yeah sure. "Blocking" state is not saved but guest's state is saved
while it was still waiting for the response for its last
resource-flush virtio msg. This virtio response, by the way is set
to be sent to the guest when the pipeline is unblocked (and when the
fence is signaled.). Once the guest's state is saved, current
instance of guest will be continued and receives the response as
usual. The problem is happening when we restore the saved guest's
state again because what guest does will be waiting for the response
that was sent a while ago to the original instance.

Where is the pending response saved? Can you detail how you test this?

There is no pending response for the guest's restored point, which is a 
problem. The response is sent out after saving is done.

Normal cycle :

resource-flush (scanout flush) -> gl block -> render -> gl unblock 
(after fence is signaled) -> pending response sent out to the guest -> 
guest (virtio-gpu drv) processes the next scanout frame -> (next cycle) 
resource-flush -> gl block ..

When vm state is saved in the middle :

resource-flush (scanout-flush) -> gl block -> saving vm-state -> render 
-> gl unblock -> pending response (resp #1) sent out to the guest -> 
guest (virtio-gpu drv) processes the next scanout frame -> (next cycle) 
resource-flush -> gl block ..

Now, we restore the vm-state we saved

vm-state is restored -> guest (virtio-gpu drv) can't move on as this 
state is still waiting for the response (resp #1)

So we need to make sure vm-state is saved after the cycle is completed.

This situation would be only happening if you use blob=true with 
virtio-gpu drv as KMS on the linux guest. Do you have any similar setup?

thanks

--
Marc-André Lureau

Re: [PATCH v2 18/18] migration/ram: Add direct-io support to precopy file migration

2024-06-12 Thread Fabiano Rosas

Peter Xu  writes:

> On Wed, Jun 12, 2024 at 07:15:19PM +0100, Daniel P. Berrangé wrote:
>> IIUC, with the "fixed-ram" feature, the on-disk format of a saved VM
>> should end up the same whether we're using traditional migration, or
>> multifd migration. Use of multifd is simply an optimization that lets
>> us write RAM in parallel to the file, with direct-io further optimizing.
>> 
>> There's also a clear break with libvirt between the existing on-disk
>> format libvirt uses, and the new fixed-ram format. So we have no backwards
>> compatibilty concerns added from multifd, beyond what we already have to
>> figure out when deciding on use of 'fixed-ram'. 
>> 
>> Thus I believe there is no downside to always using multifd for save
>> images with fixed-ram, even if we only want nchannels=1.
>
> That sounds good.
>
> Just to double check with all of us: so we allow mapped-ram to be used in
> whatever case when !dio, however we restrict dio only when with multifd=on,
> am I right?

Yes. The restricting part is not yet in place. I'll add a multifd check
to migrate_direct_io():

bool migrate_direct_io(void)
{
MigrationState *s = migrate_get_current();

return s->parameters.direct_io &&
s->capabilities[MIGRATION_CAPABILITY_MAPPED_RAM] &&
s->capabilities[MIGRATION_CAPABILITY_MULTIFD];
}

Sample 'qapi-schema-diff' output for v9.0.0 vs origin/master

2024-06-12 Thread John Snow

Hiya, here's some draft output of a script I'm working on to summarize QMP
wire format differences between QEMU versions.

This script works in terms of the QMP *wire format*, not the QAPI
*specification*. As a consequence, *almost* all type names are stripped
from this output and all nested structures are fully inlined - so changes
to shared data structures, enums, etc will manifest as many independent
changes. Similarly, changes to type names and type factorings that do not
change the wire format will not appear in this report at all.

This is still a WIP: if conditionals and features may not be fully
represented in this summary report.

Here's today's diff output, see if you think this format is "intuitive" or
makes sense to some extent; or, if you're feeling bored, if you believe
it's fully accurate:

jsnow@scv ~/s/qemu (master)> qapi-schema-diff qapi-compiled-v9.0.0.json
qapi-compiled-v9.0.0-1388-g80e8f060216.json
###
v9.0.0 ==> v9.0.0-1388-g80e8f060216
###


commands


Removed
===
x-query-rdma

Modified

blockdev-backup (arguments)
++ arguments.discard-source: Optional
drive-backup (arguments)
++ arguments.discard-source: Optional
migrate (arguments)
-- arguments.blk: Optional
-- arguments.inc: Optional
migrate-incoming (arguments)
++ arguments.exit-on-error: Optional
migrate-set-capabilities (arguments)
·· arguments.capabilities[].capability: enum
-- 'block'
-- 'compress'
migrate-set-parameters (arguments)
-- arguments.block-incremental: Optional
-- arguments.compress-level: Optional
-- arguments.compress-threads: Optional
-- arguments.compress-wait-thread: Optional
-- arguments.decompress-threads: Optional
object-add (arguments)
·· arguments.qom-type: enum
++ 'sev-snp-guest'
++ arguments.legacy-vm-type: Optional
++ arguments.author-key-enabled:
Optional
++ arguments.cbitpos: Optional
++ arguments.guest-visible-workarounds:
Optional
++ arguments.host-data: Optional
++ arguments.id-auth: Optional
++ arguments.id-block: Optional
++ arguments.kernel-hashes: Optional
++ arguments.policy: Optional
++ arguments.reduced-phys-bits: integer
++ arguments.sev-device: Optional
++ arguments.vcek-disabled: Optional
query-cpu-model-baseline (returns, arguments)
++ arguments.modela.deprecated-props: Optional
++ arguments.modela.deprecated-props[]: string
++ arguments.modelb.deprecated-props: Optional
++ arguments.modelb.deprecated-props[]: string
++ returns.model.deprecated-props: Optional
++ returns.model.deprecated-props[]: string
query-cpu-model-comparison (arguments)
++ arguments.modela.deprecated-props: Optional
++ arguments.modela.deprecated-props[]: string
++ arguments.modelb.deprecated-props: Optional
++ arguments.modelb.deprecated-props[]: string
query-cpu-model-expansion (returns, arguments)
++ arguments.model.deprecated-props: Optional
++ arguments.model.deprecated-props[]: string
++ returns.model.deprecated-props: Optional
++ returns.model.deprecated-props[]: string
query-cpus-fast (returns)
++ returns[].props.module-id: Optional
·· returns[].target: enum
-- 'nios2'
query-hotpluggable-cpus (returns)
++ returns[].props.module-id: Optional
query-machines (returns, arguments)
++ arguments.compat-props: Optional
++ returns[].compat-props: Optional
++ returns[].compat-props[]: object
++ returns[].compat-props[].property: string
++ returns[].compat-props[].qom-type: string
++ returns[].compat-props[].value: string
query-migrate (returns)
-- returns.compression: Optional
-- returns.compression.busy: integer
-- returns.compression.busy-rate: number
-- returns.compression.compressed-size: integer
-- returns.compression.compression-rate: number
-- returns.compression.pages: integer
-- returns.disk: Optional
-- returns.disk.dirty-pages-rate: integer
-- returns.disk.dirty-sync-count: integer
-- returns.disk.dirty-sync-missed-zero-copy: integer
-- returns.disk.downtime-bytes: integer
-- returns.disk.duplicate: integer
-- returns.disk.mbps: number
-- returns.disk.multifd-bytes: integer
-- returns.disk.normal: integer
-- returns.disk.normal-bytes: integer
-- returns.disk.page-size: integer
-- returns.disk.pages-per-second: integer
-- returns.disk.postcopy-bytes: integer
-- returns.disk.postcopy-requests: integer
-- returns.disk.precopy-bytes: integer
-- returns.disk.remaining: integer
-- returns.disk.skipped: integer
-- returns.disk.total: integer
-- returns.disk.transferred: integer
-- returns.ram.skipped: integer
query-migrate-capabilities (returns)
·· returns[].capability: enum
-- 'block'
-- 'compress'
query-migrate-parameters (returns)
-- returns.block-incremental: Optional
--

Re: [PATCH] Hexagon: lldb read/write predicate registers p0/p1/p2/p3

2024-06-12 Thread Matheus Tavares Bernardino

On Wed, 12 Jun 2024 10:42:39 -0600 Taylor Simpson  
wrote:
>
> diff --git a/target/hexagon/gdbstub.c b/target/hexagon/gdbstub.c
> index 502c6987f0..e67e627fc9 100644
> --- a/target/hexagon/gdbstub.c
> +++ b/target/hexagon/gdbstub.c
> @@ -56,6 +64,15 @@ int hexagon_gdb_write_register(CPUState *cs, uint8_t 
> *mem_buf, int n)
>  return sizeof(target_ulong);
>  }
>  
> +n -= TOTAL_PER_THREAD_REGS;
> +
> +if (n < NUM_PREGS) {
> +env->pred[n] = ldtul_p(mem_buf);
> +return sizeof(uint8_t);

I wonder, shouldn't this be sizeof(target_ulong) since we wrote a target_ulong?

> +}
> +
> +n -= NUM_PREGS;
> +
>  g_assert_not_reached();
>  }

Re: [PATCH v2 18/18] migration/ram: Add direct-io support to precopy file migration

2024-06-12 Thread Peter Xu

On Wed, Jun 12, 2024 at 07:15:19PM +0100, Daniel P. Berrangé wrote:
> IIUC, with the "fixed-ram" feature, the on-disk format of a saved VM
> should end up the same whether we're using traditional migration, or
> multifd migration. Use of multifd is simply an optimization that lets
> us write RAM in parallel to the file, with direct-io further optimizing.
> 
> There's also a clear break with libvirt between the existing on-disk
> format libvirt uses, and the new fixed-ram format. So we have no backwards
> compatibilty concerns added from multifd, beyond what we already have to
> figure out when deciding on use of 'fixed-ram'. 
> 
> Thus I believe there is no downside to always using multifd for save
> images with fixed-ram, even if we only want nchannels=1.

That sounds good.

Just to double check with all of us: so we allow mapped-ram to be used in
whatever case when !dio, however we restrict dio only when with multifd=on,
am I right?

I'd personally like that, and it also pretty much matches with what we have
with tcp zerocopy send too. After all they're really similar stuff to me on
pinning implications and locked_vm restrictions.  It's just that the target
of the data movement is different here, either to NIC, or to/from a file.

Thanks,

-- 
Peter Xu

Re: [PATCH v2 18/18] migration/ram: Add direct-io support to precopy file migration

2024-06-12 Thread Daniel P . Berrangé

On Wed, Jun 12, 2024 at 03:08:02PM -0300, Fabiano Rosas wrote:
> Fabiano Rosas  writes:
> 
> > Peter Xu  writes:
> >
> >> On Mon, Jun 10, 2024 at 02:45:53PM -0300, Fabiano Rosas wrote:
> I looked into this and it's cumbersome:
> 
> - We'd need to check migrate_direct_io() several times, once to get the
>   second fd and during every IO to know to use the fd.
> 
> - Even getting the second fd is not straight forward, we need to create
>   a new ioc for it with qio_channel_new_path(). But QEMUFile is generic
>   code, so we'd probably need to call this channel-file specific
>   function from migration_channel_connect().
> 
> - With the new ioc, do we put it in QEMUFile, or do we take the fd only?
>   Or maybe an ioc with two fds? Or a new QIOChannelDirect? All options
>   look bad to me.
> 
> So I suggest we proceed proceed with the 1 multifd channel approach,
> passing 2 fds into QEMU just like we do for the n channels. Is that ok
> from libvirt's perspective? I assume libvirt users are mostly interested
> in _enabling_ parallelism with --parallel, instead of _avoiding_ it with
> the ommision of the option, so main thread + 1 channel should not be a
> bad thing.

IIUC, with the "fixed-ram" feature, the on-disk format of a saved VM
should end up the same whether we're using traditional migration, or
multifd migration. Use of multifd is simply an optimization that lets
us write RAM in parallel to the file, with direct-io further optimizing.

There's also a clear break with libvirt between the existing on-disk
format libvirt uses, and the new fixed-ram format. So we have no backwards
compatibilty concerns added from multifd, beyond what we already have to
figure out when deciding on use of 'fixed-ram'. 

Thus I believe there is no downside to always using multifd for save
images with fixed-ram, even if we only want nchannels=1.

> Choosing to use 1 multifd channel now is also a gentler introduction for
> when we finally move all of the vmstate migration into multifd (I've
> been looking into this, but don't hold your breaths).

Yes, future proofing is a good idea.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v2 18/18] migration/ram: Add direct-io support to precopy file migration

2024-06-12 Thread Fabiano Rosas

Fabiano Rosas  writes:

> Peter Xu  writes:
>
>> On Mon, Jun 10, 2024 at 02:45:53PM -0300, Fabiano Rosas wrote:
>>> >> AIUI, the issue here that users are already allowed to specify in
>>> >> libvirt the equivalent to direct-io and multifd independent of each
>>> >> other (bypass-cache, parallel). To start requiring both together now in
>>> >> some situations would be a regression. I confess I don't know libvirt
>>> >> code to know whether this can be worked around somehow, but as I said,
>>> >> it's a relatively simple change from the QEMU side.
>>> >
>>> > Firstly, I definitely want to already avoid all the calls to either
>>> > migration_direct_io_start() or *_finish(), now we already need to
>>> > explicitly call them in three paths, and that's not intuitive and less
>>> > readable, just like the hard coded rdma codes.
>>> 
>>> Right, but that's just a side-effect of how the code is structured and
>>> the fact that writes to the stream happen in small chunks. Setting
>>> O_DIRECT needs to happen around aligned IO. We could move the calls
>>> further down into qemu_put_buffer_at(), but that would be four fcntl()
>>> calls for every page.
>>
>> Hmm.. why we need four fcntl()s instead of two?
>
> Because we need to first get the flags before flipping the O_DIRECT
> bit. And we do this once to enable and once to disable.
>
> int flags = fcntl(fioc->fd, F_GETFL);
> if (enabled) {
> flags |= O_DIRECT;
> } else {
> flags &= ~O_DIRECT;
> }
> fcntl(fioc->fd, F_SETFL, flags);
>
>>> 
>>> A tangent:
>>>  one thing that occured to me now is that we may be able to restrict
>>>  calls to qemu_fflush() to internal code like add_to_iovec() and maybe
>>>  use that function to gather the correct amount of data before writing,
>>>  making sure it disables O_DIRECT in case alignment is about to be
>>>  broken?
>>
>> IIUC dio doesn't require alignment if we don't care about perf?  I meant it
>> should be legal to write(fd, buffer, 5) even if O_DIRECT?
>
> No, we may get an -EINVAL. See Daniel's reply.
>
>>
>> I just noticed the asserts you added in previous patch, I think that's
>> better indeed, but still I'm wondering whether we can avoid enabling it on
>> qemufile.
>>
>> It makes me feel slightly nervous when introducing dio to QEMUFile rather
>> than iochannels - the API design of QEMUFile seems to easily encourage
>> breaking things in dio worlds with a default and static buffering. And if
>> we're going to blacklist most of the API anyway except the new one for
>> mapped-ram, I start to wonder too why bother on top of QEMUFile anyway.
>>
>> IIRC you also mentioned in the previous doc patch so that libvirt should
>> always pass in two fds anyway to the fdset if dio is enabled.  I wonder
>> whether it's also true for multifd=off and directio=on, then would it be
>> possible to use the dio for guest pages with one fd, while keeping the
>> normal stream to use !dio with the other fd.  I'm not sure whether it's
>> easy to avoid qemufile in the dio fd, even if not looks like we may avoid
>> frequent fcntl()s?
>
> Hm, sounds like a good idea. We'd need a place to put that new ioc
> though. Either QEMUFile.direct_ioc and then make use of it in
> qemu_put_buffer_at() or a more transparent QIOChannelFile.direct_fd that
> gets set somewhere during file_start_outgoing_migration(). Let me try to
> come up with something.

I looked into this and it's cumbersome:

- We'd need to check migrate_direct_io() several times, once to get the
  second fd and during every IO to know to use the fd.

- Even getting the second fd is not straight forward, we need to create
  a new ioc for it with qio_channel_new_path(). But QEMUFile is generic
  code, so we'd probably need to call this channel-file specific
  function from migration_channel_connect().

- With the new ioc, do we put it in QEMUFile, or do we take the fd only?
  Or maybe an ioc with two fds? Or a new QIOChannelDirect? All options
  look bad to me.

So I suggest we proceed proceed with the 1 multifd channel approach,
passing 2 fds into QEMU just like we do for the n channels. Is that ok
from libvirt's perspective? I assume libvirt users are mostly interested
in _enabling_ parallelism with --parallel, instead of _avoiding_ it with
the ommision of the option, so main thread + 1 channel should not be a
bad thing.

Choosing to use 1 multifd channel now is also a gentler introduction for
when we finally move all of the vmstate migration into multifd (I've
been looking into this, but don't hold your breaths).

Re: [PATCH v3] hw/arm/virt: Avoid unexpected warning from Linux guest on host with Fujitsu CPUs

2024-06-12 Thread Robin Murphy


On 2024-06-12 1:50 pm, Philippe Mathieu-Daudé wrote:

On 12/6/24 14:48, Peter Maydell wrote:
On Wed, 12 Jun 2024 at 13:33, Philippe Mathieu-Daudé 
 wrote:


Hi Zhenyu,

On 12/6/24 04:05, Zhenyu Zhang wrote:

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 3c93c0c0a6..3cefac6d43 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -271,6 +271,17 @@ static void create_fdt(VirtMachineState *vms)
   qemu_fdt_setprop_cell(fdt, "/", "#size-cells", 0x2);
   qemu_fdt_setprop_string(fdt, "/", "model", "linux,dummy-virt");

+    /*
+ * For QEMU, all DMA is coherent. Advertising this in the root 
node

+ * has two benefits:
+ *
+ * - It avoids potential bugs where we forget to mark a DMA
+ *   capable device as being dma-coherent
+ * - It avoids spurious warnings from the Linux kernel about
+ *   devices which can't do DMA at all
+ */
+    qemu_fdt_setprop(fdt, "/", "dma-coherent", NULL, 0);


OK, but why restrict that to the Aarch64 virt machine?
Shouldn't advertise this generically in create_device_tree()?
Or otherwise at least in the other virt machines?


create_device_tree() creates an empty device tree, not one
with stuff in it. It seems reasonable to me for this property
on the root to be set in the same place we set other properties
of the root node.


OK. Still the question about other virt machines remains
unanswered :)


From the DT consumer point of view, the interpretation and assumptions 
around coherency *are* generally architecture- or platform-specific. For 
instance on RISC-V, many platforms want to assume coherency by default 
(and potentially use "dma-noncoherent" to mark individual devices that 
aren't), while others may still want to do the opposite and use 
"dma-coherent" in the same manner as Arm and AArch64. Neither property 
existed back in ePAPR, so typical PowerPC systems wouldn't even be 
looking and will just make their own assumptions by other means.


Thanks,
Robin.

Re: [PATCH] hw/usb/hcd-ohci: Fix ohci_service_td: accept valid TDs

2024-06-12 Thread Cord Amfmgm

On Wed, Jun 12, 2024 at 9:21 AM Alex Bennée  wrote:

> David Hubbard  writes:
>
> > From: Cord Amfmgm 
> >
> > This changes the way the ohci emulation handles a Transfer Descriptor
> with
> > "Current Buffer Pointer" set to "Buffer End" + 1.
> >
> > The OHCI spec 4.3.1.2 Table 4-2 allows td.cbp to be one byte more than
> td.be
> > to signal the buffer has zero length. Currently qemu only accepts
> zero-length
> > Transfer Descriptors if the td.cbp is equal to 0, while actual OHCI
> hardware
> > accepts both cases.
> >
> > The qemu ohci emulation has a regression in ohci_service_td. Version 4.2
> > and earlier matched the spec. (I haven't taken the time to bisect exactly
> > where the logic was changed.)
>
> I find it hard to characterise this as a regression because we've
> basically gone from no checks to actually doing bounds checks:
>
>   1328fe0c32 (hw: usb: hcd-ohci: check len and frame_number variables)
>
> The argument here seems to be that real hardware is laxer than the specs
> in what it accepts.
>
> > With a tiny OS[1] that boots and executes a test, the issue can be seen:
> >
> > * OS that sends USB requests to a USB mass storage device
> >   but sends td.cbp = td.be + 1
> > * qemu 4.2
> > * qemu HEAD (4e66a0854)
> > * Actual OHCI controller (hardware)
> >
> > Command line:
> > qemu-system-x86_64 -m 20 \
> >  -device pci-ohci,id=ohci \
> >  -drive if=none,format=raw,id=d,file=testmbr.raw \
> >  -device usb-storage,bus=ohci.0,drive=d \
> >  --trace "usb_*" --trace "ohci_*" -D qemu.log
> >
> > Results are:
> >
> >  qemu 4.2   | qemu HEAD  | actual HW
> > ++
> >  works fine | ohci_die() | works fine
> >
> > Tip: if the flags "-serial pty -serial stdio" are added to the command
> line
> > the test will output USB requests like this:
> >
> > Testing qemu HEAD:
> >
> >> Free mem 2M ohci port2 conn FS
> >> setup { 80 6 0 1 0 0 8 0 }
> >> ED info=8 { mps=8 en=0 d=0 } tail=c20920
> >>   td0 c20880 nxt=c20960 f200 setup cbp=c20900 be=c20907
> >>   td1 c20960 nxt=c20980 f314in cbp=c20908 be=c2090f
> >>   td2 c20980 nxt=c20920 f308   out cbp=c20910 be=c2090f ohci20 host
> err
> >> usb stopped
> >
> > And in qemu.log:
> >
> > usb_ohci_iso_td_bad_cc_overrun ISO_TD start_offset=0x00c20910 >
> next_offset=0x00c2090f
> >
> > Testing qemu 4.2:
> >
> >> Free mem 2M ohci port2 conn FS
> >> setup { 80 6 0 1 0 0 8 0 }
> >> ED info=8 { mps=8 en=0 d=0 } tail=620920
> >>   td0 620880 nxt=620960 f200 setup cbp=620900 be=620907   cbp=0
> be=620907
> >>   td1 620960 nxt=620980 f314in cbp=620908 be=62090f   cbp=0
> be=62090f
> >>   td2 620980 nxt=620920 f308   out cbp=620910 be=62090f   cbp=0
> be=62090f
> >>rx { 12 1 0 2 0 0 0 8 }
> >> setup { 0 5 1 0 0 0 0 0 } tx {}
> >> ED info=8 { mps=8 en=0 d=0 } tail=620880
> >>   td0 620920 nxt=620960 f200 setup cbp=620900 be=620907   cbp=0
> be=620907
> >>   td1 620960 nxt=620880 f310in cbp=620908 be=620907   cbp=0
> be=620907
> >> setup { 80 6 0 1 0 0 12 0 }
> >> ED info=80001 { mps=8 en=0 d=1 } tail=620960
> >>   td0 620880 nxt=6209c0 f200 setup cbp=620920 be=620927   cbp=0
> be=620927
> >>   td1 6209c0 nxt=6209e0 f314in cbp=620928 be=620939   cbp=0
> be=620939
> >>   td2 6209e0 nxt=620960 f308   out cbp=62093a be=620939   cbp=0
> be=620939
> >>rx { 12 1 0 2 0 0 0 8 f4 46 1 0 0 0 1 2 3 1 }
> >> setup { 80 6 0 2 0 0 0 1 }
> >> ED info=80001 { mps=8 en=0 d=1 } tail=620880
> >>   td0 620960 nxt=6209a0 f200 setup cbp=620a20 be=620a27   cbp=0
> be=620a27
> >>   td1 6209a0 nxt=6209c0 f3140004in cbp=620a28 be=620b27
>  cbp=620a48 be=620b27
> >>   td2 6209c0 nxt=620880 f308   out cbp=620b28 be=620b27   cbp=0
> be=620b27
> >>rx { 9 2 20 0 1 1 4 c0 0 9 4 0 0 2 8 6 50 0 7 5 81 2 40 0 0 7 5 2 2
> 40 0 0 }
> >> setup { 0 9 1 0 0 0 0 0 } tx {}
> >> ED info=80001 { mps=8 en=0 d=1 } tail=620900
> >>   td0 620880 nxt=620940 f200 setup cbp=620a00 be=620a07   cbp=0
> be=620a07
> >>   td1 620940 nxt=620900 f310in cbp=620a08 be=620a07   cbp=0
> be=620a07
> >
> > [1] The OS disk image has been emailed to phi...@linaro.org,
> m...@tls.msk.ru,
> > and kra...@redhat.com:
> >
> > * testCbpOffBy1.img.xz
> > * sha256:
> f87baddcb86de845de12f002c698670a426affb40946025cc32694f9daa3abed
> >
> > Signed-off-by: Cord Amfmgm 
> > ---
> >  hw/usb/hcd-ohci.c   | 4 ++--
> >  hw/usb/trace-events | 1 +
> >  2 files changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/hw/usb/hcd-ohci.c b/hw/usb/hcd-ohci.c
> > index acd6016980..71b54914d3 100644
> > --- a/hw/usb/hcd-ohci.c
> > +++ b/hw/usb/hcd-ohci.c
> > @@ -941,8 +941,8 @@ static int ohci_service_td(OHCIState *ohci, struct
> ohci_ed *ed)
> >  if ((td.cbp & 0xf000) != (td.be & 0xf000)) {
> >  len = (td.be & 0xfff) + 0x1001 - (td.cbp & 0xfff);
> >  } else {
> > -if (td.cbp > td.be) {
> > -

Re: [PATCH 0/5] Reinstate ability to use Qemu on pre-SSE4.1 x86 hosts

2024-06-12 Thread Paolo Bonzini

On Wed, Jun 12, 2024 at 7:00 PM Daniel P. Berrangé  wrote:
> > I guess that, because these helpers are called by TCG, you wouldn't
> > pay the price of the indirect call. However, adding all this
> > infrastructure for 13-15 year old CPUs is not very enthralling.
>
> Rather than re-introducing a runtime check again for everyone, could
> we make it a configure time argument whether to assume x86_64-v2 ?

Fair enough, I'll work on it.

Paolo

Re: [PATCH 0/5] Reinstate ability to use Qemu on pre-SSE4.1 x86 hosts

2024-06-12 Thread Daniel P . Berrangé

On Wed, Jun 12, 2024 at 04:09:29PM +0100, Daniel P. Berrangé wrote:
> On Wed, Jun 12, 2024 at 01:21:26PM +0100, Daniel P. Berrangé wrote:
> > On Wed, Jun 12, 2024 at 01:51:31PM +0200, Paolo Bonzini wrote:
> > > On Wed, Jun 12, 2024 at 1:38 PM Daniel P. Berrangé  
> > > wrote:
> > > > This isn't anything to do with the distro installer. The use case is 
> > > > that
> > > > the distro wants all its software to be able to run on the x86_64 
> > > > baseline
> > > > it has chosen to build with.
> > > 
> > > Sure, and they can patch the packages if their wish is not shared by
> > > upstream. Alternatively they can live with the fact that not all users
> > > will be able to use all packages, which is probably already the case.
> > 
> > Yep, there's almost certainly scientific packages that have done
> > optimizations in their builds. QEMU is slightly more special
> > though because it is classed as a "critical path" package for
> > the distro. Even the QEMU linux-user pieces are now critical path,
> > since they're leveraged by docker & podman for running foreign arch
> > containers.
> > 
> > > Or drop QEMU, I guess. Has FeSCO ever expressed how strict they are
> > > and which of the three options they'd pick?
> > 
> > I don't know - i'm going to raise this question to find out if
> > there's any guidance.
> 
> I learnt that FESCo approved a surprisingly loose rule saying
> 
>   "Libraries packaged in Fedora may require ISA extensions,
>however any packaged application must not crash on any
>officially supported architecture, either by providing
>a generic fallback implementation OR by cleanly exiting
>when the requisite hardware support is unavailable."
>

..snip..

I queried the looseness of this wording, and it is suggested
it wasn't intended to apply to existing packages, just newly
added ones. By that interpretation it wouldn't be valid for
QEMU, and we'd be pushed towards the revert downstream, to
retain a runtime check for the feature. I really hate the
idea of keeping a revert of these patches downstream though,
as it would be an indefinite rebase headache.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

[PATCH] cpu: fix memleak of 'halt_cond' and 'thread'

2024-06-12 Thread Matheus Tavares Bernardino

Since a4c2735f35 (cpu: move Qemu[Thread|Cond] setup into common code,
2024-05-30) these fields are now allocated at cpu_common_initfn(). So
let's make sure we also free them at cpu_common_finalize().

Furthermore, the code also frees these on round robin, but we missed
'halt_cond'.

Signed-off-by: Matheus Tavares Bernardino 
---
 accel/tcg/tcg-accel-ops-rr.c | 1 +
 hw/core/cpu-common.c | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/accel/tcg/tcg-accel-ops-rr.c b/accel/tcg/tcg-accel-ops-rr.c
index 84c36c1450..48c38714bd 100644
--- a/accel/tcg/tcg-accel-ops-rr.c
+++ b/accel/tcg/tcg-accel-ops-rr.c
@@ -329,6 +329,7 @@ void rr_start_vcpu_thread(CPUState *cpu)
 /* we share the thread, dump spare data */
 g_free(cpu->thread);
 qemu_cond_destroy(cpu->halt_cond);
+g_free(cpu->halt_cond);
 cpu->thread = single_tcg_cpu_thread;
 cpu->halt_cond = single_tcg_halt_cond;
 
diff --git a/hw/core/cpu-common.c b/hw/core/cpu-common.c
index bf1a7b8892..f131cde2c0 100644
--- a/hw/core/cpu-common.c
+++ b/hw/core/cpu-common.c
@@ -286,6 +286,9 @@ static void cpu_common_finalize(Object *obj)
 g_array_free(cpu->gdb_regs, TRUE);
 qemu_lockcnt_destroy(>in_ioctl_lock);
 qemu_mutex_destroy(>work_mutex);
+qemu_cond_destroy(cpu->halt_cond);
+g_free(cpu->halt_cond);
+g_free(cpu->thread);
 }
 
 static int64_t cpu_common_get_arch_id(CPUState *cpu)
-- 
2.37.2

Re: [PATCH 0/5] Reinstate ability to use Qemu on pre-SSE4.1 x86 hosts

2024-06-12 Thread Daniel P . Berrangé

On Wed, Jun 12, 2024 at 01:51:31PM +0200, Paolo Bonzini wrote:
> On Wed, Jun 12, 2024 at 1:38 PM Daniel P. Berrangé  
> wrote:
> > If we want to use POPCNT in the TCG code, can we not do a runtime check
> > and selectively build pieces of code with  
> > __attribute__((target("popcnt"))),
> > as we've done historically for the bufferiszero.c code, rather than
> > changing the entire QEMU baseline ?
> 
> bufferiszero.c has a very quick check in front of the indirect call
> and runs for several hundred clock cycles, so the tradeoff is
> different there.
> 
> I guess that, because these helpers are called by TCG, you wouldn't
> pay the price of the indirect call. However, adding all this
> infrastructure for 13-15 year old CPUs is not very enthralling.

Ah, so the distinction is that the old code had a runtime check
on 'have_popcnt' (and similar), where as now that check is eliminated
at compile time, since the condition is a constant.

Rather than re-introducing a runtime check again for everyone, could
we make it a configure time argument whether to assume x86_64-v2 ?
So those who are happy with a increased baseline can achieve the
maximum performance with all checks eliminated at compile time,
while still allowing the tradeoff of a dynamic check for those who
prefer compatibility over peak perfr ?

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

[PATCH] Hexagon: lldb read/write predicate registers p0/p1/p2/p3

2024-06-12 Thread Taylor Simpson

hexagon-core.xml only exposes register p3_0 which is an alias that
aggregates the predicate registers.  It is more convenient for users
to interact directly with the predicate registers.

Tested with lldb downloaded from this location
https://github.com/llvm/llvm-project/releases/download/llvmorg-18.1.4/clang+llvm-18.1.4-x86_64-linux-gnu-ubuntu-18.04.tar.xz

BEFORE:
(lldb) reg read p3_0
p3_0 = 0x
(lldb) reg read p0
error: Invalid register name 'p0'.
(lldb) reg write p1 0xf
error: Register not found for 'p1'.

AFTER:
(lldb) reg read p3_0
p3_0 = 0x
(lldb) reg read p0
  p0 = 0x00
(lldb) reg read -s 1
Predicate Registers:
p0 = 0x00
p1 = 0x00
p2 = 0x00
p3 = 0x00

(lldb) reg write p1 0xf
(lldb) reg read p3_0
p3_0 = 0x0f00
(lldb) reg write p3_0 0xff00ff00
(lldb) reg read -s 1
Predicate Registers:
p0 = 0x00
p1 = 0xff
p2 = 0x00
p3 = 0xff

Signed-off-by: Taylor Simpson 
---
 target/hexagon/gdbstub.c | 19 ++-
 gdb-xml/hexagon-core.xml |  6 +-
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/target/hexagon/gdbstub.c b/target/hexagon/gdbstub.c
index 502c6987f0..e67e627fc9 100644
--- a/target/hexagon/gdbstub.c
+++ b/target/hexagon/gdbstub.c
@@ -1,5 +1,5 @@
 /*
- *  Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights 
Reserved.
+ *  Copyright(c) 2019-2024 Qualcomm Innovation Center, Inc. All Rights 
Reserved.
  *
  *  This program is free software; you can redistribute it and/or modify
  *  it under the terms of the GNU General Public License as published by
@@ -36,6 +36,14 @@ int hexagon_gdb_read_register(CPUState *cs, GByteArray 
*mem_buf, int n)
 return gdb_get_regl(mem_buf, env->gpr[n]);
 }
 
+n -= TOTAL_PER_THREAD_REGS;
+
+if (n < NUM_PREGS) {
+return gdb_get_reg8(mem_buf, env->pred[n]);
+}
+
+n -= NUM_PREGS;
+
 g_assert_not_reached();
 }
 
@@ -56,6 +64,15 @@ int hexagon_gdb_write_register(CPUState *cs, uint8_t 
*mem_buf, int n)
 return sizeof(target_ulong);
 }
 
+n -= TOTAL_PER_THREAD_REGS;
+
+if (n < NUM_PREGS) {
+env->pred[n] = ldtul_p(mem_buf);
+return sizeof(uint8_t);
+}
+
+n -= NUM_PREGS;
+
 g_assert_not_reached();
 }
 
diff --git a/gdb-xml/hexagon-core.xml b/gdb-xml/hexagon-core.xml
index e181163cff..b94378112a 100644
--- a/gdb-xml/hexagon-core.xml
+++ b/gdb-xml/hexagon-core.xml
@@ -1,6 +1,6 @@

Re: [PATCH 0/5] Reinstate ability to use Qemu on pre-SSE4.1 x86 hosts

2024-06-12 Thread Daniel P . Berrangé

On Wed, Jun 12, 2024 at 06:40:09PM +0300, Alexander Monakov wrote:
> 
> On Wed, 12 Jun 2024, Daniel P. Berrangé wrote:
> 
> > I learnt that FESCo approved a surprisingly loose rule saying
> > 
> >   "Libraries packaged in Fedora may require ISA extensions,
> >however any packaged application must not crash on any
> >officially supported architecture, either by providing
> >a generic fallback implementation OR by cleanly exiting
> >when the requisite hardware support is unavailable."
> > 
> > This might suggest we could put a runtime feature check in main(),
> > print a warning and then exit(1), however, QEMU has alot of code
> > that is triggered from ELF constructors. If we're building the
> > entire of QEMU codebase with extra features enabled, I worry that
> > the constructors could potentially cause a illegal instruction
> > crash before main() runs ?
> 
> Are you literally suggesting to find a solution that satisfies the letter
> of Fedora rules, and not what's good for the spirit of a wider community.

I'm interested in exploring what the options are. Personally I still
think QEMU ought to maintain compat with the original x86_64 ABI, since
very few distros have moved to requiring -v2, but if that doesn't happen
I want to understand the implications for Fedora since that's where I'm
a maintainer.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [RFC PATCH v2 3/5] rust: add PL011 device model

2024-06-12 Thread Daniel P . Berrangé

On Wed, Jun 12, 2024 at 05:14:56PM +0300, Manos Pitsidianakis wrote:
> On Wed, 12 Jun 2024 15:29, Paolo Bonzini  wrote:
> > I think this is extremely useful to show where we could go in the task
> > of creating more idiomatic bindings.
> > 
> > On Tue, Jun 11, 2024 at 12:34 PM Manos Pitsidianakis
> >  wrote:
> > > +pub const PL011_ARM_INFO: TypeInfo = TypeInfo {
> > > +name: TYPE_PL011.as_ptr(),
> > > +parent: TYPE_SYS_BUS_DEVICE.as_ptr(),
> > > +instance_size: core::mem::size_of::(),
> > > +instance_align: core::mem::align_of::(),
> > > +instance_init: Some(pl011_init),
> > > +instance_post_init: None,
> > > +instance_finalize: None,
> > > +abstract_: false,
> > > +class_size: 0,
> > > +class_init: Some(pl011_class_init),
> > > +class_base_init: None,
> > > +class_data: core::ptr::null_mut(),
> > > +interfaces: core::ptr::null_mut(),
> > > +};
> > 
> > QOM is certainly a major part of what we need to do for idiomatic
> > bindings. This series includes both using QOM objects (chardev) and
> > defining them.
> > 
> > For using QOM objects, there is only one strategy that we need to
> > implement for both Chardev and DeviceState/SysBusDevice which is nice.
> > We can probably take inspiration from glib-rs, see for example
> > - https://docs.rs/glib/latest/glib/object/trait.Cast.html
> > - https://docs.rs/glib/latest/glib/object/trait.ObjectType.html
> > - https://docs.rs/glib/latest/glib/object/struct.ObjectRef.html
> 
> 
> There was consensus in the community call that we won't be writing Rust APIs
> for internal C QEMU interfaces; or at least, that's not the goal

I think that is over-stating things. This was only mentioned in passing
and my feeling was that we didn't have that detailed of a discussion
because at this stage such a discussion is a bit like putting the cart
before the horse.

While the initial focus might be to just consume a Rust API that is a
1:1 mapping of the C API, I expect that over time we'll end up writing
various higher level Rust wrappers around the C API. If we didn't do that,
then in effect we'd be making ourselves write psuedo-C code in Rust,
undermining many of the benefits we could get.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH 3/9] plugins: Ensure register handles are not NULL

2024-06-12 Thread Pierrick Bouvier


On 6/12/24 08:35, Alex Bennée wrote:

From: Akihiko Odaki 

Ensure register handles are not NULL so that a plugin can assume NULL is
invalid as a register handle.

Signed-off-by: Akihiko Odaki 
Message-Id: <20240229-null-v1-1-e716501d9...@daynix.com>
Signed-off-by: Alex Bennée 
---
  plugins/api.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/plugins/api.c b/plugins/api.c
index 5a0a7f8c71..6bdb26bbe3 100644
--- a/plugins/api.c
+++ b/plugins/api.c
@@ -507,7 +507,7 @@ static GArray *create_register_handles(GArray *gdbstub_regs)
  }
  
  /* Create a record for the plugin */

-desc.handle = GINT_TO_POINTER(grd->gdb_reg);
+desc.handle = GINT_TO_POINTER(grd->gdb_reg + 1);
  desc.name = g_intern_string(grd->name);
  desc.feature = g_intern_string(grd->feature_name);
  g_array_append_val(find_data, desc);
@@ -528,7 +528,7 @@ int qemu_plugin_read_register(struct qemu_plugin_register 
*reg, GByteArray *buf)
  {
  g_assert(current_cpu);
  
-return gdb_read_register(current_cpu, buf, GPOINTER_TO_INT(reg));

+return gdb_read_register(current_cpu, buf, GPOINTER_TO_INT(reg) - 1);
  }
  
  struct qemu_plugin_scoreboard *qemu_plugin_scoreboard_new(size_t element_size)


Reviewed-by: Pierrick Bouvier

Re: [PATCH 2/9] gdbstub: move enums into separate header

2024-06-12 Thread Pierrick Bouvier


On 6/12/24 08:35, Alex Bennée wrote:

This is an experiment to further reduce the amount we throw into the
exec headers. It might not be as useful as I initially thought because
just under half of the users also need gdbserver_start().

Signed-off-by: Alex Bennée 
---
  include/exec/gdbstub.h|  9 -
  include/gdbstub/enums.h   | 21 +
  accel/hvf/hvf-accel-ops.c |  2 +-
  accel/kvm/kvm-all.c   |  2 +-
  accel/tcg/tcg-accel-ops.c |  2 +-
  gdbstub/user.c|  1 +
  monitor/hmp-cmds.c|  3 ++-
  system/vl.c   |  1 +
  target/arm/hvf/hvf.c  |  2 +-
  target/arm/hyp_gdbstub.c  |  2 +-
  target/arm/kvm.c  |  2 +-
  target/i386/kvm/kvm.c |  2 +-
  target/ppc/kvm.c  |  2 +-
  target/s390x/kvm/kvm.c|  2 +-
  14 files changed, 34 insertions(+), 19 deletions(-)
  create mode 100644 include/gdbstub/enums.h

diff --git a/include/exec/gdbstub.h b/include/exec/gdbstub.h
index 008a92198a..1bd2c4ec2a 100644
--- a/include/exec/gdbstub.h
+++ b/include/exec/gdbstub.h
@@ -1,15 +1,6 @@
  #ifndef GDBSTUB_H
  #define GDBSTUB_H
  
-#define DEFAULT_GDBSTUB_PORT "1234"

-
-/* GDB breakpoint/watchpoint types */
-#define GDB_BREAKPOINT_SW0
-#define GDB_BREAKPOINT_HW1
-#define GDB_WATCHPOINT_WRITE 2
-#define GDB_WATCHPOINT_READ  3
-#define GDB_WATCHPOINT_ACCESS4
-
  typedef struct GDBFeature {
  const char *xmlname;
  const char *xml;
diff --git a/include/gdbstub/enums.h b/include/gdbstub/enums.h
new file mode 100644
index 00..c4d54a1d08
--- /dev/null
+++ b/include/gdbstub/enums.h
@@ -0,0 +1,21 @@
+/*
+ * gdbstub enums
+ *
+ * Copyright (c) 2024 Linaro Ltd
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef GDBSTUB_ENUMS_H
+#define GDBSTUB_ENUMS_H
+
+#define DEFAULT_GDBSTUB_PORT "1234"
+
+/* GDB breakpoint/watchpoint types */
+#define GDB_BREAKPOINT_SW0
+#define GDB_BREAKPOINT_HW1
+#define GDB_WATCHPOINT_WRITE 2
+#define GDB_WATCHPOINT_READ  3
+#define GDB_WATCHPOINT_ACCESS4
+
+#endif /* GDBSTUB_ENUMS_H */
diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
index b2a37a2229..ac08cfb9f3 100644
--- a/accel/hvf/hvf-accel-ops.c
+++ b/accel/hvf/hvf-accel-ops.c
@@ -52,7 +52,7 @@
  #include "qemu/main-loop.h"
  #include "exec/address-spaces.h"
  #include "exec/exec-all.h"
-#include "exec/gdbstub.h"
+#include "gdbstub/enums.h"
  #include "sysemu/cpus.h"
  #include "sysemu/hvf.h"
  #include "sysemu/hvf_int.h"
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 009b49de44..5680cd157e 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -27,7 +27,7 @@
  #include "hw/pci/msi.h"
  #include "hw/pci/msix.h"
  #include "hw/s390x/adapter.h"
-#include "exec/gdbstub.h"
+#include "gdbstub/enums.h"
  #include "sysemu/kvm_int.h"
  #include "sysemu/runstate.h"
  #include "sysemu/cpus.h"
diff --git a/accel/tcg/tcg-accel-ops.c b/accel/tcg/tcg-accel-ops.c
index 1433e38f40..3c19e68a79 100644
--- a/accel/tcg/tcg-accel-ops.c
+++ b/accel/tcg/tcg-accel-ops.c
@@ -35,7 +35,7 @@
  #include "exec/exec-all.h"
  #include "exec/hwaddr.h"
  #include "exec/tb-flush.h"
-#include "exec/gdbstub.h"
+#include "gdbstub/enums.h"
  
  #include "hw/core/cpu.h"
  
diff --git a/gdbstub/user.c b/gdbstub/user.c

index edeb72efeb..e34b58b407 100644
--- a/gdbstub/user.c
+++ b/gdbstub/user.c
@@ -18,6 +18,7 @@
  #include "exec/gdbstub.h"
  #include "gdbstub/syscalls.h"
  #include "gdbstub/user.h"
+#include "gdbstub/enums.h"
  #include "hw/core/cpu.h"
  #include "trace.h"
  #include "internals.h"
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index ea79148ee8..067152589b 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -15,8 +15,9 @@
  
  #include "qemu/osdep.h"

  #include "exec/address-spaces.h"
-#include "exec/gdbstub.h"
  #include "exec/ioport.h"
+#include "exec/gdbstub.h"
+#include "gdbstub/enums.h"
  #include "monitor/hmp.h"
  #include "qemu/help_option.h"
  #include "monitor/monitor-internal.h"
diff --git a/system/vl.c b/system/vl.c
index a3eede5fa5..cfcb674425 100644
--- a/system/vl.c
+++ b/system/vl.c
@@ -68,6 +68,7 @@
  #include "sysemu/numa.h"
  #include "sysemu/hostmem.h"
  #include "exec/gdbstub.h"
+#include "gdbstub/enums.h"
  #include "qemu/timer.h"
  #include "chardev/char.h"
  #include "qemu/bitmap.h"
diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c
index 45e2218be5..ef9bc42738 100644
--- a/target/arm/hvf/hvf.c
+++ b/target/arm/hvf/hvf.c
@@ -33,7 +33,7 @@
  #include "trace/trace-target_arm_hvf.h"
  #include "migration/vmstate.h"
  
-#include "exec/gdbstub.h"

+#include "gdbstub/enums.h"
  
  #define MDSCR_EL1_SS_SHIFT  0

  #define MDSCR_EL1_MDE_SHIFT 15
diff --git a/target/arm/hyp_gdbstub.c b/target/arm/hyp_gdbstub.c
index ebde2899cd..f120d55caa 100644
--- a/target/arm/hyp_gdbstub.c
+++ b/target/arm/hyp_gdbstub.c
@@ -12,7 +12,7 @@
  #include "qemu/osdep.h"
  #include "cpu.h"
  #include "internals.h"
-#include "exec/gdbstub.h"
+#include

Re: [PATCH 1/9] include/exec: add missing include guard comment

2024-06-12 Thread Pierrick Bouvier


On 6/12/24 08:35, Alex Bennée wrote:

Signed-off-by: Alex Bennée 
---
  include/exec/gdbstub.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/exec/gdbstub.h b/include/exec/gdbstub.h
index eb14b91139..008a92198a 100644
--- a/include/exec/gdbstub.h
+++ b/include/exec/gdbstub.h
@@ -144,4 +144,4 @@ void gdb_set_stop_cpu(CPUState *cpu);
  /* in gdbstub-xml.c, generated by scripts/feature_to_c.py */
  extern const GDBFeature gdb_static_features[];
  
-#endif

+#endif /* GDBSTUB_H */


Reviewed-by: Pierrick Bouvier

Re: [PATCH 8/9] plugins: add time control API

2024-06-12 Thread Pierrick Bouvier


Hi Alex,

I noticed the new symbols lack QEMU_PLUGIN_API qualifier in 
include/qemu/qemu-plugin.h:

- qemu_plugin_update_ns
- qemu_plugin_request_time_control

So it would be impossible to use those symbols on windows.

I kept a reminder to send a new patch after you pulled this, but if we 
go to a new series, it could be as fast for you to just add this directly.


Thanks,
Pierrick

On 6/12/24 08:35, Alex Bennée wrote:

Expose the ability to control time through the plugin API. Only one
plugin can control time so it has to request control when loaded.
There are probably more corner cases to catch here.

From: Alex Bennée 
Signed-off-by: Pierrick Bouvier 
[AJB: tweaked user-mode handling]
Signed-off-by: Alex Bennée 
Message-Id: <20240530220610.1245424-6-pierrick.bouv...@linaro.org>

---
plugins/next
   - make qemu_plugin_update_ns a NOP in user-mode
---
  include/qemu/qemu-plugin.h   | 25 +
  plugins/api.c| 35 +++
  plugins/qemu-plugins.symbols |  2 ++
  3 files changed, 62 insertions(+)

diff --git a/include/qemu/qemu-plugin.h b/include/qemu/qemu-plugin.h
index 95703d8fec..db4d67529e 100644
--- a/include/qemu/qemu-plugin.h
+++ b/include/qemu/qemu-plugin.h
@@ -661,6 +661,31 @@ void qemu_plugin_register_vcpu_mem_inline_per_vcpu(
  qemu_plugin_u64 entry,
  uint64_t imm);
  
+/**

+ * qemu_plugin_request_time_control() - request the ability to control time
+ *
+ * This grants the plugin the ability to control system time. Only one
+ * plugin can control time so if multiple plugins request the ability
+ * all but the first will fail.
+ *
+ * Returns an opaque handle or NULL if fails
+ */
+const void *qemu_plugin_request_time_control(void);
+
+/**
+ * qemu_plugin_update_ns() - update system emulation time
+ * @handle: opaque handle returned by qemu_plugin_request_time_control()
+ * @time: time in nanoseconds
+ *
+ * This allows an appropriately authorised plugin (i.e. holding the
+ * time control handle) to move system time forward to @time. For
+ * user-mode emulation the time is not changed by this as all reported
+ * time comes from the host kernel.
+ *
+ * Start time is 0.
+ */
+void qemu_plugin_update_ns(const void *handle, int64_t time);
+
  typedef void
  (*qemu_plugin_vcpu_syscall_cb_t)(qemu_plugin_id_t id, unsigned int vcpu_index,
   int64_t num, uint64_t a1, uint64_t a2,
diff --git a/plugins/api.c b/plugins/api.c
index 6bdb26bbe3..4431a0ea7e 100644
--- a/plugins/api.c
+++ b/plugins/api.c
@@ -39,6 +39,7 @@
  #include "qemu/main-loop.h"
  #include "qemu/plugin.h"
  #include "qemu/log.h"
+#include "qemu/timer.h"
  #include "tcg/tcg.h"
  #include "exec/exec-all.h"
  #include "exec/gdbstub.h"
@@ -583,3 +584,37 @@ uint64_t qemu_plugin_u64_sum(qemu_plugin_u64 entry)
  }
  return total;
  }
+
+/*
+ * Time control
+ */
+static bool has_control;
+
+const void *qemu_plugin_request_time_control(void)
+{
+if (!has_control) {
+has_control = true;
+return _control;
+}
+return NULL;
+}
+
+#ifdef CONFIG_SOFTMMU
+static void advance_virtual_time__async(CPUState *cpu, run_on_cpu_data data)
+{
+int64_t new_time = data.host_ulong;
+qemu_clock_advance_virtual_time(new_time);
+}
+#endif
+
+void qemu_plugin_update_ns(const void *handle, int64_t new_time)
+{
+#ifdef CONFIG_SOFTMMU
+if (handle == _control) {
+/* Need to execute out of cpu_exec, so bql can be locked. */
+async_run_on_cpu(current_cpu,
+ advance_virtual_time__async,
+ RUN_ON_CPU_HOST_ULONG(new_time));
+}
+#endif
+}
diff --git a/plugins/qemu-plugins.symbols b/plugins/qemu-plugins.symbols
index aa0a77a319..ca773d8d9f 100644
--- a/plugins/qemu-plugins.symbols
+++ b/plugins/qemu-plugins.symbols
@@ -38,6 +38,7 @@
qemu_plugin_register_vcpu_tb_exec_cond_cb;
qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu;
qemu_plugin_register_vcpu_tb_trans_cb;
+  qemu_plugin_request_time_control;
qemu_plugin_reset;
qemu_plugin_scoreboard_free;
qemu_plugin_scoreboard_find;
@@ -51,5 +52,6 @@
qemu_plugin_u64_set;
qemu_plugin_u64_sum;
qemu_plugin_uninstall;
+  qemu_plugin_update_ns;
qemu_plugin_vcpu_for_each;
  };

Re: [PATCH v3 2/4] usb/hub: mark as deprecated

2024-06-12 Thread Alex Bennée

Daniel P. Berrangé  writes:

> On Thu, Jun 06, 2024 at 04:30:08PM +0200, Gerd Hoffmann wrote:
>> The hub supports only USB 1.1.  When running out of usb ports it is in
>> almost all cases the much better choice to add another usb host adapter
>> (or increase the number of root ports when using xhci) instead of using
>> the usb hub.
>
> Is that actually a strong enough reason to delete this device though ?
> This reads like its merely something we don't expect to be commonly
> used, rather than something we would actively want to delete.

This does seem quite aggressive because there may be cases when users
explicitly want to use old devices. Maybe there is need for a third
state (better_alternatives?) so we can steer users away from old command
lines they may have picked up from the web to the modern alternative?

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH v2 8/9] target/arm: Add aarch64_tcg_ops

2024-06-12 Thread Richard Henderson


On 6/12/24 07:36, Alex Bennée wrote:

What happens when the CPU is running mixed mode code and jumping between
64 and 32 bit? Wouldn't it be easier to have a helper that routes to the
correct unwinder, c.f. gen_intermediate_code


GDB can't switch modes, so there is *never* any mode switching.


r~

Re: [PATCH v3] accel/tcg: Fix typo causing tb->page_addr[1] to not be recorded

2024-06-12 Thread Alex Bennée

Anton Johansson via  writes:

> For TBs crossing page boundaries, the 2nd page will never be
> recorded/removed, as the index of the 2nd page is computed from the
> address of the 1st page. This is due to a typo, fix it.
>
> Cc: qemu-sta...@nongnu.org
> Fixes: deba78709a ("accel/tcg: Always lock pages before translation")
> Signed-off-by: Anton Johansson 
> Reviewed-by: Manos Pitsidianakis 
> Reviewed-by: Philippe Mathieu-Daudé 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH 0/5] Reinstate ability to use Qemu on pre-SSE4.1 x86 hosts

2024-06-12 Thread Alexander Monakov


On Wed, 12 Jun 2024, Daniel P. Berrangé wrote:

> I learnt that FESCo approved a surprisingly loose rule saying
> 
>   "Libraries packaged in Fedora may require ISA extensions,
>however any packaged application must not crash on any
>officially supported architecture, either by providing
>a generic fallback implementation OR by cleanly exiting
>when the requisite hardware support is unavailable."
> 
> This might suggest we could put a runtime feature check in main(),
> print a warning and then exit(1), however, QEMU has alot of code
> that is triggered from ELF constructors. If we're building the
> entire of QEMU codebase with extra features enabled, I worry that
> the constructors could potentially cause a illegal instruction
> crash before main() runs ?

Are you literally suggesting to find a solution that satisfies the letter
of Fedora rules, and not what's good for the spirit of a wider community.

Alexander

[PATCH 2/9] gdbstub: move enums into separate header

2024-06-12 Thread Alex Bennée

This is an experiment to further reduce the amount we throw into the
exec headers. It might not be as useful as I initially thought because
just under half of the users also need gdbserver_start().

Signed-off-by: Alex Bennée 
---
 include/exec/gdbstub.h|  9 -
 include/gdbstub/enums.h   | 21 +
 accel/hvf/hvf-accel-ops.c |  2 +-
 accel/kvm/kvm-all.c   |  2 +-
 accel/tcg/tcg-accel-ops.c |  2 +-
 gdbstub/user.c|  1 +
 monitor/hmp-cmds.c|  3 ++-
 system/vl.c   |  1 +
 target/arm/hvf/hvf.c  |  2 +-
 target/arm/hyp_gdbstub.c  |  2 +-
 target/arm/kvm.c  |  2 +-
 target/i386/kvm/kvm.c |  2 +-
 target/ppc/kvm.c  |  2 +-
 target/s390x/kvm/kvm.c|  2 +-
 14 files changed, 34 insertions(+), 19 deletions(-)
 create mode 100644 include/gdbstub/enums.h

diff --git a/include/exec/gdbstub.h b/include/exec/gdbstub.h
index 008a92198a..1bd2c4ec2a 100644
--- a/include/exec/gdbstub.h
+++ b/include/exec/gdbstub.h
@@ -1,15 +1,6 @@
 #ifndef GDBSTUB_H
 #define GDBSTUB_H
 
-#define DEFAULT_GDBSTUB_PORT "1234"
-
-/* GDB breakpoint/watchpoint types */
-#define GDB_BREAKPOINT_SW0
-#define GDB_BREAKPOINT_HW1
-#define GDB_WATCHPOINT_WRITE 2
-#define GDB_WATCHPOINT_READ  3
-#define GDB_WATCHPOINT_ACCESS4
-
 typedef struct GDBFeature {
 const char *xmlname;
 const char *xml;
diff --git a/include/gdbstub/enums.h b/include/gdbstub/enums.h
new file mode 100644
index 00..c4d54a1d08
--- /dev/null
+++ b/include/gdbstub/enums.h
@@ -0,0 +1,21 @@
+/*
+ * gdbstub enums
+ *
+ * Copyright (c) 2024 Linaro Ltd
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef GDBSTUB_ENUMS_H
+#define GDBSTUB_ENUMS_H
+
+#define DEFAULT_GDBSTUB_PORT "1234"
+
+/* GDB breakpoint/watchpoint types */
+#define GDB_BREAKPOINT_SW0
+#define GDB_BREAKPOINT_HW1
+#define GDB_WATCHPOINT_WRITE 2
+#define GDB_WATCHPOINT_READ  3
+#define GDB_WATCHPOINT_ACCESS4
+
+#endif /* GDBSTUB_ENUMS_H */
diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
index b2a37a2229..ac08cfb9f3 100644
--- a/accel/hvf/hvf-accel-ops.c
+++ b/accel/hvf/hvf-accel-ops.c
@@ -52,7 +52,7 @@
 #include "qemu/main-loop.h"
 #include "exec/address-spaces.h"
 #include "exec/exec-all.h"
-#include "exec/gdbstub.h"
+#include "gdbstub/enums.h"
 #include "sysemu/cpus.h"
 #include "sysemu/hvf.h"
 #include "sysemu/hvf_int.h"
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 009b49de44..5680cd157e 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -27,7 +27,7 @@
 #include "hw/pci/msi.h"
 #include "hw/pci/msix.h"
 #include "hw/s390x/adapter.h"
-#include "exec/gdbstub.h"
+#include "gdbstub/enums.h"
 #include "sysemu/kvm_int.h"
 #include "sysemu/runstate.h"
 #include "sysemu/cpus.h"
diff --git a/accel/tcg/tcg-accel-ops.c b/accel/tcg/tcg-accel-ops.c
index 1433e38f40..3c19e68a79 100644
--- a/accel/tcg/tcg-accel-ops.c
+++ b/accel/tcg/tcg-accel-ops.c
@@ -35,7 +35,7 @@
 #include "exec/exec-all.h"
 #include "exec/hwaddr.h"
 #include "exec/tb-flush.h"
-#include "exec/gdbstub.h"
+#include "gdbstub/enums.h"
 
 #include "hw/core/cpu.h"
 
diff --git a/gdbstub/user.c b/gdbstub/user.c
index edeb72efeb..e34b58b407 100644
--- a/gdbstub/user.c
+++ b/gdbstub/user.c
@@ -18,6 +18,7 @@
 #include "exec/gdbstub.h"
 #include "gdbstub/syscalls.h"
 #include "gdbstub/user.h"
+#include "gdbstub/enums.h"
 #include "hw/core/cpu.h"
 #include "trace.h"
 #include "internals.h"
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index ea79148ee8..067152589b 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -15,8 +15,9 @@
 
 #include "qemu/osdep.h"
 #include "exec/address-spaces.h"
-#include "exec/gdbstub.h"
 #include "exec/ioport.h"
+#include "exec/gdbstub.h"
+#include "gdbstub/enums.h"
 #include "monitor/hmp.h"
 #include "qemu/help_option.h"
 #include "monitor/monitor-internal.h"
diff --git a/system/vl.c b/system/vl.c
index a3eede5fa5..cfcb674425 100644
--- a/system/vl.c
+++ b/system/vl.c
@@ -68,6 +68,7 @@
 #include "sysemu/numa.h"
 #include "sysemu/hostmem.h"
 #include "exec/gdbstub.h"
+#include "gdbstub/enums.h"
 #include "qemu/timer.h"
 #include "chardev/char.h"
 #include "qemu/bitmap.h"
diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c
index 45e2218be5..ef9bc42738 100644
--- a/target/arm/hvf/hvf.c
+++ b/target/arm/hvf/hvf.c
@@ -33,7 +33,7 @@
 #include "trace/trace-target_arm_hvf.h"
 #include "migration/vmstate.h"
 
-#include "exec/gdbstub.h"
+#include "gdbstub/enums.h"
 
 #define MDSCR_EL1_SS_SHIFT  0
 #define MDSCR_EL1_MDE_SHIFT 15
diff --git a/target/arm/hyp_gdbstub.c b/target/arm/hyp_gdbstub.c
index ebde2899cd..f120d55caa 100644
--- a/target/arm/hyp_gdbstub.c
+++ b/target/arm/hyp_gdbstub.c
@@ -12,7 +12,7 @@
 #include "qemu/osdep.h"
 #include "cpu.h"
 #include "internals.h"
-#include "exec/gdbstub.h"
+#include "gdbstub/enums.h"
 
 /* Maximum and current break/watch point counts */
 int max_hw_bps, max_hw_wps;
diff --git

[PATCH 0/9] maintainer updates (gdbstub, plugins, time control)

2024-06-12 Thread Alex Bennée

Hi,

This is the current state of my maintainer trees. The gdbstub patches
are just minor clean-ups. The main feature this brings in is the
ability for plugins to control time. This has been discussed before
but represents the first time plugins can "control" the execution of
the core. The idea would be to eventually deprecate the icount auto
modes in favour of a plugin and just use icount for deterministic
execution and record/replay.

Alex.

Akihiko Odaki (1):
  plugins: Ensure register handles are not NULL

Alex Bennée (6):
  include/exec: add missing include guard comment
  gdbstub: move enums into separate header
  sysemu: add set_virtual_time to accel ops
  qtest: use cpu interface in qtest_clock_warp
  sysemu: generalise qtest_warp_clock as qemu_clock_advance_virtual_time
  plugins: add time control API

Pierrick Bouvier (2):
  qtest: move qtest_{get, set}_virtual_clock to accel/qtest/qtest.c
  contrib/plugins: add ips plugin example for cost modeling

 include/exec/gdbstub.h|  11 +-
 include/gdbstub/enums.h   |  21 +++
 include/qemu/qemu-plugin.h|  25 +++
 include/qemu/timer.h  |  15 ++
 include/sysemu/accel-ops.h|  18 +-
 include/sysemu/cpu-timers.h   |   3 +-
 include/sysemu/qtest.h|   2 -
 accel/hvf/hvf-accel-ops.c |   2 +-
 accel/kvm/kvm-all.c   |   2 +-
 accel/qtest/qtest.c   |  13 ++
 accel/tcg/tcg-accel-ops.c |   2 +-
 contrib/plugins/ips.c | 164 ++
 gdbstub/user.c|   1 +
 monitor/hmp-cmds.c|   3 +-
 plugins/api.c |  39 -
 ...t-virtual-clock.c => cpus-virtual-clock.c} |   5 +
 system/cpus.c |  11 ++
 system/qtest.c|  37 +---
 system/vl.c   |   1 +
 target/arm/hvf/hvf.c  |   2 +-
 target/arm/hyp_gdbstub.c  |   2 +-
 target/arm/kvm.c  |   2 +-
 target/i386/kvm/kvm.c |   2 +-
 target/ppc/kvm.c  |   2 +-
 target/s390x/kvm/kvm.c|   2 +-
 util/qemu-timer.c |  26 +++
 contrib/plugins/Makefile  |   1 +
 plugins/qemu-plugins.symbols  |   2 +
 stubs/meson.build |   2 +-
 29 files changed, 357 insertions(+), 61 deletions(-)
 create mode 100644 include/gdbstub/enums.h
 create mode 100644 contrib/plugins/ips.c
 rename stubs/{cpus-get-virtual-clock.c => cpus-virtual-clock.c} (68%)

-- 
2.39.2

[PATCH 8/9] plugins: add time control API

2024-06-12 Thread Alex Bennée

Expose the ability to control time through the plugin API. Only one
plugin can control time so it has to request control when loaded.
There are probably more corner cases to catch here.

From: Alex Bennée 
Signed-off-by: Pierrick Bouvier 
[AJB: tweaked user-mode handling]
Signed-off-by: Alex Bennée 
Message-Id: <20240530220610.1245424-6-pierrick.bouv...@linaro.org>

---
plugins/next
  - make qemu_plugin_update_ns a NOP in user-mode
---
 include/qemu/qemu-plugin.h   | 25 +
 plugins/api.c| 35 +++
 plugins/qemu-plugins.symbols |  2 ++
 3 files changed, 62 insertions(+)

diff --git a/include/qemu/qemu-plugin.h b/include/qemu/qemu-plugin.h
index 95703d8fec..db4d67529e 100644
--- a/include/qemu/qemu-plugin.h
+++ b/include/qemu/qemu-plugin.h
@@ -661,6 +661,31 @@ void qemu_plugin_register_vcpu_mem_inline_per_vcpu(
 qemu_plugin_u64 entry,
 uint64_t imm);
 
+/**
+ * qemu_plugin_request_time_control() - request the ability to control time
+ *
+ * This grants the plugin the ability to control system time. Only one
+ * plugin can control time so if multiple plugins request the ability
+ * all but the first will fail.
+ *
+ * Returns an opaque handle or NULL if fails
+ */
+const void *qemu_plugin_request_time_control(void);
+
+/**
+ * qemu_plugin_update_ns() - update system emulation time
+ * @handle: opaque handle returned by qemu_plugin_request_time_control()
+ * @time: time in nanoseconds
+ *
+ * This allows an appropriately authorised plugin (i.e. holding the
+ * time control handle) to move system time forward to @time. For
+ * user-mode emulation the time is not changed by this as all reported
+ * time comes from the host kernel.
+ *
+ * Start time is 0.
+ */
+void qemu_plugin_update_ns(const void *handle, int64_t time);
+
 typedef void
 (*qemu_plugin_vcpu_syscall_cb_t)(qemu_plugin_id_t id, unsigned int vcpu_index,
  int64_t num, uint64_t a1, uint64_t a2,
diff --git a/plugins/api.c b/plugins/api.c
index 6bdb26bbe3..4431a0ea7e 100644
--- a/plugins/api.c
+++ b/plugins/api.c
@@ -39,6 +39,7 @@
 #include "qemu/main-loop.h"
 #include "qemu/plugin.h"
 #include "qemu/log.h"
+#include "qemu/timer.h"
 #include "tcg/tcg.h"
 #include "exec/exec-all.h"
 #include "exec/gdbstub.h"
@@ -583,3 +584,37 @@ uint64_t qemu_plugin_u64_sum(qemu_plugin_u64 entry)
 }
 return total;
 }
+
+/*
+ * Time control
+ */
+static bool has_control;
+
+const void *qemu_plugin_request_time_control(void)
+{
+if (!has_control) {
+has_control = true;
+return _control;
+}
+return NULL;
+}
+
+#ifdef CONFIG_SOFTMMU
+static void advance_virtual_time__async(CPUState *cpu, run_on_cpu_data data)
+{
+int64_t new_time = data.host_ulong;
+qemu_clock_advance_virtual_time(new_time);
+}
+#endif
+
+void qemu_plugin_update_ns(const void *handle, int64_t new_time)
+{
+#ifdef CONFIG_SOFTMMU
+if (handle == _control) {
+/* Need to execute out of cpu_exec, so bql can be locked. */
+async_run_on_cpu(current_cpu,
+ advance_virtual_time__async,
+ RUN_ON_CPU_HOST_ULONG(new_time));
+}
+#endif
+}
diff --git a/plugins/qemu-plugins.symbols b/plugins/qemu-plugins.symbols
index aa0a77a319..ca773d8d9f 100644
--- a/plugins/qemu-plugins.symbols
+++ b/plugins/qemu-plugins.symbols
@@ -38,6 +38,7 @@
   qemu_plugin_register_vcpu_tb_exec_cond_cb;
   qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu;
   qemu_plugin_register_vcpu_tb_trans_cb;
+  qemu_plugin_request_time_control;
   qemu_plugin_reset;
   qemu_plugin_scoreboard_free;
   qemu_plugin_scoreboard_find;
@@ -51,5 +52,6 @@
   qemu_plugin_u64_set;
   qemu_plugin_u64_sum;
   qemu_plugin_uninstall;
+  qemu_plugin_update_ns;
   qemu_plugin_vcpu_for_each;
 };
-- 
2.39.2

[PATCH 5/9] qtest: use cpu interface in qtest_clock_warp

2024-06-12 Thread Alex Bennée

This generalises the qtest_clock_warp code to use the AccelOps
handlers for updating its own sense of time. This will make the next
patch which moves the warp code closer to pure code motion.

From: Alex Bennée 
Acked-by: Thomas Huth 
Signed-off-by: Pierrick Bouvier 
Signed-off-by: Alex Bennée 
Message-Id: <20240530220610.1245424-3-pierrick.bouv...@linaro.org>
---
 include/sysemu/qtest.h | 1 +
 accel/qtest/qtest.c| 1 +
 system/qtest.c | 6 +++---
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/sysemu/qtest.h b/include/sysemu/qtest.h
index b5d5fd3463..45f3b7e1df 100644
--- a/include/sysemu/qtest.h
+++ b/include/sysemu/qtest.h
@@ -36,6 +36,7 @@ void qtest_server_set_send_handler(void (*send)(void *, const 
char *),
 void qtest_server_inproc_recv(void *opaque, const char *buf);
 
 int64_t qtest_get_virtual_clock(void);
+void qtest_set_virtual_clock(int64_t count);
 #endif
 
 #endif
diff --git a/accel/qtest/qtest.c b/accel/qtest/qtest.c
index f6056ac836..53182e6c2a 100644
--- a/accel/qtest/qtest.c
+++ b/accel/qtest/qtest.c
@@ -52,6 +52,7 @@ static void qtest_accel_ops_class_init(ObjectClass *oc, void 
*data)
 
 ops->create_vcpu_thread = dummy_start_vcpu_thread;
 ops->get_virtual_clock = qtest_get_virtual_clock;
+ops->set_virtual_clock = qtest_set_virtual_clock;
 };
 
 static const TypeInfo qtest_accel_ops_type = {
diff --git a/system/qtest.c b/system/qtest.c
index 507a358f3b..5be66b0140 100644
--- a/system/qtest.c
+++ b/system/qtest.c
@@ -332,14 +332,14 @@ int64_t qtest_get_virtual_clock(void)
 return qatomic_read_i64(_clock_counter);
 }
 
-static void qtest_set_virtual_clock(int64_t count)
+void qtest_set_virtual_clock(int64_t count)
 {
 qatomic_set_i64(_clock_counter, count);
 }
 
 static void qtest_clock_warp(int64_t dest)
 {
-int64_t clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
+int64_t clock = cpus_get_virtual_clock();
 AioContext *aio_context;
 assert(qtest_enabled());
 aio_context = qemu_get_aio_context();
@@ -348,7 +348,7 @@ static void qtest_clock_warp(int64_t dest)
   QEMU_TIMER_ATTR_ALL);
 int64_t warp = qemu_soonest_timeout(dest - clock, deadline);
 
-qtest_set_virtual_clock(qtest_get_virtual_clock() + warp);
+cpus_set_virtual_clock(cpus_get_virtual_clock() + warp);
 
 qemu_clock_run_timers(QEMU_CLOCK_VIRTUAL);
 timerlist_run_timers(aio_context->tlg.tl[QEMU_CLOCK_VIRTUAL]);
-- 
2.39.2

[PATCH 7/9] qtest: move qtest_{get, set}_virtual_clock to accel/qtest/qtest.c

2024-06-12 Thread Alex Bennée

From: Pierrick Bouvier 

Signed-off-by: Pierrick Bouvier 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: <20240530220610.1245424-5-pierrick.bouv...@linaro.org>
---
 include/sysemu/qtest.h |  3 ---
 accel/qtest/qtest.c| 12 
 system/qtest.c | 12 
 3 files changed, 12 insertions(+), 15 deletions(-)

diff --git a/include/sysemu/qtest.h b/include/sysemu/qtest.h
index 45f3b7e1df..c161d75165 100644
--- a/include/sysemu/qtest.h
+++ b/include/sysemu/qtest.h
@@ -34,9 +34,6 @@ void qtest_server_init(const char *qtest_chrdev, const char 
*qtest_log, Error **
 void qtest_server_set_send_handler(void (*send)(void *, const char *),
  void *opaque);
 void qtest_server_inproc_recv(void *opaque, const char *buf);
-
-int64_t qtest_get_virtual_clock(void);
-void qtest_set_virtual_clock(int64_t count);
 #endif
 
 #endif
diff --git a/accel/qtest/qtest.c b/accel/qtest/qtest.c
index 53182e6c2a..bf14032d29 100644
--- a/accel/qtest/qtest.c
+++ b/accel/qtest/qtest.c
@@ -24,6 +24,18 @@
 #include "qemu/main-loop.h"
 #include "hw/core/cpu.h"
 
+static int64_t qtest_clock_counter;
+
+static int64_t qtest_get_virtual_clock(void)
+{
+return qatomic_read_i64(_clock_counter);
+}
+
+static void qtest_set_virtual_clock(int64_t count)
+{
+qatomic_set_i64(_clock_counter, count);
+}
+
 static int qtest_init_accel(MachineState *ms)
 {
 return 0;
diff --git a/system/qtest.c b/system/qtest.c
index 8cb98966b4..12703a2045 100644
--- a/system/qtest.c
+++ b/system/qtest.c
@@ -325,18 +325,6 @@ static void qtest_irq_handler(void *opaque, int n, int 
level)
 }
 }
 
-static int64_t qtest_clock_counter;
-
-int64_t qtest_get_virtual_clock(void)
-{
-return qatomic_read_i64(_clock_counter);
-}
-
-void qtest_set_virtual_clock(int64_t count)
-{
-qatomic_set_i64(_clock_counter, count);
-}
-
 static bool (*process_command_cb)(CharBackend *chr, gchar **words);
 
 void qtest_set_command_cb(bool (*pc_cb)(CharBackend *chr, gchar **words))
-- 
2.39.2

[PATCH 1/9] include/exec: add missing include guard comment

2024-06-12 Thread Alex Bennée

Signed-off-by: Alex Bennée 
---
 include/exec/gdbstub.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/exec/gdbstub.h b/include/exec/gdbstub.h
index eb14b91139..008a92198a 100644
--- a/include/exec/gdbstub.h
+++ b/include/exec/gdbstub.h
@@ -144,4 +144,4 @@ void gdb_set_stop_cpu(CPUState *cpu);
 /* in gdbstub-xml.c, generated by scripts/feature_to_c.py */
 extern const GDBFeature gdb_static_features[];
 
-#endif
+#endif /* GDBSTUB_H */
-- 
2.39.2

[PATCH 9/9] contrib/plugins: add ips plugin example for cost modeling

2024-06-12 Thread Alex Bennée

From: Pierrick Bouvier 

This plugin uses the new time control interface to make decisions
about the state of time during the emulation. The algorithm is
currently very simple. The user specifies an ips rate which applies
per core. If the core runs ahead of its allocated execution time the
plugin sleeps for a bit to let real time catch up. Either way time is
updated for the emulation as a function of total executed instructions
with some adjustments for cores that idle.

Examples


Slow down execution of /bin/true:
$ num_insn=$(./build/qemu-x86_64 -plugin ./build/tests/plugin/libinsn.so -d 
plugin /bin/true |& grep total | sed -e 's/.*: //')
$ time ./build/qemu-x86_64 -plugin 
./build/contrib/plugins/libips.so,ips=$(($num_insn/4)) /bin/true
real 4.000s

Boot a Linux kernel simulating a 250MHz cpu:
$ /build/qemu-system-x86_64 -kernel /boot/vmlinuz-6.1.0-21-amd64 -append 
"console=ttyS0" -plugin 
./build/contrib/plugins/libips.so,ips=$((250*1000*1000)) -smp 1 -m 512
check time until kernel panic on serial0

Tested in system mode by booting a full debian system, and using:
$ sysbench cpu run
Performance decrease linearly with the given number of ips.

Signed-off-by: Pierrick Bouvier 
Message-Id: <20240530220610.1245424-7-pierrick.bouv...@linaro.org>
---
 contrib/plugins/ips.c| 164 +++
 contrib/plugins/Makefile |   1 +
 2 files changed, 165 insertions(+)
 create mode 100644 contrib/plugins/ips.c

diff --git a/contrib/plugins/ips.c b/contrib/plugins/ips.c
new file mode 100644
index 00..db77729264
--- /dev/null
+++ b/contrib/plugins/ips.c
@@ -0,0 +1,164 @@
+/*
+ * ips rate limiting plugin.
+ *
+ * This plugin can be used to restrict the execution of a system to a
+ * particular number of Instructions Per Second (ips). This controls
+ * time as seen by the guest so while wall-clock time may be longer
+ * from the guests point of view time will pass at the normal rate.
+ *
+ * This uses the new plugin API which allows the plugin to control
+ * system time.
+ *
+ * Copyright (c) 2023 Linaro Ltd
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include 
+#include 
+#include 
+
+QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
+
+/* how many times do we update time per sec */
+#define NUM_TIME_UPDATE_PER_SEC 10
+#define NSEC_IN_ONE_SEC (1000 * 1000 * 1000)
+
+static GMutex global_state_lock;
+
+static uint64_t max_insn_per_second = 1000 * 1000 * 1000; /* ips per core, per 
second */
+static uint64_t max_insn_per_quantum; /* trap every N instructions */
+static int64_t virtual_time_ns; /* last set virtual time */
+
+static const void *time_handle;
+
+typedef struct {
+uint64_t total_insn;
+uint64_t quantum_insn; /* insn in last quantum */
+int64_t last_quantum_time; /* time when last quantum started */
+} vCPUTime;
+
+struct qemu_plugin_scoreboard *vcpus;
+
+/* return epoch time in ns */
+static int64_t now_ns(void)
+{
+return g_get_real_time() * 1000;
+}
+
+static uint64_t num_insn_during(int64_t elapsed_ns)
+{
+double num_secs = elapsed_ns / (double) NSEC_IN_ONE_SEC;
+return num_secs * (double) max_insn_per_second;
+}
+
+static int64_t time_for_insn(uint64_t num_insn)
+{
+double num_secs = (double) num_insn / (double) max_insn_per_second;
+return num_secs * (double) NSEC_IN_ONE_SEC;
+}
+
+static void update_system_time(vCPUTime *vcpu)
+{
+int64_t elapsed_ns = now_ns() - vcpu->last_quantum_time;
+uint64_t max_insn = num_insn_during(elapsed_ns);
+
+if (vcpu->quantum_insn >= max_insn) {
+/* this vcpu ran faster than expected, so it has to sleep */
+uint64_t insn_advance = vcpu->quantum_insn - max_insn;
+uint64_t time_advance_ns = time_for_insn(insn_advance);
+int64_t sleep_us = time_advance_ns / 1000;
+g_usleep(sleep_us);
+}
+
+vcpu->total_insn += vcpu->quantum_insn;
+vcpu->quantum_insn = 0;
+vcpu->last_quantum_time = now_ns();
+
+/* based on total number of instructions, what should be the new time? */
+int64_t new_virtual_time = time_for_insn(vcpu->total_insn);
+
+g_mutex_lock(_state_lock);
+
+/* Time only moves forward. Another vcpu might have updated it already. */
+if (new_virtual_time > virtual_time_ns) {
+qemu_plugin_update_ns(time_handle, new_virtual_time);
+virtual_time_ns = new_virtual_time;
+}
+
+g_mutex_unlock(_state_lock);
+}
+
+static void vcpu_init(qemu_plugin_id_t id, unsigned int cpu_index)
+{
+vCPUTime *vcpu = qemu_plugin_scoreboard_find(vcpus, cpu_index);
+vcpu->total_insn = 0;
+vcpu->quantum_insn = 0;
+vcpu->last_quantum_time = now_ns();
+}
+
+static void vcpu_exit(qemu_plugin_id_t id, unsigned int cpu_index)
+{
+vCPUTime *vcpu = qemu_plugin_scoreboard_find(vcpus, cpu_index);
+update_system_time(vcpu);
+}
+
+static void every_quantum_insn(unsigned int cpu_index, void *udata)
+{
+vCPUTime *vcpu = qemu_plugin_scoreboard_find(vcpus, cpu_index);
+

[PATCH 4/9] sysemu: add set_virtual_time to accel ops

2024-06-12 Thread Alex Bennée

We are about to remove direct calls to individual accelerators for
this information and will need a central point for plugins to hook
into time changes.

From: Alex Bennée 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Pierrick Bouvier 
Signed-off-by: Alex Bennée 
Message-Id: <20240530220610.1245424-2-pierrick.bouv...@linaro.org>
---
 include/sysemu/accel-ops.h | 18 +-
 include/sysemu/cpu-timers.h|  3 ++-
 ...et-virtual-clock.c => cpus-virtual-clock.c} |  5 +
 system/cpus.c  | 11 +++
 stubs/meson.build  |  2 +-
 5 files changed, 36 insertions(+), 3 deletions(-)
 rename stubs/{cpus-get-virtual-clock.c => cpus-virtual-clock.c} (68%)

diff --git a/include/sysemu/accel-ops.h b/include/sysemu/accel-ops.h
index ef91fc28bb..a088672230 100644
--- a/include/sysemu/accel-ops.h
+++ b/include/sysemu/accel-ops.h
@@ -20,7 +20,12 @@
 typedef struct AccelOpsClass AccelOpsClass;
 DECLARE_CLASS_CHECKERS(AccelOpsClass, ACCEL_OPS, TYPE_ACCEL_OPS)
 
-/* cpus.c operations interface */
+/**
+ * struct AccelOpsClass - accelerator interfaces
+ *
+ * This structure is used to abstract accelerator differences from the
+ * core CPU code. Not all have to be implemented.
+ */
 struct AccelOpsClass {
 /*< private >*/
 ObjectClass parent_class;
@@ -44,7 +49,18 @@ struct AccelOpsClass {
 
 void (*handle_interrupt)(CPUState *cpu, int mask);
 
+/**
+ * @get_virtual_clock: fetch virtual clock
+ * @set_virtual_clock: set virtual clock
+ *
+ * These allow the timer subsystem to defer to the accelerator to
+ * fetch time. The set function is needed if the accelerator wants
+ * to track the changes to time as the timer is warped through
+ * various timer events.
+ */
 int64_t (*get_virtual_clock)(void);
+void (*set_virtual_clock)(int64_t time);
+
 int64_t (*get_elapsed_ticks)(void);
 
 /* gdbstub hooks */
diff --git a/include/sysemu/cpu-timers.h b/include/sysemu/cpu-timers.h
index d86738a378..7bfa960fbd 100644
--- a/include/sysemu/cpu-timers.h
+++ b/include/sysemu/cpu-timers.h
@@ -96,8 +96,9 @@ int64_t cpu_get_clock(void);
 
 void qemu_timer_notify_cb(void *opaque, QEMUClockType type);
 
-/* get the VIRTUAL clock and VM elapsed ticks via the cpus accel interface */
+/* get/set VIRTUAL clock and VM elapsed ticks via the cpus accel interface */
 int64_t cpus_get_virtual_clock(void);
+void cpus_set_virtual_clock(int64_t new_time);
 int64_t cpus_get_elapsed_ticks(void);
 
 #endif /* SYSEMU_CPU_TIMERS_H */
diff --git a/stubs/cpus-get-virtual-clock.c b/stubs/cpus-virtual-clock.c
similarity index 68%
rename from stubs/cpus-get-virtual-clock.c
rename to stubs/cpus-virtual-clock.c
index fd447d53f3..af7c1a1d40 100644
--- a/stubs/cpus-get-virtual-clock.c
+++ b/stubs/cpus-virtual-clock.c
@@ -6,3 +6,8 @@ int64_t cpus_get_virtual_clock(void)
 {
 return cpu_get_clock();
 }
+
+void cpus_set_virtual_clock(int64_t new_time)
+{
+/* do nothing */
+}
diff --git a/system/cpus.c b/system/cpus.c
index f8fa78f33d..d3640c9503 100644
--- a/system/cpus.c
+++ b/system/cpus.c
@@ -229,6 +229,17 @@ int64_t cpus_get_virtual_clock(void)
 return cpu_get_clock();
 }
 
+/*
+ * Signal the new virtual time to the accelerator. This is only needed
+ * by accelerators that need to track the changes as we warp time.
+ */
+void cpus_set_virtual_clock(int64_t new_time)
+{
+if (cpus_accel && cpus_accel->set_virtual_clock) {
+cpus_accel->set_virtual_clock(new_time);
+}
+}
+
 /*
  * return the time elapsed in VM between vm_start and vm_stop.  Unless
  * icount is active, cpus_get_elapsed_ticks() uses units of the host CPU cycle
diff --git a/stubs/meson.build b/stubs/meson.build
index f15b48d01f..772a3e817d 100644
--- a/stubs/meson.build
+++ b/stubs/meson.build
@@ -29,7 +29,7 @@ endif
 if have_block or have_ga
   stub_ss.add(files('replay-tools.c'))
   # stubs for hooks in util/main-loop.c, util/async.c etc.
-  stub_ss.add(files('cpus-get-virtual-clock.c'))
+  stub_ss.add(files('cpus-virtual-clock.c'))
   stub_ss.add(files('icount.c'))
   stub_ss.add(files('graph-lock.c'))
   if linux_io_uring.found()
-- 
2.39.2

[PATCH 3/9] plugins: Ensure register handles are not NULL

2024-06-12 Thread Alex Bennée

From: Akihiko Odaki 

Ensure register handles are not NULL so that a plugin can assume NULL is
invalid as a register handle.

Signed-off-by: Akihiko Odaki 
Message-Id: <20240229-null-v1-1-e716501d9...@daynix.com>
Signed-off-by: Alex Bennée 
---
 plugins/api.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/plugins/api.c b/plugins/api.c
index 5a0a7f8c71..6bdb26bbe3 100644
--- a/plugins/api.c
+++ b/plugins/api.c
@@ -507,7 +507,7 @@ static GArray *create_register_handles(GArray *gdbstub_regs)
 }
 
 /* Create a record for the plugin */
-desc.handle = GINT_TO_POINTER(grd->gdb_reg);
+desc.handle = GINT_TO_POINTER(grd->gdb_reg + 1);
 desc.name = g_intern_string(grd->name);
 desc.feature = g_intern_string(grd->feature_name);
 g_array_append_val(find_data, desc);
@@ -528,7 +528,7 @@ int qemu_plugin_read_register(struct qemu_plugin_register 
*reg, GByteArray *buf)
 {
 g_assert(current_cpu);
 
-return gdb_read_register(current_cpu, buf, GPOINTER_TO_INT(reg));
+return gdb_read_register(current_cpu, buf, GPOINTER_TO_INT(reg) - 1);
 }
 
 struct qemu_plugin_scoreboard *qemu_plugin_scoreboard_new(size_t element_size)
-- 
2.39.2

[PATCH 6/9] sysemu: generalise qtest_warp_clock as qemu_clock_advance_virtual_time

2024-06-12 Thread Alex Bennée

Move the key functionality of moving time forward into the clock
sub-system itself. This will allow us to plumb in time control into
plugins.

From: Alex Bennée 
Signed-off-by: Pierrick Bouvier 
Signed-off-by: Alex Bennée 
Message-Id: <20240530220610.1245424-4-pierrick.bouv...@linaro.org>
---
 include/qemu/timer.h | 15 +++
 system/qtest.c   | 25 +++--
 util/qemu-timer.c| 26 ++
 3 files changed, 44 insertions(+), 22 deletions(-)

diff --git a/include/qemu/timer.h b/include/qemu/timer.h
index 9a366e551f..910587d829 100644
--- a/include/qemu/timer.h
+++ b/include/qemu/timer.h
@@ -245,6 +245,21 @@ bool qemu_clock_run_timers(QEMUClockType type);
  */
 bool qemu_clock_run_all_timers(void);
 
+/**
+ * qemu_clock_advance_virtual_time(): advance the virtual time tick
+ * @target: target time in nanoseconds
+ *
+ * This function is used where the control of the flow of time has
+ * been delegated to outside the clock subsystem (be it qtest, icount
+ * or some other external source). You can ask the clock system to
+ * return @early at the first expired timer.
+ *
+ * Time can only move forward, attempts to reverse time would lead to
+ * an error.
+ *
+ * Returns: new virtual time.
+ */
+int64_t qemu_clock_advance_virtual_time(int64_t dest);
 
 /*
  * QEMUTimerList
diff --git a/system/qtest.c b/system/qtest.c
index 5be66b0140..8cb98966b4 100644
--- a/system/qtest.c
+++ b/system/qtest.c
@@ -337,26 +337,6 @@ void qtest_set_virtual_clock(int64_t count)
 qatomic_set_i64(_clock_counter, count);
 }
 
-static void qtest_clock_warp(int64_t dest)
-{
-int64_t clock = cpus_get_virtual_clock();
-AioContext *aio_context;
-assert(qtest_enabled());
-aio_context = qemu_get_aio_context();
-while (clock < dest) {
-int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL,
-  QEMU_TIMER_ATTR_ALL);
-int64_t warp = qemu_soonest_timeout(dest - clock, deadline);
-
-cpus_set_virtual_clock(cpus_get_virtual_clock() + warp);
-
-qemu_clock_run_timers(QEMU_CLOCK_VIRTUAL);
-timerlist_run_timers(aio_context->tlg.tl[QEMU_CLOCK_VIRTUAL]);
-clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
-}
-qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
-}
-
 static bool (*process_command_cb)(CharBackend *chr, gchar **words);
 
 void qtest_set_command_cb(bool (*pc_cb)(CharBackend *chr, gchar **words))
@@ -751,7 +731,8 @@ static void qtest_process_command(CharBackend *chr, gchar 
**words)
 ns = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL,
 QEMU_TIMER_ATTR_ALL);
 }
-qtest_clock_warp(qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + ns);
+qemu_clock_advance_virtual_time(
+qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + ns);
 qtest_send_prefix(chr);
 qtest_sendf(chr, "OK %"PRIi64"\n",
 (int64_t)qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL));
@@ -777,7 +758,7 @@ static void qtest_process_command(CharBackend *chr, gchar 
**words)
 g_assert(words[1]);
 ret = qemu_strtoi64(words[1], NULL, 0, );
 g_assert(ret == 0);
-qtest_clock_warp(ns);
+qemu_clock_advance_virtual_time(ns);
 qtest_send_prefix(chr);
 qtest_sendf(chr, "OK %"PRIi64"\n",
 (int64_t)qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL));
diff --git a/util/qemu-timer.c b/util/qemu-timer.c
index 6a0de33dd2..213114be68 100644
--- a/util/qemu-timer.c
+++ b/util/qemu-timer.c
@@ -645,6 +645,11 @@ int64_t qemu_clock_get_ns(QEMUClockType type)
 }
 }
 
+static void qemu_virtual_clock_set_ns(int64_t time)
+{
+return cpus_set_virtual_clock(time);
+}
+
 void init_clocks(QEMUTimerListNotifyCB *notify_cb)
 {
 QEMUClockType type;
@@ -675,3 +680,24 @@ bool qemu_clock_run_all_timers(void)
 
 return progress;
 }
+
+int64_t qemu_clock_advance_virtual_time(int64_t dest)
+{
+int64_t clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
+AioContext *aio_context;
+aio_context = qemu_get_aio_context();
+while (clock < dest) {
+int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL,
+  QEMU_TIMER_ATTR_ALL);
+int64_t warp = qemu_soonest_timeout(dest - clock, deadline);
+
+qemu_virtual_clock_set_ns(qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + 
warp);
+
+qemu_clock_run_timers(QEMU_CLOCK_VIRTUAL);
+timerlist_run_timers(aio_context->tlg.tl[QEMU_CLOCK_VIRTUAL]);
+clock = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
+}
+qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
+
+return clock;
+}
-- 
2.39.2

Re: [RFC PATCH] migration/savevm: do not schedule snapshot_save_job_bh in qemu_aio_context

2024-06-12 Thread Stefan Hajnoczi

On Wed, 12 Jun 2024 at 05:21, Fiona Ebner  wrote:
>
> Am 11.06.24 um 16:04 schrieb Stefan Hajnoczi:
> > On Tue, Jun 11, 2024 at 02:08:49PM +0200, Fiona Ebner wrote:
> >> Am 06.06.24 um 20:36 schrieb Stefan Hajnoczi:
> >>> On Wed, Jun 05, 2024 at 02:08:48PM +0200, Fiona Ebner wrote:
>  The fact that the snapshot_save_job_bh() is scheduled in the main
>  loop's qemu_aio_context AioContext means that it might get executed
>  during a vCPU thread's aio_poll(). But saving of the VM state cannot
>  happen while the guest or devices are active and can lead to assertion
>  failures. See issue #2111 for two examples. Avoid the problem by
>  scheduling the snapshot_save_job_bh() in the iohandler AioContext,
>  which is not polled by vCPU threads.
> 
>  Solves Issue #2111.
> 
>  This change also solves the following issue:
> 
>  Since commit effd60c878 ("monitor: only run coroutine commands in
>  qemu_aio_context"), the 'snapshot-save' QMP call would not respond
>  right after starting the job anymore, but only after the job finished,
>  which can take a long time. The reason is, because after commit
>  effd60c878, do_qmp_dispatch_bh() runs in the iohandler AioContext.
>  When do_qmp_dispatch_bh() wakes the qmp_dispatch() coroutine, the
>  coroutine cannot be entered immediately anymore, but needs to be
>  scheduled to the main loop's qemu_aio_context AioContext. But
>  snapshot_save_job_bh() was scheduled first to the same AioContext and
>  thus gets executed first.
> 
>  Buglink: https://gitlab.com/qemu-project/qemu/-/issues/2111
>  Signed-off-by: Fiona Ebner 
>  ---
> 
>  While initial smoke testing seems fine, I'm not familiar enough with
>  this to rule out any pitfalls with the approach. Any reason why
>  scheduling to the iohandler AioContext could be wrong here?
> >>>
> >>> If something waits for a BlockJob to finish using aio_poll() from
> >>> qemu_aio_context then a deadlock is possible since the iohandler_ctx
> >>> won't get a chance to execute. The only suspicious code path I found was
> >>> job_completed_txn_abort_locked() -> job_finish_sync_locked() but I'm not
> >>> sure whether it triggers this scenario. Please check that code path.
> >>>
> >>
> >> Sorry, I don't understand. Isn't executing the scheduled BH the only
> >> additional progress that the iohandler_ctx needs to make compared to
> >> before the patch? How exactly would that cause issues when waiting for a
> >> BlockJob?
> >>
> >> Or do you mean something waiting for the SnapshotJob from
> >> qemu_aio_context before snapshot_save_job_bh had the chance to run?
> >
> > Yes, exactly. job_finish_sync_locked() will hang since iohandler_ctx has
> > no chance to execute. But I haven't audited the code to understand
> > whether this can happen.
> So job_finish_sync_locked() is executed in
> job_completed_txn_abort_locked() when the following branch is taken
>
> > if (!job_is_completed_locked(other_job))
>
> and there is no other job in the transaction, so we can assume other_job
> being the snapshot-save job itself.
>
> The callers of job_completed_txn_abort_locked():
>
> 1. in job_do_finalize_locked() if job->ret is non-zero. The callers of
> which are:
>
> 1a. in job_finalize_locked() if JOB_VERB_FINALIZE is allowed, meaning
> job->status is JOB_STATUS_PENDING, so job_is_completed_locked() will be
> true.
>
> 1b. job_completed_txn_success_locked() sets job->status to
> JOB_STATUS_WAITING before, so job_is_completed_locked() will be true.
>
> 2. in job_completed_locked() it is only done if job->ret is non-zero, in
> which case job->status was set to JOB_STATUS_ABORTING by the preceding
> job_update_rc_locked(), and thus job_is_completed_locked() will be true.
>
> 3. in job_cancel_locked() if job->deferred_to_main_loop is true, which
> is set in job_co_entry() before job_exit() is scheduled as a BH and is
> also set in job_do_dismiss_locked(). In the former case, the
> snapshot_save_job_bh has already been executed. In the latter case,
> job_is_completed_locked() will be true (since job_early_fail() is not
> used for the snapshot job).
>
>
> However, job_finish_sync_locked() is also executed via
> job_cancel_sync_all() during qemu_cleanup(). With bad timing there, I
> suppose the issue could arise.
>
> In fact, I could trigger it with the following hack on top:
>
> > diff --git a/migration/savevm.c b/migration/savevm.c
> > index 0086b76ab0..42c93176ba 100644
> > --- a/migration/savevm.c
> > +++ b/migration/savevm.c
> > @@ -3429,6 +3429,17 @@ static void snapshot_load_job_bh(void *opaque)
> >
> >  static void snapshot_save_job_bh(void *opaque)
> >  {
> > +static int n = 0;
> > +n++;
> > +if (n < 1000) {
> > +aio_bh_schedule_oneshot(iohandler_get_aio_context(),
> > +snapshot_save_job_bh, opaque);
> > +if (n % 100 == 0) {
> > +error_report("iteration %d",

Re: [PATCH 0/5] Reinstate ability to use Qemu on pre-SSE4.1 x86 hosts

2024-06-12 Thread Paolo Bonzini

On Wed, Jun 12, 2024 at 5:09 PM Daniel P. Berrangé  wrote:
> This might suggest we could put a runtime feature check in main(),
> print a warning and then exit(1), however, QEMU has alot of code
> that is triggered from ELF constructors. If we're building the
> entire of QEMU codebase with extra features enabled, I worry that
> the constructors could potentially cause a illegal instruction
> crash before main() runs ?

And I learnt that one can simply add -mneeded to the compiler command
line to achieve that, at least on glibc systems:

$ gcc f.c -mneeded -mpopcnt
$ qemu-x86_64 -cpu core2duo ./a.out
./a.out: CPU ISA level is lower than required
$ qemu-x86_64 ./a.out
1234

$ gcc f.c -mneeded
$ qemu-x86_64 -cpu core2duo ./a.out
1234

Using "readelf -n" on the executable unveils the magic:

Displaying notes found in: .note.gnu.property
  OwnerData size Description
  GNU  0x0030NT_GNU_PROPERTY_TYPE_0
  Properties: x86 ISA needed: x86-64-baseline, x86-64-v2
x86 feature used: x86
x86 ISA used:

I'm actually amazed. :)

Paolo

Re: [RFC PATCH v2 3/5] rust: add PL011 device model

2024-06-12 Thread Paolo Bonzini

On Wed, Jun 12, 2024 at 4:42 PM Manos Pitsidianakis
 wrote:
> There was consensus in the community call that we won't be writing Rust
> APIs for internal C QEMU interfaces; or at least, that's not the goal

I disagree with that. We need _some_ kind of bindings, otherwise we
have too much unsafe code, and the benefit of Rust becomes so much
lower that I doubt the utility.

If something is used by only one device then fine, but when some kind
of unsafe code repeats across most if not all devices, that is a
problem. It can be macros, it can be smart pointers, that remains to
be seen---but repetition should be a warning signal that _something_
is necessary.

> >For definining new classes I think it's okay if Rust does not support
> >writing superclasses yet, only leaves.
> >
> >I would make a QOM class written in Rust a struct that only contains
> >the new fields. The struct must implement Default and possibly Drop
> >(for finalize).
>
> The object is allocated and freed from C, hence it is not Dropped. We're
> only ever accessing it from a reference retrieved from a QEMU provided
> raw pointer. If the struct gains heap object fields like Box or Vec or
> String, they'd have to be dropped manually on _unrealize.

That's my point, if you have

  struct MyDevice_Inner {
data: Vec,
  }

  struct MyDevice {
parent_obj: qemu::bindings::SysBusDevice,
private: ManuallyDrop,
  }

then the instance_finalize method can simply do

  pub instance_finalize(self: *c_void)
  {
let dev = self as *mut MyDevice;
unsafe { ManuallyDrop::drop(dev.private) }
  }

Don't do it on _unrealize, create macros that do it for you.

> >and then a macro defines a wrapper struct that includes just two
> >fields, one for the superclass and one for the Rust struct.
> >instance_init can initialize the latter with Default::default().
> >
> >  struct PL011 {
> >parent_obj: qemu::bindings::SysBusDevice,
> >private: PL011_Inner,
> >  }
>
> a nested struct is not necessary for using the Default trait

Agreed, but a nested struct is nice anyway in my opinion as a boundary
between the C-ish and Rust idiomatic code.

> >"private" probably should be RefCell, avoiding the unsafe
> >
> >state.as_mut().read(addr, size)
>
>
> RefCell etc are not FFI safe.

Why does it matter? Everything after the SysBusDevice is private.

> Also, nested fields must be visible so that the offset_of! macro works, for 
> QOM properties.

Note that QOM properties do not use offset_of; qdev properties do.
Using qdev properties is much easier because they hide visitors, but
again - not necessary, sometimes going lower-level can be easier if
the API you wrap is less C-ish.

Also, you can define constants (including properties) in contexts
where non-public fields are visible:

use std::mem;
pub struct Foo {
_x: i32,
y: i32,
}
impl Foo {
pub const OFFSET_Y: usize = mem::offset_of!(Foo, y);
}
fn main() {
println!("{}", Foo::OFFSET_Y);
}

Any offset needed to go past the SysBusDevice and any other fields
before MyDevice_Inner can be added via macros. Also note that it
doesn't _have_ to be RefCell; RefCell isn't particularly magic. We can
implement our own interior mutability thingy that is more friendly to
qdev properties, or that includes the ManuallyDrop<> thing from above,
or both.

For example you could have

  type PL011 = QOMImpl;

and all the magic (for example Borrow, the TypeInfo, the
instance_init and instance_finalize function) would be in QOMImpl.

My point is: let's not focus on having a C-like API. It's the easiest
thing to do but not the target.

> Finally,
>
>  state.as_mut().read(addr, size)
>
> Is safe since we receive a valid pointer from QEMU. This fact cannot be
> derived by the compiler, which is why it has an `unsafe` keyword. That
> does not mean that the use here is unsafe.

Yes, it is safe otherwise it would be undefined behavior, but there
are no checks of the kind that you have in Rust whenever you have

state.as_mut() implies that no other references to state are in use;
but there are (you pass it as the opaque value to both the
MemoryRegionOps and the chardev frontend callbacks). This is why I
think something like RefCell is needed to go from a shared reference
to an exclusive one (interior mutability).

> >There should also be macros to define the wrappers for MMIO MemoryRegions.
>
> Do you mean the MemoryRegionOps?

Yes.

> I wanted to focus on the build system integration for the first RFC
> which is why there are some macros but not in every place it makes
> sense.

Yes, absolutely. We need to start somewhere.

Paolo

Re: [PATCH 0/5] Reinstate ability to use Qemu on pre-SSE4.1 x86 hosts

2024-06-12 Thread Daniel P . Berrangé

On Wed, Jun 12, 2024 at 01:21:26PM +0100, Daniel P. Berrangé wrote:
> On Wed, Jun 12, 2024 at 01:51:31PM +0200, Paolo Bonzini wrote:
> > On Wed, Jun 12, 2024 at 1:38 PM Daniel P. Berrangé  
> > wrote:
> > > This isn't anything to do with the distro installer. The use case is that
> > > the distro wants all its software to be able to run on the x86_64 baseline
> > > it has chosen to build with.
> > 
> > Sure, and they can patch the packages if their wish is not shared by
> > upstream. Alternatively they can live with the fact that not all users
> > will be able to use all packages, which is probably already the case.
> 
> Yep, there's almost certainly scientific packages that have done
> optimizations in their builds. QEMU is slightly more special
> though because it is classed as a "critical path" package for
> the distro. Even the QEMU linux-user pieces are now critical path,
> since they're leveraged by docker & podman for running foreign arch
> containers.
> 
> > Or drop QEMU, I guess. Has FeSCO ever expressed how strict they are
> > and which of the three options they'd pick?
> 
> I don't know - i'm going to raise this question to find out if
> there's any guidance.

I learnt that FESCo approved a surprisingly loose rule saying

  "Libraries packaged in Fedora may require ISA extensions,
   however any packaged application must not crash on any
   officially supported architecture, either by providing
   a generic fallback implementation OR by cleanly exiting
   when the requisite hardware support is unavailable."

This might suggest we could put a runtime feature check in main(),
print a warning and then exit(1), however, QEMU has alot of code
that is triggered from ELF constructors. If we're building the
entire of QEMU codebase with extra features enabled, I worry that
the constructors could potentially cause a illegal instruction
crash before main() runs ?

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH 3/4] migration: Use MigrationStatus instead of int

2024-06-12 Thread Peter Xu

On Wed, Jun 12, 2024 at 10:42:27AM -0400, Peter Xu wrote:
> @@ -1544,7 +1545,7 @@ bool migration_in_postcopy(void)
>  }
>  }
>  
> -bool migration_postcopy_is_alive(int state)
> +bool migration_postcopy_is_alive(MigrationStatus state)
>  {
>  switch (state) {
>  case MIGRATION_STATUS_POSTCOPY_ACTIVE:
> @@ -1598,7 +1599,7 @@ bool migration_is_idle(void)
>  case MIGRATION_STATUS_DEVICE:
>  case MIGRATION_STATUS_WAIT_UNPLUG:
>  return false;
> -case MIGRATION_STATUS__MAX:
> +default:
>  g_assert_not_reached();
>  }

This survives the tests, but I just found that it's risky, as it's not
covering all the states..

I'll squash below when I send v2 instead.

===8<===
>From 1fc42c76294044c0ccca8841e482472486de5459 Mon Sep 17 00:00:00 2001
From: Peter Xu 
Date: Wed, 12 Jun 2024 10:57:26 -0400
Subject: [PATCH] fixup! migration: Use MigrationStatus instead of int

Signed-off-by: Peter Xu 
---
 migration/migration.c | 13 +
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 9475dce7dc..706cee1b69 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1637,20 +1637,9 @@ bool migration_is_idle(void)
 case MIGRATION_STATUS_COMPLETED:
 case MIGRATION_STATUS_FAILED:
 return true;
-case MIGRATION_STATUS_SETUP:
-case MIGRATION_STATUS_CANCELLING:
-case MIGRATION_STATUS_ACTIVE:
-case MIGRATION_STATUS_POSTCOPY_ACTIVE:
-case MIGRATION_STATUS_COLO:
-case MIGRATION_STATUS_PRE_SWITCHOVER:
-case MIGRATION_STATUS_DEVICE:
-case MIGRATION_STATUS_WAIT_UNPLUG:
-return false;
 default:
-g_assert_not_reached();
+return false;
 }
-
-return false;
 }
 
 bool migration_is_active(void)
-- 
2.45.0

===8<===

-- 
Peter Xu

Re: [RFC PATCH v2 3/5] rust: add PL011 device model

2024-06-12 Thread Manos Pitsidianakis


On Wed, 12 Jun 2024 15:29, Paolo Bonzini  wrote:

I think this is extremely useful to show where we could go in the task
of creating more idiomatic bindings.

On Tue, Jun 11, 2024 at 12:34 PM Manos Pitsidianakis
 wrote:

+fn main() {
+println!("cargo::rerun-if-env-changed=MESON_BUILD_DIR");
+println!("cargo::rerun-if-env-changed=MESON_BUILD_ROOT");
+println!("cargo::rerun-if-changed=src/generated.rs.inc");


Why do you need this rerun-if-changed?


To show an error from build.rs in case the file is deleted and the build 
is not via meson





+pub const PL011_ARM_INFO: TypeInfo = TypeInfo {
+name: TYPE_PL011.as_ptr(),
+parent: TYPE_SYS_BUS_DEVICE.as_ptr(),
+instance_size: core::mem::size_of::(),
+instance_align: core::mem::align_of::(),
+instance_init: Some(pl011_init),
+instance_post_init: None,
+instance_finalize: None,
+abstract_: false,
+class_size: 0,
+class_init: Some(pl011_class_init),
+class_base_init: None,
+class_data: core::ptr::null_mut(),
+interfaces: core::ptr::null_mut(),
+};


QOM is certainly a major part of what we need to do for idiomatic
bindings. This series includes both using QOM objects (chardev) and
defining them.

For using QOM objects, there is only one strategy that we need to
implement for both Chardev and DeviceState/SysBusDevice which is nice.
We can probably take inspiration from glib-rs, see for example
- https://docs.rs/glib/latest/glib/object/trait.Cast.html
- https://docs.rs/glib/latest/glib/object/trait.ObjectType.html
- https://docs.rs/glib/latest/glib/object/struct.ObjectRef.html



There was consensus in the community call that we won't be writing Rust 
APIs for internal C QEMU interfaces; or at least, that's not the goal


It's not necessary to make bindings to write idiomatic Rust. If you 
notice, most callbacks QEMU calls cast the pointer into the Rust struct 
which then calls its idiomatic type methods. I like abstraction a lot, 
but it has diminishing returns. We'll see how future Rust devices look 
like and we can then decide if extra code for abstractions is a good 
trade-off.




For definining new classes I think it's okay if Rust does not support
writing superclasses yet, only leaves.

I would make a QOM class written in Rust a struct that only contains
the new fields. The struct must implement Default and possibly Drop
(for finalize).


The object is allocated and freed from C, hence it is not Dropped. We're 
only ever accessing it from a reference retrieved from a QEMU provided 
raw pointer. If the struct gains heap object fields like Box or Vec or 
String, they'd have to be dropped manually on _unrealize.





 struct PL011_Inner {
   iomem: MemoryRegion,
   readbuff: u32.
   ...
 }

and then a macro defines a wrapper struct that includes just two
fields, one for the superclass and one for the Rust struct.
instance_init can initialize the latter with Default::default().

 struct PL011 {
   parent_obj: qemu::bindings::SysBusDevice,
   private: PL011_Inner,
 }


a nested struct is not necessary for using the Default trait. Consider a 
Default impl that sets parent_obj as a field memset to zero; on instance 
initialization we can do


 *self = Self {
   parent_obj: self.parent_obj,
   ..Self::default(),
 };


"private" probably should be RefCell, avoiding the unsafe

   state.as_mut().read(addr, size)



RefCell etc are not FFI safe. Also, nested fields must be visible so 
that the offset_of! macro works, for QOM properties. Finally, 


state.as_mut().read(addr, size)

Is safe since we receive a valid pointer from QEMU. This fact cannot be 
derived by the compiler, which is why it has an `unsafe` keyword. That 
does not mean that the use here is unsafe.




There should also be macros to define the wrappers for MMIO MemoryRegions.



Do you mean the MemoryRegionOps?





+pub fn realize( self) {
+unsafe {
+qemu_chr_fe_set_handlers(
+addr_of_mut!(self.char_backend),
+Some(pl011_can_receive),
+Some(pl011_receive),
+Some(pl011_event),
+None,
+addr_of_mut!(*self).cast::(),
+core::ptr::null_mut(),
+true,
+);
+}


Probably each set of callbacks (MemoryRegion, Chardev) needs to be a
struct that implements a trait.


+#[macro_export]
+macro_rules! define_property {
+($name:expr, $state:ty, $field:expr, $prop:expr, $type:expr, default = 
$defval:expr) => {
+$crate::generated::Property {
+name: $name,
+info: $prop,
+offset: ::core::mem::offset_of!($state, $field) as _,
+bitnr: 0,
+bitmask: 0,
+set_default: true,
+defval: $crate::generated::Property__bindgen_ty_1 { u: 
$defval.into() },
+arrayoffset: 0,
+arrayinfo: ::core::ptr::null(),
+arrayfieldsize: 0,
+link_type: ::core::ptr::null(),
+

[PATCH 2/4] migration: Rename thread debug names

2024-06-12 Thread Peter Xu

The postcopy thread names on dest QEMU are slightly confusing, partly I'll
need to blame myself on 36f62f11e4 ("migration: Postcopy preemption
preparation on channel creation").  E.g., "fault-fast" reads like a fast
version of "fault-default", but it's actually the fast version of
"postcopy/listen".

Taking this chance, rename all the migration threads with proper rules.
Considering we only have 15 chars usable, prefix all threads with "mig/",
meanwhile identify src/dst threads properly this time.  So now most thread
names will look like "mig/DIR/xxx", where DIR will be "src"/"dst", except
the bg-snapshot thread which doesn't have a direction.

For multifd threads, making them "mig/{src|dst}/{send|recv}_%d".

We used to have "live_migration" thread for a very long time, now it's
called "mig/src/main".  We may hope to have "mig/dst/main" soon but not
yet.

Signed-off-by: Peter Xu 
---
 migration/colo.c | 2 +-
 migration/migration.c| 6 +++---
 migration/multifd.c  | 6 +++---
 migration/postcopy-ram.c | 4 ++--
 migration/savevm.c   | 2 +-
 5 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index f96c2ee069..6449490221 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -935,7 +935,7 @@ void coroutine_fn colo_incoming_co(void)
 assert(bql_locked());
 assert(migration_incoming_colo_enabled());
 
-qemu_thread_create(, "COLO incoming", colo_process_incoming_thread,
+qemu_thread_create(, "mig/dst/colo", colo_process_incoming_thread,
mis, QEMU_THREAD_JOINABLE);
 
 mis->colo_incoming_co = qemu_coroutine_self();
diff --git a/migration/migration.c b/migration/migration.c
index e1b269624c..d41e00ed4c 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2408,7 +2408,7 @@ static int open_return_path_on_source(MigrationState *ms)
 
 trace_open_return_path_on_source();
 
-qemu_thread_create(>rp_state.rp_thread, "return path",
+qemu_thread_create(>rp_state.rp_thread, "mig/src/rp-thr",
source_return_path_thread, ms, QEMU_THREAD_JOINABLE);
 ms->rp_state.rp_thread_created = true;
 
@@ -3747,10 +3747,10 @@ void migrate_fd_connect(MigrationState *s, Error 
*error_in)
 }
 
 if (migrate_background_snapshot()) {
-qemu_thread_create(>thread, "bg_snapshot",
+qemu_thread_create(>thread, "mig/snapshot",
 bg_migration_thread, s, QEMU_THREAD_JOINABLE);
 } else {
-qemu_thread_create(>thread, "live_migration",
+qemu_thread_create(>thread, "mig/src/main",
 migration_thread, s, QEMU_THREAD_JOINABLE);
 }
 s->migration_thread_running = true;
diff --git a/migration/multifd.c b/migration/multifd.c
index f317bff077..7afc0965f6 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -1059,7 +1059,7 @@ static bool multifd_tls_channel_connect(MultiFDSendParams 
*p,
 args->p = p;
 
 p->tls_thread_created = true;
-qemu_thread_create(>tls_thread, "multifd-tls-handshake-worker",
+qemu_thread_create(>tls_thread, "mig/src/tls",
multifd_tls_handshake_thread, args,
QEMU_THREAD_JOINABLE);
 return true;
@@ -1185,7 +1185,7 @@ bool multifd_send_setup(void)
 } else {
 p->iov = g_new0(struct iovec, page_count);
 }
-p->name = g_strdup_printf("multifdsend_%d", i);
+p->name = g_strdup_printf("mig/src/send_%d", i);
 p->page_size = qemu_target_page_size();
 p->page_count = page_count;
 p->write_flags = 0;
@@ -1601,7 +1601,7 @@ int multifd_recv_setup(Error **errp)
 + sizeof(uint64_t) * page_count;
 p->packet = g_malloc0(p->packet_len);
 }
-p->name = g_strdup_printf("multifdrecv_%d", i);
+p->name = g_strdup_printf("mig/dst/recv_%d", i);
 p->iov = g_new0(struct iovec, page_count);
 p->normal = g_new0(ram_addr_t, page_count);
 p->zero = g_new0(ram_addr_t, page_count);
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 3419779548..97701e6bb2 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -1238,7 +1238,7 @@ int postcopy_ram_incoming_setup(MigrationIncomingState 
*mis)
 return -1;
 }
 
-postcopy_thread_create(mis, >fault_thread, "fault-default",
+postcopy_thread_create(mis, >fault_thread, "mig/dst/fault",
postcopy_ram_fault_thread, QEMU_THREAD_JOINABLE);
 mis->have_fault_thread = true;
 
@@ -1258,7 +1258,7 @@ int postcopy_ram_incoming_setup(MigrationIncomingState 
*mis)
  * This thread needs to be created after the temp pages because
  * it'll fetch RAM_CHANNEL_POSTCOPY PostcopyTmpPage immediately.
  */
-postcopy_thread_create(mis, >postcopy_prio_thread, "fault-fast",
+postcopy_thread_create(mis, >postcopy_prio_thread, 
"mig/dst/preempt",

[PATCH 4/4] migration/postcopy: Add postcopy-recover-setup phase

2024-06-12 Thread Peter Xu

This patch adds a migration state on src called "postcopy-recover-setup".
The new state will describe the intermediate step starting from when the
src QEMU started an postcopy recovery request, until the migration channels
are properly established, but before the recovery process take place.

The request came from Libvirt where Libvirt currently rely on the migration
state events to detect migration state changes.  That works for most of the
migration process but except postcopy recovery failures at the beginning.

Currently postcopy recovery only has two major states:

  - postcopy-paused: this is the state that both sides of QEMU will be in
for a long time as long as the migration channel was interrupted.

  - postcopy-recover: this is the state where both sides of QEMU handshake
with each other, preparing for a continuation of postcopy which used to
be interrupted.

The issue here is when the recovery port is invalid, the src QEMU will take
the URI/channels, noticing the ports are not valid, and it'll silently keep
in the postcopy-paused state, with no event sent to Libvirt.  In this case,
the only thing Libvirt can do is to poll the migration status with a proper
interval, however that's less optimal.

Considering that this is the only case where Libvirt won't get a
notification from QEMU on such events, let's add postcopy-recover-setup
state to mimic what we used to have with the "setup" state of a newly
initialized migration, describing the phase of connection establishment.

With that, postcopy recovery will have two paths to go now, and either path
will guarantee an event generated.  Now the events will look like this
during a recovery process on src QEMU:

  - Initially when the recovery is initiated on src, QEMU will go from
"postcopy-paused" -> "postcopy-recover-setup".  Old QEMUs don't have
this event.

  - Depending on whether the channel re-establishment is succeeded:

- In succeeded case, src QEMU will move from "postcopy-recover-setup"
  to "postcopy-recover".  Old QEMUs also have this event.

- In failure case, src QEMU will move from "postcopy-recover-setup" to
  "postcopy-paused" again.  Old QEMUs don't have this event.

This guarantees that Libvirt will always receive a notification for
recovery process properly.

One thing to mention is, such new status is only needed on src QEMU not
both.  On dest QEMU, the state machine doesn't change.  Hence the events
don't change either.  It's done like so because dest QEMU may not have an
explicit point of setup start.  E.g., it can happen that when dest QEMUs
doesn't use migrate-recover command to use a new URI/channel, but the old
URI/channels can be reused in recovery, in which case the old ports simply
can work again after the network routes are fixed up.

The patch has some touch-ups in the dest path too, but it's because there's
some unclearness on using migrate_set_state(), so the change should make it
crystal clear now by checking current status always.  The next step from
that POV would be making migrate_set_state() not using cmpxchg() but always
update the status, but that's for later.

Cc: Jiri Denemark 
Cc: Fabiano Rosas 
Cc: Prasad Pandit 
Buglink: https://issues.redhat.com/browse/RHEL-38485
Signed-off-by: Peter Xu 
---
 qapi/migration.json  |  4 +++
 migration/postcopy-ram.h |  3 ++
 migration/migration.c| 66 +++-
 migration/postcopy-ram.c |  6 
 migration/savevm.c   |  4 +--
 5 files changed, 73 insertions(+), 10 deletions(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index a351fd3714..a135bbcd96 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -142,6 +142,9 @@
 #
 # @postcopy-paused: during postcopy but paused.  (since 3.0)
 #
+# @postcopy-recover-setup: setup phase for a postcopy recover process,
+# preparing for a recover phase to start.  (since 9.1)
+#
 # @postcopy-recover: trying to recover from a paused postcopy.  (since
 # 3.0)
 #
@@ -166,6 +169,7 @@
 { 'enum': 'MigrationStatus',
   'data': [ 'none', 'setup', 'cancelling', 'cancelled',
 'active', 'postcopy-active', 'postcopy-paused',
+'postcopy-recover-setup',
 'postcopy-recover', 'completed', 'failed', 'colo',
 'pre-switchover', 'device', 'wait-unplug' ] }
 ##
diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
index ecae941211..a6df1b2811 100644
--- a/migration/postcopy-ram.h
+++ b/migration/postcopy-ram.h
@@ -13,6 +13,8 @@
 #ifndef QEMU_POSTCOPY_RAM_H
 #define QEMU_POSTCOPY_RAM_H
 
+#include "qapi/qapi-types-migration.h"
+
 /* Return true if the host supports everything we need to do postcopy-ram */
 bool postcopy_ram_supported_by_host(MigrationIncomingState *mis,
 Error **errp);
@@ -193,5 +195,6 @@ enum PostcopyChannels {
 void postcopy_preempt_new_channel(MigrationIncomingState *mis, QEMUFile *file);
 void postcopy_preempt_setup(MigrationState *s);
 int

[PATCH 0/4] migration: New postcopy state, and some cleanups

2024-06-12 Thread Peter Xu

The major goal of this patchset is patch 4, which introduced a new postcopy
state so that we will send an event in postcopy reconnect failures that
Libvirt would prefer to have.  There's more information for that issue in
the commit message alone.

Patch 1-2 are cleanups that are not directly relevant but I found/stored
that could be good to have.  I made it simple by putting them together in
one thread to make patch management easier, but I can send them separately
when necessary.

Patch 3 is also a cleanup, but will be needed for patch 4 as dependency.

Comments welcomed, thanks.

CI: https://gitlab.com/peterx/qemu/-/pipelines/1328309702
(msys2-64bit is failing, but doesn't look relevant)

Peter Xu (4):
  migration/multifd: Avoid the final FLUSH in complete()
  migration: Rename thread debug names
  migration: Use MigrationStatus instead of int
  migration/postcopy: Add postcopy-recover-setup phase

 qapi/migration.json  |  4 ++
 migration/migration.h|  9 +++--
 migration/postcopy-ram.h |  3 ++
 migration/colo.c |  2 +-
 migration/migration.c| 85 
 migration/multifd.c  |  6 +--
 migration/postcopy-ram.c | 10 -
 migration/ram.c  |  4 --
 migration/savevm.c   |  6 +--
 9 files changed, 95 insertions(+), 34 deletions(-)

-- 
2.45.0

[PATCH 1/4] migration/multifd: Avoid the final FLUSH in complete()

2024-06-12 Thread Peter Xu

We always do the flush when finishing one round of scan, and during
complete() phase we should scan one more round making sure no dirty page
existed.  In that case we shouldn't need one explicit FLUSH at the end of
complete(), as when reaching there all pages should have been flushed.

Signed-off-by: Peter Xu 
---
 migration/ram.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index ceea586b06..edec1a2d07 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3300,10 +3300,6 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 }
 }
 
-if (migrate_multifd() && !migrate_multifd_flush_after_each_section() &&
-!migrate_mapped_ram()) {
-qemu_put_be64(f, RAM_SAVE_FLAG_MULTIFD_FLUSH);
-}
 qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
 return qemu_fflush(f);
 }
-- 
2.45.0

[PATCH 3/4] migration: Use MigrationStatus instead of int

2024-06-12 Thread Peter Xu

QEMU uses "int" in most cases even if it stores MigrationStatus.  I don't
know why, so let's try to do that right and see what blows up..

Signed-off-by: Peter Xu 
---
 migration/migration.h |  9 +
 migration/migration.c | 13 +++--
 2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index 6af01362d4..38aa1402d5 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -160,7 +160,7 @@ struct MigrationIncomingState {
 /* PostCopyFD's for external userfaultfds & handlers of shared memory */
 GArray   *postcopy_remote_fds;
 
-int state;
+MigrationStatus state;
 
 /*
  * The incoming migration coroutine, non-NULL during qemu_loadvm_state().
@@ -301,7 +301,7 @@ struct MigrationState {
 /* params from 'migrate-set-parameters' */
 MigrationParameters parameters;
 
-int state;
+MigrationStatus state;
 
 /* State related to return path */
 struct {
@@ -459,7 +459,8 @@ struct MigrationState {
 bool rdma_migration;
 };
 
-void migrate_set_state(int *state, int old_state, int new_state);
+void migrate_set_state(MigrationStatus *state, MigrationStatus old_state,
+   MigrationStatus new_state);
 
 void migration_fd_process_incoming(QEMUFile *f);
 void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp);
@@ -479,7 +480,7 @@ int migrate_init(MigrationState *s, Error **errp);
 bool migration_is_blocked(Error **errp);
 /* True if outgoing migration has entered postcopy phase */
 bool migration_in_postcopy(void);
-bool migration_postcopy_is_alive(int state);
+bool migration_postcopy_is_alive(MigrationStatus state);
 MigrationState *migrate_get_current(void);
 bool migration_has_failed(MigrationState *);
 bool migrate_mode_is_cpr(MigrationState *);
diff --git a/migration/migration.c b/migration/migration.c
index d41e00ed4c..bfbd657035 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -390,7 +390,7 @@ void migration_incoming_state_destroy(void)
 yank_unregister_instance(MIGRATION_YANK_INSTANCE);
 }
 
-static void migrate_generate_event(int new_state)
+static void migrate_generate_event(MigrationStatus new_state)
 {
 if (migrate_events()) {
 qapi_event_send_migration(new_state);
@@ -1273,8 +1273,6 @@ static void fill_destination_migration_info(MigrationInfo 
*info)
 }
 
 switch (mis->state) {
-case MIGRATION_STATUS_NONE:
-return;
 case MIGRATION_STATUS_SETUP:
 case MIGRATION_STATUS_CANCELLING:
 case MIGRATION_STATUS_CANCELLED:
@@ -1290,6 +1288,8 @@ static void fill_destination_migration_info(MigrationInfo 
*info)
 info->has_status = true;
 fill_destination_postcopy_migration_info(info);
 break;
+default:
+return;
 }
 info->status = mis->state;
 
@@ -1337,7 +1337,8 @@ void qmp_migrate_start_postcopy(Error **errp)
 
 /* shared migration helpers */
 
-void migrate_set_state(int *state, int old_state, int new_state)
+void migrate_set_state(MigrationStatus *state, MigrationStatus old_state,
+   MigrationStatus new_state)
 {
 assert(new_state < MIGRATION_STATUS__MAX);
 if (qatomic_cmpxchg(state, old_state, new_state) == old_state) {
@@ -1544,7 +1545,7 @@ bool migration_in_postcopy(void)
 }
 }
 
-bool migration_postcopy_is_alive(int state)
+bool migration_postcopy_is_alive(MigrationStatus state)
 {
 switch (state) {
 case MIGRATION_STATUS_POSTCOPY_ACTIVE:
@@ -1598,7 +1599,7 @@ bool migration_is_idle(void)
 case MIGRATION_STATUS_DEVICE:
 case MIGRATION_STATUS_WAIT_UNPLUG:
 return false;
-case MIGRATION_STATUS__MAX:
+default:
 g_assert_not_reached();
 }
 
-- 
2.45.0

Re: [PATCH v2 0/9] plugins: Use unwind info for special gdb registers

2024-06-12 Thread Alex Bennée

Richard Henderson  writes:

> This is an attempt to fix
> https://gitlab.com/qemu-project/qemu/-/issues/2208
> ("PC is not updated for each instruction in TCG plugins")
>
> I have only updated target/{i386,arm} so far, but basically all
> targets need updating for the new callbacks.  Extra points to
> anyone who sees how to avoid the extra code duplication.  :-)

I've made a few comments but yeah I think we just have to live with the
extra helpers. The only other option would be pre-notifying the gdb
subsystem about which registers are "lazy" which I think amounts to the
same thing.

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH v2 8/9] target/arm: Add aarch64_tcg_ops

2024-06-12 Thread Alex Bennée

Richard Henderson  writes:

> For the moment, this is an exact copy of arm_tcg_ops.
> Export arm_cpu_exec_interrupt for the cross-file reference.
>
> Signed-off-by: Richard Henderson 
> ---
>  target/arm/internals.h |  1 +
>  target/arm/cpu.c   |  2 +-
>  target/arm/cpu64.c | 30 ++
>  3 files changed, 32 insertions(+), 1 deletion(-)
>
> diff --git a/target/arm/internals.h b/target/arm/internals.h
> index 11b5da2562..dc53d86249 100644
> --- a/target/arm/internals.h
> +++ b/target/arm/internals.h
> @@ -364,6 +364,7 @@ void arm_restore_state_to_opc(CPUState *cs,
>  
>  #ifdef CONFIG_TCG
>  void arm_cpu_synchronize_from_tb(CPUState *cs, const TranslationBlock *tb);
> +bool arm_cpu_exec_interrupt(CPUState *cs, int interrupt_request);
>  #endif /* CONFIG_TCG */
>  
>  typedef enum ARMFPRounding {
> diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> index 35fa281f1b..3cd4711064 100644
> --- a/target/arm/cpu.c
> +++ b/target/arm/cpu.c
> @@ -824,7 +824,7 @@ static inline bool arm_excp_unmasked(CPUState *cs, 
> unsigned int excp_idx,
>  return unmasked || pstate_unmasked;
>  }
>  
> -static bool arm_cpu_exec_interrupt(CPUState *cs, int interrupt_request)
> +bool arm_cpu_exec_interrupt(CPUState *cs, int interrupt_request)
>  {
>  CPUClass *cc = CPU_GET_CLASS(cs);
>  CPUARMState *env = cpu_env(cs);
> diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
> index 262a1d6c0b..7ba80099af 100644
> --- a/target/arm/cpu64.c
> +++ b/target/arm/cpu64.c
> @@ -31,6 +31,9 @@
>  #include "hvf_arm.h"
>  #include "qapi/visitor.h"
>  #include "hw/qdev-properties.h"
> +#ifdef CONFIG_TCG
> +#include "hw/core/tcg-cpu-ops.h"
> +#endif
>  #include "internals.h"
>  #include "cpu-features.h"
>  #include "cpregs.h"
> @@ -793,6 +796,29 @@ static const gchar *aarch64_gdb_arch_name(CPUState *cs)
>  return "aarch64";
>  }
>  
> +#ifdef CONFIG_TCG
> +static const TCGCPUOps aarch64_tcg_ops = {
> +.initialize = arm_translate_init,
> +.synchronize_from_tb = arm_cpu_synchronize_from_tb,
> +.debug_excp_handler = arm_debug_excp_handler,
> +.restore_state_to_opc = arm_restore_state_to_opc,
> +
> +#ifdef CONFIG_USER_ONLY
> +.record_sigsegv = arm_cpu_record_sigsegv,
> +.record_sigbus = arm_cpu_record_sigbus,
> +#else
> +.tlb_fill = arm_cpu_tlb_fill,
> +.cpu_exec_interrupt = arm_cpu_exec_interrupt,
> +.do_interrupt = arm_cpu_do_interrupt,
> +.do_transaction_failed = arm_cpu_do_transaction_failed,
> +.do_unaligned_access = arm_cpu_do_unaligned_access,
> +.adjust_watchpoint_address = arm_adjust_watchpoint_address,
> +.debug_check_watchpoint = arm_debug_check_watchpoint,
> +.debug_check_breakpoint = arm_debug_check_breakpoint,
> +#endif /* !CONFIG_USER_ONLY */
> +};
> +#endif /* CONFIG_TCG */
> +

My principle concern is duplicating an otherwise identical structure
just gives us more opportunity to miss a change. 

>  static void aarch64_cpu_class_init(ObjectClass *oc, void *data)
>  {
>  CPUClass *cc = CPU_CLASS(oc);
> @@ -802,6 +828,10 @@ static void aarch64_cpu_class_init(ObjectClass *oc, void 
> *data)
>  cc->gdb_core_xml_file = "aarch64-core.xml";
>  cc->gdb_arch_name = aarch64_gdb_arch_name;
>  
> +#ifdef CONFIG_TCG
> +cc->tcg_ops = _tcg_ops;
> +#endif
> +

What happens when the CPU is running mixed mode code and jumping between
64 and 32 bit? Wouldn't it be easier to have a helper that routes to the
correct unwinder, c.f. gen_intermediate_code

  #ifdef TARGET_AARCH64
  if (EX_TBFLAG_ANY(tb_flags, AARCH64_STATE)) {
  ops = _translator_ops;
  }
  #endif

which I guess for a runtime helper would be:

   if (is_a64(cpu_env(cs))) {
  aarch64_plugin_need_unwind_for_reg(...);
   } else {
  arm_plugin_need_unwind_for_reg(...);
   }


etc...


>  object_class_property_add_bool(oc, "aarch64", aarch64_cpu_get_aarch64,
> aarch64_cpu_set_aarch64);
>  object_class_property_set_description(oc, "aarch64",

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: assert fails in s390x TCG

2024-06-12 Thread Philippe Mathieu-Daudé


On 12/6/24 15:08, Claudio Fontana wrote:

On 6/12/24 14:41, Philippe Mathieu-Daudé wrote:

On 28/7/23 18:43, Richard Henderson wrote:

On 7/28/23 09:05, Richard Henderson wrote:

It's the page containing both code and a page table entry that
concerns me.  It seems like a kernel bug, though obviously we
shouldn't crash.  I'm not sure what to do about it.


Bah.  Of course it's not a kernel bug, since the store is to LowCore.
And of course LowCore is part of a larger page, which easily has other
stuff.


Maybe related to
https://lore.kernel.org/qemu-devel/20240611215814.32752-1-a...@rev.ng/



Hi philippe,

this was already fixed by Ilya's commit:

commit 791b2b6a930273db694b9ba48bbb406e78715927
Author: Ilya Leoshkevich 
Date:   Sat Aug 5 01:03:18 2023 +0200

 target/s390x: Fix the "ignored match" case in VSTRS
 
 Currently the emulation of VSTRS recognizes partial matches in presence

 of \0 in the haystack, which, according to PoP, is not correct:
 
 If the ZS flag is one and a zero byte was detected

 in the second operand, then there can not be a
 partial match ...
 
 Add a check for this. While at it, fold a number of explicitly handled

 special cases into the generic logic.
 
 Cc: qemu-sta...@nongnu.org

 Reported-by: Claudio Fontana 
 Closes: https://lists.gnu.org/archive/html/qemu-devel/2023-08/msg00633.html
 Fixes: 1d706f314191 ("target/s390x: vxeh2: vector string search")
 Signed-off-by: Ilya Leoshkevich 
 Message-Id: <20230804233748.218935-3-...@linux.ibm.com>
 Tested-by: Claudio Fontana 
 Acked-by: David Hildenbrand 
 Signed-off-by: Thomas Huth 


Ah great, thanks Ilya ;)

Re: [PATCH 0/3] memory: Constify IOMMUTLBEvent in memory_region_notify_iommu*()

2024-06-12 Thread Peter Xu

On Wed, Jun 12, 2024 at 03:25:28PM +0200, Philippe Mathieu-Daudé wrote:
> Trivial patches using const IOMMUTLBEvent.
> 
> Philippe Mathieu-Daudé (3):
>   memory: Constify IOMMUTLBEvent in memory_region_notify_iommu_one()
>   memory: Constify IOMMUTLBEvent in memory_region_notify_iommu()
>   hw/i386/iommu: Constify IOMMUTLBEvent in vtd_page_walk_hook prototype

Reviewed-by: Peter Xu 

-- 
Peter Xu

1 2 3 >

1 - 100 of 260 matches

Mail list logo