date:20230622

Re: [PULL 09/20] target/tricore: Fix out-of-bounds index in imask instruction

2023-06-22 Thread Michael Tokarev


22.06.2023 17:51, Bastian Koppelmann wrote:
..

Is it a -stable material?


Yes. If you pick this up, make sure you also pick up 
https://lore.kernel.org/qemu-devel/20230621161422.1652151-1-kbast...@mail.uni-paderborn.de/T/#md18391dd165c4fc2e60ddefb886f3522e715f487
which applies the same fix to other instructions.


Aha. "Add CHECK_REG_PAIR() for insn accessing 64 bit regs".
This subject suggests the patch's adding this macro, instead
of using it. If it were worded like "Use CHECK.. for.." instead, I'd
notice this one too.

Picked up both, thank you!

Is there anything else in this series worth picking up for stable, eg:

 Fix helper_ret() not correctly restoring PSW
 Fix RR_JLI clobbering reg A[11]

or maybe others?

Please, in the future, add Cc: qemu-sta...@nongnu.org for patches
worth to have in -stable.

Thanks!

/mjt

Re: [PATCH v4 08/17] tcg: Fix temporary variable in tcg_gen_gvec_andcs

2023-06-22 Thread Richard Henderson


On 6/22/23 19:30, Daniel Henrique Barboza wrote:



On 6/22/23 13:16, Max Chou wrote:

The 5th parameter of tcg_gen_gvec_2s should be replaces by the temporary


s/replaces/replaced



tmp variable in the tcg_gen_gvec_andcs function.

Signed-off-by: Max Chou 
---


Reviewed-by: Daniel Henrique Barboza 


Queued to tcg-next with the typo fixed.


r~




  tcg/tcg-op-gvec.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index 95a588d6d2..a062239804 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -2774,7 +2774,7 @@ void tcg_gen_gvec_andcs(unsigned vece, uint32_t dofs, 
uint32_t aofs,
  TCGv_i64 tmp = tcg_temp_ebb_new_i64();
  tcg_gen_dup_i64(vece, tmp, c);
-    tcg_gen_gvec_2s(dofs, aofs, oprsz, maxsz, c, &g);
+    tcg_gen_gvec_2s(dofs, aofs, oprsz, maxsz, tmp, &g);
  tcg_temp_free_i64(tmp);
  }

Re: [PATCH v3 0/4] Virtio shared dma-buf

2023-06-22 Thread Michael S. Tsirkin

On Wed, May 24, 2023 at 11:13:29AM +0200, Albert Esteve wrote:
> v1 link -> https://lists.gnu.org/archive/html/qemu-devel/2023-05/msg00598.html
> v2 link -> https://lists.gnu.org/archive/html/qemu-devel/2023-05/msg04530.html
> v2 -> v3:
> - Change UUID hash function strategy to djb
> - Add qemu_uuid_is_equal wrapper

Posted some minor comments. Pls address and I'll merge.


> This patch covers the required steps to add support for virtio cross-device 
> resource sharing[1],
> which support is already available in the kernel.
> 
> The main usecase will be sharing dma buffers from virtio-gpu devices (as the 
> exporter
> -see VIRTIO_GPU_CMD_RESOURCE_ASSIGN_UUID in [2]), to virtio-video (under 
> discussion)
> devices (as the buffer-user or importer). Therefore, even though virtio specs 
> talk about
> resources or objects[3], this patch adds the infrastructure with dma-bufs in 
> mind.
> Note that virtio specs let the devices themselves define what a vitio object 
> is.
> 
> These are the main parts that are covered in the patch:
> 
> - Add hash_func and key_equal_func to uuid
> - Shared resources table, to hold all resources that can be shared in the 
> host and their assigned UUID
> - Internal shared table API for virtio devices to add, lookup and remove 
> resources
> - Unit test to verify the API.
> - New message to the vhost-user protocol to allow backend to interact with 
> the shared
>   table API through the control socket
> 
> Applies cleanly to 1c12355
> 
> [1] - https://lwn.net/Articles/828988/
> [2] - 
> https://docs.oasis-open.org/virtio/virtio/v1.2/csd01/virtio-v1.2-csd01.html#x1-3730006
> [3] - 
> https://docs.oasis-open.org/virtio/virtio/v1.2/csd01/virtio-v1.2-csd01.html#x1-10500011
> 
> Albert Esteve (4):
>   uuid: add hash_func and equal_func
>   virtio-dmabuf: introduce virtio-dmabuf
>   vhost-user: add shared_object msg
>   vhost-user: refactor send_resp code
> 
>  MAINTAINERS   |   7 ++
>  docs/interop/vhost-user.rst   |  15 +++
>  hw/display/meson.build|   1 +
>  hw/display/virtio-dmabuf.c|  90 +
>  hw/virtio/vhost-user.c|  90 ++---
>  include/hw/virtio/virtio-dmabuf.h |  59 
>  include/qemu/uuid.h   |   2 +
>  subprojects/libvhost-user/libvhost-user.c |  88 +
>  subprojects/libvhost-user/libvhost-user.h |  56 +++
>  tests/unit/meson.build|   1 +
>  tests/unit/test-uuid.c|  27 ++
>  tests/unit/test-virtio-dmabuf.c   | 112 ++
>  util/uuid.c   |  14 +++
>  13 files changed, 549 insertions(+), 13 deletions(-)
>  create mode 100644 hw/display/virtio-dmabuf.c
>  create mode 100644 include/hw/virtio/virtio-dmabuf.h
>  create mode 100644 tests/unit/test-virtio-dmabuf.c
> 
> -- 
> 2.40.0

Re: [PATCH v3 4/4] vhost-user: refactor send_resp code

2023-06-22 Thread Michael S. Tsirkin

On Wed, May 24, 2023 at 11:13:33AM +0200, Albert Esteve wrote:
> Refactor code to send response message so that
> all common parts both for the common REPLY_ACK
> case, and other data responses, can call it and
> avoid code repetition.
> 
> Signed-off-by: Albert Esteve 
> ---
>  hw/virtio/vhost-user.c | 52 +++---
>  1 file changed, 24 insertions(+), 28 deletions(-)
> 
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 5ac5f0eafd..b888f2c177 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -1619,28 +1619,36 @@ static int 
> vhost_user_backend_handle_shared_object(VhostUserShared *object)
>  return 0;
>  }
>  
> -static bool
> -vhost_user_backend_send_dmabuf_fd(QIOChannel *ioc, VhostUserHeader *hdr,
> -  VhostUserPayload *payload)
> +static bool vhost_user_send_resp(QIOChannel *ioc, VhostUserHeader *hdr,
> + VhostUserPayload *payload)
>  {
>  Error *local_err = NULL;
>  struct iovec iov[2];


As long as you are refactoring, please add an empty line here
after variable declaration.

Also, can't we initialize it here?
struct iovec iov[] = {
{ .iov_base = hdr },
{ .iov_base = payload }
};

will also avoid the need for explicit size.


> -if (hdr->flags & VHOST_USER_NEED_REPLY_MASK) {
> -hdr->flags &= ~VHOST_USER_NEED_REPLY_MASK;
> -hdr->flags |= VHOST_USER_REPLY_MASK;
> +hdr->flags &= ~VHOST_USER_NEED_REPLY_MASK;
> +hdr->flags |= VHOST_USER_REPLY_MASK;
>  
> -hdr->size = sizeof(payload->object);
> +iov[0].iov_base = hdr;
> +iov[0].iov_len = VHOST_USER_HDR_SIZE;
> +iov[1].iov_base = payload;
> +iov[1].iov_len = hdr->size;
> +
> +if (qio_channel_writev_all(ioc, iov, ARRAY_SIZE(iov), &local_err)) {
> +error_report_err(local_err);
> +return false;
> +}
>  
> -iov[0].iov_base = hdr;
> -iov[0].iov_len = VHOST_USER_HDR_SIZE;
> -iov[1].iov_base = payload;
> -iov[1].iov_len = hdr->size;
> +return true;
> +}
>  
> -if (qio_channel_writev_all(ioc, iov, ARRAY_SIZE(iov), &local_err)) {
> -error_report_err(local_err);
> -return false;
> -}
> +static bool
> +vhost_user_backend_send_dmabuf_fd(QIOChannel *ioc, VhostUserHeader *hdr,
> +  VhostUserPayload *payload)
> +{
> +if (hdr->flags & VHOST_USER_NEED_REPLY_MASK) {
> +hdr->size = sizeof(payload->object);
> +return vhost_user_send_resp(ioc, hdr, payload);
>  }
> +
>  return true;
>  }
>  
> @@ -1717,22 +1725,10 @@ static gboolean slave_read(QIOChannel *ioc, 
> GIOCondition condition,
>   * directly in their request handlers.
>   */
>  if (hdr.flags & VHOST_USER_NEED_REPLY_MASK) {
> -struct iovec iovec[2];
> -
> -
> -hdr.flags &= ~VHOST_USER_NEED_REPLY_MASK;
> -hdr.flags |= VHOST_USER_REPLY_MASK;
> -
>  payload.u64 = !!ret;
>  hdr.size = sizeof(payload.u64);
>  
> -iovec[0].iov_base = &hdr;
> -iovec[0].iov_len = VHOST_USER_HDR_SIZE;
> -iovec[1].iov_base = &payload;
> -iovec[1].iov_len = hdr.size;
> -
> -if (qio_channel_writev_all(ioc, iovec, ARRAY_SIZE(iovec), 
> &local_err)) {
> -error_report_err(local_err);
> +if (!vhost_user_send_resp(ioc, &hdr, &payload)) {
>  goto err;
>  }
>  }
> -- 
> 2.40.0

Re: [PATCH v3 3/4] vhost-user: add shared_object msg

2023-06-22 Thread Michael S. Tsirkin

On Wed, May 24, 2023 at 11:13:32AM +0200, Albert Esteve wrote:
> Add new vhost-user protocol message
> `VHOST_USER_BACKEND_SHARED_OBJECT`. This new
> message is sent from vhost-user back-ends
> to interact with the virtio-dmabuf table
> in order to add, remove, or lookup for
> virtio dma-buf shared objects.
> 
> The action taken in the front-end depends
> on the type stored in the payload struct.
> 
> In the libvhost-user library add helper
> functions to allow sending messages to
> interact with the virtio shared objects
> hash table.
> 
> Signed-off-by: Albert Esteve 
> ---
>  docs/interop/vhost-user.rst   | 15 
>  hw/virtio/vhost-user.c| 68 ++
>  subprojects/libvhost-user/libvhost-user.c | 88 +++
>  subprojects/libvhost-user/libvhost-user.h | 56 +++
>  4 files changed, 227 insertions(+)
> 
> diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> index 5a070adbc1..d3d8db41e5 100644
> --- a/docs/interop/vhost-user.rst
> +++ b/docs/interop/vhost-user.rst
> @@ -1528,6 +1528,21 @@ is sent by the front-end.
>  
>The state.num field is currently reserved and must be set to 0.
>  
> +``VHOST_USER_BACKEND_SHARED_OBJECT``
> +  :id: 6
> +  :equivalent ioctl: N/A
> +  :request payload: ``struct VhostUserShared``
> +  :reply payload: ``struct VhostUserShared`` (only for ``LOOKUP`` requests)
> +
> +  Backends that need to interact with the virtio-dmabuf shared table API
> +  can send this message. The operation is determined by the ``type`` member
> +  of the payload struct. The valid values for the operation type are
> +  ``VHOST_SHARED_OBJECT_*`` members, i.e., ``ADD``, ``LOOKUP``, and 
> ``REMOVE``.
> +  ``LOOKUP`` operations require the ``VHOST_USER_NEED_REPLY_MASK`` flag to be
> +  set by the back-end, and the front-end will then send the dma-buf fd as
> +  a response if the UUID matches an object in the table, or a negative value
> +  otherwise.
> +
>  .. _reply_ack:
>  
>  VHOST_USER_PROTOCOL_F_REPLY_ACK
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 74a2a28663..5ac5f0eafd 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -10,6 +10,7 @@
>  
>  #include "qemu/osdep.h"
>  #include "qapi/error.h"
> +#include "hw/virtio/virtio-dmabuf.h"
>  #include "hw/virtio/vhost.h"
>  #include "hw/virtio/vhost-user.h"
>  #include "hw/virtio/vhost-backend.h"
> @@ -20,6 +21,7 @@
>  #include "sysemu/kvm.h"
>  #include "qemu/error-report.h"
>  #include "qemu/main-loop.h"
> +#include "qemu/uuid.h"
>  #include "qemu/sockets.h"
>  #include "sysemu/runstate.h"
>  #include "sysemu/cryptodev.h"
> @@ -128,6 +130,7 @@ typedef enum VhostUserSlaveRequest {
>  VHOST_USER_BACKEND_IOTLB_MSG = 1,
>  VHOST_USER_BACKEND_CONFIG_CHANGE_MSG = 2,
>  VHOST_USER_BACKEND_VRING_HOST_NOTIFIER_MSG = 3,
> +VHOST_USER_BACKEND_SHARED_OBJECT = 6,
>  VHOST_USER_BACKEND_MAX
>  }  VhostUserSlaveRequest;
>  
> @@ -190,6 +193,18 @@ typedef struct VhostUserInflight {
>  uint16_t queue_size;
>  } VhostUserInflight;
>  
> +typedef enum VhostUserSharedType {
> +VHOST_SHARED_OBJECT_ADD = 0,
> +VHOST_SHARED_OBJECT_LOOKUP,
> +VHOST_SHARED_OBJECT_REMOVE,
> +} VhostUserSharedType;
> +
> +typedef struct VhostUserShared {
> +unsigned char uuid[16];
> +VhostUserSharedType type;
> +int dmabuf_fd;
> +} VhostUserShared;
> +
>  typedef struct {
>  VhostUserRequest request;
>  
> @@ -214,6 +229,7 @@ typedef union {
>  VhostUserCryptoSession session;
>  VhostUserVringArea area;
>  VhostUserInflight inflight;
> +VhostUserShared object;
>  } VhostUserPayload;
>  
>  typedef struct VhostUserMsg {
> @@ -1582,6 +1598,52 @@ static int 
> vhost_user_slave_handle_vring_host_notifier(struct vhost_dev *dev,
>  return 0;
>  }
>  
> +static int vhost_user_backend_handle_shared_object(VhostUserShared *object)
> +{
> +QemuUUID uuid;

Can we initialize it here? uuid = { .data = object->uuid } ?

Also, pls put space after variable declaration.

> +memcpy(uuid.data, object->uuid, sizeof(object->uuid));
> +
> +switch (object->type) {
> +case VHOST_SHARED_OBJECT_ADD:
> +return virtio_add_dmabuf(&uuid, object->dmabuf_fd);
> +case VHOST_SHARED_OBJECT_LOOKUP:
> +object->dmabuf_fd = virtio_lookup_dmabuf(&uuid);
> +if (object->dmabuf_fd < 0) {
> +return object->dmabuf_fd;
> +}
> +break;
> +case VHOST_SHARED_OBJECT_REMOVE:
> +return virtio_remove_resource(&uuid);
> +}
> +

I couldn't figure out why, but if I commit this then run checkpatch,
like this
./scripts/checkpatch.pl HEAD~1..HEAD

then it is unhappy about the : in case. Any idea why?

> +return 0;
> +}
> +
> +static bool
> +vhost_user_backend_send_dmabuf_fd(QIOChannel *ioc, VhostUserHeader *hdr,
> +  VhostUserPayload *payload)
> +{
> +Error *local_err = NULL;
> +struct iovec iov[2];
>

Re: [PATCH v2 5/7] spapr: TCG allow up to 8-thread SMT on POWER8 and newer CPUs

2023-06-22 Thread Cédric Le Goater


On 6/22/23 12:49, Cédric Le Goater wrote:

On 6/22/23 12:06, Cédric Le Goater wrote:

On 6/22/23 11:33, Nicholas Piggin wrote:

PPC TCG supports SMT CPU configurations for non-hypervisor state, so
permit POWER8-10 pseries machines to enable SMT.

This requires PIR and TIR be set, because that's how sibling thread
matching is done by TCG.

spapr's nested-HV capability does not currently coexist with SMT, so
that combination is prohibited (interestingly somewhat analogous to
LPAR-per-core mode on real hardware which also does not support KVM).

Signed-off-by: Nicholas Piggin 
---
  hw/ppc/spapr.c  | 16 
  hw/ppc/spapr_caps.c | 14 ++
  hw/ppc/spapr_cpu_core.c |  7 +--
  3 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 8e7d497f25..677b5eef9d 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2525,10 +2525,18 @@ static void spapr_set_vsmt_mode(SpaprMachineState 
*spapr, Error **errp)
  int ret;
  unsigned int smp_threads = ms->smp.threads;
-    if (tcg_enabled() && (smp_threads > 1)) {
-    error_setg(errp, "TCG cannot support more than 1 thread/core "
-   "on a pseries machine");
-    return;
+    if (tcg_enabled()) {


I will add :

     if (smp_threads > 1 &&

No need to resend for that.


and


Reviewed-by: Cédric Le Goater 

Thanks,

C.

[PATCH v1] virtio-gpu: Make non-gl display updates work again when blob=true

2023-06-22 Thread Vivek Kasireddy

In the case where the console does not have gl capability, and
if blob is set to true, make sure that the display updates still
work. Commit e86a93f55463 accidentally broke this by misplacing
the return statement (in resource_flush) causing the updates to
be silently ignored.

Fixes: e86a93f55463 ("virtio-gpu: splitting one extended mode guest fb into 
n-scanouts")
Cc: Gerd Hoffmann 
Cc: Marc-André Lureau 
Cc: Dongwon Kim 
Signed-off-by: Vivek Kasireddy 
---
 hw/display/virtio-gpu.c | 27 ++-
 1 file changed, 22 insertions(+), 5 deletions(-)

diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index 66cddd94d9..97cd987cf3 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -498,6 +498,8 @@ static void virtio_gpu_resource_flush(VirtIOGPU *g,
 struct virtio_gpu_resource_flush rf;
 struct virtio_gpu_scanout *scanout;
 pixman_region16_t flush_region;
+bool within_bounds = false;
+bool update_submitted = false;
 int i;
 
 VIRTIO_GPU_FILL_CMD(rf);
@@ -518,13 +520,28 @@ static void virtio_gpu_resource_flush(VirtIOGPU *g,
 rf.r.x < scanout->x + scanout->width &&
 rf.r.x + rf.r.width >= scanout->x &&
 rf.r.y < scanout->y + scanout->height &&
-rf.r.y + rf.r.height >= scanout->y &&
-console_has_gl(scanout->con)) {
-dpy_gl_update(scanout->con, 0, 0, scanout->width,
-  scanout->height);
+rf.r.y + rf.r.height >= scanout->y) {
+within_bounds = true;
+
+if (console_has_gl(scanout->con)) {
+dpy_gl_update(scanout->con, 0, 0, scanout->width,
+  scanout->height);
+update_submitted = true;
+}
 }
 }
-return;
+
+if (update_submitted) {
+return;
+}
+if (!within_bounds) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: flush bounds outside scanouts"
+  " bounds for flush %d: %d %d %d %d\n",
+  __func__, rf.resource_id, rf.r.x, rf.r.y,
+  rf.r.width, rf.r.height);
+cmd->error = VIRTIO_GPU_RESP_ERR_INVALID_PARAMETER;
+return;
+}
 }
 
 if (!res->blob &&
-- 
2.39.2

Re: [PATCH v1 0/5] virtio-mem: Device unplug support

2023-06-22 Thread Michael S. Tsirkin

On Tue, Jun 13, 2023 at 05:02:05PM +0200, David Hildenbrand wrote:
> One limitation of virtio-mem is that we cannot currently unplug virtio-mem
> devices that have all memory unplugged from the VM.
> 
> Let's properly handle forced unplug (as can be triggered by the VM) and
> add support for ordinary unplug (requests) of virtio-mem devices that are
> in a compatible state (no legacy mode, no plugged memory, no plug request).
> 
> Briefly tested on both, x86_64 and aarch64.
> 
> Cc: Peter Maydell 
> Cc: Paolo Bonzini 
> Cc: Richard Henderson 
> Cc: Eduardo Habkost 
> Cc: "Michael S. Tsirkin" 
> Cc: Marcel Apfelbaum 
> Cc: Igor Mammedov 
> Cc: qemu-...@nongnu.org
> Cc: Gavin Shan 

Lots of duplication pc/arm. Which is not new but do we have to keep
growing this?  Can't we put at least the new common code somewhere?

What do ARM maintainers think about it?


> David Hildenbrand (5):
>   pc: Properly handle unplug of virtio based memory devices
>   arm/virt: Properly handle unplug of virtio based memory devices
>   virtio-mem: Prepare for unplug support of virtio-mem-pci devices
>   pc: Support unplug of virtio-mem-pci devices
>   arm/virt: Support unplug of virtio-mem-pci devices
> 
>  hw/arm/virt.c  | 60 +--
>  hw/i386/pc.c   | 66 ++
>  hw/virtio/virtio-mem-pci.c | 42 --
>  hw/virtio/virtio-mem-pci.h |  2 ++
>  hw/virtio/virtio-mem.c | 24 +
>  include/hw/virtio/virtio-mem.h |  2 ++
>  6 files changed, 183 insertions(+), 13 deletions(-)
> 
> -- 
> 2.40.1

Re: [PATCH v3 00/12] Start replacing target_ulong with vaddr

2023-06-22 Thread Richard Henderson


On 6/21/23 15:56, Anton Johansson wrote:

This is a first patchset in removing target_ulong from non-target/
directories.  As use of target_ulong is spread accross the codebase we
are attempting to target as few maintainers as possible with each
patchset in order to ease reviewing.

The following instances of target_ulong remain in accel/ and tcg/
 - atomic helpers (atomic_common.c.inc), cpu_atomic_*()
   (atomic_template.h,) and cpu_[st|ld]*()
   (cputlb.c/ldst_common.c.inc) are only used in target/ and can
   be pulled out into a separate target-specific file;

 - walk_memory_regions() is used in user-exec.c and
   linux-user/elfload.c;

 - kvm_find_sw_breakpoint() in kvm-all.c used in target/;

Changes in v2:
 - addr argument in tb_invalidate_phys_addr() changed from vaddr
   to hwaddr;

 - Removed previous patch:

 "[PATCH 4/8] accel/tcg: Replace target_ulong with vaddr in 
helper_unaligned_*()"

   as these functions are removed by Richard's patches;

 - First patch:

 "[PATCH 1/8] accel: Replace `target_ulong` with `vaddr` in TB/TLB"

   has been split into patches 1-7 to ease reviewing;

 - Pulled in target/ changes to cpu_get_tb_cpu_state() into this
   patchset.  This was done to avoid pointer casts to target_ulong *
   which would break for 32-bit targets on a 64-bit BE host;

   Note the small target/ changes are collected in a single
   patch to not break bisection.  If it's still desirable to split
   based on maintainer, let me know;

 - `last` argument of pageflags_[find|next] changed from target_long
to vaddr.  This change was left out of the last patchset due to
triggering a "Bad ram pointer" error (softmmu/physmem.c:2273)
when running make check for a i386-softmmu target.

I was not able to recreate this on master or post rebase on
Richard's tcg-once branch.

Changes in v3:
 - Rebased on master

Finally, the grand goal is to allow for heterogeneous QEMU binaries
consisting of multiple frontends.

RFC: https://lists.nongnu.org/archive/html/qemu-devel/2022-12/msg04518.html

Anton Johansson (12):
   accel: Replace target_ulong in tlb_*()
   accel/tcg/translate-all.c: Widen pc and cs_base
   target: Widen pc/cs_base in cpu_get_tb_cpu_state
   accel/tcg/cputlb.c: Widen CPUTLBEntry access functions
   accel/tcg/cputlb.c: Widen addr in MMULookupPageData
   accel/tcg/cpu-exec.c: Widen pc to vaddr
   accel/tcg: Widen pc to vaddr in CPUJumpCache
   accel: Replace target_ulong with vaddr in probe_*()
   accel/tcg: Replace target_ulong with vaddr in *_mmu_lookup()
   accel/tcg: Replace target_ulong with vaddr in translator_*()
   accel/tcg: Replace target_ulong with vaddr in page_*()
   cpu: Replace target_ulong with hwaddr in tb_invalidate_phys_addr()

  accel/stubs/tcg-stub.c   |   6 +-
  accel/tcg/cpu-exec.c |  43 ---
  accel/tcg/cputlb.c   | 233 +--
  accel/tcg/internal.h |   6 +-
  accel/tcg/tb-hash.h  |  12 +-
  accel/tcg/tb-jmp-cache.h |   2 +-
  accel/tcg/tb-maint.c |   2 +-
  accel/tcg/translate-all.c|  13 +-
  accel/tcg/translator.c   |  10 +-
  accel/tcg/user-exec.c|  58 +
  cpu.c|   2 +-
  include/exec/cpu-all.h   |  10 +-
  include/exec/cpu-defs.h  |   4 +-
  include/exec/cpu_ldst.h  |  10 +-
  include/exec/exec-all.h  |  95 +++---
  include/exec/translate-all.h |   2 +-
  include/exec/translator.h|   6 +-
  include/qemu/plugin-memory.h |   2 +-
  target/alpha/cpu.h   |   4 +-
  target/arm/cpu.h |   4 +-
  target/arm/helper.c  |   4 +-
  target/avr/cpu.h |   4 +-
  target/cris/cpu.h|   4 +-
  target/hexagon/cpu.h |   4 +-
  target/hppa/cpu.h|   5 +-
  target/i386/cpu.h|   4 +-
  target/loongarch/cpu.h   |   6 +-
  target/m68k/cpu.h|   4 +-
  target/microblaze/cpu.h  |   4 +-
  target/mips/cpu.h|   4 +-
  target/nios2/cpu.h   |   4 +-
  target/openrisc/cpu.h|   5 +-
  target/ppc/cpu.h |   8 +-
  target/ppc/helper_regs.c |   4 +-
  target/riscv/cpu.h   |   4 +-
  target/riscv/cpu_helper.c|   4 +-
  target/rx/cpu.h  |   4 +-
  target/s390x/cpu.h   |   4 +-
  target/sh4/cpu.h |   4 +-
  target/sparc/cpu.h   |   4 +-
  target/tricore/cpu.h |   4 +-
  target/xtensa/cpu.h  |   4 +-
  42 files changed, 307 insertions(+), 313 deletions(-)

--
2.41.0


Queued to tcg-next, thanks.


r~

Re: [RFC PATCH] softfloat: use QEMU_FLATTEN to avoid mistaken isra inlining

2023-06-22 Thread Richard Henderson


On 6/22/23 22:55, BALATON Zoltan wrote:

Hello,

What happened to this patch? Will this be merged by somebody?


Thanks for the reminder.  Queued to tcg-next.

r~



Regards,
BALATON Zoltan

On Tue, 23 May 2023, BALATON Zoltan wrote:

On Tue, 23 May 2023, Alex Bennée wrote:

Balton discovered that asserts for the extract/deposit calls had a


Missing an a in my name and my given name is Zoltan. (First name and last name is in the 
other way in Hungarian.) Maybe just add a Reported-by instead of here if you want to 
record it.



significant impact on a lame benchmark on qemu-ppc. Replicating with:

 ./qemu-ppc64 ~/lsrc/tests/lame.git-svn/builds/ppc64/frontend/lame \
   -h pts-trondheim-3.wav pts-trondheim-3.mp3

showed up the pack/unpack routines not eliding the assert checks as it
should have done causing them to prominently figure in the profile:

 11.44%  qemu-ppc64  qemu-ppc64   [.] unpack_raw64.isra.0
 11.03%  qemu-ppc64  qemu-ppc64   [.] parts64_uncanon_normal
  8.26%  qemu-ppc64  qemu-ppc64   [.] helper_compute_fprf_float64
  6.75%  qemu-ppc64  qemu-ppc64   [.] do_float_check_status
  5.34%  qemu-ppc64  qemu-ppc64   [.] parts64_muladd
  4.75%  qemu-ppc64  qemu-ppc64   [.] pack_raw64.isra.0
  4.38%  qemu-ppc64  qemu-ppc64   [.] parts64_canonicalize
  3.62%  qemu-ppc64  qemu-ppc64   [.] 
float64r32_round_pack_canonical

After this patch the same test runs 31 seconds faster with a profile
where the generated code dominates more:

+   14.12% 0.00%  qemu-ppc64  [unknown]    [.] 
0x004000619420
+   13.30% 0.00%  qemu-ppc64  [unknown]    [.] 
0x004000616850
+   12.58%    12.19%  qemu-ppc64  qemu-ppc64   [.] 
parts64_uncanon_normal
+   10.62% 0.00%  qemu-ppc64  [unknown]    [.] 
0x00400061bf70
+    9.91% 9.73%  qemu-ppc64  qemu-ppc64   [.] 
helper_compute_fprf_float64
+    7.84% 7.82%  qemu-ppc64  qemu-ppc64   [.] 
do_float_check_status
+    6.47% 5.78%  qemu-ppc64  qemu-ppc64   [.] 
parts64_canonicalize.constprop.0

+    6.46% 0.00%  qemu-ppc64  [unknown]    [.] 
0x004000620130
+    6.42% 0.00%  qemu-ppc64  [unknown]    [.] 
0x004000619400
+    6.17% 6.04%  qemu-ppc64  qemu-ppc64   [.] parts64_muladd
+    5.85% 0.00%  qemu-ppc64  [unknown]    [.] 
0x0040006167e0
+    5.74% 0.00%  qemu-ppc64  [unknown]    [.] 
0xb693fcd3
+    5.45% 4.78%  qemu-ppc64  qemu-ppc64   [.] 
float64r32_round_pack_canonical


Suggested-by: Richard Henderson 
Message-Id: 
[AJB: Patchified rth's suggestion]
Signed-off-by: Alex Bennée 
Cc: BALATON Zoltan 


Replace Cc: with
Tested-by: BALATON Zoltan 

This solves the softfloat related usages, the rest probably are lower overhead, I could 
not measure any more improvement with removing asserts on top of this patch. I still 
have these functions high in my profiling result:


children  self    command  symbol
11.40%    10.86%  qemu-system-ppc  helper_compute_fprf_float64
11.25% 0.61%  qemu-system-ppc  helper_fmadds
10.01% 3.23%  qemu-system-ppc  float64r32_round_pack_canonical
8.59% 1.80%  qemu-system-ppc  helper_float_check_status
8.34% 7.23%  qemu-system-ppc  parts64_muladd
8.16% 0.67%  qemu-system-ppc  helper_fmuls
8.08% 0.43%  qemu-system-ppc  parts64_uncanon
7.49% 1.78%  qemu-system-ppc  float64r32_mul
7.32% 7.32%  qemu-system-ppc  parts64_uncanon_normal
6.48% 0.52%  qemu-system-ppc  helper_fadds
6.31% 6.31%  qemu-system-ppc  do_float_check_status
5.99% 1.14%  qemu-system-ppc  float64r32_add

Any idea on those?

Unrelated to this patch I also started to see random crashes with a DSI on a dcbz 
instruction now which did not happen before (or not frequently enough for me to notice). 
I did not bisect that as it happens randomly but I wonder if it could be related to 
recent unaligned access changes or some other TCG change? Any idea what to check?


Regards,
BALATON Zoltan


---
fpu/softfloat.c | 22 +++---
1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 108f9cb224..42e6c188b4 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -593,27 +593,27 @@ static void unpack_raw64(FloatParts64 *r, const FloatFmt *fmt, 
uint64_t raw)

    };
}

-static inline void float16_unpack_raw(FloatParts64 *p, float16 f)
+static void QEMU_FLATTEN float16_unpack_raw(FloatParts64 *p, float16 f)
{
    unpack_raw64(p, &float16_params, f);
}

-static inline void bfloat16_unpack_raw(FloatParts64 *p, bfloat16 f)
+static void QEMU_FLATTEN bfloat16_unpack_raw(FloatParts64 *p, bfloat16 f)
{
    unpack_raw64(p, &bfloat16_params, f);
}

-static inline void float32_unpack_raw(FloatParts64 *p, float32 f)
+static void QEMU_FLATTEN float32_unpack_raw(FloatParts64 *p, floa

Re: [PATCH v2 1/2] qmp: remove virtio_list, search QOM tree instead

2023-06-22 Thread Michael S. Tsirkin

On Fri, Jun 09, 2023 at 09:20:39AM -0400, Jonah Palmer wrote:
> The virtio_list duplicates information about virtio devices that already
> exist in the QOM composition tree. Instead of creating this list of
> realized virtio devices, search the QOM composition tree instead.
> 
> This patch modifies the QMP command qmp_x_query_virtio to instead search
> the partial paths of '/machine/peripheral/' &
> '/machine/peripheral-anon/' in the QOM composition tree for virtio
> devices.
> 
> A device is found to be a valid virtio device if (1) its canonical path
> is of 'TYPE_VIRTIO_DEVICE' and (2) the device has been realized.
> 
> [Jonah: In the previous commit I had written that a device is found to
>  be a valid virtio device if (1) it has a canonical path ending with
>  'virtio-backend'.
> 
>  The code now determines if it's a virtio device by appending
>  'virtio-backend' (if needed) to a given canonical path and then
>  checking that path to see if the device is of type
> 'TYPE_VIRTIO_DEVICE'.
> 
>  The patch also instead now checks to make sure it's a virtio device
>  before attempting to check whether the device is realized or not.]
> 
> Signed-off-by: Jonah Palmer 


Could one of QMP maintainers comment on this please?

> ---
>  hw/virtio/virtio-qmp.c | 128 ++---
>  hw/virtio/virtio-qmp.h |   8 +--
>  hw/virtio/virtio.c |   6 --
>  3 files changed, 82 insertions(+), 60 deletions(-)
> 
> diff --git a/hw/virtio/virtio-qmp.c b/hw/virtio/virtio-qmp.c
> index b5e1835299..e936cc8ce5 100644
> --- a/hw/virtio/virtio-qmp.c
> +++ b/hw/virtio/virtio-qmp.c
> @@ -668,67 +668,101 @@ VirtioDeviceFeatures *qmp_decode_features(uint16_t 
> device_id, uint64_t bitmap)
>  VirtioInfoList *qmp_x_query_virtio(Error **errp)
>  {
>  VirtioInfoList *list = NULL;
> -VirtioInfo *node;
> -VirtIODevice *vdev;
>  
> -QTAILQ_FOREACH(vdev, &virtio_list, next) {
> -DeviceState *dev = DEVICE(vdev);
> -Error *err = NULL;
> -QObject *obj = qmp_qom_get(dev->canonical_path, "realized", &err);
> -
> -if (err == NULL) {
> -GString *is_realized = qobject_to_json_pretty(obj, true);
> -/* virtio device is NOT realized, remove it from list */
> -if (!strncmp(is_realized->str, "false", 4)) {
> -QTAILQ_REMOVE(&virtio_list, vdev, next);
> -} else {
> -node = g_new(VirtioInfo, 1);
> -node->path = g_strdup(dev->canonical_path);
> -node->name = g_strdup(vdev->name);
> -QAPI_LIST_PREPEND(list, node);
> +/* Query the QOM composition tree for virtio devices */
> +qmp_set_virtio_device_list("/machine/peripheral/", &list);
> +qmp_set_virtio_device_list("/machine/peripheral-anon/", &list);

How sure are we these will forever be the only two places where virtio
can live?

> +if (list == NULL) {
> +error_setg(errp, "No virtio devices found");
> +return NULL;
> +}
> +return list;
> +}
> +
> +/* qmp_set_virtio_device_list:
> + * @ppath: An incomplete peripheral path to search from.
> + * @list: A list of realized virtio devices.
> + * Searches a given incomplete peripheral path (e.g. '/machine/peripheral/'
> + * or '/machine/peripheral-anon/') for realized virtio devices and adds them
> + * to a given list of virtio devices.
> + */
> +void qmp_set_virtio_device_list(const char *ppath, VirtioInfoList **list)
> +{
> +ObjectPropertyInfoList *plist;
> +VirtioInfoList *node;
> +Error *err = NULL;
> +
> +/* Search an incomplete path for virtio devices */
> +plist = qmp_qom_list(ppath, &err);
> +if (err == NULL) {
> +ObjectPropertyInfoList *start = plist;
> +while (plist != NULL) {
> +ObjectPropertyInfo *value = plist->value;
> +GString *path = g_string_new(ppath);
> +g_string_append(path, value->name);
> +g_string_append(path, "/virtio-backend");
> +
> +/* Determine if full path is a realized virtio device */
> +VirtIODevice *vdev = qmp_find_virtio_device(path->str);
> +if (vdev != NULL) {
> +node = g_new0(VirtioInfoList, 1);
> +node->value = g_new(VirtioInfo, 1);
> +node->value->path = g_strdup(path->str);
> +node->value->name = g_strdup(vdev->name);
> +QAPI_LIST_PREPEND(*list, node->value);
>  }
> -   g_string_free(is_realized, true);
> +g_string_free(path, true);
> +plist = plist->next;
>  }
> -qobject_unref(obj);
> +qapi_free_ObjectPropertyInfoList(start);
>  }
> -
> -return list;
>  }
>  
>  VirtIODevice *qmp_find_virtio_device(const char *path)
>  {
> -VirtIODevice *vdev;
> -
> -QTAILQ_FOREACH(vdev, &virtio_list, next) {
> -DeviceState *dev = DEVICE(vdev);
> -
> -if (strcmp(dev->canonical_path, path) != 0) {
> -

Re: [PULL 00/30] Next patches

2023-06-22 Thread Richard Henderson


On 6/22/23 18:54, Juan Quintela wrote:

The following changes since commit b455ce4c2f300c8ba47cba7232dd03261368a4cb:

   Merge tag 'q800-for-8.1-pull-request' ofhttps://github.com/vivier/qemu-m68k  
into staging (2023-06-22 10:18:32 +0200)

are available in the Git repository at:

   https://gitlab.com/juan.quintela/qemu.git  tags/next-pull-request

for you to fetch changes up to 23e4307eadc1497bd0a11ca91041768f15963b68:

   migration/rdma: Split qemu_fopen_rdma() into input/output functions 
(2023-06-22 18:11:58 +0200)


Migration Pull request (20230621) take 2

In this pull request the only change is fixing 32 bits complitaion issue.

Please apply.

[take 1]
- fix for multifd thread creation (fabiano)
- dirtylimity (hyman)
   * migration-test will go on next PULL request, as it has failures.
- Improve error description (tejus)
- improve -incoming and set parameters before calling incoming (wei)
- migration atomic counters reviewed patches (quintela)
- migration-test refacttoring reviewed (quintela)


New failure with check-cfi-x86_64:

https://gitlab.com/qemu-project/qemu/-/jobs/4527202764#L188

/builds/qemu-project/qemu/build/pyvenv/bin/meson test  --no-rebuild -t 0  --num-processes 
1 --print-errorlogs
  1/350 qemu:qtest+qtest-x86_64 / qtest-x86_64/qom-test   OK 
6.55s   8 subtests passed
▶   2/350 ERROR:../tests/qtest/migration-test.c:320:check_guests_ram: assertion failed: 
(bad == 0) ERROR
  2/350 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test ERROR 
151.99s   killed by signal 6 SIGABRT
>>> G_TEST_DBUS_DAEMON=/builds/qemu-project/qemu/tests/dbus-vmstate-daemon.sh 
MALLOC_PERTURB_=3 QTEST_QEMU_IMG=./qemu-img 
QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon 
QTEST_QEMU_BINARY=./qemu-system-x86_64 
/builds/qemu-project/qemu/build/tests/qtest/migration-test --tap -k

― ✀  ―
stderr:
qemu-system-x86_64: Unable to read from socket: Connection reset by peer
Memory content inconsistency at 4f65000 first_byte = 30 last_byte = 2f current = 88 
hit_edge = 1

**
ERROR:../tests/qtest/migration-test.c:320:check_guests_ram: assertion failed: 
(bad == 0)

(test program exited with status code -6)
――


r~

Re: [PATCH v2 2/2] qmp: update virtio feature maps, vhost-user-gpio instrospection

2023-06-22 Thread Michael S. Tsirkin

On Fri, Jun 09, 2023 at 09:20:40AM -0400, Jonah Palmer wrote:
> Add new virtio transport feature to transport feature map:
>  - VIRTIO_F_RING_RESET
> 
> Add new vhost-user protocol feature to vhost-user protocol feature map
> and enumeration:
>  - VHOST_USER_PROTOCOL_F_STATUS
> 
> Add new virtio device features for several virtio devices to their
> respective feature mappings:
> 
> virtio-blk:
>  - VIRTIO_BLK_F_SECURE_ERASE
> 
> virtio-net:
>  - VIRTIO_NET_F_NOTF_COAL
>  - VIRTIO_NET_F_GUEST_USO4
>  - VIRTIO_NET_F_GUEST_USO6
>  - VIRTIO_NET_F_HOST_USO
> 
> virtio/vhost-user-gpio:
>  - VIRTIO_GPIO_F_IRQ
>  - VHOST_F_LOG_ALL
>  - VHOST_USER_F_PROTOCOL_FEATURES
> 
> virtio-bt:
>  - VIRTIO_BT_F_VND_HCI
>  - VIRTIO_BT_F_MSFT_EXT
>  - VIRTIO_BT_F_AOSP_EXT
>  - VIRTIO_BT_F_CONFIG_V2
> 
> virtio-scmi:
>  - VIRTIO_SCMI_F_P2A_CHANNELS
>  - VIRTIO_SCMI_F_SHARED_MEMORY
> 
> Add support for introspection on vhost-user-gpio devices.
> 
> Signed-off-by: Jonah Palmer 

Thanks for the patch! Some comments:


> ---
>  hw/virtio/vhost-user-gpio.c |  7 
>  hw/virtio/virtio-qmp.c  | 79 +++--
>  2 files changed, 83 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/virtio/vhost-user-gpio.c b/hw/virtio/vhost-user-gpio.c
> index d6927b610a..e88ca5370f 100644
> --- a/hw/virtio/vhost-user-gpio.c
> +++ b/hw/virtio/vhost-user-gpio.c
> @@ -205,6 +205,12 @@ static void vu_gpio_guest_notifier_mask(VirtIODevice 
> *vdev, int idx, bool mask)
>  vhost_virtqueue_mask(&gpio->vhost_dev, vdev, idx, mask);
>  }
>  
> +static struct vhost_dev *vu_gpio_get_vhost(VirtIODevice *vdev)
> +{
> +VHostUserGPIO *gpio = VHOST_USER_GPIO(vdev);
> +return &gpio->vhost_dev;
> +}
> +
>  static void do_vhost_user_cleanup(VirtIODevice *vdev, VHostUserGPIO *gpio)
>  {
>  virtio_delete_queue(gpio->command_vq);
> @@ -413,6 +419,7 @@ static void vu_gpio_class_init(ObjectClass *klass, void 
> *data)
>  vdc->get_config = vu_gpio_get_config;
>  vdc->set_status = vu_gpio_set_status;
>  vdc->guest_notifier_mask = vu_gpio_guest_notifier_mask;
> +vdc->get_vhost = vu_gpio_get_vhost;
>  }
>  
>  static const TypeInfo vu_gpio_info = {
> diff --git a/hw/virtio/virtio-qmp.c b/hw/virtio/virtio-qmp.c
> index e936cc8ce5..140c420d87 100644
> --- a/hw/virtio/virtio-qmp.c
> +++ b/hw/virtio/virtio-qmp.c
> @@ -53,6 +53,7 @@ enum VhostUserProtocolFeature {
>  VHOST_USER_PROTOCOL_F_RESET_DEVICE = 13,
>  VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS = 14,
>  VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS = 15,
> +VHOST_USER_PROTOCOL_F_STATUS = 16,
>  VHOST_USER_PROTOCOL_F_MAX
>  };
>

OMG I just realized that by now we have accumulated each value
in 4 places! This is really badly asking to be moved
to a header. Not sure what to do about the document yet
but that will at least get us down to two.
  
> @@ -79,6 +80,8 @@ static const qmp_virtio_feature_map_t 
> virtio_transport_map[] = {
>  "VIRTIO_F_ORDER_PLATFORM: Memory accesses ordered by platform"),
>  FEATURE_ENTRY(VIRTIO_F_SR_IOV, \
>  "VIRTIO_F_SR_IOV: Device supports single root I/O 
> virtualization"),
> +FEATURE_ENTRY(VIRTIO_F_RING_RESET, \
> +"VIRTIO_F_RING_RESET: Driver can reset individual VQs"),
>  /* Virtio ring transport features */
>  FEATURE_ENTRY(VIRTIO_RING_F_INDIRECT_DESC, \
>  "VIRTIO_RING_F_INDIRECT_DESC: Indirect descriptors supported"),
> @@ -134,6 +137,9 @@ static const qmp_virtio_feature_map_t 
> vhost_user_protocol_map[] = {
>  FEATURE_ENTRY(VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS, \
>  "VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS: Configuration for "
>  "memory slots supported"),
> +FEATURE_ENTRY(VHOST_USER_PROTOCOL_F_STATUS, \
> +"VHOST_USER_PROTOCOL_F_STATUS: Querying and notifying back-end "
> +"device statuses supported"),

status - there's only one per device

>  { -1, "" }
>  };
>  
> @@ -176,6 +182,8 @@ static const qmp_virtio_feature_map_t 
> virtio_blk_feature_map[] = {
>  "VIRTIO_BLK_F_DISCARD: Discard command supported"),
>  FEATURE_ENTRY(VIRTIO_BLK_F_WRITE_ZEROES, \
>  "VIRTIO_BLK_F_WRITE_ZEROES: Write zeroes command supported"),
> +FEATURE_ENTRY(VIRTIO_BLK_F_SECURE_ERASE, \
> +"VIRTIO_BLK_F_SECURE_ERASE: Secure erase supported"),
>  FEATURE_ENTRY(VIRTIO_BLK_F_ZONED, \
>  "VIRTIO_BLK_F_ZONED: Zoned block devices"),
>  #ifndef VIRTIO_BLK_NO_LEGACY

> @@ -299,6 +307,14 @@ static const qmp_virtio_feature_map_t 
> virtio_net_feature_map[] = {
>  FEATURE_ENTRY(VIRTIO_NET_F_CTRL_MAC_ADDR, \
>  "VIRTIO_NET_F_CTRL_MAC_ADDR: MAC address set through control "
>  "channel"),
> +FEATURE_ENTRY(VIRTIO_NET_F_NOTF_COAL, \
> +"VIRTIO_NET_F_NOTF_COAL: Device supports coalescing 
> notifications"),
> +FEATURE_ENTRY(VIRTIO_NET_F_GUEST_USO4, \
> +"VIRTIO_NET_F_GUEST_USO4: Driver can receive USOv4"),

Re: [PATCH] vhost: fix vhost_dev_enable_notifiers() error case

2023-06-22 Thread Michael S. Tsirkin

On Wed, Jun 07, 2023 at 12:32:31PM +0300, Michael Tokarev wrote:
> 02.06.2023 19:27, Laurent Vivier wrote:
> > in vhost_dev_enable_notifiers(), if virtio_bus_set_host_notifier(true)
> > fails, we call vhost_dev_disable_notifiers() that executes
> > virtio_bus_set_host_notifier(false) on all queues, even on queues that
> > have failed to be initialized.
> > 
> > This triggers a core dump in memory_region_del_eventfd():
> > 
> >   virtio_bus_set_host_notifier: unable to init event notifier: Too many 
> > open files (-24)
> >   vhost VQ 1 notifier binding failed: 24
> >   .../softmmu/memory.c:2611: memory_region_del_eventfd: Assertion `i != 
> > mr->ioeventfd_nb' failed.
> > 
> > Fix the problem by providing to vhost_dev_disable_notifiers() the
> > number of queues to disable.
> > 
> > Fixes: 8771589b6f81 ("vhost: simplify vhost_dev_enable_notifiers")
> > Cc: longpe...@huawei.com
> > Signed-off-by: Laurent Vivier 
> > ---
> >   hw/virtio/vhost.c | 65 ++-
> >   1 file changed, 36 insertions(+), 29 deletions(-)
> 
> Is this one a candidate for -stable?
> 
> The diffstat is somewhat large but it is just moving bit of code around.

I'd say so, yes.

> Thanks,
> 
> /mjt

[PATCH] gdbstub: Permit reverse step/break to provide stop response

2023-06-22 Thread Nicholas Piggin

The final part of the reverse step and break handling is to bring
the machine back to a debug stop state. gdb expects a response.

A gdb 'rsi' command hangs forever because the gdbstub filters out
the response (also observable with reverse_debugging.py avocado
tests).

Fix by setting allow_stop_reply for the gdb backward packets.

Fixes: 758370052fb ("gdbstub: only send stop-reply packets when allowed to")
Cc: qemu-sta...@nongnu.org
Cc: Matheus Tavares Bernardino 
Cc: Alex Bennée 
Cc: Taylor Simpson 
Signed-off-by: Nicholas Piggin 
---
 gdbstub/gdbstub.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
index be18568d0a..9496d7b175 100644
--- a/gdbstub/gdbstub.c
+++ b/gdbstub/gdbstub.c
@@ -1814,6 +1814,7 @@ static int gdb_handle_packet(const char *line_buf)
 .handler = handle_backward,
 .cmd = "b",
 .cmd_startswith = 1,
+.allow_stop_reply = true,
 .schema = "o0"
 };
 cmd_parser = &backward_cmd_desc;
-- 
2.40.1

Re: How do you represent a host gcc and a cross gcc in lcitool?

2023-06-22 Thread Alistair Francis

On Thu, Jun 1, 2023 at 4:58 AM Alex Bennée  wrote:
>
>
> Brian Cain  writes:
>
> >> -Original Message-
> >> From: Alex Bennée 
> >> Sent: Wednesday, May 31, 2023 6:24 AM
> >> To: Daniel P.Berrangé 
> >> Cc: qemu-devel ; Michael Tokarev
> >> ; Erik Skultety ; Brian Cain
> >> ; Palmer Dabbelt ; Alistair Francis
> >> ; Bin Meng 
> >> Subject: How do you represent a host gcc and a cross gcc in lcitool?
> >>
> >> WARNING: This email originated from outside of Qualcomm. Please be wary of
> >> any links or attachments, and do not enable macros.
> >>
> >> Hi,
> >>
> >> While trying to convert the debian-riscv64-cross docker container to an
> >> lcitool based one I ran into a problem building QEMU. The configure step
> >> fails because despite cross compiling we still need a host compiler to
> >> build the hexagon codegen tooling.
> >
> > I thought we'd fixed this container definition so that we only
> > downloaded the hexagon toolchain instead? Do we really need a host
> > compiler for that container build?
> >
> > Or am I misunderstanding and you're referring to features required to
> > support idef parser? Does "hexagon codegen" refer to hexagon's TCG
> > generation or hexagon code itself (required by tests/tcg)?
>
> I think so:
>
> #
> #  Step 1
> #  We use a C program to create semantics_generated.pyinc
> #
> gen_semantics = executable(
> 'gen_semantics',
> 'gen_semantics.c',
> native: true, build_by_default: false)
>
> semantics_generated = custom_target(
> 'semantics_generated.pyinc',
> output: 'semantics_generated.pyinc',
> command: [gen_semantics, '@OUTPUT@'],
> )
> hexagon_ss.add(semantics_generated)
>
>
> >
> >> After scratching my head for a while I discovered we did have host GCC's
> >> in our cross images despite there being no explicit request for them in
> >> the docker description. It turned out that the gcovr requirement pulled
> >> in lcov which itself had a dependency on gcc. However this is a bug:
> >>
> >>   https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=987818
> >>
> >> which has been fixed in bookworm (and of course sid which is the only
> >> way we can get a riscv64 build of QEMU at the moment). Hence my hacky
> >> attempts to get gcc via side effect of another package failed.
> >>
> >> Hence the question in $SUBJECT. I tried to add a mapping to lcitool for
> >> a pseudo hostgcc package:
> >>
> >> +  hostgcc:
> >> +default: gcc
> >> +pkg:
> >> +MacOS:
> >> +cross-policy-default: skip
> >>
> >> however this didn't work. Do we need a new mechanism for this or am I
> >> missing a way to do this?
> >>
> >> RiscV guys,
> >>
> >> It's clear that relying on Debian Sid for the QEMU cross build for RiscV
> >> is pretty flakey. Are you guys aware of any other distros that better
> >> support cross compiling to a riscv64 target or is Debian still the best
> >> bet? Could you be persuaded to build a binary docker image with the
> >> cross compilers and libraries required for a decent cross build as an
> >> alternative?

It's probably not very helpful, but I find Arch based distros to be
the best bet for this.

Are you still looking for a Docker image? I could try and get something working

Alistair

> >>
> >> Thanks,
> >>
> >> --
> >> Alex Bennée
> >> Virtualisation Tech Lead @ Linaro
>
>
> --
> Alex Bennée
> Virtualisation Tech Lead @ Linaro
>

Re: [PATCH] linux-user/riscv: Add syscall riscv_hwprobe

2023-06-22 Thread Alistair Francis

On Mon, Jun 19, 2023 at 6:25 PM Robbin Ehn  wrote:
>
> This patch adds the new syscall for the
> "RISC-V Hardware Probing Interface"
> (https://docs.kernel.org/riscv/hwprobe.html).
>
> Reviewed-by: Palmer Dabbelt 
> Signed-off-by: Robbin Ehn 

Thanks for the patch!

Do you mind re-sending this when 6.4 comes out? I would like it in a
kernel release before we pick it up

Alistair

> ---
> v1->v2: Moved to syscall.c
> v2->v3: Separate function, get/put user
> v3->patch
> ---
>  linux-user/riscv/syscall32_nr.h |   1 +
>  linux-user/riscv/syscall64_nr.h |   1 +
>  linux-user/syscall.c| 146 
>  3 files changed, 148 insertions(+)
>
> diff --git a/linux-user/riscv/syscall32_nr.h b/linux-user/riscv/syscall32_nr.h
> index 1327d7dffa..412e58e5b2 100644
> --- a/linux-user/riscv/syscall32_nr.h
> +++ b/linux-user/riscv/syscall32_nr.h
> @@ -228,6 +228,7 @@
>  #define TARGET_NR_accept4 242
>  #define TARGET_NR_arch_specific_syscall 244
>  #define TARGET_NR_riscv_flush_icache (TARGET_NR_arch_specific_syscall + 15)
> +#define TARGET_NR_riscv_hwprobe (TARGET_NR_arch_specific_syscall + 14)
>  #define TARGET_NR_prlimit64 261
>  #define TARGET_NR_fanotify_init 262
>  #define TARGET_NR_fanotify_mark 263
> diff --git a/linux-user/riscv/syscall64_nr.h b/linux-user/riscv/syscall64_nr.h
> index 6659751933..29e1eb2075 100644
> --- a/linux-user/riscv/syscall64_nr.h
> +++ b/linux-user/riscv/syscall64_nr.h
> @@ -251,6 +251,7 @@
>  #define TARGET_NR_recvmmsg 243
>  #define TARGET_NR_arch_specific_syscall 244
>  #define TARGET_NR_riscv_flush_icache (TARGET_NR_arch_specific_syscall + 15)
> +#define TARGET_NR_riscv_hwprobe (TARGET_NR_arch_specific_syscall + 14)
>  #define TARGET_NR_wait4 260
>  #define TARGET_NR_prlimit64 261
>  #define TARGET_NR_fanotify_init 262
> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
> index f2cb101d83..55becf3666 100644
> --- a/linux-user/syscall.c
> +++ b/linux-user/syscall.c
> @@ -8874,6 +8874,147 @@ static int do_getdents64(abi_long dirfd, abi_long 
> arg2, abi_long count)
>  }
>  #endif /* TARGET_NR_getdents64 */
>
> +#if defined(TARGET_NR_riscv_hwprobe)
> +
> +#define RISCV_HWPROBE_KEY_MVENDORID 0
> +#define RISCV_HWPROBE_KEY_MARCHID   1
> +#define RISCV_HWPROBE_KEY_MIMPID2
> +
> +#define RISCV_HWPROBE_KEY_BASE_BEHAVIOR 3
> +#define RISCV_HWPROBE_BASE_BEHAVIOR_IMA (1 << 0)
> +
> +#define RISCV_HWPROBE_KEY_IMA_EXT_0 4
> +#define RISCV_HWPROBE_IMA_FD   (1 << 0)
> +#define RISCV_HWPROBE_IMA_C(1 << 1)
> +
> +#define RISCV_HWPROBE_KEY_CPUPERF_0 5
> +#define RISCV_HWPROBE_MISALIGNED_UNKNOWN (0 << 0)
> +#define RISCV_HWPROBE_MISALIGNED_EMULATED(1 << 0)
> +#define RISCV_HWPROBE_MISALIGNED_SLOW(2 << 0)
> +#define RISCV_HWPROBE_MISALIGNED_FAST(3 << 0)
> +#define RISCV_HWPROBE_MISALIGNED_UNSUPPORTED (4 << 0)
> +#define RISCV_HWPROBE_MISALIGNED_MASK(7 << 0)
> +
> +struct riscv_hwprobe {
> +abi_llong  key;
> +abi_ullong value;
> +};
> +
> +static void risc_hwprobe_fill_pairs(CPURISCVState *env,
> +struct riscv_hwprobe *pair,
> +size_t pair_count)
> +{
> +const RISCVCPUConfig *cfg = riscv_cpu_cfg(env);
> +
> +for (; pair_count > 0; pair_count--, pair++) {
> +abi_llong key;
> +abi_ullong value;
> +__put_user(0, &pair->value);
> +__get_user(key, &pair->key);
> +switch (key) {
> +case RISCV_HWPROBE_KEY_MVENDORID:
> +__put_user(cfg->mvendorid, &pair->value);
> +break;
> +case RISCV_HWPROBE_KEY_MARCHID:
> +__put_user(cfg->marchid, &pair->value);
> +break;
> +case RISCV_HWPROBE_KEY_MIMPID:
> +__put_user(cfg->mimpid, &pair->value);
> +break;
> +case RISCV_HWPROBE_KEY_BASE_BEHAVIOR:
> +value = riscv_has_ext(env, RVI) &&
> +riscv_has_ext(env, RVM) &&
> +riscv_has_ext(env, RVA) ?
> +RISCV_HWPROBE_BASE_BEHAVIOR_IMA : 0;
> +__put_user(value, &pair->value);
> +break;
> +case RISCV_HWPROBE_KEY_IMA_EXT_0:
> +value = riscv_has_ext(env, RVF) &&
> +riscv_has_ext(env, RVD) ?
> +RISCV_HWPROBE_IMA_FD : 0;
> +value |= riscv_has_ext(env, RVC) ?
> + RISCV_HWPROBE_IMA_C : pair->value;
> +__put_user(value, &pair->value);
> +break;
> +case RISCV_HWPROBE_KEY_CPUPERF_0:
> +__put_user(RISCV_HWPROBE_MISALIGNED_FAST, &pair->value);
> +break;
> +default:
> +__put_user(-1, &pair->key);
> +break;
> +}
> +}
> +}
> +
> +static int cpu_set_valid(abi_long arg3, abi_long arg4)
> +{
> +int ret, i, tmp;
> +size_t host_mask_size, target_mask_size;
> +unsigned long *host_mask;
> +
> +

Re: [PATCH 0/2] target/riscv: Fix the xlen for data address when MPRV=1

2023-06-22 Thread Alistair Francis

On Wed, Jun 14, 2023 at 1:27 PM Weiwei Li  wrote:
>
> Currently, we use the current env->xl as the xlen for address. However, the 
> xlen for data address should be changed to the xlen related to MPP when 
> MPRV=1.
>
> The port is available here:
> https://github.com/plctlab/plct-qemu/tree/plct-addr-xl-upstream
>
> Weiwei Li (2):
>   target/riscv: Add additional xlen for address when MPRV=1
>   target/riscv: update cur_pmbase/pmmask based on mode affected by MPRV

Thanks!

Applied to riscv-to-apply.next

Alistair

>
>  target/riscv/cpu.h| 49 +--
>  target/riscv/cpu_helper.c |  8 +--
>  target/riscv/csr.c| 27 +++--
>  target/riscv/translate.c  | 13 ++-
>  4 files changed, 80 insertions(+), 17 deletions(-)
>
> --
> 2.25.1
>
>

Re: [PATCH] target/riscv: fix the issue of guest reboot then no response or crash in kvm-mode

2023-06-22 Thread Alistair Francis

On Mon, Jun 12, 2023 at 11:07 PM liguang.zhang <18622748...@163.com> wrote:
>
> From: "liguang.zhang" 

Hello, thanks for the patch

>
> There have a issue of guest reboot bug in kvm-mode:
> 1. in guest shell just run the reboot, guest can't reboot success, and host 
> kvm stop the vcpu schedual.
> 2. for smp guest, ctrl+a+c switch to qemu command, use system_reset command 
> to reset the guest, then vcpu crash

There are two issues when rebooting a guest using KVM
 1. When the guest initiates a reboot the host is unable to stop the vcpu
 2. When running a SMP guest the qemu monitor system_reset causes a vcpu crash

This can be fixed by clearing the CSR values at reset and syncing the
MPSTATE with the host.

>
> kernel log
> ```shell
> $reboot
>
> The system is going down NOW!
> Sent SIGTERM to all processes
> logout
> Sent SIGKILL to all processes
> Requesting system reboot
>
> ```
> then no response
>
> for qemu command:
> $system_reset:
>
> kernel log:
> ```shell
> [   53.739556] kvm [150]: VCPU exit error -95
> [   53.739563] kvm [148]: VCPU exit error -95
> [   53.739557] kvm [149]: VCPU exit error -95
> [   53.740957] kvm [149]: SEPC=0x0 SSTATUS=0x24120 HSTATUS=0x2002001c0
> [   53.740957] kvm [148]: SEPC=0x0 SSTATUS=0x24120 HSTATUS=0x2002001c0
> [   53.741054] kvm [148]: SCAUSE=0x14 STVAL=0x0 HTVAL=0x0 HTINST=0x0
> [   53.741058] kvm [149]: SCAUSE=0x14 STVAL=0x0 HTVAL=0x0 HTINST=0x0
> [   53.756187] kvm [150]: SEPC=0x0 SSTATUS=0x24120 HSTATUS=0x2002001c0
> [   53.757797] kvm [150]: SCAUSE=0x14 STVAL=0x0 HTVAL=0x0 HTINST=0x0
> ```
>
> solution:
>
> add reset csr and context for riscv vcpu
> qemu ioctl reset vcpu->arch.power_off state of kvm
>
> tests:
>
> qemu-system-riscv64 -M virt -bios none -kernel Image \
>-smp 4 -enable-kvm \
>-append "rootwait root=/dev/vda ro" \
>-drive file=rootfs.ext2,format=raw,id=hd0 \
>-device virtio-blk-device,drive=hd0
>
> in guest shell:
> $reboot
>
> qemu command:
> $system_reset
>
> ---
> v2:
> - update submit description
>
> Signed-off-by: liguang.zhang 
> ---
>  target/riscv/kvm.c   | 43 
>  target/riscv/kvm_riscv.h |  1 +
>  2 files changed, 44 insertions(+)
>
> diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
> index 0f932a5b96..c6a7824c9e 100644
> --- a/target/riscv/kvm.c
> +++ b/target/riscv/kvm.c
> @@ -42,6 +42,8 @@
>  #include "migration/migration.h"
>  #include "sysemu/runstate.h"
>
> +static bool cap_has_mp_state;
> +
>  static uint64_t kvm_riscv_reg_id(CPURISCVState *env, uint64_t type,
>   uint64_t idx)
>  {
> @@ -335,6 +337,25 @@ int kvm_arch_get_registers(CPUState *cs)
>  return ret;
>  }
>
> +int kvm_riscv_set_mpstate_to_kvm(RISCVCPU *cpu, int state)

This should probably be called:

kvm_riscv_sync_mpstate_to_kvm()

instead

> +{
> +if (cap_has_mp_state) {
> +

No newline required

Otherwise the patch looks good

Alistair

> +struct kvm_mp_state mp_state = {
> +.mp_state = state
> +};
> +
> +int ret = kvm_vcpu_ioctl(CPU(cpu), KVM_SET_MP_STATE, &mp_state);
> +if (ret) {
> +fprintf(stderr, "%s: failed to set MP_STATE %d/%s\n",
> +__func__, ret, strerror(-ret));
> +return -1;
> +}
> +}
> +
> +return 0;
> +}
> +
>  int kvm_arch_put_registers(CPUState *cs, int level)
>  {
>  int ret = 0;
> @@ -354,6 +375,18 @@ int kvm_arch_put_registers(CPUState *cs, int level)
>  return ret;
>  }
>
> +if (KVM_PUT_RESET_STATE == level) {
> +RISCVCPU *cpu = RISCV_CPU(cs);
> +if (cs->cpu_index == 0) {
> +ret = kvm_riscv_set_mpstate_to_kvm(cpu, KVM_MP_STATE_RUNNABLE);
> +} else {
> +ret = kvm_riscv_set_mpstate_to_kvm(cpu, KVM_MP_STATE_STOPPED);
> +}
> +if (ret) {
> +return ret;
> +}
> +}
> +
>  return ret;
>  }
>
> @@ -428,6 +461,7 @@ int kvm_arch_add_msi_route_post(struct 
> kvm_irq_routing_entry *route,
>
>  int kvm_arch_init(MachineState *ms, KVMState *s)
>  {
> +cap_has_mp_state = kvm_check_extension(s, KVM_CAP_MP_STATE);
>  return 0;
>  }
>
> @@ -506,10 +540,19 @@ void kvm_riscv_reset_vcpu(RISCVCPU *cpu)
>  if (!kvm_enabled()) {
>  return;
>  }
> +for (int i=0; i<32; i++)
> +env->gpr[i] = 0;
>  env->pc = cpu->env.kernel_addr;
>  env->gpr[10] = kvm_arch_vcpu_id(CPU(cpu)); /* a0 */
>  env->gpr[11] = cpu->env.fdt_addr;  /* a1 */
>  env->satp = 0;
> +env->mie = 0;
> +env->stvec = 0;
> +env->sscratch = 0;
> +env->sepc = 0;
> +env->scause = 0;
> +env->stval = 0;
> +env->mip = 0;
>  }
>
>  void kvm_riscv_set_irq(RISCVCPU *cpu, int irq, int level)
> diff --git a/target/riscv/kvm_riscv.h b/target/riscv/kvm_riscv.h
> index ed281bdce0..4a4c262820 100644
> --- a/target/riscv/kvm_riscv.h
> +++ b/target/riscv/kvm_riscv.h
> @@ -21,5 +21,6 @@
>
>  void kvm_riscv_reset_vcpu(RI

Re: [PATCH] STM32F100: add support for external memory via FSMC

2023-06-22 Thread Alistair Francis

On Wed, Jun 21, 2023 at 5:44 AM Lucas Villa Real  wrote:
>
> Add support for FSMC on high-density STM32F100 devices and enable
> mapping of additional memory via the `-m SIZE` command-line option.
> FSMC Bank1 can address up to 4x64MB of PSRAM memory at 0x6000.

Thanks for the patches!

>
> RCC is needed to enable peripheral clock for FSMC; this commit
> implements support for RCC through the MMIO interface.

This should be a separate commit. The idea is to break commits up as
small as possible and send a patch series, this makes review much
easier. Each new feature should be its own commit.

>
> Last, high-density devices support up to 32KB of static SRAM, so
> adjust SRAM_SIZE accordingly.

Also, can you include a link to the documentation in the commit message?

>
> Signed-off-by: Lucas C. Villa Real 
> ---
>  docs/system/arm/stm32.rst|  12 ++-
>  hw/arm/Kconfig   |   1 +
>  hw/arm/stm32f100_soc.c   | 102 +++-
>  hw/arm/stm32f1_generic.c |  12 +++
>  hw/misc/Kconfig  |   3 +
>  hw/misc/meson.build  |   1 +
>  hw/misc/stm32f1xx_fsmc.c | 155 +++
>  include/hw/arm/stm32f100_soc.h   |  24 -
>  include/hw/misc/stm32f1xx_fsmc.h |  62 +
>  9 files changed, 368 insertions(+), 4 deletions(-)
>  create mode 100644 hw/misc/stm32f1xx_fsmc.c
>  create mode 100644 include/hw/misc/stm32f1xx_fsmc.h
>
> diff --git a/docs/system/arm/stm32.rst b/docs/system/arm/stm32.rst
> index d0a3b1a7eb..40de58ed04 100644
> --- a/docs/system/arm/stm32.rst
> +++ b/docs/system/arm/stm32.rst
> @@ -40,6 +40,8 @@ Supported devices
>   * SPI controller
>   * System configuration (SYSCFG)
>   * Timer controller (TIMER)
> + * Reset and Clock Controller (RCC)
> + * Flexible static memory controller (FSMC)
>
>  Missing devices
>  ---
> @@ -57,7 +59,6 @@ Missing devices
>   * Power supply configuration (PWR)
>   * Random Number Generator (RNG)
>   * Real-Time Clock (RTC) controller
> - * Reset and Clock Controller (RCC)
>   * Secure Digital Input/Output (SDIO) interface
>   * USB OTG
>   * Watchdog controller (IWDG, WWDG)
> @@ -78,4 +79,11 @@ to select the device density line.  The following values 
> are supported:
>
>  .. code-block:: bash
>
> -  $ qemu-system-arm -M stm32f1-generic -global stm32f100-soc.density=medium 
> ...
> \ No newline at end of file
> +  $ qemu-system-arm -M stm32f1-generic -global stm32f100-soc.density=medium 
> ...
> +
> +High-density devices can also enable up to 256 MB of external memory using
> +the `-m SIZE` option. The memory is mapped at address 0x6000. Example:
> +
> +.. code-block:: bash
> +
> +  $ qemu-system-arm -M stm32f1-generic -m 64M ...
> \ No newline at end of file
> diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
> index 822441945c..dd48068108 100644
> --- a/hw/arm/Kconfig
> +++ b/hw/arm/Kconfig
> @@ -433,6 +433,7 @@ config RASPI
>  config STM32F100_SOC
>  bool
>  select ARM_V7M
> +select STM32F1XX_FSMC
>  select STM32F2XX_USART
>  select STM32F2XX_SPI
>
> diff --git a/hw/arm/stm32f100_soc.c b/hw/arm/stm32f100_soc.c
> index c157ffd644..a2b863d309 100644
> --- a/hw/arm/stm32f100_soc.c
> +++ b/hw/arm/stm32f100_soc.c
> @@ -26,6 +26,7 @@
>  #include "qemu/osdep.h"
>  #include "qapi/error.h"
>  #include "qemu/module.h"
> +#include "qemu/log.h"
>  #include "hw/arm/boot.h"
>  #include "exec/address-spaces.h"
>  #include "hw/arm/stm32f100_soc.h"
> @@ -40,9 +41,85 @@ static const uint32_t usart_addr[STM_NUM_USARTS] = { 
> 0x40013800, 0x40004400,
>  0x40004800 };
>  static const uint32_t spi_addr[STM_NUM_SPIS] = { 0x40013000, 0x40003800,
>  0x40003C00 };
> +static const uint32_t fsmc_addr = 0xA000;
>
>  static const int usart_irq[STM_NUM_USARTS] = {37, 38, 39};
>  static const int spi_irq[STM_NUM_SPIS] = {35, 36, 51};
> +static const int fsmc_irq = 48;
> +
> +static uint64_t stm32f100_rcc_read(void *h, hwaddr offset, unsigned size)
> +{
> +STM32F100State *s = (STM32F100State *) h;
> +switch (offset) {
> +case 0x00:
> +return s->rcc.cr;
> +case 0x04:
> +return s->rcc.cfgr;
> +case 0x08:
> +return s->rcc.cir;
> +case 0x0C:
> +return s->rcc.apb2rstr;
> +case 0x10:
> +return s->rcc.apb1rstr;
> +case 0x14:
> +return s->rcc.ahbenr;
> +case 0x18:
> +return s->rcc.apb2enr;
> +case 0x1C:
> +return s->rcc.apb1enr;
> +case 0x20:
> +return s->rcc.bdcr;
> +case 0x24:
> +return s->rcc.csr;
> +case 0x2C:
> +return s->rcc.cfgr2;
> +default:
> +qemu_log_mask(LOG_GUEST_ERROR,
> +  "%s: Bad offset 0x%"HWADDR_PRIx"\n", __func__, offset);
> +}
> +return 0;
> +}
> +
> +static void stm32f100_rcc_write(void *h, hwaddr offset, uint64_t value64,
> +unsigned size)
> +{
> +STM32F100State *s = (STM32F100State *) h;
> +uint32_t value = value64 &

Re: [PATCH] STM32F100: support different density lines

2023-06-22 Thread Alistair Francis

On Tue, Jun 20, 2023 at 8:20 AM Lucas Villa Real  wrote:
>
> This patch adds support for the emulation of different density lines
> (low, medium, and high). A new class property stm32f100-soc.density=
> has been introduced to allow users to state the desired configuration.
> That property is recognized by a new machine, stm32f1-generic. The SOC
> is configured according to the following:
>
>density=low   32 KB FLASH, 2 SPIs
>density=medium   128 KB FLASH, 2 SPIs
>density=high 512 KB FLASH, 3 SPIs
>
> With this code change we should be able to introduce richer features
> to STM32F100, such as support for FSMC (so that a machine with more
> RAM capacity can be properly emulated). FSMC is supported on high
> density line devices only.
>
> Signed-off-by: Lucas C. Villa Real 
> ---
>  configs/devices/arm-softmmu/default.mak |  1 +
>  docs/system/arm/stm32.rst   | 14 
>  hw/arm/Kconfig  |  6 ++
>  hw/arm/meson.build  |  1 +
>  hw/arm/stm32f100_soc.c  | 92 +
>  hw/arm/stm32f1_generic.c| 70 +++
>  hw/arm/stm32vldiscovery.c   |  3 +-
>  include/hw/arm/stm32f100_soc.h  | 18 -
>  8 files changed, 189 insertions(+), 16 deletions(-)
>  create mode 100644 hw/arm/stm32f1_generic.c
>
> diff --git a/configs/devices/arm-softmmu/default.mak 
> b/configs/devices/arm-softmmu/default.mak
> index 980c48a7d9..4f0f2e99c0 100644
> --- a/configs/devices/arm-softmmu/default.mak
> +++ b/configs/devices/arm-softmmu/default.mak
> @@ -19,6 +19,7 @@ CONFIG_ARM_VIRT=y
>  # CONFIG_NSERIES=n
>  # CONFIG_STELLARIS=n
>  # CONFIG_STM32VLDISCOVERY=n
> +# CONFIG_STM32F1_GENERIC=n
>  # CONFIG_REALVIEW=n
>  # CONFIG_VERSATILE=n
>  # CONFIG_VEXPRESS=n
> diff --git a/docs/system/arm/stm32.rst b/docs/system/arm/stm32.rst
> index d7265b763d..d0a3b1a7eb 100644
> --- a/docs/system/arm/stm32.rst
> +++ b/docs/system/arm/stm32.rst
> @@ -10,6 +10,12 @@ The STM32F1 series is based on ARM Cortex-M3 core. The 
> following machines are
>  based on this chip :
>
>  - ``stm32vldiscovery``  STM32VLDISCOVERY board with STM32F100RBT6 
> microcontroller
> +- ``stm32f1-generic``   Generic STM32F1 board supporting low, medium and high
> +density devices. Low-density emulates a 32KB FLASH;
> +medium-density emulates a 128KB FLASH; high-density
> +emulates a 512KB FLASH. The density also affects the
> +number of peripherals exposed by QEMU for the 
> emulated
> +device. See ``Boot options`` below for more details.
>
>  The STM32F2 series is based on ARM Cortex-M3 core. The following machines are
>  based on this chip :
> @@ -65,3 +71,11 @@ firmware. Example:
>  .. code-block:: bash
>
>$ qemu-system-arm -M stm32vldiscovery -kernel firmware.bin
> +
> +Additionally, the ``stm32f1-generic`` board supports the ``density`` option
> +to select the device density line.  The following values are supported:
> +``low``, ``medium``, ``high``. Example:
> +
> +.. code-block:: bash
> +
> +  $ qemu-system-arm -M stm32f1-generic -global stm32f100-soc.density=medium 
> ...
> \ No newline at end of file

You are missing a new line here

> diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
> index 2159de3ce6..822441945c 100644
> --- a/hw/arm/Kconfig
> +++ b/hw/arm/Kconfig
> @@ -301,6 +301,12 @@ config STM32VLDISCOVERY
>  depends on TCG && ARM
>  select STM32F100_SOC
>
> +config STM32F1_GENERIC
> +bool
> +default y
> +depends on TCG && ARM
> +select STM32F100_SOC
> +
>  config STRONGARM
>  bool
>  select PXA2XX
> diff --git a/hw/arm/meson.build b/hw/arm/meson.build
> index 870ec67376..f88b5fe3c8 100644
> --- a/hw/arm/meson.build
> +++ b/hw/arm/meson.build
> @@ -23,6 +23,7 @@ arm_ss.add(when: 'CONFIG_REALVIEW', if_true: 
> files('realview.c'))
>  arm_ss.add(when: 'CONFIG_SBSA_REF', if_true: files('sbsa-ref.c'))
>  arm_ss.add(when: 'CONFIG_STELLARIS', if_true: files('stellaris.c'))
>  arm_ss.add(when: 'CONFIG_STM32VLDISCOVERY', if_true: 
> files('stm32vldiscovery.c'))
> +arm_ss.add(when: 'CONFIG_STM32F1_GENERIC', if_true: 
> files('stm32f1_generic.c'))
>  arm_ss.add(when: 'CONFIG_COLLIE', if_true: files('collie.c'))
>  arm_ss.add(when: 'CONFIG_VERSATILE', if_true: files('versatilepb.c'))
>  arm_ss.add(when: 'CONFIG_VEXPRESS', if_true: files('vexpress.c'))
> diff --git a/hw/arm/stm32f100_soc.c b/hw/arm/stm32f100_soc.c
> index f7b344ba9f..c157ffd644 100644
> --- a/hw/arm/stm32f100_soc.c
> +++ b/hw/arm/stm32f100_soc.c
> @@ -38,10 +38,11 @@
>
>  static const uint32_t usart_addr[STM_NUM_USARTS] = { 0x40013800, 0x40004400,
>  0x40004800 };
> -static const uint32_t spi_addr[STM_NUM_SPIS] = { 0x40013000, 0x40003800 };
> +static const uint32_t spi_addr[STM_NUM_SPIS] = { 0x40013000, 0x40003800,
> +0x40003C00 };
>
>  static const int usart_irq[STM_NUM_USARTS] = {37, 38, 39};
>

Re: [PATCH v4 0/2] vhost: register and change IOMMU flag depending on ATS state

2023-06-22 Thread Michael S. Tsirkin

On Thu, May 25, 2023 at 03:57:40PM +0300, Viktor Prutyanov wrote:
> When IOMMU and vhost are enabled together, QEMU tracks IOTLB or
> Device-TLB unmap events depending on whether Device-TLB is enabled. But
> even if Device-TLB and PCI ATS is enabled, the guest can reject to use
> it. For example, this situation appears when Windows Server 2022 is
> running with intel-iommu with device-iotlb=on and virtio-net-pci with
> vhost=on. The guest implies that no address translation info cached in
> device IOTLB and doesn't send device IOTLB invalidation commands. So,
> it leads to irrelevant address translations in vhost-net in the host
> kernel. Therefore network frames from the guest in host tap interface
> contains wrong payload data.
> 
> This series adds checking of ATS state for proper unmap flag register
> (IOMMU_NOTIFIER_UNMAP or IOMMU_NOTIFIER_DEVIOTLB_UNMAP).
> 
> Tested on Windows Server 2022, Windows 11 and Fedora guests with
>  -device virtio-net-pci,bus=pci.3,netdev=nd0,iommu_platform=on,ats=on
>  -netdev tap,id=nd0,ifname=tap1,script=no,downscript=no,vhost=on
>  -device intel-iommu,intremap=on,eim=on,device-iotlb=on/off
> Tested on Fedora guest with
>  -device virtio-iommu

Fails build on windows hosts:

https://gitlab.com/mstredhat/qemu/-/pipelines/909063579/failures

C:\GitLab-Runner\builds\mstredhat\qemu\output/../hw/net/virtio-net.c:3954: 
undefined reference to `vhost_toggle_device_iotlb'




> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2001312
> 
> v4: call vhost_toggle_device_iotlb regardless of vhost backend,
> move vhost_started check to generic part
> v3: call virtio_pci_ats_ctrl_trigger directly, remove
> IOMMU_NOTIFIER_UNMAP fallbacks
> v2: remove memory_region_iommu_notify_flags_changed, move trigger to
> VirtioDeviceClass, use vhost_ops, use device_iotlb name
> 
> Viktor Prutyanov (2):
>   vhost: register and change IOMMU flag depending on Device-TLB state
>   virtio-net: pass Device-TLB enable/disable events to vhost
> 
>  hw/net/virtio-net.c   |  1 +
>  hw/virtio/vhost.c | 38 ++
>  include/hw/virtio/vhost.h |  1 +
>  3 files changed, 28 insertions(+), 12 deletions(-)
> 
> -- 
> 2.21.0

Re: [PATCH v4 00/15] vfio: VFIO migration support with vIOMMU

2023-06-22 Thread Joao Martins

On 22/06/2023 22:48, Joao Martins wrote:
> Hey,
> 
> This series introduces support for vIOMMU with VFIO device migration,
> particurlarly related to how we do the dirty page tracking.
> 
> Today vIOMMUs serve two purposes: 1) enable interrupt remaping 2)
> provide dma translation services for guests to provide some form of
> guest kernel managed DMA e.g. for nested virt based usage; (1) is specially
> required for big VMs with VFs with more than 255 vcpus. We tackle both
> and remove the migration blocker when vIOMMU is present provided the
> conditions are met. I have both use-cases here in one series, but I am happy
> to tackle them in separate series.
> 
> As I found out we don't necessarily need to expose the whole vIOMMU
> functionality in order to just support interrupt remapping. x86 IOMMUs
> on Windows Server 2018[2] and Linux >=5.10, with qemu 7.1+ (or really
> Linux guests with commit c40c10 and since qemu commit 8646d9c773d8)
> can instantiate a IOMMU just for interrupt remapping without needing to
> be advertised/support DMA translation. AMD IOMMU in theory can provide
> the same, but Linux doesn't quite support the IR-only part there yet,
> only intel-iommu.
> 
> The series is organized as following:
> 
> Patches 1-5: Today we can't gather vIOMMU details before the guest
> establishes their first DMA mapping via the vIOMMU. So these first four
> patches add a way for vIOMMUs to be asked of their properties at start
> of day. I choose the least churn possible way for now (as opposed to a
> treewide conversion) and allow easy conversion a posteriori. As
> suggested by Peter Xu[7], I have ressurected Yi's patches[5][6] which
> allows us to fetch PCI backing vIOMMU attributes, without necessarily
> tieing the caller (VFIO or anyone else) to an IOMMU MR like I
> was doing in v3.
> 
> Patches 6-8: Handle configs with vIOMMU interrupt remapping but without
> DMA translation allowed. Today the 'dma-translation' attribute is
> x86-iommu only, but the way this series is structured nothing stops from
> other vIOMMUs supporting it too as long as they use
> pci_setup_iommu_ops() and the necessary IOMMU MR get_attr attributes
> are handled. The blocker is thus relaxed when vIOMMUs are able to toggle
> the toggle/report DMA_TRANSLATION attribute. With the patches up to this set,
> we've then tackled item (1) of the second paragraph.
> 
> Patches 9-15: Simplified a lot from v2 (patch 9) to only track the complete
> IOVA address space, leveraging the logic we use to compose the dirty ranges.
> The blocker is once again relaxed for vIOMMUs that advertise their IOVA
> addressing limits. This tackles item (2). So far I mainly use it with
> intel-iommu, although I have a small set of patches for virtio-iommu per
> Alex's suggestion in v2.
> 
> Comments, suggestions welcome. Thanks for the review!
> 

By mistake, I've spuriously sent this a little too early. There's some styling
errors in patch 1, 6 and 10. I've fixed the problems already, but I won't respin
the series as I don't wanna patch bomb folks again. I will give at least a week
or 2 before I do that. My apologies :/

Meanwhile, here's the diff of those fixes:

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 989993e303a6..7fad59126215 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3880,7 +3880,7 @@ static int vtd_iommu_get_attr(IOMMUMemoryRegion *iommu_mr,
 {
 hwaddr *max_iova = (hwaddr *)(uintptr_t) data;

-*max_iova = MAKE_64BIT_MASK(0, s->aw_bits);;
+*max_iova = MAKE_64BIT_MASK(0, s->aw_bits);
 break;
 }
 default:
@@ -4071,8 +4071,9 @@ static int vtd_get_iommu_attr(PCIBus *bus, void *opaque,
int32_t devfn,
 assert(0 <= devfn && devfn < PCI_DEVFN_MAX);

 vtd_as = vtd_find_add_as(s, bus, devfn, PCI_NO_PASID);
-if (!vtd_as)
-   return -EINVAL;
+if (!vtd_as) {
+return -EINVAL;
+}

 return memory_region_iommu_get_attr(&vtd_as->iommu, attr, data);
 }
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 91ba6f0927a4..0cf000a9c1ff 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2700,10 +2700,10 @@ AddressSpace *pci_device_iommu_address_space(PCIDevice 
*dev)
 pci_device_get_iommu_bus_devfn(dev, &bus, &iommu_bus, &devfn);
 if (!pci_bus_bypass_iommu(bus) && iommu_bus) {
 if (iommu_bus->iommu_fn) {
-   return iommu_bus->iommu_fn(bus, iommu_bus->iommu_opaque, devfn);
+return iommu_bus->iommu_fn(bus, iommu_bus->iommu_opaque, devfn);
 } else if (iommu_bus->iommu_ops &&
iommu_bus->iommu_ops->get_address_space) {
-   return iommu_bus->iommu_ops->get_address_space(bus,
+return iommu_bus->iommu_ops->get_address_space(bus,
iommu_bus->iommu_opaque, devfn);
 }
 }

[PATCH] net: add initial support for AF_XDP network backend

2023-06-22 Thread Ilya Maximets

AF_XDP is a network socket family that allows communication directly
with the network device driver in the kernel, bypassing most or all
of the kernel networking stack.  In the essence, the technology is
pretty similar to netmap.  But, unlike netmap, AF_XDP is Linux-native
and works with any network interfaces without driver modifications.
Unlike vhost-based backends (kernel, user, vdpa), AF_XDP doesn't
require access to character devices or unix sockets.  Only access to
the network interface itself is necessary.

This patch implements a network backend that communicates with the
kernel by creating an AF_XDP socket.  A chunk of userspace memory
is shared between QEMU and the host kernel.  4 ring buffers (Tx, Rx,
Fill and Completion) are placed in that memory along with a pool of
memory buffers for the packet data.  Data transmission is done by
allocating one of the buffers, copying packet data into it and
placing the pointer into Tx ring.  After transmission, device will
return the buffer via Completion ring.  On Rx, device will take
a buffer form a pre-populated Fill ring, write the packet data into
it and place the buffer into Rx ring.

AF_XDP network backend takes on the communication with the host
kernel and the network interface and forwards packets to/from the
peer device in QEMU.

Usage example:

  -device virtio-net-pci,netdev=guest1,mac=00:16:35:AF:AA:5C
  -netdev af-xdp,ifname=ens6f1np1,id=guest1,mode=native,queues=1

XDP program bridges the socket with a network interface.  It can be
attached to the interface in 2 different modes:

1. skb - this mode should work for any interface and doesn't require
 driver support.  With a caveat of lower performance.

2. native - this does require support from the driver and allows to
bypass skb allocation in the kernel and potentially use
zero-copy while getting packets in/out userspace.

By default, QEMU will try to use native mode and fall back to skb.
Mode can be forced via 'mode' option.  To force 'copy' even in native
mode, use 'force-copy=on' option.  This might be useful if there is
some issue with the driver.

Option 'queues=N' allows to specify how many device queues should
be open.  Note that all the queues that are not open are still
functional and can receive traffic, but it will not be delivered to
QEMU.  So, the number of device queues should generally match the
QEMU configuration, unless the device is shared with something
else and the traffic re-direction to appropriate queues is correctly
configured on a device level (e.g. with ethtool -N).
'start-queue=M' option can be used to specify from which queue id
QEMU should start configuring 'N' queues.  It might also be necessary
to use this option with certain NICs, e.g. MLX5 NICs.  See the docs
for examples.

In a general case QEMU will need CAP_NET_ADMIN and CAP_SYS_ADMIN
capabilities in order to load default XSK/XDP programs to the
network interface and configure BTF maps.  It is possible, however,
to run only with CAP_NET_RAW.  For that to work, an external process
with admin capabilities will need to pre-load default XSK program
and pass an open file descriptor for this program's 'xsks_map' to
QEMU process on startup.  Network backend will need to be configured
with 'inhibit=on' to avoid loading of the programs.  The file
descriptor for 'xsks_map' can be passed via 'xsks-map-fd=N' option.

There are few performance challenges with the current network backends.

First is that they do not support IO threads.  This means that data
path is handled by the main thread in QEMU and may slow down other
work or may be slowed down by some other work.  This also means that
taking advantage of multi-queue is generally not possible today.

Another thing is that data path is going through the device emulation
code, which is not really optimized for performance.  The fastest
"frontend" device is virtio-net.  But it's not optimized for heavy
traffic either, because it expects such use-cases to be handled via
some implementation of vhost (user, kernel, vdpa).  In practice, we
have virtio notifications and rcu lock/unlock on a per-packet basis
and not very efficient accesses to the guest memory.  Communication
channels between backend and frontend devices do not allow passing
more than one packet at a time as well.

Some of these challenges can be avoided in the future by adding better
batching into device emulation or by implementing vhost-af-xdp variant.

There are also a few kernel limitations.  AF_XDP sockets do not
support any kinds of checksum or segmentation offloading.  Buffers
are limited to a page size (4K), i.e. MTU is limited.  Multi-buffer
support is not implemented for AF_XDP today.  Also, transmission in
all non-zero-copy modes is synchronous, i.e. done in a syscall.
That doesn't allow high packet rates on virtual interfaces.

However, keeping in mind all of these challenges, current implementation
of the AF_XDP backend shows a decent performance while running on t

Re: [PATCH RFC 0/6] Switch iotests to pyvenv

2023-06-22 Thread Paolo Bonzini

Il gio 22 giu 2023, 23:18 John Snow  ha scritto:

> Possibly I could teach mkvenv a new trick, like "mkvenv init iotests"
> and have the mkvenv script DTRT at that point, whatever that is --
> ideally exiting very quickly without doing anything.
>

Or maybe check itself should do the bootstrap if it's invoked from the
venv?!?

Paolo

>

[PATCH v4 12/15] vfio/common: Support device dirty page tracking with vIOMMU

2023-06-22 Thread Joao Martins

Currently, device dirty page tracking with vIOMMU is not supported,
and a blocker is added and the migration is prevented.

When vIOMMU is used, IOVA ranges are DMA mapped/unmapped on the fly as
requesting by the vIOMMU. These IOVA ranges can potentially be mapped
anywhere in the vIOMMU IOVA space as advertised by the VMM.

To support device dirty tracking when vIOMMU enabled instead create the
dirty ranges based on the vIOMMU provided limits, which leads to the
tracking of the whole IOVA space regardless of what devices use.

Signed-off-by: Avihai Horon 
Signed-off-by: Joao Martins 
---
 include/hw/vfio/vfio-common.h |  1 +
 hw/vfio/common.c  | 58 +--
 hw/vfio/pci.c |  7 +
 3 files changed, 56 insertions(+), 10 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index f41860988d6b..c4bafad084b4 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -71,6 +71,7 @@ typedef struct VFIOMigration {
 typedef struct VFIOAddressSpace {
 AddressSpace *as;
 bool no_dma_translation;
+hwaddr max_iova;
 QLIST_HEAD(, VFIOContainer) containers;
 QLIST_ENTRY(VFIOAddressSpace) list;
 } VFIOAddressSpace;
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index ecfb9afb3fb6..85fddef24026 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -428,6 +428,25 @@ static bool vfio_viommu_preset(void)
 return false;
 }
 
+static int vfio_viommu_get_max_iova(hwaddr *max_iova)
+{
+VFIOAddressSpace *space;
+
+*max_iova = 0;
+
+QLIST_FOREACH(space, &vfio_address_spaces, list) {
+if (space->as == &address_space_memory) {
+continue;
+}
+
+if (*max_iova < space->max_iova) {
+*max_iova = space->max_iova;
+}
+}
+
+return *max_iova == 0;
+}
+
 int vfio_block_giommu_migration(Error **errp)
 {
 int ret;
@@ -1464,10 +1483,11 @@ static const MemoryListener 
vfio_dirty_tracking_listener = {
 .region_add = vfio_listener_dirty_tracking_update,
 };
 
-static void vfio_dirty_tracking_init(VFIOContainer *container,
+static int vfio_dirty_tracking_init(VFIOContainer *container,
  VFIODirtyRanges *ranges)
 {
 VFIODirtyRangesListener dirty;
+int ret;
 
 memset(&dirty, 0, sizeof(dirty));
 dirty.ranges.min32 = UINT32_MAX;
@@ -1475,17 +1495,29 @@ static void vfio_dirty_tracking_init(VFIOContainer 
*container,
 dirty.listener = vfio_dirty_tracking_listener;
 dirty.container = container;
 
-memory_listener_register(&dirty.listener,
- container->space->as);
+if (vfio_viommu_preset()) {
+hwaddr iommu_max_iova;
+
+ret = vfio_viommu_get_max_iova(&iommu_max_iova);
+if (ret) {
+return -EINVAL;
+}
+
+vfio_dirty_tracking_update(0, iommu_max_iova, &dirty.ranges);
+} else {
+memory_listener_register(&dirty.listener,
+ container->space->as);
+/*
+ * The memory listener is synchronous, and used to calculate the range
+ * to dirty tracking. Unregister it after we are done as we are not
+ * interested in any follow-up updates.
+ */
+memory_listener_unregister(&dirty.listener);
+}
 
 *ranges = dirty.ranges;
 
-/*
- * The memory listener is synchronous, and used to calculate the range
- * to dirty tracking. Unregister it after we are done as we are not
- * interested in any follow-up updates.
- */
-memory_listener_unregister(&dirty.listener);
+return 0;
 }
 
 static void vfio_devices_dma_logging_stop(VFIOContainer *container)
@@ -1590,7 +1622,13 @@ static int vfio_devices_dma_logging_start(VFIOContainer 
*container)
 VFIOGroup *group;
 int ret = 0;
 
-vfio_dirty_tracking_init(container, &ranges);
+ret = vfio_dirty_tracking_init(container, &ranges);
+if (ret) {
+error_report("Failed to init DMA logging ranges, err %d",
+  ret);
+return -EOPNOTSUPP;
+}
+
 feature = vfio_device_feature_dma_logging_start_create(container,
&ranges);
 if (!feature) {
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 8a98e6ffc480..3bda5618c5b5 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2974,6 +2974,13 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
   &dma_translation);
 space->no_dma_translation = !dma_translation;
 
+/*
+ * Support for advertised IOMMU address space boundaries is optional.
+ * By default, it is not advertised i.e. space::max_iova is 0.
+ */
+pci_device_iommu_get_attr(pdev, IOMMU_ATTR_MAX_IOVA,
+  &space->max_iova);
+
 QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
 if (strcmp(vbasedev_iter->name, vbasedev->name) == 0) {

[PATCH v4 10/15] intel-iommu: Implement IOMMU_ATTR_MAX_IOVA get_attr() attribute

2023-06-22 Thread Joao Martins

From: Avihai Horon 

Implement get_attr() method and use the address width property to report
the IOMMU_ATTR_MAX_IOVA attribute.

Signed-off-by: Avihai Horon 
Signed-off-by: Joao Martins 
---
 hw/i386/intel_iommu.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index ed2a46e008df..989993e303a6 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3876,6 +3876,13 @@ static int vtd_iommu_get_attr(IOMMUMemoryRegion 
*iommu_mr,
 *enabled = s->dma_translation;
 break;
 }
+case IOMMU_ATTR_MAX_IOVA:
+{
+hwaddr *max_iova = (hwaddr *)(uintptr_t) data;
+
+*max_iova = MAKE_64BIT_MASK(0, s->aw_bits);;
+break;
+}
 default:
 ret = -EINVAL;
 break;
-- 
2.17.2

[PATCH v4 11/15] vfio/common: Move dirty tracking ranges update to helper

2023-06-22 Thread Joao Martins

Separate the changes that updates the ranges from the listener, to
make it reusable in preparation to expand its use to vIOMMU support.

Signed-off-by: Joao Martins 
---
 hw/vfio/common.c | 36 +---
 1 file changed, 21 insertions(+), 15 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 17c1d882e221..ecfb9afb3fb6 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1413,20 +1413,10 @@ typedef struct VFIODirtyRangesListener {
 MemoryListener listener;
 } VFIODirtyRangesListener;
 
-static void vfio_dirty_tracking_update(MemoryListener *listener,
-   MemoryRegionSection *section)
+static void vfio_dirty_tracking_update(hwaddr iova, hwaddr end,
+   VFIODirtyRanges *range)
 {
-VFIODirtyRangesListener *dirty = container_of(listener,
-  VFIODirtyRangesListener,
-  listener);
-VFIODirtyRanges *range = &dirty->ranges;
-hwaddr iova, end, *min, *max;
-
-if (!vfio_listener_valid_section(section, "tracking_update") ||
-!vfio_get_section_iova_range(dirty->container, section,
- &iova, &end, NULL)) {
-return;
-}
+hwaddr *min, *max;
 
 /*
  * The address space passed to the dirty tracker is reduced to two ranges:
@@ -1450,12 +1440,28 @@ static void vfio_dirty_tracking_update(MemoryListener 
*listener,
 }
 
 trace_vfio_device_dirty_tracking_update(iova, end, *min, *max);
-return;
+}
+
+static void vfio_listener_dirty_tracking_update(MemoryListener *listener,
+MemoryRegionSection *section)
+{
+VFIODirtyRangesListener *dirty = container_of(listener,
+  VFIODirtyRangesListener,
+  listener);
+hwaddr iova, end;
+
+if (!vfio_listener_valid_section(section, "tracking_update") ||
+!vfio_get_section_iova_range(dirty->container, section,
+ &iova, &end, NULL)) {
+return;
+}
+
+vfio_dirty_tracking_update(iova, end, &dirty->ranges);
 }
 
 static const MemoryListener vfio_dirty_tracking_listener = {
 .name = "vfio-tracking",
-.region_add = vfio_dirty_tracking_update,
+.region_add = vfio_listener_dirty_tracking_update,
 };
 
 static void vfio_dirty_tracking_init(VFIOContainer *container,
-- 
2.17.2

[PATCH v4 14/15] vfio/common: Optimize device dirty page tracking with vIOMMU

2023-06-22 Thread Joao Martins

From: Avihai Horon 

When vIOMMU is enabled, syncing dirty page bitmaps is done by replaying
the vIOMMU mappings and querying the dirty bitmap for each mapping.

With device dirty tracking this causes a lot of overhead, since the HW
is queried many times (even with small idle guest this can end up with
thousands of calls to HW).

Optimize this by de-coupling dirty bitmap query from vIOMMU replay.
Now a single dirty bitmap is queried per vIOMMU MR section, which is
then used for all corresponding vIOMMU mappings within that MR section.

Signed-off-by: Avihai Horon 
Signed-off-by: Joao Martins 
---
 hw/vfio/common.c | 74 ++--
 1 file changed, 72 insertions(+), 2 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index c530e9d87f21..62f91e8e102d 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1832,8 +1832,36 @@ out:
 typedef struct {
 IOMMUNotifier n;
 VFIOGuestIOMMU *giommu;
+VFIOBitmap vbmap;
 } vfio_giommu_dirty_notifier;
 
+static int vfio_iommu_set_dirty_bitmap(VFIOContainer *container,
+   vfio_giommu_dirty_notifier *gdn,
+   hwaddr iova, hwaddr size,
+   ram_addr_t ram_addr)
+{
+VFIOBitmap *vbmap = &gdn->vbmap;
+VFIOBitmap dst_vbmap;
+hwaddr start_iova = REAL_HOST_PAGE_ALIGN(gdn->n.start);
+hwaddr copy_offset;
+int ret;
+
+ret = vfio_bitmap_alloc(&dst_vbmap, size);
+if (ret) {
+return -ENOMEM;
+}
+
+copy_offset = (iova - start_iova) / qemu_real_host_page_size();
+bitmap_copy_with_src_offset(dst_vbmap.bitmap, vbmap->bitmap, copy_offset,
+dst_vbmap.pages);
+
+cpu_physical_memory_set_dirty_lebitmap(dst_vbmap.bitmap, ram_addr,
+   dst_vbmap.pages);
+g_free(dst_vbmap.bitmap);
+
+return 0;
+}
+
 static void vfio_iommu_map_dirty_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
 {
 vfio_giommu_dirty_notifier *gdn = container_of(n,
@@ -1854,8 +1882,15 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier 
*n, IOMMUTLBEntry *iotlb)
 
 rcu_read_lock();
 if (vfio_get_xlat_addr(iotlb, NULL, &translated_addr, NULL)) {
-ret = vfio_get_dirty_bitmap(container, iova, iotlb->addr_mask + 1,
-translated_addr);
+if (gdn->vbmap.bitmap) {
+ret = vfio_iommu_set_dirty_bitmap(container, gdn, iova,
+  iotlb->addr_mask + 1,
+  translated_addr);
+} else {
+ret = vfio_get_dirty_bitmap(container, iova, iotlb->addr_mask + 1,
+translated_addr);
+}
+
 if (ret) {
 error_report("vfio_iommu_map_dirty_notify(%p, 0x%"HWADDR_PRIx", "
  "0x%"HWADDR_PRIx") = %d (%s)",
@@ -1936,6 +1971,7 @@ static int vfio_sync_iommu_dirty_bitmap(VFIOContainer 
*container,
 }
 
 gdn.giommu = giommu;
+gdn.vbmap.bitmap = NULL;
 idx = memory_region_iommu_attrs_to_index(giommu->iommu_mr,
  MEMTXATTRS_UNSPECIFIED);
 
@@ -1943,10 +1979,44 @@ static int vfio_sync_iommu_dirty_bitmap(VFIOContainer 
*container,
section->size);
 llend = int128_sub(llend, int128_one());
 
+/*
+ * Optimize device dirty tracking if the MR section is at least partially
+ * tracked. Optimization is done by querying a single dirty bitmap for the
+ * entire range instead of querying dirty bitmap for each vIOMMU mapping.
+ */
+if (vfio_devices_all_device_dirty_tracking(container)) {
+hwaddr start = REAL_HOST_PAGE_ALIGN(section->offset_within_region);
+hwaddr end = int128_get64(llend);
+hwaddr iommu_max_iova;
+hwaddr size;
+int ret;
+
+ret = vfio_viommu_get_max_iova(&iommu_max_iova);
+if (ret) {
+return -EINVAL;
+}
+
+size = REAL_HOST_PAGE_ALIGN(MIN(iommu_max_iova, end) - start);
+
+ret = vfio_bitmap_alloc(&gdn.vbmap, size);
+if (ret) {
+return -ENOMEM;
+}
+
+ret = vfio_devices_query_dirty_bitmap(container, &gdn.vbmap,
+  start, size);
+if (ret) {
+g_free(gdn.vbmap.bitmap);
+
+return ret;
+}
+}
+
 iommu_notifier_init(&gdn.n, vfio_iommu_map_dirty_notify, 
IOMMU_NOTIFIER_MAP,
 section->offset_within_region, int128_get64(llend),
 idx);
 memory_region_iommu_replay(giommu->iommu_mr, &gdn.n);
+g_free(gdn.vbmap.bitmap);
 
 return 0;
 }
-- 
2.17.2

[PATCH v4 13/15] vfio/common: Extract vIOMMU code from vfio_sync_dirty_bitmap()

2023-06-22 Thread Joao Martins

From: Avihai Horon 

Extract vIOMMU code from vfio_sync_dirty_bitmap() to a new function and
restructure the code.

This is done in preparation for optimizing vIOMMU deviice dirty page
tracking. No functional changes intended.

Signed-off-by: Avihai Horon 
Signed-off-by: Joao Martins 
---
 hw/vfio/common.c | 63 +---
 1 file changed, 38 insertions(+), 25 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 85fddef24026..c530e9d87f21 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1914,37 +1914,50 @@ static int 
vfio_sync_ram_discard_listener_dirty_bitmap(VFIOContainer *container,
 &vrdl);
 }
 
+static int vfio_sync_iommu_dirty_bitmap(VFIOContainer *container,
+MemoryRegionSection *section)
+{
+VFIOGuestIOMMU *giommu;
+bool found = false;
+Int128 llend;
+vfio_giommu_dirty_notifier gdn;
+int idx;
+
+QLIST_FOREACH(giommu, &container->giommu_list, giommu_next) {
+if (MEMORY_REGION(giommu->iommu_mr) == section->mr &&
+giommu->n.start == section->offset_within_region) {
+found = true;
+break;
+}
+}
+
+if (!found) {
+return 0;
+}
+
+gdn.giommu = giommu;
+idx = memory_region_iommu_attrs_to_index(giommu->iommu_mr,
+ MEMTXATTRS_UNSPECIFIED);
+
+llend = int128_add(int128_make64(section->offset_within_region),
+   section->size);
+llend = int128_sub(llend, int128_one());
+
+iommu_notifier_init(&gdn.n, vfio_iommu_map_dirty_notify, 
IOMMU_NOTIFIER_MAP,
+section->offset_within_region, int128_get64(llend),
+idx);
+memory_region_iommu_replay(giommu->iommu_mr, &gdn.n);
+
+return 0;
+}
+
 static int vfio_sync_dirty_bitmap(VFIOContainer *container,
   MemoryRegionSection *section)
 {
 ram_addr_t ram_addr;
 
 if (memory_region_is_iommu(section->mr)) {
-VFIOGuestIOMMU *giommu;
-
-QLIST_FOREACH(giommu, &container->giommu_list, giommu_next) {
-if (MEMORY_REGION(giommu->iommu_mr) == section->mr &&
-giommu->n.start == section->offset_within_region) {
-Int128 llend;
-vfio_giommu_dirty_notifier gdn = { .giommu = giommu };
-int idx = memory_region_iommu_attrs_to_index(giommu->iommu_mr,
-   MEMTXATTRS_UNSPECIFIED);
-
-llend = 
int128_add(int128_make64(section->offset_within_region),
-   section->size);
-llend = int128_sub(llend, int128_one());
-
-iommu_notifier_init(&gdn.n,
-vfio_iommu_map_dirty_notify,
-IOMMU_NOTIFIER_MAP,
-section->offset_within_region,
-int128_get64(llend),
-idx);
-memory_region_iommu_replay(giommu->iommu_mr, &gdn.n);
-break;
-}
-}
-return 0;
+return vfio_sync_iommu_dirty_bitmap(container, section);
 } else if (memory_region_has_ram_discard_manager(section->mr)) {
 return vfio_sync_ram_discard_listener_dirty_bitmap(container, section);
 }
-- 
2.17.2

[PATCH v4 02/15] hw/pci: Refactor pci_device_iommu_address_space()

2023-06-22 Thread Joao Martins

From: Yi Liu 

Refactor pci_device_iommu_address_space() and move the
code that fetches the device bus and iommu bus into its
own private helper pci_device_get_iommu_bus_devfn().

This is in preparation to introduce pci_device_iommu_get_attr()
which will need to use it too.

Signed-off-by: Yi Liu 
[joao: Commit message, and better splitting]
Signed-off-by: Joao Martins 
---
Splitted from v1:
https://lore.kernel.org/all/20210302203827.437645-6-yi.l@intel.com/
---
 hw/pci/pci.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 4e32c09e81d6..90ae92a43d85 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2632,8 +2632,8 @@ static void pci_device_class_base_init(ObjectClass 
*klass, void *data)
 assert(conventional || pcie || cxl);
 }
 }
-
-AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
+static void pci_device_get_iommu_bus_devfn(PCIDevice *dev, PCIBus **pdevbus,
+   PCIBus **pbus, uint8_t *pdevfn)
 {
 PCIBus *bus = pci_get_bus(dev);
 PCIBus *iommu_bus = bus;
@@ -2686,6 +2686,18 @@ AddressSpace *pci_device_iommu_address_space(PCIDevice 
*dev)
 
 iommu_bus = parent_bus;
 }
+
+*pdevbus = bus;
+*pbus = iommu_bus;
+*pdevfn = devfn;
+}
+
+AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
+{
+PCIBus *bus, *iommu_bus;
+uint8_t devfn;
+
+pci_device_get_iommu_bus_devfn(dev, &bus, &iommu_bus, &devfn);
 if (!pci_bus_bypass_iommu(bus) && iommu_bus) {
 if (iommu_bus->iommu_fn) {
return iommu_bus->iommu_fn(bus, iommu_bus->iommu_opaque, devfn);
-- 
2.17.2

[PATCH v4 09/15] memory/iommu: Add IOMMU_ATTR_MAX_IOVA attribute

2023-06-22 Thread Joao Martins

From: Avihai Horon 

Add a new IOMMU attribute IOMMU_ATTR_MAX_IOVA which indicates the
maximal IOVA that an IOMMU can use.

This attribute will be used by VFIO device dirty page tracking so it can
track the entire IOVA space when needed (i.e. when vIOMMU is enabled).

Signed-off-by: Avihai Horon 
Signed-off-by: Joao Martins 
Acked-by: Peter Xu 
---
 include/exec/memory.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 5d6c2ab1f397..742bff82dc77 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -321,6 +321,7 @@ typedef struct MemoryRegionClass {
 enum IOMMUMemoryRegionAttr {
 IOMMU_ATTR_SPAPR_TCE_FD,
 IOMMU_ATTR_DMA_TRANSLATION,
+IOMMU_ATTR_MAX_IOVA,
 };
 
 /*
-- 
2.17.2

[PATCH v4 05/15] memory/iommu: Add IOMMU_ATTR_DMA_TRANSLATION attribute

2023-06-22 Thread Joao Martins

Add a new IOMMU attribute IOMMU_ATTR_DMA_TRANSLATION which indicates
whether the IOMMU supports DMA Translation.

This attribute will be used by VFIO device dirty page tracking so it can
restrict the IOVA under tracking to the memory map when vIOMMU is
enabled only for interrupt remapping without dma translation enabled.

Signed-off-by: Joao Martins 
---
 include/exec/memory.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 47c2e0221c35..5d6c2ab1f397 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -319,7 +319,8 @@ typedef struct MemoryRegionClass {
 
 
 enum IOMMUMemoryRegionAttr {
-IOMMU_ATTR_SPAPR_TCE_FD
+IOMMU_ATTR_SPAPR_TCE_FD,
+IOMMU_ATTR_DMA_TRANSLATION,
 };
 
 /*
-- 
2.17.2

[PATCH v4 03/15] hw/pci: Introduce pci_device_iommu_get_attr()

2023-06-22 Thread Joao Martins

From: Yi Liu 

Introduce pci_device_iommu_get_attr() to get vIOMMU attributes
from the PCI device.

This is in preparation to ask if vIOMMU has dma translation enabled
and also to get IOVA boundaries.

Signed-off-by: Yi Liu 
[joao: Massage commit message; add one more argument in
 pci_device_get_iommu_bus_devfn(); rename to pci_device_iommu_get_attr()
 to align with the other already namespaced function. ]
Signed-off-by: Joao Martins 
---
follow-up version from:
https://lore.kernel.org/all/20210302203827.437645-6-yi.l@intel.com/
---
 include/hw/pci/pci.h |  4 
 hw/pci/pci.c | 16 
 2 files changed, 20 insertions(+)

diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index f59aef5a329a..10c81287b6b3 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -372,8 +372,12 @@ typedef struct PCIIOMMUOps PCIIOMMUOps;
 struct PCIIOMMUOps {
 AddressSpace * (*get_address_space)(PCIBus *bus,
 void *opaque, int32_t devfn);
+int (*get_iommu_attr)(PCIBus *bus, void *opaque, int32_t devfn,
+  enum IOMMUMemoryRegionAttr attr, void *data);
 };
 void pci_setup_iommu_ops(PCIBus *bus, const PCIIOMMUOps *iommu_ops, void 
*opaque);
+int pci_device_iommu_get_attr(PCIDevice *dev, enum IOMMUMemoryRegionAttr attr,
+  void *data);
 
 pcibus_t pci_bar_address(PCIDevice *d,
  int reg, uint8_t type, pcibus_t size);
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 90ae92a43d85..91ba6f0927a4 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2710,6 +2710,22 @@ AddressSpace *pci_device_iommu_address_space(PCIDevice 
*dev)
 return &address_space_memory;
 }
 
+int pci_device_iommu_get_attr(PCIDevice *dev, enum IOMMUMemoryRegionAttr attr,
+  void *data)
+{
+PCIBus *bus, *iommu_bus;
+uint8_t devfn;
+
+pci_device_get_iommu_bus_devfn(dev, &bus, &iommu_bus, &devfn);
+if (!pci_bus_bypass_iommu(bus) && iommu_bus &&
+iommu_bus->iommu_ops && iommu_bus->iommu_ops->get_iommu_attr) {
+return iommu_bus->iommu_ops->get_iommu_attr(bus, 
iommu_bus->iommu_opaque,
+devfn, attr, data);
+}
+
+return -ENOENT;
+}
+
 void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque)
 {
 bus->iommu_fn = fn;
-- 
2.17.2

[PATCH v4 15/15] vfio/common: Block migration with vIOMMUs without address width limits

2023-06-22 Thread Joao Martins

Only block the case when the underlying vIOMMU model does not report any
address space limits, in addition to DMA translation being off or no
vIOMMU present. The limits are needed such that can define the IOVA limits
that arm the device dirty tracker.

Additionally, reword the migration blocker error message to clarify that
we the configured vIOMMU does not support migration, as opposed to
implying that just being there blocks migration.

Signed-off-by: Joao Martins 
---
 hw/vfio/common.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 62f91e8e102d..c3cc0dd47044 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -449,15 +449,18 @@ static int vfio_viommu_get_max_iova(hwaddr *max_iova)
 
 int vfio_block_giommu_migration(Error **errp)
 {
+hwaddr max;
 int ret;
 
 if (giommu_migration_blocker ||
-!vfio_viommu_preset()) {
+!vfio_viommu_preset() ||
+(vfio_viommu_preset() && !vfio_viommu_get_max_iova(&max))) {
 return 0;
 }
 
 error_setg(&giommu_migration_blocker,
-   "Migration is currently not supported with vIOMMU enabled");
+   "Migration with vIOMMU is currently not supported "
+   "without vIOMMU address space boundaries");
 ret = migrate_add_blocker(giommu_migration_blocker, errp);
 if (ret < 0) {
 error_free(giommu_migration_blocker);
-- 
2.17.2

[PATCH v4 08/15] vfio/common: Relax vIOMMU detection when DMA translation is off

2023-06-22 Thread Joao Martins

Relax the vIOMMU migration blocker when the underlying IOMMU reports DMA
translation disabled. When it is disabled there will be no DMA mappings
via the vIOMMU and the guest can only use it for Interrupt Remapping.

The latter is done via vfio_viommu_preset() return value where in
addition to validating that the address space is memory, we also check
whether the vIOMMU backing the PCI device has DMA translation on. It
is assumed to be enabled, if the IOMMU model does not support toggling
on/off the dma-translation property.

Intel IOMMU right now is the only case supporting, although AMD IOMMU
can in theory provide the same functionality.

Signed-off-by: Joao Martins 
---
 hw/vfio/common.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index fa8fd949b1cf..17c1d882e221 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -419,7 +419,8 @@ static bool vfio_viommu_preset(void)
 VFIOAddressSpace *space;
 
 QLIST_FOREACH(space, &vfio_address_spaces, list) {
-if (space->as != &address_space_memory) {
+if ((space->as != &address_space_memory) &&
+!space->no_dma_translation) {
 return true;
 }
 }
-- 
2.17.2

[PATCH v4 06/15] intel-iommu: Implement get_attr() method

2023-06-22 Thread Joao Martins

Implement IOMMU MR get_attr() method and use the dma_translation
property to report the IOMMU_ATTR_DMA_TRANSLATION attribute.
Additionally add the necessary get_iommu_attr into the PCIIOMMUOps to
support pci_device_iommu_get_attr().

The callback in there acts as a IOMMU-specific address space walker
which will call get_attr in the IOMMUMemoryRegion backing the device to
fetch the desired attribute.

Signed-off-by: Avihai Horon 
Signed-off-by: Joao Martins 
---
 hw/i386/intel_iommu.c | 40 
 1 file changed, 40 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 1606d1b952d0..ed2a46e008df 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3861,6 +3861,29 @@ static void vtd_iommu_replay(IOMMUMemoryRegion 
*iommu_mr, IOMMUNotifier *n)
 return;
 }
 
+static int vtd_iommu_get_attr(IOMMUMemoryRegion *iommu_mr,
+  enum IOMMUMemoryRegionAttr attr, void *data)
+{
+VTDAddressSpace *vtd_as = container_of(iommu_mr, VTDAddressSpace, iommu);
+IntelIOMMUState *s = vtd_as->iommu_state;
+int ret = 0;
+
+switch (attr) {
+case IOMMU_ATTR_DMA_TRANSLATION:
+{
+bool *enabled = (bool *)(uintptr_t) data;
+
+*enabled = s->dma_translation;
+break;
+}
+default:
+ret = -EINVAL;
+break;
+}
+
+return ret;
+}
+
 /* Do the initialization. It will also be called when reset, so pay
  * attention when adding new initialization stuff.
  */
@@ -4032,8 +4055,24 @@ static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, 
void *opaque, int devfn)
 return &vtd_as->as;
 }
 
+static int vtd_get_iommu_attr(PCIBus *bus, void *opaque, int32_t devfn,
+  enum IOMMUMemoryRegionAttr attr, void *data)
+{
+IntelIOMMUState *s = opaque;
+VTDAddressSpace *vtd_as;
+
+assert(0 <= devfn && devfn < PCI_DEVFN_MAX);
+
+vtd_as = vtd_find_add_as(s, bus, devfn, PCI_NO_PASID);
+if (!vtd_as)
+   return -EINVAL;
+
+return memory_region_iommu_get_attr(&vtd_as->iommu, attr, data);
+}
+
 static PCIIOMMUOps vtd_iommu_ops = {
 .get_address_space = vtd_host_dma_iommu,
+.get_iommu_attr = vtd_get_iommu_attr,
 };
 
 static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
@@ -4197,6 +4236,7 @@ static void 
vtd_iommu_memory_region_class_init(ObjectClass *klass,
 imrc->translate = vtd_iommu_translate;
 imrc->notify_flag_changed = vtd_iommu_notify_flag_changed;
 imrc->replay = vtd_iommu_replay;
+imrc->get_attr = vtd_iommu_get_attr;
 }
 
 static const TypeInfo vtd_iommu_memory_region_info = {
-- 
2.17.2

[PATCH v4 00/15] vfio: VFIO migration support with vIOMMU

2023-06-22 Thread Joao Martins

Hey,

This series introduces support for vIOMMU with VFIO device migration,
particurlarly related to how we do the dirty page tracking.

Today vIOMMUs serve two purposes: 1) enable interrupt remaping 2)
provide dma translation services for guests to provide some form of
guest kernel managed DMA e.g. for nested virt based usage; (1) is specially
required for big VMs with VFs with more than 255 vcpus. We tackle both
and remove the migration blocker when vIOMMU is present provided the
conditions are met. I have both use-cases here in one series, but I am happy
to tackle them in separate series.

As I found out we don't necessarily need to expose the whole vIOMMU
functionality in order to just support interrupt remapping. x86 IOMMUs
on Windows Server 2018[2] and Linux >=5.10, with qemu 7.1+ (or really
Linux guests with commit c40c10 and since qemu commit 8646d9c773d8)
can instantiate a IOMMU just for interrupt remapping without needing to
be advertised/support DMA translation. AMD IOMMU in theory can provide
the same, but Linux doesn't quite support the IR-only part there yet,
only intel-iommu.

The series is organized as following:

Patches 1-5: Today we can't gather vIOMMU details before the guest
establishes their first DMA mapping via the vIOMMU. So these first four
patches add a way for vIOMMUs to be asked of their properties at start
of day. I choose the least churn possible way for now (as opposed to a
treewide conversion) and allow easy conversion a posteriori. As
suggested by Peter Xu[7], I have ressurected Yi's patches[5][6] which
allows us to fetch PCI backing vIOMMU attributes, without necessarily
tieing the caller (VFIO or anyone else) to an IOMMU MR like I
was doing in v3.

Patches 6-8: Handle configs with vIOMMU interrupt remapping but without
DMA translation allowed. Today the 'dma-translation' attribute is
x86-iommu only, but the way this series is structured nothing stops from
other vIOMMUs supporting it too as long as they use
pci_setup_iommu_ops() and the necessary IOMMU MR get_attr attributes
are handled. The blocker is thus relaxed when vIOMMUs are able to toggle
the toggle/report DMA_TRANSLATION attribute. With the patches up to this set,
we've then tackled item (1) of the second paragraph.

Patches 9-15: Simplified a lot from v2 (patch 9) to only track the complete
IOVA address space, leveraging the logic we use to compose the dirty ranges.
The blocker is once again relaxed for vIOMMUs that advertise their IOVA
addressing limits. This tackles item (2). So far I mainly use it with
intel-iommu, although I have a small set of patches for virtio-iommu per
Alex's suggestion in v2.

Comments, suggestions welcome. Thanks for the review!

Regards,
Joao

Changes since v3[8]:
* Pick up Yi's patches[5][6], and rework the first four patches.
  These are a bit better splitted, and make the new iommu_ops *optional*
  as opposed to a treewide conversion. Rather than returning an IOMMU MR
  and let VFIO operate on it to fetch attributes, we instead let the
  underlying IOMMU driver fetch the desired IOMMU MR and ask for the
  desired IOMMU attribute. Callers only care about PCI Device backing
  vIOMMU attributes regardless of its topology/association. (Peter Xu)
  These patches are a bit better splitted compared to original ones,
  and I've kept all the same authorship and note the changes from
  original where applicable.
* Because of the rework of the first four patches, switch to
  individual attributes in the VFIOSpace that track dma_translation
  and the max_iova. All are expected to be unused when zero to retain
  the defaults of today in common code.
* Improve the migration blocker message of the last patch to be
  more obvious that vIOMMU migration blocker is added when no vIOMMU
  address space limits are advertised. (Patch 15)
* Cast to uintptr_t in IOMMUAttr data in intel-iommu (Philippe).
* Switch to MAKE_64BIT_MASK() instead of plain left shift (Philippe).
* Change diffstat of patches with scripts/git.orderfile (Philippe).

Changes since v2[3]:
* New patches 1-9 to be able to handle vIOMMUs without DMA translation, and
introduce ways to know various IOMMU model attributes via the IOMMU MR. This
is partly meant to address a comment in previous versions where we can't
access the IOMMU MR prior to the DMA mapping happening. Before this series
vfio giommu_list is only tracking 'mapped GIOVA' and that controlled by the
guest. As well as better tackling of the IOMMU usage for interrupt-remapping
only purposes. 
* Dropped Peter Xu ack on patch 9 given that the code changed a bit.
* Adjust patch 14 to adjust for the VFIO bitmaps no longer being pointers.
* The patches that existed in v2 of vIOMMU dirty tracking, are mostly
* untouched, except patch 12 which was greatly simplified.

Changes since v1[4]:
- Rebased on latest master branch. As part of it, made some changes in
  pre-copy to adjust it to Juan's new patches:
  1. Added a new patch that passes threshold_size parameter to

[PATCH v4 04/15] intel-iommu: Switch to pci_setup_iommu_ops()

2023-06-22 Thread Joao Martins

From: Yi Liu 

Use the PCI IOMMU setup function that supply a PCIIOMMUOps
argument. This is in preparation to support fetching vIOMMU
information via pci_device_get_iommu_attr() which will require
switching the driver to pci_setup_iommu_ops().

Signed-off-by: Yi Liu 
[joao: Split from the original patch]
Signed-off-by: Joao Martins 
---
Splitted from:
https://lore.kernel.org/all/20210302203827.437645-5-yi.l@intel.com/#Z2e.:20210302203827.437645-5-yi.l.liu::40intel.com:1hw:i386:intel_iommu.c
---
 hw/i386/intel_iommu.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 94d52f4205d2..1606d1b952d0 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -4032,6 +4032,10 @@ static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, 
void *opaque, int devfn)
 return &vtd_as->as;
 }
 
+static PCIIOMMUOps vtd_iommu_ops = {
+.get_address_space = vtd_host_dma_iommu,
+};
+
 static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
 {
 X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
@@ -4155,7 +4159,7 @@ static void vtd_realize(DeviceState *dev, Error **errp)
   g_free, g_free);
 vtd_init(s);
 sysbus_mmio_map(SYS_BUS_DEVICE(s), 0, Q35_HOST_BRIDGE_IOMMU_ADDR);
-pci_setup_iommu(bus, vtd_host_dma_iommu, dev);
+pci_setup_iommu_ops(bus, &vtd_iommu_ops, dev);
 /* Pseudo address space under root PCI bus. */
 x86ms->ioapic_as = vtd_host_dma_iommu(bus, s, Q35_PSEUDO_DEVFN_IOAPIC);
 qemu_add_machine_init_done_notifier(&vtd_machine_done_notify);
-- 
2.17.2

[PATCH v4 07/15] vfio/common: Track whether DMA Translation is enabled on the vIOMMU

2023-06-22 Thread Joao Martins

vfio_get_group() allocates and fills the group/container/space on
success which will store the AddressSpace inside the VFIOSpace struct.
Use the newly added pci_device_iommu_get_attr() to see if DMA
translation is enabled or not. Assume that by default it is enabled.

Today, this means only intel-iommu supports it.

Signed-off-by: Joao Martins 
---
 include/hw/vfio/vfio-common.h |  1 +
 hw/vfio/pci.c | 15 ++-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index eed244f25f34..f41860988d6b 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -70,6 +70,7 @@ typedef struct VFIOMigration {
 
 typedef struct VFIOAddressSpace {
 AddressSpace *as;
+bool no_dma_translation;
 QLIST_HEAD(, VFIOContainer) containers;
 QLIST_ENTRY(VFIOAddressSpace) list;
 } VFIOAddressSpace;
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 73874a94de12..8a98e6ffc480 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2900,6 +2900,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 VFIOPCIDevice *vdev = VFIO_PCI(pdev);
 VFIODevice *vbasedev = &vdev->vbasedev;
 VFIODevice *vbasedev_iter;
+VFIOAddressSpace *space;
 VFIOGroup *group;
 char *tmp, *subsys, group_path[PATH_MAX], *group_name;
 Error *err = NULL;
@@ -2907,7 +2908,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 struct stat st;
 int groupid;
 int i, ret;
-bool is_mdev;
+bool is_mdev, dma_translation;
 char uuid[UUID_FMT_LEN];
 char *name;
 
@@ -2961,6 +2962,18 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 goto error;
 }
 
+space = group->container->space;
+
+/*
+ * Support for toggling DMA translation is optional.
+ * By default, DMA translation is assumed to be enabled i.e.
+ * space::no_dma_translation is 0.
+ */
+dma_translation = true;
+pci_device_iommu_get_attr(pdev, IOMMU_ATTR_DMA_TRANSLATION,
+  &dma_translation);
+space->no_dma_translation = !dma_translation;
+
 QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
 if (strcmp(vbasedev_iter->name, vbasedev->name) == 0) {
 error_setg(errp, "device is already attached");
-- 
2.17.2

[PATCH v4 01/15] hw/pci: Add a pci_setup_iommu_ops() helper

2023-06-22 Thread Joao Martins

From: Yi Liu 

Add a pci_setup_iommu_ops() that uses a newly added structure
(PCIIOMMUOps) instead of using PCIIOMMUFunc. The old pci_setup_iommu()
that uses PCIIOMMUFunc is still kept for other IOMMUs to get an
an address space for a PCI device in vendor specific way.

In preparation to expand to supplying vIOMMU attributes, add a
alternate helper pci_setup_iommu_ops() to setup the PCI device IOMMU.
For now the PCIIOMMUOps just defines the address_space, but it will
be extended to have another callback.

Signed-off-by: Yi Liu 
[joao: Massage commit message and subject, and make it a complementary
rather than changing every single consumer of pci_setup_iommu()]
Signed-off-by: Joao Martins 
---
v1: https://lore.kernel.org/all/20210302203827.437645-5-yi.l@intel.com/
---
 include/hw/pci/pci.h |  7 +++
 include/hw/pci/pci_bus.h |  1 +
 hw/pci/pci.c | 26 +++---
 3 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index e6d0574a2999..f59aef5a329a 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -368,6 +368,13 @@ typedef AddressSpace *(*PCIIOMMUFunc)(PCIBus *, void *, 
int);
 AddressSpace *pci_device_iommu_address_space(PCIDevice *dev);
 void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque);
 
+typedef struct PCIIOMMUOps PCIIOMMUOps;
+struct PCIIOMMUOps {
+AddressSpace * (*get_address_space)(PCIBus *bus,
+void *opaque, int32_t devfn);
+};
+void pci_setup_iommu_ops(PCIBus *bus, const PCIIOMMUOps *iommu_ops, void 
*opaque);
+
 pcibus_t pci_bar_address(PCIDevice *d,
  int reg, uint8_t type, pcibus_t size);
 
diff --git a/include/hw/pci/pci_bus.h b/include/hw/pci/pci_bus.h
index 56531759578f..fb770b236d69 100644
--- a/include/hw/pci/pci_bus.h
+++ b/include/hw/pci/pci_bus.h
@@ -35,6 +35,7 @@ struct PCIBus {
 enum PCIBusFlags flags;
 PCIIOMMUFunc iommu_fn;
 void *iommu_opaque;
+const PCIIOMMUOps *iommu_ops;
 uint8_t devfn_min;
 uint32_t slot_reserved_mask;
 pci_set_irq_fn set_irq;
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index bf38905b7dc0..4e32c09e81d6 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2639,7 +2639,15 @@ AddressSpace *pci_device_iommu_address_space(PCIDevice 
*dev)
 PCIBus *iommu_bus = bus;
 uint8_t devfn = dev->devfn;
 
-while (iommu_bus && !iommu_bus->iommu_fn && iommu_bus->parent_dev) {
+/*
+ * get_address_space() callback is mandatory when iommu uses
+ * pci_setup_iommu_ops(), so needs to ensure its presence in
+ * the iommu_bus search.
+ */
+while (iommu_bus &&
+   !(iommu_bus->iommu_fn ||
+(iommu_bus->iommu_ops && iommu_bus->iommu_ops->get_address_space)) 
&&
+   iommu_bus->parent_dev) {
 PCIBus *parent_bus = pci_get_bus(iommu_bus->parent_dev);
 
 /*
@@ -2678,8 +2686,14 @@ AddressSpace *pci_device_iommu_address_space(PCIDevice 
*dev)
 
 iommu_bus = parent_bus;
 }
-if (!pci_bus_bypass_iommu(bus) && iommu_bus && iommu_bus->iommu_fn) {
-return iommu_bus->iommu_fn(bus, iommu_bus->iommu_opaque, devfn);
+if (!pci_bus_bypass_iommu(bus) && iommu_bus) {
+if (iommu_bus->iommu_fn) {
+   return iommu_bus->iommu_fn(bus, iommu_bus->iommu_opaque, devfn);
+} else if (iommu_bus->iommu_ops &&
+   iommu_bus->iommu_ops->get_address_space) {
+   return iommu_bus->iommu_ops->get_address_space(bus,
+   iommu_bus->iommu_opaque, devfn);
+}
 }
 return &address_space_memory;
 }
@@ -2690,6 +2704,12 @@ void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void 
*opaque)
 bus->iommu_opaque = opaque;
 }
 
+void pci_setup_iommu_ops(PCIBus *bus, const PCIIOMMUOps *ops, void *opaque)
+{
+bus->iommu_ops = ops;
+bus->iommu_opaque = opaque;
+}
+
 static void pci_dev_get_w64(PCIBus *b, PCIDevice *dev, void *opaque)
 {
 Range *range = opaque;
-- 
2.17.2

Re: [PATCH V1 3/3] tests/qtest: live migration suspended state

2023-06-22 Thread Steven Sistare

On 6/21/2023 4:00 PM, Peter Xu wrote:
> On Wed, Jun 21, 2023 at 03:39:44PM -0400, Steven Sistare wrote:
 -jmp mainloop
 +# should this test suspend?
 +mov (suspend_me),%eax
 +cmp $0,%eax
 +je mainloop
 +
 +# are we waking after suspend?  do not suspend again.
 +mov $suspended,%eax
>>>
>>> So IIUC then it'll use 4 bytes over 100MB range which means we need at
>>> least 100MB+4bytes.. not obvious for a HIGH_ADDR definition to me..
>>>
>>> Could we just define a variable inside the section like suspend_me?
>>
>> No, because modifications to this memory backing the boot block are not
>> copied to the destination.  The dest reads a clean copy of the boot block
>> from disk, as specified by the qemu command line arguments.
> 
> Oh okay, can we use HIGH_ADDR-4, then?  I just still think it'll be nice if
> we can keep HIGH_ADDR the high bar of the whole range.

Sure.  I'll use LOW_ADDR + 4, and add a comment.

- Steve

Re: [PATCH RFC 0/6] Switch iotests to pyvenv

2023-06-22 Thread John Snow

On Thu, Jun 22, 2023 at 5:12 PM Paolo Bonzini  wrote:
>
> On Thu, Jun 22, 2023 at 11:08 PM John Snow  wrote:
> >
> > On Thu, Jun 22, 2023 at 5:05 PM Paolo Bonzini  wrote:
> > >
> > > On Thu, Jun 22, 2023 at 11:03 PM John Snow  wrote:
> > > > If we always install it in editable mode, and the path where it is
> > > > "installed" is what we expect it to be, it shouldn't have any problems
> > > > with being out of date I think. We could conceivably use the
> > > > "faux" package version the internal package has to signal when the
> > > > script needs to re-install it.
> > >
> > > Stupid question, why not treat it just like avocado?
> > >
> >
> > How do you mean? (i.e. installing it on-demand in reaction to "make
> > check-avocado"?)
>
> Yes, installing it on-demand the first time "make check-iotests" is
> run, using a "depend:" keyword argument in
> tests/qemu-iotests/meson.build.
>
> BTW,
>
> from distlib.scripts import ScriptMaker
> ScriptMaker('..', '.').make('foo.py')
>
> Seems to do the right thing as long as foo.py includes a shebang (I
> tested it inside a virtual environment).
>
> Paolo

That's possible, but it means that it will break if you run configure
and then immediately go to invoke iotests, unless we have a way to
have iotests bootstrap itself. Which I think can't be done through the
makefile, because we don't know which "make" to run in order to get
that to happen. (Or at least, I don't!)

Possibly I could teach mkvenv a new trick, like "mkvenv init iotests"
and have the mkvenv script DTRT at that point, whatever that is --
ideally exiting very quickly without doing anything.

Re: [PATCH v4 1/1] target/riscv: Add RVV registers to log

2023-06-22 Thread Daniel Henrique Barboza





On 6/22/23 06:43, Ivan Klokov wrote:

Print RvV extesion register to log if VPU option is enabled.


Typo: extesion -> extension



Signed-off-by: Ivan Klokov 
---
v4:
- General part of patch has been merged, rebase riscv part and resend.
---
  target/riscv/cpu.c | 56 +-
  1 file changed, 55 insertions(+), 1 deletion(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index fb8458bf74..b23f3fde0d 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -183,6 +183,14 @@ const char * const riscv_fpr_regnames[] = {
  "f30/ft10", "f31/ft11"
  };
  
+const char * const riscv_rvv_regnames[] = {

+  "v0",  "v1",  "v2",  "v3",  "v4",  "v5",  "v6",
+  "v7",  "v8",  "v9",  "v10", "v11", "v12", "v13",
+  "v14", "v15", "v16", "v17", "v18", "v19", "v20",
+  "v21", "v22", "v23", "v24", "v25", "v26", "v27",
+  "v28", "v29", "v30", "v31"
+};
+
  static const char * const riscv_excp_names[] = {
  "misaligned_fetch",
  "fault_fetch",
@@ -611,7 +619,8 @@ static void riscv_cpu_dump_state(CPUState *cs, FILE *f, int 
flags)
  {
  RISCVCPU *cpu = RISCV_CPU(cs);
  CPURISCVState *env = &cpu->env;
-int i;
+int i, j;
+uint8_t *p;
  
  #if !defined(CONFIG_USER_ONLY)

  if (riscv_has_ext(env, RVH)) {
@@ -695,6 +704,51 @@ static void riscv_cpu_dump_state(CPUState *cs, FILE *f, 
int flags)
  }
  }
  }
+if (riscv_has_ext(env, RVV) && (flags & CPU_DUMP_VPU)) {
+static const int dump_rvv_csrs[] = {
+CSR_VSTART,
+CSR_VXSAT,
+CSR_VXRM,
+CSR_VCSR,
+CSR_VL,
+CSR_VTYPE,
+CSR_VLENB,
+};
+for (int i = 0; i < ARRAY_SIZE(dump_rvv_csrs); ++i) {
+int csrno = dump_rvv_csrs[i];
+target_ulong val = 0;
+RISCVException res = riscv_csrrw_debug(env, csrno, &val, 0, 0);
+
+/*
+ * Rely on the smode, hmode, etc, predicates within csr.c
+ * to do the filtering of the registers that are present.
+ */
+if (res == RISCV_EXCP_NONE) {
+qemu_fprintf(f, " %-8s " TARGET_FMT_lx "\n",
+ csr_ops[csrno].name, val);
+}
+}
+uint16_t vlenb = env_archcpu(env)->cfg.vlen >> 3;


We have a "RISCVCPU *cpu" pointer available at the start of the function. Use
that to access the cfg obj and avoid an unneeded env_archcpu() call:


+uint16_t vlenb = cpu->cfg.vlen >> 3;




+
+/*
+ * From vector_helper.c
+ * Note that vector data is stored in host-endian 64-bit chunks,
+ * so addressing bytes needs a host-endian fixup.
+ */
+#if HOST_BIG_ENDIAN
+#define BYTE(x)   ((x) ^ 7)
+#else
+#define BYTE(x)   (x)
+#endif


Usually we don't declare new macros in the middle of functions. I suggest
moving this #define block to outside of riscv_cpu_dump_state() to keep the
code more line with the current code style we use. Thanks,


Daniel



+for (i = 0; i < 32; i++) {
+qemu_fprintf(f, " %-8s ", riscv_rvv_regnames[i]);
+p = (uint8_t *)env->vreg;
+for (j = vlenb - 1 ; j >= 0; j--) {
+qemu_fprintf(f, "%02x", *(p + i * vlenb + BYTE(j)));
+}
+qemu_fprintf(f, "\n");
+}
+}
  }
  
  static void riscv_cpu_set_pc(CPUState *cs, vaddr value)

Re: [PATCH RFC 0/6] Switch iotests to pyvenv

2023-06-22 Thread Paolo Bonzini

On Thu, Jun 22, 2023 at 11:08 PM John Snow  wrote:
>
> On Thu, Jun 22, 2023 at 5:05 PM Paolo Bonzini  wrote:
> >
> > On Thu, Jun 22, 2023 at 11:03 PM John Snow  wrote:
> > > If we always install it in editable mode, and the path where it is
> > > "installed" is what we expect it to be, it shouldn't have any problems
> > > with being out of date I think. We could conceivably use the
> > > "faux" package version the internal package has to signal when the
> > > script needs to re-install it.
> >
> > Stupid question, why not treat it just like avocado?
> >
>
> How do you mean? (i.e. installing it on-demand in reaction to "make
> check-avocado"?)

Yes, installing it on-demand the first time "make check-iotests" is
run, using a "depend:" keyword argument in
tests/qemu-iotests/meson.build.

BTW,

from distlib.scripts import ScriptMaker
ScriptMaker('..', '.').make('foo.py')

Seems to do the right thing as long as foo.py includes a shebang (I
tested it inside a virtual environment).

Paolo

Re: [PATCH RFC 0/6] Switch iotests to pyvenv

2023-06-22 Thread John Snow

On Thu, Jun 22, 2023 at 5:05 PM Paolo Bonzini  wrote:
>
> On Thu, Jun 22, 2023 at 11:03 PM John Snow  wrote:
> > If we always install it in editable mode, and the path where it is
> > "installed" is what we expect it to be, it shouldn't have any problems
> > with being out of date I think. We could conceivably use the
> > "faux" package version the internal package has to signal when the
> > script needs to re-install it.
>
> Stupid question, why not treat it just like avocado?
>

How do you mean? (i.e. installing it on-demand in reaction to "make
check-avocado"?)

Re: [PATCH RFC 0/6] Switch iotests to pyvenv

2023-06-22 Thread Paolo Bonzini

On Thu, Jun 22, 2023 at 11:03 PM John Snow  wrote:
> If we always install it in editable mode, and the path where it is
> "installed" is what we expect it to be, it shouldn't have any problems
> with being out of date I think. We could conceivably use the
> "faux" package version the internal package has to signal when the
> script needs to re-install it.

Stupid question, why not treat it just like avocado?

Re: [PATCH RFC 0/6] Switch iotests to pyvenv

2023-06-22 Thread John Snow

On Thu, Jun 22, 2023 at 5:24 AM Paolo Bonzini  wrote:
>
> On Wed, Jun 21, 2023 at 9:08 AM Paolo Bonzini  wrote:
> > Maybe patch 4 can use distlib.scripts as well to create the check script in 
> > the build directory? (Yes that's another mkvenv functionality...) On a 
> > phone and don't have the docs at hand, so I am not sure. If not, your 
> > solution is good enough.
> >

Yeah, that's a possibility... we could "install" the iotests script.
That might keep things simple. I'll investigate it.

> > Apart from this the only issue is the speed. IIRC having a prebuilt .whl 
> > would fix it, I think for Meson we observed that the slow part was building 
> > the wheel. Possibilities:
> >
> > 1) using --no-pep517 if that also speeds it up?
> >
> > 2) already removing the sources to qemu.qmp since that's the plan anyway; 
> > and then, if you want editability you can install the package with --user 
> > --editable, i.e. outside the venv
>
> Nope, it's 3 second always and 1.5 even with the wheel.
>
> Maybe replace qemu.qmp with a wheel and leaving PYTHONPATH for the rest?
>
> Paolo
>

Hm, I guess so. It's just disappointing because I was really hoping to
be able to use "pip install" to handle dependencies like a normal
package instead of trying to shoulder that burden with an increasing
amount of custom logic that's hard for anyone but me (or you, now) to
maintain.

It kind of defeats the point of having formatted it as a package to begin with.

Maybe there's a sane way to amortize the cost of installation by not
re-creating it after every call to configure instead -- the rest of
the script is fast enough, perhaps we could default clear to *False*
from now on and use the _get_version() bits to detect if the local
internal package is already installed or not -- and if it is, just
leave it alone.

If we always install it in editable mode, and the path where it is
"installed" is what we expect it to be, it shouldn't have any problems
with being out of date I think. We could conceivably use the
"faux" package version the internal package has to signal when the
script needs to re-install it.

Something like that?

--js

Re: [PATCH v3 01/15] hw/pci: Refactor pci_device_iommu_address_space()

2023-06-22 Thread Joao Martins




On 22/06/2023 21:50, Michael S. Tsirkin wrote:
> On Wed, May 31, 2023 at 11:03:23AM +0100, Joao Martins wrote:
>> On 30/05/2023 23:04, Philippe Mathieu-Daudé wrote:
>>> Hi Joao,
>>>
>>> On 30/5/23 19:59, Joao Martins wrote:
 Rename pci_device_iommu_address_space() into pci_device_iommu_info().
 In the new function return a new type PCIAddressSpace that encapsulates
 the AddressSpace pointer that originally was returned.

 The new type is added in preparation to expanding it to include the IOMMU
 memory region as a new field, such that we are able to fetch attributes of
 the vIOMMU e.g. at vfio migration setup.

 Signed-off-by: Joao Martins 
 ---
   hw/pci/pci.c |  9 ++---
   include/hw/pci/pci.h | 21 -
>>>
>>> Please consider using scripts/git.orderfile.
>>>
>> Will do -- wasn't aware of that script.
>>
   2 files changed, 26 insertions(+), 4 deletions(-)

 diff --git a/hw/pci/pci.c b/hw/pci/pci.c
 index 1cc7c89036b5..ecf8a543aa77 100644
 --- a/hw/pci/pci.c
 +++ b/hw/pci/pci.c
 @@ -2633,11 +2633,12 @@ static void pci_device_class_base_init(ObjectClass
 *klass, void *data)
   }
   }
   -AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
 +PCIAddressSpace pci_device_iommu_info(PCIDevice *dev)
   {
>>>
>>> This function is PCI specific, ...
>>>
   }
     void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque)
 diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
 index e6d0574a2999..9ffaf47fe2ab 100644
 --- a/include/hw/pci/pci.h
 +++ b/include/hw/pci/pci.h
 @@ -363,9 +363,28 @@ void pci_bus_get_w64_range(PCIBus *bus, Range *range);
     void pci_device_deassert_intx(PCIDevice *dev);
   +typedef struct PCIAddressSpace {
 +    AddressSpace *as;
>>>
>>> ... but here I fail to understand what is PCI specific in this
>>> structure. You are just trying to an AS with a IOMMU MR, right?
>>>
>> Right. The patch is trying to better split the changes to use one function to
>> return everything (via pci_device_iommu_info) with the PCIAddressSpace
>> intermediate structure as retval, such that patch 3 just adds a
>> IOMMUMemoryRegion* in the latter for usage with the
>> pci_device_iommu_memory_region().
>>
>> I've named the structure with a 'PCI' prefix, because it seemed to me that 
>> it is
>> the only case (AIUI) that cares about whether a PCI has a different address
>> space that the memory map.
> 
> 
> yea keep that pls. It should be possible to figure out the header
> from the name.
> 

OK.

I am about to respin v4 series. It mainly reworks the first four patch enterily.
Essentially I'm following Peter's suggestion of picking Yi's old patches[0][1]
and avoid the direct manipulation of an IOMMU MR. The structure is very similar,
but the difference is avoid the direct manipulation of an IOMMU MR[2]. The end
goal in hw/pci is similar, fetching the backing IOMMU attribute from a PCI 
device.

[0] https://lore.kernel.org/all/20210302203827.437645-5-yi.l@intel.com/
[1] https://lore.kernel.org/all/20210302203827.437645-6-yi.l@intel.com/
[2] https://lore.kernel.org/qemu-devel/ZH9Kr6mrKNqUgcYs@x1n/

 +} PCIAddressSpace;
 +
   typedef AddressSpace *(*PCIIOMMUFunc)(PCIBus *, void *, int);
 +static inline PCIAddressSpace as_to_pci_as(AddressSpace *as)
 +{
 +    PCIAddressSpace ret = { .as = as };
 +
 +    return ret;
 +}
 +static inline AddressSpace *pci_as_to_as(PCIAddressSpace pci_as)
 +{
 +    return pci_as.as;
 +}
 +
 +PCIAddressSpace pci_device_iommu_info(PCIDevice *dev);
 +static inline AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
 +{
 +    return pci_as_to_as(pci_device_iommu_info(dev));
 +}
   -AddressSpace *pci_device_iommu_address_space(PCIDevice *dev);
   void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque);
     pcibus_t pci_bar_address(PCIDevice *d,
>>>
>

Re: [RFC PATCH] softfloat: use QEMU_FLATTEN to avoid mistaken isra inlining

2023-06-22 Thread BALATON Zoltan


Hello,

What happened to this patch? Will this be merged by somebody?

Regards,
BALATON Zoltan

On Tue, 23 May 2023, BALATON Zoltan wrote:

On Tue, 23 May 2023, Alex Bennée wrote:

Balton discovered that asserts for the extract/deposit calls had a


Missing an a in my name and my given name is Zoltan. (First name and last 
name is in the other way in Hungarian.) Maybe just add a Reported-by instead 
of here if you want to record it.



significant impact on a lame benchmark on qemu-ppc. Replicating with:

 ./qemu-ppc64 ~/lsrc/tests/lame.git-svn/builds/ppc64/frontend/lame \
   -h pts-trondheim-3.wav pts-trondheim-3.mp3

showed up the pack/unpack routines not eliding the assert checks as it
should have done causing them to prominently figure in the profile:

 11.44%  qemu-ppc64  qemu-ppc64   [.] unpack_raw64.isra.0
 11.03%  qemu-ppc64  qemu-ppc64   [.] parts64_uncanon_normal
  8.26%  qemu-ppc64  qemu-ppc64   [.] 
helper_compute_fprf_float64

  6.75%  qemu-ppc64  qemu-ppc64   [.] do_float_check_status
  5.34%  qemu-ppc64  qemu-ppc64   [.] parts64_muladd
  4.75%  qemu-ppc64  qemu-ppc64   [.] pack_raw64.isra.0
  4.38%  qemu-ppc64  qemu-ppc64   [.] parts64_canonicalize
  3.62%  qemu-ppc64  qemu-ppc64   [.] 
float64r32_round_pack_canonical


After this patch the same test runs 31 seconds faster with a profile
where the generated code dominates more:

+   14.12% 0.00%  qemu-ppc64  [unknown][.] 
0x004000619420
+   13.30% 0.00%  qemu-ppc64  [unknown][.] 
0x004000616850
+   12.58%12.19%  qemu-ppc64  qemu-ppc64   [.] 
parts64_uncanon_normal
+   10.62% 0.00%  qemu-ppc64  [unknown][.] 
0x00400061bf70
+9.91% 9.73%  qemu-ppc64  qemu-ppc64   [.] 
helper_compute_fprf_float64
+7.84% 7.82%  qemu-ppc64  qemu-ppc64   [.] 
do_float_check_status
+6.47% 5.78%  qemu-ppc64  qemu-ppc64   [.] 
parts64_canonicalize.constprop.0
+6.46% 0.00%  qemu-ppc64  [unknown][.] 
0x004000620130
+6.42% 0.00%  qemu-ppc64  [unknown][.] 
0x004000619400
+6.17% 6.04%  qemu-ppc64  qemu-ppc64   [.] 
parts64_muladd
+5.85% 0.00%  qemu-ppc64  [unknown][.] 
0x0040006167e0
+5.74% 0.00%  qemu-ppc64  [unknown][.] 
0xb693fcd3
+5.45% 4.78%  qemu-ppc64  qemu-ppc64   [.] 
float64r32_round_pack_canonical


Suggested-by: Richard Henderson 
Message-Id: 
[AJB: Patchified rth's suggestion]
Signed-off-by: Alex Bennée 
Cc: BALATON Zoltan 


Replace Cc: with
Tested-by: BALATON Zoltan 

This solves the softfloat related usages, the rest probably are lower 
overhead, I could not measure any more improvement with removing asserts on 
top of this patch. I still have these functions high in my profiling result:


children  selfcommand  symbol
11.40%10.86%  qemu-system-ppc  helper_compute_fprf_float64
11.25% 0.61%  qemu-system-ppc  helper_fmadds
10.01% 3.23%  qemu-system-ppc  float64r32_round_pack_canonical
8.59% 1.80%  qemu-system-ppc  helper_float_check_status
8.34% 7.23%  qemu-system-ppc  parts64_muladd
8.16% 0.67%  qemu-system-ppc  helper_fmuls
8.08% 0.43%  qemu-system-ppc  parts64_uncanon
7.49% 1.78%  qemu-system-ppc  float64r32_mul
7.32% 7.32%  qemu-system-ppc  parts64_uncanon_normal
6.48% 0.52%  qemu-system-ppc  helper_fadds
6.31% 6.31%  qemu-system-ppc  do_float_check_status
5.99% 1.14%  qemu-system-ppc  float64r32_add

Any idea on those?

Unrelated to this patch I also started to see random crashes with a DSI on a 
dcbz instruction now which did not happen before (or not frequently enough 
for me to notice). I did not bisect that as it happens randomly but I wonder 
if it could be related to recent unaligned access changes or some other TCG 
change? Any idea what to check?


Regards,
BALATON Zoltan


---
fpu/softfloat.c | 22 +++---
1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 108f9cb224..42e6c188b4 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -593,27 +593,27 @@ static void unpack_raw64(FloatParts64 *r, const 
FloatFmt *fmt, uint64_t raw)

};
}

-static inline void float16_unpack_raw(FloatParts64 *p, float16 f)
+static void QEMU_FLATTEN float16_unpack_raw(FloatParts64 *p, float16 f)
{
unpack_raw64(p, &float16_params, f);
}

-static inline void bfloat16_unpack_raw(FloatParts64 *p, bfloat16 f)
+static void QEMU_FLATTEN bfloat16_unpack_raw(FloatParts64 *p, bfloat16 f)
{
unpack_raw64(p, &bfloat16_params, f);
}

-static inline void float32_unpack_raw(FloatParts64 *p, float32 f)
+static void QEMU_FLATTEN float32_unpack_raw(FloatParts64 *p, float32 f)
{
unpack_raw64(p, &float32_params, f);
}

-static inline void float64_unpack_raw

Re: [PATCH v3 01/15] hw/pci: Refactor pci_device_iommu_address_space()

2023-06-22 Thread Michael S. Tsirkin

On Wed, May 31, 2023 at 11:03:23AM +0100, Joao Martins wrote:
> On 30/05/2023 23:04, Philippe Mathieu-Daudé wrote:
> > Hi Joao,
> > 
> > On 30/5/23 19:59, Joao Martins wrote:
> >> Rename pci_device_iommu_address_space() into pci_device_iommu_info().
> >> In the new function return a new type PCIAddressSpace that encapsulates
> >> the AddressSpace pointer that originally was returned.
> >>
> >> The new type is added in preparation to expanding it to include the IOMMU
> >> memory region as a new field, such that we are able to fetch attributes of
> >> the vIOMMU e.g. at vfio migration setup.
> >>
> >> Signed-off-by: Joao Martins 
> >> ---
> >>   hw/pci/pci.c |  9 ++---
> >>   include/hw/pci/pci.h | 21 -
> > 
> > Please consider using scripts/git.orderfile.
> > 
> Will do -- wasn't aware of that script.
> 
> >>   2 files changed, 26 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> >> index 1cc7c89036b5..ecf8a543aa77 100644
> >> --- a/hw/pci/pci.c
> >> +++ b/hw/pci/pci.c
> >> @@ -2633,11 +2633,12 @@ static void pci_device_class_base_init(ObjectClass
> >> *klass, void *data)
> >>   }
> >>   }
> >>   -AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
> >> +PCIAddressSpace pci_device_iommu_info(PCIDevice *dev)
> >>   {
> > 
> > This function is PCI specific, ...
> > 
> >>   }
> >>     void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque)
> >> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> >> index e6d0574a2999..9ffaf47fe2ab 100644
> >> --- a/include/hw/pci/pci.h
> >> +++ b/include/hw/pci/pci.h
> >> @@ -363,9 +363,28 @@ void pci_bus_get_w64_range(PCIBus *bus, Range *range);
> >>     void pci_device_deassert_intx(PCIDevice *dev);
> >>   +typedef struct PCIAddressSpace {
> >> +    AddressSpace *as;
> > 
> > ... but here I fail to understand what is PCI specific in this
> > structure. You are just trying to an AS with a IOMMU MR, right?
> > 
> Right. The patch is trying to better split the changes to use one function to
> return everything (via pci_device_iommu_info) with the PCIAddressSpace
> intermediate structure as retval, such that patch 3 just adds a
> IOMMUMemoryRegion* in the latter for usage with the
> pci_device_iommu_memory_region().
> 
> I've named the structure with a 'PCI' prefix, because it seemed to me that it 
> is
> the only case (AIUI) that cares about whether a PCI has a different address
> space that the memory map.


yea keep that pls. It should be possible to figure out the header
from the name.

> >> +} PCIAddressSpace;
> >> +
> >>   typedef AddressSpace *(*PCIIOMMUFunc)(PCIBus *, void *, int);
> >> +static inline PCIAddressSpace as_to_pci_as(AddressSpace *as)
> >> +{
> >> +    PCIAddressSpace ret = { .as = as };
> >> +
> >> +    return ret;
> >> +}
> >> +static inline AddressSpace *pci_as_to_as(PCIAddressSpace pci_as)
> >> +{
> >> +    return pci_as.as;
> >> +}
> >> +
> >> +PCIAddressSpace pci_device_iommu_info(PCIDevice *dev);
> >> +static inline AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
> >> +{
> >> +    return pci_as_to_as(pci_device_iommu_info(dev));
> >> +}
> >>   -AddressSpace *pci_device_iommu_address_space(PCIDevice *dev);
> >>   void pci_setup_iommu(PCIBus *bus, PCIIOMMUFunc fn, void *opaque);
> >>     pcibus_t pci_bar_address(PCIDevice *d,
> >

Re: [PATCH V2] migration: file URI

2023-06-22 Thread Steven Sistare

On 6/22/2023 8:20 AM, Fabiano Rosas wrote:
> Steve Sistare  writes:
> 
>> Extend the migration URI to support file:.  This can be used for
>> any migration scenario that does not require a reverse path.  It can be used
>> as an alternative to 'exec:cat > file' in minimized containers that do not
>> contain /bin/sh, and it is easier to use than the fd: URI.  It can
>> be used in HMP commands, and as a qemu command-line parameter.
>>
>> Signed-off-by: Steve Sistare 
> 
> Reviewed-by: Fabiano Rosas 
> 
> I'm ok with using this version over mine. I based my series on top of
> this and it works fine.
> 
> I'm preparing a couple of patches with the test case. We'll need a fix
> to common migration code before it can work due to the latest
> migration-test.c changes.

Hi Fabiano,
  I re-submitted my patch, along with an offset parameter requested by Daniel.
Perhaps you can add a test case using the offset?

- Steve

[PATCH V3 1/2] migration: file URI

2023-06-22 Thread Steve Sistare

Extend the migration URI to support file:.  This can be used for
any migration scenario that does not require a reverse path.  It can be
used as an alternative to 'exec:cat > file' in minimized containers that
do not contain /bin/sh, and it is easier to use than the fd: URI.
It can be used in HMP commands, and as a qemu command-line parameter.

For best performance, guest ram should be shared and x-ignore-shared
should be true, so guest pages are not written to the file, in which case
the guest may remain running.  If ram is not so configured, then the user
is advised to stop the guest first.  Otherwise, a busy guest may re-dirty
the same page, causing it to be appended to the file multiple times,
and the file may grow unboundedly.  That issue is being addressed in the
"fixed-ram" patch series.

Signed-off-by: Steve Sistare 
Reviewed-by: Fabiano Rosas 
---
 migration/file.c   | 62 ++
 migration/file.h   | 14 
 migration/meson.build  |  1 +
 migration/migration.c  |  5 
 migration/trace-events |  4 
 qemu-options.hx|  6 -
 6 files changed, 91 insertions(+), 1 deletion(-)
 create mode 100644 migration/file.c
 create mode 100644 migration/file.h

diff --git a/migration/file.c b/migration/file.c
new file mode 100644
index 000..8e35827
--- /dev/null
+++ b/migration/file.c
@@ -0,0 +1,62 @@
+/*
+ * Copyright (c) 2021-2023 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "channel.h"
+#include "file.h"
+#include "migration.h"
+#include "io/channel-file.h"
+#include "io/channel-util.h"
+#include "trace.h"
+
+void file_start_outgoing_migration(MigrationState *s, const char *filename,
+   Error **errp)
+{
+g_autoptr(QIOChannelFile) fioc = NULL;
+QIOChannel *ioc;
+
+trace_migration_file_outgoing(filename);
+
+fioc = qio_channel_file_new_path(filename, O_CREAT | O_WRONLY | O_TRUNC,
+ 0600, errp);
+if (!fioc) {
+return;
+}
+
+ioc = QIO_CHANNEL(fioc);
+qio_channel_set_name(ioc, "migration-file-outgoing");
+migration_channel_connect(s, ioc, NULL, NULL);
+}
+
+static gboolean file_accept_incoming_migration(QIOChannel *ioc,
+   GIOCondition condition,
+   gpointer opaque)
+{
+migration_channel_process_incoming(ioc);
+object_unref(OBJECT(ioc));
+return G_SOURCE_REMOVE;
+}
+
+void file_start_incoming_migration(const char *filename, Error **errp)
+{
+QIOChannelFile *fioc = NULL;
+QIOChannel *ioc;
+
+trace_migration_file_incoming(filename);
+
+fioc = qio_channel_file_new_path(filename, O_RDONLY, 0, errp);
+if (!fioc) {
+return;
+}
+
+ioc = QIO_CHANNEL(fioc);
+qio_channel_set_name(QIO_CHANNEL(ioc), "migration-file-incoming");
+qio_channel_add_watch_full(ioc, G_IO_IN,
+   file_accept_incoming_migration,
+   NULL, NULL,
+   g_main_context_get_thread_default());
+}
diff --git a/migration/file.h b/migration/file.h
new file mode 100644
index 000..841b94a
--- /dev/null
+++ b/migration/file.h
@@ -0,0 +1,14 @@
+/*
+ * Copyright (c) 2021-2023 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_MIGRATION_FILE_H
+#define QEMU_MIGRATION_FILE_H
+void file_start_incoming_migration(const char *filename, Error **errp);
+
+void file_start_outgoing_migration(MigrationState *s, const char *filename,
+   Error **errp);
+#endif
diff --git a/migration/meson.build b/migration/meson.build
index 8ba6e42..3af817e 100644
--- a/migration/meson.build
+++ b/migration/meson.build
@@ -16,6 +16,7 @@ softmmu_ss.add(files(
   'dirtyrate.c',
   'exec.c',
   'fd.c',
+  'file.c',
   'global_state.c',
   'migration-hmp-cmds.c',
   'migration.c',
diff --git a/migration/migration.c b/migration/migration.c
index dc05c6f..cfbde86 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -20,6 +20,7 @@
 #include "migration/blocker.h"
 #include "exec.h"
 #include "fd.h"
+#include "file.h"
 #include "socket.h"
 #include "sysemu/runstate.h"
 #include "sysemu/sysemu.h"
@@ -442,6 +443,8 @@ static void qemu_start_incoming_migration(const char *uri, 
Error **errp)
 exec_start_incoming_migration(p, errp);
 } else if (strstart(uri, "fd:", &p)) {
 fd_start_incoming_migration(p, errp);
+} else if (strstart(uri, "file:", &p)) {
+file_start_incoming_migration(p, errp);
 } else {
 error_setg(errp, "unknown migration protocol: %s", uri);
 }
@@ -1662,6 +1665,8 @@ void qmp_migrate(const char *uri, bool has_

[PATCH V3 2/2] migration: file URI offset

2023-06-22 Thread Steve Sistare

Allow an offset option to be specified as part of the file URI, in
the form "file:filename,offset=offset", where offset accepts the common
size suffixes, or the 0x prefix, but not both.  Migration data is written
to and read from the file starting at offset.  If unspecified, it defaults
to 0.

This is needed by libvirt to store its own data at the head of the file.

Suggested-by: Daniel P. Berrange 
Signed-off-by: Steve Sistare 
---
 migration/file.c | 45 +++--
 qemu-options.hx  |  7 ---
 2 files changed, 47 insertions(+), 5 deletions(-)

diff --git a/migration/file.c b/migration/file.c
index 8e35827..8960be9 100644
--- a/migration/file.c
+++ b/migration/file.c
@@ -6,6 +6,8 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/cutils.h"
+#include "qapi/error.h"
 #include "channel.h"
 #include "file.h"
 #include "migration.h"
@@ -13,14 +15,41 @@
 #include "io/channel-util.h"
 #include "trace.h"
 
-void file_start_outgoing_migration(MigrationState *s, const char *filename,
+#define OFFSET_OPTION ",offset="
+
+/* Remove the offset option from @filespec and return it in @offsetp. */
+
+static int file_parse_offset(char *filespec, uint64_t *offsetp, Error **errp)
+{
+char *option = strstr(filespec, OFFSET_OPTION);
+int ret;
+
+if (option) {
+*option = 0;
+option += sizeof(OFFSET_OPTION) - 1;
+ret = qemu_strtosz(option, NULL, offsetp);
+if (ret) {
+error_setg_errno(errp, ret, "file URI has bad offset %s", option);
+return -1;
+}
+}
+return 0;
+}
+
+void file_start_outgoing_migration(MigrationState *s, const char *filespec,
Error **errp)
 {
+g_autofree char *filename = g_strdup(filespec);
 g_autoptr(QIOChannelFile) fioc = NULL;
+uint64_t offset = 0;
 QIOChannel *ioc;
 
 trace_migration_file_outgoing(filename);
 
+if (file_parse_offset(filename, &offset, errp)) {
+return;
+}
+
 fioc = qio_channel_file_new_path(filename, O_CREAT | O_WRONLY | O_TRUNC,
  0600, errp);
 if (!fioc) {
@@ -28,6 +57,9 @@ void file_start_outgoing_migration(MigrationState *s, const 
char *filename,
 }
 
 ioc = QIO_CHANNEL(fioc);
+if (offset && qio_channel_io_seek(ioc, offset, SEEK_SET, errp) < 0) {
+return;
+}
 qio_channel_set_name(ioc, "migration-file-outgoing");
 migration_channel_connect(s, ioc, NULL, NULL);
 }
@@ -41,19 +73,28 @@ static gboolean file_accept_incoming_migration(QIOChannel 
*ioc,
 return G_SOURCE_REMOVE;
 }
 
-void file_start_incoming_migration(const char *filename, Error **errp)
+void file_start_incoming_migration(const char *filespec, Error **errp)
 {
+g_autofree char *filename = g_strdup(filespec);
 QIOChannelFile *fioc = NULL;
+uint64_t offset = 0;
 QIOChannel *ioc;
 
 trace_migration_file_incoming(filename);
 
+if (file_parse_offset(filename, &offset, errp)) {
+return;
+}
+
 fioc = qio_channel_file_new_path(filename, O_RDONLY, 0, errp);
 if (!fioc) {
 return;
 }
 
 ioc = QIO_CHANNEL(fioc);
+if (offset && qio_channel_io_seek(ioc, offset, SEEK_SET, errp) < 0) {
+return;
+}
 qio_channel_set_name(QIO_CHANNEL(ioc), "migration-file-incoming");
 qio_channel_add_watch_full(ioc, G_IO_IN,
file_accept_incoming_migration,
diff --git a/qemu-options.hx b/qemu-options.hx
index 5aab8fb..5a92210 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4622,7 +4622,7 @@ DEF("incoming", HAS_ARG, QEMU_OPTION_incoming, \
 "prepare for incoming migration, listen on\n" \
 "specified protocol and socket address\n" \
 "-incoming fd:fd\n" \
-"-incoming file:filename\n" \
+"-incoming file:filename[,offset=offset]\n" \
 "-incoming exec:cmdline\n" \
 "accept incoming migration on given file descriptor\n" \
 "or from given external command\n" \
@@ -4641,8 +4641,9 @@ SRST
 ``-incoming fd:fd``
 Accept incoming migration from a given file descriptor.
 
-``-incoming file:filename``
-Accept incoming migration from a given file.
+``-incoming file:filename[,offset=offset]``
+Accept incoming migration from a given file starting at offset.
+offset allows the common size suffixes, or a 0x prefix, but not both.
 
 ``-incoming exec:cmdline``
 Accept incoming migration as an output from specified external
-- 
1.8.3.1

[PATCH V3 0/2] migration file URI

2023-06-22 Thread Steve Sistare

Add the migration URI "file:filename[,offset=offset]".

Fabiano Rosas has also written preliminary patches for the file uri, and
he will submit the unit test(s).

Steve Sistare (2):
  migration: file URI
  migration: file URI offset

 migration/file.c   | 103 +
 migration/file.h   |  14 +++
 migration/meson.build  |   1 +
 migration/migration.c  |   5 +++
 migration/trace-events |   4 ++
 qemu-options.hx|   7 +++-
 6 files changed, 133 insertions(+), 1 deletion(-)
 create mode 100644 migration/file.c
 create mode 100644 migration/file.h

-- 
1.8.3.1

Re: [PATCH 0/4] hw: Minor simplifications using proper QOM getter macros

2023-06-22 Thread Michael S. Tsirkin

On Thu, Jun 08, 2023 at 09:41:58PM +0300, Michael Tokarev wrote:
> 23.05.2023 09:12, Philippe Mathieu-Daudé wrote:
> > Enforce QOM style. Besides, using the proper QOM macros
> > slightly simplifies the code.
> 
> Applied to my trivial-patches branch (Maybe it's time to resurrect it).
> 
> Thanks,
> 
> /mjt

pci things:

Reviewed-by: Michael S. Tsirkin

Re: [PATCH v4 0/5] Support x2APIC mode with TCG accelerator

2023-06-22 Thread Michael S. Tsirkin

On Mon, May 22, 2023 at 11:31:52PM +0700, Bui Quang Minh wrote:
> Hi everyone,
> 
> This series implements x2APIC mode in userspace local APIC and the
> RDMSR/WRMSR helper to access x2APIC registers in x2APIC mode. Intel iommu
> and AMD iommu are adjusted to support x2APIC interrupt remapping. With this
> series, we can now boot Linux kernel into x2APIC mode with TCG accelerator
> using either Intel or AMD iommu.
> 
> Testing to boot my own built Linux 6.3.0-rc2, the kernel successfully boot
> with enabled x2APIC and can enumerate CPU with APIC ID 257
> Using Intel IOMMU
> 
> qemu/build/qemu-system-x86_64 \
>   -smp 2,maxcpus=260 \
>   -cpu qemu64,x2apic=on \
>   -machine q35 \
>   -device intel-iommu,intremap=on,eim=on \
>   -device qemu64-x86_64-cpu,x2apic=on,core-id=257,socket-id=0,thread-id=0 \
>   -m 2G \
>   -kernel $KERNEL_DIR \
>   -append "nokaslr console=ttyS0 root=/dev/sda earlyprintk=serial 
> net.ifnames=0" \
>   -drive file=$IMAGE_DIR,format=raw \
>   -nographic \
>   -s
> 
> Using AMD IOMMU
> 
> qemu/build/qemu-system-x86_64 \
>   -smp 2,maxcpus=260 \
>   -cpu qemu64,x2apic=on \
>   -machine q35 \
>   -device amd-iommu,intremap=on,xtsup=on \
>   -device qemu64-x86_64-cpu,x2apic=on,core-id=257,socket-id=0,thread-id=0 \
>   -m 2G \
>   -kernel $KERNEL_DIR \
>   -append "nokaslr console=ttyS0 root=/dev/sda earlyprintk=serial 
> net.ifnames=0" \
>   -drive file=$IMAGE_DIR,format=raw \
>   -nographic \
>   -s
> 
> Testing the emulated userspace APIC with kvm-unit-tests, disable test
> device with this patch
> 
> diff --git a/lib/x86/fwcfg.c b/lib/x86/fwcfg.c
> index 1734afb..f56fe1c 100644
> --- a/lib/x86/fwcfg.c
> +++ b/lib/x86/fwcfg.c
> @@ -27,6 +27,7 @@ static void read_cfg_override(void)
>  
> if ((str = getenv("TEST_DEVICE")))
> no_test_device = !atol(str);
> +   no_test_device = true;
>  
> if ((str = getenv("MEMLIMIT")))
> fw_override[FW_CFG_MAX_RAM] = atol(str) * 1024 * 1024;
> 
> ~ env QEMU=/home/minh/Desktop/oss/qemu/build/qemu-system-x86_64 ACCEL=tcg \
> ./run_tests.sh -v -g apic 
> 
> TESTNAME=apic-split TIMEOUT=90s ACCEL=tcg ./x86/run x86/apic.flat -smp 2
> -cpu qemu64,+x2apic,+tsc-deadline -machine kernel_irqchip=split FAIL
> apic-split (54 tests, 8 unexpected failures, 1 skipped)
> TESTNAME=ioapic-split TIMEOUT=90s ACCEL=tcg ./x86/run x86/ioapic.flat -smp
> 1 -cpu qemu64 -machine kernel_irqchip=split PASS ioapic-split (19 tests)
> TESTNAME=x2apic TIMEOUT=30 ACCEL=tcg ./x86/run x86/apic.flat -smp 2 -cpu
> qemu64,+x2apic,+tsc-deadline FAIL x2apic (54 tests, 8 unexpected failures,
> 1 skipped) TESTNAME=xapic TIMEOUT=60 ACCEL=tcg ./x86/run x86/apic.flat -smp
> 2 -cpu qemu64,-x2apic,+tsc-deadline -machine pit=off FAIL xapic (43 tests,
> 6 unexpected failures, 2 skipped)
> 
>   FAIL: apic_disable: *0xfee00030: 50014
>   FAIL: apic_disable: *0xfee00080: f0
>   FAIL: apic_disable: *0xfee00030: 50014
>   FAIL: apic_disable: *0xfee00080: f0 
>   FAIL: apicbase: relocate apic
> 
> These errors are because we don't disable MMIO region when switching to
> x2APIC and don't support relocate MMIO region yet. This is a problem
> because, MMIO region is the same for all CPUs, in order to support these we
> need to figure out how to allocate and manage different MMIO regions for
> each CPUs.

Oh interesting point.
Paolo what do you say? Can memory core support something like this?

> This can be an improvement in the future.
> 
>   FAIL: nmi-after-sti
>   FAIL: multiple nmi
> 
> These errors are in the way we handle CPU_INTERRUPT_NMI in core TCG.
> 
>   FAIL: TMCCT should stay at zero
> 
> This error is related to APIC timer which should be addressed in separate
> patch.
> 
> Version 4 changes,
> - Patch 5:
>   + Instead of replacing IVHD type 0x10 with type 0x11, export both types
>   for backward compatibility with old guest operating system
>   + Flip the xtsup feature check condition in amdvi_int_remap_ga for
>   readability
> 
> Version 3 changes,
> - Patch 2:
>   + Allow APIC ID > 255 only when x2APIC feature is supported on CPU
>   + Make physical destination mode IPI which has destination id 0x
>   a broadcast to xAPIC CPUs
>   + Make cluster address 0xf in cluster model of xAPIC logical destination
>   mode a broadcast to all clusters
>   + Create new extended_log_dest to store APIC_LDR information in x2APIC
>   instead of extending log_dest for backward compatibility in vmstate
> 
> Version 2 changes,
> - Add support for APIC ID larger than 255
> - Adjust AMD iommu for x2APIC suuport
> - Reorganize and split patch 1,2 into patch 1,2,3 in version 2
> 
> Thanks,
> Quang Minh.
> 
> Bui Quang Minh (5):
>   i386/tcg: implement x2APIC registers MSR access
>   apic: add support for x2APIC mode
>   apic, i386/tcg: add x2apic transitions
>   intel_iommu: allow Extended Interrupt Mode when using userspace APIC
>   amd_iommu: report x2APIC support to the operating system
> 
>  hw/i386/acpi-build.c | 127 +
>  hw/i386/am

Re: [PATCH v4 5/5] amd_iommu: report x2APIC support to the operating system

2023-06-22 Thread Michael S. Tsirkin

On Mon, May 22, 2023 at 11:31:57PM +0700, Bui Quang Minh wrote:
> This commit adds XTSup configuration to let user choose to whether enable
> this feature or not. When XTSup is enabled, additional bytes in IRTE with
> enabled guest virtual VAPIC are used to support 32-bit destination id.
> 
> Additionally, this commit exports IVHD type 0x11 besides the old IVHD type
> 0x10 in ACPI table. IVHD type 0x10 does not report full set of IOMMU
> features only the legacy ones, so operating system (e.g. Linux) may only
> detects x2APIC support if IVHD type 0x11 is available. The IVHD type 0x10
> is kept so that old operating system that only parses type 0x10 can detect
> the IOMMU device.
> 
> Signed-off-by: Bui Quang Minh 
> ---
>  hw/i386/acpi-build.c | 127 ++-
>  hw/i386/amd_iommu.c  |  21 ++-
>  hw/i386/amd_iommu.h  |  16 --
>  3 files changed, 108 insertions(+), 56 deletions(-)
> 
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 512162003b..4459122e56 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -2339,30 +2339,23 @@ static void
>  build_amd_iommu(GArray *table_data, BIOSLinker *linker, const char *oem_id,
>  const char *oem_table_id)
>  {
> -int ivhd_table_len = 24;
>  AMDVIState *s = AMD_IOMMU_DEVICE(x86_iommu_get_default());
>  GArray *ivhd_blob = g_array_new(false, true, 1);
>  AcpiTable table = { .sig = "IVRS", .rev = 1, .oem_id = oem_id,
>  .oem_table_id = oem_table_id };
> +uint64_t feature_report;
>  
>  acpi_table_begin(&table, table_data);
>  /* IVinfo - IO virtualization information common to all
>   * IOMMU units in a system
>   */
> -build_append_int_noprefix(table_data, 40UL << 8/* PASize */, 4);
> +build_append_int_noprefix(table_data,
> + (1UL << 0) | /* EFRSup */
> + (40UL << 8), /* PASize */
> + 4);
>  /* reserved */
>  build_append_int_noprefix(table_data, 0, 8);
>  
> -/* IVHD definition - type 10h */
> -build_append_int_noprefix(table_data, 0x10, 1);
> -/* virtualization flags */
> -build_append_int_noprefix(table_data,
> - (1UL << 0) | /* HtTunEn  */
> - (1UL << 4) | /* iotblSup */
> - (1UL << 6) | /* PrefSup  */
> - (1UL << 7),  /* PPRSup   */
> - 1);
> -
>  /*
>   * A PCI bus walk, for each PCI host bridge, is necessary to create a
>   * complete set of IVHD entries.  Do this into a separate blob so that we
> @@ -2382,56 +2375,92 @@ build_amd_iommu(GArray *table_data, BIOSLinker 
> *linker, const char *oem_id,
>  build_append_int_noprefix(ivhd_blob, 0x001, 4);
>  }
>  
> -ivhd_table_len += ivhd_blob->len;
> -
>  /*
>   * When interrupt remapping is supported, we add a special IVHD device
> - * for type IO-APIC.
> - */
> -if (x86_iommu_ir_supported(x86_iommu_get_default())) {
> -ivhd_table_len += 8;
> -}
> -
> -/* IVHD length */
> -build_append_int_noprefix(table_data, ivhd_table_len, 2);
> -/* DeviceID */
> -build_append_int_noprefix(table_data,
> -  object_property_get_int(OBJECT(&s->pci), 
> "addr",
> -  &error_abort), 2);
> -/* Capability offset */
> -build_append_int_noprefix(table_data, s->pci.capab_offset, 2);
> -/* IOMMU base address */
> -build_append_int_noprefix(table_data, s->mmio.addr, 8);
> -/* PCI Segment Group */
> -build_append_int_noprefix(table_data, 0, 2);
> -/* IOMMU info */
> -build_append_int_noprefix(table_data, 0, 2);
> -/* IOMMU Feature Reporting */
> -build_append_int_noprefix(table_data,
> - (48UL << 30) | /* HATS   */
> - (48UL << 28) | /* GATS   */
> - (1UL << 2)   | /* GTSup  */
> - (1UL << 6),/* GASup  */
> - 4);
> -
> -/* IVHD entries as found above */
> -g_array_append_vals(table_data, ivhd_blob->data, ivhd_blob->len);
> -g_array_free(ivhd_blob, TRUE);
> -
> -/*
> - * Add a special IVHD device type.
> + * for type IO-APIC
>   * Refer to spec - Table 95: IVHD device entry type codes
>   *
>   * Linux IOMMU driver checks for the special IVHD device (type IO-APIC).
>   * See Linux kernel commit 'c2ff5cf5294bcbd7fa50f7d860e90a66db7e5059'
>   */
>  if (x86_iommu_ir_supported(x86_iommu_get_default())) {
> -build_append_int_noprefix(table_data,
> +build_append_int_noprefix(ivhd_blob,
>   (0x1ull << 56) |   /* type IOAPIC */
>   (IOAPIC_SB_DEVID << 40) |  /* IOAP

Re: [PATCH v3 3/4] vhost-user: add shared_object msg

2023-06-22 Thread Marc-André Lureau

Hi

On Wed, May 24, 2023 at 11:13 AM Albert Esteve  wrote:

> Add new vhost-user protocol message
> `VHOST_USER_BACKEND_SHARED_OBJECT`. This new
> message is sent from vhost-user back-ends
> to interact with the virtio-dmabuf table
> in order to add, remove, or lookup for
> virtio dma-buf shared objects.
>
> The action taken in the front-end depends
> on the type stored in the payload struct.
>
> In the libvhost-user library add helper
> functions to allow sending messages to
> interact with the virtio shared objects
> hash table.
>
> Signed-off-by: Albert Esteve 
> ---
>  docs/interop/vhost-user.rst   | 15 
>  hw/virtio/vhost-user.c| 68 ++
>  subprojects/libvhost-user/libvhost-user.c | 88 +++
>  subprojects/libvhost-user/libvhost-user.h | 56 +++
>  4 files changed, 227 insertions(+)
>
> diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> index 5a070adbc1..d3d8db41e5 100644
> --- a/docs/interop/vhost-user.rst
> +++ b/docs/interop/vhost-user.rst
> @@ -1528,6 +1528,21 @@ is sent by the front-end.
>
>The state.num field is currently reserved and must be set to 0.
>
> +``VHOST_USER_BACKEND_SHARED_OBJECT``
> +  :id: 6
> +  :equivalent ioctl: N/A
> +  :request payload: ``struct VhostUserShared``
> +  :reply payload: ``struct VhostUserShared`` (only for ``LOOKUP``
> requests)
>

only for LOOKUP, ahah...


> +
> +  Backends that need to interact with the virtio-dmabuf shared table API
> +  can send this message. The operation is determined by the ``type``
> member
> +  of the payload struct. The valid values for the operation type are
> +  ``VHOST_SHARED_OBJECT_*`` members, i.e., ``ADD``, ``LOOKUP``, and
> ``REMOVE``.
>

...why not use specific messages instead of this extra "type"?


> +  ``LOOKUP`` operations require the ``VHOST_USER_NEED_REPLY_MASK`` flag
> to be
> +  set by the back-end, and the front-end will then send the dma-buf fd as
> +  a response if the UUID matches an object in the table, or a negative
> value
> +  otherwise.
>

This new message(s) should be initially negotiated with a protocol feature
flag.


> +
>  .. _reply_ack:
>
>  VHOST_USER_PROTOCOL_F_REPLY_ACK
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 74a2a28663..5ac5f0eafd 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -10,6 +10,7 @@
>
>  #include "qemu/osdep.h"
>  #include "qapi/error.h"
> +#include "hw/virtio/virtio-dmabuf.h"
>  #include "hw/virtio/vhost.h"
>  #include "hw/virtio/vhost-user.h"
>  #include "hw/virtio/vhost-backend.h"
> @@ -20,6 +21,7 @@
>  #include "sysemu/kvm.h"
>  #include "qemu/error-report.h"
>  #include "qemu/main-loop.h"
> +#include "qemu/uuid.h"
>  #include "qemu/sockets.h"
>  #include "sysemu/runstate.h"
>  #include "sysemu/cryptodev.h"
> @@ -128,6 +130,7 @@ typedef enum VhostUserSlaveRequest {
>  VHOST_USER_BACKEND_IOTLB_MSG = 1,
>  VHOST_USER_BACKEND_CONFIG_CHANGE_MSG = 2,
>  VHOST_USER_BACKEND_VRING_HOST_NOTIFIER_MSG = 3,
> +VHOST_USER_BACKEND_SHARED_OBJECT = 6,
>  VHOST_USER_BACKEND_MAX
>  }  VhostUserSlaveRequest;
>
> @@ -190,6 +193,18 @@ typedef struct VhostUserInflight {
>  uint16_t queue_size;
>  } VhostUserInflight;
>
> +typedef enum VhostUserSharedType {
> +VHOST_SHARED_OBJECT_ADD = 0,
> +VHOST_SHARED_OBJECT_LOOKUP,
> +VHOST_SHARED_OBJECT_REMOVE,
> +} VhostUserSharedType;
> +
> +typedef struct VhostUserShared {
> +unsigned char uuid[16];
> +VhostUserSharedType type;
> +int dmabuf_fd;
> +} VhostUserShared;
> +
>  typedef struct {
>  VhostUserRequest request;
>
> @@ -214,6 +229,7 @@ typedef union {
>  VhostUserCryptoSession session;
>  VhostUserVringArea area;
>  VhostUserInflight inflight;
> +VhostUserShared object;
>  } VhostUserPayload;
>
>  typedef struct VhostUserMsg {
> @@ -1582,6 +1598,52 @@ static int
> vhost_user_slave_handle_vring_host_notifier(struct vhost_dev *dev,
>  return 0;
>  }
>
> +static int vhost_user_backend_handle_shared_object(VhostUserShared
> *object)
> +{
> +QemuUUID uuid;
> +memcpy(uuid.data, object->uuid, sizeof(object->uuid));
> +
> +switch (object->type) {
> +case VHOST_SHARED_OBJECT_ADD:
> +return virtio_add_dmabuf(&uuid, object->dmabuf_fd);
> +case VHOST_SHARED_OBJECT_LOOKUP:
> +object->dmabuf_fd = virtio_lookup_dmabuf(&uuid);
> +if (object->dmabuf_fd < 0) {
> +return object->dmabuf_fd;
> +}
> +break;
> +case VHOST_SHARED_OBJECT_REMOVE:
> +return virtio_remove_resource(&uuid);
> +}
> +
> +return 0;
> +}
> +
> +static bool
> +vhost_user_backend_send_dmabuf_fd(QIOChannel *ioc, VhostUserHeader *hdr,
> +  VhostUserPayload *payload)
> +{
> +Error *local_err = NULL;
> +struct iovec iov[2];
> +if (hdr->flags & VHOST_USER_NEED_REPLY_MASK) {
> +hdr->flags &= ~VHOST_USER_NEED_REPLY_MASK;
> +

Re: [PATCH v2 00/17] Support smp.clusters for x86

2023-06-22 Thread Michael S. Tsirkin

On Mon, May 29, 2023 at 08:30:44PM +0800, Zhao Liu wrote:
> From: Zhao Liu 
> 
> Hi list,
> 
> This is the our v2 patch series, rebased on the master branch at the
> commit ac84b57b4d74 ("Merge tag 'for-upstream' of
> https://gitlab.com/bonzini/qemu into staging").
> 
> Comparing with v1 [1], v2 mainly reorganizes patches and does some
> cleanup.
> 
> This series add the cluster support for x86 PC machine, which allows
> x86 can use smp.clusters to configure the modlue level CPU topology
> of x86.
> 
> And since the compatibility issue (see section: ## Why not share L2
> cache in cluster directly), this series also introduce a new command
> to adjust the topology of the x86 L2 cache.
> 
> Welcome your comments!

PC things:

Acked-by: Michael S. Tsirkin 



> 
> # Backgroud
> 
> The "clusters" parameter in "smp" is introduced by ARM [2], but x86
> hasn't supported it.
> 
> At present, x86 defaults L2 cache is shared in one core, but this is
> not enough. There're some platforms that multiple cores share the
> same L2 cache, e.g., Alder Lake-P shares L2 cache for one module of
> Atom cores [3], that is, every four Atom cores shares one L2 cache.
> Therefore, we need the new CPU topology level (cluster/module).
> 
> Another reason is for hybrid architecture. cluster support not only
> provides another level of topology definition in x86, but would aslo
> provide required code change for future our hybrid topology support.
> 
> 
> # Overview
> 
> ## Introduction of module level for x86
> 
> "cluster" in smp is the CPU topology level which is between "core" and
> die.
> 
> For x86, the "cluster" in smp is corresponding to the module level [4],
> which is above the core level. So use the "module" other than "cluster"
> in x86 code.
> 
> And please note that x86 already has a cpu topology level also named
> "cluster" [4], this level is at the upper level of the package. Here,
> the cluster in x86 cpu topology is completely different from the
> "clusters" as the smp parameter. After the module level is introduced,
> the cluster as the smp parameter will actually refer to the module level
> of x86.
> 
> 
> ## Why not share L2 cache in cluster directly
> 
> Though "clusters" was introduced to help define L2 cache topology
> [2], using cluster to define x86's L2 cache topology will cause the
> compatibility problem:
> 
> Currently, x86 defaults that the L2 cache is shared in one core, which
> actually implies a default setting "cores per L2 cache is 1" and
> therefore implicitly defaults to having as many L2 caches as cores.
> 
> For example (i386 PC machine):
> -smp 16,sockets=2,dies=2,cores=2,threads=2,maxcpus=16 (*)
> 
> Considering the topology of the L2 cache, this (*) implicitly means "1
> core per L2 cache" and "2 L2 caches per die".
> 
> If we use cluster to configure L2 cache topology with the new default
> setting "clusters per L2 cache is 1", the above semantics will change
> to "2 cores per cluster" and "1 cluster per L2 cache", that is, "2
> cores per L2 cache".
> 
> So the same command (*) will cause changes in the L2 cache topology,
> further affecting the performance of the virtual machine.
> 
> Therefore, x86 should only treat cluster as a cpu topology level and
> avoid using it to change L2 cache by default for compatibility.
> 
> 
> ## module level in CPUID
> 
> Currently, we don't expose module level in CPUID.1FH because currently
> linux (v6.2-rc6) doesn't support module level. And exposing module and
> die levels at the same time in CPUID.1FH will cause linux to calculate
> wrong die_id. The module level should be exposed until the real machine
> has the module level in CPUID.1FH.
> 
> We can configure CPUID.04H.02H (L2 cache topology) with module level by
> a new command:
> 
> "-cpu,x-l2-cache-topo=cluster"
> 
> More information about this command, please see the section: "## New
> property: x-l2-cache-topo".
> 
> 
> ## New cache topology info in CPUCacheInfo
> 
> Currently, by default, the cache topology is encoded as:
> 1. i/d cache is shared in one core.
> 2. L2 cache is shared in one core.
> 3. L3 cache is shared in one die.
> 
> This default general setting has caused a misunderstanding, that is, the
> cache topology is completely equated with a specific cpu topology, such
> as the connection between L2 cache and core level, and the connection
> between L3 cache and die level.
> 
> In fact, the settings of these topologies depend on the specific
> platform and are not static. For example, on Alder Lake-P, every
> four Atom cores share the same L2 cache [2].
> 
> Thus, in this patch set, we explicitly define the corresponding cache
> topology for different cpu models and this has two benefits:
> 1. Easy to expand to new CPU models in the future, which has different
>cache topology.
> 2. It can easily support custom cache topology by some command (e.g.,
>x-l2-cache-topo).
> 
> 
> ## New property: x-l2-cache-topo
> 
> The property l2-cache-topo will be used to change the L2 ca

Re: [PATCH 00/10] memory-device: Some cleanups

2023-06-22 Thread Michael S. Tsirkin

On Tue, May 30, 2023 at 01:38:28PM +0200, David Hildenbrand wrote:
> Working on adding multi-memslot support for virtio-mem (teaching memory
> device code about memory devices that can consume multiple memslots), I
> have some preparatory cleanups in my queue that make sense independent of
> the actual memory-device/virtio-mem extensions.

pc/acpi things:

Acked-by: Michael S. Tsirkin 


> v1 -> v2:
> - Allocate ms->device_memory only if the size > 0.
> - Split it up and include more cleanups
> 
> David Hildenbrand (10):
>   memory-device: Unify enabled vs. supported error messages
>   memory-device: Introduce memory_devices_init()
>   hw/arm/virt: Use memory_devices_init()
>   hw/ppc/spapr: Use memory_devices_init()
>   hw/loongarch/virt: Use memory_devices_init()
>   hw/i386/pc: Use memory_devices_init()
>   hw/i386/acpi-build: Rely on machine->device_memory when building SRAT
>   hw/i386/pc: Remove PC_MACHINE_DEVMEM_REGION_SIZE
>   memory-device: Refactor memory_device_pre_plug()
>   memory-device: Track used region size in DeviceMemoryState
> 
>  hw/arm/virt.c  |  9 +
>  hw/i386/acpi-build.c   |  9 ++---
>  hw/i386/pc.c   | 36 +++---
>  hw/loongarch/virt.c| 14 ++-
>  hw/mem/memory-device.c | 69 +++---
>  hw/ppc/spapr.c | 37 +-
>  hw/ppc/spapr_hcall.c   |  2 +-
>  include/hw/boards.h|  2 +
>  include/hw/i386/pc.h   |  1 -
>  include/hw/mem/memory-device.h |  2 +
>  10 files changed, 68 insertions(+), 113 deletions(-)
> 
> -- 
> 2.40.1

Re: [PATCH 0/4] tests/qtest: Check for devices before using them

2023-06-22 Thread Michael S. Tsirkin

On Thu, May 25, 2023 at 10:10:12AM +0200, Thomas Huth wrote:
> Here are some more patches that are required for running the qtests
> with builds that have been configured with "--without-default-devices".
> We need to check whether the required devices are really available
> in the binaries before we can use them, otherwise the tests will
> fail.
> 
> Thomas Huth (4):
>   tests/qtest/usb-hcd-uhci-test: Check whether "usb-storage" is
> available
>   tests/qtest: Check for virtio-blk before using -cdrom with the arm
> virt machine
>   tests/qtest/rtl8139-test: Check whether the rtl8139 device is
> available
>   tests/qtest/usb-hcd-ehci-test: Check for EHCI and UHCI HCDs before
> using them
> 
>  tests/qtest/bios-tables-test.c  | 2 +-
>  tests/qtest/cdrom-test.c| 6 +-
>  tests/qtest/rtl8139-test.c  | 4 
>  tests/qtest/usb-hcd-ehci-test.c | 5 +
>  tests/qtest/usb-hcd-uhci-test.c | 4 +++-
>  5 files changed, 18 insertions(+), 3 deletions(-)

I am worried that if an uninitentional change disables some devices
by default our CI will no longer catch this.
Any way to address this? E.g. maybe introduce a "for CI" or
"test all" configure flag and then make test fail if something
hasn't been configured?

> -- 
> 2.31.1

Re: [PATCH v3 3/4] vhost-user: add shared_object msg

2023-06-22 Thread Michael S. Tsirkin

On Wed, May 24, 2023 at 11:13:32AM +0200, Albert Esteve wrote:
> Add new vhost-user protocol message
> `VHOST_USER_BACKEND_SHARED_OBJECT`. This new
> message is sent from vhost-user back-ends
> to interact with the virtio-dmabuf table
> in order to add, remove, or lookup for
> virtio dma-buf shared objects.
> 
> The action taken in the front-end depends
> on the type stored in the payload struct.
> 
> In the libvhost-user library add helper
> functions to allow sending messages to
> interact with the virtio shared objects
> hash table.
> 
> Signed-off-by: Albert Esteve 
> ---
>  docs/interop/vhost-user.rst   | 15 
>  hw/virtio/vhost-user.c| 68 ++
>  subprojects/libvhost-user/libvhost-user.c | 88 +++
>  subprojects/libvhost-user/libvhost-user.h | 56 +++
>  4 files changed, 227 insertions(+)
> 
> diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
> index 5a070adbc1..d3d8db41e5 100644
> --- a/docs/interop/vhost-user.rst
> +++ b/docs/interop/vhost-user.rst
> @@ -1528,6 +1528,21 @@ is sent by the front-end.
>  
>The state.num field is currently reserved and must be set to 0.
>  
> +``VHOST_USER_BACKEND_SHARED_OBJECT``
> +  :id: 6
> +  :equivalent ioctl: N/A
> +  :request payload: ``struct VhostUserShared``
> +  :reply payload: ``struct VhostUserShared`` (only for ``LOOKUP`` requests)
> +
> +  Backends that need to interact with the virtio-dmabuf shared table API
> +  can send this message. The operation is determined by the ``type`` member
> +  of the payload struct. The valid values for the operation type are
> +  ``VHOST_SHARED_OBJECT_*`` members, i.e., ``ADD``, ``LOOKUP``, and 
> ``REMOVE``.
> +  ``LOOKUP`` operations require the ``VHOST_USER_NEED_REPLY_MASK`` flag to be
> +  set by the back-end, and the front-end will then send the dma-buf fd as
> +  a response if the UUID matches an object in the table, or a negative value
> +  otherwise.
> +
>  .. _reply_ack:
>  
>  VHOST_USER_PROTOCOL_F_REPLY_ACK
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 74a2a28663..5ac5f0eafd 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -10,6 +10,7 @@
>  
>  #include "qemu/osdep.h"
>  #include "qapi/error.h"
> +#include "hw/virtio/virtio-dmabuf.h"
>  #include "hw/virtio/vhost.h"
>  #include "hw/virtio/vhost-user.h"
>  #include "hw/virtio/vhost-backend.h"
> @@ -20,6 +21,7 @@
>  #include "sysemu/kvm.h"
>  #include "qemu/error-report.h"
>  #include "qemu/main-loop.h"
> +#include "qemu/uuid.h"
>  #include "qemu/sockets.h"
>  #include "sysemu/runstate.h"
>  #include "sysemu/cryptodev.h"
> @@ -128,6 +130,7 @@ typedef enum VhostUserSlaveRequest {
>  VHOST_USER_BACKEND_IOTLB_MSG = 1,
>  VHOST_USER_BACKEND_CONFIG_CHANGE_MSG = 2,
>  VHOST_USER_BACKEND_VRING_HOST_NOTIFIER_MSG = 3,
> +VHOST_USER_BACKEND_SHARED_OBJECT = 6,
>  VHOST_USER_BACKEND_MAX
>  }  VhostUserSlaveRequest;
>  
> @@ -190,6 +193,18 @@ typedef struct VhostUserInflight {
>  uint16_t queue_size;
>  } VhostUserInflight;
>  
> +typedef enum VhostUserSharedType {
> +VHOST_SHARED_OBJECT_ADD = 0,
> +VHOST_SHARED_OBJECT_LOOKUP,
> +VHOST_SHARED_OBJECT_REMOVE,
> +} VhostUserSharedType;
> +
> +typedef struct VhostUserShared {
> +unsigned char uuid[16];
> +VhostUserSharedType type;
> +int dmabuf_fd;
> +} VhostUserShared;
> +
>  typedef struct {
>  VhostUserRequest request;
>  
> @@ -214,6 +229,7 @@ typedef union {
>  VhostUserCryptoSession session;
>  VhostUserVringArea area;
>  VhostUserInflight inflight;
> +VhostUserShared object;
>  } VhostUserPayload;
>  
>  typedef struct VhostUserMsg {
> @@ -1582,6 +1598,52 @@ static int 
> vhost_user_slave_handle_vring_host_notifier(struct vhost_dev *dev,
>  return 0;
>  }
>  
> +static int vhost_user_backend_handle_shared_object(VhostUserShared *object)
> +{
> +QemuUUID uuid;
> +memcpy(uuid.data, object->uuid, sizeof(object->uuid));
> +
> +switch (object->type) {
> +case VHOST_SHARED_OBJECT_ADD:
> +return virtio_add_dmabuf(&uuid, object->dmabuf_fd);
> +case VHOST_SHARED_OBJECT_LOOKUP:
> +object->dmabuf_fd = virtio_lookup_dmabuf(&uuid);
> +if (object->dmabuf_fd < 0) {
> +return object->dmabuf_fd;
> +}
> +break;
> +case VHOST_SHARED_OBJECT_REMOVE:
> +return virtio_remove_resource(&uuid);
> +}
> +
> +return 0;
> +}
> +
> +static bool
> +vhost_user_backend_send_dmabuf_fd(QIOChannel *ioc, VhostUserHeader *hdr,
> +  VhostUserPayload *payload)
> +{
> +Error *local_err = NULL;
> +struct iovec iov[2];
> +if (hdr->flags & VHOST_USER_NEED_REPLY_MASK) {
> +hdr->flags &= ~VHOST_USER_NEED_REPLY_MASK;
> +hdr->flags |= VHOST_USER_REPLY_MASK;
> +
> +hdr->size = sizeof(payload->object);
> +
> +iov[0].iov_base = hdr;
> +iov[0].iov_len = VHOST_USER_H

Re: [PATCH v3 0/4] Virtio shared dma-buf

2023-06-22 Thread Michael S. Tsirkin

On Wed, Jun 21, 2023 at 10:20:25AM +0200, Albert Esteve wrote:
> Hi!
> 
> It has been a month since I sent this patch, so I'll give it a bump to get 
> some
> attention back.
> 
> @mst and @Fam any comments? What would be the next steps to take to move this
> forward?
> 
> BR,
> Albert

No one seems to be worried by this patchset so I queued it.

-- 
MST

[PATCH v2 5/5] migration: Deprecate old compression method

2023-06-22 Thread Juan Quintela

Signed-off-by: Juan Quintela 
---
 docs/about/deprecated.rst |   8 +++
 qapi/migration.json   | 102 --
 migration/options.c   |  13 +
 3 files changed, 86 insertions(+), 37 deletions(-)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index 2d7c48185e..792de61c8b 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -457,3 +457,11 @@ Please see "QMP invocation for live storage migration with
 ``driver-mirror`` + NBD" in docs/interop/live-block-operations.rst for
 a detailed explanation.
 
+old compression method (since 8.1)
+''
+
+Compression method fails too much.  Too many races.  We are going to
+remove it if nobody fixes it.  For starters, migration-test
+compression tests are disabled becase they fail randomly.  If you need
+compression, use multifd compression methods.
+
diff --git a/qapi/migration.json b/qapi/migration.json
index 08dee855cb..11f759b90b 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -244,7 +244,9 @@
 #
 # @compression: migration compression statistics, only returned if
 # compression feature is on and status is 'active' or 'completed'
-# (Since 3.1)
+# This feature is unreliable and not tested. It is recommended to
+# use multifd migration instead, which offers an alternative
+# reliable and tested compression implementation.  (Since 3.1)
 #
 # @socket-address: Only used for tcp, to know what the real port is
 # (Since 4.0)
@@ -272,8 +274,11 @@
 #
 # Features:
 #
-# @deprecated: @disk migration is deprecated.  Use driver-mirror
-# with NBD instead.
+# @deprecated: @disk migration is deprecated.  Use driver-mirror with
+# NBD instead.  @compression is unreliable and untested. It is
+# recommended to use multifd migration, which offers an
+# alternative compression implementation that is reliable and
+# tested.
 #
 # Since: 0.14
 ##
@@ -291,7 +296,7 @@
'*blocked-reasons': ['str'],
'*postcopy-blocktime': 'uint32',
'*postcopy-vcpu-blocktime': ['uint32'],
-   '*compression': 'CompressionStats',
+   '*compression': { 'type': 'CompressionStats', 'features': 
['deprecated'] },
'*socket-address': ['SocketAddress'],
'*dirty-limit-throttle-time-per-round': 'uint64',
'*dirty-limit-ring-full-time': 'uint64'} }
@@ -446,7 +451,8 @@
 # compress and xbzrle are both on, compress only takes effect in
 # the ram bulk stage, after that, it will be disabled and only
 # xbzrle takes effect, this can help to minimize migration
-# traffic.  The feature is disabled by default.  (since 2.4 )
+# traffic.  The feature is disabled by default.  Obsolete.  Use
+# multifd compression methods if needed. (since 2.4 )
 #
 # @events: generate events for each migration state change (since 2.4
 # )
@@ -525,8 +531,9 @@
 #
 # Features:
 #
-# @deprecated: @block migration is deprecated.  Use driver-mirror
-# with NBD instead.
+# @deprecated: @block migration is deprecated.  Use driver-mirror with
+# NBD instead. @compress is obsolete, use multifd compression
+# methods instead.
 #
 # @unstable: Members @x-colo and @x-ignore-shared are experimental.
 #
@@ -534,7 +541,8 @@
 ##
 { 'enum': 'MigrationCapability',
   'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks',
-   'compress', 'events', 'postcopy-ram',
+   { 'name': 'compress', 'features': [ 'deprecated' ] },
+   'events', 'postcopy-ram',
{ 'name': 'x-colo', 'features': [ 'unstable' ] },
'release-ram',
{ 'name': 'block', 'features': [ 'deprecated' ] },
@@ -694,22 +702,24 @@
 # migration, the compression level is an integer between 0 and 9,
 # where 0 means no compression, 1 means the best compression
 # speed, and 9 means best compression ratio which will consume
-# more CPU.
+# more CPU. Obsolete, see multifd compression if needed.
 #
 # @compress-threads: Set compression thread count to be used in live
 # migration, the compression thread count is an integer between 1
-# and 255.
+# and 255. Obsolete, see multifd compression if needed.
 #
 # @compress-wait-thread: Controls behavior when all compression
 # threads are currently busy.  If true (default), wait for a free
 # compression thread to become available; otherwise, send the page
-# uncompressed.  (Since 3.1)
+# uncompressed. Obsolete, see multifd compression if
+# needed. (Since 3.1)
 #
 # @decompress-threads: Set decompression thread count to be used in
 # live migration, the decompression thread count is an integer
 # between 1 and 255. Usually, decompression is at least 4 times as
 # fast as compression, so set the decompress-threads to the number
-# about 1/4 of compress-threads is adequate.
+# about 1/4 of compress-threads is adequate. Obsolete, see multifd
+# co

[PATCH v2 1/5] migration: Use proper indentation for migration.json

2023-06-22 Thread Juan Quintela

We broke it with dirtyrate limit patches.

Signed-off-by: Juan Quintela 
---
 qapi/migration.json | 67 ++---
 1 file changed, 33 insertions(+), 34 deletions(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index 6ff39157ba..ad8cc57071 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -258,17 +258,17 @@
 # blocked.  Present and non-empty when migration is blocked.
 # (since 6.0)
 #
-# @dirty-limit-throttle-time-per-round: Maximum throttle time (in 
microseconds) of virtual
-#   CPUs each dirty ring full round, which 
shows how
-#   MigrationCapability dirty-limit 
affects the guest
-#   during live migration. (since 8.1)
+# @dirty-limit-throttle-time-per-round: Maximum throttle time (in
+# microseconds) of virtual CPUs each dirty ring full round, which
+# shows how MigrationCapability dirty-limit affects the guest
+# during live migration. (since 8.1)
 #
-# @dirty-limit-ring-full-time: Estimated average dirty ring full time (in 
microseconds)
-#  each dirty ring full round, note that the value 
equals
-#  dirty ring memory size divided by average dirty 
page rate
-#  of virtual CPU, which can be used to observe 
the average
-#  memory load of virtual CPU indirectly. Note 
that zero
-#  means guest doesn't dirty memory (since 8.1)
+# @dirty-limit-ring-full-time: Estimated average dirty ring full time
+# (in microseconds) each dirty ring full round, note that the
+# value equals dirty ring memory size divided by average dirty
+# page rate of virtual CPU, which can be used to observe the
+# average memory load of virtual CPU indirectly. Note that zero
+# means guest doesn't dirty memory (since 8.1)
 #
 # Since: 0.14
 ##
@@ -510,14 +510,13 @@
 # (since 7.1)
 #
 # @dirty-limit: If enabled, migration will use the dirty-limit algo to
-#   throttle down guest instead of auto-converge algo.
-#   Throttle algo only works when vCPU's dirtyrate greater
-#   than 'vcpu-dirty-limit', read processes in guest os
-#   aren't penalized any more, so this algo can improve
-#   performance of vCPU during live migration. This is an
-#   optional performance feature and should not affect the
-#   correctness of the existing auto-converge algo.
-#   (since 8.1)
+# throttle down guest instead of auto-converge algo.  Throttle
+# algo only works when vCPU's dirtyrate greater than
+# 'vcpu-dirty-limit', read processes in guest os aren't penalized
+# any more, so this algo can improve performance of vCPU during
+# live migration. This is an optional performance feature and
+# should not affect the correctness of the existing auto-converge
+# algo.  (since 8.1)
 #
 # Features:
 #
@@ -811,17 +810,17 @@
 # Nodes are mapped to their block device name if there is one, and
 # to their node name otherwise.  (Since 5.2)
 #
-# @x-vcpu-dirty-limit-period: Periodic time (in milliseconds) of dirty limit 
during
-# live migration. Should be in the range 1 to 
1000ms,
-# defaults to 1000ms. (Since 8.1)
+# @x-vcpu-dirty-limit-period: Periodic time (in milliseconds) of dirty
+# limit during live migration. Should be in the range 1 to 1000ms,
+# defaults to 1000ms. (Since 8.1)
 #
 # @vcpu-dirty-limit: Dirtyrate limit (MB/s) during live migration.
-#Defaults to 1. (Since 8.1)
+# Defaults to 1. (Since 8.1)
 #
 # Features:
 #
 # @unstable: Members @x-checkpoint-delay and @x-vcpu-dirty-limit-period
-#are experimental.
+# are experimental.
 #
 # Since: 2.4
 ##
@@ -977,17 +976,17 @@
 # Nodes are mapped to their block device name if there is one, and
 # to their node name otherwise.  (Since 5.2)
 #
-# @x-vcpu-dirty-limit-period: Periodic time (in milliseconds) of dirty limit 
during
-# live migration. Should be in the range 1 to 
1000ms,
-# defaults to 1000ms. (Since 8.1)
+# @x-vcpu-dirty-limit-period: Periodic time (in milliseconds) of dirty
+# limit during live migration. Should be in the range 1 to 1000ms,
+# defaults to 1000ms. (Since 8.1)
 #
 # @vcpu-dirty-limit: Dirtyrate limit (MB/s) during live migration.
-#Defaults to 1. (Since 8.1)
+# Defaults to 1. (Since 8.1)
 #
 # Features:
 #
 # @unstable: Members @x-checkpoint-delay and @x-vcpu-dirty-limit-period
-#are experimental.
+# are experimental.
 #
 # TODO: either fuse back into MigrationParameters, or make
 # MigrationParameters members mandatory
@@ -1180,17 +1179,17 @@
 # Nodes are mapped to their block d

[PATCH v2 4/5] migration: Deprecate block migration

2023-06-22 Thread Juan Quintela

It is obsolete.  It is better to use driver-mirror with NBD instead.

CC: Kevin Wolf 
CC: Eric Blake 
CC: Stefan Hajnoczi 
CC: Hanna Czenczek 

Signed-off-by: Juan Quintela 
---
 docs/about/deprecated.rst | 10 ++
 qapi/migration.json   | 30 +-
 migration/block.c |  3 +++
 migration/options.c   |  9 -
 4 files changed, 46 insertions(+), 6 deletions(-)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index f727db958e..2d7c48185e 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -447,3 +447,13 @@ The new way to modify migration is using migration 
parameters.
 ``blk`` functionality can be achieved by setting the
 ``block`` migration capability to ``true``.
 
+block migration (since 8.1)
+'''
+
+Block migration is too inflexible.  It needs to migrate all block
+devices or none.
+
+Please see "QMP invocation for live storage migration with
+``driver-mirror`` + NBD" in docs/interop/live-block-operations.rst for
+a detailed explanation.
+
diff --git a/qapi/migration.json b/qapi/migration.json
index 291af9407e..08dee855cb 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -270,11 +270,16 @@
 # average memory load of virtual CPU indirectly. Note that zero
 # means guest doesn't dirty memory (since 8.1)
 #
+# Features:
+#
+# @deprecated: @disk migration is deprecated.  Use driver-mirror
+# with NBD instead.
+#
 # Since: 0.14
 ##
 { 'struct': 'MigrationInfo',
   'data': {'*status': 'MigrationStatus', '*ram': 'MigrationStats',
-   '*disk': 'MigrationStats',
+   '*disk': { 'type': 'MigrationStats', 'features': ['deprecated'] },
'*vfio': 'VfioStats',
'*xbzrle-cache': 'XBZRLECacheStats',
'*total-time': 'int',
@@ -520,6 +525,9 @@
 #
 # Features:
 #
+# @deprecated: @block migration is deprecated.  Use driver-mirror
+# with NBD instead.
+#
 # @unstable: Members @x-colo and @x-ignore-shared are experimental.
 #
 # Since: 1.2
@@ -529,7 +537,8 @@
'compress', 'events', 'postcopy-ram',
{ 'name': 'x-colo', 'features': [ 'unstable' ] },
'release-ram',
-   'block', 'return-path', 'pause-before-switchover', 'multifd',
+   { 'name': 'block', 'features': [ 'deprecated' ] },
+   'return-path', 'pause-before-switchover', 'multifd',
'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
{ 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
'validate-uuid', 'background-snapshot',
@@ -819,6 +828,9 @@
 #
 # Features:
 #
+# @deprecated: Member @block-incremental is obsolete. Use
+# driver-mirror with NBD instead.
+#
 # @unstable: Members @x-checkpoint-delay and @x-vcpu-dirty-limit-period
 # are experimental.
 #
@@ -834,7 +846,7 @@
'tls-creds', 'tls-hostname', 'tls-authz', 'max-bandwidth',
'downtime-limit',
{ 'name': 'x-checkpoint-delay', 'features': [ 'unstable' ] },
-   'block-incremental',
+   { 'name': 'block-incremental', 'features': [ 'deprecated' ] },
'multifd-channels',
'xbzrle-cache-size', 'max-postcopy-bandwidth',
'max-cpu-throttle', 'multifd-compression',
@@ -985,6 +997,9 @@
 #
 # Features:
 #
+# @deprecated: Member @block-incremental is obsolete. Use
+# driver-mirror with NBD instead.
+#
 # @unstable: Members @x-checkpoint-delay and @x-vcpu-dirty-limit-period
 # are experimental.
 #
@@ -1013,7 +1028,8 @@
 '*downtime-limit': 'uint64',
 '*x-checkpoint-delay': { 'type': 'uint32',
  'features': [ 'unstable' ] },
-'*block-incremental': 'bool',
+'*block-incremental': { 'type': 'bool',
+'features': [ 'deprecated' ] },
 '*multifd-channels': 'uint8',
 '*xbzrle-cache-size': 'size',
 '*max-postcopy-bandwidth': 'size',
@@ -1188,6 +1204,9 @@
 #
 # Features:
 #
+# @deprecated: Member @block-incremental is obsolete. Use
+# driver-mirror with NBD instead.
+#
 # @unstable: Members @x-checkpoint-delay and
 # @x-vcpu-dirty-limit-period are experimental.
 #
@@ -1213,7 +1232,8 @@
 '*downtime-limit': 'uint64',
 '*x-checkpoint-delay': { 'type': 'uint32',
  'features': [ 'unstable' ] },
-'*block-incremental': 'bool',
+'*block-incremental': { 'type': 'bool',
+'features': [ 'deprecated' ] },
 '*multifd-channels': 'uint8',
 '*xbzrle-cache-size': 'size',
 '*max-postcopy-bandwidth': 'size',
diff --git a/migration/block.c b/migration/block.c
index b29e80bdc4..a095024108 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -722,6 +722,9 @@ static int block_save_setup(QEMUFile *f, void *opaque)
 trace_migration_block_save("setup"

[PATCH v2 0/5] Migration deprecated parts

2023-06-22 Thread Juan Quintela

On this v2:

- dropped -incoming  deprecation
  Paolo came with a better solution using keyvalues.

- skipped field is already ready for next pull request, so dropped.

- dropped the RFC bits, nermal PATCH.

- Assessed all the review comments.

- Added indentation of migration.json.

- Used the documentation pointer to substitute block migration.

Please review.

[v1]
Hi this series describe the migration parts that have to be deprecated.

- It is an rfc because I doubt that I did the deprecation process right. Hello 
Markus O:-)

- skipped field: It is older than me, I have never know what it stands
  for.  As far as I know it has always been zero.

- inc/blk migrate command options.  They are only used by block
  migration (that I deprecate on the following patch).  And they are really bad.
  grep must_remove_block_options.

- block migration.  block jobs, whatever they are called this week are
  way more flexible.  Current code works, but we broke it here and
  there, and really nobody has stand up to maintain it.  It is quite
  contained and can be left there.  Is anyone really using it?

- old compression method.  It don't work.  See last try from Lukas to
  make a test that works reliabely.  I failed with the same task years
  ago.  It is really slow, and if compression is good for you, multifd
  + zlib is going to perform/compress way more.

  I don't know what to do with this code, really.

  * Remove it for this release?  It don't work, and haven't work
reliabely in quite a few time.

  * Deprecate it and remove in another couple of releases, i.e. normal
deprecation.

  * Ideas?

- -incoming 

  if you need to set parameters (multifd cames to mind, and preempt has
  the same problem), you really needs to use defer.  So what should we do here?

  This part is not urget, because management apps have a working
  option that are already using "defer", and the code simplifacation
  if we remove it is not so big.  So we can leave it until 9.0 or
  whatever we think fit.

What do you think?

Later, Juan.

Juan Quintela (5):
  migration: Use proper indentation for migration.json
  migration: migrate 'inc' command option is deprecated.
  migration: migrate 'blk' command option is deprecated.
  migration: Deprecate block migration
  migration: Deprecate old compression method

 docs/about/deprecated.rst |  32 ++
 qapi/migration.json   | 203 --
 migration/block.c |   3 +
 migration/migration.c |  11 +++
 migration/options.c   |  22 -
 5 files changed, 198 insertions(+), 73 deletions(-)


base-commit: 5f9dd6a8ce3961db4ce47411ed2097ad88bdf5fc
prerequisite-patch-id: 99c8bffa9428838925e330eb2881bab476122579
prerequisite-patch-id: 77ba427fd916aeb395e95aa0e7190f84e98e96ab
prerequisite-patch-id: 9983d46fa438d7075a37be883529e37ae41e4228
prerequisite-patch-id: 207f7529924b12dcb57f6557d6db6f79ceb2d682
prerequisite-patch-id: 5ad1799a13845dbf893a28a202b51a6b50d95d90
prerequisite-patch-id: c51959aacd6d65ee84fcd4f1b2aed3dd6f6af879
prerequisite-patch-id: da9dbb6799b2da002c0896574334920097e4c50a
prerequisite-patch-id: c1110ffafbaf5465fb277a20db809372291f7846
prerequisite-patch-id: 8307c92bedd07446214b35b40206eb6793a7384d
prerequisite-patch-id: 0a6106cd4a508d5e700a7ff6c25edfdd03c8ca3d
prerequisite-patch-id: 83205051de22382e75bf4acdf69e59315801fa0d
prerequisite-patch-id: 8c9b3cba89d555c071a410041e6da41806106a7e
prerequisite-patch-id: 0ff62a33b9a242226ccc1f5424a516de803c9fe5
prerequisite-patch-id: 25b8ae1ebe09ace14457c454cfcb23077c37346c
prerequisite-patch-id: 466ea91d5be41fe345dacd4d17bbbe5ce13118c2
prerequisite-patch-id: d1045858f9729ac62eccf2e83ebf95cfebae2cb5
prerequisite-patch-id: 0276ec02073bda5426de39e2f2e81eef080b4f54
prerequisite-patch-id: 7afb4450a163cc1a63ea23831c50214966969131
prerequisite-patch-id: 06c053ce4f41db9675bd1778ae8f6a483641fcef
prerequisite-patch-id: 13ea05d54d741ed08b3bfefa1fc8bedb9c81c782
prerequisite-patch-id: 99c4e2b7101bc8c4b9515129a1bbe6f068053dbf
prerequisite-patch-id: 1e393a196dc7a1ee75f3cc3cebbb591c5422102f
prerequisite-patch-id: 2cf497b41f5024ede0a224b1f5b172226067a534
prerequisite-patch-id: 2a70276ed61d33fc4f3b52560753c05d1cd413be
prerequisite-patch-id: 17ec40f4388b62ba8bf3ac1546c6913f5d1f6079
prerequisite-patch-id: dba969ce9d6cf69c1319661a7d81b1c1c719804d
prerequisite-patch-id: 8d800cda87167314f07320bdb3df936c323e4a40
prerequisite-patch-id: 25d4aaf54ea66f30e426fa38bdd4e0f47303c513
prerequisite-patch-id: 082c9d8584c1daff1e827e44ee3047178e7004a7
prerequisite-patch-id: 0ef73900899425ae2f00751347afdce3739aa954
prerequisite-patch-id: e7db4730b791b71aaf417ee0f65fb6304566aaf8
prerequisite-patch-id: 62d7f28f8196039507ffe362f97723395d7bb704
prerequisite-patch-id: ea8de47bcb54e33bcc67e59e9ed752a4d1fad703
prerequisite-patch-id: 497893ef92e1ea56bd8605e6990a05cb4c7f9293
prerequisite-patch-id: 3dc869c80ee568449bbfa2a9bc427524d0e8970b
prerequisite-patch-id: 52c14b6fb14ed4ccd685385a9fbc6297b762c0ef
prerequisite-patch-id: 23de8371e9e3277

[PATCH v2 2/5] migration: migrate 'inc' command option is deprecated.

2023-06-22 Thread Juan Quintela

Set the 'block_incremental' migration parameter to 'true' instead.

Signed-off-by: Juan Quintela 
---
 docs/about/deprecated.rst |  7 +++
 qapi/migration.json   | 12 ++--
 migration/migration.c |  6 ++
 3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index e1aa0eafc8..cc0001041f 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -433,3 +433,10 @@ Migration
 ``skipped`` field in Migration stats has been deprecated.  It hasn't
 been used for more than 10 years.
 
+``inc`` migrate command option (since 8.1)
+''
+
+The new way to modify migration is using migration parameters.
+``inc`` functionality can be achieved by setting the
+``block-incremental`` migration parameter to ``true``.
+
diff --git a/qapi/migration.json b/qapi/migration.json
index ad8cc57071..8b30f748ef 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1479,13 +1479,20 @@
 #
 # @blk: do block migration (full disk copy)
 #
-# @inc: incremental disk copy migration
+# @inc: incremental disk copy migration.  This option is deprecated.
+# Set the 'block-incremetantal' migration parameter to 'true'
+# instead.
 #
 # @detach: this argument exists only for compatibility reasons and is
 # ignored by QEMU
 #
 # @resume: resume one paused migration, default "off". (since 3.0)
 #
+# Features:
+#
+# @deprecated: option @inc should be enabled by setting the
+# 'block-incremental' migration parameter to 'true'.
+#
 # Returns: nothing on success
 #
 # Since: 0.14
@@ -1507,7 +1514,8 @@
 # <- { "return": {} }
 ##
 { 'command': 'migrate',
-  'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool',
+  'data': {'uri': 'str', '*blk': 'bool',
+   '*inc': { 'type': 'bool', 'features': ['deprecated'] },
'*detach': 'bool', '*resume': 'bool' } }
 
 ##
diff --git a/migration/migration.c b/migration/migration.c
index 7a4ba2e846..abc40e6ef6 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1557,6 +1557,12 @@ static bool migrate_prepare(MigrationState *s, bool blk, 
bool blk_inc,
 {
 Error *local_err = NULL;
 
+if (blk_inc) {
+warn_report("-inc migrate option is deprecated, set the "
+"'block-incremental' migration parameter to 'true'"
+" instead.");
+}
+
 if (resume) {
 if (s->state != MIGRATION_STATUS_POSTCOPY_PAUSED) {
 error_setg(errp, "Cannot resume if there is no "
-- 
2.40.1

[PATCH v2 3/5] migration: migrate 'blk' command option is deprecated.

2023-06-22 Thread Juan Quintela

Set the 'block' migration capability to 'true' instead.

Signed-off-by: Juan Quintela 
---
 docs/about/deprecated.rst |  7 +++
 qapi/migration.json   | 10 +++---
 migration/migration.c |  5 +
 3 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index cc0001041f..f727db958e 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -440,3 +440,10 @@ The new way to modify migration is using migration 
parameters.
 ``inc`` functionality can be achieved by setting the
 ``block-incremental`` migration parameter to ``true``.
 
+``blk`` migrate command option (since 8.1)
+''
+
+The new way to modify migration is using migration parameters.
+``blk`` functionality can be achieved by setting the
+``block`` migration capability to ``true``.
+
diff --git a/qapi/migration.json b/qapi/migration.json
index 8b30f748ef..291af9407e 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1477,7 +1477,9 @@
 #
 # @uri: the Uniform Resource Identifier of the destination VM
 #
-# @blk: do block migration (full disk copy)
+# @blk: do block migration (full disk copy). This option is
+# deprecated.  Set the 'block' migration capability to 'true'
+# instead.
 #
 # @inc: incremental disk copy migration.  This option is deprecated.
 # Set the 'block-incremetantal' migration parameter to 'true'
@@ -1491,7 +1493,8 @@
 # Features:
 #
 # @deprecated: option @inc should be enabled by setting the
-# 'block-incremental' migration parameter to 'true'.
+# 'block-incremental' migration parameter to 'true', option @blk
+# should be enabled by setting the 'block' capability to 'true'.
 #
 # Returns: nothing on success
 #
@@ -1514,7 +1517,8 @@
 # <- { "return": {} }
 ##
 { 'command': 'migrate',
-  'data': {'uri': 'str', '*blk': 'bool',
+  'data': {'uri': 'str',
+   '*blk': { 'type': 'bool', 'features': ['deprecated'] },
'*inc': { 'type': 'bool', 'features': ['deprecated'] },
'*detach': 'bool', '*resume': 'bool' } }
 
diff --git a/migration/migration.c b/migration/migration.c
index abc40e6ef6..4c7e8ff5ee 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1563,6 +1563,11 @@ static bool migrate_prepare(MigrationState *s, bool blk, 
bool blk_inc,
 " instead.");
 }
 
+if (blk) {
+warn_report("-blk migrate option is deprecated, set the "
+"'block' capability to 'true' instead.");
+}
+
 if (resume) {
 if (s->state != MIGRATION_STATUS_POSTCOPY_PAUSED) {
 error_setg(errp, "Cannot resume if there is no "
-- 
2.40.1

Re: [RFC 4/6] migration: Deprecate -incoming

2023-06-22 Thread Juan Quintela

Peter Xu  wrote:
> On Thu, Jun 22, 2023 at 11:22:56AM +0200, Thomas Huth wrote:
>> Then simply forbid "migrate_set_parameter multifd-channels ..." if the uri
>> has been specified on the command line?
>
> Yeah, actually already in a pull (even though the pr may need a new one..):
>
> https://lore.kernel.org/r/20230622021320.66124-23-quint...@redhat.com

That is a different problem, and different solution.

It you try to set multifd_channels after migration has started, it just
fails telling that you can't change it so late.

Later, Juan.

Re: [RFC 4/6] migration: Deprecate -incoming

2023-06-22 Thread Juan Quintela

Peter Xu  wrote:
> On Mon, Jun 12, 2023 at 10:51:08PM +0200, Juan Quintela wrote:
>> Peter Xu  wrote:
>> > On Mon, Jun 12, 2023 at 09:33:42PM +0200, Juan Quintela wrote:
>> >> Only "defer" is recommended.  After setting all migation parameters,
>> >> start incoming migration with "migrate-incoming uri" command.
>> >> 
>> >> Signed-off-by: Juan Quintela 
>> >> ---
>> >>  docs/about/deprecated.rst | 7 +++
>> >>  softmmu/vl.c  | 2 ++
>> >>  2 files changed, 9 insertions(+)
>> >> 
>> >> diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
>> >> index 47e98dc95e..518672722d 100644
>> >> --- a/docs/about/deprecated.rst
>> >> +++ b/docs/about/deprecated.rst
>> >> @@ -447,3 +447,10 @@ The new way to modify migration is using migration 
>> >> parameters.
>> >>  ``blk`` functionality can be acchieved using
>> >>  ``migrate_set_parameter block-incremental true``.
>> >>  
>> >> +``-incoming uri`` (since 8.1)
>> >> +'
>> >> +
>> >> +Everything except ``-incoming defer`` are deprecated.  This allows to
>> >> +setup parameters before launching the proper migration with
>> >> +``migrate-incoming uri``.
>> >> +
>> >> diff --git a/softmmu/vl.c b/softmmu/vl.c
>> >> index b0b96f67fa..7fe865ab59 100644
>> >> --- a/softmmu/vl.c
>> >> +++ b/softmmu/vl.c
>> >> @@ -2651,6 +2651,8 @@ void qmp_x_exit_preconfig(Error **errp)
>> >>  if (incoming) {
>> >>  Error *local_err = NULL;
>> >>  if (strcmp(incoming, "defer") != 0) {
>> >> +warn_report("-incoming %s is deprecated, use -incoming defer 
>> >> and "
>> >> +" set the uri with migrate-incoming.", incoming);
>> >
>> > I still use uri for all my scripts, alongside with "-global migration.xxx"
>> > and it works.
>> 
>> You know what you are doing (TM).
>> And remember that we don't support -gobal migration.x-foo.
>> Yes, I know, we should drop the "x-" prefixes.
>
> I hope they'll always be there. :) They're pretty handy for tests, when we
> want to boot a VM without the need to script the sequences of qmp cmds.
>
> Yes, we probably should just always drop the x-.  We can always declare
> debugging purpose for all -global migration.* fields.
>
>> 
>> > Shall we just leave it there?  Or is deprecating it helps us in any form?
>> 
>> See the patches two weeks ago when people complained that lisen(.., num)
>> was too low.  And there are other parameters that work the same way
>> (that I convenientely had forgotten).  So the easiest way to get things
>> right is to use "defer" always.  Using -incoming "uri" should only be
>> for people that "know what they are doing", so we had to ways to do it:
>> - review all migration options and see which ones work without defer
>>   and document it
>> - deprecate everything that is not defer.
>> 
>> Anything else is not going to be very user unfriendly.
>> What do you think.
>
> IIRC Wei Wang had a series just for that, so after that patchset applied we
> should have fixed all issues cleanly?

No, what he does is using always a very big value for listen.  But that
is it.  Anyways, I don't know how to change the backlog listen value
without restarting the listen call.

> Is there one more thing that's not
> working right there?

Compression has other problems.  But independentely of that, they have
the problem that we need to set the parameters before we call incoming.

>> PD.  This series are RFC for multiple reasons O:-)
>
> Happy to know the rest (besides which I know will break my script :).

Thanks, Juan.

Re: [RFC 6/6] migration: Deprecated old compression method

2023-06-22 Thread Juan Quintela

Daniel P. Berrangé  wrote:
> On Mon, Jun 12, 2023 at 09:33:44PM +0200, Juan Quintela wrote:
>> Signed-off-by: Juan Quintela 
>> ---
>>  docs/about/deprecated.rst |  8 
>>  qapi/migration.json   | 92 ---
>>  migration/options.c   | 13 ++
>>  3 files changed, 79 insertions(+), 34 deletions(-)
>> 
>> diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
>> index 173c5ba5cb..fe7f2bbde8 100644
>> --- a/docs/about/deprecated.rst
>> +++ b/docs/about/deprecated.rst
>> @@ -460,3 +460,11 @@ block migration (since 8.1)
>>  Block migration is too inflexible.  It needs to migrate all block
>>  devices or none.  Use driver_mirror+NBD instead.
>>  
>> +old compression method (since 8.1)
>> +''
>> +
>> +Compression method fails too much.  Too many races.  We are going to
>> +remove it if nobody fixes it.  For starters, migration-test
>> +compression tests are disabled becase they hand randomly.  If you need
>> +compression, use multifd compression methods.
>> +
>> diff --git a/qapi/migration.json b/qapi/migration.json
>> index a8497de48d..40a8b5d124 100644
>> --- a/qapi/migration.json
>> +++ b/qapi/migration.json
>> @@ -244,6 +244,7 @@
>>  #
>>  # @compression: migration compression statistics, only returned if
>>  # compression feature is on and status is 'active' or 'completed'
>> +# It is obsolete and deprecated.  Use multifd compression methods.
>>  # (Since 3.1)
>
> This doesn't give users an indication /why/ we're saying this. Instead
> I'd suggest
>
>   This feature is unreliable and not tested. It is recommended to
>   use multifd migration instead, which offers an alternative reliable
>   and tested compression implementation.

Much better.  Done, thanks.


>>  # @deprecated: @disk migration is deprecated.  Use driver_mirror+NBD
>> -# instead.
>> +# instead. @compression is obsolete use multifd compression
>> +# methods instead.
>
> For @deprecated, are we supposed to list multiple things at once, or
> use a separate @deprecated tag for each one ?

# @unstable: Members @x-colo and @x-ignore-shared are experimental.

This is the only example that I found that is similar.
Only one example.  Markus?

>
> Again I'd suggest rewording
>
> @compression is unreliable and untested. It is recommended to
> use multifd migration, which offers an alternative compression
> implementation that is reliable and tested.

Done.



>> @@ -443,6 +443,11 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, 
>> Error **errp)
>>  "Use driver_mirror+NBD instead.");
>>  }
>>  
>> +if (new_caps[MIGRATION_CAPABILITY_BLOCK]) {
>
> Surely MIGRATION_CAPABILITY_COMPRESS not BLOCK ?

Good catch.  Copy & paste to its best.

Thanks very much.

Re: [RFC 6/6] migration: Deprecated old compression method

2023-06-22 Thread Juan Quintela

Thomas Huth  wrote:
> On 12/06/2023 21.33, Juan Quintela wrote:
>> Signed-off-by: Juan Quintela 
>> ---
>>   docs/about/deprecated.rst |  8 
>>   qapi/migration.json   | 92 ---
>>   migration/options.c   | 13 ++
>>   3 files changed, 79 insertions(+), 34 deletions(-)
>> diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
>> index 173c5ba5cb..fe7f2bbde8 100644
>> --- a/docs/about/deprecated.rst
>> +++ b/docs/about/deprecated.rst
>> @@ -460,3 +460,11 @@ block migration (since 8.1)
>>   Block migration is too inflexible.  It needs to migrate all block
>>   devices or none.  Use driver_mirror+NBD instead.
>>   +old compression method (since 8.1)
>> +''
>> +
>> +Compression method fails too much.  Too many races.  We are going to
>> +remove it if nobody fixes it.  For starters, migration-test
>> +compression tests are disabled becase they hand randomly.  If you need
>
> "because they fail randomly" ?

yeap.

>>   # @deprecated: @disk migration is deprecated.  Use driver_mirror+NBD
>> -# instead.
>> +# instead. @compression is obsolete use multifd compression
>
> Use a dot or comma after "obsolete".

fixed.

>> @@ -503,6 +506,7 @@
>>   # Features:
>>   #
>>   # @deprecated: @block migration is deprecated.  Use driver_mirror+NBD
>> +# instead. @compress is obsolete use multifd compression methods
>
> dito

fixed.

>> -# @compress-threads: compression thread count
>> +# @compress-threads: compression thread count. Obsolote and
>
> Obsolete

Fixed.

>> @@ -1182,7 +1209,6 @@
>>'features': [ 'unstable' ] },
>>   '*block-incremental': { 'type': 'bool',
>>   'features': [ 'deprecated' ] },
>> -'*block-incremental': 'bool',
>
> That hunk should go into a previous patch, I think.

Have found it already (it didn't compile).

Thanks, Juan.

Re: [RFC 4/6] migration: Deprecate -incoming

2023-06-22 Thread Peter Xu

On Thu, Jun 22, 2023 at 05:33:29PM +0100, Daniel P. Berrangé wrote:
> On Thu, Jun 22, 2023 at 11:54:43AM -0400, Peter Xu wrote:
> > I can try to move the todo even higher.  Trying to list the initial goals
> > here:
> > 
> > - One extra phase of handshake between src/dst (maybe the time to boost
> >   QEMU_VM_FILE_VERSION) before anything else happens.
> > 
> > - Dest shouldn't need to apply any cap/param, it should get all from src.
> >   Dest still need to be setup with an URI and that should be all it needs.
> > 
> > - Src shouldn't need to worry on the binary version of dst anymore as long
> >   as dest qemu supports handshake, because src can fetch it from dest.
> 
> I'm not sure that works in general. Even if we have a handshake and
> bi-directional comms for live migration, we still haave the save/restore
> to file codepath to deal with. The dst QEMU doesn't exist at the time
> the save process is done, so we can't add logic to VMSate handling that
> assumes knowledge of the dst version at time of serialization.

My current thought was still based on a new cap or anything the user would
need to specify first on both sides (but hopefully the last cap to set on
dest).

E.g. if with a new handshake cap we shouldn't set it on a exec: or file:
protocol migration, and it should just fail on qmp_migrate() telling that
the URI is not supported if the cap is set.  Return path is definitely
required here.

> 
> > - Handshake can always fail gracefully if anything wrong happened, it
> >   normally should mean dest qemu is not compatible with src's setup (either
> >   machine, device, or migration configs) for whatever reason.  Src should
> >   be able to get a solid error from dest if so.
> > 
> > - Handshake protocol should always be self-bootstrap-able, it means when we
> >   change the handshake protocol it should always works with old binaries.
> > 
> >   - When src is newer it should be able to know what's missing on dest and
> > skip the new bits.
> > 
> >   - When dst is newer it should all rely on src (which is older) and it
> > should always understand src's language.
> 
> I'm not convinced it can reliably self-bootstrap in a backwards
> compatible manner, precisely because the current migration stream
> has no handshake and only requires a unidirectional channel.

Yes, please see above.  I meant when we grow the handshake protocol we
should make sure we don't need anything new to be setup either on src/dst
of qemu.  It won't apply to before-handshake binaries.

> I don't think its possible for QEMU to validate that it has a fully
> bi-directional channel, without adding timeouts to its detection which I
> think we should strive to avoid.
> 
> I don't think we actually need self-bootstrapping anyway.
> 
> I think the mgmt app can just indicate the new v2 bi-directional
> protocol when issuing the 'migrate' and 'migrate-incoming'
> commands.  This becomes trivial when Het's refactoring of the
> migrate address QAPI is accepted:
> 
>   https://lists.gnu.org/archive/html/qemu-devel/2023-05/msg04851.html
> 
> eg:
> 
> { "execute": "migrate",
>   "arguments": {
>   "channels": [ { "channeltype": "main",
>   "addr": { "transport": "socket", "type": "inet",
>"host": "10.12.34.9",
> "port": "1050" } } ] } }
> 
> note the 'channeltype' parameter here. If we declare the 'main'
> refers to the existing migration protocol, then we merely need
> to define a new 'channeltype' to use as an indicator for the
> v2 migration handshake protocol.

Using a new channeltype would also work at least on src qemu, but I'm not
sure on how dest qemu would know that it needs a handshake in that case,
because it knows nothing until the connection is established.

Maybe we still need QEMU_VM_FILE_VERSION to be boosted at least in this
case, so dest can read this at the very beginning, old binaries will fail
immediately, new binaries will start to talk with v2 language.

> 
> > - All !main channels need to be established later than the handshake - if
> >   we're going to do this anyway we probably should do it altogether to make
> >   channels named, so each channel used in migration needs to have a common
> >   header.  Prepare to deprecate the old tricks of channel orderings.
> 
> Once the primary channel involves a bi-directional handshake,
> we'll trivially ensure ordering - similar to how the existing
> code worked fnie in TLS mode which had a bi-directional TLS
> handshake.

I'm not sure I fully get it here.

IIUC tls handshake was mostly transparent to QEMU in this case while we're
relying on gnutls_handshake().  Here IIUC we need to design the roundtrip
messages to sync up two qemus well.

The round trip messages can contain a lot of things that can be useful to
us, besides knowing what features dest supports, what caps src use, we can
e.g. also provide a device tree dump from dest and try to match it on sr

Re: [PATCH][RESEND v5 3/3] Add a Hyper-V Dynamic Memory Protocol driver (hv-balloon)

2023-06-22 Thread Maciej S. Szmigiero


On 22.06.2023 14:52, David Hildenbrand wrote:

On 22.06.23 14:14, Maciej S. Szmigiero wrote:

On 22.06.2023 14:06, David Hildenbrand wrote:

On 22.06.23 13:17, Maciej S. Szmigiero wrote:

On 22.06.2023 13:15, David Hildenbrand wrote:

On 22.06.23 13:12, Maciej S. Szmigiero wrote:

On 22.06.2023 13:01, David Hildenbrand wrote:

[...]


We'd use a memory region container as device memory region (like [1]) and would 
have to handle the !memdev case (I can help with that). > Into that, you can 
map the RAM memory region on demand (and eventually even using multiple slots like 
[1]).

(2) Use a single virtual DIMM and (un)plug that on demand. Let the machine code 
handle (un)plugging of the device.


(1) feels cleanest to me, although it will require a bit more work.



I also think approach (1) makes more sense as it avoids memslot metadata
overhead for not-yet-hot-added parts of the memory backing device.

Not sure what you mean that the !memdev case would be problematic in this
case - it is working in the current driver shape so why would adding
potential memory subregions (used in the memdev case) change that?


I'm thinking about the case where you have a hv-balloon device without a memdev.

Without -m X,maxmem=y we don't currently expect to have memory devices around
(and especially them getting (un)plugged. But why should we "force" to set the
"maxmem" option


I guess it's only a small change to QEMU to allow having hv-balloon
device (without a memdev) even in the case where there's no "maxmem"
option given on the QEMU command line.



I hope I'll find some time soonish to prototype what I have in mind, to see
if it could be made working.



Okay, so I'll wait for your prototype before commencing further work on
the next version of this driver.


About to have something simplistic running -- I think. Want to test with a 
Linux VM, but I don't seem to get it working (also without my changes).


#!/bin/bash

build/qemu-system-x86_64 \
        --enable-kvm \
        -m 4G,maxmem=36G \
        -cpu host,hv-syndbg=on,hv-synic,hv-relaxed,hv-vpindex \
        -smp 16 \
        -nographic \
        -nodefaults \
        -net nic -net user \
        -chardev stdio,nosignal,id=serial \
        -hda Fedora-Cloud-Base-37-1.7.x86_64.qcow2 \
        -cdrom /home/dhildenb/git/cloud-init/cloud-init.iso \
        -device isa-serial,chardev=serial \
        -chardev socket,id=monitor,path=/var/tmp/mon_src,server,nowait \
        -mon chardev=monitor,mode=readline \
        -device vmbus-bridge \
        -object memory-backend-ram,size=2G,id=mem0 \
        -device hv-balloon,id=hv1,memdev=mem0



[root@vm-0 ~]# uname -r
6.3.5-100.fc37.x86_64
[root@vm-0 ~]# modprobe hv_balloon
modprobe: ERROR: could not insert 'hv_balloon': No such device


Any magic flag I am missing? Or is there something preventing this to work with 
Linux VMs?



Haven't tested the driver with Linux guests in a long time (as it is
targeting Windows), but I think you need to disable KVM PV interface for
the Hyper-V one to be detected by Linux.

Something like adding "kvm=off" to "-cpu" and seeing in the dmesg whether
the detected hypervisor is now Hyper-V.

Also, you need to disable S4 in the guest for hot-add capability to work
(I'm adding "-global ICH9-LPC.disable_s4=1" with q35 machine for this).

Would also suggest adding "--trace 'hv_balloon_*' --trace 'memory_device_*'"
to QEMU command line to see what's happening.


VM is not happy:

[    1.908595] BUG: kernel NULL pointer dereference, address: 0007
[    1.908837] #PF: supervisor read access in kernel mode
[    1.908837] #PF: error_code(0x) - not-present page
[    1.908837] PGD 0 P4D 0
[    1.908837] Oops:  [#1] PREEMPT SMP NOPTI
[    1.908837] CPU: 13 PID: 492 Comm: (udev-worker) Not tainted 
6.3.5-100.fc37.x86_64 #1
[    1.908837] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.16.2-0-gea1b7a073390-p4
[    1.908837] RIP: 0010:acpi_ns_lookup+0x8f/0x4c0
[    1.908837] Code: 8b 3d f5 eb 1c 03 83 05 52 ec 1c 03 01 48 85 ff 0f 84 51 
03 00 00 44 89 c3 4c 89 cb
[    1.908837] RSP: 0018:95b680ad7950 EFLAGS: 00010286
[    1.908837] RAX: 95b680ad79e0 RBX: 0002 RCX: 0003
[    1.908837] RDX:  RSI: 8a0283a3c558 RDI: a4b376e0
[    1.908837] RBP:  R08: 0002 R09: 
[    1.908837] R10: 8a02811034ec R11:  R12: 
[    1.908837] R13: 8a02811034e8 R14: 8a02811034e8 R15: 
[    1.908837] FS:  7f3bb2e7d0c0() GS:8a02bbd4() 
knlGS:
[    1.908837] CS:  0010 DS:  ES:  CR0: 80050033
[    1.908837] CR2: 0007 CR3: 000100a58002 CR4: 00770ee0
[    1.908837] PKRU: 5554
[    1.908837] Call Trace:
[    1.908837]  
[    1.908837]  ? __die+0x23/0x70
[    1.908837]  ? page_fault_oops+0x171/0x4e0
[    1.908837]  ? prepare_alloc_pages.constprop.0+0xf6/0x1a0

Re: [PATCH qemu v2] change the fdt_load_addr variable datatype to handle 64-bit DRAM address

2023-06-22 Thread Daniel Henrique Barboza


(CC-ing Alistair)

On 6/20/23 14:44, ~rlakshmibai wrote:

From: Lakshmi Bai Raja Subramanian 


fdt_load_addr is getting overflowed when there is no DRAM at lower 32 bit 
address space.
To support pure 64-bit DRAM address, fdt_load_addr variable's data type is 
changed to uint64_t
instead of uint32_t.


It's worth mentioning that fdt_load_addr receives the result of
riscv_compute_fdt_addr(), which is an uint64_t.



Signed-off-by: Lakshmi Bai Raja Subramanian 

---


Reviewed-by: Daniel Henrique Barboza 


  hw/riscv/virt.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index 95708d890e..c348529ac0 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -1244,7 +1244,7 @@ static void virt_machine_done(Notifier *notifier, void 
*data)
  target_ulong start_addr = memmap[VIRT_DRAM].base;
  target_ulong firmware_end_addr, kernel_start_addr;
  const char *firmware_name = riscv_default_firmware_name(&s->soc[0]);
-uint32_t fdt_load_addr;
+uint64_t fdt_load_addr;





  uint64_t kernel_entry = 0;
  BlockBackend *pflash_blk0;

Re: [RFC 5/6] migration: Deprecate block migration

2023-06-22 Thread Juan Quintela

Stefan Hajnoczi  wrote:
> On Mon, Jun 12, 2023 at 09:33:43PM +0200, Juan Quintela wrote:
>> It is obsolete.  It is better to use driver_mirror+NBD instead.
>> 
>> CC: Kevin Wolf 
>> CC: Eric Blake 
>> CC: Stefan Hajnoczi 
>> CC: Hanna Czenczek 
>> 
>> Signed-off-by: Juan Quintela 
>> 
>> ---
>> 
>> Can any of you give one example of how to use driver_mirror+NBD for
>> deprecated.rst?
>
> Please see "QMP invocation for live storage migration with
> ``drive-mirror`` + NBD" in docs/interop/live-block-operations.rst for a
> detailed explanation.

You put here drive-mirror, and everything else blockdev-mirror.

It appears that blockdev-mirror is the new name from driver-mirror, but
as the documentation says driver-mirror + NBD, I am changing to
driver-mirror everywhere?

Thanks, Juan.

Re: [RFC 4/6] migration: Deprecate -incoming

2023-06-22 Thread Juan Quintela

Juan Quintela  wrote:
> Only "defer" is recommended.  After setting all migation parameters,
> start incoming migration with "migrate-incoming uri" command.
>
> Signed-off-by: Juan Quintela 

Nack myself.

Dropped on next submissiong.  keyfile properties suggested by paolo is a
much better suggestion.

Thanks to everybody involved.

Re: [RFC 3/6] migration: migrate 'blk' command option is deprecated.

2023-06-22 Thread Juan Quintela

Daniel P. Berrangé  wrote:
> On Mon, Jun 12, 2023 at 09:33:41PM +0200, Juan Quintela wrote:
>> Use 'migrate_set_capability block true' instead.
>> 
>> Signed-off-by: Juan Quintela 
>> ---
>>  docs/about/deprecated.rst |  7 +++
>>  qapi/migration.json   | 11 +++
>>  migration/migration.c |  5 +
>>  3 files changed, 19 insertions(+), 4 deletions(-)
>> 
>> diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
>> index c75a3a8f5a..47e98dc95e 100644
>> --- a/docs/about/deprecated.rst
>> +++ b/docs/about/deprecated.rst
>> @@ -440,3 +440,10 @@ The new way to modify migration is using migration 
>> parameters.
>>  ``inc`` functionality can be acchieved using
>>  ``migrate_set_parameter block-incremental true``.
>>  
>> +``blk`` migrate command option (since 8.1)
>> +''
>> +
>> +The new way to modify migration is using migration parameters.
>> +``blk`` functionality can be acchieved using
>> +``migrate_set_parameter block-incremental true``.
>
> Same comments on rewording as the previous patch, so won't repeate them
> all.

Did the same than the previous one.  Thanks.

Re: [RFC 2/6] migration: migrate 'inc' command option is deprecated.

2023-06-22 Thread Juan Quintela

Daniel P. Berrangé  wrote:
> On Mon, Jun 12, 2023 at 09:33:40PM +0200, Juan Quintela wrote:
>> Use 'migrate_set_parameter block_incremental true' instead.
>> 
>> Signed-off-by: Juan Quintela 
>> ---
>>  docs/about/deprecated.rst |  7 +++
>>  qapi/migration.json   | 11 +--
>>  migration/migration.c |  5 +
>>  3 files changed, 21 insertions(+), 2 deletions(-)
>> 
>> diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
>> index e1aa0eafc8..c75a3a8f5a 100644
>> --- a/docs/about/deprecated.rst
>> +++ b/docs/about/deprecated.rst
>> @@ -433,3 +433,10 @@ Migration
>>  ``skipped`` field in Migration stats has been deprecated.  It hasn't
>>  been used for more than 10 years.
>>  
>> +``inc`` migrate command option (since 8.1)
>> +''
>> +
>> +The new way to modify migration is using migration parameters.
>> +``inc`` functionality can be acchieved using
>> +``migrate_set_parameter block-incremental true``.
>
> This is a HMP command, but the change affects QMP too. I'd suggest
>
>  ``inc`` functionality can be achieved by setting the
>  ``block-incremental`` migration parameter to ``true``.

Applied all suggestions.  Thanks.

Re: [PATCH v4 13/17] target/riscv: Add Zvkg ISA extension support

2023-06-22 Thread Daniel Henrique Barboza





On 6/22/23 13:16, Max Chou wrote:

From: Nazar Kazakov 

This commit adds support for the Zvkg vector-crypto extension, which
consists of the following instructions:

* vgmul.vv
* vghsh.vv

Translation functions are defined in
`target/riscv/insn_trans/trans_rvvk.c.inc` and helpers are defined in
`target/riscv/vcrypto_helper.c`.

Co-authored-by: Lawrence Hunter 
[max.c...@sifive.com: Replaced vstart checking by TCG op]
Signed-off-by: Lawrence Hunter 
Signed-off-by: Nazar Kazakov 
Signed-off-by: Max Chou 
---


Reviewed-by: Daniel Henrique Barboza 


  target/riscv/cpu.c   |  5 +-
  target/riscv/cpu_cfg.h   |  1 +
  target/riscv/helper.h|  3 +
  target/riscv/insn32.decode   |  4 ++
  target/riscv/insn_trans/trans_rvvk.c.inc | 30 ++
  target/riscv/vcrypto_helper.c| 72 
  6 files changed, 113 insertions(+), 2 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index c9a9ff80cd..8e60a122d4 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -118,6 +118,7 @@ static const struct isa_ext_data isa_edata_arr[] = {
  ISA_EXT_DATA_ENTRY(zve64d, PRIV_VERSION_1_10_0, ext_zve64d),
  ISA_EXT_DATA_ENTRY(zvfh, PRIV_VERSION_1_12_0, ext_zvfh),
  ISA_EXT_DATA_ENTRY(zvfhmin, PRIV_VERSION_1_12_0, ext_zvfhmin),
+ISA_EXT_DATA_ENTRY(zvkg, PRIV_VERSION_1_12_0, ext_zvkg),
  ISA_EXT_DATA_ENTRY(zvkned, PRIV_VERSION_1_12_0, ext_zvkned),
  ISA_EXT_DATA_ENTRY(zvknha, PRIV_VERSION_1_12_0, ext_zvknha),
  ISA_EXT_DATA_ENTRY(zvknhb, PRIV_VERSION_1_12_0, ext_zvknhb),
@@ -1198,8 +1199,8 @@ void riscv_cpu_validate_set_extensions(RISCVCPU *cpu, 
Error **errp)
   * In principle Zve*x would also suffice here, were they supported
   * in qemu
   */
-if ((cpu->cfg.ext_zvbb || cpu->cfg.ext_zvkned || cpu->cfg.ext_zvknha ||
- cpu->cfg.ext_zvksh) && !cpu->cfg.ext_zve32f) {
+if ((cpu->cfg.ext_zvbb || cpu->cfg.ext_zvkg || cpu->cfg.ext_zvkned ||
+ cpu->cfg.ext_zvknha || cpu->cfg.ext_zvksh) && !cpu->cfg.ext_zve32f) {
  error_setg(errp,
 "Vector crypto extensions require V or Zve* extensions");
  return;
diff --git a/target/riscv/cpu_cfg.h b/target/riscv/cpu_cfg.h
index f859d9e2f5..b125b0b33f 100644
--- a/target/riscv/cpu_cfg.h
+++ b/target/riscv/cpu_cfg.h
@@ -85,6 +85,7 @@ struct RISCVCPUConfig {
  bool ext_zve64d;
  bool ext_zvbb;
  bool ext_zvbc;
+bool ext_zvkg;
  bool ext_zvkned;
  bool ext_zvknha;
  bool ext_zvknhb;
diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 9220af18e6..a4fe1ff5ca 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1241,3 +1241,6 @@ DEF_HELPER_5(vsha2cl_vv, void, ptr, ptr, ptr, env, i32)
  
  DEF_HELPER_5(vsm3me_vv, void, ptr, ptr, ptr, env, i32)

  DEF_HELPER_5(vsm3c_vi, void, ptr, ptr, i32, env, i32)
+
+DEF_HELPER_5(vghsh_vv, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_4(vgmul_vv, void, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 5ca83e8462..b10497afd3 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -957,3 +957,7 @@ vsha2cl_vv  10 1 . . 010 . 1110111 @r_vm_1
  # *** Zvksh vector crypto extension ***
  vsm3me_vv   10 1 . . 010 . 1110111 @r_vm_1
  vsm3c_vi101011 1 . . 010 . 1110111 @r_vm_1
+
+# *** Zvkg vector crypto extension ***
+vghsh_vv101100 1 . . 010 . 1110111 @r_vm_1
+vgmul_vv101000 1 . 10001 010 . 1110111 @r2_vm_1
diff --git a/target/riscv/insn_trans/trans_rvvk.c.inc 
b/target/riscv/insn_trans/trans_rvvk.c.inc
index af1fb74c38..e5ccb26c45 100644
--- a/target/riscv/insn_trans/trans_rvvk.c.inc
+++ b/target/riscv/insn_trans/trans_rvvk.c.inc
@@ -510,3 +510,33 @@ static inline bool vsm3c_check(DisasContext *s, arg_rmrr 
*a)
  
  GEN_VV_UNMASKED_TRANS(vsm3me_vv, vsm3me_check, ZVKSH_EGS)

  GEN_VI_UNMASKED_TRANS(vsm3c_vi, vsm3c_check, ZVKSH_EGS)
+
+/*
+ * Zvkg
+ */
+
+#define ZVKG_EGS 4
+
+static bool vgmul_check(DisasContext *s, arg_rmr *a)
+{
+int egw_bytes = ZVKG_EGS << s->sew;
+return s->cfg_ptr->ext_zvkg == true &&
+   vext_check_isa_ill(s) &&
+   require_rvv(s) &&
+   MAXSZ(s) >= egw_bytes &&
+   vext_check_ss(s, a->rd, a->rs2, a->vm) &&
+   s->sew == MO_32;
+}
+
+GEN_V_UNMASKED_TRANS(vgmul_vv, vgmul_check, ZVKG_EGS)
+
+static bool vghsh_check(DisasContext *s, arg_rmrr *a)
+{
+int egw_bytes = ZVKG_EGS << s->sew;
+return s->cfg_ptr->ext_zvkg == true &&
+   opivv_check(s, a) &&
+   MAXSZ(s) >= egw_bytes &&
+   s->sew == MO_32;
+}
+
+GEN_VV_UNMASKED_TRANS(vghsh_vv, vghsh_check, ZVKG_EGS)
diff --git a/target/riscv/vcrypto_helper.c b/target/riscv/vcrypto_helper.c
index 06c8f4adc7..04e6374211 100644
--- a/target/riscv/vcrypto_helper.c
+++ b/target/riscv/vcrypto_helper.c
@@ -851,3 +851,75 @@ void HELPER(vsm3c_

Re: [PATCH v4 12/17] target/riscv: Add Zvksh ISA extension support

2023-06-22 Thread Daniel Henrique Barboza





On 6/22/23 13:16, Max Chou wrote:

From: Lawrence Hunter 

This commit adds support for the Zvksh vector-crypto extension, which
consists of the following instructions:

* vsm3me.vv
* vsm3c.vi

Translation functions are defined in
`target/riscv/insn_trans/trans_rvvk.c.inc` and helpers are defined in
`target/riscv/vcrypto_helper.c`.

Co-authored-by: Kiran Ostrolenk 
[max.c...@sifive.com: Replaced vstart checking by TCG op]
Signed-off-by: Kiran Ostrolenk 
Signed-off-by: Lawrence Hunter 
Signed-off-by: Max Chou 
---


Reviewed-by: Daniel Henrique Barboza 


  target/riscv/cpu.c   |   5 +-
  target/riscv/cpu_cfg.h   |   1 +
  target/riscv/helper.h|   3 +
  target/riscv/insn32.decode   |   4 +
  target/riscv/insn_trans/trans_rvvk.c.inc |  31 ++
  target/riscv/vcrypto_helper.c| 134 +++
  6 files changed, 176 insertions(+), 2 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 6bba8ba8c9..c9a9ff80cd 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -121,6 +121,7 @@ static const struct isa_ext_data isa_edata_arr[] = {
  ISA_EXT_DATA_ENTRY(zvkned, PRIV_VERSION_1_12_0, ext_zvkned),
  ISA_EXT_DATA_ENTRY(zvknha, PRIV_VERSION_1_12_0, ext_zvknha),
  ISA_EXT_DATA_ENTRY(zvknhb, PRIV_VERSION_1_12_0, ext_zvknhb),
+ISA_EXT_DATA_ENTRY(zvksh, PRIV_VERSION_1_12_0, ext_zvksh),
  ISA_EXT_DATA_ENTRY(zhinx, PRIV_VERSION_1_12_0, ext_zhinx),
  ISA_EXT_DATA_ENTRY(zhinxmin, PRIV_VERSION_1_12_0, ext_zhinxmin),
  ISA_EXT_DATA_ENTRY(smaia, PRIV_VERSION_1_12_0, ext_smaia),
@@ -1197,8 +1198,8 @@ void riscv_cpu_validate_set_extensions(RISCVCPU *cpu, 
Error **errp)
   * In principle Zve*x would also suffice here, were they supported
   * in qemu
   */
-if ((cpu->cfg.ext_zvbb || cpu->cfg.ext_zvkned || cpu->cfg.ext_zvknha) &&
-!cpu->cfg.ext_zve32f) {
+if ((cpu->cfg.ext_zvbb || cpu->cfg.ext_zvkned || cpu->cfg.ext_zvknha ||
+ cpu->cfg.ext_zvksh) && !cpu->cfg.ext_zve32f) {
  error_setg(errp,
 "Vector crypto extensions require V or Zve* extensions");
  return;
diff --git a/target/riscv/cpu_cfg.h b/target/riscv/cpu_cfg.h
index 41cce87ffc..f859d9e2f5 100644
--- a/target/riscv/cpu_cfg.h
+++ b/target/riscv/cpu_cfg.h
@@ -88,6 +88,7 @@ struct RISCVCPUConfig {
  bool ext_zvkned;
  bool ext_zvknha;
  bool ext_zvknhb;
+bool ext_zvksh;
  bool ext_zmmul;
  bool ext_zvfh;
  bool ext_zvfhmin;
diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 19f5a8a28d..9220af18e6 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1238,3 +1238,6 @@ DEF_HELPER_5(vaeskf2_vi, void, ptr, ptr, i32, env, i32)
  DEF_HELPER_5(vsha2ms_vv, void, ptr, ptr, ptr, env, i32)
  DEF_HELPER_5(vsha2ch_vv, void, ptr, ptr, ptr, env, i32)
  DEF_HELPER_5(vsha2cl_vv, void, ptr, ptr, ptr, env, i32)
+
+DEF_HELPER_5(vsm3me_vv, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vsm3c_vi, void, ptr, ptr, i32, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index d2cfb2729c..5ca83e8462 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -953,3 +953,7 @@ vaeskf2_vi  101010 1 . . 010 . 1110111 @r_vm_1
  vsha2ms_vv  101101 1 . . 010 . 1110111 @r_vm_1
  vsha2ch_vv  101110 1 . . 010 . 1110111 @r_vm_1
  vsha2cl_vv  10 1 . . 010 . 1110111 @r_vm_1
+
+# *** Zvksh vector crypto extension ***
+vsm3me_vv   10 1 . . 010 . 1110111 @r_vm_1
+vsm3c_vi101011 1 . . 010 . 1110111 @r_vm_1
diff --git a/target/riscv/insn_trans/trans_rvvk.c.inc 
b/target/riscv/insn_trans/trans_rvvk.c.inc
index 528a0d3b32..af1fb74c38 100644
--- a/target/riscv/insn_trans/trans_rvvk.c.inc
+++ b/target/riscv/insn_trans/trans_rvvk.c.inc
@@ -479,3 +479,34 @@ static bool vsha_check(DisasContext *s, arg_rmrr *a)
  GEN_VV_UNMASKED_TRANS(vsha2ms_vv, vsha_check, ZVKNH_EGS)
  GEN_VV_UNMASKED_TRANS(vsha2cl_vv, vsha_check, ZVKNH_EGS)
  GEN_VV_UNMASKED_TRANS(vsha2ch_vv, vsha_check, ZVKNH_EGS)
+
+/*
+ * Zvksh
+ */
+
+#define ZVKSH_EGS 8
+
+static inline bool vsm3_check(DisasContext *s, arg_rmrr *a)
+{
+int egw_bytes = ZVKSH_EGS << s->sew;
+int mult = 1 << MAX(s->lmul, 0);
+return s->cfg_ptr->ext_zvksh == true &&
+   require_rvv(s) &&
+   vext_check_isa_ill(s) &&
+   !is_overlapped(a->rd, mult, a->rs2, mult) &&
+   MAXSZ(s) >= egw_bytes &&
+   s->sew == MO_32;
+}
+
+static inline bool vsm3me_check(DisasContext *s, arg_rmrr *a)
+{
+return vsm3_check(s, a) && vext_check_sss(s, a->rd, a->rs1, a->rs2, a->vm);
+}
+
+static inline bool vsm3c_check(DisasContext *s, arg_rmrr *a)
+{
+return vsm3_check(s, a) && vext_check_ss(s, a->rd, a->rs2, a->vm);
+}
+
+GEN_VV_UNMASKED_TRANS(vsm3me_vv, vsm3me_check, ZVKSH_EGS)
+GEN_VI_UNMASKED_TRANS(vsm3c_vi, vsm3c_check, ZVKSH_EGS)
diff --git a/target/ris

Re: [PATCH v4 11/17] target/riscv: Add Zvknh ISA extension support

2023-06-22 Thread Daniel Henrique Barboza





On 6/22/23 13:16, Max Chou wrote:

From: Kiran Ostrolenk 

This commit adds support for the Zvknh vector-crypto extension, which
consists of the following instructions:

* vsha2ms.vv
* vsha2c[hl].vv

Translation functions are defined in
`target/riscv/insn_trans/trans_rvvk.c.inc` and helpers are defined in
`target/riscv/vcrypto_helper.c`.

Co-authored-by: Nazar Kazakov 
Co-authored-by: Lawrence Hunter 
[max.c...@sifive.com: Replaced vstart checking by TCG op]
Signed-off-by: Nazar Kazakov 
Signed-off-by: Lawrence Hunter 
Signed-off-by: Kiran Ostrolenk 
Signed-off-by: Max Chou 
---


Reviewed-by: Daniel Henrique Barboza 


  target/riscv/cpu.c   |  11 +-
  target/riscv/cpu_cfg.h   |   2 +
  target/riscv/helper.h|   4 +
  target/riscv/insn32.decode   |   5 +
  target/riscv/insn_trans/trans_rvvk.c.inc |  78 +
  target/riscv/vcrypto_helper.c| 214 +++
  6 files changed, 311 insertions(+), 3 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index b6c755ba13..6bba8ba8c9 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -119,6 +119,8 @@ static const struct isa_ext_data isa_edata_arr[] = {
  ISA_EXT_DATA_ENTRY(zvfh, PRIV_VERSION_1_12_0, ext_zvfh),
  ISA_EXT_DATA_ENTRY(zvfhmin, PRIV_VERSION_1_12_0, ext_zvfhmin),
  ISA_EXT_DATA_ENTRY(zvkned, PRIV_VERSION_1_12_0, ext_zvkned),
+ISA_EXT_DATA_ENTRY(zvknha, PRIV_VERSION_1_12_0, ext_zvknha),
+ISA_EXT_DATA_ENTRY(zvknhb, PRIV_VERSION_1_12_0, ext_zvknhb),
  ISA_EXT_DATA_ENTRY(zhinx, PRIV_VERSION_1_12_0, ext_zhinx),
  ISA_EXT_DATA_ENTRY(zhinxmin, PRIV_VERSION_1_12_0, ext_zhinxmin),
  ISA_EXT_DATA_ENTRY(smaia, PRIV_VERSION_1_12_0, ext_smaia),
@@ -1195,14 +1197,17 @@ void riscv_cpu_validate_set_extensions(RISCVCPU *cpu, 
Error **errp)
   * In principle Zve*x would also suffice here, were they supported
   * in qemu
   */
-if ((cpu->cfg.ext_zvbb || cpu->cfg.ext_zvkned) && !cpu->cfg.ext_zve32f) {
+if ((cpu->cfg.ext_zvbb || cpu->cfg.ext_zvkned || cpu->cfg.ext_zvknha) &&
+!cpu->cfg.ext_zve32f) {
  error_setg(errp,
 "Vector crypto extensions require V or Zve* extensions");
  return;
  }
  
-if (cpu->cfg.ext_zvbc && !cpu->cfg.ext_zve64f) {

-error_setg(errp, "Zvbc extension requires V or Zve64{f,d} extensions");
+if ((cpu->cfg.ext_zvbc || cpu->cfg.ext_zvknhb) && !cpu->cfg.ext_zve64f) {
+error_setg(
+errp,
+"Zvbc and Zvknhb extensions require V or Zve64{f,d} extensions");
  return;
  }
  
diff --git a/target/riscv/cpu_cfg.h b/target/riscv/cpu_cfg.h

index 4636d4c84d..41cce87ffc 100644
--- a/target/riscv/cpu_cfg.h
+++ b/target/riscv/cpu_cfg.h
@@ -86,6 +86,8 @@ struct RISCVCPUConfig {
  bool ext_zvbb;
  bool ext_zvbc;
  bool ext_zvkned;
+bool ext_zvknha;
+bool ext_zvknhb;
  bool ext_zmmul;
  bool ext_zvfh;
  bool ext_zvfhmin;
diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 738f20d3ca..19f5a8a28d 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1234,3 +1234,7 @@ DEF_HELPER_4(vaesdm_vs, void, ptr, ptr, env, i32)
  DEF_HELPER_4(vaesz_vs, void, ptr, ptr, env, i32)
  DEF_HELPER_5(vaeskf1_vi, void, ptr, ptr, i32, env, i32)
  DEF_HELPER_5(vaeskf2_vi, void, ptr, ptr, i32, env, i32)
+
+DEF_HELPER_5(vsha2ms_vv, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vsha2ch_vv, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vsha2cl_vv, void, ptr, ptr, ptr, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 7e0295d493..d2cfb2729c 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -948,3 +948,8 @@ vaesdm_vs   101001 1 . 0 010 . 1110111 @r2_vm_1
  vaesz_vs101001 1 . 00111 010 . 1110111 @r2_vm_1
  vaeskf1_vi  100010 1 . . 010 . 1110111 @r_vm_1
  vaeskf2_vi  101010 1 . . 010 . 1110111 @r_vm_1
+
+# *** Zvknh vector crypto extension ***
+vsha2ms_vv  101101 1 . . 010 . 1110111 @r_vm_1
+vsha2ch_vv  101110 1 . . 010 . 1110111 @r_vm_1
+vsha2cl_vv  10 1 . . 010 . 1110111 @r_vm_1
diff --git a/target/riscv/insn_trans/trans_rvvk.c.inc 
b/target/riscv/insn_trans/trans_rvvk.c.inc
index c618f76e7e..528a0d3b32 100644
--- a/target/riscv/insn_trans/trans_rvvk.c.inc
+++ b/target/riscv/insn_trans/trans_rvvk.c.inc
@@ -401,3 +401,81 @@ static bool vaeskf2_check(DisasContext *s, arg_vaeskf2_vi 
*a)
  
  GEN_VI_UNMASKED_TRANS(vaeskf1_vi, vaeskf1_check, ZVKNED_EGS)

  GEN_VI_UNMASKED_TRANS(vaeskf2_vi, vaeskf2_check, ZVKNED_EGS)
+
+/*
+ * Zvknh
+ */
+
+#define ZVKNH_EGS 4
+
+#define GEN_VV_UNMASKED_TRANS(NAME, CHECK, EGS)\
+static bool trans_##NAME(DisasContext *s, arg_rmrr *a) \
+{  \
+if (CHECK(s, a)) {

Re: [PATCH v4 10/17] target/riscv: Add Zvkned ISA extension support

2023-06-22 Thread Daniel Henrique Barboza





On 6/22/23 13:16, Max Chou wrote:

From: Nazar Kazakov 

This commit adds support for the Zvkned vector-crypto extension, which
consists of the following instructions:

* vaesef.[vv,vs]
* vaesdf.[vv,vs]
* vaesdm.[vv,vs]
* vaesz.vs
* vaesem.[vv,vs]
* vaeskf1.vi
* vaeskf2.vi

Translation functions are defined in
`target/riscv/insn_trans/trans_rvvk.c.inc` and helpers are defined in
`target/riscv/vcrypto_helper.c`.

Co-authored-by: Lawrence Hunter 
Co-authored-by: William Salmon 
[max.c...@sifive.com: Replaced vstart checking by TCG op]
Signed-off-by: Lawrence Hunter 
Signed-off-by: William Salmon 
Signed-off-by: Nazar Kazakov 
Signed-off-by: Max Chou 
---


Reviewed-by: Daniel Henrique Barboza 


  target/riscv/cpu.c   |   3 +-
  target/riscv/cpu_cfg.h   |   1 +
  target/riscv/helper.h|  13 +
  target/riscv/insn32.decode   |  14 ++
  target/riscv/insn_trans/trans_rvvk.c.inc | 177 +
  target/riscv/op_helper.c |   6 +
  target/riscv/vcrypto_helper.c| 308 +++
  7 files changed, 521 insertions(+), 1 deletion(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 4ee5219dbc..b6c755ba13 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -118,6 +118,7 @@ static const struct isa_ext_data isa_edata_arr[] = {
  ISA_EXT_DATA_ENTRY(zve64d, PRIV_VERSION_1_10_0, ext_zve64d),
  ISA_EXT_DATA_ENTRY(zvfh, PRIV_VERSION_1_12_0, ext_zvfh),
  ISA_EXT_DATA_ENTRY(zvfhmin, PRIV_VERSION_1_12_0, ext_zvfhmin),
+ISA_EXT_DATA_ENTRY(zvkned, PRIV_VERSION_1_12_0, ext_zvkned),
  ISA_EXT_DATA_ENTRY(zhinx, PRIV_VERSION_1_12_0, ext_zhinx),
  ISA_EXT_DATA_ENTRY(zhinxmin, PRIV_VERSION_1_12_0, ext_zhinxmin),
  ISA_EXT_DATA_ENTRY(smaia, PRIV_VERSION_1_12_0, ext_smaia),
@@ -1194,7 +1195,7 @@ void riscv_cpu_validate_set_extensions(RISCVCPU *cpu, 
Error **errp)
   * In principle Zve*x would also suffice here, were they supported
   * in qemu
   */
-if (cpu->cfg.ext_zvbb && !cpu->cfg.ext_zve32f) {
+if ((cpu->cfg.ext_zvbb || cpu->cfg.ext_zvkned) && !cpu->cfg.ext_zve32f) {
  error_setg(errp,
 "Vector crypto extensions require V or Zve* extensions");
  return;
diff --git a/target/riscv/cpu_cfg.h b/target/riscv/cpu_cfg.h
index 0904dc3ae5..4636d4c84d 100644
--- a/target/riscv/cpu_cfg.h
+++ b/target/riscv/cpu_cfg.h
@@ -85,6 +85,7 @@ struct RISCVCPUConfig {
  bool ext_zve64d;
  bool ext_zvbb;
  bool ext_zvbc;
+bool ext_zvkned;
  bool ext_zmmul;
  bool ext_zvfh;
  bool ext_zvfhmin;
diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index fbb0ceca81..738f20d3ca 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1,5 +1,6 @@
  /* Exceptions */
  DEF_HELPER_2(raise_exception, noreturn, env, i32)
+DEF_HELPER_2(restore_cpu_and_raise_exception, noreturn, env, i32)
  
  /* Floating Point - rounding mode */

  DEF_HELPER_FLAGS_2(set_rounding_mode, TCG_CALL_NO_WG, void, env, i32)
@@ -1221,3 +1222,15 @@ DEF_HELPER_6(vandn_vx_b, void, ptr, ptr, tl, ptr, env, 
i32)
  DEF_HELPER_6(vandn_vx_h, void, ptr, ptr, tl, ptr, env, i32)
  DEF_HELPER_6(vandn_vx_w, void, ptr, ptr, tl, ptr, env, i32)
  DEF_HELPER_6(vandn_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_4(vaesef_vv, void, ptr, ptr, env, i32)
+DEF_HELPER_4(vaesef_vs, void, ptr, ptr, env, i32)
+DEF_HELPER_4(vaesdf_vv, void, ptr, ptr, env, i32)
+DEF_HELPER_4(vaesdf_vs, void, ptr, ptr, env, i32)
+DEF_HELPER_4(vaesem_vv, void, ptr, ptr, env, i32)
+DEF_HELPER_4(vaesem_vs, void, ptr, ptr, env, i32)
+DEF_HELPER_4(vaesdm_vv, void, ptr, ptr, env, i32)
+DEF_HELPER_4(vaesdm_vs, void, ptr, ptr, env, i32)
+DEF_HELPER_4(vaesz_vs, void, ptr, ptr, env, i32)
+DEF_HELPER_5(vaeskf1_vi, void, ptr, ptr, i32, env, i32)
+DEF_HELPER_5(vaeskf2_vi, void, ptr, ptr, i32, env, i32)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index aa6d3185a2..7e0295d493 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -75,6 +75,7 @@
  @r_rm...   . . ... . ... %rs2 %rs1 %rm %rd
  @r2_rm   ...   . . ... . ... %rs1 %rm %rd
  @r2  ...   . . ... . ... &r2 %rs1 %rd
+@r2_vm_1 .. . . . ... . ... &rmr vm=1 %rs2 %rd
  @r2_nfvm ... ... vm:1 . . ... . ... &r2nfvm %nf %rs1 %rd
  @r2_vm   .. vm:1 . . ... . ... &rmr %rs2 %rd
  @r1_vm   .. vm:1 . . ... . ... %rd
@@ -934,3 +935,16 @@ vcpop_v 010010 . . 01110 010 . 1010111 @r2_vm
  vwsll_vv110101 . . . 000 . 1010111 @r_vm
  vwsll_vx110101 . . . 100 . 1010111 @r_vm
  vwsll_vi110101 . . . 011 . 1010111 @r_vm
+
+# *** Zvkned vector crypto extension ***
+vaesef_vv   101000 1 . 00011 010 . 1110111 @r2_vm_1
+vaesef_vs   101001 1 . 00011 010 . 1110111 @r2_vm_1
+vaesdf_vv   101000 1 . 00

Re: [PATCH v2 15/16] accel: Rename 'cpu_state' -> 'cpu'

2023-06-22 Thread Richard Henderson


On 6/22/23 18:08, Philippe Mathieu-Daudé wrote:

Most of the codebase uses 'CPUState *cpu' or 'CPUState *cs'.
While 'cpu_state' is kind of explicit, it makes the code
harder to review. Simply rename as 'cpu' like the rest.

Signed-off-by: Philippe Mathieu-Daudé 


I would have chosen 'cs', since 'cpu' is often used for ArchCPU.  But ok.

Acked-by: Richard Henderson 


r~

Re: [RFC 1/6] migration: skipped field is really obsolete.

2023-06-22 Thread Juan Quintela

Daniel P. Berrangé  wrote:
> On Mon, Jun 12, 2023 at 09:33:39PM +0200, Juan Quintela wrote:
>> Has return zero for more than 10 years.  Just mark it deprecated.
>
> Specifically we introduced the field in 1.5.0
>
> commit f1c72795af573b24a7da5eb52375c9aba8a37972
> Author: Peter Lieven 
> Date:   Tue Mar 26 10:58:37 2013 +0100
>
> migration: do not sent zero pages in bulk stage
> 
> during bulk stage of ram migration if a page is a
> zero page do not send it at all.
> the memory at the destination reads as zero anyway.
> 
> even if there is an madvise with QEMU_MADV_DONTNEED
> at the target upon receipt of a zero page I have observed
> that the target starts swapping if the memory is overcommitted.
> it seems that the pages are dropped asynchronously.
> 
> this patch also updates QMP to return the number of
> skipped pages in MigrationStats.
> 
>
>
> but removed its usage in 1.5.3
>
> commit 9ef051e5536b6368a1076046ec6c4ec4ac12b5c6
> Author: Peter Lieven 
> Date:   Mon Jun 10 12:14:19 2013 +0200
>
> Revert "migration: do not sent zero pages in bulk stage"
> 
> Not sending zero pages breaks migration if a page is zero
> at the source but not at the destination. This can e.g. happen
> if different BIOS versions are used at source and destination.
> It has also been reported that migration on pseries is completely
> broken with this patch.
> 
> This effectively reverts commit f1c72795af573b24a7da5eb52375c9aba8a37972.


Thanks for the history O:-)

>> Signed-off-by: Juan Quintela 
>> ---
>>  docs/about/deprecated.rst | 10 ++
>>  qapi/migration.json   | 12 ++--
>>  2 files changed, 20 insertions(+), 2 deletions(-)
>
> Reviewed-by: Daniel P. Berrangé 
>
>
>> diff --git a/qapi/migration.json b/qapi/migration.json
>> index cb7cd3e578..bcae193733 100644
>> --- a/qapi/migration.json
>> +++ b/qapi/migration.json
>> @@ -23,7 +23,8 @@
>>  #
>>  # @duplicate: number of duplicate (zero) pages (since 1.2)
>>  #
>> -# @skipped: number of skipped zero pages (since 1.5)
>> +# @skipped: number of skipped zero pages. Don't use, only provided for
>> +# compatibility (since 1.5)
>
> I'd say
>
>@skipped: number of skipped zero pages. Always zero, only provided for
>compatibility (since 1.5)

Changed.

>>  #
>>  # @normal: number of normal pages (since 1.2)
>>  #
>> @@ -62,11 +63,18 @@
>>  # between 0 and @dirty-sync-count * @multifd-channels.  (since
>>  # 7.1)
>>  #
>> +# Features:
>> +#
>> +# @deprecated: Member @skipped has not been used for a long time.
>
>   @deprecated: Member @skipped is always zero since 1.5.3

Changed.

Thanks.

Re: [PATCH v2 5/5] hw/pci: ensure PCIE devices are plugged into only slot 0 of PCIE port

2023-06-22 Thread Michael S. Tsirkin

On Thu, Jun 22, 2023 at 05:46:40PM +0200, Julia Suvorova wrote:
> On Thu, Jun 22, 2023 at 12:34 PM Ani Sinha  wrote:
> >
> > PCI Express ports only have one slot, so PCI Express devices can only be
> > plugged into slot 0 on a PCIE port. Enforce it.
> >
> > CC: jus...@redhat.com
> > CC: imamm...@redhat.com
> > Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2128929
> > Signed-off-by: Ani Sinha 
> > ---
> >  hw/pci/pci.c | 6 ++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> > index bf38905b7d..5f25ab9f5e 100644
> > --- a/hw/pci/pci.c
> > +++ b/hw/pci/pci.c
> > @@ -64,6 +64,7 @@ bool pci_available = true;
> >  static char *pcibus_get_dev_path(DeviceState *dev);
> >  static char *pcibus_get_fw_dev_path(DeviceState *dev);
> >  static void pcibus_reset(BusState *qbus);
> > +static bool pcie_has_upstream_port(PCIDevice *dev);
> >
> >  static Property pci_props[] = {
> >  DEFINE_PROP_PCI_DEVFN("addr", PCIDevice, devfn, -1),
> > @@ -1189,6 +1190,11 @@ static PCIDevice *do_pci_register_device(PCIDevice 
> > *pci_dev,
> > name);
> >
> > return NULL;
> > +} else if (pcie_has_upstream_port(pci_dev) && PCI_SLOT(devfn)) {
> > +error_setg(errp, "PCI: slot %d is not valid for %s,"
> > +   " PCI express devices can only be plugged into slot 0.",
> 
> This is not technically correct, because downstream ports and root
> ports are also PCIe devices, and they can have different slots under
> upstream ports and RC. But this error will never be shown for them, so
> it seems fine.

Hmm. Confusing users is not nice ... I agree this might
make people think they can not use root ports in slot !=0 either.

Would you add "with an upstream port"?
E.g. "PCI Express devices with an upstream port" ?

> 
> Reviewed-by: Julia Suvorova 
> 
> 
> 
> 
> > +   PCI_SLOT(devfn), name);
> > +return NULL;
> >  }
> >
> >  pci_dev->devfn = devfn;
> > --
> > 2.39.1
> >

Re: [PATCH v4 09/17] target/riscv: Add Zvbb ISA extension support

2023-06-22 Thread Daniel Henrique Barboza





On 6/22/23 13:16, Max Chou wrote:

From: Dickon Hood 

This commit adds support for the Zvbb vector-crypto extension, which
consists of the following instructions:

* vrol.[vv,vx]
* vror.[vv,vx,vi]
* vbrev8.v
* vrev8.v
* vandn.[vv,vx]
* vbrev.v
* vclz.v
* vctz.v
* vcpop.v
* vwsll.[vv,vx,vi]

Translation functions are defined in
`target/riscv/insn_trans/trans_rvvk.c.inc` and helpers are defined in
`target/riscv/vcrypto_helper.c`.

Co-authored-by: Nazar Kazakov 
Co-authored-by: William Salmon 
Co-authored-by: Kiran Ostrolenk 
[max.c...@sifive.com: Fix imm mode of vror.vi]
Signed-off-by: Nazar Kazakov 
Signed-off-by: William Salmon 
Signed-off-by: Kiran Ostrolenk 
Signed-off-by: Dickon Hood 
Signed-off-by: Max Chou 
---


Reviewed-by: Daniel Henrique Barboza 


  target/riscv/cpu.c   |  11 ++
  target/riscv/cpu_cfg.h   |   1 +
  target/riscv/helper.h|  62 +
  target/riscv/insn32.decode   |  20 +++
  target/riscv/insn_trans/trans_rvvk.c.inc | 164 +++
  target/riscv/vcrypto_helper.c| 138 +++
  6 files changed, 396 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 53b0fcade6..4ee5219dbc 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -111,6 +111,7 @@ static const struct isa_ext_data isa_edata_arr[] = {
  ISA_EXT_DATA_ENTRY(zksed, PRIV_VERSION_1_12_0, ext_zksed),
  ISA_EXT_DATA_ENTRY(zksh, PRIV_VERSION_1_12_0, ext_zksh),
  ISA_EXT_DATA_ENTRY(zkt, PRIV_VERSION_1_12_0, ext_zkt),
+ISA_EXT_DATA_ENTRY(zvbb, PRIV_VERSION_1_12_0, ext_zvbb),
  ISA_EXT_DATA_ENTRY(zvbc, PRIV_VERSION_1_12_0, ext_zvbc),
  ISA_EXT_DATA_ENTRY(zve32f, PRIV_VERSION_1_10_0, ext_zve32f),
  ISA_EXT_DATA_ENTRY(zve64f, PRIV_VERSION_1_10_0, ext_zve64f),
@@ -1189,6 +1190,16 @@ void riscv_cpu_validate_set_extensions(RISCVCPU *cpu, 
Error **errp)
  return;
  }
  
+/*

+ * In principle Zve*x would also suffice here, were they supported
+ * in qemu
+ */
+if (cpu->cfg.ext_zvbb && !cpu->cfg.ext_zve32f) {
+error_setg(errp,
+   "Vector crypto extensions require V or Zve* extensions");
+return;
+}
+
  if (cpu->cfg.ext_zvbc && !cpu->cfg.ext_zve64f) {
  error_setg(errp, "Zvbc extension requires V or Zve64{f,d} 
extensions");
  return;
diff --git a/target/riscv/cpu_cfg.h b/target/riscv/cpu_cfg.h
index 5ca19298a7..0904dc3ae5 100644
--- a/target/riscv/cpu_cfg.h
+++ b/target/riscv/cpu_cfg.h
@@ -83,6 +83,7 @@ struct RISCVCPUConfig {
  bool ext_zve32f;
  bool ext_zve64f;
  bool ext_zve64d;
+bool ext_zvbb;
  bool ext_zvbc;
  bool ext_zmmul;
  bool ext_zvfh;
diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index be0f0f1058..fbb0ceca81 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -1159,3 +1159,65 @@ DEF_HELPER_6(vclmul_vv, void, ptr, ptr, ptr, ptr, env, 
i32)
  DEF_HELPER_6(vclmul_vx, void, ptr, ptr, tl, ptr, env, i32)
  DEF_HELPER_6(vclmulh_vv, void, ptr, ptr, ptr, ptr, env, i32)
  DEF_HELPER_6(vclmulh_vx, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vror_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vror_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vror_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vror_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+
+DEF_HELPER_6(vror_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vror_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vror_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vror_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_6(vrol_vv_b, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vrol_vv_h, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vrol_vv_w, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_6(vrol_vv_d, void, ptr, ptr, ptr, ptr, env, i32)
+
+DEF_HELPER_6(vrol_vx_b, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vrol_vx_h, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vrol_vx_w, void, ptr, ptr, tl, ptr, env, i32)
+DEF_HELPER_6(vrol_vx_d, void, ptr, ptr, tl, ptr, env, i32)
+
+DEF_HELPER_5(vrev8_v_b, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vrev8_v_h, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vrev8_v_w, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vrev8_v_d, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vbrev8_v_b, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vbrev8_v_h, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vbrev8_v_w, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vbrev8_v_d, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vbrev_v_b, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vbrev_v_h, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vbrev_v_w, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vbrev_v_d, void, ptr, ptr, ptr, env, i32)
+
+DEF_HELPER_5(vclz_v_b, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vclz_v_h, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_5(vclz_v_w, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_

Re: [PATCH v2 12/16] accel: Remove WHPX unreachable error path

2023-06-22 Thread Richard Henderson


On 6/22/23 18:08, Philippe Mathieu-Daudé wrote:

g_new0() can not fail. Remove the unreachable error path.

https://developer-old.gnome.org/glib/stable/glib-Memory-Allocation.html#glib-Memory-Allocation.description

Reported-by: Richard Henderson 
Signed-off-by: Philippe Mathieu-Daudé 
---
  target/i386/whpx/whpx-all.c | 6 --
  1 file changed, 6 deletions(-)


Reviewed-by: Richard Henderson 

r~

Re: [PATCH v2 09/16] accel: Remove NVMM unreachable error path

2023-06-22 Thread Richard Henderson


On 6/22/23 18:08, Philippe Mathieu-Daudé wrote:

g_malloc0() can not fail. Remove the unreachable error path.

https://developer-old.gnome.org/glib/stable/glib-Memory-Allocation.html#glib-Memory-Allocation.description

Signed-off-by: Philippe Mathieu-Daudé 
---
  target/i386/nvmm/nvmm-all.c | 4 
  1 file changed, 4 deletions(-)


Reviewed-by: Richard Henderson 


r~

Re: [PATCH v2 07/16] accel: Rename HAX 'struct hax_vcpu_state' -> AccelCPUState

2023-06-22 Thread Richard Henderson


On 6/22/23 18:08, Philippe Mathieu-Daudé wrote:

|+ struct AccelvCPUState *accel;|

...

+typedef struct AccelCPUState {
 hax_fd fd;
 int vcpu_id;
 struct hax_tunnel *tunnel;
 unsigned char *iobuf;
-};
+} hax_vcpu_state;



Discussed face to face, but for the record:

Put the typedef in qemu/typedefs.h, so that we can use it immediately in core/cpu.h and 
not need to re-declare it in each accelerator.


Drop hax_vcpu_state typedef and just use AccelCPUState (since you have to change all of 
those lines anyway.  Which will eventually allow



+++ b/target/i386/whpx/whpx-all.c
@@ -2258,7 +2258,7 @@ int whpx_init_vcpu(CPUState *cpu)
 
 vcpu->interruptable = true;

 cpu->vcpu_dirty = true;
-cpu->accel = (struct hax_vcpu_state *)vcpu;
+cpu->accel = (struct AccelCPUState *)vcpu;


this cast to go away.


r~

Re: [PATCH v1 02/23] pc/q35: Apply PCI bus BSEL property for Xen PCI device hotplug

2023-06-22 Thread Julia Suvorova

On Thu, Jun 22, 2023 at 9:36 AM Igor Mammedov  wrote:
>
> On Wed, 21 Jun 2023 13:24:42 -0400
> Joel Upham  wrote:
>
> > On Wed, Jun 21, 2023 at 7:28 AM Igor Mammedov  wrote:
> >
> > > On Tue, 20 Jun 2023 13:24:36 -0400
> > > Joel Upham  wrote:
> > >
> > > > On Q35 we still need to assign BSEL property to bus(es) for PCI device
> > > > add/hotplug to work.
> > > > Extend acpi_set_pci_info() function to support Q35 as well. This patch
> > > adds new (trivial)
> > > > function find_q35() which returns root PCIBus object on Q35, in a way
> > > > similar to what find_i440fx does.
> > >
> > > I think patch is mostly obsolete, q35 ACPI PCI hotplug is supported in
> > > upstream QEMU.
> > >
> > > Also see comment below.
> > >
> > > I make use of the find_q35() function in later patches, but I agree now a
> > majority of this patch is a bit different.
>
> There is likely an existing alternative already. (probably introduced by ACPI 
> PIC hotplug for q35)

There is a similar function acpi_get_i386_pci_host() in hw/i386/acpi-build.c

Best regards, Julia Suvorova.

> >
> > > >
> > > > Signed-off-by: Alexey Gerasimenko 
> > > > Signed-off-by: Joel Upham 
> > > > ---
> > > >  hw/acpi/pcihp.c  | 4 +++-
> > > >  hw/pci-host/q35.c| 9 +
> > > >  include/hw/i386/pc.h | 3 +++
> > > >  3 files changed, 15 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
> > > > index cdd6f775a1..f4e39d7a9c 100644
> > > > --- a/hw/acpi/pcihp.c
> > > > +++ b/hw/acpi/pcihp.c
> > > > @@ -40,6 +40,7 @@
> > > >  #include "qapi/error.h"
> > > >  #include "qom/qom-qobject.h"
> > > >  #include "trace.h"
> > > > +#include "sysemu/xen.h"
> > > >
> > > >  #define ACPI_PCIHP_SIZE 0x0018
> > > >  #define PCI_UP_BASE 0x
> > > > @@ -84,7 +85,8 @@ static void *acpi_set_bsel(PCIBus *bus, void *opaque)
> > > >  bool is_bridge = IS_PCI_BRIDGE(br);
> > > >
> > > >  /* hotplugged bridges can't be described in ACPI ignore them */
> > > > -if (qbus_is_hotpluggable(BUS(bus))) {
> > >
> > > > +/* Xen requires hotplugging to the root device, even on the Q35
> > > chipset */
> > > pls explain what 'root device' is.
> > > Why can't you use root-ports for hotplug?
> > >
> > > Wording may have been incorrect.  Root port is correct. This may not be
> > needed anymore,
> > and may have been left over for when I was debugging PCIe hotplugging
> > problems.
> > I will retest and fix patch once I know more. Xen expects the PCIe device
> > to be on the root port.
> >
> > I can move the function to a different patch that uses it.
> >
> > > > +if (qbus_is_hotpluggable(BUS(bus)) || xen_enabled()) {
> > > >  if (!is_bridge || (!br->hotplugged &&
> > > info->has_bridge_hotplug)) {
> > > >  bus_bsel = g_malloc(sizeof *bus_bsel);
> > > >
> > > > diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
> > > > index fd18920e7f..fe5fc0f47c 100644
> > > > --- a/hw/pci-host/q35.c
> > > > +++ b/hw/pci-host/q35.c
> > > > @@ -259,6 +259,15 @@ static void q35_host_initfn(Object *obj)
> > > >   qdev_prop_allow_set_link_before_realize,
> > > 0);
> > > >  }
> > > >
> > > > +PCIBus *find_q35(void)
> > > > +{
> > > > +PCIHostState *s = OBJECT_CHECK(PCIHostState,
> > > > +   object_resolve_path("/machine/q35",
> > > NULL),
> > > > +   TYPE_PCI_HOST_BRIDGE);
> > > > +return s ? s->bus : NULL;
> > > > +}
> > > > +
> > > > +
> > > >  static const TypeInfo q35_host_info = {
> > > >  .name   = TYPE_Q35_HOST_DEVICE,
> > > >  .parent = TYPE_PCIE_HOST_BRIDGE,
> > > > diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
> > > > index c661e9cc80..550f8fa221 100644
> > > > --- a/include/hw/i386/pc.h
> > > > +++ b/include/hw/i386/pc.h
> > > > @@ -196,6 +196,9 @@ void pc_madt_cpu_entry(int uid, const CPUArchIdList
> > > *apic_ids,
> > > >  /* sgx.c */
> > > >  void pc_machine_init_sgx_epc(PCMachineState *pcms);
> > > >
> > > > +/* q35.c */
> > > > +PCIBus *find_q35(void);
> > > > +
> > > >  extern GlobalProperty pc_compat_8_0[];
> > > >  extern const size_t pc_compat_8_0_len;
> > > >
> > >
> > >
>
>

Re: [PATCH] migration.json: Don't use space before colon

2023-06-22 Thread Markus Armbruster

Juan Quintela  writes:

> Markus Armbruster  wrote:
>> Juan Quintela  writes:
>>
>>> So all the file is consistent.
>>>
>>> Signed-off-by: Juan Quintela 
>>
>> Reviewed-by: Markus Armbruster 
>>
>> Queued.  thanks!
>
> My deprecated series depend on this, so I will got it through the
> migration tree if you don't care.

Go right ahead.

Re: [PATCH v4 17/17] target/riscv: Expose Zvk* and Zvb[b,c] cpu properties

2023-06-22 Thread Daniel Henrique Barboza





On 6/22/23 13:16, Max Chou wrote:

From: Nazar Kazakov 

Exposes earlier CPU flags allowing the use of the vector cryptography 
extensions.

Signed-off-by: Nazar Kazakov 
Signed-off-by: Max Chou 
---
  target/riscv/cpu.c | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index c1956dc29b..48d584ab0d 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1714,6 +1714,16 @@ static Property riscv_cpu_extensions[] = {
  DEFINE_PROP_BOOL("x-zvfh", RISCVCPU, cfg.ext_zvfh, false),
  DEFINE_PROP_BOOL("x-zvfhmin", RISCVCPU, cfg.ext_zvfhmin, false),
  
+/* Vector cryptography extensions */

+DEFINE_PROP_BOOL("x-zvbb", RISCVCPU, cfg.ext_zvbb, false),
+DEFINE_PROP_BOOL("x-zvbc", RISCVCPU, cfg.ext_zvbc, false),
+DEFINE_PROP_BOOL("x-zvkg", RISCVCPU, cfg.ext_zvkg, false),
+DEFINE_PROP_BOOL("x-zvkned", RISCVCPU, cfg.ext_zvkned, false),
+DEFINE_PROP_BOOL("x-zvknha", RISCVCPU, cfg.ext_zvknha, false),
+DEFINE_PROP_BOOL("x-zvknhb", RISCVCPU, cfg.ext_zvknhb, false),
+DEFINE_PROP_BOOL("x-zvksed", RISCVCPU, cfg.ext_zvksed, false),
+DEFINE_PROP_BOOL("x-zvksh", RISCVCPU, cfg.ext_zvksh, false),
+


We usually add the cpu properties in the same commit that the extension was
added, e.g. "x-zvbb" would be added by patch 9. This is no hard rule though.

Let's leave this as is and, if a v5 is required for any other reason, you can
put each property into its own patch. For now:

Reviewed-by: Daniel Henrique Barboza 


  DEFINE_PROP_END_OF_LIST(),
  };

Re: [PATCH v2 05/16] accel: Destroy HAX vCPU threads once done

2023-06-22 Thread Richard Henderson


On 6/22/23 18:08, Philippe Mathieu-Daudé wrote:

When the vCPU thread finished its processing, destroy
it and signal its destruction to generic vCPU management
layer.

Add a sanity check for the vCPU accelerator context.

Signed-off-by: Philippe Mathieu-Daudé
---
  target/i386/hax/hax-accel-ops.c | 3 +++
  target/i386/hax/hax-all.c   | 1 +
  2 files changed, 4 insertions(+)


Reviewed-by: Richard Henderson 

r~

Re: [PATCH v2 1/7] target/ppc: Add initial flags and helpers for SMT support

2023-06-22 Thread Cédric Le Goater


On 6/22/23 11:33, Nicholas Piggin wrote:

TGC SMT emulation needs to know whether it is running with SMT siblings,
to be able to iterate over siblings in a core, and to serialise
threads to access per-core shared SPRs. Add infrastructure to do these
things.

For now the sibling iteration and serialisation are implemented in a
simple but inefficient way. SMT shared state and sibling access is not
too common, and SMT configurations are mainly useful to test system
code, so performance is not to critical.

Signed-off-by: Nicholas Piggin 


Reviewed-by: Cédric Le Goater 

Thanks,

C.



---
  target/ppc/cpu.h   |  9 +
  target/ppc/cpu_init.c  |  5 +
  target/ppc/translate.c | 20 
  3 files changed, 34 insertions(+)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index bfa1777289..0087ce66e2 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -672,6 +672,8 @@ enum {
  POWERPC_FLAG_TM   = 0x0010,
  /* Has SCV (ISA 3.00)
*/
  POWERPC_FLAG_SCV  = 0x0020,
+/* Has >1 thread per core*/
+POWERPC_FLAG_SMT  = 0x0040,
  };
  
  /*

@@ -1270,6 +1272,13 @@ struct CPUArchState {
  uint64_t pmu_base_time;
  };
  
+#define _CORE_ID(cs)\

+(POWERPC_CPU(cs)->env.spr_cb[SPR_PIR].default_value & ~(cs->nr_threads - 
1))
+
+#define THREAD_SIBLING_FOREACH(cs, cs_sibling)  \
+CPU_FOREACH(cs_sibling) \
+if (_CORE_ID(cs) == _CORE_ID(cs_sibling))
+
  #define SET_FIT_PERIOD(a_, b_, c_, d_)  \
  do {\
  env->fit_period[0] = (a_);  \
diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index dccc064053..aeff71d063 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -6755,6 +6755,7 @@ static void ppc_cpu_realize(DeviceState *dev, Error 
**errp)
  {
  CPUState *cs = CPU(dev);
  PowerPCCPU *cpu = POWERPC_CPU(dev);
+CPUPPCState *env = &cpu->env;
  PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
  Error *local_err = NULL;
  
@@ -6786,6 +6787,10 @@ static void ppc_cpu_realize(DeviceState *dev, Error **errp)
  
  pcc->parent_realize(dev, errp);
  
+if (env_cpu(env)->nr_threads > 1) {

+env->flags |= POWERPC_FLAG_SMT;
+}
+
  return;
  
  unrealize:

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index b62b624682..5d585393c5 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -236,6 +236,26 @@ struct opc_handler_t {
  void (*handler)(DisasContext *ctx);
  };
  
+static inline bool gen_serialize(DisasContext *ctx)

+{
+if (tb_cflags(ctx->base.tb) & CF_PARALLEL) {
+/* Restart with exclusive lock.  */
+gen_helper_exit_atomic(cpu_env);
+ctx->base.is_jmp = DISAS_NORETURN;
+return false;
+}
+return true;
+}
+
+static inline bool gen_serialize_core(DisasContext *ctx)
+{
+if (ctx->flags & POWERPC_FLAG_SMT) {
+return gen_serialize(ctx);
+}
+
+return true;
+}
+
  /* SPR load/store helpers */
  static inline void gen_load_spr(TCGv t, int reg)
  {

Re: [PATCH v2 3/7] target/ppc: Add msgsnd/p and DPDES SMT support

2023-06-22 Thread Cédric Le Goater


On 6/22/23 11:33, Nicholas Piggin wrote:

Doorbells in SMT need to coordinate msgsnd/msgclr and DPDES access from
multiple threads that affect the same state.

Signed-off-by: Nicholas Piggin 


Reviewed-by: Cédric Le Goater 

Thanks,

C.




---
  hw/ppc/ppc.c |  6 ++
  include/hw/ppc/ppc.h |  1 +
  target/ppc/excp_helper.c | 30 ++-
  target/ppc/misc_helper.c | 44 ++--
  target/ppc/translate.c   |  8 
  5 files changed, 78 insertions(+), 11 deletions(-)

diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
index 1b1220c423..82e4408c5c 100644
--- a/hw/ppc/ppc.c
+++ b/hw/ppc/ppc.c
@@ -1436,6 +1436,12 @@ int ppc_cpu_pir(PowerPCCPU *cpu)
  return env->spr_cb[SPR_PIR].default_value;
  }
  
+int ppc_cpu_tir(PowerPCCPU *cpu)

+{
+CPUPPCState *env = &cpu->env;
+return env->spr_cb[SPR_TIR].default_value;
+}
+
  PowerPCCPU *ppc_get_vcpu_by_pir(int pir)
  {
  CPUState *cs;
diff --git a/include/hw/ppc/ppc.h b/include/hw/ppc/ppc.h
index 02af03ada2..e095c002dc 100644
--- a/include/hw/ppc/ppc.h
+++ b/include/hw/ppc/ppc.h
@@ -6,6 +6,7 @@
  void ppc_set_irq(PowerPCCPU *cpu, int n_IRQ, int level);
  PowerPCCPU *ppc_get_vcpu_by_pir(int pir);
  int ppc_cpu_pir(PowerPCCPU *cpu);
+int ppc_cpu_tir(PowerPCCPU *cpu);
  
  /* PowerPC hardware exceptions management helpers */

  typedef void (*clk_setup_cb)(void *opaque, uint32_t freq);
diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 7d45035447..d40eecb4c7 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -3187,22 +3187,42 @@ void helper_book3s_msgclrp(CPUPPCState *env, 
target_ulong rb)
  }
  
  /*

- * sends a message to other threads that are on the same
+ * sends a message to another thread  on the same
   * multi-threaded processor
   */
  void helper_book3s_msgsndp(CPUPPCState *env, target_ulong rb)
  {
-int pir = env->spr_cb[SPR_PIR].default_value;
+CPUState *cs = env_cpu(env);
+PowerPCCPU *cpu = POWERPC_CPU(cs);
+CPUState *ccs;
+uint32_t nr_threads = cs->nr_threads;
+int ttir = rb & PPC_BITMASK(57, 63);
  
  helper_hfscr_facility_check(env, HFSCR_MSGP, "msgsndp", HFSCR_IC_MSGP);
  
-if (!dbell_type_server(rb)) {

+if (!dbell_type_server(rb) || ttir >= nr_threads) {
+return;
+}
+
+if (nr_threads == 1) {
+ppc_set_irq(cpu, PPC_INTERRUPT_DOORBELL, 1);
  return;
  }
  
-/* TODO: TCG supports only one thread */

+/* Does iothread need to be locked for walking CPU list? */
+qemu_mutex_lock_iothread();
+THREAD_SIBLING_FOREACH(cs, ccs) {
+PowerPCCPU *ccpu = POWERPC_CPU(ccs);
+uint32_t thread_id = ppc_cpu_tir(ccpu);
+
+if (ttir == thread_id) {
+ppc_set_irq(ccpu, PPC_INTERRUPT_DOORBELL, 1);
+qemu_mutex_unlock_iothread();
+return;
+}
+}
  
-book3s_msgsnd_common(pir, PPC_INTERRUPT_DOORBELL);

+g_assert_not_reached();
  }
  #endif /* TARGET_PPC64 */
  
diff --git a/target/ppc/misc_helper.c b/target/ppc/misc_helper.c

index a058eb24cd..1f1af21f33 100644
--- a/target/ppc/misc_helper.c
+++ b/target/ppc/misc_helper.c
@@ -184,14 +184,31 @@ void helper_store_pcr(CPUPPCState *env, target_ulong 
value)
   */
  target_ulong helper_load_dpdes(CPUPPCState *env)
  {
+CPUState *cs = env_cpu(env);
+CPUState *ccs;
+uint32_t nr_threads = cs->nr_threads;
  target_ulong dpdes = 0;
  
  helper_hfscr_facility_check(env, HFSCR_MSGP, "load DPDES", HFSCR_IC_MSGP);
  
-/* TODO: TCG supports only one thread */

-if (env->pending_interrupts & PPC_INTERRUPT_DOORBELL) {
-dpdes = 1;
+if (nr_threads == 1) {
+if (env->pending_interrupts & PPC_INTERRUPT_DOORBELL) {
+dpdes = 1;
+}
+return dpdes;
+}
+
+qemu_mutex_lock_iothread();
+THREAD_SIBLING_FOREACH(cs, ccs) {
+PowerPCCPU *ccpu = POWERPC_CPU(ccs);
+CPUPPCState *cenv = &ccpu->env;
+uint32_t thread_id = ppc_cpu_tir(ccpu);
+
+if (cenv->pending_interrupts & PPC_INTERRUPT_DOORBELL) {
+dpdes |= (0x1 << thread_id);
+}
  }
+qemu_mutex_unlock_iothread();
  
  return dpdes;

  }
@@ -199,17 +216,32 @@ target_ulong helper_load_dpdes(CPUPPCState *env)
  void helper_store_dpdes(CPUPPCState *env, target_ulong val)
  {
  PowerPCCPU *cpu = env_archcpu(env);
+CPUState *cs = env_cpu(env);
+CPUState *ccs;
+uint32_t nr_threads = cs->nr_threads;
  
  helper_hfscr_facility_check(env, HFSCR_MSGP, "store DPDES", HFSCR_IC_MSGP);
  
-/* TODO: TCG supports only one thread */

-if (val & ~0x1) {
+if (val & ~(nr_threads - 1)) {
  qemu_log_mask(LOG_GUEST_ERROR, "Invalid DPDES register value "
TARGET_FMT_lx"\n", val);
+val &= (nr_threads - 1); /* Ignore the invalid bits */
+}
+
+if (nr_threads == 1) {
+ppc_set_irq(cpu, PPC_INTERRUPT_DOORBELL, val & 0x1);

Re: [PATCH v2 7/7] tests/avocado: Add ppc64 pseries multiprocessor boot tests

2023-06-22 Thread Cédric Le Goater


On 6/22/23 11:33, Nicholas Piggin wrote:

Add mult-thread/core/socket Linux boot tests that ensure the right
topology comes up. Of particular note is a SMT test, which is a new
capability for TCG.

Signed-off-by: Nicholas Piggin 


Reviewed-by: Cédric Le Goater 

Thanks,

C.




---
  tests/avocado/ppc_pseries.py | 60 +---
  1 file changed, 55 insertions(+), 5 deletions(-)

diff --git a/tests/avocado/ppc_pseries.py b/tests/avocado/ppc_pseries.py
index a152cf222e..ff42c770f2 100644
--- a/tests/avocado/ppc_pseries.py
+++ b/tests/avocado/ppc_pseries.py
@@ -14,12 +14,9 @@ class pseriesMachine(QemuSystemTest):
  timeout = 90
  KERNEL_COMMON_COMMAND_LINE = 'printk.time=0 '
  panic_message = 'Kernel panic - not syncing'
+good_message = 'VFS: Cannot open root device'
  
-def test_ppc64_pseries(self):

-"""
-:avocado: tags=arch:ppc64
-:avocado: tags=machine:pseries
-"""
+def do_test_ppc64_linux_boot(self):
  kernel_url = ('https://archives.fedoraproject.org/pub/archive'
'/fedora-secondary/releases/29/Everything/ppc64le/os'
'/ppc/ppc64/vmlinuz')
@@ -31,5 +28,58 @@ def test_ppc64_pseries(self):
  self.vm.add_args('-kernel', kernel_path,
   '-append', kernel_command_line)
  self.vm.launch()
+
+def test_ppc64_linux_boot(self):
+"""
+:avocado: tags=arch:ppc64
+:avocado: tags=machine:pseries
+"""
+
+self.do_test_ppc64_linux_boot()
  console_pattern = 'VFS: Cannot open root device'
  wait_for_console_pattern(self, console_pattern, self.panic_message)
+
+def test_ppc64_linux_smp_boot(self):
+"""
+:avocado: tags=arch:ppc64
+:avocado: tags=machine:pseries
+"""
+
+self.vm.add_args('-smp', '4')
+self.do_test_ppc64_linux_boot()
+console_pattern = 'smp: Brought up 1 node, 4 CPUs'
+wait_for_console_pattern(self, console_pattern, self.panic_message)
+wait_for_console_pattern(self, self.good_message, self.panic_message)
+
+def test_ppc64_linux_smt_boot(self):
+"""
+:avocado: tags=arch:ppc64
+:avocado: tags=machine:pseries
+"""
+
+self.vm.add_args('-smp', '4,threads=4')
+self.do_test_ppc64_linux_boot()
+console_pattern = 'CPU maps initialized for 4 threads per core'
+wait_for_console_pattern(self, console_pattern, self.panic_message)
+console_pattern = 'smp: Brought up 1 node, 4 CPUs'
+wait_for_console_pattern(self, console_pattern, self.panic_message)
+wait_for_console_pattern(self, self.good_message, self.panic_message)
+
+def test_ppc64_linux_big_boot(self):
+"""
+:avocado: tags=arch:ppc64
+:avocado: tags=machine:pseries
+"""
+
+self.vm.add_args('-smp', '16,threads=4,cores=2,sockets=2')
+self.vm.add_args('-m', '512M',
+ '-object', 'memory-backend-ram,size=256M,id=m0',
+ '-object', 'memory-backend-ram,size=256M,id=m1')
+self.vm.add_args('-numa', 'node,nodeid=0,memdev=m0')
+self.vm.add_args('-numa', 'node,nodeid=1,memdev=m1')
+self.do_test_ppc64_linux_boot()
+console_pattern = 'CPU maps initialized for 4 threads per core'
+wait_for_console_pattern(self, console_pattern, self.panic_message)
+console_pattern = 'smp: Brought up 2 nodes, 16 CPUs'
+wait_for_console_pattern(self, console_pattern, self.panic_message)
+wait_for_console_pattern(self, self.good_message, self.panic_message)

Re: [PATCH v2 6/7] tests/avocado: boot ppc64 pseries to Linux VFS mount

2023-06-22 Thread Cédric Le Goater


On 6/22/23 11:33, Nicholas Piggin wrote:

This machine can boot Linux to VFS mount, so don't stop in early boot.

Signed-off-by: Nicholas Piggin 


Reviewed-by: Cédric Le Goater 

Thanks,

C.




---
  tests/avocado/ppc_pseries.py | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/avocado/ppc_pseries.py b/tests/avocado/ppc_pseries.py
index d8b04dc3ea..a152cf222e 100644
--- a/tests/avocado/ppc_pseries.py
+++ b/tests/avocado/ppc_pseries.py
@@ -31,5 +31,5 @@ def test_ppc64_pseries(self):
  self.vm.add_args('-kernel', kernel_path,
   '-append', kernel_command_line)
  self.vm.launch()
-console_pattern = 'Kernel command line: %s' % kernel_command_line
+console_pattern = 'VFS: Cannot open root device'
  wait_for_console_pattern(self, console_pattern, self.panic_message)

1 2 3 4 >

1 - 100 of 363 matches

Mail list logo