date:20240622

Hello,

This series enables Vulkan Venus context support on virtio-gpu.

All virglrender and almost all Linux kernel prerequisite changes
needed by Venus are already in upstream. For kernel there is a pending
KVM patchset that fixes mapping of compound pages needed for DRM drivers
using TTM [1], othewrwise hostmem blob mapping will fail with a KVM error
from Qemu.

[1] https://lore.kernel.org/kvm/20240229025759.1187910-1-steve...@google.com/

You'll need to use recent Mesa version containing patch that removes
dependency on cross-device feature from Venus that isn't supported by
Qemu [2].

[2]
https://gitlab.freedesktop.org/mesa/mesa/-/commit/087e9a96d13155e26987befae78b6ccbb7ae242b

Example Qemu cmdline that enables Venus:

qemu-system-x86_64 -device virtio-vga-gl,hostmem=4G,blob=true,venus=true \
-machine q35,accel=kvm,memory-backend=mem1 \
-object memory-backend-memfd,id=mem1,size=8G -m 8G

Changes from V14 to V15

- Dropped hostmem mapping state that got unused in v14, suggested by
Akihiko Odaki.

- Moved resource_get_info() from set_scanout_blob() to create_blob(),
suggested by Akihiko Odaki.

- Fixed unitilized variable in create_blob(), spotted by Alex Bennée.

Changes from V13 to V14

- Fixed erronous fall-through in renderer_state's switch-case that was
spotted by Marc-André Lureau.

- Reworked HOSTMEM_MR_FINISH_UNMAPPING handling as was suggested by
Akihiko Odaki. Now it shares the same code path with HOSTMEM_MR_MAPPED.

- Made use of g_autofree in virgl_cmd_resource_create_blob() as was
suggested by Akihiko Odaki.

- Removed virtio_gpu_virgl_deinit() and moved all deinit code to
virtio_gpu_gl_device_unrealize() as was suggested by Marc-André Lureau.

- Replaced HAVE_FEATURE in mseon.build with virglrenderer's VERSION_MAJOR
check as was suggested by Marc-André Lureau.

- Added trace event for cmd-suspension as was suggested by Marc-André Lureau.

- Added patch to replace in-flight printf's with trace events as was
suggested by Marc-André Lureau

Changes from V12 to V13

- Replaced `res->async_unmap_in_progress` flag with a mapping state,
moved it to the virtio_gpu_virgl_hostmem_region like was suggested
by Akihiko Odaki.

- Renamed blob_unmap function and added back cmd_suspended argument
to it. Suggested by Akihiko Odaki.

- Reordered VirtIOGPUGL refactoring patches to minimize code changes
like was suggested by Akihiko Odaki.

- Replaced gl->renderer_inited with gl->renderer_state, like was suggested
by Alex Bennée.

- Added gl->renderer state resetting to gl_device_unrealize(), for
consistency. Suggested by Alex Bennée.

- Added rb's from Alex and Manos.

- Fixed compiling with !HAVE_VIRGL_RESOURCE_BLOB.

Changes from V11 to V12

- Fixed virgl_cmd_resource_create_blob() error handling. Now it doesn't
corrupt resource list and releases resource properly on error. Thanks
to Akihiko Odaki for spotting the bug.

- Added new patch that handles virtio_gpu_virgl_init() failure gracefully,
fixing Qemu crash. Besides fixing the crash, it allows to implement
a cleaner virtio_gpu_virgl_deinit().

- virtio_gpu_virgl_deinit() now assumes that previously virgl was
initialized successfully when it was inited at all. Suggested by
Akihiko Odaki.

- Fixed missed freeing of print_stats timer in virtio_gpu_virgl_deinit()

- Added back blob unmapping or RESOURCE_UNREF that was requested
by Akihiko Odaki. Added comment to the code explaining how
async unmapping works. Added back `res->async_unmap_in_progress`
flag and added comment telling why it's needed.

- Moved cmdq_resume_bh to VirtIOGPUGL and made coding style changes
suggested by Akihiko Odaki.

- Added patches that move fence_poll and print_stats timers to VirtIOGPUGL
for consistency with cmdq_resume_bh.

Changes from V10 to V11

- Replaced cmd_resume bool in struct ctrl_command with
"cmd->finished + !VIRTIO_GPU_FLAG_FENCE" checking as was requested
by Akihiko Odaki.

- Reworked virgl_cmd_resource_unmap/unref_blob() to avoid re-adding
the 'async_unmap_in_progress' flag that was dropped in v9:

1. virgl_cmd_resource_[un]map_blob() now doesn't check itself whether
resource was previously mapped and lets virglrenderer to do the
checking.

2. error returned by virgl_renderer_resource_unmap() is now handled
and reported properly, previously the error wasn't checked. The
virgl_renderer_resource_unmap() fails if resource wasn't mapped.

3. virgl_cmd_resource_unref_blob() now doesn't allow to unref resource
that is mapped, it's a error condition if guest didn't unmap resource
before doing the unref. Previously unref was implicitly unmapping
resource.

Changes from V9 to V10

- Dropped 'async_unmap_in_progress' variable and switched to use
aio_bh_new() isntead of oneshot variant in the "blob commands" patch.

- Further improved error messages by printing error code when actual error
occurrs and using ERR_UNSPEC instead of

[PATCH v15 01/14] virtio-gpu: Use trace events for tracking number of in-flight fences

Replace printf's used for tracking of in-flight fence inc/dec events
with tracing, for consistency with the rest of virtio-gpu code that
uses tracing.

Suggested-by: Marc-André Lureau 
Signed-off-by: Dmitry Osipenko 
---
 hw/display/trace-events   | 2 ++
 hw/display/virtio-gpu-virgl.c | 2 +-
 hw/display/virtio-gpu.c   | 4 ++--
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/hw/display/trace-events b/hw/display/trace-events
index 781f8a33203b..e212710284ae 100644
--- a/hw/display/trace-events
+++ b/hw/display/trace-events
@@ -53,6 +53,8 @@ virtio_gpu_cmd_ctx_submit(uint32_t ctx, uint32_t size) "ctx 
0x%x, size %d"
 virtio_gpu_update_cursor(uint32_t scanout, uint32_t x, uint32_t y, const char 
*type, uint32_t res) "scanout %d, x %d, y %d, %s, res 0x%x"
 virtio_gpu_fence_ctrl(uint64_t fence, uint32_t type) "fence 0x%" PRIx64 ", 
type 0x%x"
 virtio_gpu_fence_resp(uint64_t fence) "fence 0x%" PRIx64
+virtio_gpu_inc_inflight_fences(uint32_t inflight) "in-flight+ %u"
+virtio_gpu_dec_inflight_fences(uint32_t inflight) "in-flight- %u"
 
 # qxl.c
 disable qxl_io_write_vga(int qid, const char *mode, uint32_t addr, uint32_t 
val) "%d %s addr=%u val=%u"
diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index 9f34d0e6619c..14091b191ec0 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -525,7 +525,7 @@ static void virgl_write_fence(void *opaque, uint32_t fence)
 g_free(cmd);
 g->inflight--;
 if (virtio_gpu_stats_enabled(g->parent_obj.conf)) {
-fprintf(stderr, "inflight: %3d (-)\r", g->inflight);
+trace_virtio_gpu_dec_inflight_fences(g->inflight);
 }
 }
 }
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index d60b1b2973af..602952a7041b 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -1066,7 +1066,7 @@ void virtio_gpu_process_cmdq(VirtIOGPU *g)
 if (g->stats.max_inflight < g->inflight) {
 g->stats.max_inflight = g->inflight;
 }
-fprintf(stderr, "inflight: %3d (+)\r", g->inflight);
+trace_virtio_gpu_inc_inflight_fences(g->inflight);
 }
 } else {
 g_free(cmd);
@@ -1086,7 +1086,7 @@ static void virtio_gpu_process_fenceq(VirtIOGPU *g)
 g_free(cmd);
 g->inflight--;
 if (virtio_gpu_stats_enabled(g->parent_obj.conf)) {
-fprintf(stderr, "inflight: %3d (-)\r", g->inflight);
+trace_virtio_gpu_dec_inflight_fences(g->inflight);
 }
 }
 }
-- 
2.45.2

[PATCH v15 10/14] virtio-gpu: Support blob scanout using dmabuf fd

From: Robert Beckett 

Support displaying blob resources by handling SET_SCANOUT_BLOB
command.

Signed-by: Antonio Caggiano 
Signed-off-by: Robert Beckett 
Signed-off-by: Huang Rui 
Reviewed-by: Antonio Caggiano 
Signed-off-by: Dmitry Osipenko 
---
 hw/display/virtio-gpu-virgl.c  | 109 +
 hw/display/virtio-gpu.c|  12 ++--
 include/hw/virtio/virtio-gpu.h |   7 +++
 3 files changed, 122 insertions(+), 6 deletions(-)

diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index 3ffea478e723..60befab7efc2 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -17,6 +17,8 @@
 #include "trace.h"
 #include "hw/virtio/virtio.h"
 #include "hw/virtio/virtio-gpu.h"
+#include "hw/virtio/virtio-gpu-bswap.h"
+#include "hw/virtio/virtio-gpu-pixman.h"
 
 #include "ui/egl-helpers.h"
 
@@ -78,6 +80,7 @@ static void virgl_cmd_create_resource_2d(VirtIOGPU *g,
 res->base.height = c2d.height;
 res->base.format = c2d.format;
 res->base.resource_id = c2d.resource_id;
+res->base.dmabuf_fd = -1;
 QTAILQ_INSERT_HEAD(>reslist, >base, next);
 
 args.handle = c2d.resource_id;
@@ -125,6 +128,7 @@ static void virgl_cmd_create_resource_3d(VirtIOGPU *g,
 res->base.height = c3d.height;
 res->base.format = c3d.format;
 res->base.resource_id = c3d.resource_id;
+res->base.dmabuf_fd = -1;
 QTAILQ_INSERT_HEAD(>reslist, >base, next);
 
 args.handle = c3d.resource_id;
@@ -509,6 +513,106 @@ static void virgl_cmd_get_capset(VirtIOGPU *g,
 g_free(resp);
 }
 
+#if VIRGL_VERSION_MAJOR >= 1
+static void virgl_cmd_set_scanout_blob(VirtIOGPU *g,
+   struct virtio_gpu_ctrl_command *cmd)
+{
+struct virtio_gpu_framebuffer fb = { 0 };
+struct virgl_renderer_resource_info info;
+struct virtio_gpu_virgl_resource *res;
+struct virtio_gpu_set_scanout_blob ss;
+uint64_t fbend;
+
+VIRTIO_GPU_FILL_CMD(ss);
+virtio_gpu_scanout_blob_bswap();
+trace_virtio_gpu_cmd_set_scanout_blob(ss.scanout_id, ss.resource_id,
+  ss.r.width, ss.r.height, ss.r.x,
+  ss.r.y);
+
+if (ss.scanout_id >= g->parent_obj.conf.max_outputs) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: illegal scanout id specified %d",
+  __func__, ss.scanout_id);
+cmd->error = VIRTIO_GPU_RESP_ERR_INVALID_SCANOUT_ID;
+return;
+}
+
+if (ss.resource_id == 0) {
+virtio_gpu_disable_scanout(g, ss.scanout_id);
+return;
+}
+
+if (ss.width < 16 ||
+ss.height < 16 ||
+ss.r.x + ss.r.width > ss.width ||
+ss.r.y + ss.r.height > ss.height) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: illegal scanout %d bounds for"
+  " resource %d, rect (%d,%d)+%d,%d, fb %d %d\n",
+  __func__, ss.scanout_id, ss.resource_id,
+  ss.r.x, ss.r.y, ss.r.width, ss.r.height,
+  ss.width, ss.height);
+cmd->error = VIRTIO_GPU_RESP_ERR_INVALID_PARAMETER;
+return;
+}
+
+res = virtio_gpu_virgl_find_resource(g, ss.resource_id);
+if (!res) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: resource does not exist %d\n",
+  __func__, ss.resource_id);
+cmd->error = VIRTIO_GPU_RESP_ERR_INVALID_RESOURCE_ID;
+return;
+}
+if (virgl_renderer_resource_get_info(ss.resource_id, )) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: resource does not have info %d\n",
+  __func__, ss.resource_id);
+cmd->error = VIRTIO_GPU_RESP_ERR_INVALID_RESOURCE_ID;
+return;
+}
+if (res->base.dmabuf_fd < 0) {
+res->base.dmabuf_fd = info.fd;
+}
+if (res->base.dmabuf_fd < 0) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: resource not backed by dmabuf 
%d\n",
+  __func__, ss.resource_id);
+cmd->error = VIRTIO_GPU_RESP_ERR_INVALID_RESOURCE_ID;
+return;
+}
+
+fb.format = virtio_gpu_get_pixman_format(ss.format);
+if (!fb.format) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: pixel format not supported %d\n",
+  __func__, ss.format);
+cmd->error = VIRTIO_GPU_RESP_ERR_INVALID_PARAMETER;
+return;
+}
+
+fb.bytes_pp = DIV_ROUND_UP(PIXMAN_FORMAT_BPP(fb.format), 8);
+fb.width = ss.width;
+fb.height = ss.height;
+fb.stride = ss.strides[0];
+fb.offset = ss.offsets[0] + ss.r.x * fb.bytes_pp + ss.r.y * fb.stride;
+
+fbend = fb.offset;
+fbend += fb.stride * (ss.r.height - 1);
+fbend += fb.bytes_pp * ss.r.width;
+if (fbend > res->base.blob_size) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: fb end out of range\n",
+  __func__);
+cmd->error = VIRTIO_GPU_RESP_ERR_INVALID_PARAMETER;
+return;
+}
+
+g->parent_obj.enable = 1;
+if (virtio_gpu_update_dmabuf(g,

[PATCH v15 13/14] virtio-gpu: Register capsets dynamically

From: Pierre-Eric Pelloux-Prayer 

virtio_gpu_virgl_get_num_capsets will return "num_capsets", but we can't
assume that capset_index 1 is always VIRGL2 once we'll support more capsets,
like Venus and DRM capsets. Register capsets dynamically to avoid that problem.

Reviewed-by: Manos Pitsidianakis 
Signed-off-by: Pierre-Eric Pelloux-Prayer 
Signed-off-by: Dmitry Osipenko 
---
 hw/display/virtio-gpu-gl.c |  6 --
 hw/display/virtio-gpu-virgl.c  | 33 +
 include/hw/virtio/virtio-gpu.h |  4 +++-
 3 files changed, 28 insertions(+), 15 deletions(-)

diff --git a/hw/display/virtio-gpu-gl.c b/hw/display/virtio-gpu-gl.c
index 5f27568d3ec8..20a7c316bb23 100644
--- a/hw/display/virtio-gpu-gl.c
+++ b/hw/display/virtio-gpu-gl.c
@@ -138,8 +138,8 @@ static void virtio_gpu_gl_device_realize(DeviceState *qdev, 
Error **errp)
 }
 
 g->parent_obj.conf.flags |= (1 << VIRTIO_GPU_FLAG_VIRGL_ENABLED);
-VIRTIO_GPU_BASE(g)->virtio_config.num_capsets =
-virtio_gpu_virgl_get_num_capsets(g);
+g->capset_ids = virtio_gpu_virgl_get_capsets(g);
+VIRTIO_GPU_BASE(g)->virtio_config.num_capsets = g->capset_ids->len;
 
 #if VIRGL_VERSION_MAJOR >= 1
 g->parent_obj.conf.flags |= 1 << VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED;
@@ -171,6 +171,8 @@ static void virtio_gpu_gl_device_unrealize(DeviceState 
*qdev)
 }
 
 gl->renderer_state = RS_START;
+
+g_array_unref(g->capset_ids);
 }
 
 static void virtio_gpu_gl_class_init(ObjectClass *klass, void *data)
diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index f6cb4fe5b28e..58693dfa2afa 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -629,19 +629,13 @@ static void virgl_cmd_get_capset_info(VirtIOGPU *g,
 VIRTIO_GPU_FILL_CMD(info);
 
 memset(, 0, sizeof(resp));
-if (info.capset_index == 0) {
-resp.capset_id = VIRTIO_GPU_CAPSET_VIRGL;
-virgl_renderer_get_cap_set(resp.capset_id,
-   _max_version,
-   _max_size);
-} else if (info.capset_index == 1) {
-resp.capset_id = VIRTIO_GPU_CAPSET_VIRGL2;
+
+if (info.capset_index < g->capset_ids->len) {
+resp.capset_id = g_array_index(g->capset_ids, uint32_t,
+   info.capset_index);
 virgl_renderer_get_cap_set(resp.capset_id,
_max_version,
_max_size);
-} else {
-resp.capset_max_version = 0;
-resp.capset_max_size = 0;
 }
 resp.hdr.type = VIRTIO_GPU_RESP_OK_CAPSET_INFO;
 virtio_gpu_ctrl_response(g, cmd, , sizeof(resp));
@@ -1167,12 +1161,27 @@ int virtio_gpu_virgl_init(VirtIOGPU *g)
 return 0;
 }
 
-int virtio_gpu_virgl_get_num_capsets(VirtIOGPU *g)
+static void virtio_gpu_virgl_add_capset(GArray *capset_ids, uint32_t capset_id)
+{
+g_array_append_val(capset_ids, capset_id);
+}
+
+GArray *virtio_gpu_virgl_get_capsets(VirtIOGPU *g)
 {
 uint32_t capset2_max_ver, capset2_max_size;
+GArray *capset_ids;
+
+capset_ids = g_array_new(false, false, sizeof(uint32_t));
+
+/* VIRGL is always supported. */
+virtio_gpu_virgl_add_capset(capset_ids, VIRTIO_GPU_CAPSET_VIRGL);
+
 virgl_renderer_get_cap_set(VIRTIO_GPU_CAPSET_VIRGL2,
   _max_ver,
   _max_size);
+if (capset2_max_ver) {
+virtio_gpu_virgl_add_capset(capset_ids, VIRTIO_GPU_CAPSET_VIRGL2);
+}
 
-return capset2_max_ver ? 2 : 1;
+return capset_ids;
 }
diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index 775005abb337..83232f4b4bfa 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -209,6 +209,8 @@ struct VirtIOGPU {
 QTAILQ_HEAD(, VGPUDMABuf) bufs;
 VGPUDMABuf *primary[VIRTIO_GPU_MAX_SCANOUTS];
 } dmabuf;
+
+GArray *capset_ids;
 };
 
 struct VirtIOGPUClass {
@@ -354,6 +356,6 @@ void virtio_gpu_virgl_fence_poll(VirtIOGPU *g);
 void virtio_gpu_virgl_reset_scanout(VirtIOGPU *g);
 void virtio_gpu_virgl_reset(VirtIOGPU *g);
 int virtio_gpu_virgl_init(VirtIOGPU *g);
-int virtio_gpu_virgl_get_num_capsets(VirtIOGPU *g);
+GArray *virtio_gpu_virgl_get_capsets(VirtIOGPU *g);
 
 #endif
-- 
2.45.2

[PATCH v15 07/14] virtio-gpu: Support context-init feature with virglrenderer

From: Huang Rui 

Patch "virtio-gpu: CONTEXT_INIT feature" has added the context_init
feature flags. Expose this feature and support creating virglrenderer
context with flags using context_id if libvirglrenderer is new enough.

Originally-by: Antonio Caggiano 
Signed-off-by: Huang Rui 
Reviewed-by: Antonio Caggiano 
Signed-off-by: Dmitry Osipenko 
---
 hw/display/virtio-gpu-gl.c|  4 
 hw/display/virtio-gpu-virgl.c | 20 ++--
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/hw/display/virtio-gpu-gl.c b/hw/display/virtio-gpu-gl.c
index 0109244276fc..4fe9e6a0c21c 100644
--- a/hw/display/virtio-gpu-gl.c
+++ b/hw/display/virtio-gpu-gl.c
@@ -141,6 +141,10 @@ static void virtio_gpu_gl_device_realize(DeviceState 
*qdev, Error **errp)
 VIRTIO_GPU_BASE(g)->virtio_config.num_capsets =
 virtio_gpu_virgl_get_num_capsets(g);
 
+#if VIRGL_VERSION_MAJOR >= 1
+g->parent_obj.conf.flags |= 1 << VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED;
+#endif
+
 virtio_gpu_device_realize(qdev, errp);
 }
 
diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index ca6f4d6cbb58..b3aa444bcfa5 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -106,8 +106,24 @@ static void virgl_cmd_context_create(VirtIOGPU *g,
 trace_virtio_gpu_cmd_ctx_create(cc.hdr.ctx_id,
 cc.debug_name);
 
-virgl_renderer_context_create(cc.hdr.ctx_id, cc.nlen,
-  cc.debug_name);
+if (cc.context_init) {
+if (!virtio_gpu_context_init_enabled(g->parent_obj.conf)) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: context_init disabled",
+  __func__);
+cmd->error = VIRTIO_GPU_RESP_ERR_UNSPEC;
+return;
+}
+
+#if VIRGL_VERSION_MAJOR >= 1
+virgl_renderer_context_create_with_flags(cc.hdr.ctx_id,
+ cc.context_init,
+ cc.nlen,
+ cc.debug_name);
+return;
+#endif
+}
+
+virgl_renderer_context_create(cc.hdr.ctx_id, cc.nlen, cc.debug_name);
 }
 
 static void virgl_cmd_context_destroy(VirtIOGPU *g,
-- 
2.45.2

[PATCH v15 06/14] virtio-gpu: Use pkgconfig version to decide which virgl features are available

New virglrerenderer features were stabilized with release of v1.0.0.
Presence of symbols in virglrenderer.h doesn't guarantee ABI compatibility
with pre-release development versions of libvirglerender. Use virglrenderer
version to decide reliably which virgl features are available.

Reviewed-by: Alex Bennée 
Signed-off-by: Dmitry Osipenko 
---
 hw/display/virtio-gpu-virgl.c | 2 +-
 meson.build   | 5 +
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index a63d1f540f04..ca6f4d6cbb58 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -171,7 +171,7 @@ static void virgl_cmd_set_scanout(VirtIOGPU *g,
 struct virgl_renderer_resource_info info;
 void *d3d_tex2d = NULL;
 
-#ifdef HAVE_VIRGL_D3D_INFO_EXT
+#if VIRGL_VERSION_MAJOR >= 1
 struct virgl_renderer_resource_info_ext ext;
 memset(, 0, sizeof(ext));
 ret = virgl_renderer_resource_get_info_ext(ss.resource_id, );
diff --git a/meson.build b/meson.build
index 97e00d6f59b8..838d08ef0f9b 100644
--- a/meson.build
+++ b/meson.build
@@ -2329,10 +2329,7 @@ config_host_data.set('CONFIG_VNC', vnc.found())
 config_host_data.set('CONFIG_VNC_JPEG', jpeg.found())
 config_host_data.set('CONFIG_VNC_SASL', sasl.found())
 if virgl.found()
-  config_host_data.set('HAVE_VIRGL_D3D_INFO_EXT',
-   cc.has_member('struct 
virgl_renderer_resource_info_ext', 'd3d_tex2d',
- prefix: '#include ',
- dependencies: virgl))
+  config_host_data.set('VIRGL_VERSION_MAJOR', virgl.version().split('.')[0])
 endif
 config_host_data.set('CONFIG_VIRTFS', have_virtfs)
 config_host_data.set('CONFIG_VTE', vte.found())
-- 
2.45.2

[PATCH v15 09/14] virtio-gpu: Add virgl resource management

From: Huang Rui 

In a preparation to adding host blobs support to virtio-gpu, add virgl
resource management that allows to retrieve resource based on its ID
and virgl resource wrapper on top of simple resource that will be contain
fields specific to virgl.

Signed-off-by: Huang Rui 
Reviewed-by: Antonio Caggiano 
Signed-off-by: Dmitry Osipenko 
---
 hw/display/virtio-gpu-virgl.c | 76 +++
 1 file changed, 76 insertions(+)

diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index b3aa444bcfa5..3ffea478e723 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -22,6 +22,23 @@
 
 #include 
 
+struct virtio_gpu_virgl_resource {
+struct virtio_gpu_simple_resource base;
+};
+
+static struct virtio_gpu_virgl_resource *
+virtio_gpu_virgl_find_resource(VirtIOGPU *g, uint32_t resource_id)
+{
+struct virtio_gpu_simple_resource *res;
+
+res = virtio_gpu_find_resource(g, resource_id);
+if (!res) {
+return NULL;
+}
+
+return container_of(res, struct virtio_gpu_virgl_resource, base);
+}
+
 #if VIRGL_RENDERER_CALLBACKS_VERSION >= 4
 static void *
 virgl_get_egl_display(G_GNUC_UNUSED void *cookie)
@@ -35,11 +52,34 @@ static void virgl_cmd_create_resource_2d(VirtIOGPU *g,
 {
 struct virtio_gpu_resource_create_2d c2d;
 struct virgl_renderer_resource_create_args args;
+struct virtio_gpu_virgl_resource *res;
 
 VIRTIO_GPU_FILL_CMD(c2d);
 trace_virtio_gpu_cmd_res_create_2d(c2d.resource_id, c2d.format,
c2d.width, c2d.height);
 
+if (c2d.resource_id == 0) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: resource id 0 is not allowed\n",
+  __func__);
+cmd->error = VIRTIO_GPU_RESP_ERR_INVALID_RESOURCE_ID;
+return;
+}
+
+res = virtio_gpu_virgl_find_resource(g, c2d.resource_id);
+if (res) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: resource already exists %d\n",
+  __func__, c2d.resource_id);
+cmd->error = VIRTIO_GPU_RESP_ERR_INVALID_RESOURCE_ID;
+return;
+}
+
+res = g_new0(struct virtio_gpu_virgl_resource, 1);
+res->base.width = c2d.width;
+res->base.height = c2d.height;
+res->base.format = c2d.format;
+res->base.resource_id = c2d.resource_id;
+QTAILQ_INSERT_HEAD(>reslist, >base, next);
+
 args.handle = c2d.resource_id;
 args.target = 2;
 args.format = c2d.format;
@@ -59,11 +99,34 @@ static void virgl_cmd_create_resource_3d(VirtIOGPU *g,
 {
 struct virtio_gpu_resource_create_3d c3d;
 struct virgl_renderer_resource_create_args args;
+struct virtio_gpu_virgl_resource *res;
 
 VIRTIO_GPU_FILL_CMD(c3d);
 trace_virtio_gpu_cmd_res_create_3d(c3d.resource_id, c3d.format,
c3d.width, c3d.height, c3d.depth);
 
+if (c3d.resource_id == 0) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: resource id 0 is not allowed\n",
+  __func__);
+cmd->error = VIRTIO_GPU_RESP_ERR_INVALID_RESOURCE_ID;
+return;
+}
+
+res = virtio_gpu_virgl_find_resource(g, c3d.resource_id);
+if (res) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: resource already exists %d\n",
+  __func__, c3d.resource_id);
+cmd->error = VIRTIO_GPU_RESP_ERR_INVALID_RESOURCE_ID;
+return;
+}
+
+res = g_new0(struct virtio_gpu_virgl_resource, 1);
+res->base.width = c3d.width;
+res->base.height = c3d.height;
+res->base.format = c3d.format;
+res->base.resource_id = c3d.resource_id;
+QTAILQ_INSERT_HEAD(>reslist, >base, next);
+
 args.handle = c3d.resource_id;
 args.target = c3d.target;
 args.format = c3d.format;
@@ -82,12 +145,21 @@ static void virgl_cmd_resource_unref(VirtIOGPU *g,
  struct virtio_gpu_ctrl_command *cmd)
 {
 struct virtio_gpu_resource_unref unref;
+struct virtio_gpu_virgl_resource *res;
 struct iovec *res_iovs = NULL;
 int num_iovs = 0;
 
 VIRTIO_GPU_FILL_CMD(unref);
 trace_virtio_gpu_cmd_res_unref(unref.resource_id);
 
+res = virtio_gpu_virgl_find_resource(g, unref.resource_id);
+if (!res) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: resource does not exist %d\n",
+  __func__, unref.resource_id);
+cmd->error = VIRTIO_GPU_RESP_ERR_INVALID_RESOURCE_ID;
+return;
+}
+
 virgl_renderer_resource_detach_iov(unref.resource_id,
_iovs,
_iovs);
@@ -95,6 +167,10 @@ static void virgl_cmd_resource_unref(VirtIOGPU *g,
 virtio_gpu_cleanup_mapping_iov(g, res_iovs, num_iovs);
 }
 virgl_renderer_resource_unref(unref.resource_id);
+
+QTAILQ_REMOVE(>reslist, >base, next);
+
+g_free(res);
 }
 
 static void virgl_cmd_context_create(VirtIOGPU *g,
-- 
2.45.2

[PATCH v15 02/14] virtio-gpu: Move fence_poll timer to VirtIOGPUGL

Move fence_poll timer to VirtIOGPUGL for consistency with cmdq_resume_bh
that are used only by GL device.

Signed-off-by: Dmitry Osipenko 
---
 hw/display/virtio-gpu-virgl.c  | 8 +---
 include/hw/virtio/virtio-gpu.h | 3 ++-
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index 14091b191ec0..91dce90f9176 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -594,11 +594,12 @@ static void virtio_gpu_print_stats(void *opaque)
 static void virtio_gpu_fence_poll(void *opaque)
 {
 VirtIOGPU *g = opaque;
+VirtIOGPUGL *gl = VIRTIO_GPU_GL(g);
 
 virgl_renderer_poll();
 virtio_gpu_process_cmdq(g);
 if (!QTAILQ_EMPTY(>cmdq) || !QTAILQ_EMPTY(>fenceq)) {
-timer_mod(g->fence_poll, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 10);
+timer_mod(gl->fence_poll, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 10);
 }
 }
 
@@ -626,6 +627,7 @@ int virtio_gpu_virgl_init(VirtIOGPU *g)
 {
 int ret;
 uint32_t flags = 0;
+VirtIOGPUGL *gl = VIRTIO_GPU_GL(g);
 
 #if VIRGL_RENDERER_CALLBACKS_VERSION >= 4
 if (qemu_egl_display) {
@@ -645,8 +647,8 @@ int virtio_gpu_virgl_init(VirtIOGPU *g)
 return ret;
 }
 
-g->fence_poll = timer_new_ms(QEMU_CLOCK_VIRTUAL,
- virtio_gpu_fence_poll, g);
+gl->fence_poll = timer_new_ms(QEMU_CLOCK_VIRTUAL,
+  virtio_gpu_fence_poll, g);
 
 if (virtio_gpu_stats_enabled(g->parent_obj.conf)) {
 g->print_stats = timer_new_ms(QEMU_CLOCK_VIRTUAL,
diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index 7a59379f5a7a..bc69fd78a440 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -196,7 +196,6 @@ struct VirtIOGPU {
 uint64_t hostmem;
 
 bool processing_cmdq;
-QEMUTimer *fence_poll;
 QEMUTimer *print_stats;
 
 uint32_t inflight;
@@ -231,6 +230,8 @@ struct VirtIOGPUGL {
 
 bool renderer_inited;
 bool renderer_reset;
+
+QEMUTimer *fence_poll;
 };
 
 struct VhostUserGPU {
-- 
2.45.2

[PATCH v15 04/14] virtio-gpu: Handle virtio_gpu_virgl_init() failure

virtio_gpu_virgl_init() may fail, leading to a further Qemu crash
because Qemu assumes it never fails. Check virtio_gpu_virgl_init()
return code and don't execute virtio commands on error. Failed
virtio_gpu_virgl_init() will result in a timed out virtio commands
for a guest OS.

Signed-off-by: Dmitry Osipenko 
---
 hw/display/virtio-gpu-gl.c | 30 ++
 include/hw/virtio/virtio-gpu.h | 11 +--
 2 files changed, 31 insertions(+), 10 deletions(-)

diff --git a/hw/display/virtio-gpu-gl.c b/hw/display/virtio-gpu-gl.c
index e06be60dfbfc..21a1e9a05c5d 100644
--- a/hw/display/virtio-gpu-gl.c
+++ b/hw/display/virtio-gpu-gl.c
@@ -29,9 +29,14 @@ static void virtio_gpu_gl_update_cursor_data(VirtIOGPU *g,
  struct virtio_gpu_scanout *s,
  uint32_t resource_id)
 {
+VirtIOGPUGL *gl = VIRTIO_GPU_GL(g);
 uint32_t width, height;
 uint32_t pixels, *data;
 
+if (gl->renderer_state != RS_INITED) {
+return;
+}
+
 data = virgl_renderer_get_cursor_data(resource_id, , );
 if (!data) {
 return;
@@ -65,13 +70,22 @@ static void virtio_gpu_gl_handle_ctrl(VirtIODevice *vdev, 
VirtQueue *vq)
 return;
 }
 
-if (!gl->renderer_inited) {
-virtio_gpu_virgl_init(g);
-gl->renderer_inited = true;
-}
-if (gl->renderer_reset) {
-gl->renderer_reset = false;
+switch (gl->renderer_state) {
+case RS_RESET:
 virtio_gpu_virgl_reset(g);
+/* fallthrough */
+case RS_START:
+if (virtio_gpu_virgl_init(g)) {
+gl->renderer_state = RS_INIT_FAILED;
+return;
+}
+
+gl->renderer_state = RS_INITED;
+break;
+case RS_INIT_FAILED:
+return;
+case RS_INITED:
+break;
 }
 
 cmd = virtqueue_pop(vq, sizeof(struct virtio_gpu_ctrl_command));
@@ -98,9 +112,9 @@ static void virtio_gpu_gl_reset(VirtIODevice *vdev)
  * GL functions must be called with the associated GL context in main
  * thread, and when the renderer is unblocked.
  */
-if (gl->renderer_inited && !gl->renderer_reset) {
+if (gl->renderer_state == RS_INITED) {
 virtio_gpu_virgl_reset_scanout(g);
-gl->renderer_reset = true;
+gl->renderer_state = RS_RESET;
 }
 }
 
diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index 7ff989a45a5c..6e71d799e5da 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -224,11 +224,18 @@ struct VirtIOGPUClass {
  Error **errp);
 };
 
+/* VirtIOGPUGL renderer states */
+typedef enum {
+RS_START,   /* starting state */
+RS_INIT_FAILED, /* failed initialisation */
+RS_INITED,  /* initialised and working */
+RS_RESET,   /* inited and reset pending, moves to start after reset */
+} RenderState;
+
 struct VirtIOGPUGL {
 struct VirtIOGPU parent_obj;
 
-bool renderer_inited;
-bool renderer_reset;
+RenderState renderer_state;
 
 QEMUTimer *fence_poll;
 QEMUTimer *print_stats;
-- 
2.45.2

[PATCH v15 11/14] virtio-gpu: Support suspension of commands processing

Check whether command processing has been finished; otherwise, stop
processing commands and retry the command again next time. This allows
us to support asynchronous execution of non-fenced commands needed for
unmapping host blobs safely.

Suggested-by: Akihiko Odaki 
Signed-off-by: Dmitry Osipenko 
---
 hw/display/trace-events | 1 +
 hw/display/virtio-gpu.c | 6 ++
 2 files changed, 7 insertions(+)

diff --git a/hw/display/trace-events b/hw/display/trace-events
index e212710284ae..d26d663f9638 100644
--- a/hw/display/trace-events
+++ b/hw/display/trace-events
@@ -55,6 +55,7 @@ virtio_gpu_fence_ctrl(uint64_t fence, uint32_t type) "fence 
0x%" PRIx64 ", type
 virtio_gpu_fence_resp(uint64_t fence) "fence 0x%" PRIx64
 virtio_gpu_inc_inflight_fences(uint32_t inflight) "in-flight+ %u"
 virtio_gpu_dec_inflight_fences(uint32_t inflight) "in-flight- %u"
+virtio_gpu_cmd_suspended(uint32_t cmd) "cmd 0x%x"
 
 # qxl.c
 disable qxl_io_write_vga(int qid, const char *mode, uint32_t addr, uint32_t 
val) "%d %s addr=%u val=%u"
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index 95091c4b7924..1c6e97fb6931 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -1054,6 +1054,12 @@ void virtio_gpu_process_cmdq(VirtIOGPU *g)
 /* process command */
 vgc->process_cmd(g, cmd);
 
+/* command suspended */
+if (!cmd->finished && !(cmd->cmd_hdr.flags & VIRTIO_GPU_FLAG_FENCE)) {
+trace_virtio_gpu_cmd_suspended(cmd->cmd_hdr.type);
+break;
+}
+
 QTAILQ_REMOVE(>cmdq, cmd, next);
 if (virtio_gpu_stats_enabled(g->parent_obj.conf)) {
 g->stats.requests++;
-- 
2.45.2

[PATCH v15 05/14] virtio-gpu: Unrealize GL device

Even though GL GPU doesn't support hotplugging today, free virgl
resources when GL device is unrealized. For consistency.

Signed-off-by: Dmitry Osipenko 
---
 hw/display/virtio-gpu-gl.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/hw/display/virtio-gpu-gl.c b/hw/display/virtio-gpu-gl.c
index 21a1e9a05c5d..0109244276fc 100644
--- a/hw/display/virtio-gpu-gl.c
+++ b/hw/display/virtio-gpu-gl.c
@@ -150,6 +150,22 @@ static Property virtio_gpu_gl_properties[] = {
 DEFINE_PROP_END_OF_LIST(),
 };
 
+static void virtio_gpu_gl_device_unrealize(DeviceState *qdev)
+{
+VirtIOGPU *g = VIRTIO_GPU(qdev);
+VirtIOGPUGL *gl = VIRTIO_GPU_GL(qdev);
+
+if (gl->renderer_state >= RS_INITED) {
+if (virtio_gpu_stats_enabled(g->parent_obj.conf)) {
+timer_free(gl->print_stats);
+}
+timer_free(gl->fence_poll);
+virgl_renderer_cleanup(NULL);
+}
+
+gl->renderer_state = RS_START;
+}
+
 static void virtio_gpu_gl_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
@@ -163,6 +179,7 @@ static void virtio_gpu_gl_class_init(ObjectClass *klass, 
void *data)
 vgc->update_cursor_data = virtio_gpu_gl_update_cursor_data;
 
 vdc->realize = virtio_gpu_gl_device_realize;
+vdc->unrealize = virtio_gpu_gl_device_unrealize;
 vdc->reset = virtio_gpu_gl_reset;
 device_class_set_props(dc, virtio_gpu_gl_properties);
 }
-- 
2.45.2

[PATCH v15 14/14] virtio-gpu: Support Venus context

From: Antonio Caggiano 

Request Venus when initializing VirGL and if venus=true flag is set for
virtio-gpu-gl device.

Signed-off-by: Antonio Caggiano 
Signed-off-by: Huang Rui 
Signed-off-by: Dmitry Osipenko 
---
 hw/display/virtio-gpu-gl.c |  2 ++
 hw/display/virtio-gpu-virgl.c  | 22 ++
 hw/display/virtio-gpu.c| 15 +++
 include/hw/virtio/virtio-gpu.h |  3 +++
 4 files changed, 38 insertions(+), 4 deletions(-)

diff --git a/hw/display/virtio-gpu-gl.c b/hw/display/virtio-gpu-gl.c
index 20a7c316bb23..9be452547322 100644
--- a/hw/display/virtio-gpu-gl.c
+++ b/hw/display/virtio-gpu-gl.c
@@ -151,6 +151,8 @@ static void virtio_gpu_gl_device_realize(DeviceState *qdev, 
Error **errp)
 static Property virtio_gpu_gl_properties[] = {
 DEFINE_PROP_BIT("stats", VirtIOGPU, parent_obj.conf.flags,
 VIRTIO_GPU_FLAG_STATS_ENABLED, false),
+DEFINE_PROP_BIT("venus", VirtIOGPU, parent_obj.conf.flags,
+VIRTIO_GPU_FLAG_VENUS_ENABLED, false),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index 58693dfa2afa..08b0e7e49337 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -1135,6 +1135,11 @@ int virtio_gpu_virgl_init(VirtIOGPU *g)
 flags |= VIRGL_RENDERER_D3D11_SHARE_TEXTURE;
 }
 #endif
+#if VIRGL_VERSION_MAJOR >= 1
+if (virtio_gpu_venus_enabled(g->parent_obj.conf)) {
+flags |= VIRGL_RENDERER_VENUS | VIRGL_RENDERER_RENDER_SERVER;
+}
+#endif
 
 ret = virgl_renderer_init(g, flags, _gpu_3d_cbs);
 if (ret != 0) {
@@ -1168,7 +1173,7 @@ static void virtio_gpu_virgl_add_capset(GArray 
*capset_ids, uint32_t capset_id)
 
 GArray *virtio_gpu_virgl_get_capsets(VirtIOGPU *g)
 {
-uint32_t capset2_max_ver, capset2_max_size;
+uint32_t capset_max_ver, capset_max_size;
 GArray *capset_ids;
 
 capset_ids = g_array_new(false, false, sizeof(uint32_t));
@@ -1177,11 +1182,20 @@ GArray *virtio_gpu_virgl_get_capsets(VirtIOGPU *g)
 virtio_gpu_virgl_add_capset(capset_ids, VIRTIO_GPU_CAPSET_VIRGL);
 
 virgl_renderer_get_cap_set(VIRTIO_GPU_CAPSET_VIRGL2,
-  _max_ver,
-  _max_size);
-if (capset2_max_ver) {
+   _max_ver,
+   _max_size);
+if (capset_max_ver) {
 virtio_gpu_virgl_add_capset(capset_ids, VIRTIO_GPU_CAPSET_VIRGL2);
 }
 
+if (virtio_gpu_venus_enabled(g->parent_obj.conf)) {
+virgl_renderer_get_cap_set(VIRTIO_GPU_CAPSET_VENUS,
+   _max_ver,
+   _max_size);
+if (capset_max_size) {
+virtio_gpu_virgl_add_capset(capset_ids, VIRTIO_GPU_CAPSET_VENUS);
+}
+}
+
 return capset_ids;
 }
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index a5db2256a4bb..50b5634af13f 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -1507,6 +1507,21 @@ void virtio_gpu_device_realize(DeviceState *qdev, Error 
**errp)
 #endif
 }
 
+if (virtio_gpu_venus_enabled(g->parent_obj.conf)) {
+#ifdef VIRGL_VERSION_MAJOR
+#if VIRGL_VERSION_MAJOR >= 1
+if (!virtio_gpu_blob_enabled(g->parent_obj.conf) ||
+!virtio_gpu_hostmem_enabled(g->parent_obj.conf)) {
+error_setg(errp, "venus requires enabled blob and hostmem 
options");
+return;
+}
+#else
+error_setg(errp, "old virglrenderer, venus unsupported");
+return;
+#endif
+#endif
+}
+
 if (!virtio_gpu_base_device_realize(qdev,
 virtio_gpu_handle_ctrl_cb,
 virtio_gpu_handle_cursor_cb,
diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index 83232f4b4bfa..230fa0c4ee0a 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -99,6 +99,7 @@ enum virtio_gpu_base_conf_flags {
 VIRTIO_GPU_FLAG_BLOB_ENABLED,
 VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED,
 VIRTIO_GPU_FLAG_RUTABAGA_ENABLED,
+VIRTIO_GPU_FLAG_VENUS_ENABLED,
 };
 
 #define virtio_gpu_virgl_enabled(_cfg) \
@@ -117,6 +118,8 @@ enum virtio_gpu_base_conf_flags {
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_RUTABAGA_ENABLED))
 #define virtio_gpu_hostmem_enabled(_cfg) \
 (_cfg.hostmem > 0)
+#define virtio_gpu_venus_enabled(_cfg) \
+(_cfg.flags & (1 << VIRTIO_GPU_FLAG_VENUS_ENABLED))
 
 struct virtio_gpu_base_conf {
 uint32_t max_outputs;
-- 
2.45.2

[PATCH v15 12/14] virtio-gpu: Handle resource blob commands

From: Antonio Caggiano 

Support BLOB resources creation, mapping and unmapping by calling the
new stable virglrenderer 0.10 interface. Only enabled when available and
via the blob config. E.g. -device virtio-vga-gl,blob=true

Signed-off-by: Antonio Caggiano 
Signed-off-by: Xenia Ragiadakou 
Signed-off-by: Huang Rui 
Signed-off-by: Dmitry Osipenko 
---
 hw/display/virtio-gpu-gl.c |   3 +
 hw/display/virtio-gpu-virgl.c  | 334 +++--
 hw/display/virtio-gpu.c|   6 +-
 include/hw/virtio/virtio-gpu.h |   2 +
 4 files changed, 330 insertions(+), 15 deletions(-)

diff --git a/hw/display/virtio-gpu-gl.c b/hw/display/virtio-gpu-gl.c
index 4fe9e6a0c21c..5f27568d3ec8 100644
--- a/hw/display/virtio-gpu-gl.c
+++ b/hw/display/virtio-gpu-gl.c
@@ -160,6 +160,9 @@ static void virtio_gpu_gl_device_unrealize(DeviceState 
*qdev)
 VirtIOGPUGL *gl = VIRTIO_GPU_GL(qdev);
 
 if (gl->renderer_state >= RS_INITED) {
+#if VIRGL_VERSION_MAJOR >= 1
+qemu_bh_delete(gl->cmdq_resume_bh);
+#endif
 if (virtio_gpu_stats_enabled(g->parent_obj.conf)) {
 timer_free(gl->print_stats);
 }
diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index 60befab7efc2..f6cb4fe5b28e 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -26,6 +26,7 @@
 
 struct virtio_gpu_virgl_resource {
 struct virtio_gpu_simple_resource base;
+MemoryRegion *mr;
 };
 
 static struct virtio_gpu_virgl_resource *
@@ -49,6 +50,152 @@ virgl_get_egl_display(G_GNUC_UNUSED void *cookie)
 }
 #endif
 
+#if VIRGL_VERSION_MAJOR >= 1
+typedef enum {
+HOSTMEM_MR_UNMAPPING,
+HOSTMEM_MR_FINISH_UNMAPPING,
+} HostmemMRState;
+
+struct virtio_gpu_virgl_hostmem_region {
+MemoryRegion mr;
+struct VirtIOGPU *g;
+HostmemMRState state;
+};
+
+static struct virtio_gpu_virgl_hostmem_region *
+to_hostmem_region(MemoryRegion *mr)
+{
+return container_of(mr, struct virtio_gpu_virgl_hostmem_region, mr);
+}
+
+static void virtio_gpu_virgl_resume_cmdq_bh(void *opaque)
+{
+VirtIOGPU *g = opaque;
+
+virtio_gpu_process_cmdq(g);
+}
+
+static void virtio_gpu_virgl_hostmem_region_free(void *obj)
+{
+MemoryRegion *mr = MEMORY_REGION(obj);
+struct virtio_gpu_virgl_hostmem_region *vmr;
+VirtIOGPUBase *b;
+VirtIOGPUGL *gl;
+
+vmr = to_hostmem_region(mr);
+vmr->state = HOSTMEM_MR_FINISH_UNMAPPING;
+
+b = VIRTIO_GPU_BASE(vmr->g);
+b->renderer_blocked--;
+
+/*
+ * memory_region_unref() is executed from RCU thread context, while
+ * virglrenderer works only on the main-loop thread that's holding GL
+ * context.
+ */
+gl = VIRTIO_GPU_GL(vmr->g);
+qemu_bh_schedule(gl->cmdq_resume_bh);
+}
+
+static int
+virtio_gpu_virgl_map_resource_blob(VirtIOGPU *g,
+   struct virtio_gpu_virgl_resource *res,
+   uint64_t offset)
+{
+struct virtio_gpu_virgl_hostmem_region *vmr;
+VirtIOGPUBase *b = VIRTIO_GPU_BASE(g);
+MemoryRegion *mr;
+uint64_t size;
+void *data;
+int ret;
+
+if (!virtio_gpu_hostmem_enabled(b->conf)) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: hostmem disabled\n", __func__);
+return -EOPNOTSUPP;
+}
+
+ret = virgl_renderer_resource_map(res->base.resource_id, , );
+if (ret) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: failed to map virgl resource: 
%s\n",
+  __func__, strerror(-ret));
+return ret;
+}
+
+vmr = g_new0(struct virtio_gpu_virgl_hostmem_region, 1);
+vmr->g = g;
+
+mr = >mr;
+memory_region_init_ram_ptr(mr, OBJECT(mr), "blob", size, data);
+memory_region_add_subregion(>hostmem, offset, mr);
+memory_region_set_enabled(mr, true);
+
+/*
+ * MR could outlive the resource if MR's reference is held outside of
+ * virtio-gpu. In order to prevent unmapping resource while MR is alive,
+ * and thus, making the data pointer invalid, we will block virtio-gpu
+ * command processing until MR is fully unreferenced and freed.
+ */
+OBJECT(mr)->free = virtio_gpu_virgl_hostmem_region_free;
+
+res->mr = mr;
+
+return 0;
+}
+
+static int
+virtio_gpu_virgl_unmap_resource_blob(VirtIOGPU *g,
+ struct virtio_gpu_virgl_resource *res,
+ bool *cmd_suspended)
+{
+struct virtio_gpu_virgl_hostmem_region *vmr;
+VirtIOGPUBase *b = VIRTIO_GPU_BASE(g);
+MemoryRegion *mr = res->mr;
+int ret;
+
+if (!mr) {
+return 0;
+}
+
+vmr = to_hostmem_region(res->mr);
+
+/*
+ * Perform async unmapping in 3 steps:
+ *
+ * 1. Begin async unmapping with memory_region_del_subregion()
+ *and suspend/block cmd processing.
+ * 2. Wait for res->mr to be freed and cmd processing resumed
+ *asynchronously by virtio_gpu_virgl_hostmem_region_free().
+ * 3. Finish the unmapping with final

[PATCH v15 08/14] virtio-gpu: Don't require udmabuf when blobs and virgl are enabled

The udmabuf usage is mandatory when virgl is disabled and blobs feature
enabled in the Qemu machine configuration. If virgl and blobs are enabled,
then udmabuf requirement is optional. Since udmabuf isn't widely supported
by a popular Linux distros today, let's relax the udmabuf requirement for
blobs=on,virgl=on. Now, a full-featured virtio-gpu acceleration is
available to Qemu users without a need to have udmabuf available in the
system.

Reviewed-by: Antonio Caggiano 
Signed-off-by: Huang Rui 
Reviewed-by: Antonio Caggiano 
Reviewed-by: Marc-André Lureau 
Signed-off-by: Dmitry Osipenko 
---
 hw/display/virtio-gpu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index 602952a7041b..40a9d089710c 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -1485,6 +1485,7 @@ void virtio_gpu_device_realize(DeviceState *qdev, Error 
**errp)
 
 if (virtio_gpu_blob_enabled(g->parent_obj.conf)) {
 if (!virtio_gpu_rutabaga_enabled(g->parent_obj.conf) &&
+!virtio_gpu_virgl_enabled(g->parent_obj.conf) &&
 !virtio_gpu_have_udmabuf()) {
 error_setg(errp, "need rutabaga or udmabuf for blob resources");
 return;
-- 
2.45.2

[PATCH v15 03/14] virtio-gpu: Move print_stats timer to VirtIOGPUGL

Move print_stats timer to VirtIOGPUGL for consistency with
cmdq_resume_bh and fence_poll that are used only by GL device.

Signed-off-by: Dmitry Osipenko 
---
 hw/display/virtio-gpu-virgl.c  | 10 ++
 include/hw/virtio/virtio-gpu.h |  2 +-
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index 91dce90f9176..a63d1f540f04 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -574,6 +574,7 @@ static struct virgl_renderer_callbacks virtio_gpu_3d_cbs = {
 static void virtio_gpu_print_stats(void *opaque)
 {
 VirtIOGPU *g = opaque;
+VirtIOGPUGL *gl = VIRTIO_GPU_GL(g);
 
 if (g->stats.requests) {
 fprintf(stderr, "stats: vq req %4d, %3d -- 3D %4d (%5d)\n",
@@ -588,7 +589,7 @@ static void virtio_gpu_print_stats(void *opaque)
 } else {
 fprintf(stderr, "stats: idle\r");
 }
-timer_mod(g->print_stats, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 1000);
+timer_mod(gl->print_stats, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 1000);
 }
 
 static void virtio_gpu_fence_poll(void *opaque)
@@ -651,9 +652,10 @@ int virtio_gpu_virgl_init(VirtIOGPU *g)
   virtio_gpu_fence_poll, g);
 
 if (virtio_gpu_stats_enabled(g->parent_obj.conf)) {
-g->print_stats = timer_new_ms(QEMU_CLOCK_VIRTUAL,
-  virtio_gpu_print_stats, g);
-timer_mod(g->print_stats, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 
1000);
+gl->print_stats = timer_new_ms(QEMU_CLOCK_VIRTUAL,
+   virtio_gpu_print_stats, g);
+timer_mod(gl->print_stats,
+  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 1000);
 }
 return 0;
 }
diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index bc69fd78a440..7ff989a45a5c 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -196,7 +196,6 @@ struct VirtIOGPU {
 uint64_t hostmem;
 
 bool processing_cmdq;
-QEMUTimer *print_stats;
 
 uint32_t inflight;
 struct {
@@ -232,6 +231,7 @@ struct VirtIOGPUGL {
 bool renderer_reset;
 
 QEMUTimer *fence_poll;
+QEMUTimer *print_stats;
 };
 
 struct VhostUserGPU {
-- 
2.45.2

[PATCH] target/ppc/mem_helper.c: Remove a conditional from dcbz_common()

2024-06-22 Thread BALATON Zoltan

Instead of passing a bool and select a value within dcbz_common() let
the callers pass in the right value to avoid this conditional
statement. On PPC dcbz is often used to zero memory and some code uses
it a lot. This change improves the run time of a test case that copies
memory with a dcbz call in every iteration from 6.23 to 5.83 seconds.

Signed-off-by: BALATON Zoltan 
---
This is just a small optimisation removing some of the overhead but
dcbz still seems to be the biggest issue with this test. Removing the
dcbz call it runs in 2 seconds. In a profile I see:
  Children  Self  Command   Shared ObjectSymbol
-   55.01%11.44%  qemu-ppc  qemu-ppc [.] 
dcbz_common.constprop.0
   - 43.57% dcbz_common.constprop.0
  - probe_access
 - page_get_flags
  interval_tree_iter_first
   - 11.44% helper_raise_exception_err
cpu_loop_exit_restore
cpu_loop
cpu_exec
cpu_exec_setjmp.isra.0
cpu_exec_loop.constprop.0
cpu_tb_exec
0x7f262403636e
helper_raise_exception_err
cpu_loop_exit_restore
cpu_loop
cpu_exec
cpu_exec_setjmp.isra.0
cpu_exec_loop.constprop.0
cpu_tb_exec
  - 0x7f26240386a4
   11.20% helper_dcbz
+   43.81%12.28%  qemu-ppc  qemu-ppc [.] probe_access
+   39.31% 0.00%  qemu-ppc  [JIT] tid 9969   [.] 0x7f262400
+   32.45% 4.51%  qemu-ppc  qemu-ppc [.] page_get_flags
+   25.50% 2.10%  qemu-ppc  qemu-ppc [.] 
interval_tree_iter_first
+   24.67%24.67%  qemu-ppc  qemu-ppc [.] 
interval_tree_subtree_search
+   16.75% 1.19%  qemu-ppc  qemu-ppc [.] helper_dcbz
+4.78% 4.78%  qemu-ppc  [JIT] tid 9969   [.] 0x7f26240386be
+3.46% 3.46%  qemu-ppc  libc-2.32.so [.] 
__memset_avx2_unaligned_erms
Any idea how this could be optimised further? (This is running with
qemu-ppc user mode emulation but I think with system it might be even
worse.) Could an inline implementation with TCG vector ops work to
avoid the helper and let it compile to efficient host code? Even if
that could work I don't know how to do that so I'd need some further
advice on this.

 target/ppc/mem_helper.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c
index f88155ad45..361fd72226 100644
--- a/target/ppc/mem_helper.c
+++ b/target/ppc/mem_helper.c
@@ -271,12 +271,11 @@ void helper_stsw(CPUPPCState *env, target_ulong addr, 
uint32_t nb,
 }
 
 static void dcbz_common(CPUPPCState *env, target_ulong addr,
-uint32_t opcode, bool epid, uintptr_t retaddr)
+uint32_t opcode, int mmu_idx, uintptr_t retaddr)
 {
 target_ulong mask, dcbz_size = env->dcache_line_size;
 uint32_t i;
 void *haddr;
-int mmu_idx = epid ? PPC_TLB_EPID_STORE : ppc_env_mmu_index(env, false);
 
 #if defined(TARGET_PPC64)
 /* Check for dcbz vs dcbzl on 970 */
@@ -309,12 +308,12 @@ static void dcbz_common(CPUPPCState *env, target_ulong 
addr,
 
 void helper_dcbz(CPUPPCState *env, target_ulong addr, uint32_t opcode)
 {
-dcbz_common(env, addr, opcode, false, GETPC());
+dcbz_common(env, addr, opcode, ppc_env_mmu_index(env, false), GETPC());
 }
 
 void helper_dcbzep(CPUPPCState *env, target_ulong addr, uint32_t opcode)
 {
-dcbz_common(env, addr, opcode, true, GETPC());
+dcbz_common(env, addr, opcode, PPC_TLB_EPID_STORE, GETPC());
 }
 
 void helper_icbi(CPUPPCState *env, target_ulong addr)
-- 
2.30.9

Re: [PATCH 11/23] Update ARM AArch64 VM parameter definitions for bsd-user

2024-06-22 Thread Warner Losh

On Tue, Jun 18, 2024 at 4:16 PM Richard Henderson <
richard.hender...@linaro.org> wrote:

> On 6/17/24 11:57, Ajeet Singh wrote:
> > From: Stacey Son 
> >
> > Defined address spaces for FreeBSD/arm64 and added function for
> > getting stack pointer from CPU and setting a return value.
> >
> > Signed-off-by: Stacey Son 
> > Signed-off-by: Warner Losh 
> > Signed-off-by: Ajeet Singh 
> > Co-authored-by: Sean Bruno 
> > Co-authored-by: Warner Losh 
> > ---
> >   bsd-user/aarch64/target_arch_vmparam.h | 68 ++
> >   1 file changed, 68 insertions(+)
> >   create mode 100644 bsd-user/aarch64/target_arch_vmparam.h
>
> Acked-by: Richard Henderson 
>
> > +/* KERNBASE - 512 MB */
> > +#define TARGET_VM_MAXUSER_ADDRESS   (0x7f00ULL - (512 *
> MiB))
> > +#define TARGET_USRSTACK TARGET_VM_MAXUSER_ADDRESS
>
> I will note that this may conflict with -R reserved_size,
> and is an existing issue with the x86_64 port as well.
>

There are indeed existing issues with address space management. We're
working through
them right now in the blitz branch. We have finally found where the atomic
issues were
coming from and it is  not setting the flag saying we want atomic
ops when creating
the CPU structures (that's a quick summary, I'll post more on this later
when we review it).
So I'd suggest, for the moment, allowing this in and fixing it when we get
those details
ironed out. Does that sound OK?

Warner

Re: [PATCH 02/23] Added CPU loop function

2024-06-22 Thread Warner Losh

On Mon, Jun 17, 2024 at 10:24 PM Richard Henderson <
richard.hender...@linaro.org> wrote:

> On 6/17/24 11:57, Ajeet Singh wrote:
> > +/*
> > + * The carry bit is cleared for no error; set for error.
> > + * See arm64/arm64/vm_machdep.c cpu_set_syscall_retval()
> > + */
> > +pstate = pstate_read(env);
> > +if (ret >= 0) {
> > +pstate &= ~PSTATE_C;
> > +env->xregs[0] = ret;
> > +} else if (ret == -TARGET_ERESTART) {
> > +env->pc -= 4;
> > +break;
> > +} else if (ret != -TARGET_EJUSTRETURN) {
> > +pstate |= PSTATE_C;
> > +env->xregs[0] = -ret;
> > +}
> > +pstate_write(env, pstate);
>
> No need for full pstate read/write:
>
>  env->CF = {0,1};
>

If I understand what you're suggesting, the quoted code can be replaced
by the following, faster construct:

/*
 * The carry bit is cleared for no error; set for error.
 * See arm64/arm64/vm_machdep.c cpu_set_syscall_retval()
 */
if (ret >= 0) {
env->CF = 0;
env->xregs[0] = ret;
} else if (ret == -TARGET_ERESTART) {
env->pc -= 4;
break;
} else if (ret != -TARGET_EJUSTRETURN) {
env->CF = 1;
env->xregs[0] = -ret;
}
break;

Is that what you're saying?


> > +break;
> > +
> > +case EXCP_INTERRUPT:
> > +/* Just indicate that signals should be handle ASAP. */
> > +break;
> > +
> > +case EXCP_UDEF:
> > +force_sig_fault(TARGET_SIGILL, TARGET_ILL_ILLOPN, env->pc);
> > +break;
> > +
> > +
> > +case EXCP_PREFETCH_ABORT:
> > +case EXCP_DATA_ABORT:
> > +/* We should only arrive here with EC in {DATAABORT,
> INSNABORT}. */
> > +ec = syn_get_ec(env->exception.syndrome);
>
> Nevermind about my question about syndrome.h vs patch 1.
>

Ah, Since we have to re-roll this patch anyway, maybe moving it is a good
idea?
Honestly, I'm good either way.

Warner


> r~
>

Re: [PATCH 01/23] Add CPU initialization function

2024-06-22 Thread Warner Losh

On Mon, Jun 17, 2024 at 10:17 PM Richard Henderson <
richard.hender...@linaro.org> wrote:

> On 6/17/24 11:57, Ajeet Singh wrote:
> > From: Stacey Son 
> >
> > Addded function to initialize ARM CPU
> > and to check if it supports 64 bit mode
> >
> > Signed-off-by: Ajeet Singh 
> > Signed-off-by: Stacey Son 
> > ---
> >   bsd-user/aarch64/target_arch_cpu.h | 42 ++
> >   1 file changed, 42 insertions(+)
> >   create mode 100644 bsd-user/aarch64/target_arch_cpu.h
> >
> > diff --git a/bsd-user/aarch64/target_arch_cpu.h
> b/bsd-user/aarch64/target_arch_cpu.h
> > new file mode 100644
> > index 00..db5c7062b9
> > --- /dev/null
> > +++ b/bsd-user/aarch64/target_arch_cpu.h
> > @@ -0,0 +1,42 @@
> > +/*
> > + *  ARM AArch64 cpu init and loop
> > + *
> > + * Copyright (c) 2015 Stacey Son
> > + *
> > + * This library is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU Lesser General Public
> > + * License as published by the Free Software Foundation; either
> > + * version 2 of the License, or (at your option) any later version.
> > + *
> > + * This library is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > + * Lesser General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU Lesser General Public
> > + * License along with this library; if not, see <
> http://www.gnu.org/licenses/>.
> > + */
> > +
> > +#ifndef TARGET_ARCH_CPU_H
> > +#define TARGET_ARCH_CPU_H
> > +
> > +#include "target_arch.h"
> > +#include "target/arm/syndrome.h"
>
> Do you actually need syndrome.h?
>

It's needed, but not for this chunk. It is needed for patch 2 because we
start to use the syndrome functions there to dispatch / decode the traps.
So that should be moved to patch 2 in the next round, I think.

Also

Reviewed-by: Warner Losh 

since this looks correct and I didn't write it :)

Warner


> Otherwise,
> Reviewed-by: Richard Henderson 
>
> r~
>
> > +
> > +#define TARGET_DEFAULT_CPU_MODEL "any"
> > +
> > +static inline void target_cpu_init(CPUARMState *env,
> > +struct target_pt_regs *regs)
> > +{
> > +int i;
> > +
> > +if (!(arm_feature(env, ARM_FEATURE_AARCH64))) {
> > +fprintf(stderr, "The selected ARM CPU does not support 64 bit
> mode\n");
> > +exit(1);
> > +}
> > +for (i = 0; i < 31; i++) {
> > +env->xregs[i] = regs->regs[i];
> > +}
> > +env->pc = regs->pc;
> > +env->xregs[31] = regs->sp;
> > +}
>
>

Re: [PULL 15/23] Revert "host/i386: assume presence of SSE2"

2024-06-22 Thread Richard Henderson


On 6/21/24 23:15, Paolo Bonzini wrote:

This reverts commit b18236897ca15c3db1506d8edb9a191dfe51429c.
The x86-64 instruction set can now be tuned down to x86-64 v1
or i386 Pentium Pro.

Signed-off-by: Paolo Bonzini 
---
  host/include/i386/host/cpuinfo.h | 1 +
  util/bufferiszero.c  | 4 ++--
  util/cpuinfo-i386.c  | 1 +
  3 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/host/include/i386/host/cpuinfo.h b/host/include/i386/host/cpuinfo.h
index 72f6fad61e5..81771733eaa 100644
--- a/host/include/i386/host/cpuinfo.h
+++ b/host/include/i386/host/cpuinfo.h
@@ -14,6 +14,7 @@
  #define CPUINFO_POPCNT  (1u << 4)
  #define CPUINFO_BMI1(1u << 5)
  #define CPUINFO_BMI2(1u << 6)
+#define CPUINFO_SSE2(1u << 7)
  #define CPUINFO_AVX1(1u << 9)
  #define CPUINFO_AVX2(1u << 10)
  #define CPUINFO_AVX512F (1u << 11)
diff --git a/util/bufferiszero.c b/util/bufferiszero.c
index 11c080e02cf..74864f7b782 100644
--- a/util/bufferiszero.c
+++ b/util/bufferiszero.c
@@ -188,14 +188,14 @@ static biz_accel_fn const accel_table[] = {
  
  static unsigned best_accel(void)

  {
-#ifdef CONFIG_AVX2_OPT
  unsigned info = cpuinfo_init();
  
+#ifdef CONFIG_AVX2_OPT

  if (info & CPUINFO_AVX2) {
  return 2;
  }
  #endif
-return 1;
+return info & CPUINFO_SSE2 ? 1 : 0;
  }


Merge conflict with master here -- bufferiszero.c has been split.
This hunk now goes in host/include/i386/host/bufferiszero.c.inc.


r~

[PATCH] hw/usb/hcd-ohci: Set transfer error code with no dev

2024-06-22 Thread Ryan Wendland

When a usb device is disconnected the transfer service functions bails
before appropraite transfer error flags are set.
This patch sets the appropriate condition code OHCI_CC_DEVICENOTRESPONDING
when a device is disconnected and consequently has no response on the USB bus.

Fixes: https://gitlab.com/qemu-project/qemu/-/issues/2081

Signed-off-by: Ryan Wendland 
---
 hw/usb/hcd-ohci.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/usb/hcd-ohci.c b/hw/usb/hcd-ohci.c
index acd6016980..8cd25d74af 100644
--- a/hw/usb/hcd-ohci.c
+++ b/hw/usb/hcd-ohci.c
@@ -980,7 +980,8 @@ static int ohci_service_td(OHCIState *ohci, struct ohci_ed 
*ed)
 dev = ohci_find_device(ohci, OHCI_BM(ed->flags, ED_FA));
 if (dev == NULL) {
 trace_usb_ohci_td_dev_error();
-return 1;
+OHCI_SET_BM(td.flags, TD_CC, OHCI_CC_DEVICENOTRESPONDING);
+goto exit_and_retire;
 }
 ep = usb_ep_get(dev, pid, OHCI_BM(ed->flags, ED_EN));
 if (ohci->async_td) {
@@ -1087,6 +1088,7 @@ static int ohci_service_td(OHCIState *ohci, struct 
ohci_ed *ed)
 ed->head |= OHCI_ED_H;
 }
 
+exit_and_retire:
 /* Retire this TD */
 ed->head &= ~OHCI_DPTR_MASK;
 ed->head |= td.next & OHCI_DPTR_MASK;
-- 
2.34.1

[PULL 17/18] hw/usb/hcd-ohci: Fix ohci_service_td: accept zero-length TDs where CBP=BE+1

From: David Hubbard 

This changes the way the ohci emulation handles a Transfer Descriptor
with "Buffer End" set to "Current Buffer Pointer" - 1, specifically
in the case of a zero-length packet.

The OHCI spec 4.3.1.2 Table 4-2 specifies td.cbp to be zero for a
zero-length packet.  Peter Maydell tracked down commit 1328fe0c32
(hw: usb: hcd-ohci: check len and frame_number variables) where qemu
started checking this according to the spec.

What this patch does is loosen the qemu ohci implementation to allow a
zero-length packet if td.be (Buffer End) is set to td.cbp - 1, and with a
non-zero td.cbp value.

The spec is unclear whether this is valid or not -- it is not the
clearly documented way to send a zero length TD (which is CBP=BE=0),
but it isn't specifically forbidden. Actual hw seems to be ok with it.

Does any OS rely on this behavior? There have been no reports to
qemu-devel of this problem.

This is attempting to have qemu behave like actual hardware,
but this is just a minor change.

With a tiny OS[1] that boots and executes a test, the issue can be seen:

* OS that sends USB requests to a USB mass storage device
  but sends td.cbp = td.be + 1
* qemu 4.2
* qemu HEAD (4e66a0854)
* Actual OHCI controller (hardware)

Command line:
qemu-system-x86_64 -m 20 \
 -device pci-ohci,id=ohci \
 -drive if=none,format=raw,id=d,file=testmbr.raw \
 -device usb-storage,bus=ohci.0,drive=d \
 --trace "usb_*" --trace "ohci_*" -D qemu.log

Results are:

 qemu 4.2   | qemu HEAD  | actual HW
 ---++---
 works fine | ohci_die() | works fine

Tip: if the flags "-serial pty -serial stdio" are added to the command line
the test will output USB requests like this:

Testing qemu HEAD:

> Free mem 2M ohci port2 conn FS
> setup { 80 6 0 1 0 0 8 0 }
> ED info=8 { mps=8 en=0 d=0 } tail=c20920
>   td0 c20880 nxt=c20960 f200 setup cbp=c20900 be=c20907
>   td1 c20960 nxt=c20980 f314in cbp=c20908 be=c2090f
>   td2 c20980 nxt=c20920 f308   out cbp=c20910 be=c2090f ohci20 host err
> usb stopped

And in qemu.log:

usb_ohci_iso_td_bad_cc_overrun ISO_TD start_offset=0x00c20910 > 
next_offset=0x00c2090f

Testing qemu 4.2:

> Free mem 2M ohci port2 conn FS
> setup { 80 6 0 1 0 0 8 0 }
> ED info=8 { mps=8 en=0 d=0 } tail=620920
>   td0 620880 nxt=620960 f200 setup cbp=620900 be=620907   cbp=0 
> be=620907
>   td1 620960 nxt=620980 f314in cbp=620908 be=62090f   cbp=0 
> be=62090f
>   td2 620980 nxt=620920 f308   out cbp=620910 be=62090f   cbp=0 
> be=62090f
>rx { 12 1 0 2 0 0 0 8 }
> setup { 0 5 1 0 0 0 0 0 } tx {}
> ED info=8 { mps=8 en=0 d=0 } tail=620880
>   td0 620920 nxt=620960 f200 setup cbp=620900 be=620907   cbp=0 
> be=620907
>   td1 620960 nxt=620880 f310in cbp=620908 be=620907   cbp=0 
> be=620907
> setup { 80 6 0 1 0 0 12 0 }
> ED info=80001 { mps=8 en=0 d=1 } tail=620960
>   td0 620880 nxt=6209c0 f200 setup cbp=620920 be=620927   cbp=0 
> be=620927
>   td1 6209c0 nxt=6209e0 f314in cbp=620928 be=620939   cbp=0 
> be=620939
>   td2 6209e0 nxt=620960 f308   out cbp=62093a be=620939   cbp=0 
> be=620939
>rx { 12 1 0 2 0 0 0 8 f4 46 1 0 0 0 1 2 3 1 }
> setup { 80 6 0 2 0 0 0 1 }
> ED info=80001 { mps=8 en=0 d=1 } tail=620880
>   td0 620960 nxt=6209a0 f200 setup cbp=620a20 be=620a27   cbp=0 
> be=620a27
>   td1 6209a0 nxt=6209c0 f3140004in cbp=620a28 be=620b27   cbp=620a48 
> be=620b27
>   td2 6209c0 nxt=620880 f308   out cbp=620b28 be=620b27   cbp=0 
> be=620b27
>rx { 9 2 20 0 1 1 4 c0 0 9 4 0 0 2 8 6 50 0 7 5 81 2 40 0 0 7 5 2 2 40 0 0 
> }
> setup { 0 9 1 0 0 0 0 0 } tx {}
> ED info=80001 { mps=8 en=0 d=1 } tail=620900
>   td0 620880 nxt=620940 f200 setup cbp=620a00 be=620a07   cbp=0 
> be=620a07
>   td1 620940 nxt=620900 f310in cbp=620a08 be=620a07   cbp=0 
> be=620a07

[1] The OS disk image has been emailed to phi...@linaro.org, m...@tls.msk.ru,
and kra...@redhat.com:

* testCbpOffBy1.img.xz
* sha256: f87baddcb86de845de12f002c698670a426affb40946025cc32694f9daa3abed

Signed-off-by: David Hubbard 
Reviewed-by: Alex Bennée 
Reviewed-by: Peter Maydell 
Signed-off-by: Peter Maydell 
---
 hw/usb/hcd-ohci.c   | 4 ++--
 hw/usb/trace-events | 1 +
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/usb/hcd-ohci.c b/hw/usb/hcd-ohci.c
index acd60169802..71b54914d32 100644
--- a/hw/usb/hcd-ohci.c
+++ b/hw/usb/hcd-ohci.c
@@ -941,8 +941,8 @@ static int ohci_service_td(OHCIState *ohci, struct ohci_ed 
*ed)
 if ((td.cbp & 0xf000) != (td.be & 0xf000)) {
 len = (td.be & 0xfff) + 0x1001 - (td.cbp & 0xfff);
 } else {
-if (td.cbp > td.be) {
-trace_usb_ohci_iso_td_bad_cc_overrun(td.cbp, td.be);
+if (td.cbp - 1 > td.be) {  /* rely on td.cbp != 0 */
+trace_usb_ohci_td_bad_buf(td.cbp, td.be);
 ohci_die(ohci);
 return 1;

[PULL 06/18] scripts/coverity-scan/COMPONENTS.md: Fix 'char' component

The 'char' component:
 * includes the no-longer-present qemu-char.c, which has been
   long since split into the chardev/ backend code
 * also includes the hw/char devices

Split it into two components:
 * char is the hw/char devices
 * chardev is the chardev backends
with regexes matching our current sources.

Signed-off-by: Peter Maydell 
Reviewed-by: Philippe Mathieu-Daudé 
Message-id: 20240604145934.1230583-3-peter.mayd...@linaro.org
---
 scripts/coverity-scan/COMPONENTS.md | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/scripts/coverity-scan/COMPONENTS.md 
b/scripts/coverity-scan/COMPONENTS.md
index 98d4bcd6a50..fb081a59265 100644
--- a/scripts/coverity-scan/COMPONENTS.md
+++ b/scripts/coverity-scan/COMPONENTS.md
@@ -73,7 +73,10 @@ block
   ~ 
.*/qemu(/block.*|(/include?)/(block|storage-daemon)/.*|(/include)?/hw/(block|ide|nvme)/.*|/qemu-(img|io).*|/util/(aio|async|thread-pool).*)
 
 char
-  ~ .*/qemu(/qemu-char\.c|/include/sysemu/char\.h|(/include)?/hw/char/.*)
+  ~ .*/qemu((/include)?/hw/char/.*)
+
+chardev
+  ~ .*/qemu((/include)?/chardev/.*)
 
 crypto
   ~ 
.*/qemu((/include)?/crypto/.*|/hw/.*/.*crypto.*|(/include/sysemu|/backends)/cryptodev.*)
-- 
2.34.1

[PULL 04/18] hw/arm/xilinx_zynq: Fix IRQ/FIQ routing

From: Sebastian Huber 

Fix the system bus interrupt line to CPU core assignment.

Fixes: ddcf58e044ce0 ("hw/arm/xilinx_zynq: Support up to two CPU cores")
Signed-off-by: Sebastian Huber 
Reviewed-by: Philippe Mathieu-Daudé 
Message-id: 20240610052906.4432-1-sebastian.hu...@embedded-brains.de
Signed-off-by: Peter Maydell 
---
 hw/arm/xilinx_zynq.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/arm/xilinx_zynq.c b/hw/arm/xilinx_zynq.c
index 7f7a3d23fbe..c79661bbc1b 100644
--- a/hw/arm/xilinx_zynq.c
+++ b/hw/arm/xilinx_zynq.c
@@ -252,10 +252,11 @@ static void zynq_init(MachineState *machine)
 zynq_binfo.gic_cpu_if_addr = MPCORE_PERIPHBASE + 0x100;
 sysbus_create_varargs("l2x0", MPCORE_PERIPHBASE + 0x2000, NULL);
 for (n = 0; n < smp_cpus; n++) {
+/* See "hw/intc/arm_gic.h" for the IRQ line association */
 DeviceState *cpudev = DEVICE(zynq_machine->cpu[n]);
-sysbus_connect_irq(busdev, (2 * n) + 0,
+sysbus_connect_irq(busdev, n,
qdev_get_gpio_in(cpudev, ARM_CPU_IRQ));
-sysbus_connect_irq(busdev, (2 * n) + 1,
+sysbus_connect_irq(busdev, smp_cpus + n,
qdev_get_gpio_in(cpudev, ARM_CPU_FIQ));
 }
 
-- 
2.34.1

[PULL 08/18] scripts/coverity-scan/COMPONENTS.md: Fix monitor component

Update the 'monitor' component:
 * qapi/ and monitor/ are now subdirectories
 * add job-qmp.c

Signed-off-by: Peter Maydell 
Reviewed-by: Philippe Mathieu-Daudé 
Message-id: 20240604145934.1230583-5-peter.mayd...@linaro.org
---
 scripts/coverity-scan/COMPONENTS.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/coverity-scan/COMPONENTS.md 
b/scripts/coverity-scan/COMPONENTS.md
index 205ab23b280..3864f8eda07 100644
--- a/scripts/coverity-scan/COMPONENTS.md
+++ b/scripts/coverity-scan/COMPONENTS.md
@@ -97,7 +97,7 @@ migration
   ~ .*/qemu((/include)?/migration/.*)
 
 monitor
-  ~ .*/qemu(/qapi.*|/qobject/.*|/monitor\..*|/[hq]mp\..*)
+  ~ .*/qemu((/include)?/(qapi|qobject|monitor)/.*|/job-qmp.c)
 
 nbd
   ~ .*/qemu(/nbd/.*|/include/block/nbd.*|/qemu-nbd\.c)
-- 
2.34.1

[PULL 00/18] target-arm queue

Hi; here's the latest target-arm pullreq; this is pretty much
just various bugfixes.

-- PMM

The following changes since commit 02d9c38236cf8c9826e5c5be61780ccb4ae0:

  Merge tag 'pull-tcg-20240619' of https://gitlab.com/rth7680/qemu into staging 
(2024-06-19 14:00:39 -0700)

are available in the Git repository at:

  https://git.linaro.org/people/pmaydell/qemu-arm.git 
tags/pull-target-arm-20240622

for you to fetch changes up to 3b36cead6ecc0e40edb8b2f3e253baa01ebc1e9a:

  hw/arm/sbsa-ref: Enable CPU cluster on ARM sbsa machine (2024-06-21 16:24:46 
+0100)


target-arm queue:
 * hw/net/can/xlnx-versal-canfd: Fix sorting of the tx queue
 * hw/arm/xilinx_zynq: Fix IRQ/FIQ routing
 * hw/intc/arm_gic: Fix deactivation of SPI lines
 * hw/timer/a9gtimer: Handle QTest mode in a9_gtimer_get_current_cpu
 * hw/misc: Set valid access size for Exynos4210 RNG
 * hw/arm/sbsa-ref: switch to 1GHz timer frequency
 * hw/arm/sbsa-ref: Enable CPU cluster on ARM sbsa machine
 * hw/arm/virt: allow creation of a second NonSecure UART
 * hw/arm/virt: Avoid unexpected warning from Linux guest on host with Fujitsu 
CPUs
 * scripts/coverity-scan/COMPONENTS.md: update component regexes
 * hw/usb/hcd-dwc2: Handle invalid address access in read and write functions
 * hw/usb/hcd-ohci: Fix ohci_service_td: accept zero-length TDs where CBP=BE+1


David Hubbard (1):
  hw/usb/hcd-ohci: Fix ohci_service_td: accept zero-length TDs where 
CBP=BE+1

Edgar E. Iglesias (1):
  hw/intc/arm_gic: Fix deactivation of SPI lines

Marcin Juszkiewicz (1):
  hw/arm/sbsa-ref: switch to 1GHz timer frequency

Peter Maydell (8):
  scripts/coverity-scan/COMPONENTS.md: Update paths to match gitlab CI
  scripts/coverity-scan/COMPONENTS.md: Fix 'char' component
  scripts/coverity-scan/COMPONENTS.md: Add crypto headers in host/include 
to the crypto component
  scripts/coverity-scan/COMPONENTS.md: Fix monitor component
  scripts/coverity-scan/COMPONENTS.md: Include libqmp in testlibs
  hw/arm/virt: Add serial aliases in DTB
  hw/arm/virt: Rename VIRT_UART and VIRT_SECURE_UART to VIRT_UART[01]
  hw/arm/virt: allow creation of a second NonSecure UART

Sebastian Huber (1):
  hw/arm/xilinx_zynq: Fix IRQ/FIQ routing

Shiva sagar Myana (1):
  hw/net/can/xlnx-versal-canfd: Fix sorting of the tx queue

Xiong Yining (1):
  hw/arm/sbsa-ref: Enable CPU cluster on ARM sbsa machine

Zhenyu Zhang (1):
  hw/arm/virt: Avoid unexpected warning from Linux guest on host with 
Fujitsu CPUs

Zheyu Ma (3):
  hw/timer/a9gtimer: Handle QTest mode in a9_gtimer_get_current_cpu
  hw/usb/hcd-dwc2: Handle invalid address access in read and write functions
  hw/misc: Set valid access size for Exynos4210 RNG

 docs/system/arm/sbsa.rst|   4 ++
 docs/system/arm/virt.rst|   6 +-
 hw/intc/gic_internal.h  |   8 ++-
 include/hw/arm/virt.h   |   5 +-
 hw/arm/sbsa-ref.c   |  23 +---
 hw/arm/virt-acpi-build.c|  22 +---
 hw/arm/virt.c   |  63 ++---
 hw/arm/xilinx_zynq.c|   5 +-
 hw/misc/exynos4210_rng.c|   2 +
 hw/net/can/xlnx-versal-canfd.c  |   5 +-
 hw/timer/a9gtimer.c |   5 ++
 hw/usb/hcd-dwc2.c   |   9 ++-
 hw/usb/hcd-ohci.c   |   4 +-
 hw/usb/trace-events |   1 +
 scripts/coverity-scan/COMPONENTS.md | 107 ++--
 15 files changed, 179 insertions(+), 90 deletions(-)

[PULL 13/18] hw/arm/virt: Rename VIRT_UART and VIRT_SECURE_UART to VIRT_UART[01]

We're going to make the second UART not always a secure-only device.
Rename the constants VIRT_UART and VIRT_SECURE_UART to VIRT_UART0
and VIRT_UART1 accordingly.

Signed-off-by: Peter Maydell 
Reviewed-by: Philippe Mathieu-Daudé 
Message-id: 20240610162343.2131524-3-peter.mayd...@linaro.org
---
 include/hw/arm/virt.h|  4 ++--
 hw/arm/virt-acpi-build.c | 12 ++--
 hw/arm/virt.c| 14 +++---
 3 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index bb486d36b14..1227e7f7f08 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -59,7 +59,7 @@ enum {
 VIRT_GIC_ITS,
 VIRT_GIC_REDIST,
 VIRT_SMMU,
-VIRT_UART,
+VIRT_UART0,
 VIRT_MMIO,
 VIRT_RTC,
 VIRT_FW_CFG,
@@ -69,7 +69,7 @@ enum {
 VIRT_PCIE_ECAM,
 VIRT_PLATFORM_BUS,
 VIRT_GPIO,
-VIRT_SECURE_UART,
+VIRT_UART1,
 VIRT_SECURE_MEM,
 VIRT_SECURE_GPIO,
 VIRT_PCDIMM_ACPI,
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index c3ccfef026f..eb5796e309b 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -440,10 +440,10 @@ spcr_setup(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 .base_addr.width = 32,
 .base_addr.offset = 0,
 .base_addr.size = 3,
-.base_addr.addr = vms->memmap[VIRT_UART].base,
+.base_addr.addr = vms->memmap[VIRT_UART0].base,
 .interrupt_type = (1 << 3),/* Bit[3] ARMH GIC interrupt*/
 .pc_interrupt = 0, /* IRQ */
-.interrupt = (vms->irqmap[VIRT_UART] + ARM_SPI_BASE),
+.interrupt = (vms->irqmap[VIRT_UART0] + ARM_SPI_BASE),
 .baud_rate = 3,/* 9600 */
 .parity = 0,   /* No Parity */
 .stop_bits = 1,/* 1 Stop bit */
@@ -631,11 +631,11 @@ build_dbg2(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 
 /* BaseAddressRegister[] */
 build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 32, 0, 3,
- vms->memmap[VIRT_UART].base);
+ vms->memmap[VIRT_UART0].base);
 
 /* AddressSize[] */
 build_append_int_noprefix(table_data,
-  vms->memmap[VIRT_UART].size, 4);
+  vms->memmap[VIRT_UART0].size, 4);
 
 /* NamespaceString[] */
 g_array_append_vals(table_data, name, namespace_length);
@@ -816,8 +816,8 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
  */
 scope = aml_scope("\\_SB");
 acpi_dsdt_add_cpus(scope, vms);
-acpi_dsdt_add_uart(scope, [VIRT_UART],
-   (irqmap[VIRT_UART] + ARM_SPI_BASE));
+acpi_dsdt_add_uart(scope, [VIRT_UART0],
+   (irqmap[VIRT_UART0] + ARM_SPI_BASE));
 if (vmc->acpi_expose_flash) {
 acpi_dsdt_add_flash(scope, [VIRT_FLASH]);
 }
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 61a9d47c026..ffb4983885f 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -165,11 +165,11 @@ static const MemMapEntry base_memmap[] = {
 [VIRT_GIC_ITS] ={ 0x0808, 0x0002 },
 /* This redistributor space allows up to 2*64kB*123 CPUs */
 [VIRT_GIC_REDIST] = { 0x080A, 0x00F6 },
-[VIRT_UART] =   { 0x0900, 0x1000 },
+[VIRT_UART0] =  { 0x0900, 0x1000 },
 [VIRT_RTC] ={ 0x0901, 0x1000 },
 [VIRT_FW_CFG] = { 0x0902, 0x0018 },
 [VIRT_GPIO] =   { 0x0903, 0x1000 },
-[VIRT_SECURE_UART] ={ 0x0904, 0x1000 },
+[VIRT_UART1] =  { 0x0904, 0x1000 },
 [VIRT_SMMU] =   { 0x0905, 0x0002 },
 [VIRT_PCDIMM_ACPI] ={ 0x0907, MEMORY_HOTPLUG_IO_LEN },
 [VIRT_ACPI_GED] =   { 0x0908, ACPI_GED_EVT_SEL_LEN },
@@ -212,11 +212,11 @@ static MemMapEntry extended_memmap[] = {
 };
 
 static const int a15irqmap[] = {
-[VIRT_UART] = 1,
+[VIRT_UART0] = 1,
 [VIRT_RTC] = 2,
 [VIRT_PCIE] = 3, /* ... to 6 */
 [VIRT_GPIO] = 7,
-[VIRT_SECURE_UART] = 8,
+[VIRT_UART1] = 8,
 [VIRT_ACPI_GED] = 9,
 [VIRT_MMIO] = 16, /* ...to 16 + NUM_VIRTIO_TRANSPORTS - 1 */
 [VIRT_GIC_V2M] = 48, /* ...to 48 + NUM_GICV2M_SPIS - 1 */
@@ -939,7 +939,7 @@ static void create_uart(const VirtMachineState *vms, int 
uart,
 qemu_fdt_setprop(ms->fdt, nodename, "clock-names",
  clocknames, sizeof(clocknames));
 
-if (uart == VIRT_UART) {
+if (uart == VIRT_UART0) {
 qemu_fdt_setprop_string(ms->fdt, "/chosen", "stdout-path", nodename);
 qemu_fdt_setprop_string(ms->fdt, "/aliases", "serial0", nodename);
 } else {
@@ -2317,11 +2317,11 @@ static void machvirt_init(MachineState *machine)
 
 fdt_add_pmu_nodes(vms);
 
-create_uart(vms, VIRT_UART, sysmem, serial_hd(0));
+create_uart(vms, VIRT_UART0, sysmem,

[PULL 07/18] scripts/coverity-scan/COMPONENTS.md: Add crypto headers in host/include to the crypto component

host/include/*/host/crypto/ are relatively new headers; add them
to the crypto component.

Signed-off-by: Peter Maydell 
Reviewed-by: Philippe Mathieu-Daudé 
Message-id: 20240604145934.1230583-4-peter.mayd...@linaro.org
---
 scripts/coverity-scan/COMPONENTS.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/coverity-scan/COMPONENTS.md 
b/scripts/coverity-scan/COMPONENTS.md
index fb081a59265..205ab23b280 100644
--- a/scripts/coverity-scan/COMPONENTS.md
+++ b/scripts/coverity-scan/COMPONENTS.md
@@ -79,7 +79,7 @@ chardev
   ~ .*/qemu((/include)?/chardev/.*)
 
 crypto
-  ~ 
.*/qemu((/include)?/crypto/.*|/hw/.*/.*crypto.*|(/include/sysemu|/backends)/cryptodev.*)
+  ~ 
.*/qemu((/include)?/crypto/.*|/hw/.*/.*crypto.*|(/include/sysemu|/backends)/cryptodev.*|/host/include/.*/host/crypto/.*)
 
 disas
   ~ .*/qemu((/include)?/disas.*)
-- 
2.34.1

[PULL 03/18] hw/intc/arm_gic: Fix deactivation of SPI lines

From: "Edgar E. Iglesias" 

Julien reported that he has seen strange behaviour when running
Xen on QEMU using GICv2. When Xen migrates a guest's vCPU from
one pCPU to another while the vCPU is handling an interrupt, the
guest is unable to properly deactivate interrupts.

Looking at it a little closer, our GICv2 model treats
deactivation of SPI lines as if they were PPI's, i.e banked per
CPU core. The state for active interrupts should only be banked
for PPI lines, not for SPI lines.

Make deactivation of SPI lines unbanked, similar to how we
handle writes to GICD_ICACTIVER.

Reported-by: Julien Grall 
Signed-off-by: Edgar E. Iglesias 
Message-id: 20240605143044.2029444-2-edgar.igles...@gmail.com
Reviewed-by: Peter Maydell 
Signed-off-by: Peter Maydell 
---
 hw/intc/gic_internal.h | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/hw/intc/gic_internal.h b/hw/intc/gic_internal.h
index 8d29b40ca10..8ddbf554c69 100644
--- a/hw/intc/gic_internal.h
+++ b/hw/intc/gic_internal.h
@@ -280,6 +280,8 @@ static inline void gic_set_active(GICState *s, int irq, int 
cpu)
 
 static inline void gic_clear_active(GICState *s, int irq, int cpu)
 {
+unsigned int cm;
+
 if (gic_is_vcpu(cpu)) {
 uint32_t *entry = gic_get_lr_entry(s, irq, cpu);
 GICH_LR_CLEAR_ACTIVE(*entry);
@@ -301,11 +303,13 @@ static inline void gic_clear_active(GICState *s, int irq, 
int cpu)
  * the GIC is secure.
  */
 if (!s->security_extn || GIC_DIST_TEST_GROUP(phys_irq, 1 << rcpu)) 
{
-GIC_DIST_CLEAR_ACTIVE(phys_irq, 1 << rcpu);
+cm = phys_irq < GIC_INTERNAL ? 1 << rcpu : ALL_CPU_MASK;
+GIC_DIST_CLEAR_ACTIVE(phys_irq, cm);
 }
 }
 } else {
-GIC_DIST_CLEAR_ACTIVE(irq, 1 << cpu);
+cm = irq < GIC_INTERNAL ? 1 << cpu : ALL_CPU_MASK;
+GIC_DIST_CLEAR_ACTIVE(irq, cm);
 }
 }
 
-- 
2.34.1

[PULL 14/18] hw/arm/virt: allow creation of a second NonSecure UART

For some use-cases, it is helpful to have more than one UART
available to the guest.  If the second UART slot is not already used
for a TrustZone Secure-World-only UART, create it as a NonSecure UART
only when the user provides a serial backend (e.g.  via a second
-serial command line option).

This avoids problems where existing guest software only expects a
single UART, and gets confused by the second UART in the DTB.  The
major example of this is older EDK2 firmware, which will send the
GRUB bootloader output to UART1 and the guest serial output to UART0.
Users who want to use both UARTs with a guest setup including EDK2
are advised to update to EDK2 release edk2-stable202311 or newer.
(The prebuilt EDK2 blobs QEMU upstream provides are new enough.)
The relevant EDK2 changes are the ones described here:
https://bugzilla.tianocore.org/show_bug.cgi?id=4577

Inspired-by: Axel Heider 
Signed-off-by: Peter Maydell 
Tested-by: Laszlo Ersek 
Reviewed-by: Philippe Mathieu-Daudé 
Message-id: 20240610162343.2131524-4-peter.mayd...@linaro.org
---
 docs/system/arm/virt.rst |  6 +-
 include/hw/arm/virt.h|  1 +
 hw/arm/virt-acpi-build.c | 12 
 hw/arm/virt.c| 38 +++---
 4 files changed, 49 insertions(+), 8 deletions(-)

diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst
index 26fcba00b76..e67e7f0f7c5 100644
--- a/docs/system/arm/virt.rst
+++ b/docs/system/arm/virt.rst
@@ -26,7 +26,7 @@ The virt board supports:
 
 - PCI/PCIe devices
 - Flash memory
-- One PL011 UART
+- Either one or two PL011 UARTs for the NonSecure World
 - An RTC
 - The fw_cfg device that allows a guest to obtain data from QEMU
 - A PL061 GPIO controller
@@ -48,6 +48,10 @@ The virt board supports:
   - A secure flash memory
   - 16MB of secure RAM
 
+The second NonSecure UART only exists if a backend is configured
+explicitly (e.g. with a second -serial command line option) and
+TrustZone emulation is not enabled.
+
 Supported guest CPU types:
 
 - ``cortex-a7`` (32-bit)
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 1227e7f7f08..ab961bb6a9b 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -151,6 +151,7 @@ struct VirtMachineState {
 bool ras;
 bool mte;
 bool dtb_randomness;
+bool second_ns_uart_present;
 OnOffAuto acpi;
 VirtGICType gic_version;
 VirtIOMMUType iommu;
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index eb5796e309b..b2366f24f96 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -79,11 +79,11 @@ static void acpi_dsdt_add_cpus(Aml *scope, VirtMachineState 
*vms)
 }
 
 static void acpi_dsdt_add_uart(Aml *scope, const MemMapEntry *uart_memmap,
-   uint32_t uart_irq)
+   uint32_t uart_irq, int uartidx)
 {
-Aml *dev = aml_device("COM0");
+Aml *dev = aml_device("COM%d", uartidx);
 aml_append(dev, aml_name_decl("_HID", aml_string("ARMH0011")));
-aml_append(dev, aml_name_decl("_UID", aml_int(0)));
+aml_append(dev, aml_name_decl("_UID", aml_int(uartidx)));
 
 Aml *crs = aml_resource_template();
 aml_append(crs, aml_memory32_fixed(uart_memmap->base,
@@ -817,7 +817,11 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 scope = aml_scope("\\_SB");
 acpi_dsdt_add_cpus(scope, vms);
 acpi_dsdt_add_uart(scope, [VIRT_UART0],
-   (irqmap[VIRT_UART0] + ARM_SPI_BASE));
+   (irqmap[VIRT_UART0] + ARM_SPI_BASE), 0);
+if (vms->second_ns_uart_present) {
+acpi_dsdt_add_uart(scope, [VIRT_UART1],
+   (irqmap[VIRT_UART1] + ARM_SPI_BASE), 1);
+}
 if (vmc->acpi_expose_flash) {
 acpi_dsdt_add_flash(scope, [VIRT_FLASH]);
 }
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index ffb4983885f..85556152563 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -906,7 +906,7 @@ static void create_gic(VirtMachineState *vms, MemoryRegion 
*mem)
 }
 
 static void create_uart(const VirtMachineState *vms, int uart,
-MemoryRegion *mem, Chardev *chr)
+MemoryRegion *mem, Chardev *chr, bool secure)
 {
 char *nodename;
 hwaddr base = vms->memmap[uart].base;
@@ -944,6 +944,8 @@ static void create_uart(const VirtMachineState *vms, int 
uart,
 qemu_fdt_setprop_string(ms->fdt, "/aliases", "serial0", nodename);
 } else {
 qemu_fdt_setprop_string(ms->fdt, "/aliases", "serial1", nodename);
+}
+if (secure) {
 /* Mark as not usable by the normal world */
 qemu_fdt_setprop_string(ms->fdt, nodename, "status", "disabled");
 qemu_fdt_setprop_string(ms->fdt, nodename, "secure-status", "okay");
@@ -2317,11 +2319,41 @@ static void machvirt_init(MachineState *machine)
 
 fdt_add_pmu_nodes(vms);
 
-create_uart(vms, VIRT_UART0, sysmem, serial_hd(0));
+/*
+ * The first UART always

[PULL 18/18] hw/arm/sbsa-ref: Enable CPU cluster on ARM sbsa machine

From: Xiong Yining 

Enable CPU cluster support on SbsaQemu platform, so that users can
specify a 4-level CPU hierarchy sockets/clusters/cores/threads. And
this topology can be passed to the firmware through /cpus/topology
Device Tree.

Signed-off-by: Xiong Yining 
Reviewed-by: Marcin Juszkiewicz 
Reviewed-by: Leif Lindholm 
Message-id: 20240607103825.1295328-2-xiongyining1...@phytium.com.cn
Tested-by: Marcin Juszkiewicz 
Signed-off-by: Peter Maydell 
---
 docs/system/arm/sbsa.rst |  4 
 hw/arm/sbsa-ref.c| 11 ++-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/docs/system/arm/sbsa.rst b/docs/system/arm/sbsa.rst
index 2bf22a1d0b0..2bf3fc8d59d 100644
--- a/docs/system/arm/sbsa.rst
+++ b/docs/system/arm/sbsa.rst
@@ -62,6 +62,7 @@ The devicetree reports:
- platform version
- GIC addresses
- NUMA node id for CPUs and memory
+   - CPU topology information
 
 Platform version
 
@@ -88,3 +89,6 @@ Platform version changes:
 
 0.3
   The USB controller is an XHCI device, not EHCI.
+
+0.4
+  CPU topology information is present in devicetree.
diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
index 87884400e30..ae37a923015 100644
--- a/hw/arm/sbsa-ref.c
+++ b/hw/arm/sbsa-ref.c
@@ -219,7 +219,7 @@ static void create_fdt(SBSAMachineState *sms)
  *fw compatibility.
  */
 qemu_fdt_setprop_cell(fdt, "/", "machine-version-major", 0);
-qemu_fdt_setprop_cell(fdt, "/", "machine-version-minor", 3);
+qemu_fdt_setprop_cell(fdt, "/", "machine-version-minor", 4);
 
 if (ms->numa_state->have_numa_distance) {
 int size = nb_numa_nodes * nb_numa_nodes * 3 * sizeof(uint32_t);
@@ -276,6 +276,14 @@ static void create_fdt(SBSAMachineState *sms)
 g_free(nodename);
 }
 
+/* Add CPU topology description through fdt node topology. */
+qemu_fdt_add_subnode(sms->fdt, "/cpus/topology");
+
+qemu_fdt_setprop_cell(sms->fdt, "/cpus/topology", "sockets", 
ms->smp.sockets);
+qemu_fdt_setprop_cell(sms->fdt, "/cpus/topology", "clusters", 
ms->smp.clusters);
+qemu_fdt_setprop_cell(sms->fdt, "/cpus/topology", "cores", ms->smp.cores);
+qemu_fdt_setprop_cell(sms->fdt, "/cpus/topology", "threads", 
ms->smp.threads);
+
 sbsa_fdt_add_gic_node(sms);
 }
 
@@ -898,6 +906,7 @@ static void sbsa_ref_class_init(ObjectClass *oc, void *data)
 mc->default_ram_size = 1 * GiB;
 mc->default_ram_id = "sbsa-ref.ram";
 mc->default_cpus = 4;
+mc->smp_props.clusters_supported = true;
 mc->possible_cpu_arch_ids = sbsa_ref_possible_cpu_arch_ids;
 mc->cpu_index_to_instance_props = sbsa_ref_cpu_index_to_props;
 mc->get_default_cpu_node_id = sbsa_ref_get_default_cpu_node_id;
-- 
2.34.1

[PULL 12/18] hw/arm/virt: Add serial aliases in DTB

If there is more than one UART in the DTB, then there is no guarantee
on which order a guest is supposed to initialise them.  The standard
solution to this is "serialN" entries in the "/aliases" node of the
dtb which give the nodename of the UARTs.

At the moment we only have two UARTs in the DTB when one is for
the Secure world and one for the Non-Secure world, so this isn't
really a problem. However if we want to add a second NS UART we'll
need the aliases to ensure guests pick the right one.

Signed-off-by: Peter Maydell 
Reviewed-by: Philippe Mathieu-Daudé 
Message-id: 20240610162343.2131524-2-peter.mayd...@linaro.org
---
 hw/arm/virt.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index c7a1f754e72..61a9d47c026 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -284,6 +284,8 @@ static void create_fdt(VirtMachineState *vms)
 }
 }
 
+qemu_fdt_add_subnode(fdt, "/aliases");
+
 /* Clock node, for the benefit of the UART. The kernel device tree
  * binding documentation claims the PL011 node clock properties are
  * optional but in practice if you omit them the kernel refuses to
@@ -939,7 +941,9 @@ static void create_uart(const VirtMachineState *vms, int 
uart,
 
 if (uart == VIRT_UART) {
 qemu_fdt_setprop_string(ms->fdt, "/chosen", "stdout-path", nodename);
+qemu_fdt_setprop_string(ms->fdt, "/aliases", "serial0", nodename);
 } else {
+qemu_fdt_setprop_string(ms->fdt, "/aliases", "serial1", nodename);
 /* Mark as not usable by the normal world */
 qemu_fdt_setprop_string(ms->fdt, nodename, "status", "disabled");
 qemu_fdt_setprop_string(ms->fdt, nodename, "secure-status", "okay");
-- 
2.34.1

[PULL 02/18] hw/arm/sbsa-ref: switch to 1GHz timer frequency

From: Marcin Juszkiewicz 

Updated firmware for QEMU CI is already in merge queue so we can move
platform to be future proof.

All supported cpus work fine with 1GHz timer frequency when firmware is
fresh enough.

Signed-off-by: Marcin Juszkiewicz 
Reviewed-by: Leif Lindholm 
Message-id: 20240531093729.220758-2-marcin.juszkiew...@linaro.org
Signed-off-by: Peter Maydell 
---
 hw/arm/sbsa-ref.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
index e884692f07f..87884400e30 100644
--- a/hw/arm/sbsa-ref.c
+++ b/hw/arm/sbsa-ref.c
@@ -62,16 +62,12 @@
 
 /*
  * Generic timer frequency in Hz (which drives both the CPU generic timers
- * and the SBSA watchdog-timer). Older versions of the TF-A firmware
- * typically used with sbsa-ref (including the binaries in our Avocado test
- * Aarch64SbsarefMachine.test_sbsaref_alpine_linux_max_pauth_impdef
- * assume it is this value.
+ * and the SBSA watchdog-timer). Older (<2.11) versions of the TF-A firmware
+ * assumed 62.5MHz here.
  *
- * TODO: this value is not architecturally correct for an Armv8.6 or
- * better CPU, so we should move to 1GHz once the TF-A fix above has
- * made it into a release and into our Avocado test.
+ * Starting with Armv8.6 CPU 1GHz timer frequency is mandated.
  */
-#define SBSA_GTIMER_HZ 6250
+#define SBSA_GTIMER_HZ 10
 
 enum {
 SBSA_FLASH,
-- 
2.34.1

[PULL 16/18] hw/misc: Set valid access size for Exynos4210 RNG

From: Zheyu Ma 

The Exynos4210 RNG module requires 32-bit (4-byte) accesses to its registers.
According to the User Manual Section 25.3[1], the registers for RNG operations
are 32-bit. This change ensures that the memory region operations for the
RNG module enforce the correct access sizes, preventing invalid memory accesses.

[1] 
http://www.mediafire.com/view/8ly2fqls3c9c31c/Exynos_4412_SCP_Users_Manual_Ver.0.10.00_Preliminary0.pdf

Reproducer:
cat << EOF | qemu-system-aarch64 -display none \
-machine accel=qtest, -m 512M -machine smdkc210 -qtest stdio
readb 0x10830454
EOF

Suggested-by: Philippe Mathieu-Daudé 
Signed-off-by: Zheyu Ma 
Message-id: 20240618163701.3204975-1-zheyum...@gmail.com
Reviewed-by: Peter Maydell 
Signed-off-by: Peter Maydell 
---
 hw/misc/exynos4210_rng.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/misc/exynos4210_rng.c b/hw/misc/exynos4210_rng.c
index 0756bd32059..674d8eece5f 100644
--- a/hw/misc/exynos4210_rng.c
+++ b/hw/misc/exynos4210_rng.c
@@ -217,6 +217,8 @@ static const MemoryRegionOps exynos4210_rng_ops = {
 .read = exynos4210_rng_read,
 .write = exynos4210_rng_write,
 .endianness = DEVICE_NATIVE_ENDIAN,
+.valid.min_access_size = 4,
+.valid.max_access_size = 4,
 };
 
 static void exynos4210_rng_reset(DeviceState *dev)
-- 
2.34.1

[PULL 05/18] scripts/coverity-scan/COMPONENTS.md: Update paths to match gitlab CI

Since commit 83aa1baa069c we have been running the build for Coverity
Scan as a Gitlab CI job, rather than the old setup where it was run
on a local developer's machine.  This is working well, but the
absolute paths of files are different for the Gitlab CI job, which
means that the regexes we use to identify Coverity components no
longer work. With Gitlab CI builds the file paths are of the form
 /builds/qemu-project/qemu/accel/kvm/kvm-all.c

rather than the old
 /qemu/accel/kvm/kvm-all.c

and our regexes all don't match.

Update all the regexes to start with .*/qemu/ . This will hopefully
avoid the need to change them again in future if the build path
changes again.

This change was made with a search-and-replace of (/qemu)?
to .*/qemu .

Signed-off-by: Peter Maydell 
Reviewed-by: Philippe Mathieu-Daudé 
Message-id: 20240604145934.1230583-2-peter.mayd...@linaro.org
---
 scripts/coverity-scan/COMPONENTS.md | 104 ++--
 1 file changed, 52 insertions(+), 52 deletions(-)

diff --git a/scripts/coverity-scan/COMPONENTS.md 
b/scripts/coverity-scan/COMPONENTS.md
index 1537e49cd5a..98d4bcd6a50 100644
--- a/scripts/coverity-scan/COMPONENTS.md
+++ b/scripts/coverity-scan/COMPONENTS.md
@@ -1,157 +1,157 @@
 This is the list of currently configured Coverity components:
 
 alpha
-  ~ (/qemu)?((/include)?/hw/alpha/.*|/target/alpha/.*)
+  ~ .*/qemu((/include)?/hw/alpha/.*|/target/alpha/.*)
 
 arm
-  ~ 
(/qemu)?((/include)?/hw/arm/.*|(/include)?/hw/.*/(arm|allwinner-a10|bcm28|digic|exynos|imx|omap|stellaris|pxa2xx|versatile|zynq|cadence).*|/hw/net/xgmac.c|/hw/ssi/xilinx_spips.c|/target/arm/.*)
+  ~ 
.*/qemu((/include)?/hw/arm/.*|(/include)?/hw/.*/(arm|allwinner-a10|bcm28|digic|exynos|imx|omap|stellaris|pxa2xx|versatile|zynq|cadence).*|/hw/net/xgmac.c|/hw/ssi/xilinx_spips.c|/target/arm/.*)
 
 avr
-  ~ (/qemu)?((/include)?/hw/avr/.*|/target/avr/.*)
+  ~ .*/qemu((/include)?/hw/avr/.*|/target/avr/.*)
 
 cris
-  ~ (/qemu)?((/include)?/hw/cris/.*|/target/cris/.*)
+  ~ .*/qemu((/include)?/hw/cris/.*|/target/cris/.*)
 
 hexagon-gen (component should be ignored in analysis)
-  ~ (/qemu)?(/target/hexagon/.*generated.*)
+  ~ .*/qemu(/target/hexagon/.*generated.*)
 
 hexagon
-  ~ (/qemu)?(/target/hexagon/.*)
+  ~ .*/qemu(/target/hexagon/.*)
 
 hppa
-  ~ (/qemu)?((/include)?/hw/hppa/.*|/target/hppa/.*)
+  ~ .*/qemu((/include)?/hw/hppa/.*|/target/hppa/.*)
 
 i386
-  ~ (/qemu)?((/include)?/hw/i386/.*|/target/i386/.*|/hw/intc/[^/]*apic[^/]*\.c)
+  ~ .*/qemu((/include)?/hw/i386/.*|/target/i386/.*|/hw/intc/[^/]*apic[^/]*\.c)
 
 loongarch
-  ~ (/qemu)?((/include)?/hw/(loongarch/.*|.*/loongarch.*)|/target/loongarch/.*)
+  ~ .*/qemu((/include)?/hw/(loongarch/.*|.*/loongarch.*)|/target/loongarch/.*)
 
 m68k
-  ~ 
(/qemu)?((/include)?/hw/m68k/.*|/target/m68k/.*|(/include)?/hw(/.*)?/mcf.*|(/include)?/hw/nubus/.*)
+  ~ 
.*/qemu((/include)?/hw/m68k/.*|/target/m68k/.*|(/include)?/hw(/.*)?/mcf.*|(/include)?/hw/nubus/.*)
 
 microblaze
-  ~ (/qemu)?((/include)?/hw/microblaze/.*|/target/microblaze/.*)
+  ~ .*/qemu((/include)?/hw/microblaze/.*|/target/microblaze/.*)
 
 mips
-  ~ (/qemu)?((/include)?/hw/mips/.*|/target/mips/.*)
+  ~ .*/qemu((/include)?/hw/mips/.*|/target/mips/.*)
 
 openrisc
-  ~ (/qemu)?((/include)?/hw/openrisc/.*|/target/openrisc/.*)
+  ~ .*/qemu((/include)?/hw/openrisc/.*|/target/openrisc/.*)
 
 ppc
-  ~ 
(/qemu)?((/include)?/hw/ppc/.*|/target/ppc/.*|/hw/pci-host/(uninorth.*|dec.*|prep.*|ppc.*)|/hw/misc/macio/.*|(/include)?/hw/.*/(xics|openpic|spapr).*)
+  ~ 
.*/qemu((/include)?/hw/ppc/.*|/target/ppc/.*|/hw/pci-host/(uninorth.*|dec.*|prep.*|ppc.*)|/hw/misc/macio/.*|(/include)?/hw/.*/(xics|openpic|spapr).*)
 
 riscv
-  ~ 
(/qemu)?((/include)?/hw/riscv/.*|/target/riscv/.*|/hw/.*/(riscv_|ibex_|sifive_).*)
+  ~ 
.*/qemu((/include)?/hw/riscv/.*|/target/riscv/.*|/hw/.*/(riscv_|ibex_|sifive_).*)
 
 rx
-  ~ (/qemu)?((/include)?/hw/rx/.*|/target/rx/.*)
+  ~ .*/qemu((/include)?/hw/rx/.*|/target/rx/.*)
 
 s390
-  ~ (/qemu)?((/include)?/hw/s390x/.*|/target/s390x/.*|/hw/.*/s390_.*)
+  ~ .*/qemu((/include)?/hw/s390x/.*|/target/s390x/.*|/hw/.*/s390_.*)
 
 sh4
-  ~ (/qemu)?((/include)?/hw/sh4/.*|/target/sh4/.*)
+  ~ .*/qemu((/include)?/hw/sh4/.*|/target/sh4/.*)
 
 sparc
-  ~ 
(/qemu)?((/include)?/hw/sparc(64)?.*|/target/sparc/.*|/hw/.*/grlib.*|/hw/display/cg3.c)
+  ~ 
.*/qemu((/include)?/hw/sparc(64)?.*|/target/sparc/.*|/hw/.*/grlib.*|/hw/display/cg3.c)
 
 tricore
-  ~ (/qemu)?((/include)?/hw/tricore/.*|/target/tricore/.*)
+  ~ .*/qemu((/include)?/hw/tricore/.*|/target/tricore/.*)
 
 xtensa
-  ~ (/qemu)?((/include)?/hw/xtensa/.*|/target/xtensa/.*)
+  ~ .*/qemu((/include)?/hw/xtensa/.*|/target/xtensa/.*)
 
 9pfs
-  ~ (/qemu)?(/hw/9pfs/.*|/fsdev/.*)
+  ~ .*/qemu(/hw/9pfs/.*|/fsdev/.*)
 
 audio
-  ~ (/qemu)?((/include)?/(audio|hw/audio)/.*)
+  ~ .*/qemu((/include)?/(audio|hw/audio)/.*)
 
 block
-  ~

[PULL 11/18] hw/usb/hcd-dwc2: Handle invalid address access in read and write functions

From: Zheyu Ma 

This commit modifies the dwc2_hsotg_read() and dwc2_hsotg_write() functions
to handle invalid address access gracefully. Instead of using
g_assert_not_reached(), which causes the program to abort, the functions
now log an error message and return a default value for reads or do
nothing for writes.

This change prevents the program from aborting and provides clear log
messages indicating when an invalid memory address is accessed.

Reproducer:
cat << EOF | qemu-system-aarch64 -display none \
-machine accel=qtest, -m 512M -machine raspi2b -m 1G -nodefaults \
-usb -drive file=null-co://,if=none,format=raw,id=disk0 -device \
usb-storage,port=1,drive=disk0 -qtest stdio
readl 0x3f980dfb
EOF

Signed-off-by: Zheyu Ma 
Reviewed-by: Paul Zimmerman 
Message-id: 20240618135610.3109175-1-zheyum...@gmail.com
Signed-off-by: Peter Maydell 
---
 hw/usb/hcd-dwc2.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/hw/usb/hcd-dwc2.c b/hw/usb/hcd-dwc2.c
index 8cac9c0a062..b4f0652c7d2 100644
--- a/hw/usb/hcd-dwc2.c
+++ b/hw/usb/hcd-dwc2.c
@@ -1128,7 +1128,10 @@ static uint64_t dwc2_hsotg_read(void *ptr, hwaddr addr, 
unsigned size)
 val = dwc2_pcgreg_read(ptr, addr, (addr - HSOTG_REG(0xe00)) >> 2, 
size);
 break;
 default:
-g_assert_not_reached();
+qemu_log_mask(LOG_GUEST_ERROR, "%s: Bad offset 0x%"HWADDR_PRIx"\n",
+  __func__, addr);
+val = 0;
+break;
 }
 
 return val;
@@ -1160,7 +1163,9 @@ static void dwc2_hsotg_write(void *ptr, hwaddr addr, 
uint64_t val,
 dwc2_pcgreg_write(ptr, addr, (addr - HSOTG_REG(0xe00)) >> 2, val, 
size);
 break;
 default:
-g_assert_not_reached();
+qemu_log_mask(LOG_GUEST_ERROR, "%s: Bad offset 0x%"HWADDR_PRIx"\n",
+  __func__, addr);
+break;
 }
 }
 
-- 
2.34.1

[PULL 10/18] hw/timer/a9gtimer: Handle QTest mode in a9_gtimer_get_current_cpu

From: Zheyu Ma 

This commit updates the a9_gtimer_get_current_cpu() function to handle
cases where QTest is enabled. When QTest is used, it returns 0 instead
of dereferencing the current_cpu, which can be NULL. This prevents the
program from crashing during QTest runs.

Reproducer:
cat << EOF | qemu-system-aarch64 -display \
none -machine accel=qtest, -m 512M -machine npcm750-evb -qtest stdio
writel 0xf03fe20c 0x26d7468c
EOF

Signed-off-by: Zheyu Ma 
Reviewed-by: Philippe Mathieu-Daudé 
Message-id: 20240618144009.3137806-1-zheyum...@gmail.com
Signed-off-by: Peter Maydell 
---
 hw/timer/a9gtimer.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/hw/timer/a9gtimer.c b/hw/timer/a9gtimer.c
index a2ac5bdfb99..64d80cdf6a3 100644
--- a/hw/timer/a9gtimer.c
+++ b/hw/timer/a9gtimer.c
@@ -32,6 +32,7 @@
 #include "qemu/log.h"
 #include "qemu/module.h"
 #include "hw/core/cpu.h"
+#include "sysemu/qtest.h"
 
 #ifndef A9_GTIMER_ERR_DEBUG
 #define A9_GTIMER_ERR_DEBUG 0
@@ -48,6 +49,10 @@
 
 static inline int a9_gtimer_get_current_cpu(A9GTimerState *s)
 {
+if (qtest_enabled()) {
+return 0;
+}
+
 if (current_cpu->cpu_index >= s->num_cpu) {
 hw_error("a9gtimer: num-cpu %d but this cpu is %d!\n",
  s->num_cpu, current_cpu->cpu_index);
-- 
2.34.1

[PULL 15/18] hw/arm/virt: Avoid unexpected warning from Linux guest on host with Fujitsu CPUs

From: Zhenyu Zhang 

Multiple warning messages and corresponding backtraces are observed when Linux
guest is booted on the host with Fujitsu CPUs. One of them is shown as below.

[0.032443] [ cut here ]
[0.032446] uart-pl011 900.pl011: ARCH_DMA_MINALIGN smaller than
CTR_EL0.CWG (128 < 256)
[0.032454] WARNING: CPU: 0 PID: 1 at arch/arm64/mm/dma-mapping.c:54
arch_setup_dma_ops+0xbc/0xcc
[0.032470] Modules linked in:
[0.032475] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.14.0-452.el9.aarch64
[0.032481] Hardware name: linux,dummy-virt (DT)
[0.032484] pstate: 6045 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[0.032490] pc : arch_setup_dma_ops+0xbc/0xcc
[0.032496] lr : arch_setup_dma_ops+0xbc/0xcc
[0.032501] sp : 80008003b860
[0.032503] x29: 80008003b860 x28:  x27: aae4b949049c
[0.032510] x26:  x25:  x24: 
[0.032517] x23: 0100 x22:  x21: 
[0.032523] x20: 0001 x19: 2f06c02ea400 x18: 
[0.032529] x17: 208a5f76 x16: 6589dbcb x15: aae4ba071c89
[0.032535] x14:  x13: aae4ba071c84 x12: 455f525443206e61
[0.032541] x11: 68742072656c6c61 x10: 0029 x9 : aae4b7d21da4
[0.032547] x8 : 0029 x7 : 4c414e494d5f414d x6 : 0029
[0.032553] x5 : 000f x4 : aae4b9617a00 x3 : 0001
[0.032558] x2 :  x1 :  x0 : 2f06c029be40
[0.032564] Call trace:
[0.032566]  arch_setup_dma_ops+0xbc/0xcc
[0.032572]  of_dma_configure_id+0x138/0x300
[0.032591]  amba_dma_configure+0x34/0xc0
[0.032600]  really_probe+0x78/0x3dc
[0.032614]  __driver_probe_device+0x108/0x160
[0.032619]  driver_probe_device+0x44/0x114
[0.032624]  __device_attach_driver+0xb8/0x14c
[0.032629]  bus_for_each_drv+0x88/0xe4
[0.032634]  __device_attach+0xb0/0x1e0
[0.032638]  device_initial_probe+0x18/0x20
[0.032643]  bus_probe_device+0xa8/0xb0
[0.032648]  device_add+0x4b4/0x6c0
[0.032652]  amba_device_try_add.part.0+0x48/0x360
[0.032657]  amba_device_add+0x104/0x144
[0.032662]  of_amba_device_create.isra.0+0x100/0x1c4
[0.032666]  of_platform_bus_create+0x294/0x35c
[0.032669]  of_platform_populate+0x5c/0x150
[0.032672]  of_platform_default_populate_init+0xd0/0xec
[0.032697]  do_one_initcall+0x4c/0x2e0
[0.032701]  do_initcalls+0x100/0x13c
[0.032707]  kernel_init_freeable+0x1c8/0x21c
[0.032712]  kernel_init+0x28/0x140
[0.032731]  ret_from_fork+0x10/0x20
[0.032735] ---[ end trace  ]---

In Linux, a check is applied to every device which is exposed through
device-tree node. The warning message is raised when the device isn't
DMA coherent and the cache line size is larger than ARCH_DMA_MINALIGN
(128 bytes). The cache line is sorted from CTR_EL0[CWG], which corresponds
to 256 bytes on the guest CPUs. The DMA coherent capability is claimed
through 'dma-coherent' in their device-tree nodes or parent nodes.
This happens even when the device doesn't implement or use DMA at all,
for legacy reasons.

Fix the issue by adding 'dma-coherent' property to the device-tree root
node, meaning all devices are capable of DMA coherent by default.
This both suppresses the spurious kernel warnings and also guards
against possible future QEMU bugs where we add a DMA-capable device
and forget to mark it as dma-coherent.

Signed-off-by: Zhenyu Zhang 
Reviewed-by: Gavin Shan 
Reviewed-by: Donald Dutile 
Message-id: 20240612020506.307793-1-zheny...@redhat.com
[PMM: tweaked commit message]
Signed-off-by: Peter Maydell 
---
 hw/arm/virt.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 85556152563..0784ee7f466 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -271,6 +271,17 @@ static void create_fdt(VirtMachineState *vms)
 qemu_fdt_setprop_cell(fdt, "/", "#size-cells", 0x2);
 qemu_fdt_setprop_string(fdt, "/", "model", "linux,dummy-virt");
 
+/*
+ * For QEMU, all DMA is coherent. Advertising this in the root node
+ * has two benefits:
+ *
+ * - It avoids potential bugs where we forget to mark a DMA
+ *   capable device as being dma-coherent
+ * - It avoids spurious warnings from the Linux kernel about
+ *   devices which can't do DMA at all
+ */
+qemu_fdt_setprop(fdt, "/", "dma-coherent", NULL, 0);
+
 /* /chosen must exist for load_dtb to fill in necessary properties later */
 qemu_fdt_add_subnode(fdt, "/chosen");
 if (vms->dtb_randomness) {
-- 
2.34.1

[PULL 09/18] scripts/coverity-scan/COMPONENTS.md: Include libqmp in testlibs

Add libqmp to the testlibs component.

Signed-off-by: Peter Maydell 
Reviewed-by: Philippe Mathieu-Daudé 
Message-id: 20240604145934.1230583-6-peter.mayd...@linaro.org
---
 scripts/coverity-scan/COMPONENTS.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/coverity-scan/COMPONENTS.md 
b/scripts/coverity-scan/COMPONENTS.md
index 3864f8eda07..858190be097 100644
--- a/scripts/coverity-scan/COMPONENTS.md
+++ b/scripts/coverity-scan/COMPONENTS.md
@@ -154,7 +154,7 @@ sysemu
   ~ .*/qemu(/include/.*)
 
 testlibs
-  ~ .*/qemu(/tests/qtest(/libqos/.*|/libqtest.*))
+  ~ .*/qemu(/tests/qtest(/libqos/.*|/libqtest.*|/libqmp.*))
 
 tests
   ~ .*/qemu(/tests/.*)
-- 
2.34.1

[PULL 01/18] hw/net/can/xlnx-versal-canfd: Fix sorting of the tx queue

From: Shiva sagar Myana 

Returning an uint32_t casted to a gint from g_cmp_ids causes the tx queue to
become wrongly sorted when executing g_slist_sort. Fix this by always
returning -1 or 1 from g_cmp_ids based on the ID comparison instead.
Also, if two message IDs are the same, sort them by using their index and
transmit the message at the lowest index first.

Signed-off-by: Shiva sagar Myana 
Reviewed-by: Francisco Iglesias 
Message-id: 20240603051732.3334571-1-shivasagar.my...@amd.com
Signed-off-by: Peter Maydell 
---
 hw/net/can/xlnx-versal-canfd.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/hw/net/can/xlnx-versal-canfd.c b/hw/net/can/xlnx-versal-canfd.c
index 47a14cfe633..5f083c21e93 100644
--- a/hw/net/can/xlnx-versal-canfd.c
+++ b/hw/net/can/xlnx-versal-canfd.c
@@ -1312,7 +1312,10 @@ static gint g_cmp_ids(gconstpointer data1, gconstpointer 
data2)
 tx_ready_reg_info *tx_reg_1 = (tx_ready_reg_info *) data1;
 tx_ready_reg_info *tx_reg_2 = (tx_ready_reg_info *) data2;
 
-return tx_reg_1->can_id - tx_reg_2->can_id;
+if (tx_reg_1->can_id == tx_reg_2->can_id) {
+return (tx_reg_1->reg_num < tx_reg_2->reg_num) ? -1 : 1;
+}
+return (tx_reg_1->can_id < tx_reg_2->can_id) ? -1 : 1;
 }
 
 static void free_list(GSList *list)
-- 
2.34.1

[PATCH v4 2/3] hw/clock: Expose 'qtest-clock-period' QOM property for QTests

Expose the clock period via the QOM 'qtest-clock-period' property so it
can be used in QTests. This property is only accessible in QTests (not
via HMP).

Signed-off-by: Inès Varhol 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Luc Michel 
---
 docs/devel/clocks.rst |  6 ++
 hw/core/clock.c   | 16 
 2 files changed, 22 insertions(+)

diff --git a/docs/devel/clocks.rst b/docs/devel/clocks.rst
index 177ee1c90d..3f744f2be1 100644
--- a/docs/devel/clocks.rst
+++ b/docs/devel/clocks.rst
@@ -358,6 +358,12 @@ humans (for instance in debugging), use 
``clock_display_freq()``,
 which returns a prettified string-representation, e.g. "33.3 MHz".
 The caller must free the string with g_free() after use.
 
+It's also possible to retrieve the clock period from a QTest by
+accessing QOM property ``qtest-clock-period`` using a QMP command.
+This property is only present when the device is being run under
+the ``qtest`` accelerator; it is not available when QEMU is
+being run normally.
+
 Calculating expiry deadlines
 
 
diff --git a/hw/core/clock.c b/hw/core/clock.c
index e212865307..cbe7b1bc46 100644
--- a/hw/core/clock.c
+++ b/hw/core/clock.c
@@ -13,6 +13,8 @@
 
 #include "qemu/osdep.h"
 #include "qemu/cutils.h"
+#include "qapi/visitor.h"
+#include "sysemu/qtest.h"
 #include "hw/clock.h"
 #include "trace.h"
 
@@ -158,6 +160,15 @@ bool clock_set_mul_div(Clock *clk, uint32_t multiplier, 
uint32_t divider)
 return true;
 }
 
+static void clock_period_prop_get(Object *obj, Visitor *v, const char *name,
+void *opaque, Error **errp)
+{
+Clock *clk = CLOCK(obj);
+uint64_t period = clock_get(clk);
+visit_type_uint64(v, name, , errp);
+}
+
+
 static void clock_initfn(Object *obj)
 {
 Clock *clk = CLOCK(obj);
@@ -166,6 +177,11 @@ static void clock_initfn(Object *obj)
 clk->divider = 1;
 
 QLIST_INIT(>children);
+
+if (qtest_enabled()) {
+object_property_add(obj, "qtest-clock-period", "uint64",
+clock_period_prop_get, NULL, NULL, NULL);
+}
 }
 
 static void clock_finalizefn(Object *obj)
-- 
2.43.2

[PATCH v4 3/3] tests/qtest: Check STM32L4x5 clock connections

For USART, GPIO and SYSCFG devices, check that clock frequency before
and after enabling the peripheral clock in RCC is correct.

Signed-off-by: Inès Varhol 
Reviewed-by: Peter Maydell 
---
 tests/qtest/stm32l4x5.h | 42 +
 tests/qtest/stm32l4x5_gpio-test.c   | 23 
 tests/qtest/stm32l4x5_syscfg-test.c | 20 --
 tests/qtest/stm32l4x5_usart-test.c  | 26 ++
 4 files changed, 109 insertions(+), 2 deletions(-)
 create mode 100644 tests/qtest/stm32l4x5.h

diff --git a/tests/qtest/stm32l4x5.h b/tests/qtest/stm32l4x5.h
new file mode 100644
index 00..2d21cc666c
--- /dev/null
+++ b/tests/qtest/stm32l4x5.h
@@ -0,0 +1,42 @@
+/*
+ * QTest testcase header for STM32L4X5 :
+ * used for consolidating common objects in stm32l4x5_*-test.c
+ *
+ * Copyright (c) 2024 Arnaud Minier 
+ * Copyright (c) 2024 Inès Varhol 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "libqtest.h"
+
+/* copied from clock.h */
+#define CLOCK_PERIOD_1SEC (10llu << 32)
+#define CLOCK_PERIOD_FROM_HZ(hz) (((hz) != 0) ? CLOCK_PERIOD_1SEC / (hz) : 0u)
+/*
+ * MSI (4 MHz) is used as system clock source after startup
+ * from Reset.
+ * AHB, APB1 and APB2 prescalers are set to 1 at reset.
+ */
+#define SYSCLK_PERIOD CLOCK_PERIOD_FROM_HZ(400)
+#define RCC_AHB2ENR 0x4002104C
+#define RCC_APB1ENR1 0x40021058
+#define RCC_APB1ENR2 0x4002105C
+#define RCC_APB2ENR 0x40021060
+
+
+static inline uint64_t get_clock_period(QTestState *qts, const char *path)
+{
+uint64_t clock_period = 0;
+QDict *r;
+
+r = qtest_qmp(qts, "{ 'execute': 'qom-get', 'arguments':"
+" { 'path': %s, 'property': 'qtest-clock-period'} }", path);
+g_assert_false(qdict_haskey(r, "error"));
+clock_period = qdict_get_int(r, "return");
+qobject_unref(r);
+return clock_period;
+}
+
+
diff --git a/tests/qtest/stm32l4x5_gpio-test.c 
b/tests/qtest/stm32l4x5_gpio-test.c
index 72a7823406..c0686c7b30 100644
--- a/tests/qtest/stm32l4x5_gpio-test.c
+++ b/tests/qtest/stm32l4x5_gpio-test.c
@@ -10,6 +10,7 @@
 
 #include "qemu/osdep.h"
 #include "libqtest-single.h"
+#include "stm32l4x5.h"
 
 #define GPIO_BASE_ADDR 0x4800
 #define GPIO_SIZE  0x400
@@ -505,6 +506,26 @@ static void test_bsrr_brr(const void *data)
 gpio_writel(gpio, ODR, reset(gpio, ODR));
 }
 
+static void test_clock_enable(void)
+{
+/*
+ * For each GPIO, enable its clock in RCC
+ * and check that its clock period changes to SYSCLK_PERIOD
+ */
+unsigned int gpio_id;
+
+for (uint32_t gpio = GPIO_A; gpio <= GPIO_H; gpio += GPIO_B - GPIO_A) {
+gpio_id = get_gpio_id(gpio);
+g_autofree char *path = g_strdup_printf("/machine/soc/gpio%c/clk",
+gpio_id + 'a');
+g_assert_cmpuint(get_clock_period(global_qtest, path), ==, 0);
+/* Enable the gpio clock */
+writel(RCC_AHB2ENR, readl(RCC_AHB2ENR) | (0x1 << gpio_id));
+g_assert_cmpuint(get_clock_period(global_qtest, path), ==,
+ SYSCLK_PERIOD);
+}
+}
+
 int main(int argc, char **argv)
 {
 int ret;
@@ -556,6 +577,8 @@ int main(int argc, char **argv)
 qtest_add_data_func("stm32l4x5/gpio/test_bsrr_brr2",
 test_data(GPIO_D, 0),
 test_bsrr_brr);
+qtest_add_func("stm32l4x5/gpio/test_clock_enable",
+   test_clock_enable);
 
 qtest_start("-machine b-l475e-iot01a");
 ret = g_test_run();
diff --git a/tests/qtest/stm32l4x5_syscfg-test.c 
b/tests/qtest/stm32l4x5_syscfg-test.c
index 506ca08bc2..8eaffe43ea 100644
--- a/tests/qtest/stm32l4x5_syscfg-test.c
+++ b/tests/qtest/stm32l4x5_syscfg-test.c
@@ -10,6 +10,7 @@
 
 #include "qemu/osdep.h"
 #include "libqtest-single.h"
+#include "stm32l4x5.h"
 
 #define SYSCFG_BASE_ADDR 0x4001
 #define SYSCFG_MEMRMP 0x00
@@ -26,7 +27,9 @@
 #define INVALID_ADDR 0x2C
 
 /* SoC forwards GPIOs to SysCfg */
-#define SYSCFG "/machine/soc"
+#define SOC "/machine/soc"
+#define SYSCFG "/machine/soc/syscfg"
+#define SYSCFG_CLK "/machine/soc/syscfg/clk"
 #define EXTI "/machine/soc/exti"
 
 static void syscfg_writel(unsigned int offset, uint32_t value)
@@ -41,7 +44,7 @@ static uint32_t syscfg_readl(unsigned int offset)
 
 static void syscfg_set_irq(int num, int level)
 {
-   qtest_set_irq_in(global_qtest, SYSCFG, NULL, num, level);
+   qtest_set_irq_in(global_qtest, SOC, NULL, num, level);
 }
 
 static void system_reset(void)
@@ -301,6 +304,17 @@ static void test_irq_gpio_multiplexer(void)
 syscfg_writel(SYSCFG_EXTICR1, 0x);
 }
 
+static void test_clock_enable(void)
+{
+g_assert_cmpuint(get_clock_period(global_qtest, SYSCFG_CLK), ==, 0);
+
+/* Enable SYSCFG clock */
+writel(RCC_APB2ENR, readl(RCC_APB2ENR) | (0x1 << 0));
+
+g_assert_cmpuint(get_clock_period(global_qtest, SYSCFG_CLK), ==,
+

[PATCH v4 1/3] hw/misc: Create STM32L4x5 SYSCFG clock

This commit creates a clock in STM32L4x5 SYSCFG and wires it up to the
corresponding clock from STM32L4x5 RCC.

Signed-off-by: Inès Varhol 
Reviewed-by: Peter Maydell 
Reviewed-by: Philippe Mathieu-Daudé 
---
 include/hw/misc/stm32l4x5_syscfg.h |  1 +
 hw/arm/stm32l4x5_soc.c |  2 ++
 hw/misc/stm32l4x5_syscfg.c | 19 +--
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/include/hw/misc/stm32l4x5_syscfg.h 
b/include/hw/misc/stm32l4x5_syscfg.h
index 23bb564150..c450df2b9e 100644
--- a/include/hw/misc/stm32l4x5_syscfg.h
+++ b/include/hw/misc/stm32l4x5_syscfg.h
@@ -48,6 +48,7 @@ struct Stm32l4x5SyscfgState {
 uint32_t swpr2;
 
 qemu_irq gpio_out[GPIO_NUM_PINS];
+Clock *clk;
 };
 
 #endif
diff --git a/hw/arm/stm32l4x5_soc.c b/hw/arm/stm32l4x5_soc.c
index 38f7a2d5d9..fb2afa6cfe 100644
--- a/hw/arm/stm32l4x5_soc.c
+++ b/hw/arm/stm32l4x5_soc.c
@@ -236,6 +236,8 @@ static void stm32l4x5_soc_realize(DeviceState *dev_soc, 
Error **errp)
 
 /* System configuration controller */
 busdev = SYS_BUS_DEVICE(>syscfg);
+qdev_connect_clock_in(DEVICE(>syscfg), "clk",
+qdev_get_clock_out(DEVICE(&(s->rcc)), "syscfg-out"));
 if (!sysbus_realize(busdev, errp)) {
 return;
 }
diff --git a/hw/misc/stm32l4x5_syscfg.c b/hw/misc/stm32l4x5_syscfg.c
index a5a1ce2680..a947a9e036 100644
--- a/hw/misc/stm32l4x5_syscfg.c
+++ b/hw/misc/stm32l4x5_syscfg.c
@@ -26,6 +26,9 @@
 #include "trace.h"
 #include "hw/irq.h"
 #include "migration/vmstate.h"
+#include "hw/clock.h"
+#include "hw/qdev-clock.h"
+#include "qapi/error.h"
 #include "hw/misc/stm32l4x5_syscfg.h"
 #include "hw/gpio/stm32l4x5_gpio.h"
 
@@ -225,12 +228,22 @@ static void stm32l4x5_syscfg_init(Object *obj)
 qdev_init_gpio_in(DEVICE(obj), stm32l4x5_syscfg_set_irq,
   GPIO_NUM_PINS * NUM_GPIOS);
 qdev_init_gpio_out(DEVICE(obj), s->gpio_out, GPIO_NUM_PINS);
+s->clk = qdev_init_clock_in(DEVICE(s), "clk", NULL, s, 0);
+}
+
+static void stm32l4x5_syscfg_realize(DeviceState *dev, Error **errp)
+{
+Stm32l4x5SyscfgState *s = STM32L4X5_SYSCFG(dev);
+if (!clock_has_source(s->clk)) {
+error_setg(errp, "SYSCFG: clk input must be connected");
+return;
+}
 }
 
 static const VMStateDescription vmstate_stm32l4x5_syscfg = {
 .name = TYPE_STM32L4X5_SYSCFG,
-.version_id = 1,
-.minimum_version_id = 1,
+.version_id = 2,
+.minimum_version_id = 2,
 .fields = (VMStateField[]) {
 VMSTATE_UINT32(memrmp, Stm32l4x5SyscfgState),
 VMSTATE_UINT32(cfgr1, Stm32l4x5SyscfgState),
@@ -241,6 +254,7 @@ static const VMStateDescription vmstate_stm32l4x5_syscfg = {
 VMSTATE_UINT32(swpr, Stm32l4x5SyscfgState),
 VMSTATE_UINT32(skr, Stm32l4x5SyscfgState),
 VMSTATE_UINT32(swpr2, Stm32l4x5SyscfgState),
+VMSTATE_CLOCK(clk, Stm32l4x5SyscfgState),
 VMSTATE_END_OF_LIST()
 }
 };
@@ -251,6 +265,7 @@ static void stm32l4x5_syscfg_class_init(ObjectClass *klass, 
void *data)
 ResettableClass *rc = RESETTABLE_CLASS(klass);
 
 dc->vmsd = _stm32l4x5_syscfg;
+dc->realize = stm32l4x5_syscfg_realize;
 rc->phases.hold = stm32l4x5_syscfg_hold_reset;
 }
 
-- 
2.43.2

[PATCH v4 0/3] Check clock connection between STM32L4x5 RCC and peripherals

Among implemented STM32L4x5 devices, USART, GPIO and SYSCFG
have a clock source, but none has a corresponding test in QEMU.

This patch makes sure that all 3 devices create a clock correctly,
adds a QOM property to access clocks' periods from QTests,
and adds QTests checking that clock enable in RCC has the
expected results for all 3 devices.

Thank you for the reviews.

Changes from v3 to v4:
- removed 2nd commit (it was bumping up version id in
`vmstate_stm32l4x5_usart_base`, which is useless when not adding
any fields), it was a misunderstanding
- in `clock.c`, `vmstate_stm32l4x5_usart_base`, renamed `freq_hz` to
`period`
- in `clocks.rst`, specified that `qtest-clock-period` is only usable
from the QTests and not QEMU
- in `qtest/stm32l4x5.h`, used macros from "clock.h" to compute
the expected clock period in the right unit
- in `qtest/stm32l4x5.h`, removed "osdep.h" include

Changes from "v1" to v3:
- adding a commit to expose `qtest-clock-period`, a QOM property for
all clocks, only accessible from QTests, and mention it in clock.rst
- adapt QTests so that they use clock period instead of clock frequency
- remove `clock-freq-hz` QOM property in STM32L4x5 USART and SYSCFG
- dropping the commit migrating GPIO clocks as it's already upstream

Changes from v1 to an unfortunate second "v1":
- upgrading `VMStateDescription` to version 2 to account for
`VMSTATE_CLOCK()`
- QTests : consolidating `get_clock_freq_hz()` in a header
and making appropriate changes in stm32l4x5q_*-test.c

Signed-off-by: Inès Varhol 

Inès Varhol (3):
  hw/misc: Create STM32L4x5 SYSCFG clock
  hw/clock: Expose 'qtest-clock-period' QOM property for QTests
  tests/qtest: Check STM32L4x5 clock connections

 docs/devel/clocks.rst   |  6 +
 include/hw/misc/stm32l4x5_syscfg.h  |  1 +
 tests/qtest/stm32l4x5.h | 42 +
 hw/arm/stm32l4x5_soc.c  |  2 ++
 hw/core/clock.c | 16 +++
 hw/misc/stm32l4x5_syscfg.c  | 19 +++--
 tests/qtest/stm32l4x5_gpio-test.c   | 23 
 tests/qtest/stm32l4x5_syscfg-test.c | 20 --
 tests/qtest/stm32l4x5_usart-test.c  | 26 ++
 9 files changed, 151 insertions(+), 4 deletions(-)
 create mode 100644 tests/qtest/stm32l4x5.h

-- 
2.43.2

Re: [PATCH 09/13] qapi: convert "Note" sections to plain rST

2024-06-22 Thread Markus Armbruster

John Snow  writes:

> On Fri, Jun 21, 2024 at 8:23 AM Markus Armbruster  wrote:

[...]

>> My reason for four spaces is reducing churn.  To see by how much, I
>> redid your change.  I found a few more notes that don't start with a
>> capital letter, or don't end with a period.
>>
>
> ^ Guess I'll re-audit for v2. Hang on to the list of cases you found.

Happy to share my patch.

> (Sorry for the churn, though. I obviously don't mind it as much as you do,
> but I suspect I'm a lot less nimble with fiddling through git history than
> you are and find the value of avoiding churn to be ... lower than you do,
> in general. Respecting reviewer time is a strong argument, I apologize that
> some non-mechanical changes snuck into the patch. The downside of hacking
> together a very large series.)

You did a good job splitting it up.  Minor mistakes are bound to happen.
Got to give the reviewer soemthing to find ;)

[...]

Re: [PATCH 04/13] qapi/parser: preserve indentation in QAPIDoc sections

2024-06-22 Thread Markus Armbruster

John Snow  writes:

> On Fri, Jun 21, 2024 at 2:38 AM Markus Armbruster  wrote:

[...]

>> I'd like you to express more clearly that you're talking about an
>> alternative you rejected.  Perhaps like this:
>>
>>   block-level constructs such as code blocks, lists, and other such
>>   markup.
>>
>>   The alternative would be to somehow undo .get_doc_indented()'s
>>   indentation changes in the new generator.  Much messier.
>>
>> Feel free to add more detail to the last paragraph.
>>
>
> Eh, I just deleted it. I recall running into troubles but I can't
> articulate the precise conditions because as you point out, it's a doomed
> strategy for other reasons - you can't reconstruct the proper indentation.
>
> This patch is still the correct way to go, so I don't have to explain my
> failures at length in the commit message ... I just like giving people
> clues for *why* I decided to implement things a certain way, because I
> often find that more instructive than the "how".

"Why" tends to be much more useful in a commit message than "how".  I
should be able to figure out "how" by reading the patch, whereas for
"why", I may have to read the author's mind.

>  In this case, the "why" is
> probably more properly summarized as "it's a total shitshow in that
> direction, trust me"

The right amount of detail is often not obvious.  Use your judgement.

Re: standardizing i2c device ids

2024-06-22 Thread Markus Armbruster

Patrick Leis  writes:

> Corey and Peter,
>
> My team builds lots of configurations for Qemu boards, and one pain point
> has been that the qom path for a device depends on the device insertion
> order, child[0], child[1] and the like.

Yes.

Discussed in my "Dynamic & heterogeneous machines, initial
configuration: problems" memo, under "Problem 4: The
/machine/unattached/ orphanage".

Copy of the "Problem 4" part appended for your your convenience.  Full
memo archived at
https://lore.kernel.org/qemu-devel/87o7d1i7ky@pond.sub.org/

>  I noticed that the qdev paths for
> devices also exist by their device id property.  By default, this ends up
> being the device type name.

Which kind of devices?

There are onboard devices and user-created devices.

A user-created device's QOM path is "/machine/peripheral/ID" when it was
created with a qdev ID, and "/machine/peripheral-anon/device[N]" (where
N counts up from zero) when it was created without a qdev ID.  N depends
on creation order, which is under the user's control.  Users can and
should avoid relying on their order by supplying an ID.

An onboard device's QOM path is chosen by board code.  For instance, q35
puts the mch device at "/machine/q35/mch".  However, if the board code
neglects to put the device anywhere, the system puts it at
"/machine/unattached/device[N]" (where N counts up from zero).  N
depends on creation order.

N can change at the drop of a hat.  Whether "device[N]" is a stable
interface is unclear.  It would clearly be a bad one.

If (part of) your problem is "/machine/peripheral-anon/device[N]",
supply IDs to bypass it.

If (part of) your problem is "/machine/unattached/device[N]", all I can
offer is the proper solution: fix the board code to put the device in
its proper place instead of abandoning it to the "/machine/unattached/"
orphanage.

>  I was wondering if it made sense to override
> this with the device type plus the smbus address?  I did something similar
> with the i2c mux device, to resolve part of this issue.

I doubt it.

Questions?

= Problem 4: The /machine/unattached/ orphanage =

Is it okay for a QOM object to have no parent?

An object without a parent is not part of the composition tree; it has
no canonical path, and object_get_canonical_path() returns null.

Such objects can behave in wonky ways.  For instance,
object_property_set_link() treats a target object without a parent as
null.  If a linked object somehow loses its parent,
object_property_get_link() will return null even though the underlying C
pointer still points to the poor orphan.

This strongly suggests QOM was designed with the assumption that objects
always have a parent, except during initialization (before they are
connected to anything) and finalization (when no longer connected to
anything).  object_property_try_add_child()'s contract seems to confirm
this:

 * Child properties form the composition tree.  All objects need to be a child
 * of another object.  Objects can only be a child of one object.

Some functions to create objects take the new object's parent as a
parameter.  Example: object_new_with_props(), object_new_with_propv(),
clock_new(), ...

Others set a fixed parent.  For instance, we always add character
backends to "/chardevs/", objects created with object-add in
"/objects/", devices created with device_add in "/machine/peripheral/"
(with ID) or "/machine/peripheral-anon/" (without ID), ...

There are also functions that don't set a parent: object_new(),
object_new_with_class(), qdev_new(), qdev_try_new(), ...  Setting a
parent is the callers job then.  Invites misuse.  I'm aware of one
instance: @current_migration remains without a parent forever.

Not all callers care to set a parent themselves.  Instead, they rely on
the "/machine/unattached/" orphanage:

* qdev_connect_gpio_out_named() needs the input pin to have a parent.
  If it lacks one, it gets added to "/machine/unattached/" with a
  made-up name.

* device_set_realized() ensures realized devices have a parent by adding
  devices lacking one to "/machine/unattached/" with a made-up name.

* portio_list_add() adds a memory region.  If the caller doesn't specify
  the parent, "/machine/unattached/" is assumed.

* memory_region_init() adds a memory region, and may set the parent.  If
  the caller requests setting a parent without specifying one,
  "/machine/unattached/" is assumed.

* qemu_create_machine() adds the main system bus to
  "/machine/unattached/".

Except for the last one, the child names depend on execution order.  For
instance, device_set_realized() uses "device[N]", where N counts up from
zero.

These brittle, made-up names are visible in QMP QOM introspection.
Whether that's a stable interface is unclear.  Better not.

We don't rely on these names in C.  We follow pointers instead.

When we replace C code by configuration, we switch from pointers to
names.  Brittle names become a

Re: [PATCH] docs/cxl: fix some typos

2024-06-22 Thread Hyeongtak Ji

Hello Jonathan,

Thank you for your response.

On Sat, Jun 22, 2024 at 1:10 AM Jonathan Cameron
 wrote:
>
> On Wed, 19 Jun 2024 13:54:59 +0900
> Hyeongtak Ji  wrote:
>
> Hi, some description would be good of how you caught these
> (I'm guessing a close read).

Just to confirm, are you suggesting that the patch should include a
commit message?  I apologize for submitting the patch without any
sufficient explanation.  However, I am not entirely sure if "how I
found these typos" needs to be included in the commit message.  For
your information, I discovered these typos because the ASCII art did
not align with the explanations (yes, a close read).

>
> Whilst checking this I did notice there are some errors in
> the example bus numbering but that's a separate issue.
>
> Jonathan
>
>
> > Signed-off-by: Hyeongtak Ji 
> > ---
> >  docs/system/devices/cxl.rst | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/docs/system/devices/cxl.rst b/docs/system/devices/cxl.rst
> > index 10a0e9bc9ff4..e2497e6a098b 100644
> > --- a/docs/system/devices/cxl.rst
> > +++ b/docs/system/devices/cxl.rst
> > @@ -218,17 +218,17 @@ Notes:
> >  A complex configuration here, might be to use the following HDM
> >  decoders in HB0. HDM0 routes CFMW0 requests to RP0 and hence
> >  part of CXL Type3 0. HDM1 routes CFMW0 requests from a
> > -different region of the CFMW0 PA range to RP2 and hence part
> > +different region of the CFMW0 PA range to RP1 and hence part
>
> Good catch.
>
> >  of CXL Type 3 1.  HDM2 routes yet another PA range from within
> >  CFMW0 to be interleaved across RP0 and RP1, providing 2 way
> >  interleave of part of the memory provided by CXL Type3 0 and
> >  CXL Type 3 1. HDM3 routes those interleaved accesses from
> >  CFMW1 that target HB0 to RP 0 and another part of the memory of
> >  CXL Type 3 0 (as part of a 2 way interleave at the system level
> > -across for example CXL Type3 0 and CXL Type3 2.
> > +across for example CXL Type3 0 and CXL Type3 1).
> This one is wrong.  CFMW1 interleaves across both host bridges so we need
> a device below HB0 and one below HB1, so CXL type3 2 is a possible choice
> (could be CXL type3 3 as well, but that doesn't matter.)

Oh, I misunderstood the original explanation.  I will correct it just by
adding the missing parenthesis instead.

>
> >  HDM4 is used to enable system wide 4 way interleave across all
> >  the present CXL type3 devices, by interleaving those (interleaved)
> > -requests that HB0 receives from from CFMW1 across RP 0 and
> > +requests that HB0 receives from CFMW1 across RP 0 and
> Good.
>
> >  RP 1 and hence to yet more regions of the memory of the
> >  attached Type3 devices.  Note this is a representative subset
> >  of the full range of possible HDM decoder configurations in this
>

I will send V2 with a decent explanation and the corrected typo fix.

Kind regards,
Hyeongtak

On Sat, Jun 22, 2024 at 1:10 AM Jonathan Cameron
 wrote:
>
> On Wed, 19 Jun 2024 13:54:59 +0900
> Hyeongtak Ji  wrote:
>
> Hi, some description would be good of how you caught these
> (I'm guessing a close read).
>
> Whilst checking this I did notice there are some errors in
> the example bus numbering but that's a separate issue.
>
> Jonathan
>
>
> > Signed-off-by: Hyeongtak Ji 
> > ---
> >  docs/system/devices/cxl.rst | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/docs/system/devices/cxl.rst b/docs/system/devices/cxl.rst
> > index 10a0e9bc9ff4..e2497e6a098b 100644
> > --- a/docs/system/devices/cxl.rst
> > +++ b/docs/system/devices/cxl.rst
> > @@ -218,17 +218,17 @@ Notes:
> >  A complex configuration here, might be to use the following HDM
> >  decoders in HB0. HDM0 routes CFMW0 requests to RP0 and hence
> >  part of CXL Type3 0. HDM1 routes CFMW0 requests from a
> > -different region of the CFMW0 PA range to RP2 and hence part
> > +different region of the CFMW0 PA range to RP1 and hence part
>
> Good catch.
>
> >  of CXL Type 3 1.  HDM2 routes yet another PA range from within
> >  CFMW0 to be interleaved across RP0 and RP1, providing 2 way
> >  interleave of part of the memory provided by CXL Type3 0 and
> >  CXL Type 3 1. HDM3 routes those interleaved accesses from
> >  CFMW1 that target HB0 to RP 0 and another part of the memory of
> >  CXL Type 3 0 (as part of a 2 way interleave at the system level
> > -across for example CXL Type3 0 and CXL Type3 2.
> > +across for example CXL Type3 0 and CXL Type3 1).
> This one is wrong.  CFMW1 interleaves across both host bridges so we need
> a device below HB0 and one below HB1, so CXL type3 2 is a possible choice
> (could be CXL type3 3 as well, but that doesn't matter.)
>
> >  HDM4 is used to enable system wide 4 way interleave across all
> >  the present CXL type3 devices, by interleaving those (interleaved)
> > -requests that

[PULL 21/23] block: rename former bdrv_file_open callbacks

Since there is no bdrv_file_open callback anymore, rename the implementations
so that they end with "_open" instead of "_file_open".  NFS is the exception
because all the functions are named nfs_file_*.

Suggested-by: Kevin Wolf 
Signed-off-by: Paolo Bonzini 
---
 block/blkio.c | 8 
 block/null.c  | 8 
 block/nvme.c  | 8 
 block/ssh.c   | 6 +++---
 4 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/block/blkio.c b/block/blkio.c
index 1a38064ce76..3d9a2e764c3 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -713,7 +713,7 @@ static int blkio_virtio_blk_connect(BlockDriverState *bs, 
QDict *options,
  * for example will fail.
  *
  * In order to open the device read-only, we are using the `read-only`
- * property of the libblkio driver in blkio_file_open().
+ * property of the libblkio driver in blkio_open().
  */
 fd = qemu_open(path, O_RDWR, NULL);
 if (fd < 0) {
@@ -791,8 +791,8 @@ static int blkio_virtio_blk_connect(BlockDriverState *bs, 
QDict *options,
 return 0;
 }
 
-static int blkio_file_open(BlockDriverState *bs, QDict *options, int flags,
-   Error **errp)
+static int blkio_open(BlockDriverState *bs, QDict *options, int flags,
+  Error **errp)
 {
 const char *blkio_driver = bs->drv->protocol_name;
 BDRVBlkioState *s = bs->opaque;
@@ -1088,7 +1088,7 @@ static void blkio_refresh_limits(BlockDriverState *bs, 
Error **errp)
  */
 #define BLKIO_DRIVER_COMMON \
 .instance_size   = sizeof(BDRVBlkioState), \
-.bdrv_open   = blkio_file_open, \
+.bdrv_open   = blkio_open, \
 .bdrv_close  = blkio_close, \
 .bdrv_co_getlength   = blkio_co_getlength, \
 .bdrv_co_truncate= blkio_truncate, \
diff --git a/block/null.c b/block/null.c
index 6fa64d20d86..4730acc1eb2 100644
--- a/block/null.c
+++ b/block/null.c
@@ -77,8 +77,8 @@ static void null_aio_parse_filename(const char *filename, 
QDict *options,
 }
 }
 
-static int null_file_open(BlockDriverState *bs, QDict *options, int flags,
-  Error **errp)
+static int null_open(BlockDriverState *bs, QDict *options, int flags,
+ Error **errp)
 {
 QemuOpts *opts;
 BDRVNullState *s = bs->opaque;
@@ -283,7 +283,7 @@ static BlockDriver bdrv_null_co = {
 .protocol_name  = "null-co",
 .instance_size  = sizeof(BDRVNullState),
 
-.bdrv_open  = null_file_open,
+.bdrv_open  = null_open,
 .bdrv_parse_filename= null_co_parse_filename,
 .bdrv_co_getlength  = null_co_getlength,
 .bdrv_co_get_allocated_file_size = null_co_get_allocated_file_size,
@@ -304,7 +304,7 @@ static BlockDriver bdrv_null_aio = {
 .protocol_name  = "null-aio",
 .instance_size  = sizeof(BDRVNullState),
 
-.bdrv_open  = null_file_open,
+.bdrv_open  = null_open,
 .bdrv_parse_filename= null_aio_parse_filename,
 .bdrv_co_getlength  = null_co_getlength,
 .bdrv_co_get_allocated_file_size = null_co_get_allocated_file_size,
diff --git a/block/nvme.c b/block/nvme.c
index c84914af6dd..3b588b139f6 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -889,7 +889,7 @@ out:
 qemu_vfio_pci_unmap_bar(s->vfio, 0, (void *)regs, 0, sizeof(NvmeBar));
 }
 
-/* Cleaning up is done in nvme_file_open() upon error. */
+/* Cleaning up is done in nvme_open() upon error. */
 return ret;
 }
 
@@ -967,8 +967,8 @@ static void nvme_close(BlockDriverState *bs)
 g_free(s->device);
 }
 
-static int nvme_file_open(BlockDriverState *bs, QDict *options, int flags,
-  Error **errp)
+static int nvme_open(BlockDriverState *bs, QDict *options, int flags,
+ Error **errp)
 {
 const char *device;
 QemuOpts *opts;
@@ -1630,7 +1630,7 @@ static BlockDriver bdrv_nvme = {
 .create_opts  = _create_opts_simple,
 
 .bdrv_parse_filename  = nvme_parse_filename,
-.bdrv_open= nvme_file_open,
+.bdrv_open= nvme_open,
 .bdrv_close   = nvme_close,
 .bdrv_co_getlength= nvme_co_getlength,
 .bdrv_probe_blocksizes= nvme_probe_blocksizes,
diff --git a/block/ssh.c b/block/ssh.c
index 1344822ed85..27d582e0e3d 100644
--- a/block/ssh.c
+++ b/block/ssh.c
@@ -837,8 +837,8 @@ static int connect_to_ssh(BDRVSSHState *s, 
BlockdevOptionsSsh *opts,
 return ret;
 }
 
-static int ssh_file_open(BlockDriverState *bs, QDict *options, int bdrv_flags,
- Error **errp)
+static int ssh_open(BlockDriverState *bs, QDict *options, int bdrv_flags,
+Error **errp)
 {
 BDRVSSHState *s = bs->opaque;
 BlockdevOptionsSsh *opts;
@@ -1362,7 +1362,7 @@ static BlockDriver bdrv_ssh = {
 .protocol_name= "ssh",
 .instance_size

[PULL 15/23] Revert "host/i386: assume presence of SSE2"

This reverts commit b18236897ca15c3db1506d8edb9a191dfe51429c.
The x86-64 instruction set can now be tuned down to x86-64 v1
or i386 Pentium Pro.

Signed-off-by: Paolo Bonzini 
---
 host/include/i386/host/cpuinfo.h | 1 +
 util/bufferiszero.c  | 4 ++--
 util/cpuinfo-i386.c  | 1 +
 3 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/host/include/i386/host/cpuinfo.h b/host/include/i386/host/cpuinfo.h
index 72f6fad61e5..81771733eaa 100644
--- a/host/include/i386/host/cpuinfo.h
+++ b/host/include/i386/host/cpuinfo.h
@@ -14,6 +14,7 @@
 #define CPUINFO_POPCNT  (1u << 4)
 #define CPUINFO_BMI1(1u << 5)
 #define CPUINFO_BMI2(1u << 6)
+#define CPUINFO_SSE2(1u << 7)
 #define CPUINFO_AVX1(1u << 9)
 #define CPUINFO_AVX2(1u << 10)
 #define CPUINFO_AVX512F (1u << 11)
diff --git a/util/bufferiszero.c b/util/bufferiszero.c
index 11c080e02cf..74864f7b782 100644
--- a/util/bufferiszero.c
+++ b/util/bufferiszero.c
@@ -188,14 +188,14 @@ static biz_accel_fn const accel_table[] = {
 
 static unsigned best_accel(void)
 {
-#ifdef CONFIG_AVX2_OPT
 unsigned info = cpuinfo_init();
 
+#ifdef CONFIG_AVX2_OPT
 if (info & CPUINFO_AVX2) {
 return 2;
 }
 #endif
-return 1;
+return info & CPUINFO_SSE2 ? 1 : 0;
 }
 
 #elif defined(__aarch64__) && defined(__ARM_NEON)
diff --git a/util/cpuinfo-i386.c b/util/cpuinfo-i386.c
index ca74ef04f54..90f92a42dc8 100644
--- a/util/cpuinfo-i386.c
+++ b/util/cpuinfo-i386.c
@@ -34,6 +34,7 @@ unsigned __attribute__((constructor)) cpuinfo_init(void)
 if (max >= 1) {
 __cpuid(1, a, b, c, d);
 
+info |= (d & bit_SSE2 ? CPUINFO_SSE2 : 0);
 info |= (c & bit_MOVBE ? CPUINFO_MOVBE : 0);
 info |= (c & bit_POPCNT ? CPUINFO_POPCNT : 0);
 info |= (c & bit_PCLMUL ? CPUINFO_PCLMUL : 0);
-- 
2.45.2

[PULL 17/23] meson: remove dead optimization option

Reviewed-by: Richard Henderson 
Signed-off-by: Paolo Bonzini 
---
 meson.build   | 13 -
 meson_options.txt |  2 --
 scripts/meson-buildoptions.sh |  3 ---
 3 files changed, 18 deletions(-)

diff --git a/meson.build b/meson.build
index 6e694ecd9fe..54e6b09f4fb 100644
--- a/meson.build
+++ b/meson.build
@@ -2874,18 +2874,6 @@ config_host_data.set('CONFIG_AVX2_OPT', 
get_option('avx2') \
 int main(int argc, char *argv[]) { return bar(argv[argc - 1]); }
   '''), error_message: 'AVX2 not available').allowed())
 
-config_host_data.set('CONFIG_AVX512F_OPT', get_option('avx512f') \
-  .require(have_cpuid_h, error_message: 'cpuid.h not available, cannot enable 
AVX512F') \
-  .require(cc.links('''
-#include 
-#include 
-static int __attribute__((target("avx512f"))) bar(void *a) {
-  __m512i x = *(__m512i *)a;
-  return _mm512_test_epi64_mask(x, x);
-}
-int main(int argc, char *argv[]) { return bar(argv[argc - 1]); }
-  '''), error_message: 'AVX512F not available').allowed())
-
 config_host_data.set('CONFIG_AVX512BW_OPT', get_option('avx512bw') \
   .require(have_cpuid_h, error_message: 'cpuid.h not available, cannot enable 
AVX512BW') \
   .require(cc.links('''
@@ -4283,7 +4271,6 @@ summary_info += {'mutex debugging':   
get_option('debug_mutex')}
 summary_info += {'memory allocator':  get_option('malloc')}
 summary_info += {'avx2 optimization': config_host_data.get('CONFIG_AVX2_OPT')}
 summary_info += {'avx512bw optimization': 
config_host_data.get('CONFIG_AVX512BW_OPT')}
-summary_info += {'avx512f optimization': 
config_host_data.get('CONFIG_AVX512F_OPT')}
 summary_info += {'gcov':  get_option('b_coverage')}
 summary_info += {'thread sanitizer':  get_option('tsan')}
 summary_info += {'CFI support':   get_option('cfi')}
diff --git a/meson_options.txt b/meson_options.txt
index 6065ed2d352..0269fa0f16e 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -119,8 +119,6 @@ option('membarrier', type: 'feature', value: 'disabled',
 
 option('avx2', type: 'feature', value: 'auto',
description: 'AVX2 optimizations')
-option('avx512f', type: 'feature', value: 'disabled',
-   description: 'AVX512F optimizations')
 option('avx512bw', type: 'feature', value: 'auto',
description: 'AVX512BW optimizations')
 option('keyring', type: 'feature', value: 'auto',
diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
index 62842d47e88..cfadb5ea86a 100644
--- a/scripts/meson-buildoptions.sh
+++ b/scripts/meson-buildoptions.sh
@@ -95,7 +95,6 @@ meson_options_help() {
   printf "%s\n" '  auth-pamPAM access control'
   printf "%s\n" '  avx2AVX2 optimizations'
   printf "%s\n" '  avx512bwAVX512BW optimizations'
-  printf "%s\n" '  avx512f AVX512F optimizations'
   printf "%s\n" '  blkio   libblkio block device driver'
   printf "%s\n" '  bochs   bochs image format support'
   printf "%s\n" '  bpf eBPF support'
@@ -240,8 +239,6 @@ _meson_option_parse() {
 --disable-avx2) printf "%s" -Davx2=disabled ;;
 --enable-avx512bw) printf "%s" -Davx512bw=enabled ;;
 --disable-avx512bw) printf "%s" -Davx512bw=disabled ;;
---enable-avx512f) printf "%s" -Davx512f=enabled ;;
---disable-avx512f) printf "%s" -Davx512f=disabled ;;
 --enable-gcov) printf "%s" -Db_coverage=true ;;
 --disable-gcov) printf "%s" -Db_coverage=false ;;
 --enable-lto) printf "%s" -Db_lto=true ;;
-- 
2.45.2

[PULL 02/23] target/i386: fix CC_OP dump

POPCNT was missing, and the entries were all out of order after
ADCX/ADOX/ADCOX were moved close to EFLAGS.  Just use designated
initializers.

Fixes: 4885c3c4953 ("target-i386: Use ctpop helper", 2017-01-10)
Fixes: cc155f19717 ("target/i386: rewrite flags writeback for ADCX/ADOX", 
2024-06-11)
Signed-off-by: Paolo Bonzini 
---
 target/i386/cpu-dump.c | 101 +
 1 file changed, 51 insertions(+), 50 deletions(-)

diff --git a/target/i386/cpu-dump.c b/target/i386/cpu-dump.c
index 40697064d92..3bb8e440916 100644
--- a/target/i386/cpu-dump.c
+++ b/target/i386/cpu-dump.c
@@ -28,69 +28,70 @@
 /* x86 debug */
 
 static const char *cc_op_str[CC_OP_NB] = {
-"DYNAMIC",
-"EFLAGS",
+[CC_OP_DYNAMIC] = "DYNAMIC",
 
-"MULB",
-"MULW",
-"MULL",
-"MULQ",
+[CC_OP_EFLAGS] = "EFLAGS",
+[CC_OP_ADCX] = "ADCX",
+[CC_OP_ADOX] = "ADOX",
+[CC_OP_ADCOX] = "ADCOX",
 
-"ADDB",
-"ADDW",
-"ADDL",
-"ADDQ",
+[CC_OP_MULB] = "MULB",
+[CC_OP_MULW] = "MULW",
+[CC_OP_MULL] = "MULL",
+[CC_OP_MULQ] = "MULQ",
 
-"ADCB",
-"ADCW",
-"ADCL",
-"ADCQ",
+[CC_OP_ADDB] = "ADDB",
+[CC_OP_ADDW] = "ADDW",
+[CC_OP_ADDL] = "ADDL",
+[CC_OP_ADDQ] = "ADDQ",
 
-"SUBB",
-"SUBW",
-"SUBL",
-"SUBQ",
+[CC_OP_ADCB] = "ADCB",
+[CC_OP_ADCW] = "ADCW",
+[CC_OP_ADCL] = "ADCL",
+[CC_OP_ADCQ] = "ADCQ",
 
-"SBBB",
-"SBBW",
-"SBBL",
-"SBBQ",
+[CC_OP_SUBB] = "SUBB",
+[CC_OP_SUBW] = "SUBW",
+[CC_OP_SUBL] = "SUBL",
+[CC_OP_SUBQ] = "SUBQ",
 
-"LOGICB",
-"LOGICW",
-"LOGICL",
-"LOGICQ",
+[CC_OP_SBBB] = "SBBB",
+[CC_OP_SBBW] = "SBBW",
+[CC_OP_SBBL] = "SBBL",
+[CC_OP_SBBQ] = "SBBQ",
 
-"INCB",
-"INCW",
-"INCL",
-"INCQ",
+[CC_OP_LOGICB] = "LOGICB",
+[CC_OP_LOGICW] = "LOGICW",
+[CC_OP_LOGICL] = "LOGICL",
+[CC_OP_LOGICQ] = "LOGICQ",
 
-"DECB",
-"DECW",
-"DECL",
-"DECQ",
+[CC_OP_INCB] = "INCB",
+[CC_OP_INCW] = "INCW",
+[CC_OP_INCL] = "INCL",
+[CC_OP_INCQ] = "INCQ",
 
-"SHLB",
-"SHLW",
-"SHLL",
-"SHLQ",
+[CC_OP_DECB] = "DECB",
+[CC_OP_DECW] = "DECW",
+[CC_OP_DECL] = "DECL",
+[CC_OP_DECQ] = "DECQ",
 
-"SARB",
-"SARW",
-"SARL",
-"SARQ",
+[CC_OP_SHLB] = "SHLB",
+[CC_OP_SHLW] = "SHLW",
+[CC_OP_SHLL] = "SHLL",
+[CC_OP_SHLQ] = "SHLQ",
 
-"BMILGB",
-"BMILGW",
-"BMILGL",
-"BMILGQ",
+[CC_OP_SARB] = "SARB",
+[CC_OP_SARW] = "SARW",
+[CC_OP_SARL] = "SARL",
+[CC_OP_SARQ] = "SARQ",
 
-"ADCX",
-"ADOX",
-"ADCOX",
+[CC_OP_BMILGB] = "BMILGB",
+[CC_OP_BMILGW] = "BMILGW",
+[CC_OP_BMILGL] = "BMILGL",
+[CC_OP_BMILGQ] = "BMILGQ",
 
-"CLR",
+[CC_OP_POPCNT] = "POPCNT",
+[CC_OP_CLR] = "CLR",
 };
 
 static void
-- 
2.45.2

[PULL 07/23] target/i386: decode address before going back to translate.c

There are now relatively few unconverted opcodes in translate.c (there
are 13 of them including 8 for x87), and all of them have the same
format with a mod/rm byte and no immediate.  A good next step is
to remove the early bail out to disas_insn_x87/disas_insn_old,
instead giving these legacy translator functions the same prototype
as the other gen_* functions.

To do this, the X86DecodeInsn can be passed down to the places that
used to fetch address bytes from the instruction stream.  To make
sure that everything is done cleanly, the CPUX86State* argument is
removed.

As part of the unification, the gen_lea_modrm() name is now free,
so rename gen_load_ea() to gen_lea_modrm().  This is as good a name
and it makes the changes to translate.c easier to review.

Reviewed-by: Richard Henderson 
Signed-off-by: Paolo Bonzini 
---
 target/i386/tcg/decode-new.h |  14 ++-
 target/i386/tcg/translate.c  | 152 +--
 target/i386/tcg/decode-new.c.inc |  53 ++-
 target/i386/tcg/emit.c.inc   |   2 +-
 4 files changed, 103 insertions(+), 118 deletions(-)

diff --git a/target/i386/tcg/decode-new.h b/target/i386/tcg/decode-new.h
index e4cdf5e3c4f..bebc77bd54b 100644
--- a/target/i386/tcg/decode-new.h
+++ b/target/i386/tcg/decode-new.h
@@ -264,12 +264,13 @@ typedef enum X86VEXSpecial {
 
 typedef struct X86OpEntry  X86OpEntry;
 typedef struct X86DecodedInsn X86DecodedInsn;
+struct DisasContext;
 
 /* Decode function for multibyte opcodes.  */
-typedef void (*X86DecodeFunc)(DisasContext *s, CPUX86State *env, X86OpEntry 
*entry, uint8_t *b);
+typedef void (*X86DecodeFunc)(struct DisasContext *s, CPUX86State *env, 
X86OpEntry *entry, uint8_t *b);
 
 /* Code generation function.  */
-typedef void (*X86GenFunc)(DisasContext *s, X86DecodedInsn *decode);
+typedef void (*X86GenFunc)(struct DisasContext *s, X86DecodedInsn *decode);
 
 struct X86OpEntry {
 /* Based on the is_decode flags.  */
@@ -316,6 +317,14 @@ typedef struct X86DecodedOp {
 };
 } X86DecodedOp;
 
+typedef struct AddressParts {
+int def_seg;
+int base;
+int index;
+int scale;
+target_long disp;
+} AddressParts;
+
 struct X86DecodedInsn {
 X86OpEntry e;
 X86DecodedOp op[3];
@@ -333,3 +342,4 @@ struct X86DecodedInsn {
 uint8_t b;
 };
 
+static void gen_lea_modrm(struct DisasContext *s, X86DecodedInsn *decode);
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 08db40681fa..1d845ff66bb 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -29,6 +29,7 @@
 #include "exec/helper-proto.h"
 #include "exec/helper-gen.h"
 #include "helper-tcg.h"
+#include "decode-new.h"
 
 #include "exec/log.h"
 
@@ -1529,14 +1530,6 @@ static inline uint64_t x86_ldq_code(CPUX86State *env, 
DisasContext *s)
 
 /* Decompose an address.  */
 
-typedef struct AddressParts {
-int def_seg;
-int base;
-int index;
-int scale;
-target_long disp;
-} AddressParts;
-
 static AddressParts gen_lea_modrm_0(CPUX86State *env, DisasContext *s,
 int modrm)
 {
@@ -1695,24 +1688,11 @@ static TCGv gen_lea_modrm_1(DisasContext *s, 
AddressParts a, bool is_vsib)
 return ea;
 }
 
-static void gen_lea_modrm(CPUX86State *env, DisasContext *s, int modrm)
-{
-AddressParts a = gen_lea_modrm_0(env, s, modrm);
-TCGv ea = gen_lea_modrm_1(s, a, false);
-gen_lea_v_seg(s, ea, a.def_seg, s->override);
-}
-
-static void gen_nop_modrm(CPUX86State *env, DisasContext *s, int modrm)
-{
-(void)gen_lea_modrm_0(env, s, modrm);
-}
-
 /* Used for BNDCL, BNDCU, BNDCN.  */
-static void gen_bndck(CPUX86State *env, DisasContext *s, int modrm,
+static void gen_bndck(DisasContext *s, X86DecodedInsn *decode,
   TCGCond cond, TCGv_i64 bndv)
 {
-AddressParts a = gen_lea_modrm_0(env, s, modrm);
-TCGv ea = gen_lea_modrm_1(s, a, false);
+TCGv ea = gen_lea_modrm_1(s, decode->mem, false);
 
 tcg_gen_extu_tl_i64(s->tmp1_i64, ea);
 if (!CODE64(s)) {
@@ -1724,8 +1704,9 @@ static void gen_bndck(CPUX86State *env, DisasContext *s, 
int modrm,
 }
 
 /* generate modrm load of memory or register. */
-static void gen_ld_modrm(CPUX86State *env, DisasContext *s, int modrm, MemOp 
ot)
+static void gen_ld_modrm(DisasContext *s, X86DecodedInsn *decode, MemOp ot)
 {
+int modrm = s->modrm;
 int mod, rm;
 
 mod = (modrm >> 6) & 3;
@@ -1733,14 +1714,15 @@ static void gen_ld_modrm(CPUX86State *env, DisasContext 
*s, int modrm, MemOp ot)
 if (mod == 3) {
 gen_op_mov_v_reg(s, ot, s->T0, rm);
 } else {
-gen_lea_modrm(env, s, modrm);
+gen_lea_modrm(s, decode);
 gen_op_ld_v(s, ot, s->T0, s->A0);
 }
 }
 
 /* generate modrm store of memory or register. */
-static void gen_st_modrm(CPUX86State *env, DisasContext *s, int modrm, MemOp 
ot)
+static void gen_st_modrm(DisasContext *s, X86DecodedInsn *decode, MemOp ot)
 {
+int modrm = s->modrm;
 int mod, rm;
 
 mod =

[PULL 01/23] configure: detect --cpu=mipsisa64r6

Treat it as a MIPS64 machine.

Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Thomas Huth 
Signed-off-by: Paolo Bonzini 
---
 configure | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/configure b/configure
index 5ad1674ca5f..8b6a2f16ceb 100755
--- a/configure
+++ b/configure
@@ -450,7 +450,7 @@ case "$cpu" in
 linux_arch=loongarch
 ;;
 
-  mips64*)
+  mips64*|mipsisa64*)
 cpu=mips64
 host_arch=mips
 linux_arch=mips
-- 
2.45.2

[PULL 14/23] Revert "host/i386: assume presence of SSSE3"

This reverts commit 433cd6d94a8256af70a5200f236dc8047c3c1468.
The x86-64 instruction set can now be tuned down to x86-64 v1
or i386 Pentium Pro.

Signed-off-by: Paolo Bonzini 
---
 util/cpuinfo-i386.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/util/cpuinfo-i386.c b/util/cpuinfo-i386.c
index 6d474a6259a..ca74ef04f54 100644
--- a/util/cpuinfo-i386.c
+++ b/util/cpuinfo-i386.c
@@ -38,8 +38,8 @@ unsigned __attribute__((constructor)) cpuinfo_init(void)
 info |= (c & bit_POPCNT ? CPUINFO_POPCNT : 0);
 info |= (c & bit_PCLMUL ? CPUINFO_PCLMUL : 0);
 
-/* NOTE: our AES support requires SSSE3 (PSHUFB) as well. */
-info |= (c & bit_AES) ? CPUINFO_AES : 0;
+/* Our AES support requires PSHUFB as well. */
+info |= ((c & bit_AES) && (c & bit_SSSE3) ? CPUINFO_AES : 0);
 
 /* For AVX features, we must check available and usable. */
 if ((c & bit_AVX) && (c & bit_OSXSAVE)) {
-- 
2.45.2

[PULL 22/23] exec: avoid using C++ keywords in function parameters

From: Roman Kiryanov 

to use the QEMU headers with a C++ compiler.

Signed-off-by: Roman Kiryanov 
Link: https://lore.kernel.org/r/20240618224553.878869-1-r...@google.com
Signed-off-by: Paolo Bonzini 
---
 include/exec/memory.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 1be58f694c9..d7591a60d9f 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -945,7 +945,7 @@ struct MemoryListener {
  * the current transaction.
  */
 void (*log_start)(MemoryListener *listener, MemoryRegionSection *section,
-  int old, int new);
+  int old_val, int new_val);
 
 /**
  * @log_stop:
@@ -964,7 +964,7 @@ struct MemoryListener {
  * the current transaction.
  */
 void (*log_stop)(MemoryListener *listener, MemoryRegionSection *section,
- int old, int new);
+ int old_val, int new_val);
 
 /**
  * @log_sync:
-- 
2.45.2

[PULL 10/23] target/i386: list instructions still in translate.c

Group them so that it is easier to figure out which two-byte opcodes to
tackle together.

Reviewed-by: Richard Henderson 
Signed-off-by: Paolo Bonzini 
---
 target/i386/tcg/decode-new.c.inc | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc
index fa51aadfcf2..f01a4f1f1fe 100644
--- a/target/i386/tcg/decode-new.c.inc
+++ b/target/i386/tcg/decode-new.c.inc
@@ -129,6 +129,37 @@
  *
  *(^)  these are the two cases in which Intel and AMD disagree on the
  * primary exception class
+ *
+ * Instructions still in translate.c
+ * -
+ * Generation of TCG opcodes for almost all instructions is in emit.c.inc;
+ * this file interprets the prefixes and opcode bytes down to individual
+ * instruction mnemonics.  There is only a handful of opcodes still using
+ * a switch statement to decode modrm bits 3-5 and prefixes after decoding
+ * is complete; these are relics of the older x86 decoder and their code
+ * generation is performed in translate.c.
+ *
+ * These unconverted opcodes also perform their own effective address
+ * generation using the gen_lea_modrm() function.
+ *
+ * There is nothing particularly complicated about them; simply, they don't
+ * need any nasty hacks in the decoder, and they shouldn't get in the way
+ * of the implementation of new x86 instructions, so they are left alone
+ * for the time being.
+ *
+ * x87:
+ * 0xD8 - 0xDF
+ *
+ * privileged/system:
+ * 0x0F 0x00   group 6 (SLDT, STR, LLDT, LTR, VERR, VERW)
+ * 0x0F 0x01   group 7 (SGDT, SIDT, LGDT, LIDT, SMSW, LMSW, INVLPG,
+ *  MONITOR, MWAIT, CLAC, STAC, XGETBV, XSETBV,
+ *  SWAPGS, RDTSCP)
+ * 0x0F 0xC7 (reg operand) group 9 (RDRAND, RDSEED, RDPID)
+ *
+ * MPX:
+ * 0x0F 0x1A   BNDLDX, BNDMOV, BNDCL, BNDCU
+ * 0x0F 0x1B   BNDSTX, BNDMOV, BNDMK, BNDCN
  */
 
 #define X86_OP_NONE { 0 },
-- 
2.45.2

[PULL 04/23] target/i386: give CC_OP_POPCNT low bits corresponding to MO_TL

Handle it like the other arithmetic cc_ops.  This simplifies a
bit the implementation of bit test instructions.

Reviewed-by: Richard Henderson 
Signed-off-by: Paolo Bonzini 
---
 target/i386/cpu.h   | 13 +++--
 target/i386/tcg/translate.c |  3 +--
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index f54cd93b3f9..8504a7998fd 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1275,6 +1275,7 @@ typedef enum {
 CC_OP_ADCX, /* CC_DST = C, CC_SRC = rest.  */
 CC_OP_ADOX, /* CC_SRC2 = O, CC_SRC = rest.  */
 CC_OP_ADCOX, /* CC_DST = C, CC_SRC2 = O, CC_SRC = rest.  */
+CC_OP_CLR, /* Z and P set, all other flags clear.  */
 
 CC_OP_MULB, /* modify all flags, C, O = (CC_SRC != 0) */
 CC_OP_MULW,
@@ -1331,8 +1332,16 @@ typedef enum {
 CC_OP_BMILGL,
 CC_OP_BMILGQ,
 
-CC_OP_CLR, /* Z set, all other flags clear.  */
-CC_OP_POPCNT, /* Z via CC_DST, all other flags clear.  */
+/*
+ * Note that only CC_OP_POPCNT (i.e. the one with MO_TL size)
+ * is used or implemented, because the translation needs
+ * to zero-extend CC_DST anyway.
+ */
+CC_OP_POPCNTB__, /* Z via CC_DST, all other flags clear.  */
+CC_OP_POPCNTW__,
+CC_OP_POPCNTL__,
+CC_OP_POPCNTQ__,
+CC_OP_POPCNT = sizeof(target_ulong) == 8 ? CC_OP_POPCNTQ__ : 
CC_OP_POPCNTL__,
 
 CC_OP_NB,
 } CCOp;
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index eb353dc3c9f..934c514e64f 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -1019,8 +1019,6 @@ static CCPrepare gen_prepare_eflags_z(DisasContext *s, 
TCGv reg)
  .imm = CC_Z };
 case CC_OP_CLR:
 return (CCPrepare) { .cond = TCG_COND_ALWAYS };
-case CC_OP_POPCNT:
-return (CCPrepare) { .cond = TCG_COND_EQ, .reg = cpu_cc_dst };
 default:
 {
 MemOp size = (s->cc_op - CC_OP_ADDB) & 3;
@@ -3177,6 +3175,7 @@ static void disas_insn_old(DisasContext *s, CPUState 
*cpu, int b)
 case CC_OP_SHLB ... CC_OP_SHLQ:
 case CC_OP_SARB ... CC_OP_SARQ:
 case CC_OP_BMILGB ... CC_OP_BMILGQ:
+case CC_OP_POPCNT:
 /* Z was going to be computed from the non-zero status of CC_DST.
We can get that same Z value (and the new C value) by leaving
CC_DST alone, setting CC_SRC, and using a CC_OP_SAR of the
-- 
2.45.2

[PULL 00/23] Misc changes for 2024-06-22

The following changes since commit 223696363bb117241ad9c2facbff0c474afa4104:

  Merge tag 'edgar/xilinx-queue-2024-06-17.for-upstream' of 
https://gitlab.com/edgar.iglesias/qemu into staging (2024-06-18 13:08:01 -0700)

are available in the Git repository at:

  https://gitlab.com/bonzini/qemu.git tags/for-upstream

for you to fetch changes up to b9b51004033983589e00fb4697f620b903cfcf0e:

  exec: don't use void* in pointer arithmetic in headers (2024-06-21 18:32:18 
+0200)


* configure: detect --cpu=mipsisa64r6
* target/i386: decode address before going back to translate.c
* meson: allow configuring the x86-64 baseline
* meson: remove dead optimization option
* exec: small changes to allow compilation with C++ in Android emulator


Paolo Bonzini (21):
  configure: detect --cpu=mipsisa64r6
  target/i386: fix CC_OP dump
  target/i386: use cpu_cc_dst for CC_OP_POPCNT
  target/i386: give CC_OP_POPCNT low bits corresponding to MO_TL
  target/i386: convert bit test instructions to new decoder
  target/i386: try not to force EFLAGS computation for CC_OP_ADOX/ADCX
  target/i386: decode address before going back to translate.c
  target/i386: convert CMPXCHG8B/CMPXCHG16B to new decoder
  target/i386: do not check PREFIX_LOCK in old-style decoder
  target/i386: list instructions still in translate.c
  target/i386: assert that cc_op* and pc_save are preserved
  target/i386: remove gen_ext_tl
  Revert "host/i386: assume presence of POPCNT"
  Revert "host/i386: assume presence of SSSE3"
  Revert "host/i386: assume presence of SSE2"
  meson: allow configuring the x86-64 baseline
  meson: remove dead optimization option
  block: make assertion more generic
  block: do not check bdrv_file_open
  block: remove separate bdrv_file_open callback
  block: rename former bdrv_file_open callbacks

Roman Kiryanov (2):
  exec: avoid using C++ keywords in function parameters
  exec: don't use void* in pointer arithmetic in headers

 configure|   2 +-
 meson.build  |  54 +++--
 host/include/i386/host/cpuinfo.h |   2 +
 include/block/block_int-common.h |   3 -
 include/exec/memory.h|   6 +-
 target/i386/cpu.h|  13 +-
 target/i386/tcg/decode-new.h |  19 +-
 tcg/i386/tcg-target.h|   5 +-
 block.c  |  17 +-
 block/blkdebug.c |   2 +-
 block/blkio.c|   8 +-
 block/blkverify.c|   2 +-
 block/curl.c |   8 +-
 block/file-posix.c   |   8 +-
 block/file-win32.c   |   4 +-
 block/gluster.c  |   6 +-
 block/iscsi.c|   4 +-
 block/nbd.c  |   6 +-
 block/nfs.c  |   2 +-
 block/null.c |   8 +-
 block/nvme.c |   8 +-
 block/rbd.c  |   3 +-
 block/ssh.c  |   6 +-
 block/vvfat.c|   2 +-
 target/i386/cpu-dump.c   | 101 
 target/i386/tcg/cc_helper.c  |   2 +-
 target/i386/tcg/translate.c  | 492 ---
 util/bufferiszero.c  |   4 +-
 util/cpuinfo-i386.c  |   6 +-
 target/i386/tcg/decode-new.c.inc | 136 ---
 target/i386/tcg/emit.c.inc   | 249 +++-
 meson_options.txt|   5 +-
 scripts/meson-buildoptions.sh|   6 +-
 33 files changed, 618 insertions(+), 581 deletions(-)
-- 
2.45.2

[PULL 05/23] target/i386: convert bit test instructions to new decoder

Reviewed-by: Richard Henderson 
Signed-off-by: Paolo Bonzini 
---
 target/i386/tcg/decode-new.h |   3 +
 target/i386/tcg/translate.c  | 147 +-
 target/i386/tcg/decode-new.c.inc |  40 ++---
 target/i386/tcg/emit.c.inc   | 149 ++-
 4 files changed, 181 insertions(+), 158 deletions(-)

diff --git a/target/i386/tcg/decode-new.h b/target/i386/tcg/decode-new.h
index f9bf9a60411..e4cdf5e3c4f 100644
--- a/target/i386/tcg/decode-new.h
+++ b/target/i386/tcg/decode-new.h
@@ -190,6 +190,9 @@ typedef enum X86InsnSpecial {
 /* Always locked if it has a memory operand (XCHG) */
 X86_SPECIAL_Locked,
 
+/* Like HasLock, but also operand 2 provides bit displacement into memory. 
 */
+X86_SPECIAL_BitTest,
+
 /* Do not load effective address in s->A0 */
 X86_SPECIAL_NoLoadEA,
 
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 934c514e64f..257110ac703 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -708,11 +708,6 @@ static TCGv gen_ext_tl(TCGv dst, TCGv src, MemOp size, 
bool sign)
 return dst;
 }
 
-static void gen_exts(MemOp ot, TCGv reg)
-{
-gen_ext_tl(reg, reg, ot, true);
-}
-
 static void gen_op_j_ecx(DisasContext *s, TCGCond cond, TCGLabel *label1)
 {
 TCGv tmp = gen_ext_tl(NULL, cpu_regs[R_ECX], s->aflag, false);
@@ -2985,7 +2980,7 @@ static void disas_insn_old(DisasContext *s, CPUState 
*cpu, int b)
 int prefixes = s->prefix;
 MemOp dflag = s->dflag;
 MemOp ot;
-int modrm, reg, rm, mod, op, val;
+int modrm, reg, rm, mod, op;
 
 /* now check op code */
 switch (b) {
@@ -3051,146 +3046,6 @@ static void disas_insn_old(DisasContext *s, CPUState 
*cpu, int b)
 }
 break;
 
-//
-/* bit operations */
-case 0x1ba: /* bt/bts/btr/btc Gv, im */
-ot = dflag;
-modrm = x86_ldub_code(env, s);
-op = (modrm >> 3) & 7;
-mod = (modrm >> 6) & 3;
-rm = (modrm & 7) | REX_B(s);
-if (mod != 3) {
-s->rip_offset = 1;
-gen_lea_modrm(env, s, modrm);
-if (!(s->prefix & PREFIX_LOCK)) {
-gen_op_ld_v(s, ot, s->T0, s->A0);
-}
-} else {
-gen_op_mov_v_reg(s, ot, s->T0, rm);
-}
-/* load shift */
-val = x86_ldub_code(env, s);
-tcg_gen_movi_tl(s->T1, val);
-if (op < 4)
-goto unknown_op;
-op -= 4;
-goto bt_op;
-case 0x1a3: /* bt Gv, Ev */
-op = 0;
-goto do_btx;
-case 0x1ab: /* bts */
-op = 1;
-goto do_btx;
-case 0x1b3: /* btr */
-op = 2;
-goto do_btx;
-case 0x1bb: /* btc */
-op = 3;
-do_btx:
-ot = dflag;
-modrm = x86_ldub_code(env, s);
-reg = ((modrm >> 3) & 7) | REX_R(s);
-mod = (modrm >> 6) & 3;
-rm = (modrm & 7) | REX_B(s);
-gen_op_mov_v_reg(s, MO_32, s->T1, reg);
-if (mod != 3) {
-AddressParts a = gen_lea_modrm_0(env, s, modrm);
-/* specific case: we need to add a displacement */
-gen_exts(ot, s->T1);
-tcg_gen_sari_tl(s->tmp0, s->T1, 3 + ot);
-tcg_gen_shli_tl(s->tmp0, s->tmp0, ot);
-tcg_gen_add_tl(s->A0, gen_lea_modrm_1(s, a, false), s->tmp0);
-gen_lea_v_seg(s, s->A0, a.def_seg, s->override);
-if (!(s->prefix & PREFIX_LOCK)) {
-gen_op_ld_v(s, ot, s->T0, s->A0);
-}
-} else {
-gen_op_mov_v_reg(s, ot, s->T0, rm);
-}
-bt_op:
-tcg_gen_andi_tl(s->T1, s->T1, (1 << (3 + ot)) - 1);
-tcg_gen_movi_tl(s->tmp0, 1);
-tcg_gen_shl_tl(s->tmp0, s->tmp0, s->T1);
-if (s->prefix & PREFIX_LOCK) {
-switch (op) {
-case 0: /* bt */
-/* Needs no atomic ops; we suppressed the normal
-   memory load for LOCK above so do it now.  */
-gen_op_ld_v(s, ot, s->T0, s->A0);
-break;
-case 1: /* bts */
-tcg_gen_atomic_fetch_or_tl(s->T0, s->A0, s->tmp0,
-   s->mem_index, ot | MO_LE);
-break;
-case 2: /* btr */
-tcg_gen_not_tl(s->tmp0, s->tmp0);
-tcg_gen_atomic_fetch_and_tl(s->T0, s->A0, s->tmp0,
-s->mem_index, ot | MO_LE);
-break;
-default:
-case 3: /* btc */
-tcg_gen_atomic_fetch_xor_tl(s->T0, s->A0, s->tmp0,
-s->mem_index, ot | MO_LE);
-break;
-}
-tcg_gen_shr_tl(s->tmp4, s->T0, s->T1);
-} else {
-tcg_gen_shr_tl(s->tmp4, s->T0, s->T1);
-switch (op) {
-case 0: /* bt */
-/* Data

[PULL 08/23] target/i386: convert CMPXCHG8B/CMPXCHG16B to new decoder

This moves the last LOCK-enabled instructions to the new decoder.  It is now
possible to assume that PREFIX_LOCK gen_multi0F is called only after checking
that LOCK was not specified.

The gen_cmpxchg8b and gen_cmpxchg16b functions even have the correct
prototype already; the only thing that needs to be done is removing the
gen_lea_modrm() call.

Reviewed-by: Richard Henderson 
Signed-off-by: Paolo Bonzini 
---
 target/i386/tcg/decode-new.h |   2 +
 target/i386/tcg/translate.c  | 121 +--
 target/i386/tcg/decode-new.c.inc |  34 ++---
 target/i386/tcg/emit.c.inc   |  96 
 4 files changed, 124 insertions(+), 129 deletions(-)

diff --git a/target/i386/tcg/decode-new.h b/target/i386/tcg/decode-new.h
index bebc77bd54b..7f23d373ea7 100644
--- a/target/i386/tcg/decode-new.h
+++ b/target/i386/tcg/decode-new.h
@@ -114,6 +114,8 @@ typedef enum X86CPUIDFeature {
 X86_FEAT_CLWB,
 X86_FEAT_CMOV,
 X86_FEAT_CMPCCXADD,
+X86_FEAT_CX8,
+X86_FEAT_CX16,
 X86_FEAT_F16C,
 X86_FEAT_FMA,
 X86_FEAT_FSGSBASE,
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 1d845ff66bb..c60f18c7482 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -2298,104 +2298,6 @@ static void gen_sty_env_A0(DisasContext *s, int offset, 
bool align)
 tcg_gen_qemu_st_i128(t, s->tmp0, mem_index, mop);
 }
 
-static void gen_cmpxchg8b(DisasContext *s, X86DecodedInsn *decode)
-{
-TCGv_i64 cmp, val, old;
-TCGv Z;
-
-gen_lea_modrm(s, decode);
-
-cmp = tcg_temp_new_i64();
-val = tcg_temp_new_i64();
-old = tcg_temp_new_i64();
-
-/* Construct the comparison values from the register pair. */
-tcg_gen_concat_tl_i64(cmp, cpu_regs[R_EAX], cpu_regs[R_EDX]);
-tcg_gen_concat_tl_i64(val, cpu_regs[R_EBX], cpu_regs[R_ECX]);
-
-/* Only require atomic with LOCK; non-parallel handled in generator. */
-if (s->prefix & PREFIX_LOCK) {
-tcg_gen_atomic_cmpxchg_i64(old, s->A0, cmp, val, s->mem_index, 
MO_TEUQ);
-} else {
-tcg_gen_nonatomic_cmpxchg_i64(old, s->A0, cmp, val,
-  s->mem_index, MO_TEUQ);
-}
-
-/* Set tmp0 to match the required value of Z. */
-tcg_gen_setcond_i64(TCG_COND_EQ, cmp, old, cmp);
-Z = tcg_temp_new();
-tcg_gen_trunc_i64_tl(Z, cmp);
-
-/*
- * Extract the result values for the register pair.
- * For 32-bit, we may do this unconditionally, because on success (Z=1),
- * the old value matches the previous value in EDX:EAX.  For x86_64,
- * the store must be conditional, because we must leave the source
- * registers unchanged on success, and zero-extend the writeback
- * on failure (Z=0).
- */
-if (TARGET_LONG_BITS == 32) {
-tcg_gen_extr_i64_tl(cpu_regs[R_EAX], cpu_regs[R_EDX], old);
-} else {
-TCGv zero = tcg_constant_tl(0);
-
-tcg_gen_extr_i64_tl(s->T0, s->T1, old);
-tcg_gen_movcond_tl(TCG_COND_EQ, cpu_regs[R_EAX], Z, zero,
-   s->T0, cpu_regs[R_EAX]);
-tcg_gen_movcond_tl(TCG_COND_EQ, cpu_regs[R_EDX], Z, zero,
-   s->T1, cpu_regs[R_EDX]);
-}
-
-/* Update Z. */
-gen_compute_eflags(s);
-tcg_gen_deposit_tl(cpu_cc_src, cpu_cc_src, Z, ctz32(CC_Z), 1);
-}
-
-#ifdef TARGET_X86_64
-static void gen_cmpxchg16b(DisasContext *s, X86DecodedInsn *decode)
-{
-MemOp mop = MO_TE | MO_128 | MO_ALIGN;
-TCGv_i64 t0, t1;
-TCGv_i128 cmp, val;
-
-gen_lea_modrm(s, decode);
-
-cmp = tcg_temp_new_i128();
-val = tcg_temp_new_i128();
-tcg_gen_concat_i64_i128(cmp, cpu_regs[R_EAX], cpu_regs[R_EDX]);
-tcg_gen_concat_i64_i128(val, cpu_regs[R_EBX], cpu_regs[R_ECX]);
-
-/* Only require atomic with LOCK; non-parallel handled in generator. */
-if (s->prefix & PREFIX_LOCK) {
-tcg_gen_atomic_cmpxchg_i128(val, s->A0, cmp, val, s->mem_index, mop);
-} else {
-tcg_gen_nonatomic_cmpxchg_i128(val, s->A0, cmp, val, s->mem_index, 
mop);
-}
-
-tcg_gen_extr_i128_i64(s->T0, s->T1, val);
-
-/* Determine success after the fact. */
-t0 = tcg_temp_new_i64();
-t1 = tcg_temp_new_i64();
-tcg_gen_xor_i64(t0, s->T0, cpu_regs[R_EAX]);
-tcg_gen_xor_i64(t1, s->T1, cpu_regs[R_EDX]);
-tcg_gen_or_i64(t0, t0, t1);
-
-/* Update Z. */
-gen_compute_eflags(s);
-tcg_gen_setcondi_i64(TCG_COND_EQ, t0, t0, 0);
-tcg_gen_deposit_tl(cpu_cc_src, cpu_cc_src, t0, ctz32(CC_Z), 1);
-
-/*
- * Extract the result values for the register pair.  We may do this
- * unconditionally, because on success (Z=1), the old value matches
- * the previous value in RDX:RAX.
- */
-tcg_gen_mov_i64(cpu_regs[R_EAX], s->T0);
-tcg_gen_mov_i64(cpu_regs[R_EDX], s->T1);
-}
-#endif
-
 #include "emit.c.inc"
 
 static void gen_x87(DisasContext *s, X86DecodedInsn *decode)
@@ -2971,29 +2873,10 @@ static void gen_multi0F(DisasContext

[PULL 23/23] exec: don't use void* in pointer arithmetic in headers

From: Roman Kiryanov 

void* pointer arithmetic is a GCC extentension which could not be
available in other build tools (e.g. C++). This changes removes this
assumption.

Signed-off-by: Roman Kiryanov 
Suggested-by: Paolo Bonzini 
Link: https://lore.kernel.org/r/20240620201654.598024-1-r...@google.com
Signed-off-by: Paolo Bonzini 
---
 include/exec/memory.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index d7591a60d9f..08ecd7e195d 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -2796,7 +2796,7 @@ MemTxResult address_space_write_rom(AddressSpace *as, 
hwaddr addr,
 #include "exec/memory_ldst_phys.h.inc"
 
 struct MemoryRegionCache {
-void *ptr;
+uint8_t *ptr;
 hwaddr xlat;
 hwaddr len;
 FlatView *fv;
-- 
2.45.2

[PULL 20/23] block: remove separate bdrv_file_open callback

bdrv_file_open and bdrv_open are completely equivalent, they are
never checked except to see which one to invoke.  So merge them
into a single one.

Signed-off-by: Paolo Bonzini 
---
 include/block/block_int-common.h | 3 ---
 block.c  | 4 +---
 block/blkdebug.c | 2 +-
 block/blkio.c| 2 +-
 block/blkverify.c| 2 +-
 block/curl.c | 8 
 block/file-posix.c   | 8 
 block/file-win32.c   | 4 ++--
 block/gluster.c  | 6 +++---
 block/iscsi.c| 4 ++--
 block/nbd.c  | 6 +++---
 block/nfs.c  | 2 +-
 block/null.c | 4 ++--
 block/nvme.c | 2 +-
 block/rbd.c  | 3 ++-
 block/ssh.c  | 2 +-
 block/vvfat.c| 2 +-
 17 files changed, 30 insertions(+), 34 deletions(-)

diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index 761276127ed..ebb4e56a503 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -248,9 +248,6 @@ struct BlockDriver {
 int GRAPH_UNLOCKED_PTR (*bdrv_open)(
 BlockDriverState *bs, QDict *options, int flags, Error **errp);
 
-/* Protocol drivers should implement this instead of bdrv_open */
-int GRAPH_UNLOCKED_PTR (*bdrv_file_open)(
-BlockDriverState *bs, QDict *options, int flags, Error **errp);
 void (*bdrv_close)(BlockDriverState *bs);
 
 int coroutine_fn GRAPH_UNLOCKED_PTR (*bdrv_co_create)(
diff --git a/block.c b/block.c
index dd14ba85fc3..c1cc313d216 100644
--- a/block.c
+++ b/block.c
@@ -1655,9 +1655,7 @@ bdrv_open_driver(BlockDriverState *bs, BlockDriver *drv, 
const char *node_name,
 bs->opaque = g_malloc0(drv->instance_size);
 
 assert(!drv->bdrv_needs_filename || bs->filename[0]);
-if (drv->bdrv_file_open) {
-ret = drv->bdrv_file_open(bs, options, open_flags, _err);
-} else if (drv->bdrv_open) {
+if (drv->bdrv_open) {
 ret = drv->bdrv_open(bs, options, open_flags, _err);
 } else {
 ret = 0;
diff --git a/block/blkdebug.c b/block/blkdebug.c
index 9da8c9eddc2..c95c818c388 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -1073,7 +1073,7 @@ static BlockDriver bdrv_blkdebug = {
 .is_filter  = true,
 
 .bdrv_parse_filename= blkdebug_parse_filename,
-.bdrv_file_open = blkdebug_open,
+.bdrv_open  = blkdebug_open,
 .bdrv_close = blkdebug_close,
 .bdrv_reopen_prepare= blkdebug_reopen_prepare,
 .bdrv_child_perm= blkdebug_child_perm,
diff --git a/block/blkio.c b/block/blkio.c
index 882e1c297b4..1a38064ce76 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -1088,7 +1088,7 @@ static void blkio_refresh_limits(BlockDriverState *bs, 
Error **errp)
  */
 #define BLKIO_DRIVER_COMMON \
 .instance_size   = sizeof(BDRVBlkioState), \
-.bdrv_file_open  = blkio_file_open, \
+.bdrv_open   = blkio_file_open, \
 .bdrv_close  = blkio_close, \
 .bdrv_co_getlength   = blkio_co_getlength, \
 .bdrv_co_truncate= blkio_truncate, \
diff --git a/block/blkverify.c b/block/blkverify.c
index ec45d8335ed..5a9bf674d9c 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -321,7 +321,7 @@ static BlockDriver bdrv_blkverify = {
 .instance_size= sizeof(BDRVBlkverifyState),
 
 .bdrv_parse_filename  = blkverify_parse_filename,
-.bdrv_file_open   = blkverify_open,
+.bdrv_open= blkverify_open,
 .bdrv_close   = blkverify_close,
 .bdrv_child_perm  = bdrv_default_perms,
 .bdrv_co_getlength= blkverify_co_getlength,
diff --git a/block/curl.c b/block/curl.c
index 419f7c89ef2..ef5252d00b5 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -1034,7 +1034,7 @@ static BlockDriver bdrv_http = {
 
 .instance_size  = sizeof(BDRVCURLState),
 .bdrv_parse_filename= curl_parse_filename,
-.bdrv_file_open = curl_open,
+.bdrv_open  = curl_open,
 .bdrv_close = curl_close,
 .bdrv_co_getlength  = curl_co_getlength,
 
@@ -1053,7 +1053,7 @@ static BlockDriver bdrv_https = {
 
 .instance_size  = sizeof(BDRVCURLState),
 .bdrv_parse_filename= curl_parse_filename,
-.bdrv_file_open = curl_open,
+.bdrv_open  = curl_open,
 .bdrv_close = curl_close,
 .bdrv_co_getlength  = curl_co_getlength,
 
@@ -1072,7 +1072,7 @@ static BlockDriver bdrv_ftp = {
 
 .instance_size  = sizeof(BDRVCURLState),
 .bdrv_parse_filename= curl_parse_filename,
-.bdrv_file_open = curl_open,
+.bdrv_open

[PULL 19/23] block: do not check bdrv_file_open

The set of BlockDrivers that have .bdrv_file_open coincides with those
that have .protocol_name and guess what---checking drv->bdrv_file_open
is done to see if the driver is a protocol.  So check drv->protocol_name
instead.

Signed-off-by: Paolo Bonzini 
---
 block.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/block.c b/block.c
index 69a2905178a..dd14ba85fc3 100644
--- a/block.c
+++ b/block.c
@@ -926,7 +926,6 @@ BlockDriver *bdrv_find_protocol(const char *filename,
 int i;
 
 GLOBAL_STATE_CODE();
-/* TODO Drivers without bdrv_file_open must be specified explicitly */
 
 /*
  * XXX(hch): we really should not let host device detection
@@ -1983,7 +1982,7 @@ static int bdrv_open_common(BlockDriverState *bs, 
BlockBackend *file,
 open_flags = bdrv_open_flags(bs, bs->open_flags);
 node_name = qemu_opt_get(opts, "node-name");
 
-assert(!drv->bdrv_file_open || file == NULL);
+assert(!drv->protocol_name || file == NULL);
 ret = bdrv_open_driver(bs, drv, node_name, options, open_flags, errp);
 if (ret < 0) {
 goto fail_opts;
@@ -2084,7 +2083,7 @@ static int bdrv_fill_options(QDict **options, const char 
*filename,
 }
 /* If the user has explicitly specified the driver, this choice should
  * override the BDRV_O_PROTOCOL flag */
-protocol = drv->bdrv_file_open;
+protocol = drv->protocol_name;
 }
 
 if (protocol) {
@@ -4123,7 +4122,7 @@ bdrv_open_inherit(const char *filename, const char 
*reference, QDict *options,
 }
 
 /* BDRV_O_PROTOCOL must be set iff a protocol BDS is about to be created */
-assert(!!(flags & BDRV_O_PROTOCOL) == !!drv->bdrv_file_open);
+assert(!!(flags & BDRV_O_PROTOCOL) == !!drv->protocol_name);
 /* file must be NULL if a protocol BDS is about to be created
  * (the inverse results in an error message from bdrv_open_common()) */
 assert(!(flags & BDRV_O_PROTOCOL) || !file);
@@ -5971,7 +5970,7 @@ int64_t coroutine_fn 
bdrv_co_get_allocated_file_size(BlockDriverState *bs)
 return drv->bdrv_co_get_allocated_file_size(bs);
 }
 
-if (drv->bdrv_file_open) {
+if (drv->protocol_name) {
 /*
  * Protocol drivers default to -ENOTSUP (most of their data is
  * not stored in any of their children (if they even have any),
@@ -8030,7 +8029,7 @@ void bdrv_refresh_filename(BlockDriverState *bs)
  *   Both of these conditions are represented by 
generate_json_filename.
  */
 if (primary_child_bs->exact_filename[0] &&
-primary_child_bs->drv->bdrv_file_open &&
+primary_child_bs->drv->protocol_name &&
 !drv->is_filter && !generate_json_filename)
 {
 strcpy(bs->exact_filename, primary_child_bs->exact_filename);
-- 
2.45.2

[PULL 18/23] block: make assertion more generic

.bdrv_needs_filename is only set for drivers that also set bdrv_file_open,
i.e. protocol drivers.

So we can make the assertion always, it will always pass for those drivers
that use bdrv_open.

Signed-off-by: Paolo Bonzini 
---
 block.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 468cf5e67d7..69a2905178a 100644
--- a/block.c
+++ b/block.c
@@ -1655,8 +1655,8 @@ bdrv_open_driver(BlockDriverState *bs, BlockDriver *drv, 
const char *node_name,
 bs->drv = drv;
 bs->opaque = g_malloc0(drv->instance_size);
 
+assert(!drv->bdrv_needs_filename || bs->filename[0]);
 if (drv->bdrv_file_open) {
-assert(!drv->bdrv_needs_filename || bs->filename[0]);
 ret = drv->bdrv_file_open(bs, options, open_flags, _err);
 } else if (drv->bdrv_open) {
 ret = drv->bdrv_open(bs, options, open_flags, _err);
-- 
2.45.2

[PULL 12/23] target/i386: remove gen_ext_tl

With the introduction of tcg_gen_ext_tl, most uses can be converted directly
because they do not have a NULL destination.  tcg_gen_ext_tl is able to drop
no-ops like "tcg_gen_ext_tl(tcgv, tcgv, MO_TL)" just fine, and the only thing
that gen_ext_tl was adding on top was avoiding the creation of a useless
temporary.  This can be done in the only place where it matters, which is
gen_op_j_ecx.

Reviewed-by: Richard Henderson 
Signed-off-by: Paolo Bonzini 
---
 target/i386/tcg/translate.c | 41 +++--
 1 file changed, 17 insertions(+), 24 deletions(-)

diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index d11c5e1dc13..5c9c992400e 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -697,23 +697,16 @@ static inline TCGv gen_compute_Dshift(DisasContext *s, 
MemOp ot)
 return dshift;
 };
 
-static TCGv gen_ext_tl(TCGv dst, TCGv src, MemOp size, bool sign)
-{
-if (size == MO_TL) {
-return src;
-}
-if (!dst) {
-dst = tcg_temp_new();
-}
-tcg_gen_ext_tl(dst, src, size | (sign ? MO_SIGN : 0));
-return dst;
-}
-
 static void gen_op_j_ecx(DisasContext *s, TCGCond cond, TCGLabel *label1)
 {
-TCGv tmp = gen_ext_tl(NULL, cpu_regs[R_ECX], s->aflag, false);
-
-tcg_gen_brcondi_tl(cond, tmp, 0, label1);
+TCGv lhs;
+if (s->aflag == MO_TL) {
+lhs = cpu_regs[R_ECX];
+} else {
+lhs = tcg_temp_new();
+tcg_gen_ext_tl(lhs, cpu_regs[R_ECX], s->aflag);
+}
+tcg_gen_brcondi_tl(cond, lhs, 0, label1);
 }
 
 static inline void gen_op_jz_ecx(DisasContext *s, TCGLabel *label1)
@@ -886,16 +879,16 @@ static CCPrepare gen_prepare_eflags_c(DisasContext *s, 
TCGv reg)
 case CC_OP_SUBB ... CC_OP_SUBQ:
 /* (DATA_TYPE)CC_SRCT < (DATA_TYPE)CC_SRC */
 size = s->cc_op - CC_OP_SUBB;
-gen_ext_tl(s->cc_srcT, s->cc_srcT, size, false);
-gen_ext_tl(cpu_cc_src, cpu_cc_src, size, false);
+tcg_gen_ext_tl(s->cc_srcT, s->cc_srcT, size);
+tcg_gen_ext_tl(cpu_cc_src, cpu_cc_src, size);
 return (CCPrepare) { .cond = TCG_COND_LTU, .reg = s->cc_srcT,
  .reg2 = cpu_cc_src, .use_reg2 = true };
 
 case CC_OP_ADDB ... CC_OP_ADDQ:
 /* (DATA_TYPE)CC_DST < (DATA_TYPE)CC_SRC */
 size = s->cc_op - CC_OP_ADDB;
-gen_ext_tl(cpu_cc_dst, cpu_cc_dst, size, false);
-gen_ext_tl(cpu_cc_src, cpu_cc_src, size, false);
+tcg_gen_ext_tl(cpu_cc_dst, cpu_cc_dst, size);
+tcg_gen_ext_tl(cpu_cc_src, cpu_cc_src, size);
 return (CCPrepare) { .cond = TCG_COND_LTU, .reg = cpu_cc_dst,
  .reg2 = cpu_cc_src, .use_reg2 = true };
 
@@ -920,7 +913,7 @@ static CCPrepare gen_prepare_eflags_c(DisasContext *s, TCGv 
reg)
 
 case CC_OP_BMILGB ... CC_OP_BMILGQ:
 size = s->cc_op - CC_OP_BMILGB;
-gen_ext_tl(cpu_cc_src, cpu_cc_src, size, false);
+tcg_gen_ext_tl(cpu_cc_src, cpu_cc_src, size);
 return (CCPrepare) { .cond = TCG_COND_EQ, .reg = cpu_cc_src };
 
 case CC_OP_ADCX:
@@ -1050,8 +1043,8 @@ static CCPrepare gen_prepare_cc(DisasContext *s, int b, 
TCGv reg)
 size = s->cc_op - CC_OP_SUBB;
 switch (jcc_op) {
 case JCC_BE:
-gen_ext_tl(s->cc_srcT, s->cc_srcT, size, false);
-gen_ext_tl(cpu_cc_src, cpu_cc_src, size, false);
+tcg_gen_ext_tl(s->cc_srcT, s->cc_srcT, size);
+tcg_gen_ext_tl(cpu_cc_src, cpu_cc_src, size);
 cc = (CCPrepare) { .cond = TCG_COND_LEU, .reg = s->cc_srcT,
.reg2 = cpu_cc_src, .use_reg2 = true };
 break;
@@ -1061,8 +1054,8 @@ static CCPrepare gen_prepare_cc(DisasContext *s, int b, 
TCGv reg)
 case JCC_LE:
 cond = TCG_COND_LE;
 fast_jcc_l:
-gen_ext_tl(s->cc_srcT, s->cc_srcT, size, true);
-gen_ext_tl(cpu_cc_src, cpu_cc_src, size, true);
+tcg_gen_ext_tl(s->cc_srcT, s->cc_srcT, size | MO_SIGN);
+tcg_gen_ext_tl(cpu_cc_src, cpu_cc_src, size | MO_SIGN);
 cc = (CCPrepare) { .cond = cond, .reg = s->cc_srcT,
.reg2 = cpu_cc_src, .use_reg2 = true };
 break;
-- 
2.45.2

[PULL 16/23] meson: allow configuring the x86-64 baseline

Add a Meson option to configure which x86-64 instruction
set to use.  QEMU will now default to x86-64-v1 + cmpxchg16b for
64-bit builds (that corresponds to a Pentium 4 for 32-bit builds).

The baseline can be tuned down to Pentium Pro for 32-bit builds (with
-Dx86_version=0), or up as desired.

Acked-by: Richard Henderson 
Signed-off-by: Paolo Bonzini 
---
 meson.build   | 41 ---
 meson_options.txt |  3 +++
 scripts/meson-buildoptions.sh |  3 +++
 3 files changed, 39 insertions(+), 8 deletions(-)

diff --git a/meson.build b/meson.build
index 97e00d6f59b..6e694ecd9fe 100644
--- a/meson.build
+++ b/meson.build
@@ -336,15 +336,40 @@ if host_arch == 'i386' and not cc.links('''
   qemu_common_flags = ['-march=i486'] + qemu_common_flags
 endif
 
-# Assume x86-64-v2 (minus CMPXCHG16B for 32-bit code)
-if host_arch == 'i386'
-  qemu_common_flags = ['-mfpmath=sse'] + qemu_common_flags
-endif
+# Pick x86-64 baseline version
 if host_arch in ['i386', 'x86_64']
-  qemu_common_flags = ['-mpopcnt', '-msse4.2'] + qemu_common_flags
-endif
-if host_arch == 'x86_64'
-  qemu_common_flags = ['-mcx16'] + qemu_common_flags
+  if get_option('x86_version') == '0' and host_arch == 'x86_64'
+error('x86_64-v1 required for x86-64 hosts')
+  endif
+
+  # add flags for individual instruction set extensions
+  if get_option('x86_version') >= '1'
+if host_arch == 'i386'
+  qemu_common_flags = ['-mfpmath=sse'] + qemu_common_flags
+else
+  # present on basically all processors but technically not part of
+  # x86-64-v1, so only include -mneeded for x86-64 version 2 and above
+  qemu_common_flags = ['-mcx16'] + qemu_common_flags
+endif
+  endif
+  if get_option('x86_version') >= '2'
+qemu_common_flags = ['-mpopcnt'] + qemu_common_flags
+qemu_common_flags = cc.get_supported_arguments('-mneeded') + 
qemu_common_flags
+  endif
+  if get_option('x86_version') >= '3'
+qemu_common_flags = ['-mmovbe', '-mabm', '-mbmi1', '-mbmi2', '-mfma', 
'-mf16c'] + qemu_common_flags
+  endif
+
+  # add required vector instruction set (each level implies those below)
+  if get_option('x86_version') == '1'
+qemu_common_flags = ['-msse2'] + qemu_common_flags
+  elif get_option('x86_version') == '2'
+qemu_common_flags = ['-msse4.2'] + qemu_common_flags
+  elif get_option('x86_version') == '3'
+qemu_common_flags = ['-mavx2'] + qemu_common_flags
+  elif get_option('x86_version') == '4'
+qemu_common_flags = ['-mavx512f', '-mavx512bw', '-mavx512cd', 
'-mavx512dq', '-mavx512vl'] + qemu_common_flags
+  endif
 endif
 
 if get_option('prefer_static')
diff --git a/meson_options.txt b/meson_options.txt
index 7a79dd89706..6065ed2d352 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -370,3 +370,6 @@ option('qemu_ga_version', type: 'string', value: '',
 
 option('hexagon_idef_parser', type : 'boolean', value : true,
description: 'use idef-parser to automatically generate TCG code for 
the Hexagon frontend')
+
+option('x86_version', type : 'combo', choices : ['0', '1', '2', '3', '4'], 
value: '1',
+   description: 'tweak required x86_64 architecture version beyond 
compiler default')
diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
index 58d49a447d5..62842d47e88 100644
--- a/scripts/meson-buildoptions.sh
+++ b/scripts/meson-buildoptions.sh
@@ -82,6 +82,8 @@ meson_options_help() {
   printf "%s\n" '  --with-suffix=VALUE  Suffix for QEMU 
data/modules/config directories'
   printf "%s\n" '   (can be empty) [qemu]'
   printf "%s\n" '  --with-trace-file=VALUE  Trace file prefix for simple 
backend [trace]'
+  printf "%s\n" '  --x86-version=CHOICE tweak required x86_64 architecture 
version beyond'
+  printf "%s\n" '   compiler default [1] (choices: 
0/1/2/3)'
   printf "%s\n" ''
   printf "%s\n" 'Optional features, enabled with --enable-FEATURE and'
   printf "%s\n" 'disabled with --disable-FEATURE, default is enabled if 
available'
@@ -552,6 +554,7 @@ _meson_option_parse() {
 --disable-werror) printf "%s" -Dwerror=false ;;
 --enable-whpx) printf "%s" -Dwhpx=enabled ;;
 --disable-whpx) printf "%s" -Dwhpx=disabled ;;
+--x86-version=*) quote_sh "-Dx86_version=$2" ;;
 --enable-xen) printf "%s" -Dxen=enabled ;;
 --disable-xen) printf "%s" -Dxen=disabled ;;
 --enable-xen-pci-passthrough) printf "%s" -Dxen_pci_passthrough=enabled ;;
-- 
2.45.2

[PULL 13/23] Revert "host/i386: assume presence of POPCNT"

This reverts commit 45ccdbcb24baf99667997fac5cf60318e5e7db51.
The x86-64 instruction set can now be tuned down to x86-64 v1
or i386 Pentium Pro.

Signed-off-by: Paolo Bonzini 
---
 host/include/i386/host/cpuinfo.h | 1 +
 tcg/i386/tcg-target.h| 5 +++--
 util/cpuinfo-i386.c  | 1 +
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/host/include/i386/host/cpuinfo.h b/host/include/i386/host/cpuinfo.h
index c1e94d75ce1..72f6fad61e5 100644
--- a/host/include/i386/host/cpuinfo.h
+++ b/host/include/i386/host/cpuinfo.h
@@ -11,6 +11,7 @@
 #define CPUINFO_ALWAYS  (1u << 0)  /* so cpuinfo is nonzero */
 #define CPUINFO_MOVBE   (1u << 2)
 #define CPUINFO_LZCNT   (1u << 3)
+#define CPUINFO_POPCNT  (1u << 4)
 #define CPUINFO_BMI1(1u << 5)
 #define CPUINFO_BMI2(1u << 6)
 #define CPUINFO_AVX1(1u << 9)
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index ecc69827287..2f67a97e059 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -111,6 +111,7 @@ typedef enum {
 #endif
 
 #define have_bmi1 (cpuinfo & CPUINFO_BMI1)
+#define have_popcnt   (cpuinfo & CPUINFO_POPCNT)
 #define have_avx1 (cpuinfo & CPUINFO_AVX1)
 #define have_avx2 (cpuinfo & CPUINFO_AVX2)
 #define have_movbe(cpuinfo & CPUINFO_MOVBE)
@@ -142,7 +143,7 @@ typedef enum {
 #define TCG_TARGET_HAS_nor_i32  0
 #define TCG_TARGET_HAS_clz_i32  1
 #define TCG_TARGET_HAS_ctz_i32  1
-#define TCG_TARGET_HAS_ctpop_i321
+#define TCG_TARGET_HAS_ctpop_i32have_popcnt
 #define TCG_TARGET_HAS_deposit_i32  1
 #define TCG_TARGET_HAS_extract_i32  1
 #define TCG_TARGET_HAS_sextract_i32 1
@@ -177,7 +178,7 @@ typedef enum {
 #define TCG_TARGET_HAS_nor_i64  0
 #define TCG_TARGET_HAS_clz_i64  1
 #define TCG_TARGET_HAS_ctz_i64  1
-#define TCG_TARGET_HAS_ctpop_i641
+#define TCG_TARGET_HAS_ctpop_i64have_popcnt
 #define TCG_TARGET_HAS_deposit_i64  1
 #define TCG_TARGET_HAS_extract_i64  1
 #define TCG_TARGET_HAS_sextract_i64 0
diff --git a/util/cpuinfo-i386.c b/util/cpuinfo-i386.c
index 8f2694d88f2..6d474a6259a 100644
--- a/util/cpuinfo-i386.c
+++ b/util/cpuinfo-i386.c
@@ -35,6 +35,7 @@ unsigned __attribute__((constructor)) cpuinfo_init(void)
 __cpuid(1, a, b, c, d);
 
 info |= (c & bit_MOVBE ? CPUINFO_MOVBE : 0);
+info |= (c & bit_POPCNT ? CPUINFO_POPCNT : 0);
 info |= (c & bit_PCLMUL ? CPUINFO_PCLMUL : 0);
 
 /* NOTE: our AES support requires SSSE3 (PSHUFB) as well. */
-- 
2.45.2

[PULL 09/23] target/i386: do not check PREFIX_LOCK in old-style decoder

It is already checked before getting there.

Reviewed-by: Richard Henderson 
Signed-off-by: Paolo Bonzini 
---
 target/i386/tcg/translate.c | 26 --
 1 file changed, 8 insertions(+), 18 deletions(-)

diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index c60f18c7482..501a1ef9313 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -2878,7 +2878,7 @@ static void gen_multi0F(DisasContext *s, X86DecodedInsn 
*decode)
 switch ((modrm >> 3) & 7) {
 case 7:
 if (mod != 3 ||
-(s->prefix & (PREFIX_LOCK | PREFIX_REPNZ))) {
+(s->prefix & PREFIX_REPNZ)) {
 goto illegal_op;
 }
 if (s->prefix & PREFIX_REPZ) {
@@ -2898,7 +2898,7 @@ static void gen_multi0F(DisasContext *s, X86DecodedInsn 
*decode)
 
 case 6: /* RDRAND */
 if (mod != 3 ||
-(s->prefix & (PREFIX_LOCK | PREFIX_REPZ | PREFIX_REPNZ)) ||
+(s->prefix & (PREFIX_REPZ | PREFIX_REPNZ)) ||
 !(s->cpuid_ext_features & CPUID_EXT_RDRAND)) {
 goto illegal_op;
 }
@@ -3058,8 +3058,7 @@ static void gen_multi0F(DisasContext *s, X86DecodedInsn 
*decode)
 
 case 0xd0: /* xgetbv */
 if ((s->cpuid_ext_features & CPUID_EXT_XSAVE) == 0
-|| (s->prefix & (PREFIX_LOCK | PREFIX_DATA
- | PREFIX_REPZ | PREFIX_REPNZ))) {
+|| (s->prefix & (PREFIX_DATA | PREFIX_REPZ | PREFIX_REPNZ))) {
 goto illegal_op;
 }
 tcg_gen_trunc_tl_i32(s->tmp2_i32, cpu_regs[R_ECX]);
@@ -3069,8 +3068,7 @@ static void gen_multi0F(DisasContext *s, X86DecodedInsn 
*decode)
 
 case 0xd1: /* xsetbv */
 if ((s->cpuid_ext_features & CPUID_EXT_XSAVE) == 0
-|| (s->prefix & (PREFIX_LOCK | PREFIX_DATA
- | PREFIX_REPZ | PREFIX_REPNZ))) {
+|| (s->prefix & (PREFIX_DATA | PREFIX_REPZ | PREFIX_REPNZ))) {
 goto illegal_op;
 }
 gen_svm_check_intercept(s, SVM_EXIT_XSETBV);
@@ -3237,8 +3235,7 @@ static void gen_multi0F(DisasContext *s, X86DecodedInsn 
*decode)
 gen_st_modrm(s, decode, ot);
 break;
 case 0xee: /* rdpkru */
-if (s->prefix & (PREFIX_LOCK | PREFIX_DATA
- | PREFIX_REPZ | PREFIX_REPNZ)) {
+if (s->prefix & (PREFIX_DATA | PREFIX_REPZ | PREFIX_REPNZ)) {
 goto illegal_op;
 }
 tcg_gen_trunc_tl_i32(s->tmp2_i32, cpu_regs[R_ECX]);
@@ -3246,8 +3243,7 @@ static void gen_multi0F(DisasContext *s, X86DecodedInsn 
*decode)
 tcg_gen_extr_i64_tl(cpu_regs[R_EAX], cpu_regs[R_EDX], s->tmp1_i64);
 break;
 case 0xef: /* wrpkru */
-if (s->prefix & (PREFIX_LOCK | PREFIX_DATA
- | PREFIX_REPZ | PREFIX_REPNZ)) {
+if (s->prefix & (PREFIX_DATA | PREFIX_REPZ | PREFIX_REPNZ)) {
 goto illegal_op;
 }
 tcg_gen_concat_tl_i64(s->tmp1_i64, cpu_regs[R_EAX],
@@ -3323,7 +3319,6 @@ static void gen_multi0F(DisasContext *s, X86DecodedInsn 
*decode)
 if (prefixes & PREFIX_REPZ) {
 /* bndcl */
 if (reg >= 4
-|| (prefixes & PREFIX_LOCK)
 || s->aflag == MO_16) {
 goto illegal_op;
 }
@@ -3331,7 +3326,6 @@ static void gen_multi0F(DisasContext *s, X86DecodedInsn 
*decode)
 } else if (prefixes & PREFIX_REPNZ) {
 /* bndcu */
 if (reg >= 4
-|| (prefixes & PREFIX_LOCK)
 || s->aflag == MO_16) {
 goto illegal_op;
 }
@@ -3345,7 +3339,7 @@ static void gen_multi0F(DisasContext *s, X86DecodedInsn 
*decode)
 }
 if (mod == 3) {
 int reg2 = (modrm & 7) | REX_B(s);
-if (reg2 >= 4 || (prefixes & PREFIX_LOCK)) {
+if (reg2 >= 4) {
 goto illegal_op;
 }
 if (s->flags & HF_MPX_IU_MASK) {
@@ -3374,7 +3368,6 @@ static void gen_multi0F(DisasContext *s, X86DecodedInsn 
*decode)
 /* bndldx */
 AddressParts a = decode->mem;
 if (reg >= 4
-|| (prefixes & PREFIX_LOCK)
 || s->aflag == MO_16
 || a.base < -1) {
 goto illegal_op;
@@ -3410,7 +3403,6 @@ static void gen_multi0F(DisasContext *s, X86DecodedInsn 
*decode)
 if (mod != 3 && (prefixes & PREFIX_REPZ)) {
 /* bndmk */
 if (reg >= 4
-|| (prefixes & PREFIX_LOCK)
 || s->aflag == MO_16) {

[PULL 03/23] target/i386: use cpu_cc_dst for CC_OP_POPCNT

It is the only CCOp, among those that compute ZF from one of the cc_op_*
registers, that uses cpu_cc_src.  Do not make it the odd one off,
instead use cpu_cc_dst like the others.

Reviewed-by: Richard Henderson 
Signed-off-by: Paolo Bonzini 
---
 target/i386/cpu.h   | 2 +-
 target/i386/tcg/cc_helper.c | 2 +-
 target/i386/tcg/translate.c | 4 ++--
 target/i386/tcg/emit.c.inc  | 4 ++--
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 7e2a9b56aea..f54cd93b3f9 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1332,7 +1332,7 @@ typedef enum {
 CC_OP_BMILGQ,
 
 CC_OP_CLR, /* Z set, all other flags clear.  */
-CC_OP_POPCNT, /* Z via CC_SRC, all other flags clear.  */
+CC_OP_POPCNT, /* Z via CC_DST, all other flags clear.  */
 
 CC_OP_NB,
 } CCOp;
diff --git a/target/i386/tcg/cc_helper.c b/target/i386/tcg/cc_helper.c
index f76e9cb8cfb..301ed954064 100644
--- a/target/i386/tcg/cc_helper.c
+++ b/target/i386/tcg/cc_helper.c
@@ -107,7 +107,7 @@ target_ulong helper_cc_compute_all(target_ulong dst, 
target_ulong src1,
 case CC_OP_CLR:
 return CC_Z | CC_P;
 case CC_OP_POPCNT:
-return src1 ? 0 : CC_Z;
+return dst ? 0 : CC_Z;
 
 case CC_OP_MULB:
 return compute_all_mulb(dst, src1);
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index ad1819815ab..eb353dc3c9f 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -324,7 +324,7 @@ static const uint8_t cc_op_live[CC_OP_NB] = {
 [CC_OP_ADOX] = USES_CC_SRC | USES_CC_SRC2,
 [CC_OP_ADCOX] = USES_CC_DST | USES_CC_SRC | USES_CC_SRC2,
 [CC_OP_CLR] = 0,
-[CC_OP_POPCNT] = USES_CC_SRC,
+[CC_OP_POPCNT] = USES_CC_DST,
 };
 
 static void set_cc_op_1(DisasContext *s, CCOp op, bool dirty)
@@ -1020,7 +1020,7 @@ static CCPrepare gen_prepare_eflags_z(DisasContext *s, 
TCGv reg)
 case CC_OP_CLR:
 return (CCPrepare) { .cond = TCG_COND_ALWAYS };
 case CC_OP_POPCNT:
-return (CCPrepare) { .cond = TCG_COND_EQ, .reg = cpu_cc_src };
+return (CCPrepare) { .cond = TCG_COND_EQ, .reg = cpu_cc_dst };
 default:
 {
 MemOp size = (s->cc_op - CC_OP_ADDB) & 3;
diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc
index 11faa70b5e2..fc7477833bc 100644
--- a/target/i386/tcg/emit.c.inc
+++ b/target/i386/tcg/emit.c.inc
@@ -2804,10 +2804,10 @@ static void gen_POPA(DisasContext *s, X86DecodedInsn 
*decode)
 
 static void gen_POPCNT(DisasContext *s, X86DecodedInsn *decode)
 {
-decode->cc_src = tcg_temp_new();
+decode->cc_dst = tcg_temp_new();
 decode->cc_op = CC_OP_POPCNT;
 
-tcg_gen_mov_tl(decode->cc_src, s->T0);
+tcg_gen_mov_tl(decode->cc_dst, s->T0);
 tcg_gen_ctpop_tl(s->T0, s->T0);
 }
 
-- 
2.45.2

[PULL 11/23] target/i386: assert that cc_op* and pc_save are preserved

Now all decoding has been done before any code generation.
There is no need anymore to save and restore cc_op* and
pc_save but, for the time being, assert that this is indeed
the case.

Reviewed-by: Richard Henderson 
Signed-off-by: Paolo Bonzini 
---
 target/i386/tcg/translate.c | 12 +++-
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 501a1ef9313..d11c5e1dc13 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -3709,15 +3709,9 @@ static void i386_tr_translate_insn(DisasContextBase 
*dcbase, CPUState *cpu)
 case 2:
 /* Restore state that may affect the next instruction. */
 dc->pc = dc->base.pc_next;
-/*
- * TODO: These save/restore can be removed after the table-based
- * decoder is complete; we will be decoding the insn completely
- * before any code generation that might affect these variables.
- */
-dc->cc_op_dirty = orig_cc_op_dirty;
-dc->cc_op = orig_cc_op;
-dc->pc_save = orig_pc_save;
-/* END TODO */
+assert(dc->cc_op_dirty == orig_cc_op_dirty);
+assert(dc->cc_op == orig_cc_op);
+assert(dc->pc_save == orig_pc_save);
 dc->base.num_insns--;
 tcg_remove_ops_after(dc->prev_insn_end);
 dc->base.insn_start = dc->prev_insn_start;
-- 
2.45.2

[PULL 06/23] target/i386: try not to force EFLAGS computation for CC_OP_ADOX/ADCX