Re: [PATCH] target: ppc: Correctly initialize HILE in HID-0 for book3s processors

2023-04-27 Thread Vaibhav Jain


Hi Fabiano,

Thanks for looking into this patch and apologies for the delayed reponse.
Fabiano Rosas  writes:

> Narayana Murty N  writes:
>
>> On PPC64 the HILE(Hypervisor Interrupt Little Endian) bit in HID-0
>> register needs to be initialized as per isa 3.0b[1] section
>> 2.10. This bit gets copied to the MSR_LE when handling interrupts that
>> are handled in HV mode to establish the Endianess mode of the interrupt
>> handler.
>>
>> Qemu's ppc_interrupts_little_endian() depends on HILE to determine Host
>> endianness which is then used to determine the endianess of the guest dump.
>>
>
> Not quite. We use the interrupt endianness as a proxy to guest
> endianness to avoid reading MSR_LE at an inopportune moment when the
> guest is switching endianness.
Agreed

> This is not dependent on host
> endianness. The HILE check is used when taking a memory dump of a
> HV-capable machine such as the emulated powernv.

I think one concern which the patch tries to address is the guest memorydump 
file
generated of a BigEndian(BE) guest on a LittleEndian(LE) host is not readable on
the same LE host since 'crash' doesnt support cross endianess
dumps. Also even for a LE guest on LE host the memory dumps are marked as BE
making it not possible to analyze any guest memory dumps on the host.

However setting the HILE based on host endianess of qemu might not be
the right way to fix this problem. Based on an off mailing list discussion
with Narayana, he is working on another patch which doesnt set HILE
based on host endianess. However the problem seems to be stemming from
fact that qemu on KVM is using the HILE to set up the endianess of
memory-dump elf and since its not setup correctly the memory dumps are
in wrong endianess.

> I think the actual issue might be that we're calling
> ppc_interrupts_little_endian with hv=true for the dump.
>
Yes, that is currently the case with cpu_get_dump_info(). Excerpt from
that function below that sets the endianess of the dump:

if (ppc_interrupts_little_endian(cpu, cpu->env.has_hv_mode)) {
info->d_endian = ELFDATA2LSB;
} else {
info->d_endian = ELFDATA2MSB;
}

for pseries kvm guest cpu->env.has_hv_mode is already set hence
ppc_interrupts_little_endian() assumes its running in 'hv' mode. The new
patch from Narayana will be addressing this.

>> Currently the HILE bit is never set in the HID0 register even if the
>> qemu is running in Little-Endian mode. This causes the guest dumps to be
>> always taken in Big-Endian byte ordering. A guest memory dump of a
>> Little-Endian guest running on Little-Endian qemu guest fails with the
>> crash tool as illustrated below:
>>
>
> Could you describe in more detail what is your setup? Specifically
> whether both guests are running TCG or KVM (info kvm) and the state of
> the nested-hv capability in QEMU command line.
Currently the issue is seen with any pseries KVM guest running on a PowerNV 
host.

-- 
Cheers
~ Vaibhav



[PATCH 4/5] virtio: Add shared memory capability

2023-04-27 Thread Gurchetan Singh
From: "Dr. David Alan Gilbert" 

Define a new capability type 'VIRTIO_PCI_CAP_SHARED_MEMORY_CFG' to allow
defining shared memory regions with sizes and offsets of 2^32 and more.
Multiple instances of the capability are allowed and distinguished
by a device-specific 'id'.

Signed-off-by: Dr. David Alan Gilbert 
Signed-off-by: Antonio Caggiano 
Reviewed-by: Gurchetan Singh 
Signed-off-by: Gurchetan Singh 
---
 hw/virtio/virtio-pci.c | 18 ++
 include/hw/virtio/virtio-pci.h |  4 
 2 files changed, 22 insertions(+)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 02fb84a8fa..40a798d794 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1399,6 +1399,24 @@ static int virtio_pci_add_mem_cap(VirtIOPCIProxy *proxy,
 return offset;
 }
 
+int virtio_pci_add_shm_cap(VirtIOPCIProxy *proxy,
+   uint8_t bar, uint64_t offset, uint64_t length,
+   uint8_t id)
+{
+struct virtio_pci_cap64 cap = {
+.cap.cap_len = sizeof cap,
+.cap.cfg_type = VIRTIO_PCI_CAP_SHARED_MEMORY_CFG,
+};
+
+cap.cap.bar = bar;
+cap.cap.length = cpu_to_le32(length);
+cap.length_hi = cpu_to_le32(length >> 32);
+cap.cap.offset = cpu_to_le32(offset);
+cap.offset_hi = cpu_to_le32(offset >> 32);
+cap.cap.id = id;
+return virtio_pci_add_mem_cap(proxy, );
+}
+
 static uint64_t virtio_pci_common_read(void *opaque, hwaddr addr,
unsigned size)
 {
diff --git a/include/hw/virtio/virtio-pci.h b/include/hw/virtio/virtio-pci.h
index ab2051b64b..5a3f182f99 100644
--- a/include/hw/virtio/virtio-pci.h
+++ b/include/hw/virtio/virtio-pci.h
@@ -264,4 +264,8 @@ unsigned virtio_pci_optimal_num_queues(unsigned 
fixed_queues);
 void virtio_pci_set_guest_notifier_fd_handler(VirtIODevice *vdev, VirtQueue 
*vq,
   int n, bool assign,
   bool with_irqfd);
+
+int virtio_pci_add_shm_cap(VirtIOPCIProxy *proxy, uint8_t bar, uint64_t offset,
+   uint64_t length, uint8_t id);
+
 #endif
-- 
2.40.1.495.gc816e09b53d-goog




[PATCH 1/5] hw/display/virtio-gpu-virgl: virtio_gpu_gl -> virtio_gpu_virgl

2023-04-27 Thread Gurchetan Singh
From: Gurchetan Singh 

The virtio-gpu GL device has a heavy dependence on virgl.
Acknowledge this by naming functions accurately.

Signed-off-by: Gurchetan Singh 
Reviewed-by: Philippe Mathieu-Daudé 
---
v1:
 - (Philippe) virtio_gpu_virglrenderer_reset --> virtio_gpu_virgl_reset_renderer

 hw/display/virtio-gpu-gl.c | 27 ++-
 hw/display/virtio-gpu-virgl.c  |  2 +-
 include/hw/virtio/virtio-gpu.h |  2 +-
 3 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/hw/display/virtio-gpu-gl.c b/hw/display/virtio-gpu-gl.c
index e06be60dfb..7d69050b8c 100644
--- a/hw/display/virtio-gpu-gl.c
+++ b/hw/display/virtio-gpu-gl.c
@@ -25,9 +25,10 @@
 
 #include 
 
-static void virtio_gpu_gl_update_cursor_data(VirtIOGPU *g,
- struct virtio_gpu_scanout *s,
- uint32_t resource_id)
+static void
+virtio_gpu_virgl_update_cursor(VirtIOGPU *g,
+   struct virtio_gpu_scanout *s,
+   uint32_t resource_id)
 {
 uint32_t width, height;
 uint32_t pixels, *data;
@@ -48,14 +49,14 @@ static void virtio_gpu_gl_update_cursor_data(VirtIOGPU *g,
 free(data);
 }
 
-static void virtio_gpu_gl_flushed(VirtIOGPUBase *b)
+static void virtio_gpu_virgl_flushed(VirtIOGPUBase *b)
 {
 VirtIOGPU *g = VIRTIO_GPU(b);
 
 virtio_gpu_process_cmdq(g);
 }
 
-static void virtio_gpu_gl_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
+static void virtio_gpu_virgl_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
 {
 VirtIOGPU *g = VIRTIO_GPU(vdev);
 VirtIOGPUGL *gl = VIRTIO_GPU_GL(vdev);
@@ -71,7 +72,7 @@ static void virtio_gpu_gl_handle_ctrl(VirtIODevice *vdev, 
VirtQueue *vq)
 }
 if (gl->renderer_reset) {
 gl->renderer_reset = false;
-virtio_gpu_virgl_reset(g);
+virtio_gpu_virgl_reset_renderer(g);
 }
 
 cmd = virtqueue_pop(vq, sizeof(struct virtio_gpu_ctrl_command));
@@ -87,7 +88,7 @@ static void virtio_gpu_gl_handle_ctrl(VirtIODevice *vdev, 
VirtQueue *vq)
 virtio_gpu_virgl_fence_poll(g);
 }
 
-static void virtio_gpu_gl_reset(VirtIODevice *vdev)
+static void virtio_gpu_virgl_reset(VirtIODevice *vdev)
 {
 VirtIOGPU *g = VIRTIO_GPU(vdev);
 VirtIOGPUGL *gl = VIRTIO_GPU_GL(vdev);
@@ -104,7 +105,7 @@ static void virtio_gpu_gl_reset(VirtIODevice *vdev)
 }
 }
 
-static void virtio_gpu_gl_device_realize(DeviceState *qdev, Error **errp)
+static void virtio_gpu_virgl_device_realize(DeviceState *qdev, Error **errp)
 {
 VirtIOGPU *g = VIRTIO_GPU(qdev);
 
@@ -143,13 +144,13 @@ static void virtio_gpu_gl_class_init(ObjectClass *klass, 
void *data)
 VirtIOGPUBaseClass *vbc = VIRTIO_GPU_BASE_CLASS(klass);
 VirtIOGPUClass *vgc = VIRTIO_GPU_CLASS(klass);
 
-vbc->gl_flushed = virtio_gpu_gl_flushed;
-vgc->handle_ctrl = virtio_gpu_gl_handle_ctrl;
+vbc->gl_flushed = virtio_gpu_virgl_flushed;
+vgc->handle_ctrl = virtio_gpu_virgl_handle_ctrl;
 vgc->process_cmd = virtio_gpu_virgl_process_cmd;
-vgc->update_cursor_data = virtio_gpu_gl_update_cursor_data;
+vgc->update_cursor_data = virtio_gpu_virgl_update_cursor;
 
-vdc->realize = virtio_gpu_gl_device_realize;
-vdc->reset = virtio_gpu_gl_reset;
+vdc->realize = virtio_gpu_virgl_device_realize;
+vdc->reset = virtio_gpu_virgl_reset;
 device_class_set_props(dc, virtio_gpu_gl_properties);
 }
 
diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index 1c47603d40..ffe4ec7f3d 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -599,7 +599,7 @@ void virtio_gpu_virgl_reset_scanout(VirtIOGPU *g)
 }
 }
 
-void virtio_gpu_virgl_reset(VirtIOGPU *g)
+void virtio_gpu_virgl_reset_renderer(VirtIOGPU *g)
 {
 virgl_renderer_reset();
 }
diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index 2e28507efe..21b0f55bc8 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -281,7 +281,7 @@ void virtio_gpu_virgl_process_cmd(VirtIOGPU *g,
   struct virtio_gpu_ctrl_command *cmd);
 void virtio_gpu_virgl_fence_poll(VirtIOGPU *g);
 void virtio_gpu_virgl_reset_scanout(VirtIOGPU *g);
-void virtio_gpu_virgl_reset(VirtIOGPU *g);
+void virtio_gpu_virgl_reset_renderer(VirtIOGPU *g);
 int virtio_gpu_virgl_init(VirtIOGPU *g);
 int virtio_gpu_virgl_get_num_capsets(VirtIOGPU *g);
 
-- 
2.40.1.495.gc816e09b53d-goog




[PATCH 3/5] hw/display/virtio-gpu-virgl: define callbacks in realize function

2023-04-27 Thread Gurchetan Singh
From: Gurchetan Singh 

This reduces the amount of renderer backend specific needed to
be exposed to the GL device.  We only need one realize function
per renderer backend.

Signed-off-by: Gurchetan Singh 
Reviewed-by: Philippe Mathieu-Daudé 
---
v1: - Remove NULL inits (Philippe)
- Use VIRTIO_GPU_BASE where possible (Philippe)

 hw/display/virtio-gpu-gl.c | 15 ++-
 hw/display/virtio-gpu-virgl.c  | 35 --
 include/hw/virtio/virtio-gpu.h |  7 ---
 3 files changed, 31 insertions(+), 26 deletions(-)

diff --git a/hw/display/virtio-gpu-gl.c b/hw/display/virtio-gpu-gl.c
index 2d140e8792..cdc9483e4d 100644
--- a/hw/display/virtio-gpu-gl.c
+++ b/hw/display/virtio-gpu-gl.c
@@ -21,6 +21,11 @@
 #include "hw/virtio/virtio-gpu-pixman.h"
 #include "hw/qdev-properties.h"
 
+static void virtio_gpu_gl_device_realize(DeviceState *qdev, Error **errp)
+{
+virtio_gpu_virgl_device_realize(qdev, errp);
+}
+
 static Property virtio_gpu_gl_properties[] = {
 DEFINE_PROP_BIT("stats", VirtIOGPU, parent_obj.conf.flags,
 VIRTIO_GPU_FLAG_STATS_ENABLED, false),
@@ -31,16 +36,8 @@ static void virtio_gpu_gl_class_init(ObjectClass *klass, 
void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
 VirtioDeviceClass *vdc = VIRTIO_DEVICE_CLASS(klass);
-VirtIOGPUBaseClass *vbc = VIRTIO_GPU_BASE_CLASS(klass);
-VirtIOGPUClass *vgc = VIRTIO_GPU_CLASS(klass);
-
-vbc->gl_flushed = virtio_gpu_virgl_flushed;
-vgc->handle_ctrl = virtio_gpu_virgl_handle_ctrl;
-vgc->process_cmd = virtio_gpu_virgl_process_cmd;
-vgc->update_cursor_data = virtio_gpu_virgl_update_cursor;
 
-vdc->realize = virtio_gpu_virgl_device_realize;
-vdc->reset = virtio_gpu_virgl_reset;
+vdc->realize = virtio_gpu_gl_device_realize;
 device_class_set_props(dc, virtio_gpu_gl_properties);
 }
 
diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index ee5ddb887c..0ff77e9966 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -401,8 +401,9 @@ static void virgl_cmd_get_capset(VirtIOGPU *g,
 g_free(resp);
 }
 
-void virtio_gpu_virgl_process_cmd(VirtIOGPU *g,
-  struct virtio_gpu_ctrl_command *cmd)
+static void
+virtio_gpu_virgl_process_cmd(VirtIOGPU *g,
+ struct virtio_gpu_ctrl_command *cmd)
 {
 VIRTIO_GPU_FILL_CMD(cmd->cmd_hdr);
 
@@ -637,7 +638,7 @@ static int virtio_gpu_virgl_get_num_capsets(VirtIOGPU *g)
 return capset2_max_ver ? 2 : 1;
 }
 
-void
+static void
 virtio_gpu_virgl_update_cursor(VirtIOGPU *g,
struct virtio_gpu_scanout *s,
uint32_t resource_id)
@@ -661,14 +662,14 @@ virtio_gpu_virgl_update_cursor(VirtIOGPU *g,
 free(data);
 }
 
-void virtio_gpu_virgl_flushed(VirtIOGPUBase *b)
+static void virtio_gpu_virgl_flushed(VirtIOGPUBase *b)
 {
 VirtIOGPU *g = VIRTIO_GPU(b);
 
 virtio_gpu_process_cmdq(g);
 }
 
-void virtio_gpu_virgl_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
+static void virtio_gpu_virgl_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
 {
 VirtIOGPU *g = VIRTIO_GPU(vdev);
 VirtIOGPUGL *gl = VIRTIO_GPU_GL(vdev);
@@ -700,7 +701,7 @@ void virtio_gpu_virgl_handle_ctrl(VirtIODevice *vdev, 
VirtQueue *vq)
 virtio_gpu_virgl_fence_poll(g);
 }
 
-void virtio_gpu_virgl_reset(VirtIODevice *vdev)
+static void virtio_gpu_virgl_reset(VirtIODevice *vdev)
 {
 VirtIOGPU *g = VIRTIO_GPU(vdev);
 VirtIOGPUGL *gl = VIRTIO_GPU_GL(vdev);
@@ -719,7 +720,21 @@ void virtio_gpu_virgl_reset(VirtIODevice *vdev)
 
 void virtio_gpu_virgl_device_realize(DeviceState *qdev, Error **errp)
 {
-VirtIOGPU *g = VIRTIO_GPU(qdev);
+VirtIODevice *vdev = VIRTIO_DEVICE(qdev);
+VirtioDeviceClass *vdc = VIRTIO_DEVICE_GET_CLASS(vdev);
+
+VirtIOGPUBase *bdev = VIRTIO_GPU_BASE(qdev);
+VirtIOGPUBaseClass *vbc = VIRTIO_GPU_BASE_GET_CLASS(bdev);
+
+VirtIOGPU *gpudev = VIRTIO_GPU(qdev);
+VirtIOGPUClass *vgc = VIRTIO_GPU_GET_CLASS(gpudev);
+
+vbc->gl_flushed = virtio_gpu_virgl_flushed;
+vgc->handle_ctrl = virtio_gpu_virgl_handle_ctrl;
+vgc->process_cmd = virtio_gpu_virgl_process_cmd;
+vgc->update_cursor_data = virtio_gpu_virgl_update_cursor;
+
+vdc->reset = virtio_gpu_virgl_reset;
 
 #if HOST_BIG_ENDIAN
 error_setg(errp, "virgl is not supported on bigendian platforms");
@@ -737,9 +752,9 @@ void virtio_gpu_virgl_device_realize(DeviceState *qdev, 
Error **errp)
 return;
 }
 
-g->parent_obj.conf.flags |= (1 << VIRTIO_GPU_FLAG_VIRGL_ENABLED);
-VIRTIO_GPU_BASE(g)->virtio_config.num_capsets =
-virtio_gpu_virgl_get_num_capsets(g);
+VIRTIO_GPU_BASE(gpudev)->conf.flags |= (1 << 
VIRTIO_GPU_FLAG_VIRGL_ENABLED);
+VIRTIO_GPU_BASE(gpudev)->virtio_config.num_capsets =
+virtio_gpu_virgl_get_num_capsets(gpudev);
 
 virtio_gpu_device_realize(qdev, errp);
 }
diff --git 

[PATCH 0/5] virtio-gpu cleanups and obvious definitions

2023-04-27 Thread Gurchetan Singh
This series refactors the virtio-gpu-gl device in the first three
patches.  The 4th and 5th patches are definitions already in the
virtio-spec and can benefit all three of the following proposals:

https://lists.gnu.org/archive/html/qemu-devel/2023-04/msg03791.html
https://lists.gnu.org/archive/html/qemu-devel/2023-03/msg03972.html
https://lists.gnu.org/archive/html/qemu-devel/2022-09/msg04111.html

All have been reviewed, aside from patch 2 (which is code movement).
Hopefully, we can land these to reduce the patch load on all GPU
modernizations attempts?

Antonio Caggiano (1):
  virtio-gpu: CONTEXT_INIT feature

Dr. David Alan Gilbert (1):
  virtio: Add shared memory capability

Gurchetan Singh (3):
  hw/display/virtio-gpu-virgl: virtio_gpu_gl -> virtio_gpu_virgl
  hw/display/virtio-gpu-virgl: make GL device more library agnostic
  hw/display/virtio-gpu-virgl: define callbacks in realize function

 hw/display/virtio-gpu-base.c   |   3 +
 hw/display/virtio-gpu-gl.c | 114 +--
 hw/display/virtio-gpu-virgl.c  | 138 +++--
 hw/virtio/virtio-pci.c |  18 +
 include/hw/virtio/virtio-gpu.h |  11 +--
 include/hw/virtio/virtio-pci.h |   4 +
 6 files changed, 161 insertions(+), 127 deletions(-)

-- 
2.40.1.495.gc816e09b53d-goog




[PATCH 2/5] hw/display/virtio-gpu-virgl: make GL device more library agnostic

2023-04-27 Thread Gurchetan Singh
From: Gurchetan Singh 

We need to:
- Move all virgl functions to their own file
- Only have needed class callbacks in the generic GL device

We plan to use this cleanup for gfxstream in the near feature.

Signed-off-by: Gurchetan Singh 
---
 hw/display/virtio-gpu-gl.c | 110 --
 hw/display/virtio-gpu-virgl.c  | 119 +++--
 include/hw/virtio/virtio-gpu.h |  11 +--
 3 files changed, 120 insertions(+), 120 deletions(-)

diff --git a/hw/display/virtio-gpu-gl.c b/hw/display/virtio-gpu-gl.c
index 7d69050b8c..2d140e8792 100644
--- a/hw/display/virtio-gpu-gl.c
+++ b/hw/display/virtio-gpu-gl.c
@@ -15,122 +15,12 @@
 #include "qemu/iov.h"
 #include "qemu/module.h"
 #include "qemu/error-report.h"
-#include "qapi/error.h"
-#include "sysemu/sysemu.h"
 #include "hw/virtio/virtio.h"
 #include "hw/virtio/virtio-gpu.h"
 #include "hw/virtio/virtio-gpu-bswap.h"
 #include "hw/virtio/virtio-gpu-pixman.h"
 #include "hw/qdev-properties.h"
 
-#include 
-
-static void
-virtio_gpu_virgl_update_cursor(VirtIOGPU *g,
-   struct virtio_gpu_scanout *s,
-   uint32_t resource_id)
-{
-uint32_t width, height;
-uint32_t pixels, *data;
-
-data = virgl_renderer_get_cursor_data(resource_id, , );
-if (!data) {
-return;
-}
-
-if (width != s->current_cursor->width ||
-height != s->current_cursor->height) {
-free(data);
-return;
-}
-
-pixels = s->current_cursor->width * s->current_cursor->height;
-memcpy(s->current_cursor->data, data, pixels * sizeof(uint32_t));
-free(data);
-}
-
-static void virtio_gpu_virgl_flushed(VirtIOGPUBase *b)
-{
-VirtIOGPU *g = VIRTIO_GPU(b);
-
-virtio_gpu_process_cmdq(g);
-}
-
-static void virtio_gpu_virgl_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
-{
-VirtIOGPU *g = VIRTIO_GPU(vdev);
-VirtIOGPUGL *gl = VIRTIO_GPU_GL(vdev);
-struct virtio_gpu_ctrl_command *cmd;
-
-if (!virtio_queue_ready(vq)) {
-return;
-}
-
-if (!gl->renderer_inited) {
-virtio_gpu_virgl_init(g);
-gl->renderer_inited = true;
-}
-if (gl->renderer_reset) {
-gl->renderer_reset = false;
-virtio_gpu_virgl_reset_renderer(g);
-}
-
-cmd = virtqueue_pop(vq, sizeof(struct virtio_gpu_ctrl_command));
-while (cmd) {
-cmd->vq = vq;
-cmd->error = 0;
-cmd->finished = false;
-QTAILQ_INSERT_TAIL(>cmdq, cmd, next);
-cmd = virtqueue_pop(vq, sizeof(struct virtio_gpu_ctrl_command));
-}
-
-virtio_gpu_process_cmdq(g);
-virtio_gpu_virgl_fence_poll(g);
-}
-
-static void virtio_gpu_virgl_reset(VirtIODevice *vdev)
-{
-VirtIOGPU *g = VIRTIO_GPU(vdev);
-VirtIOGPUGL *gl = VIRTIO_GPU_GL(vdev);
-
-virtio_gpu_reset(vdev);
-
-/*
- * GL functions must be called with the associated GL context in main
- * thread, and when the renderer is unblocked.
- */
-if (gl->renderer_inited && !gl->renderer_reset) {
-virtio_gpu_virgl_reset_scanout(g);
-gl->renderer_reset = true;
-}
-}
-
-static void virtio_gpu_virgl_device_realize(DeviceState *qdev, Error **errp)
-{
-VirtIOGPU *g = VIRTIO_GPU(qdev);
-
-#if HOST_BIG_ENDIAN
-error_setg(errp, "virgl is not supported on bigendian platforms");
-return;
-#endif
-
-if (!object_resolve_path_type("", TYPE_VIRTIO_GPU_GL, NULL)) {
-error_setg(errp, "at most one %s device is permitted", 
TYPE_VIRTIO_GPU_GL);
-return;
-}
-
-if (!display_opengl) {
-error_setg(errp, "opengl is not available");
-return;
-}
-
-g->parent_obj.conf.flags |= (1 << VIRTIO_GPU_FLAG_VIRGL_ENABLED);
-VIRTIO_GPU_BASE(g)->virtio_config.num_capsets =
-virtio_gpu_virgl_get_num_capsets(g);
-
-virtio_gpu_device_realize(qdev, errp);
-}
-
 static Property virtio_gpu_gl_properties[] = {
 DEFINE_PROP_BIT("stats", VirtIOGPU, parent_obj.conf.flags,
 VIRTIO_GPU_FLAG_STATS_ENABLED, false),
diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index ffe4ec7f3d..ee5ddb887c 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -14,6 +14,8 @@
 #include "qemu/osdep.h"
 #include "qemu/error-report.h"
 #include "qemu/iov.h"
+#include "qapi/error.h"
+#include "sysemu/sysemu.h"
 #include "trace.h"
 #include "hw/virtio/virtio.h"
 #include "hw/virtio/virtio-gpu.h"
@@ -584,12 +586,12 @@ static void virtio_gpu_fence_poll(void *opaque)
 }
 }
 
-void virtio_gpu_virgl_fence_poll(VirtIOGPU *g)
+static void virtio_gpu_virgl_fence_poll(VirtIOGPU *g)
 {
 virtio_gpu_fence_poll(g);
 }
 
-void virtio_gpu_virgl_reset_scanout(VirtIOGPU *g)
+static void virtio_gpu_virgl_reset_scanout(VirtIOGPU *g)
 {
 int i;
 
@@ -599,12 +601,12 @@ void virtio_gpu_virgl_reset_scanout(VirtIOGPU *g)
 }
 }
 
-void virtio_gpu_virgl_reset_renderer(VirtIOGPU *g)
+static void 

[PATCH 5/5] virtio-gpu: CONTEXT_INIT feature

2023-04-27 Thread Gurchetan Singh
From: Antonio Caggiano 

The feature can be enabled when a backend wants it.

Signed-off-by: Antonio Caggiano 
Reviewed-by: Marc-André Lureau 
Signed-off-by: Gurchetan Singh 
---
 hw/display/virtio-gpu-base.c   | 3 +++
 include/hw/virtio/virtio-gpu.h | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/hw/display/virtio-gpu-base.c b/hw/display/virtio-gpu-base.c
index a29f191aa8..6c5f1f327f 100644
--- a/hw/display/virtio-gpu-base.c
+++ b/hw/display/virtio-gpu-base.c
@@ -215,6 +215,9 @@ virtio_gpu_base_get_features(VirtIODevice *vdev, uint64_t 
features,
 if (virtio_gpu_blob_enabled(g->conf)) {
 features |= (1 << VIRTIO_GPU_F_RESOURCE_BLOB);
 }
+if (virtio_gpu_context_init_enabled(g->conf)) {
+features |= (1 << VIRTIO_GPU_F_CONTEXT_INIT);
+}
 
 return features;
 }
diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index d5808f2ab6..cf24d2e21b 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -90,6 +90,7 @@ enum virtio_gpu_base_conf_flags {
 VIRTIO_GPU_FLAG_EDID_ENABLED,
 VIRTIO_GPU_FLAG_DMABUF_ENABLED,
 VIRTIO_GPU_FLAG_BLOB_ENABLED,
+VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED,
 };
 
 #define virtio_gpu_virgl_enabled(_cfg) \
@@ -102,6 +103,8 @@ enum virtio_gpu_base_conf_flags {
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_DMABUF_ENABLED))
 #define virtio_gpu_blob_enabled(_cfg) \
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_BLOB_ENABLED))
+#define virtio_gpu_context_init_enabled(_cfg) \
+(_cfg.flags & (1 << VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED))
 
 struct virtio_gpu_base_conf {
 uint32_t max_outputs;
-- 
2.40.1.495.gc816e09b53d-goog




RE: [PATCH v2 11/21] Hexagon (target/hexagon) Short-circuit packet register writes

2023-04-27 Thread Brian Cain
> -Original Message-
> From: Taylor Simpson 
> Sent: Thursday, April 27, 2023 6:00 PM
> To: qemu-devel@nongnu.org
> Cc: Taylor Simpson ; richard.hender...@linaro.org;
> phi...@linaro.org; a...@rev.ng; a...@rev.ng; Brian Cain
> ; Matheus Bernardino (QUIC)
> 
> Subject: [PATCH v2 11/21] Hexagon (target/hexagon) Short-circuit packet
> register writes
> 
> In certain cases, we can avoid the overhead of writing to hex_new_value
> and write directly to hex_gpr.  We add need_commit field to DisasContext
> indicating if the end-of-packet commit is needed.  If it is not needed,
> get_result_gpr() and get_result_gpr_pair() can return hex_gpr.
> 
> We pass the ctx->need_commit to helpers when needed.
> 
> Finally, we can early-exit from gen_reg_writes during packet commit.
> 
> There are a few instructions whose semantics write to the result before
> reading all the inputs.  Therefore, the idef-parser generated code is
> incompatible with short-circuit.  We tell idef-parser to skip them.
> 
> For debugging purposes, we add a cpu property to turn off short-circuit.
> When the short-circuit property is false, we skip the analysis and force
> the end-of-packet commit.
> 
> Here's a simple example of the TCG generated for
> 0x004000b4:  0x7800c020 {   R0 = #0x1 }
> 
> BEFORE:
>   004000b4
>  movi_i32 new_r0,$0x1
>  mov_i32 r0,new_r0
> 
> AFTER:
>   004000b4
>  movi_i32 r0,$0x1
> 
> This patch reintroduces a use of check_for_attrib, so we remove the
> G_GNUC_UNUSED added earlier in this series.
> 
> Signed-off-by: Taylor Simpson 
> Reviewed-by: Richard Henderson 
> ---
>  target/hexagon/cpu.h|  1 +
>  target/hexagon/gen_tcg.h|  3 +-
>  target/hexagon/genptr.h |  2 +
>  target/hexagon/helper.h |  2 +-
>  target/hexagon/macros.h | 13 -
>  target/hexagon/translate.h  |  2 +
>  target/hexagon/arch.c   |  3 +-
>  target/hexagon/cpu.c|  5 +-
>  target/hexagon/genptr.c | 30 ---
>  target/hexagon/op_helper.c  |  5 +-
>  target/hexagon/translate.c  | 67 -
>  target/hexagon/gen_helper_funcs.py  |  2 +
>  target/hexagon/gen_helper_protos.py | 10 +++-
>  target/hexagon/gen_idef_parser_funcs.py |  7 +++
>  target/hexagon/gen_tcg_funcs.py |  5 ++
>  target/hexagon/hex_common.py|  3 ++
>  16 files changed, 129 insertions(+), 31 deletions(-)
> 
> diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h
> index 81b663ecfb..9252055a38 100644
> --- a/target/hexagon/cpu.h
> +++ b/target/hexagon/cpu.h
> @@ -146,6 +146,7 @@ struct ArchCPU {
> 
>  bool lldb_compat;
>  target_ulong lldb_stack_adjust;
> +bool short_circuit;
>  };
> 
>  #include "cpu_bits.h"
> diff --git a/target/hexagon/gen_tcg.h b/target/hexagon/gen_tcg.h
> index 2b2a6175a5..1f7e535300 100644
> --- a/target/hexagon/gen_tcg.h
> +++ b/target/hexagon/gen_tcg.h
> @@ -592,7 +592,8 @@
>  #define fGEN_TCG_A5_ACS(SHORTCODE) \
>  do { \
>  gen_helper_vacsh_pred(PeV, cpu_env, RxxV, RssV, RttV); \
> -gen_helper_vacsh_val(RxxV, cpu_env, RxxV, RssV, RttV); \
> +gen_helper_vacsh_val(RxxV, cpu_env, RxxV, RssV, RttV, \
> + tcg_constant_tl(ctx->need_commit)); \
>  } while (0)
> 
>  #define fGEN_TCG_S2_cabacdecbin(SHORTCODE) \
> diff --git a/target/hexagon/genptr.h b/target/hexagon/genptr.h
> index 75d0fc262d..420867f934 100644
> --- a/target/hexagon/genptr.h
> +++ b/target/hexagon/genptr.h
> @@ -58,4 +58,6 @@ void gen_set_half(int N, TCGv result, TCGv src);
>  void gen_set_half_i64(int N, TCGv_i64 result, TCGv src);
>  void probe_noshuf_load(TCGv va, int s, int mi);
> 
> +extern const target_ulong reg_immut_masks[TOTAL_PER_THREAD_REGS];
> +
>  #endif
> diff --git a/target/hexagon/helper.h b/target/hexagon/helper.h
> index 73849e3d49..4b750d0351 100644
> --- a/target/hexagon/helper.h
> +++ b/target/hexagon/helper.h
> @@ -29,7 +29,7 @@ DEF_HELPER_FLAGS_4(fcircadd, TCG_CALL_NO_RWG_SE,
> s32, s32, s32, s32, s32)
>  DEF_HELPER_FLAGS_1(fbrev, TCG_CALL_NO_RWG_SE, i32, i32)
>  DEF_HELPER_3(sfrecipa, i64, env, f32, f32)
>  DEF_HELPER_2(sfinvsqrta, i64, env, f32)
> -DEF_HELPER_4(vacsh_val, s64, env, s64, s64, s64)
> +DEF_HELPER_5(vacsh_val, s64, env, s64, s64, s64, i32)
>  DEF_HELPER_FLAGS_4(vacsh_pred, TCG_CALL_NO_RWG_SE, s32, env, s64,
> s64, s64)
>  DEF_HELPER_FLAGS_2(cabacdecbin_val, TCG_CALL_NO_RWG_SE, s64, s64,
> s64)
>  DEF_HELPER_FLAGS_2(cabacdecbin_pred, TCG_CALL_NO_RWG_SE, s32, s64,
> s64)
> diff --git a/target/hexagon/macros.h b/target/hexagon/macros.h
> index 16e72ed0d5..a68446a367 100644
> --- a/target/hexagon/macros.h
> +++ b/target/hexagon/macros.h
> @@ -44,8 +44,17 @@
> reg_field_info[FIELD].offset)
> 
>  #define SET_USR_FIELD(FIELD, VAL) \
> -fINSERT_BITS(env->new_value[HEX_REG_USR], reg_field_info[FIELD].width,
> \
> - 

Re: [PATCH v4 2/7] target/riscv: Move pmp_get_tlb_size apart from get_physical_address_pmp

2023-04-27 Thread LIU Zhiwei



On 2023/4/22 21:03, Weiwei Li wrote:

pmp_get_tlb_size can be separated from get_physical_address_pmp and is only
needed when ret == TRANSLATE_SUCCESS.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
  target/riscv/cpu_helper.c | 16 ++--
  1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 075fc0538a..83c9699a6d 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -676,14 +676,11 @@ void riscv_cpu_set_mode(CPURISCVState *env, target_ulong 
newpriv)
   *
   * @env: CPURISCVState
   * @prot: The returned protection attributes
- * @tlb_size: TLB page size containing addr. It could be modified after PMP
- *permission checking. NULL if not set TLB page for addr.
   * @addr: The physical address to be checked permission
   * @access_type: The type of MMU access
   * @mode: Indicates current privilege level.
   */
-static int get_physical_address_pmp(CPURISCVState *env, int *prot,
-target_ulong *tlb_size, hwaddr addr,
+static int get_physical_address_pmp(CPURISCVState *env, int *prot, hwaddr addr,
  int size, MMUAccessType access_type,
  int mode)
  {
@@ -703,9 +700,6 @@ static int get_physical_address_pmp(CPURISCVState *env, int 
*prot,
  }
  
  *prot = pmp_priv_to_page_prot(pmp_priv);

-if (tlb_size != NULL) {
-*tlb_size = pmp_get_tlb_size(env, addr);
-}
  
  return TRANSLATE_SUCCESS;

  }
@@ -905,7 +899,7 @@ restart:
  }
  
  int pmp_prot;

-int pmp_ret = get_physical_address_pmp(env, _prot, NULL, pte_addr,
+int pmp_ret = get_physical_address_pmp(env, _prot, pte_addr,
 sizeof(target_ulong),
 MMU_DATA_LOAD, PRV_S);
  if (pmp_ret != TRANSLATE_SUCCESS) {
@@ -1300,8 +1294,9 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, int 
size,
  prot &= prot2;
  
  if (ret == TRANSLATE_SUCCESS) {

-ret = get_physical_address_pmp(env, _pmp, _size, pa,
+ret = get_physical_address_pmp(env, _pmp, pa,
 size, access_type, mode);
+tlb_size = pmp_get_tlb_size(env, pa);
  
  qemu_log_mask(CPU_LOG_MMU,

"%s PMP address=" HWADDR_FMT_plx " ret %d prot"
@@ -1333,8 +1328,9 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, int 
size,
__func__, address, ret, pa, prot);
  
  if (ret == TRANSLATE_SUCCESS) {

-ret = get_physical_address_pmp(env, _pmp, _size, pa,
+ret = get_physical_address_pmp(env, _pmp, pa,
 size, access_type, mode);
+tlb_size = pmp_get_tlb_size(env, pa);
  


Reviewed-by: LIU Zhiwei 

Zhiwei


  qemu_log_mask(CPU_LOG_MMU,
"%s PMP address=" HWADDR_FMT_plx " ret %d prot"




Re: [PATCH v4 1/7] target/riscv: Update pmp_get_tlb_size()

2023-04-27 Thread LIU Zhiwei



On 2023/4/22 21:03, Weiwei Li wrote:

PMP entries before the matched PMP entry (including the matched PMP entry)
may only cover partial of the TLB page, which may make different regions in
that page allow different RWX privs, such as for PMP0 (0x8008~0x800F,
R) and PMP1 (0x80001000~0x80001FFF, RWX) write access to 0x8000 will
match PMP1.


Typo here.

Otherwise,

Reviewed-by: LIU Zhiwei 

Zhiwei


  However we cannot cache the translation result in the TLB since
this will make the write access to 0x8008 bypass the check of PMP0. So we
should check all of them instead of the matched PMP entry in pmp_get_tlb_size()
and set the tlb_size to 1 in this case.
Set tlb_size to TARGET_PAGE_SIZE if PMP is not support or there is no PMP rules.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
  target/riscv/cpu_helper.c |  7 ++---
  target/riscv/pmp.c| 64 ++-
  target/riscv/pmp.h|  3 +-
  3 files changed, 52 insertions(+), 22 deletions(-)

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 433ea529b0..075fc0538a 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -703,11 +703,8 @@ static int get_physical_address_pmp(CPURISCVState *env, 
int *prot,
  }
  
  *prot = pmp_priv_to_page_prot(pmp_priv);

-if ((tlb_size != NULL) && pmp_index != MAX_RISCV_PMPS) {
-target_ulong tlb_sa = addr & ~(TARGET_PAGE_SIZE - 1);
-target_ulong tlb_ea = tlb_sa + TARGET_PAGE_SIZE - 1;
-
-*tlb_size = pmp_get_tlb_size(env, pmp_index, tlb_sa, tlb_ea);
+if (tlb_size != NULL) {
+*tlb_size = pmp_get_tlb_size(env, addr);
  }
  
  return TRANSLATE_SUCCESS;

diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c
index 1f5aca42e8..ad20a319c1 100644
--- a/target/riscv/pmp.c
+++ b/target/riscv/pmp.c
@@ -601,28 +601,62 @@ target_ulong mseccfg_csr_read(CPURISCVState *env)
  }
  
  /*

- * Calculate the TLB size if the start address or the end address of
- * PMP entry is presented in the TLB page.
+ * Calculate the TLB size. If the PMP rules may make different regions in
+ * the TLB page of 'addr' allow different RWX privs, set the size to 1
+ * (to make the translation result uncached in the TLB and only be used for
+ * a single translation). Set the size to TARGET_PAGE_SIZE otherwise.
   */
-target_ulong pmp_get_tlb_size(CPURISCVState *env, int pmp_index,
-  target_ulong tlb_sa, target_ulong tlb_ea)
+target_ulong pmp_get_tlb_size(CPURISCVState *env, target_ulong addr)
  {
-target_ulong pmp_sa = env->pmp_state.addr[pmp_index].sa;
-target_ulong pmp_ea = env->pmp_state.addr[pmp_index].ea;
+target_ulong pmp_sa;
+target_ulong pmp_ea;
+target_ulong tlb_sa = addr & ~(TARGET_PAGE_SIZE - 1);
+target_ulong tlb_ea = tlb_sa + TARGET_PAGE_SIZE - 1;
+int i;
  
-if (pmp_sa <= tlb_sa && pmp_ea >= tlb_ea) {

+/*
+ * If PMP is not supported or there is no PMP rule, which means the allowed
+ * RWX privs of the page will not affected by PMP or PMP will provide the
+ * same option (disallow accesses or allow default RWX privs) for all
+ * addresses, set the size to TARGET_PAGE_SIZE.
+ */
+if (!riscv_cpu_cfg(env)->pmp || !pmp_get_num_rules(env)) {
  return TARGET_PAGE_SIZE;
-} else {
+}
+
+for (i = 0; i < MAX_RISCV_PMPS; i++) {
+if (pmp_get_a_field(env->pmp_state.pmp[i].cfg_reg) == PMP_AMATCH_OFF) {
+continue;
+}
+
+pmp_sa = env->pmp_state.addr[i].sa;
+pmp_ea = env->pmp_state.addr[i].ea;
+
  /*
- * At this point we have a tlb_size that is the smallest possible size
- * That fits within a TARGET_PAGE_SIZE and the PMP region.
- *
- * If the size is less then TARGET_PAGE_SIZE we drop the size to 1.
- * This means the result isn't cached in the TLB and is only used for
- * a single translation.
+ * Only the first PMP entry that covers (whole or partial of) the TLB
+ * page really matters:
+ * If it can cover the whole page, set the size to TARGET_PAGE_SIZE.
+ * The following PMP entries have lower priority and will not affect
+ * the allowed RWX privs of the page.
+ * If it only cover partial of the TLB page, set the size to 1 since
+ * the allowed RWX privs for the covered region may be different from
+ * other region of the page.
   */
-return 1;
+if (pmp_sa <= tlb_sa && pmp_ea >= tlb_ea) {
+return TARGET_PAGE_SIZE;
+} else if ((pmp_sa >= tlb_sa && pmp_sa <= tlb_ea) ||
+   (pmp_ea >= tlb_sa && pmp_ea <= tlb_ea)) {
+return 1;
+}
  }
+
+/*
+ * If no PMP entry covers any region of the TLB page, similar to the above
+ * case that there is no PMP rule, PMP will provide the same option
+ * (disallow accesses or allow default RWX privs) for 

Re: [PATCH 2/2] target/riscv/vector_helper.c: make vext_set_tail_elems_1s() debug only

2023-04-27 Thread Weiwei Li



On 2023/4/28 04:57, Daniel Henrique Barboza wrote:

Commit 3479a814 ("target/riscv: rvv-1.0: add VMA and VTA") added vma and
vta fields in the vtype register, while also defining that QEMU doesn't
need to have a tail agnostic policy to be compliant with the RVV spec.
It ended up removing all tail handling code as well. Later, commit
752614ca ("target/riscv: rvv: Add tail agnostic for vector load / store
instructions") reintroduced the tail agnostic fill for vector load/store
instructions only.

This puts QEMU in a situation where some functions are 1-filling the
tail elements and others don't. This is still a valid implementation,
but the process of 1-filling the tail elements takes valuable emulation
time that can be used doing anything else. If the spec doesn't demand a
specific tail-agostic policy, a proper software wouldn't expect any
policy to be in place. This means that, more often than not, the work
we're doing by 1-filling tail elements is wasted. We would be better of
if vext_set_tail_elems_1s() is removed entirely from the code.

All this said, there's still a debug value associated with it. So,
instead of removing it, let's gate it with cpu->cfg.debug. This way
software can enable this code if desirable, but for the regular case we
shouldn't waste time with it.

Signed-off-by: Daniel Henrique Barboza 
---
  target/riscv/vector_helper.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 8e6c99e573..e0a292ac24 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -272,7 +272,7 @@ static void vext_set_tail_elems_1s(CPURISCVState *env, 
target_ulong vl,
  uint32_t vta = vext_vta(desc);
  int k;
  
-if (vta == 0) {

+if (vta == 0 || !riscv_cpu_cfg(env)->debug)  {


I think this is not correct. 'debug' property is used for debug spec. 
And this feature is controlled by another property 'rvv_ta_all_1s' .


By the way, cfg.rvv_ta_all_1s have been ANDed intovta value. So 
additional check on it  is also unnecessary here.


Regards,

Weiwei Li


  return;
  }
  





Re: [PATCH 1/2] target/riscv/vector_helper.c: skip set tail when vta is zero

2023-04-27 Thread Weiwei Li



On 2023/4/28 04:57, Daniel Henrique Barboza wrote:

The function is a no-op if 'vta' is zero but we're still doing a lot of
stuff in this function regardless. vext_set_elems_1s() will ignore every
single time (since vta is zero) and we just wasted time.

Skip it altogether in this case. Aside from the code simplification
there's a noticeable emulation performance gain by doing it. For a
regular C binary that does a vectors operation like this:

===
  #define SZ 1000

int main ()
{
   int *a = malloc (SZ * sizeof (int));
   int *b = malloc (SZ * sizeof (int));
   int *c = malloc (SZ * sizeof (int));

   for (int i = 0; i < SZ; i++)
 c[i] = a[i] + b[i];
   return c[SZ - 1];
}
===

Emulating it with qemu-riscv64 and RVV takes ~0.3 sec:

$ time ~/work/qemu/build/qemu-riscv64 \
 -cpu rv64,debug=false,vext_spec=v1.0,v=true,vlen=128 ./foo.out

real0m0.303s
user0m0.281s
sys 0m0.023s

With this skip we take ~0.275 sec:

$ time ~/work/qemu/build/qemu-riscv64 \
 -cpu rv64,debug=false,vext_spec=v1.0,v=true,vlen=128 ./foo.out

real0m0.274s
user0m0.252s
sys 0m0.019s

This performance gain adds up fast when executing heavy benchmarks like
SPEC.

Signed-off-by: Daniel Henrique Barboza 
---


Reviewed-by: Weiwei Li 

Weiwei Li


  target/riscv/vector_helper.c | 11 ---
  1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index f4d0438988..8e6c99e573 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -268,12 +268,17 @@ static void vext_set_tail_elems_1s(CPURISCVState *env, 
target_ulong vl,
 void *vd, uint32_t desc, uint32_t nf,
 uint32_t esz, uint32_t max_elems)
  {
-uint32_t total_elems = vext_get_total_elems(env, desc, esz);
-uint32_t vlenb = riscv_cpu_cfg(env)->vlen >> 3;
+uint32_t total_elems, vlenb, registers_used;
  uint32_t vta = vext_vta(desc);
-uint32_t registers_used;
  int k;
  
+if (vta == 0) {

+return;
+}
+
+total_elems = vext_get_total_elems(env, desc, esz);
+vlenb = riscv_cpu_cfg(env)->vlen >> 3;
+
  for (k = 0; k < nf; ++k) {
  vext_set_elems_1s(vd, vta, (k * max_elems + vl) * esz,
(k * max_elems + max_elems) * esz);





RE: [PATCH] target/hexagon: fix = vs. == mishap

2023-04-27 Thread Taylor Simpson


> -Original Message-
> From: Richard Henderson 
> Sent: Thursday, April 27, 2023 8:33 AM
> To: Paolo Bonzini ; qemu-devel@nongnu.org
> Cc: Taylor Simpson 
> Subject: Re: [PATCH] target/hexagon: fix = vs. == mishap
> 
> WARNING: This email originated from outside of Qualcomm. Please be wary
> of any links or attachments, and do not enable macros.
> 
> On 4/27/23 13:56, Paolo Bonzini wrote:
> > Coverity reports a parameter that is "set but never used".  This is
> > caused by an assignment operator being used instead of equality.
> >
> > Cc: Taylor Simpson
> > Signed-off-by: Paolo Bonzini
> > ---
> >   target/hexagon/idef-parser/parser-helpers.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Reviewed-by: Richard Henderson 

Queued with next Hexagon update

Thanks,
Taylor



[PATCH v2 20/21] Hexagon (target/hexagon) Move pkt_has_store_s1 to DisasContext

2023-04-27 Thread Taylor Simpson
The pkt_has_store_s1 field is only used for bookkeeping helpers with
a load.  With recent changes that eliminate the need to free TCGv
variables, it makes more sense to make this transient.

These helpers already take the instruction slot as an argument.  We
combine the slot and pkt_has_store_s1 into a single argument called
slotval.

Suggested-by: Richard Henderson 
Signed-off-by: Taylor Simpson 
Reviewed-by: Richard Henderson 
---
 target/hexagon/cpu.h|  1 -
 target/hexagon/macros.h | 16 
 target/hexagon/op_helper.h  | 12 
 target/hexagon/translate.h  |  1 -
 target/hexagon/genptr.c |  8 
 target/hexagon/op_helper.c  | 26 +++---
 target/hexagon/translate.c  |  7 ---
 target/hexagon/gen_analyze_funcs.py |  2 --
 target/hexagon/gen_helper_funcs.py  |  7 ++-
 target/hexagon/gen_tcg_funcs.py |  4 ++--
 target/hexagon/hex_common.py|  7 ---
 11 files changed, 51 insertions(+), 40 deletions(-)

diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h
index 26952cddcb..72b7d79279 100644
--- a/target/hexagon/cpu.h
+++ b/target/hexagon/cpu.h
@@ -95,7 +95,6 @@ typedef struct CPUArchState {
 target_ulong reg_written[TOTAL_PER_THREAD_REGS];
 
 MemLog mem_log_stores[STORES_MAX];
-target_ulong pkt_has_store_s1;
 target_ulong dczero_addr;
 
 float_status fp_status;
diff --git a/target/hexagon/macros.h b/target/hexagon/macros.h
index 27172193a0..f5ebaf7f54 100644
--- a/target/hexagon/macros.h
+++ b/target/hexagon/macros.h
@@ -173,14 +173,14 @@
 #define MEM_STORE8(VA, DATA, SLOT) \
 MEM_STORE8_FUNC(DATA)(cpu_env, VA, DATA, SLOT)
 #else
-#define MEM_LOAD1s(VA) ((int8_t)mem_load1(env, slot, VA))
-#define MEM_LOAD1u(VA) ((uint8_t)mem_load1(env, slot, VA))
-#define MEM_LOAD2s(VA) ((int16_t)mem_load2(env, slot, VA))
-#define MEM_LOAD2u(VA) ((uint16_t)mem_load2(env, slot, VA))
-#define MEM_LOAD4s(VA) ((int32_t)mem_load4(env, slot, VA))
-#define MEM_LOAD4u(VA) ((uint32_t)mem_load4(env, slot, VA))
-#define MEM_LOAD8s(VA) ((int64_t)mem_load8(env, slot, VA))
-#define MEM_LOAD8u(VA) ((uint64_t)mem_load8(env, slot, VA))
+#define MEM_LOAD1s(VA) ((int8_t)mem_load1(env, pkt_has_store_s1, slot, VA))
+#define MEM_LOAD1u(VA) ((uint8_t)mem_load1(env, pkt_has_store_s1, slot, VA))
+#define MEM_LOAD2s(VA) ((int16_t)mem_load2(env, pkt_has_store_s1, slot, VA))
+#define MEM_LOAD2u(VA) ((uint16_t)mem_load2(env, pkt_has_store_s1, slot, VA))
+#define MEM_LOAD4s(VA) ((int32_t)mem_load4(env, pkt_has_store_s1, slot, VA))
+#define MEM_LOAD4u(VA) ((uint32_t)mem_load4(env, pkt_has_store_s1, slot, VA))
+#define MEM_LOAD8s(VA) ((int64_t)mem_load8(env, pkt_has_store_s1, slot, VA))
+#define MEM_LOAD8u(VA) ((uint64_t)mem_load8(env, pkt_has_store_s1, slot, VA))
 
 #define MEM_STORE1(VA, DATA, SLOT) log_store32(env, VA, DATA, 1, SLOT)
 #define MEM_STORE2(VA, DATA, SLOT) log_store32(env, VA, DATA, 2, SLOT)
diff --git a/target/hexagon/op_helper.h b/target/hexagon/op_helper.h
index 6bd4b07849..8f3764d15e 100644
--- a/target/hexagon/op_helper.h
+++ b/target/hexagon/op_helper.h
@@ -19,10 +19,14 @@
 #define HEXAGON_OP_HELPER_H
 
 /* Misc functions */
-uint8_t mem_load1(CPUHexagonState *env, uint32_t slot, target_ulong vaddr);
-uint16_t mem_load2(CPUHexagonState *env, uint32_t slot, target_ulong vaddr);
-uint32_t mem_load4(CPUHexagonState *env, uint32_t slot, target_ulong vaddr);
-uint64_t mem_load8(CPUHexagonState *env, uint32_t slot, target_ulong vaddr);
+uint8_t mem_load1(CPUHexagonState *env, bool pkt_has_store_s1,
+  uint32_t slot, target_ulong vaddr);
+uint16_t mem_load2(CPUHexagonState *env, bool pkt_has_store_s1,
+   uint32_t slot, target_ulong vaddr);
+uint32_t mem_load4(CPUHexagonState *env, bool pkt_has_store_s1,
+   uint32_t slot, target_ulong vaddr);
+uint64_t mem_load8(CPUHexagonState *env, bool pkt_has_store_s1,
+   uint32_t slot, target_ulong vaddr);
 
 void log_store64(CPUHexagonState *env, target_ulong addr,
  int64_t val, int width, int slot);
diff --git a/target/hexagon/translate.h b/target/hexagon/translate.h
index a9f1ccee24..9697b4de0e 100644
--- a/target/hexagon/translate.h
+++ b/target/hexagon/translate.h
@@ -66,7 +66,6 @@ typedef struct DisasContext {
 TCGCond branch_cond;
 target_ulong branch_dest;
 bool is_tight_loop;
-bool need_pkt_has_store_s1;
 bool short_circuit;
 bool has_hvx_helper;
 TCGv new_value[TOTAL_PER_THREAD_REGS];
diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index 1ad4d636f8..1e98e2913c 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -398,6 +398,14 @@ static inline void gen_store_conditional8(DisasContext 
*ctx,
 tcg_gen_movi_tl(hex_llsc_addr, ~0);
 }
 
+#ifndef CONFIG_HEXAGON_IDEF_PARSER
+static TCGv gen_slotval(DisasContext *ctx)
+{
+int slotval = (ctx->pkt->pkt_has_store_s1 & 1) | (ctx->insn->slot 

[PATCH v2 02/21] Hexagon (target/hexagon) Add DisasContext arg to gen_log_reg_write

2023-04-27 Thread Taylor Simpson
Add DisasContext arg to gen_log_reg_write_pair also

Signed-off-by: Taylor Simpson 
Reviewed-by: Richard Henderson 
---
 target/hexagon/gen_tcg.h|  2 +-
 target/hexagon/genptr.h |  2 +-
 target/hexagon/genptr.c | 10 +-
 target/hexagon/idef-parser/parser-helpers.c |  2 +-
 target/hexagon/README   |  2 +-
 target/hexagon/gen_tcg_funcs.py |  8 +---
 6 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/target/hexagon/gen_tcg.h b/target/hexagon/gen_tcg.h
index 329e7a1024..060c11f6c0 100644
--- a/target/hexagon/gen_tcg.h
+++ b/target/hexagon/gen_tcg.h
@@ -515,7 +515,7 @@
 do { \
 TCGv_i64 RddV = get_result_gpr_pair(ctx, HEX_REG_FP); \
 gen_return(ctx, RddV, hex_gpr[HEX_REG_FP]); \
-gen_log_reg_write_pair(HEX_REG_FP, RddV); \
+gen_log_reg_write_pair(ctx, HEX_REG_FP, RddV); \
 } while (0)
 
 /*
diff --git a/target/hexagon/genptr.h b/target/hexagon/genptr.h
index 76e497aa48..75d0fc262d 100644
--- a/target/hexagon/genptr.h
+++ b/target/hexagon/genptr.h
@@ -35,7 +35,7 @@ void gen_store4i(TCGv_env cpu_env, TCGv vaddr, int32_t src, 
uint32_t slot);
 void gen_store8i(TCGv_env cpu_env, TCGv vaddr, int64_t src, uint32_t slot);
 TCGv gen_read_reg(TCGv result, int num);
 TCGv gen_read_preg(TCGv pred, uint8_t num);
-void gen_log_reg_write(int rnum, TCGv val);
+void gen_log_reg_write(DisasContext *ctx, int rnum, TCGv val);
 void gen_log_pred_write(DisasContext *ctx, int pnum, TCGv val);
 void gen_set_usr_field(DisasContext *ctx, int field, TCGv val);
 void gen_set_usr_fieldi(DisasContext *ctx, int field, int x);
diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index 502c85ae35..12c72cbac9 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -81,7 +81,7 @@ static TCGv_i64 get_result_gpr_pair(DisasContext *ctx, int 
rnum)
 return result;
 }
 
-void gen_log_reg_write(int rnum, TCGv val)
+void gen_log_reg_write(DisasContext *ctx, int rnum, TCGv val)
 {
 const target_ulong reg_mask = reg_immut_masks[rnum];
 
@@ -93,7 +93,7 @@ void gen_log_reg_write(int rnum, TCGv val)
 }
 }
 
-static void gen_log_reg_write_pair(int rnum, TCGv_i64 val)
+static void gen_log_reg_write_pair(DisasContext *ctx, int rnum, TCGv_i64 val)
 {
 const target_ulong reg_mask_low = reg_immut_masks[rnum];
 const target_ulong reg_mask_high = reg_immut_masks[rnum + 1];
@@ -231,7 +231,7 @@ static inline void gen_write_ctrl_reg(DisasContext *ctx, 
int reg_num,
 if (reg_num == HEX_REG_P3_0_ALIASED) {
 gen_write_p3_0(ctx, val);
 } else {
-gen_log_reg_write(reg_num, val);
+gen_log_reg_write(ctx, reg_num, val);
 if (reg_num == HEX_REG_QEMU_PKT_CNT) {
 ctx->num_packets = 0;
 }
@@ -255,7 +255,7 @@ static inline void gen_write_ctrl_reg_pair(DisasContext 
*ctx, int reg_num,
 tcg_gen_extrh_i64_i32(val32, val);
 tcg_gen_mov_tl(result, val32);
 } else {
-gen_log_reg_write_pair(reg_num, val);
+gen_log_reg_write_pair(ctx, reg_num, val);
 if (reg_num == HEX_REG_QEMU_PKT_CNT) {
 ctx->num_packets = 0;
 ctx->num_insns = 0;
@@ -719,7 +719,7 @@ static void gen_cond_return_subinsn(DisasContext *ctx, 
TCGCond cond, TCGv pred)
 {
 TCGv_i64 RddV = get_result_gpr_pair(ctx, HEX_REG_FP);
 gen_cond_return(ctx, RddV, hex_gpr[HEX_REG_FP], pred, cond);
-gen_log_reg_write_pair(HEX_REG_FP, RddV);
+gen_log_reg_write_pair(ctx, HEX_REG_FP, RddV);
 }
 
 static void gen_endloop0(DisasContext *ctx)
diff --git a/target/hexagon/idef-parser/parser-helpers.c 
b/target/hexagon/idef-parser/parser-helpers.c
index 86511efb62..ae0f60ada4 100644
--- a/target/hexagon/idef-parser/parser-helpers.c
+++ b/target/hexagon/idef-parser/parser-helpers.c
@@ -1318,7 +1318,7 @@ void gen_write_reg(Context *c, YYLTYPE *locp, HexValue 
*reg, HexValue *value)
 value_m = rvalue_materialize(c, locp, _m);
 OUT(c,
 locp,
-"gen_log_reg_write(", >reg.id, ", ",
+"gen_log_reg_write(ctx, ", >reg.id, ", ",
 _m, ");\n");
 }
 
diff --git a/target/hexagon/README b/target/hexagon/README
index ebafc78b1c..fe90df63e8 100644
--- a/target/hexagon/README
+++ b/target/hexagon/README
@@ -87,7 +87,7 @@ tcg_funcs_generated.c.inc
 TCGv RsV = hex_gpr[insn->regno[1]];
 TCGv RtV = hex_gpr[insn->regno[2]];
 gen_helper_A2_add(RdV, cpu_env, RsV, RtV);
-gen_log_reg_write(RdN, RdV);
+gen_log_reg_write(ctx, RdN, RdV);
 }
 
 helper_funcs_generated.c.inc
diff --git a/target/hexagon/gen_tcg_funcs.py b/target/hexagon/gen_tcg_funcs.py
index fcb3384480..d9ccbe63f6 100755
--- a/target/hexagon/gen_tcg_funcs.py
+++ b/target/hexagon/gen_tcg_funcs.py
@@ -387,7 +387,8 @@ def gen_helper_call_imm(f, immlett):
 
 
 def genptr_dst_write_pair(f, tag, regtype, regid):
-f.write(f"gen_log_reg_write_pair({regtype}{regid}N, " 

[PATCH v2 11/21] Hexagon (target/hexagon) Short-circuit packet register writes

2023-04-27 Thread Taylor Simpson
In certain cases, we can avoid the overhead of writing to hex_new_value
and write directly to hex_gpr.  We add need_commit field to DisasContext
indicating if the end-of-packet commit is needed.  If it is not needed,
get_result_gpr() and get_result_gpr_pair() can return hex_gpr.

We pass the ctx->need_commit to helpers when needed.

Finally, we can early-exit from gen_reg_writes during packet commit.

There are a few instructions whose semantics write to the result before
reading all the inputs.  Therefore, the idef-parser generated code is
incompatible with short-circuit.  We tell idef-parser to skip them.

For debugging purposes, we add a cpu property to turn off short-circuit.
When the short-circuit property is false, we skip the analysis and force
the end-of-packet commit.

Here's a simple example of the TCG generated for
0x004000b4:  0x7800c020 {   R0 = #0x1 }

BEFORE:
  004000b4
 movi_i32 new_r0,$0x1
 mov_i32 r0,new_r0

AFTER:
  004000b4
 movi_i32 r0,$0x1

This patch reintroduces a use of check_for_attrib, so we remove the
G_GNUC_UNUSED added earlier in this series.

Signed-off-by: Taylor Simpson 
Reviewed-by: Richard Henderson 
---
 target/hexagon/cpu.h|  1 +
 target/hexagon/gen_tcg.h|  3 +-
 target/hexagon/genptr.h |  2 +
 target/hexagon/helper.h |  2 +-
 target/hexagon/macros.h | 13 -
 target/hexagon/translate.h  |  2 +
 target/hexagon/arch.c   |  3 +-
 target/hexagon/cpu.c|  5 +-
 target/hexagon/genptr.c | 30 ---
 target/hexagon/op_helper.c  |  5 +-
 target/hexagon/translate.c  | 67 -
 target/hexagon/gen_helper_funcs.py  |  2 +
 target/hexagon/gen_helper_protos.py | 10 +++-
 target/hexagon/gen_idef_parser_funcs.py |  7 +++
 target/hexagon/gen_tcg_funcs.py |  5 ++
 target/hexagon/hex_common.py|  3 ++
 16 files changed, 129 insertions(+), 31 deletions(-)

diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h
index 81b663ecfb..9252055a38 100644
--- a/target/hexagon/cpu.h
+++ b/target/hexagon/cpu.h
@@ -146,6 +146,7 @@ struct ArchCPU {
 
 bool lldb_compat;
 target_ulong lldb_stack_adjust;
+bool short_circuit;
 };
 
 #include "cpu_bits.h"
diff --git a/target/hexagon/gen_tcg.h b/target/hexagon/gen_tcg.h
index 2b2a6175a5..1f7e535300 100644
--- a/target/hexagon/gen_tcg.h
+++ b/target/hexagon/gen_tcg.h
@@ -592,7 +592,8 @@
 #define fGEN_TCG_A5_ACS(SHORTCODE) \
 do { \
 gen_helper_vacsh_pred(PeV, cpu_env, RxxV, RssV, RttV); \
-gen_helper_vacsh_val(RxxV, cpu_env, RxxV, RssV, RttV); \
+gen_helper_vacsh_val(RxxV, cpu_env, RxxV, RssV, RttV, \
+ tcg_constant_tl(ctx->need_commit)); \
 } while (0)
 
 #define fGEN_TCG_S2_cabacdecbin(SHORTCODE) \
diff --git a/target/hexagon/genptr.h b/target/hexagon/genptr.h
index 75d0fc262d..420867f934 100644
--- a/target/hexagon/genptr.h
+++ b/target/hexagon/genptr.h
@@ -58,4 +58,6 @@ void gen_set_half(int N, TCGv result, TCGv src);
 void gen_set_half_i64(int N, TCGv_i64 result, TCGv src);
 void probe_noshuf_load(TCGv va, int s, int mi);
 
+extern const target_ulong reg_immut_masks[TOTAL_PER_THREAD_REGS];
+
 #endif
diff --git a/target/hexagon/helper.h b/target/hexagon/helper.h
index 73849e3d49..4b750d0351 100644
--- a/target/hexagon/helper.h
+++ b/target/hexagon/helper.h
@@ -29,7 +29,7 @@ DEF_HELPER_FLAGS_4(fcircadd, TCG_CALL_NO_RWG_SE, s32, s32, 
s32, s32, s32)
 DEF_HELPER_FLAGS_1(fbrev, TCG_CALL_NO_RWG_SE, i32, i32)
 DEF_HELPER_3(sfrecipa, i64, env, f32, f32)
 DEF_HELPER_2(sfinvsqrta, i64, env, f32)
-DEF_HELPER_4(vacsh_val, s64, env, s64, s64, s64)
+DEF_HELPER_5(vacsh_val, s64, env, s64, s64, s64, i32)
 DEF_HELPER_FLAGS_4(vacsh_pred, TCG_CALL_NO_RWG_SE, s32, env, s64, s64, s64)
 DEF_HELPER_FLAGS_2(cabacdecbin_val, TCG_CALL_NO_RWG_SE, s64, s64, s64)
 DEF_HELPER_FLAGS_2(cabacdecbin_pred, TCG_CALL_NO_RWG_SE, s32, s64, s64)
diff --git a/target/hexagon/macros.h b/target/hexagon/macros.h
index 16e72ed0d5..a68446a367 100644
--- a/target/hexagon/macros.h
+++ b/target/hexagon/macros.h
@@ -44,8 +44,17 @@
reg_field_info[FIELD].offset)
 
 #define SET_USR_FIELD(FIELD, VAL) \
-fINSERT_BITS(env->new_value[HEX_REG_USR], reg_field_info[FIELD].width, \
- reg_field_info[FIELD].offset, (VAL))
+do { \
+if (pkt_need_commit) { \
+fINSERT_BITS(env->new_value[HEX_REG_USR], \
+reg_field_info[FIELD].width, \
+reg_field_info[FIELD].offset, (VAL)); \
+} else { \
+fINSERT_BITS(env->gpr[HEX_REG_USR], \
+reg_field_info[FIELD].width, \
+reg_field_info[FIELD].offset, (VAL)); \
+} \
+} while (0)
 #endif
 
 #ifdef QEMU_GENERATE
diff --git a/target/hexagon/translate.h b/target/hexagon/translate.h
index 

[PATCH v2 18/21] Hexagon (target/hexagon) Move new_pred_value to DisasContext

2023-04-27 Thread Taylor Simpson
The new_pred_value array in the CPUHexagonState is only used for
bookkeeping within the translation of a packet.  With recent changes
that eliminate the need to free TCGv variables, these make more sense
to be transient and kept in DisasContext.

Suggested-by: Richard Henderson 
Signed-off-by: Taylor Simpson 
Reviewed-by: Richard Henderson 
---
 target/hexagon/cpu.h|  1 -
 target/hexagon/gen_tcg.h| 12 ++--
 target/hexagon/translate.h  |  2 +-
 target/hexagon/genptr.c | 10 +++---
 target/hexagon/idef-parser/parser-helpers.c |  2 +-
 target/hexagon/op_helper.c  |  2 +-
 target/hexagon/translate.c  | 16 ++--
 target/hexagon/gen_tcg_funcs.py |  2 +-
 8 files changed, 23 insertions(+), 24 deletions(-)

diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h
index 22aba20be2..8ce2c4 100644
--- a/target/hexagon/cpu.h
+++ b/target/hexagon/cpu.h
@@ -94,7 +94,6 @@ typedef struct CPUArchState {
 target_ulong this_PC;
 target_ulong reg_written[TOTAL_PER_THREAD_REGS];
 
-target_ulong new_pred_value[NUM_PREGS];
 target_ulong pred_written;
 
 MemLog mem_log_stores[STORES_MAX];
diff --git a/target/hexagon/gen_tcg.h b/target/hexagon/gen_tcg.h
index fabc1eb623..97dfdcb326 100644
--- a/target/hexagon/gen_tcg.h
+++ b/target/hexagon/gen_tcg.h
@@ -581,9 +581,9 @@
 #define fGEN_TCG_SL2_return_f(SHORTCODE) \
 gen_cond_return_subinsn(ctx, TCG_COND_NE, hex_pred[0])
 #define fGEN_TCG_SL2_return_tnew(SHORTCODE) \
-gen_cond_return_subinsn(ctx, TCG_COND_EQ, hex_new_pred_value[0])
+gen_cond_return_subinsn(ctx, TCG_COND_EQ, ctx->new_pred_value[0])
 #define fGEN_TCG_SL2_return_fnew(SHORTCODE) \
-gen_cond_return_subinsn(ctx, TCG_COND_NE, hex_new_pred_value[0])
+gen_cond_return_subinsn(ctx, TCG_COND_NE, ctx->new_pred_value[0])
 
 /*
  * Mathematical operations with more than one definition require
@@ -1118,7 +1118,7 @@
 #define fGEN_TCG_SA1_clrtnew(SHORTCODE) \
 do { \
 tcg_gen_movcond_tl(TCG_COND_EQ, RdV, \
-   hex_new_pred_value[0], tcg_constant_tl(0), \
+   ctx->new_pred_value[0], tcg_constant_tl(0), \
RdV, tcg_constant_tl(0)); \
 } while (0)
 
@@ -1126,7 +1126,7 @@
 #define fGEN_TCG_SA1_clrfnew(SHORTCODE) \
 do { \
 tcg_gen_movcond_tl(TCG_COND_NE, RdV, \
-   hex_new_pred_value[0], tcg_constant_tl(0), \
+   ctx->new_pred_value[0], tcg_constant_tl(0), \
RdV, tcg_constant_tl(0)); \
 } while (0)
 
@@ -1153,9 +1153,9 @@
 gen_cond_jumpr31(ctx, TCG_COND_NE, hex_pred[0])
 
 #define fGEN_TCG_SL2_jumpr31_tnew(SHORTCODE) \
-gen_cond_jumpr31(ctx, TCG_COND_EQ, hex_new_pred_value[0])
+gen_cond_jumpr31(ctx, TCG_COND_EQ, ctx->new_pred_value[0])
 #define fGEN_TCG_SL2_jumpr31_fnew(SHORTCODE) \
-gen_cond_jumpr31(ctx, TCG_COND_NE, hex_new_pred_value[0])
+gen_cond_jumpr31(ctx, TCG_COND_NE, ctx->new_pred_value[0])
 
 /* Count trailing zeros/ones */
 #define fGEN_TCG_S2_ct0(SHORTCODE) \
diff --git a/target/hexagon/translate.h b/target/hexagon/translate.h
index 6dde487566..fdfa1b6fe3 100644
--- a/target/hexagon/translate.h
+++ b/target/hexagon/translate.h
@@ -70,6 +70,7 @@ typedef struct DisasContext {
 bool short_circuit;
 bool has_hvx_helper;
 TCGv new_value[TOTAL_PER_THREAD_REGS];
+TCGv new_pred_value[NUM_PREGS];
 } DisasContext;
 
 static inline void ctx_log_pred_write(DisasContext *ctx, int pnum)
@@ -193,7 +194,6 @@ extern TCGv hex_slot_cancelled;
 extern TCGv hex_branch_taken;
 extern TCGv hex_new_value_usr;
 extern TCGv hex_reg_written[TOTAL_PER_THREAD_REGS];
-extern TCGv hex_new_pred_value[NUM_PREGS];
 extern TCGv hex_pred_written;
 extern TCGv hex_store_addr[STORES_MAX];
 extern TCGv hex_store_width[STORES_MAX];
diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index c7a8e2ce55..c71bea0530 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -121,7 +121,11 @@ static void gen_log_reg_write_pair(DisasContext *ctx, int 
rnum, TCGv_i64 val)
 TCGv get_result_pred(DisasContext *ctx, int pnum)
 {
 if (ctx->need_commit) {
-return hex_new_pred_value[pnum];
+if (ctx->new_pred_value[pnum] == NULL) {
+ctx->new_pred_value[pnum] = tcg_temp_new();
+tcg_gen_movi_tl(ctx->new_pred_value[pnum], 0);
+}
+return ctx->new_pred_value[pnum];
 } else {
 return hex_pred[pnum];
 }
@@ -607,7 +611,7 @@ static void gen_cmpnd_cmp_jmp(DisasContext *ctx,
 gen_log_pred_write(ctx, pnum, pred);
 } else {
 TCGv pred = tcg_temp_new();
-tcg_gen_mov_tl(pred, hex_new_pred_value[pnum]);
+tcg_gen_mov_tl(pred, ctx->new_pred_value[pnum]);
 gen_cond_jump(ctx, cond2, pred, pc_off);
 }
 }
@@ -664,7 +668,7 @@ static void 

[PATCH v2 16/21] Hexagon (target/hexagon) Make special new_value for USR

2023-04-27 Thread Taylor Simpson
Precursor to moving new_value from the global state to DisasContext

USR will need to stay in the global state because some helpers will
set it's value

Signed-off-by: Taylor Simpson 
Reviewed-by: Richard Henderson 
---
 target/hexagon/cpu.h|  1 +
 target/hexagon/genptr.h |  1 +
 target/hexagon/macros.h |  2 +-
 target/hexagon/translate.h  |  1 +
 target/hexagon/genptr.c |  8 ++--
 target/hexagon/translate.c  | 22 +++---
 target/hexagon/README   |  2 +-
 target/hexagon/gen_tcg_funcs.py |  2 +-
 8 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h
index 9252055a38..3687f2caa2 100644
--- a/target/hexagon/cpu.h
+++ b/target/hexagon/cpu.h
@@ -86,6 +86,7 @@ typedef struct CPUArchState {
 
 uint8_t slot_cancelled;
 target_ulong new_value[TOTAL_PER_THREAD_REGS];
+target_ulong new_value_usr;
 
 /*
  * Only used when HEX_DEBUG is on, but unconditionally included
diff --git a/target/hexagon/genptr.h b/target/hexagon/genptr.h
index e11ccc2358..a4b43c2910 100644
--- a/target/hexagon/genptr.h
+++ b/target/hexagon/genptr.h
@@ -35,6 +35,7 @@ void gen_store4i(TCGv_env cpu_env, TCGv vaddr, int32_t src, 
uint32_t slot);
 void gen_store8i(TCGv_env cpu_env, TCGv vaddr, int64_t src, uint32_t slot);
 TCGv gen_read_reg(TCGv result, int num);
 TCGv gen_read_preg(TCGv pred, uint8_t num);
+TCGv get_result_gpr(DisasContext *ctx, int rnum);
 TCGv get_result_pred(DisasContext *ctx, int pnum);
 void gen_log_reg_write(DisasContext *ctx, int rnum, TCGv val);
 void gen_log_pred_write(DisasContext *ctx, int pnum, TCGv val);
diff --git a/target/hexagon/macros.h b/target/hexagon/macros.h
index a68446a367..27172193a0 100644
--- a/target/hexagon/macros.h
+++ b/target/hexagon/macros.h
@@ -46,7 +46,7 @@
 #define SET_USR_FIELD(FIELD, VAL) \
 do { \
 if (pkt_need_commit) { \
-fINSERT_BITS(env->new_value[HEX_REG_USR], \
+fINSERT_BITS(env->new_value_usr, \
 reg_field_info[FIELD].width, \
 reg_field_info[FIELD].offset, (VAL)); \
 } else { \
diff --git a/target/hexagon/translate.h b/target/hexagon/translate.h
index 26bcae0395..4c17433a6f 100644
--- a/target/hexagon/translate.h
+++ b/target/hexagon/translate.h
@@ -191,6 +191,7 @@ extern TCGv hex_this_PC;
 extern TCGv hex_slot_cancelled;
 extern TCGv hex_branch_taken;
 extern TCGv hex_new_value[TOTAL_PER_THREAD_REGS];
+extern TCGv hex_new_value_usr;
 extern TCGv hex_reg_written[TOTAL_PER_THREAD_REGS];
 extern TCGv hex_new_pred_value[NUM_PREGS];
 extern TCGv hex_pred_written;
diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index 0727d4524b..ede1474ea5 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -68,10 +68,14 @@ static inline void gen_masked_reg_write(TCGv new_val, TCGv 
cur_val,
 }
 }
 
-static TCGv get_result_gpr(DisasContext *ctx, int rnum)
+TCGv get_result_gpr(DisasContext *ctx, int rnum)
 {
 if (ctx->need_commit) {
-return hex_new_value[rnum];
+if (rnum == HEX_REG_USR) {
+return hex_new_value_usr;
+} else {
+return hex_new_value[rnum];
+}
 } else {
 return hex_gpr[rnum];
 }
diff --git a/target/hexagon/translate.c b/target/hexagon/translate.c
index c7a04e34d2..d46a724c1b 100644
--- a/target/hexagon/translate.c
+++ b/target/hexagon/translate.c
@@ -45,6 +45,7 @@ TCGv hex_this_PC;
 TCGv hex_slot_cancelled;
 TCGv hex_branch_taken;
 TCGv hex_new_value[TOTAL_PER_THREAD_REGS];
+TCGv hex_new_value_usr;
 TCGv hex_reg_written[TOTAL_PER_THREAD_REGS];
 TCGv hex_new_pred_value[NUM_PREGS];
 TCGv hex_pred_written;
@@ -547,12 +548,12 @@ static void gen_start_packet(DisasContext *ctx)
 tcg_gen_movi_tl(hex_pred_written, 0);
 }
 
-/* Preload the predicated registers into hex_new_value[i] */
+/* Preload the predicated registers into get_result_gpr(ctx, i) */
 if (ctx->need_commit &&
 !bitmap_empty(ctx->predicated_regs, TOTAL_PER_THREAD_REGS)) {
 int i = find_first_bit(ctx->predicated_regs, TOTAL_PER_THREAD_REGS);
 while (i < TOTAL_PER_THREAD_REGS) {
-tcg_gen_mov_tl(hex_new_value[i], hex_gpr[i]);
+tcg_gen_mov_tl(get_result_gpr(ctx, i), hex_gpr[i]);
 i = find_next_bit(ctx->predicated_regs, TOTAL_PER_THREAD_REGS,
   i + 1);
 }
@@ -664,7 +665,7 @@ static void gen_reg_writes(DisasContext *ctx)
 for (i = 0; i < ctx->reg_log_idx; i++) {
 int reg_num = ctx->reg_log[i];
 
-tcg_gen_mov_tl(hex_gpr[reg_num], hex_new_value[reg_num]);
+tcg_gen_mov_tl(hex_gpr[reg_num], get_result_gpr(ctx, reg_num));
 
 /*
  * ctx->is_tight_loop is set when SA0 points to the beginning of the 
TB.
@@ -1177,10 +1178,14 @@ void hexagon_translate_init(void)
 offsetof(CPUHexagonState, gpr[i]),
 hexagon_regnames[i]);
 

[PATCH v2 06/21] Hexagon (target/hexagon) Remove log_reg_write from op_helper.[ch]

2023-04-27 Thread Taylor Simpson
With the overrides added in prior commits, this function is not used
Remove references in macros.h

Signed-off-by: Taylor Simpson 
Reviewed-by: Richard Henderson 
---
 target/hexagon/macros.h| 14 --
 target/hexagon/op_helper.h |  4 
 target/hexagon/op_helper.c | 17 -
 3 files changed, 35 deletions(-)

diff --git a/target/hexagon/macros.h b/target/hexagon/macros.h
index 2cb0647ce2..94a676fbf9 100644
--- a/target/hexagon/macros.h
+++ b/target/hexagon/macros.h
@@ -343,10 +343,6 @@ static inline TCGv gen_read_ireg(TCGv result, TCGv val, 
int shift)
 
 #define fREAD_LR() (env->gpr[HEX_REG_LR])
 
-#define fWRITE_LR(A) log_reg_write(env, HEX_REG_LR, A)
-#define fWRITE_FP(A) log_reg_write(env, HEX_REG_FP, A)
-#define fWRITE_SP(A) log_reg_write(env, HEX_REG_SP, A)
-
 #define fREAD_SP() (env->gpr[HEX_REG_SP])
 #define fREAD_LC0 (env->gpr[HEX_REG_LC0])
 #define fREAD_LC1 (env->gpr[HEX_REG_LC1])
@@ -371,16 +367,6 @@ static inline TCGv gen_read_ireg(TCGv result, TCGv val, 
int shift)
 #define fBRANCH(LOC, TYPE)  fWRITE_NPC(LOC)
 #define fJUMPR(REGNO, TARGET, TYPE) fBRANCH(TARGET, COF_TYPE_JUMPR)
 #define fHINTJR(TARGET) { /* Not modelled in qemu */}
-#define fWRITE_LOOP_REGS0(START, COUNT) \
-do { \
-log_reg_write(env, HEX_REG_LC0, COUNT);  \
-log_reg_write(env, HEX_REG_SA0, START); \
-} while (0)
-#define fWRITE_LOOP_REGS1(START, COUNT) \
-do { \
-log_reg_write(env, HEX_REG_LC1, COUNT);  \
-log_reg_write(env, HEX_REG_SA1, START);\
-} while (0)
 
 #define fSET_OVERFLOW() SET_USR_FIELD(USR_OVF, 1)
 #define fSET_LPCFG(VAL) SET_USR_FIELD(USR_LPCFG, (VAL))
diff --git a/target/hexagon/op_helper.h b/target/hexagon/op_helper.h
index db22b54401..6bd4b07849 100644
--- a/target/hexagon/op_helper.h
+++ b/target/hexagon/op_helper.h
@@ -19,15 +19,11 @@
 #define HEXAGON_OP_HELPER_H
 
 /* Misc functions */
-void write_new_pc(CPUHexagonState *env, bool pkt_has_multi_cof, target_ulong 
addr);
-
 uint8_t mem_load1(CPUHexagonState *env, uint32_t slot, target_ulong vaddr);
 uint16_t mem_load2(CPUHexagonState *env, uint32_t slot, target_ulong vaddr);
 uint32_t mem_load4(CPUHexagonState *env, uint32_t slot, target_ulong vaddr);
 uint64_t mem_load8(CPUHexagonState *env, uint32_t slot, target_ulong vaddr);
 
-void log_reg_write(CPUHexagonState *env, int rnum,
-   target_ulong val);
 void log_store64(CPUHexagonState *env, target_ulong addr,
  int64_t val, int width, int slot);
 void log_store32(CPUHexagonState *env, target_ulong addr,
diff --git a/target/hexagon/op_helper.c b/target/hexagon/op_helper.c
index 3cc71b69d9..7e9e3f305e 100644
--- a/target/hexagon/op_helper.c
+++ b/target/hexagon/op_helper.c
@@ -52,23 +52,6 @@ G_NORETURN void HELPER(raise_exception)(CPUHexagonState 
*env, uint32_t excp)
 do_raise_exception_err(env, excp, 0);
 }
 
-void log_reg_write(CPUHexagonState *env, int rnum,
-   target_ulong val)
-{
-HEX_DEBUG_LOG("log_reg_write[%d] = " TARGET_FMT_ld " (0x" TARGET_FMT_lx 
")",
-  rnum, val, val);
-if (val == env->gpr[rnum]) {
-HEX_DEBUG_LOG(" NO CHANGE");
-}
-HEX_DEBUG_LOG("\n");
-
-env->new_value[rnum] = val;
-if (HEX_DEBUG) {
-/* Do this so HELPER(debug_commit_end) will know */
-env->reg_written[rnum] = 1;
-}
-}
-
 static void log_pred_write(CPUHexagonState *env, int pnum, target_ulong val)
 {
 HEX_DEBUG_LOG("log_pred_write[%d] = " TARGET_FMT_ld
-- 
2.25.1



[PATCH v2 10/21] Hexagon (target/hexagon) Mark registers as read during packet analysis

2023-04-27 Thread Taylor Simpson
Have gen_analyze_funcs mark the registers that are read by the
instruction.  We also mark the implicit reads using instruction
attributes.

Signed-off-by: Taylor Simpson 
Reviewed-by: Richard Henderson 
---
 target/hexagon/translate.h  | 36 +++
 target/hexagon/attribs_def.h.inc|  6 +++-
 target/hexagon/translate.c  | 20 +
 target/hexagon/gen_analyze_funcs.py | 44 -
 target/hexagon/hex_common.py|  6 
 5 files changed, 97 insertions(+), 15 deletions(-)

diff --git a/target/hexagon/translate.h b/target/hexagon/translate.h
index 4b9f21c41d..f72228859f 100644
--- a/target/hexagon/translate.h
+++ b/target/hexagon/translate.h
@@ -38,10 +38,12 @@ typedef struct DisasContext {
 int reg_log[REG_WRITES_MAX];
 int reg_log_idx;
 DECLARE_BITMAP(regs_written, TOTAL_PER_THREAD_REGS);
+DECLARE_BITMAP(regs_read, TOTAL_PER_THREAD_REGS);
 DECLARE_BITMAP(predicated_regs, TOTAL_PER_THREAD_REGS);
 int preg_log[PRED_WRITES_MAX];
 int preg_log_idx;
 DECLARE_BITMAP(pregs_written, NUM_PREGS);
+DECLARE_BITMAP(pregs_read, NUM_PREGS);
 uint8_t store_width[STORES_MAX];
 bool s1_store_processed;
 int future_vregs_idx;
@@ -55,8 +57,10 @@ typedef struct DisasContext {
 DECLARE_BITMAP(vregs_select, NUM_VREGS);
 DECLARE_BITMAP(predicated_future_vregs, NUM_VREGS);
 DECLARE_BITMAP(predicated_tmp_vregs, NUM_VREGS);
+DECLARE_BITMAP(vregs_read, NUM_VREGS);
 int qreg_log[NUM_QREGS];
 int qreg_log_idx;
+DECLARE_BITMAP(qregs_read, NUM_QREGS);
 bool pre_commit;
 TCGCond branch_cond;
 target_ulong branch_dest;
@@ -73,6 +77,11 @@ static inline void ctx_log_pred_write(DisasContext *ctx, int 
pnum)
 }
 }
 
+static inline void ctx_log_pred_read(DisasContext *ctx, int pnum)
+{
+set_bit(pnum, ctx->pregs_read);
+}
+
 static inline void ctx_log_reg_write(DisasContext *ctx, int rnum,
  bool is_predicated)
 {
@@ -99,6 +108,17 @@ static inline void ctx_log_reg_write_pair(DisasContext 
*ctx, int rnum,
 ctx_log_reg_write(ctx, rnum + 1, is_predicated);
 }
 
+static inline void ctx_log_reg_read(DisasContext *ctx, int rnum)
+{
+set_bit(rnum, ctx->regs_read);
+}
+
+static inline void ctx_log_reg_read_pair(DisasContext *ctx, int rnum)
+{
+ctx_log_reg_read(ctx, rnum);
+ctx_log_reg_read(ctx, rnum + 1);
+}
+
 intptr_t ctx_future_vreg_off(DisasContext *ctx, int regnum,
  int num, bool alloc_ok);
 intptr_t ctx_tmp_vreg_off(DisasContext *ctx, int regnum,
@@ -139,6 +159,17 @@ static inline void ctx_log_vreg_write_pair(DisasContext 
*ctx,
 ctx_log_vreg_write(ctx, rnum ^ 1, type, is_predicated);
 }
 
+static inline void ctx_log_vreg_read(DisasContext *ctx, int rnum)
+{
+set_bit(rnum, ctx->vregs_read);
+}
+
+static inline void ctx_log_vreg_read_pair(DisasContext *ctx, int rnum)
+{
+ctx_log_vreg_read(ctx, rnum ^ 0);
+ctx_log_vreg_read(ctx, rnum ^ 1);
+}
+
 static inline void ctx_log_qreg_write(DisasContext *ctx,
   int rnum)
 {
@@ -146,6 +177,11 @@ static inline void ctx_log_qreg_write(DisasContext *ctx,
 ctx->qreg_log_idx++;
 }
 
+static inline void ctx_log_qreg_read(DisasContext *ctx, int qnum)
+{
+set_bit(qnum, ctx->qregs_read);
+}
+
 extern TCGv hex_gpr[TOTAL_PER_THREAD_REGS];
 extern TCGv hex_pred[NUM_PREGS];
 extern TCGv hex_this_PC;
diff --git a/target/hexagon/attribs_def.h.inc b/target/hexagon/attribs_def.h.inc
index 9874d1658f..17f86e1c32 100644
--- a/target/hexagon/attribs_def.h.inc
+++ b/target/hexagon/attribs_def.h.inc
@@ -1,5 +1,5 @@
 /*
- *  Copyright(c) 2019-2022 Qualcomm Innovation Center, Inc. All Rights 
Reserved.
+ *  Copyright(c) 2019-2023 Qualcomm Innovation Center, Inc. All Rights 
Reserved.
  *
  *  This program is free software; you can redistribute it and/or modify
  *  it under the terms of the GNU General Public License as published by
@@ -102,6 +102,10 @@ DEF_ATTRIB(IMPLICIT_WRITES_P1, "Writes Predicate 1", "", 
"UREG.P1")
 DEF_ATTRIB(IMPLICIT_WRITES_P2, "Writes Predicate 1", "", "UREG.P2")
 DEF_ATTRIB(IMPLICIT_WRITES_P3, "May write Predicate 3", "", "UREG.P3")
 DEF_ATTRIB(IMPLICIT_READS_PC, "Reads the PC register", "", "")
+DEF_ATTRIB(IMPLICIT_READS_P0, "Reads the P0 register", "", "")
+DEF_ATTRIB(IMPLICIT_READS_P1, "Reads the P1 register", "", "")
+DEF_ATTRIB(IMPLICIT_READS_P2, "Reads the P2 register", "", "")
+DEF_ATTRIB(IMPLICIT_READS_P3, "Reads the P3 register", "", "")
 DEF_ATTRIB(IMPLICIT_WRITES_USR, "May write USR", "", "")
 DEF_ATTRIB(WRITES_PRED_REG, "Writes a predicate register", "", "")
 DEF_ATTRIB(COMMUTES, "The operation is communitive", "", "")
diff --git a/target/hexagon/translate.c b/target/hexagon/translate.c
index 6b004b6248..023fc9be1e 100644
--- a/target/hexagon/translate.c
+++ b/target/hexagon/translate.c
@@ -336,6 +336,21 @@ static void mark_implicit_pred_writes(DisasContext *ctx)
 

[PATCH v2 01/21] meson.build Add CONFIG_HEXAGON_IDEF_PARSER

2023-04-27 Thread Taylor Simpson
Enable conditional compilation depending on whether idef-parser
is configured

Signed-off-by: Taylor Simpson 
---
 meson.build | 1 +
 1 file changed, 1 insertion(+)

diff --git a/meson.build b/meson.build
index c44d05a13f..d4e438b033 100644
--- a/meson.build
+++ b/meson.build
@@ -1859,6 +1859,7 @@ endif
 config_host_data.set('CONFIG_GTK', gtk.found())
 config_host_data.set('CONFIG_VTE', vte.found())
 config_host_data.set('CONFIG_GTK_CLIPBOARD', have_gtk_clipboard)
+config_host_data.set('CONFIG_HEXAGON_IDEF_PARSER', 
get_option('hexagon_idef_parser'))
 config_host_data.set('CONFIG_LIBATTR', have_old_libattr)
 config_host_data.set('CONFIG_LIBCAP_NG', libcap_ng.found())
 config_host_data.set('CONFIG_EBPF', libbpf.found())
-- 
2.25.1



[PATCH v2 08/21] Hexagon (target/hexagon) Clean up pred_written usage

2023-04-27 Thread Taylor Simpson
Only endloop instructions will conditionally write to a predicate.
When there is an endloop instruction, we preload the values into
new_pred_value.

The only place pred_written is needed is when HEX_DEBUG is on.

We remove the last use of check_for_attrib.  However, new uses will be
introduced later in this series, so we mark it with G_GNUC_UNUSED.

Signed-off-by: Taylor Simpson 
Reviewed-by: Richard Henderson 
---
 target/hexagon/genptr.c| 16 +---
 target/hexagon/translate.c | 53 --
 2 files changed, 23 insertions(+), 46 deletions(-)

diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index cde5cff06a..2014a8068a 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -137,7 +137,9 @@ void gen_log_pred_write(DisasContext *ctx, int pnum, TCGv 
val)
 tcg_gen_and_tl(hex_new_pred_value[pnum],
hex_new_pred_value[pnum], base_val);
 }
-tcg_gen_ori_tl(hex_pred_written, hex_pred_written, 1 << pnum);
+if (HEX_DEBUG) {
+tcg_gen_ori_tl(hex_pred_written, hex_pred_written, 1 << pnum);
+}
 set_bit(pnum, ctx->pregs_written);
 }
 
@@ -826,15 +828,13 @@ static void gen_endloop0(DisasContext *ctx)
 
 /*
  *if (lpcfg == 1) {
- *hex_new_pred_value[3] = 0xff;
- *hex_pred_written |= 1 << 3;
+ *p3 = 0xff;
  *}
  */
 TCGLabel *label1 = gen_new_label();
 tcg_gen_brcondi_tl(TCG_COND_NE, lpcfg, 1, label1);
 {
-tcg_gen_movi_tl(hex_new_pred_value[3], 0xff);
-tcg_gen_ori_tl(hex_pred_written, hex_pred_written, 1 << 3);
+gen_log_pred_write(ctx, 3, tcg_constant_tl(0xff));
 }
 gen_set_label(label1);
 
@@ -903,14 +903,12 @@ static void gen_endloop01(DisasContext *ctx)
 
 /*
  *if (lpcfg == 1) {
- *hex_new_pred_value[3] = 0xff;
- *hex_pred_written |= 1 << 3;
+ *p3 = 0xff;
  *}
  */
 tcg_gen_brcondi_tl(TCG_COND_NE, lpcfg, 1, label1);
 {
-tcg_gen_movi_tl(hex_new_pred_value[3], 0xff);
-tcg_gen_ori_tl(hex_pred_written, hex_pred_written, 1 << 3);
+gen_log_pred_write(ctx, 3, tcg_constant_tl(0xff));
 }
 gen_set_label(label1);
 
diff --git a/target/hexagon/translate.c b/target/hexagon/translate.c
index c087f183d0..6b004b6248 100644
--- a/target/hexagon/translate.c
+++ b/target/hexagon/translate.c
@@ -239,7 +239,7 @@ static int read_packet_words(CPUHexagonState *env, 
DisasContext *ctx,
 return nwords;
 }
 
-static bool check_for_attrib(Packet *pkt, int attrib)
+static G_GNUC_UNUSED bool check_for_attrib(Packet *pkt, int attrib)
 {
 for (int i = 0; i < pkt->num_insns; i++) {
 if (GET_ATTRIB(pkt->insn[i].opcode, attrib)) {
@@ -262,11 +262,6 @@ static bool need_slot_cancelled(Packet *pkt)
 return false;
 }
 
-static bool need_pred_written(Packet *pkt)
-{
-return check_for_attrib(pkt, A_WRITES_PRED_REG);
-}
-
 static bool need_next_PC(DisasContext *ctx)
 {
 Packet *pkt = ctx->pkt;
@@ -414,7 +409,7 @@ static void gen_start_packet(DisasContext *ctx)
 tcg_gen_movi_tl(hex_gpr[HEX_REG_PC], next_PC);
 }
 }
-if (need_pred_written(pkt)) {
+if (HEX_DEBUG) {
 tcg_gen_movi_tl(hex_pred_written, 0);
 }
 
@@ -428,6 +423,17 @@ static void gen_start_packet(DisasContext *ctx)
 }
 }
 
+/*
+ * Preload the predicated pred registers into hex_new_pred_value[pred_num]
+ * Only endloop instructions conditionally write to pred registers
+ */
+if (pkt->pkt_has_endloop) {
+for (int i = 0; i < ctx->preg_log_idx; i++) {
+int pred_num = ctx->preg_log[i];
+tcg_gen_mov_tl(hex_new_pred_value[pred_num], hex_pred[pred_num]);
+}
+}
+
 /* Preload the predicated HVX registers into future_VRegs and tmp_VRegs */
 if (!bitmap_empty(ctx->predicated_future_vregs, NUM_VREGS)) {
 int i = find_first_bit(ctx->predicated_future_vregs, NUM_VREGS);
@@ -532,41 +538,14 @@ static void gen_reg_writes(DisasContext *ctx)
 
 static void gen_pred_writes(DisasContext *ctx)
 {
-int i;
-
 /* Early exit if the log is empty */
 if (!ctx->preg_log_idx) {
 return;
 }
 
-/*
- * Only endloop instructions will conditionally
- * write a predicate.  If there are no endloop
- * instructions, we can use the non-conditional
- * write of the predicates.
- */
-if (ctx->pkt->pkt_has_endloop) {
-TCGv zero = tcg_constant_tl(0);
-TCGv pred_written = tcg_temp_new();
-for (i = 0; i < ctx->preg_log_idx; i++) {
-int pred_num = ctx->preg_log[i];
-
-tcg_gen_andi_tl(pred_written, hex_pred_written, 1 << pred_num);
-tcg_gen_movcond_tl(TCG_COND_NE, hex_pred[pred_num],
-   pred_written, zero,
-   hex_new_pred_value[pred_num],
-   hex_pred[pred_num]);
- 

[PATCH v2 19/21] Hexagon (target/hexagon) Move pred_written to DisasContext

2023-04-27 Thread Taylor Simpson
The pred_written variable in the CPUHexagonState is only used for
bookkeeping within the translation of a packet.  With recent changes
that eliminate the need to free TCGv variables, these make more sense
to be transient and kept in DisasContext.

Suggested-by: Richard Henderson 
Signed-off-by: Taylor Simpson 
Reviewed-by: Richard Henderson 
---
 target/hexagon/cpu.h   | 2 --
 target/hexagon/helper.h| 2 +-
 target/hexagon/translate.h | 2 +-
 target/hexagon/genptr.c| 2 +-
 target/hexagon/op_helper.c | 5 +++--
 target/hexagon/translate.c | 9 -
 6 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h
index 8ce2c4..26952cddcb 100644
--- a/target/hexagon/cpu.h
+++ b/target/hexagon/cpu.h
@@ -94,8 +94,6 @@ typedef struct CPUArchState {
 target_ulong this_PC;
 target_ulong reg_written[TOTAL_PER_THREAD_REGS];
 
-target_ulong pred_written;
-
 MemLog mem_log_stores[STORES_MAX];
 target_ulong pkt_has_store_s1;
 target_ulong dczero_addr;
diff --git a/target/hexagon/helper.h b/target/hexagon/helper.h
index 4b750d0351..f3b298beee 100644
--- a/target/hexagon/helper.h
+++ b/target/hexagon/helper.h
@@ -21,7 +21,7 @@
 DEF_HELPER_FLAGS_2(raise_exception, TCG_CALL_NO_RETURN, noreturn, env, i32)
 DEF_HELPER_1(debug_start_packet, void, env)
 DEF_HELPER_FLAGS_3(debug_check_store_width, TCG_CALL_NO_WG, void, env, int, 
int)
-DEF_HELPER_FLAGS_3(debug_commit_end, TCG_CALL_NO_WG, void, env, int, int)
+DEF_HELPER_FLAGS_4(debug_commit_end, TCG_CALL_NO_WG, void, env, int, int, int)
 DEF_HELPER_2(commit_store, void, env, int)
 DEF_HELPER_3(gather_store, void, env, i32, int)
 DEF_HELPER_1(commit_hvx_stores, void, env)
diff --git a/target/hexagon/translate.h b/target/hexagon/translate.h
index fdfa1b6fe3..a9f1ccee24 100644
--- a/target/hexagon/translate.h
+++ b/target/hexagon/translate.h
@@ -71,6 +71,7 @@ typedef struct DisasContext {
 bool has_hvx_helper;
 TCGv new_value[TOTAL_PER_THREAD_REGS];
 TCGv new_pred_value[NUM_PREGS];
+TCGv pred_written;
 } DisasContext;
 
 static inline void ctx_log_pred_write(DisasContext *ctx, int pnum)
@@ -194,7 +195,6 @@ extern TCGv hex_slot_cancelled;
 extern TCGv hex_branch_taken;
 extern TCGv hex_new_value_usr;
 extern TCGv hex_reg_written[TOTAL_PER_THREAD_REGS];
-extern TCGv hex_pred_written;
 extern TCGv hex_store_addr[STORES_MAX];
 extern TCGv hex_store_width[STORES_MAX];
 extern TCGv hex_store_val32[STORES_MAX];
diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index c71bea0530..1ad4d636f8 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -151,7 +151,7 @@ void gen_log_pred_write(DisasContext *ctx, int pnum, TCGv 
val)
 tcg_gen_and_tl(pred, pred, base_val);
 }
 if (HEX_DEBUG) {
-tcg_gen_ori_tl(hex_pred_written, hex_pred_written, 1 << pnum);
+tcg_gen_ori_tl(ctx->pred_written, ctx->pred_written, 1 << pnum);
 }
 set_bit(pnum, ctx->pregs_written);
 }
diff --git a/target/hexagon/op_helper.c b/target/hexagon/op_helper.c
index 26fba9f5d6..f9021efc7e 100644
--- a/target/hexagon/op_helper.c
+++ b/target/hexagon/op_helper.c
@@ -203,7 +203,8 @@ static void print_store(CPUHexagonState *env, int slot)
 }
 
 /* This function is a handy place to set a breakpoint */
-void HELPER(debug_commit_end)(CPUHexagonState *env, int has_st0, int has_st1)
+void HELPER(debug_commit_end)(CPUHexagonState *env,
+  int pred_written, int has_st0, int has_st1)
 {
 bool reg_printed = false;
 bool pred_printed = false;
@@ -225,7 +226,7 @@ void HELPER(debug_commit_end)(CPUHexagonState *env, int 
has_st0, int has_st1)
 }
 
 for (i = 0; i < NUM_PREGS; i++) {
-if (env->pred_written & (1 << i)) {
+if (pred_written & (1 << i)) {
 if (!pred_printed) {
 HEX_DEBUG_LOG("Predicates written\n");
 pred_printed = true;
diff --git a/target/hexagon/translate.c b/target/hexagon/translate.c
index 890badac10..b185dda35a 100644
--- a/target/hexagon/translate.c
+++ b/target/hexagon/translate.c
@@ -46,7 +46,6 @@ TCGv hex_slot_cancelled;
 TCGv hex_branch_taken;
 TCGv hex_new_value_usr;
 TCGv hex_reg_written[TOTAL_PER_THREAD_REGS];
-TCGv hex_pred_written;
 TCGv hex_store_addr[STORES_MAX];
 TCGv hex_store_width[STORES_MAX];
 TCGv hex_store_val32[STORES_MAX];
@@ -549,7 +548,8 @@ static void gen_start_packet(DisasContext *ctx)
 }
 }
 if (HEX_DEBUG) {
-tcg_gen_movi_tl(hex_pred_written, 0);
+ctx->pred_written = tcg_temp_new();
+tcg_gen_movi_tl(ctx->pred_written, 0);
 }
 
 /* Preload the predicated registers into get_result_gpr(ctx, i) */
@@ -1004,7 +1004,8 @@ static void gen_commit_packet(DisasContext *ctx)
 tcg_constant_tl(pkt->pkt_has_store_s1 && !pkt->pkt_has_dczeroa);
 
 /* Handy place to set a breakpoint at the end of execution */
-gen_helper_debug_commit_end(cpu_env, has_st0, has_st1);
+

[PATCH v2 05/21] Hexagon (target/hexagon) Add overrides for clr[tf]new

2023-04-27 Thread Taylor Simpson
These instructions have implicit reads from p0, so we don't want
them in helpers when idef-parser is off.

Signed-off-by: Taylor Simpson 
---
 target/hexagon/gen_tcg.h | 16 
 target/hexagon/macros.h  |  4 
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/target/hexagon/gen_tcg.h b/target/hexagon/gen_tcg.h
index 7c5cb93297..f3e9c280b0 100644
--- a/target/hexagon/gen_tcg.h
+++ b/target/hexagon/gen_tcg.h
@@ -1097,6 +1097,22 @@
 gen_jump(ctx, riV); \
 } while (0)
 
+/* if (p0.new) r0 = #0 */
+#define fGEN_TCG_SA1_clrtnew(SHORTCODE) \
+do { \
+tcg_gen_movcond_tl(TCG_COND_EQ, RdV, \
+   hex_new_pred_value[0], tcg_constant_tl(0), \
+   RdV, tcg_constant_tl(0)); \
+} while (0)
+
+/* if (!p0.new) r0 = #0 */
+#define fGEN_TCG_SA1_clrfnew(SHORTCODE) \
+do { \
+tcg_gen_movcond_tl(TCG_COND_NE, RdV, \
+   hex_new_pred_value[0], tcg_constant_tl(0), \
+   RdV, tcg_constant_tl(0)); \
+} while (0)
+
 #define fGEN_TCG_J2_pause(SHORTCODE) \
 do { \
 uiV = uiV; \
diff --git a/target/hexagon/macros.h b/target/hexagon/macros.h
index 3e162de3a7..2cb0647ce2 100644
--- a/target/hexagon/macros.h
+++ b/target/hexagon/macros.h
@@ -227,12 +227,8 @@ static inline void gen_cancel(uint32_t slot)
 
 #ifdef QEMU_GENERATE
 #define fLSBNEW(PVAL)   tcg_gen_andi_tl(LSB, (PVAL), 1)
-#define fLSBNEW0tcg_gen_andi_tl(LSB, hex_new_pred_value[0], 1)
-#define fLSBNEW1tcg_gen_andi_tl(LSB, hex_new_pred_value[1], 1)
 #else
 #define fLSBNEW(PVAL)   ((PVAL) & 1)
-#define fLSBNEW0(env->new_pred_value[0] & 1)
-#define fLSBNEW1(env->new_pred_value[1] & 1)
 #endif
 
 #ifdef QEMU_GENERATE
-- 
2.25.1



[PATCH v2 21/21] Hexagon (target/hexagon) Move items to DisasContext

2023-04-27 Thread Taylor Simpson
The following items in the CPUHexagonState are only used for bookkeeping
within the translation of a packet.  With recent changes that eliminate
the need to free TCGv variables, these make more sense to be transient
and kept in DisasContext.

The following items are moved
dczero_addr
branch_taken
this_PC

Suggested-by: Richard Henderson 
Signed-off-by: Taylor Simpson 
Reviewed-by: Richard Henderson 
---
 target/hexagon/cpu.h   |  3 ---
 target/hexagon/helper.h|  2 +-
 target/hexagon/macros.h|  6 +-
 target/hexagon/translate.h |  5 ++---
 target/hexagon/genptr.c|  6 +++---
 target/hexagon/op_helper.c |  5 ++---
 target/hexagon/translate.c | 23 +++
 target/hexagon/README  |  2 +-
 8 files changed, 21 insertions(+), 31 deletions(-)

diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h
index 72b7d79279..d3e5be7778 100644
--- a/target/hexagon/cpu.h
+++ b/target/hexagon/cpu.h
@@ -78,7 +78,6 @@ typedef struct {
 typedef struct CPUArchState {
 target_ulong gpr[TOTAL_PER_THREAD_REGS];
 target_ulong pred[NUM_PREGS];
-target_ulong branch_taken;
 
 /* For comparing with LLDB on target - see adjust_stack_ptrs function */
 target_ulong last_pc_dumped;
@@ -91,11 +90,9 @@ typedef struct CPUArchState {
  * Only used when HEX_DEBUG is on, but unconditionally included
  * to reduce recompile time when turning HEX_DEBUG on/off.
  */
-target_ulong this_PC;
 target_ulong reg_written[TOTAL_PER_THREAD_REGS];
 
 MemLog mem_log_stores[STORES_MAX];
-target_ulong dczero_addr;
 
 float_status fp_status;
 
diff --git a/target/hexagon/helper.h b/target/hexagon/helper.h
index f3b298beee..fa0ebaf7c8 100644
--- a/target/hexagon/helper.h
+++ b/target/hexagon/helper.h
@@ -21,7 +21,7 @@
 DEF_HELPER_FLAGS_2(raise_exception, TCG_CALL_NO_RETURN, noreturn, env, i32)
 DEF_HELPER_1(debug_start_packet, void, env)
 DEF_HELPER_FLAGS_3(debug_check_store_width, TCG_CALL_NO_WG, void, env, int, 
int)
-DEF_HELPER_FLAGS_4(debug_commit_end, TCG_CALL_NO_WG, void, env, int, int, int)
+DEF_HELPER_FLAGS_5(debug_commit_end, TCG_CALL_NO_WG, void, env, i32, int, int, 
int)
 DEF_HELPER_2(commit_store, void, env, int)
 DEF_HELPER_3(gather_store, void, env, i32, int)
 DEF_HELPER_1(commit_hvx_stores, void, env)
diff --git a/target/hexagon/macros.h b/target/hexagon/macros.h
index f5ebaf7f54..bad27d1aeb 100644
--- a/target/hexagon/macros.h
+++ b/target/hexagon/macros.h
@@ -648,7 +648,11 @@ static inline TCGv gen_read_ireg(TCGv result, TCGv val, 
int shift)
reg_field_info[FIELD].offset)
 
 #ifdef QEMU_GENERATE
-#define fDCZEROA(REG) tcg_gen_mov_tl(hex_dczero_addr, (REG))
+#define fDCZEROA(REG) \
+do { \
+ctx->dczero_addr = tcg_temp_new(); \
+tcg_gen_mov_tl(ctx->dczero_addr, (REG)); \
+} while (0)
 #endif
 
 #define fBRANCH_SPECULATE_STALL(DOTNEWVAL, JUMP_COND, SPEC_DIR, HINTBITNUM, \
diff --git a/target/hexagon/translate.h b/target/hexagon/translate.h
index 9697b4de0e..4dd59c6726 100644
--- a/target/hexagon/translate.h
+++ b/target/hexagon/translate.h
@@ -71,6 +71,8 @@ typedef struct DisasContext {
 TCGv new_value[TOTAL_PER_THREAD_REGS];
 TCGv new_pred_value[NUM_PREGS];
 TCGv pred_written;
+TCGv branch_taken;
+TCGv dczero_addr;
 } DisasContext;
 
 static inline void ctx_log_pred_write(DisasContext *ctx, int pnum)
@@ -189,16 +191,13 @@ static inline void ctx_log_qreg_read(DisasContext *ctx, 
int qnum)
 
 extern TCGv hex_gpr[TOTAL_PER_THREAD_REGS];
 extern TCGv hex_pred[NUM_PREGS];
-extern TCGv hex_this_PC;
 extern TCGv hex_slot_cancelled;
-extern TCGv hex_branch_taken;
 extern TCGv hex_new_value_usr;
 extern TCGv hex_reg_written[TOTAL_PER_THREAD_REGS];
 extern TCGv hex_store_addr[STORES_MAX];
 extern TCGv hex_store_width[STORES_MAX];
 extern TCGv hex_store_val32[STORES_MAX];
 extern TCGv_i64 hex_store_val64[STORES_MAX];
-extern TCGv hex_dczero_addr;
 extern TCGv hex_llsc_addr;
 extern TCGv hex_llsc_val;
 extern TCGv_i64 hex_llsc_val_i64;
diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index 1e98e2913c..bd0e11247a 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -480,9 +480,9 @@ static void gen_write_new_pc_addr(DisasContext *ctx, TCGv 
addr,
 if (ctx->pkt->pkt_has_multi_cof) {
 /* If there are multiple branches in a packet, ignore the second one */
 tcg_gen_movcond_tl(TCG_COND_NE, hex_gpr[HEX_REG_PC],
-   hex_branch_taken, tcg_constant_tl(0),
+   ctx->branch_taken, tcg_constant_tl(0),
hex_gpr[HEX_REG_PC], addr);
-tcg_gen_movi_tl(hex_branch_taken, 1);
+tcg_gen_movi_tl(ctx->branch_taken, 1);
 } else {
 tcg_gen_mov_tl(hex_gpr[HEX_REG_PC], addr);
 }
@@ -503,7 +503,7 @@ static void gen_write_new_pc_pcrel(DisasContext *ctx, int 
pc_off,
 ctx->branch_cond = TCG_COND_ALWAYS;
 if (pred != NULL) {
 ctx->branch_cond = 

[PATCH v2 07/21] Hexagon (target/hexagon) Eliminate uses of log_pred_write function

2023-04-27 Thread Taylor Simpson
These instructions have implicit writes to registers, so we don't
want them to be helpers when idef-parser is off.

The following instructions are overriden
S2_cabacdecbin
SA1_cmpeqi

Remove the log_pred_write function from op_helper.c
Remove references in macros.h

Signed-off-by: Taylor Simpson 
Acked-by: Richard Henderson 
---
 target/hexagon/gen_tcg.h   | 16 +++
 target/hexagon/helper.h|  2 +
 target/hexagon/macros.h|  4 --
 target/hexagon/genptr.c|  5 ++
 target/hexagon/op_helper.c | 96 --
 5 files changed, 104 insertions(+), 19 deletions(-)

diff --git a/target/hexagon/gen_tcg.h b/target/hexagon/gen_tcg.h
index f3e9c280b0..2b2a6175a5 100644
--- a/target/hexagon/gen_tcg.h
+++ b/target/hexagon/gen_tcg.h
@@ -595,6 +595,14 @@
 gen_helper_vacsh_val(RxxV, cpu_env, RxxV, RssV, RttV); \
 } while (0)
 
+#define fGEN_TCG_S2_cabacdecbin(SHORTCODE) \
+do { \
+TCGv p0 = tcg_temp_new(); \
+gen_helper_cabacdecbin_pred(p0, RssV, RttV); \
+gen_helper_cabacdecbin_val(RddV, RssV, RttV); \
+gen_log_pred_write(ctx, 0, p0); \
+} while (0)
+
 /*
  * Approximate reciprocal
  * r3,p1 = sfrecipa(r0, r1)
@@ -900,6 +908,14 @@
 #define fGEN_TCG_J4_tstbit0_fp1_jump_t(SHORTCODE) \
 gen_cmpnd_tstbit0_jmp(ctx, 1, RsV, TCG_COND_NE, riV)
 
+/* p0 = cmp.eq(r0, #7) */
+#define fGEN_TCG_SA1_cmpeqi(SHORTCODE) \
+do { \
+TCGv p0 = tcg_temp_new(); \
+gen_comparei(TCG_COND_EQ, p0, RsV, uiV); \
+gen_log_pred_write(ctx, 0, p0); \
+} while (0)
+
 #define fGEN_TCG_J2_jump(SHORTCODE) \
 gen_jump(ctx, riV)
 #define fGEN_TCG_J2_jumpr(SHORTCODE) \
diff --git a/target/hexagon/helper.h b/target/hexagon/helper.h
index ed7f9842f6..73849e3d49 100644
--- a/target/hexagon/helper.h
+++ b/target/hexagon/helper.h
@@ -31,6 +31,8 @@ DEF_HELPER_3(sfrecipa, i64, env, f32, f32)
 DEF_HELPER_2(sfinvsqrta, i64, env, f32)
 DEF_HELPER_4(vacsh_val, s64, env, s64, s64, s64)
 DEF_HELPER_FLAGS_4(vacsh_pred, TCG_CALL_NO_RWG_SE, s32, env, s64, s64, s64)
+DEF_HELPER_FLAGS_2(cabacdecbin_val, TCG_CALL_NO_RWG_SE, s64, s64, s64)
+DEF_HELPER_FLAGS_2(cabacdecbin_pred, TCG_CALL_NO_RWG_SE, s32, s64, s64)
 
 /* Floating point */
 DEF_HELPER_2(conv_sf2df, f64, env, f32)
diff --git a/target/hexagon/macros.h b/target/hexagon/macros.h
index 94a676fbf9..16e72ed0d5 100644
--- a/target/hexagon/macros.h
+++ b/target/hexagon/macros.h
@@ -371,10 +371,6 @@ static inline TCGv gen_read_ireg(TCGv result, TCGv val, 
int shift)
 #define fSET_OVERFLOW() SET_USR_FIELD(USR_OVF, 1)
 #define fSET_LPCFG(VAL) SET_USR_FIELD(USR_LPCFG, (VAL))
 #define fGET_LPCFG (GET_USR_FIELD(USR_LPCFG))
-#define fWRITE_P0(VAL) log_pred_write(env, 0, VAL)
-#define fWRITE_P1(VAL) log_pred_write(env, 1, VAL)
-#define fWRITE_P2(VAL) log_pred_write(env, 2, VAL)
-#define fWRITE_P3(VAL) log_pred_write(env, 3, VAL)
 #define fPART1(WORK) if (part1) { WORK; return; }
 #define fCAST4u(A) ((uint32_t)(A))
 #define fCAST4s(A) ((int32_t)(A))
diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index 43f6c6fb9f..cde5cff06a 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -560,6 +560,11 @@ static void gen_ploopNsi(DisasContext *ctx, int N, int 
count, int riV)
 {
 gen_ploopNsr(ctx, N, tcg_constant_tl(count), riV);
 }
+
+static inline void gen_comparei(TCGCond cond, TCGv res, TCGv arg1, int arg2)
+{
+gen_compare(cond, res, arg1, tcg_constant_tl(arg2));
+}
 #endif
 
 static void gen_cond_jumpr(DisasContext *ctx, TCGv dst_pc,
diff --git a/target/hexagon/op_helper.c b/target/hexagon/op_helper.c
index 7e9e3f305e..46ccc59106 100644
--- a/target/hexagon/op_helper.c
+++ b/target/hexagon/op_helper.c
@@ -52,21 +52,6 @@ G_NORETURN void HELPER(raise_exception)(CPUHexagonState 
*env, uint32_t excp)
 do_raise_exception_err(env, excp, 0);
 }
 
-static void log_pred_write(CPUHexagonState *env, int pnum, target_ulong val)
-{
-HEX_DEBUG_LOG("log_pred_write[%d] = " TARGET_FMT_ld
-  " (0x" TARGET_FMT_lx ")\n",
-  pnum, val, val);
-
-/* Multiple writes to the same preg are and'ed together */
-if (env->pred_written & (1 << pnum)) {
-env->new_pred_value[pnum] &= val & 0xff;
-} else {
-env->new_pred_value[pnum] = val & 0xff;
-env->pred_written |= 1 << pnum;
-}
-}
-
 void log_store32(CPUHexagonState *env, target_ulong addr,
  target_ulong val, int width, int slot)
 {
@@ -399,6 +384,87 @@ int32_t HELPER(vacsh_pred)(CPUHexagonState *env,
 return PeV;
 }
 
+int64_t HELPER(cabacdecbin_val)(int64_t RssV, int64_t RttV)
+{
+int64_t RddV = 0;
+size4u_t state;
+size4u_t valMPS;
+size4u_t bitpos;
+size4u_t range;
+size4u_t offset;
+size4u_t rLPS;
+size4u_t rMPS;
+
+state =  fEXTRACTU_RANGE(fGETWORD(1, RttV), 5, 0);
+valMPS = fEXTRACTU_RANGE(fGETWORD(1, RttV), 8, 8);
+bitpos = fEXTRACTU_RANGE(fGETWORD(0, RttV), 4, 0);
+range =  fGETWORD(0, 

[PATCH v2 14/21] Hexagon (target/hexagon) Short-circuit more HVX single instruction packets

2023-04-27 Thread Taylor Simpson
The generated helpers for HVX use pass-by-reference, so they can't
short-circuit when the reads/writes overlap.  The instructions with
overrides are OK because they use tcg_gen_gvec_*.

We add a flag has_hvx_helper to DisasContext and extend gen_analyze_funcs
to set the flag when the instruction is an HVX instruction with a
generated helper.

We add an override for V6_vcombine so that it can be short-circuited
along with a test case in tests/tcg/hexagon/hvx_misc.c

Signed-off-by: Taylor Simpson 
---
 target/hexagon/gen_tcg_hvx.h| 23 +++
 target/hexagon/translate.h  |  1 +
 target/hexagon/translate.c  | 17 +++--
 tests/tcg/hexagon/hvx_misc.c| 21 +
 target/hexagon/gen_analyze_funcs.py |  5 +
 5 files changed, 65 insertions(+), 2 deletions(-)

diff --git a/target/hexagon/gen_tcg_hvx.h b/target/hexagon/gen_tcg_hvx.h
index d4aefe8e3f..19680d8505 100644
--- a/target/hexagon/gen_tcg_hvx.h
+++ b/target/hexagon/gen_tcg_hvx.h
@@ -128,6 +128,29 @@ static inline void assert_vhist_tmp(DisasContext *ctx)
 tcg_gen_gvec_mov(MO_64, VdV_off, VuV_off, \
  sizeof(MMVector), sizeof(MMVector))
 
+/*
+ * Vector combine
+ *
+ * Be careful that the source and dest don't overlap
+ */
+#define fGEN_TCG_V6_vcombine(SHORTCODE) \
+do { \
+if (VddV_off != VuV_off) { \
+tcg_gen_gvec_mov(MO_64, VddV_off, VvV_off, \
+ sizeof(MMVector), sizeof(MMVector)); \
+tcg_gen_gvec_mov(MO_64, VddV_off + sizeof(MMVector), VuV_off, \
+ sizeof(MMVector), sizeof(MMVector)); \
+} else { \
+intptr_t tmpoff = offsetof(CPUHexagonState, vtmp); \
+tcg_gen_gvec_mov(MO_64, tmpoff, VuV_off, \
+ sizeof(MMVector), sizeof(MMVector)); \
+tcg_gen_gvec_mov(MO_64, VddV_off, VvV_off, \
+ sizeof(MMVector), sizeof(MMVector)); \
+tcg_gen_gvec_mov(MO_64, VddV_off + sizeof(MMVector), tmpoff, \
+ sizeof(MMVector), sizeof(MMVector)); \
+} \
+} while (0)
+
 /* Vector conditional move */
 #define fGEN_TCG_VEC_CMOV(PRED) \
 do { \
diff --git a/target/hexagon/translate.h b/target/hexagon/translate.h
index 3f6fd3452c..26bcae0395 100644
--- a/target/hexagon/translate.h
+++ b/target/hexagon/translate.h
@@ -68,6 +68,7 @@ typedef struct DisasContext {
 bool is_tight_loop;
 bool need_pkt_has_store_s1;
 bool short_circuit;
+bool has_hvx_helper;
 } DisasContext;
 
 static inline void ctx_log_pred_write(DisasContext *ctx, int pnum)
diff --git a/target/hexagon/translate.c b/target/hexagon/translate.c
index b714a8da96..c7a04e34d2 100644
--- a/target/hexagon/translate.c
+++ b/target/hexagon/translate.c
@@ -378,8 +378,20 @@ static bool need_commit(DisasContext *ctx)
 return true;
 }
 
-if (pkt->num_insns == 1 && !pkt->pkt_has_hvx) {
-return false;
+if (pkt->num_insns == 1) {
+if (pkt->pkt_has_hvx) {
+/*
+ * The HVX instructions with generated helpers use
+ * pass-by-reference, so they need the read/write overlap
+ * check below.
+ * The HVX instructions with overrides are OK.
+ */
+if (!ctx->has_hvx_helper) {
+return false;
+}
+} else {
+return false;
+}
 }
 
 /* Check for overlap between register reads and writes */
@@ -454,6 +466,7 @@ static void analyze_packet(DisasContext *ctx)
 {
 Packet *pkt = ctx->pkt;
 ctx->need_pkt_has_store_s1 = false;
+ctx->has_hvx_helper = false;
 for (int i = 0; i < pkt->num_insns; i++) {
 Insn *insn = >insn[i];
 ctx->insn = insn;
diff --git a/tests/tcg/hexagon/hvx_misc.c b/tests/tcg/hexagon/hvx_misc.c
index d0e64e035f..c89fe0253d 100644
--- a/tests/tcg/hexagon/hvx_misc.c
+++ b/tests/tcg/hexagon/hvx_misc.c
@@ -454,6 +454,25 @@ static void test_load_cur_predicated(void)
 check_output_w(__LINE__, BUFSIZE);
 }
 
+static void test_vcombine(void)
+{
+for (int i = 0; i < BUFSIZE / 2; i++) {
+asm volatile("v2 = vsplat(%0)\n\t"
+ "v3 = vsplat(%1)\n\t"
+ "v3:2 = vcombine(v2, v3)\n\t"
+ "vmem(%2+#0) = v2\n\t"
+ "vmem(%2+#1) = v3\n\t"
+ :
+ : "r"(2 * i), "r"(2 * i + 1), "r"([2 * i])
+ : "v2", "v3", "memory");
+for (int j = 0; j < MAX_VEC_SIZE_BYTES / 4; j++) {
+expect[2 * i].w[j] = 2 * i + 1;
+expect[2 * i + 1].w[j] = 2 * i;
+}
+}
+check_output_w(__LINE__, BUFSIZE);
+}
+
 int main()
 {
 init_buffers();
@@ -494,6 +513,8 @@ int main()
 test_load_tmp_predicated();
 test_load_cur_predicated();
 
+test_vcombine();
+
 puts(err ? "FAIL" : "PASS");
 return err ? 1 : 0;
 }
diff 

[PATCH v2 09/21] Hexagon (target/hexagon) Don't overlap dest writes with source reads

2023-04-27 Thread Taylor Simpson
When generating TCG, make sure we have read all the operand registers
before writing to the destination registers.

This is a prerequesite for short-circuiting where the source and dest
operands could be the same.

Signed-off-by: Taylor Simpson 
Reviewed-by: Richard Henderson 
---
 target/hexagon/genptr.c | 45 ++---
 1 file changed, 29 insertions(+), 16 deletions(-)

diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index 2014a8068a..aff9ffe37b 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -971,6 +971,7 @@ static void gen_cmpi_jumpnv(DisasContext *ctx,
 /* Shift left with saturation */
 static void gen_shl_sat(DisasContext *ctx, TCGv dst, TCGv src, TCGv shift_amt)
 {
+TCGv tmp = tcg_temp_new();/* In case dst == src */
 TCGv usr = get_result_gpr(ctx, HEX_REG_USR);
 TCGv sh32 = tcg_temp_new();
 TCGv dst_sar = tcg_temp_new();
@@ -995,17 +996,17 @@ static void gen_shl_sat(DisasContext *ctx, TCGv dst, TCGv 
src, TCGv shift_amt)
  */
 
 tcg_gen_andi_tl(sh32, shift_amt, 31);
-tcg_gen_movcond_tl(TCG_COND_EQ, dst, sh32, shift_amt,
+tcg_gen_movcond_tl(TCG_COND_EQ, tmp, sh32, shift_amt,
src, tcg_constant_tl(0));
-tcg_gen_shl_tl(dst, dst, sh32);
-tcg_gen_sar_tl(dst_sar, dst, sh32);
+tcg_gen_shl_tl(tmp, tmp, sh32);
+tcg_gen_sar_tl(dst_sar, tmp, sh32);
 tcg_gen_movcond_tl(TCG_COND_LT, satval, src, tcg_constant_tl(0), min, max);
 
 tcg_gen_setcond_tl(TCG_COND_NE, ovf, dst_sar, src);
 tcg_gen_shli_tl(ovf, ovf, reg_field_info[USR_OVF].offset);
 tcg_gen_or_tl(usr, usr, ovf);
 
-tcg_gen_movcond_tl(TCG_COND_EQ, dst, dst_sar, src, dst, satval);
+tcg_gen_movcond_tl(TCG_COND_EQ, dst, dst_sar, src, tmp, satval);
 }
 
 static void gen_sar(TCGv dst, TCGv src, TCGv shift_amt)
@@ -1228,22 +1229,28 @@ void gen_sat_i32(TCGv dest, TCGv source, int width)
 
 void gen_sat_i32_ovfl(TCGv ovfl, TCGv dest, TCGv source, int width)
 {
-gen_sat_i32(dest, source, width);
-tcg_gen_setcond_tl(TCG_COND_NE, ovfl, source, dest);
+TCGv tmp = tcg_temp_new();/* In case dest == source */
+gen_sat_i32(tmp, source, width);
+tcg_gen_setcond_tl(TCG_COND_NE, ovfl, source, tmp);
+tcg_gen_mov_tl(dest, tmp);
 }
 
 void gen_satu_i32(TCGv dest, TCGv source, int width)
 {
+TCGv tmp = tcg_temp_new();/* In case dest == source */
 TCGv max_val = tcg_constant_tl((1 << width) - 1);
 TCGv zero = tcg_constant_tl(0);
-tcg_gen_movcond_tl(TCG_COND_GTU, dest, source, max_val, max_val, source);
-tcg_gen_movcond_tl(TCG_COND_LT, dest, source, zero, zero, dest);
+tcg_gen_movcond_tl(TCG_COND_GTU, tmp, source, max_val, max_val, source);
+tcg_gen_movcond_tl(TCG_COND_LT, tmp, source, zero, zero, tmp);
+tcg_gen_mov_tl(dest, tmp);
 }
 
 void gen_satu_i32_ovfl(TCGv ovfl, TCGv dest, TCGv source, int width)
 {
-gen_satu_i32(dest, source, width);
-tcg_gen_setcond_tl(TCG_COND_NE, ovfl, source, dest);
+TCGv tmp = tcg_temp_new();/* In case dest == source */
+gen_satu_i32(tmp, source, width);
+tcg_gen_setcond_tl(TCG_COND_NE, ovfl, source, tmp);
+tcg_gen_mov_tl(dest, tmp);
 }
 
 void gen_sat_i64(TCGv_i64 dest, TCGv_i64 source, int width)
@@ -1256,27 +1263,33 @@ void gen_sat_i64(TCGv_i64 dest, TCGv_i64 source, int 
width)
 
 void gen_sat_i64_ovfl(TCGv ovfl, TCGv_i64 dest, TCGv_i64 source, int width)
 {
+TCGv_i64 tmp = tcg_temp_new_i64(); /* In case dest == source */
 TCGv_i64 ovfl_64;
-gen_sat_i64(dest, source, width);
+gen_sat_i64(tmp, source, width);
 ovfl_64 = tcg_temp_new_i64();
-tcg_gen_setcond_i64(TCG_COND_NE, ovfl_64, dest, source);
+tcg_gen_setcond_i64(TCG_COND_NE, ovfl_64, tmp, source);
+tcg_gen_mov_i64(dest, tmp);
 tcg_gen_trunc_i64_tl(ovfl, ovfl_64);
 }
 
 void gen_satu_i64(TCGv_i64 dest, TCGv_i64 source, int width)
 {
+TCGv_i64 tmp = tcg_temp_new_i64();/* In case dest == source */
 TCGv_i64 max_val = tcg_constant_i64((1LL << width) - 1LL);
 TCGv_i64 zero = tcg_constant_i64(0);
-tcg_gen_movcond_i64(TCG_COND_GTU, dest, source, max_val, max_val, source);
-tcg_gen_movcond_i64(TCG_COND_LT, dest, source, zero, zero, dest);
+tcg_gen_movcond_i64(TCG_COND_GTU, tmp, source, max_val, max_val, source);
+tcg_gen_movcond_i64(TCG_COND_LT, tmp, source, zero, zero, tmp);
+tcg_gen_mov_i64(dest, tmp);
 }
 
 void gen_satu_i64_ovfl(TCGv ovfl, TCGv_i64 dest, TCGv_i64 source, int width)
 {
+TCGv_i64 tmp = tcg_temp_new_i64();/* In case dest == source */
 TCGv_i64 ovfl_64;
-gen_satu_i64(dest, source, width);
+gen_satu_i64(tmp, source, width);
 ovfl_64 = tcg_temp_new_i64();
-tcg_gen_setcond_i64(TCG_COND_NE, ovfl_64, dest, source);
+tcg_gen_setcond_i64(TCG_COND_NE, ovfl_64, tmp, source);
+tcg_gen_mov_i64(dest, tmp);
 tcg_gen_trunc_i64_tl(ovfl, ovfl_64);
 }
 
-- 
2.25.1



[PATCH v2 15/21] Hexagon (target/hexagon) Add overrides for disabled idef-parser insns

2023-04-27 Thread Taylor Simpson
The following have overrides
S2_insert
S2_insert_rp
S2_asr_r_svw_trun
A2_swiz

These instructions have semantics that write to the destination
before all the operand reads have been completed.  Therefore,
the idef-parser versions were disabled with the short-circuit patch.

Test cases added to tests/tcg/hexagon/read_write_overlap.c

Signed-off-by: Taylor Simpson 
Reviewed-by: Richard Henderson 
---
 target/hexagon/gen_tcg.h   |  18 
 target/hexagon/genptr.c|  99 ++
 tests/tcg/hexagon/read_write_overlap.c | 136 +
 tests/tcg/hexagon/Makefile.target  |   1 +
 4 files changed, 254 insertions(+)
 create mode 100644 tests/tcg/hexagon/read_write_overlap.c

diff --git a/target/hexagon/gen_tcg.h b/target/hexagon/gen_tcg.h
index 1f7e535300..fabc1eb623 100644
--- a/target/hexagon/gen_tcg.h
+++ b/target/hexagon/gen_tcg.h
@@ -1181,6 +1181,24 @@
 tcg_gen_extrl_i64_i32(RdV, tmp); \
 } while (0)
 
+#define fGEN_TCG_S2_insert(SHORTCODE) \
+do { \
+int width = uiV; \
+int offset = UiV; \
+if (width != 0) { \
+if (offset + width > 32) { \
+width = 32 - offset; \
+} \
+tcg_gen_deposit_tl(RxV, RxV, RsV, offset, width); \
+} \
+} while (0)
+#define fGEN_TCG_S2_insert_rp(SHORTCODE) \
+gen_insert_rp(ctx, RxV, RsV, RttV)
+#define fGEN_TCG_S2_asr_r_svw_trun(SHORTCODE) \
+gen_asr_r_svw_trun(ctx, RdV, RssV, RtV)
+#define fGEN_TCG_A2_swiz(SHORTCODE) \
+tcg_gen_bswap_tl(RdV, RsV)
+
 /* Floating point */
 #define fGEN_TCG_F2_conv_sf2df(SHORTCODE) \
 gen_helper_conv_sf2df(RddV, cpu_env, RsV)
diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index d134d8082a..0727d4524b 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -1065,6 +1065,105 @@ static void gen_asl_r_r_sat(DisasContext *ctx, TCGv 
RdV, TCGv RsV, TCGv RtV)
 gen_set_label(done);
 }
 
+static void gen_insert_rp(DisasContext *ctx, TCGv RxV, TCGv RsV, TCGv_i64 RttV)
+{
+/*
+ * int width = fZXTN(6, 32, (fGETWORD(1, RttV)));
+ * int offset = fSXTN(7, 32, (fGETWORD(0, RttV)));
+ * size8u_t mask = ((fCONSTLL(1) << width) - 1);
+ * if (offset < 0) {
+ * RxV = 0;
+ * } else {
+ * RxV &= ~(mask << offset);
+ * RxV |= ((RsV & mask) << offset);
+ * }
+ */
+
+TCGv width = tcg_temp_new();
+TCGv offset = tcg_temp_new();
+TCGv_i64 mask = tcg_temp_new_i64();
+TCGv_i64 result = tcg_temp_new_i64();
+TCGv_i64 tmp = tcg_temp_new_i64();
+TCGv_i64 offset64 = tcg_temp_new_i64();
+TCGLabel *label = gen_new_label();
+TCGLabel *done = gen_new_label();
+
+tcg_gen_extrh_i64_i32(width, RttV);
+tcg_gen_extract_tl(width, width, 0, 6);
+tcg_gen_extrl_i64_i32(offset, RttV);
+tcg_gen_sextract_tl(offset, offset, 0, 7);
+/* Possible values for offset are -64 .. 63 */
+tcg_gen_brcondi_tl(TCG_COND_GE, offset, 0, label);
+/* For negative offsets, zero out the result */
+tcg_gen_movi_tl(RxV, 0);
+tcg_gen_br(done);
+gen_set_label(label);
+/* At this point, possible values of offset are 0 .. 63 */
+tcg_gen_ext_i32_i64(mask, width);
+tcg_gen_shl_i64(mask, tcg_constant_i64(1), mask);
+tcg_gen_subi_i64(mask, mask, 1);
+tcg_gen_extu_i32_i64(result, RxV);
+tcg_gen_ext_i32_i64(tmp, offset);
+tcg_gen_shl_i64(tmp, mask, tmp);
+tcg_gen_andc_i64(result, result, tmp);
+tcg_gen_extu_i32_i64(tmp, RsV);
+tcg_gen_and_i64(tmp, tmp, mask);
+tcg_gen_extu_i32_i64(offset64, offset);
+tcg_gen_shl_i64(tmp, tmp, offset64);
+tcg_gen_or_i64(result, result, tmp);
+tcg_gen_extrl_i64_i32(RxV, result);
+gen_set_label(done);
+}
+
+static void gen_asr_r_svw_trun(DisasContext *ctx, TCGv RdV,
+   TCGv_i64 RssV, TCGv RtV)
+{
+/*
+ * for (int i = 0; i < 2; i++) {
+ * fSETHALF(i, RdV, fGETHALF(0, ((fSXTN(7, 32, RtV) > 0) ?
+ * (fCAST4_8s(fGETWORD(i, RssV)) >> fSXTN(7, 32, RtV)) :
+ * (fCAST4_8s(fGETWORD(i, RssV)) << -fSXTN(7, 32, RtV);
+ * }
+ */
+TCGv shift_amt32 = tcg_temp_new();
+TCGv_i64 shift_amt64 = tcg_temp_new_i64();
+TCGv_i64 tmp64 = tcg_temp_new_i64();
+TCGv tmp32 = tcg_temp_new();
+TCGLabel *label = gen_new_label();
+TCGLabel *zero = gen_new_label();
+TCGLabel *done =  gen_new_label();
+
+tcg_gen_sextract_tl(shift_amt32, RtV, 0, 7);
+/* Possible values of shift_amt32 are -64 .. 63 */
+tcg_gen_brcondi_tl(TCG_COND_LE, shift_amt32, 0, label);
+/* After branch, possible values of shift_amt32 are 1 .. 63 */
+tcg_gen_ext_i32_i64(shift_amt64, shift_amt32);
+for (int i = 0; i < 2; i++) {
+tcg_gen_sextract_i64(tmp64, RssV, i * 32, 32);
+tcg_gen_sar_i64(tmp64, tmp64, shift_amt64);
+tcg_gen_extrl_i64_i32(tmp32, tmp64);
+tcg_gen_deposit_tl(RdV, RdV, tmp32, i * 16, 

[PATCH v2 03/21] Hexagon (target/hexagon) Add overrides for loop setup instructions

2023-04-27 Thread Taylor Simpson
These instructions have implicit writes to registers, so we don't
want them to be helpers when idef-parser is off.

Signed-off-by: Taylor Simpson 
Acked-by: Richard Henderson 
---
 target/hexagon/gen_tcg.h | 21 +++
 target/hexagon/genptr.c  | 44 
 2 files changed, 65 insertions(+)

diff --git a/target/hexagon/gen_tcg.h b/target/hexagon/gen_tcg.h
index 060c11f6c0..5774af4a59 100644
--- a/target/hexagon/gen_tcg.h
+++ b/target/hexagon/gen_tcg.h
@@ -663,6 +663,27 @@
 #define fGEN_TCG_J2_callrf(SHORTCODE) \
 gen_cond_callr(ctx, TCG_COND_NE, PuV, RsV)
 
+#define fGEN_TCG_J2_loop0r(SHORTCODE) \
+gen_loop0r(ctx, RsV, riV)
+#define fGEN_TCG_J2_loop1r(SHORTCODE) \
+gen_loop1r(ctx, RsV, riV)
+#define fGEN_TCG_J2_loop0i(SHORTCODE) \
+gen_loop0i(ctx, UiV, riV)
+#define fGEN_TCG_J2_loop1i(SHORTCODE) \
+gen_loop1i(ctx, UiV, riV)
+#define fGEN_TCG_J2_ploop1sr(SHORTCODE) \
+gen_ploopNsr(ctx, 1, RsV, riV)
+#define fGEN_TCG_J2_ploop1si(SHORTCODE) \
+gen_ploopNsi(ctx, 1, UiV, riV)
+#define fGEN_TCG_J2_ploop2sr(SHORTCODE) \
+gen_ploopNsr(ctx, 2, RsV, riV)
+#define fGEN_TCG_J2_ploop2si(SHORTCODE) \
+gen_ploopNsi(ctx, 2, UiV, riV)
+#define fGEN_TCG_J2_ploop3sr(SHORTCODE) \
+gen_ploopNsr(ctx, 3, RsV, riV)
+#define fGEN_TCG_J2_ploop3si(SHORTCODE) \
+gen_ploopNsi(ctx, 3, UiV, riV)
+
 #define fGEN_TCG_J2_endloop0(SHORTCODE) \
 gen_endloop0(ctx)
 #define fGEN_TCG_J2_endloop1(SHORTCODE) \
diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index 12c72cbac9..4c34da8407 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -518,6 +518,50 @@ static void gen_compare(TCGCond cond, TCGv res, TCGv arg1, 
TCGv arg2)
 tcg_gen_movcond_tl(cond, res, arg1, arg2, one, zero);
 }
 
+#ifndef CONFIG_HEXAGON_IDEF_PARSER
+static inline void gen_loop0r(DisasContext *ctx, TCGv RsV, int riV)
+{
+fIMMEXT(riV);
+fPCALIGN(riV);
+gen_log_reg_write(ctx, HEX_REG_LC0, RsV);
+gen_log_reg_write(ctx, HEX_REG_SA0, tcg_constant_tl(ctx->pkt->pc + riV));
+gen_set_usr_fieldi(ctx, USR_LPCFG, 0);
+}
+
+static void gen_loop0i(DisasContext *ctx, int count, int riV)
+{
+gen_loop0r(ctx, tcg_constant_tl(count), riV);
+}
+
+static inline void gen_loop1r(DisasContext *ctx, TCGv RsV, int riV)
+{
+fIMMEXT(riV);
+fPCALIGN(riV);
+gen_log_reg_write(ctx, HEX_REG_LC1, RsV);
+gen_log_reg_write(ctx, HEX_REG_SA1, tcg_constant_tl(ctx->pkt->pc + riV));
+}
+
+static void gen_loop1i(DisasContext *ctx, int count, int riV)
+{
+gen_loop1r(ctx, tcg_constant_tl(count), riV);
+}
+
+static void gen_ploopNsr(DisasContext *ctx, int N, TCGv RsV, int riV)
+{
+fIMMEXT(riV);
+fPCALIGN(riV);
+gen_log_reg_write(ctx, HEX_REG_LC0, RsV);
+gen_log_reg_write(ctx, HEX_REG_SA0, tcg_constant_tl(ctx->pkt->pc + riV));
+gen_set_usr_fieldi(ctx, USR_LPCFG, N);
+gen_log_pred_write(ctx, 3, tcg_constant_tl(0));
+}
+
+static void gen_ploopNsi(DisasContext *ctx, int N, int count, int riV)
+{
+gen_ploopNsr(ctx, N, tcg_constant_tl(count), riV);
+}
+#endif
+
 static void gen_cond_jumpr(DisasContext *ctx, TCGv dst_pc,
TCGCond cond, TCGv pred)
 {
-- 
2.25.1



[PATCH v2 00/21] Hexagon (target/hexagon) short-circuit and move to DisasContext

2023-04-27 Thread Taylor Simpson
This patch series achieves two major goals
Goal 1:  Short-circuit packet semantics
In certain cases, we can avoid the overhead of writing to
hex_new_value and write directly to hex_gpr.

Here's a simple example of the TCG generated for
0x004000b4:  0x7800c020 {   R0 = #0x1 }

BEFORE:
  004000b4
 movi_i32 new_r0,$0x1
 mov_i32 r0,new_r0

AFTER:
  004000b4
 movi_i32 r0,$0x1
Goal 2:  Move bookkeeping items from CPUHexagonState to DisasContext
Suggested-by: Richard Henderson 
Several fields in CPUHexagonState are only used for bookkeeping
within the translation of a packet.  With recent changes to eliminate
the need to free TCGv variables, these make more sense to be
transient and kept in DisasContext.


This patch series can be divided into 3 main parts
Part 1:  Patches 1-9
Cleanup in preparation for parts 2 and 3
The main goal is to move functionality out of generated helpers
Part 2:  Patches 10-15
Short-circuit packet semantics
Part 3:  Patches 16-21
Move bookkeeping items from CPUHexagonState to DisasContext


 Changes in v2 
Address feedback from Richard Henderson <
Cleaner implementation of gen_frame_scramble
Add g_assert_not_reached() in gen_framecheck
Move TCGv allocation inside gen_frame_scramble
Change tcg_gen_brcond_tl to tcg_gen_movcond_tl
Change static inline to G_GNUC_UNUSED
Removed in later patch
Change tcg_gen_not_i64 + tcg_gen_and_i64 to tcg_gen_andc_i64
Use full constant in gen_slotval





Taylor Simpson (21):
  meson.build Add CONFIG_HEXAGON_IDEF_PARSER
  Hexagon (target/hexagon) Add DisasContext arg to gen_log_reg_write
  Hexagon (target/hexagon) Add overrides for loop setup instructions
  Hexagon (target/hexagon) Add overrides for allocframe/deallocframe
  Hexagon (target/hexagon) Add overrides for clr[tf]new
  Hexagon (target/hexagon) Remove log_reg_write from op_helper.[ch]
  Hexagon (target/hexagon) Eliminate uses of log_pred_write function
  Hexagon (target/hexagon) Clean up pred_written usage
  Hexagon (target/hexagon) Don't overlap dest writes with source reads
  Hexagon (target/hexagon) Mark registers as read during packet analysis
  Hexagon (target/hexagon) Short-circuit packet register writes
  Hexagon (target/hexagon) Short-circuit packet predicate writes
  Hexagon (target/hexagon) Short-circuit packet HVX writes
  Hexagon (target/hexagon) Short-circuit more HVX single instruction
packets
  Hexagon (target/hexagon) Add overrides for disabled idef-parser insns
  Hexagon (target/hexagon) Make special new_value for USR
  Hexagon (target/hexagon) Move new_value to DisasContext
  Hexagon (target/hexagon) Move new_pred_value to DisasContext
  Hexagon (target/hexagon) Move pred_written to DisasContext
  Hexagon (target/hexagon) Move pkt_has_store_s1 to DisasContext
  Hexagon (target/hexagon) Move items to DisasContext

 meson.build |   1 +
 target/hexagon/cpu.h|  10 +-
 target/hexagon/gen_tcg.h| 116 ++-
 target/hexagon/gen_tcg_hvx.h|  23 ++
 target/hexagon/genptr.h |   6 +-
 target/hexagon/helper.h |   6 +-
 target/hexagon/macros.h |  57 ++--
 target/hexagon/op_helper.h  |  16 +-
 target/hexagon/translate.h  |  52 ++-
 target/hexagon/attribs_def.h.inc|   6 +-
 target/hexagon/arch.c   |   3 +-
 target/hexagon/cpu.c|   5 +-
 target/hexagon/genptr.c | 347 
 target/hexagon/idef-parser/parser-helpers.c |   4 +-
 target/hexagon/op_helper.c  | 154 ++---
 target/hexagon/translate.c  | 272 ++-
 tests/tcg/hexagon/hvx_misc.c|  21 ++
 tests/tcg/hexagon/read_write_overlap.c  | 136 
 target/hexagon/README   |   6 +-
 target/hexagon/gen_analyze_funcs.py |  51 ++-
 target/hexagon/gen_helper_funcs.py  |   9 +-
 target/hexagon/gen_helper_protos.py |  10 +-
 target/hexagon/gen_idef_parser_funcs.py |   7 +
 target/hexagon/gen_tcg_funcs.py |  21 +-
 target/hexagon/hex_common.py|  16 +-
 tests/tcg/hexagon/Makefile.target   |   1 +
 26 files changed, 1063 insertions(+), 293 deletions(-)
 create mode 100644 tests/tcg/hexagon/read_write_overlap.c

-- 
2.25.1



[PATCH v2 17/21] Hexagon (target/hexagon) Move new_value to DisasContext

2023-04-27 Thread Taylor Simpson
The new_value array in the CPUHexagonState is only used for bookkeeping
within the translation of a packet.  With recent changes that eliminate
the need to free TCGv variables, these make more sense to be transient
and kept in DisasContext.

Suggested-by: Richard Henderson 
Signed-off-by: Taylor Simpson 
Reviewed-by: Richard Henderson 
---
 target/hexagon/cpu.h   |  1 -
 target/hexagon/translate.h |  2 +-
 target/hexagon/genptr.c|  6 +-
 target/hexagon/translate.c | 14 +++---
 4 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h
index 3687f2caa2..22aba20be2 100644
--- a/target/hexagon/cpu.h
+++ b/target/hexagon/cpu.h
@@ -85,7 +85,6 @@ typedef struct CPUArchState {
 target_ulong stack_start;
 
 uint8_t slot_cancelled;
-target_ulong new_value[TOTAL_PER_THREAD_REGS];
 target_ulong new_value_usr;
 
 /*
diff --git a/target/hexagon/translate.h b/target/hexagon/translate.h
index 4c17433a6f..6dde487566 100644
--- a/target/hexagon/translate.h
+++ b/target/hexagon/translate.h
@@ -69,6 +69,7 @@ typedef struct DisasContext {
 bool need_pkt_has_store_s1;
 bool short_circuit;
 bool has_hvx_helper;
+TCGv new_value[TOTAL_PER_THREAD_REGS];
 } DisasContext;
 
 static inline void ctx_log_pred_write(DisasContext *ctx, int pnum)
@@ -190,7 +191,6 @@ extern TCGv hex_pred[NUM_PREGS];
 extern TCGv hex_this_PC;
 extern TCGv hex_slot_cancelled;
 extern TCGv hex_branch_taken;
-extern TCGv hex_new_value[TOTAL_PER_THREAD_REGS];
 extern TCGv hex_new_value_usr;
 extern TCGv hex_reg_written[TOTAL_PER_THREAD_REGS];
 extern TCGv hex_new_pred_value[NUM_PREGS];
diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index ede1474ea5..c7a8e2ce55 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -74,7 +74,11 @@ TCGv get_result_gpr(DisasContext *ctx, int rnum)
 if (rnum == HEX_REG_USR) {
 return hex_new_value_usr;
 } else {
-return hex_new_value[rnum];
+if (ctx->new_value[rnum] == NULL) {
+ctx->new_value[rnum] = tcg_temp_new();
+tcg_gen_movi_tl(ctx->new_value[rnum], 0);
+}
+return ctx->new_value[rnum];
 }
 } else {
 return hex_gpr[rnum];
diff --git a/target/hexagon/translate.c b/target/hexagon/translate.c
index d46a724c1b..5f35bb20e7 100644
--- a/target/hexagon/translate.c
+++ b/target/hexagon/translate.c
@@ -44,7 +44,6 @@ TCGv hex_pred[NUM_PREGS];
 TCGv hex_this_PC;
 TCGv hex_slot_cancelled;
 TCGv hex_branch_taken;
-TCGv hex_new_value[TOTAL_PER_THREAD_REGS];
 TCGv hex_new_value_usr;
 TCGv hex_reg_written[TOTAL_PER_THREAD_REGS];
 TCGv hex_new_pred_value[NUM_PREGS];
@@ -513,6 +512,9 @@ static void gen_start_packet(DisasContext *ctx)
 }
 ctx->s1_store_processed = false;
 ctx->pre_commit = true;
+for (i = 0; i < TOTAL_PER_THREAD_REGS; i++) {
+ctx->new_value[i] = NULL;
+}
 
 analyze_packet(ctx);
 
@@ -1156,7 +1158,6 @@ void gen_intermediate_code(CPUState *cs, TranslationBlock 
*tb, int *max_insns,
 }
 
 #define NAME_LEN   64
-static char new_value_names[TOTAL_PER_THREAD_REGS][NAME_LEN];
 static char reg_written_names[TOTAL_PER_THREAD_REGS][NAME_LEN];
 static char new_pred_value_names[NUM_PREGS][NAME_LEN];
 static char store_addr_names[STORES_MAX][NAME_LEN];
@@ -1178,15 +1179,6 @@ void hexagon_translate_init(void)
 offsetof(CPUHexagonState, gpr[i]),
 hexagon_regnames[i]);
 
-if (i == HEX_REG_USR) {
-hex_new_value[i] = NULL;
-} else {
-snprintf(new_value_names[i], NAME_LEN, "new_%s", 
hexagon_regnames[i]);
-hex_new_value[i] = tcg_global_mem_new(cpu_env,
-offsetof(CPUHexagonState, new_value[i]),
-new_value_names[i]);
-}
-
 if (HEX_DEBUG) {
 snprintf(reg_written_names[i], NAME_LEN, "reg_written_%s",
  hexagon_regnames[i]);
-- 
2.25.1



[PATCH v2 13/21] Hexagon (target/hexagon) Short-circuit packet HVX writes

2023-04-27 Thread Taylor Simpson
In certain cases, we can avoid the overhead of writing to future_VRegs
and write directly to VRegs.  We consider HVX reads/writes when computing
ctx->need_commit.  Then, we can early-exit from gen_commit_hvx.

Signed-off-by: Taylor Simpson 
Reviewed-by: Richard Henderson 
---
 target/hexagon/genptr.c|  6 -
 target/hexagon/translate.c | 46 +-
 2 files changed, 50 insertions(+), 2 deletions(-)

diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index 33f9d78aed..d134d8082a 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -1104,7 +1104,11 @@ static void gen_log_vreg_write_pair(DisasContext *ctx, 
intptr_t srcoff, int num,
 
 static intptr_t get_result_qreg(DisasContext *ctx, int qnum)
 {
-return  offsetof(CPUHexagonState, future_QRegs[qnum]);
+if (ctx->need_commit) {
+return  offsetof(CPUHexagonState, future_QRegs[qnum]);
+} else {
+return  offsetof(CPUHexagonState, QRegs[qnum]);
+}
 }
 
 static void gen_vreg_load(DisasContext *ctx, intptr_t dstoff, TCGv src,
diff --git a/target/hexagon/translate.c b/target/hexagon/translate.c
index 4532b8d05e..b714a8da96 100644
--- a/target/hexagon/translate.c
+++ b/target/hexagon/translate.c
@@ -70,6 +70,10 @@ intptr_t ctx_future_vreg_off(DisasContext *ctx, int regnum,
 {
 intptr_t offset;
 
+if (!ctx->need_commit) {
+return offsetof(CPUHexagonState, VRegs[regnum]);
+}
+
 /* See if it is already allocated */
 for (int i = 0; i < ctx->future_vregs_idx; i++) {
 if (ctx->future_vregs_num[i] == regnum) {
@@ -374,7 +378,7 @@ static bool need_commit(DisasContext *ctx)
 return true;
 }
 
-if (pkt->num_insns == 1) {
+if (pkt->num_insns == 1 && !pkt->pkt_has_hvx) {
 return false;
 }
 
@@ -394,6 +398,40 @@ static bool need_commit(DisasContext *ctx)
 }
 }
 
+/* Check for overlap between HVX reads and writes */
+for (int i = 0; i < ctx->vreg_log_idx; i++) {
+int vnum = ctx->vreg_log[i];
+if (test_bit(vnum, ctx->vregs_read)) {
+return true;
+}
+}
+if (!bitmap_empty(ctx->vregs_updated_tmp, NUM_VREGS)) {
+int i = find_first_bit(ctx->vregs_updated_tmp, NUM_VREGS);
+while (i < NUM_VREGS) {
+if (test_bit(i, ctx->vregs_read)) {
+return true;
+}
+i = find_next_bit(ctx->vregs_updated_tmp, NUM_VREGS, i + 1);
+}
+}
+if (!bitmap_empty(ctx->vregs_select, NUM_VREGS)) {
+int i = find_first_bit(ctx->vregs_select, NUM_VREGS);
+while (i < NUM_VREGS) {
+if (test_bit(i, ctx->vregs_read)) {
+return true;
+}
+i = find_next_bit(ctx->vregs_select, NUM_VREGS, i + 1);
+}
+}
+
+/* Check for overlap between HVX predicate reads and writes */
+for (int i = 0; i < ctx->qreg_log_idx; i++) {
+int qnum = ctx->qreg_log[i];
+if (test_bit(qnum, ctx->qregs_read)) {
+return true;
+}
+}
+
 return false;
 }
 
@@ -787,6 +825,12 @@ static void gen_commit_hvx(DisasContext *ctx)
 {
 int i;
 
+/* Early exit if not needed */
+if (!ctx->need_commit) {
+g_assert(!pkt_has_hvx_store(ctx->pkt));
+return;
+}
+
 /*
  *for (i = 0; i < ctx->vreg_log_idx; i++) {
  *int rnum = ctx->vreg_log[i];
-- 
2.25.1



[PATCH v2 12/21] Hexagon (target/hexagon) Short-circuit packet predicate writes

2023-04-27 Thread Taylor Simpson
In certain cases, we can avoid the overhead of writing to hex_new_pred_value
and write directly to hex_pred.  We consider predicate reads/writes when
computing ctx->need_commit.  The get_result_pred() function uses this
field to decide between hex_new_pred_value and hex_pred.  Then, we can
early-exit from gen_pred_writes.

Signed-off-by: Taylor Simpson 
Reviewed-by: Richard Henderson 
---
 target/hexagon/genptr.h|  1 +
 target/hexagon/genptr.c| 15 ---
 target/hexagon/translate.c | 14 +++---
 3 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/target/hexagon/genptr.h b/target/hexagon/genptr.h
index 420867f934..e11ccc2358 100644
--- a/target/hexagon/genptr.h
+++ b/target/hexagon/genptr.h
@@ -35,6 +35,7 @@ void gen_store4i(TCGv_env cpu_env, TCGv vaddr, int32_t src, 
uint32_t slot);
 void gen_store8i(TCGv_env cpu_env, TCGv vaddr, int64_t src, uint32_t slot);
 TCGv gen_read_reg(TCGv result, int num);
 TCGv gen_read_preg(TCGv pred, uint8_t num);
+TCGv get_result_pred(DisasContext *ctx, int pnum);
 void gen_log_reg_write(DisasContext *ctx, int rnum, TCGv val);
 void gen_log_pred_write(DisasContext *ctx, int pnum, TCGv val);
 void gen_set_usr_field(DisasContext *ctx, int field, TCGv val);
diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index 5a0f6b5195..33f9d78aed 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -110,8 +110,18 @@ static void gen_log_reg_write_pair(DisasContext *ctx, int 
rnum, TCGv_i64 val)
 gen_log_reg_write(ctx, rnum + 1, val32);
 }
 
+TCGv get_result_pred(DisasContext *ctx, int pnum)
+{
+if (ctx->need_commit) {
+return hex_new_pred_value[pnum];
+} else {
+return hex_pred[pnum];
+}
+}
+
 void gen_log_pred_write(DisasContext *ctx, int pnum, TCGv val)
 {
+TCGv pred = get_result_pred(ctx, pnum);
 TCGv base_val = tcg_temp_new();
 
 tcg_gen_andi_tl(base_val, val, 0xff);
@@ -124,10 +134,9 @@ void gen_log_pred_write(DisasContext *ctx, int pnum, TCGv 
val)
  * straight assignment.  Otherwise, do an and.
  */
 if (!test_bit(pnum, ctx->pregs_written)) {
-tcg_gen_mov_tl(hex_new_pred_value[pnum], base_val);
+tcg_gen_mov_tl(pred, base_val);
 } else {
-tcg_gen_and_tl(hex_new_pred_value[pnum],
-   hex_new_pred_value[pnum], base_val);
+tcg_gen_and_tl(pred, pred, base_val);
 }
 if (HEX_DEBUG) {
 tcg_gen_ori_tl(hex_pred_written, hex_pred_written, 1 << pnum);
diff --git a/target/hexagon/translate.c b/target/hexagon/translate.c
index 5bd71bdcaf..4532b8d05e 100644
--- a/target/hexagon/translate.c
+++ b/target/hexagon/translate.c
@@ -386,6 +386,14 @@ static bool need_commit(DisasContext *ctx)
 }
 }
 
+/* Check for overlap between predicate reads and writes */
+for (int i = 0; i < ctx->preg_log_idx; i++) {
+int pnum = ctx->preg_log[i];
+if (test_bit(pnum, ctx->pregs_read)) {
+return true;
+}
+}
+
 return false;
 }
 
@@ -503,7 +511,7 @@ static void gen_start_packet(DisasContext *ctx)
  * Preload the predicated pred registers into hex_new_pred_value[pred_num]
  * Only endloop instructions conditionally write to pred registers
  */
-if (pkt->pkt_has_endloop) {
+if (ctx->need_commit && pkt->pkt_has_endloop) {
 for (int i = 0; i < ctx->preg_log_idx; i++) {
 int pred_num = ctx->preg_log[i];
 tcg_gen_mov_tl(hex_new_pred_value[pred_num], hex_pred[pred_num]);
@@ -619,8 +627,8 @@ static void gen_reg_writes(DisasContext *ctx)
 
 static void gen_pred_writes(DisasContext *ctx)
 {
-/* Early exit if the log is empty */
-if (!ctx->preg_log_idx) {
+/* Early exit if not needed or the log is empty */
+if (!ctx->need_commit || !ctx->preg_log_idx) {
 return;
 }
 
-- 
2.25.1



[PATCH v2 04/21] Hexagon (target/hexagon) Add overrides for allocframe/deallocframe

2023-04-27 Thread Taylor Simpson
These instructions have implicit writes to registers, so we don't
want them to be helpers when idef-parser is off.

Signed-off-by: Taylor Simpson 
Reviewed-by: Richard Henderson 
---
 target/hexagon/gen_tcg.h | 32 +++
 target/hexagon/genptr.c  | 47 
 2 files changed, 79 insertions(+)

diff --git a/target/hexagon/gen_tcg.h b/target/hexagon/gen_tcg.h
index 5774af4a59..7c5cb93297 100644
--- a/target/hexagon/gen_tcg.h
+++ b/target/hexagon/gen_tcg.h
@@ -500,6 +500,38 @@
 #define fGEN_TCG_Y2_icinva(SHORTCODE) \
 do { RsV = RsV; } while (0)
 
+/*
+ * allocframe(#uiV)
+ * RxV == r29
+ */
+#define fGEN_TCG_S2_allocframe(SHORTCODE) \
+gen_allocframe(ctx, RxV, uiV)
+
+/* sub-instruction version (no RxV, so handle it manually) */
+#define fGEN_TCG_SS2_allocframe(SHORTCODE) \
+do { \
+TCGv r29 = tcg_temp_new(); \
+tcg_gen_mov_tl(r29, hex_gpr[HEX_REG_SP]); \
+gen_allocframe(ctx, r29, uiV); \
+gen_log_reg_write(ctx, HEX_REG_SP, r29); \
+} while (0)
+
+/*
+ * Rdd32 = deallocframe(Rs32):raw
+ * RddV == r31:30
+ * RsV  == r30
+ */
+#define fGEN_TCG_L2_deallocframe(SHORTCODE) \
+gen_deallocframe(ctx, RddV, RsV)
+
+/* sub-instruction version (no RddV/RsV, so handle it manually) */
+#define fGEN_TCG_SL2_deallocframe(SHORTCODE) \
+do { \
+TCGv_i64 r31_30 = tcg_temp_new_i64(); \
+gen_deallocframe(ctx, r31_30, hex_gpr[HEX_REG_FP]); \
+gen_log_reg_write_pair(ctx, HEX_REG_FP, r31_30); \
+} while (0)
+
 /*
  * dealloc_return
  * Assembler mapped to
diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index 4c34da8407..43f6c6fb9f 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -709,6 +709,18 @@ static void gen_cond_callr(DisasContext *ctx,
 gen_set_label(skip);
 }
 
+#ifndef CONFIG_HEXAGON_IDEF_PARSER
+/* frame = ((LR << 32) | FP) ^ (FRAMEKEY << 32)) */
+static TCGv_i64 gen_frame_scramble(void)
+{
+TCGv_i64 frame = tcg_temp_new_i64();
+TCGv tmp = tcg_temp_new();
+tcg_gen_xor_tl(tmp, hex_gpr[HEX_REG_LR], hex_gpr[HEX_REG_FRAMEKEY]);
+tcg_gen_concat_i32_i64(frame, hex_gpr[HEX_REG_FP], tmp);
+return frame;
+}
+#endif
+
 /* frame ^= (int64_t)FRAMEKEY << 32 */
 static void gen_frame_unscramble(TCGv_i64 frame)
 {
@@ -725,6 +737,41 @@ static void gen_load_frame(DisasContext *ctx, TCGv_i64 
frame, TCGv EA)
 tcg_gen_qemu_ld64(frame, EA, ctx->mem_idx);
 }
 
+#ifndef CONFIG_HEXAGON_IDEF_PARSER
+/* Stack overflow check */
+static void gen_framecheck(TCGv EA, int framesize)
+{
+/* Not modelled in linux-user mode */
+/* Placeholder for system mode */
+#ifndef CONFIG_USER_ONLY
+g_assert_not_reached();
+#endif
+}
+
+static void gen_allocframe(DisasContext *ctx, TCGv r29, int framesize)
+{
+TCGv r30 = tcg_temp_new();
+TCGv_i64 frame;
+tcg_gen_addi_tl(r30, r29, -8);
+frame = gen_frame_scramble();
+gen_store8(cpu_env, r30, frame, ctx->insn->slot);
+gen_log_reg_write(ctx, HEX_REG_FP, r30);
+gen_framecheck(r30, framesize);
+tcg_gen_subi_tl(r29, r30, framesize);
+}
+
+static void gen_deallocframe(DisasContext *ctx, TCGv_i64 r31_30, TCGv r30)
+{
+TCGv r29 = tcg_temp_new();
+TCGv_i64 frame = tcg_temp_new_i64();
+gen_load_frame(ctx, frame, r30);
+gen_frame_unscramble(frame);
+tcg_gen_mov_i64(r31_30, frame);
+tcg_gen_addi_tl(r29, r30, 8);
+gen_log_reg_write(ctx, HEX_REG_SP, r29);
+}
+#endif
+
 static void gen_return(DisasContext *ctx, TCGv_i64 dst, TCGv src)
 {
 /*
-- 
2.25.1



[PATCH v2 5/9] Hexagon (tests/tcg/hexagon) Add v68 HVX tests

2023-04-27 Thread Taylor Simpson
Signed-off-by: Taylor Simpson 
Reviewed-by: Anton Johansson 
---
 tests/tcg/hexagon/v68_hvx.c   |  90 +
 tests/tcg/hexagon/v6mpy_ref.c.inc | 161 ++
 tests/tcg/hexagon/Makefile.target |   3 +
 3 files changed, 254 insertions(+)
 create mode 100644 tests/tcg/hexagon/v68_hvx.c
 create mode 100644 tests/tcg/hexagon/v6mpy_ref.c.inc

diff --git a/tests/tcg/hexagon/v68_hvx.c b/tests/tcg/hexagon/v68_hvx.c
new file mode 100644
index 00..02718722a3
--- /dev/null
+++ b/tests/tcg/hexagon/v68_hvx.c
@@ -0,0 +1,90 @@
+/*
+ *  Copyright(c) 2022-2023 Qualcomm Innovation Center, Inc. All Rights 
Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see .
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+int err;
+
+#include "hvx_misc.h"
+
+MMVector v6mpy_buffer0[BUFSIZE] __attribute__((aligned(MAX_VEC_SIZE_BYTES)));
+MMVector v6mpy_buffer1[BUFSIZE] __attribute__((aligned(MAX_VEC_SIZE_BYTES)));
+
+static void init_v6mpy_buffers(void)
+{
+int counter0 = 0;
+int counter1 = 17;
+for (int i = 0; i < BUFSIZE; i++) {
+for (int j = 0; j < MAX_VEC_SIZE_BYTES / 4; j++) {
+v6mpy_buffer0[i].w[j] = counter0++;
+v6mpy_buffer1[i].w[j] = counter1++;
+}
+}
+}
+
+int v6mpy_ref[BUFSIZE][MAX_VEC_SIZE_BYTES / 4] = {
+#include "v6mpy_ref.c.inc"
+};
+
+static void test_v6mpy(void)
+{
+void *p00 = buffer0;
+void *p01 = v6mpy_buffer0;
+void *p10 = buffer1;
+void *p11 = v6mpy_buffer1;
+void *pout = output;
+
+memset(expect, 0xff, sizeof(expect));
+memset(output, 0xff, sizeof(expect));
+
+for (int i = 0; i < BUFSIZE; i++) {
+asm("v2 = vmem(%0 + #0)\n\t"
+"v3 = vmem(%1 + #0)\n\t"
+"v4 = vmem(%2 + #0)\n\t"
+"v5 = vmem(%3 + #0)\n\t"
+"v5:4.w = v6mpy(v5:4.ub, v3:2.b, #1):v\n\t"
+"vmem(%4 + #0) = v4\n\t"
+: : "r"(p00), "r"(p01), "r"(p10), "r"(p11), "r"(pout)
+: "v2", "v3", "v4", "v5", "memory");
+p00 += sizeof(MMVector);
+p01 += sizeof(MMVector);
+p10 += sizeof(MMVector);
+p11 += sizeof(MMVector);
+pout += sizeof(MMVector);
+
+for (int j = 0; j < MAX_VEC_SIZE_BYTES / 4; j++) {
+expect[i].w[j] = v6mpy_ref[i][j];
+}
+}
+
+check_output_w(__LINE__, BUFSIZE);
+}
+
+int main()
+{
+init_buffers();
+init_v6mpy_buffers();
+
+test_v6mpy();
+
+puts(err ? "FAIL" : "PASS");
+return err ? 1 : 0;
+}
diff --git a/tests/tcg/hexagon/v6mpy_ref.c.inc 
b/tests/tcg/hexagon/v6mpy_ref.c.inc
new file mode 100644
index 00..8258cddcb1
--- /dev/null
+++ b/tests/tcg/hexagon/v6mpy_ref.c.inc
@@ -0,0 +1,161 @@
+/*
+ *  Copyright(c) 2021-2023 Qualcomm Innovation Center, Inc. All Rights 
Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see .
+ */
+
+{ 0xee11, 0xfcca, 0xc1b3, 0xd0cc,
+  0xe215, 0xf58e, 0xaf37, 0xc310,
+  0xd919, 0xf152, 0x9fbb, 0xb854,
+  0xd31d, 0xf016, 0x933f, 0xb098,
+  0xd021, 0xf1da, 0x89c3, 0xabdc,
+  0xd025, 0xf69e, 0x8347, 0xaa20,
+  0xd329, 0xfe62, 0x7fcb, 0xab64,
+  0xd92d, 0x0926, 0x7f4f, 0xafa8,
+  },
+{ 0xe231, 0x16ea, 0x81d3, 0xb6ec,
+  0xee35, 0x27ae, 0x8757, 0xc130,
+  0xfd39, 0x3b72, 0x8fdb, 0xce74,
+  0x0f3d, 0x5236, 0x9b5f, 0xdeb8,
+  0x2441, 0x6bfa, 0xa9e3, 0xf1fc,
+  0x3c45, 0x88be, 0xbb67, 0x0840,
+  0x5749, 0xa882, 0xcfeb, 0xe684,
+  0x494d, 0x9a46, 0xb16f, 0x02c8,
+  },
+{ 0xf351, 0x440a, 0x4af3, 0x9c0c,
+  

[PATCH v2 6/9] Hexagon (target/hexagon) Add v69 HVX instructions

2023-04-27 Thread Taylor Simpson
The following instructions are added
V6_vasrvuhubrndsat
V6_vasrvuhubsat
V6_vasrvwuhrndsat
V6_vasrvwuhsat
V6_vassign_tmp
V6_vcombine_tmp
V6_vmpyuhvs

Signed-off-by: Taylor Simpson 
Reviewed-by: Anton Johansson 
---
 target/hexagon/gen_tcg_hvx.h | 12 ++
 target/hexagon/attribs_def.h.inc |  8 
 target/hexagon/imported/mmvec/encode_ext.def |  8 
 target/hexagon/imported/mmvec/ext.idef   | 40 
 4 files changed, 68 insertions(+)

diff --git a/target/hexagon/gen_tcg_hvx.h b/target/hexagon/gen_tcg_hvx.h
index d4aefe8e3f..8dceead5e5 100644
--- a/target/hexagon/gen_tcg_hvx.h
+++ b/target/hexagon/gen_tcg_hvx.h
@@ -128,6 +128,18 @@ static inline void assert_vhist_tmp(DisasContext *ctx)
 tcg_gen_gvec_mov(MO_64, VdV_off, VuV_off, \
  sizeof(MMVector), sizeof(MMVector))
 
+#define fGEN_TCG_V6_vassign_tmp(SHORTCODE) \
+tcg_gen_gvec_mov(MO_64, VdV_off, VuV_off, \
+ sizeof(MMVector), sizeof(MMVector))
+
+#define fGEN_TCG_V6_vcombine_tmp(SHORTCODE) \
+do { \
+tcg_gen_gvec_mov(MO_64, VddV_off, VvV_off, \
+ sizeof(MMVector), sizeof(MMVector)); \
+tcg_gen_gvec_mov(MO_64, VddV_off + sizeof(MMVector), VuV_off, \
+ sizeof(MMVector), sizeof(MMVector)); \
+} while (0)
+
 /* Vector conditional move */
 #define fGEN_TCG_VEC_CMOV(PRED) \
 do { \
diff --git a/target/hexagon/attribs_def.h.inc b/target/hexagon/attribs_def.h.inc
index 0ddfb45bdf..3bef60bef3 100644
--- a/target/hexagon/attribs_def.h.inc
+++ b/target/hexagon/attribs_def.h.inc
@@ -69,11 +69,13 @@ DEF_ATTRIB(CVI_VP_VS, "Double vector permute/shft insn 
executes on HVX", "", "")
 DEF_ATTRIB(CVI_VX, "Multiply instruction executes on HVX", "", "")
 DEF_ATTRIB(CVI_VX_DV, "Double vector multiply insn executes on HVX", "", "")
 DEF_ATTRIB(CVI_VS, "Shift instruction executes on HVX", "", "")
+DEF_ATTRIB(CVI_VS_3SRC, "This shift needs to borrow a source register", "", "")
 DEF_ATTRIB(CVI_VS_VX, "Permute/shift and multiply insn executes on HVX", "", 
"")
 DEF_ATTRIB(CVI_VA, "ALU instruction executes on HVX", "", "")
 DEF_ATTRIB(CVI_VA_DV, "Double vector alu instruction executes on HVX", "", "")
 DEF_ATTRIB(CVI_4SLOT, "Consumes all the vector execution resources", "", "")
 DEF_ATTRIB(CVI_TMP, "Transient Memory Load not written to register", "", "")
+DEF_ATTRIB(CVI_REMAP, "Register Renaming not written to register file", "", "")
 DEF_ATTRIB(CVI_GATHER, "CVI Gather operation", "", "")
 DEF_ATTRIB(CVI_SCATTER, "CVI Scatter operation", "", "")
 DEF_ATTRIB(CVI_SCATTER_RELEASE, "CVI Store Release for scatter", "", "")
@@ -147,6 +149,8 @@ DEF_ATTRIB(L2FETCH, "Instruction is l2fetch type", "", "")
 DEF_ATTRIB(ICINVA, "icinva", "", "")
 DEF_ATTRIB(DCCLEANINVA, "dccleaninva", "", "")
 
+DEF_ATTRIB(NO_INTRINSIC, "Don't generate an intrisic", "", "")
+
 /* Documentation Notes */
 DEF_ATTRIB(NOTE_CONDITIONAL, "can be conditionally executed", "", "")
 DEF_ATTRIB(NOTE_NEWVAL_SLOT0, "New-value oprnd must execute on slot 0", "", "")
@@ -155,7 +159,11 @@ DEF_ATTRIB(NOTE_NOPACKET, "solo instruction", "", "")
 DEF_ATTRIB(NOTE_AXOK, "May only be grouped with ALU32 or non-FP XTYPE.", "", 
"")
 DEF_ATTRIB(NOTE_LATEPRED, "The predicate can not be used as a .new", "", "")
 DEF_ATTRIB(NOTE_NVSLOT0, "Can execute only in slot 0 (ST)", "", "")
+DEF_ATTRIB(NOTE_NOVP, "Cannot be paired with a HVX permute instruction", "", 
"")
+DEF_ATTRIB(NOTE_VA_UNARY, "Combined with HVX ALU op (must be unary)", "", "")
 
+/* V6 MMVector Notes for Documentation */
+DEF_ATTRIB(NOTE_SHIFT_RESOURCE, "Uses the HVX shift resource.", "", "")
 /* Restrictions to make note of */
 DEF_ATTRIB(RESTRICT_NOSLOT1_STORE, "Packet must not have slot 1 store", "", "")
 DEF_ATTRIB(RESTRICT_LATEPRED, "Predicate can not be used as a .new.", "", "")
diff --git a/target/hexagon/imported/mmvec/encode_ext.def 
b/target/hexagon/imported/mmvec/encode_ext.def
index b9b62fef8d..402438f566 100644
--- a/target/hexagon/imported/mmvec/encode_ext.def
+++ b/target/hexagon/imported/mmvec/encode_ext.def
@@ -257,6 +257,11 @@ DEF_ENC(V6_vasruhubrndsat, ICLASS_CJ" 1 000 vvv 
vvttt PP 0 u 111 ddd
 DEF_ENC(V6_vasruwuhsat, ICLASS_CJ" 1 000 vvv vvttt PP 1 u 100 
d") //
 DEF_ENC(V6_vasruhubsat,ICLASS_CJ" 1 000 vvv vvttt PP 1 u 101 
d") //
 
+DEF_ENC(V6_vasrvuhubrndsat,"00011101000vPP0u011d")
+DEF_ENC(V6_vasrvuhubsat,"00011101000vPP0u010d")
+DEF_ENC(V6_vasrvwuhrndsat,"00011101000vPP0u001d")
+DEF_ENC(V6_vasrvwuhsat,"00011101000vPP0u000d")
+
 /***
 *
 *  Group #1, Uses Q6 Rt32
@@ -716,6 +721,7 @@ DEF_ENC(V6_vaddclbw,ICLASS_CJ" 1 111 000 v PP 1 
u 001 d") //
 
 DEF_ENC(V6_vavguw,ICLASS_CJ" 1 111 000 v PP 1 u 010 d") //
 DEF_ENC(V6_vavguwrnd,ICLASS_CJ" 1 111 000 v PP 1 

[PATCH v2 0/9] Hexagon (target/hexagon) New architecture support

2023-04-27 Thread Taylor Simpson
Add support for new Hexagon architecture versions v68/v69/v71/v73

 Changes in v2 
Address feedback from Anton Johansson 
Rename v6mpy_ref.h to v6mpy_ref.c.inc
Shorten format of hexagon_v*_cpu_init_functions
Change loop counts MAX_VEC_SIZE_BYTES / 2 to MAX_VEC_SIZE_BYTES / 4



Taylor Simpson (9):
  Hexagon (target/hexagon) Add support for v68/v69/v71/v73
  Hexagon (target/hexagon) Add v68 scalar instructions
  Hexagon (tests/tcg/hexagon) Add v68 scalar tests
  Hexagon (target/hexagon) Add v68 HVX instructions
  Hexagon (tests/tcg/hexagon) Add v68 HVX tests
  Hexagon (target/hexagon) Add v69 HVX instructions
  Hexagon (tests/tcg/hexagon) Add v69 HVX tests
  Hexagon (target/hexagon) Add v73 scalar instructions
  Hexagon (tests/tcg/hexagon) Add v73 scalar tests

 configure|   2 +-
 linux-user/hexagon/target_elf.h  |  13 +-
 target/hexagon/cpu.h |   4 +
 target/hexagon/gen_tcg.h |  22 ++
 target/hexagon/gen_tcg_hvx.h |  12 +
 target/hexagon/mmvec/macros.h|   9 +-
 target/hexagon/attribs_def.h.inc |  16 +
 target/hexagon/cpu.c |  14 +-
 target/hexagon/translate.c   |   3 +
 tests/tcg/hexagon/misc.c |  12 +
 tests/tcg/hexagon/v68_hvx.c  |  90 ++
 tests/tcg/hexagon/v68_scalar.c   | 186 +++
 tests/tcg/hexagon/v69_hvx.c  | 318 ++
 tests/tcg/hexagon/v73_scalar.c   |  96 ++
 tests/tcg/hexagon/v6mpy_ref.c.inc| 161 ++
 target/hexagon/gen_idef_parser_funcs.py  |   2 +
 target/hexagon/imported/branch.idef  |   7 +-
 target/hexagon/imported/encode_pp.def|  21 +-
 target/hexagon/imported/ldst.idef|  20 +-
 target/hexagon/imported/mmvec/encode_ext.def |  16 +-
 target/hexagon/imported/mmvec/ext.idef   | 321 ++-
 tests/tcg/hexagon/Makefile.target|  13 +
 22 files changed, 1339 insertions(+), 19 deletions(-)
 create mode 100644 tests/tcg/hexagon/v68_hvx.c
 create mode 100644 tests/tcg/hexagon/v68_scalar.c
 create mode 100644 tests/tcg/hexagon/v69_hvx.c
 create mode 100644 tests/tcg/hexagon/v73_scalar.c
 create mode 100644 tests/tcg/hexagon/v6mpy_ref.c.inc

-- 
2.25.1



[PATCH v2 1/9] Hexagon (target/hexagon) Add support for v68/v69/v71/v73

2023-04-27 Thread Taylor Simpson
Add support for the ELF flags
Move target/hexagon/cpu.[ch] to be v73
Change the compiler flag used by "make check-tcg"

The decbin instruction is removed in Hexagon v73, so check the
version before trying to compile the instruction.

Signed-off-by: Taylor Simpson 
Reviewed-by: Anton Johansson 
---
 configure |  2 +-
 linux-user/hexagon/target_elf.h   | 13 +
 target/hexagon/cpu.h  |  4 
 target/hexagon/cpu.c  | 14 ++
 tests/tcg/hexagon/misc.c  | 12 
 tests/tcg/hexagon/Makefile.target |  3 +++
 6 files changed, 39 insertions(+), 9 deletions(-)

diff --git a/configure b/configure
index 77c03315f8..01fa77f6c7 100755
--- a/configure
+++ b/configure
@@ -1857,7 +1857,7 @@ fi
 : ${cross_cc_armeb="$cross_cc_arm"}
 : ${cross_cc_cflags_armeb="-mbig-endian"}
 : ${cross_cc_hexagon="hexagon-unknown-linux-musl-clang"}
-: ${cross_cc_cflags_hexagon="-mv67 -O2 -static"}
+: ${cross_cc_cflags_hexagon="-mv73 -O2 -static"}
 : ${cross_cc_cflags_i386="-m32"}
 : ${cross_cc_cflags_ppc="-m32 -mbig-endian"}
 : ${cross_cc_cflags_ppc64="-m64 -mbig-endian"}
diff --git a/linux-user/hexagon/target_elf.h b/linux-user/hexagon/target_elf.h
index b4e9f40527..a0271a0a2a 100644
--- a/linux-user/hexagon/target_elf.h
+++ b/linux-user/hexagon/target_elf.h
@@ -1,5 +1,5 @@
 /*
- *  Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights 
Reserved.
+ *  Copyright(c) 2019-2023 Qualcomm Innovation Center, Inc. All Rights 
Reserved.
  *
  *  This program is free software; you can redistribute it and/or modify
  *  it under the terms of the GNU General Public License as published by
@@ -20,7 +20,7 @@
 
 static inline const char *cpu_get_model(uint32_t eflags)
 {
-/* For now, treat anything newer than v5 as a v67 */
+/* For now, treat anything newer than v5 as a v73 */
 /* FIXME - Disable instructions that are newer than the specified arch */
 if (eflags == 0x04 ||/* v5  */
 eflags == 0x05 ||/* v55 */
@@ -30,9 +30,14 @@ static inline const char *cpu_get_model(uint32_t eflags)
 eflags == 0x65 ||/* v65 */
 eflags == 0x66 ||/* v66 */
 eflags == 0x67 ||/* v67 */
-eflags == 0x8067 /* v67t */
+eflags == 0x8067 ||  /* v67t */
+eflags == 0x68 ||/* v68 */
+eflags == 0x69 ||/* v69 */
+eflags == 0x71 ||/* v71 */
+eflags == 0x8071 ||  /* v71t */
+eflags == 0x73   /* v73 */
) {
-return "v67";
+return "v73";
 }
 return "unknown";
 }
diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h
index 81b663ecfb..4d8981d862 100644
--- a/target/hexagon/cpu.h
+++ b/target/hexagon/cpu.h
@@ -43,6 +43,10 @@
 #define CPU_RESOLVING_TYPE TYPE_HEXAGON_CPU
 
 #define TYPE_HEXAGON_CPU_V67 HEXAGON_CPU_TYPE_NAME("v67")
+#define TYPE_HEXAGON_CPU_V68 HEXAGON_CPU_TYPE_NAME("v68")
+#define TYPE_HEXAGON_CPU_V69 HEXAGON_CPU_TYPE_NAME("v69")
+#define TYPE_HEXAGON_CPU_V71 HEXAGON_CPU_TYPE_NAME("v71")
+#define TYPE_HEXAGON_CPU_V73 HEXAGON_CPU_TYPE_NAME("v73")
 
 #define MMU_USER_IDX 0
 
diff --git a/target/hexagon/cpu.c b/target/hexagon/cpu.c
index ab40cfc283..c78fe25c9f 100644
--- a/target/hexagon/cpu.c
+++ b/target/hexagon/cpu.c
@@ -1,5 +1,5 @@
 /*
- *  Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights 
Reserved.
+ *  Copyright(c) 2019-2023 Qualcomm Innovation Center, Inc. All Rights 
Reserved.
  *
  *  This program is free software; you can redistribute it and/or modify
  *  it under the terms of the GNU General Public License as published by
@@ -25,9 +25,11 @@
 #include "fpu/softfloat-helpers.h"
 #include "tcg/tcg.h"
 
-static void hexagon_v67_cpu_init(Object *obj)
-{
-}
+static void hexagon_v67_cpu_init(Object *obj) { }
+static void hexagon_v68_cpu_init(Object *obj) { }
+static void hexagon_v69_cpu_init(Object *obj) { }
+static void hexagon_v71_cpu_init(Object *obj) { }
+static void hexagon_v73_cpu_init(Object *obj) { }
 
 static ObjectClass *hexagon_cpu_class_by_name(const char *cpu_model)
 {
@@ -382,6 +384,10 @@ static const TypeInfo hexagon_cpu_type_infos[] = {
 .class_init = hexagon_cpu_class_init,
 },
 DEFINE_CPU(TYPE_HEXAGON_CPU_V67,  hexagon_v67_cpu_init),
+DEFINE_CPU(TYPE_HEXAGON_CPU_V68,  hexagon_v68_cpu_init),
+DEFINE_CPU(TYPE_HEXAGON_CPU_V69,  hexagon_v69_cpu_init),
+DEFINE_CPU(TYPE_HEXAGON_CPU_V71,  hexagon_v71_cpu_init),
+DEFINE_CPU(TYPE_HEXAGON_CPU_V73,  hexagon_v73_cpu_init),
 };
 
 DEFINE_TYPES(hexagon_cpu_type_infos)
diff --git a/tests/tcg/hexagon/misc.c b/tests/tcg/hexagon/misc.c
index e126751e3a..4fcbb22795 100644
--- a/tests/tcg/hexagon/misc.c
+++ b/tests/tcg/hexagon/misc.c
@@ -18,6 +18,8 @@
 #include 
 #include 
 
+#define CORE_HAS_CABAC(__HEXAGON_ARCH__ <= 71)
+
 typedef unsigned char uint8_t;
 typedef unsigned short uint16_t;
 typedef unsigned int uint32_t;
@@ -245,6 

[PATCH v2 8/9] Hexagon (target/hexagon) Add v73 scalar instructions

2023-04-27 Thread Taylor Simpson
The following instructions are added
J2_callrh
J2_junprh

Signed-off-by: Taylor Simpson 
Reviewed-by: Anton Johansson 
---
 target/hexagon/gen_tcg.h  | 4 
 target/hexagon/attribs_def.h.inc  | 1 +
 target/hexagon/imported/branch.idef   | 7 ++-
 target/hexagon/imported/encode_pp.def | 2 ++
 4 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/target/hexagon/gen_tcg.h b/target/hexagon/gen_tcg.h
index 598d80d3ce..6f12f665db 100644
--- a/target/hexagon/gen_tcg.h
+++ b/target/hexagon/gen_tcg.h
@@ -653,6 +653,8 @@
 gen_call(ctx, riV)
 #define fGEN_TCG_J2_callr(SHORTCODE) \
 gen_callr(ctx, RsV)
+#define fGEN_TCG_J2_callrh(SHORTCODE) \
+gen_callr(ctx, RsV)
 
 #define fGEN_TCG_J2_callt(SHORTCODE) \
 gen_cond_call(ctx, PuV, TCG_COND_EQ, riV)
@@ -851,6 +853,8 @@
 gen_jump(ctx, riV)
 #define fGEN_TCG_J2_jumpr(SHORTCODE) \
 gen_jumpr(ctx, RsV)
+#define fGEN_TCG_J2_jumprh(SHORTCODE) \
+gen_jumpr(ctx, RsV)
 #define fGEN_TCG_J4_jumpseti(SHORTCODE) \
 do { \
 tcg_gen_movi_tl(RdV, UiV); \
diff --git a/target/hexagon/attribs_def.h.inc b/target/hexagon/attribs_def.h.inc
index 3bef60bef3..69da9776f0 100644
--- a/target/hexagon/attribs_def.h.inc
+++ b/target/hexagon/attribs_def.h.inc
@@ -89,6 +89,7 @@ DEF_ATTRIB(JUMP, "Jump-type instruction", "", "")
 DEF_ATTRIB(INDIRECT, "Absolute register jump", "", "")
 DEF_ATTRIB(CALL, "Function call instruction", "", "")
 DEF_ATTRIB(COF, "Change-of-flow instruction", "", "")
+DEF_ATTRIB(HINTED_COF, "This instruction is a hinted change-of-flow", "", "")
 DEF_ATTRIB(CONDEXEC, "May be cancelled by a predicate", "", "")
 DEF_ATTRIB(DOTNEWVALUE, "Uses a register value generated in this pkt", "", "")
 DEF_ATTRIB(NEWCMPJUMP, "Compound compare and jump", "", "")
diff --git a/target/hexagon/imported/branch.idef 
b/target/hexagon/imported/branch.idef
index 88f5f48cce..93e2e375a5 100644
--- a/target/hexagon/imported/branch.idef
+++ b/target/hexagon/imported/branch.idef
@@ -1,5 +1,5 @@
 /*
- *  Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights 
Reserved.
+ *  Copyright(c) 2019-2023 Qualcomm Innovation Center, Inc. All Rights 
Reserved.
  *
  *  This program is free software; you can redistribute it and/or modify
  *  it under the terms of the GNU General Public License as published by
@@ -34,6 +34,9 @@ Q6INSN(J2_jump,"jump #r22:2",ATTRIBS(A_JDIR), "direct 
unconditional jump",
 Q6INSN(J2_jumpr,"jumpr Rs32",ATTRIBS(A_JINDIR), "indirect unconditional jump",
 {fJUMPR(RsN,RsV,COF_TYPE_JUMPR);})
 
+Q6INSN(J2_jumprh,"jumprh Rs32",ATTRIBS(A_JINDIR, A_HINTED_COF), "indirect 
unconditional jump",
+{fJUMPR(RsN,RsV,COF_TYPE_JUMPR);})
+
 #define OLDCOND_JUMP(TAG,OPER,OPER2,ATTRIB,DESCR,SEMANTICS) \
 Q6INSN(TAG##t,"if (Pu4) "OPER":nt 
"OPER2,ATTRIB,DESCR,{fBRANCH_SPECULATE_STALL(fLSBOLD(PuV),,SPECULATE_NOT_TAKEN,12,0);
 if (fLSBOLD(PuV)) { SEMANTICS; }}) \
 Q6INSN(TAG##f,"if (!Pu4) "OPER":nt 
"OPER2,ATTRIB,DESCR,{fBRANCH_SPECULATE_STALL(fLSBOLDNOT(PuV),,SPECULATE_NOT_TAKEN,12,0);
 if (fLSBOLDNOT(PuV)) { SEMANTICS; }}) \
@@ -196,6 +199,8 @@ Q6INSN(J2_callrt,"if (Pu4) callr 
Rs32",ATTRIBS(CINDIR_STD),"indirect conditional
 Q6INSN(J2_callrf,"if (!Pu4) callr Rs32",ATTRIBS(CINDIR_STD),"indirect 
conditional call if false",
 {fBRANCH_SPECULATE_STALL(fLSBOLDNOT(PuV),,SPECULATE_NOT_TAKEN,12,0);if 
(fLSBOLDNOT(PuV)) { fCALLR(RsV); }})
 
+Q6INSN(J2_callrh,"callrh Rs32",ATTRIBS(CINDIR_STD, A_HINTED_COF), "hinted 
indirect unconditional call",
+{ fCALLR(RsV); })
 
 
 
diff --git a/target/hexagon/imported/encode_pp.def 
b/target/hexagon/imported/encode_pp.def
index 763f465bfd..0cd30a5e85 100644
--- a/target/hexagon/imported/encode_pp.def
+++ b/target/hexagon/imported/encode_pp.def
@@ -524,6 +524,7 @@ DEF_FIELD32(ICLASS_J" 110-  PP-! 
",J_PT,"Predict-taken")
 
 DEF_FIELDROW_DESC32(ICLASS_J"   PP-- ","[#0] PC=(Rs), 
R31=return")
 DEF_ENC32(J2_callr, ICLASS_J"   101s  PP--  ")
+DEF_ENC32(J2_callrh,ICLASS_J"   110s  PP--  ")
 
 DEF_FIELDROW_DESC32(ICLASS_J" 0001  PP-- ","[#1] if (Pu) 
PC=(Rs), R31=return")
 DEF_ENC32(J2_callrt,ICLASS_J" 0001  000s  PPuu  ")
@@ -531,6 +532,7 @@ DEF_ENC32(J2_callrf,ICLASS_J" 0001  001s  PPuu  
")
 
 DEF_FIELDROW_DESC32(ICLASS_J" 0010  PP-- ","[#2] PC=(Rs); 
")
 DEF_ENC32(J2_jumpr,  ICLASS_J" 0010  100s  PP--  ")
+DEF_ENC32(J2_jumprh, ICLASS_J" 0010  110s  PP--  ")
 DEF_ENC32(J4_hintjumpr,  ICLASS_J" 0010  101s  PP--  ")
 
 DEF_FIELDROW_DESC32(ICLASS_J" 0011  PP-- ","[#3] if (Pu) 
PC=(Rs) ")
-- 
2.25.1



[PATCH v2 4/9] Hexagon (target/hexagon) Add v68 HVX instructions

2023-04-27 Thread Taylor Simpson
The following instructions are added
V6_v6mpyvubs10_vxx
V6_v6mpyhubs10_vxx
V6_v6mpyvubs10
V6_v6mpyhubs10

Signed-off-by: Taylor Simpson 
Reviewed-by: Anton Johansson 
---
 target/hexagon/mmvec/macros.h|   9 +-
 target/hexagon/imported/mmvec/encode_ext.def |   8 +-
 target/hexagon/imported/mmvec/ext.idef   | 281 ++-
 3 files changed, 295 insertions(+), 3 deletions(-)

diff --git a/target/hexagon/mmvec/macros.h b/target/hexagon/mmvec/macros.h
index 1201d778d0..a655634fd1 100644
--- a/target/hexagon/mmvec/macros.h
+++ b/target/hexagon/mmvec/macros.h
@@ -1,5 +1,5 @@
 /*
- *  Copyright(c) 2019-2022 Qualcomm Innovation Center, Inc. All Rights 
Reserved.
+ *  Copyright(c) 2019-2023 Qualcomm Innovation Center, Inc. All Rights 
Reserved.
  *
  *  This program is free software; you can redistribute it and/or modify
  *  it under the terms of the GNU General Public License as published by
@@ -346,4 +346,11 @@
 #define fUARCH_NOTE_PUMP_2X()
 
 #define IV1DEAD()
+
+#define fGET10BIT(COE, VAL, POS) \
+do { \
+COE = (sextract32(VAL, 24 + 2 * POS, 2) << 8) | \
+   extract32(VAL, POS * 8, 8); \
+} while (0);
+
 #endif
diff --git a/target/hexagon/imported/mmvec/encode_ext.def 
b/target/hexagon/imported/mmvec/encode_ext.def
index 6fbbe2c422..b9b62fef8d 100644
--- a/target/hexagon/imported/mmvec/encode_ext.def
+++ b/target/hexagon/imported/mmvec/encode_ext.def
@@ -1,5 +1,5 @@
 /*
- *  Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights 
Reserved.
+ *  Copyright(c) 2019-2023 Qualcomm Innovation Center, Inc. All Rights 
Reserved.
  *
  *  This program is free software; you can redistribute it and/or modify
  *  it under the terms of the GNU General Public License as published by
@@ -730,6 +730,8 @@ DEF_ENC(V6_vmaxb, ICLASS_CJ" 1 111 001 v PP 0 
u 101 d") //
 DEF_ENC(V6_vsatuwuh,ICLASS_CJ" 1 111 001 v PP 0 u 110 d") //
 DEF_ENC(V6_vdealb4w, ICLASS_CJ" 1 111 001 v PP 0 u 111 d") //
 
+DEF_ENC(V6_v6mpyvubs10_vxx,ICLASS_CJ" 1 111 001 v PP 1 u 0ii 
x")
+DEF_ENC(V6_v6mpyhubs10_vxx,ICLASS_CJ" 1 111 001 v PP 1 u 1ii 
x")
 
 DEF_ENC(V6_vmpyowh_rnd, ICLASS_CJ" 1 111 010 v PP 0 u 000 d") 
//
 DEF_ENC(V6_vshuffeb,  ICLASS_CJ" 1 111 010 v PP 0 u 001 d") //
@@ -740,6 +742,10 @@ DEF_ENC(V6_vshufoeh,  ICLASS_CJ" 1 111 010 v PP 0 
u 101 d") //
 DEF_ENC(V6_vshufoeb,  ICLASS_CJ" 1 111 010 v PP 0 u 110 d") //
 DEF_ENC(V6_vcombine, ICLASS_CJ" 1 111 010 v PP 0 u 111 d") //
 
+DEF_ENC(V6_v6mpyvubs10,  ICLASS_CJ" 1 111 010 v PP 1 u 0ii d")
+DEF_ENC(V6_v6mpyhubs10,  ICLASS_CJ" 1 111 010 v PP 1 u 1ii d")
+
+
 DEF_ENC(V6_vmpyieoh, ICLASS_CJ" 1 111 011 v PP 0 u 000 d") //
 DEF_ENC(V6_vadduwsat, ICLASS_CJ" 1 111 011 v PP 0 u 001 d") //
 DEF_ENC(V6_vsathub, ICLASS_CJ" 1 111 011 v PP 0 u 010 d") //
diff --git a/target/hexagon/imported/mmvec/ext.idef 
b/target/hexagon/imported/mmvec/ext.idef
index 8ca5a606e1..c0d169fd4f 100644
--- a/target/hexagon/imported/mmvec/ext.idef
+++ b/target/hexagon/imported/mmvec/ext.idef
@@ -1,5 +1,5 @@
 /*
- *  Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights 
Reserved.
+ *  Copyright(c) 2019-2023 Qualcomm Innovation Center, Inc. All Rights 
Reserved.
  *
  *  This program is free software; you can redistribute it and/or modify
  *  it under the terms of the GNU General Public License as published by
@@ -116,6 +116,10 @@ ITERATOR_INSN_MPY_SLOT_LATE(WIDTH,TAG, SYNTAX2,DESCR,CODE)
 EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VX_DV),  \
 DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
 
+#define ITERATOR_INSN_MPY_SLOT_DOUBLE_VEC_VX_FWD(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VX_DV),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
 #define 
ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
 ITERATOR_INSN_MPY_SLOT_DOUBLE_VEC(WIDTH,TAG,SYNTAX2,DESCR,CODE)
 
@@ -2507,6 +2511,281 @@ EXTINSN(V6_vscattermhw , 
"vscatter(Rt32,Mu2,Vvv32.w).h=Vw32", ATTRIBS(A_EXTENSIO
 })
 
 
+ITERATOR_INSN_MPY_SLOT_DOUBLE_VEC_VX_FWD(32, v6mpyvubs10_vxx, 
"Vxx32.w+=v6mpy(Vuu32.ub,Vvv32.b,#u2):v", "",
+fHIDE(size2s_t c00;)
+fGET10BIT(c00, VvvV.v[0].uw[i], 0)
+fHIDE(size2s_t c01;)
+fGET10BIT(c01, VvvV.v[0].uw[i], 1)
+fHIDE(size2s_t c02;)
+fGET10BIT(c02, VvvV.v[0].uw[i], 2)
+
+   fHIDE(size2s_t c10;)
+fGET10BIT(c10, VvvV.v[1].uw[i], 0)
+fHIDE(size2s_t c11;)
+fGET10BIT(c11, VvvV.v[1].uw[i], 1)
+fHIDE(size2s_t c12;)
+fGET10BIT(c12, VvvV.v[1].uw[i], 2)
+
+if (uiV == 0) {
+VxxV.v[1].w[i] += fMPY16US(fGETUBYTE(3,VuuV.v[0].uw[i]), c10);
+VxxV.v[1].w[i] += fMPY16US(fGETUBYTE(2,VuuV.v[1].uw[i]), c11);
+VxxV.v[1].w[i] += fMPY16US(fGETUBYTE(3,VuuV.v[1].uw[i]), c12);
+
+

[PATCH v2 3/9] Hexagon (tests/tcg/hexagon) Add v68 scalar tests

2023-04-27 Thread Taylor Simpson
Signed-off-by: Taylor Simpson 
Reviewed-by: Anton Johansson 
---
 tests/tcg/hexagon/v68_scalar.c| 186 ++
 tests/tcg/hexagon/Makefile.target |   2 +
 2 files changed, 188 insertions(+)
 create mode 100644 tests/tcg/hexagon/v68_scalar.c

diff --git a/tests/tcg/hexagon/v68_scalar.c b/tests/tcg/hexagon/v68_scalar.c
new file mode 100644
index 00..7a8adb1130
--- /dev/null
+++ b/tests/tcg/hexagon/v68_scalar.c
@@ -0,0 +1,186 @@
+/*
+ *  Copyright(c) 2023 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see .
+ */
+
+#include 
+#include 
+#include 
+
+/*
+ *  Test the scalar core instructions that are new in v68
+ */
+
+int err;
+
+static int buffer32[] = { 1, 2, 3, 4 };
+static long long buffer64[] = { 5, 6, 7, 8 };
+
+static void __check32(int line, uint32_t result, uint32_t expect)
+{
+if (result != expect) {
+printf("ERROR at line %d: 0x%08x != 0x%08x\n",
+   line, result, expect);
+err++;
+}
+}
+
+#define check32(RES, EXP) __check32(__LINE__, RES, EXP)
+
+static void __check64(int line, uint64_t result, uint64_t expect)
+{
+if (result != expect) {
+printf("ERROR at line %d: 0x%016llx != 0x%016llx\n",
+   line, result, expect);
+err++;
+}
+}
+
+#define check64(RES, EXP) __check64(__LINE__, RES, EXP)
+
+static inline int loadw_aq(int *p)
+{
+int res;
+asm volatile("%0 = memw_aq(%1)\n\t"
+ : "=r"(res) : "r"(p));
+return res;
+}
+
+static void test_loadw_aq(void)
+{
+int res;
+
+res = loadw_aq([0]);
+check32(res, 1);
+res = loadw_aq([1]);
+check32(res, 2);
+}
+
+static inline long long loadd_aq(long long *p)
+{
+long long res;
+asm volatile("%0 = memd_aq(%1)\n\t"
+ : "=r"(res) : "r"(p));
+return res;
+}
+
+static void test_loadd_aq(void)
+{
+long long res;
+
+res = loadd_aq([2]);
+check64(res, 7);
+res = loadd_aq([3]);
+check64(res, 8);
+}
+
+static inline void release_at(int *p)
+{
+asm volatile("release(%0):at\n\t"
+ : : "r"(p));
+}
+
+static void test_release_at(void)
+{
+release_at([2]);
+check64(buffer32[2], 3);
+release_at([3]);
+check64(buffer32[3], 4);
+}
+
+static inline void release_st(int *p)
+{
+asm volatile("release(%0):st\n\t"
+ : : "r"(p));
+}
+
+static void test_release_st(void)
+{
+release_st([2]);
+check64(buffer32[2], 3);
+release_st([3]);
+check64(buffer32[3], 4);
+}
+
+static inline void storew_rl_at(int *p, int val)
+{
+asm volatile("memw_rl(%0):at = %1\n\t"
+ : : "r"(p), "r"(val) : "memory");
+}
+
+static void test_storew_rl_at(void)
+{
+storew_rl_at([2], 9);
+check64(buffer32[2], 9);
+storew_rl_at([3], 10);
+check64(buffer32[3], 10);
+}
+
+static inline void stored_rl_at(long long *p, long long val)
+{
+asm volatile("memd_rl(%0):at = %1\n\t"
+ : : "r"(p), "r"(val) : "memory");
+}
+
+static void test_stored_rl_at(void)
+{
+stored_rl_at([2], 11);
+check64(buffer64[2], 11);
+stored_rl_at([3], 12);
+check64(buffer64[3], 12);
+}
+
+static inline void storew_rl_st(int *p, int val)
+{
+asm volatile("memw_rl(%0):st = %1\n\t"
+ : : "r"(p), "r"(val) : "memory");
+}
+
+static void test_storew_rl_st(void)
+{
+storew_rl_st([0], 13);
+check64(buffer32[0], 13);
+storew_rl_st([1], 14);
+check64(buffer32[1], 14);
+}
+
+static inline void stored_rl_st(long long *p, long long val)
+{
+asm volatile("memd_rl(%0):st = %1\n\t"
+ : : "r"(p), "r"(val) : "memory");
+}
+
+static void test_stored_rl_st(void)
+{
+stored_rl_st([0], 15);
+check64(buffer64[0], 15);
+stored_rl_st([1], 15);
+check64(buffer64[1], 15);
+}
+
+int main()
+{
+test_loadw_aq();
+test_loadd_aq();
+test_release_at();
+test_release_st();
+test_storew_rl_at();
+test_stored_rl_at();
+test_storew_rl_st();
+test_stored_rl_st();
+
+puts(err ? "FAIL" : "PASS");
+return err ? 1 : 0;
+}
diff --git a/tests/tcg/hexagon/Makefile.target 
b/tests/tcg/hexagon/Makefile.target
index 59b1b074e9..b7529e23bc 100644
--- a/tests/tcg/hexagon/Makefile.target
+++ b/tests/tcg/hexagon/Makefile.target
@@ -76,6 +76,8 @@ HEX_TESTS += test_vminh
 HEX_TESTS += 

[PATCH v2 7/9] Hexagon (tests/tcg/hexagon) Add v69 HVX tests

2023-04-27 Thread Taylor Simpson
The following instructions are tested
V6_vasrvuhubrndsat
V6_vasrvuhubsat
V6_vasrvwuhrndsat
V6_vasrvwuhsat
V6_vassign_tmp
V6_vcombine_tmp
V6_vmpyuhvs

Signed-off-by: Taylor Simpson 
Reviewed-by: Anton Johansson 
---
 tests/tcg/hexagon/v69_hvx.c   | 318 ++
 tests/tcg/hexagon/Makefile.target |   3 +
 2 files changed, 321 insertions(+)
 create mode 100644 tests/tcg/hexagon/v69_hvx.c

diff --git a/tests/tcg/hexagon/v69_hvx.c b/tests/tcg/hexagon/v69_hvx.c
new file mode 100644
index 00..a0d567d142
--- /dev/null
+++ b/tests/tcg/hexagon/v69_hvx.c
@@ -0,0 +1,318 @@
+/*
+ *  Copyright(c) 2023 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see .
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+int err;
+
+#include "hvx_misc.h"
+
+#define fVROUND(VAL, SHAMT) \
+((VAL) + (((SHAMT) > 0) ? (1LL << ((SHAMT) - 1)) : 0))
+
+#define fVSATUB(VAL) \
+VAL) & 0xffLL) == (VAL)) ? \
+(VAL) : \
+int32_t)(VAL)) < 0) ? 0 : 0xff))
+
+#define fVSATUH(VAL) \
+VAL) & 0xLL) == (VAL)) ? \
+(VAL) : \
+int32_t)(VAL)) < 0) ? 0 : 0x))
+
+static void test_vasrvuhubrndsat(void)
+{
+void *p0 = buffer0;
+void *p1 = buffer1;
+void *pout = output;
+
+memset(expect, 0xaa, sizeof(expect));
+memset(output, 0xbb, sizeof(output));
+
+for (int i = 0; i < BUFSIZE / 2; i++) {
+asm("v4 = vmem(%0 + #0)\n\t"
+"v5 = vmem(%0 + #1)\n\t"
+"v6 = vmem(%1 + #0)\n\t"
+"v5.ub = vasr(v5:4.uh, v6.ub):rnd:sat\n\t"
+"vmem(%2) = v5\n\t"
+: : "r"(p0), "r"(p1), "r"(pout)
+: "v4", "v5", "v6", "memory");
+p0 += sizeof(MMVector) * 2;
+p1 += sizeof(MMVector);
+pout += sizeof(MMVector);
+
+for (int j = 0; j < MAX_VEC_SIZE_BYTES / 2; j++) {
+int shamt;
+uint8_t byte0;
+uint8_t byte1;
+
+shamt = buffer1[i].ub[2 * j + 0] & 0x7;
+byte0 = fVSATUB(fVROUND(buffer0[2 * i + 0].uh[j], shamt) >> shamt);
+shamt = buffer1[i].ub[2 * j + 1] & 0x7;
+byte1 = fVSATUB(fVROUND(buffer0[2 * i + 1].uh[j], shamt) >> shamt);
+expect[i].uh[j] = (byte1 << 8) | (byte0 & 0xff);
+}
+}
+
+check_output_h(__LINE__, BUFSIZE / 2);
+}
+
+static void test_vasrvuhubsat(void)
+{
+void *p0 = buffer0;
+void *p1 = buffer1;
+void *pout = output;
+
+memset(expect, 0xaa, sizeof(expect));
+memset(output, 0xbb, sizeof(output));
+
+for (int i = 0; i < BUFSIZE / 2; i++) {
+asm("v4 = vmem(%0 + #0)\n\t"
+"v5 = vmem(%0 + #1)\n\t"
+"v6 = vmem(%1 + #0)\n\t"
+"v5.ub = vasr(v5:4.uh, v6.ub):sat\n\t"
+"vmem(%2) = v5\n\t"
+: : "r"(p0), "r"(p1), "r"(pout)
+: "v4", "v5", "v6", "memory");
+p0 += sizeof(MMVector) * 2;
+p1 += sizeof(MMVector);
+pout += sizeof(MMVector);
+
+for (int j = 0; j < MAX_VEC_SIZE_BYTES / 2; j++) {
+int shamt;
+uint8_t byte0;
+uint8_t byte1;
+
+shamt = buffer1[i].ub[2 * j + 0] & 0x7;
+byte0 = fVSATUB(buffer0[2 * i + 0].uh[j] >> shamt);
+shamt = buffer1[i].ub[2 * j + 1] & 0x7;
+byte1 = fVSATUB(buffer0[2 * i + 1].uh[j] >> shamt);
+expect[i].uh[j] = (byte1 << 8) | (byte0 & 0xff);
+}
+}
+
+check_output_h(__LINE__, BUFSIZE / 2);
+}
+
+static void test_vasrvwuhrndsat(void)
+{
+void *p0 = buffer0;
+void *p1 = buffer1;
+void *pout = output;
+
+memset(expect, 0xaa, sizeof(expect));
+memset(output, 0xbb, sizeof(output));
+
+for (int i = 0; i < BUFSIZE / 2; i++) {
+asm("v4 = vmem(%0 + #0)\n\t"
+"v5 = vmem(%0 + #1)\n\t"
+"v6 = vmem(%1 + #0)\n\t"
+"v5.uh = vasr(v5:4.w, v6.uh):rnd:sat\n\t"
+"vmem(%2) = v5\n\t"
+: : "r"(p0), "r"(p1), "r"(pout)
+: "v4", "v5", "v6", "memory");
+p0 += sizeof(MMVector) * 2;
+p1 += sizeof(MMVector);
+pout += sizeof(MMVector);
+
+for (int j = 0; j < MAX_VEC_SIZE_BYTES / 4; j++) {
+int shamt;
+uint16_t half0;
+uint16_t half1;
+
+

[PATCH v2 2/9] Hexagon (target/hexagon) Add v68 scalar instructions

2023-04-27 Thread Taylor Simpson
The following instructions are added
L2_loadw_aq
L4_loadd_aq
R6_release_at_vi
R6_release_st_vi
S2_storew_rl_at_vi
S4_stored_rl_at_vi
S2_storew_rl_st_vi
S4_stored_rl_st_vi

The release instructions are nop's in qemu.  The others behave as
 loads/stores.

The encodings for these instructions changed some "don't care" bits
L2_loadw_locked
L4_loadd_locked
S2_storew_locked
S4_stored_locked

Signed-off-by: Taylor Simpson 
Reviewed-by: Anton Johansson 
---
 target/hexagon/gen_tcg.h| 18 ++
 target/hexagon/attribs_def.h.inc|  7 +++
 target/hexagon/translate.c  |  3 +++
 target/hexagon/gen_idef_parser_funcs.py |  2 ++
 target/hexagon/imported/encode_pp.def   | 19 ++-
 target/hexagon/imported/ldst.idef   | 20 +++-
 6 files changed, 63 insertions(+), 6 deletions(-)

diff --git a/target/hexagon/gen_tcg.h b/target/hexagon/gen_tcg.h
index 329e7a1024..598d80d3ce 100644
--- a/target/hexagon/gen_tcg.h
+++ b/target/hexagon/gen_tcg.h
@@ -1236,6 +1236,24 @@
 uiV = uiV; \
 } while (0)
 
+#define fGEN_TCG_L2_loadw_aq(SHORTCODE) SHORTCODE
+#define fGEN_TCG_L4_loadd_aq(SHORTCODE) SHORTCODE
+
+/* Nothing to do for these in qemu, need to suppress compiler warnings */
+#define fGEN_TCG_R6_release_at_vi(SHORTCODE) \
+do { \
+RsV = RsV; \
+} while (0)
+#define fGEN_TCG_R6_release_st_vi(SHORTCODE) \
+do { \
+RsV = RsV; \
+} while (0)
+
+#define fGEN_TCG_S2_storew_rl_at_vi(SHORTCODE)  SHORTCODE
+#define fGEN_TCG_S4_stored_rl_at_vi(SHORTCODE)  SHORTCODE
+#define fGEN_TCG_S2_storew_rl_st_vi(SHORTCODE)  SHORTCODE
+#define fGEN_TCG_S4_stored_rl_st_vi(SHORTCODE)  SHORTCODE
+
 #define fGEN_TCG_J2_trap0(SHORTCODE) \
 do { \
 uiV = uiV; \
diff --git a/target/hexagon/attribs_def.h.inc b/target/hexagon/attribs_def.h.inc
index 9874d1658f..0ddfb45bdf 100644
--- a/target/hexagon/attribs_def.h.inc
+++ b/target/hexagon/attribs_def.h.inc
@@ -52,6 +52,12 @@ DEF_ATTRIB(REGWRSIZE_4B, "Memory width is 4 bytes", "", "")
 DEF_ATTRIB(REGWRSIZE_8B, "Memory width is 8 bytes", "", "")
 DEF_ATTRIB(MEMLIKE, "Memory-like instruction", "", "")
 DEF_ATTRIB(MEMLIKE_PACKET_RULES, "follows Memory-like packet rules", "", "")
+DEF_ATTRIB(RELEASE, "Releases a lock", "", "")
+DEF_ATTRIB(ACQUIRE, "Acquires a lock", "", "")
+
+DEF_ATTRIB(RLS_INNER, "Store release inner visibility", "", "")
+DEF_ATTRIB(RLS_ALL_THREAD, "Store release among all threads", "", "")
+DEF_ATTRIB(RLS_SAME_THREAD, "Store release with the same thread", "", "")
 
 /* V6 Vector attributes */
 DEF_ATTRIB(CVI, "Executes on the HVX extension", "", "")
@@ -74,6 +80,7 @@ DEF_ATTRIB(CVI_SCATTER_RELEASE, "CVI Store Release for 
scatter", "", "")
 DEF_ATTRIB(CVI_TMP_DST, "CVI instruction that doesn't write a register", "", 
"")
 DEF_ATTRIB(CVI_SLOT23, "Can execute in slot 2 or slot 3 (HVX)", "", "")
 
+DEF_ATTRIB(VTCM_ALLBANK_ACCESS, "Allocates in all VTCM schedulers.", "", "")
 
 /* Change-of-flow attributes */
 DEF_ATTRIB(JUMP, "Jump-type instruction", "", "")
diff --git a/target/hexagon/translate.c b/target/hexagon/translate.c
index c087f183d0..5308d05447 100644
--- a/target/hexagon/translate.c
+++ b/target/hexagon/translate.c
@@ -481,6 +481,9 @@ static void mark_store_width(DisasContext *ctx)
 uint8_t width = 0;
 
 if (GET_ATTRIB(opcode, A_SCALAR_STORE)) {
+if (GET_ATTRIB(opcode, A_MEMSIZE_0B)) {
+return;
+}
 if (GET_ATTRIB(opcode, A_MEMSIZE_1B)) {
 width |= 1;
 }
diff --git a/target/hexagon/gen_idef_parser_funcs.py 
b/target/hexagon/gen_idef_parser_funcs.py
index afe68bdb6f..dc9e396b52 100644
--- a/target/hexagon/gen_idef_parser_funcs.py
+++ b/target/hexagon/gen_idef_parser_funcs.py
@@ -109,6 +109,8 @@ def main():
 continue
 if "A_COF" in hex_common.attribdict[tag]:
 continue
+if ( tag.startswith('R6_release_') ):
+continue
 
 regs = tagregs[tag]
 imms = tagimms[tag]
diff --git a/target/hexagon/imported/encode_pp.def 
b/target/hexagon/imported/encode_pp.def
index d71c04cd30..763f465bfd 100644
--- a/target/hexagon/imported/encode_pp.def
+++ b/target/hexagon/imported/encode_pp.def
@@ -1,5 +1,5 @@
 /*
- *  Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights 
Reserved.
+ *  Copyright(c) 2019-2023 Qualcomm Innovation Center, Inc. All Rights 
Reserved.
  *
  *  This program is free software; you can redistribute it and/or modify
  *  it under the terms of the GNU General Public License as published by
@@ -382,14 +382,23 @@ DEF_ENC32(L4_return_fnew_pt,  ICLASS_LD" 011 0 000 s 
PP1110vv ---d")
 DEF_ENC32(L4_return_tnew_pnt, ICLASS_LD" 011 0 000 s PP0010vv ---d")
 DEF_ENC32(L4_return_fnew_pnt, ICLASS_LD" 011 0 000 s PP1010vv ---d")
 

[PATCH v2 9/9] Hexagon (tests/tcg/hexagon) Add v73 scalar tests

2023-04-27 Thread Taylor Simpson
Tests added for the following instructions
J2_callrh
J2_jumprh

Signed-off-by: Taylor Simpson 
Reviewed-by: Anton Johansson 
---
 tests/tcg/hexagon/v73_scalar.c| 96 +++
 tests/tcg/hexagon/Makefile.target |  2 +
 2 files changed, 98 insertions(+)
 create mode 100644 tests/tcg/hexagon/v73_scalar.c

diff --git a/tests/tcg/hexagon/v73_scalar.c b/tests/tcg/hexagon/v73_scalar.c
new file mode 100644
index 00..fee67fc531
--- /dev/null
+++ b/tests/tcg/hexagon/v73_scalar.c
@@ -0,0 +1,96 @@
+/*
+ *  Copyright(c) 2023 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see .
+ */
+
+#include 
+#include 
+#include 
+
+/*
+ *  Test the scalar core instructions that are new in v73
+ */
+
+int err;
+
+static void __check32(int line, uint32_t result, uint32_t expect)
+{
+if (result != expect) {
+printf("ERROR at line %d: 0x%08x != 0x%08x\n",
+   line, result, expect);
+err++;
+}
+}
+
+#define check32(RES, EXP) __check32(__LINE__, RES, EXP)
+
+static void __check64(int line, uint64_t result, uint64_t expect)
+{
+if (result != expect) {
+printf("ERROR at line %d: 0x%016llx != 0x%016llx\n",
+   line, result, expect);
+err++;
+}
+}
+
+#define check64(RES, EXP) __check64(__LINE__, RES, EXP)
+
+static bool my_func_called;
+
+static void my_func(void)
+{
+my_func_called = true;
+}
+
+static inline void callrh(void *func)
+{
+asm volatile("callrh %0\n\t"
+ : : "r"(func)
+ /* Mark the caller-save registers as clobbered */
+ : "r0", "r1", "r2", "r3", "r4", "r5", "r6", "r7", "r8", "r9",
+   "r10", "r11", "r12", "r13", "r14", "r15", "r28",
+   "p0", "p1", "p2", "p3");
+}
+
+static void test_callrh(void)
+{
+my_func_called = false;
+callrh(_func);
+check32(my_func_called, true);
+}
+
+static void test_jumprh(void)
+{
+uint32_t res;
+asm ("%0 = #5\n\t"
+ "r0 = ##1f\n\t"
+ "jumprh r0\n\t"
+ "%0 = #3\n\t"
+ "jump 2f\n\t"
+ "1:\n\t"
+ "%0 = #1\n\t"
+ "2:\n\t"
+ : "=r"(res) : : "r0");
+check32(res, 1);
+}
+
+int main()
+{
+test_callrh();
+test_jumprh();
+
+puts(err ? "FAIL" : "PASS");
+return err ? 1 : 0;
+}
diff --git a/tests/tcg/hexagon/Makefile.target 
b/tests/tcg/hexagon/Makefile.target
index 558c056148..3172f2e4db 100644
--- a/tests/tcg/hexagon/Makefile.target
+++ b/tests/tcg/hexagon/Makefile.target
@@ -79,6 +79,7 @@ HEX_TESTS += test_vspliceb
 HEX_TESTS += v68_scalar
 HEX_TESTS += v68_hvx
 HEX_TESTS += v69_hvx
+HEX_TESTS += v73_scalar
 
 TESTS += $(HEX_TESTS)
 
@@ -98,6 +99,7 @@ v68_hvx: v68_hvx.c hvx_misc.h v6mpy_ref.c.inc
 v68_hvx: CFLAGS += -mhvx -Wno-unused-function
 v69_hvx: v69_hvx.c hvx_misc.h
 v69_hvx: CFLAGS += -mhvx -Wno-unused-function
+v73_scalar: CFLAGS += -Wno-unused-function
 
 hvx_histogram: hvx_histogram.c hvx_histogram_row.S
$(CC) $(CFLAGS) $(CROSS_CC_GUEST_CFLAGS) $^ -o $@ $(LDFLAGS)
-- 
2.25.1



Re: [PATCH 0/2] target/riscv: RVV 1-fill tail element changes

2023-04-27 Thread Palmer Dabbelt

On Thu, 27 Apr 2023 13:57:06 PDT (-0700), dbarb...@ventanamicro.com wrote:

Hi,

This series makes changes in vext_set_tail_elements_1s() to be a little
nicer to the emulation.

First patch makes the function a no-op when vta == 0. Aside from the
logic simplification we also have a little performance boost.

Second patch makes the function debug only. The logic is explained in
the commit message, but long story short: we don't have to implement any
tail-agnostic policy at all to be spec compliant, but this function has
its uses for debug purposes, so keeping it as a debug option allow users
to disable it on demand.

Patches are based on top of Alistair's riscv-to-apply.next.

Daniel Henrique Barboza (2):
  target/riscv/vector_helper.c: skip set tail when vta is zero
  target/riscv/vector_helper.c: make vext_set_tail_elems_1s() debug only

 target/riscv/vector_helper.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)


Reviewed-by: Palmer Dabbelt 

Though this made me think: it'd be nice to have some sort of 
"aggressively do odd things for VTA/VMA" mode in QEMU, as that could 
help shake out bugs in software.




Re: [PATCH v3 4/4] configure: add --disable-colo-proxy option

2023-04-27 Thread Lukas Straub
On Thu, 27 Apr 2023 23:29:46 +0300
Vladimir Sementsov-Ogievskiy  wrote:

> Add option to not build filter-mirror, filter-rewriter and
> colo-compare when they are not needed.
> 
> There could be more agile configuration, for example add separate
> options for each filter, but that may be done in future on demand. The
> aim of this patch is to make possible to disable the whole COLO Proxy
> subsystem.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  meson_options.txt |  2 ++
>  net/meson.build   | 14 ++
>  scripts/meson-buildoptions.sh |  3 +++
>  stubs/colo-compare.c  |  7 +++
>  stubs/meson.build |  1 +
>  5 files changed, 23 insertions(+), 4 deletions(-)
>  create mode 100644 stubs/colo-compare.c
> 
> diff --git a/meson_options.txt b/meson_options.txt
> index 2471dd02da..b59e7ae342 100644
> --- a/meson_options.txt
> +++ b/meson_options.txt
> @@ -289,6 +289,8 @@ option('live_block_migration', type: 'feature', value: 
> 'auto',
> description: 'block migration in the main migration stream')
>  option('replication', type: 'feature', value: 'auto',
> description: 'replication support')
> +option('colo_proxy', type: 'feature', value: 'auto',
> +   description: 'colo-proxy support')
>  option('bochs', type: 'feature', value: 'auto',
> description: 'bochs image format support')
>  option('cloop', type: 'feature', value: 'auto',
> diff --git a/net/meson.build b/net/meson.build
> index 87afca3e93..4cfc850c69 100644
> --- a/net/meson.build
> +++ b/net/meson.build
> @@ -1,13 +1,9 @@
>  softmmu_ss.add(files(
>'announce.c',
>'checksum.c',
> -  'colo-compare.c',
> -  'colo.c',
>'dump.c',
>'eth.c',
>'filter-buffer.c',
> -  'filter-mirror.c',
> -  'filter-rewriter.c',
>'filter.c',
>'hub.c',
>'net-hmp-cmds.c',
> @@ -19,6 +15,16 @@ softmmu_ss.add(files(
>'util.c',
>  ))
>  
> +if get_option('replication').allowed() or \
> +get_option('colo_proxy').allowed()
> +  softmmu_ss.add(files('colo-compare.c'))
> +  softmmu_ss.add(files('colo.c'))
> +endif
> +
> +if get_option('colo_proxy').allowed()
> +  softmmu_ss.add(files('filter-mirror.c', 'filter-rewriter.c'))
> +endif
> +

The last discussion didn't really come to a conclusion, but I still
think that 'filter-mirror.c' (which also contains filter-redirect)
should be left unchanged.

>  softmmu_ss.add(when: 'CONFIG_TCG', if_true: files('filter-replay.c'))
>  
>  if have_l2tpv3
> diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
> index d4369a3ad8..036047ce6f 100644
> --- a/scripts/meson-buildoptions.sh
> +++ b/scripts/meson-buildoptions.sh
> @@ -83,6 +83,7 @@ meson_options_help() {
>printf "%s\n" '  capstoneWhether and how to find the capstone 
> library'
>printf "%s\n" '  cloop   cloop image format support'
>printf "%s\n" '  cocoa   Cocoa user interface (macOS only)'
> +  printf "%s\n" '  colo-proxy  colo-proxy support'
>printf "%s\n" '  coreaudio   CoreAudio sound support'
>printf "%s\n" '  crypto-afalgLinux AF_ALG crypto backend driver'
>printf "%s\n" '  curlCURL block device driver'
> @@ -236,6 +237,8 @@ _meson_option_parse() {
>  --disable-cloop) printf "%s" -Dcloop=disabled ;;
>  --enable-cocoa) printf "%s" -Dcocoa=enabled ;;
>  --disable-cocoa) printf "%s" -Dcocoa=disabled ;;
> +--enable-colo-proxy) printf "%s" -Dcolo_proxy=enabled ;;
> +--disable-colo-proxy) printf "%s" -Dcolo_proxy=disabled ;;
>  --enable-coreaudio) printf "%s" -Dcoreaudio=enabled ;;
>  --disable-coreaudio) printf "%s" -Dcoreaudio=disabled ;;
>  --enable-coroutine-pool) printf "%s" -Dcoroutine_pool=true ;;
> diff --git a/stubs/colo-compare.c b/stubs/colo-compare.c
> new file mode 100644
> index 00..ec726665be
> --- /dev/null
> +++ b/stubs/colo-compare.c
> @@ -0,0 +1,7 @@
> +#include "qemu/osdep.h"
> +#include "qemu/notify.h"
> +#include "net/colo-compare.h"
> +
> +void colo_compare_cleanup(void)
> +{
> +}
> diff --git a/stubs/meson.build b/stubs/meson.build
> index 8412cad15f..a56645e2f7 100644
> --- a/stubs/meson.build
> +++ b/stubs/meson.build
> @@ -46,6 +46,7 @@ stub_ss.add(files('target-monitor-defs.c'))
>  stub_ss.add(files('trace-control.c'))
>  stub_ss.add(files('uuid.c'))
>  stub_ss.add(files('colo.c'))
> +stub_ss.add(files('colo-compare.c'))
>  stub_ss.add(files('vmstate.c'))
>  stub_ss.add(files('vm-stop.c'))
>  stub_ss.add(files('win32-kbd-hook.c'))



-- 



pgpY19OfJBwmH.pgp
Description: OpenPGP digital signature


Re: [PATCH v3 3/4] build: move COLO under CONFIG_REPLICATION

2023-04-27 Thread Lukas Straub
On Thu, 27 Apr 2023 23:29:45 +0300
Vladimir Sementsov-Ogievskiy  wrote:

> We don't allow to use x-colo capability when replication is not
> configured. So, no reason to build COLO when replication is disabled,
> it's unusable in this case.
> 
> Note also that the check in migrate_caps_check() is not the only
> restriction: some functions in migration/colo.c will just abort if
> called with not defined CONFIG_REPLICATION, for example:
> 
> migration_iteration_finish()
>case MIGRATION_STATUS_COLO:
>migrate_start_colo_process()
>colo_process_checkpoint()
>abort()
> 
> It could probably make sense to have possibility to enable COLO without
> REPLICATION, but this requires deeper audit of colo & replication code,
> which may be done later if needed.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 

Reviewed-by: Lukas Straub 

> ---
>  hmp-commands.hx|  2 ++
>  migration/colo.c   | 28 -
>  migration/meson.build  |  6 --
>  migration/migration-hmp-cmds.c |  2 ++
>  migration/options.c| 17 
>  qapi/migration.json| 12 +++
>  stubs/colo.c   | 37 ++
>  stubs/meson.build  |  1 +
>  8 files changed, 62 insertions(+), 43 deletions(-)
>  create mode 100644 stubs/colo.c
> 
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> index bb85ee1d26..fbd0932232 100644
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -1035,6 +1035,7 @@ SRST
>migration (or once already in postcopy).
>  ERST
>  
> +#ifdef CONFIG_REPLICATION
>  {
>  .name   = "x_colo_lost_heartbeat",
>  .args_type  = "",
> @@ -1043,6 +1044,7 @@ ERST
>"a failover or takeover is needed.",
>  .cmd = hmp_x_colo_lost_heartbeat,
>  },
> +#endif
>  
>  SRST
>  ``x_colo_lost_heartbeat``
> diff --git a/migration/colo.c b/migration/colo.c
> index 07bfa21fea..e4af47eeeb 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -26,9 +26,7 @@
>  #include "qemu/rcu.h"
>  #include "migration/failover.h"
>  #include "migration/ram.h"
> -#ifdef CONFIG_REPLICATION
>  #include "block/replication.h"
> -#endif
>  #include "net/colo-compare.h"
>  #include "net/colo.h"
>  #include "block/block.h"
> @@ -68,7 +66,6 @@ static bool colo_runstate_is_stopped(void)
>  static void secondary_vm_do_failover(void)
>  {
>  /* COLO needs enable block-replication */
> -#ifdef CONFIG_REPLICATION
>  int old_state;
>  MigrationIncomingState *mis = migration_incoming_get_current();
>  Error *local_err = NULL;
> @@ -133,14 +130,10 @@ static void secondary_vm_do_failover(void)
>  if (mis->migration_incoming_co) {
>  qemu_coroutine_enter(mis->migration_incoming_co);
>  }
> -#else
> -abort();
> -#endif
>  }
>  
>  static void primary_vm_do_failover(void)
>  {
> -#ifdef CONFIG_REPLICATION
>  MigrationState *s = migrate_get_current();
>  int old_state;
>  Error *local_err = NULL;
> @@ -181,9 +174,6 @@ static void primary_vm_do_failover(void)
>  
>  /* Notify COLO thread that failover work is finished */
>  qemu_sem_post(>colo_exit_sem);
> -#else
> -abort();
> -#endif
>  }
>  
>  COLOMode get_colo_mode(void)
> @@ -217,7 +207,6 @@ void colo_do_failover(void)
>  }
>  }
>  
> -#ifdef CONFIG_REPLICATION
>  void qmp_xen_set_replication(bool enable, bool primary,
>   bool has_failover, bool failover,
>   Error **errp)
> @@ -271,7 +260,6 @@ void qmp_xen_colo_do_checkpoint(Error **errp)
>  /* Notify all filters of all NIC to do checkpoint */
>  colo_notify_filters_event(COLO_EVENT_CHECKPOINT, errp);
>  }
> -#endif
>  
>  COLOStatus *qmp_query_colo_status(Error **errp)
>  {
> @@ -435,15 +423,11 @@ static int 
> colo_do_checkpoint_transaction(MigrationState *s,
>  }
>  qemu_mutex_lock_iothread();
>  
> -#ifdef CONFIG_REPLICATION
>  replication_do_checkpoint_all(_err);
>  if (local_err) {
>  qemu_mutex_unlock_iothread();
>  goto out;
>  }
> -#else
> -abort();
> -#endif
>  
>  colo_send_message(s->to_dst_file, COLO_MESSAGE_VMSTATE_SEND, _err);
>  if (local_err) {
> @@ -561,15 +545,11 @@ static void colo_process_checkpoint(MigrationState *s)
>  object_unref(OBJECT(bioc));
>  
>  qemu_mutex_lock_iothread();
> -#ifdef CONFIG_REPLICATION
>  replication_start_all(REPLICATION_MODE_PRIMARY, _err);
>  if (local_err) {
>  qemu_mutex_unlock_iothread();
>  goto out;
>  }
> -#else
> -abort();
> -#endif
>  
>  vm_start();
>  qemu_mutex_unlock_iothread();
> @@ -748,7 +728,6 @@ static void 
> colo_incoming_process_checkpoint(MigrationIncomingState *mis,
>  return;
>  }
>  
> -#ifdef CONFIG_REPLICATION
>  replication_get_error_all(_err);
>  if (local_err) {
>  error_propagate(errp, 

Re: [PATCH v3 4/4] configure: add --disable-colo-proxy option

2023-04-27 Thread Vladimir Sementsov-Ogievskiy

On 28.04.23 00:18, Lukas Straub wrote:

On Thu, 27 Apr 2023 23:29:46 +0300
Vladimir Sementsov-Ogievskiy  wrote:


Add option to not build filter-mirror, filter-rewriter and
colo-compare when they are not needed.

There could be more agile configuration, for example add separate
options for each filter, but that may be done in future on demand. The
aim of this patch is to make possible to disable the whole COLO Proxy
subsystem.

Signed-off-by: Vladimir Sementsov-Ogievskiy
---
  meson_options.txt |  2 ++
  net/meson.build   | 14 ++
  scripts/meson-buildoptions.sh |  3 +++
  stubs/colo-compare.c  |  7 +++
  stubs/meson.build |  1 +
  5 files changed, 23 insertions(+), 4 deletions(-)
  create mode 100644 stubs/colo-compare.c

diff --git a/meson_options.txt b/meson_options.txt
index 2471dd02da..b59e7ae342 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -289,6 +289,8 @@ option('live_block_migration', type: 'feature', value: 
'auto',
 description: 'block migration in the main migration stream')
  option('replication', type: 'feature', value: 'auto',
 description: 'replication support')
+option('colo_proxy', type: 'feature', value: 'auto',
+   description: 'colo-proxy support')
  option('bochs', type: 'feature', value: 'auto',
 description: 'bochs image format support')
  option('cloop', type: 'feature', value: 'auto',
diff --git a/net/meson.build b/net/meson.build
index 87afca3e93..4cfc850c69 100644
--- a/net/meson.build
+++ b/net/meson.build
@@ -1,13 +1,9 @@
  softmmu_ss.add(files(
'announce.c',
'checksum.c',
-  'colo-compare.c',
-  'colo.c',
'dump.c',
'eth.c',
'filter-buffer.c',
-  'filter-mirror.c',
-  'filter-rewriter.c',
'filter.c',
'hub.c',
'net-hmp-cmds.c',
@@ -19,6 +15,16 @@ softmmu_ss.add(files(
'util.c',
  ))
  
+if get_option('replication').allowed() or \

+get_option('colo_proxy').allowed()
+  softmmu_ss.add(files('colo-compare.c'))
+  softmmu_ss.add(files('colo.c'))
+endif
+
+if get_option('colo_proxy').allowed()
+  softmmu_ss.add(files('filter-mirror.c', 'filter-rewriter.c'))
+endif
+

The last discussion didn't really come to a conclusion, but I still
think that 'filter-mirror.c' (which also contains filter-redirect)
should be left unchanged.



OK for me, I'll wait a bit for more comments and resend with

 @@ -22,7 +22,7 @@ if get_option('replication').allowed() or \
  endif
  
  if get_option('colo_proxy').allowed()

 -  softmmu_ss.add(files('filter-mirror.c', 'filter-rewriter.c'))
 +  softmmu_ss.add(files('filter-rewriter.c'))
  endif
  
  softmmu_ss.add(when: 'CONFIG_TCG', if_true: files('filter-replay.c'))



applied here, if no other strong opinion.

--
Best regards,
Vladimir




[PATCH v10 7/8] raven: disable reentrancy detection for iomem

2023-04-27 Thread Alexander Bulekov
As the code is designed for re-entrant calls from raven_io_ops to
pci-conf, mark raven_io_ops as reentrancy-safe.

Signed-off-by: Alexander Bulekov 
---
 hw/pci-host/raven.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/hw/pci-host/raven.c b/hw/pci-host/raven.c
index 072ffe3c5e..9a11ac4b2b 100644
--- a/hw/pci-host/raven.c
+++ b/hw/pci-host/raven.c
@@ -294,6 +294,13 @@ static void raven_pcihost_initfn(Object *obj)
 memory_region_init(>pci_memory, obj, "pci-memory", 0x3f00);
 address_space_init(>pci_io_as, >pci_io, "raven-io");
 
+/*
+ * Raven's raven_io_ops use the address-space API to access pci-conf-idx
+ * (which is also owned by the raven device). As such, mark the
+ * pci_io_non_contiguous as re-entrancy safe.
+ */
+s->pci_io_non_contiguous.disable_reentrancy_guard = true;
+
 /* CPU address space */
 memory_region_add_subregion(address_space_mem, PCI_IO_BASE_ADDR,
 >pci_io);
-- 
2.39.0




[PATCH v10 3/8] checkpatch: add qemu_bh_new/aio_bh_new checks

2023-04-27 Thread Alexander Bulekov
Advise authors to use the _guarded versions of the APIs, instead.

Reviewed-by: Darren Kenny 
Signed-off-by: Alexander Bulekov 
---
 scripts/checkpatch.pl | 8 
 1 file changed, 8 insertions(+)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index d768171dcf..eeaec436eb 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -2865,6 +2865,14 @@ sub process {
if ($line =~ /\bsignal\s*\(/ && !($line =~ /SIG_(?:IGN|DFL)/)) {
ERROR("use sigaction to establish signal handlers; 
signal is not portable\n" . $herecurr);
}
+# recommend qemu_bh_new_guarded instead of qemu_bh_new
+if ($realfile =~ /.*\/hw\/.*/ && $line =~ /\bqemu_bh_new\s*\(/) {
+   ERROR("use qemu_bh_new_guarded() instead of 
qemu_bh_new() to avoid reentrancy problems\n" . $herecurr);
+   }
+# recommend aio_bh_new_guarded instead of aio_bh_new
+if ($realfile =~ /.*\/hw\/.*/ && $line =~ /\baio_bh_new\s*\(/) {
+   ERROR("use aio_bh_new_guarded() instead of aio_bh_new() 
to avoid reentrancy problems\n" . $herecurr);
+   }
 # check for module_init(), use category-specific init macros explicitly please
if ($line =~ /^module_init\s*\(/) {
ERROR("please use block_init(), type_init() etc. 
instead of module_init()\n" . $herecurr);
-- 
2.39.0




[PATCH v10 6/8] bcm2835_property: disable reentrancy detection for iomem

2023-04-27 Thread Alexander Bulekov
As the code is designed for re-entrant calls from bcm2835_property to
bcm2835_mbox and back into bcm2835_property, mark iomem as
reentrancy-safe.

Signed-off-by: Alexander Bulekov 
Reviewed-by: Thomas Huth 
---
 hw/misc/bcm2835_property.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/hw/misc/bcm2835_property.c b/hw/misc/bcm2835_property.c
index 890ae7bae5..de056ea2df 100644
--- a/hw/misc/bcm2835_property.c
+++ b/hw/misc/bcm2835_property.c
@@ -382,6 +382,13 @@ static void bcm2835_property_init(Object *obj)
 
 memory_region_init_io(>iomem, OBJECT(s), _property_ops, s,
   TYPE_BCM2835_PROPERTY, 0x10);
+
+/*
+ * bcm2835_property_ops call into bcm2835_mbox, which in-turn reads from
+ * iomem. As such, mark iomem as re-entracy safe.
+ */
+s->iomem.disable_reentrancy_guard = true;
+
 sysbus_init_mmio(SYS_BUS_DEVICE(s), >iomem);
 sysbus_init_irq(SYS_BUS_DEVICE(s), >mbox_irq);
 }
-- 
2.39.0




[PATCH v10 0/8] memory: prevent dma-reentracy issues

2023-04-27 Thread Alexander Bulekov
v8-> v9:
- Replace trace-events and attempt at making re-entrancy fatal with
  a warn_report message. This message should only be printed if a
  device is broken (and needs to be marked re-entrancy safe), or if
  something in the guest is attempting to trigger unintentional
  re-entrancy.
- Added APIC change to the series

v7 -> v8:
- Disable reentrancy checks for bcm2835_property's iomem (Patch 7)
- Cache DeviceState* in the MemoryRegion to avoid dynamic cast for
  each MemoryRegion access. (Patch 1)
- Make re-entrancy fatal for debug-builds (Patch 8)

v6 -> v7:
- Fix bad qemu_bh_new_guarded calls found by Thomas (Patch 4)
- Add an MR-specific flag to disable reentrancy (Patch 5)
- Disable reentrancy checks for lsi53c895a's RAM-like MR (Patch 6)

Patches 5 and 6 need review. I left the review-tags for Patch 4,
however a few of the qemu_bh_new_guarded calls have changed.
  
v5 -> v6:
- Only apply checkpatch checks to code in paths containing "/hw/"
  (/hw/ and include/hw/)
- Fix a bug in a _guarded call added to hw/block/virtio-blk.c
v4-> v5:
- Add corresponding checkpatch checks
- Save/restore reentrancy-flag when entering/exiting BHs
- Improve documentation
- Check object_dynamic_cast return value

v3 -> v4: Instead of changing all of the DMA APIs, instead add an
optional reentrancy guard to the BH API.

v2 -> v3: Bite the bullet and modify the DMA APIs, rather than
attempting to guess DeviceStates in BHs.

These patches aim to solve two types of DMA-reentrancy issues:

1.) mmio -> dma -> mmio case
To solve this, we track whether the device is engaged in io by
checking/setting a reentrancy-guard within APIs used for MMIO access.

2.) bh -> dma write -> mmio case
This case is trickier, since we dont have a generic way to associate a
bh with the underlying Device/DeviceState. Thus, this version allows a
device to associate a reentrancy-guard with a bh, when creating it.
(Instead of calling qemu_bh_new, you call qemu_bh_new_guarded)

I replaced most of the qemu_bh_new invocations with the guarded analog,
except for the ones where the DeviceState was not trivially accessible.

Alexander Bulekov (8):
  memory: prevent dma-reentracy issues
  async: Add an optional reentrancy guard to the BH API
  checkpatch: add qemu_bh_new/aio_bh_new checks
  hw: replace most qemu_bh_new calls with qemu_bh_new_guarded
  lsi53c895a: disable reentrancy detection for script RAM
  bcm2835_property: disable reentrancy detection for iomem
  raven: disable reentrancy detection for iomem
  apic: disable reentrancy detection for apic-msi

 docs/devel/multiple-iothreads.txt |  7 +++
 hw/9pfs/xen-9p-backend.c  |  5 -
 hw/block/dataplane/virtio-blk.c   |  3 ++-
 hw/block/dataplane/xen-block.c|  5 +++--
 hw/char/virtio-serial-bus.c   |  3 ++-
 hw/display/qxl.c  |  9 ++---
 hw/display/virtio-gpu.c   |  6 --
 hw/ide/ahci.c |  3 ++-
 hw/ide/ahci_internal.h|  1 +
 hw/ide/core.c |  4 +++-
 hw/intc/apic.c|  7 +++
 hw/misc/bcm2835_property.c|  7 +++
 hw/misc/imx_rngc.c|  6 --
 hw/misc/macio/mac_dbdma.c |  2 +-
 hw/net/virtio-net.c   |  3 ++-
 hw/nvme/ctrl.c|  6 --
 hw/pci-host/raven.c   |  7 +++
 hw/scsi/lsi53c895a.c  |  6 ++
 hw/scsi/mptsas.c  |  3 ++-
 hw/scsi/scsi-bus.c|  3 ++-
 hw/scsi/vmw_pvscsi.c  |  3 ++-
 hw/usb/dev-uas.c  |  3 ++-
 hw/usb/hcd-dwc2.c |  3 ++-
 hw/usb/hcd-ehci.c |  3 ++-
 hw/usb/hcd-uhci.c |  2 +-
 hw/usb/host-libusb.c  |  6 --
 hw/usb/redirect.c |  6 --
 hw/usb/xen-usb.c  |  3 ++-
 hw/virtio/virtio-balloon.c|  5 +++--
 hw/virtio/virtio-crypto.c |  3 ++-
 include/block/aio.h   | 18 --
 include/exec/memory.h |  5 +
 include/hw/qdev-core.h|  7 +++
 include/qemu/main-loop.h  |  7 +--
 scripts/checkpatch.pl |  8 
 softmmu/memory.c  | 16 
 tests/unit/ptimer-test-stubs.c|  3 ++-
 util/async.c  | 18 +-
 util/main-loop.c  |  5 +++--
 util/trace-events |  1 +
 40 files changed, 180 insertions(+), 41 deletions(-)

-- 
2.39.0




[PATCH v10 4/8] hw: replace most qemu_bh_new calls with qemu_bh_new_guarded

2023-04-27 Thread Alexander Bulekov
This protects devices from bh->mmio reentrancy issues.

Thanks: Thomas Huth  for diagnosing OS X test failure.
Reviewed-by: Darren Kenny 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Michael S. Tsirkin 
Reviewed-by: Paul Durrant 
Signed-off-by: Alexander Bulekov 
Reviewed-by: Thomas Huth 
---
 hw/9pfs/xen-9p-backend.c| 5 -
 hw/block/dataplane/virtio-blk.c | 3 ++-
 hw/block/dataplane/xen-block.c  | 5 +++--
 hw/char/virtio-serial-bus.c | 3 ++-
 hw/display/qxl.c| 9 ++---
 hw/display/virtio-gpu.c | 6 --
 hw/ide/ahci.c   | 3 ++-
 hw/ide/ahci_internal.h  | 1 +
 hw/ide/core.c   | 4 +++-
 hw/misc/imx_rngc.c  | 6 --
 hw/misc/macio/mac_dbdma.c   | 2 +-
 hw/net/virtio-net.c | 3 ++-
 hw/nvme/ctrl.c  | 6 --
 hw/scsi/mptsas.c| 3 ++-
 hw/scsi/scsi-bus.c  | 3 ++-
 hw/scsi/vmw_pvscsi.c| 3 ++-
 hw/usb/dev-uas.c| 3 ++-
 hw/usb/hcd-dwc2.c   | 3 ++-
 hw/usb/hcd-ehci.c   | 3 ++-
 hw/usb/hcd-uhci.c   | 2 +-
 hw/usb/host-libusb.c| 6 --
 hw/usb/redirect.c   | 6 --
 hw/usb/xen-usb.c| 3 ++-
 hw/virtio/virtio-balloon.c  | 5 +++--
 hw/virtio/virtio-crypto.c   | 3 ++-
 25 files changed, 66 insertions(+), 33 deletions(-)

diff --git a/hw/9pfs/xen-9p-backend.c b/hw/9pfs/xen-9p-backend.c
index 74f3a05f88..0e266c552b 100644
--- a/hw/9pfs/xen-9p-backend.c
+++ b/hw/9pfs/xen-9p-backend.c
@@ -61,6 +61,7 @@ typedef struct Xen9pfsDev {
 
 int num_rings;
 Xen9pfsRing *rings;
+MemReentrancyGuard mem_reentrancy_guard;
 } Xen9pfsDev;
 
 static void xen_9pfs_disconnect(struct XenLegacyDevice *xendev);
@@ -443,7 +444,9 @@ static int xen_9pfs_connect(struct XenLegacyDevice *xendev)
 xen_9pdev->rings[i].ring.out = xen_9pdev->rings[i].data +
XEN_FLEX_RING_SIZE(ring_order);
 
-xen_9pdev->rings[i].bh = qemu_bh_new(xen_9pfs_bh, 
_9pdev->rings[i]);
+xen_9pdev->rings[i].bh = qemu_bh_new_guarded(xen_9pfs_bh,
+ _9pdev->rings[i],
+ 
_9pdev->mem_reentrancy_guard);
 xen_9pdev->rings[i].out_cons = 0;
 xen_9pdev->rings[i].out_size = 0;
 xen_9pdev->rings[i].inprogress = false;
diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index b28d81737e..a6202997ee 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -127,7 +127,8 @@ bool virtio_blk_data_plane_create(VirtIODevice *vdev, 
VirtIOBlkConf *conf,
 } else {
 s->ctx = qemu_get_aio_context();
 }
-s->bh = aio_bh_new(s->ctx, notify_guest_bh, s);
+s->bh = aio_bh_new_guarded(s->ctx, notify_guest_bh, s,
+   (vdev)->mem_reentrancy_guard);
 s->batch_notify_vqs = bitmap_new(conf->num_queues);
 
 *dataplane = s;
diff --git a/hw/block/dataplane/xen-block.c b/hw/block/dataplane/xen-block.c
index 734da42ea7..d8bc39d359 100644
--- a/hw/block/dataplane/xen-block.c
+++ b/hw/block/dataplane/xen-block.c
@@ -633,8 +633,9 @@ XenBlockDataPlane *xen_block_dataplane_create(XenDevice 
*xendev,
 } else {
 dataplane->ctx = qemu_get_aio_context();
 }
-dataplane->bh = aio_bh_new(dataplane->ctx, xen_block_dataplane_bh,
-   dataplane);
+dataplane->bh = aio_bh_new_guarded(dataplane->ctx, xen_block_dataplane_bh,
+   dataplane,
+   (xendev)->mem_reentrancy_guard);
 
 return dataplane;
 }
diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index 7d4601cb5d..dd619f0731 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -985,7 +985,8 @@ static void virtser_port_device_realize(DeviceState *dev, 
Error **errp)
 return;
 }
 
-port->bh = qemu_bh_new(flush_queued_data_bh, port);
+port->bh = qemu_bh_new_guarded(flush_queued_data_bh, port,
+   >mem_reentrancy_guard);
 port->elem = NULL;
 }
 
diff --git a/hw/display/qxl.c b/hw/display/qxl.c
index 80ce1e9a93..f1c0eb7dfc 100644
--- a/hw/display/qxl.c
+++ b/hw/display/qxl.c
@@ -2201,11 +2201,14 @@ static void qxl_realize_common(PCIQXLDevice *qxl, Error 
**errp)
 
 qemu_add_vm_change_state_handler(qxl_vm_change_state_handler, qxl);
 
-qxl->update_irq = qemu_bh_new(qxl_update_irq_bh, qxl);
+qxl->update_irq = qemu_bh_new_guarded(qxl_update_irq_bh, qxl,
+  (qxl)->mem_reentrancy_guard);
 qxl_reset_state(qxl);
 
-qxl->update_area_bh = qemu_bh_new(qxl_render_update_area_bh, qxl);
-qxl->ssd.cursor_bh = qemu_bh_new(qemu_spice_cursor_refresh_bh, >ssd);
+qxl->update_area_bh = qemu_bh_new_guarded(qxl_render_update_area_bh, qxl,
+

[PATCH v10 5/8] lsi53c895a: disable reentrancy detection for script RAM

2023-04-27 Thread Alexander Bulekov
As the code is designed to use the memory APIs to access the script ram,
disable reentrancy checks for the pseudo-RAM ram_io MemoryRegion.

In the future, ram_io may be converted from an IO to a proper RAM MemoryRegion.

Reported-by: Fiona Ebner 
Signed-off-by: Alexander Bulekov 
Reviewed-by: Thomas Huth 
Reviewed-by: Darren Kenny 
---
 hw/scsi/lsi53c895a.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/hw/scsi/lsi53c895a.c b/hw/scsi/lsi53c895a.c
index af93557a9a..db27872963 100644
--- a/hw/scsi/lsi53c895a.c
+++ b/hw/scsi/lsi53c895a.c
@@ -2302,6 +2302,12 @@ static void lsi_scsi_realize(PCIDevice *dev, Error 
**errp)
 memory_region_init_io(>io_io, OBJECT(s), _io_ops, s,
   "lsi-io", 256);
 
+/*
+ * Since we use the address-space API to interact with ram_io, disable the
+ * re-entrancy guard.
+ */
+s->ram_io.disable_reentrancy_guard = true;
+
 address_space_init(>pci_io_as, pci_address_space_io(dev), "lsi-pci-io");
 qdev_init_gpio_out(d, >ext_irq, 1);
 
-- 
2.39.0




[PATCH v10 8/8] apic: disable reentrancy detection for apic-msi

2023-04-27 Thread Alexander Bulekov
As the code is designed for re-entrant calls to apic-msi, mark apic-msi
as reentrancy-safe.

Signed-off-by: Alexander Bulekov 
Reviewed-by: Darren Kenny 
---
 hw/intc/apic.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/hw/intc/apic.c b/hw/intc/apic.c
index 20b5a94073..ac3d47d231 100644
--- a/hw/intc/apic.c
+++ b/hw/intc/apic.c
@@ -885,6 +885,13 @@ static void apic_realize(DeviceState *dev, Error **errp)
 memory_region_init_io(>io_memory, OBJECT(s), _io_ops, s, 
"apic-msi",
   APIC_SPACE_SIZE);
 
+/*
+ * apic-msi's apic_mem_write can call into ioapic_eoi_broadcast, which can
+ * write back to apic-msi. As such mark the apic-msi region re-entrancy
+ * safe.
+ */
+s->io_memory.disable_reentrancy_guard = true;
+
 s->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, apic_timer, s);
 local_apics[s->id] = s;
 
-- 
2.39.0




[PATCH v10 2/8] async: Add an optional reentrancy guard to the BH API

2023-04-27 Thread Alexander Bulekov
Devices can pass their MemoryReentrancyGuard (from their DeviceState),
when creating new BHes. Then, the async API will toggle the guard
before/after calling the BH call-back. This prevents bh->mmio reentrancy
issues.

Reviewed-by: Darren Kenny 
Signed-off-by: Alexander Bulekov 
---
 docs/devel/multiple-iothreads.txt |  7 +++
 include/block/aio.h   | 18 --
 include/qemu/main-loop.h  |  7 +--
 tests/unit/ptimer-test-stubs.c|  3 ++-
 util/async.c  | 18 +-
 util/main-loop.c  |  5 +++--
 util/trace-events |  1 +
 7 files changed, 51 insertions(+), 8 deletions(-)

diff --git a/docs/devel/multiple-iothreads.txt 
b/docs/devel/multiple-iothreads.txt
index 343120f2ef..a3e949f6b3 100644
--- a/docs/devel/multiple-iothreads.txt
+++ b/docs/devel/multiple-iothreads.txt
@@ -61,6 +61,7 @@ There are several old APIs that use the main loop AioContext:
  * LEGACY qemu_aio_set_event_notifier() - monitor an event notifier
  * LEGACY timer_new_ms() - create a timer
  * LEGACY qemu_bh_new() - create a BH
+ * LEGACY qemu_bh_new_guarded() - create a BH with a device re-entrancy guard
  * LEGACY qemu_aio_wait() - run an event loop iteration
 
 Since they implicitly work on the main loop they cannot be used in code that
@@ -72,8 +73,14 @@ Instead, use the AioContext functions directly (see 
include/block/aio.h):
  * aio_set_event_notifier() - monitor an event notifier
  * aio_timer_new() - create a timer
  * aio_bh_new() - create a BH
+ * aio_bh_new_guarded() - create a BH with a device re-entrancy guard
  * aio_poll() - run an event loop iteration
 
+The qemu_bh_new_guarded/aio_bh_new_guarded APIs accept a "MemReentrancyGuard"
+argument, which is used to check for and prevent re-entrancy problems. For
+BHs associated with devices, the reentrancy-guard is contained in the
+corresponding DeviceState and named "mem_reentrancy_guard".
+
 The AioContext can be obtained from the IOThread using
 iothread_get_aio_context() or for the main loop using qemu_get_aio_context().
 Code that takes an AioContext argument works both in IOThreads or the main
diff --git a/include/block/aio.h b/include/block/aio.h
index e267d918fd..89bbc536f9 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -23,6 +23,8 @@
 #include "qemu/thread.h"
 #include "qemu/timer.h"
 #include "block/graph-lock.h"
+#include "hw/qdev-core.h"
+
 
 typedef struct BlockAIOCB BlockAIOCB;
 typedef void BlockCompletionFunc(void *opaque, int ret);
@@ -323,9 +325,11 @@ void aio_bh_schedule_oneshot_full(AioContext *ctx, 
QEMUBHFunc *cb, void *opaque,
  * is opaque and must be allocated prior to its use.
  *
  * @name: A human-readable identifier for debugging purposes.
+ * @reentrancy_guard: A guard set when entering a cb to prevent
+ * device-reentrancy issues
  */
 QEMUBH *aio_bh_new_full(AioContext *ctx, QEMUBHFunc *cb, void *opaque,
-const char *name);
+const char *name, MemReentrancyGuard 
*reentrancy_guard);
 
 /**
  * aio_bh_new: Allocate a new bottom half structure
@@ -334,7 +338,17 @@ QEMUBH *aio_bh_new_full(AioContext *ctx, QEMUBHFunc *cb, 
void *opaque,
  * string.
  */
 #define aio_bh_new(ctx, cb, opaque) \
-aio_bh_new_full((ctx), (cb), (opaque), (stringify(cb)))
+aio_bh_new_full((ctx), (cb), (opaque), (stringify(cb)), NULL)
+
+/**
+ * aio_bh_new_guarded: Allocate a new bottom half structure with a
+ * reentrancy_guard
+ *
+ * A convenience wrapper for aio_bh_new_full() that uses the cb as the name
+ * string.
+ */
+#define aio_bh_new_guarded(ctx, cb, opaque, guard) \
+aio_bh_new_full((ctx), (cb), (opaque), (stringify(cb)), guard)
 
 /**
  * aio_notify: Force processing of pending events.
diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
index b3e54e00bc..68e70e61aa 100644
--- a/include/qemu/main-loop.h
+++ b/include/qemu/main-loop.h
@@ -387,9 +387,12 @@ void qemu_cond_timedwait_iothread(QemuCond *cond, int ms);
 
 /* internal interfaces */
 
+#define qemu_bh_new_guarded(cb, opaque, guard) \
+qemu_bh_new_full((cb), (opaque), (stringify(cb)), guard)
 #define qemu_bh_new(cb, opaque) \
-qemu_bh_new_full((cb), (opaque), (stringify(cb)))
-QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name);
+qemu_bh_new_full((cb), (opaque), (stringify(cb)), NULL)
+QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name,
+ MemReentrancyGuard *reentrancy_guard);
 void qemu_bh_schedule_idle(QEMUBH *bh);
 
 enum {
diff --git a/tests/unit/ptimer-test-stubs.c b/tests/unit/ptimer-test-stubs.c
index f2bfcede93..8c9407c560 100644
--- a/tests/unit/ptimer-test-stubs.c
+++ b/tests/unit/ptimer-test-stubs.c
@@ -107,7 +107,8 @@ int64_t qemu_clock_deadline_ns_all(QEMUClockType type, int 
attr_mask)
 return deadline;
 }
 
-QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name)
+QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void 

[PATCH v10 1/8] memory: prevent dma-reentracy issues

2023-04-27 Thread Alexander Bulekov
Add a flag to the DeviceState, when a device is engaged in PIO/MMIO/DMA.
This flag is set/checked prior to calling a device's MemoryRegion
handlers, and set when device code initiates DMA.  The purpose of this
flag is to prevent two types of DMA-based reentrancy issues:

1.) mmio -> dma -> mmio case
2.) bh -> dma write -> mmio case

These issues have led to problems such as stack-exhaustion and
use-after-frees.

Summary of the problem from Peter Maydell:
https://lore.kernel.org/qemu-devel/cafeaca_23vc7he3iam-jva6w38lk4hjowae5kcknhprd5fp...@mail.gmail.com

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/62
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/540
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/541
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/556
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/557
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/827
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1282
Resolves: CVE-2023-0330

Signed-off-by: Alexander Bulekov 
Reviewed-by: Thomas Huth 
---
 include/exec/memory.h  |  5 +
 include/hw/qdev-core.h |  7 +++
 softmmu/memory.c   | 16 
 3 files changed, 28 insertions(+)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 15ade918ba..e45ce6061f 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -767,6 +767,8 @@ struct MemoryRegion {
 bool is_iommu;
 RAMBlock *ram_block;
 Object *owner;
+/* owner as TYPE_DEVICE. Used for re-entrancy checks in MR access hotpath 
*/
+DeviceState *dev;
 
 const MemoryRegionOps *ops;
 void *opaque;
@@ -791,6 +793,9 @@ struct MemoryRegion {
 unsigned ioeventfd_nb;
 MemoryRegionIoeventfd *ioeventfds;
 RamDiscardManager *rdm; /* Only for RAM */
+
+/* For devices designed to perform re-entrant IO into their own IO MRs */
+bool disable_reentrancy_guard;
 };
 
 struct IOMMUMemoryRegion {
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index bd50ad5ee1..7623703943 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -162,6 +162,10 @@ struct NamedClockList {
 QLIST_ENTRY(NamedClockList) node;
 };
 
+typedef struct {
+bool engaged_in_io;
+} MemReentrancyGuard;
+
 /**
  * DeviceState:
  * @realized: Indicates whether the device has been fully constructed.
@@ -194,6 +198,9 @@ struct DeviceState {
 int alias_required_for_version;
 ResettableState reset;
 GSList *unplug_blockers;
+
+/* Is the device currently in mmio/pio/dma? Used to prevent re-entrancy */
+MemReentrancyGuard mem_reentrancy_guard;
 };
 
 struct DeviceListener {
diff --git a/softmmu/memory.c b/softmmu/memory.c
index b1a6cae6f5..fe23f0e5ce 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -542,6 +542,18 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
 access_size_max = 4;
 }
 
+/* Do not allow more than one simultaneous access to a device's IO Regions 
*/
+if (mr->dev && !mr->disable_reentrancy_guard &&
+!mr->ram_device && !mr->ram && !mr->rom_device && !mr->readonly) {
+if (mr->dev->mem_reentrancy_guard.engaged_in_io) {
+warn_report("Blocked re-entrant IO on "
+"MemoryRegion: %s at addr: 0x%" HWADDR_PRIX,
+memory_region_name(mr), addr);
+return MEMTX_ACCESS_ERROR;
+}
+mr->dev->mem_reentrancy_guard.engaged_in_io = true;
+}
+
 /* FIXME: support unaligned access? */
 access_size = MAX(MIN(size, access_size_max), access_size_min);
 access_mask = MAKE_64BIT_MASK(0, access_size * 8);
@@ -556,6 +568,9 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
 access_mask, attrs);
 }
 }
+if (mr->dev) {
+mr->dev->mem_reentrancy_guard.engaged_in_io = false;
+}
 return r;
 }
 
@@ -1170,6 +1185,7 @@ static void memory_region_do_init(MemoryRegion *mr,
 }
 mr->name = g_strdup(name);
 mr->owner = owner;
+mr->dev = (DeviceState *) object_dynamic_cast(mr->owner, TYPE_DEVICE);
 mr->ram_block = NULL;
 
 if (name) {
-- 
2.39.0




[PATCH 1/2] target/riscv/vector_helper.c: skip set tail when vta is zero

2023-04-27 Thread Daniel Henrique Barboza
The function is a no-op if 'vta' is zero but we're still doing a lot of
stuff in this function regardless. vext_set_elems_1s() will ignore every
single time (since vta is zero) and we just wasted time.

Skip it altogether in this case. Aside from the code simplification
there's a noticeable emulation performance gain by doing it. For a
regular C binary that does a vectors operation like this:

===
 #define SZ 1000

int main ()
{
  int *a = malloc (SZ * sizeof (int));
  int *b = malloc (SZ * sizeof (int));
  int *c = malloc (SZ * sizeof (int));

  for (int i = 0; i < SZ; i++)
c[i] = a[i] + b[i];
  return c[SZ - 1];
}
===

Emulating it with qemu-riscv64 and RVV takes ~0.3 sec:

$ time ~/work/qemu/build/qemu-riscv64 \
-cpu rv64,debug=false,vext_spec=v1.0,v=true,vlen=128 ./foo.out

real0m0.303s
user0m0.281s
sys 0m0.023s

With this skip we take ~0.275 sec:

$ time ~/work/qemu/build/qemu-riscv64 \
-cpu rv64,debug=false,vext_spec=v1.0,v=true,vlen=128 ./foo.out

real0m0.274s
user0m0.252s
sys 0m0.019s

This performance gain adds up fast when executing heavy benchmarks like
SPEC.

Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/vector_helper.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index f4d0438988..8e6c99e573 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -268,12 +268,17 @@ static void vext_set_tail_elems_1s(CPURISCVState *env, 
target_ulong vl,
void *vd, uint32_t desc, uint32_t nf,
uint32_t esz, uint32_t max_elems)
 {
-uint32_t total_elems = vext_get_total_elems(env, desc, esz);
-uint32_t vlenb = riscv_cpu_cfg(env)->vlen >> 3;
+uint32_t total_elems, vlenb, registers_used;
 uint32_t vta = vext_vta(desc);
-uint32_t registers_used;
 int k;
 
+if (vta == 0) {
+return;
+}
+
+total_elems = vext_get_total_elems(env, desc, esz);
+vlenb = riscv_cpu_cfg(env)->vlen >> 3;
+
 for (k = 0; k < nf; ++k) {
 vext_set_elems_1s(vd, vta, (k * max_elems + vl) * esz,
   (k * max_elems + max_elems) * esz);
-- 
2.40.0




[PATCH 0/2] target/riscv: RVV 1-fill tail element changes

2023-04-27 Thread Daniel Henrique Barboza
Hi,

This series makes changes in vext_set_tail_elements_1s() to be a little
nicer to the emulation.

First patch makes the function a no-op when vta == 0. Aside from the
logic simplification we also have a little performance boost.

Second patch makes the function debug only. The logic is explained in
the commit message, but long story short: we don't have to implement any
tail-agnostic policy at all to be spec compliant, but this function has
its uses for debug purposes, so keeping it as a debug option allow users
to disable it on demand.

Patches are based on top of Alistair's riscv-to-apply.next.
 
Daniel Henrique Barboza (2):
  target/riscv/vector_helper.c: skip set tail when vta is zero
  target/riscv/vector_helper.c: make vext_set_tail_elems_1s() debug only

 target/riscv/vector_helper.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

-- 
2.40.0




[PATCH 2/2] target/riscv/vector_helper.c: make vext_set_tail_elems_1s() debug only

2023-04-27 Thread Daniel Henrique Barboza
Commit 3479a814 ("target/riscv: rvv-1.0: add VMA and VTA") added vma and
vta fields in the vtype register, while also defining that QEMU doesn't
need to have a tail agnostic policy to be compliant with the RVV spec.
It ended up removing all tail handling code as well. Later, commit
752614ca ("target/riscv: rvv: Add tail agnostic for vector load / store
instructions") reintroduced the tail agnostic fill for vector load/store
instructions only.

This puts QEMU in a situation where some functions are 1-filling the
tail elements and others don't. This is still a valid implementation,
but the process of 1-filling the tail elements takes valuable emulation
time that can be used doing anything else. If the spec doesn't demand a
specific tail-agostic policy, a proper software wouldn't expect any
policy to be in place. This means that, more often than not, the work
we're doing by 1-filling tail elements is wasted. We would be better of
if vext_set_tail_elems_1s() is removed entirely from the code.

All this said, there's still a debug value associated with it. So,
instead of removing it, let's gate it with cpu->cfg.debug. This way
software can enable this code if desirable, but for the regular case we
shouldn't waste time with it.

Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/vector_helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 8e6c99e573..e0a292ac24 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -272,7 +272,7 @@ static void vext_set_tail_elems_1s(CPURISCVState *env, 
target_ulong vl,
 uint32_t vta = vext_vta(desc);
 int k;
 
-if (vta == 0) {
+if (vta == 0 || !riscv_cpu_cfg(env)->debug)  {
 return;
 }
 
-- 
2.40.0




Re: [PATCH 03/20] block: bdrv/blk_co_unref() for calls in coroutine context

2023-04-27 Thread Paolo Bonzini
Il gio 27 apr 2023, 19:00 Kevin Wolf  ha scritto:

> By the way, and slightly unrelated, can vrc somehow help with finding
> places that call coroutine wrappers without holding the AioContext lock?
> (This results in an abort() when AIO_WAIT_WHILE() tries to unlock the
> AioContext.) This is one of the classes of bugs we're seeing in 8.0.
>

Seems more like a task for TSA.

Even though C TSA doesn't let you check that the *right* AioContext lock is
taken, it can check statically that *one* such lock is taken, and in
general I would guess it's rare for the wrong AioContext to be locked.

Paolo


> Kevin
>
>


Re: [PATCH v3 1/4] block/meson.build: prefer positive condition for replication

2023-04-27 Thread Lukas Straub
On Thu, 27 Apr 2023 23:29:43 +0300
Vladimir Sementsov-Ogievskiy  wrote:

> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> Reviewed-by: Juan Quintela 
> Reviewed-by: Philippe Mathieu-Daudé 

Reviewed-by: Lukas Straub 

> ---
>  block/meson.build | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/meson.build b/block/meson.build
> index 382bec0e7d..b9a72e219b 100644
> --- a/block/meson.build
> +++ b/block/meson.build
> @@ -84,7 +84,7 @@ block_ss.add(when: 'CONFIG_WIN32', if_true: 
> files('file-win32.c', 'win32-aio.c')
>  block_ss.add(when: 'CONFIG_POSIX', if_true: [files('file-posix.c'), coref, 
> iokit])
>  block_ss.add(when: libiscsi, if_true: files('iscsi-opts.c'))
>  block_ss.add(when: 'CONFIG_LINUX', if_true: files('nvme.c'))
> -if not get_option('replication').disabled()
> +if get_option('replication').allowed()
>block_ss.add(files('replication.c'))
>  endif
>  block_ss.add(when: libaio, if_true: files('linux-aio.c'))



-- 



pgpgT54Ez08iK.pgp
Description: OpenPGP digital signature


[PATCH v3 0/4] COLO: improve build options

2023-04-27 Thread Vladimir Sementsov-Ogievskiy
v3:
01: add r-bs
02: improve commit message
03: - improve commit message
- drop ifdefs from migration/colo.c which are not needed anymore
- don't move migrate_colo_enabled() (now just migrate_colo()), instead 
modify it inplace
- keep colo-compare.c for now (will be handled in updated 04 patch)
- so, no colo_compare_cleanup() stub needed for now, neither 
migrate_colo_enabled() stub
- keep Acked-by.
04: - improve commit message
- rename to --disable-colo-proxy to match subsystem name in MAINTAINERS
- don't introduce CONFIG_COLO_PROXY, it actually is not needed
- colo-compare.c is handled now and included if any of 'replication' and 
'colo-proxy' are enabled
- so, we add colo_compare_cleanup() stub in a separate stub file

Hi all!

COLO substem seems to be useless when CONFIG_REPLICATION is unset, as we
simply don't allow to set x-colo capability in this case. So, let's not
compile in unreachable code and interface we cannot use when
CONFIG_REPLICATION is unset.

Also, provide personal configure option for COLO Proxy subsystem.

Vladimir Sementsov-Ogievskiy (4):
  block/meson.build: prefer positive condition for replication
  scripts/qapi: allow optional experimental enum values
  build: move COLO under CONFIG_REPLICATION
  configure: add --disable-colo-proxy option

 block/meson.build  |  2 +-
 hmp-commands.hx|  2 ++
 meson_options.txt  |  2 ++
 migration/colo.c   | 28 -
 migration/meson.build  |  6 --
 migration/migration-hmp-cmds.c |  2 ++
 migration/options.c| 17 
 net/meson.build| 14 +
 qapi/migration.json| 12 +++
 scripts/meson-buildoptions.sh  |  3 +++
 scripts/qapi/types.py  |  2 ++
 stubs/colo-compare.c   |  7 +++
 stubs/colo.c   | 37 ++
 stubs/meson.build  |  2 ++
 14 files changed, 88 insertions(+), 48 deletions(-)
 create mode 100644 stubs/colo-compare.c
 create mode 100644 stubs/colo.c

-- 
2.34.1




[PATCH v3 1/4] block/meson.build: prefer positive condition for replication

2023-04-27 Thread Vladimir Sementsov-Ogievskiy
Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Juan Quintela 
Reviewed-by: Philippe Mathieu-Daudé 
---
 block/meson.build | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/meson.build b/block/meson.build
index 382bec0e7d..b9a72e219b 100644
--- a/block/meson.build
+++ b/block/meson.build
@@ -84,7 +84,7 @@ block_ss.add(when: 'CONFIG_WIN32', if_true: 
files('file-win32.c', 'win32-aio.c')
 block_ss.add(when: 'CONFIG_POSIX', if_true: [files('file-posix.c'), coref, 
iokit])
 block_ss.add(when: libiscsi, if_true: files('iscsi-opts.c'))
 block_ss.add(when: 'CONFIG_LINUX', if_true: files('nvme.c'))
-if not get_option('replication').disabled()
+if get_option('replication').allowed()
   block_ss.add(files('replication.c'))
 endif
 block_ss.add(when: libaio, if_true: files('linux-aio.c'))
-- 
2.34.1




[PATCH v3 3/4] build: move COLO under CONFIG_REPLICATION

2023-04-27 Thread Vladimir Sementsov-Ogievskiy
We don't allow to use x-colo capability when replication is not
configured. So, no reason to build COLO when replication is disabled,
it's unusable in this case.

Note also that the check in migrate_caps_check() is not the only
restriction: some functions in migration/colo.c will just abort if
called with not defined CONFIG_REPLICATION, for example:

migration_iteration_finish()
   case MIGRATION_STATUS_COLO:
   migrate_start_colo_process()
   colo_process_checkpoint()
   abort()

It could probably make sense to have possibility to enable COLO without
REPLICATION, but this requires deeper audit of colo & replication code,
which may be done later if needed.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 hmp-commands.hx|  2 ++
 migration/colo.c   | 28 -
 migration/meson.build  |  6 --
 migration/migration-hmp-cmds.c |  2 ++
 migration/options.c| 17 
 qapi/migration.json| 12 +++
 stubs/colo.c   | 37 ++
 stubs/meson.build  |  1 +
 8 files changed, 62 insertions(+), 43 deletions(-)
 create mode 100644 stubs/colo.c

diff --git a/hmp-commands.hx b/hmp-commands.hx
index bb85ee1d26..fbd0932232 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1035,6 +1035,7 @@ SRST
   migration (or once already in postcopy).
 ERST
 
+#ifdef CONFIG_REPLICATION
 {
 .name   = "x_colo_lost_heartbeat",
 .args_type  = "",
@@ -1043,6 +1044,7 @@ ERST
   "a failover or takeover is needed.",
 .cmd = hmp_x_colo_lost_heartbeat,
 },
+#endif
 
 SRST
 ``x_colo_lost_heartbeat``
diff --git a/migration/colo.c b/migration/colo.c
index 07bfa21fea..e4af47eeeb 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -26,9 +26,7 @@
 #include "qemu/rcu.h"
 #include "migration/failover.h"
 #include "migration/ram.h"
-#ifdef CONFIG_REPLICATION
 #include "block/replication.h"
-#endif
 #include "net/colo-compare.h"
 #include "net/colo.h"
 #include "block/block.h"
@@ -68,7 +66,6 @@ static bool colo_runstate_is_stopped(void)
 static void secondary_vm_do_failover(void)
 {
 /* COLO needs enable block-replication */
-#ifdef CONFIG_REPLICATION
 int old_state;
 MigrationIncomingState *mis = migration_incoming_get_current();
 Error *local_err = NULL;
@@ -133,14 +130,10 @@ static void secondary_vm_do_failover(void)
 if (mis->migration_incoming_co) {
 qemu_coroutine_enter(mis->migration_incoming_co);
 }
-#else
-abort();
-#endif
 }
 
 static void primary_vm_do_failover(void)
 {
-#ifdef CONFIG_REPLICATION
 MigrationState *s = migrate_get_current();
 int old_state;
 Error *local_err = NULL;
@@ -181,9 +174,6 @@ static void primary_vm_do_failover(void)
 
 /* Notify COLO thread that failover work is finished */
 qemu_sem_post(>colo_exit_sem);
-#else
-abort();
-#endif
 }
 
 COLOMode get_colo_mode(void)
@@ -217,7 +207,6 @@ void colo_do_failover(void)
 }
 }
 
-#ifdef CONFIG_REPLICATION
 void qmp_xen_set_replication(bool enable, bool primary,
  bool has_failover, bool failover,
  Error **errp)
@@ -271,7 +260,6 @@ void qmp_xen_colo_do_checkpoint(Error **errp)
 /* Notify all filters of all NIC to do checkpoint */
 colo_notify_filters_event(COLO_EVENT_CHECKPOINT, errp);
 }
-#endif
 
 COLOStatus *qmp_query_colo_status(Error **errp)
 {
@@ -435,15 +423,11 @@ static int colo_do_checkpoint_transaction(MigrationState 
*s,
 }
 qemu_mutex_lock_iothread();
 
-#ifdef CONFIG_REPLICATION
 replication_do_checkpoint_all(_err);
 if (local_err) {
 qemu_mutex_unlock_iothread();
 goto out;
 }
-#else
-abort();
-#endif
 
 colo_send_message(s->to_dst_file, COLO_MESSAGE_VMSTATE_SEND, _err);
 if (local_err) {
@@ -561,15 +545,11 @@ static void colo_process_checkpoint(MigrationState *s)
 object_unref(OBJECT(bioc));
 
 qemu_mutex_lock_iothread();
-#ifdef CONFIG_REPLICATION
 replication_start_all(REPLICATION_MODE_PRIMARY, _err);
 if (local_err) {
 qemu_mutex_unlock_iothread();
 goto out;
 }
-#else
-abort();
-#endif
 
 vm_start();
 qemu_mutex_unlock_iothread();
@@ -748,7 +728,6 @@ static void 
colo_incoming_process_checkpoint(MigrationIncomingState *mis,
 return;
 }
 
-#ifdef CONFIG_REPLICATION
 replication_get_error_all(_err);
 if (local_err) {
 error_propagate(errp, local_err);
@@ -765,9 +744,6 @@ static void 
colo_incoming_process_checkpoint(MigrationIncomingState *mis,
 qemu_mutex_unlock_iothread();
 return;
 }
-#else
-abort();
-#endif
 /* Notify all filters of all NIC to do checkpoint */
 colo_notify_filters_event(COLO_EVENT_CHECKPOINT, _err);
 
@@ -874,15 +850,11 @@ void *colo_process_incoming_thread(void *opaque)
 

[PATCH v3 4/4] configure: add --disable-colo-proxy option

2023-04-27 Thread Vladimir Sementsov-Ogievskiy
Add option to not build filter-mirror, filter-rewriter and
colo-compare when they are not needed.

There could be more agile configuration, for example add separate
options for each filter, but that may be done in future on demand. The
aim of this patch is to make possible to disable the whole COLO Proxy
subsystem.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 meson_options.txt |  2 ++
 net/meson.build   | 14 ++
 scripts/meson-buildoptions.sh |  3 +++
 stubs/colo-compare.c  |  7 +++
 stubs/meson.build |  1 +
 5 files changed, 23 insertions(+), 4 deletions(-)
 create mode 100644 stubs/colo-compare.c

diff --git a/meson_options.txt b/meson_options.txt
index 2471dd02da..b59e7ae342 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -289,6 +289,8 @@ option('live_block_migration', type: 'feature', value: 
'auto',
description: 'block migration in the main migration stream')
 option('replication', type: 'feature', value: 'auto',
description: 'replication support')
+option('colo_proxy', type: 'feature', value: 'auto',
+   description: 'colo-proxy support')
 option('bochs', type: 'feature', value: 'auto',
description: 'bochs image format support')
 option('cloop', type: 'feature', value: 'auto',
diff --git a/net/meson.build b/net/meson.build
index 87afca3e93..4cfc850c69 100644
--- a/net/meson.build
+++ b/net/meson.build
@@ -1,13 +1,9 @@
 softmmu_ss.add(files(
   'announce.c',
   'checksum.c',
-  'colo-compare.c',
-  'colo.c',
   'dump.c',
   'eth.c',
   'filter-buffer.c',
-  'filter-mirror.c',
-  'filter-rewriter.c',
   'filter.c',
   'hub.c',
   'net-hmp-cmds.c',
@@ -19,6 +15,16 @@ softmmu_ss.add(files(
   'util.c',
 ))
 
+if get_option('replication').allowed() or \
+get_option('colo_proxy').allowed()
+  softmmu_ss.add(files('colo-compare.c'))
+  softmmu_ss.add(files('colo.c'))
+endif
+
+if get_option('colo_proxy').allowed()
+  softmmu_ss.add(files('filter-mirror.c', 'filter-rewriter.c'))
+endif
+
 softmmu_ss.add(when: 'CONFIG_TCG', if_true: files('filter-replay.c'))
 
 if have_l2tpv3
diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
index d4369a3ad8..036047ce6f 100644
--- a/scripts/meson-buildoptions.sh
+++ b/scripts/meson-buildoptions.sh
@@ -83,6 +83,7 @@ meson_options_help() {
   printf "%s\n" '  capstoneWhether and how to find the capstone 
library'
   printf "%s\n" '  cloop   cloop image format support'
   printf "%s\n" '  cocoa   Cocoa user interface (macOS only)'
+  printf "%s\n" '  colo-proxy  colo-proxy support'
   printf "%s\n" '  coreaudio   CoreAudio sound support'
   printf "%s\n" '  crypto-afalgLinux AF_ALG crypto backend driver'
   printf "%s\n" '  curlCURL block device driver'
@@ -236,6 +237,8 @@ _meson_option_parse() {
 --disable-cloop) printf "%s" -Dcloop=disabled ;;
 --enable-cocoa) printf "%s" -Dcocoa=enabled ;;
 --disable-cocoa) printf "%s" -Dcocoa=disabled ;;
+--enable-colo-proxy) printf "%s" -Dcolo_proxy=enabled ;;
+--disable-colo-proxy) printf "%s" -Dcolo_proxy=disabled ;;
 --enable-coreaudio) printf "%s" -Dcoreaudio=enabled ;;
 --disable-coreaudio) printf "%s" -Dcoreaudio=disabled ;;
 --enable-coroutine-pool) printf "%s" -Dcoroutine_pool=true ;;
diff --git a/stubs/colo-compare.c b/stubs/colo-compare.c
new file mode 100644
index 00..ec726665be
--- /dev/null
+++ b/stubs/colo-compare.c
@@ -0,0 +1,7 @@
+#include "qemu/osdep.h"
+#include "qemu/notify.h"
+#include "net/colo-compare.h"
+
+void colo_compare_cleanup(void)
+{
+}
diff --git a/stubs/meson.build b/stubs/meson.build
index 8412cad15f..a56645e2f7 100644
--- a/stubs/meson.build
+++ b/stubs/meson.build
@@ -46,6 +46,7 @@ stub_ss.add(files('target-monitor-defs.c'))
 stub_ss.add(files('trace-control.c'))
 stub_ss.add(files('uuid.c'))
 stub_ss.add(files('colo.c'))
+stub_ss.add(files('colo-compare.c'))
 stub_ss.add(files('vmstate.c'))
 stub_ss.add(files('vm-stop.c'))
 stub_ss.add(files('win32-kbd-hook.c'))
-- 
2.34.1




[PATCH v3 2/4] scripts/qapi: allow optional experimental enum values

2023-04-27 Thread Vladimir Sementsov-Ogievskiy
We have 'if' feature for some things in QAPI, including enum values.
But currently it doesn't work for experimental enum values, as in
generated QEnumLookup structure, the description for additional
features (for example - "unstable") is not surrounded by corresponding
"#ifdef"s.

So let's fix it.

We are going to use it in the next commit, to make unstable x-colo
migration capability optional:

  { 'name': 'x-colo', 'features': [ 'unstable' ], 'if': 'CONFIG_COLO' }

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 scripts/qapi/types.py | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/scripts/qapi/types.py b/scripts/qapi/types.py
index c39d054d2c..18f8734047 100644
--- a/scripts/qapi/types.py
+++ b/scripts/qapi/types.py
@@ -61,10 +61,12 @@ def gen_enum_lookup(name: str,
 
 special_features = gen_special_features(memb.features)
 if special_features != '0':
+feats += memb.ifcond.gen_if()
 feats += mcgen('''
 [%(index)s] = %(special_features)s,
 ''',
index=index, special_features=special_features)
+feats += memb.ifcond.gen_endif()
 
 if feats:
 ret += mcgen('''
-- 
2.34.1




Re: [PATCH 0/6] Add RISC-V KVM AIA Support

2023-04-27 Thread Daniel Henrique Barboza

Hi,

The patches seems to be based on an old QEMU code base. E.g. patch 2 does not
have the changes made by 568e0614d097 that was merged in January this year.

Can you please re-send the series based on top of Alistair's riscv-to-apply.next
(https://github.com/alistair23/qemu/tree/riscv-to-apply.next)?


Thanks,


Daniel




On 4/24/23 06:07, Yong-Xuan Wang wrote:

This series introduces support for KVM AIA in the RISC-V architecture. The
implementation is refered to Anup's KVM AIA implementation in kvmtool
(https://github.com/avpatel/kvmtool.git). To test these patches, a Linux kernel
with KVM AIA support is required, which can be found in the qemu_kvm_aia branch
at https://github.com/yong-xuan/linux.git. This kernel branch is based on the
riscv_aia_v1 branch from https://github.com/avpatel/linux.git and includes two
additional patches.


Yong-Xuan Wang (6):
   update-linux-headers: sync-up header with Linux for KVM AIA support
   target/riscv: support the AIA device emulateion with KVM enabled
   target/riscv: check the in-kernel irqchip support
   target/riscv: Create an KVM AIA irqchip
   target/riscv: update APLIC and IMSIC to support KVM AIA
   target/riscv: select KVM AIA in riscv virt machine

  hw/intc/riscv_aplic.c |  19 +++-
  hw/intc/riscv_imsic.c |  16 ++-
  hw/riscv/virt.c   | 214 +-
  linux-headers/linux/kvm.h |   2 +
  target/riscv/kvm.c|  96 -
  target/riscv/kvm_riscv.h  |  36 +++
  6 files changed, 277 insertions(+), 106 deletions(-)





Re: [PATCH v2 3/4] build: move COLO under CONFIG_REPLICATION

2023-04-27 Thread Vladimir Sementsov-Ogievskiy

On 23.04.23 04:54, Zhang, Chen wrote:



-Original Message-
From: Vladimir Sementsov-Ogievskiy
Sent: Friday, April 21, 2023 4:36 PM
To: Zhang, Chen;qemu-devel@nongnu.org
Cc:qemu-bl...@nongnu.org;michael.r...@amd.com;arm...@redhat.com;
ebl...@redhat.com;jasow...@redhat.com;quint...@redhat.com; Zhang,
Hailiang;phi...@linaro.org;
th...@redhat.com;berra...@redhat.com;marcandre.lur...@redhat.com;
pbonz...@redhat.com;d...@treblig.org;hre...@redhat.com;
kw...@redhat.com;lizhij...@fujitsu.com
Subject: Re: [PATCH v2 3/4] build: move COLO under CONFIG_REPLICATION

On 21.04.23 06:02, Zhang, Chen wrote:



-Original Message-
From: Vladimir Sementsov-Ogievskiy
Sent: Thursday, April 20, 2023 6:53 AM
To:qemu-devel@nongnu.org
Cc:qemu-bl...@nongnu.org;michael.r...@amd.com;

arm...@redhat.com;

ebl...@redhat.com;jasow...@redhat.com;quint...@redhat.com;

Zhang,

Hailiang;phi...@linaro.org;
th...@redhat.com;berra...@redhat.com;

marcandre.lur...@redhat.com;

pbonz...@redhat.com;d...@treblig.org;hre...@redhat.com;
kw...@redhat.com; Zhang, Chen;
lizhij...@fujitsu.com; Vladimir Sementsov-Ogievskiy

Subject: [PATCH v2 3/4] build: move COLO under CONFIG_REPLICATION

We don't allow to use x-colo capability when replication is not
configured. So, no reason to build COLO when replication is disabled,
it's unusable in this case.

Yes, you are right for current status. Because COLO best practices is

replication + colo live migration + colo proxy.

But doesn't mean it has to be done in all scenarios as I explanation in V1.
The better way is allow to use x-colo capability firstly, and separate
this patch with two config options: --disable-replication  and --disable-x-

colo.
But what for? We for sure don't have such scenarios now (COLO without
replication), as it's not allowed by far 7e934f5b27eee1b0d7 (by you and
David).

If you think we need such scenario, I think it should be a separate series
which reverts 7e934f5b27eee1b0d7 and adds corresponding test and
probably documentation.

In the patch 7e934f5b27eee1b0d7 said it's for current independent disk mode,
And what we talked about before is the shared disk mode.
Rethink about the COLO shared disk mode, this feature still needs some enabling 
works.
It looks OK for now and separate the build options when enabling COLO shared 
disk mode.


I've started working on this, and now I see, that check in the 
migrate_caps_check() is not the only place.

migration/colo.c has also several abort() points. For example, 
colo_process_checkpoint will simply abort if CONFIG_REPLICATION not defined.

So for sure, current code is not prepared to use COLO with REPLICATION disabled.

If this possibility is needed it requires more work. Personally, I don't think 
that possibility to enable COLO with disabled REPLICATION is really needed and 
I know nobody who need it, so that seems to be extra work.


--
Best regards,
Vladimir




Re: [PATCH 06/19] migration/rdma: Unfold last user of acct_update_position()

2023-04-27 Thread Lukas Straub
On Thu, 27 Apr 2023 18:34:36 +0200
Juan Quintela  wrote:

> Signed-off-by: Juan Quintela 

Reviewed-by: Lukas Straub 

> ---
>  migration/ram.c  | 9 -
>  migration/ram.h  | 1 -
>  migration/rdma.c | 4 +++-
>  3 files changed, 3 insertions(+), 11 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index c249a1f468..7d81c4a39e 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -2629,15 +2629,6 @@ static int ram_find_and_save_block(RAMState *rs)
>  return pages;
>  }
>  
> -void acct_update_position(QEMUFile *f, size_t size)
> -{
> -uint64_t pages = size / TARGET_PAGE_SIZE;
> -
> -stat64_add(_stats.normal_pages, pages);
> -ram_transferred_add(size);
> -qemu_file_credit_transfer(f, size);
> -}
> -
>  static uint64_t ram_bytes_total_with_ignored(void)
>  {
>  RAMBlock *block;
> diff --git a/migration/ram.h b/migration/ram.h
> index 3804753ca3..6fffbeb5f1 100644
> --- a/migration/ram.h
> +++ b/migration/ram.h
> @@ -53,7 +53,6 @@ void mig_throttle_counter_reset(void);
>  
>  uint64_t ram_pagesize_summary(void);
>  int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t 
> len);
> -void acct_update_position(QEMUFile *f, size_t size);
>  void ram_postcopy_migrated_memory_release(MigrationState *ms);
>  /* For outgoing discard bitmap */
>  void ram_postcopy_send_discard_bitmap(MigrationState *ms);
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 7a9b284c3f..7e747b2595 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -2231,7 +2231,9 @@ retry:
>  }
>  
>  set_bit(chunk, block->transit_bitmap);
> -acct_update_position(f, sge.length);
> +stat64_add(_stats.normal_pages, sge.length / 
> qemu_target_page_size());
> +ram_transferred_add(sge.length);
> +qemu_file_credit_transfer(f, sge.length);
>  rdma->total_writes++;
>  
>  return 0;



-- 



pgpgacnKIsu4F.pgp
Description: OpenPGP digital signature


Re: [PATCH 05/19] migration/rdma: Split the zero page case from acct_update_position

2023-04-27 Thread Lukas Straub
On Thu, 27 Apr 2023 18:34:35 +0200
Juan Quintela  wrote:

> Now that we have atomic counters, we can do it on the place that we
> need it, no need to do it inside ram.c.
> 
> Signed-off-by: Juan Quintela 

Reviewed-by: Lukas Straub 

> ---
>  migration/ram.c  | 12 
>  migration/ram.h  |  2 +-
>  migration/rdma.c |  7 +--
>  3 files changed, 10 insertions(+), 11 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index c3981f64e4..c249a1f468 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -2629,17 +2629,13 @@ static int ram_find_and_save_block(RAMState *rs)
>  return pages;
>  }
>  
> -void acct_update_position(QEMUFile *f, size_t size, bool zero)
> +void acct_update_position(QEMUFile *f, size_t size)
>  {
>  uint64_t pages = size / TARGET_PAGE_SIZE;
>  
> -if (zero) {
> -stat64_add(_stats.zero_pages, pages);
> -} else {
> -stat64_add(_stats.normal_pages, pages);
> -ram_transferred_add(size);
> -qemu_file_credit_transfer(f, size);
> -}
> +stat64_add(_stats.normal_pages, pages);
> +ram_transferred_add(size);
> +qemu_file_credit_transfer(f, size);
>  }
>  
>  static uint64_t ram_bytes_total_with_ignored(void)
> diff --git a/migration/ram.h b/migration/ram.h
> index 8692de6ba0..3804753ca3 100644
> --- a/migration/ram.h
> +++ b/migration/ram.h
> @@ -53,7 +53,7 @@ void mig_throttle_counter_reset(void);
>  
>  uint64_t ram_pagesize_summary(void);
>  int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t 
> len);
> -void acct_update_position(QEMUFile *f, size_t size, bool zero);
> +void acct_update_position(QEMUFile *f, size_t size);
>  void ram_postcopy_migrated_memory_release(MigrationState *ms);
>  /* For outgoing discard bitmap */
>  void ram_postcopy_send_discard_bitmap(MigrationState *ms);
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 0af5e944f0..7a9b284c3f 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -17,8 +17,10 @@
>  #include "qemu/osdep.h"
>  #include "qapi/error.h"
>  #include "qemu/cutils.h"
> +#include "exec/target_page.h"
>  #include "rdma.h"
>  #include "migration.h"
> +#include "migration-stats.h"
>  #include "qemu-file.h"
>  #include "ram.h"
>  #include "qemu/error-report.h"
> @@ -2120,7 +2122,8 @@ retry:
>  return -EIO;
>  }
>  
> -acct_update_position(f, sge.length, true);
> +stat64_add(_stats.zero_pages,
> +   sge.length / qemu_target_page_size());
>  
>  return 1;
>  }
> @@ -2228,7 +2231,7 @@ retry:
>  }
>  
>  set_bit(chunk, block->transit_bitmap);
> -acct_update_position(f, sge.length, false);
> +acct_update_position(f, sge.length);
>  rdma->total_writes++;
>  
>  return 0;



-- 



pgpnzzlkFwkoC.pgp
Description: OpenPGP digital signature


Re: [PATCH 04/19] migration: Rename RAMStats to MigrationAtomicStats

2023-04-27 Thread Lukas Straub
On Thu, 27 Apr 2023 18:34:34 +0200
Juan Quintela  wrote:

> It is lousely based on MigrationStats, but that name is taken, so this
> is the best one that I came with.
> 
> Signed-off-by: Juan Quintela 

Reviewed-by: Lukas Straub 
> ---
> 
> If you have any good suggestion for the name, I am all ears.
> ---
>  migration/migration-stats.c | 2 +-
>  migration/migration-stats.h | 4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/migration/migration-stats.c b/migration/migration-stats.c
> index 8c0af9b80a..2f2cea965c 100644
> --- a/migration/migration-stats.c
> +++ b/migration/migration-stats.c
> @@ -14,4 +14,4 @@
>  #include "qemu/stats64.h"
>  #include "migration-stats.h"
>  
> -RAMStats mig_stats;
> +MigrationAtomicStats mig_stats;
> diff --git a/migration/migration-stats.h b/migration/migration-stats.h
> index 197374b4f6..149af932d7 100644
> --- a/migration/migration-stats.h
> +++ b/migration/migration-stats.h
> @@ -34,8 +34,8 @@ typedef struct {
>  Stat64 postcopy_requests;
>  Stat64 precopy_bytes;
>  Stat64 transferred;
> -} RAMStats;
> +} MigrationAtomicStats;
>  
> -extern RAMStats mig_stats;
> +extern MigrationAtomicStats mig_stats;
>  
>  #endif



-- 



pgp8vEdIsWNgM.pgp
Description: OpenPGP digital signature


Re: [PATCH v11 08/13] tests/qtest: Fix tests when no KVM or TCG are present

2023-04-27 Thread Michael S. Tsirkin
On Wed, Apr 26, 2023 at 03:00:08PM -0300, Fabiano Rosas wrote:
> It is possible to have a build with both TCG and KVM disabled due to
> Xen requiring the i386 and x86_64 binaries to be present in an aarch64
> host.
> 
> If we build with --disable-tcg on the aarch64 host, we will end-up
> with a QEMU binary (x86) that does not support TCG nor KVM.
> 
> Skip tests that crash or hang in the above scenario. Do not include
> any test cases if TCG and KVM are missing.
> 
> Make sure that calls to qtest_has_accel are placed after g_test_init
> in similar fashion to commit ae4b01b349 ("tests: Ensure TAP version is
> printed before other messages") to avoid TAP parsing errors.
> 
> Reviewed-by: Juan Quintela 
> Reviewed-by: Thomas Huth 
> Signed-off-by: Fabiano Rosas 


makes sense to me

Reviewed-by: Michael S. Tsirkin 

> ---
>  tests/qtest/bios-tables-test.c | 11 +--
>  tests/qtest/boot-serial-test.c |  5 +
>  tests/qtest/migration-test.c   |  9 -
>  tests/qtest/pxe-test.c |  8 +++-
>  tests/qtest/vmgenid-test.c |  9 +++--
>  5 files changed, 36 insertions(+), 6 deletions(-)
> 
> diff --git a/tests/qtest/bios-tables-test.c b/tests/qtest/bios-tables-test.c
> index 464f87382e..7fd88b0e9c 100644
> --- a/tests/qtest/bios-tables-test.c
> +++ b/tests/qtest/bios-tables-test.c
> @@ -2045,8 +2045,7 @@ static void test_acpi_virt_oem_fields(void)
>  int main(int argc, char *argv[])
>  {
>  const char *arch = qtest_get_arch();
> -const bool has_kvm = qtest_has_accel("kvm");
> -const bool has_tcg = qtest_has_accel("tcg");
> +bool has_kvm, has_tcg;
>  char *v_env = getenv("V");
>  int ret;
>  
> @@ -2056,6 +2055,14 @@ int main(int argc, char *argv[])
>  
>  g_test_init(, , NULL);
>  
> +has_kvm = qtest_has_accel("kvm");
> +has_tcg = qtest_has_accel("tcg");
> +
> +if (!has_tcg && !has_kvm) {
> +g_test_skip("No KVM or TCG accelerator available");
> +return 0;
> +}
> +
>  if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
>  ret = boot_sector_init(disk);
>  if (ret) {
> diff --git a/tests/qtest/boot-serial-test.c b/tests/qtest/boot-serial-test.c
> index 3aef3a97a9..6dd06aeaf4 100644
> --- a/tests/qtest/boot-serial-test.c
> +++ b/tests/qtest/boot-serial-test.c
> @@ -287,6 +287,11 @@ int main(int argc, char *argv[])
>  
>  g_test_init(, , NULL);
>  
> +if (!qtest_has_accel("tcg") && !qtest_has_accel("kvm")) {
> +g_test_skip("No KVM or TCG accelerator available");
> +return 0;
> +}
> +
>  for (i = 0; tests[i].arch != NULL; i++) {
>  if (g_str_equal(arch, tests[i].arch) &&
>  qtest_has_machine(tests[i].machine)) {
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index 60dd53d3ec..be73ec3c06 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -2477,7 +2477,7 @@ static bool kvm_dirty_ring_supported(void)
>  
>  int main(int argc, char **argv)
>  {
> -bool has_kvm;
> +bool has_kvm, has_tcg;
>  bool has_uffd;
>  const char *arch;
>  g_autoptr(GError) err = NULL;
> @@ -2486,6 +2486,13 @@ int main(int argc, char **argv)
>  g_test_init(, , NULL);
>  
>  has_kvm = qtest_has_accel("kvm");
> +has_tcg = qtest_has_accel("tcg");
> +
> +if (!has_tcg && !has_kvm) {
> +g_test_skip("No KVM or TCG accelerator available");
> +return 0;
> +}
> +
>  has_uffd = ufd_version_check();
>  arch = qtest_get_arch();
>  
> diff --git a/tests/qtest/pxe-test.c b/tests/qtest/pxe-test.c
> index 62b6eef464..e4b48225a5 100644
> --- a/tests/qtest/pxe-test.c
> +++ b/tests/qtest/pxe-test.c
> @@ -131,11 +131,17 @@ int main(int argc, char *argv[])
>  int ret;
>  const char *arch = qtest_get_arch();
>  
> +g_test_init(, , NULL);
> +
> +if (!qtest_has_accel("tcg") && !qtest_has_accel("kvm")) {
> +g_test_skip("No KVM or TCG accelerator available");
> +return 0;
> +}
> +
>  ret = boot_sector_init(disk);
>  if(ret)
>  return ret;
>  
> -g_test_init(, , NULL);
>  
>  if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
>  test_batch(x86_tests, false);
> diff --git a/tests/qtest/vmgenid-test.c b/tests/qtest/vmgenid-test.c
> index efba76e716..324db08c7a 100644
> --- a/tests/qtest/vmgenid-test.c
> +++ b/tests/qtest/vmgenid-test.c
> @@ -165,13 +165,18 @@ int main(int argc, char **argv)
>  {
>  int ret;
>  
> +g_test_init(, , NULL);
> +
> +if (!qtest_has_accel("tcg") && !qtest_has_accel("kvm")) {
> +g_test_skip("No KVM or TCG accelerator available");
> +return 0;
> +}
> +
>  ret = boot_sector_init(disk);
>  if (ret) {
>  return ret;
>  }
>  
> -g_test_init(, , NULL);
> -
>  qtest_add_func("/vmgenid/vmgenid/set-guid",
> vmgenid_set_guid_test);
>  qtest_add_func("/vmgenid/vmgenid/set-guid-auto",
> -- 
> 2.35.3




Re: [PATCH 03/19] migration: Rename ram_counters to mig_stats

2023-04-27 Thread Lukas Straub
On Thu, 27 Apr 2023 18:34:33 +0200
Juan Quintela  wrote:

> migration_stats is just too long, and it is going to have more than
> ram counters in the near future.
> 
> Signed-off-by: Juan Quintela 

Reviewed-by: Lukas Straub 

> ---
>  migration/migration-stats.c |  2 +-
>  migration/migration-stats.h |  2 +-
>  migration/migration.c   | 32 -
>  migration/multifd.c |  6 ++---
>  migration/ram.c | 48 ++---
>  migration/savevm.c  |  2 +-
>  6 files changed, 46 insertions(+), 46 deletions(-)
> 
> diff --git a/migration/migration-stats.c b/migration/migration-stats.c
> index b0eb5ae73c..8c0af9b80a 100644
> --- a/migration/migration-stats.c
> +++ b/migration/migration-stats.c
> @@ -14,4 +14,4 @@
>  #include "qemu/stats64.h"
>  #include "migration-stats.h"
>  
> -RAMStats ram_counters;
> +RAMStats mig_stats;
> diff --git a/migration/migration-stats.h b/migration/migration-stats.h
> index 2edea0c779..197374b4f6 100644
> --- a/migration/migration-stats.h
> +++ b/migration/migration-stats.h
> @@ -36,6 +36,6 @@ typedef struct {
>  Stat64 transferred;
>  } RAMStats;
>  
> -extern RAMStats ram_counters;
> +extern RAMStats mig_stats;
>  
>  #endif
> diff --git a/migration/migration.c b/migration/migration.c
> index 5ecf3dc381..feb5ab7493 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -909,26 +909,26 @@ static void populate_ram_info(MigrationInfo *info, 
> MigrationState *s)
>  size_t page_size = qemu_target_page_size();
>  
>  info->ram = g_malloc0(sizeof(*info->ram));
> -info->ram->transferred = stat64_get(_counters.transferred);
> +info->ram->transferred = stat64_get(_stats.transferred);
>  info->ram->total = ram_bytes_total();
> -info->ram->duplicate = stat64_get(_counters.zero_pages);
> +info->ram->duplicate = stat64_get(_stats.zero_pages);
>  /* legacy value.  It is not used anymore */
>  info->ram->skipped = 0;
> -info->ram->normal = stat64_get(_counters.normal_pages);
> +info->ram->normal = stat64_get(_stats.normal_pages);
>  info->ram->normal_bytes = info->ram->normal * page_size;
>  info->ram->mbps = s->mbps;
>  info->ram->dirty_sync_count =
> -stat64_get(_counters.dirty_sync_count);
> +stat64_get(_stats.dirty_sync_count);
>  info->ram->dirty_sync_missed_zero_copy =
> -stat64_get(_counters.dirty_sync_missed_zero_copy);
> +stat64_get(_stats.dirty_sync_missed_zero_copy);
>  info->ram->postcopy_requests =
> -stat64_get(_counters.postcopy_requests);
> +stat64_get(_stats.postcopy_requests);
>  info->ram->page_size = page_size;
> -info->ram->multifd_bytes = stat64_get(_counters.multifd_bytes);
> +info->ram->multifd_bytes = stat64_get(_stats.multifd_bytes);
>  info->ram->pages_per_second = s->pages_per_second;
> -info->ram->precopy_bytes = stat64_get(_counters.precopy_bytes);
> -info->ram->downtime_bytes = stat64_get(_counters.downtime_bytes);
> -info->ram->postcopy_bytes = stat64_get(_counters.postcopy_bytes);
> +info->ram->precopy_bytes = stat64_get(_stats.precopy_bytes);
> +info->ram->downtime_bytes = stat64_get(_stats.downtime_bytes);
> +info->ram->postcopy_bytes = stat64_get(_stats.postcopy_bytes);
>  
>  if (migrate_xbzrle()) {
>  info->xbzrle_cache = g_malloc0(sizeof(*info->xbzrle_cache));
> @@ -960,7 +960,7 @@ static void populate_ram_info(MigrationInfo *info, 
> MigrationState *s)
>  if (s->state != MIGRATION_STATUS_COMPLETED) {
>  info->ram->remaining = ram_bytes_remaining();
>  info->ram->dirty_pages_rate =
> -   stat64_get(_counters.dirty_pages_rate);
> +   stat64_get(_stats.dirty_pages_rate);
>  }
>  }
>  
> @@ -1613,10 +1613,10 @@ static bool migrate_prepare(MigrationState *s, bool 
> blk, bool blk_inc,
>  
>  migrate_init(s);
>  /*
> - * set ram_counters compression_counters memory to zero for a
> + * set mig_stats compression_counters memory to zero for a
>   * new migration
>   */
> -memset(_counters, 0, sizeof(ram_counters));
> +memset(_stats, 0, sizeof(mig_stats));
>  memset(_counters, 0, sizeof(compression_counters));
>  
>  return true;
> @@ -2627,7 +2627,7 @@ static MigThrError 
> migration_detect_error(MigrationState *s)
>  static uint64_t migration_total_bytes(MigrationState *s)
>  {
>  return qemu_file_total_transferred(s->to_dst_file) +
> -stat64_get(_counters.multifd_bytes);
> +stat64_get(_stats.multifd_bytes);
>  }
>  
>  static void migration_calculate_complete(MigrationState *s)
> @@ -2691,10 +2691,10 @@ static void migration_update_counters(MigrationState 
> *s,
>   * if we haven't sent anything, we don't want to
>   * recalculate. 1 is a small enough number for our purposes
>   */
> -if (stat64_get(_counters.dirty_pages_rate) &&
> +if (stat64_get(_stats.dirty_pages_rate) &&
>  

Re: [PATCH 02/19] migration: Move ram_stats to its own file migration-stats.[ch]

2023-04-27 Thread Lukas Straub
On Thu, 27 Apr 2023 18:34:32 +0200
Juan Quintela  wrote:

> There is already include/qemu/stats.h, so stats.h was a bad idea.
> We want this file to not depend on anything else, we will move all the
> migration counters/stats to this struct.
> 
> Signed-off-by: Juan Quintela 

Reviewed-by: Lukas Straub 

> ---
>  migration/meson.build   |  1 +
>  migration/migration-stats.c | 17 +++
>  migration/migration-stats.h | 41 +
>  migration/migration.c   |  1 +
>  migration/multifd.c |  1 +
>  migration/ram.c |  3 +--
>  migration/ram.h | 23 -
>  migration/savevm.c  |  1 +
>  8 files changed, 63 insertions(+), 25 deletions(-)
>  create mode 100644 migration/migration-stats.c
>  create mode 100644 migration/migration-stats.h
> 
> diff --git a/migration/meson.build b/migration/meson.build
> index 480ff6854a..da1897fadf 100644
> --- a/migration/meson.build
> +++ b/migration/meson.build
> @@ -19,6 +19,7 @@ softmmu_ss.add(files(
>'fd.c',
>'global_state.c',
>'migration-hmp-cmds.c',
> +  'migration-stats.c',
>'migration.c',
>'multifd.c',
>'multifd-zlib.c',
> diff --git a/migration/migration-stats.c b/migration/migration-stats.c
> new file mode 100644
> index 00..b0eb5ae73c
> --- /dev/null
> +++ b/migration/migration-stats.c
> @@ -0,0 +1,17 @@
> +/*
> + * Migration stats
> + *
> + * Copyright (c) 2012-2023 Red Hat Inc
> + *
> + * Authors:
> + *  Juan Quintela 
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/stats64.h"
> +#include "migration-stats.h"
> +
> +RAMStats ram_counters;
> diff --git a/migration/migration-stats.h b/migration/migration-stats.h
> new file mode 100644
> index 00..2edea0c779
> --- /dev/null
> +++ b/migration/migration-stats.h
> @@ -0,0 +1,41 @@
> +/*
> + * Migration stats
> + *
> + * Copyright (c) 2012-2023 Red Hat Inc
> + *
> + * Authors:
> + *  Juan Quintela 
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#ifndef QEMU_MIGRATION_STATS_H
> +#define QEMU_MIGRATION_STATS_H
> +
> +#include "qemu/stats64.h"
> +
> +/*
> + * These are the ram migration statistic counters.  It is loosely
> + * based on MigrationStats.  We change to Stat64 any counter that
> + * needs to be updated using atomic ops (can be accessed by more than
> + * one thread).
> + */
> +typedef struct {
> +Stat64 dirty_bytes_last_sync;
> +Stat64 dirty_pages_rate;
> +Stat64 dirty_sync_count;
> +Stat64 dirty_sync_missed_zero_copy;
> +Stat64 downtime_bytes;
> +Stat64 zero_pages;
> +Stat64 multifd_bytes;
> +Stat64 normal_pages;
> +Stat64 postcopy_bytes;
> +Stat64 postcopy_requests;
> +Stat64 precopy_bytes;
> +Stat64 transferred;
> +} RAMStats;
> +
> +extern RAMStats ram_counters;
> +
> +#endif
> diff --git a/migration/migration.c b/migration/migration.c
> index abcadbb619..5ecf3dc381 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -29,6 +29,7 @@
>  #include "migration/global_state.h"
>  #include "migration/misc.h"
>  #include "migration.h"
> +#include "migration-stats.h"
>  #include "savevm.h"
>  #include "qemu-file.h"
>  #include "channel.h"
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 6053012ad9..347999f84a 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -19,6 +19,7 @@
>  #include "qapi/error.h"
>  #include "ram.h"
>  #include "migration.h"
> +#include "migration-stats.h"
>  #include "socket.h"
>  #include "tls.h"
>  #include "qemu-file.h"
> diff --git a/migration/ram.c b/migration/ram.c
> index 89be3e3320..a6d5478ef8 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -36,6 +36,7 @@
>  #include "xbzrle.h"
>  #include "ram.h"
>  #include "migration.h"
> +#include "migration-stats.h"
>  #include "migration/register.h"
>  #include "migration/misc.h"
>  #include "qemu-file.h"
> @@ -460,8 +461,6 @@ uint64_t ram_bytes_remaining(void)
> 0;
>  }
>  
> -RAMStats ram_counters;
> -
>  void ram_transferred_add(uint64_t bytes)
>  {
>  if (runstate_is_running()) {
> diff --git a/migration/ram.h b/migration/ram.h
> index 04b05e1b2c..8692de6ba0 100644
> --- a/migration/ram.h
> +++ b/migration/ram.h
> @@ -32,30 +32,7 @@
>  #include "qapi/qapi-types-migration.h"
>  #include "exec/cpu-common.h"
>  #include "io/channel.h"
> -#include "qemu/stats64.h"
>  
> -/*
> - * These are the ram migration statistic counters.  It is loosely
> - * based on MigrationStats.  We change to Stat64 any counter that
> - * needs to be updated using atomic ops (can be accessed by more than
> - * one thread).
> - */
> -typedef struct {
> -Stat64 dirty_bytes_last_sync;
> -Stat64 dirty_pages_rate;
> -Stat64 

Re: [PATCH 01/19] multifd: We already account for this packet on the multifd thread

2023-04-27 Thread Lukas Straub
On Thu, 27 Apr 2023 18:34:31 +0200
Juan Quintela  wrote:

> Signed-off-by: Juan Quintela 

Reviewed-by: Lukas Straub 

> ---
>  migration/multifd.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 6a59c03dd2..6053012ad9 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -626,10 +626,7 @@ int multifd_send_sync_main(QEMUFile *f)
>  p->packet_num = multifd_send_state->packet_num++;
>  p->flags |= MULTIFD_FLAG_SYNC;
>  p->pending_job++;
> -qemu_file_acct_rate_limit(f, p->packet_len);
>  qemu_mutex_unlock(>mutex);
> -stat64_add(_counters.transferred, p->packet_len);
> -stat64_add(_counters.multifd_bytes, p->packet_len);
>  qemu_sem_post(>sem);
>  }
>  for (i = 0; i < migrate_multifd_channels(); i++) {



-- 



pgpR1ZFCUM3fw.pgp
Description: OpenPGP digital signature


Re: [PATCH 10/13] hw/ide/piix: Reuse PCIIDEState::{cmd,data}_ops

2023-04-27 Thread Bernhard Beschow



Am 27. April 2023 10:52:17 UTC schrieb Mark Cave-Ayland 
:
>On 26/04/2023 21:14, Bernhard Beschow wrote:
>
>> Am 26. April 2023 18:18:35 UTC schrieb Bernhard Beschow :
>>> 
>>> 
>>> Am 26. April 2023 11:37:48 UTC schrieb Mark Cave-Ayland 
>>> :
 On 22/04/2023 16:07, Bernhard Beschow wrote:
 
> Now that PCIIDEState::{cmd,data}_ops are initialized in the base class
> constructor there is an opportunity for PIIX to reuse these attributes. 
> This
> resolves usage of ide_init_ioport() which would fall back internally to 
> using
> the isabus global due to NULL being passed as ISADevice by PIIX.
> 
> Signed-off-by: Bernhard Beschow 
> ---
>hw/ide/piix.c | 30 +-
>1 file changed, 13 insertions(+), 17 deletions(-)
> 
> diff --git a/hw/ide/piix.c b/hw/ide/piix.c
> index a3a15dc7db..406a67fa0f 100644
> --- a/hw/ide/piix.c
> +++ b/hw/ide/piix.c
> @@ -104,34 +104,32 @@ static void piix_ide_reset(DeviceState *dev)
>pci_set_byte(pci_conf + 0x20, 0x01);  /* BMIBA: 20-23h */
>}
>-static bool pci_piix_init_bus(PCIIDEState *d, unsigned i, ISABus 
> *isa_bus,
> -  Error **errp)
> +static void pci_piix_init_bus(PCIIDEState *d, unsigned i, ISABus 
> *isa_bus)
>{
>static const struct {
>int iobase;
>int iobase2;
>int isairq;
>} port_info[] = {
> -{0x1f0, 0x3f6, 14},
> -{0x170, 0x376, 15},
> +{0x1f0, 0x3f4, 14},
> +{0x170, 0x374, 15},
>};
> -int ret;
> +MemoryRegion *address_space_io = pci_address_space_io(PCI_DEVICE(d));
>  ide_bus_init(>bus[i], sizeof(d->bus[i]), DEVICE(d), i, 2);
> -ret = ide_init_ioport(>bus[i], NULL, port_info[i].iobase,
> -  port_info[i].iobase2);
> -if (ret) {
> -error_setg_errno(errp, -ret, "Failed to realize %s port %u",
> - object_get_typename(OBJECT(d)), i);
> -return false;
> -}
> +memory_region_add_subregion(address_space_io, port_info[i].iobase,
> +>data_ops[i]);
> +/*
> + * PIIX forwards the last byte of cmd_ops to ISA. Model this using a 
> low
> + * prio so competing memory regions take precedence.
> + */
> +memory_region_add_subregion_overlap(address_space_io, 
> port_info[i].iobase2,
> +>cmd_ops[i], -1);
 
 Interesting. Is this behaviour documented somewhere and/or used in one of 
 your test images at all? If I'd have seen this myself, I probably thought 
 that the addresses were a typo...
>>> 
>>> I first  stumbled upon this and wondered why this code was working with 
>>> VIA_IDE (through my pc-via branch). Then I found the correct offsets there 
>>> which are confirmed in the piix datasheet, e.g.: "Secondary Control Block 
>>> Offset: 0374h"
>> 
>> In case you were wondering about the forwarding of the last byte the 
>> datasheet says: "Accesses to byte 3 of the Control Block are forwarded to 
>> ISA where the floppy disk controller responds."
>
>Ahhh okay okay I see what's happening here: the PIIX IDE is assuming that the 
>legacy ioport semantics are in operation here, which as you note above is 
>where the FDC controller is also accessed via the above byte in the IDE 
>control block. This is also why you need to change the address above from 
>0x3f6/0x376 to 0x3f4/0x374 when trying to use the MemoryRegions used for the 
>PCI BARs since the PCI IDE controller specification requires a 4 byte 
>allocation for the Control Block - see sections 2.0 and 2.2.

Yes, PIIX assuming that might be the case. Why does it contradict the PCI IDE 
specification? PIIX seems to apply the apprppriate "workarounds" here.

>
>And that's fine, because the portio_lists used in ide_init_ioport() set up the 
>legacy IDE ioports so that FDC accesses done in this way can succeed, and the 
>PIIX IDE is hard-coded to legacy mode. So in fact PIIX IDE should keep using 
>ide_init_ioport() rather than trying to re-use the BAR MemoryRegions so I 
>think this patch should just be dropped.

I was hoping to keep that patch...

Best regards,
Bernhard

>
 
>ide_bus_init_output_irq(>bus[i],
>isa_bus_get_irq(isa_bus, 
> port_info[i].isairq));
>  bmdma_init(>bus[i], >bmdma[i], d);
>ide_bus_register_restart_cb(>bus[i]);
> -
> -return true;
>}
>  static void pci_piix_ide_realize(PCIDevice *dev, Error **errp)
> @@ -160,9 +158,7 @@ static void pci_piix_ide_realize(PCIDevice *dev, 
> Error **errp)
>}
>  for (unsigned i = 0; i < 2; i++) {
> -if (!pci_piix_init_bus(d, i, isa_bus, errp)) 

Re: [PATCH v2] meson: Pass -j option to sphinx

2023-04-27 Thread Fabiano Rosas
Daniel P. Berrangé  writes:

> On Thu, Apr 27, 2023 at 02:25:16PM -0300, Fabiano Rosas wrote:
>> Save a bit of build time by passing the number of jobs option to
>> sphinx.
>> 
>> We cannot use the -j option from make because meson does not support
>> setting build time parameters for custom targets. Use nproc instead or
>> the equivalent sphinx option "-j auto", if that is available.
>> 
>> Also make sure our plugins support parallelism and report it properly
>> to sphinx. Particularly, implement the merge_domaindata method in
>> DBusDomain that is used to merge in data from other subprocesses.
>> 
>> before:
>>   $ time make man html
>>   ...
>>   [1/2] Generating docs/QEMU manual with a custom command
>>   [2/2] Generating docs/QEMU man pages with a custom command
>> 
>>   real0m43.157s
>>   user0m42.642s
>>   sys 0m0.576s
>> 
>> after:
>>   $ time make man html
>>   ...
>>   [1/2] Generating docs/QEMU manual with a custom command
>>   [2/2] Generating docs/QEMU man pages with a custom command
>> 
>>   real0m25.014s
>>   user0m51.288s
>>   sys 0m2.085s
>
> On my 12 CPU laptop I see a similar magnitude benefit - about
> 20 seconds is cut from the docs build time - 50 down to 30 secs.
>
> Watching the CPU usage I see sphinx is not very good at keeping
> all CPUs busy. For perhaps 2 seconds I'll see  8 sphinx processes
> burning CPUs, but the majority of the time it'll only be 1 or 2
> sphinx processes.
>
> IOW, we do get a benefit, but it is not nearly as good as one
> might hope for given the number of CPUs potentially available.
>
>> Signed-off-by: Fabiano Rosas 
>> ---
>>  docs/meson.build   | 12 
>>  docs/sphinx/dbusdomain.py  |  4 
>>  docs/sphinx/fakedbusdoc.py |  5 +
>>  docs/sphinx/qmp_lexer.py   |  5 +
>>  4 files changed, 26 insertions(+)
>
> Tested-by: Daniel P. Berrangé 
>
>> 
>> diff --git a/docs/meson.build b/docs/meson.build
>> index f220800e3e..9e4bed6fa0 100644
>> --- a/docs/meson.build
>> +++ b/docs/meson.build
>> @@ -10,6 +10,18 @@ if sphinx_build.found()
>>  SPHINX_ARGS += [ '-W', '-Dkerneldoc_werror=1' ]
>>endif
>>  
>> +  sphinx_version = run_command(SPHINX_ARGS + ['--version'],
>> +   check: false).stdout().split()[1]
>> +  if sphinx_version.version_compare('>=5.1.2')
>> +SPHINX_ARGS += ['-j', 'auto']
>> +  else
>> +nproc = find_program('nproc')
>> +if nproc.found()
>> +  jobs = run_command(nproc, check:false).stdout()
>> +  SPHINX_ARGS += ['-j', jobs]
>> +endif
>> +  endif
>
> ANy reason for check: false in these 2 run_command calls ?
>

No, I haven't thought about it. I'll change them to true.




Re: [PATCH v2] meson: Pass -j option to sphinx

2023-04-27 Thread Daniel P . Berrangé
On Thu, Apr 27, 2023 at 02:25:16PM -0300, Fabiano Rosas wrote:
> Save a bit of build time by passing the number of jobs option to
> sphinx.
> 
> We cannot use the -j option from make because meson does not support
> setting build time parameters for custom targets. Use nproc instead or
> the equivalent sphinx option "-j auto", if that is available.
> 
> Also make sure our plugins support parallelism and report it properly
> to sphinx. Particularly, implement the merge_domaindata method in
> DBusDomain that is used to merge in data from other subprocesses.
> 
> before:
>   $ time make man html
>   ...
>   [1/2] Generating docs/QEMU manual with a custom command
>   [2/2] Generating docs/QEMU man pages with a custom command
> 
>   real0m43.157s
>   user0m42.642s
>   sys 0m0.576s
> 
> after:
>   $ time make man html
>   ...
>   [1/2] Generating docs/QEMU manual with a custom command
>   [2/2] Generating docs/QEMU man pages with a custom command
> 
>   real0m25.014s
>   user0m51.288s
>   sys 0m2.085s

On my 12 CPU laptop I see a similar magnitude benefit - about
20 seconds is cut from the docs build time - 50 down to 30 secs.

Watching the CPU usage I see sphinx is not very good at keeping
all CPUs busy. For perhaps 2 seconds I'll see  8 sphinx processes
burning CPUs, but the majority of the time it'll only be 1 or 2
sphinx processes.

IOW, we do get a benefit, but it is not nearly as good as one
might hope for given the number of CPUs potentially available.

> Signed-off-by: Fabiano Rosas 
> ---
>  docs/meson.build   | 12 
>  docs/sphinx/dbusdomain.py  |  4 
>  docs/sphinx/fakedbusdoc.py |  5 +
>  docs/sphinx/qmp_lexer.py   |  5 +
>  4 files changed, 26 insertions(+)

Tested-by: Daniel P. Berrangé 

> 
> diff --git a/docs/meson.build b/docs/meson.build
> index f220800e3e..9e4bed6fa0 100644
> --- a/docs/meson.build
> +++ b/docs/meson.build
> @@ -10,6 +10,18 @@ if sphinx_build.found()
>  SPHINX_ARGS += [ '-W', '-Dkerneldoc_werror=1' ]
>endif
>  
> +  sphinx_version = run_command(SPHINX_ARGS + ['--version'],
> +   check: false).stdout().split()[1]
> +  if sphinx_version.version_compare('>=5.1.2')
> +SPHINX_ARGS += ['-j', 'auto']
> +  else
> +nproc = find_program('nproc')
> +if nproc.found()
> +  jobs = run_command(nproc, check:false).stdout()
> +  SPHINX_ARGS += ['-j', jobs]
> +endif
> +  endif

ANy reason for check: false in these 2 run_command calls ?

They'll both return 0 on success, so I would have though
'check: true' was more robust at error reporting ?


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




[PATCH v2] meson: Pass -j option to sphinx

2023-04-27 Thread Fabiano Rosas
Save a bit of build time by passing the number of jobs option to
sphinx.

We cannot use the -j option from make because meson does not support
setting build time parameters for custom targets. Use nproc instead or
the equivalent sphinx option "-j auto", if that is available.

Also make sure our plugins support parallelism and report it properly
to sphinx. Particularly, implement the merge_domaindata method in
DBusDomain that is used to merge in data from other subprocesses.

before:
  $ time make man html
  ...
  [1/2] Generating docs/QEMU manual with a custom command
  [2/2] Generating docs/QEMU man pages with a custom command

  real0m43.157s
  user0m42.642s
  sys 0m0.576s

after:
  $ time make man html
  ...
  [1/2] Generating docs/QEMU manual with a custom command
  [2/2] Generating docs/QEMU man pages with a custom command

  real0m25.014s
  user0m51.288s
  sys 0m2.085s

Signed-off-by: Fabiano Rosas 
---
 docs/meson.build   | 12 
 docs/sphinx/dbusdomain.py  |  4 
 docs/sphinx/fakedbusdoc.py |  5 +
 docs/sphinx/qmp_lexer.py   |  5 +
 4 files changed, 26 insertions(+)

diff --git a/docs/meson.build b/docs/meson.build
index f220800e3e..9e4bed6fa0 100644
--- a/docs/meson.build
+++ b/docs/meson.build
@@ -10,6 +10,18 @@ if sphinx_build.found()
 SPHINX_ARGS += [ '-W', '-Dkerneldoc_werror=1' ]
   endif
 
+  sphinx_version = run_command(SPHINX_ARGS + ['--version'],
+   check: false).stdout().split()[1]
+  if sphinx_version.version_compare('>=5.1.2')
+SPHINX_ARGS += ['-j', 'auto']
+  else
+nproc = find_program('nproc')
+if nproc.found()
+  jobs = run_command(nproc, check:false).stdout()
+  SPHINX_ARGS += ['-j', jobs]
+endif
+  endif
+
   # This is a bit awkward but works: create a trivial document and
   # try to run it with our configuration file (which enforces a
   # version requirement). This will fail if sphinx-build is too old.
diff --git a/docs/sphinx/dbusdomain.py b/docs/sphinx/dbusdomain.py
index 2ea95af623..9872fd5bf6 100644
--- a/docs/sphinx/dbusdomain.py
+++ b/docs/sphinx/dbusdomain.py
@@ -400,6 +400,10 @@ def get_objects(self) -> Iterator[Tuple[str, str, str, 
str, str, int]]:
 for refname, obj in self.objects.items():
 yield (refname, refname, obj.objtype, obj.docname, obj.node_id, 1)
 
+def merge_domaindata(self, docnames, otherdata):
+for name, obj in otherdata['objects'].items():
+if obj.docname in docnames:
+self.data['objects'][name] = obj
 
 def setup(app):
 app.add_domain(DBusDomain)
diff --git a/docs/sphinx/fakedbusdoc.py b/docs/sphinx/fakedbusdoc.py
index d2c5079046..2d2e6ef640 100644
--- a/docs/sphinx/fakedbusdoc.py
+++ b/docs/sphinx/fakedbusdoc.py
@@ -23,3 +23,8 @@ def run(self):
 def setup(app: Sphinx) -> Dict[str, Any]:
 """Register a fake dbus-doc directive with Sphinx"""
 app.add_directive("dbus-doc", FakeDBusDocDirective)
+
+return dict(
+parallel_read_safe = True,
+parallel_write_safe = True
+)
diff --git a/docs/sphinx/qmp_lexer.py b/docs/sphinx/qmp_lexer.py
index f7e4c0e198..a59de8a079 100644
--- a/docs/sphinx/qmp_lexer.py
+++ b/docs/sphinx/qmp_lexer.py
@@ -41,3 +41,8 @@ def setup(sphinx):
 sphinx.add_lexer('QMP', QMPExampleLexer)
 except errors.VersionRequirementError:
 sphinx.add_lexer('QMP', QMPExampleLexer())
+
+return dict(
+parallel_read_safe = True,
+parallel_write_safe = True
+)
-- 
2.35.3




[PATCH v10 3/4] qemu-iotests: test zone append operation

2023-04-27 Thread Sam Li
The patch tests zone append writes by reporting the zone wp after
the completion of the call. "zap -p" option can print the sector
offset value after completion, which should be the start sector
where the append write begins.

Signed-off-by: Sam Li 
Reviewed-by: Stefan Hajnoczi 
---
 qemu-io-cmds.c | 75 ++
 tests/qemu-iotests/tests/zoned | 16 +++
 tests/qemu-iotests/tests/zoned.out | 16 +++
 3 files changed, 107 insertions(+)

diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index f35ea627d7..3f75d2f5a6 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -1874,6 +1874,80 @@ static const cmdinfo_t zone_reset_cmd = {
 .oneline = "reset a zone write pointer in zone block device",
 };
 
+static int do_aio_zone_append(BlockBackend *blk, QEMUIOVector *qiov,
+  int64_t *offset, int flags, int *total)
+{
+int async_ret = NOT_DONE;
+
+blk_aio_zone_append(blk, offset, qiov, flags, aio_rw_done, _ret);
+while (async_ret == NOT_DONE) {
+main_loop_wait(false);
+}
+
+*total = qiov->size;
+return async_ret < 0 ? async_ret : 1;
+}
+
+static int zone_append_f(BlockBackend *blk, int argc, char **argv)
+{
+int ret;
+bool pflag = false;
+int flags = 0;
+int total = 0;
+int64_t offset;
+char *buf;
+int c, nr_iov;
+int pattern = 0xcd;
+QEMUIOVector qiov;
+
+if (optind > argc - 3) {
+return -EINVAL;
+}
+
+if ((c = getopt(argc, argv, "p")) != -1) {
+pflag = true;
+}
+
+offset = cvtnum(argv[optind]);
+if (offset < 0) {
+print_cvtnum_err(offset, argv[optind]);
+return offset;
+}
+optind++;
+nr_iov = argc - optind;
+buf = create_iovec(blk, , [optind], nr_iov, pattern,
+   flags & BDRV_REQ_REGISTERED_BUF);
+if (buf == NULL) {
+return -EINVAL;
+}
+ret = do_aio_zone_append(blk, , , flags, );
+if (ret < 0) {
+printf("zone append failed: %s\n", strerror(-ret));
+goto out;
+}
+
+if (pflag) {
+printf("After zap done, the append sector is 0x%" PRIx64 "\n",
+   tosector(offset));
+}
+
+out:
+qemu_io_free(blk, buf, qiov.size,
+ flags & BDRV_REQ_REGISTERED_BUF);
+qemu_iovec_destroy();
+return ret;
+}
+
+static const cmdinfo_t zone_append_cmd = {
+.name = "zone_append",
+.altname = "zap",
+.cfunc = zone_append_f,
+.argmin = 3,
+.argmax = 4,
+.args = "offset len [len..]",
+.oneline = "append write a number of bytes at a specified offset",
+};
+
 static int truncate_f(BlockBackend *blk, int argc, char **argv);
 static const cmdinfo_t truncate_cmd = {
 .name   = "truncate",
@@ -2672,6 +2746,7 @@ static void __attribute((constructor)) 
init_qemuio_commands(void)
 qemuio_add_command(_close_cmd);
 qemuio_add_command(_finish_cmd);
 qemuio_add_command(_reset_cmd);
+qemuio_add_command(_append_cmd);
 qemuio_add_command(_cmd);
 qemuio_add_command(_cmd);
 qemuio_add_command(_cmd);
diff --git a/tests/qemu-iotests/tests/zoned b/tests/qemu-iotests/tests/zoned
index 56f60616b5..3d23ce9cc1 100755
--- a/tests/qemu-iotests/tests/zoned
+++ b/tests/qemu-iotests/tests/zoned
@@ -82,6 +82,22 @@ echo "(5) resetting the second zone"
 $QEMU_IO $IMG -c "zrs 268435456 268435456"
 echo "After resetting a zone:"
 $QEMU_IO $IMG -c "zrp 268435456 1"
+echo
+echo
+echo "(6) append write" # the physical block size of the device is 4096
+$QEMU_IO $IMG -c "zrp 0 1"
+$QEMU_IO $IMG -c "zap -p 0 0x1000 0x2000"
+echo "After appending the first zone firstly:"
+$QEMU_IO $IMG -c "zrp 0 1"
+$QEMU_IO $IMG -c "zap -p 0 0x1000 0x2000"
+echo "After appending the first zone secondly:"
+$QEMU_IO $IMG -c "zrp 0 1"
+$QEMU_IO $IMG -c "zap -p 268435456 0x1000 0x2000"
+echo "After appending the second zone firstly:"
+$QEMU_IO $IMG -c "zrp 268435456 1"
+$QEMU_IO $IMG -c "zap -p 268435456 0x1000 0x2000"
+echo "After appending the second zone secondly:"
+$QEMU_IO $IMG -c "zrp 268435456 1"
 
 # success, all done
 echo "*** done"
diff --git a/tests/qemu-iotests/tests/zoned.out 
b/tests/qemu-iotests/tests/zoned.out
index b2d061da49..fe53ba4744 100644
--- a/tests/qemu-iotests/tests/zoned.out
+++ b/tests/qemu-iotests/tests/zoned.out
@@ -50,4 +50,20 @@ start: 0x8, len 0x8, cap 0x8, wptr 0x10, 
zcond:14, [type: 2]
 (5) resetting the second zone
 After resetting a zone:
 start: 0x8, len 0x8, cap 0x8, wptr 0x8, zcond:1, [type: 2]
+
+
+(6) append write
+start: 0x0, len 0x8, cap 0x8, wptr 0x0, zcond:1, [type: 2]
+After zap done, the append sector is 0x0
+After appending the first zone firstly:
+start: 0x0, len 0x8, cap 0x8, wptr 0x18, zcond:2, [type: 2]
+After zap done, the append sector is 0x18
+After appending the first zone secondly:
+start: 0x0, len 0x8, cap 0x8, wptr 0x30, zcond:2, [type: 2]
+After zap done, the append sector is 0x8
+After 

[PATCH v10 2/4] block: introduce zone append write for zoned devices

2023-04-27 Thread Sam Li
A zone append command is a write operation that specifies the first
logical block of a zone as the write position. When writing to a zoned
block device using zone append, the byte offset of the call may point at
any position within the zone to which the data is being appended. Upon
completion the device will respond with the position where the data has
been written in the zone.

Signed-off-by: Sam Li 
Reviewed-by: Dmitry Fomichev 
Reviewed-by: Stefan Hajnoczi 
---
 block/block-backend.c | 61 +++
 block/file-posix.c| 58 +
 block/io.c| 27 ++
 block/io_uring.c  |  4 ++
 block/linux-aio.c |  3 ++
 block/raw-format.c|  8 
 include/block/block-io.h  |  4 ++
 include/block/block_int-common.h  |  3 ++
 include/block/raw-aio.h   |  4 +-
 include/sysemu/block-backend-io.h |  9 +
 10 files changed, 173 insertions(+), 8 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 67722eb46d..aa8657e5c8 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1929,6 +1929,45 @@ BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, 
BlockZoneOp op,
 return >common;
 }
 
+static void coroutine_fn blk_aio_zone_append_entry(void *opaque)
+{
+BlkAioEmAIOCB *acb = opaque;
+BlkRwCo *rwco = >rwco;
+
+rwco->ret = blk_co_zone_append(rwco->blk, (int64_t *)(uintptr_t)acb->bytes,
+   rwco->iobuf, rwco->flags);
+blk_aio_complete(acb);
+}
+
+BlockAIOCB *blk_aio_zone_append(BlockBackend *blk, int64_t *offset,
+QEMUIOVector *qiov, BdrvRequestFlags flags,
+BlockCompletionFunc *cb, void *opaque) {
+BlkAioEmAIOCB *acb;
+Coroutine *co;
+IO_CODE();
+
+blk_inc_in_flight(blk);
+acb = blk_aio_get(_aio_em_aiocb_info, blk, cb, opaque);
+acb->rwco = (BlkRwCo) {
+.blk= blk,
+.ret= NOT_DONE,
+.flags  = flags,
+.iobuf  = qiov,
+};
+acb->bytes = (int64_t)(uintptr_t)offset;
+acb->has_returned = false;
+
+co = qemu_coroutine_create(blk_aio_zone_append_entry, acb);
+aio_co_enter(blk_get_aio_context(blk), co);
+acb->has_returned = true;
+if (acb->rwco.ret != NOT_DONE) {
+replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
+ blk_aio_complete_bh, acb);
+}
+
+return >common;
+}
+
 /*
  * Send a zone_report command.
  * offset is a byte offset from the start of the device. No alignment
@@ -1982,6 +2021,28 @@ int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, 
BlockZoneOp op,
 return ret;
 }
 
+/*
+ * Send a zone_append command.
+ */
+int coroutine_fn blk_co_zone_append(BlockBackend *blk, int64_t *offset,
+QEMUIOVector *qiov, BdrvRequestFlags flags)
+{
+int ret;
+IO_CODE();
+
+blk_inc_in_flight(blk);
+blk_wait_while_drained(blk);
+GRAPH_RDLOCK_GUARD();
+if (!blk_is_available(blk)) {
+blk_dec_in_flight(blk);
+return -ENOMEDIUM;
+}
+
+ret = bdrv_co_zone_append(blk_bs(blk), offset, qiov, flags);
+blk_dec_in_flight(blk);
+return ret;
+}
+
 void blk_drain(BlockBackend *blk)
 {
 BlockDriverState *bs = blk_bs(blk);
diff --git a/block/file-posix.c b/block/file-posix.c
index c0c83c6631..8fc7f73d2c 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -160,6 +160,7 @@ typedef struct BDRVRawState {
 bool has_write_zeroes:1;
 bool use_linux_aio:1;
 bool use_linux_io_uring:1;
+int64_t *offset; /* offset of zone append operation */
 int page_cache_inconsistent; /* errno from fdatasync failure */
 bool has_fallocate;
 bool needs_alignment;
@@ -1702,7 +1703,7 @@ static ssize_t handle_aiocb_rw_vector(RawPosixAIOData 
*aiocb)
 ssize_t len;
 
 len = RETRY_ON_EINTR(
-(aiocb->aio_type & QEMU_AIO_WRITE) ?
+(aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) ?
 qemu_pwritev(aiocb->aio_fildes,
aiocb->io.iov,
aiocb->io.niov,
@@ -1731,7 +1732,7 @@ static ssize_t handle_aiocb_rw_linear(RawPosixAIOData 
*aiocb, char *buf)
 ssize_t len;
 
 while (offset < aiocb->aio_nbytes) {
-if (aiocb->aio_type & QEMU_AIO_WRITE) {
+if (aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
 len = pwrite(aiocb->aio_fildes,
  (const char *)buf + offset,
  aiocb->aio_nbytes - offset,
@@ -1824,7 +1825,7 @@ static int handle_aiocb_rw(void *opaque)
 }
 
 nbytes = handle_aiocb_rw_linear(aiocb, buf);
-if (!(aiocb->aio_type & QEMU_AIO_WRITE)) {
+if (!(aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND))) {
 char *p = buf;
 size_t count = aiocb->aio_nbytes, copy;
 int i;
@@ -2457,8 +2458,12 @@ 

[PATCH v10 4/4] block: add some trace events for zone append

2023-04-27 Thread Sam Li
Signed-off-by: Sam Li 
Reviewed-by: Dmitry Fomichev 
Reviewed-by: Stefan Hajnoczi 
---
 block/file-posix.c | 3 +++
 block/trace-events | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/block/file-posix.c b/block/file-posix.c
index 8fc7f73d2c..5f1745ede8 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2517,6 +2517,8 @@ out:
 if (!BDRV_ZT_IS_CONV(*wp)) {
 if (type & QEMU_AIO_ZONE_APPEND) {
 *s->offset = *wp;
+trace_zbd_zone_append_complete(bs, *s->offset
+>> BDRV_SECTOR_BITS);
 }
 /* Advance the wp if needed */
 if (offset + bytes > *wp) {
@@ -3559,6 +3561,7 @@ static int coroutine_fn 
raw_co_zone_append(BlockDriverState *bs,
 len += iov_len;
 }
 
+trace_zbd_zone_append(bs, *offset >> BDRV_SECTOR_BITS);
 return raw_co_prw(bs, *offset, len, qiov, QEMU_AIO_ZONE_APPEND);
 }
 #endif
diff --git a/block/trace-events b/block/trace-events
index 3f4e1d088a..32665158d6 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -211,6 +211,8 @@ file_hdev_is_sg(int type, int version) "SG device found: 
type=%d, version=%d"
 file_flush_fdatasync_failed(int err) "errno %d"
 zbd_zone_report(void *bs, unsigned int nr_zones, int64_t sector) "bs %p report 
%d zones starting at sector offset 0x%" PRIx64 ""
 zbd_zone_mgmt(void *bs, const char *op_name, int64_t sector, int64_t len) "bs 
%p %s starts at sector offset 0x%" PRIx64 " over a range of 0x%" PRIx64 " 
sectors"
+zbd_zone_append(void *bs, int64_t sector) "bs %p append at sector offset 0x%" 
PRIx64 ""
+zbd_zone_append_complete(void *bs, int64_t sector) "bs %p returns append 
sector 0x%" PRIx64 ""
 
 # ssh.c
 sftp_error(const char *op, const char *ssh_err, int ssh_err_code, int 
sftp_err_code) "%s failed: %s (libssh error code: %d, sftp error code: %d)"
-- 
2.40.0




[PATCH v10 1/4] file-posix: add tracking of the zone write pointers

2023-04-27 Thread Sam Li
Since Linux doesn't have a user API to issue zone append operations to
zoned devices from user space, the file-posix driver is modified to add
zone append emulation using regular writes. To do this, the file-posix
driver tracks the wp location of all zones of the device. It uses an
array of uint64_t. The most significant bit of each wp location indicates
if the zone type is conventional zones.

The zones wp can be changed due to the following operations issued:
- zone reset: change the wp to the start offset of that zone
- zone finish: change to the end location of that zone
- write to a zone
- zone append

Signed-off-by: Sam Li 
---
 block/file-posix.c   | 177 ++-
 include/block/block-common.h |  14 +++
 include/block/block_int-common.h |   5 +
 3 files changed, 193 insertions(+), 3 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 701acddbca..c0c83c6631 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1327,9 +1327,93 @@ static int hdev_get_max_segments(int fd, struct stat *st)
 }
 
 #if defined(CONFIG_BLKZONED)
+/*
+ * If the reset_all flag is true, then the wps of zone whose state is
+ * not readonly or offline should be all reset to the start sector.
+ * Else, take the real wp of the device.
+ */
+static int get_zones_wp(BlockDriverState *bs, int fd, int64_t offset,
+unsigned int nrz, bool reset_all)
+{
+struct blk_zone *blkz;
+size_t rep_size;
+uint64_t sector = offset >> BDRV_SECTOR_BITS;
+BlockZoneWps *wps = bs->wps;
+unsigned int j = offset / bs->bl.zone_size;
+unsigned int n = 0, i = 0;
+int ret;
+rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct blk_zone);
+g_autofree struct blk_zone_report *rep = NULL;
+
+rep = g_malloc(rep_size);
+blkz = (struct blk_zone *)(rep + 1);
+while (n < nrz) {
+memset(rep, 0, rep_size);
+rep->sector = sector;
+rep->nr_zones = nrz - n;
+
+do {
+ret = ioctl(fd, BLKREPORTZONE, rep);
+} while (ret != 0 && errno == EINTR);
+if (ret != 0) {
+error_report("%d: ioctl BLKREPORTZONE at %" PRId64 " failed %d",
+fd, offset, errno);
+return -errno;
+}
+
+if (!rep->nr_zones) {
+break;
+}
+
+for (i = 0; i < rep->nr_zones; ++i, ++n, ++j) {
+/*
+ * The wp tracking cares only about sequential writes required and
+ * sequential write preferred zones so that the wp can advance to
+ * the right location.
+ * Use the most significant bit of the wp location to indicate the
+ * zone type: 0 for SWR/SWP zones and 1 for conventional zones.
+ */
+if (blkz[i].type == BLK_ZONE_TYPE_CONVENTIONAL) {
+wps->wp[j] |= 1ULL << 63;
+} else {
+switch(blkz[i].cond) {
+case BLK_ZONE_COND_FULL:
+case BLK_ZONE_COND_READONLY:
+/* Zone not writable */
+wps->wp[j] = (blkz[i].start + blkz[i].len) << 
BDRV_SECTOR_BITS;
+break;
+case BLK_ZONE_COND_OFFLINE:
+/* Zone not writable nor readable */
+wps->wp[j] = (blkz[i].start) << BDRV_SECTOR_BITS;
+break;
+default:
+if (reset_all) {
+wps->wp[j] = blkz[i].start << BDRV_SECTOR_BITS;
+} else {
+wps->wp[j] = blkz[i].wp << BDRV_SECTOR_BITS;
+}
+break;
+}
+}
+}
+sector = blkz[i - 1].start + blkz[i - 1].len;
+}
+
+return 0;
+}
+
+static void update_zones_wp(BlockDriverState *bs, int fd, int64_t offset,
+unsigned int nrz)
+{
+if (get_zones_wp(bs, fd, offset, nrz, 0) < 0) {
+error_report("update zone wp failed");
+}
+}
+
 static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
  Error **errp)
 {
+BDRVRawState *s = bs->opaque;
 BlockZoneModel zoned;
 int ret;
 
@@ -1380,6 +1464,23 @@ static void raw_refresh_zoned_limits(BlockDriverState 
*bs, struct stat *st,
 if (ret > 0) {
 bs->bl.max_append_sectors = ret >> BDRV_SECTOR_BITS;
 }
+
+ret = get_sysfs_long_val(st, "physical_block_size");
+if (ret >= 0) {
+bs->bl.write_granularity = ret;
+}
+
+/* The refresh_limits() function can be called multiple times. */
+g_free(bs->wps);
+bs->wps = g_malloc(sizeof(BlockZoneWps) +
+sizeof(int64_t) * bs->bl.nr_zones);
+ret = get_zones_wp(bs, s->fd, 0, bs->bl.nr_zones, 0);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "report wps failed");
+bs->wps = NULL;
+return;
+}
+

[PATCH v10 0/4] Add zone append write for zoned device

2023-04-27 Thread Sam Li
This patch series add zone append operation based on the previous
zoned device support part. The file-posix driver is modified to
add zone append emulation using regular writes.

v9:
- address review comments [Stefan]
  * fix get_zones_wp() for wrong offset index
  * fix misuses of QEMU_LOCK_GUARD()
  * free and allocate wps in refresh_limits for now

v8:
- address review comments [Stefan]
  * fix zone_mgmt covering multiple zones case
  * fix memory leak bug of wps in refresh_limits()
  * mv BlockZoneWps field from BlockLimits to BlockDriverState
  * add check_qiov_request() to bdrv_co_zone_append

v7:
- address review comments
  * fix wp assignment [Stefan]
  * fix reset_all cases, skip R/O & offline zones [Dmitry, Damien]
  * fix locking on non-zap related cases [Stefan]
  * cleanups and typos correction
- add "zap -p" option to qemuio-cmds [Stefan]

v6:
- add small fixes

v5:
- fix locking conditions and error handling
- drop some trival optimizations
- add tracing points for zone append

v4:
- fix lock related issues[Damien]
- drop all field in zone_mgmt op [Damien]
- fix state checks in zong_mgmt command [Damien]
- return start sector of wp when issuing zap req [Damien]

v3:
- only read wps when it is locked [Damien]
- allow last smaller zone case [Damien]
- add zone type and state checks in zone_mgmt command [Damien]
- fix RESET_ALL related problems

v2:
- split patch to two patches for better reviewing
- change BlockZoneWps's structure to an array of integers
- use only mutex lock on locking conditions of zone wps
- coding styles and clean-ups

v1:
- introduce zone append write

Sam Li (4):
  file-posix: add tracking of the zone write pointers
  block: introduce zone append write for zoned devices
  qemu-iotests: test zone append operation
  block: add some trace events for zone append

 block/block-backend.c  |  61 
 block/file-posix.c | 230 -
 block/io.c |  27 
 block/io_uring.c   |   4 +
 block/linux-aio.c  |   3 +
 block/raw-format.c |   8 +
 block/trace-events |   2 +
 include/block/block-common.h   |  14 ++
 include/block/block-io.h   |   4 +
 include/block/block_int-common.h   |   8 +
 include/block/raw-aio.h|   4 +-
 include/sysemu/block-backend-io.h  |   9 ++
 qemu-io-cmds.c |  75 ++
 tests/qemu-iotests/tests/zoned |  16 ++
 tests/qemu-iotests/tests/zoned.out |  16 ++
 15 files changed, 474 insertions(+), 7 deletions(-)

-- 
2.40.0




[PATCH v19 6/8] iotests: test new zone operations

2023-04-27 Thread Sam Li
The new block layer APIs of zoned block devices can be tested by:
$ tests/qemu-iotests/check zoned
Run each zone operation on a newly created null_blk device
and see whether it outputs the same zone information.

Signed-off-by: Sam Li 
Reviewed-by: Stefan Hajnoczi 
Acked-by: Kevin Wolf 
Message-id: 20230324090605.28361-7-faithilike...@gmail.com
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
.
--Stefan]
Signed-off-by: Stefan Hajnoczi 
---
 tests/qemu-iotests/tests/zoned | 89 ++
 tests/qemu-iotests/tests/zoned.out | 53 ++
 2 files changed, 142 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/zoned
 create mode 100644 tests/qemu-iotests/tests/zoned.out

diff --git a/tests/qemu-iotests/tests/zoned b/tests/qemu-iotests/tests/zoned
new file mode 100755
index 00..56f60616b5
--- /dev/null
+++ b/tests/qemu-iotests/tests/zoned
@@ -0,0 +1,89 @@
+#!/usr/bin/env bash
+#
+# Test zone management operations.
+#
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+status=1 # failure is the default!
+
+_cleanup()
+{
+  _cleanup_test_img
+  sudo -n rmmod null_blk
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ../common.rc
+. ../common.filter
+. ../common.qemu
+
+# This test only runs on Linux hosts with raw image files.
+_supported_fmt raw
+_supported_proto file
+_supported_os Linux
+
+sudo -n true || \
+_notrun 'Password-less sudo required'
+
+IMG="--image-opts -n driver=host_device,filename=/dev/nullb0"
+QEMU_IO_OPTIONS=$QEMU_IO_OPTIONS_NO_FMT
+
+echo "Testing a null_blk device:"
+echo "case 1: if the operations work"
+sudo -n modprobe null_blk nr_devices=1 zoned=1
+sudo -n chmod 0666 /dev/nullb0
+
+echo "(1) report the first zone:"
+$QEMU_IO $IMG -c "zrp 0 1"
+echo
+echo "report the first 10 zones"
+$QEMU_IO $IMG -c "zrp 0 10"
+echo
+echo "report the last zone:"
+$QEMU_IO $IMG -c "zrp 0x3e7000 2" # 0x3e7000 / 512 = 0x1f38
+echo
+echo
+echo "(2) opening the first zone"
+$QEMU_IO $IMG -c "zo 0 268435456"  # 268435456 / 512 = 524288
+echo "report after:"
+$QEMU_IO $IMG -c "zrp 0 1"
+echo
+echo "opening the second zone"
+$QEMU_IO $IMG -c "zo 268435456 268435456" #
+echo "report after:"
+$QEMU_IO $IMG -c "zrp 268435456 1"
+echo
+echo "opening the last zone"
+$QEMU_IO $IMG -c "zo 0x3e7000 268435456"
+echo "report after:"
+$QEMU_IO $IMG -c "zrp 0x3e7000 2"
+echo
+echo
+echo "(3) closing the first zone"
+$QEMU_IO $IMG -c "zc 0 268435456"
+echo "report after:"
+$QEMU_IO $IMG -c "zrp 0 1"
+echo
+echo "closing the last zone"
+$QEMU_IO $IMG -c "zc 0x3e7000 268435456"
+echo "report after:"
+$QEMU_IO $IMG -c "zrp 0x3e7000 2"
+echo
+echo
+echo "(4) finishing the second zone"
+$QEMU_IO $IMG -c "zf 268435456 268435456"
+echo "After finishing a zone:"
+$QEMU_IO $IMG -c "zrp 268435456 1"
+echo
+echo
+echo "(5) resetting the second zone"
+$QEMU_IO $IMG -c "zrs 268435456 268435456"
+echo "After resetting a zone:"
+$QEMU_IO $IMG -c "zrp 268435456 1"
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/tests/zoned.out 
b/tests/qemu-iotests/tests/zoned.out
new file mode 100644
index 00..b2d061da49
--- /dev/null
+++ b/tests/qemu-iotests/tests/zoned.out
@@ -0,0 +1,53 @@
+QA output created by zoned
+Testing a null_blk device:
+case 1: if the operations work
+(1) report the first zone:
+start: 0x0, len 0x8, cap 0x8, wptr 0x0, zcond:1, [type: 2]
+
+report the first 10 zones
+start: 0x0, len 0x8, cap 0x8, wptr 0x0, zcond:1, [type: 2]
+start: 0x8, len 0x8, cap 0x8, wptr 0x8, zcond:1, [type: 2]
+start: 0x10, len 0x8, cap 0x8, wptr 0x10, zcond:1, [type: 2]
+start: 0x18, len 0x8, cap 0x8, wptr 0x18, zcond:1, [type: 2]
+start: 0x20, len 0x8, cap 0x8, wptr 0x20, zcond:1, [type: 2]
+start: 0x28, len 0x8, cap 0x8, wptr 0x28, zcond:1, [type: 2]
+start: 0x30, len 0x8, cap 0x8, wptr 0x30, zcond:1, [type: 2]
+start: 0x38, len 0x8, cap 0x8, wptr 0x38, zcond:1, [type: 2]
+start: 0x40, len 0x8, cap 0x8, wptr 0x40, zcond:1, [type: 2]
+start: 0x48, len 0x8, cap 0x8, wptr 0x48, zcond:1, [type: 2]
+
+report the last zone:
+start: 0x1f38, len 0x8, cap 0x8, wptr 0x1f38, zcond:1, [type: 
2]
+
+
+(2) opening the first zone
+report after:
+start: 0x0, len 0x8, cap 0x8, wptr 0x0, zcond:3, [type: 2]
+
+opening the second zone
+report after:
+start: 0x8, len 0x8, cap 0x8, wptr 0x8, zcond:3, [type: 2]
+
+opening the last zone
+report after:
+start: 0x1f38, len 0x8, cap 0x8, wptr 0x1f38, zcond:3, [type: 
2]
+
+
+(3) closing the first zone
+report after:
+start: 0x0, len 0x8, cap 0x8, wptr 0x0, zcond:1, [type: 2]
+
+closing the last zone
+report after:
+start: 0x1f38, len 0x8, cap 0x8, wptr 0x1f38, zcond:1, [type: 

[PATCH v19 7/8] block: add some trace events for new block layer APIs

2023-04-27 Thread Sam Li
Signed-off-by: Sam Li 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Dmitry Fomichev 
Acked-by: Kevin Wolf 
Message-id: 20230324090605.28361-8-faithilike...@gmail.com
Signed-off-by: Stefan Hajnoczi 
---
 block/file-posix.c | 3 +++
 block/trace-events | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/block/file-posix.c b/block/file-posix.c
index 67d4ec6ac5..701acddbca 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -3271,6 +3271,7 @@ static int coroutine_fn 
raw_co_zone_report(BlockDriverState *bs, int64_t offset,
 },
 };
 
+trace_zbd_zone_report(bs, *nr_zones, offset >> BDRV_SECTOR_BITS);
 return raw_thread_pool_submit(handle_aiocb_zone_report, );
 }
 #endif
@@ -3337,6 +3338,8 @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState 
*bs, BlockZoneOp op,
 },
 };
 
+trace_zbd_zone_mgmt(bs, op_name, offset >> BDRV_SECTOR_BITS,
+len >> BDRV_SECTOR_BITS);
 ret = raw_thread_pool_submit(handle_aiocb_zone_mgmt, );
 if (ret != 0) {
 error_report("ioctl %s failed %d", op_name, ret);
diff --git a/block/trace-events b/block/trace-events
index 48dbf10c66..3f4e1d088a 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -209,6 +209,8 @@ file_FindEjectableOpticalMedia(const char *media) "Matching 
using %s"
 file_setup_cdrom(const char *partition) "Using %s as optical disc"
 file_hdev_is_sg(int type, int version) "SG device found: type=%d, version=%d"
 file_flush_fdatasync_failed(int err) "errno %d"
+zbd_zone_report(void *bs, unsigned int nr_zones, int64_t sector) "bs %p report 
%d zones starting at sector offset 0x%" PRIx64 ""
+zbd_zone_mgmt(void *bs, const char *op_name, int64_t sector, int64_t len) "bs 
%p %s starts at sector offset 0x%" PRIx64 " over a range of 0x%" PRIx64 " 
sectors"
 
 # ssh.c
 sftp_error(const char *op, const char *ssh_err, int ssh_err_code, int 
sftp_err_code) "%s failed: %s (libssh error code: %d, sftp error code: %d)"
-- 
2.40.0




[PATCH v19 0/8] Add support for zoned device

2023-04-27 Thread Sam Li
Zoned Block Devices (ZBDs) devide the LBA space to block regions called zones
that are larger than the LBA size. It can only allow sequential writes, which
reduces write amplification in SSD, leading to higher throughput and increased
capacity. More details about ZBDs can be found at:

https://zonedstorage.io/docs/introduction/zoned-storage

The zoned device support aims to let guests (virtual machines) access zoned
storage devices on the host (hypervisor) through a virtio-blk device. This
involves extending QEMU's block layer and virtio-blk emulation code.  In its
current status, the virtio-blk device is not aware of ZBDs but the guest sees
host-managed drives as regular drive that will runs correctly under the most
common write workloads.

This patch series extend the block layer APIs with the minimum set of zoned
commands that are necessary to support zoned devices. The commands are - Report
Zones, four zone operations and Zone Append.

There has been a debate on whethre introducing new zoned_host_device BlockDriver
specifically for zoned devices. In the end, it's been decided to stick to
existing host_device BlockDriver interface by only adding new zoned operations
inside it. The benefit of that is to avoid further changes - one example is
command line syntax - to the applications like Libvirt using QEMU zoned
emulation.

It can be tested on a null_blk device using qemu-io or qemu-iotests. For
example, to test zone report using qemu-io:
$ path/to/qemu-io --image-opts -n driver=host_device,filename=/dev/nullb0
-c "zrp offset nr_zones"

v19:
- fix CI related issues [Stefan]

v18:
- use 'sudo -n' in qemuio-tests [Stefan]

v17:
- fix qemuiotests for zoned support patches [Dmitry]

v16:
- update zoned_host device name to host_device [Stefan]
- fix probing zoned device blocksizes [Stefan]
- Use empty fields instead of changing struct size of BlkRwCo [Kevin, Stefan]

v15:
- drop zoned_host_device BlockDriver
- add zoned device option to host_device driver instead of introducing a new
  zoned_host_device BlockDriver [Stefan]

v14:
- address Stefan's comments of probing block sizes

v13:
- add some tracing points for new zone APIs [Dmitry]
- change error handling in zone_mgmt [Damien, Stefan]

v12:
- address review comments
  * drop BLK_ZO_RESET_ALL bit [Damien]
  * fix error messages, style, and typos[Damien, Hannes]

v11:
- address review comments
  * fix possible BLKZONED config compiling warnings [Stefan]
  * fix capacity field compiling warnings on older kernel [Stefan,Damien]

v10:
- address review comments
  * deal with the last small zone case in zone_mgmt operations [Damien]
  * handle the capacity field outdated in old kernel(before 5.9) [Damien]
  * use byte unit in block layer to be consistent with QEMU [Eric]
  * fix coding style related problems [Stefan]

v9:
- address review comments
  * specify units of zone commands requests [Stefan]
  * fix some error handling in file-posix [Stefan]
  * introduce zoned_host_devcie in the commit message [Markus]

v8:
- address review comments
  * solve patch conflicts and merge sysfs helper funcations into one patch
  * add cache.direct=on check in config

v7:
- address review comments
  * modify sysfs attribute helper funcations
  * move the input validation and error checking into raw_co_zone_* function
  * fix checks in config

v6:
- drop virtio-blk emulation changes
- address Stefan's review comments
  * fix CONFIG_BLKZONED configs in related functions
  * replace reading fd by g_file_get_contents() in get_sysfs_str_val()
  * rewrite documentation for zoned storage

v5:
- add zoned storage emulation to virtio-blk device
- add documentation for zoned storage
- address review comments
  * fix qemu-iotests
  * fix check to block layer
  * modify interfaces of sysfs helper functions
  * rename zoned device structs according to QEMU styles
  * reorder patches

v4:
- add virtio-blk headers for zoned device
- add configurations for zoned host device
- add zone operations for raw-format
- address review comments
  * fix memory leak bug in zone_report
  * add checks to block layers
  * fix qemu-iotests format
  * fix sysfs helper functions

v3:
- add helper functions to get sysfs attributes
- address review comments
  * fix zone report bugs
  * fix the qemu-io code path
  * use thread pool to avoid blocking ioctl() calls

v2:
- add qemu-io sub-commands
- address review comments
  * modify interfaces of APIs

v1:
- add block layer APIs resembling Linux ZoneBlockDevice ioctls

Sam Li (8):
  block/block-common: add zoned device structs
  block/file-posix: introduce helper functions for sysfs attributes
  block/block-backend: add block layer APIs resembling Linux
ZonedBlockDevice ioctls
  block/raw-format: add zone operations to pass through requests
  block: add zoned BlockDriver check to block layer
  iotests: test new zone operations
  block: add some trace events for new block layer APIs
  docs/zoned-storage: add zoned device documentation

 block.c 

[PATCH v19 1/8] block/block-common: add zoned device structs

2023-04-27 Thread Sam Li
Signed-off-by: Sam Li 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Damien Le Moal 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Dmitry Fomichev 
Acked-by: Kevin Wolf 
Message-id: 20230324090605.28361-2-faithilike...@gmail.com
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
.
--Stefan]
Signed-off-by: Stefan Hajnoczi 
---
 include/block/block-common.h | 43 
 1 file changed, 43 insertions(+)

diff --git a/include/block/block-common.h b/include/block/block-common.h
index b5122ef8ab..1576fcf2ed 100644
--- a/include/block/block-common.h
+++ b/include/block/block-common.h
@@ -75,6 +75,49 @@ typedef struct BlockDriver BlockDriver;
 typedef struct BdrvChild BdrvChild;
 typedef struct BdrvChildClass BdrvChildClass;
 
+typedef enum BlockZoneOp {
+BLK_ZO_OPEN,
+BLK_ZO_CLOSE,
+BLK_ZO_FINISH,
+BLK_ZO_RESET,
+} BlockZoneOp;
+
+typedef enum BlockZoneModel {
+BLK_Z_NONE = 0x0, /* Regular block device */
+BLK_Z_HM = 0x1, /* Host-managed zoned block device */
+BLK_Z_HA = 0x2, /* Host-aware zoned block device */
+} BlockZoneModel;
+
+typedef enum BlockZoneState {
+BLK_ZS_NOT_WP = 0x0,
+BLK_ZS_EMPTY = 0x1,
+BLK_ZS_IOPEN = 0x2,
+BLK_ZS_EOPEN = 0x3,
+BLK_ZS_CLOSED = 0x4,
+BLK_ZS_RDONLY = 0xD,
+BLK_ZS_FULL = 0xE,
+BLK_ZS_OFFLINE = 0xF,
+} BlockZoneState;
+
+typedef enum BlockZoneType {
+BLK_ZT_CONV = 0x1, /* Conventional random writes supported */
+BLK_ZT_SWR = 0x2, /* Sequential writes required */
+BLK_ZT_SWP = 0x3, /* Sequential writes preferred */
+} BlockZoneType;
+
+/*
+ * Zone descriptor data structure.
+ * Provides information on a zone with all position and size values in bytes.
+ */
+typedef struct BlockZoneDescriptor {
+uint64_t start;
+uint64_t length;
+uint64_t cap;
+uint64_t wp;
+BlockZoneType type;
+BlockZoneState state;
+} BlockZoneDescriptor;
+
 typedef struct BlockDriverInfo {
 /* in bytes, 0 if irrelevant */
 int cluster_size;
-- 
2.40.0




[PATCH v19 3/8] block/block-backend: add block layer APIs resembling Linux ZonedBlockDevice ioctls

2023-04-27 Thread Sam Li
Add zoned device option to host_device BlockDriver. It will be presented only
for zoned host block devices. By adding zone management operations to the
host_block_device BlockDriver, users can use the new block layer APIs
including Report Zone and four zone management operations
(open, close, finish, reset, reset_all).

Qemu-io uses the new APIs to perform zoned storage commands of the device:
zone_report(zrp), zone_open(zo), zone_close(zc), zone_reset(zrs),
zone_finish(zf).

For example, to test zone_report, use following command:
$ ./build/qemu-io --image-opts -n driver=host_device, filename=/dev/nullb0
-c "zrp offset nr_zones"

Signed-off-by: Sam Li 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Dmitry Fomichev 
Acked-by: Kevin Wolf 
Message-id: 20230324090605.28361-4-faithilike...@gmail.com
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
 and remove spurious ret = -errno in
raw_co_zone_mgmt().
--Stefan]
Signed-off-by: Stefan Hajnoczi 
---
 block/block-backend.c | 137 +
 block/file-posix.c| 313 +-
 block/io.c|  41 
 include/block/block-io.h  |   9 +
 include/block/block_int-common.h  |  21 ++
 include/block/raw-aio.h   |   6 +-
 include/sysemu/block-backend-io.h |  18 ++
 meson.build   |   4 +
 qemu-io-cmds.c| 149 ++
 9 files changed, 695 insertions(+), 3 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index fc530ded6a..67722eb46d 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1845,6 +1845,143 @@ int coroutine_fn blk_co_flush(BlockBackend *blk)
 return ret;
 }
 
+static void coroutine_fn blk_aio_zone_report_entry(void *opaque)
+{
+BlkAioEmAIOCB *acb = opaque;
+BlkRwCo *rwco = >rwco;
+
+rwco->ret = blk_co_zone_report(rwco->blk, rwco->offset,
+   (unsigned int*)(uintptr_t)acb->bytes,
+   rwco->iobuf);
+blk_aio_complete(acb);
+}
+
+BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset,
+unsigned int *nr_zones,
+BlockZoneDescriptor  *zones,
+BlockCompletionFunc *cb, void *opaque)
+{
+BlkAioEmAIOCB *acb;
+Coroutine *co;
+IO_CODE();
+
+blk_inc_in_flight(blk);
+acb = blk_aio_get(_aio_em_aiocb_info, blk, cb, opaque);
+acb->rwco = (BlkRwCo) {
+.blk= blk,
+.offset = offset,
+.iobuf  = zones,
+.ret= NOT_DONE,
+};
+acb->bytes = (int64_t)(uintptr_t)nr_zones,
+acb->has_returned = false;
+
+co = qemu_coroutine_create(blk_aio_zone_report_entry, acb);
+aio_co_enter(blk_get_aio_context(blk), co);
+
+acb->has_returned = true;
+if (acb->rwco.ret != NOT_DONE) {
+replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
+ blk_aio_complete_bh, acb);
+}
+
+return >common;
+}
+
+static void coroutine_fn blk_aio_zone_mgmt_entry(void *opaque)
+{
+BlkAioEmAIOCB *acb = opaque;
+BlkRwCo *rwco = >rwco;
+
+rwco->ret = blk_co_zone_mgmt(rwco->blk,
+ (BlockZoneOp)(uintptr_t)rwco->iobuf,
+ rwco->offset, acb->bytes);
+blk_aio_complete(acb);
+}
+
+BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
+  int64_t offset, int64_t len,
+  BlockCompletionFunc *cb, void *opaque) {
+BlkAioEmAIOCB *acb;
+Coroutine *co;
+IO_CODE();
+
+blk_inc_in_flight(blk);
+acb = blk_aio_get(_aio_em_aiocb_info, blk, cb, opaque);
+acb->rwco = (BlkRwCo) {
+.blk= blk,
+.offset = offset,
+.iobuf  = (void *)(uintptr_t)op,
+.ret= NOT_DONE,
+};
+acb->bytes = len;
+acb->has_returned = false;
+
+co = qemu_coroutine_create(blk_aio_zone_mgmt_entry, acb);
+aio_co_enter(blk_get_aio_context(blk), co);
+
+acb->has_returned = true;
+if (acb->rwco.ret != NOT_DONE) {
+replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
+ blk_aio_complete_bh, acb);
+}
+
+return >common;
+}
+
+/*
+ * Send a zone_report command.
+ * offset is a byte offset from the start of the device. No alignment
+ * required for offset.
+ * nr_zones represents IN maximum and OUT actual.
+ */
+int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
+unsigned int *nr_zones,
+BlockZoneDescriptor *zones)
+{
+int ret;
+IO_CODE();
+
+blk_inc_in_flight(blk); /* increase before waiting */
+blk_wait_while_drained(blk);
+GRAPH_RDLOCK_GUARD();
+if (!blk_is_available(blk)) {
+blk_dec_in_flight(blk);
+return -ENOMEDIUM;

[PATCH v19 4/8] block/raw-format: add zone operations to pass through requests

2023-04-27 Thread Sam Li
raw-format driver usually sits on top of file-posix driver. It needs to
pass through requests of zone commands.

Signed-off-by: Sam Li 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Damien Le Moal 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Dmitry Fomichev 
Acked-by: Kevin Wolf 
Message-id: 20230324090605.28361-5-faithilike...@gmail.com
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
.
--Stefan]
Signed-off-by: Stefan Hajnoczi 
---
 block/raw-format.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/block/raw-format.c b/block/raw-format.c
index 06b8030d9d..f167448462 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -317,6 +317,21 @@ raw_co_pdiscard(BlockDriverState *bs, int64_t offset, 
int64_t bytes)
 return bdrv_co_pdiscard(bs->file, offset, bytes);
 }
 
+static int coroutine_fn GRAPH_RDLOCK
+raw_co_zone_report(BlockDriverState *bs, int64_t offset,
+   unsigned int *nr_zones,
+   BlockZoneDescriptor *zones)
+{
+return bdrv_co_zone_report(bs->file->bs, offset, nr_zones, zones);
+}
+
+static int coroutine_fn GRAPH_RDLOCK
+raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
+ int64_t offset, int64_t len)
+{
+return bdrv_co_zone_mgmt(bs->file->bs, op, offset, len);
+}
+
 static int64_t coroutine_fn GRAPH_RDLOCK
 raw_co_getlength(BlockDriverState *bs)
 {
@@ -619,6 +634,8 @@ BlockDriver bdrv_raw = {
 .bdrv_co_pwritev  = _co_pwritev,
 .bdrv_co_pwrite_zeroes = _co_pwrite_zeroes,
 .bdrv_co_pdiscard = _co_pdiscard,
+.bdrv_co_zone_report  = _co_zone_report,
+.bdrv_co_zone_mgmt  = _co_zone_mgmt,
 .bdrv_co_block_status = _co_block_status,
 .bdrv_co_copy_range_from = _co_copy_range_from,
 .bdrv_co_copy_range_to  = _co_copy_range_to,
-- 
2.40.0




[PATCH v19 8/8] docs/zoned-storage: add zoned device documentation

2023-04-27 Thread Sam Li
Add the documentation about the zoned device support to virtio-blk
emulation.

Signed-off-by: Sam Li 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Damien Le Moal 
Reviewed-by: Dmitry Fomichev 
Acked-by: Kevin Wolf 
Message-id: 20230324090605.28361-9-faithilike...@gmail.com
[Add index-api.rst to fix "zoned-storage.rst:document isn't included in
any toctree" error.
--Stefan]
Signed-off-by: Stefan Hajnoczi 
---
 docs/devel/index-api.rst   |  1 +
 docs/devel/zoned-storage.rst   | 43 ++
 docs/system/qemu-block-drivers.rst.inc |  6 
 3 files changed, 50 insertions(+)
 create mode 100644 docs/devel/zoned-storage.rst

diff --git a/docs/devel/index-api.rst b/docs/devel/index-api.rst
index 60c0d7459d..7108821746 100644
--- a/docs/devel/index-api.rst
+++ b/docs/devel/index-api.rst
@@ -12,3 +12,4 @@ generated from in-code annotations to function prototypes.
memory
modules
ui
+   zoned-storage
diff --git a/docs/devel/zoned-storage.rst b/docs/devel/zoned-storage.rst
new file mode 100644
index 00..6a36133e51
--- /dev/null
+++ b/docs/devel/zoned-storage.rst
@@ -0,0 +1,43 @@
+=
+zoned-storage
+=
+
+Zoned Block Devices (ZBDs) divide the LBA space into block regions called zones
+that are larger than the LBA size. They can only allow sequential writes, which
+can reduce write amplification in SSDs, and potentially lead to higher
+throughput and increased capacity. More details about ZBDs can be found at:
+
+https://zonedstorage.io/docs/introduction/zoned-storage
+
+1. Block layer APIs for zoned storage
+-
+QEMU block layer supports three zoned storage models:
+- BLK_Z_HM: The host-managed zoned model only allows sequential writes access
+to zones. It supports ZBD-specific I/O commands that can be used by a host to
+manage the zones of a device.
+- BLK_Z_HA: The host-aware zoned model allows random write operations in
+zones, making it backward compatible with regular block devices.
+- BLK_Z_NONE: The non-zoned model has no zones support. It includes both
+regular and drive-managed ZBD devices. ZBD-specific I/O commands are not
+supported.
+
+The block device information resides inside BlockDriverState. QEMU uses
+BlockLimits struct(BlockDriverState::bl) that is continuously accessed by the
+block layer while processing I/O requests. A BlockBackend has a root pointer to
+a BlockDriverState graph(for example, raw format on top of file-posix). The
+zoned storage information can be propagated from the leaf BlockDriverState all
+the way up to the BlockBackend. If the zoned storage model in file-posix is
+set to BLK_Z_HM, then block drivers will declare support for zoned host device.
+
+The block layer APIs support commands needed for zoned storage devices,
+including report zones, four zone operations, and zone append.
+
+2. Emulating zoned storage controllers
+--
+When the BlockBackend's BlockLimits model reports a zoned storage device, users
+like the virtio-blk emulation or the qemu-io-cmds.c utility can use block layer
+APIs for zoned storage emulation or testing.
+
+For example, to test zone_report on a null_blk device using qemu-io is:
+$ path/to/qemu-io --image-opts -n driver=host_device,filename=/dev/nullb0
+-c "zrp offset nr_zones"
diff --git a/docs/system/qemu-block-drivers.rst.inc 
b/docs/system/qemu-block-drivers.rst.inc
index dfe5d2293d..105cb9679c 100644
--- a/docs/system/qemu-block-drivers.rst.inc
+++ b/docs/system/qemu-block-drivers.rst.inc
@@ -430,6 +430,12 @@ Hard disks
   you may corrupt your host data (use the ``-snapshot`` command
   line option or modify the device permissions accordingly).
 
+Zoned block devices
+  Zoned block devices can be passed through to the guest if the emulated 
storage
+  controller supports zoned storage. Use ``--blockdev host_device,
+  node-name=drive0,filename=/dev/nullb0,cache.direct=on`` to pass through
+  ``/dev/nullb0`` as ``drive0``.
+
 Windows
 ^^^
 
-- 
2.40.0




[PATCH v19 5/8] block: add zoned BlockDriver check to block layer

2023-04-27 Thread Sam Li
Putting zoned/non-zoned BlockDrivers on top of each other is not
allowed.

Signed-off-by: Sam Li 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Dmitry Fomichev 
Acked-by: Kevin Wolf 
Message-id: 20230324090605.28361-6-faithilike...@gmail.com
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
 and clarify that the check is about zoned
BlockDrivers.
--Stefan]
Signed-off-by: Stefan Hajnoczi 
---
 block.c  | 19 +++
 block/file-posix.c   | 12 
 block/raw-format.c   |  1 +
 include/block/block_int-common.h |  5 +
 4 files changed, 37 insertions(+)

diff --git a/block.c b/block.c
index 5ec1a3897e..f67317c2b9 100644
--- a/block.c
+++ b/block.c
@@ -7967,6 +7967,25 @@ void bdrv_add_child(BlockDriverState *parent_bs, 
BlockDriverState *child_bs,
 return;
 }
 
+/*
+ * Non-zoned block drivers do not follow zoned storage constraints
+ * (i.e. sequential writes to zones). Refuse mixing zoned and non-zoned
+ * drivers in a graph.
+ */
+if (!parent_bs->drv->supports_zoned_children &&
+child_bs->bl.zoned == BLK_Z_HM) {
+/*
+ * The host-aware model allows zoned storage constraints and random
+ * write. Allow mixing host-aware and non-zoned drivers. Using
+ * host-aware device as a regular device.
+ */
+error_setg(errp, "Cannot add a %s child to a %s parent",
+   child_bs->bl.zoned == BLK_Z_HM ? "zoned" : "non-zoned",
+   parent_bs->drv->supports_zoned_children ?
+   "support zoned children" : "not support zoned children");
+return;
+}
+
 if (!QLIST_EMPTY(_bs->parents)) {
 error_setg(errp, "The node %s already has a parent",
child_bs->node_name);
diff --git a/block/file-posix.c b/block/file-posix.c
index 3b6575d771..67d4ec6ac5 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -776,6 +776,18 @@ static int raw_open_common(BlockDriverState *bs, QDict 
*options,
 goto fail;
 }
 }
+#ifdef CONFIG_BLKZONED
+/*
+ * The kernel page cache does not reliably work for writes to SWR zones
+ * of zoned block device because it can not guarantee the order of writes.
+ */
+if ((bs->bl.zoned != BLK_Z_NONE) &&
+(!(s->open_flags & O_DIRECT))) {
+error_setg(errp, "The driver supports zoned devices, and it requires "
+ "cache.direct=on, which was not specified.");
+return -EINVAL; /* No host kernel page cache */
+}
+#endif
 
 if (S_ISBLK(st.st_mode)) {
 #ifdef __linux__
diff --git a/block/raw-format.c b/block/raw-format.c
index f167448462..1a1dce8da4 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -623,6 +623,7 @@ static void raw_child_perm(BlockDriverState *bs, BdrvChild 
*c,
 BlockDriver bdrv_raw = {
 .format_name  = "raw",
 .instance_size= sizeof(BDRVRawState),
+.supports_zoned_children = true,
 .bdrv_probe   = _probe,
 .bdrv_reopen_prepare  = _reopen_prepare,
 .bdrv_reopen_commit   = _reopen_commit,
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index 997d539890..3482cfa79e 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -137,6 +137,11 @@ struct BlockDriver {
  */
 bool is_format;
 
+/*
+ * Set to true if the BlockDriver supports zoned children.
+ */
+bool supports_zoned_children;
+
 /*
  * Drivers not implementing bdrv_parse_filename nor bdrv_open should have
  * this field set to true, except ones that are defined only by their
-- 
2.40.0




[PATCH v19 2/8] block/file-posix: introduce helper functions for sysfs attributes

2023-04-27 Thread Sam Li
Use get_sysfs_str_val() to get the string value of device
zoned model. Then get_sysfs_zoned_model() can convert it to
BlockZoneModel type of QEMU.

Use get_sysfs_long_val() to get the long value of zoned device
information.

Signed-off-by: Sam Li 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Damien Le Moal 
Reviewed-by: Dmitry Fomichev 
Acked-by: Kevin Wolf 
Message-id: 20230324090605.28361-3-faithilike...@gmail.com
[Adjust commit message prefix as suggested by Philippe Mathieu-Daudé
.
--Stefan]
Signed-off-by: Stefan Hajnoczi 
---
 block/file-posix.c   | 131 +++
 include/block/block_int-common.h |   3 +
 2 files changed, 100 insertions(+), 34 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index c7b723368e..ba15b10eee 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1202,60 +1202,121 @@ static int hdev_get_max_hw_transfer(int fd, struct 
stat *st)
 #endif
 }
 
-static int hdev_get_max_segments(int fd, struct stat *st)
-{
+/*
+ * Get a sysfs attribute value as character string.
+ */
+static int get_sysfs_str_val(struct stat *st, const char *attribute,
+ char **val) {
 #ifdef CONFIG_LINUX
-char buf[32];
-const char *end;
-char *sysfspath = NULL;
+g_autofree char *sysfspath = NULL;
 int ret;
-int sysfd = -1;
-long max_segments;
+size_t len;
 
-if (S_ISCHR(st->st_mode)) {
-if (ioctl(fd, SG_GET_SG_TABLESIZE, ) == 0) {
-return ret;
-}
+if (!S_ISBLK(st->st_mode)) {
 return -ENOTSUP;
 }
 
-if (!S_ISBLK(st->st_mode)) {
-return -ENOTSUP;
+sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/%s",
+major(st->st_rdev), minor(st->st_rdev),
+attribute);
+ret = g_file_get_contents(sysfspath, val, , NULL);
+if (ret == -1) {
+return -ENOENT;
 }
 
-sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/max_segments",
-major(st->st_rdev), minor(st->st_rdev));
-sysfd = open(sysfspath, O_RDONLY);
-if (sysfd == -1) {
-ret = -errno;
-goto out;
+/* The file is ended with '\n' */
+char *p;
+p = *val;
+if (*(p + len - 1) == '\n') {
+*(p + len - 1) = '\0';
 }
-ret = RETRY_ON_EINTR(read(sysfd, buf, sizeof(buf) - 1));
+return ret;
+#else
+return -ENOTSUP;
+#endif
+}
+
+static int get_sysfs_zoned_model(struct stat *st, BlockZoneModel *zoned)
+{
+g_autofree char *val = NULL;
+int ret;
+
+ret = get_sysfs_str_val(st, "zoned", );
 if (ret < 0) {
-ret = -errno;
-goto out;
-} else if (ret == 0) {
-ret = -EIO;
-goto out;
+return ret;
 }
-buf[ret] = 0;
-/* The file is ended with '\n', pass 'end' to accept that. */
-ret = qemu_strtol(buf, , 10, _segments);
-if (ret == 0 && end && *end == '\n') {
-ret = max_segments;
+
+if (strcmp(val, "host-managed") == 0) {
+*zoned = BLK_Z_HM;
+} else if (strcmp(val, "host-aware") == 0) {
+*zoned = BLK_Z_HA;
+} else if (strcmp(val, "none") == 0) {
+*zoned = BLK_Z_NONE;
+} else {
+return -ENOTSUP;
 }
+return 0;
+}
 
-out:
-if (sysfd != -1) {
-close(sysfd);
+/*
+ * Get a sysfs attribute value as a long integer.
+ */
+static long get_sysfs_long_val(struct stat *st, const char *attribute)
+{
+#ifdef CONFIG_LINUX
+g_autofree char *str = NULL;
+const char *end;
+long val;
+int ret;
+
+ret = get_sysfs_str_val(st, attribute, );
+if (ret < 0) {
+return ret;
+}
+
+/* The file is ended with '\n', pass 'end' to accept that. */
+ret = qemu_strtol(str, , 10, );
+if (ret == 0 && end && *end == '\0') {
+ret = val;
 }
-g_free(sysfspath);
 return ret;
 #else
 return -ENOTSUP;
 #endif
 }
 
+static int hdev_get_max_segments(int fd, struct stat *st)
+{
+#ifdef CONFIG_LINUX
+int ret;
+
+if (S_ISCHR(st->st_mode)) {
+if (ioctl(fd, SG_GET_SG_TABLESIZE, ) == 0) {
+return ret;
+}
+return -ENOTSUP;
+}
+return get_sysfs_long_val(st, "max_segments");
+#else
+return -ENOTSUP;
+#endif
+}
+
+static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
+ Error **errp)
+{
+BlockZoneModel zoned;
+int ret;
+
+bs->bl.zoned = BLK_Z_NONE;
+
+ret = get_sysfs_zoned_model(st, );
+if (ret < 0 || zoned == BLK_Z_NONE) {
+return;
+}
+bs->bl.zoned = zoned;
+}
+
 static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
 {
 BDRVRawState *s = bs->opaque;
@@ -1297,6 +1358,8 @@ static void raw_refresh_limits(BlockDriverState *bs, 
Error **errp)
 bs->bl.max_hw_iov = ret;
 }
 }
+
+raw_refresh_zoned_limits(bs, , errp);
 }
 
 static int check_for_dasd(int 

Re: [PATCH v20 03/21] target/s390x/cpu topology: handle STSI(15) and build the SYSIB

2023-04-27 Thread Thomas Huth

On 25/04/2023 18.14, Pierre Morel wrote:

On interception of STSI(15.1.x) the System Information Block
(SYSIB) is built from the list of pre-ordered topology entries.

Signed-off-by: Pierre Morel 
---
  MAINTAINERS |   1 +
  include/hw/s390x/cpu-topology.h |  24 +++
  include/hw/s390x/sclp.h |   1 +
  target/s390x/cpu.h  |  72 
  hw/s390x/cpu-topology.c |  13 +-
  target/s390x/kvm/cpu_topology.c | 308 
  target/s390x/kvm/kvm.c  |   5 +-
  target/s390x/kvm/meson.build|   3 +-
  8 files changed, 424 insertions(+), 3 deletions(-)
  create mode 100644 target/s390x/kvm/cpu_topology.c

diff --git a/MAINTAINERS b/MAINTAINERS
index bb7b34d0d8..de9052f753 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1659,6 +1659,7 @@ M: Pierre Morel 
  S: Supported
  F: include/hw/s390x/cpu-topology.h
  F: hw/s390x/cpu-topology.c
+F: target/s390x/kvm/cpu_topology.c


It's somewhat weird to have one file "cpu-topology.c" (in hw/s390x, with a 
dash), and one file cpu_topology.c (in target/s390x, with an underscore) ... 
could you come up with a better naming? Maybe call the new file 
stsi-topology.c or so?



diff --git a/target/s390x/cpu.h b/target/s390x/cpu.h
index bb7cfb0cab..9f97989bd7 100644
--- a/target/s390x/cpu.h
+++ b/target/s390x/cpu.h
@@ -561,6 +561,25 @@ typedef struct SysIB_322 {
  } SysIB_322;
  QEMU_BUILD_BUG_ON(sizeof(SysIB_322) != 4096);



Maybe add a short comment here what MAG stands for (magnitude fields?)?

+#define S390_TOPOLOGY_MAG  6
+#define S390_TOPOLOGY_MAG6 0
+#define S390_TOPOLOGY_MAG5 1
+#define S390_TOPOLOGY_MAG4 2
+#define S390_TOPOLOGY_MAG3 3
+#define S390_TOPOLOGY_MAG2 4
+#define S390_TOPOLOGY_MAG1 5
+/* Configuration topology */
+typedef struct SysIB_151x {
+uint8_t  reserved0[2];
+uint16_t length;
+uint8_t  mag[S390_TOPOLOGY_MAG];
+uint8_t  reserved1;
+uint8_t  mnest;
+uint32_t reserved2;
+char tle[];
+} SysIB_151x;
+QEMU_BUILD_BUG_ON(sizeof(SysIB_151x) != 16);

...


diff --git a/target/s390x/kvm/cpu_topology.c b/target/s390x/kvm/cpu_topology.c
new file mode 100644
index 00..86a286afe2
--- /dev/null
+++ b/target/s390x/kvm/cpu_topology.c
@@ -0,0 +1,308 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * QEMU S390x CPU Topology
+ *
+ * Copyright IBM Corp. 2022,2023
+ * Author(s): Pierre Morel 
+ *
+ */
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "hw/s390x/pv.h"
+#include "hw/sysbus.h"
+#include "hw/s390x/sclp.h"
+#include "hw/s390x/cpu-topology.h"
+
+/**
+ * fill_container:
+ * @p: The address of the container TLE to fill
+ * @level: The level of nesting for this container
+ * @id: The container receives a uniq ID inside its own container


s/uniq/unique/

 Thomas




Re: [PATCH 03/20] block: bdrv/blk_co_unref() for calls in coroutine context

2023-04-27 Thread Kevin Wolf
Am 27.04.2023 um 16:30 hat Paolo Bonzini geschrieben:
> Il mar 25 apr 2023, 19:32 Kevin Wolf  ha scritto:
> 
> > These functions must not be called in coroutine context, because they
> > need write access to the graph.
> >
> 
> With these patches applied vrc is still complaining about calls to
> bdrv_unref_child from qcow2_do_open and qcow2_do_close.

bdrv_unref_child() is addressed in one of the patches that I'll probably
send in the next batch, so it should be covered without additional work.

> Otherwise, the situation looks pretty good.

Thanks for checking!

By the way, and slightly unrelated, can vrc somehow help with finding
places that call coroutine wrappers without holding the AioContext lock?
(This results in an abort() when AIO_WAIT_WHILE() tries to unlock the
AioContext.) This is one of the classes of bugs we're seeing in 8.0.

Kevin




[PATCH 12/19] migration/rdma: It makes no sense to recive that flag without RDMA

2023-04-27 Thread Juan Quintela
This could only happen if the source send
RAM_SAVE_FLAG_HOOK (i.e. rdma) and destination don't have CONFIG_RDMA.

Signed-off-by: Juan Quintela 
---
 migration/qemu-file.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 9b5e14a2ef..014db96984 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -322,14 +322,6 @@ void ram_control_load_hook(QEMUFile *f, uint64_t flags, 
void *data)
 if (ret < 0) {
 qemu_file_set_error(f, ret);
 }
-} else {
-/*
- * Hook is a hook specifically requested by the source sending a flag
- * that expects there to be a hook on the destination.
- */
-if (flags == RAM_CONTROL_HOOK) {
-qemu_file_set_error(f, -EINVAL);
-}
 }
 }
 
-- 
2.40.0




[PATCH 19/19] migration/rdma: Move rdma constants from qemu-file.h to rdma.h

2023-04-27 Thread Juan Quintela
Signed-off-by: Juan Quintela 
---
 migration/qemu-file.h | 17 -
 migration/ram.c   |  2 +-
 migration/rdma.h  | 16 
 3 files changed, 17 insertions(+), 18 deletions(-)

diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index 9c99914b21..5129b6f196 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -29,13 +29,6 @@
 #include "exec/cpu-common.h"
 #include "io/channel.h"
 
-/*
- * Constants used by ram_control_* hooks
- */
-#define RAM_CONTROL_SETUP 0
-#define RAM_CONTROL_ROUND 1
-#define RAM_CONTROL_FINISH3
-
 QEMUFile *qemu_file_new_input(QIOChannel *ioc);
 QEMUFile *qemu_file_new_output(QIOChannel *ioc);
 int qemu_fclose(QEMUFile *f);
@@ -123,16 +116,6 @@ void qemu_fflush(QEMUFile *f);
 void qemu_file_set_blocking(QEMUFile *f, bool block);
 int qemu_file_get_to_fd(QEMUFile *f, int fd, size_t size);
 
-/* Whenever this is found in the data stream, the flags
- * will be passed to ram_control_load_hook in the incoming-migration
- * side. This lets before_ram_iterate/after_ram_iterate add
- * transport-specific sections to the RAM migration data.
- */
-#define RAM_SAVE_FLAG_HOOK 0x80
-
-#define RAM_SAVE_CONTROL_NOT_SUPP -1000
-#define RAM_SAVE_CONTROL_DELAYED  -2000
-
 QIOChannel *qemu_file_get_ioc(QEMUFile *file);
 
 #endif
diff --git a/migration/ram.c b/migration/ram.c
index a085ce8cae..ac2296d740 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -86,7 +86,7 @@
 #define RAM_SAVE_FLAG_EOS  0x10
 #define RAM_SAVE_FLAG_CONTINUE 0x20
 #define RAM_SAVE_FLAG_XBZRLE   0x40
-/* 0x80 is reserved in qemu-file.h for RAM_SAVE_FLAG_HOOK */
+/* 0x80 is reserved in rdma.h for RAM_SAVE_FLAG_HOOK */
 #define RAM_SAVE_FLAG_COMPRESS_PAGE0x100
 #define RAM_SAVE_FLAG_MULTIFD_FLUSH0x200
 /* We can't use any flag that is bigger than 0x200 */
diff --git a/migration/rdma.h b/migration/rdma.h
index ed3650ef67..96ec2cc8f0 100644
--- a/migration/rdma.h
+++ b/migration/rdma.h
@@ -24,6 +24,22 @@ void rdma_start_outgoing_migration(void *opaque, const char 
*host_port,
 
 void rdma_start_incoming_migration(const char *host_port, Error **errp);
 
+/*
+ * Constants used by rdma return codes
+ */
+#define RAM_CONTROL_SETUP 0
+#define RAM_CONTROL_ROUND 1
+#define RAM_CONTROL_FINISH3
+
+/*
+ * Whenever this is found in the data stream, the flags
+ * will be passed to rdma functions in the incoming-migration
+ * side.
+ */
+#define RAM_SAVE_FLAG_HOOK 0x80
+
+#define RAM_SAVE_CONTROL_NOT_SUPP -1000
+#define RAM_SAVE_CONTROL_DELAYED  -2000
 
 #ifdef CONFIG_RDMA
 int qemu_rdma_registration_handle(QEMUFile *f);
-- 
2.40.0




[PATCH 03/19] migration: Rename ram_counters to mig_stats

2023-04-27 Thread Juan Quintela
migration_stats is just too long, and it is going to have more than
ram counters in the near future.

Signed-off-by: Juan Quintela 
---
 migration/migration-stats.c |  2 +-
 migration/migration-stats.h |  2 +-
 migration/migration.c   | 32 -
 migration/multifd.c |  6 ++---
 migration/ram.c | 48 ++---
 migration/savevm.c  |  2 +-
 6 files changed, 46 insertions(+), 46 deletions(-)

diff --git a/migration/migration-stats.c b/migration/migration-stats.c
index b0eb5ae73c..8c0af9b80a 100644
--- a/migration/migration-stats.c
+++ b/migration/migration-stats.c
@@ -14,4 +14,4 @@
 #include "qemu/stats64.h"
 #include "migration-stats.h"
 
-RAMStats ram_counters;
+RAMStats mig_stats;
diff --git a/migration/migration-stats.h b/migration/migration-stats.h
index 2edea0c779..197374b4f6 100644
--- a/migration/migration-stats.h
+++ b/migration/migration-stats.h
@@ -36,6 +36,6 @@ typedef struct {
 Stat64 transferred;
 } RAMStats;
 
-extern RAMStats ram_counters;
+extern RAMStats mig_stats;
 
 #endif
diff --git a/migration/migration.c b/migration/migration.c
index 5ecf3dc381..feb5ab7493 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -909,26 +909,26 @@ static void populate_ram_info(MigrationInfo *info, 
MigrationState *s)
 size_t page_size = qemu_target_page_size();
 
 info->ram = g_malloc0(sizeof(*info->ram));
-info->ram->transferred = stat64_get(_counters.transferred);
+info->ram->transferred = stat64_get(_stats.transferred);
 info->ram->total = ram_bytes_total();
-info->ram->duplicate = stat64_get(_counters.zero_pages);
+info->ram->duplicate = stat64_get(_stats.zero_pages);
 /* legacy value.  It is not used anymore */
 info->ram->skipped = 0;
-info->ram->normal = stat64_get(_counters.normal_pages);
+info->ram->normal = stat64_get(_stats.normal_pages);
 info->ram->normal_bytes = info->ram->normal * page_size;
 info->ram->mbps = s->mbps;
 info->ram->dirty_sync_count =
-stat64_get(_counters.dirty_sync_count);
+stat64_get(_stats.dirty_sync_count);
 info->ram->dirty_sync_missed_zero_copy =
-stat64_get(_counters.dirty_sync_missed_zero_copy);
+stat64_get(_stats.dirty_sync_missed_zero_copy);
 info->ram->postcopy_requests =
-stat64_get(_counters.postcopy_requests);
+stat64_get(_stats.postcopy_requests);
 info->ram->page_size = page_size;
-info->ram->multifd_bytes = stat64_get(_counters.multifd_bytes);
+info->ram->multifd_bytes = stat64_get(_stats.multifd_bytes);
 info->ram->pages_per_second = s->pages_per_second;
-info->ram->precopy_bytes = stat64_get(_counters.precopy_bytes);
-info->ram->downtime_bytes = stat64_get(_counters.downtime_bytes);
-info->ram->postcopy_bytes = stat64_get(_counters.postcopy_bytes);
+info->ram->precopy_bytes = stat64_get(_stats.precopy_bytes);
+info->ram->downtime_bytes = stat64_get(_stats.downtime_bytes);
+info->ram->postcopy_bytes = stat64_get(_stats.postcopy_bytes);
 
 if (migrate_xbzrle()) {
 info->xbzrle_cache = g_malloc0(sizeof(*info->xbzrle_cache));
@@ -960,7 +960,7 @@ static void populate_ram_info(MigrationInfo *info, 
MigrationState *s)
 if (s->state != MIGRATION_STATUS_COMPLETED) {
 info->ram->remaining = ram_bytes_remaining();
 info->ram->dirty_pages_rate =
-   stat64_get(_counters.dirty_pages_rate);
+   stat64_get(_stats.dirty_pages_rate);
 }
 }
 
@@ -1613,10 +1613,10 @@ static bool migrate_prepare(MigrationState *s, bool 
blk, bool blk_inc,
 
 migrate_init(s);
 /*
- * set ram_counters compression_counters memory to zero for a
+ * set mig_stats compression_counters memory to zero for a
  * new migration
  */
-memset(_counters, 0, sizeof(ram_counters));
+memset(_stats, 0, sizeof(mig_stats));
 memset(_counters, 0, sizeof(compression_counters));
 
 return true;
@@ -2627,7 +2627,7 @@ static MigThrError migration_detect_error(MigrationState 
*s)
 static uint64_t migration_total_bytes(MigrationState *s)
 {
 return qemu_file_total_transferred(s->to_dst_file) +
-stat64_get(_counters.multifd_bytes);
+stat64_get(_stats.multifd_bytes);
 }
 
 static void migration_calculate_complete(MigrationState *s)
@@ -2691,10 +2691,10 @@ static void migration_update_counters(MigrationState *s,
  * if we haven't sent anything, we don't want to
  * recalculate. 1 is a small enough number for our purposes
  */
-if (stat64_get(_counters.dirty_pages_rate) &&
+if (stat64_get(_stats.dirty_pages_rate) &&
 transferred > 1) {
 s->expected_downtime =
-stat64_get(_counters.dirty_bytes_last_sync) / bandwidth;
+stat64_get(_stats.dirty_bytes_last_sync) / bandwidth;
 }
 
 qemu_file_reset_rate_limit(s->to_dst_file);
diff --git a/migration/multifd.c b/migration/multifd.c
index 

  1   2   3   4   >