Re: [PATCH] target/loongarch: Split fcc register to fcc0-7 in gdbstub

2023-08-18 Thread gaosong

Hi, Alex

在 2023/8/9 上午2:40, Alex Bennée 写道:


bibo mao  writes:


I think that it is problem of loongarch gdb, rather qemu.
If so, everytime when gdb changes register layout, qemu need modify.
There should be compatible requirements between gdb client and gdb server.

Tiezhu,

what is your opition?


You can always register additional custom regsets which is what we do
for the extended Aarch64 regs. See ->gdb_get_dynamic_xml


Thanks for you suggestions. we will use this method for vector extented.

For this patch:
Acked-by: Song Gao 

Thanks.
Song Gao



Regards
Bibo Mao

在 2023/8/8 18:03, Jiajie Chen 写道:


On 2023/8/8 17:55, Jiajie Chen wrote:


On 2023/8/8 14:10, bibo mao wrote:

I am not familiar with gdb, is there  abi breakage?
I do not know how gdb client works with gdb server with different versions.

There seemed no versioning in the process, but rather in-code xml
validation. In gdb, the code only allows new xml (fcc0-7) and
rejects old one (fcc), so gdb breaks qemu first and do not consider
backward compatibility with qemu.


Not abi breakage, but gdb will complain:

warning: while parsing target description (at line 1): Target
description specified unknown architecture "loongarch64"
warning: Could not load XML target description; ignoring
warning: No executable has been specified and target does not support
determining executable automatically.  Try using the "file" command.
Truncated register 38 in remote 'g' packet


Sorry, to be clear, the actual error message is:

(gdb) target extended-remote localhost:1234
Remote debugging using localhost:1234
warning: Architecture rejected target-supplied description
warning: No executable has been specified and target does not support

It rejects the target description xml given by qemu, thus using the
builtin one. However, there is a mismatch in fcc registers, so it
will not work if we list floating point registers.

At the same time, if we are using loongarch32 target(I recently
posted patches to support this), it will reject the target
description and fallback to loongarch64, making gcc not usable.



And gdb can no longer debug kernel running in qemu. You can
reproduce this error using latest qemu(without this patch) and
gdb(13.1 or later).



Regards
Bibo Mao


在 2023/8/8 13:42, Jiajie Chen 写道:

Since GDB 13.1(GDB commit ea3352172), GDB LoongArch changed to use
fcc0-7 instead of fcc register. This commit partially reverts commit
2f149c759 (`target/loongarch: Update gdb_set_fpu() and gdb_get_fpu()`)
to match the behavior of GDB.

Note that it is a breaking change for GDB 13.0 or earlier, but it is
also required for GDB 13.1 or later to work.

Signed-off-by: Jiajie Chen 
---
   gdb-xml/loongarch-fpu.xml  |  9 -
   target/loongarch/gdbstub.c | 16 +++-
   2 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/gdb-xml/loongarch-fpu.xml b/gdb-xml/loongarch-fpu.xml
index 78e42cf5dd..e81e3382e7 100644
--- a/gdb-xml/loongarch-fpu.xml
+++ b/gdb-xml/loongarch-fpu.xml
@@ -45,6 +45,13 @@
     
     
     
-  
+  
+  
+  
+  
+  
+  
+  
+  
     
   
diff --git a/target/loongarch/gdbstub.c b/target/loongarch/gdbstub.c
index 0752fff924..15ad6778f1 100644
--- a/target/loongarch/gdbstub.c
+++ b/target/loongarch/gdbstub.c
@@ -70,10 +70,9 @@ static int loongarch_gdb_get_fpu(CPULoongArchState *env,
   {
   if (0 <= n && n < 32) {
   return gdb_get_reg64(mem_buf, env->fpr[n].vreg.D(0));
-    } else if (n == 32) {
-    uint64_t val = read_fcc(env);
-    return gdb_get_reg64(mem_buf, val);
-    } else if (n == 33) {
+    } else if (32 <= n && n < 40) {
+    return gdb_get_reg8(mem_buf, env->cf[n - 32]);
+    } else if (n == 40) {
   return gdb_get_reg32(mem_buf, env->fcsr0);
   }
   return 0;
@@ -87,11 +86,10 @@ static int loongarch_gdb_set_fpu(CPULoongArchState *env,
   if (0 <= n && n < 32) {
   env->fpr[n].vreg.D(0) = ldq_p(mem_buf);
   length = 8;
-    } else if (n == 32) {
-    uint64_t val = ldq_p(mem_buf);
-    write_fcc(env, val);
-    length = 8;
-    } else if (n == 33) {
+    } else if (32 <= n && n < 40) {
+    env->cf[n - 32] = ldub_p(mem_buf);
+    length = 1;
+    } else if (n == 40) {
   env->fcsr0 = ldl_p(mem_buf);
   length = 4;
   }








Re: [PATCH v9 6/9] gfxstream + rutabaga: add initial support for gfxstream

2023-08-18 Thread Akihiko Odaki

On 2023/08/19 10:12, Gurchetan Singh wrote:

This adds initial support for gfxstream and cross-domain.  Both
features rely on virtio-gpu blob resources and context types, which
are also implemented in this patch.

gfxstream has a long and illustrious history in Android graphics
paravirtualization.  It has been powering graphics in the Android
Studio Emulator for more than a decade, which is the main developer
platform.

Originally conceived by Jesse Hall, it was first known as "EmuGL" [a].
The key design characteristic was a 1:1 threading model and
auto-generation, which fit nicely with the OpenGLES spec.  It also
allowed easy layering with ANGLE on the host, which provides the GLES
implementations on Windows or MacOS enviroments.

gfxstream has traditionally been maintained by a single engineer, and
between 2015 to 2021, the goldfish throne passed to Frank Yang.
Historians often remark this glorious reign ("pax gfxstreama" is the
academic term) was comparable to that of Augustus and both Queen
Elizabeths.  Just to name a few accomplishments in a resplendent
panoply: higher versions of GLES, address space graphics, snapshot
support and CTS compliant Vulkan [b].

One major drawback was the use of out-of-tree goldfish drivers.
Android engineers didn't know much about DRM/KMS and especially TTM so
a simple guest to host pipe was conceived.

Luckily, virtio-gpu 3D started to emerge in 2016 due to the work of
the Mesa/virglrenderer communities.  In 2018, the initial virtio-gpu
port of gfxstream was done by Cuttlefish enthusiast Alistair Delva.
It was a symbol compatible replacement of virglrenderer [c] and named
"AVDVirglrenderer".  This implementation forms the basis of the
current gfxstream host implementation still in use today.

cross-domain support follows a similar arc.  Originally conceived by
Wayland aficionado David Reveman and crosvm enjoyer Zach Reizner in
2018, it initially relied on the downstream "virtio-wl" device.

In 2020 and 2021, virtio-gpu was extended to include blob resources
and multiple timelines by yours truly, features gfxstream/cross-domain
both require to function correctly.

Right now, we stand at the precipice of a truly fantastic possibility:
the Android Emulator powered by upstream QEMU and upstream Linux
kernel.  gfxstream will then be packaged properfully, and app
developers can even fix gfxstream bugs on their own if they encounter
them.

It's been quite the ride, my friends.  Where will gfxstream head next,
nobody really knows.  I wouldn't be surprised if it's around for
another decade, maintained by a new generation of Android graphics
enthusiasts.

Technical details:
   - Very simple initial display integration: just used Pixman
   - Largely, 1:1 mapping of virtio-gpu hypercalls to rutabaga function
 calls

Next steps for Android VMs:
   - The next step would be improving display integration and UI interfaces
 with the goal of the QEMU upstream graphics being in an emulator
 release [d].

Next steps for Linux VMs for display virtualization:
   - For widespread distribution, someone needs to package Sommelier or the
 wayland-proxy-virtwl [e] ideally into Debian main. In addition, newer
 versions of the Linux kernel come with DRM_VIRTIO_GPU_KMS option,
 which allows disabling KMS hypercalls.  If anyone cares enough, it'll
 probably be possible to build a custom VM variant that uses this display
 virtualization strategy.

[a] https://android-review.googlesource.com/c/platform/development/+/34470
[b] 
https://android-review.googlesource.com/q/topic:%22vulkan-hostconnection-start%22
[c] 
https://android-review.googlesource.com/c/device/generic/goldfish-opengl/+/761927
[d] https://developer.android.com/studio/releases/emulator
[e] https://github.com/talex5/wayland-proxy-virtwl

Signed-off-by: Gurchetan Singh 
Tested-by: Alyssa Ross 
Tested-by: Emmanouil Pitsidianakis 
Reviewed-by: Emmanouil Pitsidianakis 
---
v1: Incorported various suggestions by Akihiko Odaki and Bernard Berschow
 - Removed GET_VIRTIO_GPU_GL / GET_RUTABAGA macros
 - Used error_report(..)
 - Used g_autofree to fix leaks on error paths
 - Removed unnecessary casts
 - added virtio-gpu-pci-rutabaga.c + virtio-vga-rutabaga.c files

v2: Incorported various suggestions by Akihiko Odaki, Marc-André Lureau and
 Bernard Berschow:
 - Parenthesis in CHECK macro
 - CHECK_RESULT(result, ..) --> CHECK(!result, ..)
 - delay until g->parent_obj.enable = 1
 - Additional cast fixes
 - initialize directly in virtio_gpu_rutabaga_realize(..)
 - add debug callback to hook into QEMU error's APIs

v3: Incorporated feedback from Akihiko Odaki and Alyssa Ross:
 - Autodetect Wayland socket when not explicitly specified
 - Fix map_blob error paths
 - Add comment why we need both `res` and `resource` in create blob
 - Cast and whitespace fixes
 - Big endian check comes before virtio_gpu_rutabaga_init().
 - VirtIOVGARUTABAGA --> VirtIOVGA

Re: [PATCH v7 9/9] docs/system: add basic virtio-gpu documentation

2023-08-18 Thread Akihiko Odaki

On 2023/08/19 10:17, Gurchetan Singh wrote:



On Fri, Aug 18, 2023 at 5:08 AM Akihiko Odaki > wrote:


On 2023/08/18 8:47, Gurchetan Singh wrote:
 >
 >
 > On Wed, Aug 16, 2023 at 10:28 PM Akihiko Odaki
mailto:akihiko.od...@gmail.com>
 > >> wrote:
 >
 >     On 2023/08/17 11:23, Gurchetan Singh wrote:
 >      > From: Gurchetan Singh mailto:gurchetansi...@chromium.org>
 >     >>
 >      >
 >      > This adds basic documentation for virtio-gpu.
 >      >
 >      > Suggested-by: Akihiko Odaki mailto:akihiko.od...@daynix.com>
 >     >>
 >      > Signed-off-by: Gurchetan Singh
mailto:gurchetansi...@chromium.org>
 >     >>
 >      > Tested-by: Alyssa Ross mailto:h...@alyssa.is>
>>
 >      > Tested-by: Emmanouil Pitsidianakis
 >     mailto:manos.pitsidiana...@linaro.org>
>>
 >      > Reviewed-by: Emmanouil Pitsidianakis
 >     mailto:manos.pitsidiana...@linaro.org>
>>
 >      > ---
 >      > v2: - Incorporated suggestions by Akihiko Odaki
 >      >      - Listed the currently supported capset_names (Bernard)
 >      >
 >      > v3: - Incorporated suggestions by Akihiko Odaki and Alyssa
Ross
 >      >
 >      > v4: - Incorporated suggestions by Akihiko Odaki
 >      >
 >      > v5: - Removed pci suffix from examples
 >      >      - Verified that -device virtio-gpu-rutabaga works. 
Strangely

 >      >        enough, I don't remember changing anything, and I
remember
 >      >        it not working.  I did rebase to top of tree though.
 >      >      - Fixed meson examples in crosvm docs
 >      >
 >      >   docs/system/device-emulation.rst   |   1 +
 >      >   docs/system/devices/virtio-gpu.rst | 113
 >     +
 >      >   2 files changed, 114 insertions(+)
 >      >   create mode 100644 docs/system/devices/virtio-gpu.rst
 >      >
 >      > diff --git a/docs/system/device-emulation.rst
 >     b/docs/system/device-emulation.rst
 >      > index 4491c4cbf7..1167f3a9f2 100644
 >      > --- a/docs/system/device-emulation.rst
 >      > +++ b/docs/system/device-emulation.rst
 >      > @@ -91,6 +91,7 @@ Emulated Devices
 >      >      devices/nvme.rst
 >      >      devices/usb.rst
 >      >      devices/vhost-user.rst
 >      > +   devices/virtio-gpu.rst
 >      >      devices/virtio-pmem.rst
 >      >      devices/vhost-user-rng.rst
 >      >      devices/canokey.rst
 >      > diff --git a/docs/system/devices/virtio-gpu.rst
 >     b/docs/system/devices/virtio-gpu.rst
 >      > new file mode 100644
 >      > index 00..8c5c708272
 >      > --- /dev/null
 >      > +++ b/docs/system/devices/virtio-gpu.rst
 >      > @@ -0,0 +1,113 @@
 >      > +..
 >      > +   SPDX-License-Identifier: GPL-2.0
 >      > +
 >      > +virtio-gpu
 >      > +==
 >      > +
 >      > +This document explains the setup and usage of the
virtio-gpu device.
 >      > +The virtio-gpu device paravirtualizes the GPU and display
 >     controller.
 >      > +
 >      > +Linux kernel support
 >      > +
 >      > +
 >      > +virtio-gpu requires a guest Linux kernel built with the
 >      > +``CONFIG_DRM_VIRTIO_GPU`` option.
 >      > +
 >      > +QEMU virtio-gpu variants
 >      > +
 >      > +
 >      > +QEMU virtio-gpu device variants come in the following form:
 >      > +
 >      > + * ``virtio-vga[-BACKEND]``
 >      > + * ``virtio-gpu[-BACKEND][-INTERFACE]``
 >      > + * ``vhost-user-vga``
 >      > + * ``vhost-user-pci``
 >      > +
 >      > +**Backends:** QEMU provides a 2D virtio-gpu backend, and two
 >     accelerated
 >      > +backends: virglrenderer ('gl' device label) and rutabaga_gfx
 >     ('rutabaga'
 >      > +device label).  There is a vhost-user backend that runs the
 >     graphics stack
 >      > +in a separate process for improved isolation.
 >      > +
 >      > +**Interfaces:** QEMU further categorizes virtio-gpu device
 >     variants based
 >      > +on the interface exposed to the guest. The interfaces can be
 >     classified
 >      > +into VGA and non-VGA variants. The VGA ones are pre

Re: [PATCH v7 9/9] docs/system: add basic virtio-gpu documentation

2023-08-18 Thread Gurchetan Singh
On Fri, Aug 18, 2023 at 5:08 AM Akihiko Odaki 
wrote:

> On 2023/08/18 8:47, Gurchetan Singh wrote:
> >
> >
> > On Wed, Aug 16, 2023 at 10:28 PM Akihiko Odaki  > > wrote:
> >
> > On 2023/08/17 11:23, Gurchetan Singh wrote:
> >  > From: Gurchetan Singh  > >
> >  >
> >  > This adds basic documentation for virtio-gpu.
> >  >
> >  > Suggested-by: Akihiko Odaki  > >
> >  > Signed-off-by: Gurchetan Singh  > >
> >  > Tested-by: Alyssa Ross mailto:h...@alyssa.is>>
> >  > Tested-by: Emmanouil Pitsidianakis
> >  manos.pitsidiana...@linaro.org>>
> >  > Reviewed-by: Emmanouil Pitsidianakis
> >  manos.pitsidiana...@linaro.org>>
> >  > ---
> >  > v2: - Incorporated suggestions by Akihiko Odaki
> >  >  - Listed the currently supported capset_names (Bernard)
> >  >
> >  > v3: - Incorporated suggestions by Akihiko Odaki and Alyssa Ross
> >  >
> >  > v4: - Incorporated suggestions by Akihiko Odaki
> >  >
> >  > v5: - Removed pci suffix from examples
> >  >  - Verified that -device virtio-gpu-rutabaga works.  Strangely
> >  >enough, I don't remember changing anything, and I remember
> >  >it not working.  I did rebase to top of tree though.
> >  >  - Fixed meson examples in crosvm docs
> >  >
> >  >   docs/system/device-emulation.rst   |   1 +
> >  >   docs/system/devices/virtio-gpu.rst | 113
> > +
> >  >   2 files changed, 114 insertions(+)
> >  >   create mode 100644 docs/system/devices/virtio-gpu.rst
> >  >
> >  > diff --git a/docs/system/device-emulation.rst
> > b/docs/system/device-emulation.rst
> >  > index 4491c4cbf7..1167f3a9f2 100644
> >  > --- a/docs/system/device-emulation.rst
> >  > +++ b/docs/system/device-emulation.rst
> >  > @@ -91,6 +91,7 @@ Emulated Devices
> >  >  devices/nvme.rst
> >  >  devices/usb.rst
> >  >  devices/vhost-user.rst
> >  > +   devices/virtio-gpu.rst
> >  >  devices/virtio-pmem.rst
> >  >  devices/vhost-user-rng.rst
> >  >  devices/canokey.rst
> >  > diff --git a/docs/system/devices/virtio-gpu.rst
> > b/docs/system/devices/virtio-gpu.rst
> >  > new file mode 100644
> >  > index 00..8c5c708272
> >  > --- /dev/null
> >  > +++ b/docs/system/devices/virtio-gpu.rst
> >  > @@ -0,0 +1,113 @@
> >  > +..
> >  > +   SPDX-License-Identifier: GPL-2.0
> >  > +
> >  > +virtio-gpu
> >  > +==
> >  > +
> >  > +This document explains the setup and usage of the virtio-gpu
> device.
> >  > +The virtio-gpu device paravirtualizes the GPU and display
> > controller.
> >  > +
> >  > +Linux kernel support
> >  > +
> >  > +
> >  > +virtio-gpu requires a guest Linux kernel built with the
> >  > +``CONFIG_DRM_VIRTIO_GPU`` option.
> >  > +
> >  > +QEMU virtio-gpu variants
> >  > +
> >  > +
> >  > +QEMU virtio-gpu device variants come in the following form:
> >  > +
> >  > + * ``virtio-vga[-BACKEND]``
> >  > + * ``virtio-gpu[-BACKEND][-INTERFACE]``
> >  > + * ``vhost-user-vga``
> >  > + * ``vhost-user-pci``
> >  > +
> >  > +**Backends:** QEMU provides a 2D virtio-gpu backend, and two
> > accelerated
> >  > +backends: virglrenderer ('gl' device label) and rutabaga_gfx
> > ('rutabaga'
> >  > +device label).  There is a vhost-user backend that runs the
> > graphics stack
> >  > +in a separate process for improved isolation.
> >  > +
> >  > +**Interfaces:** QEMU further categorizes virtio-gpu device
> > variants based
> >  > +on the interface exposed to the guest. The interfaces can be
> > classified
> >  > +into VGA and non-VGA variants. The VGA ones are prefixed with
> > virtio-vga
> >  > +or vhost-user-vga while the non-VGA ones are prefixed with
> > virtio-gpu or
> >  > +vhost-user-gpu.
> >  > +
> >  > +The VGA ones always use the PCI interface, but for the non-VGA
> > ones, the
> >  > +user can further pick between MMIO or PCI. For MMIO, the user
> > can suffix
> >  > +the device name with -device, though vhost-user-gpu does not
> > support MMIO.
> >  > +For PCI, the user can suffix it with -pci. Without these
> > suffixes, the
> >  > +platform default will be chosen.
> >  > +
> >  > +virtio-gpu 2d
> >  > +-
> >  > +
> >  > +The default 2D backend only performs 2D operations. The guest
> > needs to
> >  > +employ a software renderer for 3D graphics.
> >  > +
> >  > +Typically, the software renderer is provided by `Mesa`_ or
> > `SwiftShader`_.
> >  > +Mesa's implement

Re: [PATCH v7 8/9] gfxstream + rutabaga: enable rutabaga

2023-08-18 Thread Gurchetan Singh
On Fri, Aug 18, 2023 at 4:58 AM Akihiko Odaki 
wrote:

> On 2023/08/17 11:23, Gurchetan Singh wrote:
> > From: Gurchetan Singh 
> >
> > This change enables rutabaga to receive virtio-gpu-3d hypercalls
> > when it is active.
> >
> > Signed-off-by: Gurchetan Singh 
> > Tested-by: Alyssa Ross 
> > Tested-by: Emmanouil Pitsidianakis 
> > Reviewed-by: Emmanouil Pitsidianakis 
> > ---
> > v3: Whitespace fix (Akihiko)
> >
> >   hw/display/virtio-gpu-base.c | 3 ++-
> >   hw/display/virtio-gpu.c  | 5 +++--
> >   softmmu/qdev-monitor.c   | 3 +++
> >   softmmu/vl.c | 1 +
> >   4 files changed, 9 insertions(+), 3 deletions(-)
> >
> > diff --git a/hw/display/virtio-gpu-base.c b/hw/display/virtio-gpu-base.c
> > index 4f2b0ba1f3..50c5373b65 100644
> > --- a/hw/display/virtio-gpu-base.c
> > +++ b/hw/display/virtio-gpu-base.c
> > @@ -223,7 +223,8 @@ virtio_gpu_base_get_features(VirtIODevice *vdev,
> uint64_t features,
> >   {
> >   VirtIOGPUBase *g = VIRTIO_GPU_BASE(vdev);
> >
> > -if (virtio_gpu_virgl_enabled(g->conf)) {
> > +if (virtio_gpu_virgl_enabled(g->conf) ||
> > +virtio_gpu_rutabaga_enabled(g->conf)) {
> >   features |= (1 << VIRTIO_GPU_F_VIRGL);
> >   }
> >   if (virtio_gpu_edid_enabled(g->conf)) {
> > diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
> > index 3e658f1fef..08e170e029 100644
> > --- a/hw/display/virtio-gpu.c
> > +++ b/hw/display/virtio-gpu.c
> > @@ -1361,8 +1361,9 @@ void virtio_gpu_device_realize(DeviceState *qdev,
> Error **errp)
> >   VirtIOGPU *g = VIRTIO_GPU(qdev);
> >
> >   if (virtio_gpu_blob_enabled(g->parent_obj.conf)) {
> > -if (!virtio_gpu_have_udmabuf()) {
> > -error_setg(errp, "cannot enable blob resources without
> udmabuf");
> > +if (!virtio_gpu_have_udmabuf() &&
>
> virtio_gpu_have_udmabuf() emits a warning if udmabuf is not available,
> which is spurious when using Rutabaga.
>
> I think virtio_gpu_have_udmabuf() should be renamed to
> virtio_gpu_init_udmabuf() or something, let it set errp instead of
> emitting a warning, and call it only when Rutabaga is not in use.


Not too familiar with udmabuf case, so just reordered the if rutabaga check
to avoid the spurious warning.  Probably an udmabuf cleanups should in an
additional patch series.


> That
> clarifies the timing when an error message will be shown.
>


[PATCH v9 2/9] virtio-gpu: CONTEXT_INIT feature

2023-08-18 Thread Gurchetan Singh
From: Antonio Caggiano 

The feature can be enabled when a backend wants it.

Signed-off-by: Antonio Caggiano 
Reviewed-by: Marc-André Lureau 
Signed-off-by: Gurchetan Singh 
Tested-by: Alyssa Ross 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Akihiko Odaki 
---
 hw/display/virtio-gpu-base.c   | 3 +++
 include/hw/virtio/virtio-gpu.h | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/hw/display/virtio-gpu-base.c b/hw/display/virtio-gpu-base.c
index ca1fb7b16f..4f2b0ba1f3 100644
--- a/hw/display/virtio-gpu-base.c
+++ b/hw/display/virtio-gpu-base.c
@@ -232,6 +232,9 @@ virtio_gpu_base_get_features(VirtIODevice *vdev, uint64_t 
features,
 if (virtio_gpu_blob_enabled(g->conf)) {
 features |= (1 << VIRTIO_GPU_F_RESOURCE_BLOB);
 }
+if (virtio_gpu_context_init_enabled(g->conf)) {
+features |= (1 << VIRTIO_GPU_F_CONTEXT_INIT);
+}
 
 return features;
 }
diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index 390c4642b8..8377c365ef 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -93,6 +93,7 @@ enum virtio_gpu_base_conf_flags {
 VIRTIO_GPU_FLAG_EDID_ENABLED,
 VIRTIO_GPU_FLAG_DMABUF_ENABLED,
 VIRTIO_GPU_FLAG_BLOB_ENABLED,
+VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED,
 };
 
 #define virtio_gpu_virgl_enabled(_cfg) \
@@ -105,6 +106,8 @@ enum virtio_gpu_base_conf_flags {
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_DMABUF_ENABLED))
 #define virtio_gpu_blob_enabled(_cfg) \
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_BLOB_ENABLED))
+#define virtio_gpu_context_init_enabled(_cfg) \
+(_cfg.flags & (1 << VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED))
 
 struct virtio_gpu_base_conf {
 uint32_t max_outputs;
-- 
2.42.0.rc1.204.g551eb34607-goog




[PATCH v9 5/9] gfxstream + rutabaga prep: added need defintions, fields, and options

2023-08-18 Thread Gurchetan Singh
This modifies the common virtio-gpu.h file have the fields and
defintions needed by gfxstream/rutabaga, by VirtioGpuRutabaga.

Signed-off-by: Gurchetan Singh 
Tested-by: Alyssa Ross 
Tested-by: Emmanouil Pitsidianakis 
Reviewed-by: Emmanouil Pitsidianakis 
---
v1: void *rutabaga --> struct rutabaga *rutabaga (Akihiko)
have a separate rutabaga device instead of using GL device (Bernard)

v2: VirtioGpuRutabaga --> VirtIOGPURutabaga (Akihiko)
move MemoryRegionInfo into VirtIOGPURutabaga (Akihiko)
remove 'ctx' field (Akihiko)
remove 'rutabaga_active'

v6: remove command from commit message, refer to docs instead (Manos)

 include/hw/virtio/virtio-gpu.h | 28 
 1 file changed, 28 insertions(+)

diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index 55973e112f..e2a07e68d9 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -38,6 +38,9 @@ OBJECT_DECLARE_SIMPLE_TYPE(VirtIOGPUGL, VIRTIO_GPU_GL)
 #define TYPE_VHOST_USER_GPU "vhost-user-gpu"
 OBJECT_DECLARE_SIMPLE_TYPE(VhostUserGPU, VHOST_USER_GPU)
 
+#define TYPE_VIRTIO_GPU_RUTABAGA "virtio-gpu-rutabaga-device"
+OBJECT_DECLARE_SIMPLE_TYPE(VirtIOGPURutabaga, VIRTIO_GPU_RUTABAGA)
+
 struct virtio_gpu_simple_resource {
 uint32_t resource_id;
 uint32_t width;
@@ -94,6 +97,7 @@ enum virtio_gpu_base_conf_flags {
 VIRTIO_GPU_FLAG_DMABUF_ENABLED,
 VIRTIO_GPU_FLAG_BLOB_ENABLED,
 VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED,
+VIRTIO_GPU_FLAG_RUTABAGA_ENABLED,
 };
 
 #define virtio_gpu_virgl_enabled(_cfg) \
@@ -108,6 +112,8 @@ enum virtio_gpu_base_conf_flags {
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_BLOB_ENABLED))
 #define virtio_gpu_context_init_enabled(_cfg) \
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED))
+#define virtio_gpu_rutabaga_enabled(_cfg) \
+(_cfg.flags & (1 << VIRTIO_GPU_FLAG_RUTABAGA_ENABLED))
 #define virtio_gpu_hostmem_enabled(_cfg) \
 (_cfg.hostmem > 0)
 
@@ -232,6 +238,28 @@ struct VhostUserGPU {
 bool backend_blocked;
 };
 
+#define MAX_SLOTS 4096
+
+struct MemoryRegionInfo {
+int used;
+MemoryRegion mr;
+uint32_t resource_id;
+};
+
+struct rutabaga;
+
+struct VirtIOGPURutabaga {
+struct VirtIOGPU parent_obj;
+
+struct MemoryRegionInfo memory_regions[MAX_SLOTS];
+char *capset_names;
+char *wayland_socket_path;
+char *wsi;
+bool headless;
+uint32_t num_capsets;
+struct rutabaga *rutabaga;
+};
+
 #define VIRTIO_GPU_FILL_CMD(out) do {   \
 size_t s;   \
 s = iov_to_buf(cmd->elem.out_sg, cmd->elem.out_num, 0,  \
-- 
2.42.0.rc1.204.g551eb34607-goog




[PATCH v9 6/9] gfxstream + rutabaga: add initial support for gfxstream

2023-08-18 Thread Gurchetan Singh
This adds initial support for gfxstream and cross-domain.  Both
features rely on virtio-gpu blob resources and context types, which
are also implemented in this patch.

gfxstream has a long and illustrious history in Android graphics
paravirtualization.  It has been powering graphics in the Android
Studio Emulator for more than a decade, which is the main developer
platform.

Originally conceived by Jesse Hall, it was first known as "EmuGL" [a].
The key design characteristic was a 1:1 threading model and
auto-generation, which fit nicely with the OpenGLES spec.  It also
allowed easy layering with ANGLE on the host, which provides the GLES
implementations on Windows or MacOS enviroments.

gfxstream has traditionally been maintained by a single engineer, and
between 2015 to 2021, the goldfish throne passed to Frank Yang.
Historians often remark this glorious reign ("pax gfxstreama" is the
academic term) was comparable to that of Augustus and both Queen
Elizabeths.  Just to name a few accomplishments in a resplendent
panoply: higher versions of GLES, address space graphics, snapshot
support and CTS compliant Vulkan [b].

One major drawback was the use of out-of-tree goldfish drivers.
Android engineers didn't know much about DRM/KMS and especially TTM so
a simple guest to host pipe was conceived.

Luckily, virtio-gpu 3D started to emerge in 2016 due to the work of
the Mesa/virglrenderer communities.  In 2018, the initial virtio-gpu
port of gfxstream was done by Cuttlefish enthusiast Alistair Delva.
It was a symbol compatible replacement of virglrenderer [c] and named
"AVDVirglrenderer".  This implementation forms the basis of the
current gfxstream host implementation still in use today.

cross-domain support follows a similar arc.  Originally conceived by
Wayland aficionado David Reveman and crosvm enjoyer Zach Reizner in
2018, it initially relied on the downstream "virtio-wl" device.

In 2020 and 2021, virtio-gpu was extended to include blob resources
and multiple timelines by yours truly, features gfxstream/cross-domain
both require to function correctly.

Right now, we stand at the precipice of a truly fantastic possibility:
the Android Emulator powered by upstream QEMU and upstream Linux
kernel.  gfxstream will then be packaged properfully, and app
developers can even fix gfxstream bugs on their own if they encounter
them.

It's been quite the ride, my friends.  Where will gfxstream head next,
nobody really knows.  I wouldn't be surprised if it's around for
another decade, maintained by a new generation of Android graphics
enthusiasts.

Technical details:
  - Very simple initial display integration: just used Pixman
  - Largely, 1:1 mapping of virtio-gpu hypercalls to rutabaga function
calls

Next steps for Android VMs:
  - The next step would be improving display integration and UI interfaces
with the goal of the QEMU upstream graphics being in an emulator
release [d].

Next steps for Linux VMs for display virtualization:
  - For widespread distribution, someone needs to package Sommelier or the
wayland-proxy-virtwl [e] ideally into Debian main. In addition, newer
versions of the Linux kernel come with DRM_VIRTIO_GPU_KMS option,
which allows disabling KMS hypercalls.  If anyone cares enough, it'll
probably be possible to build a custom VM variant that uses this display
virtualization strategy.

[a] https://android-review.googlesource.com/c/platform/development/+/34470
[b] 
https://android-review.googlesource.com/q/topic:%22vulkan-hostconnection-start%22
[c] 
https://android-review.googlesource.com/c/device/generic/goldfish-opengl/+/761927
[d] https://developer.android.com/studio/releases/emulator
[e] https://github.com/talex5/wayland-proxy-virtwl

Signed-off-by: Gurchetan Singh 
Tested-by: Alyssa Ross 
Tested-by: Emmanouil Pitsidianakis 
Reviewed-by: Emmanouil Pitsidianakis 
---
v1: Incorported various suggestions by Akihiko Odaki and Bernard Berschow
- Removed GET_VIRTIO_GPU_GL / GET_RUTABAGA macros
- Used error_report(..)
- Used g_autofree to fix leaks on error paths
- Removed unnecessary casts
- added virtio-gpu-pci-rutabaga.c + virtio-vga-rutabaga.c files

v2: Incorported various suggestions by Akihiko Odaki, Marc-André Lureau and
Bernard Berschow:
- Parenthesis in CHECK macro
- CHECK_RESULT(result, ..) --> CHECK(!result, ..)
- delay until g->parent_obj.enable = 1
- Additional cast fixes
- initialize directly in virtio_gpu_rutabaga_realize(..)
- add debug callback to hook into QEMU error's APIs

v3: Incorporated feedback from Akihiko Odaki and Alyssa Ross:
- Autodetect Wayland socket when not explicitly specified
- Fix map_blob error paths
- Add comment why we need both `res` and `resource` in create blob
- Cast and whitespace fixes
- Big endian check comes before virtio_gpu_rutabaga_init().
- VirtIOVGARUTABAGA --> VirtIOVGARutabaga

v4: Incorporated feedback from Akihiko Odaki and Alyssa Ross:

[PATCH v9 8/9] gfxstream + rutabaga: enable rutabaga

2023-08-18 Thread Gurchetan Singh
This change enables rutabaga to receive virtio-gpu-3d hypercalls
when it is active.

Signed-off-by: Gurchetan Singh 
Tested-by: Alyssa Ross 
Tested-by: Emmanouil Pitsidianakis 
Reviewed-by: Emmanouil Pitsidianakis 
---
v3: Whitespace fix (Akihiko)
v9: reorder virtio_gpu_have_udmabuf() after checking if rutabaga
is enabled to avoid spurious warnings (Akihiko)

 hw/display/virtio-gpu-base.c | 3 ++-
 hw/display/virtio-gpu.c  | 5 +++--
 softmmu/qdev-monitor.c   | 3 +++
 softmmu/vl.c | 1 +
 4 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/hw/display/virtio-gpu-base.c b/hw/display/virtio-gpu-base.c
index 4f2b0ba1f3..50c5373b65 100644
--- a/hw/display/virtio-gpu-base.c
+++ b/hw/display/virtio-gpu-base.c
@@ -223,7 +223,8 @@ virtio_gpu_base_get_features(VirtIODevice *vdev, uint64_t 
features,
 {
 VirtIOGPUBase *g = VIRTIO_GPU_BASE(vdev);
 
-if (virtio_gpu_virgl_enabled(g->conf)) {
+if (virtio_gpu_virgl_enabled(g->conf) ||
+virtio_gpu_rutabaga_enabled(g->conf)) {
 features |= (1 << VIRTIO_GPU_F_VIRGL);
 }
 if (virtio_gpu_edid_enabled(g->conf)) {
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index 3e658f1fef..fe094addef 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -1361,8 +1361,9 @@ void virtio_gpu_device_realize(DeviceState *qdev, Error 
**errp)
 VirtIOGPU *g = VIRTIO_GPU(qdev);
 
 if (virtio_gpu_blob_enabled(g->parent_obj.conf)) {
-if (!virtio_gpu_have_udmabuf()) {
-error_setg(errp, "cannot enable blob resources without udmabuf");
+if (!virtio_gpu_rutabaga_enabled(g->parent_obj.conf) &&
+!virtio_gpu_have_udmabuf()) {
+error_setg(errp, "need rutabaga or udmabuf for blob resources");
 return;
 }
 
diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c
index 74f4e41338..1b8005ae55 100644
--- a/softmmu/qdev-monitor.c
+++ b/softmmu/qdev-monitor.c
@@ -86,6 +86,9 @@ static const QDevAlias qdev_alias_table[] = {
 { "virtio-gpu-pci", "virtio-gpu", QEMU_ARCH_VIRTIO_PCI },
 { "virtio-gpu-gl-device", "virtio-gpu-gl", QEMU_ARCH_VIRTIO_MMIO },
 { "virtio-gpu-gl-pci", "virtio-gpu-gl", QEMU_ARCH_VIRTIO_PCI },
+{ "virtio-gpu-rutabaga-device", "virtio-gpu-rutabaga",
+  QEMU_ARCH_VIRTIO_MMIO },
+{ "virtio-gpu-rutabaga-pci", "virtio-gpu-rutabaga", QEMU_ARCH_VIRTIO_PCI },
 { "virtio-input-host-device", "virtio-input-host", QEMU_ARCH_VIRTIO_MMIO },
 { "virtio-input-host-ccw", "virtio-input-host", QEMU_ARCH_VIRTIO_CCW },
 { "virtio-input-host-pci", "virtio-input-host", QEMU_ARCH_VIRTIO_PCI },
diff --git a/softmmu/vl.c b/softmmu/vl.c
index b0b96f67fa..2f98eefdf3 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -216,6 +216,7 @@ static struct {
 { .driver = "ati-vga",  .flag = &default_vga   },
 { .driver = "vhost-user-vga",   .flag = &default_vga   },
 { .driver = "virtio-vga-gl",.flag = &default_vga   },
+{ .driver = "virtio-vga-rutabaga",  .flag = &default_vga   },
 };
 
 static QemuOptsList qemu_rtc_opts = {
-- 
2.42.0.rc1.204.g551eb34607-goog




[PATCH v9 3/9] virtio-gpu: hostmem

2023-08-18 Thread Gurchetan Singh
From: Gerd Hoffmann 

Use VIRTIO_GPU_SHM_ID_HOST_VISIBLE as id for virtio-gpu.

Signed-off-by: Antonio Caggiano 
Tested-by: Alyssa Ross 
Acked-by: Michael S. Tsirkin 
---
 hw/display/virtio-gpu-pci.c| 14 ++
 hw/display/virtio-gpu.c|  1 +
 hw/display/virtio-vga.c| 33 -
 include/hw/virtio/virtio-gpu.h |  5 +
 4 files changed, 44 insertions(+), 9 deletions(-)

diff --git a/hw/display/virtio-gpu-pci.c b/hw/display/virtio-gpu-pci.c
index 93f214ff58..da6a99f038 100644
--- a/hw/display/virtio-gpu-pci.c
+++ b/hw/display/virtio-gpu-pci.c
@@ -33,6 +33,20 @@ static void virtio_gpu_pci_base_realize(VirtIOPCIProxy 
*vpci_dev, Error **errp)
 DeviceState *vdev = DEVICE(g);
 int i;
 
+if (virtio_gpu_hostmem_enabled(g->conf)) {
+vpci_dev->msix_bar_idx = 1;
+vpci_dev->modern_mem_bar_idx = 2;
+memory_region_init(&g->hostmem, OBJECT(g), "virtio-gpu-hostmem",
+   g->conf.hostmem);
+pci_register_bar(&vpci_dev->pci_dev, 4,
+ PCI_BASE_ADDRESS_SPACE_MEMORY |
+ PCI_BASE_ADDRESS_MEM_PREFETCH |
+ PCI_BASE_ADDRESS_MEM_TYPE_64,
+ &g->hostmem);
+virtio_pci_add_shm_cap(vpci_dev, 4, 0, g->conf.hostmem,
+   VIRTIO_GPU_SHM_ID_HOST_VISIBLE);
+}
+
 virtio_pci_force_virtio_1(vpci_dev);
 if (!qdev_realize(vdev, BUS(&vpci_dev->bus), errp)) {
 return;
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index bbd5c6561a..48ef0d9fad 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -1509,6 +1509,7 @@ static Property virtio_gpu_properties[] = {
  256 * MiB),
 DEFINE_PROP_BIT("blob", VirtIOGPU, parent_obj.conf.flags,
 VIRTIO_GPU_FLAG_BLOB_ENABLED, false),
+DEFINE_PROP_SIZE("hostmem", VirtIOGPU, parent_obj.conf.hostmem, 0),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/display/virtio-vga.c b/hw/display/virtio-vga.c
index e6fb0aa876..c8552ff760 100644
--- a/hw/display/virtio-vga.c
+++ b/hw/display/virtio-vga.c
@@ -115,17 +115,32 @@ static void virtio_vga_base_realize(VirtIOPCIProxy 
*vpci_dev, Error **errp)
 pci_register_bar(&vpci_dev->pci_dev, 0,
  PCI_BASE_ADDRESS_MEM_PREFETCH, &vga->vram);
 
-/*
- * Configure virtio bar and regions
- *
- * We use bar #2 for the mmio regions, to be compatible with stdvga.
- * virtio regions are moved to the end of bar #2, to make room for
- * the stdvga mmio registers at the start of bar #2.
- */
-vpci_dev->modern_mem_bar_idx = 2;
-vpci_dev->msix_bar_idx = 4;
 vpci_dev->modern_io_bar_idx = 5;
 
+if (!virtio_gpu_hostmem_enabled(g->conf)) {
+/*
+ * Configure virtio bar and regions
+ *
+ * We use bar #2 for the mmio regions, to be compatible with stdvga.
+ * virtio regions are moved to the end of bar #2, to make room for
+ * the stdvga mmio registers at the start of bar #2.
+ */
+vpci_dev->modern_mem_bar_idx = 2;
+vpci_dev->msix_bar_idx = 4;
+} else {
+vpci_dev->msix_bar_idx = 1;
+vpci_dev->modern_mem_bar_idx = 2;
+memory_region_init(&g->hostmem, OBJECT(g), "virtio-gpu-hostmem",
+   g->conf.hostmem);
+pci_register_bar(&vpci_dev->pci_dev, 4,
+ PCI_BASE_ADDRESS_SPACE_MEMORY |
+ PCI_BASE_ADDRESS_MEM_PREFETCH |
+ PCI_BASE_ADDRESS_MEM_TYPE_64,
+ &g->hostmem);
+virtio_pci_add_shm_cap(vpci_dev, 4, 0, g->conf.hostmem,
+   VIRTIO_GPU_SHM_ID_HOST_VISIBLE);
+}
+
 if (!(vpci_dev->flags & VIRTIO_PCI_FLAG_PAGE_PER_VQ)) {
 /*
  * with page-per-vq=off there is no padding space we can use
diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index 8377c365ef..de4f624e94 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -108,12 +108,15 @@ enum virtio_gpu_base_conf_flags {
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_BLOB_ENABLED))
 #define virtio_gpu_context_init_enabled(_cfg) \
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_CONTEXT_INIT_ENABLED))
+#define virtio_gpu_hostmem_enabled(_cfg) \
+(_cfg.hostmem > 0)
 
 struct virtio_gpu_base_conf {
 uint32_t max_outputs;
 uint32_t flags;
 uint32_t xres;
 uint32_t yres;
+uint64_t hostmem;
 };
 
 struct virtio_gpu_ctrl_command {
@@ -137,6 +140,8 @@ struct VirtIOGPUBase {
 int renderer_blocked;
 int enable;
 
+MemoryRegion hostmem;
+
 struct virtio_gpu_scanout scanout[VIRTIO_GPU_MAX_SCANOUTS];
 
 int enabled_output_bitmask;
-- 
2.42.0.rc1.204.g551eb34607-goog




[PATCH v9 7/9] gfxstream + rutabaga: meson support

2023-08-18 Thread Gurchetan Singh
- Add meson detection of rutabaga_gfx
- Build virtio-gpu-rutabaga.c + associated vga/pci files when
  present

Signed-off-by: Gurchetan Singh 
Tested-by: Alyssa Ross 
Tested-by: Emmanouil Pitsidianakis 
Reviewed-by: Emmanouil Pitsidianakis 
---
v3: Fix alignment issues (Akihiko)

 hw/display/meson.build| 22 ++
 meson.build   |  7 +++
 meson_options.txt |  2 ++
 scripts/meson-buildoptions.sh |  3 +++
 4 files changed, 34 insertions(+)

diff --git a/hw/display/meson.build b/hw/display/meson.build
index 413ba4ab24..e362d625dd 100644
--- a/hw/display/meson.build
+++ b/hw/display/meson.build
@@ -79,6 +79,13 @@ if config_all_devices.has_key('CONFIG_VIRTIO_GPU')
  if_true: [files('virtio-gpu-gl.c', 
'virtio-gpu-virgl.c'), pixman, virgl])
 hw_display_modules += {'virtio-gpu-gl': virtio_gpu_gl_ss}
   endif
+
+  if rutabaga.found()
+virtio_gpu_rutabaga_ss = ss.source_set()
+virtio_gpu_rutabaga_ss.add(when: ['CONFIG_VIRTIO_GPU', rutabaga],
+   if_true: [files('virtio-gpu-rutabaga.c'), 
pixman])
+hw_display_modules += {'virtio-gpu-rutabaga': virtio_gpu_rutabaga_ss}
+  endif
 endif
 
 if config_all_devices.has_key('CONFIG_VIRTIO_PCI')
@@ -95,6 +102,12 @@ if config_all_devices.has_key('CONFIG_VIRTIO_PCI')
  if_true: [files('virtio-gpu-pci-gl.c'), pixman])
 hw_display_modules += {'virtio-gpu-pci-gl': virtio_gpu_pci_gl_ss}
   endif
+  if rutabaga.found()
+virtio_gpu_pci_rutabaga_ss = ss.source_set()
+virtio_gpu_pci_rutabaga_ss.add(when: ['CONFIG_VIRTIO_GPU', 
'CONFIG_VIRTIO_PCI', rutabaga],
+   if_true: 
[files('virtio-gpu-pci-rutabaga.c'), pixman])
+hw_display_modules += {'virtio-gpu-pci-rutabaga': 
virtio_gpu_pci_rutabaga_ss}
+  endif
 endif
 
 if config_all_devices.has_key('CONFIG_VIRTIO_VGA')
@@ -113,6 +126,15 @@ if config_all_devices.has_key('CONFIG_VIRTIO_VGA')
   virtio_vga_gl_ss.add(when: 'CONFIG_ACPI', if_true: files('acpi-vga.c'),
 if_false: files('acpi-vga-stub.c'))
   hw_display_modules += {'virtio-vga-gl': virtio_vga_gl_ss}
+
+  if rutabaga.found()
+virtio_vga_rutabaga_ss = ss.source_set()
+virtio_vga_rutabaga_ss.add(when: ['CONFIG_VIRTIO_VGA', rutabaga],
+   if_true: [files('virtio-vga-rutabaga.c'), 
pixman])
+virtio_vga_rutabaga_ss.add(when: 'CONFIG_ACPI', if_true: 
files('acpi-vga.c'),
+if_false: 
files('acpi-vga-stub.c'))
+hw_display_modules += {'virtio-vga-rutabaga': virtio_vga_rutabaga_ss}
+  endif
 endif
 
 system_ss.add(when: 'CONFIG_OMAP', if_true: files('omap_lcdc.c'))
diff --git a/meson.build b/meson.build
index 98e68ef0b1..293f388e53 100644
--- a/meson.build
+++ b/meson.build
@@ -1069,6 +1069,12 @@ if not get_option('virglrenderer').auto() or have_system 
or have_vhost_user_gpu
dependencies: virgl))
   endif
 endif
+rutabaga = not_found
+if not get_option('rutabaga_gfx').auto() or have_system or have_vhost_user_gpu
+  rutabaga = dependency('rutabaga_gfx_ffi',
+ method: 'pkg-config',
+ required: get_option('rutabaga_gfx'))
+endif
 blkio = not_found
 if not get_option('blkio').auto() or have_block
   blkio = dependency('blkio',
@@ -4272,6 +4278,7 @@ summary_info += {'libtasn1':  tasn1}
 summary_info += {'PAM':   pam}
 summary_info += {'iconv support': iconv}
 summary_info += {'virgl support': virgl}
+summary_info += {'rutabaga support':  rutabaga}
 summary_info += {'blkio support': blkio}
 summary_info += {'curl support':  curl}
 summary_info += {'Multipath support': mpathpersist}
diff --git a/meson_options.txt b/meson_options.txt
index aaea5ddd77..dea3bf7d9c 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -224,6 +224,8 @@ option('vmnet', type : 'feature', value : 'auto',
description: 'vmnet.framework network backend support')
 option('virglrenderer', type : 'feature', value : 'auto',
description: 'virgl rendering support')
+option('rutabaga_gfx', type : 'feature', value : 'auto',
+   description: 'rutabaga_gfx support')
 option('png', type : 'feature', value : 'auto',
description: 'PNG support with libpng')
 option('vnc', type : 'feature', value : 'auto',
diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
index 9da3fe299b..9a95b4f782 100644
--- a/scripts/meson-buildoptions.sh
+++ b/scripts/meson-buildoptions.sh
@@ -154,6 +154,7 @@ meson_options_help() {
   printf "%s\n" '  rbd Ceph block device driver'
   printf "%s\n" '  rdmaEnable RDMA-based migration'
   printf "%s\n" '  replication replication support'
+  printf "%s\n" '  rutabaga-gfxrutabaga_gfx support'
   printf "%s\n" '  sdl SDL user interface'
   printf "%s\n" '  sdl-image

[PATCH v9 4/9] virtio-gpu: blob prep

2023-08-18 Thread Gurchetan Singh
From: Antonio Caggiano 

This adds preparatory functions needed to:

 - decode blob cmds
 - tracking iovecs

Signed-off-by: Antonio Caggiano 
Signed-off-by: Dmitry Osipenko 
Signed-off-by: Gurchetan Singh 
Tested-by: Alyssa Ross 
Tested-by: Emmanouil Pitsidianakis 
Reviewed-by: Emmanouil Pitsidianakis 
---
 hw/display/virtio-gpu.c  | 10 +++---
 include/hw/virtio/virtio-gpu-bswap.h | 18 ++
 include/hw/virtio/virtio-gpu.h   |  5 +
 3 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index 48ef0d9fad..3e658f1fef 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -33,15 +33,11 @@
 
 #define VIRTIO_GPU_VM_VERSION 1
 
-static struct virtio_gpu_simple_resource*
-virtio_gpu_find_resource(VirtIOGPU *g, uint32_t resource_id);
 static struct virtio_gpu_simple_resource *
 virtio_gpu_find_check_resource(VirtIOGPU *g, uint32_t resource_id,
bool require_backing,
const char *caller, uint32_t *error);
 
-static void virtio_gpu_cleanup_mapping(VirtIOGPU *g,
-   struct virtio_gpu_simple_resource *res);
 static void virtio_gpu_reset_bh(void *opaque);
 
 void virtio_gpu_update_cursor_data(VirtIOGPU *g,
@@ -116,7 +112,7 @@ static void update_cursor(VirtIOGPU *g, struct 
virtio_gpu_update_cursor *cursor)
   cursor->resource_id ? 1 : 0);
 }
 
-static struct virtio_gpu_simple_resource *
+struct virtio_gpu_simple_resource *
 virtio_gpu_find_resource(VirtIOGPU *g, uint32_t resource_id)
 {
 struct virtio_gpu_simple_resource *res;
@@ -904,8 +900,8 @@ void virtio_gpu_cleanup_mapping_iov(VirtIOGPU *g,
 g_free(iov);
 }
 
-static void virtio_gpu_cleanup_mapping(VirtIOGPU *g,
-   struct virtio_gpu_simple_resource *res)
+void virtio_gpu_cleanup_mapping(VirtIOGPU *g,
+struct virtio_gpu_simple_resource *res)
 {
 virtio_gpu_cleanup_mapping_iov(g, res->iov, res->iov_cnt);
 res->iov = NULL;
diff --git a/include/hw/virtio/virtio-gpu-bswap.h 
b/include/hw/virtio/virtio-gpu-bswap.h
index 9124108485..dd1975e2d4 100644
--- a/include/hw/virtio/virtio-gpu-bswap.h
+++ b/include/hw/virtio/virtio-gpu-bswap.h
@@ -63,10 +63,28 @@ virtio_gpu_create_blob_bswap(struct 
virtio_gpu_resource_create_blob *cblob)
 {
 virtio_gpu_ctrl_hdr_bswap(&cblob->hdr);
 le32_to_cpus(&cblob->resource_id);
+le32_to_cpus(&cblob->blob_mem);
 le32_to_cpus(&cblob->blob_flags);
+le32_to_cpus(&cblob->nr_entries);
+le64_to_cpus(&cblob->blob_id);
 le64_to_cpus(&cblob->size);
 }
 
+static inline void
+virtio_gpu_map_blob_bswap(struct virtio_gpu_resource_map_blob *mblob)
+{
+virtio_gpu_ctrl_hdr_bswap(&mblob->hdr);
+le32_to_cpus(&mblob->resource_id);
+le64_to_cpus(&mblob->offset);
+}
+
+static inline void
+virtio_gpu_unmap_blob_bswap(struct virtio_gpu_resource_unmap_blob *ublob)
+{
+virtio_gpu_ctrl_hdr_bswap(&ublob->hdr);
+le32_to_cpus(&ublob->resource_id);
+}
+
 static inline void
 virtio_gpu_scanout_blob_bswap(struct virtio_gpu_set_scanout_blob *ssb)
 {
diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index de4f624e94..55973e112f 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -257,6 +257,9 @@ void virtio_gpu_base_fill_display_info(VirtIOGPUBase *g,
 void virtio_gpu_base_generate_edid(VirtIOGPUBase *g, int scanout,
struct virtio_gpu_resp_edid *edid);
 /* virtio-gpu.c */
+struct virtio_gpu_simple_resource *
+virtio_gpu_find_resource(VirtIOGPU *g, uint32_t resource_id);
+
 void virtio_gpu_ctrl_response(VirtIOGPU *g,
   struct virtio_gpu_ctrl_command *cmd,
   struct virtio_gpu_ctrl_hdr *resp,
@@ -275,6 +278,8 @@ int virtio_gpu_create_mapping_iov(VirtIOGPU *g,
   uint32_t *niov);
 void virtio_gpu_cleanup_mapping_iov(VirtIOGPU *g,
 struct iovec *iov, uint32_t count);
+void virtio_gpu_cleanup_mapping(VirtIOGPU *g,
+struct virtio_gpu_simple_resource *res);
 void virtio_gpu_process_cmdq(VirtIOGPU *g);
 void virtio_gpu_device_realize(DeviceState *qdev, Error **errp);
 void virtio_gpu_reset(VirtIODevice *vdev);
-- 
2.42.0.rc1.204.g551eb34607-goog




[PATCH v9 9/9] docs/system: add basic virtio-gpu documentation

2023-08-18 Thread Gurchetan Singh
This adds basic documentation for virtio-gpu.

Suggested-by: Akihiko Odaki 
Signed-off-by: Gurchetan Singh 
Tested-by: Alyssa Ross 
Tested-by: Emmanouil Pitsidianakis 
Reviewed-by: Emmanouil Pitsidianakis 
---
v2: - Incorporated suggestions by Akihiko Odaki
- Listed the currently supported capset_names (Bernard)

v3: - Incorporated suggestions by Akihiko Odaki and Alyssa Ross

v4: - Incorporated suggestions by Akihiko Odaki

v5: - Removed pci suffix from examples
- Verified that -device virtio-gpu-rutabaga works.  Strangely
  enough, I don't remember changing anything, and I remember
  it not working.  I did rebase to top of tree though.
- Fixed meson examples in crosvm docs

v8: - Remove different links for "rutabaga_gfx" and
  "gfxstream-enabled rutabaga" (Akihiko)

 docs/system/device-emulation.rst   |   1 +
 docs/system/devices/virtio-gpu.rst | 112 +
 2 files changed, 113 insertions(+)
 create mode 100644 docs/system/devices/virtio-gpu.rst

diff --git a/docs/system/device-emulation.rst b/docs/system/device-emulation.rst
index 4491c4cbf7..1167f3a9f2 100644
--- a/docs/system/device-emulation.rst
+++ b/docs/system/device-emulation.rst
@@ -91,6 +91,7 @@ Emulated Devices
devices/nvme.rst
devices/usb.rst
devices/vhost-user.rst
+   devices/virtio-gpu.rst
devices/virtio-pmem.rst
devices/vhost-user-rng.rst
devices/canokey.rst
diff --git a/docs/system/devices/virtio-gpu.rst 
b/docs/system/devices/virtio-gpu.rst
new file mode 100644
index 00..2b3eb536f9
--- /dev/null
+++ b/docs/system/devices/virtio-gpu.rst
@@ -0,0 +1,112 @@
+..
+   SPDX-License-Identifier: GPL-2.0
+
+virtio-gpu
+==
+
+This document explains the setup and usage of the virtio-gpu device.
+The virtio-gpu device paravirtualizes the GPU and display controller.
+
+Linux kernel support
+
+
+virtio-gpu requires a guest Linux kernel built with the
+``CONFIG_DRM_VIRTIO_GPU`` option.
+
+QEMU virtio-gpu variants
+
+
+QEMU virtio-gpu device variants come in the following form:
+
+ * ``virtio-vga[-BACKEND]``
+ * ``virtio-gpu[-BACKEND][-INTERFACE]``
+ * ``vhost-user-vga``
+ * ``vhost-user-pci``
+
+**Backends:** QEMU provides a 2D virtio-gpu backend, and two accelerated
+backends: virglrenderer ('gl' device label) and rutabaga_gfx ('rutabaga'
+device label).  There is a vhost-user backend that runs the graphics stack
+in a separate process for improved isolation.
+
+**Interfaces:** QEMU further categorizes virtio-gpu device variants based
+on the interface exposed to the guest. The interfaces can be classified
+into VGA and non-VGA variants. The VGA ones are prefixed with virtio-vga
+or vhost-user-vga while the non-VGA ones are prefixed with virtio-gpu or
+vhost-user-gpu.
+
+The VGA ones always use the PCI interface, but for the non-VGA ones, the
+user can further pick between MMIO or PCI. For MMIO, the user can suffix
+the device name with -device, though vhost-user-gpu does not support MMIO.
+For PCI, the user can suffix it with -pci. Without these suffixes, the
+platform default will be chosen.
+
+virtio-gpu 2d
+-
+
+The default 2D backend only performs 2D operations. The guest needs to
+employ a software renderer for 3D graphics.
+
+Typically, the software renderer is provided by `Mesa`_ or `SwiftShader`_.
+Mesa's implementations (LLVMpipe, Lavapipe and virgl below) work out of box
+on typical modern Linux distributions.
+
+.. parsed-literal::
+-device virtio-gpu
+
+.. _Mesa: https://www.mesa3d.org/
+.. _SwiftShader: https://github.com/google/swiftshader
+
+virtio-gpu virglrenderer
+
+
+When using virgl accelerated graphics mode in the guest, OpenGL API calls
+are translated into an intermediate representation (see `Gallium3D`_). The
+intermediate representation is communicated to the host and the
+`virglrenderer`_ library on the host translates the intermediate
+representation back to OpenGL API calls.
+
+.. parsed-literal::
+-device virtio-gpu-gl
+
+.. _Gallium3D: https://www.freedesktop.org/wiki/Software/gallium/
+.. _virglrenderer: https://gitlab.freedesktop.org/virgl/virglrenderer/
+
+virtio-gpu rutabaga
+---
+
+virtio-gpu can also leverage rutabaga_gfx to provide `gfxstream`_
+rendering and `Wayland display passthrough`_.  With the gfxstream rendering
+mode, GLES and Vulkan calls are forwarded to the host with minimal
+modification.
+
+The crosvm book provides directions on how to build a `gfxstream-enabled
+rutabaga`_ and launch a `guest Wayland proxy`_.
+
+This device does require host blob support (``hostmem`` field below). The
+``hostmem`` field specifies the size of virtio-gpu host memory window.
+This is typically between 256M and 8G.
+
+At least one capset (see colon separated ``capset_names`` below) must be
+specified when starting the device.  The currently supported
+``capset_names`` are ``gfxstream-vulkan`` and ``cross-domain`` on Linux
+guests. Fo

[PATCH v9 1/9] virtio: Add shared memory capability

2023-08-18 Thread Gurchetan Singh
From: "Dr. David Alan Gilbert" 

Define a new capability type 'VIRTIO_PCI_CAP_SHARED_MEMORY_CFG' to allow
defining shared memory regions with sizes and offsets of 2^32 and more.
Multiple instances of the capability are allowed and distinguished
by a device-specific 'id'.

Signed-off-by: Dr. David Alan Gilbert 
Signed-off-by: Antonio Caggiano 
Reviewed-by: Gurchetan Singh 
Signed-off-by: Gurchetan Singh 
Tested-by: Alyssa Ross 
Acked-by: Huang Rui 
Tested-by: Huang Rui 
Reviewed-by: Akihiko Odaki 
---
 hw/virtio/virtio-pci.c | 18 ++
 include/hw/virtio/virtio-pci.h |  4 
 2 files changed, 22 insertions(+)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index edbc0daa18..da8c9ea12d 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1435,6 +1435,24 @@ static int virtio_pci_add_mem_cap(VirtIOPCIProxy *proxy,
 return offset;
 }
 
+int virtio_pci_add_shm_cap(VirtIOPCIProxy *proxy,
+   uint8_t bar, uint64_t offset, uint64_t length,
+   uint8_t id)
+{
+struct virtio_pci_cap64 cap = {
+.cap.cap_len = sizeof cap,
+.cap.cfg_type = VIRTIO_PCI_CAP_SHARED_MEMORY_CFG,
+};
+
+cap.cap.bar = bar;
+cap.cap.length = cpu_to_le32(length);
+cap.length_hi = cpu_to_le32(length >> 32);
+cap.cap.offset = cpu_to_le32(offset);
+cap.offset_hi = cpu_to_le32(offset >> 32);
+cap.cap.id = id;
+return virtio_pci_add_mem_cap(proxy, &cap.cap);
+}
+
 static uint64_t virtio_pci_common_read(void *opaque, hwaddr addr,
unsigned size)
 {
diff --git a/include/hw/virtio/virtio-pci.h b/include/hw/virtio/virtio-pci.h
index ab2051b64b..5a3f182f99 100644
--- a/include/hw/virtio/virtio-pci.h
+++ b/include/hw/virtio/virtio-pci.h
@@ -264,4 +264,8 @@ unsigned virtio_pci_optimal_num_queues(unsigned 
fixed_queues);
 void virtio_pci_set_guest_notifier_fd_handler(VirtIODevice *vdev, VirtQueue 
*vq,
   int n, bool assign,
   bool with_irqfd);
+
+int virtio_pci_add_shm_cap(VirtIOPCIProxy *proxy, uint8_t bar, uint64_t offset,
+   uint64_t length, uint8_t id);
+
 #endif
-- 
2.42.0.rc1.204.g551eb34607-goog




[PATCH v2 15/18] target/s390x: Use clmul_64

2023-08-18 Thread Richard Henderson
Use the generic routine for 64-bit carry-less multiply.
Remove our local version of galois_multiply64.

Signed-off-by: Richard Henderson 
---
 target/s390x/tcg/vec_int_helper.c | 58 +++
 1 file changed, 12 insertions(+), 46 deletions(-)

diff --git a/target/s390x/tcg/vec_int_helper.c 
b/target/s390x/tcg/vec_int_helper.c
index ba284b5379..b18d8a6d16 100644
--- a/target/s390x/tcg/vec_int_helper.c
+++ b/target/s390x/tcg/vec_int_helper.c
@@ -21,13 +21,6 @@ static bool s390_vec_is_zero(const S390Vector *v)
 return !v->doubleword[0] && !v->doubleword[1];
 }
 
-static void s390_vec_xor(S390Vector *res, const S390Vector *a,
- const S390Vector *b)
-{
-res->doubleword[0] = a->doubleword[0] ^ b->doubleword[0];
-res->doubleword[1] = a->doubleword[1] ^ b->doubleword[1];
-}
-
 static void s390_vec_and(S390Vector *res, const S390Vector *a,
  const S390Vector *b)
 {
@@ -166,26 +159,6 @@ DEF_VCTZ(16)
 
 /* like binary multiplication, but XOR instead of addition */
 
-static S390Vector galois_multiply64(uint64_t a, uint64_t b)
-{
-S390Vector res = {};
-S390Vector va = {
-.doubleword[1] = a,
-};
-S390Vector vb = {
-.doubleword[1] = b,
-};
-
-while (!s390_vec_is_zero(&vb)) {
-if (vb.doubleword[1] & 0x1) {
-s390_vec_xor(&res, &res, &va);
-}
-s390_vec_shl(&va, &va, 1);
-s390_vec_shr(&vb, &vb, 1);
-}
-return res;
-}
-
 /*
  * There is no carry across the two doublewords, so their order does
  * not matter.  Nor is there partial overlap between registers.
@@ -265,32 +238,25 @@ void HELPER(gvec_vgfma32)(void *v1, const void *v2, const 
void *v3,
 void HELPER(gvec_vgfm64)(void *v1, const void *v2, const void *v3,
  uint32_t desc)
 {
-S390Vector tmp1, tmp2;
-uint64_t a, b;
+uint64_t *q1 = v1;
+const uint64_t *q2 = v2, *q3 = v3;
+Int128 r;
 
-a = s390_vec_read_element64(v2, 0);
-b = s390_vec_read_element64(v3, 0);
-tmp1 = galois_multiply64(a, b);
-a = s390_vec_read_element64(v2, 1);
-b = s390_vec_read_element64(v3, 1);
-tmp2 = galois_multiply64(a, b);
-s390_vec_xor(v1, &tmp1, &tmp2);
+r = int128_xor(clmul_64(q2[0], q3[0]), clmul_64(q2[1], q3[1]));
+q1[0] = int128_gethi(r);
+q1[1] = int128_getlo(r);
 }
 
 void HELPER(gvec_vgfma64)(void *v1, const void *v2, const void *v3,
   const void *v4, uint32_t desc)
 {
-S390Vector tmp1, tmp2;
-uint64_t a, b;
+uint64_t *q1 = v1;
+const uint64_t *q2 = v2, *q3 = v3, *q4 = v4;
+Int128 r;
 
-a = s390_vec_read_element64(v2, 0);
-b = s390_vec_read_element64(v3, 0);
-tmp1 = galois_multiply64(a, b);
-a = s390_vec_read_element64(v2, 1);
-b = s390_vec_read_element64(v3, 1);
-tmp2 = galois_multiply64(a, b);
-s390_vec_xor(&tmp1, &tmp1, &tmp2);
-s390_vec_xor(v1, &tmp1, v4);
+r = int128_xor(clmul_64(q2[0], q3[0]), clmul_64(q2[1], q3[1]));
+q1[0] = q4[0] ^ int128_gethi(r);
+q1[1] = q4[1] ^ int128_getlo(r);
 }
 
 #define DEF_VMAL(BITS) 
\
-- 
2.34.1




[PATCH v2 13/18] crypto: Add generic 64-bit carry-less multiply routine

2023-08-18 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 host/include/generic/host/crypto/clmul.h | 15 +++
 include/crypto/clmul.h   | 19 +++
 crypto/clmul.c   | 18 ++
 3 files changed, 52 insertions(+)
 create mode 100644 host/include/generic/host/crypto/clmul.h

diff --git a/host/include/generic/host/crypto/clmul.h 
b/host/include/generic/host/crypto/clmul.h
new file mode 100644
index 00..915bfb88d3
--- /dev/null
+++ b/host/include/generic/host/crypto/clmul.h
@@ -0,0 +1,15 @@
+/*
+ * No host specific carry-less multiply acceleration.
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef GENERIC_HOST_CRYPTO_CLMUL_H
+#define GENERIC_HOST_CRYPTO_CLMUL_H
+
+#define HAVE_CLMUL_ACCEL  false
+#define ATTR_CLMUL_ACCEL
+
+Int128 clmul_64_accel(uint64_t, uint64_t)
+QEMU_ERROR("unsupported accel");
+
+#endif /* GENERIC_HOST_CRYPTO_CLMUL_H */
diff --git a/include/crypto/clmul.h b/include/crypto/clmul.h
index 0ea25a252c..c82d2d7559 100644
--- a/include/crypto/clmul.h
+++ b/include/crypto/clmul.h
@@ -8,6 +8,9 @@
 #ifndef CRYPTO_CLMUL_H
 #define CRYPTO_CLMUL_H
 
+#include "qemu/int128.h"
+#include "host/crypto/clmul.h"
+
 /**
  * clmul_8x8_low:
  *
@@ -61,4 +64,20 @@ uint64_t clmul_16x2_odd(uint64_t, uint64_t);
  */
 uint64_t clmul_32(uint32_t, uint32_t);
 
+/**
+ * clmul_64:
+ *
+ * Perform a 64x64->128 carry-less multiply.
+ */
+Int128 clmul_64_gen(uint64_t, uint64_t);
+
+static inline Int128 clmul_64(uint64_t a, uint64_t b)
+{
+if (HAVE_CLMUL_ACCEL) {
+return clmul_64_accel(a, b);
+} else {
+return clmul_64_gen(a, b);
+}
+}
+
 #endif /* CRYPTO_CLMUL_H */
diff --git a/crypto/clmul.c b/crypto/clmul.c
index 36ada1be9d..abf79cc49a 100644
--- a/crypto/clmul.c
+++ b/crypto/clmul.c
@@ -92,3 +92,21 @@ uint64_t clmul_32(uint32_t n, uint32_t m32)
 }
 return r;
 }
+
+Int128 clmul_64_gen(uint64_t n, uint64_t m)
+{
+uint64_t rl = 0, rh = 0;
+
+/* Bit 0 can only influence the low 64-bit result.  */
+if (n & 1) {
+rl = m;
+}
+
+for (int i = 1; i < 64; ++i) {
+uint64_t mask = -(n & 1);
+rl ^= (m << i) & mask;
+rh ^= (m >> (64 - i)) & mask;
+n >>= 1;
+}
+return int128_make128(rl, rh);
+}
-- 
2.34.1




[PATCH v2 14/18] target/arm: Use clmul_64

2023-08-18 Thread Richard Henderson
Use generic routine for 64-bit carry-less multiply.

Signed-off-by: Richard Henderson 
---
 target/arm/tcg/vec_helper.c | 22 --
 1 file changed, 4 insertions(+), 18 deletions(-)

diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index ffb4b44ce4..1f93510b85 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -2003,28 +2003,14 @@ void HELPER(gvec_pmul_b)(void *vd, void *vn, void *vm, 
uint32_t desc)
  */
 void HELPER(gvec_pmull_q)(void *vd, void *vn, void *vm, uint32_t desc)
 {
-intptr_t i, j, opr_sz = simd_oprsz(desc);
+intptr_t i, opr_sz = simd_oprsz(desc);
 intptr_t hi = simd_data(desc);
 uint64_t *d = vd, *n = vn, *m = vm;
 
 for (i = 0; i < opr_sz / 8; i += 2) {
-uint64_t nn = n[i + hi];
-uint64_t mm = m[i + hi];
-uint64_t rhi = 0;
-uint64_t rlo = 0;
-
-/* Bit 0 can only influence the low 64-bit result.  */
-if (nn & 1) {
-rlo = mm;
-}
-
-for (j = 1; j < 64; ++j) {
-uint64_t mask = -((nn >> j) & 1);
-rlo ^= (mm << j) & mask;
-rhi ^= (mm >> (64 - j)) & mask;
-}
-d[i] = rlo;
-d[i + 1] = rhi;
+Int128 r = clmul_64(n[i + hi], m[i + hi]);
+d[i] = int128_getlo(r);
+d[i + 1] = int128_gethi(r);
 }
 clear_tail(d, opr_sz, simd_maxsz(desc));
 }
-- 
2.34.1




[PATCH v2 11/18] target/s390x: Use clmul_32* routines

2023-08-18 Thread Richard Henderson
Use generic routines for 32-bit carry-less multiply.
Remove our local version of galois_multiply32.

Signed-off-by: Richard Henderson 
---
 target/s390x/tcg/vec_int_helper.c | 75 +--
 1 file changed, 22 insertions(+), 53 deletions(-)

diff --git a/target/s390x/tcg/vec_int_helper.c 
b/target/s390x/tcg/vec_int_helper.c
index 11477556e5..ba284b5379 100644
--- a/target/s390x/tcg/vec_int_helper.c
+++ b/target/s390x/tcg/vec_int_helper.c
@@ -165,22 +165,6 @@ DEF_VCTZ(8)
 DEF_VCTZ(16)
 
 /* like binary multiplication, but XOR instead of addition */
-#define DEF_GALOIS_MULTIPLY(BITS, TBITS)   
\
-static uint##TBITS##_t galois_multiply##BITS(uint##TBITS##_t a,
\
- uint##TBITS##_t b)
\
-{  
\
-uint##TBITS##_t res = 0;   
\
-   
\
-while (b) {
\
-if (b & 0x1) { 
\
-res = res ^ a; 
\
-}  
\
-a = a << 1;
\
-b = b >> 1;
\
-}  
\
-return res;
\
-}
-DEF_GALOIS_MULTIPLY(32, 64)
 
 static S390Vector galois_multiply64(uint64_t a, uint64_t b)
 {
@@ -254,24 +238,29 @@ void HELPER(gvec_vgfma16)(void *v1, const void *v2, const 
void *v3,
 q1[1] = do_gfma16(q2[1], q3[1], q4[1]);
 }
 
-#define DEF_VGFM(BITS, TBITS)  
\
-void HELPER(gvec_vgfm##BITS)(void *v1, const void *v2, const void *v3, 
\
- uint32_t desc)
\
-{  
\
-int i; 
\
-   
\
-for (i = 0; i < (128 / TBITS); i++) {  
\
-uint##BITS##_t a = s390_vec_read_element##BITS(v2, i * 2); 
\
-uint##BITS##_t b = s390_vec_read_element##BITS(v3, i * 2); 
\
-uint##TBITS##_t d = galois_multiply##BITS(a, b);   
\
-   
\
-a = s390_vec_read_element##BITS(v2, i * 2 + 1);
\
-b = s390_vec_read_element##BITS(v3, i * 2 + 1);
\
-d = d ^ galois_multiply32(a, b);   
\
-s390_vec_write_element##TBITS(v1, i, d);   
\
-}  
\
+static inline uint64_t do_gfma32(uint64_t n, uint64_t m, uint64_t a)
+{
+return clmul_32(n, m) ^ clmul_32(n >> 32, m >> 32) ^ a;
+}
+
+void HELPER(gvec_vgfm32)(void *v1, const void *v2, const void *v3, uint32_t d)
+{
+uint64_t *q1 = v1;
+const uint64_t *q2 = v2, *q3 = v3;
+
+q1[0] = do_gfma32(q2[0], q3[0], 0);
+q1[1] = do_gfma32(q2[1], q3[1], 0);
+}
+
+void HELPER(gvec_vgfma32)(void *v1, const void *v2, const void *v3,
+ const void *v4, uint32_t d)
+{
+uint64_t *q1 = v1;
+const uint64_t *q2 = v2, *q3 = v3, *q4 = v4;
+
+q1[0] = do_gfma32(q2[0], q3[0], q4[0]);
+q1[1] = do_gfma32(q2[1], q3[1], q4[1]);
 }
-DEF_VGFM(32, 64)
 
 void HELPER(gvec_vgfm64)(void *v1, const void *v2, const void *v3,
  uint32_t desc)
@@ -288,26 +277,6 @@ void HELPER(gvec_vgfm64)(void *v1, const void *v2, const 
void *v3,
 s390_vec_xor(v1, &tmp1, &tmp2);
 }
 
-#define DEF_VGFMA(BITS, TBITS) 
\
-void HELPER(gvec_vgfma##BITS)(void *v1, const void *v2, const void *v3,
\
-  const void *v4, uint32_t desc)   
\
-{  
\
-int i; 
\
-   
\
-for (i = 0; i < (128 / TBITS); i++) {  
\
-uint##BITS##_t a = s390_vec_read_element##BITS(v2, i * 2); 
\
-uint##BITS##_t b = s390_vec_read_element##BITS(v3, i * 2); 
\
-uint##TBI

[PATCH v2 08/18] target/ppc: Use clmul_16* routines

2023-08-18 Thread Richard Henderson
Use generic routines for 16-bit carry-less multiply.

Signed-off-by: Richard Henderson 
---
 target/ppc/int_helper.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 343874863a..10e19d8c9b 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -1438,6 +1438,14 @@ void helper_vpmsumb(ppc_avr_t *r, ppc_avr_t *a, 
ppc_avr_t *b)
 }
 }
 
+void helper_vpmsumh(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
+{
+for (int i = 0; i < 2; ++i) {
+uint64_t aa = a->u64[i], bb = b->u64[i];
+r->u64[i] = clmul_16x2_even(aa, bb) ^ clmul_16x2_odd(aa, bb);
+}
+}
+
 #define PMSUM(name, srcfld, trgfld, trgtyp)   \
 void helper_##name(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)  \
 { \
@@ -1458,7 +1466,6 @@ void helper_##name(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t 
*b)  \
 } \
 }
 
-PMSUM(vpmsumh, u16, u32, uint32_t)
 PMSUM(vpmsumw, u32, u64, uint64_t)
 
 void helper_VPMSUMD(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
-- 
2.34.1




[PATCH v2 07/18] target/s390x: Use clmul_16* routines

2023-08-18 Thread Richard Henderson
Use generic routines for 16-bit carry-less multiply.
Remove our local version of galois_multiply16.

Signed-off-by: Richard Henderson 
---
 target/s390x/tcg/vec_int_helper.c | 27 ---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/target/s390x/tcg/vec_int_helper.c 
b/target/s390x/tcg/vec_int_helper.c
index edff4d6b2b..11477556e5 100644
--- a/target/s390x/tcg/vec_int_helper.c
+++ b/target/s390x/tcg/vec_int_helper.c
@@ -180,7 +180,6 @@ static uint##TBITS##_t 
galois_multiply##BITS(uint##TBITS##_t a,\
 }  
\
 return res;
\
 }
-DEF_GALOIS_MULTIPLY(16, 32)
 DEF_GALOIS_MULTIPLY(32, 64)
 
 static S390Vector galois_multiply64(uint64_t a, uint64_t b)
@@ -231,6 +230,30 @@ void HELPER(gvec_vgfma8)(void *v1, const void *v2, const 
void *v3,
 q1[1] = do_gfma8(q2[1], q3[1], q4[1]);
 }
 
+static inline uint64_t do_gfma16(uint64_t n, uint64_t m, uint64_t a)
+{
+return clmul_16x2_even(n, m) ^ clmul_16x2_odd(n, m) ^ a;
+}
+
+void HELPER(gvec_vgfm16)(void *v1, const void *v2, const void *v3, uint32_t d)
+{
+uint64_t *q1 = v1;
+const uint64_t *q2 = v2, *q3 = v3;
+
+q1[0] = do_gfma16(q2[0], q3[0], 0);
+q1[1] = do_gfma16(q2[1], q3[1], 0);
+}
+
+void HELPER(gvec_vgfma16)(void *v1, const void *v2, const void *v3,
+ const void *v4, uint32_t d)
+{
+uint64_t *q1 = v1;
+const uint64_t *q2 = v2, *q3 = v3, *q4 = v4;
+
+q1[0] = do_gfma16(q2[0], q3[0], q4[0]);
+q1[1] = do_gfma16(q2[1], q3[1], q4[1]);
+}
+
 #define DEF_VGFM(BITS, TBITS)  
\
 void HELPER(gvec_vgfm##BITS)(void *v1, const void *v2, const void *v3, 
\
  uint32_t desc)
\
@@ -248,7 +271,6 @@ void HELPER(gvec_vgfm##BITS)(void *v1, const void *v2, 
const void *v3, \
 s390_vec_write_element##TBITS(v1, i, d);   
\
 }  
\
 }
-DEF_VGFM(16, 32)
 DEF_VGFM(32, 64)
 
 void HELPER(gvec_vgfm64)(void *v1, const void *v2, const void *v3,
@@ -284,7 +306,6 @@ void HELPER(gvec_vgfma##BITS)(void *v1, const void *v2, 
const void *v3,\
 s390_vec_write_element##TBITS(v1, i, d);   
\
 }  
\
 }
-DEF_VGFMA(16, 32)
 DEF_VGFMA(32, 64)
 
 void HELPER(gvec_vgfma64)(void *v1, const void *v2, const void *v3,
-- 
2.34.1




[PATCH v2 12/18] target/ppc: Use clmul_32* routines

2023-08-18 Thread Richard Henderson
Use generic routines for 32-bit carry-less multiply.

Signed-off-by: Richard Henderson 
---
 target/ppc/int_helper.c | 26 ++
 1 file changed, 6 insertions(+), 20 deletions(-)

diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 10e19d8c9b..ce793cf163 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -1446,28 +1446,14 @@ void helper_vpmsumh(ppc_avr_t *r, ppc_avr_t *a, 
ppc_avr_t *b)
 }
 }
 
-#define PMSUM(name, srcfld, trgfld, trgtyp)   \
-void helper_##name(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)  \
-{ \
-int i, j; \
-trgtyp prod[sizeof(ppc_avr_t) / sizeof(a->srcfld[0])];\
-  \
-VECTOR_FOR_INORDER_I(i, srcfld) { \
-prod[i] = 0;  \
-for (j = 0; j < sizeof(a->srcfld[0]) * 8; j++) {  \
-if (a->srcfld[i] & (1ull << j)) { \
-prod[i] ^= ((trgtyp)b->srcfld[i] << j);   \
-} \
-} \
-} \
-  \
-VECTOR_FOR_INORDER_I(i, trgfld) { \
-r->trgfld[i] = prod[2 * i] ^ prod[2 * i + 1]; \
-} \
+void helper_vpmsumw(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
+{
+for (int i = 0; i < 2; ++i) {
+uint64_t aa = a->u64[i], bb = b->u64[i];
+r->u64[i] = clmul_32(aa, bb) ^ clmul_32(aa >> 32, bb >> 32);
+}
 }
 
-PMSUM(vpmsumw, u32, u64, uint64_t)
-
 void helper_VPMSUMD(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
 {
 int i, j;
-- 
2.34.1




[PATCH v2 10/18] target/arm: Use clmul_32* routines

2023-08-18 Thread Richard Henderson
Use generic routines for 32-bit carry-less multiply.
Remove our local version of pmull_d.

Signed-off-by: Richard Henderson 
---
 target/arm/tcg/vec_helper.c | 14 +-
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index 5def86b573..ffb4b44ce4 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -2055,18 +2055,6 @@ void HELPER(sve2_pmull_h)(void *vd, void *vn, void *vm, 
uint32_t desc)
 }
 }
 
-static uint64_t pmull_d(uint64_t op1, uint64_t op2)
-{
-uint64_t result = 0;
-int i;
-
-for (i = 0; i < 32; ++i) {
-uint64_t mask = -((op1 >> i) & 1);
-result ^= (op2 << i) & mask;
-}
-return result;
-}
-
 void HELPER(sve2_pmull_d)(void *vd, void *vn, void *vm, uint32_t desc)
 {
 intptr_t sel = H4(simd_data(desc));
@@ -2075,7 +2063,7 @@ void HELPER(sve2_pmull_d)(void *vd, void *vn, void *vm, 
uint32_t desc)
 uint64_t *d = vd;
 
 for (i = 0; i < opr_sz / 8; ++i) {
-d[i] = pmull_d(n[2 * i + sel], m[2 * i + sel]);
+d[i] = clmul_32(n[2 * i + sel], m[2 * i + sel]);
 }
 }
 #endif
-- 
2.34.1




[PATCH v2 09/18] crypto: Add generic 32-bit carry-less multiply routines

2023-08-18 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 include/crypto/clmul.h |  7 +++
 crypto/clmul.c | 13 +
 2 files changed, 20 insertions(+)

diff --git a/include/crypto/clmul.h b/include/crypto/clmul.h
index c7ad28aa85..0ea25a252c 100644
--- a/include/crypto/clmul.h
+++ b/include/crypto/clmul.h
@@ -54,4 +54,11 @@ uint64_t clmul_16x2_even(uint64_t, uint64_t);
  */
 uint64_t clmul_16x2_odd(uint64_t, uint64_t);
 
+/**
+ * clmul_32:
+ *
+ * Perform a 32x32->64 carry-less multiply.
+ */
+uint64_t clmul_32(uint32_t, uint32_t);
+
 #endif /* CRYPTO_CLMUL_H */
diff --git a/crypto/clmul.c b/crypto/clmul.c
index 2c87cfbf8a..36ada1be9d 100644
--- a/crypto/clmul.c
+++ b/crypto/clmul.c
@@ -79,3 +79,16 @@ uint64_t clmul_16x2_odd(uint64_t n, uint64_t m)
 {
 return clmul_16x2_even(n >> 16, m >> 16);
 }
+
+uint64_t clmul_32(uint32_t n, uint32_t m32)
+{
+uint64_t r = 0;
+uint64_t m = m32;
+
+for (int i = 0; i < 32; ++i) {
+r ^= n & 1 ? m : 0;
+n >>= 1;
+m <<= 1;
+}
+return r;
+}
-- 
2.34.1




[PATCH v2 17/18] host/include/i386: Implement clmul.h

2023-08-18 Thread Richard Henderson
Detect PCLMUL in cpuinfo; implement the accel hook.

Signed-off-by: Richard Henderson 
---
 host/include/i386/host/cpuinfo.h|  1 +
 host/include/i386/host/crypto/clmul.h   | 29 +
 host/include/x86_64/host/crypto/clmul.h |  1 +
 include/qemu/cpuid.h|  3 +++
 util/cpuinfo-i386.c |  1 +
 5 files changed, 35 insertions(+)
 create mode 100644 host/include/i386/host/crypto/clmul.h
 create mode 100644 host/include/x86_64/host/crypto/clmul.h

diff --git a/host/include/i386/host/cpuinfo.h b/host/include/i386/host/cpuinfo.h
index 073d0a426f..7ae21568f7 100644
--- a/host/include/i386/host/cpuinfo.h
+++ b/host/include/i386/host/cpuinfo.h
@@ -27,6 +27,7 @@
 #define CPUINFO_ATOMIC_VMOVDQA  (1u << 16)
 #define CPUINFO_ATOMIC_VMOVDQU  (1u << 17)
 #define CPUINFO_AES (1u << 18)
+#define CPUINFO_PCLMUL  (1u << 19)
 
 /* Initialized with a constructor. */
 extern unsigned cpuinfo;
diff --git a/host/include/i386/host/crypto/clmul.h 
b/host/include/i386/host/crypto/clmul.h
new file mode 100644
index 00..dc3c814797
--- /dev/null
+++ b/host/include/i386/host/crypto/clmul.h
@@ -0,0 +1,29 @@
+/*
+ * x86 specific clmul acceleration.
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef X86_HOST_CRYPTO_CLMUL_H
+#define X86_HOST_CRYPTO_CLMUL_H
+
+#include "host/cpuinfo.h"
+#include 
+
+#if defined(__PCLMUL__)
+# define HAVE_CLMUL_ACCEL  true
+# define ATTR_CLMUL_ACCEL
+#else
+# define HAVE_CLMUL_ACCEL  likely(cpuinfo & CPUINFO_PCLMUL)
+# define ATTR_CLMUL_ACCEL  __attribute__((target("pclmul")))
+#endif
+
+static inline Int128 ATTR_CLMUL_ACCEL
+clmul_64_accel(uint64_t n, uint64_t m)
+{
+union { __m128i v; Int128 s; } u;
+
+u.v = _mm_clmulepi64_si128(_mm_set_epi64x(0, n), _mm_set_epi64x(0, m), 0);
+return u.s;
+}
+
+#endif /* X86_HOST_CRYPTO_CLMUL_H */
diff --git a/host/include/x86_64/host/crypto/clmul.h 
b/host/include/x86_64/host/crypto/clmul.h
new file mode 100644
index 00..f25eced416
--- /dev/null
+++ b/host/include/x86_64/host/crypto/clmul.h
@@ -0,0 +1 @@
+#include "host/include/i386/host/crypto/clmul.h"
diff --git a/include/qemu/cpuid.h b/include/qemu/cpuid.h
index 35325f1995..b11161555b 100644
--- a/include/qemu/cpuid.h
+++ b/include/qemu/cpuid.h
@@ -25,6 +25,9 @@
 #endif
 
 /* Leaf 1, %ecx */
+#ifndef bit_PCLMUL
+#define bit_PCLMUL  (1 << 1)
+#endif
 #ifndef bit_SSE4_1
 #define bit_SSE4_1  (1 << 19)
 #endif
diff --git a/util/cpuinfo-i386.c b/util/cpuinfo-i386.c
index 3a7b7e0ad1..36783fd199 100644
--- a/util/cpuinfo-i386.c
+++ b/util/cpuinfo-i386.c
@@ -39,6 +39,7 @@ unsigned __attribute__((constructor)) cpuinfo_init(void)
 info |= (c & bit_SSE4_1 ? CPUINFO_SSE4 : 0);
 info |= (c & bit_MOVBE ? CPUINFO_MOVBE : 0);
 info |= (c & bit_POPCNT ? CPUINFO_POPCNT : 0);
+info |= (c & bit_PCLMUL ? CPUINFO_PCLMUL : 0);
 
 /* Our AES support requires PSHUFB as well. */
 info |= ((c & bit_AES) && (c & bit_SSSE3) ? CPUINFO_AES : 0);
-- 
2.34.1




[PATCH v2 02/18] target/arm: Use clmul_8* routines

2023-08-18 Thread Richard Henderson
Use generic routines for 8-bit carry-less multiply.
Remove our local version of pmull_h.

Signed-off-by: Richard Henderson 
---
 target/arm/tcg/vec_internal.h |  5 
 target/arm/tcg/mve_helper.c   |  8 ++
 target/arm/tcg/vec_helper.c   | 53 ---
 3 files changed, 9 insertions(+), 57 deletions(-)

diff --git a/target/arm/tcg/vec_internal.h b/target/arm/tcg/vec_internal.h
index 1f4ed80ff7..c4afba6d9f 100644
--- a/target/arm/tcg/vec_internal.h
+++ b/target/arm/tcg/vec_internal.h
@@ -219,11 +219,6 @@ int16_t do_sqrdmlah_h(int16_t, int16_t, int16_t, bool, 
bool, uint32_t *);
 int32_t do_sqrdmlah_s(int32_t, int32_t, int32_t, bool, bool, uint32_t *);
 int64_t do_sqrdmlah_d(int64_t, int64_t, int64_t, bool, bool);
 
-/*
- * 8 x 8 -> 16 vector polynomial multiply where the inputs are
- * in the low 8 bits of each 16-bit element
-*/
-uint64_t pmull_h(uint64_t op1, uint64_t op2);
 /*
  * 16 x 16 -> 32 vector polynomial multiply where the inputs are
  * in the low 16 bits of each 32-bit element
diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c
index 403b345ea3..96ddfb4b3a 100644
--- a/target/arm/tcg/mve_helper.c
+++ b/target/arm/tcg/mve_helper.c
@@ -26,6 +26,7 @@
 #include "exec/exec-all.h"
 #include "tcg/tcg.h"
 #include "fpu/softfloat.h"
+#include "crypto/clmul.h"
 
 static uint16_t mve_eci_mask(CPUARMState *env)
 {
@@ -984,15 +985,12 @@ DO_2OP_L(vmulltuw, 1, 4, uint32_t, 8, uint64_t, DO_MUL)
  * Polynomial multiply. We can always do this generating 64 bits
  * of the result at a time, so we don't need to use DO_2OP_L.
  */
-#define VMULLPH_MASK 0x00ff00ff00ff00ffULL
 #define VMULLPW_MASK 0xULL
-#define DO_VMULLPBH(N, M) pmull_h((N) & VMULLPH_MASK, (M) & VMULLPH_MASK)
-#define DO_VMULLPTH(N, M) DO_VMULLPBH((N) >> 8, (M) >> 8)
 #define DO_VMULLPBW(N, M) pmull_w((N) & VMULLPW_MASK, (M) & VMULLPW_MASK)
 #define DO_VMULLPTW(N, M) DO_VMULLPBW((N) >> 16, (M) >> 16)
 
-DO_2OP(vmullpbh, 8, uint64_t, DO_VMULLPBH)
-DO_2OP(vmullpth, 8, uint64_t, DO_VMULLPTH)
+DO_2OP(vmullpbh, 8, uint64_t, clmul_8x4_even)
+DO_2OP(vmullpth, 8, uint64_t, clmul_8x4_odd)
 DO_2OP(vmullpbw, 8, uint64_t, DO_VMULLPBW)
 DO_2OP(vmullptw, 8, uint64_t, DO_VMULLPTW)
 
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index 6712a2c790..cd630ff905 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -23,6 +23,7 @@
 #include "tcg/tcg-gvec-desc.h"
 #include "fpu/softfloat.h"
 #include "qemu/int128.h"
+#include "crypto/clmul.h"
 #include "vec_internal.h"
 
 /*
@@ -1986,21 +1987,11 @@ void HELPER(gvec_ushl_h)(void *vd, void *vn, void *vm, 
uint32_t desc)
  */
 void HELPER(gvec_pmul_b)(void *vd, void *vn, void *vm, uint32_t desc)
 {
-intptr_t i, j, opr_sz = simd_oprsz(desc);
+intptr_t i, opr_sz = simd_oprsz(desc);
 uint64_t *d = vd, *n = vn, *m = vm;
 
 for (i = 0; i < opr_sz / 8; ++i) {
-uint64_t nn = n[i];
-uint64_t mm = m[i];
-uint64_t rr = 0;
-
-for (j = 0; j < 8; ++j) {
-uint64_t mask = (nn & 0x0101010101010101ull) * 0xff;
-rr ^= mm & mask;
-mm = (mm << 1) & 0xfefefefefefefefeull;
-nn >>= 1;
-}
-d[i] = rr;
+d[i] = clmul_8x8_low(n[i], m[i]);
 }
 clear_tail(d, opr_sz, simd_maxsz(desc));
 }
@@ -2038,22 +2029,6 @@ void HELPER(gvec_pmull_q)(void *vd, void *vn, void *vm, 
uint32_t desc)
 clear_tail(d, opr_sz, simd_maxsz(desc));
 }
 
-/*
- * 8x8->16 polynomial multiply.
- *
- * The byte inputs are expanded to (or extracted from) half-words.
- * Note that neon and sve2 get the inputs from different positions.
- * This allows 4 bytes to be processed in parallel with uint64_t.
- */
-
-static uint64_t expand_byte_to_half(uint64_t x)
-{
-return  (x & 0x00ff)
- | ((x & 0xff00) << 8)
- | ((x & 0x00ff) << 16)
- | ((x & 0xff00) << 24);
-}
-
 uint64_t pmull_w(uint64_t op1, uint64_t op2)
 {
 uint64_t result = 0;
@@ -2067,29 +2042,16 @@ uint64_t pmull_w(uint64_t op1, uint64_t op2)
 return result;
 }
 
-uint64_t pmull_h(uint64_t op1, uint64_t op2)
-{
-uint64_t result = 0;
-int i;
-for (i = 0; i < 8; ++i) {
-uint64_t mask = (op1 & 0x0001000100010001ull) * 0x;
-result ^= op2 & mask;
-op1 >>= 1;
-op2 <<= 1;
-}
-return result;
-}
-
 void HELPER(neon_pmull_h)(void *vd, void *vn, void *vm, uint32_t desc)
 {
 int hi = simd_data(desc);
 uint64_t *d = vd, *n = vn, *m = vm;
 uint64_t nn = n[hi], mm = m[hi];
 
-d[0] = pmull_h(expand_byte_to_half(nn), expand_byte_to_half(mm));
+d[0] = clmul_8x4_packed(nn, mm);
 nn >>= 32;
 mm >>= 32;
-d[1] = pmull_h(expand_byte_to_half(nn), expand_byte_to_half(mm));
+d[1] = clmul_8x4_packed(nn, mm);
 
 clear_tail(d, 16, simd_maxsz(desc));
 }
@@ -2102,10 +2064,7 @@ void HELPER(sve2_pmull_h)(void *vd, void *vn, void *vm, 
uint32_t desc)
 uint64_t *d = vd, *n = vn

[PATCH v2 06/18] target/arm: Use clmul_16* routines

2023-08-18 Thread Richard Henderson
Use generic routines for 16-bit carry-less multiply.
Remove our local version of pmull_w.

Signed-off-by: Richard Henderson 
---
 target/arm/tcg/vec_internal.h |  6 --
 target/arm/tcg/mve_helper.c   |  8 ++--
 target/arm/tcg/vec_helper.c   | 13 -
 3 files changed, 2 insertions(+), 25 deletions(-)

diff --git a/target/arm/tcg/vec_internal.h b/target/arm/tcg/vec_internal.h
index c4afba6d9f..3ca1b94ccf 100644
--- a/target/arm/tcg/vec_internal.h
+++ b/target/arm/tcg/vec_internal.h
@@ -219,12 +219,6 @@ int16_t do_sqrdmlah_h(int16_t, int16_t, int16_t, bool, 
bool, uint32_t *);
 int32_t do_sqrdmlah_s(int32_t, int32_t, int32_t, bool, bool, uint32_t *);
 int64_t do_sqrdmlah_d(int64_t, int64_t, int64_t, bool, bool);
 
-/*
- * 16 x 16 -> 32 vector polynomial multiply where the inputs are
- * in the low 16 bits of each 32-bit element
- */
-uint64_t pmull_w(uint64_t op1, uint64_t op2);
-
 /**
  * bfdotadd:
  * @sum: addend
diff --git a/target/arm/tcg/mve_helper.c b/target/arm/tcg/mve_helper.c
index 96ddfb4b3a..c666a96ba1 100644
--- a/target/arm/tcg/mve_helper.c
+++ b/target/arm/tcg/mve_helper.c
@@ -985,14 +985,10 @@ DO_2OP_L(vmulltuw, 1, 4, uint32_t, 8, uint64_t, DO_MUL)
  * Polynomial multiply. We can always do this generating 64 bits
  * of the result at a time, so we don't need to use DO_2OP_L.
  */
-#define VMULLPW_MASK 0xULL
-#define DO_VMULLPBW(N, M) pmull_w((N) & VMULLPW_MASK, (M) & VMULLPW_MASK)
-#define DO_VMULLPTW(N, M) DO_VMULLPBW((N) >> 16, (M) >> 16)
-
 DO_2OP(vmullpbh, 8, uint64_t, clmul_8x4_even)
 DO_2OP(vmullpth, 8, uint64_t, clmul_8x4_odd)
-DO_2OP(vmullpbw, 8, uint64_t, DO_VMULLPBW)
-DO_2OP(vmullptw, 8, uint64_t, DO_VMULLPTW)
+DO_2OP(vmullpbw, 8, uint64_t, clmul_16x2_even)
+DO_2OP(vmullptw, 8, uint64_t, clmul_16x2_odd)
 
 /*
  * Because the computation type is at least twice as large as required,
diff --git a/target/arm/tcg/vec_helper.c b/target/arm/tcg/vec_helper.c
index cd630ff905..5def86b573 100644
--- a/target/arm/tcg/vec_helper.c
+++ b/target/arm/tcg/vec_helper.c
@@ -2029,19 +2029,6 @@ void HELPER(gvec_pmull_q)(void *vd, void *vn, void *vm, 
uint32_t desc)
 clear_tail(d, opr_sz, simd_maxsz(desc));
 }
 
-uint64_t pmull_w(uint64_t op1, uint64_t op2)
-{
-uint64_t result = 0;
-int i;
-for (i = 0; i < 16; ++i) {
-uint64_t mask = (op1 & 0x00010001ull) * 0x;
-result ^= op2 & mask;
-op1 >>= 1;
-op2 <<= 1;
-}
-return result;
-}
-
 void HELPER(neon_pmull_h)(void *vd, void *vn, void *vm, uint32_t desc)
 {
 int hi = simd_data(desc);
-- 
2.34.1




[PATCH v2 04/18] target/ppc: Use clmul_8* routines

2023-08-18 Thread Richard Henderson
Use generic routines for 8-bit carry-less multiply.

Signed-off-by: Richard Henderson 
---
 target/ppc/int_helper.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 834da80fe3..343874863a 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -26,6 +26,7 @@
 #include "exec/helper-proto.h"
 #include "crypto/aes.h"
 #include "crypto/aes-round.h"
+#include "crypto/clmul.h"
 #include "fpu/softfloat.h"
 #include "qapi/error.h"
 #include "qemu/guest-random.h"
@@ -1425,6 +1426,18 @@ void helper_vbpermq(ppc_avr_t *r, ppc_avr_t *a, 
ppc_avr_t *b)
 #undef VBPERMQ_INDEX
 #undef VBPERMQ_DW
 
+/*
+ * There is no carry across the two doublewords, so their order does
+ * not matter.  Nor is there partial overlap between registers.
+ */
+void helper_vpmsumb(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
+{
+for (int i = 0; i < 2; ++i) {
+uint64_t aa = a->u64[i], bb = b->u64[i];
+r->u64[i] = clmul_8x4_even(aa, bb) ^ clmul_8x4_odd(aa, bb);
+}
+}
+
 #define PMSUM(name, srcfld, trgfld, trgtyp)   \
 void helper_##name(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)  \
 { \
@@ -1445,7 +1458,6 @@ void helper_##name(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t 
*b)  \
 } \
 }
 
-PMSUM(vpmsumb, u8, u16, uint16_t)
 PMSUM(vpmsumh, u16, u32, uint32_t)
 PMSUM(vpmsumw, u32, u64, uint64_t)
 
-- 
2.34.1




[PATCH v2 18/18] host/include/aarch64: Implement clmul.h

2023-08-18 Thread Richard Henderson
Detect PMULL in cpuinfo; implement the accel hook.

Signed-off-by: Richard Henderson 
---
 host/include/aarch64/host/cpuinfo.h  |  1 +
 host/include/aarch64/host/crypto/clmul.h | 41 
 util/cpuinfo-aarch64.c   |  4 ++-
 3 files changed, 45 insertions(+), 1 deletion(-)
 create mode 100644 host/include/aarch64/host/crypto/clmul.h

diff --git a/host/include/aarch64/host/cpuinfo.h 
b/host/include/aarch64/host/cpuinfo.h
index 769626b098..fe8c3b3fd1 100644
--- a/host/include/aarch64/host/cpuinfo.h
+++ b/host/include/aarch64/host/cpuinfo.h
@@ -10,6 +10,7 @@
 #define CPUINFO_LSE (1u << 1)
 #define CPUINFO_LSE2(1u << 2)
 #define CPUINFO_AES (1u << 3)
+#define CPUINFO_PMULL   (1u << 4)
 
 /* Initialized with a constructor. */
 extern unsigned cpuinfo;
diff --git a/host/include/aarch64/host/crypto/clmul.h 
b/host/include/aarch64/host/crypto/clmul.h
new file mode 100644
index 00..bb516d8b2f
--- /dev/null
+++ b/host/include/aarch64/host/crypto/clmul.h
@@ -0,0 +1,41 @@
+/*
+ * AArch64 specific clmul acceleration.
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef AARCH64_HOST_CRYPTO_CLMUL_H
+#define AARCH64_HOST_CRYPTO_CLMUL_H
+
+#include "host/cpuinfo.h"
+#include 
+
+/*
+ * 64x64->128 pmull is available with FEAT_PMULL.
+ * Both FEAT_AES and FEAT_PMULL are covered under the same macro.
+ */
+#ifdef __ARM_FEATURE_AES
+# define HAVE_CLMUL_ACCEL  true
+#else
+# define HAVE_CLMUL_ACCEL  likely(cpuinfo & CPUINFO_PMULL)
+#endif
+#if !defined(__ARM_FEATURE_AES) && defined(CONFIG_ARM_AES_BUILTIN)
+# define ATTR_CLMUL_ACCEL  __attribute__((target("+crypto")))
+#else
+# define ATTR_CLMUL_ACCEL
+#endif
+
+static inline Int128 ATTR_CLMUL_ACCEL
+clmul_64_accel(uint64_t n, uint64_t m)
+{
+union { poly128_t v; Int128 s; } u;
+
+#ifdef CONFIG_ARM_AES_BUILTIN
+u.v = vmull_p64((poly64_t)n, (poly64_t)m);
+#else
+asm(".arch_extension aes\n\t"
+"pmull %0.1q, %1.1d, %2.1d" : "=w"(u.v) : "w"(n), "w"(m));
+#endif
+return u.s;
+}
+
+#endif /* AARCH64_HOST_CRYPTO_CLMUL_H */
diff --git a/util/cpuinfo-aarch64.c b/util/cpuinfo-aarch64.c
index ababc39550..1d565b8420 100644
--- a/util/cpuinfo-aarch64.c
+++ b/util/cpuinfo-aarch64.c
@@ -56,12 +56,14 @@ unsigned __attribute__((constructor)) cpuinfo_init(void)
 unsigned long hwcap = qemu_getauxval(AT_HWCAP);
 info |= (hwcap & HWCAP_ATOMICS ? CPUINFO_LSE : 0);
 info |= (hwcap & HWCAP_USCAT ? CPUINFO_LSE2 : 0);
-info |= (hwcap & HWCAP_AES ? CPUINFO_AES: 0);
+info |= (hwcap & HWCAP_AES ? CPUINFO_AES : 0);
+info |= (hwcap & HWCAP_PMULL ? CPUINFO_PMULL : 0);
 #endif
 #ifdef CONFIG_DARWIN
 info |= sysctl_for_bool("hw.optional.arm.FEAT_LSE") * CPUINFO_LSE;
 info |= sysctl_for_bool("hw.optional.arm.FEAT_LSE2") * CPUINFO_LSE2;
 info |= sysctl_for_bool("hw.optional.arm.FEAT_AES") * CPUINFO_AES;
+info |= sysctl_for_bool("hw.optional.arm.FEAT_PMULL") * CPUINFO_PMULL;
 #endif
 
 cpuinfo = info;
-- 
2.34.1




[PATCH v2 05/18] crypto: Add generic 16-bit carry-less multiply routines

2023-08-18 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 include/crypto/clmul.h | 16 
 crypto/clmul.c | 21 +
 2 files changed, 37 insertions(+)

diff --git a/include/crypto/clmul.h b/include/crypto/clmul.h
index 153b5e3057..c7ad28aa85 100644
--- a/include/crypto/clmul.h
+++ b/include/crypto/clmul.h
@@ -38,4 +38,20 @@ uint64_t clmul_8x4_odd(uint64_t, uint64_t);
  */
 uint64_t clmul_8x4_packed(uint32_t, uint32_t);
 
+/**
+ * clmul_16x2_even:
+ *
+ * Perform two 16x16->32 carry-less multiplies.
+ * The odd words of the inputs are ignored.
+ */
+uint64_t clmul_16x2_even(uint64_t, uint64_t);
+
+/**
+ * clmul_16x2_odd:
+ *
+ * Perform two 16x16->32 carry-less multiplies.
+ * The even bytes of the inputs are ignored.
+ */
+uint64_t clmul_16x2_odd(uint64_t, uint64_t);
+
 #endif /* CRYPTO_CLMUL_H */
diff --git a/crypto/clmul.c b/crypto/clmul.c
index 82d873fee5..2c87cfbf8a 100644
--- a/crypto/clmul.c
+++ b/crypto/clmul.c
@@ -58,3 +58,24 @@ uint64_t clmul_8x4_packed(uint32_t n, uint32_t m)
 {
 return clmul_8x4_even_int(unpack_8_to_16(n), unpack_8_to_16(m));
 }
+
+uint64_t clmul_16x2_even(uint64_t n, uint64_t m)
+{
+uint64_t r = 0;
+
+n &= 0xull;
+m &= 0xull;
+
+for (int i = 0; i < 16; ++i) {
+uint64_t mask = (n & 0x00010001ull) * 0xull;
+r ^= m & mask;
+n >>= 1;
+m <<= 1;
+}
+return r;
+}
+
+uint64_t clmul_16x2_odd(uint64_t n, uint64_t m)
+{
+return clmul_16x2_even(n >> 16, m >> 16);
+}
-- 
2.34.1




[PATCH v2 00/18] crypto: Provide clmul.h and host accel

2023-08-18 Thread Richard Henderson
Inspired by Ard Biesheuvel's RFC patches [1] for accelerating
carry-less multiply under emulation.

Changes for v2:
  * Only accelerate clmul_64; keep generic helpers for other sizes.
  * Drop most of the Int128 interfaces, except for clmul_64.
  * Use the same acceleration format as aes-round.h.


r~


[1] https://patchew.org/QEMU/20230601123332.3297404-1-a...@kernel.org/

Richard Henderson (18):
  crypto: Add generic 8-bit carry-less multiply routines
  target/arm: Use clmul_8* routines
  target/s390x: Use clmul_8* routines
  target/ppc: Use clmul_8* routines
  crypto: Add generic 16-bit carry-less multiply routines
  target/arm: Use clmul_16* routines
  target/s390x: Use clmul_16* routines
  target/ppc: Use clmul_16* routines
  crypto: Add generic 32-bit carry-less multiply routines
  target/arm: Use clmul_32* routines
  target/s390x: Use clmul_32* routines
  target/ppc: Use clmul_32* routines
  crypto: Add generic 64-bit carry-less multiply routine
  target/arm: Use clmul_64
  target/s390x: Use clmul_64
  target/ppc: Use clmul_64
  host/include/i386: Implement clmul.h
  host/include/aarch64: Implement clmul.h

 host/include/aarch64/host/cpuinfo.h  |   1 +
 host/include/aarch64/host/crypto/clmul.h |  41 +
 host/include/generic/host/crypto/clmul.h |  15 ++
 host/include/i386/host/cpuinfo.h |   1 +
 host/include/i386/host/crypto/clmul.h|  29 
 host/include/x86_64/host/crypto/clmul.h  |   1 +
 include/crypto/clmul.h   |  83 ++
 include/qemu/cpuid.h |   3 +
 target/arm/tcg/vec_internal.h|  11 --
 crypto/clmul.c   | 112 ++
 target/arm/tcg/mve_helper.c  |  16 +-
 target/arm/tcg/vec_helper.c  | 102 ++---
 target/ppc/int_helper.c  |  64 
 target/s390x/tcg/vec_int_helper.c| 186 ++-
 util/cpuinfo-aarch64.c   |   4 +-
 util/cpuinfo-i386.c  |   1 +
 crypto/meson.build   |   9 +-
 17 files changed, 425 insertions(+), 254 deletions(-)
 create mode 100644 host/include/aarch64/host/crypto/clmul.h
 create mode 100644 host/include/generic/host/crypto/clmul.h
 create mode 100644 host/include/i386/host/crypto/clmul.h
 create mode 100644 host/include/x86_64/host/crypto/clmul.h
 create mode 100644 include/crypto/clmul.h
 create mode 100644 crypto/clmul.c

-- 
2.34.1




[PATCH v2 16/18] target/ppc: Use clmul_64

2023-08-18 Thread Richard Henderson
Use generic routine for 64-bit carry-less multiply.

Signed-off-by: Richard Henderson 
---
 target/ppc/int_helper.c | 17 +++--
 1 file changed, 3 insertions(+), 14 deletions(-)

diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index ce793cf163..432834c7d5 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -1456,20 +1456,9 @@ void helper_vpmsumw(ppc_avr_t *r, ppc_avr_t *a, 
ppc_avr_t *b)
 
 void helper_VPMSUMD(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
 {
-int i, j;
-Int128 tmp, prod[2] = {int128_zero(), int128_zero()};
-
-for (j = 0; j < 64; j++) {
-for (i = 0; i < ARRAY_SIZE(r->u64); i++) {
-if (a->VsrD(i) & (1ull << j)) {
-tmp = int128_make64(b->VsrD(i));
-tmp = int128_lshift(tmp, j);
-prod[i] = int128_xor(prod[i], tmp);
-}
-}
-}
-
-r->s128 = int128_xor(prod[0], prod[1]);
+Int128 e = clmul_64(a->u64[0], b->u64[0]);
+Int128 o = clmul_64(a->u64[1], b->u64[1]);
+r->s128 = int128_xor(e, o);
 }
 
 #if HOST_BIG_ENDIAN
-- 
2.34.1




[PATCH v2 01/18] crypto: Add generic 8-bit carry-less multiply routines

2023-08-18 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 include/crypto/clmul.h | 41 +
 crypto/clmul.c | 60 ++
 crypto/meson.build |  9 ---
 3 files changed, 107 insertions(+), 3 deletions(-)
 create mode 100644 include/crypto/clmul.h
 create mode 100644 crypto/clmul.c

diff --git a/include/crypto/clmul.h b/include/crypto/clmul.h
new file mode 100644
index 00..153b5e3057
--- /dev/null
+++ b/include/crypto/clmul.h
@@ -0,0 +1,41 @@
+/*
+ * Carry-less multiply operations.
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * Copyright (C) 2023 Linaro, Ltd.
+ */
+
+#ifndef CRYPTO_CLMUL_H
+#define CRYPTO_CLMUL_H
+
+/**
+ * clmul_8x8_low:
+ *
+ * Perform eight 8x8->8 carry-less multiplies.
+ */
+uint64_t clmul_8x8_low(uint64_t, uint64_t);
+
+/**
+ * clmul_8x4_even:
+ *
+ * Perform four 8x8->16 carry-less multiplies.
+ * The odd bytes of the inputs are ignored.
+ */
+uint64_t clmul_8x4_even(uint64_t, uint64_t);
+
+/**
+ * clmul_8x4_odd:
+ *
+ * Perform four 8x8->16 carry-less multiplies.
+ * The even bytes of the inputs are ignored.
+ */
+uint64_t clmul_8x4_odd(uint64_t, uint64_t);
+
+/**
+ * clmul_8x4_packed:
+ *
+ * Perform four 8x8->16 carry-less multiplies.
+ */
+uint64_t clmul_8x4_packed(uint32_t, uint32_t);
+
+#endif /* CRYPTO_CLMUL_H */
diff --git a/crypto/clmul.c b/crypto/clmul.c
new file mode 100644
index 00..82d873fee5
--- /dev/null
+++ b/crypto/clmul.c
@@ -0,0 +1,60 @@
+/*
+ * Carry-less multiply operations.
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * Copyright (C) 2023 Linaro, Ltd.
+ */
+
+#include "qemu/osdep.h"
+#include "crypto/clmul.h"
+
+uint64_t clmul_8x8_low(uint64_t n, uint64_t m)
+{
+uint64_t r = 0;
+
+for (int i = 0; i < 8; ++i) {
+uint64_t mask = (n & 0x0101010101010101ull) * 0xff;
+r ^= m & mask;
+m = (m << 1) & 0xfefefefefefefefeull;
+n >>= 1;
+}
+return r;
+}
+
+static uint64_t clmul_8x4_even_int(uint64_t n, uint64_t m)
+{
+uint64_t r = 0;
+
+for (int i = 0; i < 8; ++i) {
+uint64_t mask = (n & 0x0001000100010001ull) * 0x;
+r ^= m & mask;
+n >>= 1;
+m <<= 1;
+}
+return r;
+}
+
+uint64_t clmul_8x4_even(uint64_t n, uint64_t m)
+{
+n &= 0x00ff00ff00ff00ffull;
+m &= 0x00ff00ff00ff00ffull;
+return clmul_8x4_even_int(n, m);
+}
+
+uint64_t clmul_8x4_odd(uint64_t n, uint64_t m)
+{
+return clmul_8x4_even(n >> 8, m >> 8);
+}
+
+static uint64_t unpack_8_to_16(uint64_t x)
+{
+return  (x & 0x00ff)
+ | ((x & 0xff00) << 8)
+ | ((x & 0x00ff) << 16)
+ | ((x & 0xff00) << 24);
+}
+
+uint64_t clmul_8x4_packed(uint32_t n, uint32_t m)
+{
+return clmul_8x4_even_int(unpack_8_to_16(n), unpack_8_to_16(m));
+}
diff --git a/crypto/meson.build b/crypto/meson.build
index 5f03a30d34..9ac1a89802 100644
--- a/crypto/meson.build
+++ b/crypto/meson.build
@@ -48,9 +48,12 @@ if have_afalg
 endif
 crypto_ss.add(when: gnutls, if_true: files('tls-cipher-suites.c'))
 
-util_ss.add(files('sm4.c'))
-util_ss.add(files('aes.c'))
-util_ss.add(files('init.c'))
+util_ss.add(files(
+  'aes.c',
+  'clmul.c',
+  'init.c',
+  'sm4.c',
+))
 if gnutls.found()
   util_ss.add(gnutls)
 endif
-- 
2.34.1




[PATCH v2 03/18] target/s390x: Use clmul_8* routines

2023-08-18 Thread Richard Henderson
Use generic routines for 8-bit carry-less multiply.
Remove our local version of galois_multiply8.

Signed-off-by: Richard Henderson 
---
 target/s390x/tcg/vec_int_helper.c | 32 ---
 1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/target/s390x/tcg/vec_int_helper.c 
b/target/s390x/tcg/vec_int_helper.c
index 53ab5c5eb3..edff4d6b2b 100644
--- a/target/s390x/tcg/vec_int_helper.c
+++ b/target/s390x/tcg/vec_int_helper.c
@@ -14,6 +14,7 @@
 #include "vec.h"
 #include "exec/helper-proto.h"
 #include "tcg/tcg-gvec-desc.h"
+#include "crypto/clmul.h"
 
 static bool s390_vec_is_zero(const S390Vector *v)
 {
@@ -179,7 +180,6 @@ static uint##TBITS##_t 
galois_multiply##BITS(uint##TBITS##_t a,\
 }  
\
 return res;
\
 }
-DEF_GALOIS_MULTIPLY(8, 16)
 DEF_GALOIS_MULTIPLY(16, 32)
 DEF_GALOIS_MULTIPLY(32, 64)
 
@@ -203,6 +203,34 @@ static S390Vector galois_multiply64(uint64_t a, uint64_t b)
 return res;
 }
 
+/*
+ * There is no carry across the two doublewords, so their order does
+ * not matter.  Nor is there partial overlap between registers.
+ */
+static inline uint64_t do_gfma8(uint64_t n, uint64_t m, uint64_t a)
+{
+return clmul_8x4_even(n, m) ^ clmul_8x4_odd(n, m) ^ a;
+}
+
+void HELPER(gvec_vgfm8)(void *v1, const void *v2, const void *v3, uint32_t d)
+{
+uint64_t *q1 = v1;
+const uint64_t *q2 = v2, *q3 = v3;
+
+q1[0] = do_gfma8(q2[0], q3[0], 0);
+q1[1] = do_gfma8(q2[1], q3[1], 0);
+}
+
+void HELPER(gvec_vgfma8)(void *v1, const void *v2, const void *v3,
+ const void *v4, uint32_t desc)
+{
+uint64_t *q1 = v1;
+const uint64_t *q2 = v2, *q3 = v3, *q4 = v4;
+
+q1[0] = do_gfma8(q2[0], q3[0], q4[0]);
+q1[1] = do_gfma8(q2[1], q3[1], q4[1]);
+}
+
 #define DEF_VGFM(BITS, TBITS)  
\
 void HELPER(gvec_vgfm##BITS)(void *v1, const void *v2, const void *v3, 
\
  uint32_t desc)
\
@@ -220,7 +248,6 @@ void HELPER(gvec_vgfm##BITS)(void *v1, const void *v2, 
const void *v3, \
 s390_vec_write_element##TBITS(v1, i, d);   
\
 }  
\
 }
-DEF_VGFM(8, 16)
 DEF_VGFM(16, 32)
 DEF_VGFM(32, 64)
 
@@ -257,7 +284,6 @@ void HELPER(gvec_vgfma##BITS)(void *v1, const void *v2, 
const void *v3,\
 s390_vec_write_element##TBITS(v1, i, d);   
\
 }  
\
 }
-DEF_VGFMA(8, 16)
 DEF_VGFMA(16, 32)
 DEF_VGFMA(32, 64)
 
-- 
2.34.1




[PATCH v2 04/23] target/arm: Use tcg_gen_negsetcond_*

2023-08-18 Thread Richard Henderson
Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/tcg/translate-a64.c | 22 +-
 target/arm/tcg/translate.c | 12 
 2 files changed, 13 insertions(+), 21 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 5fa1257d32..da686cc953 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -4935,9 +4935,12 @@ static void disas_cond_select(DisasContext *s, uint32_t 
insn)
 
 if (rn == 31 && rm == 31 && (else_inc ^ else_inv)) {
 /* CSET & CSETM.  */
-tcg_gen_setcond_i64(tcg_invert_cond(c.cond), tcg_rd, c.value, zero);
 if (else_inv) {
-tcg_gen_neg_i64(tcg_rd, tcg_rd);
+tcg_gen_negsetcond_i64(tcg_invert_cond(c.cond),
+   tcg_rd, c.value, zero);
+} else {
+tcg_gen_setcond_i64(tcg_invert_cond(c.cond),
+tcg_rd, c.value, zero);
 }
 } else {
 TCGv_i64 t_true = cpu_reg(s, rn);
@@ -8670,13 +8673,10 @@ static void handle_3same_64(DisasContext *s, int 
opcode, bool u,
 }
 break;
 case 0x6: /* CMGT, CMHI */
-/* 64 bit integer comparison, result = test ? (2^64 - 1) : 0.
- * We implement this using setcond (test) and then negating.
- */
 cond = u ? TCG_COND_GTU : TCG_COND_GT;
 do_cmop:
-tcg_gen_setcond_i64(cond, tcg_rd, tcg_rn, tcg_rm);
-tcg_gen_neg_i64(tcg_rd, tcg_rd);
+/* 64 bit integer comparison, result = test ? -1 : 0. */
+tcg_gen_negsetcond_i64(cond, tcg_rd, tcg_rn, tcg_rm);
 break;
 case 0x7: /* CMGE, CMHS */
 cond = u ? TCG_COND_GEU : TCG_COND_GE;
@@ -9265,14 +9265,10 @@ static void handle_2misc_64(DisasContext *s, int 
opcode, bool u,
 }
 break;
 case 0xa: /* CMLT */
-/* 64 bit integer comparison against zero, result is
- * test ? (2^64 - 1) : 0. We implement via setcond(!test) and
- * subtracting 1.
- */
 cond = TCG_COND_LT;
 do_cmop:
-tcg_gen_setcondi_i64(cond, tcg_rd, tcg_rn, 0);
-tcg_gen_neg_i64(tcg_rd, tcg_rd);
+/* 64 bit integer comparison against zero, result is test ? -1 : 0. */
+tcg_gen_negsetcond_i64(cond, tcg_rd, tcg_rn, tcg_constant_i64(0));
 break;
 case 0x8: /* CMGT, CMGE */
 cond = u ? TCG_COND_GE : TCG_COND_GT;
diff --git a/target/arm/tcg/translate.c b/target/arm/tcg/translate.c
index b71ac2d0d5..31d3130e4c 100644
--- a/target/arm/tcg/translate.c
+++ b/target/arm/tcg/translate.c
@@ -2946,13 +2946,11 @@ void gen_gvec_sqrdmlsh_qc(unsigned vece, uint32_t 
rd_ofs, uint32_t rn_ofs,
 #define GEN_CMP0(NAME, COND)\
 static void gen_##NAME##0_i32(TCGv_i32 d, TCGv_i32 a)   \
 {   \
-tcg_gen_setcondi_i32(COND, d, a, 0);\
-tcg_gen_neg_i32(d, d);  \
+tcg_gen_negsetcond_i32(COND, d, a, tcg_constant_i32(0));\
 }   \
 static void gen_##NAME##0_i64(TCGv_i64 d, TCGv_i64 a)   \
 {   \
-tcg_gen_setcondi_i64(COND, d, a, 0);\
-tcg_gen_neg_i64(d, d);  \
+tcg_gen_negsetcond_i64(COND, d, a, tcg_constant_i64(0));\
 }   \
 static void gen_##NAME##0_vec(unsigned vece, TCGv_vec d, TCGv_vec a) \
 {   \
@@ -3863,15 +3861,13 @@ void gen_gvec_mls(unsigned vece, uint32_t rd_ofs, 
uint32_t rn_ofs,
 static void gen_cmtst_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
 {
 tcg_gen_and_i32(d, a, b);
-tcg_gen_setcondi_i32(TCG_COND_NE, d, d, 0);
-tcg_gen_neg_i32(d, d);
+tcg_gen_negsetcond_i32(TCG_COND_NE, d, d, tcg_constant_i32(0));
 }
 
 void gen_cmtst_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
 {
 tcg_gen_and_i64(d, a, b);
-tcg_gen_setcondi_i64(TCG_COND_NE, d, d, 0);
-tcg_gen_neg_i64(d, d);
+tcg_gen_negsetcond_i64(TCG_COND_NE, d, d, tcg_constant_i64(0));
 }
 
 static void gen_cmtst_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
-- 
2.34.1




[PATCH v2 19/23] tcg/i386: Merge tcg_out_movcond{32,64}

2023-08-18 Thread Richard Henderson
Pass a rexw parameter instead of duplicating the functions.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.c.inc | 28 +++-
 1 file changed, 7 insertions(+), 21 deletions(-)

diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 010432d3a9..1542afd94d 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -1593,24 +1593,14 @@ static void tcg_out_cmov(TCGContext *s, TCGCond cond, 
int rexw,
 }
 }
 
-static void tcg_out_movcond32(TCGContext *s, TCGCond cond, TCGReg dest,
-  TCGReg c1, TCGArg c2, int const_c2,
-  TCGReg v1)
+static void tcg_out_movcond(TCGContext *s, int rexw, TCGCond cond,
+TCGReg dest, TCGReg c1, TCGArg c2, int const_c2,
+TCGReg v1)
 {
-tcg_out_cmp(s, c1, c2, const_c2, 0);
-tcg_out_cmov(s, cond, 0, dest, v1);
+tcg_out_cmp(s, c1, c2, const_c2, rexw);
+tcg_out_cmov(s, cond, rexw, dest, v1);
 }
 
-#if TCG_TARGET_REG_BITS == 64
-static void tcg_out_movcond64(TCGContext *s, TCGCond cond, TCGReg dest,
-  TCGReg c1, TCGArg c2, int const_c2,
-  TCGReg v1)
-{
-tcg_out_cmp(s, c1, c2, const_c2, P_REXW);
-tcg_out_cmov(s, cond, P_REXW, dest, v1);
-}
-#endif
-
 static void tcg_out_ctz(TCGContext *s, int rexw, TCGReg dest, TCGReg arg1,
 TCGArg arg2, bool const_a2)
 {
@@ -2564,8 +2554,8 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 OP_32_64(setcond):
 tcg_out_setcond(s, rexw, args[3], a0, a1, a2, const_a2);
 break;
-case INDEX_op_movcond_i32:
-tcg_out_movcond32(s, args[5], a0, a1, a2, const_a2, args[3]);
+OP_32_64(movcond):
+tcg_out_movcond(s, rexw, args[5], a0, a1, a2, const_a2, args[3]);
 break;
 
 OP_32_64(bswap16):
@@ -2714,10 +2704,6 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 }
 break;
 
-case INDEX_op_movcond_i64:
-tcg_out_movcond64(s, args[5], a0, a1, a2, const_a2, args[3]);
-break;
-
 case INDEX_op_bswap64_i64:
 tcg_out_bswap64(s, a0);
 break;
-- 
2.34.1




[PATCH v2 12/23] tcg/aarch64: Implement negsetcond_*

2023-08-18 Thread Richard Henderson
Trivial, as aarch64 has an instruction for this: CSETM.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 tcg/aarch64/tcg-target.h |  4 ++--
 tcg/aarch64/tcg-target.c.inc | 12 
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 6080fddf73..e3faa9cff4 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -94,7 +94,7 @@ typedef enum {
 #define TCG_TARGET_HAS_mulsh_i320
 #define TCG_TARGET_HAS_extrl_i64_i320
 #define TCG_TARGET_HAS_extrh_i64_i320
-#define TCG_TARGET_HAS_negsetcond_i32   0
+#define TCG_TARGET_HAS_negsetcond_i32   1
 #define TCG_TARGET_HAS_qemu_st8_i32 0
 
 #define TCG_TARGET_HAS_div_i64  1
@@ -130,7 +130,7 @@ typedef enum {
 #define TCG_TARGET_HAS_muls2_i640
 #define TCG_TARGET_HAS_muluh_i641
 #define TCG_TARGET_HAS_mulsh_i641
-#define TCG_TARGET_HAS_negsetcond_i64   0
+#define TCG_TARGET_HAS_negsetcond_i64   1
 
 /*
  * Without FEAT_LSE2, we must use LDXP+STXP to implement atomic 128-bit load,
diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index 35ca80cd56..7d8d114c9e 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -2262,6 +2262,16 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
  TCG_REG_XZR, tcg_invert_cond(args[3]));
 break;
 
+case INDEX_op_negsetcond_i32:
+a2 = (int32_t)a2;
+/* FALLTHRU */
+case INDEX_op_negsetcond_i64:
+tcg_out_cmp(s, ext, a1, a2, c2);
+/* Use CSETM alias of CSINV Wd, WZR, WZR, invert(cond).  */
+tcg_out_insn(s, 3506, CSINV, ext, a0, TCG_REG_XZR,
+ TCG_REG_XZR, tcg_invert_cond(args[3]));
+break;
+
 case INDEX_op_movcond_i32:
 a2 = (int32_t)a2;
 /* FALLTHRU */
@@ -2868,6 +2878,8 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_sub_i64:
 case INDEX_op_setcond_i32:
 case INDEX_op_setcond_i64:
+case INDEX_op_negsetcond_i32:
+case INDEX_op_negsetcond_i64:
 return C_O1_I2(r, r, rA);
 
 case INDEX_op_mul_i32:
-- 
2.34.1




[PATCH v2 22/23] tcg/i386: Use shift in tcg_out_setcond

2023-08-18 Thread Richard Henderson
For LT/GE vs zero, shift down the sign bit.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.c.inc | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 3f3c114efd..16e830051d 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -1578,6 +1578,21 @@ static void tcg_out_setcond(TCGContext *s, int rexw, 
TCGCond cond,
 }
 return;
 
+case TCG_COND_GE:
+inv = true;
+/* fall through */
+case TCG_COND_LT:
+/* If arg2 is 0, extract the sign bit. */
+if (const_arg2 && arg2 == 0) {
+tcg_out_mov(s, rexw ? TCG_TYPE_I64 : TCG_TYPE_I32, dest, arg1);
+if (inv) {
+tcg_out_modrm(s, OPC_GRP3_Ev + rexw, EXT3_NOT, dest);
+}
+tcg_out_shifti(s, SHIFT_SHR + rexw, dest, rexw ? 63 : 31);
+return;
+}
+break;
+
 default:
 break;
 }
-- 
2.34.1




[PATCH v2 21/23] tcg/i386: Clear dest first in tcg_out_setcond if possible

2023-08-18 Thread Richard Henderson
Using XOR first is both smaller and more efficient,
though cannot be applied if it clobbers an input.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.c.inc | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 4d7b745a52..3f3c114efd 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -1532,6 +1532,7 @@ static void tcg_out_setcond(TCGContext *s, int rexw, 
TCGCond cond,
 int const_arg2)
 {
 bool inv = false;
+bool cleared;
 
 switch (cond) {
 case TCG_COND_NE:
@@ -1581,9 +1582,23 @@ static void tcg_out_setcond(TCGContext *s, int rexw, 
TCGCond cond,
 break;
 }
 
+/*
+ * If dest does not overlap the inputs, clearing it first is preferred.
+ * The XOR breaks any false dependency for the low-byte write to dest,
+ * and is also one byte smaller than MOVZBL.
+ */
+cleared = false;
+if (dest != arg1 && (const_arg2 || dest != arg2)) {
+tgen_arithr(s, ARITH_XOR, dest, dest);
+cleared = true;
+}
+
 tcg_out_cmp(s, arg1, arg2, const_arg2, rexw);
 tcg_out_modrm(s, OPC_SETCC | tcg_cond_to_jcc[cond], 0, dest);
-tcg_out_ext8u(s, dest, dest);
+
+if (!cleared) {
+tcg_out_ext8u(s, dest, dest);
+}
 }
 
 #if TCG_TARGET_REG_BITS == 32
-- 
2.34.1




[PATCH v2 10/23] tcg/ppc: Implement negsetcond_*

2023-08-18 Thread Richard Henderson
In the general case we simply negate.  However with isel we
may load -1 instead of 1 with no extra effort.

Consolidate EQ0 and NE0 logic.  Replace the NE0 zero-extension
with inversion+negation of EQ0, which is never worse and may
eliminate one insn.  Provide a special case for -EQ0.

Reviewed-by: Daniel Henrique Barboza 
Signed-off-by: Richard Henderson 
---
 tcg/ppc/tcg-target.h |   4 +-
 tcg/ppc/tcg-target.c.inc | 127 ---
 2 files changed, 82 insertions(+), 49 deletions(-)

diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index ba4fd3eb3a..a143b8f1e0 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -101,7 +101,7 @@ typedef enum {
 #define TCG_TARGET_HAS_muls2_i320
 #define TCG_TARGET_HAS_muluh_i321
 #define TCG_TARGET_HAS_mulsh_i321
-#define TCG_TARGET_HAS_negsetcond_i32   0
+#define TCG_TARGET_HAS_negsetcond_i32   1
 #define TCG_TARGET_HAS_qemu_st8_i32 0
 
 #if TCG_TARGET_REG_BITS == 64
@@ -142,7 +142,7 @@ typedef enum {
 #define TCG_TARGET_HAS_muls2_i640
 #define TCG_TARGET_HAS_muluh_i641
 #define TCG_TARGET_HAS_mulsh_i641
-#define TCG_TARGET_HAS_negsetcond_i64   0
+#define TCG_TARGET_HAS_negsetcond_i64   1
 #endif
 
 #define TCG_TARGET_HAS_qemu_ldst_i128   \
diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index 511e14b180..10448aa0e6 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -1548,8 +1548,20 @@ static void tcg_out_cmp(TCGContext *s, int cond, TCGArg 
arg1, TCGArg arg2,
 }
 
 static void tcg_out_setcond_eq0(TCGContext *s, TCGType type,
-TCGReg dst, TCGReg src)
+TCGReg dst, TCGReg src, bool neg)
 {
+if (neg && (TCG_TARGET_REG_BITS == 32 || type == TCG_TYPE_I64)) {
+/*
+ * X != 0 implies X + -1 generates a carry.
+ * RT = (~X + X) + CA
+ *= -1 + CA
+ *= CA ? 0 : -1
+ */
+tcg_out32(s, ADDIC | TAI(TCG_REG_R0, src, -1));
+tcg_out32(s, SUBFE | TAB(dst, src, src));
+return;
+}
+
 if (type == TCG_TYPE_I32) {
 tcg_out32(s, CNTLZW | RS(src) | RA(dst));
 tcg_out_shri32(s, dst, dst, 5);
@@ -1557,18 +1569,28 @@ static void tcg_out_setcond_eq0(TCGContext *s, TCGType 
type,
 tcg_out32(s, CNTLZD | RS(src) | RA(dst));
 tcg_out_shri64(s, dst, dst, 6);
 }
+if (neg) {
+tcg_out32(s, NEG | RT(dst) | RA(dst));
+}
 }
 
-static void tcg_out_setcond_ne0(TCGContext *s, TCGReg dst, TCGReg src)
+static void tcg_out_setcond_ne0(TCGContext *s, TCGType type,
+TCGReg dst, TCGReg src, bool neg)
 {
-/* X != 0 implies X + -1 generates a carry.  Extra addition
-   trickery means: R = X-1 + ~X + C = X-1 + (-X+1) + C = C.  */
-if (dst != src) {
-tcg_out32(s, ADDIC | TAI(dst, src, -1));
-tcg_out32(s, SUBFE | TAB(dst, dst, src));
-} else {
+if (!neg && (TCG_TARGET_REG_BITS == 32 || type == TCG_TYPE_I64)) {
+/*
+ * X != 0 implies X + -1 generates a carry.  Extra addition
+ * trickery means: R = X-1 + ~X + C = X-1 + (-X+1) + C = C.
+ */
 tcg_out32(s, ADDIC | TAI(TCG_REG_R0, src, -1));
 tcg_out32(s, SUBFE | TAB(dst, TCG_REG_R0, src));
+return;
+}
+tcg_out_setcond_eq0(s, type, dst, src, false);
+if (neg) {
+tcg_out32(s, ADDI | TAI(dst, dst, -1));
+} else {
+tcg_out_xori32(s, dst, dst, 1);
 }
 }
 
@@ -1590,9 +1612,10 @@ static TCGReg tcg_gen_setcond_xor(TCGContext *s, TCGReg 
arg1, TCGArg arg2,
 
 static void tcg_out_setcond(TCGContext *s, TCGType type, TCGCond cond,
 TCGArg arg0, TCGArg arg1, TCGArg arg2,
-int const_arg2)
+int const_arg2, bool neg)
 {
-int crop, sh;
+int sh;
+bool inv;
 
 tcg_debug_assert(TCG_TARGET_REG_BITS == 64 || type == TCG_TYPE_I32);
 
@@ -1605,14 +1628,10 @@ static void tcg_out_setcond(TCGContext *s, TCGType 
type, TCGCond cond,
 if (arg2 == 0) {
 switch (cond) {
 case TCG_COND_EQ:
-tcg_out_setcond_eq0(s, type, arg0, arg1);
+tcg_out_setcond_eq0(s, type, arg0, arg1, neg);
 return;
 case TCG_COND_NE:
-if (TCG_TARGET_REG_BITS == 64 && type == TCG_TYPE_I32) {
-tcg_out_ext32u(s, TCG_REG_R0, arg1);
-arg1 = TCG_REG_R0;
-}
-tcg_out_setcond_ne0(s, arg0, arg1);
+tcg_out_setcond_ne0(s, type, arg0, arg1, neg);
 return;
 case TCG_COND_GE:
 tcg_out32(s, NOR | SAB(arg1, arg0, arg1));
@@ -1621,9 +1640,17 @@ static void tcg_out_setcond(TCGContext *s, TCGType type, 
TCGCond cond,
 case TCG_COND_LT:
 /* Extract the sign bit.  */
 if (type == TCG_TYPE_I32) {
-tcg_out_shri32(s, arg0, arg1, 31);
+

[PATCH v2 20/23] tcg/i386: Use CMP+SBB in tcg_out_setcond

2023-08-18 Thread Richard Henderson
Use the carry bit to optimize some forms of setcond.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.c.inc | 50 +++
 1 file changed, 50 insertions(+)

diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 1542afd94d..4d7b745a52 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -1531,6 +1531,56 @@ static void tcg_out_setcond(TCGContext *s, int rexw, 
TCGCond cond,
 TCGArg dest, TCGArg arg1, TCGArg arg2,
 int const_arg2)
 {
+bool inv = false;
+
+switch (cond) {
+case TCG_COND_NE:
+inv = true;
+/* fall through */
+case TCG_COND_EQ:
+/* If arg2 is 0, convert to LTU/GEU vs 1. */
+if (const_arg2 && arg2 == 0) {
+arg2 = 1;
+goto do_ltu;
+}
+break;
+
+case TCG_COND_LEU:
+inv = true;
+/* fall through */
+case TCG_COND_GTU:
+/* If arg2 is a register, swap for LTU/GEU. */
+if (!const_arg2) {
+TCGReg t = arg1;
+arg1 = arg2;
+arg2 = t;
+goto do_ltu;
+}
+break;
+
+case TCG_COND_GEU:
+inv = true;
+/* fall through */
+case TCG_COND_LTU:
+do_ltu:
+/*
+ * Relying on the carry bit, use SBB to produce -1 if LTU, 0 if GEU.
+ * We can then use NEG or INC to produce the desired result.
+ * This is always smaller than the SETCC expansion.
+ */
+tcg_out_cmp(s, arg1, arg2, const_arg2, rexw);
+tgen_arithr(s, ARITH_SBB, dest, dest);  /* T:-1 F:0 */
+if (inv) {
+tgen_arithi(s, ARITH_ADD, dest, 1, 0);  /* T:0  F:1 */
+} else {
+tcg_out_modrm(s, OPC_GRP3_Ev, EXT3_NEG, dest);  /* T:1  F:0 */
+}
+return;
+
+default:
+break;
+}
+
 tcg_out_cmp(s, arg1, arg2, const_arg2, rexw);
 tcg_out_modrm(s, OPC_SETCC | tcg_cond_to_jcc[cond], 0, dest);
 tcg_out_ext8u(s, dest, dest);
-- 
2.34.1




[PATCH v2 18/23] tcg/i386: Merge tcg_out_setcond{32,64}

2023-08-18 Thread Richard Henderson
Pass a rexw parameter instead of duplicating the functions.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.c.inc | 24 +++-
 1 file changed, 7 insertions(+), 17 deletions(-)

diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 33f66ba204..010432d3a9 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -1527,23 +1527,16 @@ static void tcg_out_brcond2(TCGContext *s, const TCGArg 
*args,
 }
 #endif
 
-static void tcg_out_setcond32(TCGContext *s, TCGCond cond, TCGArg dest,
-  TCGArg arg1, TCGArg arg2, int const_arg2)
+static void tcg_out_setcond(TCGContext *s, int rexw, TCGCond cond,
+TCGArg dest, TCGArg arg1, TCGArg arg2,
+int const_arg2)
 {
-tcg_out_cmp(s, arg1, arg2, const_arg2, 0);
+tcg_out_cmp(s, arg1, arg2, const_arg2, rexw);
 tcg_out_modrm(s, OPC_SETCC | tcg_cond_to_jcc[cond], 0, dest);
 tcg_out_ext8u(s, dest, dest);
 }
 
-#if TCG_TARGET_REG_BITS == 64
-static void tcg_out_setcond64(TCGContext *s, TCGCond cond, TCGArg dest,
-  TCGArg arg1, TCGArg arg2, int const_arg2)
-{
-tcg_out_cmp(s, arg1, arg2, const_arg2, P_REXW);
-tcg_out_modrm(s, OPC_SETCC | tcg_cond_to_jcc[cond], 0, dest);
-tcg_out_ext8u(s, dest, dest);
-}
-#else
+#if TCG_TARGET_REG_BITS == 32
 static void tcg_out_setcond2(TCGContext *s, const TCGArg *args,
  const int *const_args)
 {
@@ -2568,8 +2561,8 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 tcg_out_brcond(s, rexw, a2, a0, a1, const_args[1],
arg_label(args[3]), 0);
 break;
-case INDEX_op_setcond_i32:
-tcg_out_setcond32(s, args[3], a0, a1, a2, const_a2);
+OP_32_64(setcond):
+tcg_out_setcond(s, rexw, args[3], a0, a1, a2, const_a2);
 break;
 case INDEX_op_movcond_i32:
 tcg_out_movcond32(s, args[5], a0, a1, a2, const_a2, args[3]);
@@ -2721,9 +2714,6 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 }
 break;
 
-case INDEX_op_setcond_i64:
-tcg_out_setcond64(s, args[3], a0, a1, a2, const_a2);
-break;
 case INDEX_op_movcond_i64:
 tcg_out_movcond64(s, args[5], a0, a1, a2, const_a2, args[3]);
 break;
-- 
2.34.1




[PATCH v2 15/23] tcg/s390x: Implement negsetcond_*

2023-08-18 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
Cc: Thomas Huth 
Cc: qemu-s3...@nongnu.org
---
 tcg/s390x/tcg-target.h |  4 +-
 tcg/s390x/tcg-target.c.inc | 78 +-
 2 files changed, 54 insertions(+), 28 deletions(-)

diff --git a/tcg/s390x/tcg-target.h b/tcg/s390x/tcg-target.h
index 24e207c2d4..cd3d245be0 100644
--- a/tcg/s390x/tcg-target.h
+++ b/tcg/s390x/tcg-target.h
@@ -104,7 +104,7 @@ extern uint64_t s390_facilities[3];
 #define TCG_TARGET_HAS_mulsh_i32  0
 #define TCG_TARGET_HAS_extrl_i64_i32  0
 #define TCG_TARGET_HAS_extrh_i64_i32  0
-#define TCG_TARGET_HAS_negsetcond_i32 0
+#define TCG_TARGET_HAS_negsetcond_i32 1
 #define TCG_TARGET_HAS_qemu_st8_i32   0
 
 #define TCG_TARGET_HAS_div2_i64   1
@@ -139,7 +139,7 @@ extern uint64_t s390_facilities[3];
 #define TCG_TARGET_HAS_muls2_i64  HAVE_FACILITY(MISC_INSN_EXT2)
 #define TCG_TARGET_HAS_muluh_i64  0
 #define TCG_TARGET_HAS_mulsh_i64  0
-#define TCG_TARGET_HAS_negsetcond_i64 0
+#define TCG_TARGET_HAS_negsetcond_i64 1
 
 #define TCG_TARGET_HAS_qemu_ldst_i128 1
 
diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index a94f7908d6..ecd8aaf2a1 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -1266,7 +1266,8 @@ static int tgen_cmp(TCGContext *s, TCGType type, TCGCond 
c, TCGReg r1,
 }
 
 static void tgen_setcond(TCGContext *s, TCGType type, TCGCond cond,
- TCGReg dest, TCGReg c1, TCGArg c2, int c2const)
+ TCGReg dest, TCGReg c1, TCGArg c2,
+ bool c2const, bool neg)
 {
 int cc;
 
@@ -1275,11 +1276,27 @@ static void tgen_setcond(TCGContext *s, TCGType type, 
TCGCond cond,
 /* Emit: d = 0, d = (cc ? 1 : d).  */
 cc = tgen_cmp(s, type, cond, c1, c2, c2const, false);
 tcg_out_movi(s, TCG_TYPE_I64, dest, 0);
-tcg_out_insn(s, RIEg, LOCGHI, dest, 1, cc);
+tcg_out_insn(s, RIEg, LOCGHI, dest, neg ? -1 : 1, cc);
 return;
 }
 
- restart:
+switch (cond) {
+case TCG_COND_GEU:
+case TCG_COND_LTU:
+case TCG_COND_LT:
+case TCG_COND_GE:
+/* Swap operands so that we can use LEU/GTU/GT/LE.  */
+if (!c2const) {
+TCGReg t = c1;
+c1 = c2;
+c2 = t;
+cond = tcg_swap_cond(cond);
+}
+break;
+default:
+break;
+}
+
 switch (cond) {
 case TCG_COND_NE:
 /* X != 0 is X > 0.  */
@@ -1292,11 +1309,20 @@ static void tgen_setcond(TCGContext *s, TCGType type, 
TCGCond cond,
 
 case TCG_COND_GTU:
 case TCG_COND_GT:
-/* The result of a compare has CC=2 for GT and CC=3 unused.
-   ADD LOGICAL WITH CARRY considers (CC & 2) the carry bit.  */
+/*
+ * The result of a compare has CC=2 for GT and CC=3 unused.
+ * ADD LOGICAL WITH CARRY considers (CC & 2) the carry bit.
+ */
 tgen_cmp(s, type, cond, c1, c2, c2const, true);
 tcg_out_movi(s, type, dest, 0);
 tcg_out_insn(s, RRE, ALCGR, dest, dest);
+if (neg) {
+if (type == TCG_TYPE_I32) {
+tcg_out_insn(s, RR, LCR, dest, dest);
+} else {
+tcg_out_insn(s, RRE, LCGR, dest, dest);
+}
+}
 return;
 
 case TCG_COND_EQ:
@@ -1310,27 +1336,17 @@ static void tgen_setcond(TCGContext *s, TCGType type, 
TCGCond cond,
 
 case TCG_COND_LEU:
 case TCG_COND_LE:
-/* As above, but we're looking for borrow, or !carry.
-   The second insn computes d - d - borrow, or -1 for true
-   and 0 for false.  So we must mask to 1 bit afterward.  */
+/*
+ * As above, but we're looking for borrow, or !carry.
+ * The second insn computes d - d - borrow, or -1 for true
+ * and 0 for false.  So we must mask to 1 bit afterward.
+ */
 tgen_cmp(s, type, cond, c1, c2, c2const, true);
 tcg_out_insn(s, RRE, SLBGR, dest, dest);
-tgen_andi(s, type, dest, 1);
-return;
-
-case TCG_COND_GEU:
-case TCG_COND_LTU:
-case TCG_COND_LT:
-case TCG_COND_GE:
-/* Swap operands so that we can use LEU/GTU/GT/LE.  */
-if (!c2const) {
-TCGReg t = c1;
-c1 = c2;
-c2 = t;
-cond = tcg_swap_cond(cond);
-goto restart;
+if (!neg) {
+tgen_andi(s, type, dest, 1);
 }
-break;
+return;
 
 default:
 g_assert_not_reached();
@@ -1339,7 +1355,7 @@ static void tgen_setcond(TCGContext *s, TCGType type, 
TCGCond cond,
 cc = tgen_cmp(s, type, cond, c1, c2, c2const, false);
 /* Emit: d = 0, t = 1, d = (cc ? t : d).  */
 tcg_out_movi(s, TCG_TYPE_I64, dest, 0);
-tcg_out_movi(s, TCG_TYPE_I64, TCG_TMP0, 1);
+tcg_out_movi(s, TCG_TYPE_I64, TCG_TMP0, neg ? -1 : 1);
 tcg_out_insn(s, RRFc, LOCGR, dest, TCG_TMP0, cc);
 }
 
@@ -2288,7 +2304,11 @@ static inl

[PATCH v2 14/23] tcg/riscv: Implement negsetcond_*

2023-08-18 Thread Richard Henderson
Reviewed-by: Daniel Henrique Barboza 
Signed-off-by: Richard Henderson 
---
 tcg/riscv/tcg-target.h |  4 ++--
 tcg/riscv/tcg-target.c.inc | 45 ++
 2 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index b2961fec8e..7e8ac48a7d 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -120,7 +120,7 @@ extern bool have_zbb;
 #define TCG_TARGET_HAS_ctpop_i32have_zbb
 #define TCG_TARGET_HAS_brcond2  1
 #define TCG_TARGET_HAS_setcond2 1
-#define TCG_TARGET_HAS_negsetcond_i32   0
+#define TCG_TARGET_HAS_negsetcond_i32   1
 #define TCG_TARGET_HAS_qemu_st8_i32 0
 
 #define TCG_TARGET_HAS_movcond_i64  1
@@ -159,7 +159,7 @@ extern bool have_zbb;
 #define TCG_TARGET_HAS_muls2_i640
 #define TCG_TARGET_HAS_muluh_i641
 #define TCG_TARGET_HAS_mulsh_i641
-#define TCG_TARGET_HAS_negsetcond_i64   0
+#define TCG_TARGET_HAS_negsetcond_i64   1
 
 #define TCG_TARGET_HAS_qemu_ldst_i128   0
 
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index eeaeb6b6e3..232b616af3 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -936,6 +936,44 @@ static void tcg_out_setcond(TCGContext *s, TCGCond cond, 
TCGReg ret,
 }
 }
 
+static void tcg_out_negsetcond(TCGContext *s, TCGCond cond, TCGReg ret,
+   TCGReg arg1, tcg_target_long arg2, bool c2)
+{
+int tmpflags;
+TCGReg tmp;
+
+/* For LT/GE comparison against 0, replicate the sign bit. */
+if (c2 && arg2 == 0) {
+switch (cond) {
+case TCG_COND_GE:
+tcg_out_opc_imm(s, OPC_XORI, ret, arg1, -1);
+arg1 = ret;
+/* fall through */
+case TCG_COND_LT:
+tcg_out_opc_imm(s, OPC_SRAI, ret, arg1, TCG_TARGET_REG_BITS - 1);
+return;
+default:
+break;
+}
+}
+
+tmpflags = tcg_out_setcond_int(s, cond, ret, arg1, arg2, c2);
+tmp = tmpflags & ~SETCOND_FLAGS;
+
+/* If intermediate result is zero/non-zero: test != 0. */
+if (tmpflags & SETCOND_NEZ) {
+tcg_out_opc_reg(s, OPC_SLTU, ret, TCG_REG_ZERO, tmp);
+tmp = ret;
+}
+
+/* Produce the 0/-1 result. */
+if (tmpflags & SETCOND_INV) {
+tcg_out_opc_imm(s, OPC_ADDI, ret, tmp, -1);
+} else {
+tcg_out_opc_reg(s, OPC_SUB, ret, TCG_REG_ZERO, tmp);
+}
+}
+
 static void tcg_out_movcond_zicond(TCGContext *s, TCGReg ret, TCGReg test_ne,
int val1, bool c_val1,
int val2, bool c_val2)
@@ -1782,6 +1820,11 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 tcg_out_setcond(s, args[3], a0, a1, a2, c2);
 break;
 
+case INDEX_op_negsetcond_i32:
+case INDEX_op_negsetcond_i64:
+tcg_out_negsetcond(s, args[3], a0, a1, a2, c2);
+break;
+
 case INDEX_op_movcond_i32:
 case INDEX_op_movcond_i64:
 tcg_out_movcond(s, args[5], a0, a1, a2, c2,
@@ -1910,6 +1953,8 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_xor_i64:
 case INDEX_op_setcond_i32:
 case INDEX_op_setcond_i64:
+case INDEX_op_negsetcond_i32:
+case INDEX_op_negsetcond_i64:
 return C_O1_I2(r, r, rI);
 
 case INDEX_op_andc_i32:
-- 
2.34.1




[PATCH v2 16/23] tcg/sparc64: Implement negsetcond_*

2023-08-18 Thread Richard Henderson
Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 tcg/sparc64/tcg-target.h |  4 ++--
 tcg/sparc64/tcg-target.c.inc | 36 ++--
 2 files changed, 28 insertions(+), 12 deletions(-)

diff --git a/tcg/sparc64/tcg-target.h b/tcg/sparc64/tcg-target.h
index 1faadc704b..4bbd825bd8 100644
--- a/tcg/sparc64/tcg-target.h
+++ b/tcg/sparc64/tcg-target.h
@@ -112,7 +112,7 @@ extern bool use_vis3_instructions;
 #define TCG_TARGET_HAS_muls2_i321
 #define TCG_TARGET_HAS_muluh_i320
 #define TCG_TARGET_HAS_mulsh_i320
-#define TCG_TARGET_HAS_negsetcond_i32   0
+#define TCG_TARGET_HAS_negsetcond_i32   1
 #define TCG_TARGET_HAS_qemu_st8_i32 0
 
 #define TCG_TARGET_HAS_extrl_i64_i321
@@ -150,7 +150,7 @@ extern bool use_vis3_instructions;
 #define TCG_TARGET_HAS_muls2_i640
 #define TCG_TARGET_HAS_muluh_i64use_vis3_instructions
 #define TCG_TARGET_HAS_mulsh_i640
-#define TCG_TARGET_HAS_negsetcond_i64   0
+#define TCG_TARGET_HAS_negsetcond_i64   1
 
 #define TCG_TARGET_HAS_qemu_ldst_i128   0
 
diff --git a/tcg/sparc64/tcg-target.c.inc b/tcg/sparc64/tcg-target.c.inc
index ffcb879211..37839f9a21 100644
--- a/tcg/sparc64/tcg-target.c.inc
+++ b/tcg/sparc64/tcg-target.c.inc
@@ -720,7 +720,7 @@ static void tcg_out_movcond_i64(TCGContext *s, TCGCond 
cond, TCGReg ret,
 }
 
 static void tcg_out_setcond_i32(TCGContext *s, TCGCond cond, TCGReg ret,
-TCGReg c1, int32_t c2, int c2const)
+TCGReg c1, int32_t c2, int c2const, bool neg)
 {
 /* For 32-bit comparisons, we can play games with ADDC/SUBC.  */
 switch (cond) {
@@ -760,22 +760,30 @@ static void tcg_out_setcond_i32(TCGContext *s, TCGCond 
cond, TCGReg ret,
 default:
 tcg_out_cmp(s, c1, c2, c2const);
 tcg_out_movi_s13(s, ret, 0);
-tcg_out_movcc(s, cond, MOVCC_ICC, ret, 1, 1);
+tcg_out_movcc(s, cond, MOVCC_ICC, ret, neg ? -1 : 1, 1);
 return;
 }
 
 tcg_out_cmp(s, c1, c2, c2const);
 if (cond == TCG_COND_LTU) {
-tcg_out_arithi(s, ret, TCG_REG_G0, 0, ARITH_ADDC);
+if (neg) {
+tcg_out_arithi(s, ret, TCG_REG_G0, 0, ARITH_SUBC);
+} else {
+tcg_out_arithi(s, ret, TCG_REG_G0, 0, ARITH_ADDC);
+}
 } else {
-tcg_out_arithi(s, ret, TCG_REG_G0, -1, ARITH_SUBC);
+if (neg) {
+tcg_out_arithi(s, ret, TCG_REG_G0, -1, ARITH_ADDC);
+} else {
+tcg_out_arithi(s, ret, TCG_REG_G0, -1, ARITH_SUBC);
+}
 }
 }
 
 static void tcg_out_setcond_i64(TCGContext *s, TCGCond cond, TCGReg ret,
-TCGReg c1, int32_t c2, int c2const)
+TCGReg c1, int32_t c2, int c2const, bool neg)
 {
-if (use_vis3_instructions) {
+if (use_vis3_instructions && !neg) {
 switch (cond) {
 case TCG_COND_NE:
 if (c2 != 0) {
@@ -796,11 +804,11 @@ static void tcg_out_setcond_i64(TCGContext *s, TCGCond 
cond, TCGReg ret,
if the input does not overlap the output.  */
 if (c2 == 0 && !is_unsigned_cond(cond) && c1 != ret) {
 tcg_out_movi_s13(s, ret, 0);
-tcg_out_movr(s, cond, ret, c1, 1, 1);
+tcg_out_movr(s, cond, ret, c1, neg ? -1 : 1, 1);
 } else {
 tcg_out_cmp(s, c1, c2, c2const);
 tcg_out_movi_s13(s, ret, 0);
-tcg_out_movcc(s, cond, MOVCC_XCC, ret, 1, 1);
+tcg_out_movcc(s, cond, MOVCC_XCC, ret, neg ? -1 : 1, 1);
 }
 }
 
@@ -1355,7 +1363,10 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 tcg_out_brcond_i32(s, a2, a0, a1, const_args[1], arg_label(args[3]));
 break;
 case INDEX_op_setcond_i32:
-tcg_out_setcond_i32(s, args[3], a0, a1, a2, c2);
+tcg_out_setcond_i32(s, args[3], a0, a1, a2, c2, false);
+break;
+case INDEX_op_negsetcond_i32:
+tcg_out_setcond_i32(s, args[3], a0, a1, a2, c2, true);
 break;
 case INDEX_op_movcond_i32:
 tcg_out_movcond_i32(s, args[5], a0, a1, a2, c2, args[3], 
const_args[3]);
@@ -1437,7 +1448,10 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 tcg_out_brcond_i64(s, a2, a0, a1, const_args[1], arg_label(args[3]));
 break;
 case INDEX_op_setcond_i64:
-tcg_out_setcond_i64(s, args[3], a0, a1, a2, c2);
+tcg_out_setcond_i64(s, args[3], a0, a1, a2, c2, false);
+break;
+case INDEX_op_negsetcond_i64:
+tcg_out_setcond_i64(s, args[3], a0, a1, a2, c2, true);
 break;
 case INDEX_op_movcond_i64:
 tcg_out_movcond_i64(s, args[5], a0, a1, a2, c2, args[3], 
const_args[3]);
@@ -1564,6 +1578,8 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_sar_i64:
 case INDEX_op_setcond_i32:
 case INDEX_op_setcond_i64:
+case INDEX_op_negsetcond_i32:
+case INDEX_op_negsetcond_i64:
 return C_O1_I2(r, rZ, rJ);
 
 case INDEX_

[PATCH v2 13/23] tcg/arm: Implement negsetcond_i32

2023-08-18 Thread Richard Henderson
Trivial, as we simply need to load a different constant
in the conditional move.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 tcg/arm/tcg-target.h | 2 +-
 tcg/arm/tcg-target.c.inc | 9 +
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index b076d033a9..b064bbda9f 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -122,7 +122,7 @@ extern bool use_neon_instructions;
 #define TCG_TARGET_HAS_mulsh_i320
 #define TCG_TARGET_HAS_div_i32  use_idiv_instructions
 #define TCG_TARGET_HAS_rem_i32  0
-#define TCG_TARGET_HAS_negsetcond_i32   0
+#define TCG_TARGET_HAS_negsetcond_i32   1
 #define TCG_TARGET_HAS_qemu_st8_i32 0
 
 #define TCG_TARGET_HAS_qemu_ldst_i128   0
diff --git a/tcg/arm/tcg-target.c.inc b/tcg/arm/tcg-target.c.inc
index 83e286088f..162df38c73 100644
--- a/tcg/arm/tcg-target.c.inc
+++ b/tcg/arm/tcg-target.c.inc
@@ -1975,6 +1975,14 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 tcg_out_dat_imm(s, tcg_cond_to_arm_cond[tcg_invert_cond(args[3])],
 ARITH_MOV, args[0], 0, 0);
 break;
+case INDEX_op_negsetcond_i32:
+tcg_out_dat_rIN(s, COND_AL, ARITH_CMP, ARITH_CMN, 0,
+args[1], args[2], const_args[2]);
+tcg_out_dat_imm(s, tcg_cond_to_arm_cond[args[3]],
+ARITH_MVN, args[0], 0, 0);
+tcg_out_dat_imm(s, tcg_cond_to_arm_cond[tcg_invert_cond(args[3])],
+ARITH_MOV, args[0], 0, 0);
+break;
 
 case INDEX_op_brcond2_i32:
 c = tcg_out_cmp2(s, args, const_args);
@@ -2112,6 +2120,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_add_i32:
 case INDEX_op_sub_i32:
 case INDEX_op_setcond_i32:
+case INDEX_op_negsetcond_i32:
 return C_O1_I2(r, r, rIN);
 
 case INDEX_op_and_i32:
-- 
2.34.1




[PATCH v2 09/23] target/tricore: Replace gen_cond_w with tcg_gen_negsetcond_tl

2023-08-18 Thread Richard Henderson
Reviewed-by: Bastian Koppelmann 
Signed-off-by: Richard Henderson 
---
 target/tricore/translate.c | 16 ++--
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/target/tricore/translate.c b/target/tricore/translate.c
index 1947733870..6ae5ccbf72 100644
--- a/target/tricore/translate.c
+++ b/target/tricore/translate.c
@@ -2680,13 +2680,6 @@ gen_accumulating_condi(int cond, TCGv ret, TCGv r1, 
int32_t con,
 gen_accumulating_cond(cond, ret, r1, temp, op);
 }
 
-/* ret = (r1 cond r2) ? 0x ? 0x;*/
-static inline void gen_cond_w(TCGCond cond, TCGv ret, TCGv r1, TCGv r2)
-{
-tcg_gen_setcond_tl(cond, ret, r1, r2);
-tcg_gen_neg_tl(ret, ret);
-}
-
 static inline void gen_eqany_bi(TCGv ret, TCGv r1, int32_t con)
 {
 TCGv b0 = tcg_temp_new();
@@ -5692,7 +5685,8 @@ static void decode_rr_accumulator(DisasContext *ctx)
 gen_helper_eq_h(cpu_gpr_d[r3], cpu_gpr_d[r1], cpu_gpr_d[r2]);
 break;
 case OPC2_32_RR_EQ_W:
-gen_cond_w(TCG_COND_EQ, cpu_gpr_d[r3], cpu_gpr_d[r1], cpu_gpr_d[r2]);
+tcg_gen_negsetcond_tl(TCG_COND_EQ, cpu_gpr_d[r3],
+  cpu_gpr_d[r1], cpu_gpr_d[r2]);
 break;
 case OPC2_32_RR_EQANY_B:
 gen_helper_eqany_b(cpu_gpr_d[r3], cpu_gpr_d[r1], cpu_gpr_d[r2]);
@@ -5729,10 +5723,12 @@ static void decode_rr_accumulator(DisasContext *ctx)
 gen_helper_lt_hu(cpu_gpr_d[r3], cpu_gpr_d[r1], cpu_gpr_d[r2]);
 break;
 case OPC2_32_RR_LT_W:
-gen_cond_w(TCG_COND_LT, cpu_gpr_d[r3], cpu_gpr_d[r1], cpu_gpr_d[r2]);
+tcg_gen_negsetcond_tl(TCG_COND_LT, cpu_gpr_d[r3],
+  cpu_gpr_d[r1], cpu_gpr_d[r2]);
 break;
 case OPC2_32_RR_LT_WU:
-gen_cond_w(TCG_COND_LTU, cpu_gpr_d[r3], cpu_gpr_d[r1], cpu_gpr_d[r2]);
+tcg_gen_negsetcond_tl(TCG_COND_LTU, cpu_gpr_d[r3],
+  cpu_gpr_d[r1], cpu_gpr_d[r2]);
 break;
 case OPC2_32_RR_MAX:
 tcg_gen_movcond_tl(TCG_COND_GT, cpu_gpr_d[r3], cpu_gpr_d[r1],
-- 
2.34.1




[PATCH v2 17/23] tcg/i386: Merge tcg_out_brcond{32,64}

2023-08-18 Thread Richard Henderson
Pass a rexw parameter instead of duplicating the functions.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.c.inc | 110 +-
 1 file changed, 49 insertions(+), 61 deletions(-)

diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 3045b56002..33f66ba204 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -1436,99 +1436,89 @@ static void tcg_out_cmp(TCGContext *s, TCGArg arg1, 
TCGArg arg2,
 }
 }
 
-static void tcg_out_brcond32(TCGContext *s, TCGCond cond,
- TCGArg arg1, TCGArg arg2, int const_arg2,
- TCGLabel *label, int small)
+static void tcg_out_brcond(TCGContext *s, int rexw, TCGCond cond,
+   TCGArg arg1, TCGArg arg2, int const_arg2,
+   TCGLabel *label, bool small)
 {
-tcg_out_cmp(s, arg1, arg2, const_arg2, 0);
+tcg_out_cmp(s, arg1, arg2, const_arg2, rexw);
 tcg_out_jxx(s, tcg_cond_to_jcc[cond], label, small);
 }
 
-#if TCG_TARGET_REG_BITS == 64
-static void tcg_out_brcond64(TCGContext *s, TCGCond cond,
- TCGArg arg1, TCGArg arg2, int const_arg2,
- TCGLabel *label, int small)
-{
-tcg_out_cmp(s, arg1, arg2, const_arg2, P_REXW);
-tcg_out_jxx(s, tcg_cond_to_jcc[cond], label, small);
-}
-#else
-/* XXX: we implement it at the target level to avoid having to
-   handle cross basic blocks temporaries */
+#if TCG_TARGET_REG_BITS == 32
 static void tcg_out_brcond2(TCGContext *s, const TCGArg *args,
-const int *const_args, int small)
+const int *const_args, bool small)
 {
 TCGLabel *label_next = gen_new_label();
 TCGLabel *label_this = arg_label(args[5]);
 
 switch(args[4]) {
 case TCG_COND_EQ:
-tcg_out_brcond32(s, TCG_COND_NE, args[0], args[2], const_args[2],
- label_next, 1);
-tcg_out_brcond32(s, TCG_COND_EQ, args[1], args[3], const_args[3],
- label_this, small);
+tcg_out_brcond(s, 0, TCG_COND_NE, args[0], args[2], const_args[2],
+   label_next, 1);
+tcg_out_brcond(s, 0, TCG_COND_EQ, args[1], args[3], const_args[3],
+   label_this, small);
 break;
 case TCG_COND_NE:
-tcg_out_brcond32(s, TCG_COND_NE, args[0], args[2], const_args[2],
- label_this, small);
-tcg_out_brcond32(s, TCG_COND_NE, args[1], args[3], const_args[3],
- label_this, small);
+tcg_out_brcond(s, 0, TCG_COND_NE, args[0], args[2], const_args[2],
+   label_this, small);
+tcg_out_brcond(s, 0, TCG_COND_NE, args[1], args[3], const_args[3],
+   label_this, small);
 break;
 case TCG_COND_LT:
-tcg_out_brcond32(s, TCG_COND_LT, args[1], args[3], const_args[3],
- label_this, small);
+tcg_out_brcond(s, 0, TCG_COND_LT, args[1], args[3], const_args[3],
+   label_this, small);
 tcg_out_jxx(s, JCC_JNE, label_next, 1);
-tcg_out_brcond32(s, TCG_COND_LTU, args[0], args[2], const_args[2],
- label_this, small);
+tcg_out_brcond(s, 0, TCG_COND_LTU, args[0], args[2], const_args[2],
+   label_this, small);
 break;
 case TCG_COND_LE:
-tcg_out_brcond32(s, TCG_COND_LT, args[1], args[3], const_args[3],
- label_this, small);
+tcg_out_brcond(s, 0, TCG_COND_LT, args[1], args[3], const_args[3],
+   label_this, small);
 tcg_out_jxx(s, JCC_JNE, label_next, 1);
-tcg_out_brcond32(s, TCG_COND_LEU, args[0], args[2], const_args[2],
- label_this, small);
+tcg_out_brcond(s, 0, TCG_COND_LEU, args[0], args[2], const_args[2],
+   label_this, small);
 break;
 case TCG_COND_GT:
-tcg_out_brcond32(s, TCG_COND_GT, args[1], args[3], const_args[3],
- label_this, small);
+tcg_out_brcond(s, 0, TCG_COND_GT, args[1], args[3], const_args[3],
+   label_this, small);
 tcg_out_jxx(s, JCC_JNE, label_next, 1);
-tcg_out_brcond32(s, TCG_COND_GTU, args[0], args[2], const_args[2],
- label_this, small);
+tcg_out_brcond(s, 0, TCG_COND_GTU, args[0], args[2], const_args[2],
+   label_this, small);
 break;
 case TCG_COND_GE:
-tcg_out_brcond32(s, TCG_COND_GT, args[1], args[3], const_args[3],
- label_this, small);
+tcg_out_brcond(s, 0, TCG_COND_GT, args[1], args[3], const_args[3],
+   label_this, small);
 tcg_out_jxx(s, JCC_JNE, label_next, 1);
-tcg_out_brcond32(s, TCG_COND_GEU, args[0], args[2], const_args[

[PATCH v2 00/23] tcg: Introduce negsetcond opcodes

2023-08-18 Thread Richard Henderson
Introduce two new setcond opcode variants which produce -1 instead
of 1 when the condition.  For most of our hosts, producing -1 is
just as easy as 1, and avoid requiring a separate negate instruction.

Use the new opcode in tcg/tcg-op-gvec.c for integral expansion of
generic vector operations.  I looked through target/ for obvious
pairings of setcond and neg.

Changes for v2:
  * Drop "tcg/i386: Add cf parameter to tcg_out_cmp" patch.

Patches needing review:
  15: tcg/s390x: Implement negsetcond_*


r~


Cc: Thomas Huth 
Cc: qemu-s3...@nongnu.org


Richard Henderson (23):
  tcg: Introduce negsetcond opcodes
  tcg: Use tcg_gen_negsetcond_*
  target/alpha: Use tcg_gen_movcond_i64 in gen_fold_mzero
  target/arm: Use tcg_gen_negsetcond_*
  target/m68k: Use tcg_gen_negsetcond_*
  target/openrisc: Use tcg_gen_negsetcond_*
  target/ppc: Use tcg_gen_negsetcond_*
  target/sparc: Use tcg_gen_movcond_i64 in gen_edge
  target/tricore: Replace gen_cond_w with tcg_gen_negsetcond_tl
  tcg/ppc: Implement negsetcond_*
  tcg/ppc: Use the Set Boolean Extension
  tcg/aarch64: Implement negsetcond_*
  tcg/arm: Implement negsetcond_i32
  tcg/riscv: Implement negsetcond_*
  tcg/s390x: Implement negsetcond_*
  tcg/sparc64: Implement negsetcond_*
  tcg/i386: Merge tcg_out_brcond{32,64}
  tcg/i386: Merge tcg_out_setcond{32,64}
  tcg/i386: Merge tcg_out_movcond{32,64}
  tcg/i386: Use CMP+SBB in tcg_out_setcond
  tcg/i386: Clear dest first in tcg_out_setcond if possible
  tcg/i386: Use shift in tcg_out_setcond
  tcg/i386: Implement negsetcond_*

 docs/devel/tcg-ops.rst |   6 +
 include/tcg/tcg-op-common.h|   4 +
 include/tcg/tcg-op.h   |   2 +
 include/tcg/tcg-opc.h  |   2 +
 include/tcg/tcg.h  |   1 +
 tcg/aarch64/tcg-target.h   |   2 +
 tcg/arm/tcg-target.h   |   1 +
 tcg/i386/tcg-target.h  |   2 +
 tcg/loongarch64/tcg-target.h   |   3 +
 tcg/mips/tcg-target.h  |   2 +
 tcg/ppc/tcg-target.h   |   2 +
 tcg/riscv/tcg-target.h |   2 +
 tcg/s390x/tcg-target.h |   2 +
 tcg/sparc64/tcg-target.h   |   2 +
 tcg/tci/tcg-target.h   |   2 +
 target/alpha/translate.c   |   7 +-
 target/arm/tcg/translate-a64.c |  22 +-
 target/arm/tcg/translate.c |  12 +-
 target/m68k/translate.c|  24 +-
 target/openrisc/translate.c|   6 +-
 target/sparc/translate.c   |  17 +-
 target/tricore/translate.c |  16 +-
 tcg/optimize.c |  41 +++-
 tcg/tcg-op-gvec.c  |   6 +-
 tcg/tcg-op.c   |  42 +++-
 tcg/tcg.c  |   6 +
 target/ppc/translate/fixedpoint-impl.c.inc |   6 +-
 target/ppc/translate/vmx-impl.c.inc|   8 +-
 tcg/aarch64/tcg-target.c.inc   |  12 +
 tcg/arm/tcg-target.c.inc   |   9 +
 tcg/i386/tcg-target.c.inc  | 255 +
 tcg/ppc/tcg-target.c.inc   | 149 
 tcg/riscv/tcg-target.c.inc |  45 
 tcg/s390x/tcg-target.c.inc |  78 ---
 tcg/sparc64/tcg-target.c.inc   |  36 ++-
 35 files changed, 567 insertions(+), 265 deletions(-)

-- 
2.34.1




[PATCH v2 07/23] target/ppc: Use tcg_gen_negsetcond_*

2023-08-18 Thread Richard Henderson
Tested-by: Nicholas Piggin 
Reviewed-by: Nicholas Piggin 
Reviewed-by: Daniel Henrique Barboza 
Signed-off-by: Richard Henderson 
---
 target/ppc/translate/fixedpoint-impl.c.inc | 6 --
 target/ppc/translate/vmx-impl.c.inc| 8 +++-
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/target/ppc/translate/fixedpoint-impl.c.inc 
b/target/ppc/translate/fixedpoint-impl.c.inc
index f47f1a50e8..4ce02fd3a4 100644
--- a/target/ppc/translate/fixedpoint-impl.c.inc
+++ b/target/ppc/translate/fixedpoint-impl.c.inc
@@ -342,12 +342,14 @@ static bool do_set_bool_cond(DisasContext *ctx, arg_X_bi 
*a, bool neg, bool rev)
 uint32_t mask = 0x08 >> (a->bi & 0x03);
 TCGCond cond = rev ? TCG_COND_EQ : TCG_COND_NE;
 TCGv temp = tcg_temp_new();
+TCGv zero = tcg_constant_tl(0);
 
 tcg_gen_extu_i32_tl(temp, cpu_crf[a->bi >> 2]);
 tcg_gen_andi_tl(temp, temp, mask);
-tcg_gen_setcondi_tl(cond, cpu_gpr[a->rt], temp, 0);
 if (neg) {
-tcg_gen_neg_tl(cpu_gpr[a->rt], cpu_gpr[a->rt]);
+tcg_gen_negsetcond_tl(cond, cpu_gpr[a->rt], temp, zero);
+} else {
+tcg_gen_setcond_tl(cond, cpu_gpr[a->rt], temp, zero);
 }
 return true;
 }
diff --git a/target/ppc/translate/vmx-impl.c.inc 
b/target/ppc/translate/vmx-impl.c.inc
index c8712dd7d8..6d7669aabd 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -1341,8 +1341,7 @@ static bool trans_VCMPEQUQ(DisasContext *ctx, arg_VC *a)
 tcg_gen_xor_i64(t1, t0, t1);
 
 tcg_gen_or_i64(t1, t1, t2);
-tcg_gen_setcondi_i64(TCG_COND_EQ, t1, t1, 0);
-tcg_gen_neg_i64(t1, t1);
+tcg_gen_negsetcond_i64(TCG_COND_EQ, t1, t1, tcg_constant_i64(0));
 
 set_avr64(a->vrt, t1, true);
 set_avr64(a->vrt, t1, false);
@@ -1365,15 +1364,14 @@ static bool do_vcmpgtq(DisasContext *ctx, arg_VC *a, 
bool sign)
 
 get_avr64(t0, a->vra, false);
 get_avr64(t1, a->vrb, false);
-tcg_gen_setcond_i64(TCG_COND_GTU, t2, t0, t1);
+tcg_gen_negsetcond_i64(TCG_COND_GTU, t2, t0, t1);
 
 get_avr64(t0, a->vra, true);
 get_avr64(t1, a->vrb, true);
 tcg_gen_movcond_i64(TCG_COND_EQ, t2, t0, t1, t2, tcg_constant_i64(0));
-tcg_gen_setcond_i64(sign ? TCG_COND_GT : TCG_COND_GTU, t1, t0, t1);
+tcg_gen_negsetcond_i64(sign ? TCG_COND_GT : TCG_COND_GTU, t1, t0, t1);
 
 tcg_gen_or_i64(t1, t1, t2);
-tcg_gen_neg_i64(t1, t1);
 
 set_avr64(a->vrt, t1, true);
 set_avr64(a->vrt, t1, false);
-- 
2.34.1




[PATCH v2 11/23] tcg/ppc: Use the Set Boolean Extension

2023-08-18 Thread Richard Henderson
The SETBC family of instructions requires exactly two insns for
all comparisions, saving 0-3 insns per (neg)setcond.

Tested-by: Nicholas Piggin 
Reviewed-by: Daniel Henrique Barboza 
Signed-off-by: Richard Henderson 
---
 tcg/ppc/tcg-target.c.inc | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index 10448aa0e6..090f11e71c 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -447,6 +447,11 @@ static bool tcg_target_const_match(int64_t val, TCGType 
type, int ct)
 #define TW XO31( 4)
 #define TRAP   (TW | TO(31))
 
+#define SETBCXO31(384)  /* v3.10 */
+#define SETBCR   XO31(416)  /* v3.10 */
+#define SETNBC   XO31(448)  /* v3.10 */
+#define SETNBCR  XO31(480)  /* v3.10 */
+
 #define NOPORI  /* ori 0,0,0 */
 
 #define LVXXO31(103)
@@ -1624,6 +1629,23 @@ static void tcg_out_setcond(TCGContext *s, TCGType type, 
TCGCond cond,
 arg2 = (uint32_t)arg2;
 }
 
+/* With SETBC/SETBCR, we can always implement with 2 insns. */
+if (have_isa_3_10) {
+tcg_insn_unit bi, opc;
+
+tcg_out_cmp(s, cond, arg1, arg2, const_arg2, 7, type);
+
+/* Re-use tcg_to_bc for BI and BO_COND_{TRUE,FALSE}. */
+bi = tcg_to_bc[cond] & (0x1f << 16);
+if (tcg_to_bc[cond] & BO(8)) {
+opc = neg ? SETNBC : SETBC;
+} else {
+opc = neg ? SETNBCR : SETBCR;
+}
+tcg_out32(s, opc | RT(arg0) | bi);
+return;
+}
+
 /* Handle common and trivial cases before handling anything else.  */
 if (arg2 == 0) {
 switch (cond) {
-- 
2.34.1




[PATCH v2 05/23] target/m68k: Use tcg_gen_negsetcond_*

2023-08-18 Thread Richard Henderson
Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/m68k/translate.c | 24 ++--
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/target/m68k/translate.c b/target/m68k/translate.c
index d08e823b6c..15b3701b8f 100644
--- a/target/m68k/translate.c
+++ b/target/m68k/translate.c
@@ -1350,8 +1350,7 @@ static void gen_cc_cond(DisasCompare *c, DisasContext *s, 
int cond)
 case 14: /* GT (!(Z || (N ^ V))) */
 case 15: /* LE (Z || (N ^ V)) */
 c->v1 = tmp = tcg_temp_new();
-tcg_gen_setcond_i32(TCG_COND_EQ, tmp, QREG_CC_Z, c->v2);
-tcg_gen_neg_i32(tmp, tmp);
+tcg_gen_negsetcond_i32(TCG_COND_EQ, tmp, QREG_CC_Z, c->v2);
 tmp2 = tcg_temp_new();
 tcg_gen_xor_i32(tmp2, QREG_CC_N, QREG_CC_V);
 tcg_gen_or_i32(tmp, tmp, tmp2);
@@ -1430,9 +1429,8 @@ DISAS_INSN(scc)
 gen_cc_cond(&c, s, cond);
 
 tmp = tcg_temp_new();
-tcg_gen_setcond_i32(c.tcond, tmp, c.v1, c.v2);
+tcg_gen_negsetcond_i32(c.tcond, tmp, c.v1, c.v2);
 
-tcg_gen_neg_i32(tmp, tmp);
 DEST_EA(env, insn, OS_BYTE, tmp, NULL);
 }
 
@@ -2764,13 +2762,14 @@ DISAS_INSN(mull)
 tcg_gen_muls2_i32(QREG_CC_N, QREG_CC_V, src1, DREG(ext, 12));
 /* QREG_CC_V is -(QREG_CC_V != (QREG_CC_N >> 31)) */
 tcg_gen_sari_i32(QREG_CC_Z, QREG_CC_N, 31);
-tcg_gen_setcond_i32(TCG_COND_NE, QREG_CC_V, QREG_CC_V, QREG_CC_Z);
+tcg_gen_negsetcond_i32(TCG_COND_NE, QREG_CC_V,
+   QREG_CC_V, QREG_CC_Z);
 } else {
 tcg_gen_mulu2_i32(QREG_CC_N, QREG_CC_V, src1, DREG(ext, 12));
 /* QREG_CC_V is -(QREG_CC_V != 0), use QREG_CC_C as 0 */
-tcg_gen_setcond_i32(TCG_COND_NE, QREG_CC_V, QREG_CC_V, QREG_CC_C);
+tcg_gen_negsetcond_i32(TCG_COND_NE, QREG_CC_V,
+   QREG_CC_V, QREG_CC_C);
 }
-tcg_gen_neg_i32(QREG_CC_V, QREG_CC_V);
 tcg_gen_mov_i32(DREG(ext, 12), QREG_CC_N);
 
 tcg_gen_mov_i32(QREG_CC_Z, QREG_CC_N);
@@ -3339,14 +3338,13 @@ static inline void shift_im(DisasContext *s, uint16_t 
insn, int opsize)
 if (!logical && m68k_feature(s->env, M68K_FEATURE_M68K)) {
 /* if shift count >= bits, V is (reg != 0) */
 if (count >= bits) {
-tcg_gen_setcond_i32(TCG_COND_NE, QREG_CC_V, reg, QREG_CC_V);
+tcg_gen_negsetcond_i32(TCG_COND_NE, QREG_CC_V, reg, QREG_CC_V);
 } else {
 TCGv t0 = tcg_temp_new();
 tcg_gen_sari_i32(QREG_CC_V, reg, bits - 1);
 tcg_gen_sari_i32(t0, reg, bits - count - 1);
-tcg_gen_setcond_i32(TCG_COND_NE, QREG_CC_V, QREG_CC_V, t0);
+tcg_gen_negsetcond_i32(TCG_COND_NE, QREG_CC_V, QREG_CC_V, t0);
 }
-tcg_gen_neg_i32(QREG_CC_V, QREG_CC_V);
 }
 } else {
 tcg_gen_shri_i32(QREG_CC_C, reg, count - 1);
@@ -3430,9 +3428,8 @@ static inline void shift_reg(DisasContext *s, uint16_t 
insn, int opsize)
 /* Ignore the bits below the sign bit.  */
 tcg_gen_andi_i64(t64, t64, -1ULL << (bits - 1));
 /* If any bits remain set, we have overflow.  */
-tcg_gen_setcondi_i64(TCG_COND_NE, t64, t64, 0);
+tcg_gen_negsetcond_i64(TCG_COND_NE, t64, t64, tcg_constant_i64(0));
 tcg_gen_extrl_i64_i32(QREG_CC_V, t64);
-tcg_gen_neg_i32(QREG_CC_V, QREG_CC_V);
 }
 } else {
 tcg_gen_shli_i64(t64, t64, 32);
@@ -5311,9 +5308,8 @@ DISAS_INSN(fscc)
 gen_fcc_cond(&c, s, cond);
 
 tmp = tcg_temp_new();
-tcg_gen_setcond_i32(c.tcond, tmp, c.v1, c.v2);
+tcg_gen_negsetcond_i32(c.tcond, tmp, c.v1, c.v2);
 
-tcg_gen_neg_i32(tmp, tmp);
 DEST_EA(env, insn, OS_BYTE, tmp, NULL);
 }
 
-- 
2.34.1




[PATCH v2 02/23] tcg: Use tcg_gen_negsetcond_*

2023-08-18 Thread Richard Henderson
Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 tcg/tcg-op-gvec.c | 6 ++
 tcg/tcg-op.c  | 6 ++
 2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index a062239804..e260a07c61 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -3692,8 +3692,7 @@ static void expand_cmp_i32(uint32_t dofs, uint32_t aofs, 
uint32_t bofs,
 for (i = 0; i < oprsz; i += 4) {
 tcg_gen_ld_i32(t0, cpu_env, aofs + i);
 tcg_gen_ld_i32(t1, cpu_env, bofs + i);
-tcg_gen_setcond_i32(cond, t0, t0, t1);
-tcg_gen_neg_i32(t0, t0);
+tcg_gen_negsetcond_i32(cond, t0, t0, t1);
 tcg_gen_st_i32(t0, cpu_env, dofs + i);
 }
 tcg_temp_free_i32(t1);
@@ -3710,8 +3709,7 @@ static void expand_cmp_i64(uint32_t dofs, uint32_t aofs, 
uint32_t bofs,
 for (i = 0; i < oprsz; i += 8) {
 tcg_gen_ld_i64(t0, cpu_env, aofs + i);
 tcg_gen_ld_i64(t1, cpu_env, bofs + i);
-tcg_gen_setcond_i64(cond, t0, t0, t1);
-tcg_gen_neg_i64(t0, t0);
+tcg_gen_negsetcond_i64(cond, t0, t0, t1);
 tcg_gen_st_i64(t0, cpu_env, dofs + i);
 }
 tcg_temp_free_i64(t1);
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 76d2377669..b4f1f24cab 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -863,8 +863,7 @@ void tcg_gen_movcond_i32(TCGCond cond, TCGv_i32 ret, 
TCGv_i32 c1,
 } else {
 TCGv_i32 t0 = tcg_temp_ebb_new_i32();
 TCGv_i32 t1 = tcg_temp_ebb_new_i32();
-tcg_gen_setcond_i32(cond, t0, c1, c2);
-tcg_gen_neg_i32(t0, t0);
+tcg_gen_negsetcond_i32(cond, t0, c1, c2);
 tcg_gen_and_i32(t1, v1, t0);
 tcg_gen_andc_i32(ret, v2, t0);
 tcg_gen_or_i32(ret, ret, t1);
@@ -2563,8 +2562,7 @@ void tcg_gen_movcond_i64(TCGCond cond, TCGv_i64 ret, 
TCGv_i64 c1,
 } else {
 TCGv_i64 t0 = tcg_temp_ebb_new_i64();
 TCGv_i64 t1 = tcg_temp_ebb_new_i64();
-tcg_gen_setcond_i64(cond, t0, c1, c2);
-tcg_gen_neg_i64(t0, t0);
+tcg_gen_negsetcond_i64(cond, t0, c1, c2);
 tcg_gen_and_i64(t1, v1, t0);
 tcg_gen_andc_i64(ret, v2, t0);
 tcg_gen_or_i64(ret, ret, t1);
-- 
2.34.1




[PATCH v2 01/23] tcg: Introduce negsetcond opcodes

2023-08-18 Thread Richard Henderson
Introduce a new opcode for negative setcond.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 docs/devel/tcg-ops.rst   |  6 ++
 include/tcg/tcg-op-common.h  |  4 
 include/tcg/tcg-op.h |  2 ++
 include/tcg/tcg-opc.h|  2 ++
 include/tcg/tcg.h|  1 +
 tcg/aarch64/tcg-target.h |  2 ++
 tcg/arm/tcg-target.h |  1 +
 tcg/i386/tcg-target.h|  2 ++
 tcg/loongarch64/tcg-target.h |  3 +++
 tcg/mips/tcg-target.h|  2 ++
 tcg/ppc/tcg-target.h |  2 ++
 tcg/riscv/tcg-target.h   |  2 ++
 tcg/s390x/tcg-target.h   |  2 ++
 tcg/sparc64/tcg-target.h |  2 ++
 tcg/tci/tcg-target.h |  2 ++
 tcg/optimize.c   | 41 +++-
 tcg/tcg-op.c | 36 +++
 tcg/tcg.c|  6 ++
 18 files changed, 117 insertions(+), 1 deletion(-)

diff --git a/docs/devel/tcg-ops.rst b/docs/devel/tcg-ops.rst
index 6a166c5665..fbde8040d7 100644
--- a/docs/devel/tcg-ops.rst
+++ b/docs/devel/tcg-ops.rst
@@ -498,6 +498,12 @@ Conditional moves
|
| Set *dest* to 1 if (*t1* *cond* *t2*) is true, otherwise set to 0.
 
+   * - negsetcond_i32/i64 *dest*, *t1*, *t2*, *cond*
+
+ - | *dest* = -(*t1* *cond* *t2*)
+   |
+   | Set *dest* to -1 if (*t1* *cond* *t2*) is true, otherwise set to 0.
+
* - movcond_i32/i64 *dest*, *c1*, *c2*, *v1*, *v2*, *cond*
 
  - | *dest* = (*c1* *cond* *c2* ? *v1* : *v2*)
diff --git a/include/tcg/tcg-op-common.h b/include/tcg/tcg-op-common.h
index be382bbf77..a53b15933b 100644
--- a/include/tcg/tcg-op-common.h
+++ b/include/tcg/tcg-op-common.h
@@ -344,6 +344,8 @@ void tcg_gen_setcond_i32(TCGCond cond, TCGv_i32 ret,
  TCGv_i32 arg1, TCGv_i32 arg2);
 void tcg_gen_setcondi_i32(TCGCond cond, TCGv_i32 ret,
   TCGv_i32 arg1, int32_t arg2);
+void tcg_gen_negsetcond_i32(TCGCond cond, TCGv_i32 ret,
+TCGv_i32 arg1, TCGv_i32 arg2);
 void tcg_gen_movcond_i32(TCGCond cond, TCGv_i32 ret, TCGv_i32 c1,
  TCGv_i32 c2, TCGv_i32 v1, TCGv_i32 v2);
 void tcg_gen_add2_i32(TCGv_i32 rl, TCGv_i32 rh, TCGv_i32 al,
@@ -540,6 +542,8 @@ void tcg_gen_setcond_i64(TCGCond cond, TCGv_i64 ret,
  TCGv_i64 arg1, TCGv_i64 arg2);
 void tcg_gen_setcondi_i64(TCGCond cond, TCGv_i64 ret,
   TCGv_i64 arg1, int64_t arg2);
+void tcg_gen_negsetcond_i64(TCGCond cond, TCGv_i64 ret,
+TCGv_i64 arg1, TCGv_i64 arg2);
 void tcg_gen_movcond_i64(TCGCond cond, TCGv_i64 ret, TCGv_i64 c1,
  TCGv_i64 c2, TCGv_i64 v1, TCGv_i64 v2);
 void tcg_gen_add2_i64(TCGv_i64 rl, TCGv_i64 rh, TCGv_i64 al,
diff --git a/include/tcg/tcg-op.h b/include/tcg/tcg-op.h
index d63683c47b..80cfcf8104 100644
--- a/include/tcg/tcg-op.h
+++ b/include/tcg/tcg-op.h
@@ -200,6 +200,7 @@ DEF_ATOMIC2(tcg_gen_atomic_umax_fetch, i64)
 #define tcg_gen_brcondi_tl tcg_gen_brcondi_i64
 #define tcg_gen_setcond_tl tcg_gen_setcond_i64
 #define tcg_gen_setcondi_tl tcg_gen_setcondi_i64
+#define tcg_gen_negsetcond_tl tcg_gen_negsetcond_i64
 #define tcg_gen_mul_tl tcg_gen_mul_i64
 #define tcg_gen_muli_tl tcg_gen_muli_i64
 #define tcg_gen_div_tl tcg_gen_div_i64
@@ -317,6 +318,7 @@ DEF_ATOMIC2(tcg_gen_atomic_umax_fetch, i64)
 #define tcg_gen_brcondi_tl tcg_gen_brcondi_i32
 #define tcg_gen_setcond_tl tcg_gen_setcond_i32
 #define tcg_gen_setcondi_tl tcg_gen_setcondi_i32
+#define tcg_gen_negsetcond_tl tcg_gen_negsetcond_i32
 #define tcg_gen_mul_tl tcg_gen_mul_i32
 #define tcg_gen_muli_tl tcg_gen_muli_i32
 #define tcg_gen_div_tl tcg_gen_div_i32
diff --git a/include/tcg/tcg-opc.h b/include/tcg/tcg-opc.h
index acfa5ba753..5044814d15 100644
--- a/include/tcg/tcg-opc.h
+++ b/include/tcg/tcg-opc.h
@@ -46,6 +46,7 @@ DEF(mb, 0, 0, 1, 0)
 
 DEF(mov_i32, 1, 1, 0, TCG_OPF_NOT_PRESENT)
 DEF(setcond_i32, 1, 2, 1, 0)
+DEF(negsetcond_i32, 1, 2, 1, IMPL(TCG_TARGET_HAS_negsetcond_i32))
 DEF(movcond_i32, 1, 4, 1, IMPL(TCG_TARGET_HAS_movcond_i32))
 /* load/store */
 DEF(ld8u_i32, 1, 1, 1, 0)
@@ -111,6 +112,7 @@ DEF(ctpop_i32, 1, 1, 0, IMPL(TCG_TARGET_HAS_ctpop_i32))
 
 DEF(mov_i64, 1, 1, 0, TCG_OPF_64BIT | TCG_OPF_NOT_PRESENT)
 DEF(setcond_i64, 1, 2, 1, IMPL64)
+DEF(negsetcond_i64, 1, 2, 1, IMPL64 | IMPL(TCG_TARGET_HAS_negsetcond_i64))
 DEF(movcond_i64, 1, 4, 1, IMPL64 | IMPL(TCG_TARGET_HAS_movcond_i64))
 /* load/store */
 DEF(ld8u_i64, 1, 1, 1, IMPL64)
diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index 0875971719..f00bff9c85 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -104,6 +104,7 @@ typedef uint64_t TCGRegSet;
 #define TCG_TARGET_HAS_muls2_i640
 #define TCG_TARGET_HAS_muluh_i640
 #define TCG_TARGET_HAS_mulsh_i640
+#define TCG_TARGET_HAS_negsetcond_i64   0
 /* Turn some undef macros into true macros.  */
 #define TCG_TARGET_HAS_add2_i32 1
 #define TCG_TARGET_HAS_sub2_i32 

[PATCH v2 23/23] tcg/i386: Implement negsetcond_*

2023-08-18 Thread Richard Henderson
Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.h |  4 ++--
 tcg/i386/tcg-target.c.inc | 27 +++
 2 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index f3cdc6927a..efc5ff8f9d 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -156,7 +156,7 @@ typedef enum {
 #define TCG_TARGET_HAS_muls2_i321
 #define TCG_TARGET_HAS_muluh_i320
 #define TCG_TARGET_HAS_mulsh_i320
-#define TCG_TARGET_HAS_negsetcond_i32   0
+#define TCG_TARGET_HAS_negsetcond_i32   1
 
 #if TCG_TARGET_REG_BITS == 64
 /* Keep 32-bit values zero-extended in a register.  */
@@ -194,7 +194,7 @@ typedef enum {
 #define TCG_TARGET_HAS_muls2_i641
 #define TCG_TARGET_HAS_muluh_i640
 #define TCG_TARGET_HAS_mulsh_i640
-#define TCG_TARGET_HAS_negsetcond_i64   0
+#define TCG_TARGET_HAS_negsetcond_i64   1
 #define TCG_TARGET_HAS_qemu_st8_i32 0
 #else
 #define TCG_TARGET_HAS_qemu_st8_i32 1
diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 16e830051d..e778dc642f 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -1529,7 +1529,7 @@ static void tcg_out_brcond2(TCGContext *s, const TCGArg 
*args,
 
 static void tcg_out_setcond(TCGContext *s, int rexw, TCGCond cond,
 TCGArg dest, TCGArg arg1, TCGArg arg2,
-int const_arg2)
+int const_arg2, bool neg)
 {
 bool inv = false;
 bool cleared;
@@ -1570,11 +1570,13 @@ static void tcg_out_setcond(TCGContext *s, int rexw, 
TCGCond cond,
  * This is always smaller than the SETCC expansion.
  */
 tcg_out_cmp(s, arg1, arg2, const_arg2, rexw);
-tgen_arithr(s, ARITH_SBB, dest, dest);  /* T:-1 F:0 */
-if (inv) {
-tgen_arithi(s, ARITH_ADD, dest, 1, 0);  /* T:0  F:1 */
-} else {
-tcg_out_modrm(s, OPC_GRP3_Ev, EXT3_NEG, dest);  /* T:1  F:0 */
+tgen_arithr(s, ARITH_SBB + (neg ? rexw : 0), dest, dest); /* T:-1 F:0 
*/
+if (inv && neg) {
+tcg_out_modrm(s, OPC_GRP3_Ev + rexw, EXT3_NOT, dest); /* T:0 F:-1 
*/
+} else if (inv) {
+tgen_arithi(s, ARITH_ADD, dest, 1, 0);/* T:0  F:1 
*/
+} else if (!neg) {
+tcg_out_modrm(s, OPC_GRP3_Ev, EXT3_NEG, dest);/* T:1  F:0 
*/
 }
 return;
 
@@ -1588,7 +1590,8 @@ static void tcg_out_setcond(TCGContext *s, int rexw, 
TCGCond cond,
 if (inv) {
 tcg_out_modrm(s, OPC_GRP3_Ev + rexw, EXT3_NOT, dest);
 }
-tcg_out_shifti(s, SHIFT_SHR + rexw, dest, rexw ? 63 : 31);
+tcg_out_shifti(s, (neg ? SHIFT_SAR : SHIFT_SHR) + rexw,
+   dest, rexw ? 63 : 31);
 return;
 }
 break;
@@ -1614,6 +1617,9 @@ static void tcg_out_setcond(TCGContext *s, int rexw, 
TCGCond cond,
 if (!cleared) {
 tcg_out_ext8u(s, dest, dest);
 }
+if (neg) {
+tcg_out_modrm(s, OPC_GRP3_Ev + rexw, EXT3_NEG, dest);
+}
 }
 
 #if TCG_TARGET_REG_BITS == 32
@@ -2632,7 +2638,10 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
arg_label(args[3]), 0);
 break;
 OP_32_64(setcond):
-tcg_out_setcond(s, rexw, args[3], a0, a1, a2, const_a2);
+tcg_out_setcond(s, rexw, args[3], a0, a1, a2, const_a2, false);
+break;
+OP_32_64(negsetcond):
+tcg_out_setcond(s, rexw, args[3], a0, a1, a2, const_a2, true);
 break;
 OP_32_64(movcond):
 tcg_out_movcond(s, rexw, args[5], a0, a1, a2, const_a2, args[3]);
@@ -3377,6 +3386,8 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 
 case INDEX_op_setcond_i32:
 case INDEX_op_setcond_i64:
+case INDEX_op_negsetcond_i32:
+case INDEX_op_negsetcond_i64:
 return C_O1_I2(q, r, re);
 
 case INDEX_op_movcond_i32:
-- 
2.34.1




[PATCH v2 06/23] target/openrisc: Use tcg_gen_negsetcond_*

2023-08-18 Thread Richard Henderson
Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/openrisc/translate.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/target/openrisc/translate.c b/target/openrisc/translate.c
index a86360d4f5..7c6f80daf1 100644
--- a/target/openrisc/translate.c
+++ b/target/openrisc/translate.c
@@ -253,9 +253,8 @@ static void gen_mul(DisasContext *dc, TCGv dest, TCGv srca, 
TCGv srcb)
 
 tcg_gen_muls2_tl(dest, cpu_sr_ov, srca, srcb);
 tcg_gen_sari_tl(t0, dest, TARGET_LONG_BITS - 1);
-tcg_gen_setcond_tl(TCG_COND_NE, cpu_sr_ov, cpu_sr_ov, t0);
+tcg_gen_negsetcond_tl(TCG_COND_NE, cpu_sr_ov, cpu_sr_ov, t0);
 
-tcg_gen_neg_tl(cpu_sr_ov, cpu_sr_ov);
 gen_ove_ov(dc);
 }
 
@@ -309,9 +308,8 @@ static void gen_muld(DisasContext *dc, TCGv srca, TCGv srcb)
 
 tcg_gen_muls2_i64(cpu_mac, high, t1, t2);
 tcg_gen_sari_i64(t1, cpu_mac, 63);
-tcg_gen_setcond_i64(TCG_COND_NE, t1, t1, high);
+tcg_gen_negsetcond_i64(TCG_COND_NE, t1, t1, high);
 tcg_gen_trunc_i64_tl(cpu_sr_ov, t1);
-tcg_gen_neg_tl(cpu_sr_ov, cpu_sr_ov);
 
 gen_ove_ov(dc);
 }
-- 
2.34.1




[PATCH v2 03/23] target/alpha: Use tcg_gen_movcond_i64 in gen_fold_mzero

2023-08-18 Thread Richard Henderson
The setcond + neg + and sequence is a complex method of
performing a conditional move.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/alpha/translate.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/target/alpha/translate.c b/target/alpha/translate.c
index 846f3d8091..0839182a1f 100644
--- a/target/alpha/translate.c
+++ b/target/alpha/translate.c
@@ -517,10 +517,9 @@ static void gen_fold_mzero(TCGCond cond, TCGv dest, TCGv 
src)
 
 case TCG_COND_GE:
 case TCG_COND_LT:
-/* For >= or <, map -0.0 to +0.0 via comparison and mask.  */
-tcg_gen_setcondi_i64(TCG_COND_NE, dest, src, mzero);
-tcg_gen_neg_i64(dest, dest);
-tcg_gen_and_i64(dest, dest, src);
+/* For >= or <, map -0.0 to +0.0. */
+tcg_gen_movcond_i64(TCG_COND_NE, dest, src, tcg_constant_i64(mzero),
+src, tcg_constant_i64(0));
 break;
 
 default:
-- 
2.34.1




[PATCH v2 08/23] target/sparc: Use tcg_gen_movcond_i64 in gen_edge

2023-08-18 Thread Richard Henderson
The setcond + neg + or sequence is a complex method of
performing a conditional move.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/sparc/translate.c | 17 -
 1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/target/sparc/translate.c b/target/sparc/translate.c
index bd877a5e4a..fa80a91161 100644
--- a/target/sparc/translate.c
+++ b/target/sparc/translate.c
@@ -2916,7 +2916,7 @@ static void gen_edge(DisasContext *dc, TCGv dst, TCGv s1, 
TCGv s2,
 
 tcg_gen_shr_tl(lo1, tcg_constant_tl(tabl), lo1);
 tcg_gen_shr_tl(lo2, tcg_constant_tl(tabr), lo2);
-tcg_gen_andi_tl(dst, lo1, omask);
+tcg_gen_andi_tl(lo1, lo1, omask);
 tcg_gen_andi_tl(lo2, lo2, omask);
 
 amask = -8;
@@ -2926,18 +2926,9 @@ static void gen_edge(DisasContext *dc, TCGv dst, TCGv 
s1, TCGv s2,
 tcg_gen_andi_tl(s1, s1, amask);
 tcg_gen_andi_tl(s2, s2, amask);
 
-/* We want to compute
-dst = (s1 == s2 ? lo1 : lo1 & lo2).
-   We've already done dst = lo1, so this reduces to
-dst &= (s1 == s2 ? -1 : lo2)
-   Which we perform by
-lo2 |= -(s1 == s2)
-dst &= lo2
-*/
-tcg_gen_setcond_tl(TCG_COND_EQ, lo1, s1, s2);
-tcg_gen_neg_tl(lo1, lo1);
-tcg_gen_or_tl(lo2, lo2, lo1);
-tcg_gen_and_tl(dst, dst, lo2);
+/* Compute dst = (s1 == s2 ? lo1 : lo1 & lo2). */
+tcg_gen_and_tl(lo2, lo2, lo1);
+tcg_gen_movcond_tl(TCG_COND_EQ, dst, s1, s2, lo1, lo2);
 }
 
 static void gen_alignaddr(TCGv dst, TCGv s1, TCGv s2, bool left)
-- 
2.34.1




[PATCH] target/arm: Fix SME ST1Q

2023-08-18 Thread Richard Henderson
A typo, noted in the bug report, resulting in an
incorrect write offset.

Cc: qemu-sta...@nongnu.org
Fixes: 7390e0e9ab8 ("target/arm: Implement SME LD1, ST1")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1833
Signed-off-by: Richard Henderson 
---
 target/arm/tcg/sme_helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c
index 1e67fcac30..296826ffe6 100644
--- a/target/arm/tcg/sme_helper.c
+++ b/target/arm/tcg/sme_helper.c
@@ -379,7 +379,7 @@ static inline void HNAME##_host(void *za, intptr_t off, 
void *host) \
 {   \
 uint64_t *ptr = za + off;   \
 HOST(host, ptr[BE]);\
-HOST(host + 1, ptr[!BE]);   \
+HOST(host + 8, ptr[!BE]);   \
 }   \
 static inline void VNAME##_v_host(void *za, intptr_t off, void *host)   \
 {   \
-- 
2.34.1




Re: [PATCH 3/3] bsd-user: Remove image_info.start_brk

2023-08-18 Thread Richard Henderson

On 8/18/23 11:22, Warner Losh wrote:

Forgot on 1/3 to mention I'm planning on doing a pull request right after the
release (subject to our release engineer's ok) and can include this there.


Excellent, thanks.

r~



Re: [PATCH 0/3] bsd-user: image_info cleanups

2023-08-18 Thread Warner Losh
These all look good. Right now I'm planning on queueing them for the next
pull request I have, which is all the reviewed changes from my GSoC
student's work and one submission from another user in the bsd-user fork (I
think, that one's up in the air). So next week sometime after the release
(since last time I did a pull request on a late rc, it caused rebase issues
and there's no reason for me not to wait).

Warner

On Fri, Aug 18, 2023 at 11:57 AM Richard Henderson <
richard.hender...@linaro.org> wrote:

> This mirrors some changes I've posted for linux-user,
> removing stuff from image_info which is unused.
>
>
> r~
>
>
> Richard Henderson (3):
>   bsd-user: Remove ELF_START_MMAP and image_info.start_mmap
>   bsd-user: Remove image_info.mmap
>   bsd-user: Remove image_info.start_brk
>
>  bsd-user/arm/target_arch_elf.h| 1 -
>  bsd-user/i386/target_arch_elf.h   | 1 -
>  bsd-user/qemu.h   | 3 ---
>  bsd-user/x86_64/target_arch_elf.h | 1 -
>  bsd-user/elfload.c| 4 +---
>  bsd-user/main.c   | 2 --
>  6 files changed, 1 insertion(+), 11 deletions(-)
>
> --
> 2.34.1
>
>


Re: [PATCH 3/3] bsd-user: Remove image_info.start_brk

2023-08-18 Thread Warner Losh
On Fri, Aug 18, 2023 at 11:57 AM Richard Henderson <
richard.hender...@linaro.org> wrote:

> This has the same value is image_info.brk, which is also logged,
> and is otherwise unused.
>
> Signed-off-by: Richard Henderson 
>

Reviewed-by: Warner Losh 

Same comments as 1/3.

Forgot on 1/3 to mention I'm planning on doing a pull request right after
the
release (subject to our release engineer's ok) and can include this there.

Warner


> ---
>  bsd-user/qemu.h| 1 -
>  bsd-user/elfload.c | 2 +-
>  bsd-user/main.c| 2 --
>  3 files changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/bsd-user/qemu.h b/bsd-user/qemu.h
> index 898fe3e8b3..61501c321b 100644
> --- a/bsd-user/qemu.h
> +++ b/bsd-user/qemu.h
> @@ -50,7 +50,6 @@ struct image_info {
>  abi_ulong end_code;
>  abi_ulong start_data;
>  abi_ulong end_data;
> -abi_ulong start_brk;
>  abi_ulong brk;
>  abi_ulong rss;
>  abi_ulong start_stack;
> diff --git a/bsd-user/elfload.c b/bsd-user/elfload.c
> index 2d39e59258..baf2f63d2f 100644
> --- a/bsd-user/elfload.c
> +++ b/bsd-user/elfload.c
> @@ -811,7 +811,7 @@ int load_elf_binary(struct bsd_binprm *bprm, struct
> target_pt_regs *regs,
> bprm->stringp, &elf_ex, load_addr,
> et_dyn_addr, interp_load_addr,
> info);
>  info->load_addr = reloc_func_desc;
> -info->start_brk = info->brk = elf_brk;
> +info->brk = elf_brk;
>  info->start_stack = bprm->p;
>  info->load_bias = 0;
>
> diff --git a/bsd-user/main.c b/bsd-user/main.c
> index 381bb18df8..f913cb55a7 100644
> --- a/bsd-user/main.c
> +++ b/bsd-user/main.c
> @@ -553,8 +553,6 @@ int main(int argc, char **argv)
>  fprintf(f, "page layout changed following binary load\n");
>  page_dump(f);
>
> -fprintf(f, "start_brk   0x" TARGET_ABI_FMT_lx "\n",
> -info->start_brk);
>  fprintf(f, "end_code0x" TARGET_ABI_FMT_lx "\n",
>  info->end_code);
>  fprintf(f, "start_code  0x" TARGET_ABI_FMT_lx "\n",
> --
> 2.34.1
>
>


Re: [PATCH 2/3] bsd-user: Remove image_info.mmap

2023-08-18 Thread Warner Losh
On Fri, Aug 18, 2023 at 11:57 AM Richard Henderson <
richard.hender...@linaro.org> wrote:

> This value is unused.
>
> Signed-off-by: Richard Henderson ]
>

Reviewed by: Warner Losh 

Same comments as patch 1/3.

Warner


> ---
>  bsd-user/qemu.h| 1 -
>  bsd-user/elfload.c | 1 -
>  2 files changed, 2 deletions(-)
>
> diff --git a/bsd-user/qemu.h b/bsd-user/qemu.h
> index 178114b423..898fe3e8b3 100644
> --- a/bsd-user/qemu.h
> +++ b/bsd-user/qemu.h
> @@ -52,7 +52,6 @@ struct image_info {
>  abi_ulong end_data;
>  abi_ulong start_brk;
>  abi_ulong brk;
> -abi_ulong mmap;
>  abi_ulong rss;
>  abi_ulong start_stack;
>  abi_ulong entry;
> diff --git a/bsd-user/elfload.c b/bsd-user/elfload.c
> index 38a3439d2c..2d39e59258 100644
> --- a/bsd-user/elfload.c
> +++ b/bsd-user/elfload.c
> @@ -738,7 +738,6 @@ int load_elf_binary(struct bsd_binprm *bprm, struct
> target_pt_regs *regs,
>  /* OK, This is the point of no return */
>  info->end_data = 0;
>  info->end_code = 0;
> -info->mmap = 0;
>  elf_entry = (abi_ulong) elf_ex.e_entry;
>
>  /* XXX Join this with PT_INTERP search? */
> --
> 2.34.1
>
>


Re: [PATCH 1/3] bsd-user: Remove ELF_START_MMAP and image_info.start_mmap

2023-08-18 Thread Warner Losh
On Fri, Aug 18, 2023 at 11:57 AM Richard Henderson <
richard.hender...@linaro.org> wrote:

> The start_mmap value is write-only.
> Remove the field and the defines that populated it.
>
> Signed-off-by: Richard Henderson 
>

Reviewed-by: Warner Losh 

This one won't interfere with anything, but unless I hear otherwise, I'll
queue it.
It applies to the blitz branch (though it needs a few more targets there),
and none
of these files are being modified by Kariim Taha, my gsoc student.

Warner

> ---
>  bsd-user/arm/target_arch_elf.h| 1 -
>  bsd-user/i386/target_arch_elf.h   | 1 -
>  bsd-user/qemu.h   | 1 -
>  bsd-user/x86_64/target_arch_elf.h | 1 -
>  bsd-user/elfload.c| 1 -
>  5 files changed, 5 deletions(-)
>
> diff --git a/bsd-user/arm/target_arch_elf.h
> b/bsd-user/arm/target_arch_elf.h
> index 935bce347f..b1c0fd2b32 100644
> --- a/bsd-user/arm/target_arch_elf.h
> +++ b/bsd-user/arm/target_arch_elf.h
> @@ -20,7 +20,6 @@
>  #ifndef TARGET_ARCH_ELF_H
>  #define TARGET_ARCH_ELF_H
>
> -#define ELF_START_MMAP 0x8000
>  #define ELF_ET_DYN_LOAD_ADDR0x50
>
>  #define elf_check_arch(x) ((x) == EM_ARM)
> diff --git a/bsd-user/i386/target_arch_elf.h
> b/bsd-user/i386/target_arch_elf.h
> index cbcd1f08e2..4ac27b02e7 100644
> --- a/bsd-user/i386/target_arch_elf.h
> +++ b/bsd-user/i386/target_arch_elf.h
> @@ -20,7 +20,6 @@
>  #ifndef TARGET_ARCH_ELF_H
>  #define TARGET_ARCH_ELF_H
>
> -#define ELF_START_MMAP 0x8000
>  #define ELF_ET_DYN_LOAD_ADDR0x01001000
>  #define elf_check_arch(x) (((x) == EM_386) || ((x) == EM_486))
>
> diff --git a/bsd-user/qemu.h b/bsd-user/qemu.h
> index 8f2d6a3c78..178114b423 100644
> --- a/bsd-user/qemu.h
> +++ b/bsd-user/qemu.h
> @@ -52,7 +52,6 @@ struct image_info {
>  abi_ulong end_data;
>  abi_ulong start_brk;
>  abi_ulong brk;
> -abi_ulong start_mmap;
>  abi_ulong mmap;
>  abi_ulong rss;
>  abi_ulong start_stack;
> diff --git a/bsd-user/x86_64/target_arch_elf.h
> b/bsd-user/x86_64/target_arch_elf.h
> index b244711888..e51c2faf08 100644
> --- a/bsd-user/x86_64/target_arch_elf.h
> +++ b/bsd-user/x86_64/target_arch_elf.h
> @@ -20,7 +20,6 @@
>  #ifndef TARGET_ARCH_ELF_H
>  #define TARGET_ARCH_ELF_H
>
> -#define ELF_START_MMAP 0x2ab000ULL
>  #define ELF_ET_DYN_LOAD_ADDR0x01021000
>  #define elf_check_arch(x) (((x) == ELF_ARCH))
>
> diff --git a/bsd-user/elfload.c b/bsd-user/elfload.c
> index 1f650bdde8..38a3439d2c 100644
> --- a/bsd-user/elfload.c
> +++ b/bsd-user/elfload.c
> @@ -738,7 +738,6 @@ int load_elf_binary(struct bsd_binprm *bprm, struct
> target_pt_regs *regs,
>  /* OK, This is the point of no return */
>  info->end_data = 0;
>  info->end_code = 0;
> -info->start_mmap = (abi_ulong)ELF_START_MMAP;
>  info->mmap = 0;
>  elf_entry = (abi_ulong) elf_ex.e_entry;
>
> --
> 2.34.1
>
>


[PATCH 1/3] bsd-user: Remove ELF_START_MMAP and image_info.start_mmap

2023-08-18 Thread Richard Henderson
The start_mmap value is write-only.
Remove the field and the defines that populated it.

Signed-off-by: Richard Henderson 
---
 bsd-user/arm/target_arch_elf.h| 1 -
 bsd-user/i386/target_arch_elf.h   | 1 -
 bsd-user/qemu.h   | 1 -
 bsd-user/x86_64/target_arch_elf.h | 1 -
 bsd-user/elfload.c| 1 -
 5 files changed, 5 deletions(-)

diff --git a/bsd-user/arm/target_arch_elf.h b/bsd-user/arm/target_arch_elf.h
index 935bce347f..b1c0fd2b32 100644
--- a/bsd-user/arm/target_arch_elf.h
+++ b/bsd-user/arm/target_arch_elf.h
@@ -20,7 +20,6 @@
 #ifndef TARGET_ARCH_ELF_H
 #define TARGET_ARCH_ELF_H
 
-#define ELF_START_MMAP 0x8000
 #define ELF_ET_DYN_LOAD_ADDR0x50
 
 #define elf_check_arch(x) ((x) == EM_ARM)
diff --git a/bsd-user/i386/target_arch_elf.h b/bsd-user/i386/target_arch_elf.h
index cbcd1f08e2..4ac27b02e7 100644
--- a/bsd-user/i386/target_arch_elf.h
+++ b/bsd-user/i386/target_arch_elf.h
@@ -20,7 +20,6 @@
 #ifndef TARGET_ARCH_ELF_H
 #define TARGET_ARCH_ELF_H
 
-#define ELF_START_MMAP 0x8000
 #define ELF_ET_DYN_LOAD_ADDR0x01001000
 #define elf_check_arch(x) (((x) == EM_386) || ((x) == EM_486))
 
diff --git a/bsd-user/qemu.h b/bsd-user/qemu.h
index 8f2d6a3c78..178114b423 100644
--- a/bsd-user/qemu.h
+++ b/bsd-user/qemu.h
@@ -52,7 +52,6 @@ struct image_info {
 abi_ulong end_data;
 abi_ulong start_brk;
 abi_ulong brk;
-abi_ulong start_mmap;
 abi_ulong mmap;
 abi_ulong rss;
 abi_ulong start_stack;
diff --git a/bsd-user/x86_64/target_arch_elf.h 
b/bsd-user/x86_64/target_arch_elf.h
index b244711888..e51c2faf08 100644
--- a/bsd-user/x86_64/target_arch_elf.h
+++ b/bsd-user/x86_64/target_arch_elf.h
@@ -20,7 +20,6 @@
 #ifndef TARGET_ARCH_ELF_H
 #define TARGET_ARCH_ELF_H
 
-#define ELF_START_MMAP 0x2ab000ULL
 #define ELF_ET_DYN_LOAD_ADDR0x01021000
 #define elf_check_arch(x) (((x) == ELF_ARCH))
 
diff --git a/bsd-user/elfload.c b/bsd-user/elfload.c
index 1f650bdde8..38a3439d2c 100644
--- a/bsd-user/elfload.c
+++ b/bsd-user/elfload.c
@@ -738,7 +738,6 @@ int load_elf_binary(struct bsd_binprm *bprm, struct 
target_pt_regs *regs,
 /* OK, This is the point of no return */
 info->end_data = 0;
 info->end_code = 0;
-info->start_mmap = (abi_ulong)ELF_START_MMAP;
 info->mmap = 0;
 elf_entry = (abi_ulong) elf_ex.e_entry;
 
-- 
2.34.1




[PATCH 3/3] bsd-user: Remove image_info.start_brk

2023-08-18 Thread Richard Henderson
This has the same value is image_info.brk, which is also logged,
and is otherwise unused.

Signed-off-by: Richard Henderson 
---
 bsd-user/qemu.h| 1 -
 bsd-user/elfload.c | 2 +-
 bsd-user/main.c| 2 --
 3 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/bsd-user/qemu.h b/bsd-user/qemu.h
index 898fe3e8b3..61501c321b 100644
--- a/bsd-user/qemu.h
+++ b/bsd-user/qemu.h
@@ -50,7 +50,6 @@ struct image_info {
 abi_ulong end_code;
 abi_ulong start_data;
 abi_ulong end_data;
-abi_ulong start_brk;
 abi_ulong brk;
 abi_ulong rss;
 abi_ulong start_stack;
diff --git a/bsd-user/elfload.c b/bsd-user/elfload.c
index 2d39e59258..baf2f63d2f 100644
--- a/bsd-user/elfload.c
+++ b/bsd-user/elfload.c
@@ -811,7 +811,7 @@ int load_elf_binary(struct bsd_binprm *bprm, struct 
target_pt_regs *regs,
bprm->stringp, &elf_ex, load_addr,
et_dyn_addr, interp_load_addr, info);
 info->load_addr = reloc_func_desc;
-info->start_brk = info->brk = elf_brk;
+info->brk = elf_brk;
 info->start_stack = bprm->p;
 info->load_bias = 0;
 
diff --git a/bsd-user/main.c b/bsd-user/main.c
index 381bb18df8..f913cb55a7 100644
--- a/bsd-user/main.c
+++ b/bsd-user/main.c
@@ -553,8 +553,6 @@ int main(int argc, char **argv)
 fprintf(f, "page layout changed following binary load\n");
 page_dump(f);
 
-fprintf(f, "start_brk   0x" TARGET_ABI_FMT_lx "\n",
-info->start_brk);
 fprintf(f, "end_code0x" TARGET_ABI_FMT_lx "\n",
 info->end_code);
 fprintf(f, "start_code  0x" TARGET_ABI_FMT_lx "\n",
-- 
2.34.1




[PATCH 2/3] bsd-user: Remove image_info.mmap

2023-08-18 Thread Richard Henderson
This value is unused.

Signed-off-by: Richard Henderson 
---
 bsd-user/qemu.h| 1 -
 bsd-user/elfload.c | 1 -
 2 files changed, 2 deletions(-)

diff --git a/bsd-user/qemu.h b/bsd-user/qemu.h
index 178114b423..898fe3e8b3 100644
--- a/bsd-user/qemu.h
+++ b/bsd-user/qemu.h
@@ -52,7 +52,6 @@ struct image_info {
 abi_ulong end_data;
 abi_ulong start_brk;
 abi_ulong brk;
-abi_ulong mmap;
 abi_ulong rss;
 abi_ulong start_stack;
 abi_ulong entry;
diff --git a/bsd-user/elfload.c b/bsd-user/elfload.c
index 38a3439d2c..2d39e59258 100644
--- a/bsd-user/elfload.c
+++ b/bsd-user/elfload.c
@@ -738,7 +738,6 @@ int load_elf_binary(struct bsd_binprm *bprm, struct 
target_pt_regs *regs,
 /* OK, This is the point of no return */
 info->end_data = 0;
 info->end_code = 0;
-info->mmap = 0;
 elf_entry = (abi_ulong) elf_ex.e_entry;
 
 /* XXX Join this with PT_INTERP search? */
-- 
2.34.1




[PATCH 0/3] bsd-user: image_info cleanups

2023-08-18 Thread Richard Henderson
This mirrors some changes I've posted for linux-user,
removing stuff from image_info which is unused.


r~


Richard Henderson (3):
  bsd-user: Remove ELF_START_MMAP and image_info.start_mmap
  bsd-user: Remove image_info.mmap
  bsd-user: Remove image_info.start_brk

 bsd-user/arm/target_arch_elf.h| 1 -
 bsd-user/i386/target_arch_elf.h   | 1 -
 bsd-user/qemu.h   | 3 ---
 bsd-user/x86_64/target_arch_elf.h | 1 -
 bsd-user/elfload.c| 4 +---
 bsd-user/main.c   | 2 --
 6 files changed, 1 insertion(+), 11 deletions(-)

-- 
2.34.1




Re: [PATCH v2 4/8] target/loongarch: Introduce abstract TYPE_LOONGARCH64_CPU

2023-08-18 Thread Richard Henderson

On 8/18/23 10:20, Philippe Mathieu-Daudé wrote:

In preparation of introducing TYPE_LOONGARCH32_CPU, introduce
an abstract TYPE_LOONGARCH64_CPU.

Signed-off-by: Philippe Mathieu-Daudé
---
  target/loongarch/cpu.h |  1 +
  target/loongarch/cpu.c | 12 +---
  2 files changed, 10 insertions(+), 3 deletions(-)


Reviewed-by: Richard Henderson 


r~



Re: [PATCH v2 5/8] target/loongarch: Extract 64-bit specifics to loongarch64_cpu_class_init

2023-08-18 Thread Richard Henderson

On 8/18/23 10:20, Philippe Mathieu-Daudé wrote:

Extract loongarch64 specific code from loongarch_cpu_class_init()
to a new loongarch64_cpu_class_init().

In preparation of supporting loongarch32 cores, rename these
functions using the '64' suffix.

Signed-off-by: Philippe Mathieu-Daudé 
---
  target/loongarch/cpu.c | 27 +--
  1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 34d6c5a31d..356d039560 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -356,7 +356,7 @@ static bool loongarch_cpu_has_work(CPUState *cs)
  #endif
  }
  
-static void loongarch_la464_initfn(Object *obj)

+static void loongarch64_la464_initfn(Object *obj)


This rename is not relevant to populating the abstract loongarch64 class.

Otherwise,
Reviewed-by: Richard Henderson 


r~



Re: [PATCH v3 8/8] vdpa: Send cvq state load commands in parallel

2023-08-18 Thread Eugenio Perez Martin
On Wed, Jul 19, 2023 at 9:54 AM Hawkins Jiawei  wrote:
>
> This patch enables sending CVQ state load commands
> in parallel at device startup by following steps:
>
>   * Refactor vhost_vdpa_net_load_cmd() to iterate through
> the control commands shadow buffers. This allows different
> CVQ state load commands to use their own unique buffers.
>
>   * Delay the polling and checking of buffers until either
> the SVQ is full or control commands shadow buffers are full.
>
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1578
> Signed-off-by: Hawkins Jiawei 
> ---
>  net/vhost-vdpa.c | 157 +--
>  1 file changed, 96 insertions(+), 61 deletions(-)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 795c9c1fd2..1ebb58f7f6 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -633,6 +633,26 @@ static uint16_t 
> vhost_vdpa_net_svq_available_slots(VhostVDPAState *s)
>  return vhost_svq_available_slots(svq);
>  }
>
> +/*
> + * Poll SVQ for multiple pending control commands and check the device's ack.
> + *
> + * Caller should hold the BQL when invoking this function.
> + */
> +static ssize_t vhost_vdpa_net_svq_flush(VhostVDPAState *s,
> +size_t cmds_in_flight)
> +{
> +vhost_vdpa_net_svq_poll(s, cmds_in_flight);
> +
> +/* Device should and must use only one byte ack each control command */
> +assert(cmds_in_flight < vhost_vdpa_net_cvq_cmd_page_len());
> +for (int i = 0; i < cmds_in_flight; ++i) {
> +if (s->status[i] != VIRTIO_NET_OK) {
> +return -EIO;
> +}
> +}
> +return 0;
> +}
> +
>  static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState *s, void **out_cursor,
> void **in_cursor, uint8_t class,
> uint8_t cmd, const struct iovec 
> *data_sg,
> @@ -642,19 +662,41 @@ static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState 
> *s, void **out_cursor,
>  .class = class,
>  .cmd = cmd,
>  };
> -size_t data_size = iov_size(data_sg, data_num);
> +size_t data_size = iov_size(data_sg, data_num),
> +   left_bytes = vhost_vdpa_net_cvq_cmd_page_len() -
> +(*out_cursor - s->cvq_cmd_out_buffer);
>  /* Buffers for the device */
>  struct iovec out = {
> -.iov_base = *out_cursor,
>  .iov_len = sizeof(ctrl) + data_size,
>  };
>  struct iovec in = {
> -.iov_base = *in_cursor,
>  .iov_len = sizeof(*s->status),
>  };
>  ssize_t r;
>
> -assert(data_size < vhost_vdpa_net_cvq_cmd_page_len() - sizeof(ctrl));
> +if (sizeof(ctrl) > left_bytes || data_size > left_bytes - sizeof(ctrl) ||

I'm ok with this code, but maybe we can simplify the code if we use
two struct iovec as cursors instead of a void **? I think functions
like iov_size and iov_copy already take care of a few checks here.

Apart from that it would be great to merge this call to
vhost_vdpa_net_svq_flush, but I find it very hard to do unless we
scatter it through all callers of vhost_vdpa_net_load_cmd.

Apart from the minor comments I think the series is great, thanks!

> +vhost_vdpa_net_svq_available_slots(s) < 2) {
> +/*
> + * It is time to flush all pending control commands if SVQ is full
> + * or control commands shadow buffers are full.
> + *
> + * We can poll here since we've had BQL from the time
> + * we sent the descriptor.
> + */
> +r = vhost_vdpa_net_svq_flush(s, *in_cursor - (void *)s->status);
> +if (unlikely(r < 0)) {
> +return r;
> +}
> +
> +*out_cursor = s->cvq_cmd_out_buffer;
> +*in_cursor = s->status;
> +left_bytes = vhost_vdpa_net_cvq_cmd_page_len();
> +}
> +
> +out.iov_base = *out_cursor;
> +in.iov_base = *in_cursor;
> +
> +assert(data_size <= left_bytes - sizeof(ctrl));
>  /* Each CVQ command has one out descriptor and one in descriptor */
>  assert(vhost_vdpa_net_svq_available_slots(s) >= 2);
>
> @@ -670,11 +712,11 @@ static ssize_t vhost_vdpa_net_load_cmd(VhostVDPAState 
> *s, void **out_cursor,
>  return r;
>  }
>
> -/*
> - * We can poll here since we've had BQL from the time
> - * we sent the descriptor.
> - */
> -return vhost_vdpa_net_svq_poll(s, 1);
> +/* iterate the cursors */
> +*out_cursor += out.iov_len;
> +*in_cursor += in.iov_len;
> +
> +return 0;
>  }
>
>  static int vhost_vdpa_net_load_mac(VhostVDPAState *s, const VirtIONet *n,
> @@ -685,15 +727,12 @@ static int vhost_vdpa_net_load_mac(VhostVDPAState *s, 
> const VirtIONet *n,
>  .iov_base = (void *)n->mac,
>  .iov_len = sizeof(n->mac),
>  };
> -ssize_t dev_written = vhost_vdpa_net_load_cmd(s, out_cursor, 
> in_cursor,
> -  VIRTIO_NET_CTRL_MAC,
> - 

[PATCH v2 7/8] target/loongarch: Add new object class for loongarch32 cpus

2023-08-18 Thread Philippe Mathieu-Daudé
From: Jiajie Chen 

Add object class stub for future loongarch32 cpus.

Signed-off-by: Jiajie Chen 
Reviewed-by: Richard Henderson 
Signed-off-by: Song Gao 
Message-ID: <20230817093121.1053890-3-gaos...@loongson.cn>
[Rebased on TYPE_LOONGARCH64_CPU introduction]
Signed-off-by: Philippe Mathieu-Daudé 
---
 target/loongarch/cpu.h |  1 +
 target/loongarch/cpu.c | 11 +++
 2 files changed, 12 insertions(+)

diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index 3235ad081f..b8af491041 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -382,6 +382,7 @@ struct ArchCPU {
 };
 
 #define TYPE_LOONGARCH_CPU "loongarch-cpu"
+#define TYPE_LOONGARCH32_CPU "loongarch32-cpu"
 #define TYPE_LOONGARCH64_CPU "loongarch64-cpu"
 
 OBJECT_DECLARE_CPU_TYPE(LoongArchCPU, LoongArchCPUClass,
diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 356d039560..5082506f10 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -726,6 +726,10 @@ static void loongarch_cpu_class_init(ObjectClass *c, void 
*data)
 #endif
 }
 
+static void loongarch32_cpu_class_init(ObjectClass *c, void *data)
+{
+}
+
 static gchar *loongarch64_gdb_arch_name(CPUState *cs)
 {
 return g_strdup("loongarch64");
@@ -758,6 +762,13 @@ static const TypeInfo loongarch_cpu_type_infos[] = {
 .class_size = sizeof(LoongArchCPUClass),
 .class_init = loongarch_cpu_class_init,
 },
+{
+.name = TYPE_LOONGARCH32_CPU,
+.parent = TYPE_LOONGARCH_CPU,
+
+.abstract = true,
+.class_init = loongarch32_cpu_class_init,
+},
 {
 .name = TYPE_LOONGARCH64_CPU,
 .parent = TYPE_LOONGARCH_CPU,
-- 
2.41.0




[PATCH v2 8/8] target/loongarch: Add GDB support for loongarch32 mode

2023-08-18 Thread Philippe Mathieu-Daudé
From: Jiajie Chen 

GPRs and PC are 32-bit wide in loongarch32 mode.

Signed-off-by: Jiajie Chen 
Reviewed-by: Richard Henderson 
Signed-off-by: Song Gao 
Message-ID: <20230817093121.1053890-4-gaos...@loongson.cn>
[PMD: Rebased, set gdb_num_core_regs]
Signed-off-by: Philippe Mathieu-Daudé 
---
 configs/targets/loongarch64-softmmu.mak |  2 +-
 target/loongarch/cpu.c  | 10 ++
 target/loongarch/gdbstub.c  | 32 ++
 gdb-xml/loongarch-base32.xml| 45 +
 4 files changed, 81 insertions(+), 8 deletions(-)
 create mode 100644 gdb-xml/loongarch-base32.xml

diff --git a/configs/targets/loongarch64-softmmu.mak 
b/configs/targets/loongarch64-softmmu.mak
index 9abc99056f..f23780fdd8 100644
--- a/configs/targets/loongarch64-softmmu.mak
+++ b/configs/targets/loongarch64-softmmu.mak
@@ -1,5 +1,5 @@
 TARGET_ARCH=loongarch64
 TARGET_BASE_ARCH=loongarch
 TARGET_SUPPORTS_MTTCG=y
-TARGET_XML_FILES= gdb-xml/loongarch-base64.xml gdb-xml/loongarch-fpu.xml
+TARGET_XML_FILES= gdb-xml/loongarch-base32.xml gdb-xml/loongarch-base64.xml 
gdb-xml/loongarch-fpu.xml
 TARGET_NEED_FDT=y
diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 5082506f10..f42e8497d6 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -726,8 +726,18 @@ static void loongarch_cpu_class_init(ObjectClass *c, void 
*data)
 #endif
 }
 
+static gchar *loongarch32_gdb_arch_name(CPUState *cs)
+{
+return g_strdup("loongarch32");
+}
+
 static void loongarch32_cpu_class_init(ObjectClass *c, void *data)
 {
+CPUClass *cc = CPU_CLASS(c);
+
+cc->gdb_num_core_regs = 35;
+cc->gdb_core_xml_file = "loongarch-base32.xml";
+cc->gdb_arch_name = loongarch32_gdb_arch_name;
 }
 
 static gchar *loongarch64_gdb_arch_name(CPUState *cs)
diff --git a/target/loongarch/gdbstub.c b/target/loongarch/gdbstub.c
index 0752fff924..a462e25737 100644
--- a/target/loongarch/gdbstub.c
+++ b/target/loongarch/gdbstub.c
@@ -34,16 +34,25 @@ int loongarch_cpu_gdb_read_register(CPUState *cs, 
GByteArray *mem_buf, int n)
 {
 LoongArchCPU *cpu = LOONGARCH_CPU(cs);
 CPULoongArchState *env = &cpu->env;
+uint64_t val;
 
 if (0 <= n && n < 32) {
-return gdb_get_regl(mem_buf, env->gpr[n]);
+val = env->gpr[n];
 } else if (n == 32) {
 /* orig_a0 */
-return gdb_get_regl(mem_buf, 0);
+val = 0;
 } else if (n == 33) {
-return gdb_get_regl(mem_buf, env->pc);
+val = env->pc;
 } else if (n == 34) {
-return gdb_get_regl(mem_buf, env->CSR_BADV);
+val = env->CSR_BADV;
+}
+
+if (0 <= n && n <= 34) {
+if (is_la64(env)) {
+return gdb_get_reg64(mem_buf, val);
+} else {
+return gdb_get_reg32(mem_buf, val);
+}
 }
 return 0;
 }
@@ -52,15 +61,24 @@ int loongarch_cpu_gdb_write_register(CPUState *cs, uint8_t 
*mem_buf, int n)
 {
 LoongArchCPU *cpu = LOONGARCH_CPU(cs);
 CPULoongArchState *env = &cpu->env;
-target_ulong tmp = ldtul_p(mem_buf);
+target_ulong tmp;
+int read_length;
 int length = 0;
 
+if (is_la64(env)) {
+tmp = ldq_p(mem_buf);
+read_length = 8;
+} else {
+tmp = ldl_p(mem_buf);
+read_length = 4;
+}
+
 if (0 <= n && n < 32) {
 env->gpr[n] = tmp;
-length = sizeof(target_ulong);
+length = read_length;
 } else if (n == 33) {
 env->pc = tmp;
-length = sizeof(target_ulong);
+length = read_length;
 }
 return length;
 }
diff --git a/gdb-xml/loongarch-base32.xml b/gdb-xml/loongarch-base32.xml
new file mode 100644
index 00..af47bbd3da
--- /dev/null
+++ b/gdb-xml/loongarch-base32.xml
@@ -0,0 +1,45 @@
+
+
+
+
+
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+
-- 
2.41.0




[PATCH v2 3/8] target/loongarch: Fix loongarch_la464_initfn() misses setting LSPW

2023-08-18 Thread Philippe Mathieu-Daudé
From: Song Gao 

Reviewed-by: Richard Henderson 
Signed-off-by: Song Gao 
Reviewed-by: Philippe Mathieu-Daudé 
Message-ID: <20230817093121.1053890-11-gaos...@loongson.cn>
Signed-off-by: Philippe Mathieu-Daudé 
---
 target/loongarch/cpu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index dc617be36f..a1ebc20330 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -391,6 +391,7 @@ static void loongarch_la464_initfn(Object *obj)
 data = FIELD_DP32(data, CPUCFG2, LSX, 1),
 data = FIELD_DP32(data, CPUCFG2, LLFTP, 1);
 data = FIELD_DP32(data, CPUCFG2, LLFTP_VER, 1);
+data = FIELD_DP32(data, CPUCFG2, LSPW, 1);
 data = FIELD_DP32(data, CPUCFG2, LAM, 1);
 env->cpucfg[2] = data;
 
-- 
2.41.0




[PATCH v2 5/8] target/loongarch: Extract 64-bit specifics to loongarch64_cpu_class_init

2023-08-18 Thread Philippe Mathieu-Daudé
Extract loongarch64 specific code from loongarch_cpu_class_init()
to a new loongarch64_cpu_class_init().

In preparation of supporting loongarch32 cores, rename these
functions using the '64' suffix.

Signed-off-by: Philippe Mathieu-Daudé 
---
 target/loongarch/cpu.c | 27 +--
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 34d6c5a31d..356d039560 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -356,7 +356,7 @@ static bool loongarch_cpu_has_work(CPUState *cs)
 #endif
 }
 
-static void loongarch_la464_initfn(Object *obj)
+static void loongarch64_la464_initfn(Object *obj)
 {
 LoongArchCPU *cpu = LOONGARCH_CPU(obj);
 CPULoongArchState *env = &cpu->env;
@@ -695,11 +695,6 @@ static const struct SysemuCPUOps loongarch_sysemu_ops = {
 };
 #endif
 
-static gchar *loongarch_gdb_arch_name(CPUState *cs)
-{
-return g_strdup("loongarch64");
-}
-
 static void loongarch_cpu_class_init(ObjectClass *c, void *data)
 {
 LoongArchCPUClass *lacc = LOONGARCH_CPU_CLASS(c);
@@ -724,16 +719,27 @@ static void loongarch_cpu_class_init(ObjectClass *c, void 
*data)
 cc->disas_set_info = loongarch_cpu_disas_set_info;
 cc->gdb_read_register = loongarch_cpu_gdb_read_register;
 cc->gdb_write_register = loongarch_cpu_gdb_write_register;
-cc->gdb_num_core_regs = 35;
-cc->gdb_core_xml_file = "loongarch-base64.xml";
 cc->gdb_stop_before_watchpoint = true;
-cc->gdb_arch_name = loongarch_gdb_arch_name;
 
 #ifdef CONFIG_TCG
 cc->tcg_ops = &loongarch_tcg_ops;
 #endif
 }
 
+static gchar *loongarch64_gdb_arch_name(CPUState *cs)
+{
+return g_strdup("loongarch64");
+}
+
+static void loongarch64_cpu_class_init(ObjectClass *c, void *data)
+{
+CPUClass *cc = CPU_CLASS(c);
+
+cc->gdb_num_core_regs = 35;
+cc->gdb_core_xml_file = "loongarch-base64.xml";
+cc->gdb_arch_name = loongarch64_gdb_arch_name;
+}
+
 #define DEFINE_LOONGARCH_CPU_TYPE(size, model, initfn) \
 { \
 .parent = TYPE_LOONGARCH##size##_CPU, \
@@ -757,8 +763,9 @@ static const TypeInfo loongarch_cpu_type_infos[] = {
 .parent = TYPE_LOONGARCH_CPU,
 
 .abstract = true,
+.class_init = loongarch64_cpu_class_init,
 },
-DEFINE_LOONGARCH_CPU_TYPE(64, "la464", loongarch_la464_initfn),
+DEFINE_LOONGARCH_CPU_TYPE(64, "la464", loongarch64_la464_initfn),
 };
 
 DEFINE_TYPES(loongarch_cpu_type_infos)
-- 
2.41.0




[PATCH v2 2/8] target/loongarch: Remove duplicated disas_set_info assignment

2023-08-18 Thread Philippe Mathieu-Daudé
Commit 228021f05e ("target/loongarch: Add core definition") sets
disas_set_info to loongarch_cpu_disas_set_info. Probably due to
a failed git-rebase, commit ca61e75071 ("target/loongarch: Add gdb
support") also sets it to the same value. Remove the duplication.

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Richard Henderson 
---
 target/loongarch/cpu.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 7107968699..dc617be36f 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -723,7 +723,6 @@ static void loongarch_cpu_class_init(ObjectClass *c, void 
*data)
 cc->disas_set_info = loongarch_cpu_disas_set_info;
 cc->gdb_read_register = loongarch_cpu_gdb_read_register;
 cc->gdb_write_register = loongarch_cpu_gdb_write_register;
-cc->disas_set_info = loongarch_cpu_disas_set_info;
 cc->gdb_num_core_regs = 35;
 cc->gdb_core_xml_file = "loongarch-base64.xml";
 cc->gdb_stop_before_watchpoint = true;
-- 
2.41.0




[PATCH v2 6/8] target/loongarch: Add function to check current arch

2023-08-18 Thread Philippe Mathieu-Daudé
From: Jiajie Chen 

Add is_la64 function to check if the current cpucfg[1].arch equals to
2(LA64).

Signed-off-by: Jiajie Chen 
Co-authored-by: Richard Henderson 
Reviewed-by: Richard Henderson 
Signed-off-by: Song Gao 
Message-ID: <20230817093121.1053890-2-gaos...@loongson.cn>
Signed-off-by: Philippe Mathieu-Daudé 
---
 target/loongarch/cpu.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index c50b3a5ef3..3235ad081f 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -132,6 +132,11 @@ FIELD(CPUCFG1, HP, 24, 1)
 FIELD(CPUCFG1, IOCSR_BRD, 25, 1)
 FIELD(CPUCFG1, MSG_INT, 26, 1)
 
+/* cpucfg[1].arch */
+#define CPUCFG1_ARCH_LA32R   0
+#define CPUCFG1_ARCH_LA321
+#define CPUCFG1_ARCH_LA642
+
 /* cpucfg[2] bits */
 FIELD(CPUCFG2, FP, 0, 1)
 FIELD(CPUCFG2, FP_SP, 1, 1)
@@ -421,6 +426,11 @@ static inline int cpu_mmu_index(CPULoongArchState *env, 
bool ifetch)
 #endif
 }
 
+static inline bool is_la64(CPULoongArchState *env)
+{
+return FIELD_EX32(env->cpucfg[1], CPUCFG1, ARCH) == CPUCFG1_ARCH_LA64;
+}
+
 /*
  * LoongArch CPUs hardware flags.
  */
-- 
2.41.0




[PATCH v2 4/8] target/loongarch: Introduce abstract TYPE_LOONGARCH64_CPU

2023-08-18 Thread Philippe Mathieu-Daudé
In preparation of introducing TYPE_LOONGARCH32_CPU, introduce
an abstract TYPE_LOONGARCH64_CPU.

Signed-off-by: Philippe Mathieu-Daudé 
---
 target/loongarch/cpu.h |  1 +
 target/loongarch/cpu.c | 12 +---
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index fa371ca8ba..c50b3a5ef3 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -377,6 +377,7 @@ struct ArchCPU {
 };
 
 #define TYPE_LOONGARCH_CPU "loongarch-cpu"
+#define TYPE_LOONGARCH64_CPU "loongarch64-cpu"
 
 OBJECT_DECLARE_CPU_TYPE(LoongArchCPU, LoongArchCPUClass,
 LOONGARCH_CPU)
diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index a1ebc20330..34d6c5a31d 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -734,9 +734,9 @@ static void loongarch_cpu_class_init(ObjectClass *c, void 
*data)
 #endif
 }
 
-#define DEFINE_LOONGARCH_CPU_TYPE(model, initfn) \
+#define DEFINE_LOONGARCH_CPU_TYPE(size, model, initfn) \
 { \
-.parent = TYPE_LOONGARCH_CPU, \
+.parent = TYPE_LOONGARCH##size##_CPU, \
 .instance_init = initfn, \
 .name = LOONGARCH_CPU_TYPE_NAME(model), \
 }
@@ -752,7 +752,13 @@ static const TypeInfo loongarch_cpu_type_infos[] = {
 .class_size = sizeof(LoongArchCPUClass),
 .class_init = loongarch_cpu_class_init,
 },
-DEFINE_LOONGARCH_CPU_TYPE("la464", loongarch_la464_initfn),
+{
+.name = TYPE_LOONGARCH64_CPU,
+.parent = TYPE_LOONGARCH_CPU,
+
+.abstract = true,
+},
+DEFINE_LOONGARCH_CPU_TYPE(64, "la464", loongarch_la464_initfn),
 };
 
 DEFINE_TYPES(loongarch_cpu_type_infos)
-- 
2.41.0




[PATCH v2 0/8] target/loongarch: Cleanups in preparation of loongarch32 support

2023-08-18 Thread Philippe Mathieu-Daudé
v2:
- Do no rename loongarch_cpu_get/set_pc (rth)
- Rebased Jiajie's patches for convenience
- Added rth's R-b

Jiajie, this series contains few notes I took while
reviewing your series adding loongarch32 support [*].

If your series isn't merged, consider rebasing it on
this one.

Regards,

Phil.

[*] 
https://lore.kernel.org/qemu-devel/20230817093121.1053890-1-gaos...@loongson.cn/

Jiajie Chen (3):
  target/loongarch: Add function to check current arch
  target/loongarch: Add new object class for loongarch32 cpus
  target/loongarch: Add GDB support for loongarch32 mode

Philippe Mathieu-Daudé (4):
  target/loongarch: Log I/O write accesses to CSR registers
  target/loongarch: Remove duplicated disas_set_info assignment
  target/loongarch: Introduce abstract TYPE_LOONGARCH64_CPU
  target/loongarch: Extract 64-bit specifics to
loongarch64_cpu_class_init

Song Gao (1):
  target/loongarch: Fix loongarch_la464_initfn() misses setting LSPW

 configs/targets/loongarch64-softmmu.mak |  2 +-
 target/loongarch/cpu.h  | 12 +
 target/loongarch/cpu.c  | 62 +++--
 target/loongarch/gdbstub.c  | 32 ++---
 gdb-xml/loongarch-base32.xml| 45 ++
 5 files changed, 132 insertions(+), 21 deletions(-)
 create mode 100644 gdb-xml/loongarch-base32.xml

-- 
2.41.0




[PATCH v2 1/8] target/loongarch: Log I/O write accesses to CSR registers

2023-08-18 Thread Philippe Mathieu-Daudé
Various CSR registers have Read/Write fields. We might
want to see guest trying to change such registers.

Signed-off-by: Philippe Mathieu-Daudé 
---
 target/loongarch/cpu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index ad93ecac92..7107968699 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -544,6 +544,8 @@ static void loongarch_cpu_realizefn(DeviceState *dev, Error 
**errp)
 static void loongarch_qemu_write(void *opaque, hwaddr addr,
  uint64_t val, unsigned size)
 {
+qemu_log_mask(LOG_UNIMP, "[%s]: Unimplemented reg 0x%" HWADDR_PRIx "\n",
+  __func__, addr);
 }
 
 static uint64_t loongarch_qemu_read(void *opaque, hwaddr addr, unsigned size)
-- 
2.41.0




[PATCH v4 0/3] qmp, vhost-user: Remove virtio_list & update virtio introspection

2023-08-18 Thread Jonah Palmer
These patches update a few things related to virtio introspection via.
QMP/HMP commands.

1. Remove 'virtio_list' and instead query the QOM composition tree to
find any active & realized virtio devices.

The 'virtio_list' was duplicating information about virtio devices that
was already available in the QOM composition tree, so there was no need
to keep this list.

2. Add new transport, protocol, and device features as well as support
to introspect vhost-user-gpio devices.

Vhost-user-gpio previously had no support for introspection. Support for
introspecting its vhost-user device is now available in these patches.

3. Move VhostUserProtocolFeature definition to its corresponding header
file (vhost-user.h). Cleanup previous definitions in other files.

VhostUserProtocolFeature was being defined in 3 separate files. Instead
of 3 separate definitions, use one instead and add it to the
vhost-user.h header file.

New vhost-user protocol feature:

 - VHOST_USER_PROTOCOL_F_STATUS

New virtio device features:
---
virtio-blk:
 - VIRTIO_BLK_F_SECURE_ERASE

virtio-net:
 - VIRTIO_NET_F_NOTF_COAL
 - VIRTIO_NET_F_GUEST_USO4
 - VIRTIO_NET_F_GUEST_USO6
 - VIRTIO_NET_F_HOST_USO

virtio/vhost-user-gpio:
 - VIRTIO_GPIO_F_IRQ
 - VHOST_USER_F_PROTOCOL_FEATURES

v4: use 'g_autofree char *' instead of 'gchar *'
remove unneeded object unreferences ('object_unref')
remove 'VHOST_F_LOG_ALL' in virtio-gpio feature map
remove 'VIRTIO_F_RING_RESET' in transport feature map (already
exists)

v3: use recursion and type casting to find realized virtio devices
remove virtio scmi & bluetooth feature mappings
revert virtio scmi & bluetooth case changes in qmp_decode_features
change config define for VIRTIO_GPIO to CONFIG_VHOST_USER_GPIO
move VhostUserProtocolFeature definition to header file

v2: verify virtio devices via. 'TYPE_VIRTIO_DEVICES'
verify path is a virtio device before checking if it's realized
remove 'VIRTIO_BLK_F_ZONED' update (already exists)
add cover letter

Jonah Palmer (3):
  qmp: remove virtio_list, search QOM tree instead
  qmp: update virtio feature maps, vhost-user-gpio introspection
  vhost-user: move VhostUserProtocolFeature definition to header file

 hw/scsi/vhost-user-scsi.c  |   4 -
 hw/virtio/vhost-user-gpio.c|   7 ++
 hw/virtio/vhost-user.c |  21 -
 hw/virtio/virtio-qmp.c | 142 +++--
 hw/virtio/virtio-qmp.h |   7 --
 hw/virtio/virtio.c |   6 --
 include/hw/virtio/vhost-user.h |  21 +
 7 files changed, 93 insertions(+), 115 deletions(-)

-- 
2.39.3




[PATCH v4 2/3] qmp: update virtio feature maps, vhost-user-gpio introspection

2023-08-18 Thread Jonah Palmer
Add new vhost-user protocol feature to vhost-user protocol feature map
and enumeration:
 - VHOST_USER_PROTOCOL_F_STATUS

Add new virtio device features for several virtio devices to their
respective feature mappings:

virtio-blk:
 - VIRTIO_BLK_F_SECURE_ERASE

virtio-net:
 - VIRTIO_NET_F_NOTF_COAL
 - VIRTIO_NET_F_GUEST_USO4
 - VIRTIO_NET_F_GUEST_USO6
 - VIRTIO_NET_F_HOST_USO

virtio/vhost-user-gpio:
 - VIRTIO_GPIO_F_IRQ
 - VHOST_USER_F_PROTOCOL_FEATURES

Add support for introspection on vhost-user-gpio devices.

Signed-off-by: Jonah Palmer 
---

 Jonah: The previous version of this patch included the
 'VIRTIO_F_RING_RESET' feature being added to the virtio transport
 feature map but is no longer included here as it was added in a
 separate patch series that was recently pulled in.

 The previous version also included the 'VHOST_F_LOG_ALL' feature for
 vhost-user-gpio. However, this was removed in this version since
 it's not used for this device nor did it make sense for it.

 hw/virtio/vhost-user-gpio.c |  7 +++
 hw/virtio/virtio-qmp.c  | 34 +-
 2 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-user-gpio.c b/hw/virtio/vhost-user-gpio.c
index 3b013f2d0f..3d7fae3984 100644
--- a/hw/virtio/vhost-user-gpio.c
+++ b/hw/virtio/vhost-user-gpio.c
@@ -205,6 +205,12 @@ static void vu_gpio_guest_notifier_mask(VirtIODevice 
*vdev, int idx, bool mask)
 vhost_virtqueue_mask(&gpio->vhost_dev, vdev, idx, mask);
 }
 
+static struct vhost_dev *vu_gpio_get_vhost(VirtIODevice *vdev)
+{
+VHostUserGPIO *gpio = VHOST_USER_GPIO(vdev);
+return &gpio->vhost_dev;
+}
+
 static void do_vhost_user_cleanup(VirtIODevice *vdev, VHostUserGPIO *gpio)
 {
 virtio_delete_queue(gpio->command_vq);
@@ -413,6 +419,7 @@ static void vu_gpio_class_init(ObjectClass *klass, void 
*data)
 vdc->get_config = vu_gpio_get_config;
 vdc->set_status = vu_gpio_set_status;
 vdc->guest_notifier_mask = vu_gpio_guest_notifier_mask;
+vdc->get_vhost = vu_gpio_get_vhost;
 }
 
 static const TypeInfo vu_gpio_info = {
diff --git a/hw/virtio/virtio-qmp.c b/hw/virtio/virtio-qmp.c
index ac5f0ee0ee..9c3284e6c3 100644
--- a/hw/virtio/virtio-qmp.c
+++ b/hw/virtio/virtio-qmp.c
@@ -30,6 +30,7 @@
 #include "standard-headers/linux/virtio_iommu.h"
 #include "standard-headers/linux/virtio_mem.h"
 #include "standard-headers/linux/virtio_vsock.h"
+#include "standard-headers/linux/virtio_gpio.h"
 
 #include CONFIG_DEVICES
 
@@ -53,6 +54,7 @@ enum VhostUserProtocolFeature {
 VHOST_USER_PROTOCOL_F_RESET_DEVICE = 13,
 VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS = 14,
 VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS = 15,
+VHOST_USER_PROTOCOL_F_STATUS = 16,
 VHOST_USER_PROTOCOL_F_MAX
 };
 
@@ -136,6 +138,9 @@ static const qmp_virtio_feature_map_t 
vhost_user_protocol_map[] = {
 FEATURE_ENTRY(VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS, \
 "VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS: Configuration for "
 "memory slots supported"),
+FEATURE_ENTRY(VHOST_USER_PROTOCOL_F_STATUS, \
+"VHOST_USER_PROTOCOL_F_STATUS: Querying and notifying back-end "
+"device status supported"),
 { -1, "" }
 };
 
@@ -178,6 +183,8 @@ static const qmp_virtio_feature_map_t 
virtio_blk_feature_map[] = {
 "VIRTIO_BLK_F_DISCARD: Discard command supported"),
 FEATURE_ENTRY(VIRTIO_BLK_F_WRITE_ZEROES, \
 "VIRTIO_BLK_F_WRITE_ZEROES: Write zeroes command supported"),
+FEATURE_ENTRY(VIRTIO_BLK_F_SECURE_ERASE, \
+"VIRTIO_BLK_F_SECURE_ERASE: Secure erase supported"),
 FEATURE_ENTRY(VIRTIO_BLK_F_ZONED, \
 "VIRTIO_BLK_F_ZONED: Zoned block devices"),
 #ifndef VIRTIO_BLK_NO_LEGACY
@@ -301,6 +308,14 @@ static const qmp_virtio_feature_map_t 
virtio_net_feature_map[] = {
 FEATURE_ENTRY(VIRTIO_NET_F_CTRL_MAC_ADDR, \
 "VIRTIO_NET_F_CTRL_MAC_ADDR: MAC address set through control "
 "channel"),
+FEATURE_ENTRY(VIRTIO_NET_F_NOTF_COAL, \
+"VIRTIO_NET_F_NOTF_COAL: Device supports coalescing 
notifications"),
+FEATURE_ENTRY(VIRTIO_NET_F_GUEST_USO4, \
+"VIRTIO_NET_F_GUEST_USO4: Driver can receive USOv4"),
+FEATURE_ENTRY(VIRTIO_NET_F_GUEST_USO6, \
+"VIRTIO_NET_F_GUEST_USO4: Driver can receive USOv6"),
+FEATURE_ENTRY(VIRTIO_NET_F_HOST_USO, \
+"VIRTIO_NET_F_HOST_USO: Device can receive USO"),
 FEATURE_ENTRY(VIRTIO_NET_F_HASH_REPORT, \
 "VIRTIO_NET_F_HASH_REPORT: Hash reporting supported"),
 FEATURE_ENTRY(VIRTIO_NET_F_RSS, \
@@ -471,6 +486,18 @@ static const qmp_virtio_feature_map_t 
virtio_rng_feature_map[] = {
 };
 #endif
 
+/* virtio/vhost-gpio features mapping */
+#ifdef CONFIG_VHOST_USER_GPIO
+static const qmp_virtio_feature_map_t virtio_gpio_feature_map[] = {
+FEATURE_ENTRY(VIRTIO_GPIO_F_IRQ, \
+"VIRTIO_GPIO_F_IRQ: Device supports interrupts on GPIO lines"),
+FEATURE_ENTRY(VHOST_USER_F_PRO

[PATCH v4 3/3] vhost-user: move VhostUserProtocolFeature definition to header file

2023-08-18 Thread Jonah Palmer
Move the definition of VhostUserProtocolFeature to
include/hw/virtio/vhost-user.h.

Remove previous definitions in hw/scsi/vhost-user-scsi.c,
hw/virtio/vhost-user.c, and hw/virtio/virtio-qmp.c.

Previously there were 3 separate definitions of this over 3 different
files. Now only 1 definition of this will be present for these 3 files.

Signed-off-by: Jonah Palmer 
---
 hw/scsi/vhost-user-scsi.c  |  4 
 hw/virtio/vhost-user.c | 21 -
 hw/virtio/virtio-qmp.c | 22 +-
 include/hw/virtio/vhost-user.h | 21 +
 4 files changed, 22 insertions(+), 46 deletions(-)

diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index ee99b19e7a..df6b66cc1a 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -39,10 +39,6 @@ static const int user_feature_bits[] = {
 VHOST_INVALID_FEATURE_BIT
 };
 
-enum VhostUserProtocolFeature {
-VHOST_USER_PROTOCOL_F_RESET_DEVICE = 13,
-};
-
 static void vhost_user_scsi_set_status(VirtIODevice *vdev, uint8_t status)
 {
 VHostUserSCSI *s = (VHostUserSCSI *)vdev;
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 8dcf049d42..a096335921 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -56,27 +56,6 @@
  */
 #define VHOST_USER_MAX_CONFIG_SIZE 256
 
-enum VhostUserProtocolFeature {
-VHOST_USER_PROTOCOL_F_MQ = 0,
-VHOST_USER_PROTOCOL_F_LOG_SHMFD = 1,
-VHOST_USER_PROTOCOL_F_RARP = 2,
-VHOST_USER_PROTOCOL_F_REPLY_ACK = 3,
-VHOST_USER_PROTOCOL_F_NET_MTU = 4,
-VHOST_USER_PROTOCOL_F_BACKEND_REQ = 5,
-VHOST_USER_PROTOCOL_F_CROSS_ENDIAN = 6,
-VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7,
-VHOST_USER_PROTOCOL_F_PAGEFAULT = 8,
-VHOST_USER_PROTOCOL_F_CONFIG = 9,
-VHOST_USER_PROTOCOL_F_BACKEND_SEND_FD = 10,
-VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 11,
-VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD = 12,
-VHOST_USER_PROTOCOL_F_RESET_DEVICE = 13,
-/* Feature 14 reserved for VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS. */
-VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS = 15,
-VHOST_USER_PROTOCOL_F_STATUS = 16,
-VHOST_USER_PROTOCOL_F_MAX
-};
-
 #define VHOST_USER_PROTOCOL_FEATURE_MASK ((1 << VHOST_USER_PROTOCOL_F_MAX) - 1)
 
 typedef enum VhostUserRequest {
diff --git a/hw/virtio/virtio-qmp.c b/hw/virtio/virtio-qmp.c
index 9c3284e6c3..2e1f9c0e7a 100644
--- a/hw/virtio/virtio-qmp.c
+++ b/hw/virtio/virtio-qmp.c
@@ -17,6 +17,7 @@
 #include "qapi/qapi-commands-qom.h"
 #include "qapi/qmp/qobject.h"
 #include "qapi/qmp/qjson.h"
+#include "hw/virtio/vhost-user.h"
 
 #include "standard-headers/linux/virtio_ids.h"
 #include "standard-headers/linux/vhost_types.h"
@@ -37,27 +38,6 @@
 #define FEATURE_ENTRY(name, desc) (qmp_virtio_feature_map_t) \
 { .virtio_bit = name, .feature_desc = desc }
 
-enum VhostUserProtocolFeature {
-VHOST_USER_PROTOCOL_F_MQ = 0,
-VHOST_USER_PROTOCOL_F_LOG_SHMFD = 1,
-VHOST_USER_PROTOCOL_F_RARP = 2,
-VHOST_USER_PROTOCOL_F_REPLY_ACK = 3,
-VHOST_USER_PROTOCOL_F_NET_MTU = 4,
-VHOST_USER_PROTOCOL_F_BACKEND_REQ = 5,
-VHOST_USER_PROTOCOL_F_CROSS_ENDIAN = 6,
-VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7,
-VHOST_USER_PROTOCOL_F_PAGEFAULT = 8,
-VHOST_USER_PROTOCOL_F_CONFIG = 9,
-VHOST_USER_PROTOCOL_F_BACKEND_SEND_FD = 10,
-VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 11,
-VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD = 12,
-VHOST_USER_PROTOCOL_F_RESET_DEVICE = 13,
-VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS = 14,
-VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS = 15,
-VHOST_USER_PROTOCOL_F_STATUS = 16,
-VHOST_USER_PROTOCOL_F_MAX
-};
-
 /* Virtio transport features mapping */
 static const qmp_virtio_feature_map_t virtio_transport_map[] = {
 /* Virtio device transport features */
diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
index 191216a74f..80e2b4a463 100644
--- a/include/hw/virtio/vhost-user.h
+++ b/include/hw/virtio/vhost-user.h
@@ -11,6 +11,27 @@
 #include "chardev/char-fe.h"
 #include "hw/virtio/virtio.h"
 
+enum VhostUserProtocolFeature {
+VHOST_USER_PROTOCOL_F_MQ = 0,
+VHOST_USER_PROTOCOL_F_LOG_SHMFD = 1,
+VHOST_USER_PROTOCOL_F_RARP = 2,
+VHOST_USER_PROTOCOL_F_REPLY_ACK = 3,
+VHOST_USER_PROTOCOL_F_NET_MTU = 4,
+VHOST_USER_PROTOCOL_F_BACKEND_REQ = 5,
+VHOST_USER_PROTOCOL_F_CROSS_ENDIAN = 6,
+VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7,
+VHOST_USER_PROTOCOL_F_PAGEFAULT = 8,
+VHOST_USER_PROTOCOL_F_CONFIG = 9,
+VHOST_USER_PROTOCOL_F_BACKEND_SEND_FD = 10,
+VHOST_USER_PROTOCOL_F_HOST_NOTIFIER = 11,
+VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD = 12,
+VHOST_USER_PROTOCOL_F_RESET_DEVICE = 13,
+VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS = 14,
+VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS = 15,
+VHOST_USER_PROTOCOL_F_STATUS = 16,
+VHOST_USER_PROTOCOL_F_MAX
+};
+
 /**
  * VhostUserHostNotifier - notifier information for one queue
  * @rcu: rcu_head for cleanup

[PATCH v4 1/3] qmp: remove virtio_list, search QOM tree instead

2023-08-18 Thread Jonah Palmer
The virtio_list duplicates information about virtio devices that already
exist in the QOM composition tree. Instead of creating this list of
realized virtio devices, search the QOM composition tree instead.

This patch modifies the QMP command qmp_x_query_virtio to instead
recursively search the QOM composition tree for devices of type
'TYPE_VIRTIO_DEVICE'. The device is also checked to ensure it's
realized.

Signed-off-by: Jonah Palmer 
---

 Jonah: In the v2 patches, the qmp_x_query_virtio function was
 iterating through devices found via. qmp_qom_list and appending
 "/virtio-backend" to devices' paths to check if they were a virtio
 device.

 This method was messy and involved unneeded string manipulation.

 Instead, we can use recursion with object_get_root to iterate through
 all parent and child device paths to find virtio devices.

 The qmp_find_virtio_device function was also updated to simplify the
 method of determining if a path is to a valid and realized virtio
 device.

 hw/virtio/virtio-qmp.c | 88 +++---
 hw/virtio/virtio-qmp.h |  7 
 hw/virtio/virtio.c |  6 ---
 3 files changed, 32 insertions(+), 69 deletions(-)

diff --git a/hw/virtio/virtio-qmp.c b/hw/virtio/virtio-qmp.c
index 7515b0947b..ac5f0ee0ee 100644
--- a/hw/virtio/virtio-qmp.c
+++ b/hw/virtio/virtio-qmp.c
@@ -667,70 +667,46 @@ VirtioDeviceFeatures *qmp_decode_features(uint16_t 
device_id, uint64_t bitmap)
 return features;
 }
 
-VirtioInfoList *qmp_x_query_virtio(Error **errp)
+static int query_dev_child(Object *child, void *opaque)
 {
-VirtioInfoList *list = NULL;
-VirtioInfo *node;
-VirtIODevice *vdev;
+VirtioInfoList **vdevs = opaque;
+Object *dev = object_dynamic_cast(child, TYPE_VIRTIO_DEVICE);
+if (dev != NULL && DEVICE(dev)->realized) {
+VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+
+VirtioInfo *info = g_new(VirtioInfo, 1);
+
+/* Get canonical path of device */
+g_autofree char *path = object_get_canonical_path(dev);
 
-QTAILQ_FOREACH(vdev, &virtio_list, next) {
-DeviceState *dev = DEVICE(vdev);
-Error *err = NULL;
-QObject *obj = qmp_qom_get(dev->canonical_path, "realized", &err);
-
-if (err == NULL) {
-GString *is_realized = qobject_to_json_pretty(obj, true);
-/* virtio device is NOT realized, remove it from list */
-if (!strncmp(is_realized->str, "false", 4)) {
-QTAILQ_REMOVE(&virtio_list, vdev, next);
-} else {
-node = g_new(VirtioInfo, 1);
-node->path = g_strdup(dev->canonical_path);
-node->name = g_strdup(vdev->name);
-QAPI_LIST_PREPEND(list, node);
-}
-   g_string_free(is_realized, true);
-}
-qobject_unref(obj);
+info->path = g_strdup(path);
+info->name = g_strdup(vdev->name);
+QAPI_LIST_PREPEND(*vdevs, info);
 }
+return 0;
+}
 
-return list;
+VirtioInfoList *qmp_x_query_virtio(Error **errp)
+{
+VirtioInfoList *vdevs = NULL;
+
+/* Query the QOM composition tree recursively for virtio devices */
+object_child_foreach_recursive(object_get_root(), query_dev_child, &vdevs);
+if (vdevs == NULL) {
+error_setg(errp, "No virtio devices found");
+}
+return vdevs;
 }
 
 VirtIODevice *qmp_find_virtio_device(const char *path)
 {
-VirtIODevice *vdev;
-
-QTAILQ_FOREACH(vdev, &virtio_list, next) {
-DeviceState *dev = DEVICE(vdev);
-
-if (strcmp(dev->canonical_path, path) != 0) {
-continue;
-}
-
-Error *err = NULL;
-QObject *obj = qmp_qom_get(dev->canonical_path, "realized", &err);
-if (err == NULL) {
-GString *is_realized = qobject_to_json_pretty(obj, true);
-/* virtio device is NOT realized, remove it from list */
-if (!strncmp(is_realized->str, "false", 4)) {
-g_string_free(is_realized, true);
-qobject_unref(obj);
-QTAILQ_REMOVE(&virtio_list, vdev, next);
-return NULL;
-}
-g_string_free(is_realized, true);
-} else {
-/* virtio device doesn't exist in QOM tree */
-QTAILQ_REMOVE(&virtio_list, vdev, next);
-qobject_unref(obj);
-return NULL;
-}
-/* device exists in QOM tree & is realized */
-qobject_unref(obj);
-return vdev;
+/* Verify the canonical path is a realized virtio device */
+Object *dev = object_dynamic_cast(object_resolve_path(path, NULL),
+  TYPE_VIRTIO_DEVICE);
+if (!dev || !DEVICE(dev)->realized) {
+return NULL;
 }
-return NULL;
+return VIRTIO_DEVICE(dev);
 }
 
 VirtioStatus *qmp_x_query_virtio_status(const char *path, Error **errp)
@@ -740,7 +716,7 @@ VirtioStatus *qmp_x_query_virtio_status(const char *path, 
Error 

[PATCH 31/33] linux-user: Bound mmap_min_addr by host page size

2023-08-18 Thread Richard Henderson
Bizzarely, it is possible to set /proc/sys/vm/mmap_min_addr
to a value below the host page size.  Fix that.

Signed-off-by: Richard Henderson 
---
 linux-user/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/linux-user/main.c b/linux-user/main.c
index 2334d7cc67..1925c275ed 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -904,7 +904,7 @@ int main(int argc, char **argv, char **envp)
 if ((fp = fopen("/proc/sys/vm/mmap_min_addr", "r")) != NULL) {
 unsigned long tmp;
 if (fscanf(fp, "%lu", &tmp) == 1 && tmp != 0) {
-mmap_min_addr = tmp;
+mmap_min_addr = MAX(tmp, host_page_size);
 qemu_log_mask(CPU_LOG_PAGE, "host mmap_min_addr=0x%lx\n",
   mmap_min_addr);
 }
-- 
2.34.1




[PATCH 23/33] linux-user: Split out mmap_h_gt_g

2023-08-18 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 linux-user/mmap.c | 288 ++
 1 file changed, 139 insertions(+), 149 deletions(-)

diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index ed82b4bb75..6ab2f35e6f 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -223,7 +223,16 @@ int target_mprotect(abi_ulong start, abi_ulong len, int 
target_prot)
 return ret;
 }
 
-/* map an incomplete host page */
+/*
+ * Map an incomplete host page.
+ *
+ * Here be dragons.  This case will not work if there is an existing
+ * overlapping host page, which is file mapped, and for which the mapping
+ * is beyond the end of the file.  In that case, we will see SIGBUS when
+ * trying to write a portion of this page.
+ *
+ * FIXME: Work around this with a temporary signal handler and longjmp.
+ */
 static bool mmap_frag(abi_ulong real_start, abi_ulong start, abi_ulong last,
   int prot, int flags, int fd, off_t offset)
 {
@@ -629,19 +638,138 @@ static abi_long mmap_h_lt_g(abi_ulong start, abi_ulong 
len, int host_prot,
 return mmap_end(start, last, start, pass_last, mmap_flags, page_flags);
 }
 
+/*
+ * Special case host page size > target page size.
+ *
+ * The two special cases are address and file offsets that are valid
+ * for the guest that cannot be directly represented by the host.
+ */
+static abi_long mmap_h_gt_g(abi_ulong start, abi_ulong len,
+int target_prot, int host_prot,
+int flags, int page_flags, int fd,
+off_t offset, int host_page_size)
+{
+void *p, *want_p = g2h_untagged(start);
+off_t host_offset = offset & -host_page_size;
+abi_ulong last, real_start, real_last;
+bool misaligned_offset = false;
+size_t host_len;
+
+if (!(flags & (MAP_FIXED | MAP_FIXED_NOREPLACE))) {
+/*
+ * Adjust the offset to something representable on the host.
+ */
+host_len = len + offset - host_offset;
+p = mmap(want_p, host_len, host_prot, flags, fd, host_offset);
+if (p == MAP_FAILED) {
+return -1;
+}
+
+/* Update start to the file position at offset. */
+p += offset - host_offset;
+
+start = h2g(p);
+last = start + len - 1;
+return mmap_end(start, last, start, last, flags, page_flags);
+}
+
+if (!(flags & MAP_ANONYMOUS)) {
+misaligned_offset = (start ^ offset) & (host_page_size - 1);
+
+/*
+ * The fallback for misalignment is a private mapping + read.
+ * This carries none of semantics required of MAP_SHARED.
+ */
+if (misaligned_offset && (flags & MAP_TYPE) != MAP_PRIVATE) {
+errno = EINVAL;
+return -1;
+}
+}
+
+last = start + len - 1;
+real_start = start & -host_page_size;
+real_last = ROUND_UP(last, host_page_size) - 1;
+
+/*
+ * Handle the start and end of the mapping.
+ */
+if (real_start < start) {
+abi_ulong real_page_last = real_start + host_page_size - 1;
+if (last <= real_page_last) {
+/* Entire allocation a subset of one host page. */
+if (!mmap_frag(real_start, start, last, target_prot,
+   flags, fd, offset)) {
+return -1;
+}
+return mmap_end(start, last, -1, 0, flags, page_flags);
+}
+
+if (!mmap_frag(real_start, start, real_page_last, target_prot,
+   flags, fd, offset)) {
+return -1;
+}
+real_start = real_page_last + 1;
+}
+
+if (last < real_last) {
+abi_ulong real_page_start = real_last - host_page_size + 1;
+if (!mmap_frag(real_page_start, real_page_start, last,
+   target_prot, flags, fd,
+   offset + real_page_start - start)) {
+return -1;
+}
+real_last = real_page_start - 1;
+}
+
+if (real_start > real_last) {
+return mmap_end(start, last, -1, 0, flags, page_flags);
+}
+
+/*
+ * Handle the middle of the mapping.
+ */
+
+host_len = real_last - real_start + 1;
+want_p += real_start - start;
+
+if (flags & MAP_ANONYMOUS) {
+p = mmap(want_p, host_len, host_prot, flags, -1, 0);
+} else if (!misaligned_offset) {
+p = mmap(want_p, host_len, host_prot, flags, fd,
+ offset + real_start - start);
+} else {
+p = mmap(want_p, host_len, host_prot | PROT_WRITE,
+ flags | MAP_ANONYMOUS, -1, 0);
+}
+if (p != want_p) {
+if (p != MAP_FAILED) {
+munmap(p, host_len);
+errno = EEXIST;
+}
+return -1;
+}
+
+if (misaligned_offset) {
+/* TODO: The read could be short. */
+if (pread(fd, p, host_len, offset + real_start - start) != host_len) {
+munmap(p, host_len);
+return -1

[PATCH 20/33] linux-user: Do early mmap placement only for reserved_va

2023-08-18 Thread Richard Henderson
For reserved_va, place all non-fixed maps then proceed
as for MAP_FIXED.

Signed-off-by: Richard Henderson 
---
 linux-user/mmap.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index caa76eb11a..7d482df06d 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -495,17 +495,19 @@ static abi_long target_mmap__locked(abi_ulong start, 
abi_ulong len,
 host_offset = offset & -host_page_size;
 
 /*
- * If the user is asking for the kernel to find a location, do that
- * before we truncate the length for mapping files below.
+ * For reserved_va, we are in full control of the allocation.
+ * Find a suitible hole and convert to MAP_FIXED.
  */
-if (!(flags & (MAP_FIXED | MAP_FIXED_NOREPLACE))) {
+if (reserved_va && !(flags & (MAP_FIXED | MAP_FIXED_NOREPLACE))) {
 host_len = len + offset - host_offset;
-host_len = ROUND_UP(host_len, host_page_size);
-start = mmap_find_vma(real_start, host_len, TARGET_PAGE_SIZE);
+start = mmap_find_vma(real_start, host_len,
+  MAX(host_page_size, TARGET_PAGE_SIZE));
 if (start == (abi_ulong)-1) {
 errno = ENOMEM;
 return -1;
 }
+start += offset - host_offset;
+flags |= MAP_FIXED;
 }
 
 /*
-- 
2.34.1




[PATCH 25/33] tests/tcg: Extend file in linux-madvise.c

2023-08-18 Thread Richard Henderson
When guest page size > host page size, this test can fail
due to the SIGBUS protection hack.  Avoid this by making
sure that the file size is at least one guest page.

Visible with alpha guest on x86_64 host.

Signed-off-by: Richard Henderson 
---
 tests/tcg/multiarch/linux/linux-madvise.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tests/tcg/multiarch/linux/linux-madvise.c 
b/tests/tcg/multiarch/linux/linux-madvise.c
index 29d0997e68..539fb3b772 100644
--- a/tests/tcg/multiarch/linux/linux-madvise.c
+++ b/tests/tcg/multiarch/linux/linux-madvise.c
@@ -42,6 +42,8 @@ static void test_file(void)
 assert(ret == 0);
 written = write(fd, &c, sizeof(c));
 assert(written == sizeof(c));
+ret = ftruncate(fd, pagesize);
+assert(ret == 0);
 page = mmap(NULL, pagesize, PROT_READ, MAP_PRIVATE, fd, 0);
 assert(page != MAP_FAILED);
 
-- 
2.34.1




[PATCH 22/33] linux-user: Split out mmap_h_lt_g

2023-08-18 Thread Richard Henderson
Work much harder to get alignment and mapping beyond the end
of the file correct.  Both of which are excercised by our
test-mmap for alpha (8k pages) on any 4k page host.

Signed-off-by: Richard Henderson 
---
 linux-user/mmap.c | 156 +-
 1 file changed, 125 insertions(+), 31 deletions(-)

diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index 7a0c0c1f35..ed82b4bb75 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -507,6 +507,128 @@ static abi_long mmap_h_eq_g(abi_ulong start, abi_ulong 
len,
 return mmap_end(start, last, start, last, flags, page_flags);
 }
 
+/*
+ * Special case host page size < target page size.
+ *
+ * The two special cases are increased guest alignment, and mapping
+ * past the end of a file.
+ *
+ * When mapping files into a memory area larger than the file,
+ * accesses to pages beyond the file size will cause a SIGBUS.
+ *
+ * For example, if mmaping a file of 100 bytes on a host with 4K
+ * pages emulating a target with 8K pages, the target expects to
+ * be able to access the first 8K. But the host will trap us on
+ * any access beyond 4K.
+ *
+ * When emulating a target with a larger page-size than the hosts,
+ * we may need to truncate file maps at EOF and add extra anonymous
+ * pages up to the targets page boundary.
+ *
+ * This workaround only works for files that do not change.
+ * If the file is later extended (e.g. ftruncate), the SIGBUS
+ * vanishes and the proper behaviour is that changes within the
+ * anon page should be reflected in the file.
+ *
+ * However, this case is rather common with executable images,
+ * so the workaround is important for even trivial tests, whereas
+ * the mmap of of a file being extended is less common.
+ */
+static abi_long mmap_h_lt_g(abi_ulong start, abi_ulong len, int host_prot,
+int mmap_flags, int page_flags, int fd,
+off_t offset, int host_page_size)
+{
+void *p, *want_p = g2h_untagged(start);
+off_t fileend_adj = 0;
+int flags = mmap_flags;
+abi_ulong last, pass_last;
+
+if (!(flags & MAP_ANONYMOUS)) {
+struct stat sb;
+
+if (fstat(fd, &sb) == -1) {
+return -1;
+}
+if (offset >= sb.st_size) {
+/*
+ * The entire map is beyond the end of the file.
+ * Transform it to an anonymous mapping.
+ */
+flags |= MAP_ANONYMOUS;
+fd = -1;
+offset = 0;
+} else if (offset + len > sb.st_size) {
+/*
+ * A portion of the map is beyond the end of the file.
+ * Truncate the file portion of the allocation.
+ */
+fileend_adj = offset + len - sb.st_size;
+}
+}
+
+if (flags & (MAP_FIXED | MAP_FIXED_NOREPLACE)) {
+if (fileend_adj) {
+p = mmap(want_p, len, host_prot, flags | MAP_ANONYMOUS, -1, 0);
+} else {
+p = mmap(want_p, len, host_prot, flags, fd, offset);
+}
+if (p != want_p) {
+if (p != MAP_FAILED) {
+munmap(p, len);
+errno = EEXIST;
+}
+return -1;
+}
+
+if (fileend_adj) {
+void *t = mmap(p, len - fileend_adj, host_prot,
+   (flags & ~MAP_FIXED_NOREPLACE) | MAP_FIXED,
+   fd, offset);
+assert(t != MAP_FAILED);
+}
+} else {
+size_t host_len, part_len;
+
+/*
+ * Take care to align the host memory.  Perform a larger anonymous
+ * allocation and extract the aligned portion.  Remap the file on
+ * top of that.
+ */
+host_len = len + TARGET_PAGE_SIZE - host_page_size;
+p = mmap(want_p, host_len, host_prot, flags | MAP_ANONYMOUS, -1, 0);
+if (p == MAP_FAILED) {
+return -1;
+}
+
+part_len = (uintptr_t)p & (TARGET_PAGE_SIZE - 1);
+if (part_len) {
+part_len = TARGET_PAGE_SIZE - part_len;
+munmap(p, part_len);
+p += part_len;
+host_len -= part_len;
+}
+if (len < host_len) {
+munmap(p + len, host_len - len);
+}
+
+if (!(flags & MAP_ANONYMOUS)) {
+void *t = mmap(p, len - fileend_adj, host_prot,
+   flags | MAP_FIXED, fd, offset);
+assert(t != MAP_FAILED);
+}
+
+start = h2g(p);
+}
+
+last = start + len - 1;
+if (fileend_adj) {
+pass_last = ROUND_UP(last - fileend_adj, host_page_size) - 1;
+} else {
+pass_last = last;
+}
+return mmap_end(start, last, start, pass_last, mmap_flags, page_flags);
+}
+
 static abi_long target_mmap__locked(abi_ulong start, abi_ulong len,
 int target_prot, int flags, int page_flags,
 int fd, off_t offset)
@@ -55

[PATCH 18/33] linux-user: Fix sub-host-page mmap

2023-08-18 Thread Richard Henderson
We cannot skip over the_end1 to the_end, because we fail to
record the validity of the guest page with the interval tree.
Remove "the_end" and rename "the_end1" to "the_end".

Signed-off-by: Richard Henderson 
---
 linux-user/mmap.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index 85d16a29c1..e905b1b8f2 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -599,7 +599,7 @@ static abi_long target_mmap__locked(abi_ulong start, 
abi_ulong len,
target_prot, flags, fd, offset)) {
 return -1;
 }
-goto the_end1;
+goto the_end;
 }
 if (!mmap_frag(real_start, start,
real_start + host_page_size - 1,
@@ -646,7 +646,7 @@ static abi_long target_mmap__locked(abi_ulong start, 
abi_ulong len,
 passthrough_last = real_last;
 }
 }
- the_end1:
+ the_end:
 if (flags & MAP_ANONYMOUS) {
 page_flags |= PAGE_ANON;
 }
@@ -663,7 +663,6 @@ static abi_long target_mmap__locked(abi_ulong start, 
abi_ulong len,
 page_set_flags(passthrough_last + 1, last, page_flags);
 }
 }
- the_end:
 trace_target_mmap_complete(start);
 if (qemu_loglevel_mask(CPU_LOG_PAGE)) {
 FILE *f = qemu_log_trylock();
-- 
2.34.1




[PATCH 03/33] linux-user: Remove qemu_host_page_{size, mask} in probe_guest_base

2023-08-18 Thread Richard Henderson
The host SHMLBA is by definition a multiple of the host page size.
Thus the remaining component of qemu_host_page_size is the
target page size.

Signed-off-by: Richard Henderson 
---
 linux-user/elfload.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 9865f5e825..3648d7048d 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -2731,7 +2731,7 @@ static bool pgb_addr_set(PGBAddrs *ga, abi_ulong 
guest_loaddr,
 
 /* Add any HI_COMMPAGE not covered by reserved_va. */
 if (reserved_va < HI_COMMPAGE) {
-ga->bounds[n][0] = HI_COMMPAGE & qemu_host_page_mask;
+ga->bounds[n][0] = HI_COMMPAGE & -qemu_real_host_page_size();
 ga->bounds[n][1] = HI_COMMPAGE + TARGET_PAGE_SIZE - 1;
 n++;
 }
@@ -2913,7 +2913,7 @@ void probe_guest_base(const char *image_name, abi_ulong 
guest_loaddr,
   abi_ulong guest_hiaddr)
 {
 /* In order to use host shmat, we must be able to honor SHMLBA.  */
-uintptr_t align = MAX(SHMLBA, qemu_host_page_size);
+uintptr_t align = MAX(SHMLBA, TARGET_PAGE_SIZE);
 
 /* Sanity check the guest binary. */
 if (reserved_va) {
-- 
2.34.1




[PATCH 13/33] softmmu/physmem: Remove qemu_host_page_size

2023-08-18 Thread Richard Henderson
Use qemu_real_host_page_size() instead.

Signed-off-by: Richard Henderson 
---
 softmmu/physmem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 3df73542e1..6881b2d8f8 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -3448,7 +3448,7 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, 
size_t length)
  *fallocate works on hugepages and shmem
  *shared anonymous memory requires madvise REMOVE
  */
-need_madvise = (rb->page_size == qemu_host_page_size);
+need_madvise = (rb->page_size == qemu_real_host_page_size());
 need_fallocate = rb->fd != -1;
 if (need_fallocate) {
 /* For a file, this causes the area of the file to be zero'd
-- 
2.34.1




[PATCH 21/33] linux-user: Split out mmap_h_eq_g

2023-08-18 Thread Richard Henderson
Move the MAX_FIXED_NOREPLACE check for reserved_va earlier.
Move the computation of host_prot earlier.

Signed-off-by: Richard Henderson 
---
 linux-user/mmap.c | 66 +--
 1 file changed, 53 insertions(+), 13 deletions(-)

diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index 7d482df06d..7a0c0c1f35 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -482,6 +482,31 @@ static abi_long mmap_end(abi_ulong start, abi_ulong last,
 return start;
 }
 
+/*
+ * Special case host page size == target page size,
+ * where there are no edge conditions.
+ */
+static abi_long mmap_h_eq_g(abi_ulong start, abi_ulong len,
+int host_prot, int flags, int page_flags,
+int fd, off_t offset)
+{
+void *p, *want_p = g2h_untagged(start);
+abi_ulong last;
+
+p = mmap(want_p, len, host_prot, flags, fd, offset);
+if (p == MAP_FAILED) {
+return -1;
+}
+if ((flags & MAP_FIXED_NOREPLACE) && p != want_p) {
+errno = EEXIST;
+return -1;
+}
+
+start = h2g(p);
+last = start + len - 1;
+return mmap_end(start, last, start, last, flags, page_flags);
+}
+
 static abi_long target_mmap__locked(abi_ulong start, abi_ulong len,
 int target_prot, int flags, int page_flags,
 int fd, off_t offset)
@@ -490,6 +515,7 @@ static abi_long target_mmap__locked(abi_ulong start, 
abi_ulong len,
 abi_ulong ret, last, real_start, real_last, retaddr, host_len;
 abi_ulong passthrough_start = -1, passthrough_last = 0;
 off_t host_offset;
+int host_prot;
 
 real_start = start & -host_page_size;
 host_offset = offset & -host_page_size;
@@ -498,16 +524,33 @@ static abi_long target_mmap__locked(abi_ulong start, 
abi_ulong len,
  * For reserved_va, we are in full control of the allocation.
  * Find a suitible hole and convert to MAP_FIXED.
  */
-if (reserved_va && !(flags & (MAP_FIXED | MAP_FIXED_NOREPLACE))) {
-host_len = len + offset - host_offset;
-start = mmap_find_vma(real_start, host_len,
-  MAX(host_page_size, TARGET_PAGE_SIZE));
-if (start == (abi_ulong)-1) {
-errno = ENOMEM;
-return -1;
+if (reserved_va) {
+if (flags & MAP_FIXED_NOREPLACE) {
+/* Validate that the chosen range is empty. */
+if (!page_check_range_empty(start, start + len - 1)) {
+errno = EEXIST;
+return -1;
+}
+flags = (flags & ~MAP_FIXED_NOREPLACE) | MAP_FIXED;
+} else if (!(flags & MAP_FIXED)) {
+size_t real_len = len + offset - host_offset;
+abi_ulong align = MAX(host_page_size, TARGET_PAGE_SIZE);
+
+start = mmap_find_vma(real_start, real_len, align);
+if (start == (abi_ulong)-1) {
+errno = ENOMEM;
+return -1;
+}
+start += offset - host_offset;
+flags |= MAP_FIXED;
 }
-start += offset - host_offset;
-flags |= MAP_FIXED;
+}
+
+host_prot = target_to_host_prot(target_prot);
+
+if (host_page_size == TARGET_PAGE_SIZE) {
+return mmap_h_eq_g(start, len, host_prot, flags,
+   page_flags, fd, offset);
 }
 
 /*
@@ -543,12 +586,10 @@ static abi_long target_mmap__locked(abi_ulong start, 
abi_ulong len,
 
 if (!(flags & (MAP_FIXED | MAP_FIXED_NOREPLACE))) {
 uintptr_t host_start;
-int host_prot;
 void *p;
 
 host_len = len + offset - host_offset;
 host_len = ROUND_UP(host_len, host_page_size);
-host_prot = target_to_host_prot(target_prot);
 
 /* Note: we prefer to control the mapping address. */
 p = mmap(g2h_untagged(start), host_len, host_prot,
@@ -671,8 +712,7 @@ static abi_long target_mmap__locked(abi_ulong start, 
abi_ulong len,
 len1 = real_last - real_start + 1;
 want_p = g2h_untagged(real_start);
 
-p = mmap(want_p, len1, target_to_host_prot(target_prot),
- flags, fd, offset1);
+p = mmap(want_p, len1, host_prot, flags, fd, offset1);
 if (p != want_p) {
 if (p != MAP_FAILED) {
 munmap(p, len1);
-- 
2.34.1




[PATCH 24/33] tests/tcg: Remove run-test-mmap-*

2023-08-18 Thread Richard Henderson
These tests are confused, because -p does not change
the guest page size, but the host page size.

Signed-off-by: Richard Henderson 
---
 tests/tcg/alpha/Makefile.target |  3 ---
 tests/tcg/arm/Makefile.target   |  3 ---
 tests/tcg/hppa/Makefile.target  |  3 ---
 tests/tcg/i386/Makefile.target  |  3 ---
 tests/tcg/m68k/Makefile.target  |  3 ---
 tests/tcg/multiarch/Makefile.target |  9 -
 tests/tcg/ppc/Makefile.target   | 12 
 tests/tcg/sh4/Makefile.target   |  3 ---
 tests/tcg/sparc64/Makefile.target   |  6 --
 9 files changed, 45 deletions(-)
 delete mode 100644 tests/tcg/ppc/Makefile.target
 delete mode 100644 tests/tcg/sparc64/Makefile.target

diff --git a/tests/tcg/alpha/Makefile.target b/tests/tcg/alpha/Makefile.target
index b94500a7d9..fdd7ddf64e 100644
--- a/tests/tcg/alpha/Makefile.target
+++ b/tests/tcg/alpha/Makefile.target
@@ -13,6 +13,3 @@ test-cmov: test-cond.c
$(CC) $(CFLAGS) $(EXTRA_CFLAGS) $< -o $@ $(LDFLAGS)
 
 run-test-cmov: test-cmov
-
-# On Alpha Linux only supports 8k pages
-EXTRA_RUNS+=run-test-mmap-8192
diff --git a/tests/tcg/arm/Makefile.target b/tests/tcg/arm/Makefile.target
index 0038cef02c..4b8c9c334e 100644
--- a/tests/tcg/arm/Makefile.target
+++ b/tests/tcg/arm/Makefile.target
@@ -79,6 +79,3 @@ sha512-vector: sha512.c
 ARM_TESTS += sha512-vector
 
 TESTS += $(ARM_TESTS)
-
-# On ARM Linux only supports 4k pages
-EXTRA_RUNS+=run-test-mmap-4096
diff --git a/tests/tcg/hppa/Makefile.target b/tests/tcg/hppa/Makefile.target
index cdd0d572a7..ea5ae2186d 100644
--- a/tests/tcg/hppa/Makefile.target
+++ b/tests/tcg/hppa/Makefile.target
@@ -2,9 +2,6 @@
 #
 # HPPA specific tweaks - specifically masking out broken tests
 
-# On parisc Linux supports 4K/16K/64K (but currently only 4k works)
-EXTRA_RUNS+=run-test-mmap-4096 # run-test-mmap-16384 run-test-mmap-65536
-
 # This triggers failures for hppa-linux about 1% of the time
 # HPPA is the odd target that can't use the sigtramp page;
 # it requires the full vdso with dwarf2 unwind info.
diff --git a/tests/tcg/i386/Makefile.target b/tests/tcg/i386/Makefile.target
index fdf757c6ce..f64d7bfbf5 100644
--- a/tests/tcg/i386/Makefile.target
+++ b/tests/tcg/i386/Makefile.target
@@ -71,9 +71,6 @@ endif
 I386_TESTS:=$(filter-out $(SKIP_I386_TESTS), $(ALL_X86_TESTS))
 TESTS=$(MULTIARCH_TESTS) $(I386_TESTS)
 
-# On i386 and x86_64 Linux only supports 4k pages (large pages are a different 
hack)
-EXTRA_RUNS+=run-test-mmap-4096
-
 sha512-sse: CFLAGS=-msse4.1 -O3
 sha512-sse: sha512.c
$(CC) $(CFLAGS) $(EXTRA_CFLAGS) $< -o $@ $(LDFLAGS)
diff --git a/tests/tcg/m68k/Makefile.target b/tests/tcg/m68k/Makefile.target
index 1163c7ef03..73a16aedd2 100644
--- a/tests/tcg/m68k/Makefile.target
+++ b/tests/tcg/m68k/Makefile.target
@@ -5,6 +5,3 @@
 
 VPATH += $(SRC_PATH)/tests/tcg/m68k
 TESTS += trap
-
-# On m68k Linux supports 4k and 8k pages (but 8k is currently broken)
-EXTRA_RUNS+=run-test-mmap-4096 # run-test-mmap-8192
diff --git a/tests/tcg/multiarch/Makefile.target 
b/tests/tcg/multiarch/Makefile.target
index 43bddeaf21..fa1ac190f2 100644
--- a/tests/tcg/multiarch/Makefile.target
+++ b/tests/tcg/multiarch/Makefile.target
@@ -51,18 +51,9 @@ run-plugin-vma-pthread-with-%: vma-pthread
$(call skip-test, $<, "flaky on CI?")
 endif
 
-# We define the runner for test-mmap after the individual
-# architectures have defined their supported pages sizes. If no
-# additional page sizes are defined we only run the default test.
-
-# default case (host page size)
 run-test-mmap: test-mmap
$(call run-test, test-mmap, $(QEMU) $<, $< (default))
 
-# additional page sizes (defined by each architecture adding to EXTRA_RUNS)
-run-test-mmap-%: test-mmap
-   $(call run-test, test-mmap-$*, $(QEMU) -p $* $<, $< ($* byte pages))
-
 ifneq ($(HAVE_GDB_BIN),)
 ifeq ($(HOST_GDB_SUPPORTS_ARCH),y)
 GDB_SCRIPT=$(SRC_PATH)/tests/guest-debug/run-test.py
diff --git a/tests/tcg/ppc/Makefile.target b/tests/tcg/ppc/Makefile.target
deleted file mode 100644
index f5e08c7376..00
--- a/tests/tcg/ppc/Makefile.target
+++ /dev/null
@@ -1,12 +0,0 @@
-# -*- Mode: makefile -*-
-#
-# PPC - included from tests/tcg/Makefile
-#
-
-ifneq (,$(findstring 64,$(TARGET_NAME)))
-# On PPC64 Linux can be configured with 4k (default) or 64k pages (currently 
broken)
-EXTRA_RUNS+=run-test-mmap-4096 #run-test-mmap-65536
-else
-# On PPC32 Linux supports 4K/16K/64K/256K (but currently only 4k works)
-EXTRA_RUNS+=run-test-mmap-4096 #run-test-mmap-16384 run-test-mmap-65536 
run-test-mmap-262144
-endif
diff --git a/tests/tcg/sh4/Makefile.target b/tests/tcg/sh4/Makefile.target
index 47c39a44b6..16eaa850a8 100644
--- a/tests/tcg/sh4/Makefile.target
+++ b/tests/tcg/sh4/Makefile.target
@@ -3,9 +3,6 @@
 # SuperH specific tweaks
 #
 
-# On sh Linux supports 4k, 8k, 16k and 64k pages (but only 4k currently works)
-EXTRA_RUNS+=run-test-mmap-4096 # run-test-mmap-8192 run-test-mmap-16384 
run-test-mmap-65536
-
 # This triggers failures for sh4-

[PATCH 19/33] linux-user: Split out mmap_end

2023-08-18 Thread Richard Henderson
Use a subroutine instead of a goto within target_mmap__locked.

Signed-off-by: Richard Henderson 
---
 linux-user/mmap.c | 69 +++
 1 file changed, 40 insertions(+), 29 deletions(-)

diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index e905b1b8f2..caa76eb11a 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -446,6 +446,42 @@ abi_ulong mmap_find_vma(abi_ulong start, abi_ulong size, 
abi_ulong align)
 }
 }
 
+/*
+ * Record a successful mmap within the user-exec interval tree.
+ */
+static abi_long mmap_end(abi_ulong start, abi_ulong last,
+ abi_ulong passthrough_start,
+ abi_ulong passthrough_last,
+ int flags, int page_flags)
+{
+if (flags & MAP_ANONYMOUS) {
+page_flags |= PAGE_ANON;
+}
+page_flags |= PAGE_RESET;
+if (passthrough_start > passthrough_last) {
+page_set_flags(start, last, page_flags);
+} else {
+if (start < passthrough_start) {
+page_set_flags(start, passthrough_start - 1, page_flags);
+}
+page_set_flags(passthrough_start, passthrough_last,
+   page_flags | PAGE_PASSTHROUGH);
+if (passthrough_last < last) {
+page_set_flags(passthrough_last + 1, last, page_flags);
+}
+}
+trace_target_mmap_complete(start);
+if (qemu_loglevel_mask(CPU_LOG_PAGE)) {
+FILE *f = qemu_log_trylock();
+if (f) {
+fprintf(f, "page layout changed following mmap\n");
+page_dump(f);
+qemu_log_unlock(f);
+}
+}
+return start;
+}
+
 static abi_long target_mmap__locked(abi_ulong start, abi_ulong len,
 int target_prot, int flags, int page_flags,
 int fd, off_t offset)
@@ -588,7 +624,7 @@ static abi_long target_mmap__locked(abi_ulong start, 
abi_ulong len,
 ret = target_mprotect(start, len, target_prot);
 assert(ret == 0);
 }
-goto the_end;
+return mmap_end(start, last, -1, 0, flags, page_flags);
 }
 
 /* handle the start of the mapping */
@@ -599,7 +635,7 @@ static abi_long target_mmap__locked(abi_ulong start, 
abi_ulong len,
target_prot, flags, fd, offset)) {
 return -1;
 }
-goto the_end;
+return mmap_end(start, last, -1, 0, flags, page_flags);
 }
 if (!mmap_frag(real_start, start,
real_start + host_page_size - 1,
@@ -646,33 +682,8 @@ static abi_long target_mmap__locked(abi_ulong start, 
abi_ulong len,
 passthrough_last = real_last;
 }
 }
- the_end:
-if (flags & MAP_ANONYMOUS) {
-page_flags |= PAGE_ANON;
-}
-page_flags |= PAGE_RESET;
-if (passthrough_start > passthrough_last) {
-page_set_flags(start, last, page_flags);
-} else {
-if (start < passthrough_start) {
-page_set_flags(start, passthrough_start - 1, page_flags);
-}
-page_set_flags(passthrough_start, passthrough_last,
-   page_flags | PAGE_PASSTHROUGH);
-if (passthrough_last < last) {
-page_set_flags(passthrough_last + 1, last, page_flags);
-}
-}
-trace_target_mmap_complete(start);
-if (qemu_loglevel_mask(CPU_LOG_PAGE)) {
-FILE *f = qemu_log_trylock();
-if (f) {
-fprintf(f, "page layout changed following mmap\n");
-page_dump(f);
-qemu_log_unlock(f);
-}
-}
-return start;
+return mmap_end(start, last, passthrough_start, passthrough_last,
+flags, page_flags);
 }
 
 /* NOTE: all the constants are the HOST ones */
-- 
2.34.1




[PATCH 28/33] accel/tcg: Disconnect TargetPageDataNode from page size

2023-08-18 Thread Richard Henderson
Dynamically size the node for the runtime target page size.

Signed-off-by: Richard Henderson 
---
 accel/tcg/user-exec.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c
index 4c1697500a..09dc85c851 100644
--- a/accel/tcg/user-exec.c
+++ b/accel/tcg/user-exec.c
@@ -863,7 +863,7 @@ tb_page_addr_t get_page_addr_code_hostp(CPUArchState *env, 
vaddr addr,
 typedef struct TargetPageDataNode {
 struct rcu_head rcu;
 IntervalTreeNode itree;
-char data[TPD_PAGES][TARGET_PAGE_DATA_SIZE] __attribute__((aligned));
+char data[] __attribute__((aligned));
 } TargetPageDataNode;
 
 static IntervalTreeRoot targetdata_root;
@@ -901,7 +901,8 @@ void page_reset_target_data(target_ulong start, 
target_ulong last)
 n_last = MIN(last, n->last);
 p_len = (n_last + 1 - n_start) >> TARGET_PAGE_BITS;
 
-memset(t->data[p_ofs], 0, p_len * TARGET_PAGE_DATA_SIZE);
+memset(t->data + p_ofs * TARGET_PAGE_DATA_SIZE, 0,
+   p_len * TARGET_PAGE_DATA_SIZE);
 }
 }
 
@@ -909,7 +910,7 @@ void *page_get_target_data(target_ulong address)
 {
 IntervalTreeNode *n;
 TargetPageDataNode *t;
-target_ulong page, region;
+target_ulong page, region, p_ofs;
 
 page = address & TARGET_PAGE_MASK;
 region = address & TBD_MASK;
@@ -925,7 +926,8 @@ void *page_get_target_data(target_ulong address)
 mmap_lock();
 n = interval_tree_iter_first(&targetdata_root, page, page);
 if (!n) {
-t = g_new0(TargetPageDataNode, 1);
+t = g_malloc0(sizeof(TargetPageDataNode)
+  + TPD_PAGES * TARGET_PAGE_DATA_SIZE);
 n = &t->itree;
 n->start = region;
 n->last = region | ~TBD_MASK;
@@ -935,7 +937,8 @@ void *page_get_target_data(target_ulong address)
 }
 
 t = container_of(n, TargetPageDataNode, itree);
-return t->data[(page - region) >> TARGET_PAGE_BITS];
+p_ofs = (page - region) >> TARGET_PAGE_BITS;
+return t->data + p_ofs * TARGET_PAGE_DATA_SIZE;
 }
 #else
 void page_reset_target_data(target_ulong start, target_ulong last) { }
-- 
2.34.1




[PATCH 06/33] linux-user/nios2: Remove qemu_host_page_size from init_guest_commpage

2023-08-18 Thread Richard Henderson
Use qemu_real_host_page_size.
If !reserved_va, use MAP_FIXED_NOREPLACE.

Signed-off-by: Richard Henderson 
---
 linux-user/elfload.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 1da77f4f71..b3b9232955 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -1375,10 +1375,14 @@ static bool init_guest_commpage(void)
  0x3a, 0x68, 0x3b, 0x00,  /* trap 0 */
 };
 
-void *want = g2h_untagged(LO_COMMPAGE & -qemu_host_page_size);
-void *addr = mmap(want, qemu_host_page_size, PROT_READ | PROT_WRITE,
-  MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0);
+int host_page_size = qemu_real_host_page_size();
+void *want, *addr;
 
+want = g2h_untagged(LO_COMMPAGE & -host_page_size);
+addr = mmap(want, host_page_size, PROT_READ | PROT_WRITE,
+MAP_ANONYMOUS | MAP_PRIVATE |
+(reserved_va ? MAP_FIXED : MAP_FIXED_NOREPLACE),
+-1, 0);
 if (addr == MAP_FAILED) {
 perror("Allocating guest commpage");
 exit(EXIT_FAILURE);
@@ -1387,9 +1391,9 @@ static bool init_guest_commpage(void)
 return false;
 }
 
-memcpy(addr, kuser_page, sizeof(kuser_page));
+memcpy(g2h_untagged(LO_COMMPAGE), kuser_page, sizeof(kuser_page));
 
-if (mprotect(addr, qemu_host_page_size, PROT_READ)) {
+if (mprotect(addr, host_page_size, PROT_READ)) {
 perror("Protecting guest commpage");
 exit(EXIT_FAILURE);
 }
-- 
2.34.1




[PATCH 26/33] linux-user: Deprecate and disable -p pagesize

2023-08-18 Thread Richard Henderson
This option controls the host page size.  From the mis-usage in
our own testsuite, this is easily confused with guest page size.

The only thing that occurs when changing the host page size is
that stuff breaks, because one cannot actually change the host
page size.  Therefore reject all but the no-op setting as part
of the deprecation process.

Signed-off-by: Richard Henderson 
---
 linux-user/main.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/linux-user/main.c b/linux-user/main.c
index c1058abc3c..3dd3310331 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -332,10 +332,11 @@ static void handle_arg_ld_prefix(const char *arg)
 
 static void handle_arg_pagesize(const char *arg)
 {
-qemu_host_page_size = atoi(arg);
-if (qemu_host_page_size == 0 ||
-(qemu_host_page_size & (qemu_host_page_size - 1)) != 0) {
-fprintf(stderr, "page size must be a power of two\n");
+unsigned size, want = qemu_real_host_page_size();
+
+if (qemu_strtoui(arg, NULL, 10, &size) || size != want) {
+error_report("Deprecated page size option cannot "
+ "change host page size (%u)", want);
 exit(EXIT_FAILURE);
 }
 }
@@ -496,7 +497,7 @@ static const struct qemu_argument arg_table[] = {
 {"D",  "QEMU_LOG_FILENAME", true, handle_arg_log_filename,
  "logfile", "write logs to 'logfile' (default stderr)"},
 {"p",  "QEMU_PAGESIZE",true,  handle_arg_pagesize,
- "pagesize",   "set the host page size to 'pagesize'"},
+ "pagesize",   "deprecated change to host page size"},
 {"one-insn-per-tb",
"QEMU_ONE_INSN_PER_TB",  false, handle_arg_one_insn_per_tb,
  "",   "run with one guest instruction per emulated TB"},
-- 
2.34.1




[PATCH 09/33] linux-user: Remove REAL_HOST_PAGE_ALIGN from mmap.c

2023-08-18 Thread Richard Henderson
We already have qemu_real_host_page_size() in a local variable.

Signed-off-by: Richard Henderson 
---
 linux-user/mmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index fc23192a32..48a6ef0af9 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -541,7 +541,7 @@ abi_long target_mmap(abi_ulong start, abi_ulong len, int 
target_prot,
  * the hosts real pagesize. Additional anonymous maps
  * will be created beyond EOF.
  */
-len = REAL_HOST_PAGE_ALIGN(sb.st_size - offset);
+len = ROUND_UP(sb.st_size - offset, host_page_size);
 }
 }
 
-- 
2.34.1




[PATCH 08/33] linux-user: Remove qemu_host_page_{size, mask} from mmap.c

2023-08-18 Thread Richard Henderson
Use qemu_real_host_page_size instead.

Signed-off-by: Richard Henderson 
---
 linux-user/mmap.c | 66 +++
 1 file changed, 33 insertions(+), 33 deletions(-)

diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index 9aab48d4a3..fc23192a32 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -121,6 +121,7 @@ static int target_to_host_prot(int prot)
 /* NOTE: all the constants are the HOST ones, but addresses are target. */
 int target_mprotect(abi_ulong start, abi_ulong len, int target_prot)
 {
+int host_page_size = qemu_real_host_page_size();
 abi_ulong starts[3];
 abi_ulong lens[3];
 int prots[3];
@@ -145,13 +146,13 @@ int target_mprotect(abi_ulong start, abi_ulong len, int 
target_prot)
 }
 
 last = start + len - 1;
-host_start = start & qemu_host_page_mask;
+host_start = start & -host_page_size;
 host_last = HOST_PAGE_ALIGN(last) - 1;
 nranges = 0;
 
 mmap_lock();
 
-if (host_last - host_start < qemu_host_page_size) {
+if (host_last - host_start < host_page_size) {
 /* Single host page contains all guest pages: sum the prot. */
 prot1 = target_prot;
 for (abi_ulong a = host_start; a < start; a += TARGET_PAGE_SIZE) {
@@ -161,7 +162,7 @@ int target_mprotect(abi_ulong start, abi_ulong len, int 
target_prot)
 prot1 |= page_get_flags(a + 1);
 }
 starts[nranges] = host_start;
-lens[nranges] = qemu_host_page_size;
+lens[nranges] = host_page_size;
 prots[nranges] = prot1;
 nranges++;
 } else {
@@ -174,10 +175,10 @@ int target_mprotect(abi_ulong start, abi_ulong len, int 
target_prot)
 /* If the resulting sum differs, create a new range. */
 if (prot1 != target_prot) {
 starts[nranges] = host_start;
-lens[nranges] = qemu_host_page_size;
+lens[nranges] = host_page_size;
 prots[nranges] = prot1;
 nranges++;
-host_start += qemu_host_page_size;
+host_start += host_page_size;
 }
 }
 
@@ -189,9 +190,9 @@ int target_mprotect(abi_ulong start, abi_ulong len, int 
target_prot)
 }
 /* If the resulting sum differs, create a new range. */
 if (prot1 != target_prot) {
-host_last -= qemu_host_page_size;
+host_last -= host_page_size;
 starts[nranges] = host_last + 1;
-lens[nranges] = qemu_host_page_size;
+lens[nranges] = host_page_size;
 prots[nranges] = prot1;
 nranges++;
 }
@@ -226,6 +227,7 @@ int target_mprotect(abi_ulong start, abi_ulong len, int 
target_prot)
 static bool mmap_frag(abi_ulong real_start, abi_ulong start, abi_ulong last,
   int prot, int flags, int fd, off_t offset)
 {
+int host_page_size = qemu_real_host_page_size();
 abi_ulong real_last;
 void *host_start;
 int prot_old, prot_new;
@@ -242,7 +244,7 @@ static bool mmap_frag(abi_ulong real_start, abi_ulong 
start, abi_ulong last,
 return false;
 }
 
-real_last = real_start + qemu_host_page_size - 1;
+real_last = real_start + host_page_size - 1;
 host_start = g2h_untagged(real_start);
 
 /* Get the protection of the target pages outside the mapping. */
@@ -260,12 +262,12 @@ static bool mmap_frag(abi_ulong real_start, abi_ulong 
start, abi_ulong last,
  * outside of the fragment we need to map.  Allocate a new host
  * page to cover, discarding whatever else may have been present.
  */
-void *p = mmap(host_start, qemu_host_page_size,
+void *p = mmap(host_start, host_page_size,
target_to_host_prot(prot),
flags | MAP_ANONYMOUS, -1, 0);
 if (p != host_start) {
 if (p != MAP_FAILED) {
-munmap(p, qemu_host_page_size);
+munmap(p, host_page_size);
 errno = EEXIST;
 }
 return false;
@@ -280,7 +282,7 @@ static bool mmap_frag(abi_ulong real_start, abi_ulong 
start, abi_ulong last,
 /* Adjust protection to be able to write. */
 if (!(host_prot_old & PROT_WRITE)) {
 host_prot_old |= PROT_WRITE;
-mprotect(host_start, qemu_host_page_size, host_prot_old);
+mprotect(host_start, host_page_size, host_prot_old);
 }
 
 /* Read or zero the new guest pages. */
@@ -294,7 +296,7 @@ static bool mmap_frag(abi_ulong real_start, abi_ulong 
start, abi_ulong last,
 
 /* Put final protection */
 if (host_prot_new != host_prot_old) {
-mprotect(host_start, qemu_host_page_size, host_prot_new);
+mprotect(host_start, host_page_size, host_prot_new);
 }
 return true;
 }
@@ -329,17 +331,18 @@ static abi_ulong mmap_find_vma_reserved(abi_ulong start, 
abi_ulong size,
  */
 abi_ulong mmap_find_vma(a

[PATCH 12/33] hw/tpm: Remove HOST_PAGE_ALIGN from tpm_ppi_init

2023-08-18 Thread Richard Henderson
The size of the allocation need not match the alignment.

Signed-off-by: Richard Henderson 
---
 hw/tpm/tpm_ppi.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/hw/tpm/tpm_ppi.c b/hw/tpm/tpm_ppi.c
index 7f74e26ec6..91eeafd53a 100644
--- a/hw/tpm/tpm_ppi.c
+++ b/hw/tpm/tpm_ppi.c
@@ -47,8 +47,7 @@ void tpm_ppi_reset(TPMPPI *tpmppi)
 void tpm_ppi_init(TPMPPI *tpmppi, MemoryRegion *m,
   hwaddr addr, Object *obj)
 {
-tpmppi->buf = qemu_memalign(qemu_real_host_page_size(),
-HOST_PAGE_ALIGN(TPM_PPI_ADDR_SIZE));
+tpmppi->buf = qemu_memalign(qemu_real_host_page_size(), TPM_PPI_ADDR_SIZE);
 memory_region_init_ram_device_ptr(&tpmppi->ram, obj, "tpm-ppi",
   TPM_PPI_ADDR_SIZE, tpmppi->buf);
 vmstate_register_ram(&tpmppi->ram, DEVICE(obj));
-- 
2.34.1




  1   2   3   >