date:20230204

[PATCH] configure: Add 'mkdir build' check

2023-02-04 Thread Dinah Baum

QEMU configure script goes into an infinite error printing loop
when in read only directory due to 'build' dir never being created.

Checking if 'mkdir dir' succeeds and if the directory is
writeable prevents this error.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/321

Signed-off-by: Dinah Baum 
---
 configure | 37 ++---
 1 file changed, 30 insertions(+), 7 deletions(-)

diff --git a/configure b/configure
index 64960c6000..fe9028991f 100755
--- a/configure
+++ b/configure
@@ -32,9 +32,11 @@ then
 fi
 
 mkdir build
-touch $MARKER
+if [ -d build ] && [ -w build ]
+then
+touch $MARKER
 
-cat > GNUmakefile <<'EOF'
+cat > GNUmakefile <<'EOF'
 # This file is auto-generated by configure to support in-source tree
 # 'make' command invocation
 
@@ -56,8 +58,15 @@ force: ;
 GNUmakefile: ;
 
 EOF
-cd build
-exec "$source_path/configure" "$@"
+cd build
+exec "$source_path/configure" "$@"
+elif ! [ -d build ]
+then
+echo "ERROR: Unable to create ./build dir, try using a 
../qemu/configure build"
+elif ! [ -w build ]
+then
+echo "ERROR: ./build dir not writeable, try using a ../qemu/configure 
build"
+fi
 fi
 
 # Temporary directory used for files created while
@@ -181,9 +190,12 @@ compile_prog() {
 
 # symbolically link $1 to $2.  Portable version of "ln -sf".
 symlink() {
-  rm -rf "$2"
-  mkdir -p "$(dirname "$2")"
-  ln -s "$1" "$2"
+  if [ -d $source_path/build ] && [ -w $source_path/build ]
+  then
+  rm -rf "$2"
+  mkdir -p "$(dirname "$2")"
+  ln -s "$1" "$2"
+  fi
 }
 
 # check whether a command is available to this shell (may be either an
@@ -2287,7 +2299,18 @@ fi
 ###
 # generate config-host.mak
 
+if ! [ -d $source_path/build ] || ! [ -w $source_path/build ]
+then
+echo "ERROR: ./build dir unusable, exiting"
+# cleanup
+rm -f config.log
+rm -f Makefile.prereqs
+rm -r "$TMPDIR1"
+exit 1
+fi
+
 if ! (GIT="$git" "$source_path/scripts/git-submodule.sh" 
"$git_submodules_action" "$git_submodules"); then
+echo "BAD"
 exit 1
 fi
 
-- 
2.30.2

Re: [PATCH] KVM: dirty ring: check if vcpu is created before dirty_ring_reap_one

2023-02-04 Thread Weinan Liu

Sorry, this patch is wrong.
kvm_dirty_ring_reap_locked holds slots_lock, which may result in deadlock at 
the moment when modifying memory_region.

I am finding a better way to get known the finishing of all vcpus' creations 
before waking reaper up.


> -原始邮件-发件人:"Weinan Liu" 发送时间:2023-02-05 00:08:08 
> (星期日)收件人:qemu-devel@nongnu.org抄送:pet...@redhat.com, dgilb...@redhat.com, 
> "Weinan Liu" 主题:[PATCH] KVM: dirty ring: check if vcpu 
> is created before dirty_ring_reap_one
> 
> From: Weinan Liu 
> 
> Failed to assert '(dirty_gfns && ring_size)' in kvm_dirty_ring_reap_one if
> the vcpu has not been finished to create yet. This bug occasionally occurs
> when I open 200+ qemu instances on my 16G 6-cores x86 machine. And it must
> be triggered if inserting a 'sleep(10)' into kvm_vcpu_thread_fn as below--
> 
>  static void *kvm_vcpu_thread_fn(void *arg)
>  {
>  CPUState *cpu = arg;
>  int r;
> 
>  rcu_register_thread();
> 
> +sleep(10);
>  qemu_mutex_lock_iothread();
>  qemu_thread_get_self(cpu->thread);
>  cpu->thread_id = qemu_get_thread_id();
>  cpu->can_do_io = 1;
> 
> where dirty ring reaper will wakeup but then a vcpu has not been finished
> to create.
> 
> Signed-off-by: Weinan Liu 
> ---
>  accel/kvm/kvm-all.c | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index 7e6a6076b1..840da7630e 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -719,6 +719,15 @@ static uint64_t kvm_dirty_ring_reap_locked(KVMState *s, 
> CPUState* cpu)
>  total = kvm_dirty_ring_reap_one(s, cpu);
>  } else {
>  CPU_FOREACH(cpu) {
> +/*
> + * Must ensure kvm_init_vcpu is finished, so cpu->kvm_dirty_gfns 
> is
> + * available.
> + */
> +while (cpu->created == false) {
> +qemu_mutex_unlock_iothread();
> +qemu_mutex_lock_iothread();
> +}
> +
>  total += kvm_dirty_ring_reap_one(s, cpu);
>  }
>  }
> -- 
> 2.25.1


--
Weinan Liu （刘炜楠）

Department of Computer Science and Technology
School of Informatics Xiamen University

Re: [PATCH] KVM: dirty ring: check if vcpu is created before dirty_ring_reap_one

2023-02-04 Thread Weinan Liu

Sorry, this patch is wrong.
kvm_dirty_ring_reap_locked holds slots_lock, which may result in deadlock at 
the moment when modifying memory_region.

I am finding a better way to get known the finishing of all vcpus' creations 
before waking reaper up.


> -原始邮件-发件人:"Weinan Liu" 发送时间:2023-02-05 00:08:08 
> (星期日)收件人:qemu-devel@nongnu.org抄送:pet...@redhat.com, dgilb...@redhat.com, 
> "Weinan Liu" 主题:[PATCH] KVM: dirty ring: check if vcpu 
> is created before dirty_ring_reap_one
> 
> From: Weinan Liu 
> 
> Failed to assert '(dirty_gfns && ring_size)' in kvm_dirty_ring_reap_one if
> the vcpu has not been finished to create yet. This bug occasionally occurs
> when I open 200+ qemu instances on my 16G 6-cores x86 machine. And it must
> be triggered if inserting a 'sleep(10)' into kvm_vcpu_thread_fn as below--
> 
>  static void *kvm_vcpu_thread_fn(void *arg)
>  {
>  CPUState *cpu = arg;
>  int r;
> 
>  rcu_register_thread();
> 
> +sleep(10);
>  qemu_mutex_lock_iothread();
>  qemu_thread_get_self(cpu->thread);
>  cpu->thread_id = qemu_get_thread_id();
>  cpu->can_do_io = 1;
> 
> where dirty ring reaper will wakeup but then a vcpu has not been finished
> to create.
> 
> Signed-off-by: Weinan Liu 
> ---
>  accel/kvm/kvm-all.c | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index 7e6a6076b1..840da7630e 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -719,6 +719,15 @@ static uint64_t kvm_dirty_ring_reap_locked(KVMState *s, 
> CPUState* cpu)
>  total = kvm_dirty_ring_reap_one(s, cpu);
>  } else {
>  CPU_FOREACH(cpu) {
> +/*
> + * Must ensure kvm_init_vcpu is finished, so cpu->kvm_dirty_gfns 
> is
> + * available.
> + */
> +while (cpu->created == false) {
> +qemu_mutex_unlock_iothread();
> +qemu_mutex_lock_iothread();
> +}
> +
>  total += kvm_dirty_ring_reap_one(s, cpu);
>  }
>  }
> -- 
> 2.25.1


--
Weinan Liu （刘炜楠）

Department of Computer Science and Technology
School of Informatics Xiamen University

[PATCH v2] KVM: dirty ring: check if vcpu is created before dirty_ring_reap_one

2023-02-04 Thread Weinan Liu

Failed to assert '(dirty_gfns && ring_size)' in kvm_dirty_ring_reap_one if
the vcpu has not been finished to create yet. This bug occasionally occurs
when I open 200+ qemu instances on my 16G 6-cores x86 machine. And it must
be triggered if inserting a 'sleep(10)' into kvm_vcpu_thread_fn as below--

 static void *kvm_vcpu_thread_fn(void *arg)
 {
 CPUState *cpu = arg;
 int r;

 rcu_register_thread();

+sleep(10);
 qemu_mutex_lock_iothread();
 qemu_thread_get_self(cpu->thread);
 cpu->thread_id = qemu_get_thread_id();
 cpu->can_do_io = 1;

where dirty ring reaper will wakeup but then a vcpu has not been finished
to create.

Signed-off-by: Weinan Liu 
---
 accel/kvm/kvm-all.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 7e6a6076b1..0070ad72b8 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1416,6 +1416,11 @@ static void *kvm_dirty_ring_reaper_thread(void *data)
  */
 sleep(1);
 
+/* ensure kvm_init_vcpu is finished, so cpu->kvm_dirty_gfns is ok */
+if (!phase_check(PHASE_MACHINE_READY)) {
+continue;
+}
+
 /* keep sleeping so that dirtylimit not be interfered by reaper */
 if (dirtylimit_in_service()) {
 continue;
-- 
2.25.1

[PATCH] linux-user: add support for xtensa FDPIC

2023-02-04 Thread Max Filippov

Define xtensa-specific info_is_fdpic and fill in FDPIC-specific
registers in the xtensa version of init_thread.

Signed-off-by: Max Filippov 
---
 include/elf.h|  1 +
 linux-user/elfload.c | 16 +++-
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/elf.h b/include/elf.h
index 8bf1e72720d5..e8bfe38a9fbd 100644
--- a/include/elf.h
+++ b/include/elf.h
@@ -1619,6 +1619,7 @@ typedef struct elf64_shdr {
 #define ELFOSABI_MODESTO11  /* Novell Modesto.  */
 #define ELFOSABI_OPENBSD12  /* OpenBSD.  */
 #define ELFOSABI_ARM_FDPIC  65  /* ARM FDPIC */
+#define ELFOSABI_XTENSA_FDPIC   65  /* Xtensa FDPIC */
 #define ELFOSABI_ARM97  /* ARM */
 #define ELFOSABI_STANDALONE 255 /* Standalone (embedded) application */
 
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 5928c14dfc97..150d1d450396 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -1748,6 +1748,15 @@ static inline void init_thread(struct target_pt_regs 
*regs,
 regs->windowstart = 1;
 regs->areg[1] = infop->start_stack;
 regs->pc = infop->entry;
+if (info_is_fdpic(infop)) {
+regs->areg[4] = infop->loadmap_addr;
+regs->areg[5] = infop->interpreter_loadmap_addr;
+if (infop->interpreter_loadmap_addr) {
+regs->areg[6] = infop->interpreter_pt_dynamic_addr;
+} else {
+regs->areg[6] = infop->pt_dynamic_addr;
+}
+}
 }
 
 /* See linux kernel: arch/xtensa/include/asm/elf.h.  */
@@ -2207,11 +2216,16 @@ static void zero_bss(abi_ulong elf_bss, abi_ulong 
last_bss, int prot)
 }
 }
 
-#ifdef TARGET_ARM
+#if defined(TARGET_ARM)
 static int elf_is_fdpic(struct elfhdr *exec)
 {
 return exec->e_ident[EI_OSABI] == ELFOSABI_ARM_FDPIC;
 }
+#elif defined(TARGET_XTENSA)
+static int elf_is_fdpic(struct elfhdr *exec)
+{
+return exec->e_ident[EI_OSABI] == ELFOSABI_XTENSA_FDPIC;
+}
 #else
 /* Default implementation, always false.  */
 static int elf_is_fdpic(struct elfhdr *exec)
-- 
2.30.2

[PATCH 10/10] docs/fuzz: remove mentions of fork-based fuzzing

2023-02-04 Thread Alexander Bulekov

Signed-off-by: Alexander Bulekov 
---
 docs/devel/fuzzing.rst | 22 ++
 1 file changed, 2 insertions(+), 20 deletions(-)

diff --git a/docs/devel/fuzzing.rst b/docs/devel/fuzzing.rst
index 715330c856..3bfcb33fc4 100644
--- a/docs/devel/fuzzing.rst
+++ b/docs/devel/fuzzing.rst
@@ -19,11 +19,6 @@ responsibility to ensure that state is reset between 
fuzzing-runs.
 Building the fuzzers
 
 
-*NOTE*: If possible, build a 32-bit binary. When forking, the 32-bit fuzzer is
-much faster, since the page-map has a smaller size. This is due to the fact 
that
-AddressSanitizer maps ~20TB of memory, as part of its detection. This results
-in a large page-map, and a much slower ``fork()``.
-
 To build the fuzzers, install a recent version of clang:
 Configure with (substitute the clang binaries with the version you installed).
 Here, enable-sanitizers, is optional but it allows us to reliably detect bugs
@@ -296,10 +291,9 @@ input. It is also responsible for manually calling 
``main_loop_wait`` to ensure
 that bottom halves are executed and any cleanup required before the next input.
 
 Since the same process is reused for many fuzzing runs, QEMU state needs to
-be reset at the end of each run. There are currently two implemented
-options for resetting state:
+be reset at the end of each run. For example, this can be done by rebooting the
+VM, after each run.
 
-- Reboot the guest between runs.
   - *Pros*: Straightforward and fast for simple fuzz targets.
 
   - *Cons*: Depending on the device, does not reset all device state. If the
@@ -308,15 +302,3 @@ options for resetting state:
 reboot.
 
   - *Example target*: ``i440fx-qtest-reboot-fuzz``
-
-- Run each test case in a separate forked process and copy the coverage
-   information back to the parent. This is fairly similar to AFL's "deferred"
-   fork-server mode [3]
-
-  - *Pros*: Relatively fast. Devices only need to be initialized once. No need 
to
-do slow reboots or vmloads.
-
-  - *Cons*: Not officially supported by libfuzzer. Does not work well for
- devices that rely on dedicated threads.
-
-  - *Example target*: ``virtio-net-fork-fuzz``
-- 
2.39.0

[PATCH 00/10] Retire Fork-Based Fuzzing

2023-02-04 Thread Alexander Bulekov

Hello,
This series removes fork-based fuzzing.
How does fork-based fuzzing work?
 * A single parent process initializes QEMU
 * We identify the devices we wish to fuzz (fuzzer-dependent)
 * Use QTest to PCI enumerate the devices
 * After that we start a fork-server which forks the process and executes
   fuzzer inputs inside the disposable children.

In a normal fuzzing process, everything happens in a single process.

Pros of fork-based fuzzing:
 * We only need to do common configuration once (e.g. PCI enumeration).
 * Fork provides a strong guarantee that fuzzer inputs will not interfere with
   each-other
 * The fuzzing process can continue even after a child-process crashes
 * We can apply our-own timers to child-processes to exit slow inputs, early

Cons of fork-based fuzzing:
 * Fork-based fuzzing is not supported by libfuzzer. We had to build our own
   fork-server and rely on tricks using linker-scripts and shared-memory to
   support fuzzing. ( https://physics.bu.edu/~alxndr/libfuzzer-forkserver/ )
 * Fork-based fuzzing is currently the main blocker preventing us from enabling
   other fuzzers such as AFL++ on OSS-Fuzz
 * Fork-based fuzzing may be a reason why coverage-builds are failing on
   OSS-Fuzz. Coverage is an important fuzzing metric which would allow us to
   find parts of the code that are not well-covered.
 * Fork-based fuzzing has high overhead. fork() is an expensive system-call,
   especially for processes running ASAN (with large/complex) VMA layouts.
 * Fork prevents us from effectively fuzzing devices that rely on
   threads (e.g. qxl).

These patches remove fork-based fuzzing and replace it with reboot-based
fuzzing for most cases. Misc notes about this change:
 * libfuzzer appears to be no longer in active development. As such, the
   current implementation of fork-based fuzzing (while having some nice
   advantages) is likely to hold us back in the future. If these changes
   are approved and appear to run successfully on OSS-Fuzz, we should be
   able to easily experiment with other fuzzing engines (AFL++).
 * Some device do not completely reset their state. This can lead to
   non-reproducible crashes. However, in my local tests, most crashes
   were reproducible. OSS-Fuzz shouldn't send us reports unless it can
   consistently reproduce a crash.
 * In theory, the corpus-format should not change, so the existing
   corpus-inputs on OSS-Fuzz will transfer to the new reset()-able
   fuzzers.
 * Each fuzzing process will now exit after a single crash is found. To
   continue the fuzzing process, use libfuzzer flags such as -jobs=-1
 * We no long control input-timeouts (those are handled by libfuzzer).
   Since timeouts on oss-fuzz can be many seconds long, I added a limit
   on the number of DMA bytes written.
 

Alexander Bulekov (10):
  hw/sparse-mem: clear memory on reset
  fuzz: add fuzz_reboot API
  fuzz/generic-fuzz: use reboots instead of forks to reset state
  fuzz/generic-fuzz: add a limit on DMA bytes written
  fuzz/virtio-scsi: remove fork-based fuzzer
  fuzz/virtio-net: remove fork-based fuzzer
  fuzz/virtio-blk: remove fork-based fuzzer
  fuzz/i440fx: remove fork-based fuzzer
  fuzz: remove fork-fuzzing scaffolding
  docs/fuzz: remove mentions of fork-based fuzzing

 docs/devel/fuzzing.rst  |  22 +-
 hw/mem/sparse-mem.c |  13 +++-
 meson.build |   4 -
 tests/qtest/fuzz/fork_fuzz.c|  41 --
 tests/qtest/fuzz/fork_fuzz.h|  23 --
 tests/qtest/fuzz/fork_fuzz.ld   |  56 --
 tests/qtest/fuzz/fuzz.c |   6 ++
 tests/qtest/fuzz/fuzz.h |   2 +-
 tests/qtest/fuzz/generic_fuzz.c | 111 +++-
 tests/qtest/fuzz/i440fx_fuzz.c  |  27 +--
 tests/qtest/fuzz/meson.build|   6 +-
 tests/qtest/fuzz/virtio_blk_fuzz.c  |  51 ++---
 tests/qtest/fuzz/virtio_net_fuzz.c  |  54 ++
 tests/qtest/fuzz/virtio_scsi_fuzz.c |  51 ++---
 14 files changed, 72 insertions(+), 395 deletions(-)
 delete mode 100644 tests/qtest/fuzz/fork_fuzz.c
 delete mode 100644 tests/qtest/fuzz/fork_fuzz.h
 delete mode 100644 tests/qtest/fuzz/fork_fuzz.ld

-- 
2.39.0

[PATCH 03/10] fuzz/generic-fuzz: use reboots instead of forks to reset state

2023-02-04 Thread Alexander Bulekov

Signed-off-by: Alexander Bulekov 
---
 tests/qtest/fuzz/generic_fuzz.c | 106 +++-
 1 file changed, 23 insertions(+), 83 deletions(-)

diff --git a/tests/qtest/fuzz/generic_fuzz.c b/tests/qtest/fuzz/generic_fuzz.c
index 7326f6840b..c2e5642150 100644
--- a/tests/qtest/fuzz/generic_fuzz.c
+++ b/tests/qtest/fuzz/generic_fuzz.c
@@ -18,7 +18,6 @@
 #include "tests/qtest/libqtest.h"
 #include "tests/qtest/libqos/pci-pc.h"
 #include "fuzz.h"
-#include "fork_fuzz.h"
 #include "string.h"
 #include "exec/memory.h"
 #include "exec/ramblock.h"
@@ -29,6 +28,8 @@
 #include "generic_fuzz_configs.h"
 #include "hw/mem/sparse-mem.h"
 
+static void pci_enum(gpointer pcidev, gpointer bus);
+
 /*
  * SEPARATOR is used to separate "operations" in the fuzz input
  */
@@ -589,30 +590,6 @@ static void op_disable_pci(QTestState *s, const unsigned 
char *data, size_t len)
 pci_disabled = true;
 }
 
-static void handle_timeout(int sig)
-{
-if (qtest_log_enabled) {
-fprintf(stderr, "[Timeout]\n");
-fflush(stderr);
-}
-
-/*
- * If there is a crash, libfuzzer/ASAN forks a child to run an
- * "llvm-symbolizer" process for printing out a pretty stacktrace. It
- * communicates with this child using a pipe.  If we timeout+Exit, while
- * libfuzzer is still communicating with the llvm-symbolizer child, we will
- * be left with an orphan llvm-symbolizer process. Sometimes, this appears
- * to lead to a deadlock in the forkserver. Use waitpid to check if there
- * are any waitable children. If so, exit out of the signal-handler, and
- * let libfuzzer finish communicating with the child, and exit, on its own.
- */
-if (waitpid(-1, NULL, WNOHANG) == 0) {
-return;
-}
-
-_Exit(0);
-}
-
 /*
  * Here, we interpret random bytes from the fuzzer, as a sequence of commands.
  * Some commands can be variable-width, so we use a separator, SEPARATOR, to
@@ -669,64 +646,34 @@ static void generic_fuzz(QTestState *s, const unsigned 
char *Data, size_t Size)
 size_t cmd_len;
 uint8_t op;
 
-if (fork() == 0) {
-struct sigaction sact;
-struct itimerval timer;
-sigset_t set;
-/*
- * Sometimes the fuzzer will find inputs that take quite a long time to
- * process. Often times, these inputs do not result in new coverage.
- * Even if these inputs might be interesting, they can slow down the
- * fuzzer, overall. Set a timeout for each command to avoid hurting
- * performance, too much
- */
-if (timeout) {
-
-sigemptyset(&sact.sa_mask);
-sact.sa_flags   = SA_NODEFER;
-sact.sa_handler = handle_timeout;
-sigaction(SIGALRM, &sact, NULL);
-
-sigemptyset(&set);
-sigaddset(&set, SIGALRM);
-pthread_sigmask(SIG_UNBLOCK, &set, NULL);
-
-memset(&timer, 0, sizeof(timer));
-timer.it_value.tv_sec = timeout / USEC_IN_SEC;
-timer.it_value.tv_usec = timeout % USEC_IN_SEC;
-}
+op_clear_dma_patterns(s, NULL, 0);
+pci_disabled = false;
 
-op_clear_dma_patterns(s, NULL, 0);
-pci_disabled = false;
+QPCIBus *pcibus = qpci_new_pc(s, NULL);
+g_ptr_array_foreach(fuzzable_pci_devices, pci_enum, pcibus);
+qpci_free_pc(pcibus);
 
-while (cmd && Size) {
-/* Reset the timeout, each time we run a new command */
-if (timeout) {
-setitimer(ITIMER_REAL, &timer, NULL);
-}
+while (cmd && Size) {
+/* Reset the timeout, each time we run a new command */
 
-/* Get the length until the next command or end of input */
-nextcmd = memmem(cmd, Size, SEPARATOR, strlen(SEPARATOR));
-cmd_len = nextcmd ? nextcmd - cmd : Size;
+/* Get the length until the next command or end of input */
+nextcmd = memmem(cmd, Size, SEPARATOR, strlen(SEPARATOR));
+cmd_len = nextcmd ? nextcmd - cmd : Size;
 
-if (cmd_len > 0) {
-/* Interpret the first byte of the command as an opcode */
-op = *cmd % (sizeof(ops) / sizeof((ops)[0]));
-ops[op](s, cmd + 1, cmd_len - 1);
+if (cmd_len > 0) {
+/* Interpret the first byte of the command as an opcode */
+op = *cmd % (sizeof(ops) / sizeof((ops)[0]));
+ops[op](s, cmd + 1, cmd_len - 1);
 
-/* Run the main loop */
-flush_events(s);
-}
-/* Advance to the next command */
-cmd = nextcmd ? nextcmd + sizeof(SEPARATOR) - 1 : nextcmd;
-Size = Size - (cmd_len + sizeof(SEPARATOR) - 1);
-g_array_set_size(dma_regions, 0);
+/* Run the main loop */
+flush_events(s);
 }
-_Exit(0);
-} else {
-flush_events(s);
-wait(0);
+/* Advance to the next command *

[PATCH 05/10] fuzz/virtio-scsi: remove fork-based fuzzer

2023-02-04 Thread Alexander Bulekov

Signed-off-by: Alexander Bulekov 
---
 tests/qtest/fuzz/virtio_scsi_fuzz.c | 51 -
 1 file changed, 7 insertions(+), 44 deletions(-)

diff --git a/tests/qtest/fuzz/virtio_scsi_fuzz.c 
b/tests/qtest/fuzz/virtio_scsi_fuzz.c
index b3220ef6cb..8b26e951ae 100644
--- a/tests/qtest/fuzz/virtio_scsi_fuzz.c
+++ b/tests/qtest/fuzz/virtio_scsi_fuzz.c
@@ -20,7 +20,6 @@
 #include "standard-headers/linux/virtio_pci.h"
 #include "standard-headers/linux/virtio_scsi.h"
 #include "fuzz.h"
-#include "fork_fuzz.h"
 #include "qos_fuzz.h"
 
 #define PCI_SLOT0x02
@@ -132,48 +131,24 @@ static void virtio_scsi_fuzz(QTestState *s, 
QVirtioSCSIQueues* queues,
 }
 }
 
-static void virtio_scsi_fork_fuzz(QTestState *s,
-const unsigned char *Data, size_t Size)
-{
-QVirtioSCSI *scsi = fuzz_qos_obj;
-static QVirtioSCSIQueues *queues;
-if (!queues) {
-queues = qvirtio_scsi_init(scsi->vdev, 0);
-}
-if (fork() == 0) {
-virtio_scsi_fuzz(s, queues, Data, Size);
-flush_events(s);
-_Exit(0);
-} else {
-flush_events(s);
-wait(NULL);
-}
-}
-
 static void virtio_scsi_with_flag_fuzz(QTestState *s,
 const unsigned char *Data, size_t Size)
 {
 QVirtioSCSI *scsi = fuzz_qos_obj;
 static QVirtioSCSIQueues *queues;
 
-if (fork() == 0) {
-if (Size >= sizeof(uint64_t)) {
-queues = qvirtio_scsi_init(scsi->vdev, *(uint64_t *)Data);
-virtio_scsi_fuzz(s, queues,
- Data + sizeof(uint64_t), Size - sizeof(uint64_t));
-flush_events(s);
-}
-_Exit(0);
-} else {
+if (Size >= sizeof(uint64_t)) {
+queues = qvirtio_scsi_init(scsi->vdev, *(uint64_t *)Data);
+virtio_scsi_fuzz(s, queues,
+Data + sizeof(uint64_t), Size - sizeof(uint64_t));
 flush_events(s);
-wait(NULL);
 }
+fuzz_reboot(s);
 }
 
 static void virtio_scsi_pre_fuzz(QTestState *s)
 {
 qos_init_path(s);
-counter_shm_init();
 }
 
 static void *virtio_scsi_test_setup(GString *cmd_line, void *arg)
@@ -189,22 +164,10 @@ static void *virtio_scsi_test_setup(GString *cmd_line, 
void *arg)
 
 static void register_virtio_scsi_fuzz_targets(void)
 {
-fuzz_add_qos_target(&(FuzzTarget){
-.name = "virtio-scsi-fuzz",
-.description = "Fuzz the virtio-scsi virtual queues, forking "
-"for each fuzz run",
-.pre_vm_init = &counter_shm_init,
-.pre_fuzz = &virtio_scsi_pre_fuzz,
-.fuzz = virtio_scsi_fork_fuzz,},
-"virtio-scsi",
-&(QOSGraphTestOptions){.before = virtio_scsi_test_setup}
-);
-
 fuzz_add_qos_target(&(FuzzTarget){
 .name = "virtio-scsi-flags-fuzz",
-.description = "Fuzz the virtio-scsi virtual queues, forking "
-"for each fuzz run (also fuzzes the virtio flags)",
-.pre_vm_init = &counter_shm_init,
+.description = "Fuzz the virtio-scsi virtual queues. "
+"Also fuzzes the virtio flags",
 .pre_fuzz = &virtio_scsi_pre_fuzz,
 .fuzz = virtio_scsi_with_flag_fuzz,},
 "virtio-scsi",
-- 
2.39.0

[PATCH 07/10] fuzz/virtio-blk: remove fork-based fuzzer

2023-02-04 Thread Alexander Bulekov

Signed-off-by: Alexander Bulekov 
---
 tests/qtest/fuzz/virtio_blk_fuzz.c | 51 --
 1 file changed, 7 insertions(+), 44 deletions(-)

diff --git a/tests/qtest/fuzz/virtio_blk_fuzz.c 
b/tests/qtest/fuzz/virtio_blk_fuzz.c
index a9fb9ecf6c..82575a11d9 100644
--- a/tests/qtest/fuzz/virtio_blk_fuzz.c
+++ b/tests/qtest/fuzz/virtio_blk_fuzz.c
@@ -19,7 +19,6 @@
 #include "standard-headers/linux/virtio_pci.h"
 #include "standard-headers/linux/virtio_blk.h"
 #include "fuzz.h"
-#include "fork_fuzz.h"
 #include "qos_fuzz.h"
 
 #define TEST_IMAGE_SIZE (64 * 1024 * 1024)
@@ -128,48 +127,24 @@ static void virtio_blk_fuzz(QTestState *s, 
QVirtioBlkQueues* queues,
 }
 }
 
-static void virtio_blk_fork_fuzz(QTestState *s,
-const unsigned char *Data, size_t Size)
-{
-QVirtioBlk *blk = fuzz_qos_obj;
-static QVirtioBlkQueues *queues;
-if (!queues) {
-queues = qvirtio_blk_init(blk->vdev, 0);
-}
-if (fork() == 0) {
-virtio_blk_fuzz(s, queues, Data, Size);
-flush_events(s);
-_Exit(0);
-} else {
-flush_events(s);
-wait(NULL);
-}
-}
-
 static void virtio_blk_with_flag_fuzz(QTestState *s,
 const unsigned char *Data, size_t Size)
 {
 QVirtioBlk *blk = fuzz_qos_obj;
 static QVirtioBlkQueues *queues;
 
-if (fork() == 0) {
-if (Size >= sizeof(uint64_t)) {
-queues = qvirtio_blk_init(blk->vdev, *(uint64_t *)Data);
-virtio_blk_fuzz(s, queues,
- Data + sizeof(uint64_t), Size - sizeof(uint64_t));
-flush_events(s);
-}
-_Exit(0);
-} else {
+if (Size >= sizeof(uint64_t)) {
+queues = qvirtio_blk_init(blk->vdev, *(uint64_t *)Data);
+virtio_blk_fuzz(s, queues,
+Data + sizeof(uint64_t), Size - sizeof(uint64_t));
 flush_events(s);
-wait(NULL);
 }
+fuzz_reboot(s);
 }
 
 static void virtio_blk_pre_fuzz(QTestState *s)
 {
 qos_init_path(s);
-counter_shm_init();
 }
 
 static void drive_destroy(void *path)
@@ -208,22 +183,10 @@ static void *virtio_blk_test_setup(GString *cmd_line, 
void *arg)
 
 static void register_virtio_blk_fuzz_targets(void)
 {
-fuzz_add_qos_target(&(FuzzTarget){
-.name = "virtio-blk-fuzz",
-.description = "Fuzz the virtio-blk virtual queues, forking "
-"for each fuzz run",
-.pre_vm_init = &counter_shm_init,
-.pre_fuzz = &virtio_blk_pre_fuzz,
-.fuzz = virtio_blk_fork_fuzz,},
-"virtio-blk",
-&(QOSGraphTestOptions){.before = virtio_blk_test_setup}
-);
-
 fuzz_add_qos_target(&(FuzzTarget){
 .name = "virtio-blk-flags-fuzz",
-.description = "Fuzz the virtio-blk virtual queues, forking "
-"for each fuzz run (also fuzzes the virtio flags)",
-.pre_vm_init = &counter_shm_init,
+.description = "Fuzz the virtio-blk virtual queues. "
+"Also fuzzes the virtio flags)",
 .pre_fuzz = &virtio_blk_pre_fuzz,
 .fuzz = virtio_blk_with_flag_fuzz,},
 "virtio-blk",
-- 
2.39.0

[PATCH 09/10] fuzz: remove fork-fuzzing scaffolding

2023-02-04 Thread Alexander Bulekov

Fork-fuzzing provides a few pros, but our implementation prevents us
from using fuzzers other than libFuzzer, and may be causing issues such
as coverage-failure builds on OSS-Fuzz. It is not a great long-term
solution as it depends on internal implementation details of libFuzzer
(which is no longer in active development). Remove it in favor of other
methods of resetting state between inputs.

Signed-off-by: Alexander Bulekov 
---
 meson.build   |  4 ---
 tests/qtest/fuzz/fork_fuzz.c  | 41 -
 tests/qtest/fuzz/fork_fuzz.h  | 23 --
 tests/qtest/fuzz/fork_fuzz.ld | 56 ---
 tests/qtest/fuzz/meson.build  |  6 ++--
 5 files changed, 3 insertions(+), 127 deletions(-)
 delete mode 100644 tests/qtest/fuzz/fork_fuzz.c
 delete mode 100644 tests/qtest/fuzz/fork_fuzz.h
 delete mode 100644 tests/qtest/fuzz/fork_fuzz.ld

diff --git a/meson.build b/meson.build
index 6d3b665629..8be27c2408 100644
--- a/meson.build
+++ b/meson.build
@@ -215,10 +215,6 @@ endif
 # Specify linker-script with add_project_link_arguments so that it is not 
placed
 # within a linker --start-group/--end-group pair
 if get_option('fuzzing')
-  add_project_link_arguments(['-Wl,-T,',
-  (meson.current_source_dir() / 
'tests/qtest/fuzz/fork_fuzz.ld')],
- native: false, language: all_languages)
-
   # Specify a filter to only instrument code that is directly related to
   # virtual-devices.
   configure_file(output: 'instrumentation-filter',
diff --git a/tests/qtest/fuzz/fork_fuzz.c b/tests/qtest/fuzz/fork_fuzz.c
deleted file mode 100644
index 6ffb2a7937..00
--- a/tests/qtest/fuzz/fork_fuzz.c
+++ /dev/null
@@ -1,41 +0,0 @@
-/*
- * Fork-based fuzzing helpers
- *
- * Copyright Red Hat Inc., 2019
- *
- * Authors:
- *  Alexander Bulekov   
- *
- * This work is licensed under the terms of the GNU GPL, version 2 or later.
- * See the COPYING file in the top-level directory.
- *
- */
-
-#include "qemu/osdep.h"
-#include "fork_fuzz.h"
-
-
-void counter_shm_init(void)
-{
-/* Copy what's in the counter region to a temporary buffer.. */
-void *copy = malloc(&__FUZZ_COUNTERS_END - &__FUZZ_COUNTERS_START);
-memcpy(copy,
-   &__FUZZ_COUNTERS_START,
-   &__FUZZ_COUNTERS_END - &__FUZZ_COUNTERS_START);
-
-/* Map a shared region over the counter region */
-if (mmap(&__FUZZ_COUNTERS_START,
- &__FUZZ_COUNTERS_END - &__FUZZ_COUNTERS_START,
- PROT_READ | PROT_WRITE, MAP_SHARED | MAP_FIXED | MAP_ANONYMOUS,
- 0, 0) == MAP_FAILED) {
-perror("Error: ");
-exit(1);
-}
-
-/* Copy the original data back to the counter-region */
-memcpy(&__FUZZ_COUNTERS_START, copy,
-   &__FUZZ_COUNTERS_END - &__FUZZ_COUNTERS_START);
-free(copy);
-}
-
-
diff --git a/tests/qtest/fuzz/fork_fuzz.h b/tests/qtest/fuzz/fork_fuzz.h
deleted file mode 100644
index 9ecb8b58ef..00
--- a/tests/qtest/fuzz/fork_fuzz.h
+++ /dev/null
@@ -1,23 +0,0 @@
-/*
- * Fork-based fuzzing helpers
- *
- * Copyright Red Hat Inc., 2019
- *
- * Authors:
- *  Alexander Bulekov   
- *
- * This work is licensed under the terms of the GNU GPL, version 2 or later.
- * See the COPYING file in the top-level directory.
- *
- */
-
-#ifndef FORK_FUZZ_H
-#define FORK_FUZZ_H
-
-extern uint8_t __FUZZ_COUNTERS_START;
-extern uint8_t __FUZZ_COUNTERS_END;
-
-void counter_shm_init(void);
-
-#endif
-
diff --git a/tests/qtest/fuzz/fork_fuzz.ld b/tests/qtest/fuzz/fork_fuzz.ld
deleted file mode 100644
index cfb88b7fdb..00
--- a/tests/qtest/fuzz/fork_fuzz.ld
+++ /dev/null
@@ -1,56 +0,0 @@
-/*
- * We adjust linker script modification to place all of the stuff that needs to
- * persist across fuzzing runs into a contiguous section of memory. Then, it is
- * easy to re-map the counter-related memory as shared.
- */
-
-SECTIONS
-{
-  .data.fuzz_start : ALIGN(4K)
-  {
-  __FUZZ_COUNTERS_START = .;
-  __start___sancov_cntrs = .;
-  *(_*sancov_cntrs);
-  __stop___sancov_cntrs = .;
-
-  /* Lowest stack counter */
-  *(__sancov_lowest_stack);
-  }
-}
-INSERT AFTER .data;
-
-SECTIONS
-{
-  .data.fuzz_ordered :
-  {
-  /*
-   * Coverage counters. They're not necessary for fuzzing, but are useful
-   * for analyzing the fuzzing performance
-   */
-  __start___llvm_prf_cnts = .;
-  *(*llvm_prf_cnts);
-  __stop___llvm_prf_cnts = .;
-
-  /* Internal Libfuzzer TracePC object which contains the ValueProfileMap 
*/
-  FuzzerTracePC*(.bss*);
-  /*
-   * In case the above line fails, explicitly specify the (mangled) name of
-   * the object we care about
-   */
-   *(.bss._ZN6fuzzer3TPCE);
-  }
-}
-INSERT AFTER .data.fuzz_start;
-
-SECTIONS
-{
-  .data.fuzz_end : ALIGN(4K)
-  {
-  __FUZZ_COUNTERS_END = .;
-  }
-}
-/*
- * Don't overwrite the SECTIONS in the default linker script. Instead insert 
the
- * above into the def

[PATCH 08/10] fuzz/i440fx: remove fork-based fuzzer

2023-02-04 Thread Alexander Bulekov

Signed-off-by: Alexander Bulekov 
---
 tests/qtest/fuzz/i440fx_fuzz.c | 27 +--
 1 file changed, 1 insertion(+), 26 deletions(-)

diff --git a/tests/qtest/fuzz/i440fx_fuzz.c b/tests/qtest/fuzz/i440fx_fuzz.c
index b17fc725df..5d6a703481 100644
--- a/tests/qtest/fuzz/i440fx_fuzz.c
+++ b/tests/qtest/fuzz/i440fx_fuzz.c
@@ -18,7 +18,6 @@
 #include "tests/qtest/libqos/pci-pc.h"
 #include "fuzz.h"
 #include "qos_fuzz.h"
-#include "fork_fuzz.h"
 
 
 #define I440FX_PCI_HOST_BRIDGE_CFG 0xcf8
@@ -89,6 +88,7 @@ static void i440fx_fuzz_qtest(QTestState *s,
   size_t Size)
 {
 ioport_fuzz_qtest(s, Data, Size);
+fuzz_reboot(s);
 }
 
 static void pciconfig_fuzz_qos(QTestState *s, QPCIBus *bus,
@@ -145,17 +145,6 @@ static void i440fx_fuzz_qos(QTestState *s,
 pciconfig_fuzz_qos(s, bus, Data, Size);
 }
 
-static void i440fx_fuzz_qos_fork(QTestState *s,
-const unsigned char *Data, size_t Size) {
-if (fork() == 0) {
-i440fx_fuzz_qos(s, Data, Size);
-_Exit(0);
-} else {
-flush_events(s);
-wait(NULL);
-}
-}
-
 static const char *i440fx_qtest_argv = TARGET_NAME " -machine accel=qtest"
" -m 0 -display none";
 static GString *i440fx_argv(FuzzTarget *t)
@@ -163,10 +152,6 @@ static GString *i440fx_argv(FuzzTarget *t)
 return g_string_new(i440fx_qtest_argv);
 }
 
-static void fork_init(void)
-{
-counter_shm_init();
-}
 
 static void register_pci_fuzz_targets(void)
 {
@@ -178,16 +163,6 @@ static void register_pci_fuzz_targets(void)
 .get_init_cmdline = i440fx_argv,
 .fuzz = i440fx_fuzz_qtest});
 
-/* Uses libqos and forks to prevent state leakage */
-fuzz_add_qos_target(&(FuzzTarget){
-.name = "i440fx-qos-fork-fuzz",
-.description = "Fuzz the i440fx using raw qtest commands and "
-   "rebooting after each run",
-.pre_vm_init = &fork_init,
-.fuzz = i440fx_fuzz_qos_fork,},
-"i440FX-pcihost",
-&(QOSGraphTestOptions){}
-);
 
 /*
  * Uses libqos. Doesn't do anything to reset state. Note that if we were to
-- 
2.39.0

[PATCH 04/10] fuzz/generic-fuzz: add a limit on DMA bytes written

2023-02-04 Thread Alexander Bulekov

As we have repplaced fork-based fuzzing, with reboots - we can no longer
use a timeout+exit() to avoid slow inputs. Libfuzzer has its own timer
that it uses to catch slow inputs, however these timeouts are usually
seconds-minutes long: more than enough to bog-down the fuzzing process.
However, I found that slow inputs often attempt to fill overly large DMA
requests. Thus, we can mitigate most timeouts by setting a cap on the
total number of DMA bytes written by an input.

Signed-off-by: Alexander Bulekov 
---
 tests/qtest/fuzz/generic_fuzz.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/tests/qtest/fuzz/generic_fuzz.c b/tests/qtest/fuzz/generic_fuzz.c
index c2e5642150..eab92cbc23 100644
--- a/tests/qtest/fuzz/generic_fuzz.c
+++ b/tests/qtest/fuzz/generic_fuzz.c
@@ -52,6 +52,7 @@ enum cmds {
 #define USEC_IN_SEC 10
 
 #define MAX_DMA_FILL_SIZE 0x1
+#define MAX_TOTAL_DMA_SIZE 0x1000
 
 #define PCI_HOST_BRIDGE_CFG 0xcf8
 #define PCI_HOST_BRIDGE_DATA 0xcfc
@@ -64,6 +65,7 @@ typedef struct {
 static useconds_t timeout = DEFAULT_TIMEOUT_US;
 
 static bool qtest_log_enabled;
+size_t dma_bytes_written;
 
 MemoryRegion *sparse_mem_mr;
 
@@ -197,6 +199,7 @@ void fuzz_dma_read_cb(size_t addr, size_t len, MemoryRegion 
*mr)
  */
 if (dma_patterns->len == 0
 || len == 0
+|| dma_bytes_written > MAX_TOTAL_DMA_SIZE
 || (mr != current_machine->ram && mr != sparse_mem_mr)) {
 return;
 }
@@ -269,6 +272,7 @@ void fuzz_dma_read_cb(size_t addr, size_t len, MemoryRegion 
*mr)
 fflush(stderr);
 }
 qtest_memwrite(qts_global, addr, buf, l);
+dma_bytes_written += l;
 }
 len -= l;
 buf += l;
@@ -648,6 +652,7 @@ static void generic_fuzz(QTestState *s, const unsigned char 
*Data, size_t Size)
 
 op_clear_dma_patterns(s, NULL, 0);
 pci_disabled = false;
+dma_bytes_written = 0;
 
 QPCIBus *pcibus = qpci_new_pc(s, NULL);
 g_ptr_array_foreach(fuzzable_pci_devices, pci_enum, pcibus);
-- 
2.39.0

[PATCH 02/10] fuzz: add fuzz_reboot API

2023-02-04 Thread Alexander Bulekov

As we are converting most fuzzers to rely on reboots to reset state,
introduce an API to make sure reboots are invoked in a consistent
manner.

Signed-off-by: Alexander Bulekov 
---
 tests/qtest/fuzz/fuzz.c | 6 ++
 tests/qtest/fuzz/fuzz.h | 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/tests/qtest/fuzz/fuzz.c b/tests/qtest/fuzz/fuzz.c
index eb7520544b..c2d07a4c7e 100644
--- a/tests/qtest/fuzz/fuzz.c
+++ b/tests/qtest/fuzz/fuzz.c
@@ -51,6 +51,12 @@ void flush_events(QTestState *s)
 }
 }
 
+void fuzz_reboot(QTestState *s)
+{
+qemu_system_reset(SHUTDOWN_CAUSE_GUEST_RESET);
+main_loop_wait(true);
+}
+
 static QTestState *qtest_setup(void)
 {
 qtest_server_set_send_handler(&qtest_client_inproc_recv, &fuzz_qts);
diff --git a/tests/qtest/fuzz/fuzz.h b/tests/qtest/fuzz/fuzz.h
index 327c1c5a55..69e2b3877f 100644
--- a/tests/qtest/fuzz/fuzz.h
+++ b/tests/qtest/fuzz/fuzz.h
@@ -103,7 +103,7 @@ typedef struct FuzzTarget {
 } FuzzTarget;
 
 void flush_events(QTestState *);
-void reboot(QTestState *);
+void fuzz_reboot(QTestState *);
 
 /* Use the QTest ASCII protocol or call address_space API directly?*/
 void fuzz_qtest_set_serialize(bool option);
-- 
2.39.0

[PATCH 06/10] fuzz/virtio-net: remove fork-based fuzzer

2023-02-04 Thread Alexander Bulekov

Signed-off-by: Alexander Bulekov 
---
 tests/qtest/fuzz/virtio_net_fuzz.c | 54 +++---
 1 file changed, 5 insertions(+), 49 deletions(-)

diff --git a/tests/qtest/fuzz/virtio_net_fuzz.c 
b/tests/qtest/fuzz/virtio_net_fuzz.c
index c2c15f07f0..d245ee66a1 100644
--- a/tests/qtest/fuzz/virtio_net_fuzz.c
+++ b/tests/qtest/fuzz/virtio_net_fuzz.c
@@ -16,7 +16,6 @@
 #include "tests/qtest/libqtest.h"
 #include "tests/qtest/libqos/virtio-net.h"
 #include "fuzz.h"
-#include "fork_fuzz.h"
 #include "qos_fuzz.h"
 
 
@@ -115,36 +114,18 @@ static void virtio_net_fuzz_multi(QTestState *s,
 }
 }
 
-static void virtio_net_fork_fuzz(QTestState *s,
-const unsigned char *Data, size_t Size)
-{
-if (fork() == 0) {
-virtio_net_fuzz_multi(s, Data, Size, false);
-flush_events(s);
-_Exit(0);
-} else {
-flush_events(s);
-wait(NULL);
-}
-}
 
-static void virtio_net_fork_fuzz_check_used(QTestState *s,
+static void virtio_net_fuzz_check_used(QTestState *s,
 const unsigned char *Data, size_t Size)
 {
-if (fork() == 0) {
-virtio_net_fuzz_multi(s, Data, Size, true);
-flush_events(s);
-_Exit(0);
-} else {
-flush_events(s);
-wait(NULL);
-}
+virtio_net_fuzz_multi(s, Data, Size, true);
+flush_events(s);
+fuzz_reboot(s);
 }
 
 static void virtio_net_pre_fuzz(QTestState *s)
 {
 qos_init_path(s);
-counter_shm_init();
 }
 
 static void *virtio_net_test_setup_socket(GString *cmd_line, void *arg)
@@ -158,23 +139,8 @@ static void *virtio_net_test_setup_socket(GString 
*cmd_line, void *arg)
 return arg;
 }
 
-static void *virtio_net_test_setup_user(GString *cmd_line, void *arg)
-{
-g_string_append_printf(cmd_line, " -netdev user,id=hs0 ");
-return arg;
-}
-
 static void register_virtio_net_fuzz_targets(void)
 {
-fuzz_add_qos_target(&(FuzzTarget){
-.name = "virtio-net-socket",
-.description = "Fuzz the virtio-net virtual queues. Fuzz incoming "
-"traffic using the socket backend",
-.pre_fuzz = &virtio_net_pre_fuzz,
-.fuzz = virtio_net_fork_fuzz,},
-"virtio-net",
-&(QOSGraphTestOptions){.before = virtio_net_test_setup_socket}
-);
 
 fuzz_add_qos_target(&(FuzzTarget){
 .name = "virtio-net-socket-check-used",
@@ -182,20 +148,10 @@ static void register_virtio_net_fuzz_targets(void)
 "descriptors to be used. Timeout may indicate improperly handled "
 "input",
 .pre_fuzz = &virtio_net_pre_fuzz,
-.fuzz = virtio_net_fork_fuzz_check_used,},
+.fuzz = virtio_net_fuzz_check_used,},
 "virtio-net",
 &(QOSGraphTestOptions){.before = virtio_net_test_setup_socket}
 );
-fuzz_add_qos_target(&(FuzzTarget){
-.name = "virtio-net-slirp",
-.description = "Fuzz the virtio-net virtual queues with the slirp "
-" backend. Warning: May result in network traffic emitted from the 
"
-" process. Run in an isolated network environment.",
-.pre_fuzz = &virtio_net_pre_fuzz,
-.fuzz = virtio_net_fork_fuzz,},
-"virtio-net",
-&(QOSGraphTestOptions){.before = virtio_net_test_setup_user}
-);
 }
 
 fuzz_target_init(register_virtio_net_fuzz_targets);
-- 
2.39.0

[PATCH 01/10] hw/sparse-mem: clear memory on reset

2023-02-04 Thread Alexander Bulekov

We use sparse-mem for fuzzing. For long-running fuzzing processes, we
eventually end up with many allocated sparse-mem pages. To avoid this,
clear the allocated pages on system-reset.

Signed-off-by: Alexander Bulekov 
---
 hw/mem/sparse-mem.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/hw/mem/sparse-mem.c b/hw/mem/sparse-mem.c
index e6640eb8e7..72f038d47d 100644
--- a/hw/mem/sparse-mem.c
+++ b/hw/mem/sparse-mem.c
@@ -77,6 +77,13 @@ static void sparse_mem_write(void *opaque, hwaddr addr, 
uint64_t v,
 
 }
 
+static void sparse_mem_enter_reset(Object *obj, ResetType type)
+{
+SparseMemState *s = SPARSE_MEM(obj);
+g_hash_table_remove_all(s->mapped);
+return;
+}
+
 static const MemoryRegionOps sparse_mem_ops = {
 .read = sparse_mem_read,
 .write = sparse_mem_write,
@@ -123,7 +130,8 @@ static void sparse_mem_realize(DeviceState *dev, Error 
**errp)
 
 assert(s->baseaddr + s->length > s->baseaddr);
 
-s->mapped = g_hash_table_new(NULL, NULL);
+s->mapped = g_hash_table_new_full(NULL, NULL, NULL,
+  (GDestroyNotify)g_free);
 memory_region_init_io(&s->mmio, OBJECT(s), &sparse_mem_ops, s,
   "sparse-mem", s->length);
 sysbus_init_mmio(sbd, &s->mmio);
@@ -131,12 +139,15 @@ static void sparse_mem_realize(DeviceState *dev, Error 
**errp)
 
 static void sparse_mem_class_init(ObjectClass *klass, void *data)
 {
+ResettableClass *rc = RESETTABLE_CLASS(klass);
 DeviceClass *dc = DEVICE_CLASS(klass);
 
 device_class_set_props(dc, sparse_mem_properties);
 
 dc->desc = "Sparse Memory Device";
 dc->realize = sparse_mem_realize;
+
+rc->phases.enter = sparse_mem_enter_reset;
 }
 
 static const TypeInfo sparse_mem_types[] = {
-- 
2.39.0

[PATCH v6 4/4] hw: replace most qemu_bh_new calls with qemu_bh_new_guarded

2023-02-04 Thread Alexander Bulekov

This protects devices from bh->mmio reentrancy issues.

Reviewed-by: Darren Kenny 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Alexander Bulekov 
---
 hw/9pfs/xen-9p-backend.c| 4 +++-
 hw/block/dataplane/virtio-blk.c | 3 ++-
 hw/block/dataplane/xen-block.c  | 5 +++--
 hw/char/virtio-serial-bus.c | 3 ++-
 hw/display/qxl.c| 9 ++---
 hw/display/virtio-gpu.c | 6 --
 hw/ide/ahci.c   | 3 ++-
 hw/ide/core.c   | 3 ++-
 hw/misc/imx_rngc.c  | 6 --
 hw/misc/macio/mac_dbdma.c   | 2 +-
 hw/net/virtio-net.c | 3 ++-
 hw/nvme/ctrl.c  | 6 --
 hw/scsi/mptsas.c| 3 ++-
 hw/scsi/scsi-bus.c  | 3 ++-
 hw/scsi/vmw_pvscsi.c| 3 ++-
 hw/usb/dev-uas.c| 3 ++-
 hw/usb/hcd-dwc2.c   | 3 ++-
 hw/usb/hcd-ehci.c   | 3 ++-
 hw/usb/hcd-uhci.c   | 2 +-
 hw/usb/host-libusb.c| 6 --
 hw/usb/redirect.c   | 6 --
 hw/usb/xen-usb.c| 3 ++-
 hw/virtio/virtio-balloon.c  | 5 +++--
 hw/virtio/virtio-crypto.c   | 3 ++-
 24 files changed, 63 insertions(+), 33 deletions(-)

diff --git a/hw/9pfs/xen-9p-backend.c b/hw/9pfs/xen-9p-backend.c
index 65c4979c3c..f077c1b255 100644
--- a/hw/9pfs/xen-9p-backend.c
+++ b/hw/9pfs/xen-9p-backend.c
@@ -441,7 +441,9 @@ static int xen_9pfs_connect(struct XenLegacyDevice *xendev)
 xen_9pdev->rings[i].ring.out = xen_9pdev->rings[i].data +
XEN_FLEX_RING_SIZE(ring_order);
 
-xen_9pdev->rings[i].bh = qemu_bh_new(xen_9pfs_bh, 
&xen_9pdev->rings[i]);
+xen_9pdev->rings[i].bh = qemu_bh_new_guarded(xen_9pfs_bh,
+ &xen_9pdev->rings[i],
+ 
&DEVICE(xen_9pdev)->mem_reentrancy_guard);
 xen_9pdev->rings[i].out_cons = 0;
 xen_9pdev->rings[i].out_size = 0;
 xen_9pdev->rings[i].inprogress = false;
diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index b28d81737e..a6202997ee 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -127,7 +127,8 @@ bool virtio_blk_data_plane_create(VirtIODevice *vdev, 
VirtIOBlkConf *conf,
 } else {
 s->ctx = qemu_get_aio_context();
 }
-s->bh = aio_bh_new(s->ctx, notify_guest_bh, s);
+s->bh = aio_bh_new_guarded(s->ctx, notify_guest_bh, s,
+   &DEVICE(vdev)->mem_reentrancy_guard);
 s->batch_notify_vqs = bitmap_new(conf->num_queues);
 
 *dataplane = s;
diff --git a/hw/block/dataplane/xen-block.c b/hw/block/dataplane/xen-block.c
index 2785b9e849..e31806b317 100644
--- a/hw/block/dataplane/xen-block.c
+++ b/hw/block/dataplane/xen-block.c
@@ -632,8 +632,9 @@ XenBlockDataPlane *xen_block_dataplane_create(XenDevice 
*xendev,
 } else {
 dataplane->ctx = qemu_get_aio_context();
 }
-dataplane->bh = aio_bh_new(dataplane->ctx, xen_block_dataplane_bh,
-   dataplane);
+dataplane->bh = aio_bh_new_guarded(dataplane->ctx, xen_block_dataplane_bh,
+   dataplane,
+   &DEVICE(xendev)->mem_reentrancy_guard);
 
 return dataplane;
 }
diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index 7d4601cb5d..dd619f0731 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -985,7 +985,8 @@ static void virtser_port_device_realize(DeviceState *dev, 
Error **errp)
 return;
 }
 
-port->bh = qemu_bh_new(flush_queued_data_bh, port);
+port->bh = qemu_bh_new_guarded(flush_queued_data_bh, port,
+   &dev->mem_reentrancy_guard);
 port->elem = NULL;
 }
 
diff --git a/hw/display/qxl.c b/hw/display/qxl.c
index ec712d3ca2..c0460c4ef1 100644
--- a/hw/display/qxl.c
+++ b/hw/display/qxl.c
@@ -2201,11 +2201,14 @@ static void qxl_realize_common(PCIQXLDevice *qxl, Error 
**errp)
 
 qemu_add_vm_change_state_handler(qxl_vm_change_state_handler, qxl);
 
-qxl->update_irq = qemu_bh_new(qxl_update_irq_bh, qxl);
+qxl->update_irq = qemu_bh_new_guarded(qxl_update_irq_bh, qxl,
+  &DEVICE(qxl)->mem_reentrancy_guard);
 qxl_reset_state(qxl);
 
-qxl->update_area_bh = qemu_bh_new(qxl_render_update_area_bh, qxl);
-qxl->ssd.cursor_bh = qemu_bh_new(qemu_spice_cursor_refresh_bh, &qxl->ssd);
+qxl->update_area_bh = qemu_bh_new_guarded(qxl_render_update_area_bh, qxl,
+  
&DEVICE(qxl)->mem_reentrancy_guard);
+qxl->ssd.cursor_bh = qemu_bh_new_guarded(qemu_spice_cursor_refresh_bh, 
&qxl->ssd,
+ 
&DEVICE(qxl)->mem_reentrancy_guard);
 }
 
 static void qxl_realize_primary(PCIDevice *dev, Error **errp)
diff --git a/hw/display/virtio-gp

[PATCH v6 3/4] checkpatch: add qemu_bh_new/aio_bh_new checks

2023-02-04 Thread Alexander Bulekov

Advise authors to use the _guarded versions of the APIs, instead.

Reviewed-by: Darren Kenny 
Signed-off-by: Alexander Bulekov 
---
 scripts/checkpatch.pl | 8 
 1 file changed, 8 insertions(+)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 6ecabfb2b5..fbb71c70f8 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -2865,6 +2865,14 @@ sub process {
if ($line =~ /\bsignal\s*\(/ && !($line =~ /SIG_(?:IGN|DFL)/)) {
ERROR("use sigaction to establish signal handlers; 
signal is not portable\n" . $herecurr);
}
+# recommend qemu_bh_new_guarded instead of qemu_bh_new
+if ($realfile =~ /.*\/hw\/.*/ && $line =~ /\bqemu_bh_new\s*\(/) {
+   ERROR("use qemu_bh_new_guarded() instead of 
qemu_bh_new() to avoid reentrancy problems\n" . $herecurr);
+   }
+# recommend aio_bh_new_guarded instead of aio_bh_new
+if ($realfile =~ /.*\/hw\/.*/ && $line =~ /\baio_bh_new\s*\(/) {
+   ERROR("use aio_bh_new_guarded() instead of aio_bh_new() 
to avoid reentrancy problems\n" . $herecurr);
+   }
 # check for module_init(), use category-specific init macros explicitly please
if ($line =~ /^module_init\s*\(/) {
ERROR("please use block_init(), type_init() etc. 
instead of module_init()\n" . $herecurr);
-- 
2.39.0

[PATCH v6 2/4] async: Add an optional reentrancy guard to the BH API

2023-02-04 Thread Alexander Bulekov

Devices can pass their MemoryReentrancyGuard (from their DeviceState),
when creating new BHes. Then, the async API will toggle the guard
before/after calling the BH call-back. This prevents bh->mmio reentrancy
issues.

Reviewed-by: Darren Kenny 
Signed-off-by: Alexander Bulekov 
---
 docs/devel/multiple-iothreads.txt |  7 +++
 include/block/aio.h   | 18 --
 include/qemu/main-loop.h  |  7 +--
 tests/unit/ptimer-test-stubs.c|  3 ++-
 util/async.c  | 18 +-
 util/main-loop.c  |  5 +++--
 util/trace-events |  1 +
 7 files changed, 51 insertions(+), 8 deletions(-)

diff --git a/docs/devel/multiple-iothreads.txt 
b/docs/devel/multiple-iothreads.txt
index 343120f2ef..a3e949f6b3 100644
--- a/docs/devel/multiple-iothreads.txt
+++ b/docs/devel/multiple-iothreads.txt
@@ -61,6 +61,7 @@ There are several old APIs that use the main loop AioContext:
  * LEGACY qemu_aio_set_event_notifier() - monitor an event notifier
  * LEGACY timer_new_ms() - create a timer
  * LEGACY qemu_bh_new() - create a BH
+ * LEGACY qemu_bh_new_guarded() - create a BH with a device re-entrancy guard
  * LEGACY qemu_aio_wait() - run an event loop iteration
 
 Since they implicitly work on the main loop they cannot be used in code that
@@ -72,8 +73,14 @@ Instead, use the AioContext functions directly (see 
include/block/aio.h):
  * aio_set_event_notifier() - monitor an event notifier
  * aio_timer_new() - create a timer
  * aio_bh_new() - create a BH
+ * aio_bh_new_guarded() - create a BH with a device re-entrancy guard
  * aio_poll() - run an event loop iteration
 
+The qemu_bh_new_guarded/aio_bh_new_guarded APIs accept a "MemReentrancyGuard"
+argument, which is used to check for and prevent re-entrancy problems. For
+BHs associated with devices, the reentrancy-guard is contained in the
+corresponding DeviceState and named "mem_reentrancy_guard".
+
 The AioContext can be obtained from the IOThread using
 iothread_get_aio_context() or for the main loop using qemu_get_aio_context().
 Code that takes an AioContext argument works both in IOThreads or the main
diff --git a/include/block/aio.h b/include/block/aio.h
index 8fba6a3584..3e3bdb9352 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -23,6 +23,8 @@
 #include "qemu/thread.h"
 #include "qemu/timer.h"
 #include "block/graph-lock.h"
+#include "hw/qdev-core.h"
+
 
 typedef struct BlockAIOCB BlockAIOCB;
 typedef void BlockCompletionFunc(void *opaque, int ret);
@@ -331,9 +333,11 @@ void aio_bh_schedule_oneshot_full(AioContext *ctx, 
QEMUBHFunc *cb, void *opaque,
  * is opaque and must be allocated prior to its use.
  *
  * @name: A human-readable identifier for debugging purposes.
+ * @reentrancy_guard: A guard set when entering a cb to prevent
+ * device-reentrancy issues
  */
 QEMUBH *aio_bh_new_full(AioContext *ctx, QEMUBHFunc *cb, void *opaque,
-const char *name);
+const char *name, MemReentrancyGuard 
*reentrancy_guard);
 
 /**
  * aio_bh_new: Allocate a new bottom half structure
@@ -342,7 +346,17 @@ QEMUBH *aio_bh_new_full(AioContext *ctx, QEMUBHFunc *cb, 
void *opaque,
  * string.
  */
 #define aio_bh_new(ctx, cb, opaque) \
-aio_bh_new_full((ctx), (cb), (opaque), (stringify(cb)))
+aio_bh_new_full((ctx), (cb), (opaque), (stringify(cb)), NULL)
+
+/**
+ * aio_bh_new_guarded: Allocate a new bottom half structure with a
+ * reentrancy_guard
+ *
+ * A convenience wrapper for aio_bh_new_full() that uses the cb as the name
+ * string.
+ */
+#define aio_bh_new_guarded(ctx, cb, opaque, guard) \
+aio_bh_new_full((ctx), (cb), (opaque), (stringify(cb)), guard)
 
 /**
  * aio_notify: Force processing of pending events.
diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
index c25f390696..84d1ce57f0 100644
--- a/include/qemu/main-loop.h
+++ b/include/qemu/main-loop.h
@@ -389,9 +389,12 @@ void qemu_cond_timedwait_iothread(QemuCond *cond, int ms);
 
 void qemu_fd_register(int fd);
 
+#define qemu_bh_new_guarded(cb, opaque, guard) \
+qemu_bh_new_full((cb), (opaque), (stringify(cb)), guard)
 #define qemu_bh_new(cb, opaque) \
-qemu_bh_new_full((cb), (opaque), (stringify(cb)))
-QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name);
+qemu_bh_new_full((cb), (opaque), (stringify(cb)), NULL)
+QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name,
+ MemReentrancyGuard *reentrancy_guard);
 void qemu_bh_schedule_idle(QEMUBH *bh);
 
 enum {
diff --git a/tests/unit/ptimer-test-stubs.c b/tests/unit/ptimer-test-stubs.c
index f5e75a96b6..24d5413f9d 100644
--- a/tests/unit/ptimer-test-stubs.c
+++ b/tests/unit/ptimer-test-stubs.c
@@ -107,7 +107,8 @@ int64_t qemu_clock_deadline_ns_all(QEMUClockType type, int 
attr_mask)
 return deadline;
 }
 
-QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name)
+QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb,

[PATCH v6 1/4] memory: prevent dma-reentracy issues

2023-02-04 Thread Alexander Bulekov

Add a flag to the DeviceState, when a device is engaged in PIO/MMIO/DMA.
This flag is set/checked prior to calling a device's MemoryRegion
handlers, and set when device code initiates DMA.  The purpose of this
flag is to prevent two types of DMA-based reentrancy issues:

1.) mmio -> dma -> mmio case
2.) bh -> dma write -> mmio case

These issues have led to problems such as stack-exhaustion and
use-after-frees.

Summary of the problem from Peter Maydell:
https://lore.kernel.org/qemu-devel/cafeaca_23vc7he3iam-jva6w38lk4hjowae5kcknhprd5fp...@mail.gmail.com

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/62
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/540
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/541
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/556
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/557
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/827
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1282

Reviewed-by: Darren Kenny 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Alexander Bulekov 
Acked-by: Peter Xu 
---
 include/hw/qdev-core.h |  7 +++
 softmmu/memory.c   | 17 +
 softmmu/trace-events   |  1 +
 3 files changed, 25 insertions(+)

diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 35fddb19a6..8858195262 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -162,6 +162,10 @@ struct NamedClockList {
 QLIST_ENTRY(NamedClockList) node;
 };
 
+typedef struct {
+bool engaged_in_io;
+} MemReentrancyGuard;
+
 /**
  * DeviceState:
  * @realized: Indicates whether the device has been fully constructed.
@@ -194,6 +198,9 @@ struct DeviceState {
 int alias_required_for_version;
 ResettableState reset;
 GSList *unplug_blockers;
+
+/* Is the device currently in mmio/pio/dma? Used to prevent re-entrancy */
+MemReentrancyGuard mem_reentrancy_guard;
 };
 
 struct DeviceListener {
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 9d64efca26..eefeeae317 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -533,6 +533,7 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
 uint64_t access_mask;
 unsigned access_size;
 unsigned i;
+DeviceState *dev = NULL;
 MemTxResult r = MEMTX_OK;
 
 if (!access_size_min) {
@@ -542,6 +543,19 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
 access_size_max = 4;
 }
 
+/* Do not allow more than one simultanous access to a device's IO Regions 
*/
+if (mr->owner &&
+!mr->ram_device && !mr->ram && !mr->rom_device && !mr->readonly) {
+dev = (DeviceState *) object_dynamic_cast(mr->owner, TYPE_DEVICE);
+if (dev) {
+if (dev->mem_reentrancy_guard.engaged_in_io) {
+trace_memory_region_reentrant_io(get_cpu_index(), mr, addr, 
size);
+return MEMTX_ERROR;
+}
+dev->mem_reentrancy_guard.engaged_in_io = true;
+}
+}
+
 /* FIXME: support unaligned access? */
 access_size = MAX(MIN(size, access_size_max), access_size_min);
 access_mask = MAKE_64BIT_MASK(0, access_size * 8);
@@ -556,6 +570,9 @@ static MemTxResult access_with_adjusted_size(hwaddr addr,
 access_mask, attrs);
 }
 }
+if (dev) {
+dev->mem_reentrancy_guard.engaged_in_io = false;
+}
 return r;
 }
 
diff --git a/softmmu/trace-events b/softmmu/trace-events
index 22606dc27b..62d04ea9a7 100644
--- a/softmmu/trace-events
+++ b/softmmu/trace-events
@@ -13,6 +13,7 @@ memory_region_ops_read(int cpu_index, void *mr, uint64_t 
addr, uint64_t value, u
 memory_region_ops_write(int cpu_index, void *mr, uint64_t addr, uint64_t 
value, unsigned size, const char *name) "cpu %d mr %p addr 0x%"PRIx64" value 
0x%"PRIx64" size %u name '%s'"
 memory_region_subpage_read(int cpu_index, void *mr, uint64_t offset, uint64_t 
value, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" value 0x%"PRIx64" size 
%u"
 memory_region_subpage_write(int cpu_index, void *mr, uint64_t offset, uint64_t 
value, unsigned size) "cpu %d mr %p offset 0x%"PRIx64" value 0x%"PRIx64" size 
%u"
+memory_region_reentrant_io(int cpu_index, void *mr, uint64_t offset, unsigned 
size) "cpu %d mr %p offset 0x%"PRIx64" size %u"
 memory_region_ram_device_read(int cpu_index, void *mr, uint64_t addr, uint64_t 
value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
 memory_region_ram_device_write(int cpu_index, void *mr, uint64_t addr, 
uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" 
size %u"
 memory_region_sync_dirty(const char *mr, const char *listener, int global) "mr 
'%s' listener '%s' synced (global=%d)"
-- 
2.39.0

[PATCH v6 0/4] memory: prevent dma-reentracy issues

2023-02-04 Thread Alexander Bulekov

These patches aim to solve two types of DMA-reentrancy issues:
  
1.) mmio -> dma -> mmio case
To solve this, we track whether the device is engaged in io by
checking/setting a reentrancy-guard within APIs used for MMIO access.
  
2.) bh -> dma write -> mmio case
This case is trickier, since we dont have a generic way to associate a
bh with the underlying Device/DeviceState. Thus, this version allows a
device to associate a reentrancy-guard with a bh, when creating it.
(Instead of calling qemu_bh_new, you call qemu_bh_new_guarded)
  
I replaced most of the qemu_bh_new invocations with the guarded analog,
except for the ones where the DeviceState was not trivially accessible.

v5 -> v6:
- Only apply checkpatch checks to code in paths containing "/hw/"
  (/hw/ and include/hw/)
- Fix a bug in a _guarded call added to hw/block/virtio-blk.c
v4-> v5:
- Add corresponding checkpatch checks
- Save/restore reentrancy-flag when entering/exiting BHs
- Improve documentation
- Check object_dynamic_cast return value
  
v3 -> v4: Instead of changing all of the DMA APIs, instead add an
optional reentrancy guard to the BH API.

v2 -> v3: Bite the bullet and modify the DMA APIs, rather than
attempting to guess DeviceStates in BHs.

Alexander Bulekov (4):
  memory: prevent dma-reentracy issues
  async: Add an optional reentrancy guard to the BH API
  checkpatch: add qemu_bh_new/aio_bh_new checks
  hw: replace most qemu_bh_new calls with qemu_bh_new_guarded

 docs/devel/multiple-iothreads.txt |  7 +++
 hw/9pfs/xen-9p-backend.c  |  4 +++-
 hw/block/dataplane/virtio-blk.c   |  3 ++-
 hw/block/dataplane/xen-block.c|  5 +++--
 hw/char/virtio-serial-bus.c   |  3 ++-
 hw/display/qxl.c  |  9 ++---
 hw/display/virtio-gpu.c   |  6 --
 hw/ide/ahci.c |  3 ++-
 hw/ide/core.c |  3 ++-
 hw/misc/imx_rngc.c|  6 --
 hw/misc/macio/mac_dbdma.c |  2 +-
 hw/net/virtio-net.c   |  3 ++-
 hw/nvme/ctrl.c|  6 --
 hw/scsi/mptsas.c  |  3 ++-
 hw/scsi/scsi-bus.c|  3 ++-
 hw/scsi/vmw_pvscsi.c  |  3 ++-
 hw/usb/dev-uas.c  |  3 ++-
 hw/usb/hcd-dwc2.c |  3 ++-
 hw/usb/hcd-ehci.c |  3 ++-
 hw/usb/hcd-uhci.c |  2 +-
 hw/usb/host-libusb.c  |  6 --
 hw/usb/redirect.c |  6 --
 hw/usb/xen-usb.c  |  3 ++-
 hw/virtio/virtio-balloon.c|  5 +++--
 hw/virtio/virtio-crypto.c |  3 ++-
 include/block/aio.h   | 18 --
 include/hw/qdev-core.h|  7 +++
 include/qemu/main-loop.h  |  7 +--
 scripts/checkpatch.pl |  8 
 softmmu/memory.c  | 17 +
 softmmu/trace-events  |  1 +
 tests/unit/ptimer-test-stubs.c|  3 ++-
 util/async.c  | 18 +-
 util/main-loop.c  |  5 +++--
 util/trace-events |  1 +
 35 files changed, 147 insertions(+), 41 deletions(-)

-- 
2.39.0

Re: [PULL 00/11] Net patches

2023-02-04 Thread Laurent Vivier


On 2/4/23 15:57, Peter Maydell wrote:

On Thu, 2 Feb 2023 at 06:21, Jason Wang  wrote:


The following changes since commit 13356edb87506c148b163b8c7eb0695647d00c2a:

   Merge tag 'block-pull-request' of https://gitlab.com/stefanha/qemu into 
staging (2023-01-24 09:45:33 +)

are available in the git repository at:

   https://github.com/jasowang/qemu.git tags/net-pull-request

for you to fetch changes up to 2bd492bca521ee8594f1d5db8dc9aac126fc4f85:

   vdpa: fix VHOST_BACKEND_F_IOTLB_ASID flag check (2023-02-02 14:16:48 +0800)




Something weird has happened here -- this pullreq is trying to
add tests/qtest/netdev-socket.c, but it already exists in the
tree and doesn't have the same contents as the version in your
pull request.

Can you look at what's happened here and fix it up, please ?


Thomas and Jason have queued the patch:

  tests/qtest: netdev: test stream and dgram backends

For Jason it's because it's needed by

  net: stream: add a new option to automatically reconnect

For me, both patches (in tree and Jason's one) are identical to my v7
(except the one that is merged does not have Thomas' acked-by).

Jason, you can remove PULL 09/11 from your pull request has it is already 
merged [1]

Thanks,
Laurent

[1] c95031a19f0d ("tests/qtest: netdev: test stream and dgram backends")

Re: [PULL 00/22] Linux user for 8.0 patches

2023-02-04 Thread Peter Maydell

On Sat, 4 Feb 2023 at 16:08, Laurent Vivier  wrote:
>
> The following changes since commit 13356edb87506c148b163b8c7eb0695647d00c2a:
>
>   Merge tag 'block-pull-request' of https://gitlab.com/stefanha/qemu into 
> staging (2023-01-24 09:45:33 +)
>
> are available in the Git repository at:
>
>   https://gitlab.com/laurent_vivier/qemu.git 
> tags/linux-user-for-8.0-pull-request
>
> for you to fetch changes up to 3f0744f98b07c6fd2ce9d5840726d0915b2ae7c1:
>
>   linux-user: Allow sendmsg() without IOV (2023-02-03 22:55:12 +0100)
>
> 
> linux-user branch pull request 20230204
>
> Implement execveat()
> un-parent OBJECT(cpu) when closing thread
> Revert fix for glibc >= 2.36 sys/mount.h
> Fix/update strace
> move target_flat.h to target subdirs
> Fix SO_ERROR return code of getsockopt()
> Fix /proc/cpuinfo output for hppa
> Add emulation for MADV_WIPEONFORK and MADV_KEEPONFORK in madvise()
> Implement SOL_ALG encryption support
> linux-user: Allow sendmsg() without IOV
>
> 


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.0
for any user-visible changes.

-- PMM

Re: [PATCH] hw/ppc/pegasos2: Fix a typo in a comment

2023-02-04 Thread Daniel Henrique Barboza


Queued in gitlab.com/danielhb/qemu/tree/ppc-next. Thanks,


Daniel

On 2/3/23 16:43, BALATON Zoltan wrote:

Reported-by: Stefan Weil 
Signed-off-by: BALATON Zoltan 
---
  hw/ppc/pegasos2.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/ppc/pegasos2.c b/hw/ppc/pegasos2.c
index 1a13632ba6..a9563f4fb2 100644
--- a/hw/ppc/pegasos2.c
+++ b/hw/ppc/pegasos2.c
@@ -564,7 +564,7 @@ static void dt_isa(PCIBus *bus, PCIDevice *d, FDTInfo *fi)
  qemu_fdt_setprop_string(fi->fdt, fi->path, "device_type", "isa");
  qemu_fdt_setprop_string(fi->fdt, fi->path, "name", "isa");
  
-/* addional devices */

+/* additional devices */
  g_string_printf(name, "%s/lpt@i3bc", fi->path);
  qemu_fdt_add_subnode(fi->fdt, name->str);
  qemu_fdt_setprop_cell(fi->fdt, name->str, "clock-frequency", 0);

Re: [PATCH] tcg: Init temp_subindex in liveness_pass_2

2023-02-04 Thread Philippe Mathieu-Daudé


On 3/2/23 23:59, Richard Henderson wrote:

Correctly handle large types while lowering.

Fixes: fac87bd2a49b ("tcg: Add temp_subindex to TCGTemp")
Signed-off-by: Richard Henderson 
---
  tcg/tcg.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index fd557d55d3..bc60fd0fe8 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -3063,6 +3063,7 @@ static bool liveness_pass_2(TCGContext *s)
  TCGTemp *dts = tcg_temp_alloc(s);
  dts->type = its->type;
  dts->base_type = its->base_type;
+dts->temp_subindex = its->temp_subindex;
  dts->kind = TEMP_EBB;
  its->state_ptr = dts;
  } else {


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH 4/4] pcie: add trace-poing for power indicator transitions

2023-02-04 Thread Vladimir Sementsov-Ogievskiy


Oops, sorry. Both [4] patches are equal, except for this one has a typo in 
subject


--
Best regards,
Vladimir

[PATCH 3/4] pcie: drop unused PCIExpressIndicator

2023-02-04 Thread Vladimir Sementsov-Ogievskiy

The structure type is unused. Also, it's the only user of corresponding
macros, so drop them too.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/hw/pci/pcie.h  | 8 
 include/hw/pci/pcie_regs.h | 5 -
 2 files changed, 13 deletions(-)

diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
index 798a262a0a..3cc2b15957 100644
--- a/include/hw/pci/pcie.h
+++ b/include/hw/pci/pcie.h
@@ -27,14 +27,6 @@
 #include "hw/pci/pcie_sriov.h"
 #include "hw/hotplug.h"
 
-typedef enum {
-/* for attention and power indicator */
-PCI_EXP_HP_IND_RESERVED = PCI_EXP_SLTCTL_IND_RESERVED,
-PCI_EXP_HP_IND_ON   = PCI_EXP_SLTCTL_IND_ON,
-PCI_EXP_HP_IND_BLINK= PCI_EXP_SLTCTL_IND_BLINK,
-PCI_EXP_HP_IND_OFF  = PCI_EXP_SLTCTL_IND_OFF,
-} PCIExpressIndicator;
-
 typedef enum {
 /* these bits must match the bits in Slot Control/Status registers.
  * PCI_EXP_HP_EV_xxx = PCI_EXP_SLTCTL_xxxE = PCI_EXP_SLTSTA_xxx
diff --git a/include/hw/pci/pcie_regs.h b/include/hw/pci/pcie_regs.h
index 00b595a82e..1fe0bdd25b 100644
--- a/include/hw/pci/pcie_regs.h
+++ b/include/hw/pci/pcie_regs.h
@@ -66,11 +66,6 @@ typedef enum PCIExpLinkWidth {
 
 #define PCI_EXP_SLTCAP_PSN_SHIFTctz32(PCI_EXP_SLTCAP_PSN)
 
-#define PCI_EXP_SLTCTL_IND_RESERVED 0x0
-#define PCI_EXP_SLTCTL_IND_ON   0x1
-#define PCI_EXP_SLTCTL_IND_BLINK0x2
-#define PCI_EXP_SLTCTL_IND_OFF  0x3
-
 #define PCI_EXP_SLTCTL_SUPPORTED\
 (PCI_EXP_SLTCTL_ABPE |  \
  PCI_EXP_SLTCTL_PDCE |  \
-- 
2.34.1

[PATCH 2/4] pcie_regs: drop duplicated indicator value macros

2023-02-04 Thread Vladimir Sementsov-Ogievskiy

We already have indicator values in
include/standard-headers/linux/pci_regs.h , no reason to reinvent them
in include/hw/pci/pcie_regs.h. (and we already have usage of
PCI_EXP_SLTCTL_PWR_IND_BLINK and PCI_EXP_SLTCTL_PWR_IND_OFF in
hw/pci/pcie.c, so let's be consistent)

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/hw/pci/pcie_regs.h |  9 -
 hw/pci/pcie.c  | 13 +++--
 2 files changed, 7 insertions(+), 15 deletions(-)

diff --git a/include/hw/pci/pcie_regs.h b/include/hw/pci/pcie_regs.h
index 963dc2e170..00b595a82e 100644
--- a/include/hw/pci/pcie_regs.h
+++ b/include/hw/pci/pcie_regs.h
@@ -70,15 +70,6 @@ typedef enum PCIExpLinkWidth {
 #define PCI_EXP_SLTCTL_IND_ON   0x1
 #define PCI_EXP_SLTCTL_IND_BLINK0x2
 #define PCI_EXP_SLTCTL_IND_OFF  0x3
-#define PCI_EXP_SLTCTL_AIC_SHIFTctz32(PCI_EXP_SLTCTL_AIC)
-#define PCI_EXP_SLTCTL_AIC_OFF  \
-(PCI_EXP_SLTCTL_IND_OFF << PCI_EXP_SLTCTL_AIC_SHIFT)
-
-#define PCI_EXP_SLTCTL_PIC_SHIFTctz32(PCI_EXP_SLTCTL_PIC)
-#define PCI_EXP_SLTCTL_PIC_OFF  \
-(PCI_EXP_SLTCTL_IND_OFF << PCI_EXP_SLTCTL_PIC_SHIFT)
-#define PCI_EXP_SLTCTL_PIC_ON  \
-(PCI_EXP_SLTCTL_IND_ON << PCI_EXP_SLTCTL_PIC_SHIFT)
 
 #define PCI_EXP_SLTCTL_SUPPORTED\
 (PCI_EXP_SLTCTL_ABPE |  \
diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index 82ef723983..ccdb2377e1 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -634,8 +634,8 @@ void pcie_cap_slot_init(PCIDevice *dev, PCIESlot *s)
  PCI_EXP_SLTCTL_PIC |
  PCI_EXP_SLTCTL_AIC);
 pci_word_test_and_set_mask(dev->config + pos + PCI_EXP_SLTCTL,
-   PCI_EXP_SLTCTL_PIC_OFF |
-   PCI_EXP_SLTCTL_AIC_OFF);
+   PCI_EXP_SLTCTL_PWR_IND_OFF |
+   PCI_EXP_SLTCTL_ATTN_IND_OFF);
 pci_word_test_and_set_mask(dev->wmask + pos + PCI_EXP_SLTCTL,
PCI_EXP_SLTCTL_PIC |
PCI_EXP_SLTCTL_AIC |
@@ -679,7 +679,7 @@ void pcie_cap_slot_reset(PCIDevice *dev)
  PCI_EXP_SLTCTL_PDCE |
  PCI_EXP_SLTCTL_ABPE);
 pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTCTL,
-   PCI_EXP_SLTCTL_AIC_OFF);
+   PCI_EXP_SLTCTL_ATTN_IND_OFF);
 
 if (dev->cap_present & QEMU_PCIE_SLTCAP_PCP) {
 /* Downstream ports enforce device number 0. */
@@ -694,7 +694,8 @@ void pcie_cap_slot_reset(PCIDevice *dev)
PCI_EXP_SLTCTL_PCC);
 }
 
-pic = populated ? PCI_EXP_SLTCTL_PIC_ON : PCI_EXP_SLTCTL_PIC_OFF;
+pic = populated ?
+PCI_EXP_SLTCTL_PWR_IND_ON : PCI_EXP_SLTCTL_PWR_IND_OFF;
 pci_word_test_and_set_mask(exp_cap + PCI_EXP_SLTCTL, pic);
 }
 
@@ -770,9 +771,9 @@ void pcie_cap_slot_write_config(PCIDevice *dev,
  * control of powered off slots before powering them on.
  */
 if ((sltsta & PCI_EXP_SLTSTA_PDS) && (val & PCI_EXP_SLTCTL_PCC) &&
-(val & PCI_EXP_SLTCTL_PIC) == PCI_EXP_SLTCTL_PIC_OFF &&
+(val & PCI_EXP_SLTCTL_PIC) == PCI_EXP_SLTCTL_PWR_IND_OFF &&
 (!(old_slt_ctl & PCI_EXP_SLTCTL_PCC) ||
-(old_slt_ctl & PCI_EXP_SLTCTL_PIC) != PCI_EXP_SLTCTL_PIC_OFF)) {
+(old_slt_ctl & PCI_EXP_SLTCTL_PIC) != PCI_EXP_SLTCTL_PWR_IND_OFF)) {
 pcie_cap_slot_do_unplug(dev);
 }
 pcie_cap_update_power(dev);
-- 
2.34.1

[PATCH 4/4] pcie: add trace-poing for power indicator transitions

2023-02-04 Thread Vladimir Sementsov-Ogievskiy

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 hw/pci/pcie.c   | 20 
 hw/pci/trace-events |  3 +++
 2 files changed, 23 insertions(+)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index ccdb2377e1..1a19368994 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -28,6 +28,7 @@
 #include "hw/pci/pcie_regs.h"
 #include "hw/pci/pcie_port.h"
 #include "qemu/range.h"
+#include "trace.h"
 
 //#define DEBUG_PCIE
 #ifdef DEBUG_PCIE
@@ -718,6 +719,20 @@ void pcie_cap_slot_get(PCIDevice *dev, uint16_t *slt_ctl, 
uint16_t *slt_sta)
 *slt_sta = pci_get_word(exp_cap + PCI_EXP_SLTSTA);
 }
 
+static const char *pcie_sltctl_pic_str(uint16_t sltctl)
+{
+switch (sltctl & PCI_EXP_SLTCTL_PIC) {
+case PCI_EXP_SLTCTL_PWR_IND_ON:
+return "on";
+case PCI_EXP_SLTCTL_PWR_IND_BLINK:
+return "blink";
+case PCI_EXP_SLTCTL_PWR_IND_OFF:
+return "off";
+default:
+return "?";
+}
+}
+
 void pcie_cap_slot_write_config(PCIDevice *dev,
 uint16_t old_slt_ctl, uint16_t old_slt_sta,
 uint32_t addr, uint32_t val, int len)
@@ -762,6 +777,11 @@ void pcie_cap_slot_write_config(PCIDevice *dev,
 sltsta);
 }
 
+if ((val & PCI_EXP_SLTCTL_PIC) != (old_slt_ctl & PCI_EXP_SLTCTL_PIC)) {
+trace_pcie_power_indicator(pcie_sltctl_pic_str(old_slt_ctl),
+   pcie_sltctl_pic_str(val));
+}
+
 /*
  * If the slot is populated, power indicator is off and power
  * controller is off, it is safe to detach the devices.
diff --git a/hw/pci/trace-events b/hw/pci/trace-events
index aaf46bc92d..ec4a5ff43d 100644
--- a/hw/pci/trace-events
+++ b/hw/pci/trace-events
@@ -15,3 +15,6 @@ msix_write_config(char *name, bool enabled, bool masked) "dev 
%s enabled %d mask
 sriov_register_vfs(const char *name, int slot, int function, int num_vfs) "%s 
%02x:%x: creating %d vf devs"
 sriov_unregister_vfs(const char *name, int slot, int function, int num_vfs) 
"%s %02x:%x: Unregistering %d vf devs"
 sriov_config_write(const char *name, int slot, int fun, uint32_t offset, 
uint32_t val, uint32_t len) "%s %02x:%x: sriov offset 0x%x val 0x%x len %d"
+
+# pcie.c
+pcie_power_indicator(const char *old, const char *new) "%s -> %s"
-- 
2.34.1

[PATCH 4/4] pcie: add trace-point for power indicator transitions

2023-02-04 Thread Vladimir Sementsov-Ogievskiy

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 hw/pci/pcie.c   | 20 
 hw/pci/trace-events |  3 +++
 2 files changed, 23 insertions(+)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index ccdb2377e1..1a19368994 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -28,6 +28,7 @@
 #include "hw/pci/pcie_regs.h"
 #include "hw/pci/pcie_port.h"
 #include "qemu/range.h"
+#include "trace.h"
 
 //#define DEBUG_PCIE
 #ifdef DEBUG_PCIE
@@ -718,6 +719,20 @@ void pcie_cap_slot_get(PCIDevice *dev, uint16_t *slt_ctl, 
uint16_t *slt_sta)
 *slt_sta = pci_get_word(exp_cap + PCI_EXP_SLTSTA);
 }
 
+static const char *pcie_sltctl_pic_str(uint16_t sltctl)
+{
+switch (sltctl & PCI_EXP_SLTCTL_PIC) {
+case PCI_EXP_SLTCTL_PWR_IND_ON:
+return "on";
+case PCI_EXP_SLTCTL_PWR_IND_BLINK:
+return "blink";
+case PCI_EXP_SLTCTL_PWR_IND_OFF:
+return "off";
+default:
+return "?";
+}
+}
+
 void pcie_cap_slot_write_config(PCIDevice *dev,
 uint16_t old_slt_ctl, uint16_t old_slt_sta,
 uint32_t addr, uint32_t val, int len)
@@ -762,6 +777,11 @@ void pcie_cap_slot_write_config(PCIDevice *dev,
 sltsta);
 }
 
+if ((val & PCI_EXP_SLTCTL_PIC) != (old_slt_ctl & PCI_EXP_SLTCTL_PIC)) {
+trace_pcie_power_indicator(pcie_sltctl_pic_str(old_slt_ctl),
+   pcie_sltctl_pic_str(val));
+}
+
 /*
  * If the slot is populated, power indicator is off and power
  * controller is off, it is safe to detach the devices.
diff --git a/hw/pci/trace-events b/hw/pci/trace-events
index aaf46bc92d..ec4a5ff43d 100644
--- a/hw/pci/trace-events
+++ b/hw/pci/trace-events
@@ -15,3 +15,6 @@ msix_write_config(char *name, bool enabled, bool masked) "dev 
%s enabled %d mask
 sriov_register_vfs(const char *name, int slot, int function, int num_vfs) "%s 
%02x:%x: creating %d vf devs"
 sriov_unregister_vfs(const char *name, int slot, int function, int num_vfs) 
"%s %02x:%x: Unregistering %d vf devs"
 sriov_config_write(const char *name, int slot, int fun, uint32_t offset, 
uint32_t val, uint32_t len) "%s %02x:%x: sriov offset 0x%x val 0x%x len %d"
+
+# pcie.c
+pcie_power_indicator(const char *old, const char *new) "%s -> %s"
-- 
2.34.1

[PATCH 1/4] pcie: pcie_cap_slot_write_config(): use correct macro

2023-02-04 Thread Vladimir Sementsov-Ogievskiy

PCI_EXP_SLTCTL_PIC_OFF is a value, and PCI_EXP_SLTCTL_PIC is a mask.
Happily PCI_EXP_SLTCTL_PIC_OFF is a maximum value for this mask and is
equal to the mask itself. Still the code looks like a bug. Let's make
it more reader-friendly.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 hw/pci/pcie.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index 924fdabd15..82ef723983 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -770,9 +770,9 @@ void pcie_cap_slot_write_config(PCIDevice *dev,
  * control of powered off slots before powering them on.
  */
 if ((sltsta & PCI_EXP_SLTSTA_PDS) && (val & PCI_EXP_SLTCTL_PCC) &&
-(val & PCI_EXP_SLTCTL_PIC_OFF) == PCI_EXP_SLTCTL_PIC_OFF &&
+(val & PCI_EXP_SLTCTL_PIC) == PCI_EXP_SLTCTL_PIC_OFF &&
 (!(old_slt_ctl & PCI_EXP_SLTCTL_PCC) ||
-(old_slt_ctl & PCI_EXP_SLTCTL_PIC_OFF) != PCI_EXP_SLTCTL_PIC_OFF)) {
+(old_slt_ctl & PCI_EXP_SLTCTL_PIC) != PCI_EXP_SLTCTL_PIC_OFF)) {
 pcie_cap_slot_do_unplug(dev);
 }
 pcie_cap_update_power(dev);
-- 
2.34.1

[PATCH 0/4] pcie: cleanup code and add trace point

2023-02-04 Thread Vladimir Sementsov-Ogievskiy

Hi all!

Here is tiny code cleanup + on trace point to track power indicator
changes (which may help to analyze
"Hot-unplug failed: guest is busy (power indicator blinking)" error
message).

Vladimir Sementsov-Ogievskiy (4):
  pcie: pcie_cap_slot_write_config(): use correct macro
  pcie_regs: drop duplicated indicator value macros
  pcie: drop unused PCIExpressIndicator
  pcie: add trace-point for power indicator transitions

 include/hw/pci/pcie.h  |  8 
 include/hw/pci/pcie_regs.h | 14 --
 hw/pci/pcie.c  | 33 +++--
 hw/pci/trace-events|  3 +++
 4 files changed, 30 insertions(+), 28 deletions(-)

-- 
2.34.1

Re: [PULL 0/1] M68k next patches

2023-02-04 Thread Peter Maydell

On Wed, 1 Feb 2023 at 09:54, Laurent Vivier  wrote:
>
> The following changes since commit 13356edb87506c148b163b8c7eb0695647d00c2a:
>
>   Merge tag 'block-pull-request' of https://gitlab.com/stefanha/qemu into 
> staging (2023-01-24 09:45:33 +)
>
> are available in the Git repository at:
>
>   https://github.com/vivier/qemu-m68k.git tags/m68k-next-pull-request
>
> for you to fetch changes up to c1fc91b82545a2b8ab73f81e5b7b6b0fec292ea1:
>
>   m68k: fix 'bkpt' instruction in softmmu mode (2023-02-01 10:18:21 +0100)
>
> 
> m68k pull request 20230201
>
> fix 'bkpt' instruction in softmmu mode
>
> 
>
> Laurent Vivier (1):
>   m68k: fix 'bkpt' instruction in softmmu mode
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.0
for any user-visible changes.

-- PMM

Re: pixman_blt on aarch64

2023-02-04 Thread BALATON Zoltan

This has just bounced, I hoped to still be able to post after moderation 
but now I'm resending it after subscribing to the pixman list. Meanwhile 
I've found this ticket as well: 
https://gitlab.freedesktop.org/pixman/pixman/-/merge_requests/71
See the rest of the message below. Looks like this is being worked on but 
I'm not sure how far is it from getting resolved. Any info on that?


On Sat, 4 Feb 2023, BALATON Zoltan wrote:

Hello,

I'm trying to involve the pixman list in this thread on qemu-devel list 
started with subject "Display update issue on M1 Macs". See here:


https://lists.nongnu.org/archive/html/qemu-devel/2023-02/msg01033.html

We have found that on aarch64 Macs running macOS the pixman_blt and 
pixman_fill functions are disabled without fallback due to not being able to 
compile the needed assembly code. See detailed discussion below.


Is there a way to fix this in pixman in the near future or provide a fallback 
for this in pixman? Or do I need to add a fallback in QEMU or try using 
something else instead of pixman for these functions?


Thank you,
BALATON Zoltan

On Sat, 4 Feb 2023, Akihiko Odaki wrote:

On 2023/02/03 22:45, BALATON Zoltan wrote:

On Fri, 3 Feb 2023, Akihiko Odaki wrote:
I finally reproduced the issue with MorphOS and ati-vga and figured out 
its cause.


The problem is that pixman_blt() is disabled because its backend is 
written in GNU assembly, and GNU assembler is not available on macOS. 
There is no fallback written in C, unfortunately. The issue is tracked by 
the upstream at:

https://gitlab.freedesktop.org/pixman/pixman/-/issues/59


Hm, OK but that ticket is just about compile error and suggests to disable 
it and does not say it won't work then. Are they aware this is a problem? 
Maybe we should write to their mailing list after we're sure what's 
happening.


That's a good idea. They may prioritize the issue if they realize that 
disables pixman_blt().


I hit the same problem on Asahi Linux, which is based on Arch Linux ARM. 
It is because Arch Linux copied PKGBUILD from x86 Arch Linux, which 
disables Arm backends. It is easy to enable the backend for the platform 
so I proposed a change at:

https://github.com/archlinuxarm/PKGBUILDs/pull/1985


On macOS one source of pixman most people use is brew.sh where this seems 
to be disabled:


https://github.com/Homebrew/homebrew-core/blob/master/Formula/pixman.rb

another source is macports which has an older version and no such options:

https://github.com/macports/macports-ports/blob/master/graphics/libpixman-devel/Portfile

I wonder if it compiles from macports on aarch64 then.


It's more likely that it is just outdated. It does not carry a patch to fix 
the issue.


I wait if I can get some more test results and try to check pixman but its 
source is not too clear to me and there are no docs either so maybe the 
best way is to ask on their list. If this is a pixman issue I hope it can 
be fixed there and we don't need to implement a fallback in QEMU.


This is certainly a pixman issue.

If you read the source, you can see pixman_blt() calls 
_pixman_implementation_blt(). _pixman_implementation_blt() calls blt member 
of pixman_implementation_t in turn. Grepping for "blt =" tells it is only 
assigned in:

pixman/pixman-arm-neon.c
pixman/pixman-arm-simd.c
pixman/pixman-mips-dspr2.c
pixman/pixman-mmx.c
pixman/pixman-sse2.c

For AArch64, only pixman/pixman-arm-neon.c is relevant, and it needs to be 
disabled to build the library on macOS.


Regards,
Akihiko Odaki



Regards,
BALATON Zoltan

[PULL 10/40] include/qemu/int128: Use Int128 structure for TCI

2023-02-04 Thread Richard Henderson

We are about to allow passing Int128 to/from tcg helper functions,
but libffi doesn't support __int128_t, so use the structure.

In order for atomic128.h to continue working, we must provide
a mechanism to frob between real __int128_t and the structure.
Provide a new union, Int128Alias, for this.  We cannot modify
Int128 itself, as any changed alignment would also break libffi.

Reviewed-by: Alex Bennée 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 include/qemu/atomic128.h | 29 +--
 include/qemu/int128.h| 25 +---
 util/int128.c| 42 
 3 files changed, 87 insertions(+), 9 deletions(-)

diff --git a/include/qemu/atomic128.h b/include/qemu/atomic128.h
index adb9a1a260..d0ba0b9c65 100644
--- a/include/qemu/atomic128.h
+++ b/include/qemu/atomic128.h
@@ -44,13 +44,23 @@
 #if defined(CONFIG_ATOMIC128)
 static inline Int128 atomic16_cmpxchg(Int128 *ptr, Int128 cmp, Int128 new)
 {
-return qatomic_cmpxchg__nocheck(ptr, cmp, new);
+Int128Alias r, c, n;
+
+c.s = cmp;
+n.s = new;
+r.i = qatomic_cmpxchg__nocheck((__int128_t *)ptr, c.i, n.i);
+return r.s;
 }
 # define HAVE_CMPXCHG128 1
 #elif defined(CONFIG_CMPXCHG128)
 static inline Int128 atomic16_cmpxchg(Int128 *ptr, Int128 cmp, Int128 new)
 {
-return __sync_val_compare_and_swap_16(ptr, cmp, new);
+Int128Alias r, c, n;
+
+c.s = cmp;
+n.s = new;
+r.i = __sync_val_compare_and_swap_16((__int128_t *)ptr, c.i, n.i);
+return r.s;
 }
 # define HAVE_CMPXCHG128 1
 #elif defined(__aarch64__)
@@ -89,12 +99,18 @@ Int128 QEMU_ERROR("unsupported atomic")
 #if defined(CONFIG_ATOMIC128)
 static inline Int128 atomic16_read(Int128 *ptr)
 {
-return qatomic_read__nocheck(ptr);
+Int128Alias r;
+
+r.i = qatomic_read__nocheck((__int128_t *)ptr);
+return r.s;
 }
 
 static inline void atomic16_set(Int128 *ptr, Int128 val)
 {
-qatomic_set__nocheck(ptr, val);
+Int128Alias v;
+
+v.s = val;
+qatomic_set__nocheck((__int128_t *)ptr, v.i);
 }
 
 # define HAVE_ATOMIC128 1
@@ -132,7 +148,8 @@ static inline void atomic16_set(Int128 *ptr, Int128 val)
 static inline Int128 atomic16_read(Int128 *ptr)
 {
 /* Maybe replace 0 with 0, returning the old value.  */
-return atomic16_cmpxchg(ptr, 0, 0);
+Int128 z = int128_make64(0);
+return atomic16_cmpxchg(ptr, z, z);
 }
 
 static inline void atomic16_set(Int128 *ptr, Int128 val)
@@ -141,7 +158,7 @@ static inline void atomic16_set(Int128 *ptr, Int128 val)
 do {
 cmp = old;
 old = atomic16_cmpxchg(ptr, cmp, val);
-} while (old != cmp);
+} while (int128_ne(old, cmp));
 }
 
 # define HAVE_ATOMIC128 1
diff --git a/include/qemu/int128.h b/include/qemu/int128.h
index d2b76ca6ac..f62a46b48c 100644
--- a/include/qemu/int128.h
+++ b/include/qemu/int128.h
@@ -3,7 +3,12 @@
 
 #include "qemu/bswap.h"
 
-#ifdef CONFIG_INT128
+/*
+ * With TCI, we need to use libffi for interfacing with TCG helpers.
+ * But libffi does not support __int128_t, and therefore cannot pass
+ * or return values of this type, force use of the Int128 struct.
+ */
+#if defined(CONFIG_INT128) && !defined(CONFIG_TCG_INTERPRETER)
 typedef __int128_t Int128;
 
 static inline Int128 int128_make64(uint64_t a)
@@ -460,8 +465,7 @@ Int128 int128_divu(Int128, Int128);
 Int128 int128_remu(Int128, Int128);
 Int128 int128_divs(Int128, Int128);
 Int128 int128_rems(Int128, Int128);
-
-#endif /* CONFIG_INT128 */
+#endif /* CONFIG_INT128 && !CONFIG_TCG_INTERPRETER */
 
 static inline void bswap128s(Int128 *s)
 {
@@ -472,4 +476,19 @@ static inline void bswap128s(Int128 *s)
 #define INT128_MAX int128_make128(UINT64_MAX, INT64_MAX)
 #define INT128_MIN int128_make128(0, INT64_MIN)
 
+/*
+ * When compiler supports a 128-bit type, define a combination of
+ * a possible structure and the native types.  Ease parameter passing
+ * via use of the transparent union extension.
+ */
+#ifdef CONFIG_INT128
+typedef union {
+Int128 s;
+__int128_t i;
+__uint128_t u;
+} Int128Alias __attribute__((transparent_union));
+#else
+typedef Int128 Int128Alias;
+#endif /* CONFIG_INT128 */
+
 #endif /* INT128_H */
diff --git a/util/int128.c b/util/int128.c
index ed8f25fef1..df6c6331bd 100644
--- a/util/int128.c
+++ b/util/int128.c
@@ -144,4 +144,46 @@ Int128 int128_rems(Int128 a, Int128 b)
 return r;
 }
 
+#elif defined(CONFIG_TCG_INTERPRETER)
+
+Int128 int128_divu(Int128 a_s, Int128 b_s)
+{
+Int128Alias r, a, b;
+
+a.s = a_s;
+b.s = b_s;
+r.u = a.u / b.u;
+return r.s;
+}
+
+Int128 int128_remu(Int128 a_s, Int128 b_s)
+{
+Int128Alias r, a, b;
+
+a.s = a_s;
+b.s = b_s;
+r.u = a.u % b.u;
+return r.s;
+}
+
+Int128 int128_divs(Int128 a_s, Int128 b_s)
+{
+Int128Alias r, a, b;
+
+a.s = a_s;
+b.s = b_s;
+r.i = a.i / b.i;
+return r.s;
+}
+
+Int128 int128_rems(Int128 a_s, Int128 b_s)
+{
+Int128Alias r, a, b;
+
+a.s = a_s;
+

[PULL 36/40] target/s390x: Implement CC_OP_NZ in gen_op_calc_cc

2023-02-04 Thread Richard Henderson

This case is trivial to implement inline.

Reviewed-by: David Hildenbrand 
Signed-off-by: Richard Henderson 
---
 target/s390x/tcg/translate.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
index 9ea28b3e52..ac5bd98f04 100644
--- a/target/s390x/tcg/translate.c
+++ b/target/s390x/tcg/translate.c
@@ -625,6 +625,9 @@ static void gen_op_calc_cc(DisasContext *s)
 /* env->cc_op already is the cc value */
 break;
 case CC_OP_NZ:
+tcg_gen_setcondi_i64(TCG_COND_NE, cc_dst, cc_dst, 0);
+tcg_gen_extrl_i64_i32(cc_op, cc_dst);
+break;
 case CC_OP_ABS_64:
 case CC_OP_NABS_64:
 case CC_OP_ABS_32:
-- 
2.34.1

[PULL 33/40] target/s390x: Use Int128 for returning float128

2023-02-04 Thread Richard Henderson

Acked-by: David Hildenbrand 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
v2: Remove extraneous return_low128.
---
 target/s390x/helper.h| 22 +++---
 target/s390x/tcg/insn-data.h.inc | 20 ++---
 target/s390x/tcg/fpu_helper.c| 29 +-
 target/s390x/tcg/translate.c | 51 +---
 4 files changed, 63 insertions(+), 59 deletions(-)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index b4170a4256..d40aeb471f 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -31,32 +31,32 @@ DEF_HELPER_4(clcle, i32, env, i32, i64, i32)
 DEF_HELPER_4(clclu, i32, env, i32, i64, i32)
 DEF_HELPER_3(cegb, i64, env, s64, i32)
 DEF_HELPER_3(cdgb, i64, env, s64, i32)
-DEF_HELPER_3(cxgb, i64, env, s64, i32)
+DEF_HELPER_3(cxgb, i128, env, s64, i32)
 DEF_HELPER_3(celgb, i64, env, i64, i32)
 DEF_HELPER_3(cdlgb, i64, env, i64, i32)
-DEF_HELPER_3(cxlgb, i64, env, i64, i32)
+DEF_HELPER_3(cxlgb, i128, env, i64, i32)
 DEF_HELPER_4(cdsg, void, env, i64, i32, i32)
 DEF_HELPER_4(cdsg_parallel, void, env, i64, i32, i32)
 DEF_HELPER_4(csst, i32, env, i32, i64, i64)
 DEF_HELPER_4(csst_parallel, i32, env, i32, i64, i64)
 DEF_HELPER_FLAGS_3(aeb, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(adb, TCG_CALL_NO_WG, i64, env, i64, i64)
-DEF_HELPER_FLAGS_5(axb, TCG_CALL_NO_WG, i64, env, i64, i64, i64, i64)
+DEF_HELPER_FLAGS_5(axb, TCG_CALL_NO_WG, i128, env, i64, i64, i64, i64)
 DEF_HELPER_FLAGS_3(seb, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(sdb, TCG_CALL_NO_WG, i64, env, i64, i64)
-DEF_HELPER_FLAGS_5(sxb, TCG_CALL_NO_WG, i64, env, i64, i64, i64, i64)
+DEF_HELPER_FLAGS_5(sxb, TCG_CALL_NO_WG, i128, env, i64, i64, i64, i64)
 DEF_HELPER_FLAGS_3(deb, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(ddb, TCG_CALL_NO_WG, i64, env, i64, i64)
-DEF_HELPER_FLAGS_5(dxb, TCG_CALL_NO_WG, i64, env, i64, i64, i64, i64)
+DEF_HELPER_FLAGS_5(dxb, TCG_CALL_NO_WG, i128, env, i64, i64, i64, i64)
 DEF_HELPER_FLAGS_3(meeb, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(mdeb, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(mdb, TCG_CALL_NO_WG, i64, env, i64, i64)
-DEF_HELPER_FLAGS_5(mxb, TCG_CALL_NO_WG, i64, env, i64, i64, i64, i64)
-DEF_HELPER_FLAGS_4(mxdb, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
+DEF_HELPER_FLAGS_5(mxb, TCG_CALL_NO_WG, i128, env, i64, i64, i64, i64)
+DEF_HELPER_FLAGS_4(mxdb, TCG_CALL_NO_WG, i128, env, i64, i64, i64)
 DEF_HELPER_FLAGS_2(ldeb, TCG_CALL_NO_WG, i64, env, i64)
 DEF_HELPER_FLAGS_4(ldxb, TCG_CALL_NO_WG, i64, env, i64, i64, i32)
-DEF_HELPER_FLAGS_2(lxdb, TCG_CALL_NO_WG, i64, env, i64)
-DEF_HELPER_FLAGS_2(lxeb, TCG_CALL_NO_WG, i64, env, i64)
+DEF_HELPER_FLAGS_2(lxdb, TCG_CALL_NO_WG, i128, env, i64)
+DEF_HELPER_FLAGS_2(lxeb, TCG_CALL_NO_WG, i128, env, i64)
 DEF_HELPER_FLAGS_3(ledb, TCG_CALL_NO_WG, i64, env, i64, i32)
 DEF_HELPER_FLAGS_4(lexb, TCG_CALL_NO_WG, i64, env, i64, i64, i32)
 DEF_HELPER_FLAGS_3(ceb, TCG_CALL_NO_WG_SE, i32, env, i64, i64)
@@ -79,7 +79,7 @@ DEF_HELPER_3(clfdb, i64, env, i64, i32)
 DEF_HELPER_4(clfxb, i64, env, i64, i64, i32)
 DEF_HELPER_FLAGS_3(fieb, TCG_CALL_NO_WG, i64, env, i64, i32)
 DEF_HELPER_FLAGS_3(fidb, TCG_CALL_NO_WG, i64, env, i64, i32)
-DEF_HELPER_FLAGS_4(fixb, TCG_CALL_NO_WG, i64, env, i64, i64, i32)
+DEF_HELPER_FLAGS_4(fixb, TCG_CALL_NO_WG, i128, env, i64, i64, i32)
 DEF_HELPER_FLAGS_4(maeb, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
 DEF_HELPER_FLAGS_4(madb, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
 DEF_HELPER_FLAGS_4(mseb, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
@@ -89,7 +89,7 @@ DEF_HELPER_FLAGS_3(tcdb, TCG_CALL_NO_RWG_SE, i32, env, i64, 
i64)
 DEF_HELPER_FLAGS_4(tcxb, TCG_CALL_NO_RWG_SE, i32, env, i64, i64, i64)
 DEF_HELPER_FLAGS_2(sqeb, TCG_CALL_NO_WG, i64, env, i64)
 DEF_HELPER_FLAGS_2(sqdb, TCG_CALL_NO_WG, i64, env, i64)
-DEF_HELPER_FLAGS_3(sqxb, TCG_CALL_NO_WG, i64, env, i64, i64)
+DEF_HELPER_FLAGS_3(sqxb, TCG_CALL_NO_WG, i128, env, i64, i64)
 DEF_HELPER_FLAGS_1(cvd, TCG_CALL_NO_RWG_SE, i64, s32)
 DEF_HELPER_FLAGS_4(pack, TCG_CALL_NO_WG, void, env, i32, i64, i64)
 DEF_HELPER_FLAGS_4(pka, TCG_CALL_NO_WG, void, env, i64, i64, i32)
diff --git a/target/s390x/tcg/insn-data.h.inc b/target/s390x/tcg/insn-data.h.inc
index d0814cb218..517a4500ae 100644
--- a/target/s390x/tcg/insn-data.h.inc
+++ b/target/s390x/tcg/insn-data.h.inc
@@ -306,10 +306,10 @@
 /* CONVERT FROM FIXED */
 F(0xb394, CEFBR,   RRF_e, Z,   0, r2_32s, new, e1, cegb, 0, IF_BFP)
 F(0xb395, CDFBR,   RRF_e, Z,   0, r2_32s, new, f1, cdgb, 0, IF_BFP)
-F(0xb396, CXFBR,   RRF_e, Z,   0, r2_32s, new_P, x1, cxgb, 0, IF_BFP)
+F(0xb396, CXFBR,   RRF_e, Z,   0, r2_32s, new_x, x1, cxgb, 0, IF_BFP)
 F(0xb3a4, CEGBR,   RRF_e, Z,   0, r2_o, new, e1, cegb, 0, IF_BFP)
 F(0xb3a5, CDGBR,   RRF_e, Z,   0, r2_o, new, f1, cdgb, 0, IF_BFP)
-F(0xb3a6, CXGBR,   RRF_e, Z,   0, r2_o, new_P, x1, cxgb, 0, IF_BFP)
+F(0xb3a6, CXGBR,   RRF_e, Z,   0, r2_o, new_x, x1, cxgb, 0, IF_B

[PULL 01/40] accel/tcg: Test CPUJumpCache in tb_jmp_cache_clear_page

2023-02-04 Thread Richard Henderson

From: Eric Auger 

After commit 4e4fa6c12d ("accel/tcg: Complete cpu initialization
before registration"), it looks the CPUJumpCache pointer can be NULL.
This causes a SIGSEV when running debug-wp-migration kvm unit test.

At the first place it should be clarified why this TCG code is called
with KVM acceleration. This may hide another bug.

Fixes: 4e4fa6c12d ("accel/tcg: Complete cpu initialization before registration")
Signed-off-by: Eric Auger 
Message-Id: <20230203171510.2867451-1-eric.au...@redhat.com>
Signed-off-by: Richard Henderson 
---
 accel/tcg/cputlb.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 4e040a1cb9..04e270742e 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -100,9 +100,14 @@ static void tlb_window_reset(CPUTLBDesc *desc, int64_t ns,
 
 static void tb_jmp_cache_clear_page(CPUState *cpu, target_ulong page_addr)
 {
-int i, i0 = tb_jmp_cache_hash_page(page_addr);
 CPUJumpCache *jc = cpu->tb_jmp_cache;
+int i, i0;
 
+if (unlikely(!jc)) {
+return;
+}
+
+i0 = tb_jmp_cache_hash_page(page_addr);
 for (i = 0; i < TB_JMP_PAGE_SIZE; i++) {
 qatomic_set(&jc->array[i0 + i].tb, NULL);
 }
-- 
2.34.1

[PULL 30/40] target/s390x: Use Int128 for return from CKSM

2023-02-04 Thread Richard Henderson

Acked-by: Ilya Leoshkevich 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/s390x/helper.h | 2 +-
 target/s390x/tcg/mem_helper.c | 7 +++
 target/s390x/tcg/translate.c  | 6 --
 3 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 25c2dd0b3c..03b29efa3e 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -103,7 +103,7 @@ DEF_HELPER_4(tre, i64, env, i64, i64, i64)
 DEF_HELPER_4(trt, i32, env, i32, i64, i64)
 DEF_HELPER_4(trtr, i32, env, i32, i64, i64)
 DEF_HELPER_5(trXX, i32, env, i32, i32, i32, i32)
-DEF_HELPER_4(cksm, i64, env, i64, i64, i64)
+DEF_HELPER_4(cksm, i128, env, i64, i64, i64)
 DEF_HELPER_FLAGS_5(calc_cc, TCG_CALL_NO_RWG_SE, i32, env, i32, i64, i64, i64)
 DEF_HELPER_FLAGS_2(sfpc, TCG_CALL_NO_WG, void, env, i64)
 DEF_HELPER_FLAGS_2(sfas, TCG_CALL_NO_WG, void, env, i64)
diff --git a/target/s390x/tcg/mem_helper.c b/target/s390x/tcg/mem_helper.c
index 9be42851d8..b0b403e23a 100644
--- a/target/s390x/tcg/mem_helper.c
+++ b/target/s390x/tcg/mem_helper.c
@@ -1350,8 +1350,8 @@ uint32_t HELPER(clclu)(CPUS390XState *env, uint32_t r1, 
uint64_t a2,
 }
 
 /* checksum */
-uint64_t HELPER(cksm)(CPUS390XState *env, uint64_t r1,
-  uint64_t src, uint64_t src_len)
+Int128 HELPER(cksm)(CPUS390XState *env, uint64_t r1,
+uint64_t src, uint64_t src_len)
 {
 uintptr_t ra = GETPC();
 uint64_t max_len, len;
@@ -1392,8 +1392,7 @@ uint64_t HELPER(cksm)(CPUS390XState *env, uint64_t r1,
 env->cc_op = (len == src_len ? 0 : 3);
 
 /* Return both cksm and processed length.  */
-env->retxl = cksm;
-return len;
+return int128_make128(cksm, len);
 }
 
 void HELPER(pack)(CPUS390XState *env, uint32_t len, uint64_t dest, uint64_t 
src)
diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
index 8397fe2bd8..1a7aa9e4ae 100644
--- a/target/s390x/tcg/translate.c
+++ b/target/s390x/tcg/translate.c
@@ -2041,11 +2041,13 @@ static DisasJumpType op_cxlgb(DisasContext *s, DisasOps 
*o)
 static DisasJumpType op_cksm(DisasContext *s, DisasOps *o)
 {
 int r2 = get_field(s, r2);
+TCGv_i128 pair = tcg_temp_new_i128();
 TCGv_i64 len = tcg_temp_new_i64();
 
-gen_helper_cksm(len, cpu_env, o->in1, o->in2, regs[r2 + 1]);
+gen_helper_cksm(pair, cpu_env, o->in1, o->in2, regs[r2 + 1]);
 set_cc_static(s);
-return_low128(o->out);
+tcg_gen_extr_i128_i64(o->out, len, pair);
+tcg_temp_free_i128(pair);
 
 tcg_gen_add_i64(regs[r2], regs[r2], len);
 tcg_gen_sub_i64(regs[r2 + 1], regs[r2 + 1], len);
-- 
2.34.1

[PULL 00/40] tcg patch queue

2023-02-04 Thread Richard Henderson

The following changes since commit 579510e196a544b42bd8bca9cc61688d4d1211ac:

  Merge tag 'pull-monitor-2023-02-03-v2' of https://repo.or.cz/qemu/armbru into 
staging (2023-02-04 10:19:55 +)

are available in the Git repository at:

  https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20230204

for you to fetch changes up to a2495ede07498ee36b18b03e7038ba30c9871bb2:

  tcg/aarch64: Fix patching of LDR in tb_target_set_jmp_target (2023-02-04 
06:19:43 -1000)


tcg: Add support for TCGv_i128 in parameters and returns.
tcg: Add support for TCGv_i128 in cmpxchg.
tcg: Test CPUJumpCache in tb_jmp_cache_clear_page
tcg: Split out tcg_gen_nonatomic_cmpxchg_i{32,64}
tcg/aarch64: Fix patching of LDR in tb_target_set_jmp_target
target/arm: Use tcg_gen_atomic_cmpxchg_i128
target/i386: Use tcg_gen_atomic_cmpxchg_i128
target/i386: Use tcg_gen_nonatomic_cmpxchg_i{32,64}
target/s390x: Use tcg_gen_atomic_cmpxchg_i128
target/s390x: Use TCGv_i128 in passing and returning float128
target/s390x: Implement CC_OP_NZ in gen_op_calc_cc


Eric Auger (1):
  accel/tcg: Test CPUJumpCache in tb_jmp_cache_clear_page

Ilya Leoshkevich (3):
  tests/tcg/s390x: Add div.c
  tests/tcg/s390x: Add clst.c
  tests/tcg/s390x: Add cdsg.c

Richard Henderson (36):
  tcg: Init temp_subindex in liveness_pass_2
  tcg: Define TCG_TYPE_I128 and related helper macros
  tcg: Handle dh_typecode_i128 with TCG_CALL_{RET,ARG}_NORMAL
  tcg: Allocate objects contiguously in temp_allocate_frame
  tcg: Introduce tcg_out_addi_ptr
  tcg: Add TCG_CALL_{RET,ARG}_BY_REF
  tcg: Introduce tcg_target_call_oarg_reg
  tcg: Add TCG_CALL_RET_BY_VEC
  include/qemu/int128: Use Int128 structure for TCI
  tcg/i386: Add TCG_TARGET_CALL_{RET,ARG}_I128
  tcg/tci: Fix big-endian return register ordering
  tcg/tci: Add TCG_TARGET_CALL_{RET,ARG}_I128
  tcg: Add TCG_TARGET_CALL_{RET,ARG}_I128
  tcg: Add temp allocation for TCGv_i128
  tcg: Add basic data movement for TCGv_i128
  tcg: Add guest load/store primitives for TCGv_i128
  tcg: Add tcg_gen_{non}atomic_cmpxchg_i128
  tcg: Split out tcg_gen_nonatomic_cmpxchg_i{32,64}
  target/arm: Use tcg_gen_atomic_cmpxchg_i128 for STXP
  target/arm: Use tcg_gen_atomic_cmpxchg_i128 for CASP
  target/ppc: Use tcg_gen_atomic_cmpxchg_i128 for STQCX
  tests/tcg/s390x: Add long-double.c
  target/s390x: Use a single return for helper_divs32/u32
  target/s390x: Use a single return for helper_divs64/u64
  target/s390x: Use Int128 for return from CLST
  target/s390x: Use Int128 for return from CKSM
  target/s390x: Use Int128 for return from TRE
  target/s390x: Copy wout_x1 to wout_x1_P
  target/s390x: Use Int128 for returning float128
  target/s390x: Use Int128 for passing float128
  target/s390x: Use tcg_gen_atomic_cmpxchg_i128 for CDSG
  target/s390x: Implement CC_OP_NZ in gen_op_calc_cc
  target/i386: Split out gen_cmpxchg8b, gen_cmpxchg16b
  target/i386: Inline cmpxchg8b
  target/i386: Inline cmpxchg16b
  tcg/aarch64: Fix patching of LDR in tb_target_set_jmp_target

 accel/tcg/tcg-runtime.h  |  11 ++
 include/exec/cpu_ldst.h  |  10 +
 include/exec/helper-head.h   |   7 +
 include/qemu/atomic128.h |  29 ++-
 include/qemu/int128.h|  25 ++-
 include/tcg/tcg-op.h |  15 ++
 include/tcg/tcg.h|  49 -
 target/arm/helper-a64.h  |   8 -
 target/i386/helper.h |   6 -
 target/ppc/helper.h  |   2 -
 target/s390x/helper.h|  54 +++---
 tcg/aarch64/tcg-target.h |   2 +
 tcg/arm/tcg-target.h |   2 +
 tcg/i386/tcg-target.h|  10 +
 tcg/loongarch64/tcg-target.h |   2 +
 tcg/mips/tcg-target.h|   2 +
 tcg/riscv/tcg-target.h   |   3 +
 tcg/s390x/tcg-target.h   |   2 +
 tcg/sparc64/tcg-target.h |   2 +
 tcg/tcg-internal.h   |  17 ++
 tcg/tci/tcg-target.h |   3 +
 target/s390x/tcg/insn-data.h.inc |  60 +++---
 accel/tcg/cputlb.c   | 119 +++-
 accel/tcg/user-exec.c|  66 +++
 target/arm/helper-a64.c  | 147 ---
 target/arm/translate-a64.c   | 121 ++--
 target/i386/tcg/mem_helper.c | 126 -
 target/i386/tcg/translate.c  | 126 +++--
 target/ppc/mem_helper.c  |  44 -
 target/ppc/translate.c   | 102 +-
 target/s390x/tcg/fpu_helper.c| 103 +-
 target/s390x/tcg/int_helper.c|  64 +++
 target/s390x/tcg/mem_helper.c|  77 +---
 target/s390x/tcg/translate.c | 212 ++---
 tcg/tcg-op.c | 393 +--
 tcg/tcg.c| 308 ++

[PULL 02/40] tcg: Init temp_subindex in liveness_pass_2

2023-02-04 Thread Richard Henderson

Correctly handle large types while lowering.

Fixes: fac87bd2a49b ("tcg: Add temp_subindex to TCGTemp")
Signed-off-by: Richard Henderson 
---
 tcg/tcg.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index fd557d55d3..bc60fd0fe8 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -3063,6 +3063,7 @@ static bool liveness_pass_2(TCGContext *s)
 TCGTemp *dts = tcg_temp_alloc(s);
 dts->type = its->type;
 dts->base_type = its->base_type;
+dts->temp_subindex = its->temp_subindex;
 dts->kind = TEMP_EBB;
 its->state_ptr = dts;
 } else {
-- 
2.34.1

[PULL 13/40] tcg/tci: Add TCG_TARGET_CALL_{RET,ARG}_I128

2023-02-04 Thread Richard Henderson

Fill in the parameters for libffi for Int128.
Adjust the interpreter to allow for 16-byte return values.
Adjust tcg_out_call to record the return value length.

Call parameters are no longer all the same size, so we
cannot reuse the same call_slots array for every function.
Compute it each time now, but only fill in slots required
for the call we're about to make.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 tcg/tci/tcg-target.h |  3 +++
 tcg/tcg.c| 19 +
 tcg/tci.c| 44 
 tcg/tci/tcg-target.c.inc | 10 -
 4 files changed, 49 insertions(+), 27 deletions(-)

diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 1414ab4d5b..7140a76a73 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -160,10 +160,13 @@ typedef enum {
 #if TCG_TARGET_REG_BITS == 32
 # define TCG_TARGET_CALL_ARG_I32TCG_CALL_ARG_EVEN
 # define TCG_TARGET_CALL_ARG_I64TCG_CALL_ARG_EVEN
+# define TCG_TARGET_CALL_ARG_I128   TCG_CALL_ARG_EVEN
 #else
 # define TCG_TARGET_CALL_ARG_I32TCG_CALL_ARG_NORMAL
 # define TCG_TARGET_CALL_ARG_I64TCG_CALL_ARG_NORMAL
+# define TCG_TARGET_CALL_ARG_I128   TCG_CALL_ARG_NORMAL
 #endif
+#define TCG_TARGET_CALL_RET_I128TCG_CALL_RET_NORMAL
 
 #define HAVE_TCG_QEMU_TB_EXEC
 #define TCG_TARGET_NEED_POOL_LABELS
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 098be83b00..865ed5ea0f 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -570,6 +570,22 @@ static GHashTable *helper_table;
 #ifdef CONFIG_TCG_INTERPRETER
 static ffi_type *typecode_to_ffi(int argmask)
 {
+/*
+ * libffi does not support __int128_t, so we have forced Int128
+ * to use the structure definition instead of the builtin type.
+ */
+static ffi_type *ffi_type_i128_elements[3] = {
+&ffi_type_uint64,
+&ffi_type_uint64,
+NULL
+};
+static ffi_type ffi_type_i128 = {
+.size = 16,
+.alignment = __alignof__(Int128),
+.type = FFI_TYPE_STRUCT,
+.elements = ffi_type_i128_elements,
+};
+
 switch (argmask) {
 case dh_typecode_void:
 return &ffi_type_void;
@@ -583,6 +599,8 @@ static ffi_type *typecode_to_ffi(int argmask)
 return &ffi_type_sint64;
 case dh_typecode_ptr:
 return &ffi_type_pointer;
+case dh_typecode_i128:
+return &ffi_type_i128;
 }
 g_assert_not_reached();
 }
@@ -613,6 +631,7 @@ static void init_ffi_layouts(void)
 /* Ignoring the return type, find the last non-zero field. */
 nargs = 32 - clz32(typemask >> 3);
 nargs = DIV_ROUND_UP(nargs, 3);
+assert(nargs <= MAX_CALL_IARGS);
 
 ca = g_malloc0(sizeof(*ca) + nargs * sizeof(ffi_type *));
 ca->cif.rtype = typecode_to_ffi(typemask & 7);
diff --git a/tcg/tci.c b/tcg/tci.c
index eeccdde8bc..022fe9d0f8 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -470,12 +470,9 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 tcg_target_ulong regs[TCG_TARGET_NB_REGS];
 uint64_t stack[(TCG_STATIC_CALL_ARGS_SIZE + TCG_STATIC_FRAME_SIZE)
/ sizeof(uint64_t)];
-void *call_slots[TCG_STATIC_CALL_ARGS_SIZE / sizeof(uint64_t)];
 
 regs[TCG_AREG0] = (tcg_target_ulong)env;
 regs[TCG_REG_CALL_STACK] = (uintptr_t)stack;
-/* Other call_slots entries initialized at first use (see below). */
-call_slots[0] = NULL;
 tci_assert(tb_ptr);
 
 for (;;) {
@@ -498,26 +495,26 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 
 switch (opc) {
 case INDEX_op_call:
-/*
- * Set up the ffi_avalue array once, delayed until now
- * because many TB's do not make any calls. In tcg_gen_callN,
- * we arranged for every real argument to be "left-aligned"
- * in each 64-bit slot.
- */
-if (unlikely(call_slots[0] == NULL)) {
-for (int i = 0; i < ARRAY_SIZE(call_slots); ++i) {
-call_slots[i] = &stack[i];
-}
-}
-
-tci_args_nl(insn, tb_ptr, &len, &ptr);
-
-/* Helper functions may need to access the "return address" */
-tci_tb_ptr = (uintptr_t)tb_ptr;
-
 {
-void **pptr = ptr;
-ffi_call(pptr[1], pptr[0], stack, call_slots);
+void *call_slots[MAX_CALL_IARGS];
+ffi_cif *cif;
+void *func;
+unsigned i, s, n;
+
+tci_args_nl(insn, tb_ptr, &len, &ptr);
+func = ((void **)ptr)[0];
+cif = ((void **)ptr)[1];
+
+n = cif->nargs;
+for (i = s = 0; i < n; ++i) {
+ffi_type *t = cif->arg_types[i];
+call_slots[i] = &stack[s];
+s += DIV_ROUND_UP(t->size, 8);
+}
+
+/* Helper func

[PULL 31/40] target/s390x: Use Int128 for return from TRE

2023-02-04 Thread Richard Henderson

Acked-by: Ilya Leoshkevich 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/s390x/helper.h | 2 +-
 target/s390x/tcg/mem_helper.c | 7 +++
 target/s390x/tcg/translate.c  | 7 +--
 3 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 03b29efa3e..b4170a4256 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -99,7 +99,7 @@ DEF_HELPER_FLAGS_4(unpka, TCG_CALL_NO_WG, i32, env, i64, i32, 
i64)
 DEF_HELPER_FLAGS_4(unpku, TCG_CALL_NO_WG, i32, env, i64, i32, i64)
 DEF_HELPER_FLAGS_3(tp, TCG_CALL_NO_WG, i32, env, i64, i32)
 DEF_HELPER_FLAGS_4(tr, TCG_CALL_NO_WG, void, env, i32, i64, i64)
-DEF_HELPER_4(tre, i64, env, i64, i64, i64)
+DEF_HELPER_4(tre, i128, env, i64, i64, i64)
 DEF_HELPER_4(trt, i32, env, i32, i64, i64)
 DEF_HELPER_4(trtr, i32, env, i32, i64, i64)
 DEF_HELPER_5(trXX, i32, env, i32, i32, i32, i32)
diff --git a/target/s390x/tcg/mem_helper.c b/target/s390x/tcg/mem_helper.c
index b0b403e23a..49969abda7 100644
--- a/target/s390x/tcg/mem_helper.c
+++ b/target/s390x/tcg/mem_helper.c
@@ -1632,8 +1632,8 @@ void HELPER(tr)(CPUS390XState *env, uint32_t len, 
uint64_t array,
 do_helper_tr(env, len, array, trans, GETPC());
 }
 
-uint64_t HELPER(tre)(CPUS390XState *env, uint64_t array,
- uint64_t len, uint64_t trans)
+Int128 HELPER(tre)(CPUS390XState *env, uint64_t array,
+   uint64_t len, uint64_t trans)
 {
 uintptr_t ra = GETPC();
 uint8_t end = env->regs[0] & 0xff;
@@ -1668,8 +1668,7 @@ uint64_t HELPER(tre)(CPUS390XState *env, uint64_t array,
 }
 
 env->cc_op = cc;
-env->retxl = len - i;
-return array + i;
+return int128_make128(len - i, array + i);
 }
 
 static inline uint32_t do_helper_trt(CPUS390XState *env, int len,
diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
index 1a7aa9e4ae..f3e4b70ed9 100644
--- a/target/s390x/tcg/translate.c
+++ b/target/s390x/tcg/translate.c
@@ -4905,8 +4905,11 @@ static DisasJumpType op_tr(DisasContext *s, DisasOps *o)
 
 static DisasJumpType op_tre(DisasContext *s, DisasOps *o)
 {
-gen_helper_tre(o->out, cpu_env, o->out, o->out2, o->in2);
-return_low128(o->out2);
+TCGv_i128 pair = tcg_temp_new_i128();
+
+gen_helper_tre(pair, cpu_env, o->out, o->out2, o->in2);
+tcg_gen_extr_i128_i64(o->out2, o->out, pair);
+tcg_temp_free_i128(pair);
 set_cc_static(s);
 return DISAS_NEXT;
 }
-- 
2.34.1

[PULL 40/40] tcg/aarch64: Fix patching of LDR in tb_target_set_jmp_target

2023-02-04 Thread Richard Henderson

'offset' should be bits [23:5] of LDR instruction, rather than [4:0].

Fixes: d59d83a1c388 ("tcg/aarch64: Reorg goto_tb implementation")
Reviewed-by: Zenghui Yu 
Reported-by: Zenghui Yu 
Signed-off-by: Richard Henderson 
---
 tcg/aarch64/tcg-target.c.inc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index fde3b30ad1..a091326f84 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -1914,7 +1914,7 @@ void tb_target_set_jmp_target(const TranslationBlock *tb, 
int n,
 ptrdiff_t i_offset = i_addr - jmp_rx;
 
 /* Note that we asserted this in range in tcg_out_goto_tb. */
-insn = deposit32(I3305_LDR | TCG_REG_TMP, 0, 5, i_offset >> 2);
+insn = deposit32(I3305_LDR | TCG_REG_TMP, 5, 19, i_offset >> 2);
 }
 qatomic_set((uint32_t *)jmp_rw, insn);
 flush_idcache_range(jmp_rx, jmp_rw, 4);
-- 
2.34.1

[PULL 03/40] tcg: Define TCG_TYPE_I128 and related helper macros

2023-02-04 Thread Richard Henderson

Begin staging in support for TCGv_i128 with Int128.
Define the type enumerator, the typedef, and the
helper-head.h macros.

This cannot yet be used, because you can't allocate
temporaries of this new type.

Reviewed-by: Alex Bennée 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 include/exec/helper-head.h |  7 +++
 include/tcg/tcg.h  | 17 ++---
 2 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/include/exec/helper-head.h b/include/exec/helper-head.h
index bc6698b19f..b8d1140dc7 100644
--- a/include/exec/helper-head.h
+++ b/include/exec/helper-head.h
@@ -26,6 +26,7 @@
 #define dh_alias_int i32
 #define dh_alias_i64 i64
 #define dh_alias_s64 i64
+#define dh_alias_i128 i128
 #define dh_alias_f16 i32
 #define dh_alias_f32 i32
 #define dh_alias_f64 i64
@@ -40,6 +41,7 @@
 #define dh_ctype_int int
 #define dh_ctype_i64 uint64_t
 #define dh_ctype_s64 int64_t
+#define dh_ctype_i128 Int128
 #define dh_ctype_f16 uint32_t
 #define dh_ctype_f32 float32
 #define dh_ctype_f64 float64
@@ -71,6 +73,7 @@
 #define dh_retvar_decl0_noreturn void
 #define dh_retvar_decl0_i32 TCGv_i32 retval
 #define dh_retvar_decl0_i64 TCGv_i64 retval
+#define dh_retval_decl0_i128 TCGv_i128 retval
 #define dh_retvar_decl0_ptr TCGv_ptr retval
 #define dh_retvar_decl0(t) glue(dh_retvar_decl0_, dh_alias(t))
 
@@ -78,6 +81,7 @@
 #define dh_retvar_decl_noreturn
 #define dh_retvar_decl_i32 TCGv_i32 retval,
 #define dh_retvar_decl_i64 TCGv_i64 retval,
+#define dh_retvar_decl_i128 TCGv_i128 retval,
 #define dh_retvar_decl_ptr TCGv_ptr retval,
 #define dh_retvar_decl(t) glue(dh_retvar_decl_, dh_alias(t))
 
@@ -85,6 +89,7 @@
 #define dh_retvar_noreturn NULL
 #define dh_retvar_i32 tcgv_i32_temp(retval)
 #define dh_retvar_i64 tcgv_i64_temp(retval)
+#define dh_retvar_i128 tcgv_i128_temp(retval)
 #define dh_retvar_ptr tcgv_ptr_temp(retval)
 #define dh_retvar(t) glue(dh_retvar_, dh_alias(t))
 
@@ -95,6 +100,7 @@
 #define dh_typecode_i64 4
 #define dh_typecode_s64 5
 #define dh_typecode_ptr 6
+#define dh_typecode_i128 7
 #define dh_typecode_int dh_typecode_s32
 #define dh_typecode_f16 dh_typecode_i32
 #define dh_typecode_f32 dh_typecode_i32
@@ -104,6 +110,7 @@
 
 #define dh_callflag_i32  0
 #define dh_callflag_i64  0
+#define dh_callflag_i128 0
 #define dh_callflag_ptr  0
 #define dh_callflag_void 0
 #define dh_callflag_noreturn TCG_CALL_NO_RETURN
diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index c5112da0ef..4d7e4107a9 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -270,6 +270,7 @@ typedef struct TCGPool {
 typedef enum TCGType {
 TCG_TYPE_I32,
 TCG_TYPE_I64,
+TCG_TYPE_I128,
 
 TCG_TYPE_V64,
 TCG_TYPE_V128,
@@ -351,13 +352,14 @@ typedef tcg_target_ulong TCGArg;
in tcg/README. Target CPU front-end code uses these types to deal
with TCG variables as it emits TCG code via the tcg_gen_* functions.
They come in several flavours:
-* TCGv_i32 : 32 bit integer type
-* TCGv_i64 : 64 bit integer type
-* TCGv_ptr : a host pointer type
-* TCGv_vec : a host vector type; the exact size is not exposed
- to the CPU front-end code.
-* TCGv : an integer type the same size as target_ulong
- (an alias for either TCGv_i32 or TCGv_i64)
+* TCGv_i32  : 32 bit integer type
+* TCGv_i64  : 64 bit integer type
+* TCGv_i128 : 128 bit integer type
+* TCGv_ptr  : a host pointer type
+* TCGv_vec  : a host vector type; the exact size is not exposed
+  to the CPU front-end code.
+* TCGv  : an integer type the same size as target_ulong
+  (an alias for either TCGv_i32 or TCGv_i64)
The compiler's type checking will complain if you mix them
up and pass the wrong sized TCGv to a function.
 
@@ -377,6 +379,7 @@ typedef tcg_target_ulong TCGArg;
 
 typedef struct TCGv_i32_d *TCGv_i32;
 typedef struct TCGv_i64_d *TCGv_i64;
+typedef struct TCGv_i128_d *TCGv_i128;
 typedef struct TCGv_ptr_d *TCGv_ptr;
 typedef struct TCGv_vec_d *TCGv_vec;
 typedef TCGv_ptr TCGv_env;
-- 
2.34.1

[PULL 26/40] tests/tcg/s390x: Add cdsg.c

2023-02-04 Thread Richard Henderson

From: Ilya Leoshkevich 

Add a simple test to prevent regressions.

Signed-off-by: Ilya Leoshkevich 
Message-Id: <20230201133257.3223115-1-...@linux.ibm.com>
Signed-off-by: Richard Henderson 
---
 tests/tcg/s390x/cdsg.c  | 93 +
 tests/tcg/s390x/Makefile.target |  4 ++
 2 files changed, 97 insertions(+)
 create mode 100644 tests/tcg/s390x/cdsg.c

diff --git a/tests/tcg/s390x/cdsg.c b/tests/tcg/s390x/cdsg.c
new file mode 100644
index 00..800618ff4b
--- /dev/null
+++ b/tests/tcg/s390x/cdsg.c
@@ -0,0 +1,93 @@
+/*
+ * Test CDSG instruction.
+ *
+ * Increment the first half of aligned_quadword by 1, and the second half by 2
+ * from 2 threads. Verify that the result is consistent.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+#include 
+#include 
+#include 
+#include 
+
+static volatile bool start;
+typedef unsigned long aligned_quadword[2] __attribute__((__aligned__(16)));
+static aligned_quadword val;
+static const int n_iterations = 100;
+
+static inline int cdsg(unsigned long *orig0, unsigned long *orig1,
+   unsigned long new0, unsigned long new1,
+   aligned_quadword *mem)
+{
+register unsigned long r0 asm("r0");
+register unsigned long r1 asm("r1");
+register unsigned long r2 asm("r2");
+register unsigned long r3 asm("r3");
+int cc;
+
+r0 = *orig0;
+r1 = *orig1;
+r2 = new0;
+r3 = new1;
+asm("cdsg %[r0],%[r2],%[db2]\n"
+"ipm %[cc]"
+: [r0] "+r" (r0)
+, [r1] "+r" (r1)
+, [db2] "+m" (*mem)
+, [cc] "=r" (cc)
+: [r2] "r" (r2)
+, [r3] "r" (r3)
+: "cc");
+*orig0 = r0;
+*orig1 = r1;
+
+return (cc >> 28) & 3;
+}
+
+void *cdsg_loop(void *arg)
+{
+unsigned long orig0, orig1, new0, new1;
+int cc;
+int i;
+
+while (!start) {
+}
+
+orig0 = val[0];
+orig1 = val[1];
+for (i = 0; i < n_iterations;) {
+new0 = orig0 + 1;
+new1 = orig1 + 2;
+
+cc = cdsg(&orig0, &orig1, new0, new1, &val);
+
+if (cc == 0) {
+orig0 = new0;
+orig1 = new1;
+i++;
+} else {
+assert(cc == 1);
+}
+}
+
+return NULL;
+}
+
+int main(void)
+{
+pthread_t thread;
+int ret;
+
+ret = pthread_create(&thread, NULL, cdsg_loop, NULL);
+assert(ret == 0);
+start = true;
+cdsg_loop(NULL);
+ret = pthread_join(thread, NULL);
+assert(ret == 0);
+
+assert(val[0] == n_iterations * 2);
+assert(val[1] == n_iterations * 4);
+
+return EXIT_SUCCESS;
+}
diff --git a/tests/tcg/s390x/Makefile.target b/tests/tcg/s390x/Makefile.target
index 1d454270c0..72ad309b27 100644
--- a/tests/tcg/s390x/Makefile.target
+++ b/tests/tcg/s390x/Makefile.target
@@ -27,6 +27,10 @@ TESTS+=noexec
 TESTS+=div
 TESTS+=clst
 TESTS+=long-double
+TESTS+=cdsg
+
+cdsg: CFLAGS+=-pthread
+cdsg: LDFLAGS+=-pthread
 
 Z13_TESTS=vistr
 $(Z13_TESTS): CFLAGS+=-march=z13 -O2
-- 
2.34.1

[PULL 16/40] tcg: Add basic data movement for TCGv_i128

2023-02-04 Thread Richard Henderson

Add code generation functions for data movement between
TCGv_i128 (mov) and to/from TCGv_i64 (concat, extract).

Reviewed-by: Alex Bennée 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 include/tcg/tcg-op.h |  4 
 tcg/tcg-internal.h   | 13 +
 tcg/tcg-op.c | 20 
 3 files changed, 37 insertions(+)

diff --git a/include/tcg/tcg-op.h b/include/tcg/tcg-op.h
index 79b1cf786f..c4276767d1 100644
--- a/include/tcg/tcg-op.h
+++ b/include/tcg/tcg-op.h
@@ -712,6 +712,10 @@ void tcg_gen_extrh_i64_i32(TCGv_i32 ret, TCGv_i64 arg);
 void tcg_gen_extr_i64_i32(TCGv_i32 lo, TCGv_i32 hi, TCGv_i64 arg);
 void tcg_gen_extr32_i64(TCGv_i64 lo, TCGv_i64 hi, TCGv_i64 arg);
 
+void tcg_gen_mov_i128(TCGv_i128 dst, TCGv_i128 src);
+void tcg_gen_extr_i128_i64(TCGv_i64 lo, TCGv_i64 hi, TCGv_i128 arg);
+void tcg_gen_concat_i64_i128(TCGv_i128 ret, TCGv_i64 lo, TCGv_i64 hi);
+
 static inline void tcg_gen_concat32_i64(TCGv_i64 ret, TCGv_i64 lo, TCGv_i64 hi)
 {
 tcg_gen_deposit_i64(ret, lo, hi, 32, 32);
diff --git a/tcg/tcg-internal.h b/tcg/tcg-internal.h
index 33f1d8b411..e542a4e9b7 100644
--- a/tcg/tcg-internal.h
+++ b/tcg/tcg-internal.h
@@ -117,4 +117,17 @@ extern TCGv_i32 TCGV_LOW(TCGv_i64) QEMU_ERROR("32-bit code 
path is reachable");
 extern TCGv_i32 TCGV_HIGH(TCGv_i64) QEMU_ERROR("32-bit code path is 
reachable");
 #endif
 
+static inline TCGv_i64 TCGV128_LOW(TCGv_i128 t)
+{
+/* For 32-bit, offset by 2, which may then have TCGV_{LOW,HIGH} applied. */
+int o = HOST_BIG_ENDIAN ? 64 / TCG_TARGET_REG_BITS : 0;
+return temp_tcgv_i64(tcgv_i128_temp(t) + o);
+}
+
+static inline TCGv_i64 TCGV128_HIGH(TCGv_i128 t)
+{
+int o = HOST_BIG_ENDIAN ? 0 : 64 / TCG_TARGET_REG_BITS;
+return temp_tcgv_i64(tcgv_i128_temp(t) + o);
+}
+
 #endif /* TCG_INTERNAL_H */
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 326a9180ef..cb83d2375d 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -2747,6 +2747,26 @@ void tcg_gen_extr32_i64(TCGv_i64 lo, TCGv_i64 hi, 
TCGv_i64 arg)
 tcg_gen_shri_i64(hi, arg, 32);
 }
 
+void tcg_gen_extr_i128_i64(TCGv_i64 lo, TCGv_i64 hi, TCGv_i128 arg)
+{
+tcg_gen_mov_i64(lo, TCGV128_LOW(arg));
+tcg_gen_mov_i64(hi, TCGV128_HIGH(arg));
+}
+
+void tcg_gen_concat_i64_i128(TCGv_i128 ret, TCGv_i64 lo, TCGv_i64 hi)
+{
+tcg_gen_mov_i64(TCGV128_LOW(ret), lo);
+tcg_gen_mov_i64(TCGV128_HIGH(ret), hi);
+}
+
+void tcg_gen_mov_i128(TCGv_i128 dst, TCGv_i128 src)
+{
+if (dst != src) {
+tcg_gen_mov_i64(TCGV128_LOW(dst), TCGV128_LOW(src));
+tcg_gen_mov_i64(TCGV128_HIGH(dst), TCGV128_HIGH(src));
+}
+}
+
 /* QEMU specific operations.  */
 
 void tcg_gen_exit_tb(const TranslationBlock *tb, unsigned idx)
-- 
2.34.1

[PULL 39/40] target/i386: Inline cmpxchg16b

2023-02-04 Thread Richard Henderson

Use tcg_gen_atomic_cmpxchg_i128 for the atomic case,
and tcg_gen_qemu_ld/st_i128 otherwise.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/i386/helper.h |  4 ---
 target/i386/tcg/mem_helper.c | 69 
 target/i386/tcg/translate.c  | 44 ---
 3 files changed, 39 insertions(+), 78 deletions(-)

diff --git a/target/i386/helper.h b/target/i386/helper.h
index 2df8049f91..e627a93107 100644
--- a/target/i386/helper.h
+++ b/target/i386/helper.h
@@ -66,10 +66,6 @@ DEF_HELPER_1(rsm, void, env)
 #endif /* !CONFIG_USER_ONLY */
 
 DEF_HELPER_2(into, void, env, int)
-#ifdef TARGET_X86_64
-DEF_HELPER_2(cmpxchg16b_unlocked, void, env, tl)
-DEF_HELPER_2(cmpxchg16b, void, env, tl)
-#endif
 DEF_HELPER_FLAGS_1(single_step, TCG_CALL_NO_WG, noreturn, env)
 DEF_HELPER_1(rechecking_single_step, void, env)
 DEF_HELPER_1(cpuid, void, env)
diff --git a/target/i386/tcg/mem_helper.c b/target/i386/tcg/mem_helper.c
index 814786bb87..3ef84e90d9 100644
--- a/target/i386/tcg/mem_helper.c
+++ b/target/i386/tcg/mem_helper.c
@@ -27,75 +27,6 @@
 #include "tcg/tcg.h"
 #include "helper-tcg.h"
 
-#ifdef TARGET_X86_64
-void helper_cmpxchg16b_unlocked(CPUX86State *env, target_ulong a0)
-{
-uintptr_t ra = GETPC();
-Int128 oldv, cmpv, newv;
-uint64_t o0, o1;
-int eflags;
-bool success;
-
-if ((a0 & 0xf) != 0) {
-raise_exception_ra(env, EXCP0D_GPF, GETPC());
-}
-eflags = cpu_cc_compute_all(env, CC_OP);
-
-cmpv = int128_make128(env->regs[R_EAX], env->regs[R_EDX]);
-newv = int128_make128(env->regs[R_EBX], env->regs[R_ECX]);
-
-o0 = cpu_ldq_data_ra(env, a0 + 0, ra);
-o1 = cpu_ldq_data_ra(env, a0 + 8, ra);
-
-oldv = int128_make128(o0, o1);
-success = int128_eq(oldv, cmpv);
-if (!success) {
-newv = oldv;
-}
-
-cpu_stq_data_ra(env, a0 + 0, int128_getlo(newv), ra);
-cpu_stq_data_ra(env, a0 + 8, int128_gethi(newv), ra);
-
-if (success) {
-eflags |= CC_Z;
-} else {
-env->regs[R_EAX] = int128_getlo(oldv);
-env->regs[R_EDX] = int128_gethi(oldv);
-eflags &= ~CC_Z;
-}
-CC_SRC = eflags;
-}
-
-void helper_cmpxchg16b(CPUX86State *env, target_ulong a0)
-{
-uintptr_t ra = GETPC();
-
-if ((a0 & 0xf) != 0) {
-raise_exception_ra(env, EXCP0D_GPF, ra);
-} else if (HAVE_CMPXCHG128) {
-int eflags = cpu_cc_compute_all(env, CC_OP);
-
-Int128 cmpv = int128_make128(env->regs[R_EAX], env->regs[R_EDX]);
-Int128 newv = int128_make128(env->regs[R_EBX], env->regs[R_ECX]);
-
-int mem_idx = cpu_mmu_index(env, false);
-MemOpIdx oi = make_memop_idx(MO_TE | MO_128 | MO_ALIGN, mem_idx);
-Int128 oldv = cpu_atomic_cmpxchgo_le_mmu(env, a0, cmpv, newv, oi, ra);
-
-if (int128_eq(oldv, cmpv)) {
-eflags |= CC_Z;
-} else {
-env->regs[R_EAX] = int128_getlo(oldv);
-env->regs[R_EDX] = int128_gethi(oldv);
-eflags &= ~CC_Z;
-}
-CC_SRC = eflags;
-} else {
-cpu_loop_exit_atomic(env_cpu(env), ra);
-}
-}
-#endif
-
 void helper_boundw(CPUX86State *env, target_ulong a0, int v)
 {
 int low, high;
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index b542b084a6..9d9392b009 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -3053,15 +3053,49 @@ static void gen_cmpxchg8b(DisasContext *s, CPUX86State 
*env, int modrm)
 #ifdef TARGET_X86_64
 static void gen_cmpxchg16b(DisasContext *s, CPUX86State *env, int modrm)
 {
+MemOp mop = MO_TE | MO_128 | MO_ALIGN;
+TCGv_i64 t0, t1;
+TCGv_i128 cmp, val;
+
 gen_lea_modrm(env, s, modrm);
 
-if ((s->prefix & PREFIX_LOCK) &&
-(tb_cflags(s->base.tb) & CF_PARALLEL)) {
-gen_helper_cmpxchg16b(cpu_env, s->A0);
+cmp = tcg_temp_new_i128();
+val = tcg_temp_new_i128();
+tcg_gen_concat_i64_i128(cmp, cpu_regs[R_EAX], cpu_regs[R_EDX]);
+tcg_gen_concat_i64_i128(val, cpu_regs[R_EBX], cpu_regs[R_ECX]);
+
+/* Only require atomic with LOCK; non-parallel handled in generator. */
+if (s->prefix & PREFIX_LOCK) {
+tcg_gen_atomic_cmpxchg_i128(val, s->A0, cmp, val, s->mem_index, mop);
 } else {
-gen_helper_cmpxchg16b_unlocked(cpu_env, s->A0);
+tcg_gen_nonatomic_cmpxchg_i128(val, s->A0, cmp, val, s->mem_index, 
mop);
 }
-set_cc_op(s, CC_OP_EFLAGS);
+
+tcg_gen_extr_i128_i64(s->T0, s->T1, val);
+tcg_temp_free_i128(cmp);
+tcg_temp_free_i128(val);
+
+/* Determine success after the fact. */
+t0 = tcg_temp_new_i64();
+t1 = tcg_temp_new_i64();
+tcg_gen_xor_i64(t0, s->T0, cpu_regs[R_EAX]);
+tcg_gen_xor_i64(t1, s->T1, cpu_regs[R_EDX]);
+tcg_gen_or_i64(t0, t0, t1);
+tcg_temp_free_i64(t1);
+
+/* Update Z. */
+gen_compute_eflags(s);
+tcg_gen_setcondi_i64(TCG_COND_EQ, t0, t0, 0);
+tcg_gen_deposit_tl(cpu_cc_src, cpu_cc_src,

[PULL 08/40] tcg: Introduce tcg_target_call_oarg_reg

2023-02-04 Thread Richard Henderson

Replace the flat array tcg_target_call_oarg_regs[] with
a function call including the TCGCallReturnKind.

Extend the set of registers for ARM to r0-r3 to match the ABI:
https://github.com/ARM-software/abi-aa/blob/main/aapcs32/aapcs32.rst#result-return

Reviewed-by: Alex Bennée 
Reviewed-by: Daniel Henrique Barboza 
Signed-off-by: Richard Henderson 
---
 tcg/tcg.c|  9 ++---
 tcg/aarch64/tcg-target.c.inc | 10 +++---
 tcg/arm/tcg-target.c.inc | 10 +++---
 tcg/i386/tcg-target.c.inc| 16 ++--
 tcg/loongarch64/tcg-target.c.inc | 10 ++
 tcg/mips/tcg-target.c.inc| 10 ++
 tcg/ppc/tcg-target.c.inc | 10 ++
 tcg/riscv/tcg-target.c.inc   | 10 ++
 tcg/s390x/tcg-target.c.inc   |  9 ++---
 tcg/sparc64/tcg-target.c.inc | 12 ++--
 tcg/tci/tcg-target.c.inc | 12 ++--
 11 files changed, 72 insertions(+), 46 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index 123cde7000..a77483eee8 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -151,6 +151,7 @@ static bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg 
val,
 TCGReg base, intptr_t ofs);
 static void tcg_out_call(TCGContext *s, const tcg_insn_unit *target,
  const TCGHelperInfo *info);
+static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot);
 static bool tcg_target_const_match(int64_t val, TCGType type, int ct);
 #ifdef TCG_TARGET_NEED_LDST_LABELS
 static int tcg_out_ldst_finalize(TCGContext *s);
@@ -740,14 +741,16 @@ static void init_call_layout(TCGHelperInfo *info)
 case dh_typecode_s64:
 info->nr_out = 64 / TCG_TARGET_REG_BITS;
 info->out_kind = TCG_CALL_RET_NORMAL;
-assert(info->nr_out <= ARRAY_SIZE(tcg_target_call_oarg_regs));
+/* Query the last register now to trigger any assert early. */
+tcg_target_call_oarg_reg(info->out_kind, info->nr_out - 1);
 break;
 case dh_typecode_i128:
 info->nr_out = 128 / TCG_TARGET_REG_BITS;
 info->out_kind = TCG_CALL_RET_NORMAL; /* TODO */
 switch (/* TODO */ TCG_CALL_RET_NORMAL) {
 case TCG_CALL_RET_NORMAL:
-assert(info->nr_out <= ARRAY_SIZE(tcg_target_call_oarg_regs));
+/* Query the last register now to trigger any assert early. */
+tcg_target_call_oarg_reg(info->out_kind, info->nr_out - 1);
 break;
 case TCG_CALL_RET_BY_REF:
 /*
@@ -4592,7 +4595,7 @@ static void tcg_reg_alloc_call(TCGContext *s, TCGOp *op)
 case TCG_CALL_RET_NORMAL:
 for (i = 0; i < nb_oargs; i++) {
 TCGTemp *ts = arg_temp(op->args[i]);
-TCGReg reg = tcg_target_call_oarg_regs[i];
+TCGReg reg = tcg_target_call_oarg_reg(TCG_CALL_RET_NORMAL, i);
 
 /* ENV should not be modified.  */
 tcg_debug_assert(!temp_readonly(ts));
diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index bd6da72678..fde3b30ad1 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -63,9 +63,13 @@ static const int tcg_target_call_iarg_regs[8] = {
 TCG_REG_X0, TCG_REG_X1, TCG_REG_X2, TCG_REG_X3,
 TCG_REG_X4, TCG_REG_X5, TCG_REG_X6, TCG_REG_X7
 };
-static const int tcg_target_call_oarg_regs[1] = {
-TCG_REG_X0
-};
+
+static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot)
+{
+tcg_debug_assert(kind == TCG_CALL_RET_NORMAL);
+tcg_debug_assert(slot >= 0 && slot <= 1);
+return TCG_REG_X0 + slot;
+}
 
 #define TCG_REG_TMP TCG_REG_X30
 #define TCG_VEC_TMP TCG_REG_V31
diff --git a/tcg/arm/tcg-target.c.inc b/tcg/arm/tcg-target.c.inc
index 6e9e9b9b3f..d06ac60c15 100644
--- a/tcg/arm/tcg-target.c.inc
+++ b/tcg/arm/tcg-target.c.inc
@@ -79,9 +79,13 @@ static const int tcg_target_reg_alloc_order[] = {
 static const int tcg_target_call_iarg_regs[4] = {
 TCG_REG_R0, TCG_REG_R1, TCG_REG_R2, TCG_REG_R3
 };
-static const int tcg_target_call_oarg_regs[2] = {
-TCG_REG_R0, TCG_REG_R1
-};
+
+static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot)
+{
+tcg_debug_assert(kind == TCG_CALL_RET_NORMAL);
+tcg_debug_assert(slot >= 0 && slot <= 3);
+return TCG_REG_R0 + slot;
+}
 
 #define TCG_REG_TMP  TCG_REG_R12
 #define TCG_VEC_TMP  TCG_REG_Q15
diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 7b573bd287..2f0a9521bf 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -109,12 +109,16 @@ static const int tcg_target_call_iarg_regs[] = {
 #endif
 };
 
-static const int tcg_target_call_oarg_regs[] = {
-TCG_REG_EAX,
-#if TCG_TARGET_REG_BITS == 32
-TCG_REG_EDX
-#endif
-};
+static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot)
+{
+switch (kind) {
+case TCG_CALL_RET_NORMAL:
+tcg_debug_assert(slot >= 0 && slot <= 1);
+return slot ? TCG_REG_EDX : TCG_REG_EAX;
+default:
+g_assert_

[PULL 11/40] tcg/i386: Add TCG_TARGET_CALL_{RET,ARG}_I128

2023-02-04 Thread Richard Henderson

Fill in the parameters for the host ABI for Int128.
Adjust tcg_target_call_oarg_reg for _WIN64, and
tcg_out_call for i386 sysv.  Allow TCG_TYPE_V128
stores without AVX enabled.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.h | 10 ++
 tcg/i386/tcg-target.c.inc | 30 +-
 2 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 5797a55ea0..d4f2a6f8c2 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -100,6 +100,16 @@ typedef enum {
 #endif
 #define TCG_TARGET_CALL_ARG_I32  TCG_CALL_ARG_NORMAL
 #define TCG_TARGET_CALL_ARG_I64  TCG_CALL_ARG_NORMAL
+#if defined(_WIN64)
+# define TCG_TARGET_CALL_ARG_I128TCG_CALL_ARG_BY_REF
+# define TCG_TARGET_CALL_RET_I128TCG_CALL_RET_BY_VEC
+#elif TCG_TARGET_REG_BITS == 64
+# define TCG_TARGET_CALL_ARG_I128TCG_CALL_ARG_NORMAL
+# define TCG_TARGET_CALL_RET_I128TCG_CALL_RET_NORMAL
+#else
+# define TCG_TARGET_CALL_ARG_I128TCG_CALL_ARG_NORMAL
+# define TCG_TARGET_CALL_RET_I128TCG_CALL_RET_BY_REF
+#endif
 
 extern bool have_bmi1;
 extern bool have_popcnt;
diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 2f0a9521bf..883ced8168 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -115,6 +115,11 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind 
kind, int slot)
 case TCG_CALL_RET_NORMAL:
 tcg_debug_assert(slot >= 0 && slot <= 1);
 return slot ? TCG_REG_EDX : TCG_REG_EAX;
+#ifdef _WIN64
+case TCG_CALL_RET_BY_VEC:
+tcg_debug_assert(slot == 0);
+return TCG_REG_XMM0;
+#endif
 default:
 g_assert_not_reached();
 }
@@ -1188,9 +1193,16 @@ static void tcg_out_st(TCGContext *s, TCGType type, 
TCGReg arg,
  * The gvec infrastructure is asserts that v128 vector loads
  * and stores use a 16-byte aligned offset.  Validate that the
  * final pointer is aligned by using an insn that will SIGSEGV.
+ *
+ * This specific instance is also used by TCG_CALL_RET_BY_VEC,
+ * for _WIN64, which must have SSE2 but may not have AVX.
  */
 tcg_debug_assert(arg >= 16);
-tcg_out_vex_modrm_offset(s, OPC_MOVDQA_WxVx, arg, 0, arg1, arg2);
+if (have_avx1) {
+tcg_out_vex_modrm_offset(s, OPC_MOVDQA_WxVx, arg, 0, arg1, arg2);
+} else {
+tcg_out_modrm_offset(s, OPC_MOVDQA_WxVx, arg, arg1, arg2);
+}
 break;
 case TCG_TYPE_V256:
 /*
@@ -1677,6 +1689,22 @@ static void tcg_out_call(TCGContext *s, const 
tcg_insn_unit *dest,
  const TCGHelperInfo *info)
 {
 tcg_out_branch(s, 1, dest);
+
+#ifndef _WIN32
+if (TCG_TARGET_REG_BITS == 32 && info->out_kind == TCG_CALL_RET_BY_REF) {
+/*
+ * The sysv i386 abi for struct return places a reference as the
+ * first argument of the stack, and pops that argument with the
+ * return statement.  Since we want to retain the aligned stack
+ * pointer for the callee, we do not want to actually push that
+ * argument before the call but rely on the normal store to the
+ * stack slot.  But we do need to compensate for the pop in order
+ * to reset our correct stack pointer value.
+ * Pushing a garbage value back onto the stack is quickest.
+ */
+tcg_out_push(s, TCG_REG_EAX);
+}
+#endif
 }
 
 static void tcg_out_jmp(TCGContext *s, const tcg_insn_unit *dest)
-- 
2.34.1

[PULL 14/40] tcg: Add TCG_TARGET_CALL_{RET,ARG}_I128

2023-02-04 Thread Richard Henderson

Fill in the parameters for the host ABI for Int128 for
those backends which require no extra modification.

Reviewed-by: Alex Bennée 
Reviewed-by: Daniel Henrique Barboza 
Signed-off-by: Richard Henderson 
---
 tcg/aarch64/tcg-target.h | 2 ++
 tcg/arm/tcg-target.h | 2 ++
 tcg/loongarch64/tcg-target.h | 2 ++
 tcg/mips/tcg-target.h| 2 ++
 tcg/riscv/tcg-target.h   | 3 +++
 tcg/s390x/tcg-target.h   | 2 ++
 tcg/sparc64/tcg-target.h | 2 ++
 tcg/tcg.c| 6 +++---
 tcg/ppc/tcg-target.c.inc | 3 +++
 9 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 8d244292aa..c0b0f614ba 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -54,6 +54,8 @@ typedef enum {
 #define TCG_TARGET_CALL_STACK_OFFSET0
 #define TCG_TARGET_CALL_ARG_I32 TCG_CALL_ARG_NORMAL
 #define TCG_TARGET_CALL_ARG_I64 TCG_CALL_ARG_NORMAL
+#define TCG_TARGET_CALL_ARG_I128TCG_CALL_ARG_EVEN
+#define TCG_TARGET_CALL_RET_I128TCG_CALL_RET_NORMAL
 
 /* optional instructions */
 #define TCG_TARGET_HAS_div_i32  1
diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index 91b8954804..def2a189e6 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -91,6 +91,8 @@ extern bool use_neon_instructions;
 #define TCG_TARGET_CALL_STACK_OFFSET   0
 #define TCG_TARGET_CALL_ARG_I32 TCG_CALL_ARG_NORMAL
 #define TCG_TARGET_CALL_ARG_I64 TCG_CALL_ARG_EVEN
+#define TCG_TARGET_CALL_ARG_I128TCG_CALL_ARG_EVEN
+#define TCG_TARGET_CALL_RET_I128TCG_CALL_RET_BY_REF
 
 /* optional instructions */
 #define TCG_TARGET_HAS_ext8s_i321
diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
index 8b151e7f6f..17b8193aa5 100644
--- a/tcg/loongarch64/tcg-target.h
+++ b/tcg/loongarch64/tcg-target.h
@@ -92,6 +92,8 @@ typedef enum {
 #define TCG_TARGET_CALL_STACK_OFFSET0
 #define TCG_TARGET_CALL_ARG_I32 TCG_CALL_ARG_NORMAL
 #define TCG_TARGET_CALL_ARG_I64 TCG_CALL_ARG_NORMAL
+#define TCG_TARGET_CALL_ARG_I128TCG_CALL_ARG_NORMAL
+#define TCG_TARGET_CALL_RET_I128TCG_CALL_RET_NORMAL
 
 /* optional instructions */
 #define TCG_TARGET_HAS_movcond_i32  1
diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
index 7bc8e15293..68b11e4d48 100644
--- a/tcg/mips/tcg-target.h
+++ b/tcg/mips/tcg-target.h
@@ -89,6 +89,8 @@ typedef enum {
 # define TCG_TARGET_CALL_ARG_I64  TCG_CALL_ARG_NORMAL
 #endif
 #define TCG_TARGET_CALL_ARG_I32   TCG_CALL_ARG_NORMAL
+#define TCG_TARGET_CALL_ARG_I128  TCG_CALL_ARG_EVEN
+#define TCG_TARGET_CALL_RET_I128  TCG_CALL_RET_NORMAL
 
 /* MOVN/MOVZ instructions detection */
 #if (defined(__mips_isa_rev) && (__mips_isa_rev >= 1)) || \
diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index 1337bc1f1e..0deb33701f 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -85,9 +85,12 @@ typedef enum {
 #define TCG_TARGET_CALL_ARG_I32 TCG_CALL_ARG_NORMAL
 #if TCG_TARGET_REG_BITS == 32
 #define TCG_TARGET_CALL_ARG_I64 TCG_CALL_ARG_EVEN
+#define TCG_TARGET_CALL_ARG_I128TCG_CALL_ARG_EVEN
 #else
 #define TCG_TARGET_CALL_ARG_I64 TCG_CALL_ARG_NORMAL
+#define TCG_TARGET_CALL_ARG_I128TCG_CALL_ARG_NORMAL
 #endif
+#define TCG_TARGET_CALL_RET_I128TCG_CALL_RET_NORMAL
 
 /* optional instructions */
 #define TCG_TARGET_HAS_movcond_i32  0
diff --git a/tcg/s390x/tcg-target.h b/tcg/s390x/tcg-target.h
index e597e47e60..a05b473117 100644
--- a/tcg/s390x/tcg-target.h
+++ b/tcg/s390x/tcg-target.h
@@ -169,6 +169,8 @@ extern uint64_t s390_facilities[3];
 #define TCG_TARGET_CALL_STACK_OFFSET   160
 #define TCG_TARGET_CALL_ARG_I32 TCG_CALL_ARG_EXTEND
 #define TCG_TARGET_CALL_ARG_I64 TCG_CALL_ARG_NORMAL
+#define TCG_TARGET_CALL_ARG_I128TCG_CALL_ARG_BY_REF
+#define TCG_TARGET_CALL_RET_I128TCG_CALL_RET_BY_REF
 
 #define TCG_TARGET_HAS_MEMORY_BSWAP   1
 
diff --git a/tcg/sparc64/tcg-target.h b/tcg/sparc64/tcg-target.h
index 1d6a5c8b07..ffe22b1d21 100644
--- a/tcg/sparc64/tcg-target.h
+++ b/tcg/sparc64/tcg-target.h
@@ -73,6 +73,8 @@ typedef enum {
 #define TCG_TARGET_CALL_STACK_OFFSET(128 + 6*8 + TCG_TARGET_STACK_BIAS)
 #define TCG_TARGET_CALL_ARG_I32 TCG_CALL_ARG_EXTEND
 #define TCG_TARGET_CALL_ARG_I64 TCG_CALL_ARG_NORMAL
+#define TCG_TARGET_CALL_ARG_I128TCG_CALL_ARG_NORMAL
+#define TCG_TARGET_CALL_RET_I128TCG_CALL_RET_NORMAL
 
 #if defined(__VIS__) && __VIS__ >= 0x300
 #define use_vis3_instructions  1
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 865ed5ea0f..163913c95f 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -765,8 +765,8 @@ static void init_call_layout(TCGHelperInfo *info)
 break;
 case dh_typecode_i128:
 info->nr_out = 128 / TCG_TARGET_REG_BITS;
-info->out_kind = TCG_CALL_RET_NORMAL; /* TODO */
-switch (/* TODO */ TCG_CALL_RET_NORMAL)

[PULL 21/40] target/arm: Use tcg_gen_atomic_cmpxchg_i128 for CASP

2023-02-04 Thread Richard Henderson

Signed-off-by: Richard Henderson 
Reviewed-by: Peter Maydell 
Message-Id: <20221112042555.2622152-3-richard.hender...@linaro.org>
---
 target/arm/helper-a64.h|  2 --
 target/arm/helper-a64.c| 43 ---
 target/arm/translate-a64.c | 61 +++---
 3 files changed, 18 insertions(+), 88 deletions(-)

diff --git a/target/arm/helper-a64.h b/target/arm/helper-a64.h
index 94065d1917..ff56807247 100644
--- a/target/arm/helper-a64.h
+++ b/target/arm/helper-a64.h
@@ -50,8 +50,6 @@ DEF_HELPER_FLAGS_2(frecpx_f16, TCG_CALL_NO_RWG, f16, f16, ptr)
 DEF_HELPER_FLAGS_2(fcvtx_f64_to_f32, TCG_CALL_NO_RWG, f32, f64, env)
 DEF_HELPER_FLAGS_3(crc32_64, TCG_CALL_NO_RWG_SE, i64, i64, i64, i32)
 DEF_HELPER_FLAGS_3(crc32c_64, TCG_CALL_NO_RWG_SE, i64, i64, i64, i32)
-DEF_HELPER_5(casp_le_parallel, void, env, i32, i64, i64, i64)
-DEF_HELPER_5(casp_be_parallel, void, env, i32, i64, i64, i64)
 DEF_HELPER_FLAGS_3(advsimd_maxh, TCG_CALL_NO_RWG, f16, f16, f16, ptr)
 DEF_HELPER_FLAGS_3(advsimd_minh, TCG_CALL_NO_RWG, f16, f16, f16, ptr)
 DEF_HELPER_FLAGS_3(advsimd_maxnumh, TCG_CALL_NO_RWG, f16, f16, f16, ptr)
diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
index 7dbdb2c233..0972a4bdd0 100644
--- a/target/arm/helper-a64.c
+++ b/target/arm/helper-a64.c
@@ -505,49 +505,6 @@ uint64_t HELPER(crc32c_64)(uint64_t acc, uint64_t val, 
uint32_t bytes)
 return crc32c(acc, buf, bytes) ^ 0x;
 }
 
-/* Writes back the old data into Rs.  */
-void HELPER(casp_le_parallel)(CPUARMState *env, uint32_t rs, uint64_t addr,
-  uint64_t new_lo, uint64_t new_hi)
-{
-Int128 oldv, cmpv, newv;
-uintptr_t ra = GETPC();
-int mem_idx;
-MemOpIdx oi;
-
-assert(HAVE_CMPXCHG128);
-
-mem_idx = cpu_mmu_index(env, false);
-oi = make_memop_idx(MO_LE | MO_128 | MO_ALIGN, mem_idx);
-
-cmpv = int128_make128(env->xregs[rs], env->xregs[rs + 1]);
-newv = int128_make128(new_lo, new_hi);
-oldv = cpu_atomic_cmpxchgo_le_mmu(env, addr, cmpv, newv, oi, ra);
-
-env->xregs[rs] = int128_getlo(oldv);
-env->xregs[rs + 1] = int128_gethi(oldv);
-}
-
-void HELPER(casp_be_parallel)(CPUARMState *env, uint32_t rs, uint64_t addr,
-  uint64_t new_hi, uint64_t new_lo)
-{
-Int128 oldv, cmpv, newv;
-uintptr_t ra = GETPC();
-int mem_idx;
-MemOpIdx oi;
-
-assert(HAVE_CMPXCHG128);
-
-mem_idx = cpu_mmu_index(env, false);
-oi = make_memop_idx(MO_LE | MO_128 | MO_ALIGN, mem_idx);
-
-cmpv = int128_make128(env->xregs[rs + 1], env->xregs[rs]);
-newv = int128_make128(new_lo, new_hi);
-oldv = cpu_atomic_cmpxchgo_be_mmu(env, addr, cmpv, newv, oi, ra);
-
-env->xregs[rs + 1] = int128_getlo(oldv);
-env->xregs[rs] = int128_gethi(oldv);
-}
-
 /*
  * AdvSIMD half-precision
  */
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 951b64c9b1..da9f877476 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -2709,53 +2709,28 @@ static void gen_compare_and_swap_pair(DisasContext *s, 
int rs, int rt,
 tcg_gen_extr32_i64(s2, s1, cmp);
 }
 tcg_temp_free_i64(cmp);
-} else if (tb_cflags(s->base.tb) & CF_PARALLEL) {
-if (HAVE_CMPXCHG128) {
-TCGv_i32 tcg_rs = tcg_constant_i32(rs);
-if (s->be_data == MO_LE) {
-gen_helper_casp_le_parallel(cpu_env, tcg_rs,
-clean_addr, t1, t2);
-} else {
-gen_helper_casp_be_parallel(cpu_env, tcg_rs,
-clean_addr, t1, t2);
-}
-} else {
-gen_helper_exit_atomic(cpu_env);
-s->base.is_jmp = DISAS_NORETURN;
-}
 } else {
-TCGv_i64 d1 = tcg_temp_new_i64();
-TCGv_i64 d2 = tcg_temp_new_i64();
-TCGv_i64 a2 = tcg_temp_new_i64();
-TCGv_i64 c1 = tcg_temp_new_i64();
-TCGv_i64 c2 = tcg_temp_new_i64();
-TCGv_i64 zero = tcg_constant_i64(0);
+TCGv_i128 cmp = tcg_temp_new_i128();
+TCGv_i128 val = tcg_temp_new_i128();
 
-/* Load the two words, in memory order.  */
-tcg_gen_qemu_ld_i64(d1, clean_addr, memidx,
-MO_64 | MO_ALIGN_16 | s->be_data);
-tcg_gen_addi_i64(a2, clean_addr, 8);
-tcg_gen_qemu_ld_i64(d2, a2, memidx, MO_64 | s->be_data);
+if (s->be_data == MO_LE) {
+tcg_gen_concat_i64_i128(val, t1, t2);
+tcg_gen_concat_i64_i128(cmp, s1, s2);
+} else {
+tcg_gen_concat_i64_i128(val, t2, t1);
+tcg_gen_concat_i64_i128(cmp, s2, s1);
+}
 
-/* Compare the two words, also in memory order.  */
-tcg_gen_setcond_i64(TCG_COND_EQ, c1, d1, s1);
-tcg_gen_setcond_i64(TCG_COND_EQ, c2, d2, s2);
-tcg_gen_and_i64(c2, c2, c1);
+tcg_gen_atomic_cmpxchg_i128(cmp, clean_addr, cmp, val, memidx,
+

[PULL 04/40] tcg: Handle dh_typecode_i128 with TCG_CALL_{RET, ARG}_NORMAL

2023-02-04 Thread Richard Henderson

Many hosts pass and return 128-bit quantities like sequential
64-bit quantities.  Treat this just like we currently break
down 64-bit quantities for a 32-bit host.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 tcg/tcg.c | 37 +
 1 file changed, 33 insertions(+), 4 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index bc60fd0fe8..bc7198e5d0 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -707,11 +707,22 @@ static void init_call_layout(TCGHelperInfo *info)
 case dh_typecode_s64:
 info->nr_out = 64 / TCG_TARGET_REG_BITS;
 info->out_kind = TCG_CALL_RET_NORMAL;
+assert(info->nr_out <= ARRAY_SIZE(tcg_target_call_oarg_regs));
+break;
+case dh_typecode_i128:
+info->nr_out = 128 / TCG_TARGET_REG_BITS;
+info->out_kind = TCG_CALL_RET_NORMAL; /* TODO */
+switch (/* TODO */ TCG_CALL_RET_NORMAL) {
+case TCG_CALL_RET_NORMAL:
+assert(info->nr_out <= ARRAY_SIZE(tcg_target_call_oarg_regs));
+break;
+default:
+qemu_build_not_reached();
+}
 break;
 default:
 g_assert_not_reached();
 }
-assert(info->nr_out <= ARRAY_SIZE(tcg_target_call_oarg_regs));
 
 /*
  * Parse and place function arguments.
@@ -733,6 +744,9 @@ static void init_call_layout(TCGHelperInfo *info)
 case dh_typecode_ptr:
 type = TCG_TYPE_PTR;
 break;
+case dh_typecode_i128:
+type = TCG_TYPE_I128;
+break;
 default:
 g_assert_not_reached();
 }
@@ -772,6 +786,19 @@ static void init_call_layout(TCGHelperInfo *info)
 }
 break;
 
+case TCG_TYPE_I128:
+switch (/* TODO */ TCG_CALL_ARG_NORMAL) {
+case TCG_CALL_ARG_EVEN:
+layout_arg_even(&cum);
+/* fall through */
+case TCG_CALL_ARG_NORMAL:
+layout_arg_normal_n(&cum, info, 128 / TCG_TARGET_REG_BITS);
+break;
+default:
+qemu_build_not_reached();
+}
+break;
+
 default:
 g_assert_not_reached();
 }
@@ -1692,11 +1719,13 @@ void tcg_gen_callN(void *func, TCGTemp *ret, int nargs, 
TCGTemp **args)
 op->args[pi++] = temp_arg(ret);
 break;
 case 2:
+case 4:
 tcg_debug_assert(ret != NULL);
-tcg_debug_assert(ret->base_type == ret->type + 1);
+tcg_debug_assert(ret->base_type == ret->type + ctz32(n));
 tcg_debug_assert(ret->temp_subindex == 0);
-op->args[pi++] = temp_arg(ret);
-op->args[pi++] = temp_arg(ret + 1);
+for (i = 0; i < n; ++i) {
+op->args[pi++] = temp_arg(ret + i);
+}
 break;
 default:
 g_assert_not_reached();
-- 
2.34.1

[PULL 17/40] tcg: Add guest load/store primitives for TCGv_i128

2023-02-04 Thread Richard Henderson

These are not yet considering atomicity of the 16-byte value;
this is a direct replacement for the current target code which
uses a pair of 8-byte operations.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 include/exec/cpu_ldst.h |  10 +++
 include/tcg/tcg-op.h|   2 +
 accel/tcg/cputlb.c  | 112 +
 accel/tcg/user-exec.c   |  66 
 tcg/tcg-op.c| 134 
 5 files changed, 324 insertions(+)

diff --git a/include/exec/cpu_ldst.h b/include/exec/cpu_ldst.h
index d0c7c0d5fe..09b55cc0ee 100644
--- a/include/exec/cpu_ldst.h
+++ b/include/exec/cpu_ldst.h
@@ -220,6 +220,11 @@ uint32_t cpu_ldl_le_mmu(CPUArchState *env, abi_ptr ptr,
 uint64_t cpu_ldq_le_mmu(CPUArchState *env, abi_ptr ptr,
 MemOpIdx oi, uintptr_t ra);
 
+Int128 cpu_ld16_be_mmu(CPUArchState *env, abi_ptr addr,
+   MemOpIdx oi, uintptr_t ra);
+Int128 cpu_ld16_le_mmu(CPUArchState *env, abi_ptr addr,
+   MemOpIdx oi, uintptr_t ra);
+
 void cpu_stb_mmu(CPUArchState *env, abi_ptr ptr, uint8_t val,
  MemOpIdx oi, uintptr_t ra);
 void cpu_stw_be_mmu(CPUArchState *env, abi_ptr ptr, uint16_t val,
@@ -235,6 +240,11 @@ void cpu_stl_le_mmu(CPUArchState *env, abi_ptr ptr, 
uint32_t val,
 void cpu_stq_le_mmu(CPUArchState *env, abi_ptr ptr, uint64_t val,
 MemOpIdx oi, uintptr_t ra);
 
+void cpu_st16_be_mmu(CPUArchState *env, abi_ptr addr, Int128 val,
+ MemOpIdx oi, uintptr_t ra);
+void cpu_st16_le_mmu(CPUArchState *env, abi_ptr addr, Int128 val,
+ MemOpIdx oi, uintptr_t ra);
+
 uint32_t cpu_atomic_cmpxchgb_mmu(CPUArchState *env, target_ulong addr,
  uint32_t cmpv, uint32_t newv,
  MemOpIdx oi, uintptr_t retaddr);
diff --git a/include/tcg/tcg-op.h b/include/tcg/tcg-op.h
index c4276767d1..e5f5b63c37 100644
--- a/include/tcg/tcg-op.h
+++ b/include/tcg/tcg-op.h
@@ -845,6 +845,8 @@ void tcg_gen_qemu_ld_i32(TCGv_i32, TCGv, TCGArg, MemOp);
 void tcg_gen_qemu_st_i32(TCGv_i32, TCGv, TCGArg, MemOp);
 void tcg_gen_qemu_ld_i64(TCGv_i64, TCGv, TCGArg, MemOp);
 void tcg_gen_qemu_st_i64(TCGv_i64, TCGv, TCGArg, MemOp);
+void tcg_gen_qemu_ld_i128(TCGv_i128, TCGv, TCGArg, MemOp);
+void tcg_gen_qemu_st_i128(TCGv_i128, TCGv, TCGArg, MemOp);
 
 static inline void tcg_gen_qemu_ld8u(TCGv ret, TCGv addr, int mem_index)
 {
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 04e270742e..4812d83961 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -2192,6 +2192,64 @@ uint64_t cpu_ldq_le_mmu(CPUArchState *env, abi_ptr addr,
 return cpu_load_helper(env, addr, oi, ra, helper_le_ldq_mmu);
 }
 
+Int128 cpu_ld16_be_mmu(CPUArchState *env, abi_ptr addr,
+   MemOpIdx oi, uintptr_t ra)
+{
+MemOp mop = get_memop(oi);
+int mmu_idx = get_mmuidx(oi);
+MemOpIdx new_oi;
+unsigned a_bits;
+uint64_t h, l;
+
+tcg_debug_assert((mop & (MO_BSWAP|MO_SSIZE)) == (MO_BE|MO_128));
+a_bits = get_alignment_bits(mop);
+
+/* Handle CPU specific unaligned behaviour */
+if (addr & ((1 << a_bits) - 1)) {
+cpu_unaligned_access(env_cpu(env), addr, MMU_DATA_LOAD,
+ mmu_idx, ra);
+}
+
+/* Construct an unaligned 64-bit replacement MemOpIdx. */
+mop = (mop & ~(MO_SIZE | MO_AMASK)) | MO_64 | MO_UNALN;
+new_oi = make_memop_idx(mop, mmu_idx);
+
+h = helper_be_ldq_mmu(env, addr, new_oi, ra);
+l = helper_be_ldq_mmu(env, addr + 8, new_oi, ra);
+
+qemu_plugin_vcpu_mem_cb(env_cpu(env), addr, oi, QEMU_PLUGIN_MEM_R);
+return int128_make128(l, h);
+}
+
+Int128 cpu_ld16_le_mmu(CPUArchState *env, abi_ptr addr,
+   MemOpIdx oi, uintptr_t ra)
+{
+MemOp mop = get_memop(oi);
+int mmu_idx = get_mmuidx(oi);
+MemOpIdx new_oi;
+unsigned a_bits;
+uint64_t h, l;
+
+tcg_debug_assert((mop & (MO_BSWAP|MO_SSIZE)) == (MO_LE|MO_128));
+a_bits = get_alignment_bits(mop);
+
+/* Handle CPU specific unaligned behaviour */
+if (addr & ((1 << a_bits) - 1)) {
+cpu_unaligned_access(env_cpu(env), addr, MMU_DATA_LOAD,
+ mmu_idx, ra);
+}
+
+/* Construct an unaligned 64-bit replacement MemOpIdx. */
+mop = (mop & ~(MO_SIZE | MO_AMASK)) | MO_64 | MO_UNALN;
+new_oi = make_memop_idx(mop, mmu_idx);
+
+l = helper_le_ldq_mmu(env, addr, new_oi, ra);
+h = helper_le_ldq_mmu(env, addr + 8, new_oi, ra);
+
+qemu_plugin_vcpu_mem_cb(env_cpu(env), addr, oi, QEMU_PLUGIN_MEM_R);
+return int128_make128(l, h);
+}
+
 /*
  * Store Helpers
  */
@@ -2546,6 +2604,60 @@ void cpu_stq_le_mmu(CPUArchState *env, target_ulong 
addr, uint64_t val,
 cpu_store_helper(env, addr, val, oi, retaddr, helper_le_stq_mmu);
 }
 
+void cpu_st16_be_mmu(CPUArchState *env, abi_ptr addr, Int128 val,
+ MemOpIdx o

[PULL 12/40] tcg/tci: Fix big-endian return register ordering

2023-02-04 Thread Richard Henderson

We expect the backend to require register pairs in
host-endian ordering, thus for big-endian the first
register of a pair contains the high part.
We were forcing R0 to contain the low part for calls.

Reviewed-by: Alex Bennée 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 05a24163d3..eeccdde8bc 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -520,27 +520,28 @@ uintptr_t QEMU_DISABLE_CFI tcg_qemu_tb_exec(CPUArchState 
*env,
 ffi_call(pptr[1], pptr[0], stack, call_slots);
 }
 
-/* Any result winds up "left-aligned" in the stack[0] slot. */
 switch (len) {
 case 0: /* void */
 break;
 case 1: /* uint32_t */
 /*
+ * The result winds up "left-aligned" in the stack[0] slot.
  * Note that libffi has an odd special case in that it will
  * always widen an integral result to ffi_arg.
  */
-if (sizeof(ffi_arg) == 4) {
-regs[TCG_REG_R0] = *(uint32_t *)stack;
-break;
-}
-/* fall through */
-case 2: /* uint64_t */
-if (TCG_TARGET_REG_BITS == 32) {
-tci_write_reg64(regs, TCG_REG_R1, TCG_REG_R0, stack[0]);
+if (sizeof(ffi_arg) == 8) {
+regs[TCG_REG_R0] = (uint32_t)stack[0];
 } else {
-regs[TCG_REG_R0] = stack[0];
+regs[TCG_REG_R0] = *(uint32_t *)stack;
 }
 break;
+case 2: /* uint64_t */
+/*
+ * For TCG_TARGET_REG_BITS == 32, the register pair
+ * must stay in host memory order.
+ */
+memcpy(®s[TCG_REG_R0], stack, 8);
+break;
 default:
 g_assert_not_reached();
 }
-- 
2.34.1

[PULL 15/40] tcg: Add temp allocation for TCGv_i128

2023-02-04 Thread Richard Henderson

This enables allocation of i128.  The type is not yet
usable, as we have not yet added data movement ops.

Reviewed-by: Alex Bennée 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 include/tcg/tcg.h | 32 +
 tcg/tcg.c | 60 +--
 2 files changed, 74 insertions(+), 18 deletions(-)

diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index 4d7e4107a9..59854f95b1 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -687,6 +687,11 @@ static inline TCGTemp *tcgv_i64_temp(TCGv_i64 v)
 return tcgv_i32_temp((TCGv_i32)v);
 }
 
+static inline TCGTemp *tcgv_i128_temp(TCGv_i128 v)
+{
+return tcgv_i32_temp((TCGv_i32)v);
+}
+
 static inline TCGTemp *tcgv_ptr_temp(TCGv_ptr v)
 {
 return tcgv_i32_temp((TCGv_i32)v);
@@ -707,6 +712,11 @@ static inline TCGArg tcgv_i64_arg(TCGv_i64 v)
 return temp_arg(tcgv_i64_temp(v));
 }
 
+static inline TCGArg tcgv_i128_arg(TCGv_i128 v)
+{
+return temp_arg(tcgv_i128_temp(v));
+}
+
 static inline TCGArg tcgv_ptr_arg(TCGv_ptr v)
 {
 return temp_arg(tcgv_ptr_temp(v));
@@ -728,6 +738,11 @@ static inline TCGv_i64 temp_tcgv_i64(TCGTemp *t)
 return (TCGv_i64)temp_tcgv_i32(t);
 }
 
+static inline TCGv_i128 temp_tcgv_i128(TCGTemp *t)
+{
+return (TCGv_i128)temp_tcgv_i32(t);
+}
+
 static inline TCGv_ptr temp_tcgv_ptr(TCGTemp *t)
 {
 return (TCGv_ptr)temp_tcgv_i32(t);
@@ -853,6 +868,11 @@ static inline void tcg_temp_free_i64(TCGv_i64 arg)
 tcg_temp_free_internal(tcgv_i64_temp(arg));
 }
 
+static inline void tcg_temp_free_i128(TCGv_i128 arg)
+{
+tcg_temp_free_internal(tcgv_i128_temp(arg));
+}
+
 static inline void tcg_temp_free_ptr(TCGv_ptr arg)
 {
 tcg_temp_free_internal(tcgv_ptr_temp(arg));
@@ -901,6 +921,18 @@ static inline TCGv_i64 tcg_temp_local_new_i64(void)
 return temp_tcgv_i64(t);
 }
 
+static inline TCGv_i128 tcg_temp_new_i128(void)
+{
+TCGTemp *t = tcg_temp_new_internal(TCG_TYPE_I128, false);
+return temp_tcgv_i128(t);
+}
+
+static inline TCGv_i128 tcg_temp_local_new_i128(void)
+{
+TCGTemp *t = tcg_temp_new_internal(TCG_TYPE_I128, true);
+return temp_tcgv_i128(t);
+}
+
 static inline TCGv_ptr tcg_global_mem_new_ptr(TCGv_ptr reg, intptr_t offset,
   const char *name)
 {
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 163913c95f..a4a3da6804 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1273,26 +1273,45 @@ TCGTemp *tcg_temp_new_internal(TCGType type, bool 
temp_local)
 tcg_debug_assert(ts->base_type == type);
 tcg_debug_assert(ts->kind == kind);
 } else {
+int i, n;
+
+switch (type) {
+case TCG_TYPE_I32:
+case TCG_TYPE_V64:
+case TCG_TYPE_V128:
+case TCG_TYPE_V256:
+n = 1;
+break;
+case TCG_TYPE_I64:
+n = 64 / TCG_TARGET_REG_BITS;
+break;
+case TCG_TYPE_I128:
+n = 128 / TCG_TARGET_REG_BITS;
+break;
+default:
+g_assert_not_reached();
+}
+
 ts = tcg_temp_alloc(s);
-if (TCG_TARGET_REG_BITS == 32 && type == TCG_TYPE_I64) {
-TCGTemp *ts2 = tcg_temp_alloc(s);
+ts->base_type = type;
+ts->temp_allocated = 1;
+ts->kind = kind;
 
-ts->base_type = type;
-ts->type = TCG_TYPE_I32;
-ts->temp_allocated = 1;
-ts->kind = kind;
-
-tcg_debug_assert(ts2 == ts + 1);
-ts2->base_type = TCG_TYPE_I64;
-ts2->type = TCG_TYPE_I32;
-ts2->temp_allocated = 1;
-ts2->temp_subindex = 1;
-ts2->kind = kind;
-} else {
-ts->base_type = type;
+if (n == 1) {
 ts->type = type;
-ts->temp_allocated = 1;
-ts->kind = kind;
+} else {
+ts->type = TCG_TYPE_REG;
+
+for (i = 1; i < n; ++i) {
+TCGTemp *ts2 = tcg_temp_alloc(s);
+
+tcg_debug_assert(ts2 == ts + i);
+ts2->base_type = type;
+ts2->type = TCG_TYPE_REG;
+ts2->temp_allocated = 1;
+ts2->temp_subindex = i;
+ts2->kind = kind;
+}
 }
 }
 
@@ -3384,9 +3403,14 @@ static void temp_allocate_frame(TCGContext *s, TCGTemp 
*ts)
 case TCG_TYPE_V64:
 align = 8;
 break;
+case TCG_TYPE_I128:
 case TCG_TYPE_V128:
 case TCG_TYPE_V256:
-/* Note that we do not require aligned storage for V256. */
+/*
+ * Note that we do not require aligned storage for V256,
+ * and that we provide alignment for I128 to match V128,
+ * even if that's above what the host ABI requires.
+ */
 align = 16;
 break;
 default:
-- 
2.34.1

[PULL 35/40] target/s390x: Use tcg_gen_atomic_cmpxchg_i128 for CDSG

2023-02-04 Thread Richard Henderson

Acked-by: Ilya Leoshkevich 
Signed-off-by: Richard Henderson 
---
 target/s390x/helper.h|  2 --
 target/s390x/tcg/insn-data.h.inc |  2 +-
 target/s390x/tcg/mem_helper.c| 52 --
 target/s390x/tcg/translate.c | 55 +++-
 4 files changed, 33 insertions(+), 78 deletions(-)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index bccd3bfca6..341bc51ec2 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -35,8 +35,6 @@ DEF_HELPER_3(cxgb, i128, env, s64, i32)
 DEF_HELPER_3(celgb, i64, env, i64, i32)
 DEF_HELPER_3(cdlgb, i64, env, i64, i32)
 DEF_HELPER_3(cxlgb, i128, env, i64, i32)
-DEF_HELPER_4(cdsg, void, env, i64, i32, i32)
-DEF_HELPER_4(cdsg_parallel, void, env, i64, i32, i32)
 DEF_HELPER_4(csst, i32, env, i32, i64, i64)
 DEF_HELPER_4(csst_parallel, i32, env, i32, i64, i64)
 DEF_HELPER_FLAGS_3(aeb, TCG_CALL_NO_WG, i64, env, i64, i64)
diff --git a/target/s390x/tcg/insn-data.h.inc b/target/s390x/tcg/insn-data.h.inc
index 893f4b48db..9d2d35f084 100644
--- a/target/s390x/tcg/insn-data.h.inc
+++ b/target/s390x/tcg/insn-data.h.inc
@@ -276,7 +276,7 @@
 /* COMPARE DOUBLE AND SWAP */
 D(0xbb00, CDS, RS_a,  Z,   r3_D32, r1_D32, new, r1_D32, cs, 0, MO_TEUQ)
 D(0xeb31, CDSY,RSY_a, LD,  r3_D32, r1_D32, new, r1_D32, cs, 0, MO_TEUQ)
-C(0xeb3e, CDSG,RSY_a, Z,   0, 0, 0, 0, cdsg, 0)
+C(0xeb3e, CDSG,RSY_a, Z,   la2, r3_D64, 0, r1_D64, cdsg, 0)
 /* COMPARE AND SWAP AND STORE */
 C(0xc802, CSST,SSF,   CASS, la1, a2, 0, 0, csst, 0)
 
diff --git a/target/s390x/tcg/mem_helper.c b/target/s390x/tcg/mem_helper.c
index 49969abda7..d6725fd18c 100644
--- a/target/s390x/tcg/mem_helper.c
+++ b/target/s390x/tcg/mem_helper.c
@@ -1771,58 +1771,6 @@ uint32_t HELPER(trXX)(CPUS390XState *env, uint32_t r1, 
uint32_t r2,
 return cc;
 }
 
-void HELPER(cdsg)(CPUS390XState *env, uint64_t addr,
-  uint32_t r1, uint32_t r3)
-{
-uintptr_t ra = GETPC();
-Int128 cmpv = int128_make128(env->regs[r1 + 1], env->regs[r1]);
-Int128 newv = int128_make128(env->regs[r3 + 1], env->regs[r3]);
-Int128 oldv;
-uint64_t oldh, oldl;
-bool fail;
-
-check_alignment(env, addr, 16, ra);
-
-oldh = cpu_ldq_data_ra(env, addr + 0, ra);
-oldl = cpu_ldq_data_ra(env, addr + 8, ra);
-
-oldv = int128_make128(oldl, oldh);
-fail = !int128_eq(oldv, cmpv);
-if (fail) {
-newv = oldv;
-}
-
-cpu_stq_data_ra(env, addr + 0, int128_gethi(newv), ra);
-cpu_stq_data_ra(env, addr + 8, int128_getlo(newv), ra);
-
-env->cc_op = fail;
-env->regs[r1] = int128_gethi(oldv);
-env->regs[r1 + 1] = int128_getlo(oldv);
-}
-
-void HELPER(cdsg_parallel)(CPUS390XState *env, uint64_t addr,
-   uint32_t r1, uint32_t r3)
-{
-uintptr_t ra = GETPC();
-Int128 cmpv = int128_make128(env->regs[r1 + 1], env->regs[r1]);
-Int128 newv = int128_make128(env->regs[r3 + 1], env->regs[r3]);
-int mem_idx;
-MemOpIdx oi;
-Int128 oldv;
-bool fail;
-
-assert(HAVE_CMPXCHG128);
-
-mem_idx = cpu_mmu_index(env, false);
-oi = make_memop_idx(MO_TE | MO_128 | MO_ALIGN, mem_idx);
-oldv = cpu_atomic_cmpxchgo_be_mmu(env, addr, cmpv, newv, oi, ra);
-fail = !int128_eq(oldv, cmpv);
-
-env->cc_op = fail;
-env->regs[r1] = int128_gethi(oldv);
-env->regs[r1 + 1] = int128_getlo(oldv);
-}
-
 static uint32_t do_csst(CPUS390XState *env, uint32_t r3, uint64_t a1,
 uint64_t a2, bool parallel)
 {
diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
index d422a1e62b..9ea28b3e52 100644
--- a/target/s390x/tcg/translate.c
+++ b/target/s390x/tcg/translate.c
@@ -2224,31 +2224,25 @@ static DisasJumpType op_cs(DisasContext *s, DisasOps *o)
 static DisasJumpType op_cdsg(DisasContext *s, DisasOps *o)
 {
 int r1 = get_field(s, r1);
-int r3 = get_field(s, r3);
-int d2 = get_field(s, d2);
-int b2 = get_field(s, b2);
-DisasJumpType ret = DISAS_NEXT;
-TCGv_i64 addr;
-TCGv_i32 t_r1, t_r3;
 
-/* Note that R1:R1+1 = expected value and R3:R3+1 = new value.  */
-addr = get_address(s, 0, b2, d2);
-t_r1 = tcg_const_i32(r1);
-t_r3 = tcg_const_i32(r3);
-if (!(tb_cflags(s->base.tb) & CF_PARALLEL)) {
-gen_helper_cdsg(cpu_env, addr, t_r1, t_r3);
-} else if (HAVE_CMPXCHG128) {
-gen_helper_cdsg_parallel(cpu_env, addr, t_r1, t_r3);
-} else {
-gen_helper_exit_atomic(cpu_env);
-ret = DISAS_NORETURN;
-}
-tcg_temp_free_i64(addr);
-tcg_temp_free_i32(t_r1);
-tcg_temp_free_i32(t_r3);
+o->out_128 = tcg_temp_new_i128();
+tcg_gen_concat_i64_i128(o->out_128, regs[r1 + 1], regs[r1]);
 
-set_cc_static(s);
-return ret;
+/* Note out (R1:R1+1) = expected value and in2 (R3:R3+1) = new value.  */
+tcg_gen_atomic_cmpxchg_i128(o->out_128, o->addr1, o->out_128, o->in2_128,
+get_mem_index(s), MO

[PULL 37/40] target/i386: Split out gen_cmpxchg8b, gen_cmpxchg16b

2023-02-04 Thread Richard Henderson

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/i386/tcg/translate.c | 48 -
 1 file changed, 31 insertions(+), 17 deletions(-)

diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 7e0b2a709a..a82131d635 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -2993,6 +2993,34 @@ static void gen_sty_env_A0(DisasContext *s, int offset, 
bool align)
 #include "emit.c.inc"
 #include "decode-new.c.inc"
 
+static void gen_cmpxchg8b(DisasContext *s, CPUX86State *env, int modrm)
+{
+gen_lea_modrm(env, s, modrm);
+
+if ((s->prefix & PREFIX_LOCK) &&
+(tb_cflags(s->base.tb) & CF_PARALLEL)) {
+gen_helper_cmpxchg8b(cpu_env, s->A0);
+} else {
+gen_helper_cmpxchg8b_unlocked(cpu_env, s->A0);
+}
+set_cc_op(s, CC_OP_EFLAGS);
+}
+
+#ifdef TARGET_X86_64
+static void gen_cmpxchg16b(DisasContext *s, CPUX86State *env, int modrm)
+{
+gen_lea_modrm(env, s, modrm);
+
+if ((s->prefix & PREFIX_LOCK) &&
+(tb_cflags(s->base.tb) & CF_PARALLEL)) {
+gen_helper_cmpxchg16b(cpu_env, s->A0);
+} else {
+gen_helper_cmpxchg16b_unlocked(cpu_env, s->A0);
+}
+set_cc_op(s, CC_OP_EFLAGS);
+}
+#endif
+
 /* convert one instruction. s->base.is_jmp is set if the translation must
be stopped. Return the next pc value */
 static bool disas_insn(DisasContext *s, CPUState *cpu)
@@ -3844,28 +3872,14 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 if (!(s->cpuid_ext_features & CPUID_EXT_CX16)) {
 goto illegal_op;
 }
-gen_lea_modrm(env, s, modrm);
-if ((s->prefix & PREFIX_LOCK) &&
-(tb_cflags(s->base.tb) & CF_PARALLEL)) {
-gen_helper_cmpxchg16b(cpu_env, s->A0);
-} else {
-gen_helper_cmpxchg16b_unlocked(cpu_env, s->A0);
-}
-set_cc_op(s, CC_OP_EFLAGS);
+gen_cmpxchg16b(s, env, modrm);
 break;
 }
-#endif
+#endif
 if (!(s->cpuid_features & CPUID_CX8)) {
 goto illegal_op;
 }
-gen_lea_modrm(env, s, modrm);
-if ((s->prefix & PREFIX_LOCK) &&
-(tb_cflags(s->base.tb) & CF_PARALLEL)) {
-gen_helper_cmpxchg8b(cpu_env, s->A0);
-} else {
-gen_helper_cmpxchg8b_unlocked(cpu_env, s->A0);
-}
-set_cc_op(s, CC_OP_EFLAGS);
+gen_cmpxchg8b(s, env, modrm);
 break;
 
 case 7: /* RDSEED */
-- 
2.34.1

[PULL 09/40] tcg: Add TCG_CALL_RET_BY_VEC

2023-02-04 Thread Richard Henderson

This will be used by _WIN64 to return i128.  Not yet used,
because allocation is not yet enabled.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 tcg/tcg-internal.h |  1 +
 tcg/tcg.c  | 19 +++
 2 files changed, 20 insertions(+)

diff --git a/tcg/tcg-internal.h b/tcg/tcg-internal.h
index 2ec1ea01df..33f1d8b411 100644
--- a/tcg/tcg-internal.h
+++ b/tcg/tcg-internal.h
@@ -37,6 +37,7 @@
 typedef enum {
 TCG_CALL_RET_NORMAL, /* by registers */
 TCG_CALL_RET_BY_REF, /* for i128, by reference */
+TCG_CALL_RET_BY_VEC, /* for i128, by vector register */
 } TCGCallReturnKind;
 
 typedef enum {
diff --git a/tcg/tcg.c b/tcg/tcg.c
index a77483eee8..098be83b00 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -752,6 +752,10 @@ static void init_call_layout(TCGHelperInfo *info)
 /* Query the last register now to trigger any assert early. */
 tcg_target_call_oarg_reg(info->out_kind, info->nr_out - 1);
 break;
+case TCG_CALL_RET_BY_VEC:
+/* Query the single register now to trigger any assert early. */
+tcg_target_call_oarg_reg(TCG_CALL_RET_BY_VEC, 0);
+break;
 case TCG_CALL_RET_BY_REF:
 /*
  * Allocate the first argument to the output.
@@ -4605,6 +4609,21 @@ static void tcg_reg_alloc_call(TCGContext *s, TCGOp *op)
 }
 break;
 
+case TCG_CALL_RET_BY_VEC:
+{
+TCGTemp *ts = arg_temp(op->args[0]);
+
+tcg_debug_assert(ts->base_type == TCG_TYPE_I128);
+tcg_debug_assert(ts->temp_subindex == 0);
+if (!ts->mem_allocated) {
+temp_allocate_frame(s, ts);
+}
+tcg_out_st(s, TCG_TYPE_V128,
+   tcg_target_call_oarg_reg(TCG_CALL_RET_BY_VEC, 0),
+   ts->mem_base->reg, ts->mem_offset);
+}
+/* fall through to mark all parts in memory */
+
 case TCG_CALL_RET_BY_REF:
 /* The callee has performed a write through the reference. */
 for (i = 0; i < nb_oargs; i++) {
-- 
2.34.1

[PULL 38/40] target/i386: Inline cmpxchg8b

2023-02-04 Thread Richard Henderson

Use tcg_gen_atomic_cmpxchg_i64 for the atomic case,
and tcg_gen_nonatomic_cmpxchg_i64 otherwise.

Reviewed-by: Alex Bennée 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/i386/helper.h |  2 --
 target/i386/tcg/mem_helper.c | 57 
 target/i386/tcg/translate.c  | 54 ++
 3 files changed, 49 insertions(+), 64 deletions(-)

diff --git a/target/i386/helper.h b/target/i386/helper.h
index b7de5429ef..2df8049f91 100644
--- a/target/i386/helper.h
+++ b/target/i386/helper.h
@@ -66,8 +66,6 @@ DEF_HELPER_1(rsm, void, env)
 #endif /* !CONFIG_USER_ONLY */
 
 DEF_HELPER_2(into, void, env, int)
-DEF_HELPER_2(cmpxchg8b_unlocked, void, env, tl)
-DEF_HELPER_2(cmpxchg8b, void, env, tl)
 #ifdef TARGET_X86_64
 DEF_HELPER_2(cmpxchg16b_unlocked, void, env, tl)
 DEF_HELPER_2(cmpxchg16b, void, env, tl)
diff --git a/target/i386/tcg/mem_helper.c b/target/i386/tcg/mem_helper.c
index e3cdafd2d4..814786bb87 100644
--- a/target/i386/tcg/mem_helper.c
+++ b/target/i386/tcg/mem_helper.c
@@ -27,63 +27,6 @@
 #include "tcg/tcg.h"
 #include "helper-tcg.h"
 
-void helper_cmpxchg8b_unlocked(CPUX86State *env, target_ulong a0)
-{
-uintptr_t ra = GETPC();
-uint64_t oldv, cmpv, newv;
-int eflags;
-
-eflags = cpu_cc_compute_all(env, CC_OP);
-
-cmpv = deposit64(env->regs[R_EAX], 32, 32, env->regs[R_EDX]);
-newv = deposit64(env->regs[R_EBX], 32, 32, env->regs[R_ECX]);
-
-oldv = cpu_ldq_data_ra(env, a0, ra);
-newv = (cmpv == oldv ? newv : oldv);
-/* always do the store */
-cpu_stq_data_ra(env, a0, newv, ra);
-
-if (oldv == cmpv) {
-eflags |= CC_Z;
-} else {
-env->regs[R_EAX] = (uint32_t)oldv;
-env->regs[R_EDX] = (uint32_t)(oldv >> 32);
-eflags &= ~CC_Z;
-}
-CC_SRC = eflags;
-}
-
-void helper_cmpxchg8b(CPUX86State *env, target_ulong a0)
-{
-#ifdef CONFIG_ATOMIC64
-uint64_t oldv, cmpv, newv;
-int eflags;
-
-eflags = cpu_cc_compute_all(env, CC_OP);
-
-cmpv = deposit64(env->regs[R_EAX], 32, 32, env->regs[R_EDX]);
-newv = deposit64(env->regs[R_EBX], 32, 32, env->regs[R_ECX]);
-
-{
-uintptr_t ra = GETPC();
-int mem_idx = cpu_mmu_index(env, false);
-MemOpIdx oi = make_memop_idx(MO_TEUQ, mem_idx);
-oldv = cpu_atomic_cmpxchgq_le_mmu(env, a0, cmpv, newv, oi, ra);
-}
-
-if (oldv == cmpv) {
-eflags |= CC_Z;
-} else {
-env->regs[R_EAX] = (uint32_t)oldv;
-env->regs[R_EDX] = (uint32_t)(oldv >> 32);
-eflags &= ~CC_Z;
-}
-CC_SRC = eflags;
-#else
-cpu_loop_exit_atomic(env_cpu(env), GETPC());
-#endif /* CONFIG_ATOMIC64 */
-}
-
 #ifdef TARGET_X86_64
 void helper_cmpxchg16b_unlocked(CPUX86State *env, target_ulong a0)
 {
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index a82131d635..b542b084a6 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -2995,15 +2995,59 @@ static void gen_sty_env_A0(DisasContext *s, int offset, 
bool align)
 
 static void gen_cmpxchg8b(DisasContext *s, CPUX86State *env, int modrm)
 {
+TCGv_i64 cmp, val, old;
+TCGv Z;
+
 gen_lea_modrm(env, s, modrm);
 
-if ((s->prefix & PREFIX_LOCK) &&
-(tb_cflags(s->base.tb) & CF_PARALLEL)) {
-gen_helper_cmpxchg8b(cpu_env, s->A0);
+cmp = tcg_temp_new_i64();
+val = tcg_temp_new_i64();
+old = tcg_temp_new_i64();
+
+/* Construct the comparison values from the register pair. */
+tcg_gen_concat_tl_i64(cmp, cpu_regs[R_EAX], cpu_regs[R_EDX]);
+tcg_gen_concat_tl_i64(val, cpu_regs[R_EBX], cpu_regs[R_ECX]);
+
+/* Only require atomic with LOCK; non-parallel handled in generator. */
+if (s->prefix & PREFIX_LOCK) {
+tcg_gen_atomic_cmpxchg_i64(old, s->A0, cmp, val, s->mem_index, 
MO_TEUQ);
 } else {
-gen_helper_cmpxchg8b_unlocked(cpu_env, s->A0);
+tcg_gen_nonatomic_cmpxchg_i64(old, s->A0, cmp, val,
+  s->mem_index, MO_TEUQ);
 }
-set_cc_op(s, CC_OP_EFLAGS);
+tcg_temp_free_i64(val);
+
+/* Set tmp0 to match the required value of Z. */
+tcg_gen_setcond_i64(TCG_COND_EQ, cmp, old, cmp);
+Z = tcg_temp_new();
+tcg_gen_trunc_i64_tl(Z, cmp);
+tcg_temp_free_i64(cmp);
+
+/*
+ * Extract the result values for the register pair.
+ * For 32-bit, we may do this unconditionally, because on success (Z=1),
+ * the old value matches the previous value in EDX:EAX.  For x86_64,
+ * the store must be conditional, because we must leave the source
+ * registers unchanged on success, and zero-extend the writeback
+ * on failure (Z=0).
+ */
+if (TARGET_LONG_BITS == 32) {
+tcg_gen_extr_i64_tl(cpu_regs[R_EAX], cpu_regs[R_EDX], old);
+} else {
+TCGv zero = tcg_constant_tl(0);
+
+tcg_gen_extr_i64_tl(s->T0, s->T1, old);
+tcg_gen_movcond_tl(TCG_COND_EQ, cpu_regs[R_EAX], Z, zero,
+

[PULL 20/40] target/arm: Use tcg_gen_atomic_cmpxchg_i128 for STXP

2023-02-04 Thread Richard Henderson

Signed-off-by: Richard Henderson 
Reviewed-by: Peter Maydell 
Message-Id: <20221112042555.2622152-2-richard.hender...@linaro.org>
---
 target/arm/helper-a64.h|   6 ---
 target/arm/helper-a64.c| 104 -
 target/arm/translate-a64.c |  60 -
 3 files changed, 35 insertions(+), 135 deletions(-)

diff --git a/target/arm/helper-a64.h b/target/arm/helper-a64.h
index 7b706571bb..94065d1917 100644
--- a/target/arm/helper-a64.h
+++ b/target/arm/helper-a64.h
@@ -50,12 +50,6 @@ DEF_HELPER_FLAGS_2(frecpx_f16, TCG_CALL_NO_RWG, f16, f16, 
ptr)
 DEF_HELPER_FLAGS_2(fcvtx_f64_to_f32, TCG_CALL_NO_RWG, f32, f64, env)
 DEF_HELPER_FLAGS_3(crc32_64, TCG_CALL_NO_RWG_SE, i64, i64, i64, i32)
 DEF_HELPER_FLAGS_3(crc32c_64, TCG_CALL_NO_RWG_SE, i64, i64, i64, i32)
-DEF_HELPER_FLAGS_4(paired_cmpxchg64_le, TCG_CALL_NO_WG, i64, env, i64, i64, 
i64)
-DEF_HELPER_FLAGS_4(paired_cmpxchg64_le_parallel, TCG_CALL_NO_WG,
-   i64, env, i64, i64, i64)
-DEF_HELPER_FLAGS_4(paired_cmpxchg64_be, TCG_CALL_NO_WG, i64, env, i64, i64, 
i64)
-DEF_HELPER_FLAGS_4(paired_cmpxchg64_be_parallel, TCG_CALL_NO_WG,
-   i64, env, i64, i64, i64)
 DEF_HELPER_5(casp_le_parallel, void, env, i32, i64, i64, i64)
 DEF_HELPER_5(casp_be_parallel, void, env, i32, i64, i64, i64)
 DEF_HELPER_FLAGS_3(advsimd_maxh, TCG_CALL_NO_RWG, f16, f16, f16, ptr)
diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
index 77a8502b6b..7dbdb2c233 100644
--- a/target/arm/helper-a64.c
+++ b/target/arm/helper-a64.c
@@ -505,110 +505,6 @@ uint64_t HELPER(crc32c_64)(uint64_t acc, uint64_t val, 
uint32_t bytes)
 return crc32c(acc, buf, bytes) ^ 0x;
 }
 
-uint64_t HELPER(paired_cmpxchg64_le)(CPUARMState *env, uint64_t addr,
- uint64_t new_lo, uint64_t new_hi)
-{
-Int128 cmpv = int128_make128(env->exclusive_val, env->exclusive_high);
-Int128 newv = int128_make128(new_lo, new_hi);
-Int128 oldv;
-uintptr_t ra = GETPC();
-uint64_t o0, o1;
-bool success;
-int mem_idx = cpu_mmu_index(env, false);
-MemOpIdx oi0 = make_memop_idx(MO_LEUQ | MO_ALIGN_16, mem_idx);
-MemOpIdx oi1 = make_memop_idx(MO_LEUQ, mem_idx);
-
-o0 = cpu_ldq_le_mmu(env, addr + 0, oi0, ra);
-o1 = cpu_ldq_le_mmu(env, addr + 8, oi1, ra);
-oldv = int128_make128(o0, o1);
-
-success = int128_eq(oldv, cmpv);
-if (success) {
-cpu_stq_le_mmu(env, addr + 0, int128_getlo(newv), oi1, ra);
-cpu_stq_le_mmu(env, addr + 8, int128_gethi(newv), oi1, ra);
-}
-
-return !success;
-}
-
-uint64_t HELPER(paired_cmpxchg64_le_parallel)(CPUARMState *env, uint64_t addr,
-  uint64_t new_lo, uint64_t new_hi)
-{
-Int128 oldv, cmpv, newv;
-uintptr_t ra = GETPC();
-bool success;
-int mem_idx;
-MemOpIdx oi;
-
-assert(HAVE_CMPXCHG128);
-
-mem_idx = cpu_mmu_index(env, false);
-oi = make_memop_idx(MO_LE | MO_128 | MO_ALIGN, mem_idx);
-
-cmpv = int128_make128(env->exclusive_val, env->exclusive_high);
-newv = int128_make128(new_lo, new_hi);
-oldv = cpu_atomic_cmpxchgo_le_mmu(env, addr, cmpv, newv, oi, ra);
-
-success = int128_eq(oldv, cmpv);
-return !success;
-}
-
-uint64_t HELPER(paired_cmpxchg64_be)(CPUARMState *env, uint64_t addr,
- uint64_t new_lo, uint64_t new_hi)
-{
-/*
- * High and low need to be switched here because this is not actually a
- * 128bit store but two doublewords stored consecutively
- */
-Int128 cmpv = int128_make128(env->exclusive_high, env->exclusive_val);
-Int128 newv = int128_make128(new_hi, new_lo);
-Int128 oldv;
-uintptr_t ra = GETPC();
-uint64_t o0, o1;
-bool success;
-int mem_idx = cpu_mmu_index(env, false);
-MemOpIdx oi0 = make_memop_idx(MO_BEUQ | MO_ALIGN_16, mem_idx);
-MemOpIdx oi1 = make_memop_idx(MO_BEUQ, mem_idx);
-
-o1 = cpu_ldq_be_mmu(env, addr + 0, oi0, ra);
-o0 = cpu_ldq_be_mmu(env, addr + 8, oi1, ra);
-oldv = int128_make128(o0, o1);
-
-success = int128_eq(oldv, cmpv);
-if (success) {
-cpu_stq_be_mmu(env, addr + 0, int128_gethi(newv), oi1, ra);
-cpu_stq_be_mmu(env, addr + 8, int128_getlo(newv), oi1, ra);
-}
-
-return !success;
-}
-
-uint64_t HELPER(paired_cmpxchg64_be_parallel)(CPUARMState *env, uint64_t addr,
-  uint64_t new_lo, uint64_t new_hi)
-{
-Int128 oldv, cmpv, newv;
-uintptr_t ra = GETPC();
-bool success;
-int mem_idx;
-MemOpIdx oi;
-
-assert(HAVE_CMPXCHG128);
-
-mem_idx = cpu_mmu_index(env, false);
-oi = make_memop_idx(MO_BE | MO_128 | MO_ALIGN, mem_idx);
-
-/*
- * High and low need to be switched here because this is not actually a
- * 128bit store but two doublewords stored consecutively
- */
-cmpv = int128_make128(env->exclusive_high, env->exclusive_val);
-newv = int128_make128(ne

[PULL 19/40] tcg: Split out tcg_gen_nonatomic_cmpxchg_i{32,64}

2023-02-04 Thread Richard Henderson

Normally this is automatically handled by the CF_PARALLEL checks
with in tcg_gen_atomic_cmpxchg_i{32,64}, but x86 has a special
case of !PREFIX_LOCK where it always wants the non-atomic version.

Split these out so that x86 does not have to roll its own.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 include/tcg/tcg-op.h |   4 ++
 tcg/tcg-op.c | 154 +++
 2 files changed, 101 insertions(+), 57 deletions(-)

diff --git a/include/tcg/tcg-op.h b/include/tcg/tcg-op.h
index 31bf3d287e..839d91c0c7 100644
--- a/include/tcg/tcg-op.h
+++ b/include/tcg/tcg-op.h
@@ -910,6 +910,10 @@ void tcg_gen_atomic_cmpxchg_i64(TCGv_i64, TCGv, TCGv_i64, 
TCGv_i64,
 void tcg_gen_atomic_cmpxchg_i128(TCGv_i128, TCGv, TCGv_i128, TCGv_i128,
  TCGArg, MemOp);
 
+void tcg_gen_nonatomic_cmpxchg_i32(TCGv_i32, TCGv, TCGv_i32, TCGv_i32,
+   TCGArg, MemOp);
+void tcg_gen_nonatomic_cmpxchg_i64(TCGv_i64, TCGv, TCGv_i64, TCGv_i64,
+   TCGArg, MemOp);
 void tcg_gen_nonatomic_cmpxchg_i128(TCGv_i128, TCGv, TCGv_i128, TCGv_i128,
 TCGArg, MemOp);
 
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 5811ecd3e7..c581ae77c4 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -3325,82 +3325,122 @@ static void * const table_cmpxchg[(MO_SIZE | MO_BSWAP) 
+ 1] = {
 WITH_ATOMIC128([MO_128 | MO_BE] = gen_helper_atomic_cmpxchgo_be)
 };
 
+void tcg_gen_nonatomic_cmpxchg_i32(TCGv_i32 retv, TCGv addr, TCGv_i32 cmpv,
+   TCGv_i32 newv, TCGArg idx, MemOp memop)
+{
+TCGv_i32 t1 = tcg_temp_new_i32();
+TCGv_i32 t2 = tcg_temp_new_i32();
+
+tcg_gen_ext_i32(t2, cmpv, memop & MO_SIZE);
+
+tcg_gen_qemu_ld_i32(t1, addr, idx, memop & ~MO_SIGN);
+tcg_gen_movcond_i32(TCG_COND_EQ, t2, t1, t2, newv, t1);
+tcg_gen_qemu_st_i32(t2, addr, idx, memop);
+tcg_temp_free_i32(t2);
+
+if (memop & MO_SIGN) {
+tcg_gen_ext_i32(retv, t1, memop);
+} else {
+tcg_gen_mov_i32(retv, t1);
+}
+tcg_temp_free_i32(t1);
+}
+
 void tcg_gen_atomic_cmpxchg_i32(TCGv_i32 retv, TCGv addr, TCGv_i32 cmpv,
 TCGv_i32 newv, TCGArg idx, MemOp memop)
 {
-memop = tcg_canonicalize_memop(memop, 0, 0);
+gen_atomic_cx_i32 gen;
+MemOpIdx oi;
 
 if (!(tcg_ctx->gen_tb->cflags & CF_PARALLEL)) {
-TCGv_i32 t1 = tcg_temp_new_i32();
-TCGv_i32 t2 = tcg_temp_new_i32();
-
-tcg_gen_ext_i32(t2, cmpv, memop & MO_SIZE);
-
-tcg_gen_qemu_ld_i32(t1, addr, idx, memop & ~MO_SIGN);
-tcg_gen_movcond_i32(TCG_COND_EQ, t2, t1, t2, newv, t1);
-tcg_gen_qemu_st_i32(t2, addr, idx, memop);
-tcg_temp_free_i32(t2);
-
-if (memop & MO_SIGN) {
-tcg_gen_ext_i32(retv, t1, memop);
-} else {
-tcg_gen_mov_i32(retv, t1);
-}
-tcg_temp_free_i32(t1);
-} else {
-gen_atomic_cx_i32 gen;
-MemOpIdx oi;
-
-gen = table_cmpxchg[memop & (MO_SIZE | MO_BSWAP)];
-tcg_debug_assert(gen != NULL);
-
-oi = make_memop_idx(memop & ~MO_SIGN, idx);
-gen(retv, cpu_env, addr, cmpv, newv, tcg_constant_i32(oi));
-
-if (memop & MO_SIGN) {
-tcg_gen_ext_i32(retv, retv, memop);
-}
+tcg_gen_nonatomic_cmpxchg_i32(retv, addr, cmpv, newv, idx, memop);
+return;
 }
+
+memop = tcg_canonicalize_memop(memop, 0, 0);
+gen = table_cmpxchg[memop & (MO_SIZE | MO_BSWAP)];
+tcg_debug_assert(gen != NULL);
+
+oi = make_memop_idx(memop & ~MO_SIGN, idx);
+gen(retv, cpu_env, addr, cmpv, newv, tcg_constant_i32(oi));
+
+if (memop & MO_SIGN) {
+tcg_gen_ext_i32(retv, retv, memop);
+}
+}
+
+void tcg_gen_nonatomic_cmpxchg_i64(TCGv_i64 retv, TCGv addr, TCGv_i64 cmpv,
+   TCGv_i64 newv, TCGArg idx, MemOp memop)
+{
+TCGv_i64 t1, t2;
+
+if (TCG_TARGET_REG_BITS == 32 && (memop & MO_SIZE) < MO_64) {
+tcg_gen_nonatomic_cmpxchg_i32(TCGV_LOW(retv), addr, TCGV_LOW(cmpv),
+  TCGV_LOW(newv), idx, memop);
+if (memop & MO_SIGN) {
+tcg_gen_sari_i32(TCGV_HIGH(retv), TCGV_LOW(retv), 31);
+} else {
+tcg_gen_movi_i32(TCGV_HIGH(retv), 0);
+}
+return;
+}
+
+t1 = tcg_temp_new_i64();
+t2 = tcg_temp_new_i64();
+
+tcg_gen_ext_i64(t2, cmpv, memop & MO_SIZE);
+
+tcg_gen_qemu_ld_i64(t1, addr, idx, memop & ~MO_SIGN);
+tcg_gen_movcond_i64(TCG_COND_EQ, t2, t1, t2, newv, t1);
+tcg_gen_qemu_st_i64(t2, addr, idx, memop);
+tcg_temp_free_i64(t2);
+
+if (memop & MO_SIGN) {
+tcg_gen_ext_i64(retv, t1, memop);
+} else {
+tcg_gen_mov_i64(retv, t1);
+}
+tcg_temp_free_i64(t1);
 }
 
 void tcg_gen_atomic_cmpxchg_i64(TCGv_i64 retv, TCGv addr, TCGv_i64 cmpv,
 TCGv

[PULL 18/40] tcg: Add tcg_gen_{non}atomic_cmpxchg_i128

2023-02-04 Thread Richard Henderson

This will allow targets to avoid rolling their own.

Reviewed-by: Alex Bennée 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 accel/tcg/tcg-runtime.h   | 11 +
 include/tcg/tcg-op.h  |  5 +++
 tcg/tcg-op.c  | 85 +++
 accel/tcg/atomic_common.c.inc | 45 +++
 4 files changed, 146 insertions(+)

diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h
index 37cbd722bf..e141a6ab24 100644
--- a/accel/tcg/tcg-runtime.h
+++ b/accel/tcg/tcg-runtime.h
@@ -55,6 +55,17 @@ DEF_HELPER_FLAGS_5(atomic_cmpxchgq_be, TCG_CALL_NO_WG,
 DEF_HELPER_FLAGS_5(atomic_cmpxchgq_le, TCG_CALL_NO_WG,
i64, env, tl, i64, i64, i32)
 #endif
+#ifdef CONFIG_CMPXCHG128
+DEF_HELPER_FLAGS_5(atomic_cmpxchgo_be, TCG_CALL_NO_WG,
+   i128, env, tl, i128, i128, i32)
+DEF_HELPER_FLAGS_5(atomic_cmpxchgo_le, TCG_CALL_NO_WG,
+   i128, env, tl, i128, i128, i32)
+#endif
+
+DEF_HELPER_FLAGS_5(nonatomic_cmpxchgo_be, TCG_CALL_NO_WG,
+   i128, env, tl, i128, i128, i32)
+DEF_HELPER_FLAGS_5(nonatomic_cmpxchgo_le, TCG_CALL_NO_WG,
+   i128, env, tl, i128, i128, i32)
 
 #ifdef CONFIG_ATOMIC64
 #define GEN_ATOMIC_HELPERS(NAME)  \
diff --git a/include/tcg/tcg-op.h b/include/tcg/tcg-op.h
index e5f5b63c37..31bf3d287e 100644
--- a/include/tcg/tcg-op.h
+++ b/include/tcg/tcg-op.h
@@ -907,6 +907,11 @@ void tcg_gen_atomic_cmpxchg_i32(TCGv_i32, TCGv, TCGv_i32, 
TCGv_i32,
 TCGArg, MemOp);
 void tcg_gen_atomic_cmpxchg_i64(TCGv_i64, TCGv, TCGv_i64, TCGv_i64,
 TCGArg, MemOp);
+void tcg_gen_atomic_cmpxchg_i128(TCGv_i128, TCGv, TCGv_i128, TCGv_i128,
+ TCGArg, MemOp);
+
+void tcg_gen_nonatomic_cmpxchg_i128(TCGv_i128, TCGv, TCGv_i128, TCGv_i128,
+TCGArg, MemOp);
 
 void tcg_gen_atomic_xchg_i32(TCGv_i32, TCGv, TCGv_i32, TCGArg, MemOp);
 void tcg_gen_atomic_xchg_i64(TCGv_i64, TCGv, TCGv_i64, TCGArg, MemOp);
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 33ef325f6e..5811ecd3e7 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -3295,6 +3295,8 @@ typedef void (*gen_atomic_cx_i32)(TCGv_i32, TCGv_env, 
TCGv,
   TCGv_i32, TCGv_i32, TCGv_i32);
 typedef void (*gen_atomic_cx_i64)(TCGv_i64, TCGv_env, TCGv,
   TCGv_i64, TCGv_i64, TCGv_i32);
+typedef void (*gen_atomic_cx_i128)(TCGv_i128, TCGv_env, TCGv,
+   TCGv_i128, TCGv_i128, TCGv_i32);
 typedef void (*gen_atomic_op_i32)(TCGv_i32, TCGv_env, TCGv,
   TCGv_i32, TCGv_i32);
 typedef void (*gen_atomic_op_i64)(TCGv_i64, TCGv_env, TCGv,
@@ -3305,6 +3307,11 @@ typedef void (*gen_atomic_op_i64)(TCGv_i64, TCGv_env, 
TCGv,
 #else
 # define WITH_ATOMIC64(X)
 #endif
+#ifdef CONFIG_CMPXCHG128
+# define WITH_ATOMIC128(X) X,
+#else
+# define WITH_ATOMIC128(X)
+#endif
 
 static void * const table_cmpxchg[(MO_SIZE | MO_BSWAP) + 1] = {
 [MO_8] = gen_helper_atomic_cmpxchgb,
@@ -3314,6 +3321,8 @@ static void * const table_cmpxchg[(MO_SIZE | MO_BSWAP) + 
1] = {
 [MO_32 | MO_BE] = gen_helper_atomic_cmpxchgl_be,
 WITH_ATOMIC64([MO_64 | MO_LE] = gen_helper_atomic_cmpxchgq_le)
 WITH_ATOMIC64([MO_64 | MO_BE] = gen_helper_atomic_cmpxchgq_be)
+WITH_ATOMIC128([MO_128 | MO_LE] = gen_helper_atomic_cmpxchgo_le)
+WITH_ATOMIC128([MO_128 | MO_BE] = gen_helper_atomic_cmpxchgo_be)
 };
 
 void tcg_gen_atomic_cmpxchg_i32(TCGv_i32 retv, TCGv addr, TCGv_i32 cmpv,
@@ -3412,6 +3421,82 @@ void tcg_gen_atomic_cmpxchg_i64(TCGv_i64 retv, TCGv 
addr, TCGv_i64 cmpv,
 }
 }
 
+void tcg_gen_nonatomic_cmpxchg_i128(TCGv_i128 retv, TCGv addr, TCGv_i128 cmpv,
+TCGv_i128 newv, TCGArg idx, MemOp memop)
+{
+if (TCG_TARGET_REG_BITS == 32) {
+/* Inline expansion below is simply too large for 32-bit hosts. */
+gen_atomic_cx_i128 gen = ((memop & MO_BSWAP) == MO_LE
+  ? gen_helper_nonatomic_cmpxchgo_le 
+  : gen_helper_nonatomic_cmpxchgo_be);
+MemOpIdx oi = make_memop_idx(memop, idx);
+
+tcg_debug_assert((memop & MO_SIZE) == MO_128);
+tcg_debug_assert((memop & MO_SIGN) == 0);
+
+gen(retv, cpu_env, addr, cmpv, newv, tcg_constant_i32(oi));
+} else {
+TCGv_i128 oldv = tcg_temp_new_i128();
+TCGv_i128 tmpv = tcg_temp_new_i128();
+TCGv_i64 t0 = tcg_temp_new_i64();
+TCGv_i64 t1 = tcg_temp_new_i64();
+TCGv_i64 z = tcg_constant_i64(0);
+
+tcg_gen_qemu_ld_i128(oldv, addr, idx, memop);
+
+/* Compare i128 */
+tcg_gen_xor_i64(t0, TCGV128_LOW(oldv), TCGV128_LOW(cmpv));
+tcg_gen_xor_i64(t1, TCGV128_HIGH(oldv), TCGV128_HIGH(cmpv));
+tcg_gen_or_i64(t0, t0, t1);
+
+/* tmpv = equal

[PULL 25/40] tests/tcg/s390x: Add long-double.c

2023-02-04 Thread Richard Henderson

Acked-by: Ilya Leoshkevich 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tests/tcg/s390x/long-double.c   | 24 
 tests/tcg/s390x/Makefile.target |  1 +
 2 files changed, 25 insertions(+)
 create mode 100644 tests/tcg/s390x/long-double.c

diff --git a/tests/tcg/s390x/long-double.c b/tests/tcg/s390x/long-double.c
new file mode 100644
index 00..757a6262fd
--- /dev/null
+++ b/tests/tcg/s390x/long-double.c
@@ -0,0 +1,24 @@
+/*
+ * Perform some basic arithmetic with long double, as a sanity check.
+ * With small integral numbers, we can cross-check with integers.
+ */
+
+#include 
+
+int main()
+{
+int i, j;
+
+for (i = 1; i < 5; i++) {
+for (j = 1; j < 5; j++) {
+long double la = (long double)i + j;
+long double lm = (long double)i * j;
+long double ls = (long double)i - j;
+
+assert(la == i + j);
+assert(lm == i * j);
+assert(ls == i - j);
+}
+}
+return 0;
+}
diff --git a/tests/tcg/s390x/Makefile.target b/tests/tcg/s390x/Makefile.target
index 79250f31dd..1d454270c0 100644
--- a/tests/tcg/s390x/Makefile.target
+++ b/tests/tcg/s390x/Makefile.target
@@ -26,6 +26,7 @@ TESTS+=branch-relative-long
 TESTS+=noexec
 TESTS+=div
 TESTS+=clst
+TESTS+=long-double
 
 Z13_TESTS=vistr
 $(Z13_TESTS): CFLAGS+=-march=z13 -O2
-- 
2.34.1

[PULL 07/40] tcg: Add TCG_CALL_{RET,ARG}_BY_REF

2023-02-04 Thread Richard Henderson

These will be used by some hosts, both 32 and 64-bit, to pass and
return i128.  Not yet used, because allocation is not yet enabled.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 tcg/tcg-internal.h |   3 +
 tcg/tcg.c  | 135 -
 2 files changed, 135 insertions(+), 3 deletions(-)

diff --git a/tcg/tcg-internal.h b/tcg/tcg-internal.h
index 6e50aeba3a..2ec1ea01df 100644
--- a/tcg/tcg-internal.h
+++ b/tcg/tcg-internal.h
@@ -36,6 +36,7 @@
  */
 typedef enum {
 TCG_CALL_RET_NORMAL, /* by registers */
+TCG_CALL_RET_BY_REF, /* for i128, by reference */
 } TCGCallReturnKind;
 
 typedef enum {
@@ -44,6 +45,8 @@ typedef enum {
 TCG_CALL_ARG_EXTEND, /* for i32, as a sign/zero-extended i64 */
 TCG_CALL_ARG_EXTEND_U,   /*  ... as a zero-extended i64 */
 TCG_CALL_ARG_EXTEND_S,   /*  ... as a sign-extended i64 */
+TCG_CALL_ARG_BY_REF, /* for i128, by reference, first */
+TCG_CALL_ARG_BY_REF_N,   /*   ... by reference, subsequent */
 } TCGCallArgumentKind;
 
 typedef struct TCGCallArgumentLoc {
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 8923b52044..123cde7000 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -104,8 +104,7 @@ static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg 
ret, TCGReg arg1,
 static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg);
 static void tcg_out_movi(TCGContext *s, TCGType type,
  TCGReg ret, tcg_target_long arg);
-static void tcg_out_addi_ptr(TCGContext *s, TCGReg, TCGReg, tcg_target_long)
-__attribute__((unused));
+static void tcg_out_addi_ptr(TCGContext *s, TCGReg, TCGReg, tcg_target_long);
 static void tcg_out_exit_tb(TCGContext *s, uintptr_t arg);
 static void tcg_out_goto_tb(TCGContext *s, int which);
 static void tcg_out_op(TCGContext *s, TCGOpcode opc,
@@ -683,6 +682,38 @@ static void layout_arg_normal_n(TCGCumulativeArgs *cum,
 cum->arg_slot += n;
 }
 
+static void layout_arg_by_ref(TCGCumulativeArgs *cum, TCGHelperInfo *info)
+{
+TCGCallArgumentLoc *loc = &info->in[cum->info_in_idx];
+int n = 128 / TCG_TARGET_REG_BITS;
+
+/* The first subindex carries the pointer. */
+layout_arg_1(cum, info, TCG_CALL_ARG_BY_REF);
+
+/*
+ * The callee is allowed to clobber memory associated with
+ * structure pass by-reference.  Therefore we must make copies.
+ * Allocate space from "ref_slot", which will be adjusted to
+ * follow the parameters on the stack.
+ */
+loc[0].ref_slot = cum->ref_slot;
+
+/*
+ * Subsequent words also go into the reference slot, but
+ * do not accumulate into the regular arguments.
+ */
+for (int i = 1; i < n; ++i) {
+loc[i] = (TCGCallArgumentLoc){
+.kind = TCG_CALL_ARG_BY_REF_N,
+.arg_idx = cum->arg_idx,
+.tmp_subindex = i,
+.ref_slot = cum->ref_slot + i,
+};
+}
+cum->info_in_idx += n;
+cum->ref_slot += n;
+}
+
 static void init_call_layout(TCGHelperInfo *info)
 {
 int max_reg_slots = ARRAY_SIZE(tcg_target_call_iarg_regs);
@@ -718,6 +749,14 @@ static void init_call_layout(TCGHelperInfo *info)
 case TCG_CALL_RET_NORMAL:
 assert(info->nr_out <= ARRAY_SIZE(tcg_target_call_oarg_regs));
 break;
+case TCG_CALL_RET_BY_REF:
+/*
+ * Allocate the first argument to the output.
+ * We don't need to store this anywhere, just make it
+ * unavailable for use in the input loop below.
+ */
+cum.arg_slot = 1;
+break;
 default:
 qemu_build_not_reached();
 }
@@ -796,6 +835,9 @@ static void init_call_layout(TCGHelperInfo *info)
 case TCG_CALL_ARG_NORMAL:
 layout_arg_normal_n(&cum, info, 128 / TCG_TARGET_REG_BITS);
 break;
+case TCG_CALL_ARG_BY_REF:
+layout_arg_by_ref(&cum, info);
+break;
 default:
 qemu_build_not_reached();
 }
@@ -811,7 +853,39 @@ static void init_call_layout(TCGHelperInfo *info)
 assert(cum.info_in_idx <= ARRAY_SIZE(info->in));
 /* Validate the backend has enough argument space. */
 assert(cum.arg_slot <= max_reg_slots + max_stk_slots);
-assert(cum.ref_slot <= max_stk_slots);
+
+/*
+ * Relocate the "ref_slot" area to the end of the parameters.
+ * Minimizing this stack offset helps code size for x86,
+ * which has a signed 8-bit offset encoding.
+ */
+if (cum.ref_slot != 0) {
+int ref_base = 0;
+
+if (cum.arg_slot > max_reg_slots) {
+int align = __alignof(Int128) / sizeof(tcg_target_long);
+
+ref_base = cum.arg_slot - max_reg_slots;
+if (align > 1) {
+ref_base = ROUND_UP(ref_base, align);
+}
+}
+assert(ref_base + cum.ref_slot <=

[PULL 05/40] tcg: Allocate objects contiguously in temp_allocate_frame

2023-02-04 Thread Richard Henderson

When allocating a temp to the stack frame, consider the
base type and allocate all parts at once.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 tcg/tcg.c | 34 ++
 1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index bc7198e5d0..cdfc50b164 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -3267,11 +3267,12 @@ static bool liveness_pass_2(TCGContext *s)
 
 static void temp_allocate_frame(TCGContext *s, TCGTemp *ts)
 {
-int size = tcg_type_size(ts->type);
-int align;
 intptr_t off;
+int size, align;
 
-switch (ts->type) {
+/* When allocating an object, look at the full type. */
+size = tcg_type_size(ts->base_type);
+switch (ts->base_type) {
 case TCG_TYPE_I32:
 align = 4;
 break;
@@ -3302,13 +3303,30 @@ static void temp_allocate_frame(TCGContext *s, TCGTemp 
*ts)
 tcg_raise_tb_overflow(s);
 }
 s->current_frame_offset = off + size;
-
-ts->mem_offset = off;
 #if defined(__sparc__)
-ts->mem_offset += TCG_TARGET_STACK_BIAS;
+off += TCG_TARGET_STACK_BIAS;
 #endif
-ts->mem_base = s->frame_temp;
-ts->mem_allocated = 1;
+
+/* If the object was subdivided, assign memory to all the parts. */
+if (ts->base_type != ts->type) {
+int part_size = tcg_type_size(ts->type);
+int part_count = size / part_size;
+
+/*
+ * Each part is allocated sequentially in tcg_temp_new_internal.
+ * Jump back to the first part by subtracting the current index.
+ */
+ts -= ts->temp_subindex;
+for (int i = 0; i < part_count; ++i) {
+ts[i].mem_offset = off + i * part_size;
+ts[i].mem_base = s->frame_temp;
+ts[i].mem_allocated = 1;
+}
+} else {
+ts->mem_offset = off;
+ts->mem_base = s->frame_temp;
+ts->mem_allocated = 1;
+}
 }
 
 /* Assign @reg to @ts, and update reg_to_temp[]. */
-- 
2.34.1

[PULL 22/40] target/ppc: Use tcg_gen_atomic_cmpxchg_i128 for STQCX

2023-02-04 Thread Richard Henderson

Note that the previous direct reference to reserve_val,

-   tcg_gen_ld_i64(t1, cpu_env, (ctx->le_mode
-? offsetof(CPUPPCState, reserve_val2)
-: offsetof(CPUPPCState, reserve_val)));

was incorrect because all references should have gone through
cpu_reserve_val.  Create a cpu_reserve_val2 tcg temp to fix this.

Signed-off-by: Richard Henderson 
Reviewed-by: Daniel Henrique Barboza 
Message-Id: <20221112061122.2720163-2-richard.hender...@linaro.org>
---
 target/ppc/helper.h |   2 -
 target/ppc/mem_helper.c |  44 -
 target/ppc/translate.c  | 102 ++--
 3 files changed, 47 insertions(+), 101 deletions(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 8dd22a35e4..0beaca5c7a 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -818,6 +818,4 @@ DEF_HELPER_FLAGS_5(stq_le_parallel, TCG_CALL_NO_WG,
void, env, tl, i64, i64, i32)
 DEF_HELPER_FLAGS_5(stq_be_parallel, TCG_CALL_NO_WG,
void, env, tl, i64, i64, i32)
-DEF_HELPER_5(stqcx_le_parallel, i32, env, tl, i64, i64, i32)
-DEF_HELPER_5(stqcx_be_parallel, i32, env, tl, i64, i64, i32)
 #endif
diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c
index d1163f316c..1578887a8f 100644
--- a/target/ppc/mem_helper.c
+++ b/target/ppc/mem_helper.c
@@ -413,50 +413,6 @@ void helper_stq_be_parallel(CPUPPCState *env, target_ulong 
addr,
 val = int128_make128(lo, hi);
 cpu_atomic_sto_be_mmu(env, addr, val, opidx, GETPC());
 }
-
-uint32_t helper_stqcx_le_parallel(CPUPPCState *env, target_ulong addr,
-  uint64_t new_lo, uint64_t new_hi,
-  uint32_t opidx)
-{
-bool success = false;
-
-/* We will have raised EXCP_ATOMIC from the translator.  */
-assert(HAVE_CMPXCHG128);
-
-if (likely(addr == env->reserve_addr)) {
-Int128 oldv, cmpv, newv;
-
-cmpv = int128_make128(env->reserve_val2, env->reserve_val);
-newv = int128_make128(new_lo, new_hi);
-oldv = cpu_atomic_cmpxchgo_le_mmu(env, addr, cmpv, newv,
-  opidx, GETPC());
-success = int128_eq(oldv, cmpv);
-}
-env->reserve_addr = -1;
-return env->so + success * CRF_EQ_BIT;
-}
-
-uint32_t helper_stqcx_be_parallel(CPUPPCState *env, target_ulong addr,
-  uint64_t new_lo, uint64_t new_hi,
-  uint32_t opidx)
-{
-bool success = false;
-
-/* We will have raised EXCP_ATOMIC from the translator.  */
-assert(HAVE_CMPXCHG128);
-
-if (likely(addr == env->reserve_addr)) {
-Int128 oldv, cmpv, newv;
-
-cmpv = int128_make128(env->reserve_val2, env->reserve_val);
-newv = int128_make128(new_lo, new_hi);
-oldv = cpu_atomic_cmpxchgo_be_mmu(env, addr, cmpv, newv,
-  opidx, GETPC());
-success = int128_eq(oldv, cmpv);
-}
-env->reserve_addr = -1;
-return env->so + success * CRF_EQ_BIT;
-}
 #endif
 
 /*/
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index edb3daa9b5..1c17d5a558 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -72,6 +72,7 @@ static TCGv cpu_cfar;
 static TCGv cpu_xer, cpu_so, cpu_ov, cpu_ca, cpu_ov32, cpu_ca32;
 static TCGv cpu_reserve;
 static TCGv cpu_reserve_val;
+static TCGv cpu_reserve_val2;
 static TCGv cpu_fpscr;
 static TCGv_i32 cpu_access_type;
 
@@ -141,8 +142,11 @@ void ppc_translate_init(void)
  offsetof(CPUPPCState, reserve_addr),
  "reserve_addr");
 cpu_reserve_val = tcg_global_mem_new(cpu_env,
- offsetof(CPUPPCState, reserve_val),
- "reserve_val");
+ offsetof(CPUPPCState, reserve_val),
+ "reserve_val");
+cpu_reserve_val2 = tcg_global_mem_new(cpu_env,
+  offsetof(CPUPPCState, reserve_val2),
+  "reserve_val2");
 
 cpu_fpscr = tcg_global_mem_new(cpu_env,
offsetof(CPUPPCState, fpscr), "fpscr");
@@ -3998,78 +4002,66 @@ static void gen_lqarx(DisasContext *ctx)
 /* stqcx. */
 static void gen_stqcx_(DisasContext *ctx)
 {
+TCGLabel *lab_fail, *lab_over;
 int rs = rS(ctx->opcode);
-TCGv EA, hi, lo;
+TCGv EA, t0, t1;
+TCGv_i128 cmp, val;
 
 if (unlikely(rs & 1)) {
 gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL);
 return;
 }
 
+lab_fail = gen_new_label();
+lab_over = gen_new_label();
+
 gen_set_access_type(ctx, ACCESS_RES);
 EA = tcg_temp_new();
 gen_addr_reg_index(ctx, EA);
 
+tcg_gen_brcon

[PULL 34/40] target/s390x: Use Int128 for passing float128

2023-02-04 Thread Richard Henderson

Acked-by: David Hildenbrand 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
v2: Fix SPEC_in1_x1.
---
 target/s390x/helper.h| 32 ++--
 target/s390x/tcg/insn-data.h.inc | 30 +--
 target/s390x/tcg/fpu_helper.c| 88 ++--
 target/s390x/tcg/translate.c | 76 ++-
 4 files changed, 121 insertions(+), 105 deletions(-)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index d40aeb471f..bccd3bfca6 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -41,55 +41,55 @@ DEF_HELPER_4(csst, i32, env, i32, i64, i64)
 DEF_HELPER_4(csst_parallel, i32, env, i32, i64, i64)
 DEF_HELPER_FLAGS_3(aeb, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(adb, TCG_CALL_NO_WG, i64, env, i64, i64)
-DEF_HELPER_FLAGS_5(axb, TCG_CALL_NO_WG, i128, env, i64, i64, i64, i64)
+DEF_HELPER_FLAGS_3(axb, TCG_CALL_NO_WG, i128, env, i128, i128)
 DEF_HELPER_FLAGS_3(seb, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(sdb, TCG_CALL_NO_WG, i64, env, i64, i64)
-DEF_HELPER_FLAGS_5(sxb, TCG_CALL_NO_WG, i128, env, i64, i64, i64, i64)
+DEF_HELPER_FLAGS_3(sxb, TCG_CALL_NO_WG, i128, env, i128, i128)
 DEF_HELPER_FLAGS_3(deb, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(ddb, TCG_CALL_NO_WG, i64, env, i64, i64)
-DEF_HELPER_FLAGS_5(dxb, TCG_CALL_NO_WG, i128, env, i64, i64, i64, i64)
+DEF_HELPER_FLAGS_3(dxb, TCG_CALL_NO_WG, i128, env, i128, i128)
 DEF_HELPER_FLAGS_3(meeb, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(mdeb, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(mdb, TCG_CALL_NO_WG, i64, env, i64, i64)
-DEF_HELPER_FLAGS_5(mxb, TCG_CALL_NO_WG, i128, env, i64, i64, i64, i64)
-DEF_HELPER_FLAGS_4(mxdb, TCG_CALL_NO_WG, i128, env, i64, i64, i64)
+DEF_HELPER_FLAGS_3(mxb, TCG_CALL_NO_WG, i128, env, i128, i128)
+DEF_HELPER_FLAGS_3(mxdb, TCG_CALL_NO_WG, i128, env, i128, i64)
 DEF_HELPER_FLAGS_2(ldeb, TCG_CALL_NO_WG, i64, env, i64)
-DEF_HELPER_FLAGS_4(ldxb, TCG_CALL_NO_WG, i64, env, i64, i64, i32)
+DEF_HELPER_FLAGS_3(ldxb, TCG_CALL_NO_WG, i64, env, i128, i32)
 DEF_HELPER_FLAGS_2(lxdb, TCG_CALL_NO_WG, i128, env, i64)
 DEF_HELPER_FLAGS_2(lxeb, TCG_CALL_NO_WG, i128, env, i64)
 DEF_HELPER_FLAGS_3(ledb, TCG_CALL_NO_WG, i64, env, i64, i32)
-DEF_HELPER_FLAGS_4(lexb, TCG_CALL_NO_WG, i64, env, i64, i64, i32)
+DEF_HELPER_FLAGS_3(lexb, TCG_CALL_NO_WG, i64, env, i128, i32)
 DEF_HELPER_FLAGS_3(ceb, TCG_CALL_NO_WG_SE, i32, env, i64, i64)
 DEF_HELPER_FLAGS_3(cdb, TCG_CALL_NO_WG_SE, i32, env, i64, i64)
-DEF_HELPER_FLAGS_5(cxb, TCG_CALL_NO_WG_SE, i32, env, i64, i64, i64, i64)
+DEF_HELPER_FLAGS_3(cxb, TCG_CALL_NO_WG_SE, i32, env, i128, i128)
 DEF_HELPER_FLAGS_3(keb, TCG_CALL_NO_WG, i32, env, i64, i64)
 DEF_HELPER_FLAGS_3(kdb, TCG_CALL_NO_WG, i32, env, i64, i64)
-DEF_HELPER_FLAGS_5(kxb, TCG_CALL_NO_WG, i32, env, i64, i64, i64, i64)
+DEF_HELPER_FLAGS_3(kxb, TCG_CALL_NO_WG, i32, env, i128, i128)
 DEF_HELPER_3(cgeb, i64, env, i64, i32)
 DEF_HELPER_3(cgdb, i64, env, i64, i32)
-DEF_HELPER_4(cgxb, i64, env, i64, i64, i32)
+DEF_HELPER_3(cgxb, i64, env, i128, i32)
 DEF_HELPER_3(cfeb, i64, env, i64, i32)
 DEF_HELPER_3(cfdb, i64, env, i64, i32)
-DEF_HELPER_4(cfxb, i64, env, i64, i64, i32)
+DEF_HELPER_3(cfxb, i64, env, i128, i32)
 DEF_HELPER_3(clgeb, i64, env, i64, i32)
 DEF_HELPER_3(clgdb, i64, env, i64, i32)
-DEF_HELPER_4(clgxb, i64, env, i64, i64, i32)
+DEF_HELPER_3(clgxb, i64, env, i128, i32)
 DEF_HELPER_3(clfeb, i64, env, i64, i32)
 DEF_HELPER_3(clfdb, i64, env, i64, i32)
-DEF_HELPER_4(clfxb, i64, env, i64, i64, i32)
+DEF_HELPER_3(clfxb, i64, env, i128, i32)
 DEF_HELPER_FLAGS_3(fieb, TCG_CALL_NO_WG, i64, env, i64, i32)
 DEF_HELPER_FLAGS_3(fidb, TCG_CALL_NO_WG, i64, env, i64, i32)
-DEF_HELPER_FLAGS_4(fixb, TCG_CALL_NO_WG, i128, env, i64, i64, i32)
+DEF_HELPER_FLAGS_3(fixb, TCG_CALL_NO_WG, i128, env, i128, i32)
 DEF_HELPER_FLAGS_4(maeb, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
 DEF_HELPER_FLAGS_4(madb, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
 DEF_HELPER_FLAGS_4(mseb, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
 DEF_HELPER_FLAGS_4(msdb, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
 DEF_HELPER_FLAGS_3(tceb, TCG_CALL_NO_RWG_SE, i32, env, i64, i64)
 DEF_HELPER_FLAGS_3(tcdb, TCG_CALL_NO_RWG_SE, i32, env, i64, i64)
-DEF_HELPER_FLAGS_4(tcxb, TCG_CALL_NO_RWG_SE, i32, env, i64, i64, i64)
+DEF_HELPER_FLAGS_3(tcxb, TCG_CALL_NO_RWG_SE, i32, env, i128, i64)
 DEF_HELPER_FLAGS_2(sqeb, TCG_CALL_NO_WG, i64, env, i64)
 DEF_HELPER_FLAGS_2(sqdb, TCG_CALL_NO_WG, i64, env, i64)
-DEF_HELPER_FLAGS_3(sqxb, TCG_CALL_NO_WG, i128, env, i64, i64)
+DEF_HELPER_FLAGS_2(sqxb, TCG_CALL_NO_WG, i128, env, i128)
 DEF_HELPER_FLAGS_1(cvd, TCG_CALL_NO_RWG_SE, i64, s32)
 DEF_HELPER_FLAGS_4(pack, TCG_CALL_NO_WG, void, env, i32, i64, i64)
 DEF_HELPER_FLAGS_4(pka, TCG_CALL_NO_WG, void, env, i64, i64, i32)
diff --git a/target/s390x/tcg/insn-data.h.inc b/target/s390x/tcg/insn-data.h.inc
index 517a4500ae..893f4b48db 100644
--- a/target/s390x/tcg/insn-data.h.inc
+++ b/targ

[PULL 23/40] tests/tcg/s390x: Add div.c

2023-02-04 Thread Richard Henderson

From: Ilya Leoshkevich 

Add a basic test to prevent regressions.

Signed-off-by: Ilya Leoshkevich 
Message-Id: <2022110300.2539919-1-...@linux.ibm.com>
Signed-off-by: Richard Henderson 
---
 tests/tcg/s390x/div.c   | 40 +
 tests/tcg/s390x/Makefile.target |  1 +
 2 files changed, 41 insertions(+)
 create mode 100644 tests/tcg/s390x/div.c

diff --git a/tests/tcg/s390x/div.c b/tests/tcg/s390x/div.c
new file mode 100644
index 00..5807295614
--- /dev/null
+++ b/tests/tcg/s390x/div.c
@@ -0,0 +1,40 @@
+#include 
+#include 
+
+static void test_dr(void)
+{
+register int32_t r0 asm("r0") = -1;
+register int32_t r1 asm("r1") = -4241;
+int32_t b = 101, q, r;
+
+asm("dr %[r0],%[b]"
+: [r0] "+r" (r0), [r1] "+r" (r1)
+: [b] "r" (b)
+: "cc");
+q = r1;
+r = r0;
+assert(q == -41);
+assert(r == -100);
+}
+
+static void test_dlr(void)
+{
+register uint32_t r0 asm("r0") = 0;
+register uint32_t r1 asm("r1") = 4243;
+uint32_t b = 101, q, r;
+
+asm("dlr %[r0],%[b]"
+: [r0] "+r" (r0), [r1] "+r" (r1)
+: [b] "r" (b)
+: "cc");
+q = r1;
+r = r0;
+assert(q == 42);
+assert(r == 1);
+}
+
+int main(void)
+{
+test_dr();
+test_dlr();
+}
diff --git a/tests/tcg/s390x/Makefile.target b/tests/tcg/s390x/Makefile.target
index 07fcc6d0ce..ab7a3bcfb2 100644
--- a/tests/tcg/s390x/Makefile.target
+++ b/tests/tcg/s390x/Makefile.target
@@ -24,6 +24,7 @@ TESTS+=trap
 TESTS+=signals-s390x
 TESTS+=branch-relative-long
 TESTS+=noexec
+TESTS+=div
 
 Z13_TESTS=vistr
 $(Z13_TESTS): CFLAGS+=-march=z13 -O2
-- 
2.34.1

[PULL 29/40] target/s390x: Use Int128 for return from CLST

2023-02-04 Thread Richard Henderson

Reviewed-by: Philippe Mathieu-Daudé 
Acked-by: Ilya Leoshkevich 
Signed-off-by: Richard Henderson 
---
 target/s390x/helper.h |  2 +-
 target/s390x/tcg/mem_helper.c | 11 ---
 target/s390x/tcg/translate.c  |  8 ++--
 3 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 593f3c8bee..25c2dd0b3c 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -16,7 +16,7 @@ DEF_HELPER_FLAGS_3(divs64, TCG_CALL_NO_WG, i128, env, s64, 
s64)
 DEF_HELPER_FLAGS_4(divu64, TCG_CALL_NO_WG, i128, env, i64, i64, i64)
 DEF_HELPER_3(srst, void, env, i32, i32)
 DEF_HELPER_3(srstu, void, env, i32, i32)
-DEF_HELPER_4(clst, i64, env, i64, i64, i64)
+DEF_HELPER_4(clst, i128, env, i64, i64, i64)
 DEF_HELPER_FLAGS_4(mvn, TCG_CALL_NO_WG, void, env, i32, i64, i64)
 DEF_HELPER_FLAGS_4(mvo, TCG_CALL_NO_WG, void, env, i32, i64, i64)
 DEF_HELPER_FLAGS_4(mvpg, TCG_CALL_NO_WG, i32, env, i64, i32, i32)
diff --git a/target/s390x/tcg/mem_helper.c b/target/s390x/tcg/mem_helper.c
index cb82cd1c1d..9be42851d8 100644
--- a/target/s390x/tcg/mem_helper.c
+++ b/target/s390x/tcg/mem_helper.c
@@ -886,7 +886,7 @@ void HELPER(srstu)(CPUS390XState *env, uint32_t r1, 
uint32_t r2)
 }
 
 /* unsigned string compare (c is string terminator) */
-uint64_t HELPER(clst)(CPUS390XState *env, uint64_t c, uint64_t s1, uint64_t s2)
+Int128 HELPER(clst)(CPUS390XState *env, uint64_t c, uint64_t s1, uint64_t s2)
 {
 uintptr_t ra = GETPC();
 uint32_t len;
@@ -904,23 +904,20 @@ uint64_t HELPER(clst)(CPUS390XState *env, uint64_t c, 
uint64_t s1, uint64_t s2)
 if (v1 == c) {
 /* Equal.  CC=0, and don't advance the registers.  */
 env->cc_op = 0;
-env->retxl = s2;
-return s1;
+return int128_make128(s2, s1);
 }
 } else {
 /* Unequal.  CC={1,2}, and advance the registers.  Note that
the terminator need not be zero, but the string that contains
the terminator is by definition "low".  */
 env->cc_op = (v1 == c ? 1 : v2 == c ? 2 : v1 < v2 ? 1 : 2);
-env->retxl = s2 + len;
-return s1 + len;
+return int128_make128(s2 + len, s1 + len);
 }
 }
 
 /* CPU-determined bytes equal; advance the registers.  */
 env->cc_op = 3;
-env->retxl = s2 + len;
-return s1 + len;
+return int128_make128(s2 + len, s1 + len);
 }
 
 /* move page */
diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
index 6953b81de7..8397fe2bd8 100644
--- a/target/s390x/tcg/translate.c
+++ b/target/s390x/tcg/translate.c
@@ -2164,9 +2164,13 @@ static DisasJumpType op_clm(DisasContext *s, DisasOps *o)
 
 static DisasJumpType op_clst(DisasContext *s, DisasOps *o)
 {
-gen_helper_clst(o->in1, cpu_env, regs[0], o->in1, o->in2);
+TCGv_i128 pair = tcg_temp_new_i128();
+
+gen_helper_clst(pair, cpu_env, regs[0], o->in1, o->in2);
+tcg_gen_extr_i128_i64(o->in2, o->in1, pair);
+tcg_temp_free_i128(pair);
+
 set_cc_static(s);
-return_low128(o->in2);
 return DISAS_NEXT;
 }
 
-- 
2.34.1

[PULL 28/40] target/s390x: Use a single return for helper_divs64/u64

2023-02-04 Thread Richard Henderson

Pack the quotient and remainder into a single Int128.
Use the divu128 primitive to remove the cpu_abort on
32-bit hosts.

Reviewed-by: Philippe Mathieu-Daudé 
Acked-by: Ilya Leoshkevich 
Signed-off-by: Richard Henderson 
---
v2: Extended div test case to cover these insns.
---
 target/s390x/helper.h |  4 ++--
 target/s390x/tcg/int_helper.c | 38 +--
 target/s390x/tcg/translate.c  | 14 +
 tests/tcg/s390x/div.c | 35 
 4 files changed, 56 insertions(+), 35 deletions(-)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index bc828d976b..593f3c8bee 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -12,8 +12,8 @@ DEF_HELPER_3(clcl, i32, env, i32, i32)
 DEF_HELPER_FLAGS_4(clm, TCG_CALL_NO_WG, i32, env, i32, i32, i64)
 DEF_HELPER_FLAGS_3(divs32, TCG_CALL_NO_WG, i64, env, s64, s64)
 DEF_HELPER_FLAGS_3(divu32, TCG_CALL_NO_WG, i64, env, i64, i64)
-DEF_HELPER_FLAGS_3(divs64, TCG_CALL_NO_WG, s64, env, s64, s64)
-DEF_HELPER_FLAGS_4(divu64, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
+DEF_HELPER_FLAGS_3(divs64, TCG_CALL_NO_WG, i128, env, s64, s64)
+DEF_HELPER_FLAGS_4(divu64, TCG_CALL_NO_WG, i128, env, i64, i64, i64)
 DEF_HELPER_3(srst, void, env, i32, i32)
 DEF_HELPER_3(srstu, void, env, i32, i32)
 DEF_HELPER_4(clst, i64, env, i64, i64, i64)
diff --git a/target/s390x/tcg/int_helper.c b/target/s390x/tcg/int_helper.c
index 7260583cf2..eb8e6dd1b5 100644
--- a/target/s390x/tcg/int_helper.c
+++ b/target/s390x/tcg/int_helper.c
@@ -76,46 +76,26 @@ uint64_t HELPER(divu32)(CPUS390XState *env, uint64_t a, 
uint64_t b64)
 }
 
 /* 64/64 -> 64 signed division */
-int64_t HELPER(divs64)(CPUS390XState *env, int64_t a, int64_t b)
+Int128 HELPER(divs64)(CPUS390XState *env, int64_t a, int64_t b)
 {
 /* Catch divide by zero, and non-representable quotient (MIN / -1).  */
 if (b == 0 || (b == -1 && a == (1ll << 63))) {
 tcg_s390_program_interrupt(env, PGM_FIXPT_DIVIDE, GETPC());
 }
-env->retxl = a % b;
-return a / b;
+return int128_make128(a / b, a % b);
 }
 
 /* 128 -> 64/64 unsigned division */
-uint64_t HELPER(divu64)(CPUS390XState *env, uint64_t ah, uint64_t al,
-uint64_t b)
+Int128 HELPER(divu64)(CPUS390XState *env, uint64_t ah, uint64_t al, uint64_t b)
 {
-uint64_t ret;
-/* Signal divide by zero.  */
-if (b == 0) {
-tcg_s390_program_interrupt(env, PGM_FIXPT_DIVIDE, GETPC());
-}
-if (ah == 0) {
-/* 64 -> 64/64 case */
-env->retxl = al % b;
-ret = al / b;
-} else {
-/* ??? Move i386 idivq helper to host-utils.  */
-#ifdef CONFIG_INT128
-__uint128_t a = ((__uint128_t)ah << 64) | al;
-__uint128_t q = a / b;
-env->retxl = a % b;
-ret = q;
-if (ret != q) {
-tcg_s390_program_interrupt(env, PGM_FIXPT_DIVIDE, GETPC());
+if (b != 0) {
+uint64_t r = divu128(&al, &ah, b);
+if (ah == 0) {
+return int128_make128(al, r);
 }
-#else
-/* 32-bit hosts would need special wrapper functionality - just abort 
if
-   we encounter such a case; it's very unlikely anyways. */
-cpu_abort(env_cpu(env), "128 -> 64/64 division not implemented\n");
-#endif
 }
-return ret;
+/* divide by zero or overflow */
+tcg_s390_program_interrupt(env, PGM_FIXPT_DIVIDE, GETPC());
 }
 
 uint64_t HELPER(cvd)(int32_t reg)
diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
index 169f7ee1b2..6953b81de7 100644
--- a/target/s390x/tcg/translate.c
+++ b/target/s390x/tcg/translate.c
@@ -2409,15 +2409,21 @@ static DisasJumpType op_divu32(DisasContext *s, 
DisasOps *o)
 
 static DisasJumpType op_divs64(DisasContext *s, DisasOps *o)
 {
-gen_helper_divs64(o->out2, cpu_env, o->in1, o->in2);
-return_low128(o->out);
+TCGv_i128 t = tcg_temp_new_i128();
+
+gen_helper_divs64(t, cpu_env, o->in1, o->in2);
+tcg_gen_extr_i128_i64(o->out2, o->out, t);
+tcg_temp_free_i128(t);
 return DISAS_NEXT;
 }
 
 static DisasJumpType op_divu64(DisasContext *s, DisasOps *o)
 {
-gen_helper_divu64(o->out2, cpu_env, o->out, o->out2, o->in2);
-return_low128(o->out);
+TCGv_i128 t = tcg_temp_new_i128();
+
+gen_helper_divu64(t, cpu_env, o->out, o->out2, o->in2);
+tcg_gen_extr_i128_i64(o->out2, o->out, t);
+tcg_temp_free_i128(t);
 return DISAS_NEXT;
 }
 
diff --git a/tests/tcg/s390x/div.c b/tests/tcg/s390x/div.c
index 5807295614..6ad9900e08 100644
--- a/tests/tcg/s390x/div.c
+++ b/tests/tcg/s390x/div.c
@@ -33,8 +33,43 @@ static void test_dlr(void)
 assert(r == 1);
 }
 
+static void test_dsgr(void)
+{
+register int64_t r0 asm("r0") = -1;
+register int64_t r1 asm("r1") = -4241;
+int64_t b = 101, q, r;
+
+asm("dsgr %[r0],%[b]"
+: [r0] "+r" (r0), [r1] "+r" (r1)
+: [b] "r" (b)
+: "cc");
+q = r1;
+r = r0;
+assert(q == -41);
+assert(r == -

[PULL 27/40] target/s390x: Use a single return for helper_divs32/u32

2023-02-04 Thread Richard Henderson

Pack the quotient and remainder into a single uint64_t.

Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: David Hildenbrand 
Signed-off-by: Richard Henderson 
---
v2: Fix operand ordering; use tcg_extr32_i64.
---
 target/s390x/helper.h |  2 +-
 target/s390x/tcg/int_helper.c | 26 +-
 target/s390x/tcg/translate.c  |  8 
 3 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 93923ca153..bc828d976b 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -10,7 +10,7 @@ DEF_HELPER_FLAGS_4(clc, TCG_CALL_NO_WG, i32, env, i32, i64, 
i64)
 DEF_HELPER_3(mvcl, i32, env, i32, i32)
 DEF_HELPER_3(clcl, i32, env, i32, i32)
 DEF_HELPER_FLAGS_4(clm, TCG_CALL_NO_WG, i32, env, i32, i32, i64)
-DEF_HELPER_FLAGS_3(divs32, TCG_CALL_NO_WG, s64, env, s64, s64)
+DEF_HELPER_FLAGS_3(divs32, TCG_CALL_NO_WG, i64, env, s64, s64)
 DEF_HELPER_FLAGS_3(divu32, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(divs64, TCG_CALL_NO_WG, s64, env, s64, s64)
 DEF_HELPER_FLAGS_4(divu64, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
diff --git a/target/s390x/tcg/int_helper.c b/target/s390x/tcg/int_helper.c
index 954542388a..7260583cf2 100644
--- a/target/s390x/tcg/int_helper.c
+++ b/target/s390x/tcg/int_helper.c
@@ -34,45 +34,45 @@
 #endif
 
 /* 64/32 -> 32 signed division */
-int64_t HELPER(divs32)(CPUS390XState *env, int64_t a, int64_t b64)
+uint64_t HELPER(divs32)(CPUS390XState *env, int64_t a, int64_t b64)
 {
-int32_t ret, b = b64;
-int64_t q;
+int32_t b = b64;
+int64_t q, r;
 
 if (b == 0) {
 tcg_s390_program_interrupt(env, PGM_FIXPT_DIVIDE, GETPC());
 }
 
-ret = q = a / b;
-env->retxl = a % b;
+q = a / b;
+r = a % b;
 
 /* Catch non-representable quotient.  */
-if (ret != q) {
+if (q != (int32_t)q) {
 tcg_s390_program_interrupt(env, PGM_FIXPT_DIVIDE, GETPC());
 }
 
-return ret;
+return deposit64(q, 32, 32, r);
 }
 
 /* 64/32 -> 32 unsigned division */
 uint64_t HELPER(divu32)(CPUS390XState *env, uint64_t a, uint64_t b64)
 {
-uint32_t ret, b = b64;
-uint64_t q;
+uint32_t b = b64;
+uint64_t q, r;
 
 if (b == 0) {
 tcg_s390_program_interrupt(env, PGM_FIXPT_DIVIDE, GETPC());
 }
 
-ret = q = a / b;
-env->retxl = a % b;
+q = a / b;
+r = a % b;
 
 /* Catch non-representable quotient.  */
-if (ret != q) {
+if (q != (uint32_t)q) {
 tcg_s390_program_interrupt(env, PGM_FIXPT_DIVIDE, GETPC());
 }
 
-return ret;
+return deposit64(q, 32, 32, r);
 }
 
 /* 64/64 -> 64 signed division */
diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
index a339b277e9..169f7ee1b2 100644
--- a/target/s390x/tcg/translate.c
+++ b/target/s390x/tcg/translate.c
@@ -2395,15 +2395,15 @@ static DisasJumpType op_diag(DisasContext *s, DisasOps 
*o)
 
 static DisasJumpType op_divs32(DisasContext *s, DisasOps *o)
 {
-gen_helper_divs32(o->out2, cpu_env, o->in1, o->in2);
-return_low128(o->out);
+gen_helper_divs32(o->out, cpu_env, o->in1, o->in2);
+tcg_gen_extr32_i64(o->out2, o->out, o->out);
 return DISAS_NEXT;
 }
 
 static DisasJumpType op_divu32(DisasContext *s, DisasOps *o)
 {
-gen_helper_divu32(o->out2, cpu_env, o->in1, o->in2);
-return_low128(o->out);
+gen_helper_divu32(o->out, cpu_env, o->in1, o->in2);
+tcg_gen_extr32_i64(o->out2, o->out, o->out);
 return DISAS_NEXT;
 }
 
-- 
2.34.1

[PULL 06/40] tcg: Introduce tcg_out_addi_ptr

2023-02-04 Thread Richard Henderson

Implement the function for arm, i386, and s390x, which will use it.
Add stubs for all other backends.

Reviewed-by: Alex Bennée 
Reviewed-by: Daniel Henrique Barboza 
Signed-off-by: Richard Henderson 
---
 tcg/tcg.c|  2 ++
 tcg/aarch64/tcg-target.c.inc |  7 +++
 tcg/arm/tcg-target.c.inc | 20 
 tcg/i386/tcg-target.c.inc|  8 
 tcg/loongarch64/tcg-target.c.inc |  7 +++
 tcg/mips/tcg-target.c.inc|  7 +++
 tcg/ppc/tcg-target.c.inc |  7 +++
 tcg/riscv/tcg-target.c.inc   |  7 +++
 tcg/s390x/tcg-target.c.inc   |  7 +++
 tcg/sparc64/tcg-target.c.inc |  7 +++
 tcg/tci/tcg-target.c.inc |  7 +++
 11 files changed, 86 insertions(+)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index cdfc50b164..8923b52044 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -104,6 +104,8 @@ static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg 
ret, TCGReg arg1,
 static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg);
 static void tcg_out_movi(TCGContext *s, TCGType type,
  TCGReg ret, tcg_target_long arg);
+static void tcg_out_addi_ptr(TCGContext *s, TCGReg, TCGReg, tcg_target_long)
+__attribute__((unused));
 static void tcg_out_exit_tb(TCGContext *s, uintptr_t arg);
 static void tcg_out_goto_tb(TCGContext *s, int which);
 static void tcg_out_op(TCGContext *s, TCGOpcode opc,
diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index 330d26b395..bd6da72678 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -1102,6 +1102,13 @@ static void tcg_out_movi(TCGContext *s, TCGType type, 
TCGReg rd,
 tcg_out_insn(s, 3305, LDR, 0, rd);
 }
 
+static void tcg_out_addi_ptr(TCGContext *s, TCGReg rd, TCGReg rs,
+ tcg_target_long imm)
+{
+/* This function is only used for passing structs by reference. */
+g_assert_not_reached();
+}
+
 /* Define something more legible for general use.  */
 #define tcg_out_ldst_r  tcg_out_insn_3310
 
diff --git a/tcg/arm/tcg-target.c.inc b/tcg/arm/tcg-target.c.inc
index 0f5f9f4925..6e9e9b9b3f 100644
--- a/tcg/arm/tcg-target.c.inc
+++ b/tcg/arm/tcg-target.c.inc
@@ -2581,6 +2581,26 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
 tcg_out_movi32(s, COND_AL, ret, arg);
 }
 
+static void tcg_out_addi_ptr(TCGContext *s, TCGReg rd, TCGReg rs,
+ tcg_target_long imm)
+{
+int enc, opc = ARITH_ADD;
+
+/* All of the easiest immediates to encode are positive. */
+if (imm < 0) {
+imm = -imm;
+opc = ARITH_SUB;
+}
+enc = encode_imm(imm);
+if (enc >= 0) {
+tcg_out_dat_imm(s, COND_AL, opc, rd, rs, enc);
+} else {
+tcg_out_movi32(s, COND_AL, TCG_REG_TMP, imm);
+tcg_out_dat_reg(s, COND_AL, opc, rd, rs,
+TCG_REG_TMP, SHIFT_IMM_LSL(0));
+}
+}
+
 /* Type is always V128, with I64 elements.  */
 static void tcg_out_dup2_vec(TCGContext *s, TCGReg rd, TCGReg rl, TCGReg rh)
 {
diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index c71c3e664d..7b573bd287 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -1069,6 +1069,14 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
 }
 }
 
+static void tcg_out_addi_ptr(TCGContext *s, TCGReg rd, TCGReg rs,
+ tcg_target_long imm)
+{
+/* This function is only used for passing structs by reference. */
+tcg_debug_assert(TCG_TARGET_REG_BITS == 32);
+tcg_out_modrm_offset(s, OPC_LEA, rd, rs, imm);
+}
+
 static inline void tcg_out_pushi(TCGContext *s, tcg_target_long val)
 {
 if (val == (int8_t)val) {
diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index ce4a153887..b6e2ff6213 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -417,6 +417,13 @@ static void tcg_out_addi(TCGContext *s, TCGType type, 
TCGReg rd,
 }
 }
 
+static void tcg_out_addi_ptr(TCGContext *s, TCGReg rd, TCGReg rs,
+ tcg_target_long imm)
+{
+/* This function is only used for passing structs by reference. */
+g_assert_not_reached();
+}
+
 static void tcg_out_ext8u(TCGContext *s, TCGReg ret, TCGReg arg)
 {
 tcg_out_opc_andi(s, ret, arg, 0xff);
diff --git a/tcg/mips/tcg-target.c.inc b/tcg/mips/tcg-target.c.inc
index 6e000d8e69..d419c4c1fc 100644
--- a/tcg/mips/tcg-target.c.inc
+++ b/tcg/mips/tcg-target.c.inc
@@ -550,6 +550,13 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
 }
 }
 
+static void tcg_out_addi_ptr(TCGContext *s, TCGReg rd, TCGReg rs,
+ tcg_target_long imm)
+{
+/* This function is only used for passing structs by reference. */
+g_assert_not_reached();
+}
+
 static void tcg_out_bswap16(TCGContext *s, TCGReg ret, TCGReg arg, int flags)
 {
 /* ret and arg can't be register tmp0 */
diff --git a/t

[PULL 24/40] tests/tcg/s390x: Add clst.c

2023-02-04 Thread Richard Henderson

From: Ilya Leoshkevich 

Add a basic test to prevent regressions.

Signed-off-by: Ilya Leoshkevich 
Message-Id: <20221025213008.2209006-2-...@linux.ibm.com>
Signed-off-by: Richard Henderson 
---
 tests/tcg/s390x/clst.c  | 82 +
 tests/tcg/s390x/Makefile.target |  1 +
 2 files changed, 83 insertions(+)
 create mode 100644 tests/tcg/s390x/clst.c

diff --git a/tests/tcg/s390x/clst.c b/tests/tcg/s390x/clst.c
new file mode 100644
index 00..ed2fe7326c
--- /dev/null
+++ b/tests/tcg/s390x/clst.c
@@ -0,0 +1,82 @@
+#define _GNU_SOURCE
+#include 
+#include 
+
+static int clst(char sep, const char **s1, const char **s2)
+{
+const char *r1 = *s1;
+const char *r2 = *s2;
+int cc;
+
+do {
+register int r0 asm("r0") = sep;
+
+asm("clst %[r1],%[r2]\n"
+"ipm %[cc]\n"
+"srl %[cc],28"
+: [r1] "+r" (r1), [r2] "+r" (r2), "+r" (r0), [cc] "=r" (cc)
+:
+: "cc");
+*s1 = r1;
+*s2 = r2;
+} while (cc == 3);
+
+return cc;
+}
+
+static const struct test {
+const char *name;
+char sep;
+const char *s1;
+const char *s2;
+int exp_cc;
+int exp_off;
+} tests[] = {
+{
+.name = "cc0",
+.sep = 0,
+.s1 = "aa",
+.s2 = "aa",
+.exp_cc = 0,
+.exp_off = 0,
+},
+{
+.name = "cc1",
+.sep = 1,
+.s1 = "a\x01",
+.s2 = "aa\x01",
+.exp_cc = 1,
+.exp_off = 1,
+},
+{
+.name = "cc2",
+.sep = 2,
+.s1 = "abc\x02",
+.s2 = "abb\x02",
+.exp_cc = 2,
+.exp_off = 2,
+},
+};
+
+int main(void)
+{
+const struct test *t;
+const char *s1, *s2;
+size_t i;
+int cc;
+
+for (i = 0; i < sizeof(tests) / sizeof(tests[0]); i++) {
+t = &tests[i];
+s1 = t->s1;
+s2 = t->s2;
+cc = clst(t->sep, &s1, &s2);
+if (cc != t->exp_cc ||
+s1 != t->s1 + t->exp_off ||
+s2 != t->s2 + t->exp_off) {
+fprintf(stderr, "%s\n", t->name);
+return EXIT_FAILURE;
+}
+}
+
+return EXIT_SUCCESS;
+}
diff --git a/tests/tcg/s390x/Makefile.target b/tests/tcg/s390x/Makefile.target
index ab7a3bcfb2..79250f31dd 100644
--- a/tests/tcg/s390x/Makefile.target
+++ b/tests/tcg/s390x/Makefile.target
@@ -25,6 +25,7 @@ TESTS+=signals-s390x
 TESTS+=branch-relative-long
 TESTS+=noexec
 TESTS+=div
+TESTS+=clst
 
 Z13_TESTS=vistr
 $(Z13_TESTS): CFLAGS+=-march=z13 -O2
-- 
2.34.1

[PULL 32/40] target/s390x: Copy wout_x1 to wout_x1_P

2023-02-04 Thread Richard Henderson

Make a copy of wout_x1 before modifying it, as wout_x1_P
emphasizing that it operates on the out/out2 pair.  The insns
that use x1_P are data movement that will not change to Int128.

Acked-by: Ilya Leoshkevich 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/s390x/tcg/insn-data.h.inc | 12 ++--
 target/s390x/tcg/translate.c |  8 
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/target/s390x/tcg/insn-data.h.inc b/target/s390x/tcg/insn-data.h.inc
index 79c6ab509a..d0814cb218 100644
--- a/target/s390x/tcg/insn-data.h.inc
+++ b/target/s390x/tcg/insn-data.h.inc
@@ -422,7 +422,7 @@
 F(0x3800, LER, RR_a,  Z,   0, e2, 0, cond_e1e2, mov2, 0, IF_AFP1 | 
IF_AFP2)
 F(0x7800, LE,  RX_a,  Z,   0, m2_32u, 0, e1, mov2, 0, IF_AFP1)
 F(0xed64, LEY, RXY_a, LD,  0, m2_32u, 0, e1, mov2, 0, IF_AFP1)
-F(0xb365, LXR, RRE,   Z,   x2h, x2l, 0, x1, movx, 0, IF_AFP1)
+F(0xb365, LXR, RRE,   Z,   x2h, x2l, 0, x1_P, movx, 0, IF_AFP1)
 /* LOAD IMMEDIATE */
 C(0xc001, LGFI,RIL_a, EI,  0, i2, 0, r1, mov2, 0)
 /* LOAD RELATIVE LONG */
@@ -461,7 +461,7 @@
 C(0xe332, LTGF,RXY_a, GIE, 0, a2, r1, 0, ld32s, s64)
 F(0xb302, LTEBR,   RRE,   Z,   0, e2, 0, cond_e1e2, mov2, f32, IF_BFP)
 F(0xb312, LTDBR,   RRE,   Z,   0, f2, 0, f1, mov2, f64, IF_BFP)
-F(0xb342, LTXBR,   RRE,   Z,   x2h, x2l, 0, x1, movx, f128, IF_BFP)
+F(0xb342, LTXBR,   RRE,   Z,   x2h, x2l, 0, x1_P, movx, f128, IF_BFP)
 /* LOAD AND TRAP */
 C(0xe39f, LAT, RXY_a, LAT, 0, m2_32u, r1, 0, lat, 0)
 C(0xe385, LGAT,RXY_a, LAT, 0, a2, r1, 0, lgat, 0)
@@ -483,7 +483,7 @@
 C(0xb913, LCGFR,   RRE,   Z,   0, r2_32s, r1, 0, neg, neg64)
 F(0xb303, LCEBR,   RRE,   Z,   0, e2, new, e1, negf32, f32, IF_BFP)
 F(0xb313, LCDBR,   RRE,   Z,   0, f2, new, f1, negf64, f64, IF_BFP)
-F(0xb343, LCXBR,   RRE,   Z,   x2h, x2l, new_P, x1, negf128, f128, IF_BFP)
+F(0xb343, LCXBR,   RRE,   Z,   x2h, x2l, new_P, x1_P, negf128, f128, 
IF_BFP)
 F(0xb373, LCDFR,   RRE,   FPSSH, 0, f2, new, f1, negf64, 0, IF_AFP1 | 
IF_AFP2)
 /* LOAD COUNT TO BLOCK BOUNDARY */
 C(0xe727, LCBB,RXE,   V,   la2, 0, r1, 0, lcbb, 0)
@@ -552,7 +552,7 @@
 C(0xb911, LNGFR,   RRE,   Z,   0, r2_32s, r1, 0, nabs, nabs64)
 F(0xb301, LNEBR,   RRE,   Z,   0, e2, new, e1, nabsf32, f32, IF_BFP)
 F(0xb311, LNDBR,   RRE,   Z,   0, f2, new, f1, nabsf64, f64, IF_BFP)
-F(0xb341, LNXBR,   RRE,   Z,   x2h, x2l, new_P, x1, nabsf128, f128, IF_BFP)
+F(0xb341, LNXBR,   RRE,   Z,   x2h, x2l, new_P, x1_P, nabsf128, f128, 
IF_BFP)
 F(0xb371, LNDFR,   RRE,   FPSSH, 0, f2, new, f1, nabsf64, 0, IF_AFP1 | 
IF_AFP2)
 /* LOAD ON CONDITION */
 C(0xb9f2, LOCR,RRF_c, LOC, r1, r2, new, r1_32, loc, 0)
@@ -577,7 +577,7 @@
 C(0xb910, LPGFR,   RRE,   Z,   0, r2_32s, r1, 0, abs, abs64)
 F(0xb300, LPEBR,   RRE,   Z,   0, e2, new, e1, absf32, f32, IF_BFP)
 F(0xb310, LPDBR,   RRE,   Z,   0, f2, new, f1, absf64, f64, IF_BFP)
-F(0xb340, LPXBR,   RRE,   Z,   x2h, x2l, new_P, x1, absf128, f128, IF_BFP)
+F(0xb340, LPXBR,   RRE,   Z,   x2h, x2l, new_P, x1_P, absf128, f128, 
IF_BFP)
 F(0xb370, LPDFR,   RRE,   FPSSH, 0, f2, new, f1, absf64, 0, IF_AFP1 | 
IF_AFP2)
 /* LOAD REVERSED */
 C(0xb91f, LRVR,RRE,   Z,   0, r2_32u, new, r1_32, rev32, 0)
@@ -588,7 +588,7 @@
 /* LOAD ZERO */
 F(0xb374, LZER,RRE,   Z,   0, 0, 0, e1, zero, 0, IF_AFP1)
 F(0xb375, LZDR,RRE,   Z,   0, 0, 0, f1, zero, 0, IF_AFP1)
-F(0xb376, LZXR,RRE,   Z,   0, 0, 0, x1, zero2, 0, IF_AFP1)
+F(0xb376, LZXR,RRE,   Z,   0, 0, 0, x1_P, zero2, 0, IF_AFP1)
 
 /* LOAD FPC */
 F(0xb29d, LFPC,S, Z,   0, m2_32u, 0, 0, sfpc, 0, IF_BFP)
diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
index f3e4b70ed9..d25b6f3c03 100644
--- a/target/s390x/tcg/translate.c
+++ b/target/s390x/tcg/translate.c
@@ -5518,6 +5518,14 @@ static void wout_x1(DisasContext *s, DisasOps *o)
 }
 #define SPEC_wout_x1 SPEC_r1_f128
 
+static void wout_x1_P(DisasContext *s, DisasOps *o)
+{
+int f1 = get_field(s, r1);
+store_freg(f1, o->out);
+store_freg(f1 + 2, o->out2);
+}
+#define SPEC_wout_x1_P SPEC_r1_f128
+
 static void wout_cond_r1r2_32(DisasContext *s, DisasOps *o)
 {
 if (get_field(s, r1) != get_field(s, r2)) {
-- 
2.34.1

[PATCH] KVM: dirty ring: check if vcpu is created before dirty_ring_reap_one

2023-02-04 Thread Weinan Liu

From: Weinan Liu 

Failed to assert '(dirty_gfns && ring_size)' in kvm_dirty_ring_reap_one if
the vcpu has not been finished to create yet. This bug occasionally occurs
when I open 200+ qemu instances on my 16G 6-cores x86 machine. And it must
be triggered if inserting a 'sleep(10)' into kvm_vcpu_thread_fn as below--

 static void *kvm_vcpu_thread_fn(void *arg)
 {
 CPUState *cpu = arg;
 int r;

 rcu_register_thread();

+sleep(10);
 qemu_mutex_lock_iothread();
 qemu_thread_get_self(cpu->thread);
 cpu->thread_id = qemu_get_thread_id();
 cpu->can_do_io = 1;

where dirty ring reaper will wakeup but then a vcpu has not been finished
to create.

Signed-off-by: Weinan Liu 
---
 accel/kvm/kvm-all.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 7e6a6076b1..840da7630e 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -719,6 +719,15 @@ static uint64_t kvm_dirty_ring_reap_locked(KVMState *s, 
CPUState* cpu)
 total = kvm_dirty_ring_reap_one(s, cpu);
 } else {
 CPU_FOREACH(cpu) {
+/*
+ * Must ensure kvm_init_vcpu is finished, so cpu->kvm_dirty_gfns is
+ * available.
+ */
+while (cpu->created == false) {
+qemu_mutex_unlock_iothread();
+qemu_mutex_lock_iothread();
+}
+
 total += kvm_dirty_ring_reap_one(s, cpu);
 }
 }
-- 
2.25.1

[PULL 16/22] linux-user: Fix /proc/cpuinfo output for hppa

2023-02-04 Thread Laurent Vivier

From: Helge Deller 

The hppa architectures provides an own output for the emulated
/proc/cpuinfo file.

Some userspace applications count (even if that's not the recommended
way) the number of lines which start with "processor:" and assume that
this number then reflects the number of online CPUs. Since those 3
architectures don't provide any such line, applications may assume "0"
CPUs.  One such issue can be seen in debian bug report:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1024653

Avoid such issues by adding a "processor:" line for each of the online
CPUs.

Signed-off-by: Helge Deller 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Richard Henderson 
Reviewed-by: Laurent Vivier 
Message-Id: 
Signed-off-by: Laurent Vivier 
---
 linux-user/syscall.c | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 1c42df651801..55d53b344b84 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8232,11 +8232,17 @@ static int open_cpuinfo(CPUArchState *cpu_env, int fd)
 #if defined(TARGET_HPPA)
 static int open_cpuinfo(CPUArchState *cpu_env, int fd)
 {
-dprintf(fd, "cpu family\t: PA-RISC 1.1e\n");
-dprintf(fd, "cpu\t\t: PA7300LC (PCX-L2)\n");
-dprintf(fd, "capabilities\t: os32\n");
-dprintf(fd, "model\t\t: 9000/778/B160L\n");
-dprintf(fd, "model name\t: Merlin L2 160 QEMU (9000/778/B160L)\n");
+int i, num_cpus;
+
+num_cpus = sysconf(_SC_NPROCESSORS_ONLN);
+for (i = 0; i < num_cpus; i++) {
+dprintf(fd, "processor\t: %d\n", i);
+dprintf(fd, "cpu family\t: PA-RISC 1.1e\n");
+dprintf(fd, "cpu\t\t: PA7300LC (PCX-L2)\n");
+dprintf(fd, "capabilities\t: os32\n");
+dprintf(fd, "model\t\t: 9000/778/B160L - "
+"Merlin L2 160 QEMU (9000/778/B160L)\n\n");
+}
 return 0;
 }
 #endif
-- 
2.39.1

[PULL 03/22] linux-user/strace: Add output for execveat() syscall

2023-02-04 Thread Laurent Vivier

From: Drew DeVault 

Signed-off-by: Drew DeVault 
Message-Id: <20221104081015.706009-1-...@cmpwn.com>
Suggested-by: Helge Deller 
[PMD: Split of bigger patch]
Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Laurent Vivier 
Message-Id: <20221104173632.1052-4-phi...@linaro.org>
Signed-off-by: Laurent Vivier 
---
 linux-user/strace.c| 23 +++
 linux-user/strace.list |  2 +-
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/linux-user/strace.c b/linux-user/strace.c
index 3d11d2f75978..7bccb4f0c067 100644
--- a/linux-user/strace.c
+++ b/linux-user/strace.c
@@ -1104,6 +1104,16 @@ UNUSED static const struct flags clone_flags[] = {
 FLAG_END,
 };
 
+UNUSED static const struct flags execveat_flags[] = {
+#ifdef AT_EMPTY_PATH
+FLAG_GENERIC(AT_EMPTY_PATH),
+#endif
+#ifdef AT_SYMLINK_NOFOLLOW
+FLAG_GENERIC(AT_SYMLINK_NOFOLLOW),
+#endif
+FLAG_END,
+};
+
 UNUSED static const struct flags msg_flags[] = {
 /* send */
 FLAG_GENERIC(MSG_CONFIRM),
@@ -1976,6 +1986,19 @@ print_execve(CPUArchState *cpu_env, const struct 
syscallname *name,
 print_syscall_epilogue(name);
 }
 
+static void
+print_execveat(CPUArchState *cpu_env, const struct syscallname *name,
+   abi_long arg1, abi_long arg2, abi_long arg3,
+   abi_long arg4, abi_long arg5, abi_long arg6)
+{
+print_syscall_prologue(name);
+print_at_dirfd(arg1, 0);
+print_string(arg2, 0);
+print_execve_argv(arg3, 0);
+print_flags(execveat_flags, arg5, 1);
+print_syscall_epilogue(name);
+}
+
 #if defined(TARGET_NR_faccessat) || defined(TARGET_NR_faccessat2)
 static void
 print_faccessat(CPUArchState *cpu_env, const struct syscallname *name,
diff --git a/linux-user/strace.list b/linux-user/strace.list
index 3a898e2532d3..bb21c054148e 100644
--- a/linux-user/strace.list
+++ b/linux-user/strace.list
@@ -164,7 +164,7 @@
 { TARGET_NR_execve, "execve" , NULL, print_execve, NULL },
 #endif
 #ifdef TARGET_NR_execveat
-{ TARGET_NR_execveat, "execveat" , NULL, NULL, NULL },
+{ TARGET_NR_execveat, "execveat" , NULL, print_execveat, NULL },
 #endif
 #ifdef TARGET_NR_exec_with_loader
 { TARGET_NR_exec_with_loader, "exec_with_loader" , NULL, NULL, NULL },
-- 
2.39.1

[PULL 02/22] linux-user/strace: Extract print_execve_argv() from print_execve()

2023-02-04 Thread Laurent Vivier

From: Drew DeVault 

In order to add print_execveat() which re-use common code from
print_execve(), extract print_execve_argv() from it.

Signed-off-by: Drew DeVault 
Message-Id: <20221104081015.706009-1-...@cmpwn.com>
[PMD: Split of bigger patch, filled description, fixed style]
Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Laurent Vivier 
Message-Id: <20221104173632.1052-3-phi...@linaro.org>
Signed-off-by: Laurent Vivier 
---
 linux-user/strace.c | 71 +
 1 file changed, 39 insertions(+), 32 deletions(-)

diff --git a/linux-user/strace.c b/linux-user/strace.c
index 25c47f03160d..3d11d2f75978 100644
--- a/linux-user/strace.c
+++ b/linux-user/strace.c
@@ -616,38 +616,6 @@ print_semctl(CPUArchState *cpu_env, const struct 
syscallname *name,
 }
 #endif
 
-static void
-print_execve(CPUArchState *cpu_env, const struct syscallname *name,
- abi_long arg1, abi_long arg2, abi_long arg3,
- abi_long arg4, abi_long arg5, abi_long arg6)
-{
-abi_ulong arg_ptr_addr;
-char *s;
-
-if (!(s = lock_user_string(arg1)))
-return;
-qemu_log("%s(\"%s\",{", name->name, s);
-unlock_user(s, arg1, 0);
-
-for (arg_ptr_addr = arg2; ; arg_ptr_addr += sizeof(abi_ulong)) {
-abi_ulong *arg_ptr, arg_addr;
-
-arg_ptr = lock_user(VERIFY_READ, arg_ptr_addr, sizeof(abi_ulong), 1);
-if (!arg_ptr)
-return;
-arg_addr = tswapal(*arg_ptr);
-unlock_user(arg_ptr, arg_ptr_addr, 0);
-if (!arg_addr)
-break;
-if ((s = lock_user_string(arg_addr))) {
-qemu_log("\"%s\",", s);
-unlock_user(s, arg_addr, 0);
-}
-}
-
-qemu_log("NULL})");
-}
-
 #ifdef TARGET_NR_ipc
 static void
 print_ipc(CPUArchState *cpu_env, const struct syscallname *name,
@@ -1969,6 +1937,45 @@ print_execv(CPUArchState *cpu_env, const struct 
syscallname *name,
 }
 #endif
 
+static void
+print_execve_argv(abi_long argv, int last)
+{
+abi_ulong arg_ptr_addr;
+char *s;
+
+qemu_log("{");
+for (arg_ptr_addr = argv; ; arg_ptr_addr += sizeof(abi_ulong)) {
+abi_ulong *arg_ptr, arg_addr;
+
+arg_ptr = lock_user(VERIFY_READ, arg_ptr_addr, sizeof(abi_ulong), 1);
+if (!arg_ptr) {
+return;
+}
+arg_addr = tswapal(*arg_ptr);
+unlock_user(arg_ptr, arg_ptr_addr, 0);
+if (!arg_addr) {
+break;
+}
+s = lock_user_string(arg_addr);
+if (s) {
+qemu_log("\"%s\",", s);
+unlock_user(s, arg_addr, 0);
+}
+}
+qemu_log("NULL}%s", get_comma(last));
+}
+
+static void
+print_execve(CPUArchState *cpu_env, const struct syscallname *name,
+ abi_long arg1, abi_long arg2, abi_long arg3,
+ abi_long arg4, abi_long arg5, abi_long arg6)
+{
+print_syscall_prologue(name);
+print_string(arg1, 0);
+print_execve_argv(arg2, 1);
+print_syscall_epilogue(name);
+}
+
 #if defined(TARGET_NR_faccessat) || defined(TARGET_NR_faccessat2)
 static void
 print_faccessat(CPUArchState *cpu_env, const struct syscallname *name,
-- 
2.39.1

[PULL 17/22] linux-user: Improve strace output of personality() and sysinfo()

2023-02-04 Thread Laurent Vivier

From: Helge Deller 

Make the strace look nicer for those two syscalls.

Signed-off-by: Helge Deller 
Reviewed-by: Richard Henderson 
Reviewed-by: Laurent Vivier 
Message-Id: 
Signed-off-by: Laurent Vivier 
---
 linux-user/strace.list | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/linux-user/strace.list b/linux-user/strace.list
index cf291d02edfe..3a1f61803a39 100644
--- a/linux-user/strace.list
+++ b/linux-user/strace.list
@@ -1049,7 +1049,8 @@
 { TARGET_NR_perfctr, "perfctr" , NULL, NULL, NULL },
 #endif
 #ifdef TARGET_NR_personality
-{ TARGET_NR_personality, "personality" , NULL, NULL, NULL },
+{ TARGET_NR_personality, "personality" , "%s(0x"TARGET_ABI_FMT_lx")", NULL,
+  print_syscall_ret_addr },
 #endif
 #ifdef TARGET_NR_pipe
 { TARGET_NR_pipe, "pipe" , NULL, NULL, NULL },
@@ -1504,7 +1505,7 @@
 { TARGET_NR_sysfs, "sysfs" , NULL, NULL, NULL },
 #endif
 #ifdef TARGET_NR_sysinfo
-{ TARGET_NR_sysinfo, "sysinfo" , NULL, NULL, NULL },
+{ TARGET_NR_sysinfo, "sysinfo" , "%s(%p)", NULL, NULL },
 #endif
 #ifdef TARGET_NR_sys_kexec_load
 { TARGET_NR_sys_kexec_load, "sys_kexec_load" , NULL, NULL, NULL },
-- 
2.39.1

[PULL 12/22] linux-user: Add strace output for clock_getres_time64() and futex_time64()

2023-02-04 Thread Laurent Vivier

From: Helge Deller 

Add the two syscalls to strace output to avoid "Unknown syscall" message.

Signed-off-by: Helge Deller 
Reviewed-by: Laurent Vivier 
Message-Id: <20230115113517.25143-1-del...@gmx.de>
Signed-off-by: Laurent Vivier 
---
 linux-user/strace.list | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/linux-user/strace.list b/linux-user/strace.list
index bb21c054148e..64db8e6b8412 100644
--- a/linux-user/strace.list
+++ b/linux-user/strace.list
@@ -86,6 +86,9 @@
 { TARGET_NR_clock_getres, "clock_getres" , NULL, print_clock_getres,
   print_syscall_ret_clock_getres },
 #endif
+#ifdef TARGET_NR_clock_getres_time64
+{ TARGET_NR_clock_getres_time64, "clock_getres_time64" , NULL, NULL, NULL },
+#endif
 #ifdef TARGET_NR_clock_gettime
 { TARGET_NR_clock_gettime, "clock_gettime" , NULL, print_clock_gettime,
print_syscall_ret_clock_gettime },
@@ -275,6 +278,9 @@
 #ifdef TARGET_NR_futex
 { TARGET_NR_futex, "futex" , NULL, print_futex, NULL },
 #endif
+#ifdef TARGET_NR_futex_time64
+{ TARGET_NR_futex_time64, "futex_time64" , NULL, NULL, NULL },
+#endif
 #ifdef TARGET_NR_futimesat
 { TARGET_NR_futimesat, "futimesat" , NULL, print_futimesat, NULL },
 #endif
-- 
2.39.1

[PULL 19/22] linux-user: Show 4th argument of rt_sigprocmask() in strace

2023-02-04 Thread Laurent Vivier

From: Helge Deller 

Add output for the missing 4th parameter (size_t sigsetsize).

Signed-off-by: Helge Deller 
Reviewed-by: Richard Henderson 
Reviewed-by: Laurent Vivier 
Message-Id: 
Signed-off-by: Laurent Vivier 
---
 linux-user/strace.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/linux-user/strace.c b/linux-user/strace.c
index f38227ba5db5..340010661c4f 100644
--- a/linux-user/strace.c
+++ b/linux-user/strace.c
@@ -3224,7 +3224,8 @@ print_rt_sigprocmask(CPUArchState *cpu_env, const struct 
syscallname *name,
 }
 qemu_log("%s,", how);
 print_pointer(arg1, 0);
-print_pointer(arg2, 1);
+print_pointer(arg2, 0);
+print_raw_param("%u", arg3, 1);
 print_syscall_epilogue(name);
 }
 #endif
-- 
2.39.1

[PULL 06/22] linux-user: Add missing MAP_HUGETLB and MAP_STACK flags in strace

2023-02-04 Thread Laurent Vivier

From: Helge Deller 

Add two missing mmap flags.

Signed-off-by: Helge Deller 
Reviewed-by: Laurent Vivier 

Message-Id: 
Signed-off-by: Laurent Vivier 
---
 linux-user/strace.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/linux-user/strace.c b/linux-user/strace.c
index 7bccb4f0c067..5027289bdde4 100644
--- a/linux-user/strace.c
+++ b/linux-user/strace.c
@@ -1057,6 +1057,8 @@ UNUSED static const struct flags mmap_flags[] = {
 #ifdef TARGET_MAP_UNINITIALIZED
 FLAG_TARGET(MAP_UNINITIALIZED),
 #endif
+FLAG_TARGET(MAP_HUGETLB),
+FLAG_TARGET(MAP_STACK),
 FLAG_END,
 };
 
-- 
2.39.1

[PULL 15/22] linux-user: Fix SO_ERROR return code of getsockopt()

2023-02-04 Thread Laurent Vivier

From: Helge Deller 

Add translation for the host error return code of:
getsockopt(19, SOL_SOCKET, SO_ERROR, [ECONNREFUSED], [4]) = 0

This fixes the testsuite of the cockpit debian package with a
hppa-linux guest on a x86-64 host.

Signed-off-by: Helge Deller 
Reviewed-by: Richard Henderson 
Reviewed-by: Laurent Vivier 
Message-Id: 
Signed-off-by: Laurent Vivier 
---
 linux-user/syscall.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 210db5f0be94..1c42df651801 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -2758,8 +2758,13 @@ get_timeout:
 ret = get_errno(getsockopt(sockfd, level, optname, &val, &lv));
 if (ret < 0)
 return ret;
-if (optname == SO_TYPE) {
+switch (optname) {
+case SO_TYPE:
 val = host_to_target_sock_type(val);
+break;
+case SO_ERROR:
+val = host_to_target_errno(val);
+break;
 }
 if (len > lv)
 len = lv;
-- 
2.39.1

[PULL 09/22] linux-user: add more netlink protocol constants

2023-02-04 Thread Laurent Vivier

From: Letu Ren 

Currently, qemu strace only prints four protocol contants. This patch
adds others listed in "linux/netlink.h".

Signed-off-by: Letu Ren 
Message-Id: <20230101141105.12024-1-fantasq...@gmail.com>
Signed-off-by: Laurent Vivier 
---
 linux-user/strace.c | 48 +
 1 file changed, 48 insertions(+)

diff --git a/linux-user/strace.c b/linux-user/strace.c
index 081fc87344ca..f38227ba5db5 100644
--- a/linux-user/strace.c
+++ b/linux-user/strace.c
@@ -506,21 +506,69 @@ print_socket_protocol(int domain, int type, int protocol)
 case NETLINK_ROUTE:
 qemu_log("NETLINK_ROUTE");
 break;
+case NETLINK_UNUSED:
+qemu_log("NETLINK_UNUSED");
+break;
+case NETLINK_USERSOCK:
+qemu_log("NETLINK_USERSOCK");
+break;
+case NETLINK_FIREWALL:
+qemu_log("NETLINK_FIREWALL");
+break;
+case NETLINK_SOCK_DIAG:
+qemu_log("NETLINK_SOCK_DIAG");
+break;
+case NETLINK_NFLOG:
+qemu_log("NETLINK_NFLOG");
+break;
+case NETLINK_XFRM:
+qemu_log("NETLINK_XFRM");
+break;
+case NETLINK_SELINUX:
+qemu_log("NETLINK_SELINUX");
+break;
+case NETLINK_ISCSI:
+qemu_log("NETLINK_ISCSI");
+break;
 case NETLINK_AUDIT:
 qemu_log("NETLINK_AUDIT");
 break;
+case NETLINK_FIB_LOOKUP:
+qemu_log("NETLINK_FIB_LOOKUP");
+break;
+case NETLINK_CONNECTOR:
+qemu_log("NETLINK_CONNECTOR");
+break;
 case NETLINK_NETFILTER:
 qemu_log("NETLINK_NETFILTER");
 break;
+case NETLINK_IP6_FW:
+qemu_log("NETLINK_IP6_FW");
+break;
+case NETLINK_DNRTMSG:
+qemu_log("NETLINK_DNRTMSG");
+break;
 case NETLINK_KOBJECT_UEVENT:
 qemu_log("NETLINK_KOBJECT_UEVENT");
 break;
+case NETLINK_GENERIC:
+qemu_log("NETLINK_GENERIC");
+break;
+case NETLINK_SCSITRANSPORT:
+qemu_log("NETLINK_SCSITRANSPORT");
+break;
+case NETLINK_ECRYPTFS:
+qemu_log("NETLINK_ECRYPTFS");
+break;
 case NETLINK_RDMA:
 qemu_log("NETLINK_RDMA");
 break;
 case NETLINK_CRYPTO:
 qemu_log("NETLINK_CRYPTO");
 break;
+case NETLINK_SMC:
+qemu_log("NETLINK_SMC");
+break;
 default:
 qemu_log("%d", protocol);
 break;
-- 
2.39.1

[PULL 18/22] linux-user: Add emulation for MADV_WIPEONFORK and MADV_KEEPONFORK in madvise()

2023-02-04 Thread Laurent Vivier

From: Helge Deller 

Both parameters have a different value on the parisc platform, so first
translate the target value into a host value for usage in the native
madvise() syscall.

Those parameters are often used by security sensitive applications (e.g.
tor browser, boringssl, ...) which expect the call to return a proper
return code on failure, so return -EINVAL if qemu fails to forward the
syscall to the host OS.

While touching this code, enhance the comments about MADV_DONTNEED.

Tested with testcase of tor browser when running hppa-linux guest on
x86-64 host.

Signed-off-by: Helge Deller 
Acked-by: Ilya Leoshkevich 
Reviewed-by: Laurent Vivier 
Message-Id: 
Signed-off-by: Laurent Vivier 
---
 linux-user/mmap.c | 56 ---
 1 file changed, 43 insertions(+), 13 deletions(-)

diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index 10f5079331c3..28135c9e6aa9 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -857,7 +857,7 @@ abi_long target_mremap(abi_ulong old_addr, abi_ulong 
old_size,
 return new_addr;
 }
 
-static bool can_passthrough_madv_dontneed(abi_ulong start, abi_ulong end)
+static bool can_passthrough_madvise(abi_ulong start, abi_ulong end)
 {
 ulong addr;
 
@@ -901,23 +901,53 @@ abi_long target_madvise(abi_ulong start, abi_ulong 
len_in, int advice)
 return -TARGET_EINVAL;
 }
 
+/* Translate for some architectures which have different MADV_xxx values */
+switch (advice) {
+case TARGET_MADV_DONTNEED:  /* alpha */
+advice = MADV_DONTNEED;
+break;
+case TARGET_MADV_WIPEONFORK:/* parisc */
+advice = MADV_WIPEONFORK;
+break;
+case TARGET_MADV_KEEPONFORK:/* parisc */
+advice = MADV_KEEPONFORK;
+break;
+/* we do not care about the other MADV_xxx values yet */
+}
+
 /*
- * A straight passthrough may not be safe because qemu sometimes turns
- * private file-backed mappings into anonymous mappings.
+ * Most advice values are hints, so ignoring and returning success is ok.
+ *
+ * However, some advice values such as MADV_DONTNEED, MADV_WIPEONFORK and
+ * MADV_KEEPONFORK are not hints and need to be emulated.
  *
- * This is a hint, so ignoring and returning success is ok.
+ * A straight passthrough for those may not be safe because qemu sometimes
+ * turns private file-backed mappings into anonymous mappings.
+ * can_passthrough_madvise() helps to check if a passthrough is possible by
+ * comparing mappings that are known to have the same semantics in the host
+ * and the guest. In this case passthrough is safe.
  *
- * This breaks MADV_DONTNEED, completely implementing which is quite
- * complicated. However, there is one low-hanging fruit: mappings that are
- * known to have the same semantics in the host and the guest. In this case
- * passthrough is safe, so do it.
+ * We pass through MADV_WIPEONFORK and MADV_KEEPONFORK if possible and
+ * return failure if not.
+ *
+ * MADV_DONTNEED is passed through as well, if possible.
+ * If passthrough isn't possible, we nevertheless (wrongly!) return
+ * success, which is broken but some userspace programs fail to work
+ * otherwise. Completely implementing such emulation is quite complicated
+ * though.
  */
 mmap_lock();
-if (advice == TARGET_MADV_DONTNEED &&
-can_passthrough_madv_dontneed(start, end)) {
-ret = get_errno(madvise(g2h_untagged(start), len, MADV_DONTNEED));
-if (ret == 0) {
-page_reset_target_data(start, start + len);
+switch (advice) {
+case MADV_WIPEONFORK:
+case MADV_KEEPONFORK:
+ret = -EINVAL;
+/* fall through */
+case MADV_DONTNEED:
+if (can_passthrough_madvise(start, end)) {
+ret = get_errno(madvise(g2h_untagged(start), len, advice));
+if ((advice == MADV_DONTNEED) && (ret == 0)) {
+page_reset_target_data(start, start + len);
+}
 }
 }
 mmap_unlock();
-- 
2.39.1

[PULL 07/22] linux-user: un-parent OBJECT(cpu) when closing thread

2023-02-04 Thread Laurent Vivier

From: Richard Henderson 

This reinstates commit 52f0c1607671293afcdb2acc2f83e9bccbfa74bb:

While forcing the CPU to unrealize by hand does trigger the clean-up
code we never fully free resources because refcount never reaches
zero. This is because QOM automatically added objects without an
explicit parent to /unattached/, incrementing the refcount.

Instead of manually triggering unrealization just unparent the object
and let the device machinery deal with that for us.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/866
Signed-off-by: Alex Bennée 
Reviewed-by: Laurent Vivier 
Message-Id: <20220811151413.3350684-2-alex.ben...@linaro.org>

The original patch tickled a problem in target/arm, and was reverted.
But that problem is fixed as of commit 3b07a936d3bf.

Signed-off-by: Richard Henderson 
Message-Id: <20230124201019.3935934-1-richard.hender...@linaro.org>
Signed-off-by: Laurent Vivier 
---
 linux-user/syscall.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 3e72bd333ede..dbf51e500b4f 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8756,7 +8756,13 @@ static abi_long do_syscall1(CPUArchState *cpu_env, int 
num, abi_long arg1,
 if (CPU_NEXT(first_cpu)) {
 TaskState *ts = cpu->opaque;
 
-object_property_set_bool(OBJECT(cpu), "realized", false, NULL);
+if (ts->child_tidptr) {
+put_user_u32(0, ts->child_tidptr);
+do_sys_futex(g2h(cpu, ts->child_tidptr),
+ FUTEX_WAKE, INT_MAX, NULL, NULL, 0);
+}
+
+object_unparent(OBJECT(cpu));
 object_unref(OBJECT(cpu));
 /*
  * At this point the CPU should be unrealized and removed
@@ -8766,11 +8772,6 @@ static abi_long do_syscall1(CPUArchState *cpu_env, int 
num, abi_long arg1,
 
 pthread_mutex_unlock(&clone_lock);
 
-if (ts->child_tidptr) {
-put_user_u32(0, ts->child_tidptr);
-do_sys_futex(g2h(cpu, ts->child_tidptr),
- FUTEX_WAKE, INT_MAX, NULL, NULL, 0);
-}
 thread_cpu = NULL;
 g_free(ts);
 rcu_unregister_thread();
-- 
2.39.1

[PULL 08/22] linux-user: fix strace build w/out munlockall

2023-02-04 Thread Laurent Vivier

From: Mike Frysinger 

Signed-off-by: Mike Frysinger 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: <20230118090144.31155-1-vap...@gentoo.org>
Signed-off-by: Laurent Vivier 
---
 linux-user/strace.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/linux-user/strace.c b/linux-user/strace.c
index 5027289bdde4..081fc87344ca 100644
--- a/linux-user/strace.c
+++ b/linux-user/strace.c
@@ -1360,7 +1360,8 @@ UNUSED static const struct flags termios_lflags[] = {
 FLAG_END,
 };
 
-UNUSED static const struct flags mlockall_flags[] = {
+#ifdef TARGET_NR_mlockall
+static const struct flags mlockall_flags[] = {
 FLAG_TARGET(MCL_CURRENT),
 FLAG_TARGET(MCL_FUTURE),
 #ifdef MCL_ONFAULT
@@ -1368,6 +1369,7 @@ UNUSED static const struct flags mlockall_flags[] = {
 #endif
 FLAG_END,
 };
+#endif
 
 /* IDs of the various system clocks */
 #define TARGET_CLOCK_REALTIME  0
-- 
2.39.1

[PULL 22/22] linux-user: Allow sendmsg() without IOV

2023-02-04 Thread Laurent Vivier

From: Helge Deller 

Applications do call sendmsg() without any IOV, e.g.:
 sendmsg(4, {msg_name=NULL, msg_namelen=0, msg_iov=NULL, msg_iovlen=0,
msg_control=[{cmsg_len=36, cmsg_level=SOL_ALG, cmsg_type=0x2}],
msg_controllen=40, msg_flags=0}, MSG_MORE) = 0
 sendmsg(4, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="The quick brown 
fox jumps over t"..., iov_len=183}],
msg_iovlen=1, msg_control=[{cmsg_len=20, cmsg_level=SOL_ALG, 
cmsg_type=0x3}],
msg_controllen=24, msg_flags=0}, 0) = 183

The function do_sendrecvmsg_locked() is used for sndmsg() and recvmsg()
and calls lock_iovec() to lock the IOV into memory. For the first
sendmsg() above it returns NULL and thus wrongly skips the call the host
sendmsg() syscall, which will break the calling application.

Fix this issue by:
- allowing sendmsg() even with empty IOV
- skip recvmsg() if IOV is NULL
- skip both if the return code of do_sendrecvmsg_locked() != 0, which
  indicates some failure like EFAULT on the IOV

Tested with the debian "ell" package with hppa guest on x86_64 host.

Signed-off-by: Helge Deller 
Reviewed-by: Laurent Vivier 
Message-Id: <20221212173416.90590-2-del...@gmx.de>
Signed-off-by: Laurent Vivier 
---
 linux-user/syscall.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index a0d2beddaa4e..1e868e9b0e27 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -3293,7 +3293,10 @@ static abi_long do_sendrecvmsg_locked(int fd, struct 
target_msghdr *msgp,
  target_vec, count, send);
 if (vec == NULL) {
 ret = -host_to_target_errno(errno);
-goto out2;
+/* allow sending packet without any iov, e.g. with MSG_MORE flag */
+if (!send || ret) {
+goto out2;
+}
 }
 msg.msg_iovlen = count;
 msg.msg_iov = vec;
@@ -3345,7 +3348,9 @@ static abi_long do_sendrecvmsg_locked(int fd, struct 
target_msghdr *msgp,
 }
 
 out:
-unlock_iovec(vec, target_vec, count, !send);
+if (vec) {
+unlock_iovec(vec, target_vec, count, !send);
+}
 out2:
 return ret;
 }
-- 
2.39.1

[PULL 04/22] linux-user/syscall: Extract do_execve() from do_syscall1()

2023-02-04 Thread Laurent Vivier

From: Drew DeVault 

execve() is a particular case of execveat(). In order
to add do_execveat(), first factor do_execve() out.

Signed-off-by: Drew DeVault 
Message-Id: <20221104081015.706009-1-...@cmpwn.com>
[PMD: Split of bigger patch, filled description, fixed style]
Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Laurent Vivier 
Message-Id: <20221104173632.1052-5-phi...@linaro.org>
Signed-off-by: Laurent Vivier 
---
 linux-user/syscall.c | 211 +++
 1 file changed, 114 insertions(+), 97 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 1f8c10f8ef94..11236d16a372 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8357,6 +8357,119 @@ static int do_openat(CPUArchState *cpu_env, int dirfd, 
const char *pathname, int
 return safe_openat(dirfd, path(pathname), flags, mode);
 }
 
+static int do_execve(CPUArchState *cpu_env,
+   abi_long pathname, abi_long guest_argp,
+   abi_long guest_envp)
+{
+int ret;
+char **argp, **envp;
+int argc, envc;
+abi_ulong gp;
+abi_ulong addr;
+char **q;
+void *p;
+
+argc = 0;
+
+for (gp = guest_argp; gp; gp += sizeof(abi_ulong)) {
+if (get_user_ual(addr, gp)) {
+return -TARGET_EFAULT;
+}
+if (!addr) {
+break;
+}
+argc++;
+}
+envc = 0;
+for (gp = guest_envp; gp; gp += sizeof(abi_ulong)) {
+if (get_user_ual(addr, gp)) {
+return -TARGET_EFAULT;
+}
+if (!addr) {
+break;
+}
+envc++;
+}
+
+argp = g_new0(char *, argc + 1);
+envp = g_new0(char *, envc + 1);
+
+for (gp = guest_argp, q = argp; gp; gp += sizeof(abi_ulong), q++) {
+if (get_user_ual(addr, gp)) {
+goto execve_efault;
+}
+if (!addr) {
+break;
+}
+*q = lock_user_string(addr);
+if (!*q) {
+goto execve_efault;
+}
+}
+*q = NULL;
+
+for (gp = guest_envp, q = envp; gp; gp += sizeof(abi_ulong), q++) {
+if (get_user_ual(addr, gp)) {
+goto execve_efault;
+}
+if (!addr) {
+break;
+}
+*q = lock_user_string(addr);
+if (!*q) {
+goto execve_efault;
+}
+}
+*q = NULL;
+
+/*
+ * Although execve() is not an interruptible syscall it is
+ * a special case where we must use the safe_syscall wrapper:
+ * if we allow a signal to happen before we make the host
+ * syscall then we will 'lose' it, because at the point of
+ * execve the process leaves QEMU's control. So we use the
+ * safe syscall wrapper to ensure that we either take the
+ * signal as a guest signal, or else it does not happen
+ * before the execve completes and makes it the other
+ * program's problem.
+ */
+p = lock_user_string(pathname);
+if (!p) {
+goto execve_efault;
+}
+
+if (is_proc_myself(p, "exe")) {
+ret = get_errno(safe_execve(exec_path, argp, envp));
+} else {
+ret = get_errno(safe_execve(p, argp, envp));
+}
+
+unlock_user(p, pathname, 0);
+
+goto execve_end;
+
+execve_efault:
+ret = -TARGET_EFAULT;
+
+execve_end:
+for (gp = guest_argp, q = argp; *q; gp += sizeof(abi_ulong), q++) {
+if (get_user_ual(addr, gp) || !addr) {
+break;
+}
+unlock_user(*q, addr, 0);
+}
+for (gp = guest_envp, q = envp; *q; gp += sizeof(abi_ulong), q++) {
+if (get_user_ual(addr, gp) || !addr) {
+break;
+}
+unlock_user(*q, addr, 0);
+}
+
+g_free(argp);
+g_free(envp);
+return ret;
+}
+
 #define TIMER_MAGIC 0x0caf
 #define TIMER_MAGIC_MASK 0x
 
@@ -8867,103 +8980,7 @@ static abi_long do_syscall1(CPUArchState *cpu_env, int 
num, abi_long arg1,
 return ret;
 #endif
 case TARGET_NR_execve:
-{
-char **argp, **envp;
-int argc, envc;
-abi_ulong gp;
-abi_ulong guest_argp;
-abi_ulong guest_envp;
-abi_ulong addr;
-char **q;
-
-argc = 0;
-guest_argp = arg2;
-for (gp = guest_argp; gp; gp += sizeof(abi_ulong)) {
-if (get_user_ual(addr, gp))
-return -TARGET_EFAULT;
-if (!addr)
-break;
-argc++;
-}
-envc = 0;
-guest_envp = arg3;
-for (gp = guest_envp; gp; gp += sizeof(abi_ulong)) {
-if (get_user_ual(addr, gp))
-return -TARGET_EFAULT;
-if (!addr)
-break;
-envc++;
-}
-
-argp = g_new0(char *, argc + 1);
-envp = g_new0(char *, envc + 1);
-
-for (gp = guest_argp, q = argp; gp;
-

[PULL 20/22] linux-user: Enhance strace output for various syscalls

2023-02-04 Thread Laurent Vivier

From: Helge Deller 

Add appropriate strace printf formats for various Linux syscalls.

Signed-off-by: Helge Deller 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: 
Signed-off-by: Laurent Vivier 
---
 linux-user/strace.list | 43 ++
 1 file changed, 23 insertions(+), 20 deletions(-)

diff --git a/linux-user/strace.list b/linux-user/strace.list
index 3a1f61803a39..d8acbeec6093 100644
--- a/linux-user/strace.list
+++ b/linux-user/strace.list
@@ -343,7 +343,7 @@
 { TARGET_NR_getpagesize, "getpagesize" , NULL, NULL, NULL },
 #endif
 #ifdef TARGET_NR_getpeername
-{ TARGET_NR_getpeername, "getpeername" , NULL, NULL, NULL },
+{ TARGET_NR_getpeername, "getpeername" , "%s(%d,%p,%p)", NULL, NULL },
 #endif
 #ifdef TARGET_NR_getpgid
 { TARGET_NR_getpgid, "getpgid" , "%s(%u)", NULL, NULL },
@@ -367,19 +367,19 @@
 { TARGET_NR_getrandom, "getrandom", "%s(%p,%u,%u)", NULL, NULL },
 #endif
 #ifdef TARGET_NR_getresgid
-{ TARGET_NR_getresgid, "getresgid" , NULL, NULL, NULL },
+{ TARGET_NR_getresgid, "getresgid" , "%s(%p,%p,%p)", NULL, NULL },
 #endif
 #ifdef TARGET_NR_getresgid32
 { TARGET_NR_getresgid32, "getresgid32" , NULL, NULL, NULL },
 #endif
 #ifdef TARGET_NR_getresuid
-{ TARGET_NR_getresuid, "getresuid" , NULL, NULL, NULL },
+{ TARGET_NR_getresuid, "getresuid" , "%s(%p,%p,%p)", NULL, NULL },
 #endif
 #ifdef TARGET_NR_getresuid32
 { TARGET_NR_getresuid32, "getresuid32" , NULL, NULL, NULL },
 #endif
 #ifdef TARGET_NR_getrlimit
-{ TARGET_NR_getrlimit, "getrlimit" , NULL, NULL, NULL },
+{ TARGET_NR_getrlimit, "getrlimit" , "%s(%d,%p)", NULL, NULL },
 #endif
 #ifdef TARGET_NR_get_robust_list
 { TARGET_NR_get_robust_list, "get_robust_list" , NULL, NULL, NULL },
@@ -391,10 +391,10 @@
 { TARGET_NR_getsid, "getsid" , "%s(%d)", NULL, NULL },
 #endif
 #ifdef TARGET_NR_getsockname
-{ TARGET_NR_getsockname, "getsockname" , NULL, NULL, NULL },
+{ TARGET_NR_getsockname, "getsockname" , "%s(%d,%p,%p)", NULL, NULL },
 #endif
 #ifdef TARGET_NR_getsockopt
-{ TARGET_NR_getsockopt, "getsockopt" , NULL, NULL, NULL },
+{ TARGET_NR_getsockopt, "getsockopt" , "%s(%d,%d,%d,%p,%p)", NULL, NULL },
 #endif
 #ifdef TARGET_NR_get_thread_area
 #if defined(TARGET_I386) && defined(TARGET_ABI32)
@@ -1059,10 +1059,10 @@
 { TARGET_NR_pivot_root, "pivot_root" , NULL, NULL, NULL },
 #endif
 #ifdef TARGET_NR_poll
-{ TARGET_NR_poll, "poll" , NULL, NULL, NULL },
+{ TARGET_NR_poll, "poll" , "%s(%p,%u,%d)", NULL, NULL },
 #endif
 #ifdef TARGET_NR_ppoll
-{ TARGET_NR_ppoll, "ppoll" , NULL, NULL, NULL },
+{ TARGET_NR_ppoll, "ppoll" , "%s(%p,%u,%p,%p)", NULL, NULL },
 #endif
 #ifdef TARGET_NR_prctl
 { TARGET_NR_prctl, "prctl" , NULL, NULL, NULL },
@@ -1131,7 +1131,7 @@
 { TARGET_NR_reboot, "reboot" , NULL, NULL, NULL },
 #endif
 #ifdef TARGET_NR_recv
-{ TARGET_NR_recv, "recv" , NULL, NULL, NULL },
+{ TARGET_NR_recv, "recv" , "%s(%d,%p,%u,%d)", NULL, NULL },
 #endif
 #ifdef TARGET_NR_recvfrom
 { TARGET_NR_recvfrom, "recvfrom" , NULL, NULL, NULL },
@@ -1191,7 +1191,7 @@
 { TARGET_NR_rt_sigqueueinfo, "rt_sigqueueinfo" , NULL, print_rt_sigqueueinfo, 
NULL },
 #endif
 #ifdef TARGET_NR_rt_sigreturn
-{ TARGET_NR_rt_sigreturn, "rt_sigreturn" , NULL, NULL, NULL },
+{ TARGET_NR_rt_sigreturn, "rt_sigreturn" , "%s(%p)", NULL, NULL },
 #endif
 #ifdef TARGET_NR_rt_sigsuspend
 { TARGET_NR_rt_sigsuspend, "rt_sigsuspend" , NULL, NULL, NULL },
@@ -1203,16 +1203,19 @@
 { TARGET_NR_rt_tgsigqueueinfo, "rt_tgsigqueueinfo" , NULL, 
print_rt_tgsigqueueinfo, NULL },
 #endif
 #ifdef TARGET_NR_sched_getaffinity
-{ TARGET_NR_sched_getaffinity, "sched_getaffinity" , NULL, NULL, NULL },
+{ TARGET_NR_sched_getaffinity, "sched_getaffinity" , "%s(%d,%u,%p)", NULL, 
NULL },
 #endif
 #ifdef TARGET_NR_sched_get_affinity
 { TARGET_NR_sched_get_affinity, "sched_get_affinity" , NULL, NULL, NULL },
 #endif
 #ifdef TARGET_NR_sched_getattr
-{ TARGET_NR_sched_getattr, "sched_getattr" , NULL, NULL, NULL },
+{ TARGET_NR_sched_getattr, "sched_getattr" , "%s(%d,%p,%u,%u)", NULL, NULL },
+#endif
+#ifdef TARGET_NR_sched_setattr
+{ TARGET_NR_sched_setattr, "sched_setattr" , "%s(%p,%p)", NULL, NULL },
 #endif
 #ifdef TARGET_NR_sched_getparam
-{ TARGET_NR_sched_getparam, "sched_getparam" , NULL, NULL, NULL },
+{ TARGET_NR_sched_getparam, "sched_getparam" , "%s(%d,%p)", NULL, NULL },
 #endif
 #ifdef TARGET_NR_sched_get_priority_max
 { TARGET_NR_sched_get_priority_max, "sched_get_priority_max" , NULL, NULL, 
NULL },
@@ -1227,7 +1230,7 @@
 { TARGET_NR_sched_rr_get_interval, "sched_rr_get_interval" , NULL, NULL, NULL 
},
 #endif
 #ifdef TARGET_NR_sched_setaffinity
-{ TARGET_NR_sched_setaffinity, "sched_setaffinity" , NULL, NULL, NULL },
+{ TARGET_NR_sched_setaffinity, "sched_setaffinity" , "%s(%d,%u,%p)", NULL, 
NULL },
 #endif
 #ifdef TARGET_NR_sched_setatt
 { TARGET_NR_sched_setatt, "sched_setatt" , NULL, NULL, NULL },
@@ -1360,23 +1363,23 @@
 { TARGET_NR_setreuid32, "setreuid32" , "%s(%u,%u)", NULL, NULL },
 #endif
 #ifdef TARGET_NR_setrlimit
-{ TARGET_NR_setrlimit, "setrlim

[PULL 05/22] linux-user/syscall: Implement execveat()

2023-02-04 Thread Laurent Vivier

From: Drew DeVault 

References: https://gitlab.com/qemu-project/qemu/-/issues/1007
Signed-off-by: Drew DeVault 
Reviewed-by: Laurent Vivier 
Message-Id: <20221104081015.706009-1-...@cmpwn.com>
Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20221104173632.1052-6-phi...@linaro.org>
Signed-off-by: Laurent Vivier 
---
 linux-user/syscall.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 11236d16a372..3e72bd333ede 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -696,7 +696,8 @@ safe_syscall4(pid_t, wait4, pid_t, pid, int *, status, int, 
options, \
 #endif
 safe_syscall5(int, waitid, idtype_t, idtype, id_t, id, siginfo_t *, infop, \
   int, options, struct rusage *, rusage)
-safe_syscall3(int, execve, const char *, filename, char **, argv, char **, 
envp)
+safe_syscall5(int, execveat, int, dirfd, const char *, filename,
+  char **, argv, char **, envp, int, flags)
 #if defined(TARGET_NR_select) || defined(TARGET_NR__newselect) || \
 defined(TARGET_NR_pselect6) || defined(TARGET_NR_pselect6_time64)
 safe_syscall6(int, pselect6, int, nfds, fd_set *, readfds, fd_set *, writefds, 
\
@@ -8357,9 +8358,9 @@ static int do_openat(CPUArchState *cpu_env, int dirfd, 
const char *pathname, int
 return safe_openat(dirfd, path(pathname), flags, mode);
 }
 
-static int do_execve(CPUArchState *cpu_env,
+static int do_execveat(CPUArchState *cpu_env, int dirfd,
abi_long pathname, abi_long guest_argp,
-   abi_long guest_envp)
+   abi_long guest_envp, int flags)
 {
 int ret;
 char **argp, **envp;
@@ -8439,9 +8440,9 @@ static int do_execve(CPUArchState *cpu_env,
 }
 
 if (is_proc_myself(p, "exe")) {
-ret = get_errno(safe_execve(exec_path, argp, envp));
+ret = get_errno(safe_execveat(dirfd, exec_path, argp, envp, flags));
 } else {
-ret = get_errno(safe_execve(p, argp, envp));
+ret = get_errno(safe_execveat(dirfd, p, argp, envp, flags));
 }
 
 unlock_user(p, pathname, 0);
@@ -8979,8 +8980,10 @@ static abi_long do_syscall1(CPUArchState *cpu_env, int 
num, abi_long arg1,
 unlock_user(p, arg2, 0);
 return ret;
 #endif
+case TARGET_NR_execveat:
+return do_execveat(cpu_env, arg1, arg2, arg3, arg4, arg5);
 case TARGET_NR_execve:
-return do_execve(cpu_env, arg1, arg2, arg3);
+return do_execveat(cpu_env, AT_FDCWD, arg1, arg2, arg3, 0);
 case TARGET_NR_chdir:
 if (!(p = lock_user_string(arg1)))
 return -TARGET_EFAULT;
-- 
2.39.1

[PULL 11/22] Revert "linux-user: fix compat with glibc >= 2.36 sys/mount.h"

2023-02-04 Thread Laurent Vivier

From: Daniel P. Berrangé 

This reverts commit 3cd3df2a9584e6f753bb62a0028bd67124ab5532.

glibc has fixed (in 2.36.9000-40-g774058d729) the problem
that caused a clash when both sys/mount.h annd linux/mount.h
are included, and backported this to the 2.36 stable release
too:

  
https://sourceware.org/glibc/wiki/Release/2.36#Usage_of_.3Clinux.2Fmount.h.3E_and_.3Csys.2Fmount.h.3E

It is saner for QEMU to remove the workaround it applied for
glibc 2.36 and expect distros to ship the 2.36 maint release
with the fix. This avoids needing to add a further workaround
to QEMU to deal with the fact that linux/brtfs.h now also pulls
in linux/mount.h via linux/fs.h since Linux 6.1

Signed-off-by: Daniel P. Berrangé 
Reviewed-by: Marc-André Lureau 
Message-Id: <20230110174901.2580297-3-berra...@redhat.com>
Signed-off-by: Laurent Vivier 
---
 linux-user/syscall.c | 18 --
 meson.build  |  2 --
 2 files changed, 20 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index b88f8ee96f0f..210db5f0be94 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -95,25 +95,7 @@
 #include 
 #include 
 #include 
-
-#ifdef HAVE_SYS_MOUNT_FSCONFIG
-/*
- * glibc >= 2.36 linux/mount.h conflicts with sys/mount.h,
- * which in turn prevents use of linux/fs.h. So we have to
- * define the constants ourselves for now.
- */
-#define FS_IOC_GETFLAGS_IOR('f', 1, long)
-#define FS_IOC_SETFLAGS_IOW('f', 2, long)
-#define FS_IOC_GETVERSION  _IOR('v', 1, long)
-#define FS_IOC_SETVERSION  _IOW('v', 2, long)
-#define FS_IOC_FIEMAP  _IOWR('f', 11, struct fiemap)
-#define FS_IOC32_GETFLAGS  _IOR('f', 1, int)
-#define FS_IOC32_SETFLAGS  _IOW('f', 2, int)
-#define FS_IOC32_GETVERSION_IOR('v', 1, int)
-#define FS_IOC32_SETVERSION_IOW('v', 2, int)
-#else
 #include 
-#endif
 #include 
 #if defined(CONFIG_FIEMAP)
 #include 
diff --git a/meson.build b/meson.build
index 6d3b66562975..cccd19f864e3 100644
--- a/meson.build
+++ b/meson.build
@@ -2046,8 +2046,6 @@ config_host_data.set('HAVE_OPTRESET',
  cc.has_header_symbol('getopt.h', 'optreset'))
 config_host_data.set('HAVE_IPPROTO_MPTCP',
  cc.has_header_symbol('netinet/in.h', 'IPPROTO_MPTCP'))
-config_host_data.set('HAVE_SYS_MOUNT_FSCONFIG',
- cc.has_header_symbol('sys/mount.h', 'FSCONFIG_SET_FLAG'))
 
 # has_member
 config_host_data.set('HAVE_SIGEV_NOTIFY_THREAD_ID',
-- 
2.39.1

[PULL 10/22] Revert "linux-user: add more compat ioctl definitions"

2023-02-04 Thread Laurent Vivier

From: Daniel P. Berrangé 

This reverts commit c5495f4ecb0cdaaf2e9dddeb48f1689cdb520ca0.

glibc has fixed (in 2.36.9000-40-g774058d729) the problem
that caused a clash when both sys/mount.h annd linux/mount.h
are included, and backported this to the 2.36 stable release
too:

  
https://sourceware.org/glibc/wiki/Release/2.36#Usage_of_.3Clinux.2Fmount.h.3E_and_.3Csys.2Fmount.h.3E

It is saner for QEMU to remove the workaround it applied for
glibc 2.36 and expect distros to ship the 2.36 maint release
with the fix. This avoids needing to add a further workaround
to QEMU to deal with the fact that linux/brtfs.h now also pulls
in linux/mount.h via linux/fs.h since Linux 6.1

Signed-off-by: Daniel P. Berrangé 
Reviewed-by: Marc-André Lureau 
Message-Id: <20230110174901.2580297-2-berra...@redhat.com>
Signed-off-by: Laurent Vivier 
---
 linux-user/syscall.c | 25 -
 1 file changed, 25 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index dbf51e500b4f..b88f8ee96f0f 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -111,31 +111,6 @@
 #define FS_IOC32_SETFLAGS  _IOW('f', 2, int)
 #define FS_IOC32_GETVERSION_IOR('v', 1, int)
 #define FS_IOC32_SETVERSION_IOW('v', 2, int)
-
-#define BLKGETSIZE64 _IOR(0x12,114,size_t)
-#define BLKDISCARD _IO(0x12,119)
-#define BLKIOMIN _IO(0x12,120)
-#define BLKIOOPT _IO(0x12,121)
-#define BLKALIGNOFF _IO(0x12,122)
-#define BLKPBSZGET _IO(0x12,123)
-#define BLKDISCARDZEROES _IO(0x12,124)
-#define BLKSECDISCARD _IO(0x12,125)
-#define BLKROTATIONAL _IO(0x12,126)
-#define BLKZEROOUT _IO(0x12,127)
-
-#define FIBMAP _IO(0x00,1)
-#define FIGETBSZ   _IO(0x00,2)
-
-struct file_clone_range {
-__s64 src_fd;
-__u64 src_offset;
-__u64 src_length;
-__u64 dest_offset;
-};
-
-#define FICLONE _IOW(0x94, 9, int)
-#define FICLONERANGE_IOW(0x94, 13, struct file_clone_range)
-
 #else
 #include 
 #endif
-- 
2.39.1

[PULL 01/22] linux-user/strace: Constify struct flags

2023-02-04 Thread Laurent Vivier

From: Philippe Mathieu-Daudé 

print_flags() takes a const pointer.

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Laurent Vivier 
Message-Id: <20221104173632.1052-2-phi...@linaro.org>
Signed-off-by: Laurent Vivier 
---
 linux-user/strace.c | 40 
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/linux-user/strace.c b/linux-user/strace.c
index 9ae5a812cd71..25c47f03160d 100644
--- a/linux-user/strace.c
+++ b/linux-user/strace.c
@@ -945,7 +945,7 @@ print_syscall_ret_ioctl(CPUArchState *cpu_env, const struct 
syscallname *name,
 }
 #endif
 
-UNUSED static struct flags access_flags[] = {
+UNUSED static const struct flags access_flags[] = {
 FLAG_GENERIC(F_OK),
 FLAG_GENERIC(R_OK),
 FLAG_GENERIC(W_OK),
@@ -953,7 +953,7 @@ UNUSED static struct flags access_flags[] = {
 FLAG_END,
 };
 
-UNUSED static struct flags at_file_flags[] = {
+UNUSED static const struct flags at_file_flags[] = {
 #ifdef AT_EACCESS
 FLAG_GENERIC(AT_EACCESS),
 #endif
@@ -963,14 +963,14 @@ UNUSED static struct flags at_file_flags[] = {
 FLAG_END,
 };
 
-UNUSED static struct flags unlinkat_flags[] = {
+UNUSED static const struct flags unlinkat_flags[] = {
 #ifdef AT_REMOVEDIR
 FLAG_GENERIC(AT_REMOVEDIR),
 #endif
 FLAG_END,
 };
 
-UNUSED static struct flags mode_flags[] = {
+UNUSED static const struct flags mode_flags[] = {
 FLAG_GENERIC(S_IFSOCK),
 FLAG_GENERIC(S_IFLNK),
 FLAG_GENERIC(S_IFREG),
@@ -981,14 +981,14 @@ UNUSED static struct flags mode_flags[] = {
 FLAG_END,
 };
 
-UNUSED static struct flags open_access_flags[] = {
+UNUSED static const struct flags open_access_flags[] = {
 FLAG_TARGET(O_RDONLY),
 FLAG_TARGET(O_WRONLY),
 FLAG_TARGET(O_RDWR),
 FLAG_END,
 };
 
-UNUSED static struct flags open_flags[] = {
+UNUSED static const struct flags open_flags[] = {
 FLAG_TARGET(O_APPEND),
 FLAG_TARGET(O_CREAT),
 FLAG_TARGET(O_DIRECTORY),
@@ -1019,7 +1019,7 @@ UNUSED static struct flags open_flags[] = {
 FLAG_END,
 };
 
-UNUSED static struct flags mount_flags[] = {
+UNUSED static const struct flags mount_flags[] = {
 #ifdef MS_BIND
 FLAG_GENERIC(MS_BIND),
 #endif
@@ -1044,7 +1044,7 @@ UNUSED static struct flags mount_flags[] = {
 FLAG_END,
 };
 
-UNUSED static struct flags umount2_flags[] = {
+UNUSED static const struct flags umount2_flags[] = {
 #ifdef MNT_FORCE
 FLAG_GENERIC(MNT_FORCE),
 #endif
@@ -1057,7 +1057,7 @@ UNUSED static struct flags umount2_flags[] = {
 FLAG_END,
 };
 
-UNUSED static struct flags mmap_prot_flags[] = {
+UNUSED static const struct flags mmap_prot_flags[] = {
 FLAG_GENERIC(PROT_NONE),
 FLAG_GENERIC(PROT_EXEC),
 FLAG_GENERIC(PROT_READ),
@@ -1068,7 +1068,7 @@ UNUSED static struct flags mmap_prot_flags[] = {
 FLAG_END,
 };
 
-UNUSED static struct flags mmap_flags[] = {
+UNUSED static const struct flags mmap_flags[] = {
 FLAG_TARGET(MAP_SHARED),
 FLAG_TARGET(MAP_PRIVATE),
 FLAG_TARGET(MAP_ANONYMOUS),
@@ -1092,7 +1092,7 @@ UNUSED static struct flags mmap_flags[] = {
 FLAG_END,
 };
 
-UNUSED static struct flags clone_flags[] = {
+UNUSED static const struct flags clone_flags[] = {
 FLAG_GENERIC(CLONE_VM),
 FLAG_GENERIC(CLONE_FS),
 FLAG_GENERIC(CLONE_FILES),
@@ -1136,7 +1136,7 @@ UNUSED static struct flags clone_flags[] = {
 FLAG_END,
 };
 
-UNUSED static struct flags msg_flags[] = {
+UNUSED static const struct flags msg_flags[] = {
 /* send */
 FLAG_GENERIC(MSG_CONFIRM),
 FLAG_GENERIC(MSG_DONTROUTE),
@@ -1156,7 +1156,7 @@ UNUSED static struct flags msg_flags[] = {
 FLAG_END,
 };
 
-UNUSED static struct flags statx_flags[] = {
+UNUSED static const struct flags statx_flags[] = {
 #ifdef AT_EMPTY_PATH
 FLAG_GENERIC(AT_EMPTY_PATH),
 #endif
@@ -1178,7 +1178,7 @@ UNUSED static struct flags statx_flags[] = {
 FLAG_END,
 };
 
-UNUSED static struct flags statx_mask[] = {
+UNUSED static const struct flags statx_mask[] = {
 /* This must come first, because it includes everything.  */
 #ifdef STATX_ALL
 FLAG_GENERIC(STATX_ALL),
@@ -1226,7 +1226,7 @@ UNUSED static struct flags statx_mask[] = {
 FLAG_END,
 };
 
-UNUSED static struct flags falloc_flags[] = {
+UNUSED static const struct flags falloc_flags[] = {
 FLAG_GENERIC(FALLOC_FL_KEEP_SIZE),
 FLAG_GENERIC(FALLOC_FL_PUNCH_HOLE),
 #ifdef FALLOC_FL_NO_HIDE_STALE
@@ -1246,7 +1246,7 @@ UNUSED static struct flags falloc_flags[] = {
 #endif
 };
 
-UNUSED static struct flags termios_iflags[] = {
+UNUSED static const struct flags termios_iflags[] = {
 FLAG_TARGET(IGNBRK),
 FLAG_TARGET(BRKINT),
 FLAG_TARGET(IGNPAR),
@@ -1265,7 +1265,7 @@ UNUSED static struct flags termios_iflags[] = {
 FLAG_END,
 };
 
-UNUSED static struct flags termios_oflags[] = {
+UNUSED static const struct flags termios_oflags[] = {
 FLAG_TARGET(OPOST),
 FLAG_TARGET(OLCUC),
 FLAG_TARGET(ONLCR),
@@ -1349,7 +1349,7 @@ UNUSED static struct enums termios_cflags_CS

[PULL 21/22] linux-user: Implement SOL_ALG encryption support

2023-02-04 Thread Laurent Vivier

From: Helge Deller 

Add suport to handle SOL_ALG packets via sendmsg() and recvmsg().
This allows emulated userspace to use encryption functionality.

Tested with the debian ell package with hppa guest on x86_64 host.

Signed-off-by: Helge Deller 
Reviewed-by: Laurent Vivier 
Message-Id: <20221212173416.90590-1-del...@gmx.de>
Signed-off-by: Laurent Vivier 
---
 linux-user/syscall.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 55d53b344b84..a0d2beddaa4e 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -1829,6 +1829,14 @@ static inline abi_long target_to_host_cmsg(struct msghdr 
*msgh,
 __get_user(cred->pid, &target_cred->pid);
 __get_user(cred->uid, &target_cred->uid);
 __get_user(cred->gid, &target_cred->gid);
+} else if (cmsg->cmsg_level == SOL_ALG) {
+uint32_t *dst = (uint32_t *)data;
+
+memcpy(dst, target_data, len);
+/* fix endianess of first 32-bit word */
+if (len >= sizeof(uint32_t)) {
+*dst = tswap32(*dst);
+}
 } else {
 qemu_log_mask(LOG_UNIMP, "Unsupported ancillary data: %d/%d\n",
   cmsg->cmsg_level, cmsg->cmsg_type);
-- 
2.39.1

[PULL 00/22] Linux user for 8.0 patches

2023-02-04 Thread Laurent Vivier

The following changes since commit 13356edb87506c148b163b8c7eb0695647d00c2a:

  Merge tag 'block-pull-request' of https://gitlab.com/stefanha/qemu into 
staging (2023-01-24 09:45:33 +)

are available in the Git repository at:

  https://gitlab.com/laurent_vivier/qemu.git 
tags/linux-user-for-8.0-pull-request

for you to fetch changes up to 3f0744f98b07c6fd2ce9d5840726d0915b2ae7c1:

  linux-user: Allow sendmsg() without IOV (2023-02-03 22:55:12 +0100)


linux-user branch pull request 20230204

Implement execveat()
un-parent OBJECT(cpu) when closing thread
Revert fix for glibc >= 2.36 sys/mount.h
Fix/update strace
move target_flat.h to target subdirs
Fix SO_ERROR return code of getsockopt()
Fix /proc/cpuinfo output for hppa
Add emulation for MADV_WIPEONFORK and MADV_KEEPONFORK in madvise()
Implement SOL_ALG encryption support
linux-user: Allow sendmsg() without IOV



Daniel P. Berrangé (2):
  Revert "linux-user: add more compat ioctl definitions"
  Revert "linux-user: fix compat with glibc >= 2.36 sys/mount.h"

Drew DeVault (4):
  linux-user/strace: Extract print_execve_argv() from print_execve()
  linux-user/strace: Add output for execveat() syscall
  linux-user/syscall: Extract do_execve() from do_syscall1()
  linux-user/syscall: Implement execveat()

Helge Deller (11):
  linux-user: Add missing MAP_HUGETLB and MAP_STACK flags in strace
  linux-user: Add strace output for clock_getres_time64() and
futex_time64()
  linux-user: Improve strace output of getgroups() and setgroups()
  linux-user: Fix SO_ERROR return code of getsockopt()
  linux-user: Fix /proc/cpuinfo output for hppa
  linux-user: Improve strace output of personality() and sysinfo()
  linux-user: Add emulation for MADV_WIPEONFORK and MADV_KEEPONFORK in
madvise()
  linux-user: Show 4th argument of rt_sigprocmask() in strace
  linux-user: Enhance strace output for various syscalls
  linux-user: Implement SOL_ALG encryption support
  linux-user: Allow sendmsg() without IOV

Letu Ren (1):
  linux-user: add more netlink protocol constants

Mike Frysinger (2):
  linux-user: fix strace build w/out munlockall
  linux-user: move target_flat.h to target subdirs

Philippe Mathieu-Daudé (1):
  linux-user/strace: Constify struct flags

Richard Henderson (1):
  linux-user: un-parent OBJECT(cpu) when closing thread

 linux-user/aarch64/target_flat.h   |   1 +
 linux-user/arm/target_flat.h   |   1 +
 linux-user/{ => generic}/target_flat.h |   0
 linux-user/m68k/target_flat.h  |   1 +
 linux-user/microblaze/target_flat.h|   1 +
 linux-user/mmap.c  |  56 +++--
 linux-user/sh4/target_flat.h   |   1 +
 linux-user/strace.c| 189 ++-
 linux-user/strace.list |  64 ++---
 linux-user/syscall.c   | 312 +
 meson.build|   2 -
 11 files changed, 378 insertions(+), 250 deletions(-)
 create mode 100644 linux-user/aarch64/target_flat.h
 create mode 100644 linux-user/arm/target_flat.h
 rename linux-user/{ => generic}/target_flat.h (100%)
 create mode 100644 linux-user/m68k/target_flat.h
 create mode 100644 linux-user/microblaze/target_flat.h
 create mode 100644 linux-user/sh4/target_flat.h

-- 
2.39.1

[PULL 13/22] linux-user: Improve strace output of getgroups() and setgroups()

2023-02-04 Thread Laurent Vivier

From: Helge Deller 

Make the strace look nicer for those syscalls.

Signed-off-by: Helge Deller 
Reviewed-by: Laurent Vivier 
Message-Id: <20230115210057.445132-1-del...@gmx.de>
Signed-off-by: Laurent Vivier 
---
 linux-user/strace.list | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/linux-user/strace.list b/linux-user/strace.list
index 64db8e6b8412..cf291d02edfe 100644
--- a/linux-user/strace.list
+++ b/linux-user/strace.list
@@ -321,10 +321,10 @@
 { TARGET_NR_getgid32, "getgid32" , NULL, NULL, NULL },
 #endif
 #ifdef TARGET_NR_getgroups
-{ TARGET_NR_getgroups, "getgroups" , NULL, NULL, NULL },
+{ TARGET_NR_getgroups, "getgroups" , "%s(%d,%p)", NULL, NULL },
 #endif
 #ifdef TARGET_NR_getgroups32
-{ TARGET_NR_getgroups32, "getgroups32" , NULL, NULL, NULL },
+{ TARGET_NR_getgroups32, "getgroups32" , "%s(%d,%p)", NULL, NULL },
 #endif
 #ifdef TARGET_NR_gethostname
 { TARGET_NR_gethostname, "gethostname" , NULL, NULL, NULL },
@@ -1304,10 +1304,10 @@
 { TARGET_NR_setgid32, "setgid32" , "%s(%u)", NULL, NULL },
 #endif
 #ifdef TARGET_NR_setgroups
-{ TARGET_NR_setgroups, "setgroups" , NULL, NULL, NULL },
+{ TARGET_NR_setgroups, "setgroups" , "%s(%d,%p)", NULL, NULL },
 #endif
 #ifdef TARGET_NR_setgroups32
-{ TARGET_NR_setgroups32, "setgroups32" , NULL, NULL, NULL },
+{ TARGET_NR_setgroups32, "setgroups32" , "%s(%d,%p)", NULL, NULL },
 #endif
 #ifdef TARGET_NR_sethae
 { TARGET_NR_sethae, "sethae" , NULL, NULL, NULL },
-- 
2.39.1

[PULL 14/22] linux-user: move target_flat.h to target subdirs

2023-02-04 Thread Laurent Vivier

From: Mike Frysinger 

This makes target_flat.h behave like every other target_xxx.h header.
It also makes it actually work -- while the current header says adding
a header to the target subdir overrides the common one, it doesn't.
This is for two reasons:
* meson.build adds -Ilinux-user before -Ilinux-user/$arch
* the compiler search path for "target_flat.h" looks in the same dir
  as the source file before searching -I paths.

This can be seen with the xtensa port -- the subdir settings aren't
used which breaks stack setup.

Move it to the generic/ subdir and add include stubs like every
other target_xxx.h header is handled.

Signed-off-by: Mike Frysinger 
Reviewed-by: Richard Henderson 
Message-Id: <20230129004625.11228-1-vap...@gentoo.org>
Signed-off-by: Laurent Vivier 
---
 linux-user/aarch64/target_flat.h   | 1 +
 linux-user/arm/target_flat.h   | 1 +
 linux-user/{ => generic}/target_flat.h | 0
 linux-user/m68k/target_flat.h  | 1 +
 linux-user/microblaze/target_flat.h| 1 +
 linux-user/sh4/target_flat.h   | 1 +
 6 files changed, 5 insertions(+)
 create mode 100644 linux-user/aarch64/target_flat.h
 create mode 100644 linux-user/arm/target_flat.h
 rename linux-user/{ => generic}/target_flat.h (100%)
 create mode 100644 linux-user/m68k/target_flat.h
 create mode 100644 linux-user/microblaze/target_flat.h
 create mode 100644 linux-user/sh4/target_flat.h

diff --git a/linux-user/aarch64/target_flat.h b/linux-user/aarch64/target_flat.h
new file mode 100644
index ..bc83224cea12
--- /dev/null
+++ b/linux-user/aarch64/target_flat.h
@@ -0,0 +1 @@
+#include "../generic/target_flat.h"
diff --git a/linux-user/arm/target_flat.h b/linux-user/arm/target_flat.h
new file mode 100644
index ..bc83224cea12
--- /dev/null
+++ b/linux-user/arm/target_flat.h
@@ -0,0 +1 @@
+#include "../generic/target_flat.h"
diff --git a/linux-user/target_flat.h b/linux-user/generic/target_flat.h
similarity index 100%
rename from linux-user/target_flat.h
rename to linux-user/generic/target_flat.h
diff --git a/linux-user/m68k/target_flat.h b/linux-user/m68k/target_flat.h
new file mode 100644
index ..bc83224cea12
--- /dev/null
+++ b/linux-user/m68k/target_flat.h
@@ -0,0 +1 @@
+#include "../generic/target_flat.h"
diff --git a/linux-user/microblaze/target_flat.h 
b/linux-user/microblaze/target_flat.h
new file mode 100644
index ..bc83224cea12
--- /dev/null
+++ b/linux-user/microblaze/target_flat.h
@@ -0,0 +1 @@
+#include "../generic/target_flat.h"
diff --git a/linux-user/sh4/target_flat.h b/linux-user/sh4/target_flat.h
new file mode 100644
index ..bc83224cea12
--- /dev/null
+++ b/linux-user/sh4/target_flat.h
@@ -0,0 +1 @@
+#include "../generic/target_flat.h"
-- 
2.39.1

pixman_blt on aarch64

2023-02-04 Thread BALATON Zoltan


Hello,

I'm trying to involve the pixman list in this thread on qemu-devel list 
started with subject "Display update issue on M1 Macs". See here:


https://lists.nongnu.org/archive/html/qemu-devel/2023-02/msg01033.html

We have found that on aarch64 Macs running macOS the pixman_blt and 
pixman_fill functions are disabled without fallback due to not being able 
to compile the needed assembly code. See detailed discussion below.


Is there a way to fix this in pixman in the near future or provide a 
fallback for this in pixman? Or do I need to add a fallback in QEMU or try 
using something else instead of pixman for these functions?


Thank you,
BALATON Zoltan

On Sat, 4 Feb 2023, Akihiko Odaki wrote:

On 2023/02/03 22:45, BALATON Zoltan wrote:

On Fri, 3 Feb 2023, Akihiko Odaki wrote:
I finally reproduced the issue with MorphOS and ati-vga and figured out 
its cause.


The problem is that pixman_blt() is disabled because its backend is 
written in GNU assembly, and GNU assembler is not available on macOS. 
There is no fallback written in C, unfortunately. The issue is tracked by 
the upstream at:

https://gitlab.freedesktop.org/pixman/pixman/-/issues/59


Hm, OK but that ticket is just about compile error and suggests to disable 
it and does not say it won't work then. Are they aware this is a problem? 
Maybe we should write to their mailing list after we're sure what's 
happening.


That's a good idea. They may prioritize the issue if they realize that 
disables pixman_blt().


I hit the same problem on Asahi Linux, which is based on Arch Linux ARM. 
It is because Arch Linux copied PKGBUILD from x86 Arch Linux, which 
disables Arm backends. It is easy to enable the backend for the platform 
so I proposed a change at:

https://github.com/archlinuxarm/PKGBUILDs/pull/1985


On macOS one source of pixman most people use is brew.sh where this seems 
to be disabled:


https://github.com/Homebrew/homebrew-core/blob/master/Formula/pixman.rb

another source is macports which has an older version and no such options:

https://github.com/macports/macports-ports/blob/master/graphics/libpixman-devel/Portfile

I wonder if it compiles from macports on aarch64 then.


It's more likely that it is just outdated. It does not carry a patch to fix 
the issue.


I wait if I can get some more test results and try to check pixman but its 
source is not too clear to me and there are no docs either so maybe the 
best way is to ask on their list. If this is a pixman issue I hope it can 
be fixed there and we don't need to implement a fallback in QEMU.


This is certainly a pixman issue.

If you read the source, you can see pixman_blt() calls 
_pixman_implementation_blt(). _pixman_implementation_blt() calls blt member 
of pixman_implementation_t in turn. Grepping for "blt =" tells it is only 
assigned in:

pixman/pixman-arm-neon.c
pixman/pixman-arm-simd.c
pixman/pixman-mips-dspr2.c
pixman/pixman-mmx.c
pixman/pixman-sse2.c

For AArch64, only pixman/pixman-arm-neon.c is relevant, and it needs to be 
disabled to build the library on macOS.


Regards,
Akihiko Odaki



Regards,
BALATON Zoltan

1 2 >

1 - 100 of 126 matches

Mail list logo