date:20230823

Re: constructor vs. constructor

2023-08-23 Thread Markus Armbruster

Liu Jaloo  writes:

> What's the difference between  "__attribute__((constructor))" and
> "__attribute__((__constructor__))" in qemu source?

Reading the fine manual helps:

You may optionally specify attribute names with ‘__’ preceding and
following the name.  This allows you to use them in header files
without being concerned about a possible macro of the same name. For
example, you may use the attribute name __noreturn__ instead of
noreturn.

https://gcc.gnu.org/onlinedocs/gcc/Attribute-Syntax.html

Re: [PATCH] softmmu: Fix dirtylimit memory leak

2023-08-23 Thread Michael Tokarev


23.08.2023 10:47, alloc.yo...@outlook.com wrote:

From: "alloc.young" 

Fix memory leak in hmp_info_vcpu_dirty_limit,use g_autoptr
handle memory deallocation, alse use g_free to match g_malloc
&& g_new functions.


"..use g_autoptr TO handle.." ("to" is missing).
"alse" - I guess should be "Also".

I think it is better to split this into two parts, one fixing
the memleak and another converting to g_free().

/mjt

constructor vs. constructor

2023-08-23 Thread Liu Jaloo

What's the difference between  "__attribute__((constructor))" and
"__attribute__((__constructor__))" in qemu source?

[PATCH] softmmu: Fix dirtylimit memory leak

2023-08-23 Thread alloc . young

From: "alloc.young" 

Fix memory leak in hmp_info_vcpu_dirty_limit,use g_autoptr
to handle memory deallocation.

Signed-off-by: alloc.young 
Reviewed-by: Yong Huang 
---

v1->v2
drop g_free, just focus on fix memory leak issue

---
 softmmu/dirtylimit.c | 16 +++-
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/softmmu/dirtylimit.c b/softmmu/dirtylimit.c
index 3c275ee55b..e3ff53b8fc 100644
--- a/softmmu/dirtylimit.c
+++ b/softmmu/dirtylimit.c
@@ -653,7 +653,8 @@ struct DirtyLimitInfoList *qmp_query_vcpu_dirty_limit(Error 
**errp)
 
 void hmp_info_vcpu_dirty_limit(Monitor *mon, const QDict *qdict)
 {
-DirtyLimitInfoList *limit, *head, *info = NULL;
+DirtyLimitInfoList *info;
+g_autoptr(DirtyLimitInfoList) head = NULL;
 Error *err = NULL;
 
 if (!dirtylimit_in_service()) {
@@ -661,20 +662,17 @@ void hmp_info_vcpu_dirty_limit(Monitor *mon, const QDict 
*qdict)
 return;
 }
 
-info = qmp_query_vcpu_dirty_limit();
+head = qmp_query_vcpu_dirty_limit();
 if (err) {
 hmp_handle_error(mon, err);
 return;
 }
 
-head = info;
-for (limit = head; limit != NULL; limit = limit->next) {
+for (info = head; info != NULL; info = info->next) {
 monitor_printf(mon, "vcpu[%"PRIi64"], limit rate %"PRIi64 " (MB/s),"
 " current rate %"PRIi64 " (MB/s)\n",
-limit->value->cpu_index,
-limit->value->limit_rate,
-limit->value->current_rate);
+info->value->cpu_index,
+info->value->limit_rate,
+info->value->current_rate);
 }
-
-g_free(info);
 }
-- 
2.39.3

回复: [PATCH] softmmu: Fix dirtylimit memory leak

2023-08-23 Thread 阳春光




发件人: Yong Huang 
发送时间: 2023年8月24日 8:31
收件人: alloc.yo...@outlook.com
抄送: qemu-devel@nongnu.org
主题: Re: [PATCH] softmmu: Fix dirtylimit memory leak



On Wed, Aug 23, 2023 at 3:48 PM 
mailto:alloc.yo...@outlook.com>> wrote:
From: "alloc.young" mailto:alloc.yo...@outlook.com>>

Fix memory leak in hmp_info_vcpu_dirty_limit,use g_autoptr
handle memory deallocation, alse use g_free to match g_malloc
&& g_new functions.

Signed-off-by: alloc.young 
mailto:alloc.yo...@outlook.com>>
---
 softmmu/dirtylimit.c | 26 --
 1 file changed, 12 insertions(+), 14 deletions(-)

[...]
diff --git a/softmmu/dirtylimit.c b/softmmu/dirtylimit.c
index 3c275ee55b..fa959d7743 100644
--- a/softmmu/dirtylimit.c
+++ b/softmmu/dirtylimit.c
@@ -100,7 +100,7 @@ static void vcpu_dirty_rate_stat_collect(void)
 stat.rates[i].dirty_rate;
 }

-free(stat.rates);
+g_free(stat.rates);
 }

Code optimization.
 static void *vcpu_dirty_rate_stat_thread(void *opaque)
@@ -171,10 +171,10 @@ void vcpu_dirty_rate_stat_initialize(void)

 void vcpu_dirty_rate_stat_finalize(void)
 {
-free(vcpu_dirty_rate_stat->stat.rates);
+g_free(vcpu_dirty_rate_stat->stat.rates);
 vcpu_dirty_rate_stat->stat.rates = NULL;

-free(vcpu_dirty_rate_stat);
+g_free(vcpu_dirty_rate_stat);
 vcpu_dirty_rate_stat = NULL;
 }

Likewise...
@@ -220,10 +220,10 @@ void dirtylimit_state_initialize(void)

 void dirtylimit_state_finalize(void)
 {
-free(dirtylimit_state->states);
+g_free(dirtylimit_state->states);
 dirtylimit_state->states = NULL;

-free(dirtylimit_state);
+g_free(dirtylimit_state);
 dirtylimit_state = NULL;

Likewise...
 trace_dirtylimit_state_finalize();
@@ -653,7 +653,8 @@ struct DirtyLimitInfoList *qmp_query_vcpu_dirty_limit(Error 
**errp)

 void hmp_info_vcpu_dirty_limit(Monitor *mon, const QDict *qdict)
 {
-DirtyLimitInfoList *limit, *head, *info = NULL;
+DirtyLimitInfoList *info;
+g_autoptr(DirtyLimitInfoList) head = NULL;
 Error *err = NULL;

 if (!dirtylimit_in_service()) {
@@ -661,20 +662,17 @@ void hmp_info_vcpu_dirty_limit(Monitor *mon, const QDict 
*qdict)
 return;
 }

-info = qmp_query_vcpu_dirty_limit();
+head = qmp_query_vcpu_dirty_limit();
 if (err) {
 hmp_handle_error(mon, err);
 return;
 }

-head = info;
-for (limit = head; limit != NULL; limit = limit->next) {
+for (info = head; info != NULL; info = info->next) {
 monitor_printf(mon, "vcpu[%"PRIi64"], limit rate %"PRIi64 " (MB/s),"
 " current rate %"PRIi64 " (MB/s)\n",
-limit->value->cpu_index,
-limit->value->limit_rate,
-limit->value->current_rate);
+info->value->cpu_index,
+info->value->limit_rate,
+info->value->current_rate);
 }
-
-g_free(info);
Fix memory leak.
 }
--
2.39.3

I'll choose the memory leak modifications to keep the patch focused on a single
independent issue.

Ok, will send a patch just to fix this issue, thx

Anyway,

Reviewed-by: Hyman Huang(黄勇) 
mailto:yong.hu...@smartx.com>>

--
Best regards

[PATCH 10/13] linux-user: Move shmat and shmdt implementations to mmap.c

2023-08-23 Thread Richard Henderson

Rename from do_* to target_*.  Fix some minor checkpatch errors.

Tested-by: Philippe Mathieu-Daudé 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Warner Losh 
Signed-off-by: Richard Henderson 
---
 linux-user/user-mmap.h |   4 ++
 linux-user/mmap.c  | 138 +++
 linux-user/syscall.c   | 143 ++---
 3 files changed, 146 insertions(+), 139 deletions(-)

diff --git a/linux-user/user-mmap.h b/linux-user/user-mmap.h
index 0f4883eb57..b94bcdcf83 100644
--- a/linux-user/user-mmap.h
+++ b/linux-user/user-mmap.h
@@ -58,4 +58,8 @@ abi_ulong mmap_find_vma(abi_ulong, abi_ulong, abi_ulong);
 void mmap_fork_start(void);
 void mmap_fork_end(int child);
 
+abi_ulong target_shmat(CPUArchState *cpu_env, int shmid,
+   abi_ulong shmaddr, int shmflg);
+abi_long target_shmdt(abi_ulong shmaddr);
+
 #endif /* LINUX_USER_USER_MMAP_H */
diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index 9aab48d4a3..3aeacd1ecd 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -17,6 +17,7 @@
  *  along with this program; if not, see .
  */
 #include "qemu/osdep.h"
+#include 
 #include "trace.h"
 #include "exec/log.h"
 #include "qemu.h"
@@ -27,6 +28,14 @@
 static pthread_mutex_t mmap_mutex = PTHREAD_MUTEX_INITIALIZER;
 static __thread int mmap_lock_count;
 
+#define N_SHM_REGIONS  32
+
+static struct shm_region {
+abi_ulong start;
+abi_ulong size;
+bool in_use;
+} shm_regions[N_SHM_REGIONS];
+
 void mmap_lock(void)
 {
 if (mmap_lock_count++ == 0) {
@@ -981,3 +990,132 @@ abi_long target_madvise(abi_ulong start, abi_ulong 
len_in, int advice)
 
 return ret;
 }
+
+#ifndef TARGET_FORCE_SHMLBA
+/*
+ * For most architectures, SHMLBA is the same as the page size;
+ * some architectures have larger values, in which case they should
+ * define TARGET_FORCE_SHMLBA and provide a target_shmlba() function.
+ * This corresponds to the kernel arch code defining __ARCH_FORCE_SHMLBA
+ * and defining its own value for SHMLBA.
+ *
+ * The kernel also permits SHMLBA to be set by the architecture to a
+ * value larger than the page size without setting __ARCH_FORCE_SHMLBA;
+ * this means that addresses are rounded to the large size if
+ * SHM_RND is set but addresses not aligned to that size are not rejected
+ * as long as they are at least page-aligned. Since the only architecture
+ * which uses this is ia64 this code doesn't provide for that oddity.
+ */
+static inline abi_ulong target_shmlba(CPUArchState *cpu_env)
+{
+return TARGET_PAGE_SIZE;
+}
+#endif
+
+abi_ulong target_shmat(CPUArchState *cpu_env, int shmid,
+   abi_ulong shmaddr, int shmflg)
+{
+CPUState *cpu = env_cpu(cpu_env);
+abi_ulong raddr;
+void *host_raddr;
+struct shmid_ds shm_info;
+int i, ret;
+abi_ulong shmlba;
+
+/* shmat pointers are always untagged */
+
+/* find out the length of the shared memory segment */
+ret = get_errno(shmctl(shmid, IPC_STAT, _info));
+if (is_error(ret)) {
+/* can't get length, bail out */
+return ret;
+}
+
+shmlba = target_shmlba(cpu_env);
+
+if (shmaddr & (shmlba - 1)) {
+if (shmflg & SHM_RND) {
+shmaddr &= ~(shmlba - 1);
+} else {
+return -TARGET_EINVAL;
+}
+}
+if (!guest_range_valid_untagged(shmaddr, shm_info.shm_segsz)) {
+return -TARGET_EINVAL;
+}
+
+mmap_lock();
+
+/*
+ * We're mapping shared memory, so ensure we generate code for parallel
+ * execution and flush old translations.  This will work up to the level
+ * supported by the host -- anything that requires EXCP_ATOMIC will not
+ * be atomic with respect to an external process.
+ */
+if (!(cpu->tcg_cflags & CF_PARALLEL)) {
+cpu->tcg_cflags |= CF_PARALLEL;
+tb_flush(cpu);
+}
+
+if (shmaddr) {
+host_raddr = shmat(shmid, (void *)g2h_untagged(shmaddr), shmflg);
+} else {
+abi_ulong mmap_start;
+
+/* In order to use the host shmat, we need to honor host SHMLBA.  */
+mmap_start = mmap_find_vma(0, shm_info.shm_segsz, MAX(SHMLBA, shmlba));
+
+if (mmap_start == -1) {
+errno = ENOMEM;
+host_raddr = (void *)-1;
+} else {
+host_raddr = shmat(shmid, g2h_untagged(mmap_start),
+   shmflg | SHM_REMAP);
+}
+}
+
+if (host_raddr == (void *)-1) {
+mmap_unlock();
+return get_errno((intptr_t)host_raddr);
+}
+raddr = h2g((uintptr_t)host_raddr);
+
+page_set_flags(raddr, raddr + shm_info.shm_segsz - 1,
+   PAGE_VALID | PAGE_RESET | PAGE_READ |
+   (shmflg & SHM_RDONLY ? 0 : PAGE_WRITE));
+
+for (i = 0; i < N_SHM_REGIONS; i++) {
+if (!shm_regions[i].in_use) {
+shm_regions[i].in_use = true;
+shm_regions[i].start = raddr;
+

[PATCH 11/13] linux-user: Use WITH_MMAP_LOCK_GUARD in target_{shmat, shmdt}

2023-08-23 Thread Richard Henderson

Move the CF_PARALLEL setting outside of the mmap lock.

Signed-off-by: Richard Henderson 
---
 linux-user/mmap.c | 98 ++-
 1 file changed, 46 insertions(+), 52 deletions(-)

diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index 3aeacd1ecd..f45b2d307c 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -1017,9 +1017,8 @@ abi_ulong target_shmat(CPUArchState *cpu_env, int shmid,
 {
 CPUState *cpu = env_cpu(cpu_env);
 abi_ulong raddr;
-void *host_raddr;
 struct shmid_ds shm_info;
-int i, ret;
+int ret;
 abi_ulong shmlba;
 
 /* shmat pointers are always untagged */
@@ -1044,7 +1043,43 @@ abi_ulong target_shmat(CPUArchState *cpu_env, int shmid,
 return -TARGET_EINVAL;
 }
 
-mmap_lock();
+WITH_MMAP_LOCK_GUARD() {
+void *host_raddr;
+
+if (shmaddr) {
+host_raddr = shmat(shmid, (void *)g2h_untagged(shmaddr), shmflg);
+} else {
+abi_ulong mmap_start;
+
+/* In order to use the host shmat, we need to honor host SHMLBA.  
*/
+mmap_start = mmap_find_vma(0, shm_info.shm_segsz,
+   MAX(SHMLBA, shmlba));
+
+if (mmap_start == -1) {
+return -TARGET_ENOMEM;
+}
+host_raddr = shmat(shmid, g2h_untagged(mmap_start),
+   shmflg | SHM_REMAP);
+}
+
+if (host_raddr == (void *)-1) {
+return get_errno(-1);
+}
+raddr = h2g(host_raddr);
+
+page_set_flags(raddr, raddr + shm_info.shm_segsz - 1,
+   PAGE_VALID | PAGE_RESET | PAGE_READ |
+   (shmflg & SHM_RDONLY ? 0 : PAGE_WRITE));
+
+for (int i = 0; i < N_SHM_REGIONS; i++) {
+if (!shm_regions[i].in_use) {
+shm_regions[i].in_use = true;
+shm_regions[i].start = raddr;
+shm_regions[i].size = shm_info.shm_segsz;
+break;
+}
+}
+}
 
 /*
  * We're mapping shared memory, so ensure we generate code for parallel
@@ -1057,65 +1092,24 @@ abi_ulong target_shmat(CPUArchState *cpu_env, int shmid,
 tb_flush(cpu);
 }
 
-if (shmaddr) {
-host_raddr = shmat(shmid, (void *)g2h_untagged(shmaddr), shmflg);
-} else {
-abi_ulong mmap_start;
-
-/* In order to use the host shmat, we need to honor host SHMLBA.  */
-mmap_start = mmap_find_vma(0, shm_info.shm_segsz, MAX(SHMLBA, shmlba));
-
-if (mmap_start == -1) {
-errno = ENOMEM;
-host_raddr = (void *)-1;
-} else {
-host_raddr = shmat(shmid, g2h_untagged(mmap_start),
-   shmflg | SHM_REMAP);
-}
-}
-
-if (host_raddr == (void *)-1) {
-mmap_unlock();
-return get_errno((intptr_t)host_raddr);
-}
-raddr = h2g((uintptr_t)host_raddr);
-
-page_set_flags(raddr, raddr + shm_info.shm_segsz - 1,
-   PAGE_VALID | PAGE_RESET | PAGE_READ |
-   (shmflg & SHM_RDONLY ? 0 : PAGE_WRITE));
-
-for (i = 0; i < N_SHM_REGIONS; i++) {
-if (!shm_regions[i].in_use) {
-shm_regions[i].in_use = true;
-shm_regions[i].start = raddr;
-shm_regions[i].size = shm_info.shm_segsz;
-break;
-}
-}
-
-mmap_unlock();
 return raddr;
 }
 
 abi_long target_shmdt(abi_ulong shmaddr)
 {
-int i;
 abi_long rv;
 
 /* shmdt pointers are always untagged */
 
-mmap_lock();
-
-for (i = 0; i < N_SHM_REGIONS; ++i) {
-if (shm_regions[i].in_use && shm_regions[i].start == shmaddr) {
-shm_regions[i].in_use = false;
-page_set_flags(shmaddr, shmaddr + shm_regions[i].size - 1, 0);
-break;
+WITH_MMAP_LOCK_GUARD() {
+for (int i = 0; i < N_SHM_REGIONS; ++i) {
+if (shm_regions[i].in_use && shm_regions[i].start == shmaddr) {
+shm_regions[i].in_use = false;
+page_set_flags(shmaddr, shmaddr + shm_regions[i].size - 1, 0);
+break;
+}
 }
+rv = get_errno(shmdt(g2h_untagged(shmaddr)));
 }
-rv = get_errno(shmdt(g2h_untagged(shmaddr)));
-
-mmap_unlock();
-
 return rv;
 }
-- 
2.34.1

[PATCH 13/13] linux-user: Track shm regions with an interval tree

2023-08-23 Thread Richard Henderson

Remove the fixed size shm_regions[] array.
Remove references when other mappings completely remove
or replace a region.

Signed-off-by: Richard Henderson 
---
 linux-user/mmap.c | 81 +++
 1 file changed, 53 insertions(+), 28 deletions(-)

diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index 44116c014b..8eaf57b208 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -24,18 +24,11 @@
 #include "user-internals.h"
 #include "user-mmap.h"
 #include "target_mman.h"
+#include "qemu/interval-tree.h"
 
 static pthread_mutex_t mmap_mutex = PTHREAD_MUTEX_INITIALIZER;
 static __thread int mmap_lock_count;
 
-#define N_SHM_REGIONS  32
-
-static struct shm_region {
-abi_ulong start;
-abi_ulong size;
-bool in_use;
-} shm_regions[N_SHM_REGIONS];
-
 void mmap_lock(void)
 {
 if (mmap_lock_count++ == 0) {
@@ -73,6 +66,44 @@ void mmap_fork_end(int child)
 }
 }
 
+/* Protected by mmap_lock. */
+static IntervalTreeRoot shm_regions;
+
+static void shm_region_add(abi_ptr start, abi_ptr last)
+{
+IntervalTreeNode *i = g_new0(IntervalTreeNode, 1);
+
+i->start = start;
+i->last = last;
+interval_tree_insert(i, _regions);
+}
+
+static abi_ptr shm_region_find(abi_ptr start)
+{
+IntervalTreeNode *i;
+
+for (i = interval_tree_iter_first(_regions, start, start); i;
+ i = interval_tree_iter_next(i, start, start)) {
+if (i->start == start) {
+return i->last;
+}
+}
+return 0;
+}
+
+static void shm_region_rm_complete(abi_ptr start, abi_ptr last)
+{
+IntervalTreeNode *i, *n;
+
+for (i = interval_tree_iter_first(_regions, start, last); i; i = n) {
+n = interval_tree_iter_next(i, start, last);
+if (i->start >= start && i->last <= last) {
+interval_tree_remove(i, _regions);
+g_free(i);
+}
+}
+}
+
 /*
  * Validate target prot bitmask.
  * Return the prot bitmask for the host in *HOST_PROT.
@@ -729,6 +760,7 @@ abi_long target_mmap(abi_ulong start, abi_ulong len, int 
target_prot,
 page_set_flags(passthrough_last + 1, last, page_flags);
 }
 }
+shm_region_rm_complete(start, last);
  the_end:
 trace_target_mmap_complete(start);
 if (qemu_loglevel_mask(CPU_LOG_PAGE)) {
@@ -826,6 +858,7 @@ int target_munmap(abi_ulong start, abi_ulong len)
 mmap_lock();
 mmap_reserve_or_unmap(start, len);
 page_set_flags(start, start + len - 1, 0);
+shm_region_rm_complete(start, start + len - 1);
 mmap_unlock();
 
 return 0;
@@ -915,8 +948,10 @@ abi_long target_mremap(abi_ulong old_addr, abi_ulong 
old_size,
 new_addr = h2g(host_addr);
 prot = page_get_flags(old_addr);
 page_set_flags(old_addr, old_addr + old_size - 1, 0);
+shm_region_rm_complete(old_addr, old_addr + old_size - 1);
 page_set_flags(new_addr, new_addr + new_size - 1,
prot | PAGE_VALID | PAGE_RESET);
+shm_region_rm_complete(new_addr, new_addr + new_size - 1);
 }
 mmap_unlock();
 return new_addr;
@@ -1045,6 +1080,7 @@ abi_ulong target_shmat(CPUArchState *cpu_env, int shmid,
 
 WITH_MMAP_LOCK_GUARD() {
 void *host_raddr;
+abi_ulong last;
 
 if (shmaddr) {
 host_raddr = shmat(shmid, (void *)g2h_untagged(shmaddr), shmflg);
@@ -1066,19 +1102,14 @@ abi_ulong target_shmat(CPUArchState *cpu_env, int shmid,
 return get_errno(-1);
 }
 raddr = h2g(host_raddr);
+last = raddr + shm_info.shm_segsz - 1;
 
-page_set_flags(raddr, raddr + shm_info.shm_segsz - 1,
+page_set_flags(raddr, last,
PAGE_VALID | PAGE_RESET | PAGE_READ |
(shmflg & SHM_RDONLY ? 0 : PAGE_WRITE));
 
-for (int i = 0; i < N_SHM_REGIONS; i++) {
-if (!shm_regions[i].in_use) {
-shm_regions[i].in_use = true;
-shm_regions[i].start = raddr;
-shm_regions[i].size = shm_info.shm_segsz;
-break;
-}
-}
+shm_region_rm_complete(raddr, last);
+shm_region_add(raddr, last);
 }
 
 /*
@@ -1102,23 +1133,17 @@ abi_long target_shmdt(abi_ulong shmaddr)
 /* shmdt pointers are always untagged */
 
 WITH_MMAP_LOCK_GUARD() {
-int i;
-
-for (i = 0; i < N_SHM_REGIONS; ++i) {
-if (shm_regions[i].in_use && shm_regions[i].start == shmaddr) {
-break;
-}
-}
-if (i == N_SHM_REGIONS) {
+abi_ulong last = shm_region_find(shmaddr);
+if (last == 0) {
 return -TARGET_EINVAL;
 }
 
 rv = get_errno(shmdt(g2h_untagged(shmaddr)));
 if (rv == 0) {
-abi_ulong size = shm_regions[i].size;
+abi_ulong size = last - shmaddr + 1;
 
-shm_regions[i].in_use = false;
-page_set_flags(shmaddr, shmaddr + size - 1, 0);
+

[PATCH 09/13] linux-user: Remove ELF_START_MMAP and image_info.start_mmap

2023-08-23 Thread Richard Henderson

The start_mmap value is write-only.
Remove the field and the defines that populated it.
Logically, this has been replaced by task_unmapped_base.

Tested-by: Helge Deller 
Reviewed-by: Ilya Leoshkevich 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 linux-user/qemu.h|  1 -
 linux-user/elfload.c | 38 --
 2 files changed, 39 deletions(-)

diff --git a/linux-user/qemu.h b/linux-user/qemu.h
index 4f8b55e2fb..12f638336a 100644
--- a/linux-user/qemu.h
+++ b/linux-user/qemu.h
@@ -30,7 +30,6 @@ struct image_info {
 abi_ulong   start_data;
 abi_ulong   end_data;
 abi_ulong   brk;
-abi_ulong   start_mmap;
 abi_ulong   start_stack;
 abi_ulong   stack_limit;
 abi_ulong   entry;
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index dbc5d430e8..fdb87d80d3 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -143,8 +143,6 @@ static uint32_t get_elf_hwcap(void)
 }
 
 #ifdef TARGET_X86_64
-#define ELF_START_MMAP 0x2ab000ULL
-
 #define ELF_CLASS  ELFCLASS64
 #define ELF_ARCH   EM_X86_64
 
@@ -221,8 +219,6 @@ static bool init_guest_commpage(void)
 #endif
 #else
 
-#define ELF_START_MMAP 0x8000
-
 /*
  * This is used to ensure we don't load something for the wrong architecture.
  */
@@ -308,8 +304,6 @@ static void elf_core_copy_regs(target_elf_gregset_t *regs, 
const CPUX86State *en
 #ifndef TARGET_AARCH64
 /* 32 bit ARM definitions */
 
-#define ELF_START_MMAP 0x8000
-
 #define ELF_ARCHEM_ARM
 #define ELF_CLASS   ELFCLASS32
 #define EXSTACK_DEFAULT true
@@ -600,7 +594,6 @@ static const char *get_elf_platform(void)
 
 #else
 /* 64 bit ARM definitions */
-#define ELF_START_MMAP 0x8000
 
 #define ELF_ARCHEM_AARCH64
 #define ELF_CLASS   ELFCLASS64
@@ -871,7 +864,6 @@ const char *elf_hwcap2_str(uint32_t bit)
 #ifdef TARGET_SPARC
 #ifdef TARGET_SPARC64
 
-#define ELF_START_MMAP 0x8000
 #define ELF_HWCAP  (HWCAP_SPARC_FLUSH | HWCAP_SPARC_STBAR | HWCAP_SPARC_SWAP \
 | HWCAP_SPARC_MULDIV | HWCAP_SPARC_V9)
 #ifndef TARGET_ABI32
@@ -883,7 +875,6 @@ const char *elf_hwcap2_str(uint32_t bit)
 #define ELF_CLASS   ELFCLASS64
 #define ELF_ARCHEM_SPARCV9
 #else
-#define ELF_START_MMAP 0x8000
 #define ELF_HWCAP  (HWCAP_SPARC_FLUSH | HWCAP_SPARC_STBAR | HWCAP_SPARC_SWAP \
 | HWCAP_SPARC_MULDIV)
 #define ELF_CLASS   ELFCLASS32
@@ -905,7 +896,6 @@ static inline void init_thread(struct target_pt_regs *regs,
 #ifdef TARGET_PPC
 
 #define ELF_MACHINEPPC_ELF_MACHINE
-#define ELF_START_MMAP 0x8000
 
 #if defined(TARGET_PPC64)
 
@@ -1108,8 +1098,6 @@ static void elf_core_copy_regs(target_elf_gregset_t 
*regs, const CPUPPCState *en
 
 #ifdef TARGET_LOONGARCH64
 
-#define ELF_START_MMAP 0x8000
-
 #define ELF_CLASS   ELFCLASS64
 #define ELF_ARCHEM_LOONGARCH
 #define EXSTACK_DEFAULT true
@@ -1200,8 +1188,6 @@ static uint32_t get_elf_hwcap(void)
 
 #ifdef TARGET_MIPS
 
-#define ELF_START_MMAP 0x8000
-
 #ifdef TARGET_MIPS64
 #define ELF_CLASS   ELFCLASS64
 #else
@@ -1359,8 +1345,6 @@ static uint32_t get_elf_hwcap(void)
 
 #ifdef TARGET_MICROBLAZE
 
-#define ELF_START_MMAP 0x8000
-
 #define elf_check_arch(x) ( (x) == EM_MICROBLAZE || (x) == EM_MICROBLAZE_OLD)
 
 #define ELF_CLASS   ELFCLASS32
@@ -1401,8 +1385,6 @@ static void elf_core_copy_regs(target_elf_gregset_t 
*regs, const CPUMBState *env
 
 #ifdef TARGET_NIOS2
 
-#define ELF_START_MMAP 0x8000
-
 #define elf_check_arch(x) ((x) == EM_ALTERA_NIOS2)
 
 #define ELF_CLASS   ELFCLASS32
@@ -1498,8 +1480,6 @@ static void elf_core_copy_regs(target_elf_gregset_t *regs,
 
 #ifdef TARGET_OPENRISC
 
-#define ELF_START_MMAP 0x0800
-
 #define ELF_ARCH EM_OPENRISC
 #define ELF_CLASS ELFCLASS32
 #define ELF_DATA  ELFDATA2MSB
@@ -1536,8 +1516,6 @@ static void elf_core_copy_regs(target_elf_gregset_t *regs,
 
 #ifdef TARGET_SH4
 
-#define ELF_START_MMAP 0x8000
-
 #define ELF_CLASS ELFCLASS32
 #define ELF_ARCH  EM_SH
 
@@ -1618,8 +1596,6 @@ static uint32_t get_elf_hwcap(void)
 
 #ifdef TARGET_CRIS
 
-#define ELF_START_MMAP 0x8000
-
 #define ELF_CLASS ELFCLASS32
 #define ELF_ARCH  EM_CRIS
 
@@ -1635,8 +1611,6 @@ static inline void init_thread(struct target_pt_regs 
*regs,
 
 #ifdef TARGET_M68K
 
-#define ELF_START_MMAP 0x8000
-
 #define ELF_CLASS   ELFCLASS32
 #define ELF_ARCHEM_68K
 
@@ -1686,8 +1660,6 @@ static void elf_core_copy_regs(target_elf_gregset_t 
*regs, const CPUM68KState *e
 
 #ifdef TARGET_ALPHA
 
-#define ELF_START_MMAP (0x300ULL)
-
 #define ELF_CLASS  ELFCLASS64
 #define ELF_ARCH   EM_ALPHA
 
@@ -1705,8 +1677,6 @@ static inline void init_thread(struct target_pt_regs 
*regs,
 
 #ifdef TARGET_S390X
 
-#define ELF_START_MMAP (0x200ULL)
-
 #define ELF_CLASS  ELFCLASS64
 #define ELF_DATA   ELFDATA2MSB
 #define ELF_ARCH   EM_S390
@@ -1816,7 +1786,6 @@

[PATCH 04/13] util/selfmap: Use dev_t and ino_t in MapInfo

2023-08-23 Thread Richard Henderson

Use dev_t instead of a string, and ino_t instead of uint64_t.
The latter is likely to be identical on modern systems but is
more type-correct for usage.

Tested-by: Helge Deller 
Reviewed-by: Ilya Leoshkevich 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 include/qemu/selfmap.h |  4 ++--
 linux-user/syscall.c   |  6 --
 util/selfmap.c | 12 +++-
 3 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/include/qemu/selfmap.h b/include/qemu/selfmap.h
index 7d938945cb..1690a74f4b 100644
--- a/include/qemu/selfmap.h
+++ b/include/qemu/selfmap.h
@@ -20,10 +20,10 @@ typedef struct {
 bool is_exec;
 bool is_priv;
 
+dev_t dev;
+ino_t inode;
 uint64_t offset;
-uint64_t inode;
 const char *path;
-char dev[];
 } MapInfo;
 
 /**
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index faad3a56df..a562920a84 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8160,13 +8160,15 @@ static int open_self_maps_1(CPUArchState *cpu_env, int 
fd, bool smaps)
 }
 
 count = dprintf(fd, TARGET_ABI_FMT_ptr "-" TARGET_ABI_FMT_ptr
-" %c%c%c%c %08" PRIx64 " %s %"PRId64,
+" %c%c%c%c %08" PRIx64 " %02x:%02x %"PRId64,
 h2g(min), h2g(max - 1) + 1,
 (flags & PAGE_READ) ? 'r' : '-',
 (flags & PAGE_WRITE_ORG) ? 'w' : '-',
 (flags & PAGE_EXEC) ? 'x' : '-',
 e->is_priv ? 'p' : 's',
-(uint64_t) e->offset, e->dev, e->inode);
+(uint64_t)e->offset,
+major(e->dev), minor(e->dev),
+(uint64_t)e->inode);
 if (path) {
 dprintf(fd, "%*s%s\n", 73 - count, "", path);
 } else {
diff --git a/util/selfmap.c b/util/selfmap.c
index 4db5b42651..483cb617e2 100644
--- a/util/selfmap.c
+++ b/util/selfmap.c
@@ -30,19 +30,21 @@ IntervalTreeRoot *read_self_maps(void)
 
 if (nfields > 4) {
 uint64_t start, end, offset, inode;
+unsigned dev_maj, dev_min;
 int errors = 0;
 const char *p;
 
 errors |= qemu_strtou64(fields[0], , 16, );
 errors |= qemu_strtou64(p + 1, NULL, 16, );
 errors |= qemu_strtou64(fields[2], NULL, 16, );
+errors |= qemu_strtoui(fields[3], , 16, _maj);
+errors |= qemu_strtoui(p + 1, NULL, 16, _min);
 errors |= qemu_strtou64(fields[4], NULL, 10, );
 
 if (!errors) {
-size_t dev_len, path_len;
+size_t path_len;
 MapInfo *e;
 
-dev_len = strlen(fields[3]) + 1;
 if (nfields == 6) {
 p = fields[5];
 p += strspn(p, " ");
@@ -52,11 +54,12 @@ IntervalTreeRoot *read_self_maps(void)
 path_len = 0;
 }
 
-e = g_malloc0(sizeof(*e) + dev_len + path_len);
+e = g_malloc0(sizeof(*e) + path_len);
 
 e->itree.start = start;
 e->itree.last = end - 1;
 e->offset = offset;
+e->dev = makedev(dev_maj, dev_min);
 e->inode = inode;
 
 e->is_read  = fields[1][0] == 'r';
@@ -64,9 +67,8 @@ IntervalTreeRoot *read_self_maps(void)
 e->is_exec  = fields[1][2] == 'x';
 e->is_priv  = fields[1][3] == 'p';
 
-memcpy(e->dev, fields[3], dev_len);
 if (path_len) {
-e->path = memcpy(e->dev + dev_len, p, path_len);
+e->path = memcpy(e + 1, p, path_len);
 }
 
 interval_tree_insert(>itree, root);
-- 
2.34.1

[PATCH 07/13] linux-user: Show heap address in /proc/pid/maps

2023-08-23 Thread Richard Henderson

Tested-by: Helge Deller 
Reviewed-by: Ilya Leoshkevich 
Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 0b91f996b7..0641d8f433 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8125,6 +8125,8 @@ static void open_self_maps_4(const struct 
open_self_maps_data *d,
 
 if (test_stack(start, end, info->stack_limit)) {
 path = "[stack]";
+} else if (start == info->brk) {
+path = "[heap]";
 }
 
 /* Except null device (MAP_ANON), adjust offset for this fragment. */
-- 
2.34.1

[PATCH 05/13] linux-user: Use walk_memory_regions for open_self_maps

2023-08-23 Thread Richard Henderson

Replace the by-hand method of region identification with
the official user-exec interface.  Cross-check the region
provided to the callback with the interval tree from
read_self_maps().

Tested-by: Helge Deller 
Reviewed-by: Ilya Leoshkevich 
Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 192 ++-
 1 file changed, 115 insertions(+), 77 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index a562920a84..0b91f996b7 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8095,12 +8095,66 @@ static int open_self_cmdline(CPUArchState *cpu_env, int 
fd)
 return 0;
 }
 
-static void show_smaps(int fd, unsigned long size)
-{
-unsigned long page_size_kb = TARGET_PAGE_SIZE >> 10;
-unsigned long size_kb = size >> 10;
+struct open_self_maps_data {
+TaskState *ts;
+IntervalTreeRoot *host_maps;
+int fd;
+bool smaps;
+};
 
-dprintf(fd, "Size:  %lu kB\n"
+/*
+ * Subroutine to output one line of /proc/self/maps,
+ * or one region of /proc/self/smaps.
+ */
+
+#ifdef TARGET_HPPA
+# define test_stack(S, E, L)  (E == L)
+#else
+# define test_stack(S, E, L)  (S == L)
+#endif
+
+static void open_self_maps_4(const struct open_self_maps_data *d,
+ const MapInfo *mi, abi_ptr start,
+ abi_ptr end, unsigned flags)
+{
+const struct image_info *info = d->ts->info;
+const char *path = mi->path;
+uint64_t offset;
+int fd = d->fd;
+int count;
+
+if (test_stack(start, end, info->stack_limit)) {
+path = "[stack]";
+}
+
+/* Except null device (MAP_ANON), adjust offset for this fragment. */
+offset = mi->offset;
+if (mi->dev) {
+uintptr_t hstart = (uintptr_t)g2h_untagged(start);
+offset += hstart - mi->itree.start;
+}
+
+count = dprintf(fd, TARGET_ABI_FMT_ptr "-" TARGET_ABI_FMT_ptr
+" %c%c%c%c %08" PRIx64 " %02x:%02x %"PRId64,
+start, end,
+(flags & PAGE_READ) ? 'r' : '-',
+(flags & PAGE_WRITE_ORG) ? 'w' : '-',
+(flags & PAGE_EXEC) ? 'x' : '-',
+mi->is_priv ? 'p' : 's',
+offset, major(mi->dev), minor(mi->dev),
+(uint64_t)mi->inode);
+if (path) {
+dprintf(fd, "%*s%s\n", 73 - count, "", path);
+} else {
+dprintf(fd, "\n");
+}
+
+if (d->smaps) {
+unsigned long size = end - start;
+unsigned long page_size_kb = TARGET_PAGE_SIZE >> 10;
+unsigned long size_kb = size >> 10;
+
+dprintf(fd, "Size:  %lu kB\n"
 "KernelPageSize:%lu kB\n"
 "MMUPageSize:   %lu kB\n"
 "Rss:   0 kB\n"
@@ -8121,91 +8175,75 @@ static void show_smaps(int fd, unsigned long size)
 "Swap:  0 kB\n"
 "SwapPss:   0 kB\n"
 "Locked:0 kB\n"
-"THPeligible:0\n", size_kb, page_size_kb, page_size_kb);
+"THPeligible:0\n"
+"VmFlags:%s%s%s%s%s%s%s%s\n",
+size_kb, page_size_kb, page_size_kb,
+(flags & PAGE_READ) ? " rd" : "",
+(flags & PAGE_WRITE_ORG) ? " wr" : "",
+(flags & PAGE_EXEC) ? " ex" : "",
+mi->is_priv ? "" : " sh",
+(flags & PAGE_READ) ? " mr" : "",
+(flags & PAGE_WRITE_ORG) ? " mw" : "",
+(flags & PAGE_EXEC) ? " me" : "",
+mi->is_priv ? "" : " ms");
+}
 }
 
-static int open_self_maps_1(CPUArchState *cpu_env, int fd, bool smaps)
+/*
+ * Callback for walk_memory_regions, when read_self_maps() fails.
+ * Proceed without the benefit of host /proc/self/maps cross-check.
+ */
+static int open_self_maps_3(void *opaque, target_ulong guest_start,
+target_ulong guest_end, unsigned long flags)
 {
-CPUState *cpu = env_cpu(cpu_env);
-TaskState *ts = cpu->opaque;
-IntervalTreeRoot *map_info = read_self_maps();
-IntervalTreeNode *s;
-int count;
+static const MapInfo mi = { .is_priv = true };
 
-for (s = interval_tree_iter_first(map_info, 0, -1); s;
- s = interval_tree_iter_next(s, 0, -1)) {
-MapInfo *e = container_of(s, MapInfo, itree);
+open_self_maps_4(opaque, , guest_start, guest_end, flags);
+return 0;
+}
 
-if (h2g_valid(e->itree.start)) {
-unsigned long min = e->itree.start;
-unsigned long max = e->itree.last + 1;
-int flags = page_get_flags(h2g(min));
-const char *path;
+/*
+ * Callback for walk_memory_regions, when read_self_maps() succeeds.
+ */
+static int open_self_maps_2(void *opaque, target_ulong guest_start,
+target_ulong guest_end, unsigned long flags)
+{
+

[PATCH 01/13] linux-user: Split out cpu/target_proc.h

2023-08-23 Thread Richard Henderson

Move the various open_cpuinfo functions into new files.
Move the m68k open_hardware function as well.
All other guest architectures get a boilerplate empty file.

Signed-off-by: Richard Henderson 
---
 linux-user/aarch64/target_proc.h |   1 +
 linux-user/alpha/target_proc.h   |   1 +
 linux-user/arm/target_proc.h |   1 +
 linux-user/cris/target_proc.h|   1 +
 linux-user/hexagon/target_proc.h |   1 +
 linux-user/hppa/target_proc.h|  26 
 linux-user/i386/target_proc.h|   1 +
 linux-user/loongarch64/target_proc.h |   1 +
 linux-user/m68k/target_proc.h|  16 +++
 linux-user/microblaze/target_proc.h  |   1 +
 linux-user/mips/target_proc.h|   1 +
 linux-user/mips64/target_proc.h  |   1 +
 linux-user/nios2/target_proc.h   |   1 +
 linux-user/openrisc/target_proc.h|   1 +
 linux-user/ppc/target_proc.h |   1 +
 linux-user/riscv/target_proc.h   |  37 ++
 linux-user/s390x/target_proc.h   | 109 +
 linux-user/sh4/target_proc.h |   1 +
 linux-user/sparc/target_proc.h   |  16 +++
 linux-user/x86_64/target_proc.h  |   1 +
 linux-user/xtensa/target_proc.h  |   1 +
 linux-user/syscall.c | 176 +--
 22 files changed, 226 insertions(+), 170 deletions(-)
 create mode 100644 linux-user/aarch64/target_proc.h
 create mode 100644 linux-user/alpha/target_proc.h
 create mode 100644 linux-user/arm/target_proc.h
 create mode 100644 linux-user/cris/target_proc.h
 create mode 100644 linux-user/hexagon/target_proc.h
 create mode 100644 linux-user/hppa/target_proc.h
 create mode 100644 linux-user/i386/target_proc.h
 create mode 100644 linux-user/loongarch64/target_proc.h
 create mode 100644 linux-user/m68k/target_proc.h
 create mode 100644 linux-user/microblaze/target_proc.h
 create mode 100644 linux-user/mips/target_proc.h
 create mode 100644 linux-user/mips64/target_proc.h
 create mode 100644 linux-user/nios2/target_proc.h
 create mode 100644 linux-user/openrisc/target_proc.h
 create mode 100644 linux-user/ppc/target_proc.h
 create mode 100644 linux-user/riscv/target_proc.h
 create mode 100644 linux-user/s390x/target_proc.h
 create mode 100644 linux-user/sh4/target_proc.h
 create mode 100644 linux-user/sparc/target_proc.h
 create mode 100644 linux-user/x86_64/target_proc.h
 create mode 100644 linux-user/xtensa/target_proc.h

diff --git a/linux-user/aarch64/target_proc.h b/linux-user/aarch64/target_proc.h
new file mode 100644
index 00..43fe29ca72
--- /dev/null
+++ b/linux-user/aarch64/target_proc.h
@@ -0,0 +1 @@
+/* No target-specific /proc support */
diff --git a/linux-user/alpha/target_proc.h b/linux-user/alpha/target_proc.h
new file mode 100644
index 00..43fe29ca72
--- /dev/null
+++ b/linux-user/alpha/target_proc.h
@@ -0,0 +1 @@
+/* No target-specific /proc support */
diff --git a/linux-user/arm/target_proc.h b/linux-user/arm/target_proc.h
new file mode 100644
index 00..43fe29ca72
--- /dev/null
+++ b/linux-user/arm/target_proc.h
@@ -0,0 +1 @@
+/* No target-specific /proc support */
diff --git a/linux-user/cris/target_proc.h b/linux-user/cris/target_proc.h
new file mode 100644
index 00..43fe29ca72
--- /dev/null
+++ b/linux-user/cris/target_proc.h
@@ -0,0 +1 @@
+/* No target-specific /proc support */
diff --git a/linux-user/hexagon/target_proc.h b/linux-user/hexagon/target_proc.h
new file mode 100644
index 00..43fe29ca72
--- /dev/null
+++ b/linux-user/hexagon/target_proc.h
@@ -0,0 +1 @@
+/* No target-specific /proc support */
diff --git a/linux-user/hppa/target_proc.h b/linux-user/hppa/target_proc.h
new file mode 100644
index 00..9340c3b6af
--- /dev/null
+++ b/linux-user/hppa/target_proc.h
@@ -0,0 +1,26 @@
+/*
+ * HPPA specific proc functions for linux-user
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+#ifndef HPPA_TARGET_PROC_H
+#define HPPA_TARGET_PROC_H
+
+static int open_cpuinfo(CPUArchState *cpu_env, int fd)
+{
+int i, num_cpus;
+
+num_cpus = sysconf(_SC_NPROCESSORS_ONLN);
+for (i = 0; i < num_cpus; i++) {
+dprintf(fd, "processor\t: %d\n", i);
+dprintf(fd, "cpu family\t: PA-RISC 1.1e\n");
+dprintf(fd, "cpu\t\t: PA7300LC (PCX-L2)\n");
+dprintf(fd, "capabilities\t: os32\n");
+dprintf(fd, "model\t\t: 9000/778/B160L - "
+"Merlin L2 160 QEMU (9000/778/B160L)\n\n");
+}
+return 0;
+}
+#define HAVE_ARCH_PROC_CPUINFO
+
+#endif /* HPPA_TARGET_PROC_H */
diff --git a/linux-user/i386/target_proc.h b/linux-user/i386/target_proc.h
new file mode 100644
index 00..43fe29ca72
--- /dev/null
+++ b/linux-user/i386/target_proc.h
@@ -0,0 +1 @@
+/* No target-specific /proc support */
diff --git a/linux-user/loongarch64/target_proc.h 
b/linux-user/loongarch64/target_proc.h
new file mode 100644
index 00..43fe29ca72
--- /dev/null
+++ b/linux-user/loongarch64/target_proc.h
@@ -0,0 +1 @@
+/* No target-specific /proc support */
diff --git

[PATCH 03/13] linux-user: Emulate /proc/cpuinfo for Alpha

2023-08-23 Thread Richard Henderson

From: Helge Deller 

Add emulation for /proc/cpuinfo for the alpha architecture.

alpha output example:

(alpha-chroot)root@p100:/# cat /proc/cpuinfo
cpu : Alpha
cpu model   : ev67
cpu variation   : 0
cpu revision: 0
cpu serial number   : JA
system type : QEMU
system variation: QEMU_v8.0.92
system revision : 0
system serial number: AY
cycle frequency [Hz]: 25000
timer frequency [Hz]: 250.00
page size [bytes]   : 8192
phys. address bits  : 44
max. addr. space #  : 255
BogoMIPS: 2500.00
platform string : AlphaServer QEMU user-mode VM
cpus detected   : 8
cpus active : 4
cpu active mask : 0095
L1 Icache   : n/a
L1 Dcache   : n/a
L2 cache: n/a
L3 cache: n/a

Cc: Michael Cree 
Signed-off-by: Helge Deller 
Reviewed-by: Richard Henderson 
Message-Id: <20230803214450.647040-4-del...@gmx.de>
Signed-off-by: Richard Henderson 
---
 linux-user/alpha/target_proc.h | 68 +-
 1 file changed, 67 insertions(+), 1 deletion(-)

diff --git a/linux-user/alpha/target_proc.h b/linux-user/alpha/target_proc.h
index 43fe29ca72..dac37dffc9 100644
--- a/linux-user/alpha/target_proc.h
+++ b/linux-user/alpha/target_proc.h
@@ -1 +1,67 @@
-/* No target-specific /proc support */
+/*
+ * Alpha specific proc functions for linux-user
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+#ifndef ALPHA_TARGET_PROC_H
+#define ALPHA_TARGET_PROC_H
+
+static int open_cpuinfo(CPUArchState *cpu_env, int fd)
+{
+int max_cpus = sysconf(_SC_NPROCESSORS_CONF);
+int num_cpus = sysconf(_SC_NPROCESSORS_ONLN);
+unsigned long cpu_mask;
+char model[32];
+const char *p, *q;
+int t;
+
+p = object_class_get_name(OBJECT_CLASS(CPU_GET_CLASS(env_cpu(cpu_env;
+q = strchr(p, '-');
+t = q - p;
+assert(t < sizeof(model));
+memcpy(model, p, t);
+model[t] = 0;
+
+t = sched_getaffinity(getpid(), sizeof(cpu_mask), (cpu_set_t *)_mask);
+if (t < 0) {
+if (num_cpus >= sizeof(cpu_mask) * 8) {
+cpu_mask = -1;
+} else {
+cpu_mask = (1UL << num_cpus) - 1;
+}
+}
+
+dprintf(fd,
+"cpu\t\t\t: Alpha\n"
+"cpu model\t\t: %s\n"
+"cpu variation\t\t: 0\n"
+"cpu revision\t\t: 0\n"
+"cpu serial number\t: JA\n"
+"system type\t\t: QEMU\n"
+"system variation\t: QEMU_v" QEMU_VERSION "\n"
+"system revision\t\t: 0\n"
+"system serial number\t: AY\n"
+"cycle frequency [Hz]\t: 25000\n"
+"timer frequency [Hz]\t: 250.00\n"
+"page size [bytes]\t: %d\n"
+"phys. address bits\t: %d\n"
+"max. addr. space #\t: 255\n"
+"BogoMIPS\t\t: 2500.00\n"
+"kernel unaligned acc\t: 0 (pc=0,va=0)\n"
+"user unaligned acc\t: 0 (pc=0,va=0)\n"
+"platform string\t\t: AlphaServer QEMU user-mode VM\n"
+"cpus detected\t\t: %d\n"
+"cpus active\t\t: %d\n"
+"cpu active mask\t\t: %016lx\n"
+"L1 Icache\t\t: n/a\n"
+"L1 Dcache\t\t: n/a\n"
+"L2 cache\t\t: n/a\n"
+"L3 cache\t\t: n/a\n",
+model, TARGET_PAGE_SIZE, TARGET_PHYS_ADDR_SPACE_BITS,
+max_cpus, num_cpus, cpu_mask);
+
+return 0;
+}
+#define HAVE_ARCH_PROC_CPUINFO
+
+#endif /* ALPHA_TARGET_PROC_H */
-- 
2.34.1

[PATCH 06/13] linux-user: Adjust brk for load_bias

2023-08-23 Thread Richard Henderson

PIE executables are usually linked at offset 0 and are
relocated somewhere during load.  The hiaddr needs to
be adjusted to keep the brk next to the executable.

Cc: qemu-sta...@nongnu.org
Fixes: 1f356e8c013 ("linux-user: Adjust initial brk when interpreter is close 
to executable")
Tested-by: Helge Deller 
Reviewed-by: Ilya Leoshkevich 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 linux-user/elfload.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index d5f67de288..dbc5d430e8 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -3326,7 +3326,7 @@ static void load_elf_image(const char *image_name, int 
image_fd,
 info->start_data = -1;
 info->end_data = 0;
 /* Usual start for brk is after all sections of the main executable. */
-info->brk = TARGET_PAGE_ALIGN(hiaddr);
+info->brk = TARGET_PAGE_ALIGN(hiaddr + load_bias);
 info->elf_flags = ehdr->e_flags;
 
 prot_exec = PROT_EXEC;
-- 
2.34.1

[PATCH 08/13] linux-user: Emulate the Anonymous: keyword in /proc/self/smaps

2023-08-23 Thread Richard Henderson

From: Ilya Leoshkevich 

Core dumps produced by gdb's gcore when connected to qemu's gdbstub
lack stack. The reason is that gdb includes only anonymous memory in
core dumps, which is distinguished by a non-0 Anonymous: value.

Consider the mappings with PAGE_ANON fully anonymous, and the mappings
without it fully non-anonymous.

Signed-off-by: Ilya Leoshkevich 
[rth: Update for open_self_maps_* rewrite]
Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 0641d8f433..8d96acd085 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8167,7 +8167,7 @@ static void open_self_maps_4(const struct 
open_self_maps_data *d,
 "Private_Clean: 0 kB\n"
 "Private_Dirty: 0 kB\n"
 "Referenced:0 kB\n"
-"Anonymous: 0 kB\n"
+"Anonymous: %lu kB\n"
 "LazyFree:  0 kB\n"
 "AnonHugePages: 0 kB\n"
 "ShmemPmdMapped:0 kB\n"
@@ -8180,6 +8180,7 @@ static void open_self_maps_4(const struct 
open_self_maps_data *d,
 "THPeligible:0\n"
 "VmFlags:%s%s%s%s%s%s%s%s\n",
 size_kb, page_size_kb, page_size_kb,
+(flags & PAGE_ANON ? size_kb : 0),
 (flags & PAGE_READ) ? " rd" : "",
 (flags & PAGE_WRITE_ORG) ? " wr" : "",
 (flags & PAGE_EXEC) ? " ex" : "",
-- 
2.34.1

[PATCH 00/13] linux-user patch queue

2023-08-23 Thread Richard Henderson

Combine a bunch of smaller linux-user patches:

Supercedes: 20230801230842.414421-1-del...@gmx.de
("[PATCH v2 0/3] linux-user: /proc/cpuinfo fix and content emulation for arm")
Supercedes: 20230807122206.655701-1-...@linux.ibm.com
("[PATCH v2] linux-user: Emulate the Anonymous: keyword in /proc/self/smaps")
Supercedes: 20230816181437.572997-1-richard.hender...@linaro.org
("[PATCH 0/6] linux-user: Rewrite open_self_maps")
Supercedes: 20230820204408.327348-1-richard.hender...@linaro.org
("[PATCH 0/4] linux-user: shmat/shmdt improvements")

with some additions.  Patches needing review:

  01-linux-user-Split-out-cpu-target_proc.h.patch
  11-linux-user-Use-WITH_MMAP_LOCK_GUARD-in-target_-shmat.patch
  12-linux-user-Fix-shmdt.patch
  13-linux-user-Track-shm-regions-with-an-interval-tree.patch


r~


Helge Deller (2):
  linux-user: Emulate /proc/cpuinfo on aarch64 and arm
  linux-user: Emulate /proc/cpuinfo for Alpha

Ilya Leoshkevich (1):
  linux-user: Emulate the Anonymous: keyword in /proc/self/smaps

Richard Henderson (10):
  linux-user: Split out cpu/target_proc.h
  util/selfmap: Use dev_t and ino_t in MapInfo
  linux-user: Use walk_memory_regions for open_self_maps
  linux-user: Adjust brk for load_bias
  linux-user: Show heap address in /proc/pid/maps
  linux-user: Remove ELF_START_MMAP and image_info.start_mmap
  linux-user: Move shmat and shmdt implementations to mmap.c
  linux-user: Use WITH_MMAP_LOCK_GUARD in target_{shmat,shmdt}
  linux-user: Fix shmdt
  linux-user: Track shm regions with an interval tree

 include/qemu/selfmap.h   |   4 +-
 linux-user/aarch64/target_proc.h |   1 +
 linux-user/alpha/target_proc.h   |  67 
 linux-user/arm/target_proc.h | 101 ++
 linux-user/cris/target_proc.h|   1 +
 linux-user/hexagon/target_proc.h |   1 +
 linux-user/hppa/target_proc.h|  26 ++
 linux-user/i386/target_proc.h|   1 +
 linux-user/loader.h  |   6 +-
 linux-user/loongarch64/target_proc.h |   1 +
 linux-user/m68k/target_proc.h|  16 +
 linux-user/microblaze/target_proc.h  |   1 +
 linux-user/mips/target_proc.h|   1 +
 linux-user/mips64/target_proc.h  |   1 +
 linux-user/nios2/target_proc.h   |   1 +
 linux-user/openrisc/target_proc.h|   1 +
 linux-user/ppc/target_proc.h |   1 +
 linux-user/qemu.h|   1 -
 linux-user/riscv/target_proc.h   |  37 ++
 linux-user/s390x/target_proc.h   | 109 ++
 linux-user/sh4/target_proc.h |   1 +
 linux-user/sparc/target_proc.h   |  16 +
 linux-user/user-mmap.h   |   4 +
 linux-user/x86_64/target_proc.h  |   1 +
 linux-user/xtensa/target_proc.h  |   1 +
 linux-user/elfload.c | 170 ++---
 linux-user/mmap.c| 168 +
 linux-user/syscall.c | 514 +++
 util/selfmap.c   |  12 +-
 29 files changed, 828 insertions(+), 437 deletions(-)
 create mode 100644 linux-user/aarch64/target_proc.h
 create mode 100644 linux-user/alpha/target_proc.h
 create mode 100644 linux-user/arm/target_proc.h
 create mode 100644 linux-user/cris/target_proc.h
 create mode 100644 linux-user/hexagon/target_proc.h
 create mode 100644 linux-user/hppa/target_proc.h
 create mode 100644 linux-user/i386/target_proc.h
 create mode 100644 linux-user/loongarch64/target_proc.h
 create mode 100644 linux-user/m68k/target_proc.h
 create mode 100644 linux-user/microblaze/target_proc.h
 create mode 100644 linux-user/mips/target_proc.h
 create mode 100644 linux-user/mips64/target_proc.h
 create mode 100644 linux-user/nios2/target_proc.h
 create mode 100644 linux-user/openrisc/target_proc.h
 create mode 100644 linux-user/ppc/target_proc.h
 create mode 100644 linux-user/riscv/target_proc.h
 create mode 100644 linux-user/s390x/target_proc.h
 create mode 100644 linux-user/sh4/target_proc.h
 create mode 100644 linux-user/sparc/target_proc.h
 create mode 100644 linux-user/x86_64/target_proc.h
 create mode 100644 linux-user/xtensa/target_proc.h

-- 
2.34.1

[PATCH 12/13] linux-user: Fix shmdt

2023-08-23 Thread Richard Henderson

If the shm region is not mapped at shmaddr, EINVAL.
Do not unmap the region until the syscall succeeds.
Use mmap_reserve_or_unmap to preserve reserved_va semantics.

Signed-off-by: Richard Henderson 
---
 linux-user/mmap.c | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index f45b2d307c..44116c014b 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -1102,14 +1102,25 @@ abi_long target_shmdt(abi_ulong shmaddr)
 /* shmdt pointers are always untagged */
 
 WITH_MMAP_LOCK_GUARD() {
-for (int i = 0; i < N_SHM_REGIONS; ++i) {
+int i;
+
+for (i = 0; i < N_SHM_REGIONS; ++i) {
 if (shm_regions[i].in_use && shm_regions[i].start == shmaddr) {
-shm_regions[i].in_use = false;
-page_set_flags(shmaddr, shmaddr + shm_regions[i].size - 1, 0);
 break;
 }
 }
+if (i == N_SHM_REGIONS) {
+return -TARGET_EINVAL;
+}
+
 rv = get_errno(shmdt(g2h_untagged(shmaddr)));
+if (rv == 0) {
+abi_ulong size = shm_regions[i].size;
+
+shm_regions[i].in_use = false;
+page_set_flags(shmaddr, shmaddr + size - 1, 0);
+mmap_reserve_or_unmap(shmaddr, size);
+}
 }
 return rv;
 }
-- 
2.34.1

[PATCH 02/13] linux-user: Emulate /proc/cpuinfo on aarch64 and arm

2023-08-23 Thread Richard Henderson

From: Helge Deller 

Add emulation for /proc/cpuinfo for arm architecture.
The output below mimics output as seen on debian porterboxes.

aarch64 output example:

processor   : 0
model name  : ARMv8 Processor rev 0 (v8l)
BogoMIPS: 100.00
Features: swp half thumb fast_mult vfp edsp neon vfpv3 tls vfpv4 idiva 
idivt vfpd32 lpae aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x1
CPU part: 0xd07
CPU revision: 0

arm 32-bit output example:

processor   : 0
model name  : ARMv7 Processor rev 5 (armv7l)
BogoMIPS: 100.00
Features: swp half thumb fast_mult vfp edsp thumbee neon vfpv3 tls 
vfpv4 idiva idivt vfpd32 lpae
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0f
CPU part: 0xc07
CPU revision: 5

Signed-off-by: Helge Deller 
Reviewed-by: Richard Henderson 
Message-Id: <20230803214450.647040-3-del...@gmx.de>
Signed-off-by: Richard Henderson 
---
 linux-user/aarch64/target_proc.h |   2 +-
 linux-user/arm/target_proc.h | 102 +++-
 linux-user/loader.h  |   6 +-
 linux-user/elfload.c | 130 ++-
 4 files changed, 233 insertions(+), 7 deletions(-)

diff --git a/linux-user/aarch64/target_proc.h b/linux-user/aarch64/target_proc.h
index 43fe29ca72..907df4dcd2 100644
--- a/linux-user/aarch64/target_proc.h
+++ b/linux-user/aarch64/target_proc.h
@@ -1 +1 @@
-/* No target-specific /proc support */
+#include "../arm/target_proc.h"
diff --git a/linux-user/arm/target_proc.h b/linux-user/arm/target_proc.h
index 43fe29ca72..ac75af9ca6 100644
--- a/linux-user/arm/target_proc.h
+++ b/linux-user/arm/target_proc.h
@@ -1 +1,101 @@
-/* No target-specific /proc support */
+/*
+ * Arm specific proc functions for linux-user
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+#ifndef ARM_TARGET_PROC_H
+#define ARM_TARGET_PROC_H
+
+static int open_cpuinfo(CPUArchState *cpu_env, int fd)
+{
+ARMCPU *cpu = env_archcpu(cpu_env);
+int arch, midr_rev, midr_part, midr_var, midr_impl;
+target_ulong elf_hwcap = get_elf_hwcap();
+target_ulong elf_hwcap2 = get_elf_hwcap2();
+const char *elf_name;
+int num_cpus, len_part, len_var;
+
+#if TARGET_BIG_ENDIAN
+# define END_SUFFIX "b"
+#else
+# define END_SUFFIX "l"
+#endif
+
+arch = 8;
+elf_name = "v8" END_SUFFIX;
+midr_rev = FIELD_EX32(cpu->midr, MIDR_EL1, REVISION);
+midr_part = FIELD_EX32(cpu->midr, MIDR_EL1, PARTNUM);
+midr_var = FIELD_EX32(cpu->midr, MIDR_EL1, VARIANT);
+midr_impl = FIELD_EX32(cpu->midr, MIDR_EL1, IMPLEMENTER);
+len_part = 3;
+len_var = 1;
+
+#ifndef TARGET_AARCH64
+/* For simplicity, treat ARMv8 as an arm64 kernel with CONFIG_COMPAT. */
+if (!arm_feature(>env, ARM_FEATURE_V8)) {
+if (arm_feature(>env, ARM_FEATURE_V7)) {
+arch = 7;
+midr_var = (cpu->midr >> 16) & 0x7f;
+len_var = 2;
+if (arm_feature(>env, ARM_FEATURE_M)) {
+elf_name = "armv7m" END_SUFFIX;
+} else {
+elf_name = "armv7" END_SUFFIX;
+}
+} else {
+midr_part = cpu->midr >> 4;
+len_part = 7;
+if (arm_feature(>env, ARM_FEATURE_V6)) {
+arch = 6;
+elf_name = "armv6" END_SUFFIX;
+} else if (arm_feature(>env, ARM_FEATURE_V5)) {
+arch = 5;
+elf_name = "armv5t" END_SUFFIX;
+} else {
+arch = 4;
+elf_name = "armv4" END_SUFFIX;
+}
+}
+}
+#endif
+
+#undef END_SUFFIX
+
+num_cpus = sysconf(_SC_NPROCESSORS_ONLN);
+for (int i = 0; i < num_cpus; i++) {
+dprintf(fd,
+"processor\t: %d\n"
+"model name\t: ARMv%d Processor rev %d (%s)\n"
+"BogoMIPS\t: 100.00\n"
+"Features\t:",
+i, arch, midr_rev, elf_name);
+
+for (target_ulong j = elf_hwcap; j ; j &= j - 1) {
+dprintf(fd, " %s", elf_hwcap_str(ctz64(j)));
+}
+for (target_ulong j = elf_hwcap2; j ; j &= j - 1) {
+dprintf(fd, " %s", elf_hwcap2_str(ctz64(j)));
+}
+
+dprintf(fd, "\n"
+"CPU implementer\t: 0x%02x\n"
+"CPU architecture: %d\n"
+"CPU variant\t: 0x%0*x\n",
+midr_impl, arch, len_var, midr_var);
+if (arch >= 7) {
+dprintf(fd, "CPU part\t: 0x%0*x\n", len_part, midr_part);
+}
+dprintf(fd, "CPU revision\t: %d\n\n", midr_rev);
+}
+
+if (arch < 8) {
+dprintf(fd, "Hardware\t: QEMU v%s %s\n", QEMU_VERSION,
+cpu->dtb_compatible ? : "");
+dprintf(fd, "Revision\t: \n");
+dprintf(fd, "Serial\t\t: \n");
+}
+return 0;
+}
+#define HAVE_ARCH_PROC_CPUINFO
+
+#endif /* ARM_TARGET_PROC_H */
diff --git

[PATCH v2] target/loongarch: cpu: Implement get_arch_id callback

2023-08-23 Thread Bibo Mao

Implement the callback for getting the architecture-dependent CPU
ID, the cpu ID is physical id described in ACPI MADT table, this
will be used for cpu hotplug.

Signed-off-by: Bibo Mao 
Reviewed-by: Song Gao 
---

v1->v2:
 remove unuseful changeid.

---
 hw/loongarch/virt.c| 2 ++
 target/loongarch/cpu.c | 8 
 target/loongarch/cpu.h | 1 +
 3 files changed, 11 insertions(+)

diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index e19b042ce8..6f6b577749 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -815,6 +815,8 @@ static void loongarch_init(MachineState *machine)
 cpu = cpu_create(machine->cpu_type);
 cpu->cpu_index = i;
 machine->possible_cpus->cpus[i].cpu = OBJECT(cpu);
+lacpu = LOONGARCH_CPU(cpu);
+lacpu->phy_id = machine->possible_cpus->cpus[i].arch_id;
 }
 fdt_add_cpu_nodes(lams);
 
diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index ad93ecac92..7be3769672 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -690,6 +690,13 @@ static struct TCGCPUOps loongarch_tcg_ops = {
 static const struct SysemuCPUOps loongarch_sysemu_ops = {
 .get_phys_page_debug = loongarch_cpu_get_phys_page_debug,
 };
+
+static int64_t loongarch_cpu_get_arch_id(CPUState *cs)
+{
+LoongArchCPU *cpu = LOONGARCH_CPU(cs);
+
+return cpu->phy_id;
+}
 #endif
 
 static gchar *loongarch_gdb_arch_name(CPUState *cs)
@@ -715,6 +722,7 @@ static void loongarch_cpu_class_init(ObjectClass *c, void 
*data)
 cc->set_pc = loongarch_cpu_set_pc;
 cc->get_pc = loongarch_cpu_get_pc;
 #ifndef CONFIG_USER_ONLY
+cc->get_arch_id = loongarch_cpu_get_arch_id;
 dc->vmsd = _loongarch_cpu;
 cc->sysemu_ops = _sysemu_ops;
 #endif
diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index fa371ca8ba..033081593c 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -371,6 +371,7 @@ struct ArchCPU {
 CPUNegativeOffsetState neg;
 CPULoongArchState env;
 QEMUTimer timer;
+uint32_t  phy_id;
 
 /* 'compatible' string for this CPU for Linux device trees */
 const char *dtb_compatible;
-- 
2.27.0

Re: NVMe ZNS last zone size

2023-08-23 Thread Sam Li

Klaus Jensen  于2023年8月24日周四 02:53写道：
>
> On Aug 23 22:58, Sam Li wrote:
> > Stefan Hajnoczi  于2023年8月23日周三 22:41写道：
> > >
> > > On Wed, 23 Aug 2023 at 10:24, Sam Li  wrote:
> > > >
> > > > Hi Stefan,
> > > >
> > > > Stefan Hajnoczi  于2023年8月23日周三 21:26写道：
> > > > >
> > > > > Hi Sam and Klaus,
> > > > > Val is adding nvme-io_uring ZNS support to libblkio
> > > > > (https://gitlab.com/libblkio/libblkio/-/merge_requests/221) and asked
> > > > > how to test the size of the last zone when the namespace's total size
> > > > > is not a multiple of the zone size.
> > > >
> > > > I think a zone report operation can do the trick. Given zone configs,
> > > > the size of last zone should be [size - (nr_zones - 1) * zone_size].
> > > > Reporting last zone on such devices tells whether the value is
> > > > correct.
> > >
> > > In nvme_ns_zoned_check_calc_geometry() the number of zones is rounded 
> > > down:
> > >
> > >   ns->num_zones = le64_to_cpu(ns->id_ns.nsze) / ns->zone_size;
> > >
> > > Afterwards nsze is recalculated as follows:
> > >
> > >   ns->id_ns.nsze = cpu_to_le64(ns->num_zones * ns->zone_size);
> > >
> > > I interpret this to mean that when the namespace's total size is not a
> > > multiple of the zone size, then the last part will be ignored and not
> > > exposed as a zone.
> >
> > I see. Current ZNS emulation does not support this case.
> >
>
> NVMe Zoned Namespaces requires all zones to be the same size. The
> "trailing zone" is a thing in SMR HDDs.

Thanks! Then qcow2 with ZNS should also ignore the trailing zone.

Sam

Re: [PATCH] softmmu: Fix dirtylimit memory leak

2023-08-23 Thread Yong Huang

On Wed, Aug 23, 2023 at 3:48 PM  wrote:

> From: "alloc.young" 
>
> Fix memory leak in hmp_info_vcpu_dirty_limit,use g_autoptr
> handle memory deallocation, alse use g_free to match g_malloc
> && g_new functions.
>
> Signed-off-by: alloc.young 
> ---
>  softmmu/dirtylimit.c | 26 --
>  1 file changed, 12 insertions(+), 14 deletions(-)
>
> [...]

> diff --git a/softmmu/dirtylimit.c b/softmmu/dirtylimit.c
> index 3c275ee55b..fa959d7743 100644
> --- a/softmmu/dirtylimit.c
> +++ b/softmmu/dirtylimit.c
> @@ -100,7 +100,7 @@ static void vcpu_dirty_rate_stat_collect(void)
>  stat.rates[i].dirty_rate;
>  }
>
> -free(stat.rates);
> +g_free(stat.rates);
>  }
>
> Code optimization.

>  static void *vcpu_dirty_rate_stat_thread(void *opaque)
> @@ -171,10 +171,10 @@ void vcpu_dirty_rate_stat_initialize(void)
>
>  void vcpu_dirty_rate_stat_finalize(void)
>  {
> -free(vcpu_dirty_rate_stat->stat.rates);
> +g_free(vcpu_dirty_rate_stat->stat.rates);
>  vcpu_dirty_rate_stat->stat.rates = NULL;
>
> -free(vcpu_dirty_rate_stat);
> +g_free(vcpu_dirty_rate_stat);
>  vcpu_dirty_rate_stat = NULL;
>  }
>
> Likewise...

> @@ -220,10 +220,10 @@ void dirtylimit_state_initialize(void)
>
>  void dirtylimit_state_finalize(void)
>  {
> -free(dirtylimit_state->states);
> +g_free(dirtylimit_state->states);
>  dirtylimit_state->states = NULL;
>
> -free(dirtylimit_state);
> +g_free(dirtylimit_state);
>  dirtylimit_state = NULL;
>
> Likewise...

>  trace_dirtylimit_state_finalize();
> @@ -653,7 +653,8 @@ struct DirtyLimitInfoList
> *qmp_query_vcpu_dirty_limit(Error **errp)
>
>  void hmp_info_vcpu_dirty_limit(Monitor *mon, const QDict *qdict)
>  {
> -DirtyLimitInfoList *limit, *head, *info = NULL;
> +DirtyLimitInfoList *info;
> +g_autoptr(DirtyLimitInfoList) head = NULL;
>  Error *err = NULL;
>
>  if (!dirtylimit_in_service()) {
> @@ -661,20 +662,17 @@ void hmp_info_vcpu_dirty_limit(Monitor *mon, const
> QDict *qdict)
>  return;
>  }
>
> -info = qmp_query_vcpu_dirty_limit();
> +head = qmp_query_vcpu_dirty_limit();
>  if (err) {
>  hmp_handle_error(mon, err);
>  return;
>  }
>
> -head = info;
> -for (limit = head; limit != NULL; limit = limit->next) {
> +for (info = head; info != NULL; info = info->next) {
>  monitor_printf(mon, "vcpu[%"PRIi64"], limit rate %"PRIi64 "
> (MB/s),"
>  " current rate %"PRIi64 " (MB/s)\n",
> -limit->value->cpu_index,
> -limit->value->limit_rate,
> -limit->value->current_rate);
> +info->value->cpu_index,
> +info->value->limit_rate,
> +info->value->current_rate);
>  }
> -
> -g_free(info);
>
Fix memory leak.

>  }
> --
> 2.39.3
>
> I'll choose the memory leak modifications to keep the patch focused on a
single
independent issue.

Anyway,

Reviewed-by: Hyman Huang(黄勇) 

-- 
Best regards

[PATCH v2 1/4] block: remove AIOCBInfo->get_aio_context()

2023-08-23 Thread Stefan Hajnoczi

The synchronous bdrv_aio_cancel() function needs the acb's AioContext so
it can call aio_poll() to wait for cancellation.

It turns out that all users run under the BQL in the main AioContext, so
this callback is not needed.

Remove the callback, mark bdrv_aio_cancel() GLOBAL_STATE_CODE just like
its blk_aio_cancel() caller, and poll the main loop AioContext.

The purpose of this cleanup is to identify bdrv_aio_cancel() as an API
that does not work with the multi-queue block layer.

Signed-off-by: Stefan Hajnoczi 
---
 include/block/aio.h|  1 -
 include/block/block-global-state.h |  2 ++
 include/block/block-io.h   |  1 -
 block/block-backend.c  | 17 -
 block/io.c | 23 ---
 hw/nvme/ctrl.c |  7 ---
 softmmu/dma-helpers.c  |  8 
 util/thread-pool.c |  8 
 8 files changed, 10 insertions(+), 57 deletions(-)

diff --git a/include/block/aio.h b/include/block/aio.h
index 32042e8905..bcc165c974 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -31,7 +31,6 @@ typedef void BlockCompletionFunc(void *opaque, int ret);
 
 typedef struct AIOCBInfo {
 void (*cancel_async)(BlockAIOCB *acb);
-AioContext *(*get_aio_context)(BlockAIOCB *acb);
 size_t aiocb_size;
 } AIOCBInfo;
 
diff --git a/include/block/block-global-state.h 
b/include/block/block-global-state.h
index f347199bff..ac2a605ef5 100644
--- a/include/block/block-global-state.h
+++ b/include/block/block-global-state.h
@@ -185,6 +185,8 @@ void bdrv_drain_all_begin_nopoll(void);
 void bdrv_drain_all_end(void);
 void bdrv_drain_all(void);
 
+void bdrv_aio_cancel(BlockAIOCB *acb);
+
 int bdrv_has_zero_init_1(BlockDriverState *bs);
 int bdrv_has_zero_init(BlockDriverState *bs);
 BlockDriverState *bdrv_find_node(const char *node_name);
diff --git a/include/block/block-io.h b/include/block/block-io.h
index 4415506e40..b078d17bf1 100644
--- a/include/block/block-io.h
+++ b/include/block/block-io.h
@@ -101,7 +101,6 @@ bdrv_co_delete_file_noerr(BlockDriverState *bs);
 
 
 /* async block I/O */
-void bdrv_aio_cancel(BlockAIOCB *acb);
 void bdrv_aio_cancel_async(BlockAIOCB *acb);
 
 /* sg packet commands */
diff --git a/block/block-backend.c b/block/block-backend.c
index 4009ed5fed..a77295a198 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -33,8 +33,6 @@
 
 #define NOT_DONE 0x7fff /* used while emulated sync operation in progress 
*/
 
-static AioContext *blk_aiocb_get_aio_context(BlockAIOCB *acb);
-
 typedef struct BlockBackendAioNotifier {
 void (*attached_aio_context)(AioContext *new_context, void *opaque);
 void (*detach_aio_context)(void *opaque);
@@ -103,7 +101,6 @@ typedef struct BlockBackendAIOCB {
 } BlockBackendAIOCB;
 
 static const AIOCBInfo block_backend_aiocb_info = {
-.get_aio_context = blk_aiocb_get_aio_context,
 .aiocb_size = sizeof(BlockBackendAIOCB),
 };
 
@@ -1545,16 +1542,8 @@ typedef struct BlkAioEmAIOCB {
 bool has_returned;
 } BlkAioEmAIOCB;
 
-static AioContext *blk_aio_em_aiocb_get_aio_context(BlockAIOCB *acb_)
-{
-BlkAioEmAIOCB *acb = container_of(acb_, BlkAioEmAIOCB, common);
-
-return blk_get_aio_context(acb->rwco.blk);
-}
-
 static const AIOCBInfo blk_aio_em_aiocb_info = {
 .aiocb_size = sizeof(BlkAioEmAIOCB),
-.get_aio_context= blk_aio_em_aiocb_get_aio_context,
 };
 
 static void blk_aio_complete(BlkAioEmAIOCB *acb)
@@ -2434,12 +2423,6 @@ AioContext *blk_get_aio_context(BlockBackend *blk)
 return blk->ctx;
 }
 
-static AioContext *blk_aiocb_get_aio_context(BlockAIOCB *acb)
-{
-BlockBackendAIOCB *blk_acb = DO_UPCAST(BlockBackendAIOCB, common, acb);
-return blk_get_aio_context(blk_acb->blk);
-}
-
 int blk_set_aio_context(BlockBackend *blk, AioContext *new_context,
 Error **errp)
 {
diff --git a/block/io.c b/block/io.c
index 055fcf7438..16245dc93a 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2944,25 +2944,18 @@ int bdrv_load_vmstate(BlockDriverState *bs, uint8_t 
*buf,
 /**/
 /* async I/Os */
 
+/**
+ * Synchronously cancels an acb. Must be called with the BQL held and the acb
+ * must be processed with the BQL held too (IOThreads are not allowed).
+ *
+ * Use bdrv_aio_cancel_async() instead when possible.
+ */
 void bdrv_aio_cancel(BlockAIOCB *acb)
 {
-IO_CODE();
+GLOBAL_STATE_CODE();
 qemu_aio_ref(acb);
 bdrv_aio_cancel_async(acb);
-while (acb->refcnt > 1) {
-if (acb->aiocb_info->get_aio_context) {
-aio_poll(acb->aiocb_info->get_aio_context(acb), true);
-} else if (acb->bs) {
-/* qemu_aio_ref and qemu_aio_unref are not thread-safe, so
- * assert that we're not using an I/O thread.  Thread-safe
- * code should use bdrv_aio_cancel_async exclusively.
- */
-assert(bdrv_get_aio_context(acb->bs) ==

[PATCH v2 4/4] block-coroutine-wrapper: use qemu_get_current_aio_context()

2023-08-23 Thread Stefan Hajnoczi

Use qemu_get_current_aio_context() in mixed wrappers and coroutine
wrappers so that code runs in the caller's AioContext instead of moving
to the BlockDriverState's AioContext. This change is necessary for the
multi-queue block layer where any thread can call into the block layer.

Most wrappers are IO_CODE where it's safe to use the current AioContext
nowadays. BlockDrivers and the core block layer use their own locks and
no longer depend on the AioContext lock for thread-safety.

The bdrv_create() wrapper invokes GLOBAL_STATE code. Using the current
AioContext is safe because this code is only called with the BQL held
from the main loop thread.

The output of qemu-iotests 051 is sensitive to event loop activity.
Update the output because the monitor BH runs at a different time,
causing prompts to be printed differently in the output.

Signed-off-by: Stefan Hajnoczi 
---
 scripts/block-coroutine-wrapper.py | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/scripts/block-coroutine-wrapper.py 
b/scripts/block-coroutine-wrapper.py
index d4a183db61..f93fe154c3 100644
--- a/scripts/block-coroutine-wrapper.py
+++ b/scripts/block-coroutine-wrapper.py
@@ -88,8 +88,6 @@ def __init__(self, wrapper_type: str, return_type: str, name: 
str,
 raise ValueError(f"no_co function can't be rdlock: 
{self.name}")
 self.target_name = f'{subsystem}_{subname}'
 
-self.ctx = self.gen_ctx()
-
 self.get_result = 's->ret = '
 self.ret = 'return s.ret;'
 self.co_ret = 'return '
@@ -162,7 +160,7 @@ def create_mixed_wrapper(func: FuncDecl) -> str:
 {func.co_ret}{name}({ func.gen_list('{name}') });
 }} else {{
 {struct_name} s = {{
-.poll_state.ctx = {func.ctx},
+.poll_state.ctx = qemu_get_current_aio_context(),
 .poll_state.in_progress = true,
 
 { func.gen_block('.{name} = {name},') }
@@ -186,7 +184,7 @@ def create_co_wrapper(func: FuncDecl) -> str:
 {func.return_type} {func.name}({ func.gen_list('{decl}') })
 {{
 {struct_name} s = {{
-.poll_state.ctx = {func.ctx},
+.poll_state.ctx = qemu_get_current_aio_context(),
 .poll_state.in_progress = true,
 
 { func.gen_block('.{name} = {name},') }
-- 
2.41.0

[PATCH v2 0/4] block-backend: process I/O in the current AioContext

2023-08-23 Thread Stefan Hajnoczi

v2
- Add patch to remove AIOCBInfo->get_aio_context() [Kevin]
- Add patch to use qemu_get_current_aio_context() in block-coroutine-wrapper so
  that the wrappers use the current AioContext instead of
  bdrv_get_aio_context().

Switch blk_aio_*() APIs over to multi-queue by using
qemu_get_current_aio_context() instead of blk_get_aio_context(). This change
will allow devices to process I/O in multiple IOThreads in the future.

The final patch requires my QIOChannel AioContext series to pass
tests/qemu-iotests/check -qcow2 281 because the nbd block driver is now
accessed from the main loop thread in addition to the IOThread:
https://lore.kernel.org/qemu-devel/20230823234504.1387239-1-stefa...@redhat.com/T/#t

Based-on: 20230823234504.1387239-1-stefa...@redhat.com

Stefan Hajnoczi (4):
  block: remove AIOCBInfo->get_aio_context()
  block-backend: process I/O in the current AioContext
  block-backend: process zoned requests in the current AioContext
  block-coroutine-wrapper: use qemu_get_current_aio_context()

 include/block/aio.h|  1 -
 include/block/block-global-state.h |  2 ++
 include/block/block-io.h   |  1 -
 block/block-backend.c  | 35 --
 block/io.c | 23 +++-
 hw/nvme/ctrl.c |  7 --
 softmmu/dma-helpers.c  |  8 ---
 util/thread-pool.c |  8 ---
 scripts/block-coroutine-wrapper.py |  6 ++---
 9 files changed, 21 insertions(+), 70 deletions(-)

-- 
2.41.0

[PATCH v2 3/4] block-backend: process zoned requests in the current AioContext

2023-08-23 Thread Stefan Hajnoczi

Process zoned requests in the current thread's AioContext instead of in
the BlockBackend's AioContext.

There is no need to use the BlockBackend's AioContext thanks to CoMutex
bs->wps->colock, which protects zone metadata.

Signed-off-by: Stefan Hajnoczi 
---
 block/block-backend.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 4863be5691..427ebcc0e4 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1890,11 +1890,11 @@ BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, 
int64_t offset,
 acb->has_returned = false;
 
 co = qemu_coroutine_create(blk_aio_zone_report_entry, acb);
-aio_co_enter(blk_get_aio_context(blk), co);
+aio_co_enter(qemu_get_current_aio_context(), co);
 
 acb->has_returned = true;
 if (acb->rwco.ret != NOT_DONE) {
-replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
+replay_bh_schedule_oneshot_event(qemu_get_current_aio_context(),
  blk_aio_complete_bh, acb);
 }
 
@@ -1931,11 +1931,11 @@ BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, 
BlockZoneOp op,
 acb->has_returned = false;
 
 co = qemu_coroutine_create(blk_aio_zone_mgmt_entry, acb);
-aio_co_enter(blk_get_aio_context(blk), co);
+aio_co_enter(qemu_get_current_aio_context(), co);
 
 acb->has_returned = true;
 if (acb->rwco.ret != NOT_DONE) {
-replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
+replay_bh_schedule_oneshot_event(qemu_get_current_aio_context(),
  blk_aio_complete_bh, acb);
 }
 
@@ -1971,10 +1971,10 @@ BlockAIOCB *blk_aio_zone_append(BlockBackend *blk, 
int64_t *offset,
 acb->has_returned = false;
 
 co = qemu_coroutine_create(blk_aio_zone_append_entry, acb);
-aio_co_enter(blk_get_aio_context(blk), co);
+aio_co_enter(qemu_get_current_aio_context(), co);
 acb->has_returned = true;
 if (acb->rwco.ret != NOT_DONE) {
-replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
+replay_bh_schedule_oneshot_event(qemu_get_current_aio_context(),
  blk_aio_complete_bh, acb);
 }
 
-- 
2.41.0

[PATCH v2 2/4] block-backend: process I/O in the current AioContext

2023-08-23 Thread Stefan Hajnoczi

Switch blk_aio_*() APIs over to multi-queue by using
qemu_get_current_aio_context() instead of blk_get_aio_context(). This
change will allow devices to process I/O in multiple IOThreads in the
future.

I audited existing blk_aio_*() callers:
- migration/block.c: blk_mig_lock() protects the data accessed by the
  completion callback.
- The remaining emulated devices and exports run with
  qemu_get_aio_context() == blk_get_aio_context().

Signed-off-by: Stefan Hajnoczi 
---
 block/block-backend.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index a77295a198..4863be5691 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1530,7 +1530,7 @@ BlockAIOCB *blk_abort_aio_request(BlockBackend *blk,
 acb->blk = blk;
 acb->ret = ret;
 
-replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
+replay_bh_schedule_oneshot_event(qemu_get_current_aio_context(),
  error_callback_bh, acb);
 return >common;
 }
@@ -1584,11 +1584,11 @@ static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, 
int64_t offset,
 acb->has_returned = false;
 
 co = qemu_coroutine_create(co_entry, acb);
-aio_co_enter(blk_get_aio_context(blk), co);
+aio_co_enter(qemu_get_current_aio_context(), co);
 
 acb->has_returned = true;
 if (acb->rwco.ret != NOT_DONE) {
-replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
+replay_bh_schedule_oneshot_event(qemu_get_current_aio_context(),
  blk_aio_complete_bh, acb);
 }
 
-- 
2.41.0

[PATCH 2/2] io: follow coroutine AioContext in qio_channel_yield()

2023-08-23 Thread Stefan Hajnoczi

The ongoing QEMU multi-queue block layer effort makes it possible for multiple
threads to process I/O in parallel. The nbd block driver is not compatible with
the multi-queue block layer yet because QIOChannel cannot be used easily from
coroutines running in multiple threads. This series changes the QIOChannel API
to make that possible.

In the current API, calling qio_channel_attach_aio_context() sets the
AioContext where qio_channel_yield() installs an fd handler prior to yielding:

  qio_channel_attach_aio_context(ioc, my_ctx);
  ...
  qio_channel_yield(ioc); // my_ctx is used here
  ...
  qio_channel_detach_aio_context(ioc);

This API design has limitations: reading and writing must be done in the same
AioContext and moving between AioContexts involves a cumbersome sequence of API
calls that is not suitable for doing on a per-request basis.

There is no fundamental reason why a QIOChannel needs to run within the
same AioContext every time qio_channel_yield() is called. QIOChannel
only uses the AioContext while inside qio_channel_yield(). The rest of
the time, QIOChannel is independent of any AioContext.

In the new API, qio_channel_yield() queries the AioContext from the current
coroutine using qemu_coroutine_get_aio_context(). There is no need to
explicitly attach/detach AioContexts anymore and
qio_channel_attach_aio_context() and qio_channel_detach_aio_context() are gone.
One coroutine can read from the QIOChannel while another coroutine writes from
a different AioContext.

This API change allows the nbd block driver to use QIOChannel from any thread.
It's important to keep in mind that the block driver already synchronizes
QIOChannel access and ensures that two coroutines never read simultaneously or
write simultaneously.

This patch updates all users of qio_channel_attach_aio_context() to the
new API. Most conversions are simple, but vhost-user-server requires a
new qemu_coroutine_yield() call to quiesce the vu_client_trip()
coroutine when not attached to any AioContext.

While the API is has become simpler, there is one wart: QIOChannel has a
special case for the iohandler AioContext (used for handlers that must not run
in nested event loops). I didn't find an elegant way preserve that behavior, so
I added a new API called qio_channel_set_follow_coroutine_ctx(ioc, true|false)
for opting in to the new AioContext model. By default QIOChannel uses the
iohandler AioHandler. Code that formerly called
qio_channel_attach_aio_context() now calls
qio_channel_set_follow_coroutine_ctx(ioc, true) once after the QIOChannel is
created.

Signed-off-by: Stefan Hajnoczi 
---
 include/io/channel.h |  34 +++--
 include/qemu/vhost-user-server.h |   1 +
 block/nbd.c  |  11 +--
 io/channel-command.c |  13 +++-
 io/channel-file.c|  18 -
 io/channel-null.c|   3 +-
 io/channel-socket.c  |  18 -
 io/channel-tls.c |   6 +-
 io/channel.c | 120 ++-
 migration/channel-block.c|   3 +-
 nbd/client.c |   2 +-
 nbd/server.c |  14 +---
 scsi/qemu-pr-helper.c|   4 +-
 util/vhost-user-server.c |  27 +--
 14 files changed, 191 insertions(+), 83 deletions(-)

diff --git a/include/io/channel.h b/include/io/channel.h
index 229bf36910..dfbe6f2931 100644
--- a/include/io/channel.h
+++ b/include/io/channel.h
@@ -81,9 +81,11 @@ struct QIOChannel {
 Object parent;
 unsigned int features; /* bitmask of QIOChannelFeatures */
 char *name;
-AioContext *ctx;
+AioContext *read_ctx;
 Coroutine *read_coroutine;
+AioContext *write_ctx;
 Coroutine *write_coroutine;
+bool follow_coroutine_ctx;
 #ifdef _WIN32
 HANDLE event; /* For use with GSource on Win32 */
 #endif
@@ -140,8 +142,9 @@ struct QIOChannelClass {
  int whence,
  Error **errp);
 void (*io_set_aio_fd_handler)(QIOChannel *ioc,
-  AioContext *ctx,
+  AioContext *read_ctx,
   IOHandler *io_read,
+  AioContext *write_ctx,
   IOHandler *io_write,
   void *opaque);
 int (*io_flush)(QIOChannel *ioc,
@@ -498,6 +501,21 @@ int qio_channel_set_blocking(QIOChannel *ioc,
  bool enabled,
  Error **errp);
 
+/**
+ * qio_channel_set_follow_coroutine_ctx:
+ * @ioc: the channel object
+ * @enabled: whether or not to follow the coroutine's AioContext
+ *
+ * If @enabled is true, calls to qio_channel_yield() use the current
+ * coroutine's AioContext. Usually this is desirable.
+ *
+ * If @enabled is false, calls to qio_channel_yield() use the global iohandler
+ * AioContext. This is may be used by coroutines that run in the main loop and
+ * do not

[PATCH 0/2] io: follow coroutine AioContext in qio_channel_yield()

2023-08-23 Thread Stefan Hajnoczi

The ongoing QEMU multi-queue block layer effort makes it possible for multiple
threads to process I/O in parallel. The nbd block driver is not compatible with
the multi-queue block layer yet because QIOChannel cannot be used easily from
coroutines running in multiple threads. This series changes the QIOChannel API
to make that possible.

Stefan Hajnoczi (2):
  io: check there are no qio_channel_yield() coroutines during
->finalize()
  io: follow coroutine AioContext in qio_channel_yield()

 include/io/channel.h |  34 -
 include/qemu/vhost-user-server.h |   1 +
 block/nbd.c  |  11 +--
 io/channel-command.c |  13 +++-
 io/channel-file.c|  18 -
 io/channel-null.c|   3 +-
 io/channel-socket.c  |  18 -
 io/channel-tls.c |   6 +-
 io/channel.c | 124 ++-
 migration/channel-block.c|   3 +-
 nbd/client.c |   2 +-
 nbd/server.c |  14 +---
 scsi/qemu-pr-helper.c|   4 +-
 util/vhost-user-server.c |  27 +--
 14 files changed, 195 insertions(+), 83 deletions(-)

-- 
2.41.0

[PATCH 1/2] io: check there are no qio_channel_yield() coroutines during ->finalize()

2023-08-23 Thread Stefan Hajnoczi

Callers must clean up their coroutines before calling
object_unref(OBJECT(ioc)) to prevent an fd handler leak. Add an
assertion to check this.

This patch is preparation for the fd handler changes that follow.

Signed-off-by: Stefan Hajnoczi 
---
 io/channel.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/io/channel.c b/io/channel.c
index 72f0066af5..c415f3fc88 100644
--- a/io/channel.c
+++ b/io/channel.c
@@ -653,6 +653,10 @@ static void qio_channel_finalize(Object *obj)
 {
 QIOChannel *ioc = QIO_CHANNEL(obj);
 
+/* Must not have coroutines in qio_channel_yield() */
+assert(!ioc->read_coroutine);
+assert(!ioc->write_coroutine);
+
 g_free(ioc->name);
 
 #ifdef _WIN32
-- 
2.41.0

Re: [PULL 00/12] First batch of s390x patches for QEMU 8.2

2023-08-23 Thread Stefan Hajnoczi

On Wed, Aug 23, 2023 at 01:45:32PM +0200, Thomas Huth wrote:
> The following changes since commit b0dd9a7d6dd15a6898e9c585b521e6bec79b25aa:
> 
>   Open 8.2 development tree (2023-08-22 07:14:07 -0700)
> 
> are available in the Git repository at:
> 
>   https://gitlab.com/thuth/qemu.git tags/pull-request-2023-08-23
> 
> for you to fetch changes up to 6c49f685d30ffe81cfa47da2c258904ad28ac368:
> 
>   tests/tcg/s390x: Test VSTRS (2023-08-23 12:07:30 +0200)

Hi Thomas,
Please take a look at the following ubuntu-20.04-s390x-all CI failure:
https://gitlab.com/qemu-project/qemu/-/jobs/4931341536

Stefan


signature.asc
Description: PGP signature

Re: [PATCH v2 0/8] tcg: Document swap helper implementations

2023-08-23 Thread Philippe Mathieu-Daudé


On 23/8/23 20:46, Richard Henderson wrote:

On 8/23/23 07:55, Philippe Mathieu-Daudé wrote:

Philippe Mathieu-Daudé (8):
   tcg/tcg-op: Document bswap16_i32() byte pattern
   tcg/tcg-op: Document bswap16_i64() byte pattern
   tcg/tcg-op: Document bswap32_i32() byte pattern
   tcg/tcg-op: Document bswap32_i64() byte pattern
   tcg/tcg-op: Document bswap64_i64() byte pattern
   tcg/tcg-op: Document hswap_i32/64() byte pattern
   tcg/tcg-op: Document wswap_i64() byte pattern
   target/cris: Fix a typo in gen_swapr()


Reviewed-by: Richard Henderson 

and queued to tcg-next with a few tweaks.


Thanks!

Re: [PATCH v2 1/4] softmmu: Support concurrent bounce buffers

2023-08-23 Thread Peter Xu

On Wed, Aug 23, 2023 at 10:08:08PM +0200, Mattias Nissler wrote:
> Peter, thanks for taking a look and providing feedback!
> 
> On Wed, Aug 23, 2023 at 7:35 PM Peter Xu  wrote:
> >
> > On Wed, Aug 23, 2023 at 02:29:02AM -0700, Mattias Nissler wrote:
> > > When DMA memory can't be directly accessed, as is the case when
> > > running the device model in a separate process without shareable DMA
> > > file descriptors, bounce buffering is used.
> > >
> > > It is not uncommon for device models to request mapping of several DMA
> > > regions at the same time. Examples include:
> > >  * net devices, e.g. when transmitting a packet that is split across
> > >several TX descriptors (observed with igb)
> > >  * USB host controllers, when handling a packet with multiple data TRBs
> > >(observed with xhci)
> > >
> > > Previously, qemu only provided a single bounce buffer and would fail DMA
> > > map requests while the buffer was already in use. In turn, this would
> > > cause DMA failures that ultimately manifest as hardware errors from the
> > > guest perspective.
> > >
> > > This change allocates DMA bounce buffers dynamically instead of
> > > supporting only a single buffer. Thus, multiple DMA mappings work
> > > correctly also when RAM can't be mmap()-ed.
> > >
> > > The total bounce buffer allocation size is limited by a new command line
> > > parameter. The default is 4096 bytes to match the previous maximum
> > > buffer size. It is expected that suitable limits will vary quite a bit
> > > in practice depending on device models and workloads.
> > >
> > > Signed-off-by: Mattias Nissler 
> > > ---
> > >  include/sysemu/sysemu.h |  2 +
> > >  qemu-options.hx | 27 +
> > >  softmmu/globals.c   |  1 +
> > >  softmmu/physmem.c   | 84 +++--
> > >  softmmu/vl.c|  6 +++
> > >  5 files changed, 83 insertions(+), 37 deletions(-)
> > >
> > > diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> > > index 25be2a692e..c5dc93cb53 100644
> > > --- a/include/sysemu/sysemu.h
> > > +++ b/include/sysemu/sysemu.h
> > > @@ -61,6 +61,8 @@ extern int nb_option_roms;
> > >  extern const char *prom_envs[MAX_PROM_ENVS];
> > >  extern unsigned int nb_prom_envs;
> > >
> > > +extern uint64_t max_bounce_buffer_size;
> > > +
> > >  /* serial ports */
> > >
> > >  /* Return the Chardev for serial port i, or NULL if none */
> > > diff --git a/qemu-options.hx b/qemu-options.hx
> > > index 29b98c3d4c..6071794237 100644
> > > --- a/qemu-options.hx
> > > +++ b/qemu-options.hx
> > > @@ -4959,6 +4959,33 @@ SRST
> > >  ERST
> > >  #endif
> > >
> > > +DEF("max-bounce-buffer-size", HAS_ARG,
> > > +QEMU_OPTION_max_bounce_buffer_size,
> > > +"-max-bounce-buffer-size size\n"
> > > +"DMA bounce buffer size limit in bytes 
> > > (default=4096)\n",
> > > +QEMU_ARCH_ALL)
> > > +SRST
> > > +``-max-bounce-buffer-size size``
> > > +Set the limit in bytes for DMA bounce buffer allocations.
> > > +
> > > +DMA bounce buffers are used when device models request memory-mapped 
> > > access
> > > +to memory regions that can't be directly mapped by the qemu process, 
> > > so the
> > > +memory must read or written to a temporary local buffer for the 
> > > device
> > > +model to work with. This is the case e.g. for I/O memory regions, 
> > > and when
> > > +running in multi-process mode without shared access to memory.
> > > +
> > > +Whether bounce buffering is necessary depends heavily on the device 
> > > model
> > > +implementation. Some devices use explicit DMA read and write 
> > > operations
> > > +which do not require bounce buffers. Some devices, notably storage, 
> > > will
> > > +retry a failed DMA map request after bounce buffer space becomes 
> > > available
> > > +again. Most other devices will bail when encountering map request 
> > > failures,
> > > +which will typically appear to the guest as a hardware error.
> > > +
> > > +Suitable bounce buffer size values depend on the workload and guest
> > > +configuration. A few kilobytes up to a few megabytes are common sizes
> > > +encountered in practice.
> >
> > Does it mean that the default 4K size can still easily fail with some
> > device setup?
> 
> Yes. The thing is that the respective device setup is pretty exotic,
> at least the only setup I'm aware of is multi-process with direct RAM
> access via shared file descriptors from the device process disabled
> (which hurts performance, so few people will run this setup). In
> theory, DMA to an I/O region of some sort would also run into the
> issue even in single process mode, but I'm not aware of such a
> situation. Looking at it from a historic perspective, note that the
> single-bounce-buffer restriction has been present since a decade, and
> thus the issue has been present for years (since multi-process is a
> thing), without it hurting anyone enough to get fixed. But

[PULL 01/48] accel/kvm: Widen pc/saved_insn for kvm_sw_breakpoint

2023-08-23 Thread Richard Henderson

From: Anton Johansson via 

Widens the pc and saved_insn fields of kvm_sw_breakpoint from
target_ulong to vaddr. The pc argument of kvm_find_sw_breakpoint is also
widened to match.

Signed-off-by: Anton Johansson 
Reviewed-by: Richard Henderson 
Message-Id: <20230807155706.9580-2-a...@rev.ng>
Signed-off-by: Richard Henderson 
---
 include/sysemu/kvm.h | 6 +++---
 accel/kvm/kvm-all.c  | 3 +--
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 115f0cca79..5670306dbf 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -411,14 +411,14 @@ struct kvm_guest_debug;
 struct kvm_debug_exit_arch;
 
 struct kvm_sw_breakpoint {
-target_ulong pc;
-target_ulong saved_insn;
+vaddr pc;
+vaddr saved_insn;
 int use_count;
 QTAILQ_ENTRY(kvm_sw_breakpoint) entry;
 };
 
 struct kvm_sw_breakpoint *kvm_find_sw_breakpoint(CPUState *cpu,
- target_ulong pc);
+ vaddr pc);
 
 int kvm_sw_breakpoints_active(CPUState *cpu);
 
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 7b3da8dc3a..76a6d91d15 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -3306,8 +3306,7 @@ bool kvm_arm_supports_user_irq(void)
 }
 
 #ifdef KVM_CAP_SET_GUEST_DEBUG
-struct kvm_sw_breakpoint *kvm_find_sw_breakpoint(CPUState *cpu,
- target_ulong pc)
+struct kvm_sw_breakpoint *kvm_find_sw_breakpoint(CPUState *cpu, vaddr pc)
 {
 struct kvm_sw_breakpoint *bp;
 
-- 
2.34.1

[PULL 32/48] tcg/i386: Merge tcg_out_brcond{32,64}

2023-08-23 Thread Richard Henderson

Pass a rexw parameter instead of duplicating the functions.

Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.c.inc | 110 +-
 1 file changed, 49 insertions(+), 61 deletions(-)

diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 3045b56002..33f66ba204 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -1436,99 +1436,89 @@ static void tcg_out_cmp(TCGContext *s, TCGArg arg1, 
TCGArg arg2,
 }
 }
 
-static void tcg_out_brcond32(TCGContext *s, TCGCond cond,
- TCGArg arg1, TCGArg arg2, int const_arg2,
- TCGLabel *label, int small)
+static void tcg_out_brcond(TCGContext *s, int rexw, TCGCond cond,
+   TCGArg arg1, TCGArg arg2, int const_arg2,
+   TCGLabel *label, bool small)
 {
-tcg_out_cmp(s, arg1, arg2, const_arg2, 0);
+tcg_out_cmp(s, arg1, arg2, const_arg2, rexw);
 tcg_out_jxx(s, tcg_cond_to_jcc[cond], label, small);
 }
 
-#if TCG_TARGET_REG_BITS == 64
-static void tcg_out_brcond64(TCGContext *s, TCGCond cond,
- TCGArg arg1, TCGArg arg2, int const_arg2,
- TCGLabel *label, int small)
-{
-tcg_out_cmp(s, arg1, arg2, const_arg2, P_REXW);
-tcg_out_jxx(s, tcg_cond_to_jcc[cond], label, small);
-}
-#else
-/* XXX: we implement it at the target level to avoid having to
-   handle cross basic blocks temporaries */
+#if TCG_TARGET_REG_BITS == 32
 static void tcg_out_brcond2(TCGContext *s, const TCGArg *args,
-const int *const_args, int small)
+const int *const_args, bool small)
 {
 TCGLabel *label_next = gen_new_label();
 TCGLabel *label_this = arg_label(args[5]);
 
 switch(args[4]) {
 case TCG_COND_EQ:
-tcg_out_brcond32(s, TCG_COND_NE, args[0], args[2], const_args[2],
- label_next, 1);
-tcg_out_brcond32(s, TCG_COND_EQ, args[1], args[3], const_args[3],
- label_this, small);
+tcg_out_brcond(s, 0, TCG_COND_NE, args[0], args[2], const_args[2],
+   label_next, 1);
+tcg_out_brcond(s, 0, TCG_COND_EQ, args[1], args[3], const_args[3],
+   label_this, small);
 break;
 case TCG_COND_NE:
-tcg_out_brcond32(s, TCG_COND_NE, args[0], args[2], const_args[2],
- label_this, small);
-tcg_out_brcond32(s, TCG_COND_NE, args[1], args[3], const_args[3],
- label_this, small);
+tcg_out_brcond(s, 0, TCG_COND_NE, args[0], args[2], const_args[2],
+   label_this, small);
+tcg_out_brcond(s, 0, TCG_COND_NE, args[1], args[3], const_args[3],
+   label_this, small);
 break;
 case TCG_COND_LT:
-tcg_out_brcond32(s, TCG_COND_LT, args[1], args[3], const_args[3],
- label_this, small);
+tcg_out_brcond(s, 0, TCG_COND_LT, args[1], args[3], const_args[3],
+   label_this, small);
 tcg_out_jxx(s, JCC_JNE, label_next, 1);
-tcg_out_brcond32(s, TCG_COND_LTU, args[0], args[2], const_args[2],
- label_this, small);
+tcg_out_brcond(s, 0, TCG_COND_LTU, args[0], args[2], const_args[2],
+   label_this, small);
 break;
 case TCG_COND_LE:
-tcg_out_brcond32(s, TCG_COND_LT, args[1], args[3], const_args[3],
- label_this, small);
+tcg_out_brcond(s, 0, TCG_COND_LT, args[1], args[3], const_args[3],
+   label_this, small);
 tcg_out_jxx(s, JCC_JNE, label_next, 1);
-tcg_out_brcond32(s, TCG_COND_LEU, args[0], args[2], const_args[2],
- label_this, small);
+tcg_out_brcond(s, 0, TCG_COND_LEU, args[0], args[2], const_args[2],
+   label_this, small);
 break;
 case TCG_COND_GT:
-tcg_out_brcond32(s, TCG_COND_GT, args[1], args[3], const_args[3],
- label_this, small);
+tcg_out_brcond(s, 0, TCG_COND_GT, args[1], args[3], const_args[3],
+   label_this, small);
 tcg_out_jxx(s, JCC_JNE, label_next, 1);
-tcg_out_brcond32(s, TCG_COND_GTU, args[0], args[2], const_args[2],
- label_this, small);
+tcg_out_brcond(s, 0, TCG_COND_GTU, args[0], args[2], const_args[2],
+   label_this, small);
 break;
 case TCG_COND_GE:
-tcg_out_brcond32(s, TCG_COND_GT, args[1], args[3], const_args[3],
- label_this, small);
+tcg_out_brcond(s, 0, TCG_COND_GT, args[1], args[3], const_args[3],
+   label_this, small);
 tcg_out_jxx(s, JCC_JNE, label_next, 1);
-tcg_out_brcond32(s,

[PULL 47/48] docs/devel/tcg-ops: fix missing newlines in "Host vector operations"

2023-08-23 Thread Richard Henderson

From: Mark Cave-Ayland 

This unintentionally causes the mov_vec, ld_vec and st_vec operations
to appear on the same line.

Signed-off-by: Mark Cave-Ayland 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: <20230823141740.35974-1-mark.cave-ayl...@ilande.co.uk>
Signed-off-by: Richard Henderson 
---
 docs/devel/tcg-ops.rst | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/docs/devel/tcg-ops.rst b/docs/devel/tcg-ops.rst
index 9e2a931d85..8ae59ea02b 100644
--- a/docs/devel/tcg-ops.rst
+++ b/docs/devel/tcg-ops.rst
@@ -718,7 +718,9 @@ E.g. VECL = 1 -> 64 << 1 -> v128, and VECE = 2 -> 1 << 2 -> 
i32.
 .. list-table::
 
* - mov_vec *v0*, *v1*
+
ld_vec *v0*, *t1*
+
st_vec *v0*, *t1*
 
  - | Move, load and store.
-- 
2.34.1

[PULL 48/48] tcg: spelling fixes

2023-08-23 Thread Richard Henderson

From: Michael Tokarev 

Acked-by: Alex Bennée 
Signed-off-by: Michael Tokarev 
Message-Id: <20230823065335.1919380-4-...@tls.msk.ru>
Signed-off-by: Richard Henderson 
---
 tcg/aarch64/tcg-target.c.inc |  2 +-
 tcg/arm/tcg-target.c.inc | 10 ++
 tcg/riscv/tcg-target.c.inc   |  4 ++--
 3 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index 7d8d114c9e..0931a69448 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -3098,7 +3098,7 @@ static void tcg_target_qemu_prologue(TCGContext *s)
 #if !defined(CONFIG_SOFTMMU)
 /*
  * Note that XZR cannot be encoded in the address base register slot,
- * as that actaully encodes SP.  Depending on the guest, we may need
+ * as that actually encodes SP.  Depending on the guest, we may need
  * to zero-extend the guest address via the address index register slot,
  * therefore we need to load even a zero guest base into a register.
  */
diff --git a/tcg/arm/tcg-target.c.inc b/tcg/arm/tcg-target.c.inc
index 162df38c73..acb5f23b54 100644
--- a/tcg/arm/tcg-target.c.inc
+++ b/tcg/arm/tcg-target.c.inc
@@ -1216,9 +1216,11 @@ static TCGCond tcg_out_cmp2(TCGContext *s, const TCGArg 
*args,
 case TCG_COND_LEU:
 case TCG_COND_GTU:
 case TCG_COND_GEU:
-/* We perform a conditional comparision.  If the high half is
-   equal, then overwrite the flags with the comparison of the
-   low half.  The resulting flags cover the whole.  */
+/*
+ * We perform a conditional comparison.  If the high half is
+ * equal, then overwrite the flags with the comparison of the
+ * low half.  The resulting flags cover the whole.
+ */
 tcg_out_dat_rI(s, COND_AL, ARITH_CMP, 0, ah, bh, const_bh);
 tcg_out_dat_rI(s, COND_EQ, ARITH_CMP, 0, al, bl, const_bl);
 return cond;
@@ -1250,7 +1252,7 @@ static TCGCond tcg_out_cmp2(TCGContext *s, const TCGArg 
*args,
 
 /*
  * Note that TCGReg references Q-registers.
- * Q-regno = 2 * D-regno, so shift left by 1 whlie inserting.
+ * Q-regno = 2 * D-regno, so shift left by 1 while inserting.
  */
 static uint32_t encode_vd(TCGReg rd)
 {
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index 232b616af3..9be81c1b7b 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -69,7 +69,7 @@ static const char * const 
tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
 
 static const int tcg_target_reg_alloc_order[] = {
 /* Call saved registers */
-/* TCG_REG_S0 reservered for TCG_AREG0 */
+/* TCG_REG_S0 reserved for TCG_AREG0 */
 TCG_REG_S1,
 TCG_REG_S2,
 TCG_REG_S3,
@@ -260,7 +260,7 @@ typedef enum {
 /* Zba: Bit manipulation extension, address generation */
 OPC_ADD_UW = 0x083b,
 
-/* Zbb: Bit manipulation extension, basic bit manipulaton */
+/* Zbb: Bit manipulation extension, basic bit manipulation */
 OPC_ANDN   = 0x40007033,
 OPC_CLZ= 0x60001013,
 OPC_CLZW   = 0x6000101b,
-- 
2.34.1

[PULL 42/48] tcg/tcg-op: Document bswap32_i64() byte pattern

2023-08-23 Thread Richard Henderson

From: Philippe Mathieu-Daudé 

Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20230823145542.79633-5-phi...@linaro.org>
Signed-off-by: Richard Henderson 
---
 tcg/tcg-op.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index ed0ab218a1..b56ae748b8 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -1822,6 +1822,14 @@ void tcg_gen_bswap16_i64(TCGv_i64 ret, TCGv_i64 arg, int 
flags)
 }
 }
 
+/*
+ * bswap32_i64: 32-bit byte swap on the low bits of a 64-bit value.
+ *
+ * Byte pattern: abcd -> dcba
+ *
+ * With TCG_BSWAP_IZ, x == zero, else undefined.
+ * With TCG_BSWAP_OZ, y == zero, with TCG_BSWAP_OS y == sign, else undefined.
+ */
 void tcg_gen_bswap32_i64(TCGv_i64 ret, TCGv_i64 arg, int flags)
 {
 /* Only one extension flag may be present. */
@@ -1855,7 +1863,8 @@ void tcg_gen_bswap32_i64(TCGv_i64 ret, TCGv_i64 arg, int 
flags)
 } else {
 tcg_gen_shri_i64(t1, t1, 32);   /*  t1 = dc.. */
 }
-tcg_gen_or_i64(ret, t0, t1);/* ret = dcba */
+tcg_gen_or_i64(ret, t0, t1);/* ret = dcba (OS) */
+/*   dcba (else) */
 
 tcg_temp_free_i64(t0);
 tcg_temp_free_i64(t1);
-- 
2.34.1

[PULL 45/48] tcg/tcg-op: Document wswap_i64() byte pattern

2023-08-23 Thread Richard Henderson

From: Philippe Mathieu-Daudé 

Document wswap_i64(), added in commit 46be8425ff
("tcg: Implement tcg_gen_{h,w}swap_{i32,i64}").

Reviewed-by: Richard Henderson 
Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20230823145542.79633-8-phi...@linaro.org>
Signed-off-by: Richard Henderson 
---
 tcg/tcg-op.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 58572526b7..02a8cadcc0 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -1950,6 +1950,11 @@ void tcg_gen_hswap_i64(TCGv_i64 ret, TCGv_i64 arg)
 tcg_temp_free_i64(t1);
 }
 
+/*
+ * wswap_i64: Swap 32-bit words within a 64-bit value.
+ *
+ * Byte pattern: abcdefgh -> efghabcd
+ */
 void tcg_gen_wswap_i64(TCGv_i64 ret, TCGv_i64 arg)
 {
 /* Swapping 2 32-bit elements is a rotate. */
-- 
2.34.1

[PULL 44/48] tcg/tcg-op: Document hswap_i32/64() byte pattern

2023-08-23 Thread Richard Henderson

From: Philippe Mathieu-Daudé 

Document hswap_i32() and hswap_i64(), added in commit
46be8425ff ("tcg: Implement tcg_gen_{h,w}swap_{i32,i64}").

Reviewed-by: Richard Henderson 
Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20230823145542.79633-7-phi...@linaro.org>
Signed-off-by: Richard Henderson 
---
 tcg/tcg-op.c | 25 ++---
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 22c682c28e..58572526b7 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -1108,6 +1108,11 @@ void tcg_gen_bswap32_i32(TCGv_i32 ret, TCGv_i32 arg)
 }
 }
 
+/*
+ * hswap_i32: Swap 16-bit halfwords within a 32-bit value.
+ *
+ * Byte pattern: abcd -> cdab
+ */
 void tcg_gen_hswap_i32(TCGv_i32 ret, TCGv_i32 arg)
 {
 /* Swapping 2 16-bit elements is a rotate. */
@@ -1921,19 +1926,25 @@ void tcg_gen_bswap64_i64(TCGv_i64 ret, TCGv_i64 arg)
 }
 }
 
+/*
+ * hswap_i64: Swap 16-bit halfwords within a 64-bit value.
+ * See also include/qemu/bitops.h, hswap64.
+ *
+ * Byte pattern: abcdefgh -> ghefcdab
+ */
 void tcg_gen_hswap_i64(TCGv_i64 ret, TCGv_i64 arg)
 {
 uint64_t m = 0xull;
 TCGv_i64 t0 = tcg_temp_ebb_new_i64();
 TCGv_i64 t1 = tcg_temp_ebb_new_i64();
 
-/* See include/qemu/bitops.h, hswap64. */
-tcg_gen_rotli_i64(t1, arg, 32);
-tcg_gen_andi_i64(t0, t1, m);
-tcg_gen_shli_i64(t0, t0, 16);
-tcg_gen_shri_i64(t1, t1, 16);
-tcg_gen_andi_i64(t1, t1, m);
-tcg_gen_or_i64(ret, t0, t1);
+/* arg = abcdefgh */
+tcg_gen_rotli_i64(t1, arg, 32); /*  t1 = efghabcd */
+tcg_gen_andi_i64(t0, t1, m);/*  t0 = ..gh..cd */
+tcg_gen_shli_i64(t0, t0, 16);   /*  t0 = gh..cd.. */
+tcg_gen_shri_i64(t1, t1, 16);   /*  t1 = ..efghab */
+tcg_gen_andi_i64(t1, t1, m);/*  t1 = ..ef..ab */
+tcg_gen_or_i64(ret, t0, t1);/* ret = ghefcdab */
 
 tcg_temp_free_i64(t0);
 tcg_temp_free_i64(t1);
-- 
2.34.1

[PULL 43/48] tcg/tcg-op: Document bswap64_i64() byte pattern

2023-08-23 Thread Richard Henderson

From: Philippe Mathieu-Daudé 

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Richard Henderson 
Message-Id: <20230823145542.79633-6-phi...@linaro.org>
Signed-off-by: Richard Henderson 
---
 tcg/tcg-op.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index b56ae748b8..22c682c28e 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -1871,6 +1871,11 @@ void tcg_gen_bswap32_i64(TCGv_i64 ret, TCGv_i64 arg, int 
flags)
 }
 }
 
+/*
+ * bswap64_i64: 64-bit byte swap on a 64-bit value.
+ *
+ * Byte pattern: abcdefgh -> hgfedcba
+ */
 void tcg_gen_bswap64_i64(TCGv_i64 ret, TCGv_i64 arg)
 {
 if (TCG_TARGET_REG_BITS == 32) {
-- 
2.34.1

[PULL 41/48] tcg/tcg-op: Document bswap32_i32() byte pattern

2023-08-23 Thread Richard Henderson

From: Philippe Mathieu-Daudé 

Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20230823145542.79633-4-phi...@linaro.org>
Signed-off-by: Richard Henderson 
---
 tcg/tcg-op.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 88c7c60ffe..ed0ab218a1 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -1078,6 +1078,11 @@ void tcg_gen_bswap16_i32(TCGv_i32 ret, TCGv_i32 arg, int 
flags)
 }
 }
 
+/*
+ * bswap32_i32: 32-bit byte swap on a 32-bit value.
+ *
+ * Byte pattern: abcd -> dcba
+ */
 void tcg_gen_bswap32_i32(TCGv_i32 ret, TCGv_i32 arg)
 {
 if (TCG_TARGET_HAS_bswap32_i32) {
-- 
2.34.1

[PULL 19/48] target/arm: Use tcg_gen_negsetcond_*

2023-08-23 Thread Richard Henderson

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/tcg/translate-a64.c | 22 +-
 target/arm/tcg/translate.c | 12 
 2 files changed, 13 insertions(+), 21 deletions(-)

diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 5fa1257d32..da686cc953 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -4935,9 +4935,12 @@ static void disas_cond_select(DisasContext *s, uint32_t 
insn)
 
 if (rn == 31 && rm == 31 && (else_inc ^ else_inv)) {
 /* CSET & CSETM.  */
-tcg_gen_setcond_i64(tcg_invert_cond(c.cond), tcg_rd, c.value, zero);
 if (else_inv) {
-tcg_gen_neg_i64(tcg_rd, tcg_rd);
+tcg_gen_negsetcond_i64(tcg_invert_cond(c.cond),
+   tcg_rd, c.value, zero);
+} else {
+tcg_gen_setcond_i64(tcg_invert_cond(c.cond),
+tcg_rd, c.value, zero);
 }
 } else {
 TCGv_i64 t_true = cpu_reg(s, rn);
@@ -8670,13 +8673,10 @@ static void handle_3same_64(DisasContext *s, int 
opcode, bool u,
 }
 break;
 case 0x6: /* CMGT, CMHI */
-/* 64 bit integer comparison, result = test ? (2^64 - 1) : 0.
- * We implement this using setcond (test) and then negating.
- */
 cond = u ? TCG_COND_GTU : TCG_COND_GT;
 do_cmop:
-tcg_gen_setcond_i64(cond, tcg_rd, tcg_rn, tcg_rm);
-tcg_gen_neg_i64(tcg_rd, tcg_rd);
+/* 64 bit integer comparison, result = test ? -1 : 0. */
+tcg_gen_negsetcond_i64(cond, tcg_rd, tcg_rn, tcg_rm);
 break;
 case 0x7: /* CMGE, CMHS */
 cond = u ? TCG_COND_GEU : TCG_COND_GE;
@@ -9265,14 +9265,10 @@ static void handle_2misc_64(DisasContext *s, int 
opcode, bool u,
 }
 break;
 case 0xa: /* CMLT */
-/* 64 bit integer comparison against zero, result is
- * test ? (2^64 - 1) : 0. We implement via setcond(!test) and
- * subtracting 1.
- */
 cond = TCG_COND_LT;
 do_cmop:
-tcg_gen_setcondi_i64(cond, tcg_rd, tcg_rn, 0);
-tcg_gen_neg_i64(tcg_rd, tcg_rd);
+/* 64 bit integer comparison against zero, result is test ? -1 : 0. */
+tcg_gen_negsetcond_i64(cond, tcg_rd, tcg_rn, tcg_constant_i64(0));
 break;
 case 0x8: /* CMGT, CMGE */
 cond = u ? TCG_COND_GE : TCG_COND_GT;
diff --git a/target/arm/tcg/translate.c b/target/arm/tcg/translate.c
index b71ac2d0d5..31d3130e4c 100644
--- a/target/arm/tcg/translate.c
+++ b/target/arm/tcg/translate.c
@@ -2946,13 +2946,11 @@ void gen_gvec_sqrdmlsh_qc(unsigned vece, uint32_t 
rd_ofs, uint32_t rn_ofs,
 #define GEN_CMP0(NAME, COND)\
 static void gen_##NAME##0_i32(TCGv_i32 d, TCGv_i32 a)   \
 {   \
-tcg_gen_setcondi_i32(COND, d, a, 0);\
-tcg_gen_neg_i32(d, d);  \
+tcg_gen_negsetcond_i32(COND, d, a, tcg_constant_i32(0));\
 }   \
 static void gen_##NAME##0_i64(TCGv_i64 d, TCGv_i64 a)   \
 {   \
-tcg_gen_setcondi_i64(COND, d, a, 0);\
-tcg_gen_neg_i64(d, d);  \
+tcg_gen_negsetcond_i64(COND, d, a, tcg_constant_i64(0));\
 }   \
 static void gen_##NAME##0_vec(unsigned vece, TCGv_vec d, TCGv_vec a) \
 {   \
@@ -3863,15 +3861,13 @@ void gen_gvec_mls(unsigned vece, uint32_t rd_ofs, 
uint32_t rn_ofs,
 static void gen_cmtst_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b)
 {
 tcg_gen_and_i32(d, a, b);
-tcg_gen_setcondi_i32(TCG_COND_NE, d, d, 0);
-tcg_gen_neg_i32(d, d);
+tcg_gen_negsetcond_i32(TCG_COND_NE, d, d, tcg_constant_i32(0));
 }
 
 void gen_cmtst_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b)
 {
 tcg_gen_and_i64(d, a, b);
-tcg_gen_setcondi_i64(TCG_COND_NE, d, d, 0);
-tcg_gen_neg_i64(d, d);
+tcg_gen_negsetcond_i64(TCG_COND_NE, d, d, tcg_constant_i64(0));
 }
 
 static void gen_cmtst_vec(unsigned vece, TCGv_vec d, TCGv_vec a, TCGv_vec b)
-- 
2.34.1

[PULL 38/48] tcg/i386: Implement negsetcond_*

2023-08-23 Thread Richard Henderson

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.h |  4 ++--
 tcg/i386/tcg-target.c.inc | 32 
 2 files changed, 26 insertions(+), 10 deletions(-)

diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index ebc0b1a6ce..8417ea4899 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -150,7 +150,7 @@ typedef enum {
 #define TCG_TARGET_HAS_sextract_i32 1
 #define TCG_TARGET_HAS_extract2_i32 1
 #define TCG_TARGET_HAS_movcond_i32  1
-#define TCG_TARGET_HAS_negsetcond_i32   0
+#define TCG_TARGET_HAS_negsetcond_i32   1
 #define TCG_TARGET_HAS_add2_i32 1
 #define TCG_TARGET_HAS_sub2_i32 1
 #define TCG_TARGET_HAS_mulu2_i321
@@ -187,7 +187,7 @@ typedef enum {
 #define TCG_TARGET_HAS_sextract_i64 0
 #define TCG_TARGET_HAS_extract2_i64 1
 #define TCG_TARGET_HAS_movcond_i64  1
-#define TCG_TARGET_HAS_negsetcond_i64   0
+#define TCG_TARGET_HAS_negsetcond_i64   1
 #define TCG_TARGET_HAS_add2_i64 1
 #define TCG_TARGET_HAS_sub2_i64 1
 #define TCG_TARGET_HAS_mulu2_i641
diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 16e830051d..0c3d1e4cef 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -1529,7 +1529,7 @@ static void tcg_out_brcond2(TCGContext *s, const TCGArg 
*args,
 
 static void tcg_out_setcond(TCGContext *s, int rexw, TCGCond cond,
 TCGArg dest, TCGArg arg1, TCGArg arg2,
-int const_arg2)
+int const_arg2, bool neg)
 {
 bool inv = false;
 bool cleared;
@@ -1570,11 +1570,18 @@ static void tcg_out_setcond(TCGContext *s, int rexw, 
TCGCond cond,
  * This is always smaller than the SETCC expansion.
  */
 tcg_out_cmp(s, arg1, arg2, const_arg2, rexw);
-tgen_arithr(s, ARITH_SBB, dest, dest);  /* T:-1 F:0 */
-if (inv) {
-tgen_arithi(s, ARITH_ADD, dest, 1, 0);  /* T:0  F:1 */
-} else {
-tcg_out_modrm(s, OPC_GRP3_Ev, EXT3_NEG, dest);  /* T:1  F:0 */
+
+/* X - X - C = -C = (C ? -1 : 0) */
+tgen_arithr(s, ARITH_SBB + (neg ? rexw : 0), dest, dest);
+if (inv && neg) {
+/* ~(C ? -1 : 0) = (C ? 0 : -1) */
+tcg_out_modrm(s, OPC_GRP3_Ev + rexw, EXT3_NOT, dest);
+} else if (inv) {
+/* (C ? -1 : 0) + 1 = (C ? 0 : 1) */
+tgen_arithi(s, ARITH_ADD, dest, 1, 0);
+} else if (!neg) {
+/* -(C ? -1 : 0) = (C ? 1 : 0) */
+tcg_out_modrm(s, OPC_GRP3_Ev, EXT3_NEG, dest);
 }
 return;
 
@@ -1588,7 +1595,8 @@ static void tcg_out_setcond(TCGContext *s, int rexw, 
TCGCond cond,
 if (inv) {
 tcg_out_modrm(s, OPC_GRP3_Ev + rexw, EXT3_NOT, dest);
 }
-tcg_out_shifti(s, SHIFT_SHR + rexw, dest, rexw ? 63 : 31);
+tcg_out_shifti(s, (neg ? SHIFT_SAR : SHIFT_SHR) + rexw,
+   dest, rexw ? 63 : 31);
 return;
 }
 break;
@@ -1614,6 +1622,9 @@ static void tcg_out_setcond(TCGContext *s, int rexw, 
TCGCond cond,
 if (!cleared) {
 tcg_out_ext8u(s, dest, dest);
 }
+if (neg) {
+tcg_out_modrm(s, OPC_GRP3_Ev + rexw, EXT3_NEG, dest);
+}
 }
 
 #if TCG_TARGET_REG_BITS == 32
@@ -2632,7 +2643,10 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
arg_label(args[3]), 0);
 break;
 OP_32_64(setcond):
-tcg_out_setcond(s, rexw, args[3], a0, a1, a2, const_a2);
+tcg_out_setcond(s, rexw, args[3], a0, a1, a2, const_a2, false);
+break;
+OP_32_64(negsetcond):
+tcg_out_setcond(s, rexw, args[3], a0, a1, a2, const_a2, true);
 break;
 OP_32_64(movcond):
 tcg_out_movcond(s, rexw, args[5], a0, a1, a2, const_a2, args[3]);
@@ -3377,6 +3391,8 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 
 case INDEX_op_setcond_i32:
 case INDEX_op_setcond_i64:
+case INDEX_op_negsetcond_i32:
+case INDEX_op_negsetcond_i64:
 return C_O1_I2(q, r, re);
 
 case INDEX_op_movcond_i32:
-- 
2.34.1

[PULL 46/48] target/cris: Fix a typo in gen_swapr()

2023-08-23 Thread Richard Henderson

From: Philippe Mathieu-Daudé 

Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20230823145542.79633-9-phi...@linaro.org>
Signed-off-by: Richard Henderson 
---
 target/cris/translate.c | 20 +++-
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/target/cris/translate.c b/target/cris/translate.c
index 0b3d724281..42103b5558 100644
--- a/target/cris/translate.c
+++ b/target/cris/translate.c
@@ -411,15 +411,17 @@ static inline void t_gen_swapw(TCGv d, TCGv s)
 tcg_gen_or_tl(d, d, t);
 }
 
-/* Reverse the within each byte.
-   T0 = (((T0 << 7) & 0x80808080) |
-   ((T0 << 5) & 0x40404040) |
-   ((T0 << 3) & 0x20202020) |
-   ((T0 << 1) & 0x10101010) |
-   ((T0 >> 1) & 0x08080808) |
-   ((T0 >> 3) & 0x04040404) |
-   ((T0 >> 5) & 0x02020202) |
-   ((T0 >> 7) & 0x01010101));
+/*
+ * Reverse the bits within each byte.
+ *
+ *  T0 = ((T0 << 7) & 0x80808080)
+ * | ((T0 << 5) & 0x40404040)
+ * | ((T0 << 3) & 0x20202020)
+ * | ((T0 << 1) & 0x10101010)
+ * | ((T0 >> 1) & 0x08080808)
+ * | ((T0 >> 3) & 0x04040404)
+ * | ((T0 >> 5) & 0x02020202)
+ * | ((T0 >> 7) & 0x01010101);
  */
 static void t_gen_swapr(TCGv d, TCGv s)
 {
-- 
2.34.1

[PULL 17/48] tcg: Use tcg_gen_negsetcond_*

2023-08-23 Thread Richard Henderson

Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 tcg/tcg-op-gvec.c | 6 ++
 tcg/tcg-op.c  | 6 ++
 2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index a062239804..e260a07c61 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -3692,8 +3692,7 @@ static void expand_cmp_i32(uint32_t dofs, uint32_t aofs, 
uint32_t bofs,
 for (i = 0; i < oprsz; i += 4) {
 tcg_gen_ld_i32(t0, cpu_env, aofs + i);
 tcg_gen_ld_i32(t1, cpu_env, bofs + i);
-tcg_gen_setcond_i32(cond, t0, t0, t1);
-tcg_gen_neg_i32(t0, t0);
+tcg_gen_negsetcond_i32(cond, t0, t0, t1);
 tcg_gen_st_i32(t0, cpu_env, dofs + i);
 }
 tcg_temp_free_i32(t1);
@@ -3710,8 +3709,7 @@ static void expand_cmp_i64(uint32_t dofs, uint32_t aofs, 
uint32_t bofs,
 for (i = 0; i < oprsz; i += 8) {
 tcg_gen_ld_i64(t0, cpu_env, aofs + i);
 tcg_gen_ld_i64(t1, cpu_env, bofs + i);
-tcg_gen_setcond_i64(cond, t0, t0, t1);
-tcg_gen_neg_i64(t0, t0);
+tcg_gen_negsetcond_i64(cond, t0, t0, t1);
 tcg_gen_st_i64(t0, cpu_env, dofs + i);
 }
 tcg_temp_free_i64(t1);
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index a954004cff..b59a50a5a9 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -863,8 +863,7 @@ void tcg_gen_movcond_i32(TCGCond cond, TCGv_i32 ret, 
TCGv_i32 c1,
 } else {
 TCGv_i32 t0 = tcg_temp_ebb_new_i32();
 TCGv_i32 t1 = tcg_temp_ebb_new_i32();
-tcg_gen_setcond_i32(cond, t0, c1, c2);
-tcg_gen_neg_i32(t0, t0);
+tcg_gen_negsetcond_i32(cond, t0, c1, c2);
 tcg_gen_and_i32(t1, v1, t0);
 tcg_gen_andc_i32(ret, v2, t0);
 tcg_gen_or_i32(ret, ret, t1);
@@ -2563,8 +2562,7 @@ void tcg_gen_movcond_i64(TCGCond cond, TCGv_i64 ret, 
TCGv_i64 c1,
 } else {
 TCGv_i64 t0 = tcg_temp_ebb_new_i64();
 TCGv_i64 t1 = tcg_temp_ebb_new_i64();
-tcg_gen_setcond_i64(cond, t0, c1, c2);
-tcg_gen_neg_i64(t0, t0);
+tcg_gen_negsetcond_i64(cond, t0, c1, c2);
 tcg_gen_and_i64(t1, v1, t0);
 tcg_gen_andc_i64(ret, v2, t0);
 tcg_gen_or_i64(ret, ret, t1);
-- 
2.34.1

[PULL 40/48] tcg/tcg-op: Document bswap16_i64() byte pattern

2023-08-23 Thread Richard Henderson

From: Philippe Mathieu-Daudé 

Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20230823145542.79633-3-phi...@linaro.org>
Signed-off-by: Richard Henderson 
---
 tcg/tcg-op.c | 27 +++
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index fc3a0ab7fc..88c7c60ffe 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -1767,6 +1767,14 @@ void tcg_gen_ext32u_i64(TCGv_i64 ret, TCGv_i64 arg)
 }
 }
 
+/*
+ * bswap16_i64: 16-bit byte swap on the low bits of a 64-bit value.
+ *
+ * Byte pattern: ab -> ba
+ *
+ * With TCG_BSWAP_IZ, x == zero, else undefined.
+ * With TCG_BSWAP_OZ, y == zero, with TCG_BSWAP_OS y == sign, else undefined.
+ */
 void tcg_gen_bswap16_i64(TCGv_i64 ret, TCGv_i64 arg, int flags)
 {
 /* Only one extension flag may be present. */
@@ -1785,22 +1793,25 @@ void tcg_gen_bswap16_i64(TCGv_i64 ret, TCGv_i64 arg, 
int flags)
 TCGv_i64 t0 = tcg_temp_ebb_new_i64();
 TCGv_i64 t1 = tcg_temp_ebb_new_i64();
 
-tcg_gen_shri_i64(t0, arg, 8);
+/* arg = ..ab or xxab */
+tcg_gen_shri_i64(t0, arg, 8);   /*  t0 = ...a or .xxa */
 if (!(flags & TCG_BSWAP_IZ)) {
-tcg_gen_ext8u_i64(t0, t0);
+tcg_gen_ext8u_i64(t0, t0);  /*  t0 = ...a */
 }
 
 if (flags & TCG_BSWAP_OS) {
-tcg_gen_shli_i64(t1, arg, 56);
-tcg_gen_sari_i64(t1, t1, 48);
+tcg_gen_shli_i64(t1, arg, 56);  /*  t1 = b... */
+tcg_gen_sari_i64(t1, t1, 48);   /*  t1 = ssb. */
 } else if (flags & TCG_BSWAP_OZ) {
-tcg_gen_ext8u_i64(t1, arg);
-tcg_gen_shli_i64(t1, t1, 8);
+tcg_gen_ext8u_i64(t1, arg); /*  t1 = ...b */
+tcg_gen_shli_i64(t1, t1, 8);/*  t1 = ..b. */
 } else {
-tcg_gen_shli_i64(t1, arg, 8);
+tcg_gen_shli_i64(t1, arg, 8);   /*  t1 = xab. */
 }
 
-tcg_gen_or_i64(ret, t0, t1);
+tcg_gen_or_i64(ret, t0, t1);/* ret = ..ba (OZ) */
+/*   ssba (OS) */
+/*   xaba (no flag) */
 tcg_temp_free_i64(t0);
 tcg_temp_free_i64(t1);
 }
-- 
2.34.1

[PULL 13/48] tcg/i386: Allow immediate as input to deposit_*

2023-08-23 Thread Richard Henderson

We can use MOVB and MOVW with an immediate just as easily
as with a register input.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target-con-set.h |  2 +-
 tcg/i386/tcg-target.c.inc | 26 ++
 2 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/tcg/i386/tcg-target-con-set.h b/tcg/i386/tcg-target-con-set.h
index 3949d49538..7d00a7dde8 100644
--- a/tcg/i386/tcg-target-con-set.h
+++ b/tcg/i386/tcg-target-con-set.h
@@ -33,7 +33,7 @@ C_O1_I1(r, q)
 C_O1_I1(r, r)
 C_O1_I1(x, r)
 C_O1_I1(x, x)
-C_O1_I2(q, 0, q)
+C_O1_I2(q, 0, qi)
 C_O1_I2(q, r, re)
 C_O1_I2(r, 0, ci)
 C_O1_I2(r, 0, r)
diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index ba40dd0f4d..3045b56002 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -276,6 +276,7 @@ static bool tcg_target_const_match(int64_t val, TCGType 
type, int ct)
 #define OPC_MOVL_GvEv  (0x8b)  /* loads, more or less */
 #define OPC_MOVB_EvIz   (0xc6)
 #define OPC_MOVL_EvIz  (0xc7)
+#define OPC_MOVB_Ib (0xb0)
 #define OPC_MOVL_Iv (0xb8)
 #define OPC_MOVBE_GyMy  (0xf0 | P_EXT38)
 #define OPC_MOVBE_MyGy  (0xf1 | P_EXT38)
@@ -2750,13 +2751,30 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 OP_32_64(deposit):
 if (args[3] == 0 && args[4] == 8) {
 /* load bits 0..7 */
-tcg_out_modrm(s, OPC_MOVB_EvGv | P_REXB_R | P_REXB_RM, a2, a0);
+if (const_a2) {
+tcg_out_opc(s, OPC_MOVB_Ib | P_REXB_RM | LOWREGMASK(a0),
+0, a0, 0);
+tcg_out8(s, a2);
+} else {
+tcg_out_modrm(s, OPC_MOVB_EvGv | P_REXB_R | P_REXB_RM, a2, a0);
+}
 } else if (TCG_TARGET_REG_BITS == 32 && args[3] == 8 && args[4] == 8) {
 /* load bits 8..15 */
-tcg_out_modrm(s, OPC_MOVB_EvGv, a2, a0 + 4);
+if (const_a2) {
+tcg_out8(s, OPC_MOVB_Ib + a0 + 4);
+tcg_out8(s, a2);
+} else {
+tcg_out_modrm(s, OPC_MOVB_EvGv, a2, a0 + 4);
+}
 } else if (args[3] == 0 && args[4] == 16) {
 /* load bits 0..15 */
-tcg_out_modrm(s, OPC_MOVL_EvGv | P_DATA16, a2, a0);
+if (const_a2) {
+tcg_out_opc(s, OPC_MOVL_Iv | P_DATA16 | LOWREGMASK(a0),
+0, a0, 0);
+tcg_out16(s, a2);
+} else {
+tcg_out_modrm(s, OPC_MOVL_EvGv | P_DATA16, a2, a0);
+}
 } else {
 g_assert_not_reached();
 }
@@ -3311,7 +3329,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 
 case INDEX_op_deposit_i32:
 case INDEX_op_deposit_i64:
-return C_O1_I2(q, 0, q);
+return C_O1_I2(q, 0, qi);
 
 case INDEX_op_setcond_i32:
 case INDEX_op_setcond_i64:
-- 
2.34.1

[PULL 16/48] tcg: Introduce negsetcond opcodes

2023-08-23 Thread Richard Henderson

Introduce a new opcode for negative setcond.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 docs/devel/tcg-ops.rst   |  6 ++
 include/tcg/tcg-op-common.h  |  4 
 include/tcg/tcg-op.h |  2 ++
 include/tcg/tcg-opc.h|  2 ++
 include/tcg/tcg.h|  1 +
 tcg/aarch64/tcg-target.h |  2 ++
 tcg/arm/tcg-target.h |  1 +
 tcg/i386/tcg-target.h|  2 ++
 tcg/loongarch64/tcg-target.h |  3 +++
 tcg/mips/tcg-target.h|  2 ++
 tcg/ppc/tcg-target.h |  2 ++
 tcg/riscv/tcg-target.h   |  2 ++
 tcg/s390x/tcg-target.h   |  2 ++
 tcg/sparc64/tcg-target.h |  2 ++
 tcg/tci/tcg-target.h |  2 ++
 tcg/optimize.c   | 41 +++-
 tcg/tcg-op.c | 36 +++
 tcg/tcg.c|  6 ++
 18 files changed, 117 insertions(+), 1 deletion(-)

diff --git a/docs/devel/tcg-ops.rst b/docs/devel/tcg-ops.rst
index 53695e1623..9e2a931d85 100644
--- a/docs/devel/tcg-ops.rst
+++ b/docs/devel/tcg-ops.rst
@@ -498,6 +498,12 @@ Conditional moves
|
| Set *dest* to 1 if (*t1* *cond* *t2*) is true, otherwise set to 0.
 
+   * - negsetcond_i32/i64 *dest*, *t1*, *t2*, *cond*
+
+ - | *dest* = -(*t1* *cond* *t2*)
+   |
+   | Set *dest* to -1 if (*t1* *cond* *t2*) is true, otherwise set to 0.
+
* - movcond_i32/i64 *dest*, *c1*, *c2*, *v1*, *v2*, *cond*
 
  - | *dest* = (*c1* *cond* *c2* ? *v1* : *v2*)
diff --git a/include/tcg/tcg-op-common.h b/include/tcg/tcg-op-common.h
index be382bbf77..a53b15933b 100644
--- a/include/tcg/tcg-op-common.h
+++ b/include/tcg/tcg-op-common.h
@@ -344,6 +344,8 @@ void tcg_gen_setcond_i32(TCGCond cond, TCGv_i32 ret,
  TCGv_i32 arg1, TCGv_i32 arg2);
 void tcg_gen_setcondi_i32(TCGCond cond, TCGv_i32 ret,
   TCGv_i32 arg1, int32_t arg2);
+void tcg_gen_negsetcond_i32(TCGCond cond, TCGv_i32 ret,
+TCGv_i32 arg1, TCGv_i32 arg2);
 void tcg_gen_movcond_i32(TCGCond cond, TCGv_i32 ret, TCGv_i32 c1,
  TCGv_i32 c2, TCGv_i32 v1, TCGv_i32 v2);
 void tcg_gen_add2_i32(TCGv_i32 rl, TCGv_i32 rh, TCGv_i32 al,
@@ -540,6 +542,8 @@ void tcg_gen_setcond_i64(TCGCond cond, TCGv_i64 ret,
  TCGv_i64 arg1, TCGv_i64 arg2);
 void tcg_gen_setcondi_i64(TCGCond cond, TCGv_i64 ret,
   TCGv_i64 arg1, int64_t arg2);
+void tcg_gen_negsetcond_i64(TCGCond cond, TCGv_i64 ret,
+TCGv_i64 arg1, TCGv_i64 arg2);
 void tcg_gen_movcond_i64(TCGCond cond, TCGv_i64 ret, TCGv_i64 c1,
  TCGv_i64 c2, TCGv_i64 v1, TCGv_i64 v2);
 void tcg_gen_add2_i64(TCGv_i64 rl, TCGv_i64 rh, TCGv_i64 al,
diff --git a/include/tcg/tcg-op.h b/include/tcg/tcg-op.h
index d63683c47b..80cfcf8104 100644
--- a/include/tcg/tcg-op.h
+++ b/include/tcg/tcg-op.h
@@ -200,6 +200,7 @@ DEF_ATOMIC2(tcg_gen_atomic_umax_fetch, i64)
 #define tcg_gen_brcondi_tl tcg_gen_brcondi_i64
 #define tcg_gen_setcond_tl tcg_gen_setcond_i64
 #define tcg_gen_setcondi_tl tcg_gen_setcondi_i64
+#define tcg_gen_negsetcond_tl tcg_gen_negsetcond_i64
 #define tcg_gen_mul_tl tcg_gen_mul_i64
 #define tcg_gen_muli_tl tcg_gen_muli_i64
 #define tcg_gen_div_tl tcg_gen_div_i64
@@ -317,6 +318,7 @@ DEF_ATOMIC2(tcg_gen_atomic_umax_fetch, i64)
 #define tcg_gen_brcondi_tl tcg_gen_brcondi_i32
 #define tcg_gen_setcond_tl tcg_gen_setcond_i32
 #define tcg_gen_setcondi_tl tcg_gen_setcondi_i32
+#define tcg_gen_negsetcond_tl tcg_gen_negsetcond_i32
 #define tcg_gen_mul_tl tcg_gen_mul_i32
 #define tcg_gen_muli_tl tcg_gen_muli_i32
 #define tcg_gen_div_tl tcg_gen_div_i32
diff --git a/include/tcg/tcg-opc.h b/include/tcg/tcg-opc.h
index c64dfe558c..6eff3d9106 100644
--- a/include/tcg/tcg-opc.h
+++ b/include/tcg/tcg-opc.h
@@ -46,6 +46,7 @@ DEF(mb, 0, 0, 1, 0)
 
 DEF(mov_i32, 1, 1, 0, TCG_OPF_NOT_PRESENT)
 DEF(setcond_i32, 1, 2, 1, 0)
+DEF(negsetcond_i32, 1, 2, 1, IMPL(TCG_TARGET_HAS_negsetcond_i32))
 DEF(movcond_i32, 1, 4, 1, IMPL(TCG_TARGET_HAS_movcond_i32))
 /* load/store */
 DEF(ld8u_i32, 1, 1, 1, 0)
@@ -111,6 +112,7 @@ DEF(ctpop_i32, 1, 1, 0, IMPL(TCG_TARGET_HAS_ctpop_i32))
 
 DEF(mov_i64, 1, 1, 0, TCG_OPF_64BIT | TCG_OPF_NOT_PRESENT)
 DEF(setcond_i64, 1, 2, 1, IMPL64)
+DEF(negsetcond_i64, 1, 2, 1, IMPL64 | IMPL(TCG_TARGET_HAS_negsetcond_i64))
 DEF(movcond_i64, 1, 4, 1, IMPL64 | IMPL(TCG_TARGET_HAS_movcond_i64))
 /* load/store */
 DEF(ld8u_i64, 1, 1, 1, IMPL64)
diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index ea7e55eeb8..61d7c81b44 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -97,6 +97,7 @@ typedef uint64_t TCGRegSet;
 #define TCG_TARGET_HAS_sextract_i64 0
 #define TCG_TARGET_HAS_extract2_i64 0
 #define TCG_TARGET_HAS_movcond_i64  0
+#define TCG_TARGET_HAS_negsetcond_i64   0
 #define TCG_TARGET_HAS_add2_i64 0
 #define TCG_TARGET_HAS_sub2_i64 0
 #define TCG_TARGET_HAS_mulu2_i640

[PULL 15/48] tcg: Unify TCG_TARGET_HAS_extr[lh]_i64_i32

2023-08-23 Thread Richard Henderson

Replace the separate defines with TCG_TARGET_HAS_extr_i64_i32,
so that the two parts of backend-specific type changing cannot
be out of sync.

Reported-by: Philippe Mathieu-Daudé 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
Message-id: <20230822175127.1173698-1-richard.hender...@linaro.org>
---
 include/tcg/tcg-opc.h| 4 ++--
 include/tcg/tcg.h| 3 +--
 tcg/aarch64/tcg-target.h | 3 +--
 tcg/i386/tcg-target.h| 3 +--
 tcg/loongarch64/tcg-target.h | 3 +--
 tcg/mips/tcg-target.h| 3 +--
 tcg/ppc/tcg-target.h | 3 +--
 tcg/riscv/tcg-target.h   | 3 +--
 tcg/s390x/tcg-target.h   | 3 +--
 tcg/sparc64/tcg-target.h | 3 +--
 tcg/tci/tcg-target.h | 3 +--
 tcg/tcg-op.c | 4 ++--
 tcg/tcg.c| 3 +--
 13 files changed, 15 insertions(+), 26 deletions(-)

diff --git a/include/tcg/tcg-opc.h b/include/tcg/tcg-opc.h
index acfa5ba753..c64dfe558c 100644
--- a/include/tcg/tcg-opc.h
+++ b/include/tcg/tcg-opc.h
@@ -152,10 +152,10 @@ DEF(extract2_i64, 1, 2, 1, IMPL64 | 
IMPL(TCG_TARGET_HAS_extract2_i64))
 DEF(ext_i32_i64, 1, 1, 0, IMPL64)
 DEF(extu_i32_i64, 1, 1, 0, IMPL64)
 DEF(extrl_i64_i32, 1, 1, 0,
-IMPL(TCG_TARGET_HAS_extrl_i64_i32)
+IMPL(TCG_TARGET_HAS_extr_i64_i32)
 | (TCG_TARGET_REG_BITS == 32 ? TCG_OPF_NOT_PRESENT : 0))
 DEF(extrh_i64_i32, 1, 1, 0,
-IMPL(TCG_TARGET_HAS_extrh_i64_i32)
+IMPL(TCG_TARGET_HAS_extr_i64_i32)
 | (TCG_TARGET_REG_BITS == 32 ? TCG_OPF_NOT_PRESENT : 0))
 
 DEF(brcond_i64, 0, 2, 2, TCG_OPF_BB_END | TCG_OPF_COND_BRANCH | IMPL64)
diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index 0875971719..ea7e55eeb8 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -68,8 +68,7 @@ typedef uint64_t TCGRegSet;
 
 #if TCG_TARGET_REG_BITS == 32
 /* Turn some undef macros into false macros.  */
-#define TCG_TARGET_HAS_extrl_i64_i320
-#define TCG_TARGET_HAS_extrh_i64_i320
+#define TCG_TARGET_HAS_extr_i64_i32 0
 #define TCG_TARGET_HAS_div_i64  0
 #define TCG_TARGET_HAS_rem_i64  0
 #define TCG_TARGET_HAS_div2_i64 0
diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index ce64de06e5..12765cc281 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -92,8 +92,7 @@ typedef enum {
 #define TCG_TARGET_HAS_muls2_i320
 #define TCG_TARGET_HAS_muluh_i320
 #define TCG_TARGET_HAS_mulsh_i320
-#define TCG_TARGET_HAS_extrl_i64_i320
-#define TCG_TARGET_HAS_extrh_i64_i320
+#define TCG_TARGET_HAS_extr_i64_i32 0
 #define TCG_TARGET_HAS_qemu_st8_i32 0
 
 #define TCG_TARGET_HAS_div_i64  1
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 30cce01ca4..32dd795259 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -159,8 +159,7 @@ typedef enum {
 
 #if TCG_TARGET_REG_BITS == 64
 /* Keep 32-bit values zero-extended in a register.  */
-#define TCG_TARGET_HAS_extrl_i64_i321
-#define TCG_TARGET_HAS_extrh_i64_i321
+#define TCG_TARGET_HAS_extr_i64_i32 1
 #define TCG_TARGET_HAS_div2_i64 1
 #define TCG_TARGET_HAS_rot_i64  1
 #define TCG_TARGET_HAS_ext8s_i641
diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
index 26f1aab780..c94e0c6044 100644
--- a/tcg/loongarch64/tcg-target.h
+++ b/tcg/loongarch64/tcg-target.h
@@ -130,8 +130,7 @@ typedef enum {
 #define TCG_TARGET_HAS_extract_i64  1
 #define TCG_TARGET_HAS_sextract_i64 0
 #define TCG_TARGET_HAS_extract2_i64 0
-#define TCG_TARGET_HAS_extrl_i64_i321
-#define TCG_TARGET_HAS_extrh_i64_i321
+#define TCG_TARGET_HAS_extr_i64_i32 1
 #define TCG_TARGET_HAS_ext8s_i641
 #define TCG_TARGET_HAS_ext16s_i64   1
 #define TCG_TARGET_HAS_ext32s_i64   1
diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
index dd2efa795c..bdfa25bef4 100644
--- a/tcg/mips/tcg-target.h
+++ b/tcg/mips/tcg-target.h
@@ -132,8 +132,7 @@ extern bool use_mips32r2_instructions;
 #if TCG_TARGET_REG_BITS == 64
 #define TCG_TARGET_HAS_add2_i32 0
 #define TCG_TARGET_HAS_sub2_i32 0
-#define TCG_TARGET_HAS_extrl_i64_i321
-#define TCG_TARGET_HAS_extrh_i64_i321
+#define TCG_TARGET_HAS_extr_i64_i32 1
 #define TCG_TARGET_HAS_div_i64  1
 #define TCG_TARGET_HAS_rem_i64  1
 #define TCG_TARGET_HAS_not_i64  1
diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index 9a41fab8cc..37b54e6aeb 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -106,8 +106,7 @@ typedef enum {
 #if TCG_TARGET_REG_BITS == 64
 #define TCG_TARGET_HAS_add2_i32 0
 #define TCG_TARGET_HAS_sub2_i32 0
-#define TCG_TARGET_HAS_extrl_i64_i320
-#define TCG_TARGET_HAS_extrh_i64_i320
+#define TCG_TARGET_HAS_extr_i64_i32 0
 #define TCG_TARGET_HAS_div_i64  1
 #define TCG_TARGET_HAS_rem_i64  have_isa_3_00
 #define TCG_TARGET_HAS_rot_i64  1
diff --git a/tcg/riscv/tcg-target.h

[PULL 28/48] tcg/arm: Implement negsetcond_i32

2023-08-23 Thread Richard Henderson

Trivial, as we simply need to load a different constant
in the conditional move.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 tcg/arm/tcg-target.h | 2 +-
 tcg/arm/tcg-target.c.inc | 9 +
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index ad66f11574..311a985209 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -116,7 +116,7 @@ extern bool use_neon_instructions;
 #define TCG_TARGET_HAS_sextract_i32 use_armv7_instructions
 #define TCG_TARGET_HAS_extract2_i32 1
 #define TCG_TARGET_HAS_movcond_i32  1
-#define TCG_TARGET_HAS_negsetcond_i32   0
+#define TCG_TARGET_HAS_negsetcond_i32   1
 #define TCG_TARGET_HAS_mulu2_i321
 #define TCG_TARGET_HAS_muls2_i321
 #define TCG_TARGET_HAS_muluh_i320
diff --git a/tcg/arm/tcg-target.c.inc b/tcg/arm/tcg-target.c.inc
index 83e286088f..162df38c73 100644
--- a/tcg/arm/tcg-target.c.inc
+++ b/tcg/arm/tcg-target.c.inc
@@ -1975,6 +1975,14 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 tcg_out_dat_imm(s, tcg_cond_to_arm_cond[tcg_invert_cond(args[3])],
 ARITH_MOV, args[0], 0, 0);
 break;
+case INDEX_op_negsetcond_i32:
+tcg_out_dat_rIN(s, COND_AL, ARITH_CMP, ARITH_CMN, 0,
+args[1], args[2], const_args[2]);
+tcg_out_dat_imm(s, tcg_cond_to_arm_cond[args[3]],
+ARITH_MVN, args[0], 0, 0);
+tcg_out_dat_imm(s, tcg_cond_to_arm_cond[tcg_invert_cond(args[3])],
+ARITH_MOV, args[0], 0, 0);
+break;
 
 case INDEX_op_brcond2_i32:
 c = tcg_out_cmp2(s, args, const_args);
@@ -2112,6 +2120,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_add_i32:
 case INDEX_op_sub_i32:
 case INDEX_op_setcond_i32:
+case INDEX_op_negsetcond_i32:
 return C_O1_I2(r, r, rIN);
 
 case INDEX_op_and_i32:
-- 
2.34.1

[PULL 27/48] tcg/aarch64: Implement negsetcond_*

2023-08-23 Thread Richard Henderson

Trivial, as aarch64 has an instruction for this: CSETM.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 tcg/aarch64/tcg-target.h |  4 ++--
 tcg/aarch64/tcg-target.c.inc | 12 
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index bfa3e5aae9..98727ea53b 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -86,7 +86,7 @@ typedef enum {
 #define TCG_TARGET_HAS_sextract_i32 1
 #define TCG_TARGET_HAS_extract2_i32 1
 #define TCG_TARGET_HAS_movcond_i32  1
-#define TCG_TARGET_HAS_negsetcond_i32   0
+#define TCG_TARGET_HAS_negsetcond_i32   1
 #define TCG_TARGET_HAS_add2_i32 1
 #define TCG_TARGET_HAS_sub2_i32 1
 #define TCG_TARGET_HAS_mulu2_i320
@@ -123,7 +123,7 @@ typedef enum {
 #define TCG_TARGET_HAS_sextract_i64 1
 #define TCG_TARGET_HAS_extract2_i64 1
 #define TCG_TARGET_HAS_movcond_i64  1
-#define TCG_TARGET_HAS_negsetcond_i64   0
+#define TCG_TARGET_HAS_negsetcond_i64   1
 #define TCG_TARGET_HAS_add2_i64 1
 #define TCG_TARGET_HAS_sub2_i64 1
 #define TCG_TARGET_HAS_mulu2_i640
diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index 35ca80cd56..7d8d114c9e 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -2262,6 +2262,16 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
  TCG_REG_XZR, tcg_invert_cond(args[3]));
 break;
 
+case INDEX_op_negsetcond_i32:
+a2 = (int32_t)a2;
+/* FALLTHRU */
+case INDEX_op_negsetcond_i64:
+tcg_out_cmp(s, ext, a1, a2, c2);
+/* Use CSETM alias of CSINV Wd, WZR, WZR, invert(cond).  */
+tcg_out_insn(s, 3506, CSINV, ext, a0, TCG_REG_XZR,
+ TCG_REG_XZR, tcg_invert_cond(args[3]));
+break;
+
 case INDEX_op_movcond_i32:
 a2 = (int32_t)a2;
 /* FALLTHRU */
@@ -2868,6 +2878,8 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_sub_i64:
 case INDEX_op_setcond_i32:
 case INDEX_op_setcond_i64:
+case INDEX_op_negsetcond_i32:
+case INDEX_op_negsetcond_i64:
 return C_O1_I2(r, r, rA);
 
 case INDEX_op_mul_i32:
-- 
2.34.1

[PULL 30/48] tcg/s390x: Implement negsetcond_*

2023-08-23 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 tcg/s390x/tcg-target.h |  4 +-
 tcg/s390x/tcg-target.c.inc | 78 +-
 2 files changed, 54 insertions(+), 28 deletions(-)

diff --git a/tcg/s390x/tcg-target.h b/tcg/s390x/tcg-target.h
index 123a4b1645..50e12ef9d6 100644
--- a/tcg/s390x/tcg-target.h
+++ b/tcg/s390x/tcg-target.h
@@ -96,7 +96,7 @@ extern uint64_t s390_facilities[3];
 #define TCG_TARGET_HAS_sextract_i32   0
 #define TCG_TARGET_HAS_extract2_i32   0
 #define TCG_TARGET_HAS_movcond_i321
-#define TCG_TARGET_HAS_negsetcond_i32 0
+#define TCG_TARGET_HAS_negsetcond_i32 1
 #define TCG_TARGET_HAS_add2_i32   1
 #define TCG_TARGET_HAS_sub2_i32   1
 #define TCG_TARGET_HAS_mulu2_i32  0
@@ -132,7 +132,7 @@ extern uint64_t s390_facilities[3];
 #define TCG_TARGET_HAS_sextract_i64   0
 #define TCG_TARGET_HAS_extract2_i64   0
 #define TCG_TARGET_HAS_movcond_i641
-#define TCG_TARGET_HAS_negsetcond_i64 0
+#define TCG_TARGET_HAS_negsetcond_i64 1
 #define TCG_TARGET_HAS_add2_i64   1
 #define TCG_TARGET_HAS_sub2_i64   1
 #define TCG_TARGET_HAS_mulu2_i64  1
diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index a94f7908d6..ecd8aaf2a1 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -1266,7 +1266,8 @@ static int tgen_cmp(TCGContext *s, TCGType type, TCGCond 
c, TCGReg r1,
 }
 
 static void tgen_setcond(TCGContext *s, TCGType type, TCGCond cond,
- TCGReg dest, TCGReg c1, TCGArg c2, int c2const)
+ TCGReg dest, TCGReg c1, TCGArg c2,
+ bool c2const, bool neg)
 {
 int cc;
 
@@ -1275,11 +1276,27 @@ static void tgen_setcond(TCGContext *s, TCGType type, 
TCGCond cond,
 /* Emit: d = 0, d = (cc ? 1 : d).  */
 cc = tgen_cmp(s, type, cond, c1, c2, c2const, false);
 tcg_out_movi(s, TCG_TYPE_I64, dest, 0);
-tcg_out_insn(s, RIEg, LOCGHI, dest, 1, cc);
+tcg_out_insn(s, RIEg, LOCGHI, dest, neg ? -1 : 1, cc);
 return;
 }
 
- restart:
+switch (cond) {
+case TCG_COND_GEU:
+case TCG_COND_LTU:
+case TCG_COND_LT:
+case TCG_COND_GE:
+/* Swap operands so that we can use LEU/GTU/GT/LE.  */
+if (!c2const) {
+TCGReg t = c1;
+c1 = c2;
+c2 = t;
+cond = tcg_swap_cond(cond);
+}
+break;
+default:
+break;
+}
+
 switch (cond) {
 case TCG_COND_NE:
 /* X != 0 is X > 0.  */
@@ -1292,11 +1309,20 @@ static void tgen_setcond(TCGContext *s, TCGType type, 
TCGCond cond,
 
 case TCG_COND_GTU:
 case TCG_COND_GT:
-/* The result of a compare has CC=2 for GT and CC=3 unused.
-   ADD LOGICAL WITH CARRY considers (CC & 2) the carry bit.  */
+/*
+ * The result of a compare has CC=2 for GT and CC=3 unused.
+ * ADD LOGICAL WITH CARRY considers (CC & 2) the carry bit.
+ */
 tgen_cmp(s, type, cond, c1, c2, c2const, true);
 tcg_out_movi(s, type, dest, 0);
 tcg_out_insn(s, RRE, ALCGR, dest, dest);
+if (neg) {
+if (type == TCG_TYPE_I32) {
+tcg_out_insn(s, RR, LCR, dest, dest);
+} else {
+tcg_out_insn(s, RRE, LCGR, dest, dest);
+}
+}
 return;
 
 case TCG_COND_EQ:
@@ -1310,27 +1336,17 @@ static void tgen_setcond(TCGContext *s, TCGType type, 
TCGCond cond,
 
 case TCG_COND_LEU:
 case TCG_COND_LE:
-/* As above, but we're looking for borrow, or !carry.
-   The second insn computes d - d - borrow, or -1 for true
-   and 0 for false.  So we must mask to 1 bit afterward.  */
+/*
+ * As above, but we're looking for borrow, or !carry.
+ * The second insn computes d - d - borrow, or -1 for true
+ * and 0 for false.  So we must mask to 1 bit afterward.
+ */
 tgen_cmp(s, type, cond, c1, c2, c2const, true);
 tcg_out_insn(s, RRE, SLBGR, dest, dest);
-tgen_andi(s, type, dest, 1);
-return;
-
-case TCG_COND_GEU:
-case TCG_COND_LTU:
-case TCG_COND_LT:
-case TCG_COND_GE:
-/* Swap operands so that we can use LEU/GTU/GT/LE.  */
-if (!c2const) {
-TCGReg t = c1;
-c1 = c2;
-c2 = t;
-cond = tcg_swap_cond(cond);
-goto restart;
+if (!neg) {
+tgen_andi(s, type, dest, 1);
 }
-break;
+return;
 
 default:
 g_assert_not_reached();
@@ -1339,7 +1355,7 @@ static void tgen_setcond(TCGContext *s, TCGType type, 
TCGCond cond,
 cc = tgen_cmp(s, type, cond, c1, c2, c2const, false);
 /* Emit: d = 0, t = 1, d = (cc ? t : d).  */
 tcg_out_movi(s, TCG_TYPE_I64, dest, 0);
-tcg_out_movi(s, TCG_TYPE_I64, TCG_TMP0, 1);
+tcg_out_movi(s, TCG_TYPE_I64, TCG_TMP0, neg ? -1 : 1);
 tcg_out_insn(s, RRFc, LOCGR, dest, TCG_TMP0,

[PULL 02/48] accel/hvf: Widen pc/saved_insn for hvf_sw_breakpoint

2023-08-23 Thread Richard Henderson

From: Anton Johansson via 

Widens the pc and saved_insn fields of hvf_sw_breakpoint from
target_ulong to vaddr. Other hvf_* functions accessing hvf_sw_breakpoint
are also widened to match.

Signed-off-by: Anton Johansson 
Reviewed-by: Richard Henderson 
Message-Id: <20230807155706.9580-3-a...@rev.ng>
Signed-off-by: Richard Henderson 
---
 include/sysemu/hvf.h  | 6 +++---
 accel/hvf/hvf-accel-ops.c | 4 ++--
 accel/hvf/hvf-all.c   | 2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/sysemu/hvf.h b/include/sysemu/hvf.h
index 70549b9158..4cbae87ced 100644
--- a/include/sysemu/hvf.h
+++ b/include/sysemu/hvf.h
@@ -39,14 +39,14 @@ DECLARE_INSTANCE_CHECKER(HVFState, HVF_STATE,
 
 #ifdef NEED_CPU_H
 struct hvf_sw_breakpoint {
-target_ulong pc;
-target_ulong saved_insn;
+vaddr pc;
+vaddr saved_insn;
 int use_count;
 QTAILQ_ENTRY(hvf_sw_breakpoint) entry;
 };
 
 struct hvf_sw_breakpoint *hvf_find_sw_breakpoint(CPUState *cpu,
- target_ulong pc);
+ vaddr pc);
 int hvf_sw_breakpoints_active(CPUState *cpu);
 
 int hvf_arch_insert_sw_breakpoint(CPUState *cpu, struct hvf_sw_breakpoint *bp);
diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
index a44cf1c144..3c94c79747 100644
--- a/accel/hvf/hvf-accel-ops.c
+++ b/accel/hvf/hvf-accel-ops.c
@@ -474,7 +474,7 @@ static void hvf_start_vcpu_thread(CPUState *cpu)
cpu, QEMU_THREAD_JOINABLE);
 }
 
-static int hvf_insert_breakpoint(CPUState *cpu, int type, hwaddr addr, hwaddr 
len)
+static int hvf_insert_breakpoint(CPUState *cpu, int type, vaddr addr, vaddr 
len)
 {
 struct hvf_sw_breakpoint *bp;
 int err;
@@ -512,7 +512,7 @@ static int hvf_insert_breakpoint(CPUState *cpu, int type, 
hwaddr addr, hwaddr le
 return 0;
 }
 
-static int hvf_remove_breakpoint(CPUState *cpu, int type, hwaddr addr, hwaddr 
len)
+static int hvf_remove_breakpoint(CPUState *cpu, int type, vaddr addr, vaddr 
len)
 {
 struct hvf_sw_breakpoint *bp;
 int err;
diff --git a/accel/hvf/hvf-all.c b/accel/hvf/hvf-all.c
index 4920787af6..db05b81be5 100644
--- a/accel/hvf/hvf-all.c
+++ b/accel/hvf/hvf-all.c
@@ -51,7 +51,7 @@ void assert_hvf_ok(hv_return_t ret)
 abort();
 }
 
-struct hvf_sw_breakpoint *hvf_find_sw_breakpoint(CPUState *cpu, target_ulong 
pc)
+struct hvf_sw_breakpoint *hvf_find_sw_breakpoint(CPUState *cpu, vaddr pc)
 {
 struct hvf_sw_breakpoint *bp;
 
-- 
2.34.1

[PULL 29/48] tcg/riscv: Implement negsetcond_*

2023-08-23 Thread Richard Henderson

Reviewed-by: Daniel Henrique Barboza 
Signed-off-by: Richard Henderson 
---
 tcg/riscv/tcg-target.h |  4 ++--
 tcg/riscv/tcg-target.c.inc | 45 ++
 2 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index 0efb3fc0b8..c1132d178f 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -88,7 +88,7 @@ extern bool have_zbb;
 
 /* optional instructions */
 #define TCG_TARGET_HAS_movcond_i32  1
-#define TCG_TARGET_HAS_negsetcond_i32   0
+#define TCG_TARGET_HAS_negsetcond_i32   1
 #define TCG_TARGET_HAS_div_i32  1
 #define TCG_TARGET_HAS_rem_i32  1
 #define TCG_TARGET_HAS_div2_i32 0
@@ -124,7 +124,7 @@ extern bool have_zbb;
 #define TCG_TARGET_HAS_qemu_st8_i32 0
 
 #define TCG_TARGET_HAS_movcond_i64  1
-#define TCG_TARGET_HAS_negsetcond_i64   0
+#define TCG_TARGET_HAS_negsetcond_i64   1
 #define TCG_TARGET_HAS_div_i64  1
 #define TCG_TARGET_HAS_rem_i64  1
 #define TCG_TARGET_HAS_div2_i64 0
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index eeaeb6b6e3..232b616af3 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -936,6 +936,44 @@ static void tcg_out_setcond(TCGContext *s, TCGCond cond, 
TCGReg ret,
 }
 }
 
+static void tcg_out_negsetcond(TCGContext *s, TCGCond cond, TCGReg ret,
+   TCGReg arg1, tcg_target_long arg2, bool c2)
+{
+int tmpflags;
+TCGReg tmp;
+
+/* For LT/GE comparison against 0, replicate the sign bit. */
+if (c2 && arg2 == 0) {
+switch (cond) {
+case TCG_COND_GE:
+tcg_out_opc_imm(s, OPC_XORI, ret, arg1, -1);
+arg1 = ret;
+/* fall through */
+case TCG_COND_LT:
+tcg_out_opc_imm(s, OPC_SRAI, ret, arg1, TCG_TARGET_REG_BITS - 1);
+return;
+default:
+break;
+}
+}
+
+tmpflags = tcg_out_setcond_int(s, cond, ret, arg1, arg2, c2);
+tmp = tmpflags & ~SETCOND_FLAGS;
+
+/* If intermediate result is zero/non-zero: test != 0. */
+if (tmpflags & SETCOND_NEZ) {
+tcg_out_opc_reg(s, OPC_SLTU, ret, TCG_REG_ZERO, tmp);
+tmp = ret;
+}
+
+/* Produce the 0/-1 result. */
+if (tmpflags & SETCOND_INV) {
+tcg_out_opc_imm(s, OPC_ADDI, ret, tmp, -1);
+} else {
+tcg_out_opc_reg(s, OPC_SUB, ret, TCG_REG_ZERO, tmp);
+}
+}
+
 static void tcg_out_movcond_zicond(TCGContext *s, TCGReg ret, TCGReg test_ne,
int val1, bool c_val1,
int val2, bool c_val2)
@@ -1782,6 +1820,11 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 tcg_out_setcond(s, args[3], a0, a1, a2, c2);
 break;
 
+case INDEX_op_negsetcond_i32:
+case INDEX_op_negsetcond_i64:
+tcg_out_negsetcond(s, args[3], a0, a1, a2, c2);
+break;
+
 case INDEX_op_movcond_i32:
 case INDEX_op_movcond_i64:
 tcg_out_movcond(s, args[5], a0, a1, a2, c2,
@@ -1910,6 +1953,8 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_xor_i64:
 case INDEX_op_setcond_i32:
 case INDEX_op_setcond_i64:
+case INDEX_op_negsetcond_i32:
+case INDEX_op_negsetcond_i64:
 return C_O1_I2(r, r, rI);
 
 case INDEX_op_andc_i32:
-- 
2.34.1

[PULL 23/48] target/sparc: Use tcg_gen_movcond_i64 in gen_edge

2023-08-23 Thread Richard Henderson

The setcond + neg + or sequence is a complex method of
performing a conditional move.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/sparc/translate.c | 17 -
 1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/target/sparc/translate.c b/target/sparc/translate.c
index bd877a5e4a..fa80a91161 100644
--- a/target/sparc/translate.c
+++ b/target/sparc/translate.c
@@ -2916,7 +2916,7 @@ static void gen_edge(DisasContext *dc, TCGv dst, TCGv s1, 
TCGv s2,
 
 tcg_gen_shr_tl(lo1, tcg_constant_tl(tabl), lo1);
 tcg_gen_shr_tl(lo2, tcg_constant_tl(tabr), lo2);
-tcg_gen_andi_tl(dst, lo1, omask);
+tcg_gen_andi_tl(lo1, lo1, omask);
 tcg_gen_andi_tl(lo2, lo2, omask);
 
 amask = -8;
@@ -2926,18 +2926,9 @@ static void gen_edge(DisasContext *dc, TCGv dst, TCGv 
s1, TCGv s2,
 tcg_gen_andi_tl(s1, s1, amask);
 tcg_gen_andi_tl(s2, s2, amask);
 
-/* We want to compute
-dst = (s1 == s2 ? lo1 : lo1 & lo2).
-   We've already done dst = lo1, so this reduces to
-dst &= (s1 == s2 ? -1 : lo2)
-   Which we perform by
-lo2 |= -(s1 == s2)
-dst &= lo2
-*/
-tcg_gen_setcond_tl(TCG_COND_EQ, lo1, s1, s2);
-tcg_gen_neg_tl(lo1, lo1);
-tcg_gen_or_tl(lo2, lo2, lo1);
-tcg_gen_and_tl(dst, dst, lo2);
+/* Compute dst = (s1 == s2 ? lo1 : lo1 & lo2). */
+tcg_gen_and_tl(lo2, lo2, lo1);
+tcg_gen_movcond_tl(TCG_COND_EQ, dst, s1, s2, lo1, lo2);
 }
 
 static void gen_alignaddr(TCGv dst, TCGv s1, TCGv s2, bool left)
-- 
2.34.1

[PULL 20/48] target/m68k: Use tcg_gen_negsetcond_*

2023-08-23 Thread Richard Henderson

Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/m68k/translate.c | 24 ++--
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/target/m68k/translate.c b/target/m68k/translate.c
index d08e823b6c..15b3701b8f 100644
--- a/target/m68k/translate.c
+++ b/target/m68k/translate.c
@@ -1350,8 +1350,7 @@ static void gen_cc_cond(DisasCompare *c, DisasContext *s, 
int cond)
 case 14: /* GT (!(Z || (N ^ V))) */
 case 15: /* LE (Z || (N ^ V)) */
 c->v1 = tmp = tcg_temp_new();
-tcg_gen_setcond_i32(TCG_COND_EQ, tmp, QREG_CC_Z, c->v2);
-tcg_gen_neg_i32(tmp, tmp);
+tcg_gen_negsetcond_i32(TCG_COND_EQ, tmp, QREG_CC_Z, c->v2);
 tmp2 = tcg_temp_new();
 tcg_gen_xor_i32(tmp2, QREG_CC_N, QREG_CC_V);
 tcg_gen_or_i32(tmp, tmp, tmp2);
@@ -1430,9 +1429,8 @@ DISAS_INSN(scc)
 gen_cc_cond(, s, cond);
 
 tmp = tcg_temp_new();
-tcg_gen_setcond_i32(c.tcond, tmp, c.v1, c.v2);
+tcg_gen_negsetcond_i32(c.tcond, tmp, c.v1, c.v2);
 
-tcg_gen_neg_i32(tmp, tmp);
 DEST_EA(env, insn, OS_BYTE, tmp, NULL);
 }
 
@@ -2764,13 +2762,14 @@ DISAS_INSN(mull)
 tcg_gen_muls2_i32(QREG_CC_N, QREG_CC_V, src1, DREG(ext, 12));
 /* QREG_CC_V is -(QREG_CC_V != (QREG_CC_N >> 31)) */
 tcg_gen_sari_i32(QREG_CC_Z, QREG_CC_N, 31);
-tcg_gen_setcond_i32(TCG_COND_NE, QREG_CC_V, QREG_CC_V, QREG_CC_Z);
+tcg_gen_negsetcond_i32(TCG_COND_NE, QREG_CC_V,
+   QREG_CC_V, QREG_CC_Z);
 } else {
 tcg_gen_mulu2_i32(QREG_CC_N, QREG_CC_V, src1, DREG(ext, 12));
 /* QREG_CC_V is -(QREG_CC_V != 0), use QREG_CC_C as 0 */
-tcg_gen_setcond_i32(TCG_COND_NE, QREG_CC_V, QREG_CC_V, QREG_CC_C);
+tcg_gen_negsetcond_i32(TCG_COND_NE, QREG_CC_V,
+   QREG_CC_V, QREG_CC_C);
 }
-tcg_gen_neg_i32(QREG_CC_V, QREG_CC_V);
 tcg_gen_mov_i32(DREG(ext, 12), QREG_CC_N);
 
 tcg_gen_mov_i32(QREG_CC_Z, QREG_CC_N);
@@ -3339,14 +3338,13 @@ static inline void shift_im(DisasContext *s, uint16_t 
insn, int opsize)
 if (!logical && m68k_feature(s->env, M68K_FEATURE_M68K)) {
 /* if shift count >= bits, V is (reg != 0) */
 if (count >= bits) {
-tcg_gen_setcond_i32(TCG_COND_NE, QREG_CC_V, reg, QREG_CC_V);
+tcg_gen_negsetcond_i32(TCG_COND_NE, QREG_CC_V, reg, QREG_CC_V);
 } else {
 TCGv t0 = tcg_temp_new();
 tcg_gen_sari_i32(QREG_CC_V, reg, bits - 1);
 tcg_gen_sari_i32(t0, reg, bits - count - 1);
-tcg_gen_setcond_i32(TCG_COND_NE, QREG_CC_V, QREG_CC_V, t0);
+tcg_gen_negsetcond_i32(TCG_COND_NE, QREG_CC_V, QREG_CC_V, t0);
 }
-tcg_gen_neg_i32(QREG_CC_V, QREG_CC_V);
 }
 } else {
 tcg_gen_shri_i32(QREG_CC_C, reg, count - 1);
@@ -3430,9 +3428,8 @@ static inline void shift_reg(DisasContext *s, uint16_t 
insn, int opsize)
 /* Ignore the bits below the sign bit.  */
 tcg_gen_andi_i64(t64, t64, -1ULL << (bits - 1));
 /* If any bits remain set, we have overflow.  */
-tcg_gen_setcondi_i64(TCG_COND_NE, t64, t64, 0);
+tcg_gen_negsetcond_i64(TCG_COND_NE, t64, t64, tcg_constant_i64(0));
 tcg_gen_extrl_i64_i32(QREG_CC_V, t64);
-tcg_gen_neg_i32(QREG_CC_V, QREG_CC_V);
 }
 } else {
 tcg_gen_shli_i64(t64, t64, 32);
@@ -5311,9 +5308,8 @@ DISAS_INSN(fscc)
 gen_fcc_cond(, s, cond);
 
 tmp = tcg_temp_new();
-tcg_gen_setcond_i32(c.tcond, tmp, c.v1, c.v2);
+tcg_gen_negsetcond_i32(c.tcond, tmp, c.v1, c.v2);
 
-tcg_gen_neg_i32(tmp, tmp);
 DEST_EA(env, insn, OS_BYTE, tmp, NULL);
 }
 
-- 
2.34.1

[PULL 07/48] include/exec: Widen tlb_hit/tlb_hit_page()

2023-08-23 Thread Richard Henderson

From: Anton Johansson via 

tlb_addr is changed from target_ulong to uint64_t to match the type of
a CPUTLBEntry value, and the addressed is changed to vaddr.

Signed-off-by: Anton Johansson 
Reviewed-by: Richard Henderson 
Message-Id: <20230807155706.9580-8-a...@rev.ng>
Signed-off-by: Richard Henderson 
---
 include/exec/cpu-all.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index 94f44f1f59..c2c62160c6 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -397,7 +397,7 @@ QEMU_BUILD_BUG_ON(TLB_FLAGS_MASK & TLB_SLOW_FLAGS_MASK);
  * @addr: virtual address to test (must be page aligned)
  * @tlb_addr: TLB entry address (a CPUTLBEntry addr_read/write/code value)
  */
-static inline bool tlb_hit_page(target_ulong tlb_addr, target_ulong addr)
+static inline bool tlb_hit_page(uint64_t tlb_addr, vaddr addr)
 {
 return addr == (tlb_addr & (TARGET_PAGE_MASK | TLB_INVALID_MASK));
 }
@@ -408,7 +408,7 @@ static inline bool tlb_hit_page(target_ulong tlb_addr, 
target_ulong addr)
  * @addr: virtual address to test (need not be page aligned)
  * @tlb_addr: TLB entry address (a CPUTLBEntry addr_read/write/code value)
  */
-static inline bool tlb_hit(target_ulong tlb_addr, target_ulong addr)
+static inline bool tlb_hit(uint64_t tlb_addr, vaddr addr)
 {
 return tlb_hit_page(tlb_addr, addr & TARGET_PAGE_MASK);
 }
-- 
2.34.1

[PULL 22/48] target/ppc: Use tcg_gen_negsetcond_*

2023-08-23 Thread Richard Henderson

Tested-by: Nicholas Piggin 
Reviewed-by: Nicholas Piggin 
Reviewed-by: Daniel Henrique Barboza 
Signed-off-by: Richard Henderson 
---
 target/ppc/translate/fixedpoint-impl.c.inc | 6 --
 target/ppc/translate/vmx-impl.c.inc| 8 +++-
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/target/ppc/translate/fixedpoint-impl.c.inc 
b/target/ppc/translate/fixedpoint-impl.c.inc
index f47f1a50e8..4ce02fd3a4 100644
--- a/target/ppc/translate/fixedpoint-impl.c.inc
+++ b/target/ppc/translate/fixedpoint-impl.c.inc
@@ -342,12 +342,14 @@ static bool do_set_bool_cond(DisasContext *ctx, arg_X_bi 
*a, bool neg, bool rev)
 uint32_t mask = 0x08 >> (a->bi & 0x03);
 TCGCond cond = rev ? TCG_COND_EQ : TCG_COND_NE;
 TCGv temp = tcg_temp_new();
+TCGv zero = tcg_constant_tl(0);
 
 tcg_gen_extu_i32_tl(temp, cpu_crf[a->bi >> 2]);
 tcg_gen_andi_tl(temp, temp, mask);
-tcg_gen_setcondi_tl(cond, cpu_gpr[a->rt], temp, 0);
 if (neg) {
-tcg_gen_neg_tl(cpu_gpr[a->rt], cpu_gpr[a->rt]);
+tcg_gen_negsetcond_tl(cond, cpu_gpr[a->rt], temp, zero);
+} else {
+tcg_gen_setcond_tl(cond, cpu_gpr[a->rt], temp, zero);
 }
 return true;
 }
diff --git a/target/ppc/translate/vmx-impl.c.inc 
b/target/ppc/translate/vmx-impl.c.inc
index c8712dd7d8..6d7669aabd 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -1341,8 +1341,7 @@ static bool trans_VCMPEQUQ(DisasContext *ctx, arg_VC *a)
 tcg_gen_xor_i64(t1, t0, t1);
 
 tcg_gen_or_i64(t1, t1, t2);
-tcg_gen_setcondi_i64(TCG_COND_EQ, t1, t1, 0);
-tcg_gen_neg_i64(t1, t1);
+tcg_gen_negsetcond_i64(TCG_COND_EQ, t1, t1, tcg_constant_i64(0));
 
 set_avr64(a->vrt, t1, true);
 set_avr64(a->vrt, t1, false);
@@ -1365,15 +1364,14 @@ static bool do_vcmpgtq(DisasContext *ctx, arg_VC *a, 
bool sign)
 
 get_avr64(t0, a->vra, false);
 get_avr64(t1, a->vrb, false);
-tcg_gen_setcond_i64(TCG_COND_GTU, t2, t0, t1);
+tcg_gen_negsetcond_i64(TCG_COND_GTU, t2, t0, t1);
 
 get_avr64(t0, a->vra, true);
 get_avr64(t1, a->vrb, true);
 tcg_gen_movcond_i64(TCG_COND_EQ, t2, t0, t1, t2, tcg_constant_i64(0));
-tcg_gen_setcond_i64(sign ? TCG_COND_GT : TCG_COND_GTU, t1, t0, t1);
+tcg_gen_negsetcond_i64(sign ? TCG_COND_GT : TCG_COND_GTU, t1, t0, t1);
 
 tcg_gen_or_i64(t1, t1, t2);
-tcg_gen_neg_i64(t1, t1);
 
 set_avr64(a->vrt, t1, true);
 set_avr64(a->vrt, t1, false);
-- 
2.34.1

[PULL 39/48] tcg/tcg-op: Document bswap16_i32() byte pattern

2023-08-23 Thread Richard Henderson

From: Philippe Mathieu-Daudé 

Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20230823145542.79633-2-phi...@linaro.org>
Signed-off-by: Richard Henderson 
---
 tcg/tcg-op.c | 27 +++
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index b59a50a5a9..fc3a0ab7fc 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -1035,6 +1035,14 @@ void tcg_gen_ext16u_i32(TCGv_i32 ret, TCGv_i32 arg)
 }
 }
 
+/*
+ * bswap16_i32: 16-bit byte swap on the low bits of a 32-bit value.
+ *
+ * Byte pattern: xxab -> yyba
+ *
+ * With TCG_BSWAP_IZ, x == zero, else undefined.
+ * With TCG_BSWAP_OZ, y == zero, with TCG_BSWAP_OS y == sign, else undefined.
+ */
 void tcg_gen_bswap16_i32(TCGv_i32 ret, TCGv_i32 arg, int flags)
 {
 /* Only one extension flag may be present. */
@@ -1046,22 +1054,25 @@ void tcg_gen_bswap16_i32(TCGv_i32 ret, TCGv_i32 arg, 
int flags)
 TCGv_i32 t0 = tcg_temp_ebb_new_i32();
 TCGv_i32 t1 = tcg_temp_ebb_new_i32();
 
-tcg_gen_shri_i32(t0, arg, 8);
+/* arg = ..ab (IZ) xxab (!IZ) */
+tcg_gen_shri_i32(t0, arg, 8);   /*  t0 = ...a (IZ) .xxa (!IZ) */
 if (!(flags & TCG_BSWAP_IZ)) {
-tcg_gen_ext8u_i32(t0, t0);
+tcg_gen_ext8u_i32(t0, t0);  /*  t0 = ...a */
 }
 
 if (flags & TCG_BSWAP_OS) {
-tcg_gen_shli_i32(t1, arg, 24);
-tcg_gen_sari_i32(t1, t1, 16);
+tcg_gen_shli_i32(t1, arg, 24);  /*  t1 = b... */
+tcg_gen_sari_i32(t1, t1, 16);   /*  t1 = ssb. */
 } else if (flags & TCG_BSWAP_OZ) {
-tcg_gen_ext8u_i32(t1, arg);
-tcg_gen_shli_i32(t1, t1, 8);
+tcg_gen_ext8u_i32(t1, arg); /*  t1 = ...b */
+tcg_gen_shli_i32(t1, t1, 8);/*  t1 = ..b. */
 } else {
-tcg_gen_shli_i32(t1, arg, 8);
+tcg_gen_shli_i32(t1, arg, 8);   /*  t1 = xab. */
 }
 
-tcg_gen_or_i32(ret, t0, t1);
+tcg_gen_or_i32(ret, t0, t1);/* ret = ..ba (OZ) */
+/* = ssba (OS) */
+/* = xaba (no flag) */
 tcg_temp_free_i32(t0);
 tcg_temp_free_i32(t1);
 }
-- 
2.34.1

[PULL 24/48] target/tricore: Replace gen_cond_w with tcg_gen_negsetcond_tl

2023-08-23 Thread Richard Henderson

Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Bastian Koppelmann 
Signed-off-by: Richard Henderson 
---
 target/tricore/translate.c | 16 ++--
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/target/tricore/translate.c b/target/tricore/translate.c
index 1947733870..6ae5ccbf72 100644
--- a/target/tricore/translate.c
+++ b/target/tricore/translate.c
@@ -2680,13 +2680,6 @@ gen_accumulating_condi(int cond, TCGv ret, TCGv r1, 
int32_t con,
 gen_accumulating_cond(cond, ret, r1, temp, op);
 }
 
-/* ret = (r1 cond r2) ? 0x ? 0x;*/
-static inline void gen_cond_w(TCGCond cond, TCGv ret, TCGv r1, TCGv r2)
-{
-tcg_gen_setcond_tl(cond, ret, r1, r2);
-tcg_gen_neg_tl(ret, ret);
-}
-
 static inline void gen_eqany_bi(TCGv ret, TCGv r1, int32_t con)
 {
 TCGv b0 = tcg_temp_new();
@@ -5692,7 +5685,8 @@ static void decode_rr_accumulator(DisasContext *ctx)
 gen_helper_eq_h(cpu_gpr_d[r3], cpu_gpr_d[r1], cpu_gpr_d[r2]);
 break;
 case OPC2_32_RR_EQ_W:
-gen_cond_w(TCG_COND_EQ, cpu_gpr_d[r3], cpu_gpr_d[r1], cpu_gpr_d[r2]);
+tcg_gen_negsetcond_tl(TCG_COND_EQ, cpu_gpr_d[r3],
+  cpu_gpr_d[r1], cpu_gpr_d[r2]);
 break;
 case OPC2_32_RR_EQANY_B:
 gen_helper_eqany_b(cpu_gpr_d[r3], cpu_gpr_d[r1], cpu_gpr_d[r2]);
@@ -5729,10 +5723,12 @@ static void decode_rr_accumulator(DisasContext *ctx)
 gen_helper_lt_hu(cpu_gpr_d[r3], cpu_gpr_d[r1], cpu_gpr_d[r2]);
 break;
 case OPC2_32_RR_LT_W:
-gen_cond_w(TCG_COND_LT, cpu_gpr_d[r3], cpu_gpr_d[r1], cpu_gpr_d[r2]);
+tcg_gen_negsetcond_tl(TCG_COND_LT, cpu_gpr_d[r3],
+  cpu_gpr_d[r1], cpu_gpr_d[r2]);
 break;
 case OPC2_32_RR_LT_WU:
-gen_cond_w(TCG_COND_LTU, cpu_gpr_d[r3], cpu_gpr_d[r1], cpu_gpr_d[r2]);
+tcg_gen_negsetcond_tl(TCG_COND_LTU, cpu_gpr_d[r3],
+  cpu_gpr_d[r1], cpu_gpr_d[r2]);
 break;
 case OPC2_32_RR_MAX:
 tcg_gen_movcond_tl(TCG_COND_GT, cpu_gpr_d[r3], cpu_gpr_d[r1],
-- 
2.34.1

[PULL 37/48] tcg/i386: Use shift in tcg_out_setcond

2023-08-23 Thread Richard Henderson

For LT/GE vs zero, shift down the sign bit.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.c.inc | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 3f3c114efd..16e830051d 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -1578,6 +1578,21 @@ static void tcg_out_setcond(TCGContext *s, int rexw, 
TCGCond cond,
 }
 return;
 
+case TCG_COND_GE:
+inv = true;
+/* fall through */
+case TCG_COND_LT:
+/* If arg2 is 0, extract the sign bit. */
+if (const_arg2 && arg2 == 0) {
+tcg_out_mov(s, rexw ? TCG_TYPE_I64 : TCG_TYPE_I32, dest, arg1);
+if (inv) {
+tcg_out_modrm(s, OPC_GRP3_Ev + rexw, EXT3_NOT, dest);
+}
+tcg_out_shifti(s, SHIFT_SHR + rexw, dest, rexw ? 63 : 31);
+return;
+}
+break;
+
 default:
 break;
 }
-- 
2.34.1

[PULL 06/48] include/exec: typedef abi_ptr to vaddr in softmmu

2023-08-23 Thread Richard Henderson

From: Anton Johansson via 

In system mode, abi_ptr is primarily used for representing addresses
when accessing guest memory with cpu_[st|ld]*(). Widening it from
target_ulong to vaddr reduces the target dependence of these functions
and is step towards building accel/ once for system mode.

Signed-off-by: Anton Johansson 
Reviewed-by: Richard Henderson 
Message-Id: <20230807155706.9580-7-a...@rev.ng>
Signed-off-by: Richard Henderson 
---
 include/exec/cpu_ldst.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/exec/cpu_ldst.h b/include/exec/cpu_ldst.h
index da10ba1433..f3ce4eb1d0 100644
--- a/include/exec/cpu_ldst.h
+++ b/include/exec/cpu_ldst.h
@@ -121,8 +121,8 @@ static inline bool guest_range_valid_untagged(abi_ulong 
start, abi_ulong len)
 h2g_nocheck(x); \
 })
 #else
-typedef target_ulong abi_ptr;
-#define TARGET_ABI_FMT_ptr TARGET_FMT_lx
+typedef vaddr abi_ptr;
+#define TARGET_ABI_FMT_ptr "%016" VADDR_PRIx
 #endif
 
 uint32_t cpu_ldub_data(CPUArchState *env, abi_ptr ptr);
-- 
2.34.1

[PULL 26/48] tcg/ppc: Use the Set Boolean Extension

2023-08-23 Thread Richard Henderson

The SETBC family of instructions requires exactly two insns for
all comparisions, saving 0-3 insns per (neg)setcond.

Tested-by: Nicholas Piggin 
Reviewed-by: Daniel Henrique Barboza 
Signed-off-by: Richard Henderson 
---
 tcg/ppc/tcg-target.c.inc | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index 10448aa0e6..090f11e71c 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -447,6 +447,11 @@ static bool tcg_target_const_match(int64_t val, TCGType 
type, int ct)
 #define TW XO31( 4)
 #define TRAP   (TW | TO(31))
 
+#define SETBCXO31(384)  /* v3.10 */
+#define SETBCR   XO31(416)  /* v3.10 */
+#define SETNBC   XO31(448)  /* v3.10 */
+#define SETNBCR  XO31(480)  /* v3.10 */
+
 #define NOPORI  /* ori 0,0,0 */
 
 #define LVXXO31(103)
@@ -1624,6 +1629,23 @@ static void tcg_out_setcond(TCGContext *s, TCGType type, 
TCGCond cond,
 arg2 = (uint32_t)arg2;
 }
 
+/* With SETBC/SETBCR, we can always implement with 2 insns. */
+if (have_isa_3_10) {
+tcg_insn_unit bi, opc;
+
+tcg_out_cmp(s, cond, arg1, arg2, const_arg2, 7, type);
+
+/* Re-use tcg_to_bc for BI and BO_COND_{TRUE,FALSE}. */
+bi = tcg_to_bc[cond] & (0x1f << 16);
+if (tcg_to_bc[cond] & BO(8)) {
+opc = neg ? SETNBC : SETBC;
+} else {
+opc = neg ? SETNBCR : SETBCR;
+}
+tcg_out32(s, opc | RT(arg0) | bi);
+return;
+}
+
 /* Handle common and trivial cases before handling anything else.  */
 if (arg2 == 0) {
 switch (cond) {
-- 
2.34.1

[PULL 35/48] tcg/i386: Use CMP+SBB in tcg_out_setcond

2023-08-23 Thread Richard Henderson

Use the carry bit to optimize some forms of setcond.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.c.inc | 50 +++
 1 file changed, 50 insertions(+)

diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 1542afd94d..4d7b745a52 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -1531,6 +1531,56 @@ static void tcg_out_setcond(TCGContext *s, int rexw, 
TCGCond cond,
 TCGArg dest, TCGArg arg1, TCGArg arg2,
 int const_arg2)
 {
+bool inv = false;
+
+switch (cond) {
+case TCG_COND_NE:
+inv = true;
+/* fall through */
+case TCG_COND_EQ:
+/* If arg2 is 0, convert to LTU/GEU vs 1. */
+if (const_arg2 && arg2 == 0) {
+arg2 = 1;
+goto do_ltu;
+}
+break;
+
+case TCG_COND_LEU:
+inv = true;
+/* fall through */
+case TCG_COND_GTU:
+/* If arg2 is a register, swap for LTU/GEU. */
+if (!const_arg2) {
+TCGReg t = arg1;
+arg1 = arg2;
+arg2 = t;
+goto do_ltu;
+}
+break;
+
+case TCG_COND_GEU:
+inv = true;
+/* fall through */
+case TCG_COND_LTU:
+do_ltu:
+/*
+ * Relying on the carry bit, use SBB to produce -1 if LTU, 0 if GEU.
+ * We can then use NEG or INC to produce the desired result.
+ * This is always smaller than the SETCC expansion.
+ */
+tcg_out_cmp(s, arg1, arg2, const_arg2, rexw);
+tgen_arithr(s, ARITH_SBB, dest, dest);  /* T:-1 F:0 */
+if (inv) {
+tgen_arithi(s, ARITH_ADD, dest, 1, 0);  /* T:0  F:1 */
+} else {
+tcg_out_modrm(s, OPC_GRP3_Ev, EXT3_NEG, dest);  /* T:1  F:0 */
+}
+return;
+
+default:
+break;
+}
+
 tcg_out_cmp(s, arg1, arg2, const_arg2, rexw);
 tcg_out_modrm(s, OPC_SETCC | tcg_cond_to_jcc[cond], 0, dest);
 tcg_out_ext8u(s, dest, dest);
-- 
2.34.1

[PULL 36/48] tcg/i386: Clear dest first in tcg_out_setcond if possible

2023-08-23 Thread Richard Henderson

Using XOR first is both smaller and more efficient,
though cannot be applied if it clobbers an input.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.c.inc | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 4d7b745a52..3f3c114efd 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -1532,6 +1532,7 @@ static void tcg_out_setcond(TCGContext *s, int rexw, 
TCGCond cond,
 int const_arg2)
 {
 bool inv = false;
+bool cleared;
 
 switch (cond) {
 case TCG_COND_NE:
@@ -1581,9 +1582,23 @@ static void tcg_out_setcond(TCGContext *s, int rexw, 
TCGCond cond,
 break;
 }
 
+/*
+ * If dest does not overlap the inputs, clearing it first is preferred.
+ * The XOR breaks any false dependency for the low-byte write to dest,
+ * and is also one byte smaller than MOVZBL.
+ */
+cleared = false;
+if (dest != arg1 && (const_arg2 || dest != arg2)) {
+tgen_arithr(s, ARITH_XOR, dest, dest);
+cleared = true;
+}
+
 tcg_out_cmp(s, arg1, arg2, const_arg2, rexw);
 tcg_out_modrm(s, OPC_SETCC | tcg_cond_to_jcc[cond], 0, dest);
-tcg_out_ext8u(s, dest, dest);
+
+if (!cleared) {
+tcg_out_ext8u(s, dest, dest);
+}
 }
 
 #if TCG_TARGET_REG_BITS == 32
-- 
2.34.1

[PULL 25/48] tcg/ppc: Implement negsetcond_*

2023-08-23 Thread Richard Henderson

In the general case we simply negate.  However with isel we
may load -1 instead of 1 with no extra effort.

Consolidate EQ0 and NE0 logic.  Replace the NE0 zero-extension
with inversion+negation of EQ0, which is never worse and may
eliminate one insn.  Provide a special case for -EQ0.

Reviewed-by: Daniel Henrique Barboza 
Signed-off-by: Richard Henderson 
---
 tcg/ppc/tcg-target.h |   4 +-
 tcg/ppc/tcg-target.c.inc | 127 ---
 2 files changed, 82 insertions(+), 49 deletions(-)

diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index a2ca0b44ce..8bfb14998e 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -97,7 +97,7 @@ typedef enum {
 #define TCG_TARGET_HAS_sextract_i32 0
 #define TCG_TARGET_HAS_extract2_i32 0
 #define TCG_TARGET_HAS_movcond_i32  1
-#define TCG_TARGET_HAS_negsetcond_i32   0
+#define TCG_TARGET_HAS_negsetcond_i32   1
 #define TCG_TARGET_HAS_mulu2_i320
 #define TCG_TARGET_HAS_muls2_i320
 #define TCG_TARGET_HAS_muluh_i321
@@ -135,7 +135,7 @@ typedef enum {
 #define TCG_TARGET_HAS_sextract_i64 0
 #define TCG_TARGET_HAS_extract2_i64 0
 #define TCG_TARGET_HAS_movcond_i64  1
-#define TCG_TARGET_HAS_negsetcond_i64   0
+#define TCG_TARGET_HAS_negsetcond_i64   1
 #define TCG_TARGET_HAS_add2_i64 1
 #define TCG_TARGET_HAS_sub2_i64 1
 #define TCG_TARGET_HAS_mulu2_i640
diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index 511e14b180..10448aa0e6 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -1548,8 +1548,20 @@ static void tcg_out_cmp(TCGContext *s, int cond, TCGArg 
arg1, TCGArg arg2,
 }
 
 static void tcg_out_setcond_eq0(TCGContext *s, TCGType type,
-TCGReg dst, TCGReg src)
+TCGReg dst, TCGReg src, bool neg)
 {
+if (neg && (TCG_TARGET_REG_BITS == 32 || type == TCG_TYPE_I64)) {
+/*
+ * X != 0 implies X + -1 generates a carry.
+ * RT = (~X + X) + CA
+ *= -1 + CA
+ *= CA ? 0 : -1
+ */
+tcg_out32(s, ADDIC | TAI(TCG_REG_R0, src, -1));
+tcg_out32(s, SUBFE | TAB(dst, src, src));
+return;
+}
+
 if (type == TCG_TYPE_I32) {
 tcg_out32(s, CNTLZW | RS(src) | RA(dst));
 tcg_out_shri32(s, dst, dst, 5);
@@ -1557,18 +1569,28 @@ static void tcg_out_setcond_eq0(TCGContext *s, TCGType 
type,
 tcg_out32(s, CNTLZD | RS(src) | RA(dst));
 tcg_out_shri64(s, dst, dst, 6);
 }
+if (neg) {
+tcg_out32(s, NEG | RT(dst) | RA(dst));
+}
 }
 
-static void tcg_out_setcond_ne0(TCGContext *s, TCGReg dst, TCGReg src)
+static void tcg_out_setcond_ne0(TCGContext *s, TCGType type,
+TCGReg dst, TCGReg src, bool neg)
 {
-/* X != 0 implies X + -1 generates a carry.  Extra addition
-   trickery means: R = X-1 + ~X + C = X-1 + (-X+1) + C = C.  */
-if (dst != src) {
-tcg_out32(s, ADDIC | TAI(dst, src, -1));
-tcg_out32(s, SUBFE | TAB(dst, dst, src));
-} else {
+if (!neg && (TCG_TARGET_REG_BITS == 32 || type == TCG_TYPE_I64)) {
+/*
+ * X != 0 implies X + -1 generates a carry.  Extra addition
+ * trickery means: R = X-1 + ~X + C = X-1 + (-X+1) + C = C.
+ */
 tcg_out32(s, ADDIC | TAI(TCG_REG_R0, src, -1));
 tcg_out32(s, SUBFE | TAB(dst, TCG_REG_R0, src));
+return;
+}
+tcg_out_setcond_eq0(s, type, dst, src, false);
+if (neg) {
+tcg_out32(s, ADDI | TAI(dst, dst, -1));
+} else {
+tcg_out_xori32(s, dst, dst, 1);
 }
 }
 
@@ -1590,9 +1612,10 @@ static TCGReg tcg_gen_setcond_xor(TCGContext *s, TCGReg 
arg1, TCGArg arg2,
 
 static void tcg_out_setcond(TCGContext *s, TCGType type, TCGCond cond,
 TCGArg arg0, TCGArg arg1, TCGArg arg2,
-int const_arg2)
+int const_arg2, bool neg)
 {
-int crop, sh;
+int sh;
+bool inv;
 
 tcg_debug_assert(TCG_TARGET_REG_BITS == 64 || type == TCG_TYPE_I32);
 
@@ -1605,14 +1628,10 @@ static void tcg_out_setcond(TCGContext *s, TCGType 
type, TCGCond cond,
 if (arg2 == 0) {
 switch (cond) {
 case TCG_COND_EQ:
-tcg_out_setcond_eq0(s, type, arg0, arg1);
+tcg_out_setcond_eq0(s, type, arg0, arg1, neg);
 return;
 case TCG_COND_NE:
-if (TCG_TARGET_REG_BITS == 64 && type == TCG_TYPE_I32) {
-tcg_out_ext32u(s, TCG_REG_R0, arg1);
-arg1 = TCG_REG_R0;
-}
-tcg_out_setcond_ne0(s, arg0, arg1);
+tcg_out_setcond_ne0(s, type, arg0, arg1, neg);
 return;
 case TCG_COND_GE:
 tcg_out32(s, NOR | SAB(arg1, arg0, arg1));
@@ -1621,9 +1640,17 @@ static void tcg_out_setcond(TCGContext *s, TCGType type, 
TCGCond cond,
 case TCG_COND_LT:
 /*

[PULL 08/48] accel/tcg: Widen address arg in tlb_compare_set()

2023-08-23 Thread Richard Henderson

From: Anton Johansson via 

Signed-off-by: Anton Johansson 
Reviewed-by: Richard Henderson 
Message-Id: <20230807155706.9580-9-a...@rev.ng>
Signed-off-by: Richard Henderson 
---
 accel/tcg/cputlb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 11095c4f5f..bd2cf4b0be 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -1108,7 +1108,7 @@ static void tlb_add_large_page(CPUArchState *env, int 
mmu_idx,
 }
 
 static inline void tlb_set_compare(CPUTLBEntryFull *full, CPUTLBEntry *ent,
-   target_ulong address, int flags,
+   vaddr address, int flags,
MMUAccessType access_type, bool enable)
 {
 if (enable) {
-- 
2.34.1

[PULL 04/48] sysemu/hvf: Use vaddr for hvf_arch_[insert|remove]_hw_breakpoint

2023-08-23 Thread Richard Henderson

From: Anton Johansson via 

Changes the signature of the target-defined functions for
inserting/removing hvf hw breakpoints. The address and length arguments
are now of vaddr type, which both matches the type used internally in
accel/hvf/hvf-all.c and makes the api target-agnostic.

Signed-off-by: Anton Johansson 
Reviewed-by: Richard Henderson 
Message-Id: <20230807155706.9580-5-a...@rev.ng>
Signed-off-by: Richard Henderson 
---
 include/sysemu/hvf.h  | 6 ++
 target/arm/hvf/hvf.c  | 4 ++--
 target/i386/hvf/hvf.c | 4 ++--
 3 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/include/sysemu/hvf.h b/include/sysemu/hvf.h
index 4cbae87ced..4037cd6a73 100644
--- a/include/sysemu/hvf.h
+++ b/include/sysemu/hvf.h
@@ -51,10 +51,8 @@ int hvf_sw_breakpoints_active(CPUState *cpu);
 
 int hvf_arch_insert_sw_breakpoint(CPUState *cpu, struct hvf_sw_breakpoint *bp);
 int hvf_arch_remove_sw_breakpoint(CPUState *cpu, struct hvf_sw_breakpoint *bp);
-int hvf_arch_insert_hw_breakpoint(target_ulong addr, target_ulong len,
-  int type);
-int hvf_arch_remove_hw_breakpoint(target_ulong addr, target_ulong len,
-  int type);
+int hvf_arch_insert_hw_breakpoint(vaddr addr, vaddr len, int type);
+int hvf_arch_remove_hw_breakpoint(vaddr addr, vaddr len, int type);
 void hvf_arch_remove_all_hw_breakpoints(void);
 
 /*
diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c
index 8fce64bbf6..486f90be1d 100644
--- a/target/arm/hvf/hvf.c
+++ b/target/arm/hvf/hvf.c
@@ -2063,7 +2063,7 @@ int hvf_arch_remove_sw_breakpoint(CPUState *cpu, struct 
hvf_sw_breakpoint *bp)
 return 0;
 }
 
-int hvf_arch_insert_hw_breakpoint(target_ulong addr, target_ulong len, int 
type)
+int hvf_arch_insert_hw_breakpoint(vaddr addr, vaddr len, int type)
 {
 switch (type) {
 case GDB_BREAKPOINT_HW:
@@ -2077,7 +2077,7 @@ int hvf_arch_insert_hw_breakpoint(target_ulong addr, 
target_ulong len, int type)
 }
 }
 
-int hvf_arch_remove_hw_breakpoint(target_ulong addr, target_ulong len, int 
type)
+int hvf_arch_remove_hw_breakpoint(vaddr addr, vaddr len, int type)
 {
 switch (type) {
 case GDB_BREAKPOINT_HW:
diff --git a/target/i386/hvf/hvf.c b/target/i386/hvf/hvf.c
index b9cbcc02a8..cb2cd0b02f 100644
--- a/target/i386/hvf/hvf.c
+++ b/target/i386/hvf/hvf.c
@@ -690,12 +690,12 @@ int hvf_arch_remove_sw_breakpoint(CPUState *cpu, struct 
hvf_sw_breakpoint *bp)
 return -ENOSYS;
 }
 
-int hvf_arch_insert_hw_breakpoint(target_ulong addr, target_ulong len, int 
type)
+int hvf_arch_insert_hw_breakpoint(vaddr addr, vaddr len, int type)
 {
 return -ENOSYS;
 }
 
-int hvf_arch_remove_hw_breakpoint(target_ulong addr, target_ulong len, int 
type)
+int hvf_arch_remove_hw_breakpoint(vaddr addr, vaddr len, int type)
 {
 return -ENOSYS;
 }
-- 
2.34.1

[PULL 10/48] target/m68k: Use tcg_gen_deposit_i32 in gen_partset_reg

2023-08-23 Thread Richard Henderson

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/m68k/translate.c | 11 ++-
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/target/m68k/translate.c b/target/m68k/translate.c
index e07161d76f..d08e823b6c 100644
--- a/target/m68k/translate.c
+++ b/target/m68k/translate.c
@@ -697,19 +697,12 @@ static inline int ext_opsize(int ext, int pos)
  */
 static void gen_partset_reg(int opsize, TCGv reg, TCGv val)
 {
-TCGv tmp;
 switch (opsize) {
 case OS_BYTE:
-tcg_gen_andi_i32(reg, reg, 0xff00);
-tmp = tcg_temp_new();
-tcg_gen_ext8u_i32(tmp, val);
-tcg_gen_or_i32(reg, reg, tmp);
+tcg_gen_deposit_i32(reg, reg, val, 0, 8);
 break;
 case OS_WORD:
-tcg_gen_andi_i32(reg, reg, 0x);
-tmp = tcg_temp_new();
-tcg_gen_ext16u_i32(tmp, val);
-tcg_gen_or_i32(reg, reg, tmp);
+tcg_gen_deposit_i32(reg, reg, val, 0, 16);
 break;
 case OS_LONG:
 case OS_SINGLE:
-- 
2.34.1

[PULL 31/48] tcg/sparc64: Implement negsetcond_*

2023-08-23 Thread Richard Henderson

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 tcg/sparc64/tcg-target.h |  4 ++--
 tcg/sparc64/tcg-target.c.inc | 40 +++-
 2 files changed, 32 insertions(+), 12 deletions(-)

diff --git a/tcg/sparc64/tcg-target.h b/tcg/sparc64/tcg-target.h
index 79889db760..3d41c9659b 100644
--- a/tcg/sparc64/tcg-target.h
+++ b/tcg/sparc64/tcg-target.h
@@ -106,7 +106,7 @@ extern bool use_vis3_instructions;
 #define TCG_TARGET_HAS_sextract_i32 0
 #define TCG_TARGET_HAS_extract2_i32 0
 #define TCG_TARGET_HAS_movcond_i32  1
-#define TCG_TARGET_HAS_negsetcond_i32   0
+#define TCG_TARGET_HAS_negsetcond_i32   1
 #define TCG_TARGET_HAS_add2_i32 1
 #define TCG_TARGET_HAS_sub2_i32 1
 #define TCG_TARGET_HAS_mulu2_i321
@@ -143,7 +143,7 @@ extern bool use_vis3_instructions;
 #define TCG_TARGET_HAS_sextract_i64 0
 #define TCG_TARGET_HAS_extract2_i64 0
 #define TCG_TARGET_HAS_movcond_i64  1
-#define TCG_TARGET_HAS_negsetcond_i64   0
+#define TCG_TARGET_HAS_negsetcond_i64   1
 #define TCG_TARGET_HAS_add2_i64 1
 #define TCG_TARGET_HAS_sub2_i64 1
 #define TCG_TARGET_HAS_mulu2_i640
diff --git a/tcg/sparc64/tcg-target.c.inc b/tcg/sparc64/tcg-target.c.inc
index ffcb879211..f2a346a1bd 100644
--- a/tcg/sparc64/tcg-target.c.inc
+++ b/tcg/sparc64/tcg-target.c.inc
@@ -720,7 +720,7 @@ static void tcg_out_movcond_i64(TCGContext *s, TCGCond 
cond, TCGReg ret,
 }
 
 static void tcg_out_setcond_i32(TCGContext *s, TCGCond cond, TCGReg ret,
-TCGReg c1, int32_t c2, int c2const)
+TCGReg c1, int32_t c2, int c2const, bool neg)
 {
 /* For 32-bit comparisons, we can play games with ADDC/SUBC.  */
 switch (cond) {
@@ -760,22 +760,34 @@ static void tcg_out_setcond_i32(TCGContext *s, TCGCond 
cond, TCGReg ret,
 default:
 tcg_out_cmp(s, c1, c2, c2const);
 tcg_out_movi_s13(s, ret, 0);
-tcg_out_movcc(s, cond, MOVCC_ICC, ret, 1, 1);
+tcg_out_movcc(s, cond, MOVCC_ICC, ret, neg ? -1 : 1, 1);
 return;
 }
 
 tcg_out_cmp(s, c1, c2, c2const);
 if (cond == TCG_COND_LTU) {
-tcg_out_arithi(s, ret, TCG_REG_G0, 0, ARITH_ADDC);
+if (neg) {
+/* 0 - 0 - C = -C = (C ? -1 : 0) */
+tcg_out_arithi(s, ret, TCG_REG_G0, 0, ARITH_SUBC);
+} else {
+/* 0 + 0 + C =  C = (C ? 1 : 0) */
+tcg_out_arithi(s, ret, TCG_REG_G0, 0, ARITH_ADDC);
+}
 } else {
-tcg_out_arithi(s, ret, TCG_REG_G0, -1, ARITH_SUBC);
+if (neg) {
+/* 0 + -1 + C = C - 1 = (C ? 0 : -1) */
+tcg_out_arithi(s, ret, TCG_REG_G0, -1, ARITH_ADDC);
+} else {
+/* 0 - -1 - C = 1 - C = (C ? 0 : 1) */
+tcg_out_arithi(s, ret, TCG_REG_G0, -1, ARITH_SUBC);
+}
 }
 }
 
 static void tcg_out_setcond_i64(TCGContext *s, TCGCond cond, TCGReg ret,
-TCGReg c1, int32_t c2, int c2const)
+TCGReg c1, int32_t c2, int c2const, bool neg)
 {
-if (use_vis3_instructions) {
+if (use_vis3_instructions && !neg) {
 switch (cond) {
 case TCG_COND_NE:
 if (c2 != 0) {
@@ -796,11 +808,11 @@ static void tcg_out_setcond_i64(TCGContext *s, TCGCond 
cond, TCGReg ret,
if the input does not overlap the output.  */
 if (c2 == 0 && !is_unsigned_cond(cond) && c1 != ret) {
 tcg_out_movi_s13(s, ret, 0);
-tcg_out_movr(s, cond, ret, c1, 1, 1);
+tcg_out_movr(s, cond, ret, c1, neg ? -1 : 1, 1);
 } else {
 tcg_out_cmp(s, c1, c2, c2const);
 tcg_out_movi_s13(s, ret, 0);
-tcg_out_movcc(s, cond, MOVCC_XCC, ret, 1, 1);
+tcg_out_movcc(s, cond, MOVCC_XCC, ret, neg ? -1 : 1, 1);
 }
 }
 
@@ -1355,7 +1367,10 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 tcg_out_brcond_i32(s, a2, a0, a1, const_args[1], arg_label(args[3]));
 break;
 case INDEX_op_setcond_i32:
-tcg_out_setcond_i32(s, args[3], a0, a1, a2, c2);
+tcg_out_setcond_i32(s, args[3], a0, a1, a2, c2, false);
+break;
+case INDEX_op_negsetcond_i32:
+tcg_out_setcond_i32(s, args[3], a0, a1, a2, c2, true);
 break;
 case INDEX_op_movcond_i32:
 tcg_out_movcond_i32(s, args[5], a0, a1, a2, c2, args[3], 
const_args[3]);
@@ -1437,7 +1452,10 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 tcg_out_brcond_i64(s, a2, a0, a1, const_args[1], arg_label(args[3]));
 break;
 case INDEX_op_setcond_i64:
-tcg_out_setcond_i64(s, args[3], a0, a1, a2, c2);
+tcg_out_setcond_i64(s, args[3], a0, a1, a2, c2, false);
+break;
+case INDEX_op_negsetcond_i64:
+tcg_out_setcond_i64(s, args[3], a0, a1, a2, c2, true);
 break;
 case INDEX_op_movcond_i64:
 tcg_out_movcond_i64(s, args[5], a0, a1, a2, c2, args[3],

[PULL 05/48] include/exec: Replace target_ulong with abi_ptr in cpu_[st|ld]*()

2023-08-23 Thread Richard Henderson

From: Anton Johansson via 

Changes the address type of the guest memory read/write functions from
target_ulong to abi_ptr. (abi_ptr is currently typedef'd to target_ulong
but that will change in a following commit.) This will reduce the
coupling between accel/ and target/.

Note: Function pointers that point to cpu_[st|ld]*() in target/riscv and
target/rx are also updated in this commit.

Signed-off-by: Anton Johansson 
Reviewed-by: Richard Henderson 
Message-Id: <20230807155706.9580-6-a...@rev.ng>
Signed-off-by: Richard Henderson 
---
 accel/tcg/atomic_template.h  | 16 
 include/exec/cpu_ldst.h  | 24 
 accel/tcg/cputlb.c   | 10 +-
 target/riscv/vector_helper.c |  2 +-
 target/rx/op_helper.c|  6 +++---
 5 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/accel/tcg/atomic_template.h b/accel/tcg/atomic_template.h
index e312acd16d..84c08b1425 100644
--- a/accel/tcg/atomic_template.h
+++ b/accel/tcg/atomic_template.h
@@ -69,7 +69,7 @@
 # define END  _le
 #endif
 
-ABI_TYPE ATOMIC_NAME(cmpxchg)(CPUArchState *env, target_ulong addr,
+ABI_TYPE ATOMIC_NAME(cmpxchg)(CPUArchState *env, abi_ptr addr,
   ABI_TYPE cmpv, ABI_TYPE newv,
   MemOpIdx oi, uintptr_t retaddr)
 {
@@ -87,7 +87,7 @@ ABI_TYPE ATOMIC_NAME(cmpxchg)(CPUArchState *env, target_ulong 
addr,
 }
 
 #if DATA_SIZE < 16
-ABI_TYPE ATOMIC_NAME(xchg)(CPUArchState *env, target_ulong addr, ABI_TYPE val,
+ABI_TYPE ATOMIC_NAME(xchg)(CPUArchState *env, abi_ptr addr, ABI_TYPE val,
MemOpIdx oi, uintptr_t retaddr)
 {
 DATA_TYPE *haddr = atomic_mmu_lookup(env, addr, oi, DATA_SIZE, retaddr);
@@ -100,7 +100,7 @@ ABI_TYPE ATOMIC_NAME(xchg)(CPUArchState *env, target_ulong 
addr, ABI_TYPE val,
 }
 
 #define GEN_ATOMIC_HELPER(X)\
-ABI_TYPE ATOMIC_NAME(X)(CPUArchState *env, target_ulong addr,   \
+ABI_TYPE ATOMIC_NAME(X)(CPUArchState *env, abi_ptr addr,\
 ABI_TYPE val, MemOpIdx oi, uintptr_t retaddr) \
 {   \
 DATA_TYPE *haddr, ret;  \
@@ -131,7 +131,7 @@ GEN_ATOMIC_HELPER(xor_fetch)
  * of CF_PARALLEL's value, we'll trace just a read and a write.
  */
 #define GEN_ATOMIC_HELPER_FN(X, FN, XDATA_TYPE, RET)\
-ABI_TYPE ATOMIC_NAME(X)(CPUArchState *env, target_ulong addr,   \
+ABI_TYPE ATOMIC_NAME(X)(CPUArchState *env, abi_ptr addr,\
 ABI_TYPE xval, MemOpIdx oi, uintptr_t retaddr) \
 {   \
 XDATA_TYPE *haddr, cmp, old, new, val = xval;   \
@@ -172,7 +172,7 @@ GEN_ATOMIC_HELPER_FN(umax_fetch, MAX,  DATA_TYPE, new)
 # define END  _be
 #endif
 
-ABI_TYPE ATOMIC_NAME(cmpxchg)(CPUArchState *env, target_ulong addr,
+ABI_TYPE ATOMIC_NAME(cmpxchg)(CPUArchState *env, abi_ptr addr,
   ABI_TYPE cmpv, ABI_TYPE newv,
   MemOpIdx oi, uintptr_t retaddr)
 {
@@ -190,7 +190,7 @@ ABI_TYPE ATOMIC_NAME(cmpxchg)(CPUArchState *env, 
target_ulong addr,
 }
 
 #if DATA_SIZE < 16
-ABI_TYPE ATOMIC_NAME(xchg)(CPUArchState *env, target_ulong addr, ABI_TYPE val,
+ABI_TYPE ATOMIC_NAME(xchg)(CPUArchState *env, abi_ptr addr, ABI_TYPE val,
MemOpIdx oi, uintptr_t retaddr)
 {
 DATA_TYPE *haddr = atomic_mmu_lookup(env, addr, oi, DATA_SIZE, retaddr);
@@ -203,7 +203,7 @@ ABI_TYPE ATOMIC_NAME(xchg)(CPUArchState *env, target_ulong 
addr, ABI_TYPE val,
 }
 
 #define GEN_ATOMIC_HELPER(X)\
-ABI_TYPE ATOMIC_NAME(X)(CPUArchState *env, target_ulong addr,   \
+ABI_TYPE ATOMIC_NAME(X)(CPUArchState *env, abi_ptr addr,\
 ABI_TYPE val, MemOpIdx oi, uintptr_t retaddr) \
 {   \
 DATA_TYPE *haddr, ret;  \
@@ -231,7 +231,7 @@ GEN_ATOMIC_HELPER(xor_fetch)
  * of CF_PARALLEL's value, we'll trace just a read and a write.
  */
 #define GEN_ATOMIC_HELPER_FN(X, FN, XDATA_TYPE, RET)\
-ABI_TYPE ATOMIC_NAME(X)(CPUArchState *env, target_ulong addr,   \
+ABI_TYPE ATOMIC_NAME(X)(CPUArchState *env, abi_ptr addr,\
 ABI_TYPE xval, MemOpIdx oi, uintptr_t retaddr) \
 {   \
 XDATA_TYPE *haddr, ldo, ldn, old, new, val = xval;  \
diff --git a/include/exec/cpu_ldst.h b/include/exec/cpu_ldst.h
index 645476f0e5..da10ba1433 100644
--- a/include/exec/cpu_ldst.h
+++ b/include/exec/cpu_ldst.h
@@ -223,31 +223,31 @@ void cpu_stq_mmu(CPUArchState *env, abi_ptr ptr, uint64_t 
val,
 void cpu_st16_mmu(CPUArchState *env, abi_ptr addr, Int128 val,
   MemOpIdx

[PULL 14/48] docs/devel/tcg-ops: Bury mentions of trunc_shr_i64_i32()

2023-08-23 Thread Richard Henderson

From: Philippe Mathieu-Daudé 

Commit 609ad70562 ("tcg: Split trunc_shr_i32 opcode into
extr[lh]_i64_i32") remove trunc_shr_i64_i32(). Update the
backend documentation.

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Richard Henderson 
Message-Id: <20230822162847.71206-1-phi...@linaro.org>
Signed-off-by: Richard Henderson 
---
 docs/devel/tcg-ops.rst | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/docs/devel/tcg-ops.rst b/docs/devel/tcg-ops.rst
index 6a166c5665..53695e1623 100644
--- a/docs/devel/tcg-ops.rst
+++ b/docs/devel/tcg-ops.rst
@@ -882,14 +882,15 @@ sub2_i32, brcond2_i32).
 On a 64 bit target, the values are transferred between 32 and 64-bit
 registers using the following ops:
 
-- trunc_shr_i64_i32
+- extrl_i64_i32
+- extrh_i64_i32
 - ext_i32_i64
 - extu_i32_i64
 
 They ensure that the values are correctly truncated or extended when
 moved from a 32-bit to a 64-bit register or vice-versa. Note that the
-trunc_shr_i64_i32 is an optional op. It is not necessary to implement
-it if all the following conditions are met:
+extrl_i64_i32 and extrh_i64_i32 are optional ops. It is not necessary
+to implement them if all the following conditions are met:
 
 - 64-bit registers can hold 32-bit values
 - 32-bit values in a 64-bit register do not need to stay zero or
-- 
2.34.1

[PULL 11/48] tcg/i386: Drop BYTEH deposits for 64-bit

2023-08-23 Thread Richard Henderson

It is more useful to allow low-part deposits into all registers
than to restrict allocation for high-byte deposits.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target-con-set.h | 2 +-
 tcg/i386/tcg-target-con-str.h | 1 -
 tcg/i386/tcg-target.h | 4 ++--
 tcg/i386/tcg-target.c.inc | 7 +++
 4 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/tcg/i386/tcg-target-con-set.h b/tcg/i386/tcg-target-con-set.h
index 5ea3a292f0..3949d49538 100644
--- a/tcg/i386/tcg-target-con-set.h
+++ b/tcg/i386/tcg-target-con-set.h
@@ -33,7 +33,7 @@ C_O1_I1(r, q)
 C_O1_I1(r, r)
 C_O1_I1(x, r)
 C_O1_I1(x, x)
-C_O1_I2(Q, 0, Q)
+C_O1_I2(q, 0, q)
 C_O1_I2(q, r, re)
 C_O1_I2(r, 0, ci)
 C_O1_I2(r, 0, r)
diff --git a/tcg/i386/tcg-target-con-str.h b/tcg/i386/tcg-target-con-str.h
index 24e6bcb80d..95a30e58cd 100644
--- a/tcg/i386/tcg-target-con-str.h
+++ b/tcg/i386/tcg-target-con-str.h
@@ -19,7 +19,6 @@ REGS('D', 1u << TCG_REG_EDI)
 REGS('r', ALL_GENERAL_REGS)
 REGS('x', ALL_VECTOR_REGS)
 REGS('q', ALL_BYTEL_REGS) /* regs that can be used as a byte operand */
-REGS('Q', ALL_BYTEH_REGS) /* regs with a second byte (e.g. %ah) */
 REGS('L', ALL_GENERAL_REGS & ~SOFTMMU_RESERVE_REGS)  /* qemu_ld/st */
 REGS('s', ALL_BYTEL_REGS & ~SOFTMMU_RESERVE_REGS)/* qemu_st8_i32 data */
 
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 2a2e3fffa8..30cce01ca4 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -227,8 +227,8 @@ typedef enum {
 #define TCG_TARGET_HAS_cmpsel_vec   -1
 
 #define TCG_TARGET_deposit_i32_valid(ofs, len) \
-(((ofs) == 0 && (len) == 8) || ((ofs) == 8 && (len) == 8) || \
- ((ofs) == 0 && (len) == 16))
+(((ofs) == 0 && ((len) == 8 || (len) == 16)) || \
+ (TCG_TARGET_REG_BITS == 32 && (ofs) == 8 && (len) == 8))
 #define TCG_TARGET_deposit_i64_validTCG_TARGET_deposit_i32_valid
 
 /* Check for the possibility of high-byte extraction and, for 64-bit,
diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index a6b2eae995..ba40dd0f4d 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -144,7 +144,6 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind 
kind, int slot)
 # define TCG_REG_L1 TCG_REG_EDX
 #endif
 
-#define ALL_BYTEH_REGS 0x000fu
 #if TCG_TARGET_REG_BITS == 64
 # define ALL_GENERAL_REGS  0xu
 # define ALL_VECTOR_REGS   0xu
@@ -152,7 +151,7 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind 
kind, int slot)
 #else
 # define ALL_GENERAL_REGS  0x00ffu
 # define ALL_VECTOR_REGS   0x00ffu
-# define ALL_BYTEL_REGSALL_BYTEH_REGS
+# define ALL_BYTEL_REGS0x000fu
 #endif
 #ifdef CONFIG_SOFTMMU
 # define SOFTMMU_RESERVE_REGS  ((1 << TCG_REG_L0) | (1 << TCG_REG_L1))
@@ -2752,7 +2751,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 if (args[3] == 0 && args[4] == 8) {
 /* load bits 0..7 */
 tcg_out_modrm(s, OPC_MOVB_EvGv | P_REXB_R | P_REXB_RM, a2, a0);
-} else if (args[3] == 8 && args[4] == 8) {
+} else if (TCG_TARGET_REG_BITS == 32 && args[3] == 8 && args[4] == 8) {
 /* load bits 8..15 */
 tcg_out_modrm(s, OPC_MOVB_EvGv, a2, a0 + 4);
 } else if (args[3] == 0 && args[4] == 16) {
@@ -3312,7 +3311,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode 
op)
 
 case INDEX_op_deposit_i32:
 case INDEX_op_deposit_i64:
-return C_O1_I2(Q, 0, Q);
+return C_O1_I2(q, 0, q);
 
 case INDEX_op_setcond_i32:
 case INDEX_op_setcond_i64:
-- 
2.34.1

[PULL 12/48] tcg: Fold deposit with zero to and

2023-08-23 Thread Richard Henderson

Inserting a zero into a value, or inserting a value
into zero at offset 0 may be implemented with AND.

Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 37 +
 1 file changed, 37 insertions(+)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index d2156367a3..bbd9bb64c6 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -1279,6 +1279,8 @@ static bool fold_ctpop(OptContext *ctx, TCGOp *op)
 
 static bool fold_deposit(OptContext *ctx, TCGOp *op)
 {
+TCGOpcode and_opc;
+
 if (arg_is_const(op->args[1]) && arg_is_const(op->args[2])) {
 uint64_t t1 = arg_info(op->args[1])->val;
 uint64_t t2 = arg_info(op->args[2])->val;
@@ -1287,6 +1289,41 @@ static bool fold_deposit(OptContext *ctx, TCGOp *op)
 return tcg_opt_gen_movi(ctx, op, op->args[0], t1);
 }
 
+switch (ctx->type) {
+case TCG_TYPE_I32:
+and_opc = INDEX_op_and_i32;
+break;
+case TCG_TYPE_I64:
+and_opc = INDEX_op_and_i64;
+break;
+default:
+g_assert_not_reached();
+}
+
+/* Inserting a value into zero at offset 0. */
+if (arg_is_const(op->args[1])
+&& arg_info(op->args[1])->val == 0
+&& op->args[3] == 0) {
+uint64_t mask = MAKE_64BIT_MASK(0, op->args[4]);
+
+op->opc = and_opc;
+op->args[1] = op->args[2];
+op->args[2] = temp_arg(tcg_constant_internal(ctx->type, mask));
+ctx->z_mask = mask & arg_info(op->args[1])->z_mask;
+return false;
+}
+
+/* Inserting zero into a value. */
+if (arg_is_const(op->args[2])
+&& arg_info(op->args[2])->val == 0) {
+uint64_t mask = deposit64(-1, op->args[3], op->args[4], 0);
+
+op->opc = and_opc;
+op->args[2] = temp_arg(tcg_constant_internal(ctx->type, mask));
+ctx->z_mask = mask & arg_info(op->args[1])->z_mask;
+return false;
+}
+
 ctx->z_mask = deposit64(arg_info(op->args[1])->z_mask,
 op->args[3], op->args[4],
 arg_info(op->args[2])->z_mask);
-- 
2.34.1

[PULL 09/48] accel/tcg: Update run_on_cpu_data static assert

2023-08-23 Thread Richard Henderson

From: Anton Johansson via 

As we are now using vaddr for representing guest addresses, update the
static assert to check that vaddr fits in the run_on_cpu_data union.

Signed-off-by: Anton Johansson 
Reviewed-by: Richard Henderson 
Message-Id: <20230807155706.9580-10-a...@rev.ng>
Signed-off-by: Richard Henderson 
---
 accel/tcg/cputlb.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index bd2cf4b0be..c643d66190 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -74,8 +74,9 @@
 } while (0)
 
 /* run_on_cpu_data.target_ptr should always be big enough for a
- * target_ulong even on 32 bit builds */
-QEMU_BUILD_BUG_ON(sizeof(target_ulong) > sizeof(run_on_cpu_data));
+ * vaddr even on 32 bit builds
+ */
+QEMU_BUILD_BUG_ON(sizeof(vaddr) > sizeof(run_on_cpu_data));
 
 /* We currently can't handle more than 16 bits in the MMUIDX bitmask.
  */
-- 
2.34.1

[PULL 18/48] target/alpha: Use tcg_gen_movcond_i64 in gen_fold_mzero

2023-08-23 Thread Richard Henderson

The setcond + neg + and sequence is a complex method of
performing a conditional move.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/alpha/translate.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/target/alpha/translate.c b/target/alpha/translate.c
index 846f3d8091..0839182a1f 100644
--- a/target/alpha/translate.c
+++ b/target/alpha/translate.c
@@ -517,10 +517,9 @@ static void gen_fold_mzero(TCGCond cond, TCGv dest, TCGv 
src)
 
 case TCG_COND_GE:
 case TCG_COND_LT:
-/* For >= or <, map -0.0 to +0.0 via comparison and mask.  */
-tcg_gen_setcondi_i64(TCG_COND_NE, dest, src, mzero);
-tcg_gen_neg_i64(dest, dest);
-tcg_gen_and_i64(dest, dest, src);
+/* For >= or <, map -0.0 to +0.0. */
+tcg_gen_movcond_i64(TCG_COND_NE, dest, src, tcg_constant_i64(mzero),
+src, tcg_constant_i64(0));
 break;
 
 default:
-- 
2.34.1

[PULL 33/48] tcg/i386: Merge tcg_out_setcond{32,64}

2023-08-23 Thread Richard Henderson

Pass a rexw parameter instead of duplicating the functions.

Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.c.inc | 24 +++-
 1 file changed, 7 insertions(+), 17 deletions(-)

diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 33f66ba204..010432d3a9 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -1527,23 +1527,16 @@ static void tcg_out_brcond2(TCGContext *s, const TCGArg 
*args,
 }
 #endif
 
-static void tcg_out_setcond32(TCGContext *s, TCGCond cond, TCGArg dest,
-  TCGArg arg1, TCGArg arg2, int const_arg2)
+static void tcg_out_setcond(TCGContext *s, int rexw, TCGCond cond,
+TCGArg dest, TCGArg arg1, TCGArg arg2,
+int const_arg2)
 {
-tcg_out_cmp(s, arg1, arg2, const_arg2, 0);
+tcg_out_cmp(s, arg1, arg2, const_arg2, rexw);
 tcg_out_modrm(s, OPC_SETCC | tcg_cond_to_jcc[cond], 0, dest);
 tcg_out_ext8u(s, dest, dest);
 }
 
-#if TCG_TARGET_REG_BITS == 64
-static void tcg_out_setcond64(TCGContext *s, TCGCond cond, TCGArg dest,
-  TCGArg arg1, TCGArg arg2, int const_arg2)
-{
-tcg_out_cmp(s, arg1, arg2, const_arg2, P_REXW);
-tcg_out_modrm(s, OPC_SETCC | tcg_cond_to_jcc[cond], 0, dest);
-tcg_out_ext8u(s, dest, dest);
-}
-#else
+#if TCG_TARGET_REG_BITS == 32
 static void tcg_out_setcond2(TCGContext *s, const TCGArg *args,
  const int *const_args)
 {
@@ -2568,8 +2561,8 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 tcg_out_brcond(s, rexw, a2, a0, a1, const_args[1],
arg_label(args[3]), 0);
 break;
-case INDEX_op_setcond_i32:
-tcg_out_setcond32(s, args[3], a0, a1, a2, const_a2);
+OP_32_64(setcond):
+tcg_out_setcond(s, rexw, args[3], a0, a1, a2, const_a2);
 break;
 case INDEX_op_movcond_i32:
 tcg_out_movcond32(s, args[5], a0, a1, a2, const_a2, args[3]);
@@ -2721,9 +2714,6 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 }
 break;
 
-case INDEX_op_setcond_i64:
-tcg_out_setcond64(s, args[3], a0, a1, a2, const_a2);
-break;
 case INDEX_op_movcond_i64:
 tcg_out_movcond64(s, args[5], a0, a1, a2, const_a2, args[3]);
 break;
-- 
2.34.1

[PULL 21/48] target/openrisc: Use tcg_gen_negsetcond_*

2023-08-23 Thread Richard Henderson

Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/openrisc/translate.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/target/openrisc/translate.c b/target/openrisc/translate.c
index a86360d4f5..7c6f80daf1 100644
--- a/target/openrisc/translate.c
+++ b/target/openrisc/translate.c
@@ -253,9 +253,8 @@ static void gen_mul(DisasContext *dc, TCGv dest, TCGv srca, 
TCGv srcb)
 
 tcg_gen_muls2_tl(dest, cpu_sr_ov, srca, srcb);
 tcg_gen_sari_tl(t0, dest, TARGET_LONG_BITS - 1);
-tcg_gen_setcond_tl(TCG_COND_NE, cpu_sr_ov, cpu_sr_ov, t0);
+tcg_gen_negsetcond_tl(TCG_COND_NE, cpu_sr_ov, cpu_sr_ov, t0);
 
-tcg_gen_neg_tl(cpu_sr_ov, cpu_sr_ov);
 gen_ove_ov(dc);
 }
 
@@ -309,9 +308,8 @@ static void gen_muld(DisasContext *dc, TCGv srca, TCGv srcb)
 
 tcg_gen_muls2_i64(cpu_mac, high, t1, t2);
 tcg_gen_sari_i64(t1, cpu_mac, 63);
-tcg_gen_setcond_i64(TCG_COND_NE, t1, t1, high);
+tcg_gen_negsetcond_i64(TCG_COND_NE, t1, t1, high);
 tcg_gen_trunc_i64_tl(cpu_sr_ov, t1);
-tcg_gen_neg_tl(cpu_sr_ov, cpu_sr_ov);
 
 gen_ove_ov(dc);
 }
-- 
2.34.1

[PULL 34/48] tcg/i386: Merge tcg_out_movcond{32,64}

2023-08-23 Thread Richard Henderson

Pass a rexw parameter instead of duplicating the functions.

Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.c.inc | 28 +++-
 1 file changed, 7 insertions(+), 21 deletions(-)

diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 010432d3a9..1542afd94d 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -1593,24 +1593,14 @@ static void tcg_out_cmov(TCGContext *s, TCGCond cond, 
int rexw,
 }
 }
 
-static void tcg_out_movcond32(TCGContext *s, TCGCond cond, TCGReg dest,
-  TCGReg c1, TCGArg c2, int const_c2,
-  TCGReg v1)
+static void tcg_out_movcond(TCGContext *s, int rexw, TCGCond cond,
+TCGReg dest, TCGReg c1, TCGArg c2, int const_c2,
+TCGReg v1)
 {
-tcg_out_cmp(s, c1, c2, const_c2, 0);
-tcg_out_cmov(s, cond, 0, dest, v1);
+tcg_out_cmp(s, c1, c2, const_c2, rexw);
+tcg_out_cmov(s, cond, rexw, dest, v1);
 }
 
-#if TCG_TARGET_REG_BITS == 64
-static void tcg_out_movcond64(TCGContext *s, TCGCond cond, TCGReg dest,
-  TCGReg c1, TCGArg c2, int const_c2,
-  TCGReg v1)
-{
-tcg_out_cmp(s, c1, c2, const_c2, P_REXW);
-tcg_out_cmov(s, cond, P_REXW, dest, v1);
-}
-#endif
-
 static void tcg_out_ctz(TCGContext *s, int rexw, TCGReg dest, TCGReg arg1,
 TCGArg arg2, bool const_a2)
 {
@@ -2564,8 +2554,8 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 OP_32_64(setcond):
 tcg_out_setcond(s, rexw, args[3], a0, a1, a2, const_a2);
 break;
-case INDEX_op_movcond_i32:
-tcg_out_movcond32(s, args[5], a0, a1, a2, const_a2, args[3]);
+OP_32_64(movcond):
+tcg_out_movcond(s, rexw, args[5], a0, a1, a2, const_a2, args[3]);
 break;
 
 OP_32_64(bswap16):
@@ -2714,10 +2704,6 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 }
 break;
 
-case INDEX_op_movcond_i64:
-tcg_out_movcond64(s, args[5], a0, a1, a2, const_a2, args[3]);
-break;
-
 case INDEX_op_bswap64_i64:
 tcg_out_bswap64(s, a0);
 break;
-- 
2.34.1

[PULL 03/48] sysemu/kvm: Use vaddr for kvm_arch_[insert|remove]_hw_breakpoint

2023-08-23 Thread Richard Henderson

From: Anton Johansson via 

Changes the signature of the target-defined functions for
inserting/removing kvm hw breakpoints. The address and length arguments
are now of vaddr type, which both matches the type used internally in
accel/kvm/kvm-all.c and makes the api target-agnostic.

Signed-off-by: Anton Johansson 
Reviewed-by: Richard Henderson 
Message-Id: <20230807155706.9580-4-a...@rev.ng>
Signed-off-by: Richard Henderson 
---
 include/sysemu/kvm.h   |  6 ++
 target/arm/kvm64.c |  6 ++
 target/i386/kvm/kvm.c  |  8 +++-
 target/ppc/kvm.c   | 13 ++---
 target/s390x/kvm/kvm.c |  6 ++
 5 files changed, 15 insertions(+), 24 deletions(-)

diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 5670306dbf..19d87b20e8 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -426,10 +426,8 @@ int kvm_arch_insert_sw_breakpoint(CPUState *cpu,
   struct kvm_sw_breakpoint *bp);
 int kvm_arch_remove_sw_breakpoint(CPUState *cpu,
   struct kvm_sw_breakpoint *bp);
-int kvm_arch_insert_hw_breakpoint(target_ulong addr,
-  target_ulong len, int type);
-int kvm_arch_remove_hw_breakpoint(target_ulong addr,
-  target_ulong len, int type);
+int kvm_arch_insert_hw_breakpoint(vaddr addr, vaddr len, int type);
+int kvm_arch_remove_hw_breakpoint(vaddr addr, vaddr len, int type);
 void kvm_arch_remove_all_hw_breakpoints(void);
 
 void kvm_arch_update_guest_debug(CPUState *cpu, struct kvm_guest_debug *dbg);
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index 94bbd9661f..4d904a1d11 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -49,8 +49,7 @@ void kvm_arm_init_debug(KVMState *s)
 return;
 }
 
-int kvm_arch_insert_hw_breakpoint(target_ulong addr,
-  target_ulong len, int type)
+int kvm_arch_insert_hw_breakpoint(vaddr addr, vaddr len, int type)
 {
 switch (type) {
 case GDB_BREAKPOINT_HW:
@@ -65,8 +64,7 @@ int kvm_arch_insert_hw_breakpoint(target_ulong addr,
 }
 }
 
-int kvm_arch_remove_hw_breakpoint(target_ulong addr,
-  target_ulong len, int type)
+int kvm_arch_remove_hw_breakpoint(vaddr addr, vaddr len, int type)
 {
 switch (type) {
 case GDB_BREAKPOINT_HW:
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index ebfaf3d24c..295228cafb 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -4995,7 +4995,7 @@ MemTxAttrs kvm_arch_post_run(CPUState *cpu, struct 
kvm_run *run)
 kvm_rate_limit_on_bus_lock();
 }
 
-#ifdef CONFIG_XEN_EMU
+#ifdef CONFIG_XEN_EMU
 /*
  * If the callback is asserted as a GSI (or PCI INTx) then check if
  * vcpu_info->evtchn_upcall_pending has been cleared, and deassert
@@ -5156,8 +5156,7 @@ static int find_hw_breakpoint(target_ulong addr, int len, 
int type)
 return -1;
 }
 
-int kvm_arch_insert_hw_breakpoint(target_ulong addr,
-  target_ulong len, int type)
+int kvm_arch_insert_hw_breakpoint(vaddr addr, vaddr len, int type)
 {
 switch (type) {
 case GDB_BREAKPOINT_HW:
@@ -5197,8 +5196,7 @@ int kvm_arch_insert_hw_breakpoint(target_ulong addr,
 return 0;
 }
 
-int kvm_arch_remove_hw_breakpoint(target_ulong addr,
-  target_ulong len, int type)
+int kvm_arch_remove_hw_breakpoint(vaddr addr, vaddr len, int type)
 {
 int n;
 
diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index a8a935e267..91e73620d3 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -1444,15 +1444,15 @@ static int find_hw_watchpoint(target_ulong addr, int 
*flag)
 return -1;
 }
 
-int kvm_arch_insert_hw_breakpoint(target_ulong addr,
-  target_ulong len, int type)
+int kvm_arch_insert_hw_breakpoint(vaddr addr, vaddr len, int type)
 {
-if ((nb_hw_breakpoint + nb_hw_watchpoint) >= ARRAY_SIZE(hw_debug_points)) {
+const unsigned breakpoint_index = nb_hw_breakpoint + nb_hw_watchpoint;
+if (breakpoint_index >= ARRAY_SIZE(hw_debug_points)) {
 return -ENOBUFS;
 }
 
-hw_debug_points[nb_hw_breakpoint + nb_hw_watchpoint].addr = addr;
-hw_debug_points[nb_hw_breakpoint + nb_hw_watchpoint].type = type;
+hw_debug_points[breakpoint_index].addr = addr;
+hw_debug_points[breakpoint_index].type = type;
 
 switch (type) {
 case GDB_BREAKPOINT_HW:
@@ -1488,8 +1488,7 @@ int kvm_arch_insert_hw_breakpoint(target_ulong addr,
 return 0;
 }
 
-int kvm_arch_remove_hw_breakpoint(target_ulong addr,
-  target_ulong len, int type)
+int kvm_arch_remove_hw_breakpoint(vaddr addr, vaddr len, int type)
 {
 int n;
 
diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c
index a9e5880349..1b240fc8de 100644
--- a/target/s390x/kvm/kvm.c
+++ b/target/s390x/kvm/kvm.c
@@ -995,8 +995,7 @@ static int insert_hw_breakpoint(target_ulong addr, int len, 
int

[PULL 00/48] tcg patch queue

2023-08-23 Thread Richard Henderson

The following changes since commit b0dd9a7d6dd15a6898e9c585b521e6bec79b25aa:

  Open 8.2 development tree (2023-08-22 07:14:07 -0700)

are available in the Git repository at:

  https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20230823

for you to fetch changes up to 05e09d2a830a74d651ca6106e2002ec4f7b6997a:

  tcg: spelling fixes (2023-08-23 13:20:47 -0700)


accel/*: Widen pc/saved_insn for *_sw_breakpoint
accel/tcg: Replace remaining target_ulong in system-mode accel
tcg: spelling fixes
tcg: Document bswap, hswap, wswap byte patterns
tcg: Introduce negsetcond opcodes
tcg: Fold deposit with zero to and
tcg: Unify TCG_TARGET_HAS_extr[lh]_i64_i32
tcg/i386: Drop BYTEH deposits for 64-bit
tcg/i386: Allow immediate as input to deposit
target/*: Use tcg_gen_negsetcond_*


Anton Johansson via (9):
  accel/kvm: Widen pc/saved_insn for kvm_sw_breakpoint
  accel/hvf: Widen pc/saved_insn for hvf_sw_breakpoint
  sysemu/kvm: Use vaddr for kvm_arch_[insert|remove]_hw_breakpoint
  sysemu/hvf: Use vaddr for hvf_arch_[insert|remove]_hw_breakpoint
  include/exec: Replace target_ulong with abi_ptr in cpu_[st|ld]*()
  include/exec: typedef abi_ptr to vaddr in softmmu
  include/exec: Widen tlb_hit/tlb_hit_page()
  accel/tcg: Widen address arg in tlb_compare_set()
  accel/tcg: Update run_on_cpu_data static assert

Mark Cave-Ayland (1):
  docs/devel/tcg-ops: fix missing newlines in "Host vector operations"

Michael Tokarev (1):
  tcg: spelling fixes

Philippe Mathieu-Daudé (9):
  docs/devel/tcg-ops: Bury mentions of trunc_shr_i64_i32()
  tcg/tcg-op: Document bswap16_i32() byte pattern
  tcg/tcg-op: Document bswap16_i64() byte pattern
  tcg/tcg-op: Document bswap32_i32() byte pattern
  tcg/tcg-op: Document bswap32_i64() byte pattern
  tcg/tcg-op: Document bswap64_i64() byte pattern
  tcg/tcg-op: Document hswap_i32/64() byte pattern
  tcg/tcg-op: Document wswap_i64() byte pattern
  target/cris: Fix a typo in gen_swapr()

Richard Henderson (28):
  target/m68k: Use tcg_gen_deposit_i32 in gen_partset_reg
  tcg/i386: Drop BYTEH deposits for 64-bit
  tcg: Fold deposit with zero to and
  tcg/i386: Allow immediate as input to deposit_*
  tcg: Unify TCG_TARGET_HAS_extr[lh]_i64_i32
  tcg: Introduce negsetcond opcodes
  tcg: Use tcg_gen_negsetcond_*
  target/alpha: Use tcg_gen_movcond_i64 in gen_fold_mzero
  target/arm: Use tcg_gen_negsetcond_*
  target/m68k: Use tcg_gen_negsetcond_*
  target/openrisc: Use tcg_gen_negsetcond_*
  target/ppc: Use tcg_gen_negsetcond_*
  target/sparc: Use tcg_gen_movcond_i64 in gen_edge
  target/tricore: Replace gen_cond_w with tcg_gen_negsetcond_tl
  tcg/ppc: Implement negsetcond_*
  tcg/ppc: Use the Set Boolean Extension
  tcg/aarch64: Implement negsetcond_*
  tcg/arm: Implement negsetcond_i32
  tcg/riscv: Implement negsetcond_*
  tcg/s390x: Implement negsetcond_*
  tcg/sparc64: Implement negsetcond_*
  tcg/i386: Merge tcg_out_brcond{32,64}
  tcg/i386: Merge tcg_out_setcond{32,64}
  tcg/i386: Merge tcg_out_movcond{32,64}
  tcg/i386: Use CMP+SBB in tcg_out_setcond
  tcg/i386: Clear dest first in tcg_out_setcond if possible
  tcg/i386: Use shift in tcg_out_setcond
  tcg/i386: Implement negsetcond_*

 docs/devel/tcg-ops.rst |  15 +-
 accel/tcg/atomic_template.h|  16 +-
 include/exec/cpu-all.h |   4 +-
 include/exec/cpu_ldst.h|  28 +--
 include/sysemu/hvf.h   |  12 +-
 include/sysemu/kvm.h   |  12 +-
 include/tcg/tcg-op-common.h|   4 +
 include/tcg/tcg-op.h   |   2 +
 include/tcg/tcg-opc.h  |   6 +-
 include/tcg/tcg.h  |   4 +-
 tcg/aarch64/tcg-target.h   |   5 +-
 tcg/arm/tcg-target.h   |   1 +
 tcg/i386/tcg-target-con-set.h  |   2 +-
 tcg/i386/tcg-target-con-str.h  |   1 -
 tcg/i386/tcg-target.h  |   9 +-
 tcg/loongarch64/tcg-target.h   |   6 +-
 tcg/mips/tcg-target.h  |   5 +-
 tcg/ppc/tcg-target.h   |   5 +-
 tcg/riscv/tcg-target.h |   5 +-
 tcg/s390x/tcg-target.h |   5 +-
 tcg/sparc64/tcg-target.h   |   5 +-
 tcg/tci/tcg-target.h   |   5 +-
 accel/hvf/hvf-accel-ops.c  |   4 +-
 accel/hvf/hvf-all.c|   2 +-
 accel/kvm/kvm-all.c|   3 +-
 accel/tcg/cputlb.c |  17 +-
 target/alpha/translate.c   |   7 +-
 target/arm/hvf/hvf.c   |   4 +-
 target/a

Re: [PATCH v3 07/11] softmmu/physmem: Never return directories from file_ram_open()

2023-08-23 Thread Peter Xu

On Wed, Aug 23, 2023 at 05:34:07PM +0200, David Hildenbrand wrote:
> open() does not fail on directories when opening them readonly (O_RDONLY).
> 
> Currently, we succeed opening such directories and fail later during
> mmap(), resulting in a misleading error message.
> 
> $ ./qemu-system-x86_64 \
> -object memory-backend-file,id=ram0,mem-path=tmp,readonly=true,size=1g
>  qemu-system-x86_64: unable to map backing store for guest RAM: No such device
> 
> To identify directories and handle them accordingly in file_ram_open()
> also when readonly=true was specified, detect if we just opened a directory
> using fstat() instead. Then, fail file_ram_open() right away, similarly
> to how we now fail if the file does not exist and we want to open the
> file readonly.
> 
> With this change, we get a nicer error message:
>  qemu-system-x86_64: can't open backing store tmp for guest RAM: Is a 
> directory
> 
> Note that the only memory-backend-file will end up calling
> memory_region_init_ram_from_file() -> qemu_ram_alloc_from_file() ->
> file_ram_open().
> 
> Reported-by: Thiner Logoer 
> Signed-off-by: David Hildenbrand 

Reviewed-by: Peter Xu 

-- 
Peter Xu

Re: [PATCH v3 06/11] softmmu/physmem: Fail creation of new files in file_ram_open() with readonly=true

2023-08-23 Thread Peter Xu

On Wed, Aug 23, 2023 at 05:34:06PM +0200, David Hildenbrand wrote:
> Currently, if a file does not exist yet, file_ram_open() will create new
> empty file and open it writable. However, it even does that when
> readonly=true was specified.
> 
> Specifying O_RDONLY instead to create a new readonly file would
> theoretically work, however, ftruncate() will refuse to resize the new
> empty file and we'll get a warning:
> ftruncate: Invalid argument
> And later eventually more problems when actually mmap'ing that file and
> accessing it.
> 
> If someone intends to let QEMU open+mmap a file read-only, better
> create+resize+fill that file ahead of time outside of QEMU context.
> 
> We'll now fail with:
> ./qemu-system-x86_64 \
> -object memory-backend-file,id=ram0,mem-path=tmp,readonly=true,size=1g
> qemu-system-x86_64: can't open backing store tmp for guest RAM: No such file 
> or directory
> 
> All use cases of readonly files (R/O NVDIMMs, VM templating) work on
> existing files, so silently creating new files might just hide user
> errors when accidentally specifying a non-existent file.
> 
> Note that the only memory-backend-file will end up calling
> memory_region_init_ram_from_file() -> qemu_ram_alloc_from_file() ->
> file_ram_open().
> 
> Move error reporting to the single caller.
> 
> Signed-off-by: David Hildenbrand 

Acked-by: Peter Xu 

-- 
Peter Xu

Re: [PATCH v3 04/11] softmmu/physmem: Remap with proper protection in qemu_ram_remap()

2023-08-23 Thread Peter Xu

On Wed, Aug 23, 2023 at 05:34:04PM +0200, David Hildenbrand wrote:
> Let's remap with the proper protection that we can derive from
> RAM_READONLY.
> 
> Signed-off-by: David Hildenbrand 

Reviewed-by: Peter Xu 

-- 
Peter Xu

Re: [PATCH v3 05/11] softmmu/physmem: Bail out early in ram_block_discard_range() with readonly files

2023-08-23 Thread Peter Xu

On Wed, Aug 23, 2023 at 05:34:05PM +0200, David Hildenbrand wrote:
> fallocate() will fail, let's print a nicer error message.
> 
> Suggested-by: Peter Xu 
> Signed-off-by: David Hildenbrand 

Reviewed-by: Peter Xu 

-- 
Peter Xu

Re: [PATCH v3 02/11] softmmu/physmem: Distinguish between file access mode and mmap protection

2023-08-23 Thread Peter Xu

On Wed, Aug 23, 2023 at 05:34:02PM +0200, David Hildenbrand wrote:
> There is a difference between how we open a file and how we mmap it,
> and we want to support writable private mappings of readonly files. Let's
> define RAM_READONLY and RAM_READONLY_FD flags, to replace the single
> "readonly" parameter for file-related functions.
> 
> In memory_region_init_ram_from_fd() and memory_region_init_ram_from_file(),
> initialize mr->readonly based on the new RAM_READONLY flag.
> 
> While at it, add some RAM_* flags we missed to add to the list of accepted
> flags in the documentation of some functions.
> 
> No change in functionality intended. We'll make use of both flags next
> and start setting them independently for memory-backend-file.
> 
> Signed-off-by: David Hildenbrand 

Acked-by: Peter Xu 

-- 
Peter Xu

Re: [PATCH v2 1/4] softmmu: Support concurrent bounce buffers

2023-08-23 Thread Mattias Nissler

Peter, thanks for taking a look and providing feedback!

On Wed, Aug 23, 2023 at 7:35 PM Peter Xu  wrote:
>
> On Wed, Aug 23, 2023 at 02:29:02AM -0700, Mattias Nissler wrote:
> > When DMA memory can't be directly accessed, as is the case when
> > running the device model in a separate process without shareable DMA
> > file descriptors, bounce buffering is used.
> >
> > It is not uncommon for device models to request mapping of several DMA
> > regions at the same time. Examples include:
> >  * net devices, e.g. when transmitting a packet that is split across
> >several TX descriptors (observed with igb)
> >  * USB host controllers, when handling a packet with multiple data TRBs
> >(observed with xhci)
> >
> > Previously, qemu only provided a single bounce buffer and would fail DMA
> > map requests while the buffer was already in use. In turn, this would
> > cause DMA failures that ultimately manifest as hardware errors from the
> > guest perspective.
> >
> > This change allocates DMA bounce buffers dynamically instead of
> > supporting only a single buffer. Thus, multiple DMA mappings work
> > correctly also when RAM can't be mmap()-ed.
> >
> > The total bounce buffer allocation size is limited by a new command line
> > parameter. The default is 4096 bytes to match the previous maximum
> > buffer size. It is expected that suitable limits will vary quite a bit
> > in practice depending on device models and workloads.
> >
> > Signed-off-by: Mattias Nissler 
> > ---
> >  include/sysemu/sysemu.h |  2 +
> >  qemu-options.hx | 27 +
> >  softmmu/globals.c   |  1 +
> >  softmmu/physmem.c   | 84 +++--
> >  softmmu/vl.c|  6 +++
> >  5 files changed, 83 insertions(+), 37 deletions(-)
> >
> > diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> > index 25be2a692e..c5dc93cb53 100644
> > --- a/include/sysemu/sysemu.h
> > +++ b/include/sysemu/sysemu.h
> > @@ -61,6 +61,8 @@ extern int nb_option_roms;
> >  extern const char *prom_envs[MAX_PROM_ENVS];
> >  extern unsigned int nb_prom_envs;
> >
> > +extern uint64_t max_bounce_buffer_size;
> > +
> >  /* serial ports */
> >
> >  /* Return the Chardev for serial port i, or NULL if none */
> > diff --git a/qemu-options.hx b/qemu-options.hx
> > index 29b98c3d4c..6071794237 100644
> > --- a/qemu-options.hx
> > +++ b/qemu-options.hx
> > @@ -4959,6 +4959,33 @@ SRST
> >  ERST
> >  #endif
> >
> > +DEF("max-bounce-buffer-size", HAS_ARG,
> > +QEMU_OPTION_max_bounce_buffer_size,
> > +"-max-bounce-buffer-size size\n"
> > +"DMA bounce buffer size limit in bytes 
> > (default=4096)\n",
> > +QEMU_ARCH_ALL)
> > +SRST
> > +``-max-bounce-buffer-size size``
> > +Set the limit in bytes for DMA bounce buffer allocations.
> > +
> > +DMA bounce buffers are used when device models request memory-mapped 
> > access
> > +to memory regions that can't be directly mapped by the qemu process, 
> > so the
> > +memory must read or written to a temporary local buffer for the device
> > +model to work with. This is the case e.g. for I/O memory regions, and 
> > when
> > +running in multi-process mode without shared access to memory.
> > +
> > +Whether bounce buffering is necessary depends heavily on the device 
> > model
> > +implementation. Some devices use explicit DMA read and write operations
> > +which do not require bounce buffers. Some devices, notably storage, 
> > will
> > +retry a failed DMA map request after bounce buffer space becomes 
> > available
> > +again. Most other devices will bail when encountering map request 
> > failures,
> > +which will typically appear to the guest as a hardware error.
> > +
> > +Suitable bounce buffer size values depend on the workload and guest
> > +configuration. A few kilobytes up to a few megabytes are common sizes
> > +encountered in practice.
>
> Does it mean that the default 4K size can still easily fail with some
> device setup?

Yes. The thing is that the respective device setup is pretty exotic,
at least the only setup I'm aware of is multi-process with direct RAM
access via shared file descriptors from the device process disabled
(which hurts performance, so few people will run this setup). In
theory, DMA to an I/O region of some sort would also run into the
issue even in single process mode, but I'm not aware of such a
situation. Looking at it from a historic perspective, note that the
single-bounce-buffer restriction has been present since a decade, and
thus the issue has been present for years (since multi-process is a
thing), without it hurting anyone enough to get fixed. But don't get
me wrong - I don't want to downplay anything and very much would like
to see this fixed, but I want to be honest and put things into the
right perspective.

>
> IIUC the whole point of limit here is to make sure the allocation is still
> bounded, while 4K itself is not a hard limit.

Re: [PATCH 05/24] tcg: spelling fixes

2023-08-23 Thread Richard Henderson


On 8/23/23 01:30, Alex Bennée wrote:


Michael Tokarev  writes:


Signed-off-by: Michael Tokarev 


Acked-by: Alex Bennée 



Queued this one patch to tcg-next.


r~

Re: [PATCH] docs/devel/tcg-ops: fix missing newlines in "Host vector operations" section

2023-08-23 Thread Richard Henderson


On 8/23/23 07:56, Philippe Mathieu-Daudé wrote:

On 23/8/23 16:17, Mark Cave-Ayland wrote:

This unintentionally causes the mov_vec, ld_vec and st_vec operations to appear
on the same line.

Signed-off-by: Mark Cave-Ayland 
---
  docs/devel/tcg-ops.rst | 2 ++
  1 file changed, 2 insertions(+)


Reviewed-by: Philippe Mathieu-Daudé 




Queued to tcg-next.


r~

Re: [PATCH v2 33/58] headers: Add definitions from UEFI spec for volumes, resources, etc...

2023-08-23 Thread Isaku Yamahata

On Fri, Aug 18, 2023 at 05:50:16AM -0400,
Xiaoyao Li  wrote:

> Add UEFI definitions for literals, enums, structs, GUIDs, etc... that
> will be used by TDX to build the UEFI Hand-Off Block (HOB) that is passed
> to the Trusted Domain Virtual Firmware (TDVF).
> 
> All values come from the UEFI specification and TDVF design guide. [1]
> 
> Note, EFI_RESOURCE_MEMORY_UNACCEPTED will be added in future UEFI spec.
> 
> [1] 
> https://software.intel.com/content/dam/develop/external/us/en/documents/tdx-virtual-firmware-design-guide-rev-1.pdf

Nitpick: The specs [1] [2] include unaccepted memory.

[1] UEFI Specification Version 2.10 (released August 2022)
[2] UEFI Platform Initialization Distribution Packaging Specification Version 
1.1)
-- 
Isaku Yamahata

Re: [PATCH 21/24] scripts/: spelling fixes

2023-08-23 Thread Michael Tokarev


23.08.2023 22:12, Michael Tokarev пишет:

23.08.2023 17:13, Alex Bennée wrote:


Reviewed-by: Alex Bennée 


Alex, can I keep your R-b of this with the changes
suggested by Eric?


n/m, it's just me with too many patches, screwed up in multiple ways :)

/mjt

Re: [PATCH 21/24] scripts/: spelling fixes

2023-08-23 Thread Michael Tokarev


23.08.2023 17:13, Alex Bennée wrote:


Reviewed-by: Alex Bennée 


Alex, can I keep your R-b of this with the changes
suggested by Eric?

Thank you for the review!

/mjt

Re: [PATCH 24/24] misc/other: spelling fixes

2023-08-23 Thread Michael Tokarev


23.08.2023 21:23, Eric Blake wrote:

On Wed, Aug 23, 2023 at 09:53:35AM +0300, Michael Tokarev wrote:

Signed-off-by: Michael Tokarev 
---
  accel/tcg/tb-maint.c   | 2 +-
  backends/tpm/tpm_ioctl.h   | 2 +-
  chardev/char-socket.c  | 6 +++---
  chardev/char.c | 2 +-
  contrib/plugins/cache.c| 2 +-
  contrib/plugins/lockstep.c | 2 +-
  crypto/afalg.c | 2 +-
  crypto/block-luks.c| 6 +++---
  crypto/der.c   | 2 +-
  crypto/der.h   | 6 +++---
  linux-user/flatload.c  | 2 +-
  linux-user/syscall.c   | 4 ++--
  nbd/client-connection.c| 2 +-
  net/checksum.c | 4 ++--
  net/filter.c   | 4 ++--
  net/vhost-vdpa.c   | 8 
  semihosting/config.c   | 2 +-
  semihosting/syscalls.c | 4 ++--
  softmmu/icount.c   | 2 +-
  softmmu/ioport.c   | 2 +-
  20 files changed, 33 insertions(+), 33 deletions(-)


Wide-ranging; but since the nbd/ change caught my eye (rather, my mail
filters), I went ahead and reviewed the entire patch.


Yeah, it's "all over", a trade-off between large number of patches
and patches of large size.

..

With those tweaks,

Reviewed-by: Eric Blake 


Added the tweaks here too. Thank you!

/mjt

Re: [RFC PATCH 2/5] HACK: target/arm/tcg: Add some more caches to cpu=max

2023-08-23 Thread Richard Henderson


On 8/23/23 07:59, Jonathan Cameron via wrote:

On Mon, 14 Aug 2023 11:13:58 +0100
Alex Bennée  wrote:


Jonathan Cameron  writes:


Used to drive the MPAM cache intialization and to exercise more
of the PPTT cache entry generation code. Perhaps a default
L3 cache is acceptable for max?

Signed-off-by: Jonathan Cameron 
---
  target/arm/tcg/cpu64.c | 12 
  1 file changed, 12 insertions(+)

diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
index 8019f00bc3..2af67739f6 100644
--- a/target/arm/tcg/cpu64.c
+++ b/target/arm/tcg/cpu64.c
@@ -711,6 +711,17 @@ void aarch64_max_tcg_initfn(Object *obj)
  uint64_t t;
  uint32_t u;
  
+/*

+ * Expanded cache set
+ */
+cpu->clidr = 0x8204923; /* 4 4 4 4 3 in 3 bit fields */
+cpu->ccsidr[0] = 0x00ff001aull; /* 64KB L1 dcache */
+cpu->ccsidr[1] = 0x00ff001aull; /* 64KB L1 icache */
+cpu->ccsidr[2] = 0x07ff003aull; /* 1MB L2 unified cache */
+cpu->ccsidr[4] = 0x07ff007cull; /* 2MB L3 cache 128B line */
+cpu->ccsidr[6] = 0x7fff007cull; /* 16MB L4 cache 128B line */
+cpu->ccsidr[8] = 0x0007007cull; /* 2048MB L5 cache 128B line */
+


I think Peter in another thread wondered if we should have a generic
function for expanding the cache idr registers based on a abstract lane
definition.



Great!

This response?
https://lore.kernel.org/qemu-devel/cafeaca_lzj1leutmro72fcfqicwtopd+5b-ypcfkv8bg1f+...@mail.gmail.com/


Followed up with

https://lore.kernel.org/qemu-devel/20230811214031.171020-6-richard.hender...@linaro.org/


r~

Re: NVMe ZNS last zone size

2023-08-23 Thread Stefan Hajnoczi

On Wed, 23 Aug 2023 at 14:53, Klaus Jensen  wrote:
>
> On Aug 23 22:58, Sam Li wrote:
> > Stefan Hajnoczi  于2023年8月23日周三 22:41写道：
> > >
> > > On Wed, 23 Aug 2023 at 10:24, Sam Li  wrote:
> > > >
> > > > Hi Stefan,
> > > >
> > > > Stefan Hajnoczi  于2023年8月23日周三 21:26写道：
> > > > >
> > > > > Hi Sam and Klaus,
> > > > > Val is adding nvme-io_uring ZNS support to libblkio
> > > > > (https://gitlab.com/libblkio/libblkio/-/merge_requests/221) and asked
> > > > > how to test the size of the last zone when the namespace's total size
> > > > > is not a multiple of the zone size.
> > > >
> > > > I think a zone report operation can do the trick. Given zone configs,
> > > > the size of last zone should be [size - (nr_zones - 1) * zone_size].
> > > > Reporting last zone on such devices tells whether the value is
> > > > correct.
> > >
> > > In nvme_ns_zoned_check_calc_geometry() the number of zones is rounded 
> > > down:
> > >
> > >   ns->num_zones = le64_to_cpu(ns->id_ns.nsze) / ns->zone_size;
> > >
> > > Afterwards nsze is recalculated as follows:
> > >
> > >   ns->id_ns.nsze = cpu_to_le64(ns->num_zones * ns->zone_size);
> > >
> > > I interpret this to mean that when the namespace's total size is not a
> > > multiple of the zone size, then the last part will be ignored and not
> > > exposed as a zone.
> >
> > I see. Current ZNS emulation does not support this case.
> >
>
> NVMe Zoned Namespaces requires all zones to be the same size. The
> "trailing zone" is a thing in SMR HDDs.

Thanks for letting me know.

Stefan

Re: [PATCH 17/24] xen: spelling fixes

2023-08-23 Thread Michael Tokarev


23.08.2023 21:38, David Woodhouse wrote:

On Wed, 2023-08-23 at 09:53 +0300, Michael Tokarev wrote:


  include/hw/xen/interface/arch-x86/xen-x86_64.h | 2 +-
  include/hw/xen/interface/arch-x86/xen.h    | 2 +-
  include/hw/xen/interface/event_channel.h   | 2 +-
  include/hw/xen/interface/grant_table.h | 2 +-
  include/hw/xen/interface/hvm/hvm_op.h  | 2 +-
  include/hw/xen/interface/io/blkif.h    | 6 +++---
  include/hw/xen/interface/io/fbif.h | 2 +-
  include/hw/xen/interface/io/kbdif.h    | 2 +-
  include/hw/xen/interface/memory.h  | 2 +-
  include/hw/xen/interface/physdev.h | 4 ++--
  include/hw/xen/interface/xen.h | 4 ++--
  12 files changed, 16 insertions(+), 16 deletions(-)


Please don't make changes to these unless you also change them in the
upstream Xen project, from which they're imported.


Aha. I didn't know they're imported from elsewhere.
Is it just include/hw/xen/interface/ or whole include/hw/xen/ ?

I guess these changes can be sent to xen project :)


Thanks.


--- a/include/hw/xen/interface/memory.h
+++ b/include/hw/xen/interface/memory.h
@@ -185,5 +185,5 @@ struct xen_machphys_mfn_list {
  /*
   * Pointer to buffer to fill with list of extent starts. If there are
- * any large discontiguities in the machine address space, 2MB gaps in
+ * any large discontinuities in the machine address space, 2MB gaps in
   * the machphys table will be represented by an MFN base of zero.
   */


If you're already correcting that line, why not also correct MB to MiB?


That's not really spelling fix anymore.  Also, this is not universal, some
still prefer MB over MiB (to mean 1024 not 1000).  Else it's just too many
changes and the thing will never finish :)

Here in this patch, after removing include/hw/xen/, there's just one file
with one fix left, hw/xen/xen_pvdev.c. Hwell.. :)

/mjt

Re: NVMe ZNS last zone size

2023-08-23 Thread Klaus Jensen

On Aug 23 22:58, Sam Li wrote:
> Stefan Hajnoczi  于2023年8月23日周三 22:41写道：
> >
> > On Wed, 23 Aug 2023 at 10:24, Sam Li  wrote:
> > >
> > > Hi Stefan,
> > >
> > > Stefan Hajnoczi  于2023年8月23日周三 21:26写道：
> > > >
> > > > Hi Sam and Klaus,
> > > > Val is adding nvme-io_uring ZNS support to libblkio
> > > > (https://gitlab.com/libblkio/libblkio/-/merge_requests/221) and asked
> > > > how to test the size of the last zone when the namespace's total size
> > > > is not a multiple of the zone size.
> > >
> > > I think a zone report operation can do the trick. Given zone configs,
> > > the size of last zone should be [size - (nr_zones - 1) * zone_size].
> > > Reporting last zone on such devices tells whether the value is
> > > correct.
> >
> > In nvme_ns_zoned_check_calc_geometry() the number of zones is rounded down:
> >
> >   ns->num_zones = le64_to_cpu(ns->id_ns.nsze) / ns->zone_size;
> >
> > Afterwards nsze is recalculated as follows:
> >
> >   ns->id_ns.nsze = cpu_to_le64(ns->num_zones * ns->zone_size);
> >
> > I interpret this to mean that when the namespace's total size is not a
> > multiple of the zone size, then the last part will be ignored and not
> > exposed as a zone.
> 
> I see. Current ZNS emulation does not support this case.
> 

NVMe Zoned Namespaces requires all zones to be the same size. The
"trailing zone" is a thing in SMR HDDs.


Cheers,
Klaus

Re: [PATCH v2 0/8] tcg: Document swap helper implementations

2023-08-23 Thread Richard Henderson


On 8/23/23 07:55, Philippe Mathieu-Daudé wrote:

Philippe Mathieu-Daudé (8):
   tcg/tcg-op: Document bswap16_i32() byte pattern
   tcg/tcg-op: Document bswap16_i64() byte pattern
   tcg/tcg-op: Document bswap32_i32() byte pattern
   tcg/tcg-op: Document bswap32_i64() byte pattern
   tcg/tcg-op: Document bswap64_i64() byte pattern
   tcg/tcg-op: Document hswap_i32/64() byte pattern
   tcg/tcg-op: Document wswap_i64() byte pattern
   target/cris: Fix a typo in gen_swapr()


Reviewed-by: Richard Henderson 

and queued to tcg-next with a few tweaks.


r~

Re: [PATCH 22/24] tests/: spelling fixes

2023-08-23 Thread Michael Tokarev


23.08.2023 20:55, Eric Blake wrote:

On Wed, Aug 23, 2023 at 08:51:53AM +0300, Michael Tokarev wrote:

Signed-off-by: Michael Tokarev 
---



+++ b/tests/qemu-iotests/298
@@ -142,5 +142,5 @@ class TestTruncate(iotests.QMPTestCase):
  
  # Probably we'll want preallocate filter to keep align to cluster when

-# shrink preallocation, so, ignore small differece
+# shrink preallocation, so, ignore small difference


The sentence as a whole still comes across awkwardly; maybe shorten and reword 
to:


Yeah. I especially didn't tweak wording, - there are a LOT of places
which needs additional tweaking, and the patch set is already way too
large...  It was tempting to tweak some of these but I decided to draw
the line somewhere.


The preallocate filter may keep cluster alignment when shrinking, so
ignore small differences


Hwell :)



+++ b/tests/unit/test-throttle.c
@@ -136,5 +136,5 @@ static void test_compute_wait(void)
  g_assert(double_cmp(bkt.level, (i + 1) * (bkt.max - bkt.avg) / 10));
  /* We can do bursts for the 2 seconds we have configured in
- * burst_length. We have 100 extra miliseconds of burst
+ * burst_length. We have 100 extra milliseconds of burst
   * because bkt.level has been leaking during this time.
   * After that, we have to wait. */
@@ -380,5 +380,5 @@ static void test_is_valid(void)
  /* zero are valids */
  test_is_valid_for_value(0, true);
-/* positives numers are valids */
+/* positives numbers are valids */


Multiple grammar errors in this function.  Should be:

/* negative numbers are invalid */
/* zero is valid */
/* positive numbers are valid */


Actually I missed this "valids" somehow, even after additional
re-reading of the changes.

Still, these are *grammar* errors, not spelling errors, so the
subject line does not fit anymore :)


All other changes in this patch look good.  With additional tweaks,

Reviewed-by: Eric Blake 


Adding extra comment about the grammar changes to the commit message :)

Thank you!

/mjt

1 2 3 >

1 - 100 of 285 matches

Mail list logo