Re: [PATCH for-8.1] tcg: Use HAVE_CMPXCHG128 instead of CONFIG_CMPXCHG128

2023-07-14 Thread Thomas Huth

On 13/07/2023 22.23, Richard Henderson wrote:

We adjust CONFIG_ATOMIC128 and CONFIG_CMPXCHG128 with
CONFIG_ATOMIC128_OPT in atomic128.h.  It is difficult
to tell when those changes have been applied with the
ifdef we must use with CONFIG_CMPXCHG128.  So instead
use HAVE_CMPXCHG128, which triggers -Werror-undef when
the proper header has not been included.

Improves tcg_gen_atomic_cmpxchg_i128 for s390x host, which
requires CONFIG_ATOMIC128_OPT.  Without this we fall back
to EXCP_ATOMIC to single-step 128-bit atomics, which is
slow enough to cause some tests to time out.

Reported-by: Thomas Huth 
Signed-off-by: Richard Henderson 
---

Thomas, this issue does not quite match the one you bisected, but
other than the cmpxchg, I don't see any see any qemu_{ld,st}_i128
being used in BootLinuxS390X.test_s390_ccw_virtio_tcg.

As far as I can see, this wasn't broken by the addition of
CONFIG_ATOMIC128_OPT, rather that fix didn't go far enough.

Anyway, test_s390_ccw_virtio_tcg now passes in 159s on our host.


Thanks, I can confirm that this fixes the issue for me, too.

Tested-by: Thomas Huth 





Re: [PATCH for-8.1] tcg: Use HAVE_CMPXCHG128 instead of CONFIG_CMPXCHG128

2023-07-13 Thread Richard Henderson

On 7/13/23 22:36, Philippe Mathieu-Daudé wrote:

Hi Richard,

On 13/7/23 22:23, Richard Henderson wrote:

We adjust CONFIG_ATOMIC128 and CONFIG_CMPXCHG128 with
CONFIG_ATOMIC128_OPT in atomic128.h.  It is difficult
to tell when those changes have been applied with the
ifdef we must use with CONFIG_CMPXCHG128.  So instead
use HAVE_CMPXCHG128, which triggers -Werror-undef when
the proper header has not been included.

Improves tcg_gen_atomic_cmpxchg_i128 for s390x host, which
requires CONFIG_ATOMIC128_OPT.  Without this we fall back
to EXCP_ATOMIC to single-step 128-bit atomics, which is
slow enough to cause some tests to time out.

Reported-by: Thomas Huth 
Signed-off-by: Richard Henderson 
---

Thomas, this issue does not quite match the one you bisected, but
other than the cmpxchg, I don't see any see any qemu_{ld,st}_i128
being used in BootLinuxS390X.test_s390_ccw_virtio_tcg.

As far as I can see, this wasn't broken by the addition of
CONFIG_ATOMIC128_OPT, rather that fix didn't go far enough.

Anyway, test_s390_ccw_virtio_tcg now passes in 159s on our host.


IIUC:

If we have CONFIG_ATOMIC128, we use qatomic_cmpxchg__nocheck;
else if we have CONFIG_CMPXCHG128 we use __sync_val_compare_and_swap_16;
in both cases we set HAVE_CMPXCHG128;
otherwise we can not use atomic128 cmpxchg().

(I'm trying to figure why we need both CONFIGs).


Or sometimes we use inline asm, because there's no compiler support at all.
Please see host/include/*/host/atomic16-*.h.


r~



Re: [PATCH for-8.1] tcg: Use HAVE_CMPXCHG128 instead of CONFIG_CMPXCHG128

2023-07-13 Thread Philippe Mathieu-Daudé

Hi Richard,

On 13/7/23 22:23, Richard Henderson wrote:

We adjust CONFIG_ATOMIC128 and CONFIG_CMPXCHG128 with
CONFIG_ATOMIC128_OPT in atomic128.h.  It is difficult
to tell when those changes have been applied with the
ifdef we must use with CONFIG_CMPXCHG128.  So instead
use HAVE_CMPXCHG128, which triggers -Werror-undef when
the proper header has not been included.

Improves tcg_gen_atomic_cmpxchg_i128 for s390x host, which
requires CONFIG_ATOMIC128_OPT.  Without this we fall back
to EXCP_ATOMIC to single-step 128-bit atomics, which is
slow enough to cause some tests to time out.

Reported-by: Thomas Huth 
Signed-off-by: Richard Henderson 
---

Thomas, this issue does not quite match the one you bisected, but
other than the cmpxchg, I don't see any see any qemu_{ld,st}_i128
being used in BootLinuxS390X.test_s390_ccw_virtio_tcg.

As far as I can see, this wasn't broken by the addition of
CONFIG_ATOMIC128_OPT, rather that fix didn't go far enough.

Anyway, test_s390_ccw_virtio_tcg now passes in 159s on our host.


IIUC:

If we have CONFIG_ATOMIC128, we use qatomic_cmpxchg__nocheck;
else if we have CONFIG_CMPXCHG128 we use __sync_val_compare_and_swap_16;
in both cases we set HAVE_CMPXCHG128;
otherwise we can not use atomic128 cmpxchg().

(I'm trying to figure why we need both CONFIGs).



[PATCH for-8.1] tcg: Use HAVE_CMPXCHG128 instead of CONFIG_CMPXCHG128

2023-07-13 Thread Richard Henderson
We adjust CONFIG_ATOMIC128 and CONFIG_CMPXCHG128 with
CONFIG_ATOMIC128_OPT in atomic128.h.  It is difficult
to tell when those changes have been applied with the
ifdef we must use with CONFIG_CMPXCHG128.  So instead
use HAVE_CMPXCHG128, which triggers -Werror-undef when
the proper header has not been included.

Improves tcg_gen_atomic_cmpxchg_i128 for s390x host, which
requires CONFIG_ATOMIC128_OPT.  Without this we fall back
to EXCP_ATOMIC to single-step 128-bit atomics, which is
slow enough to cause some tests to time out.

Reported-by: Thomas Huth 
Signed-off-by: Richard Henderson 
---

Thomas, this issue does not quite match the one you bisected, but
other than the cmpxchg, I don't see any see any qemu_{ld,st}_i128
being used in BootLinuxS390X.test_s390_ccw_virtio_tcg.

As far as I can see, this wasn't broken by the addition of
CONFIG_ATOMIC128_OPT, rather that fix didn't go far enough.

Anyway, test_s390_ccw_virtio_tcg now passes in 159s on our host.


r~
---
 accel/tcg/tcg-runtime.h| 2 +-
 include/exec/helper-proto-common.h | 2 ++
 accel/tcg/cputlb.c | 2 +-
 accel/tcg/user-exec.c  | 2 +-
 tcg/tcg-op-ldst.c  | 2 +-
 accel/tcg/atomic_common.c.inc  | 2 +-
 6 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h
index 39e68007f9..186899a2c7 100644
--- a/accel/tcg/tcg-runtime.h
+++ b/accel/tcg/tcg-runtime.h
@@ -58,7 +58,7 @@ DEF_HELPER_FLAGS_5(atomic_cmpxchgq_be, TCG_CALL_NO_WG,
 DEF_HELPER_FLAGS_5(atomic_cmpxchgq_le, TCG_CALL_NO_WG,
i64, env, i64, i64, i64, i32)
 #endif
-#ifdef CONFIG_CMPXCHG128
+#if HAVE_CMPXCHG128
 DEF_HELPER_FLAGS_5(atomic_cmpxchgo_be, TCG_CALL_NO_WG,
i128, env, i64, i128, i128, i32)
 DEF_HELPER_FLAGS_5(atomic_cmpxchgo_le, TCG_CALL_NO_WG,
diff --git a/include/exec/helper-proto-common.h 
b/include/exec/helper-proto-common.h
index 4d4b022668..8b67170a22 100644
--- a/include/exec/helper-proto-common.h
+++ b/include/exec/helper-proto-common.h
@@ -7,6 +7,8 @@
 #ifndef HELPER_PROTO_COMMON_H
 #define HELPER_PROTO_COMMON_H
 
+#include "qemu/atomic128.h"  /* for HAVE_CMPXCHG128 */
+
 #define HELPER_H "accel/tcg/tcg-runtime.h"
 #include "exec/helper-proto.h.inc"
 #undef  HELPER_H
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index c2b81ec569..e0079c9a9d 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -3105,7 +3105,7 @@ void cpu_st16_mmu(CPUArchState *env, target_ulong addr, 
Int128 val,
 #include "atomic_template.h"
 #endif
 
-#if defined(CONFIG_ATOMIC128) || defined(CONFIG_CMPXCHG128)
+#if defined(CONFIG_ATOMIC128) || HAVE_CMPXCHG128
 #define DATA_SIZE 16
 #include "atomic_template.h"
 #endif
diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c
index d95b875a6a..e7225e10e9 100644
--- a/accel/tcg/user-exec.c
+++ b/accel/tcg/user-exec.c
@@ -1385,7 +1385,7 @@ static void *atomic_mmu_lookup(CPUArchState *env, vaddr 
addr, MemOpIdx oi,
 #include "atomic_template.h"
 #endif
 
-#if defined(CONFIG_ATOMIC128) || defined(CONFIG_CMPXCHG128)
+#if defined(CONFIG_ATOMIC128) || HAVE_CMPXCHG128
 #define DATA_SIZE 16
 #include "atomic_template.h"
 #endif
diff --git a/tcg/tcg-op-ldst.c b/tcg/tcg-op-ldst.c
index 0fcc1618e5..d54c305598 100644
--- a/tcg/tcg-op-ldst.c
+++ b/tcg/tcg-op-ldst.c
@@ -778,7 +778,7 @@ typedef void (*gen_atomic_op_i64)(TCGv_i64, TCGv_env, 
TCGv_i64,
 #else
 # define WITH_ATOMIC64(X)
 #endif
-#ifdef CONFIG_CMPXCHG128
+#if HAVE_CMPXCHG128
 # define WITH_ATOMIC128(X) X,
 #else
 # define WITH_ATOMIC128(X)
diff --git a/accel/tcg/atomic_common.c.inc b/accel/tcg/atomic_common.c.inc
index ee222fd7e7..95a5c5ff12 100644
--- a/accel/tcg/atomic_common.c.inc
+++ b/accel/tcg/atomic_common.c.inc
@@ -41,7 +41,7 @@ CMPXCHG_HELPER(cmpxchgq_be, uint64_t)
 CMPXCHG_HELPER(cmpxchgq_le, uint64_t)
 #endif
 
-#ifdef CONFIG_CMPXCHG128
+#if HAVE_CMPXCHG128
 CMPXCHG_HELPER(cmpxchgo_be, Int128)
 CMPXCHG_HELPER(cmpxchgo_le, Int128)
 #endif
-- 
2.34.1