[RFC PATCH 01/43] target/loongarch: Add vector data type vec_t
Signed-off-by: Song Gao --- linux-user/loongarch64/signal.c | 4 ++-- target/loongarch/cpu.c | 2 +- target/loongarch/cpu.h | 18 +- target/loongarch/gdbstub.c | 4 ++-- target/loongarch/machine.c | 2 +- 5 files changed, 23 insertions(+), 7 deletions(-) diff --git a/linux-user/loongarch64/signal.c b/linux-user/loongarch64/signal.c index 7c7afb652e..40dba974d0 100644 --- a/linux-user/loongarch64/signal.c +++ b/linux-user/loongarch64/signal.c @@ -128,7 +128,7 @@ static void setup_sigframe(CPULoongArchState *env, fpu_ctx = (struct target_fpu_context *)(info + 1); for (i = 0; i < 32; ++i) { -__put_user(env->fpr[i], &fpu_ctx->regs[i]); +__put_user(env->fpr[i].d, &fpu_ctx->regs[i]); } __put_user(read_fcc(env), &fpu_ctx->fcc); __put_user(env->fcsr0, &fpu_ctx->fcsr); @@ -193,7 +193,7 @@ static void restore_sigframe(CPULoongArchState *env, uint64_t fcc; for (i = 0; i < 32; ++i) { -__get_user(env->fpr[i], &fpu_ctx->regs[i]); +__get_user(env->fpr[i].d, &fpu_ctx->regs[i]); } __get_user(fcc, &fpu_ctx->fcc); write_fcc(env, fcc); diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c index 290ab4d526..59ae29a3b4 100644 --- a/target/loongarch/cpu.c +++ b/target/loongarch/cpu.c @@ -653,7 +653,7 @@ void loongarch_cpu_dump_state(CPUState *cs, FILE *f, int flags) /* fpr */ if (flags & CPU_DUMP_FPU) { for (i = 0; i < 32; i++) { -qemu_fprintf(f, " %s %016" PRIx64, fregnames[i], env->fpr[i]); +qemu_fprintf(f, " %s %016" PRIx64, fregnames[i], env->fpr[i].d); if ((i & 3) == 3) { qemu_fprintf(f, "\n"); } diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h index e35cf65597..d37df63bde 100644 --- a/target/loongarch/cpu.h +++ b/target/loongarch/cpu.h @@ -239,6 +239,22 @@ FIELD(TLB_MISC, ASID, 1, 10) FIELD(TLB_MISC, VPPN, 13, 35) FIELD(TLB_MISC, PS, 48, 6) +#define LSX_LEN (128) +typedef union vec_t vec_t; +union vec_t { +int8_t B[LSX_LEN / 8]; +int16_t H[LSX_LEN / 16]; +int32_t W[LSX_LEN / 32]; +int64_t D[LSX_LEN / 64]; +__int128 Q[LSX_LEN / 128]; +}; + +typedef union fpr_t fpr_t; +union fpr_t { +uint64_t d; +vec_t vec; +}; + struct LoongArchTLB { uint64_t tlb_misc; /* Fields corresponding to CSR_TLBELO0/1 */ @@ -251,7 +267,7 @@ typedef struct CPUArchState { uint64_t gpr[32]; uint64_t pc; -uint64_t fpr[32]; +fpr_t fpr[32]; float_status fp_status; bool cf[8]; diff --git a/target/loongarch/gdbstub.c b/target/loongarch/gdbstub.c index a4d1e28e36..18cba6f8f3 100644 --- a/target/loongarch/gdbstub.c +++ b/target/loongarch/gdbstub.c @@ -68,7 +68,7 @@ static int loongarch_gdb_get_fpu(CPULoongArchState *env, GByteArray *mem_buf, int n) { if (0 <= n && n < 32) { -return gdb_get_reg64(mem_buf, env->fpr[n]); +return gdb_get_reg64(mem_buf, env->fpr[n].d); } else if (n == 32) { uint64_t val = read_fcc(env); return gdb_get_reg64(mem_buf, val); @@ -84,7 +84,7 @@ static int loongarch_gdb_set_fpu(CPULoongArchState *env, int length = 0; if (0 <= n && n < 32) { -env->fpr[n] = ldq_p(mem_buf); +env->fpr[n].d = ldq_p(mem_buf); length = 8; } else if (n == 32) { uint64_t val = ldq_p(mem_buf); diff --git a/target/loongarch/machine.c b/target/loongarch/machine.c index b1e523ea72..b3598cce3f 100644 --- a/target/loongarch/machine.c +++ b/target/loongarch/machine.c @@ -33,7 +33,7 @@ const VMStateDescription vmstate_loongarch_cpu = { VMSTATE_UINTTL_ARRAY(env.gpr, LoongArchCPU, 32), VMSTATE_UINTTL(env.pc, LoongArchCPU), -VMSTATE_UINT64_ARRAY(env.fpr, LoongArchCPU, 32), +VMSTATE_UINT64_ARRAY(env.fpr.d, LoongArchCPU, 32), VMSTATE_UINT32(env.fcsr0, LoongArchCPU), VMSTATE_BOOL_ARRAY(env.cf, LoongArchCPU, 8), -- 2.31.1
Re: [RFC PATCH 01/43] target/loongarch: Add vector data type vec_t
Hi, Richard 在 2022/12/25 上午1:32, Richard Henderson 写道: On 12/24/22 00:15, Song Gao wrote: +union vec_t { + int8_t B[LSX_LEN / 8]; + int16_t H[LSX_LEN / 16]; + int32_t W[LSX_LEN / 32]; + int64_t D[LSX_LEN / 64]; + __int128 Q[LSX_LEN / 128]; Oh, you can't use __int128 directly. It won't compile on 32-bit hosts. Can we use Int128 after include "qem/int128.h" ? So, some vxx_q instructions can use int128_ xx(a, b). Thanks. Song Gao
Re: [RFC PATCH 01/43] target/loongarch: Add vector data type vec_t
On 2/12/23 22:24, gaosong wrote: Hi, Richard 在 2022/12/25 上午1:32, Richard Henderson 写道: On 12/24/22 00:15, Song Gao wrote: +union vec_t { + int8_t B[LSX_LEN / 8]; + int16_t H[LSX_LEN / 16]; + int32_t W[LSX_LEN / 32]; + int64_t D[LSX_LEN / 64]; + __int128 Q[LSX_LEN / 128]; Oh, you can't use __int128 directly. It won't compile on 32-bit hosts. Can we use Int128 after include "qem/int128.h" ? So, some vxx_q instructions can use int128_ xx(a, b). Yes, certainly. r~
Re: [RFC PATCH 01/43] target/loongarch: Add vector data type vec_t
On 12/24/22 00:15, Song Gao wrote: +#define LSX_LEN (128) +typedef union vec_t vec_t; +union vec_t { +int8_t B[LSX_LEN / 8]; +int16_t H[LSX_LEN / 16]; +int32_t W[LSX_LEN / 32]; +int64_t D[LSX_LEN / 64]; +__int128 Q[LSX_LEN / 128]; +}; + +typedef union fpr_t fpr_t; +union fpr_t { +uint64_t d; +vec_t vec; +}; You need to think about host endianness with this overlap and indexing. There are two different models which can be emulated: (1) target/{arm,s390x}/ has each uint64_t in host-endian order, but the words are indexed little-endian. See, for instance, target/s390x/tcg/vec.h. (2) target/{ppc,i386}/ has the entire vector in host-endian order. See, for instance, ZMM_* in target/i386/cpu.h. If you do nothing, I assume this will fail on a big-endian host. r~
Re: [RFC PATCH 01/43] target/loongarch: Add vector data type vec_t
On 12/24/22 00:15, Song Gao wrote: +union fpr_t { +uint64_t d; +vec_t vec; +}; + struct LoongArchTLB { uint64_t tlb_misc; /* Fields corresponding to CSR_TLBELO0/1 */ @@ -251,7 +267,7 @@ typedef struct CPUArchState { uint64_t gpr[32]; uint64_t pc; -uint64_t fpr[32]; +fpr_t fpr[32]; I didn't spot it right away, because you didn't add ".d" to the tcg register allocation, but if you use tcg/tcg-op-gvec.h (and you really should), then you will also have to remove for (i = 0; i < 32; i++) { int off = offsetof(CPULoongArchState, fpr[i]); cpu_fpr[i] = tcg_global_mem_new_i64(cpu_env, off, fregnames[i]); } because one cannot modify global_mem variables with gvec. I strongly suggest that you introduce wrappers to load/store fpr values from their env slots. I would name them similarly to gpr_{src,dst}, gen_set_gpr. r~
Re: [RFC PATCH 01/43] target/loongarch: Add vector data type vec_t
On 12/24/22 00:15, Song Gao wrote: +union vec_t { +int8_t B[LSX_LEN / 8]; +int16_t H[LSX_LEN / 16]; +int32_t W[LSX_LEN / 32]; +int64_t D[LSX_LEN / 64]; +__int128 Q[LSX_LEN / 128]; Oh, you can't use __int128 directly. It won't compile on 32-bit hosts. r~
Re: [RFC PATCH 01/43] target/loongarch: Add vector data type vec_t
在 2022/12/25 上午1:24, Richard Henderson 写道: On 12/24/22 00:15, Song Gao wrote: +union fpr_t { + uint64_t d; + vec_t vec; +}; + struct LoongArchTLB { uint64_t tlb_misc; /* Fields corresponding to CSR_TLBELO0/1 */ @@ -251,7 +267,7 @@ typedef struct CPUArchState { uint64_t gpr[32]; uint64_t pc; - uint64_t fpr[32]; + fpr_t fpr[32]; I didn't spot it right away, because you didn't add ".d" to the tcg register allocation, Oh, my mistake. but if you use tcg/tcg-op-gvec.h (and you really should), then you will also have to remove for (i = 0; i < 32; i++) { int off = offsetof(CPULoongArchState, fpr[i]); cpu_fpr[i] = tcg_global_mem_new_i64(cpu_env, off, fregnames[i]); } because one cannot modify global_mem variables with gvec. The manual says "The lower 64 bits of each vector register overlap with the floating point register of the same number. In other words When the basic floating-point instruction is executed to update the floating-point register, the low 64 bits of the corresponding LSX register are also updated to the same value." So If we don't use the fpr_t. we should: 1 Update LSX low 64 bits after floating point instruction translation; 2 Update floating-point registers after LSX instruction translation. Should we do this or have I misunderstood? I strongly suggest that you introduce wrappers to load/store fpr values from their env slots. I would name them similarly to gpr_{src,dst}, gen_set_gpr. Got it . Thanks. Song Gao
Re: [RFC PATCH 01/43] target/loongarch: Add vector data type vec_t
On 12/27/22 18:34, gaosong wrote: The manual says "The lower 64 bits of each vector register overlap with the floating point register of the same number. In other words When the basic floating-point instruction is executed to update the floating-point register, the low 64 bits of the corresponding LSX register are also updated to the same value." So If we don't use the fpr_t. we should: 1 Update LSX low 64 bits after floating point instruction translation; 2 Update floating-point registers after LSX instruction translation. Should we do this or have I misunderstood? You should use fpr_t, you should not use cpu_fpr[]. This is the same as aarch64, for instance. A related question though: does the manual mention whether the fpu instructions only modify the lower 64 bits, or do the high 64-bits become zeroed, nanboxed, or unspecified? I strongly suggest that you introduce wrappers to load/store fpr values from their env slots. I would name them similarly to gpr_{src,dst}, gen_set_gpr. Got it. r~
Re: [RFC PATCH 01/43] target/loongarch: Add vector data type vec_t
在 2022/12/29 上午1:30, Richard Henderson 写道: On 12/27/22 18:34, gaosong wrote: The manual says "The lower 64 bits of each vector register overlap with the floating point register of the same number. In other words When the basic floating-point instruction is executed to update the floating-point register, the low 64 bits of the corresponding LSX register are also updated to the same value." So If we don't use the fpr_t. we should: 1 Update LSX low 64 bits after floating point instruction translation; 2 Update floating-point registers after LSX instruction translation. Should we do this or have I misunderstood? You should use fpr_t, you should not use cpu_fpr[]. This is the same as aarch64, for instance. A related question though: does the manual mention whether the fpu instructions only modify the lower 64 bits, or do the high 64-bits become zeroed, nanboxed, or unspecified? Only modify the lower 64bits, the high 64-bits is unpecified. Thanks. Song Gao I strongly suggest that you introduce wrappers to load/store fpr values from their env slots. I would name them similarly to gpr_{src,dst}, gen_set_gpr. Got it. r~
Re: [RFC PATCH 01/43] target/loongarch: Add vector data type vec_t
On 12/28/22 17:51, gaosong wrote: A related question though: does the manual mention whether the fpu instructions only modify the lower 64 bits, or do the high 64-bits become zeroed, nanboxed, or unspecified? Only modify the lower 64bits, the high 64-bits is unpecified. These two options are mutually exclusive. If upper 64 bits unmodified, then they *are* specified to be the previous contents. r~
Re: [RFC PATCH 01/43] target/loongarch: Add vector data type vec_t
在 2022/12/29 上午11:13, Richard Henderson 写道: On 12/28/22 17:51, gaosong wrote: A related question though: does the manual mention whether the fpu instructions only modify the lower 64 bits, or do the high 64-bits become zeroed, nanboxed, or unspecified? Only modify the lower 64bits, the high 64-bits is unpecified. These two options are mutually exclusive. If upper 64 bits unmodified, then they *are* specified to be the previous contents. My description is not correct. 'The fpu instruction will modify the low 64 bits, but the high 64 bits are unspecified and their values are "unpredictable" '. Thanks. Song Gao