[RFC PATCH 01/43] target/loongarch: Add vector data type vec_t

2022-12-24 Thread Song Gao
Signed-off-by: Song Gao 
---
 linux-user/loongarch64/signal.c |  4 ++--
 target/loongarch/cpu.c  |  2 +-
 target/loongarch/cpu.h  | 18 +-
 target/loongarch/gdbstub.c  |  4 ++--
 target/loongarch/machine.c  |  2 +-
 5 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/linux-user/loongarch64/signal.c b/linux-user/loongarch64/signal.c
index 7c7afb652e..40dba974d0 100644
--- a/linux-user/loongarch64/signal.c
+++ b/linux-user/loongarch64/signal.c
@@ -128,7 +128,7 @@ static void setup_sigframe(CPULoongArchState *env,
 
 fpu_ctx = (struct target_fpu_context *)(info + 1);
 for (i = 0; i < 32; ++i) {
-__put_user(env->fpr[i], &fpu_ctx->regs[i]);
+__put_user(env->fpr[i].d, &fpu_ctx->regs[i]);
 }
 __put_user(read_fcc(env), &fpu_ctx->fcc);
 __put_user(env->fcsr0, &fpu_ctx->fcsr);
@@ -193,7 +193,7 @@ static void restore_sigframe(CPULoongArchState *env,
 uint64_t fcc;
 
 for (i = 0; i < 32; ++i) {
-__get_user(env->fpr[i], &fpu_ctx->regs[i]);
+__get_user(env->fpr[i].d, &fpu_ctx->regs[i]);
 }
 __get_user(fcc, &fpu_ctx->fcc);
 write_fcc(env, fcc);
diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 290ab4d526..59ae29a3b4 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -653,7 +653,7 @@ void loongarch_cpu_dump_state(CPUState *cs, FILE *f, int 
flags)
 /* fpr */
 if (flags & CPU_DUMP_FPU) {
 for (i = 0; i < 32; i++) {
-qemu_fprintf(f, " %s %016" PRIx64, fregnames[i], env->fpr[i]);
+qemu_fprintf(f, " %s %016" PRIx64, fregnames[i], env->fpr[i].d);
 if ((i & 3) == 3) {
 qemu_fprintf(f, "\n");
 }
diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index e35cf65597..d37df63bde 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -239,6 +239,22 @@ FIELD(TLB_MISC, ASID, 1, 10)
 FIELD(TLB_MISC, VPPN, 13, 35)
 FIELD(TLB_MISC, PS, 48, 6)
 
+#define LSX_LEN   (128)
+typedef union vec_t vec_t;
+union vec_t {
+int8_t   B[LSX_LEN / 8];
+int16_t  H[LSX_LEN / 16];
+int32_t  W[LSX_LEN / 32];
+int64_t  D[LSX_LEN / 64];
+__int128 Q[LSX_LEN / 128];
+};
+
+typedef union fpr_t fpr_t;
+union fpr_t {
+uint64_t d;
+vec_t vec;
+};
+
 struct LoongArchTLB {
 uint64_t tlb_misc;
 /* Fields corresponding to CSR_TLBELO0/1 */
@@ -251,7 +267,7 @@ typedef struct CPUArchState {
 uint64_t gpr[32];
 uint64_t pc;
 
-uint64_t fpr[32];
+fpr_t fpr[32];
 float_status fp_status;
 bool cf[8];
 
diff --git a/target/loongarch/gdbstub.c b/target/loongarch/gdbstub.c
index a4d1e28e36..18cba6f8f3 100644
--- a/target/loongarch/gdbstub.c
+++ b/target/loongarch/gdbstub.c
@@ -68,7 +68,7 @@ static int loongarch_gdb_get_fpu(CPULoongArchState *env,
  GByteArray *mem_buf, int n)
 {
 if (0 <= n && n < 32) {
-return gdb_get_reg64(mem_buf, env->fpr[n]);
+return gdb_get_reg64(mem_buf, env->fpr[n].d);
 } else if (n == 32) {
 uint64_t val = read_fcc(env);
 return gdb_get_reg64(mem_buf, val);
@@ -84,7 +84,7 @@ static int loongarch_gdb_set_fpu(CPULoongArchState *env,
 int length = 0;
 
 if (0 <= n && n < 32) {
-env->fpr[n] = ldq_p(mem_buf);
+env->fpr[n].d = ldq_p(mem_buf);
 length = 8;
 } else if (n == 32) {
 uint64_t val = ldq_p(mem_buf);
diff --git a/target/loongarch/machine.c b/target/loongarch/machine.c
index b1e523ea72..b3598cce3f 100644
--- a/target/loongarch/machine.c
+++ b/target/loongarch/machine.c
@@ -33,7 +33,7 @@ const VMStateDescription vmstate_loongarch_cpu = {
 
 VMSTATE_UINTTL_ARRAY(env.gpr, LoongArchCPU, 32),
 VMSTATE_UINTTL(env.pc, LoongArchCPU),
-VMSTATE_UINT64_ARRAY(env.fpr, LoongArchCPU, 32),
+VMSTATE_UINT64_ARRAY(env.fpr.d, LoongArchCPU, 32),
 VMSTATE_UINT32(env.fcsr0, LoongArchCPU),
 VMSTATE_BOOL_ARRAY(env.cf, LoongArchCPU, 8),
 
-- 
2.31.1




Re: [RFC PATCH 01/43] target/loongarch: Add vector data type vec_t

2023-02-13 Thread gaosong

Hi,  Richard

在 2022/12/25 上午1:32, Richard Henderson 写道:

On 12/24/22 00:15, Song Gao wrote:

+union vec_t {
+    int8_t   B[LSX_LEN / 8];
+    int16_t  H[LSX_LEN / 16];
+    int32_t  W[LSX_LEN / 32];
+    int64_t  D[LSX_LEN / 64];
+    __int128 Q[LSX_LEN / 128];


Oh, you can't use __int128 directly.
It won't compile on 32-bit hosts.



Can we  use Int128  after include "qem/int128.h" ?
So,   some  vxx_q  instructions  can  use   int128_ xx(a, b).

Thanks.
Song Gao




Re: [RFC PATCH 01/43] target/loongarch: Add vector data type vec_t

2023-02-13 Thread Richard Henderson

On 2/12/23 22:24, gaosong wrote:

Hi,  Richard

在 2022/12/25 上午1:32, Richard Henderson 写道:

On 12/24/22 00:15, Song Gao wrote:

+union vec_t {
+    int8_t   B[LSX_LEN / 8];
+    int16_t  H[LSX_LEN / 16];
+    int32_t  W[LSX_LEN / 32];
+    int64_t  D[LSX_LEN / 64];
+    __int128 Q[LSX_LEN / 128];


Oh, you can't use __int128 directly.
It won't compile on 32-bit hosts.



Can we  use Int128  after include "qem/int128.h" ?
So,   some  vxx_q  instructions  can  use   int128_ xx(a, b).


Yes, certainly.

r~




Re: [RFC PATCH 01/43] target/loongarch: Add vector data type vec_t

2022-12-24 Thread Richard Henderson

On 12/24/22 00:15, Song Gao wrote:

+#define LSX_LEN   (128)
+typedef union vec_t vec_t;
+union vec_t {
+int8_t   B[LSX_LEN / 8];
+int16_t  H[LSX_LEN / 16];
+int32_t  W[LSX_LEN / 32];
+int64_t  D[LSX_LEN / 64];
+__int128 Q[LSX_LEN / 128];
+};
+
+typedef union fpr_t fpr_t;
+union fpr_t {
+uint64_t d;
+vec_t vec;
+};


You need to think about host endianness with this overlap and indexing.

There are two different models which can be emulated:

(1) target/{arm,s390x}/ has each uint64_t in host-endian order, but the words are indexed 
little-endian.  See, for instance, target/s390x/tcg/vec.h.


(2) target/{ppc,i386}/ has the entire vector in host-endian order.  See, for instance, 
ZMM_* in target/i386/cpu.h.


If you do nothing, I assume this will fail on a big-endian host.


r~



Re: [RFC PATCH 01/43] target/loongarch: Add vector data type vec_t

2022-12-24 Thread Richard Henderson

On 12/24/22 00:15, Song Gao wrote:

+union fpr_t {
+uint64_t d;
+vec_t vec;
+};
+
  struct LoongArchTLB {
  uint64_t tlb_misc;
  /* Fields corresponding to CSR_TLBELO0/1 */
@@ -251,7 +267,7 @@ typedef struct CPUArchState {
  uint64_t gpr[32];
  uint64_t pc;
  
-uint64_t fpr[32];

+fpr_t fpr[32];


I didn't spot it right away, because you didn't add ".d" to the tcg register allocation, 
but if you use tcg/tcg-op-gvec.h (and you really should), then you will also have to remove



for (i = 0; i < 32; i++) {
int off = offsetof(CPULoongArchState, fpr[i]);
cpu_fpr[i] = tcg_global_mem_new_i64(cpu_env, off, fregnames[i]);
}


because one cannot modify global_mem variables with gvec.

I strongly suggest that you introduce wrappers to load/store fpr values from their env 
slots.  I would name them similarly to gpr_{src,dst}, gen_set_gpr.



r~



Re: [RFC PATCH 01/43] target/loongarch: Add vector data type vec_t

2022-12-24 Thread Richard Henderson

On 12/24/22 00:15, Song Gao wrote:

+union vec_t {
+int8_t   B[LSX_LEN / 8];
+int16_t  H[LSX_LEN / 16];
+int32_t  W[LSX_LEN / 32];
+int64_t  D[LSX_LEN / 64];
+__int128 Q[LSX_LEN / 128];


Oh, you can't use __int128 directly.
It won't compile on 32-bit hosts.


r~



Re: [RFC PATCH 01/43] target/loongarch: Add vector data type vec_t

2022-12-27 Thread gaosong



在 2022/12/25 上午1:24, Richard Henderson 写道:

On 12/24/22 00:15, Song Gao wrote:

+union fpr_t {
+    uint64_t d;
+    vec_t vec;
+};
+
  struct LoongArchTLB {
  uint64_t tlb_misc;
  /* Fields corresponding to CSR_TLBELO0/1 */
@@ -251,7 +267,7 @@ typedef struct CPUArchState {
  uint64_t gpr[32];
  uint64_t pc;
  -    uint64_t fpr[32];
+    fpr_t fpr[32];


I didn't spot it right away, because you didn't add ".d" to the tcg 
register allocation, 

Oh,    my mistake.
but if you use tcg/tcg-op-gvec.h (and you really should), then you 
will also have to remove



    for (i = 0; i < 32; i++) {
    int off = offsetof(CPULoongArchState, fpr[i]);
    cpu_fpr[i] = tcg_global_mem_new_i64(cpu_env, off, fregnames[i]);
    }


because one cannot modify global_mem variables with gvec.

The manual says "The lower 64 bits of each vector register overlap with 
the floating point register of the same number.  In other words
When the basic floating-point instruction is executed to update the 
floating-point register, the low 64 bits of the corresponding LSX register

are also updated to the same value."

So If we don't use the fpr_t.  we should:
1 Update LSX low 64 bits after floating point instruction translation;
2 Update floating-point registers after LSX instruction translation.

Should we do this  or have I misunderstood?
I strongly suggest that you introduce wrappers to load/store fpr 
values from their env slots.  I would name them similarly to 
gpr_{src,dst}, gen_set_gpr.



Got it .

Thanks.
Song Gao




Re: [RFC PATCH 01/43] target/loongarch: Add vector data type vec_t

2022-12-28 Thread Richard Henderson

On 12/27/22 18:34, gaosong wrote:
The manual says "The lower 64 bits of each vector register overlap with the floating point 
register of the same number.  In other words
When the basic floating-point instruction is executed to update the floating-point 
register, the low 64 bits of the corresponding LSX register

are also updated to the same value."

So If we don't use the fpr_t.  we should:
1 Update LSX low 64 bits after floating point instruction translation;
2 Update floating-point registers after LSX instruction translation.

Should we do this  or have I misunderstood?


You should use fpr_t, you should not use cpu_fpr[].
This is the same as aarch64, for instance.

A related question though: does the manual mention whether the fpu instructions only 
modify the lower 64 bits, or do the high 64-bits become zeroed, nanboxed, or unspecified?



I strongly suggest that you introduce wrappers to load/store fpr values from their env 
slots.  I would name them similarly to gpr_{src,dst}, gen_set_gpr.



Got it.



r~




Re: [RFC PATCH 01/43] target/loongarch: Add vector data type vec_t

2022-12-28 Thread gaosong



在 2022/12/29 上午1:30, Richard Henderson 写道:

On 12/27/22 18:34, gaosong wrote:
The manual says "The lower 64 bits of each vector register overlap 
with the floating point register of the same number.  In other words
When the basic floating-point instruction is executed to update the 
floating-point register, the low 64 bits of the corresponding LSX 
register

are also updated to the same value."

So If we don't use the fpr_t.  we should:
1 Update LSX low 64 bits after floating point instruction translation;
2 Update floating-point registers after LSX instruction translation.

Should we do this  or have I misunderstood?


You should use fpr_t, you should not use cpu_fpr[].
This is the same as aarch64, for instance.

A related question though: does the manual mention whether the fpu 
instructions only modify the lower 64 bits, or do the high 64-bits 
become zeroed, nanboxed, or unspecified?




Only modify the lower 64bits,   the high 64-bits is unpecified.

Thanks.
Song Gao
I strongly suggest that you introduce wrappers to load/store fpr 
values from their env slots.  I would name them similarly to 
gpr_{src,dst}, gen_set_gpr.



Got it.



r~





Re: [RFC PATCH 01/43] target/loongarch: Add vector data type vec_t

2022-12-28 Thread Richard Henderson

On 12/28/22 17:51, gaosong wrote:
A related question though: does the manual mention whether the fpu instructions only 
modify the lower 64 bits, or do the high 64-bits become zeroed, nanboxed, or unspecified?




Only modify the lower 64bits,   the high 64-bits is unpecified.


These two options are mutually exclusive.  If upper 64 bits unmodified, then they *are* 
specified to be the previous contents.



r~



Re: [RFC PATCH 01/43] target/loongarch: Add vector data type vec_t

2022-12-28 Thread gaosong



在 2022/12/29 上午11:13, Richard Henderson 写道:

On 12/28/22 17:51, gaosong wrote:
A related question though: does the manual mention whether the fpu 
instructions only modify the lower 64 bits, or do the high 64-bits 
become zeroed, nanboxed, or unspecified?




Only modify the lower 64bits,   the high 64-bits is unpecified.


These two options are mutually exclusive.  If upper 64 bits 
unmodified, then they *are* specified to be the previous contents.



My description is not correct.
'The fpu instruction will modify the low 64 bits, but the high 64 bits 
are unspecified and their values are "unpredictable" '.


Thanks.
Song Gao