Re: [Qemu-devel] [RFC PATCH 4/6] target/ppc: switch FPR, VMX and VSX helpers to access data directly from cpu_env

2018-12-11 Thread Richard Henderson
On 12/11/18 1:21 PM, Mark Cave-Ayland wrote:
>> Note however, that there are other steps that you must add here before using
>> vector operations in the next patch:
>>
>> (1a) The fpr and vsr arrays must be merged, since fpr[n] == vsrh[n].
>>  If this isn't done, then you simply cannot apply one operation
>>  to two disjoint memory blocks.
>>
>> (1b) The vsr and avr arrays should be merged, since vsr[32+n] == avr[n].
>>  This is simply tidiness, matching the layout to the architecture.
>>
>> These steps will modify gdbstub.c, machine.c, and linux-user/.
> 
> The reason I didn't touch the VSR arrays was because I was hoping that this 
> could be
> done as a follow up later; my thought was that since I'd only introduced 
> vector
> operations into the VMX instructions then currently no vector operations 
> could be
> done across the 2 separate memory blocks?

True, until you convert the VSX insns you can delay this.
Though honestly I would consider doing both at once.

>> (2) The vsr array needs to be QEMU_ALIGN(16).  See target/arm/cpu.h.
>> We assert that the host addresses are 16 byte aligned, so that we
>> can eventually use Altivec/VSX in tcg/ppc/.
> 
> That's a good observation. Presumably being on Intel the unaligned accesses 
> would
> still work but just be slower? I've certainly seen the new vector ops being 
> emitted
> in the generated code.

Yes, currently I generate unaligned loads.  It made sense when considering AVX2
and ARM SVE, since I do not increase the alignment requirements to 32-bytes
when using 256-bit vectors.

I do wonder if I should go back and generate aligned loads, just to raise
SIGBUS when one has forgotten the QEMU_ALIGN marker, as a portability aid.


r~



Re: [Qemu-devel] [RFC PATCH 4/6] target/ppc: switch FPR, VMX and VSX helpers to access data directly from cpu_env

2018-12-11 Thread Mark Cave-Ayland
On 10/12/2018 19:05, Richard Henderson wrote:

> On 12/7/18 2:56 AM, Mark Cave-Ayland wrote:
>> Instead of accessing the FPR, VMX and VSX registers through static arrays of
>> TCGv_i64 globals, remove them and change the helpers to load/store data 
>> directly
>> within cpu_env.
>>
>> Signed-off-by: Mark Cave-Ayland 
>> ---
>>  target/ppc/translate.c | 59 
>> ++
>>  1 file changed, 16 insertions(+), 43 deletions(-)
> 
> Reviewed-by: Richard Henderson 
> 
> Note however, that there are other steps that you must add here before using
> vector operations in the next patch:
> 
> (1a) The fpr and vsr arrays must be merged, since fpr[n] == vsrh[n].
>  If this isn't done, then you simply cannot apply one operation
>  to two disjoint memory blocks.
> 
> (1b) The vsr and avr arrays should be merged, since vsr[32+n] == avr[n].
>  This is simply tidiness, matching the layout to the architecture.
> 
> These steps will modify gdbstub.c, machine.c, and linux-user/.

The reason I didn't touch the VSR arrays was because I was hoping that this 
could be
done as a follow up later; my thought was that since I'd only introduced vector
operations into the VMX instructions then currently no vector operations could 
be
done across the 2 separate memory blocks?

> (2) The vsr array needs to be QEMU_ALIGN(16).  See target/arm/cpu.h.
> We assert that the host addresses are 16 byte aligned, so that we
> can eventually use Altivec/VSX in tcg/ppc/.

That's a good observation. Presumably being on Intel the unaligned accesses 
would
still work but just be slower? I've certainly seen the new vector ops being 
emitted
in the generated code.


ATB,

Mark.



Re: [Qemu-devel] [RFC PATCH 4/6] target/ppc: switch FPR, VMX and VSX helpers to access data directly from cpu_env

2018-12-10 Thread Richard Henderson
On 12/7/18 2:56 AM, Mark Cave-Ayland wrote:
> Instead of accessing the FPR, VMX and VSX registers through static arrays of
> TCGv_i64 globals, remove them and change the helpers to load/store data 
> directly
> within cpu_env.
> 
> Signed-off-by: Mark Cave-Ayland 
> ---
>  target/ppc/translate.c | 59 
> ++
>  1 file changed, 16 insertions(+), 43 deletions(-)

Reviewed-by: Richard Henderson 

Note however, that there are other steps that you must add here before using
vector operations in the next patch:

(1a) The fpr and vsr arrays must be merged, since fpr[n] == vsrh[n].
 If this isn't done, then you simply cannot apply one operation
 to two disjoint memory blocks.

(1b) The vsr and avr arrays should be merged, since vsr[32+n] == avr[n].
 This is simply tidiness, matching the layout to the architecture.

These steps will modify gdbstub.c, machine.c, and linux-user/.

(2) The vsr array needs to be QEMU_ALIGN(16).  See target/arm/cpu.h.
We assert that the host addresses are 16 byte aligned, so that we
can eventually use Altivec/VSX in tcg/ppc/.


r~



[Qemu-devel] [RFC PATCH 4/6] target/ppc: switch FPR, VMX and VSX helpers to access data directly from cpu_env

2018-12-07 Thread Mark Cave-Ayland
Instead of accessing the FPR, VMX and VSX registers through static arrays of
TCGv_i64 globals, remove them and change the helpers to load/store data directly
within cpu_env.

Signed-off-by: Mark Cave-Ayland 
---
 target/ppc/translate.c | 59 ++
 1 file changed, 16 insertions(+), 43 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index fa3e8dc114..5923c688cd 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -55,15 +55,9 @@
 /* global register indexes */
 static char cpu_reg_names[10*3 + 22*4 /* GPR */
 + 10*4 + 22*5 /* SPE GPRh */
-+ 10*4 + 22*5 /* FPR */
-+ 2*(10*6 + 22*7) /* AVRh, AVRl */
-+ 10*5 + 22*6 /* VSR */
 + 8*5 /* CRF */];
 static TCGv cpu_gpr[32];
 static TCGv cpu_gprh[32];
-static TCGv_i64 cpu_fpr[32];
-static TCGv_i64 cpu_avrh[32], cpu_avrl[32];
-static TCGv_i64 cpu_vsr[32];
 static TCGv_i32 cpu_crf[8];
 static TCGv cpu_nip;
 static TCGv cpu_msr;
@@ -108,39 +102,6 @@ void ppc_translate_init(void)
  offsetof(CPUPPCState, gprh[i]), p);
 p += (i < 10) ? 4 : 5;
 cpu_reg_names_size -= (i < 10) ? 4 : 5;
-
-snprintf(p, cpu_reg_names_size, "fp%d", i);
-cpu_fpr[i] = tcg_global_mem_new_i64(cpu_env,
-offsetof(CPUPPCState, fpr[i]), p);
-p += (i < 10) ? 4 : 5;
-cpu_reg_names_size -= (i < 10) ? 4 : 5;
-
-snprintf(p, cpu_reg_names_size, "avr%dH", i);
-#ifdef HOST_WORDS_BIGENDIAN
-cpu_avrh[i] = tcg_global_mem_new_i64(cpu_env,
- offsetof(CPUPPCState, 
avr[i].u64[0]), p);
-#else
-cpu_avrh[i] = tcg_global_mem_new_i64(cpu_env,
- offsetof(CPUPPCState, 
avr[i].u64[1]), p);
-#endif
-p += (i < 10) ? 6 : 7;
-cpu_reg_names_size -= (i < 10) ? 6 : 7;
-
-snprintf(p, cpu_reg_names_size, "avr%dL", i);
-#ifdef HOST_WORDS_BIGENDIAN
-cpu_avrl[i] = tcg_global_mem_new_i64(cpu_env,
- offsetof(CPUPPCState, 
avr[i].u64[1]), p);
-#else
-cpu_avrl[i] = tcg_global_mem_new_i64(cpu_env,
- offsetof(CPUPPCState, 
avr[i].u64[0]), p);
-#endif
-p += (i < 10) ? 6 : 7;
-cpu_reg_names_size -= (i < 10) ? 6 : 7;
-snprintf(p, cpu_reg_names_size, "vsr%d", i);
-cpu_vsr[i] = tcg_global_mem_new_i64(cpu_env,
-offsetof(CPUPPCState, vsr[i]), p);
-p += (i < 10) ? 5 : 6;
-cpu_reg_names_size -= (i < 10) ? 5 : 6;
 }
 
 cpu_nip = tcg_global_mem_new(cpu_env,
@@ -6696,22 +6657,34 @@ GEN_TM_PRIV_NOOP(trechkpt);
 
 static inline void get_fpr(TCGv_i64 dst, int regno)
 {
-tcg_gen_mov_i64(dst, cpu_fpr[regno]);
+tcg_gen_ld_i64(dst, cpu_env, offsetof(CPUPPCState, fpr[regno]));
 }
 
 static inline void set_fpr(int regno, TCGv_i64 src)
 {
-tcg_gen_mov_i64(cpu_fpr[regno], src);
+tcg_gen_st_i64(src, cpu_env, offsetof(CPUPPCState, fpr[regno]));
 }
 
 static inline void get_avr64(TCGv_i64 dst, int regno, bool high)
 {
-tcg_gen_mov_i64(dst, (high ? cpu_avrh : cpu_avrl)[regno]);
+#ifdef HOST_WORDS_BIGENDIAN
+tcg_gen_ld_i64(dst, cpu_env, offsetof(CPUPPCState,
+  avr[regno].u64[(high ? 0 : 1)]));
+#else
+tcg_gen_ld_i64(dst, cpu_env, offsetof(CPUPPCState,
+  avr[regno].u64[(high ? 1 : 0)]));
+#endif
 }
 
 static inline void set_avr64(int regno, TCGv_i64 src, bool high)
 {
-tcg_gen_mov_i64((high ? cpu_avrh : cpu_avrl)[regno], src);
+#ifdef HOST_WORDS_BIGENDIAN
+tcg_gen_st_i64(src, cpu_env, offsetof(CPUPPCState,
+  avr[regno].u64[(high ? 0 : 1)]));
+#else
+tcg_gen_st_i64(src, cpu_env, offsetof(CPUPPCState,
+  avr[regno].u64[(high ? 1 : 0)]));
+#endif
 }
 
 #include "translate/fp-impl.inc.c"
-- 
2.11.0