Re: [Qemu-devel] [RFC PATCH 4/6] target/ppc: switch FPR, VMX and VSX helpers to access data directly from cpu_env

2018-12-11 Thread Richard Henderson
On 12/11/18 1:21 PM, Mark Cave-Ayland wrote:
>> Note however, that there are other steps that you must add here before using
>> vector operations in the next patch:
>>
>> (1a) The fpr and vsr arrays must be merged, since fpr[n] == vsrh[n].
>>  If this isn't done, then you simply cannot apply one operation
>>  to two disjoint memory blocks.
>>
>> (1b) The vsr and avr arrays should be merged, since vsr[32+n] == avr[n].
>>  This is simply tidiness, matching the layout to the architecture.
>>
>> These steps will modify gdbstub.c, machine.c, and linux-user/.
> 
> The reason I didn't touch the VSR arrays was because I was hoping that this 
> could be
> done as a follow up later; my thought was that since I'd only introduced 
> vector
> operations into the VMX instructions then currently no vector operations 
> could be
> done across the 2 separate memory blocks?

True, until you convert the VSX insns you can delay this.
Though honestly I would consider doing both at once.

>> (2) The vsr array needs to be QEMU_ALIGN(16).  See target/arm/cpu.h.
>> We assert that the host addresses are 16 byte aligned, so that we
>> can eventually use Altivec/VSX in tcg/ppc/.
> 
> That's a good observation. Presumably being on Intel the unaligned accesses 
> would
> still work but just be slower? I've certainly seen the new vector ops being 
> emitted
> in the generated code.

Yes, currently I generate unaligned loads.  It made sense when considering AVX2
and ARM SVE, since I do not increase the alignment requirements to 32-bytes
when using 256-bit vectors.

I do wonder if I should go back and generate aligned loads, just to raise
SIGBUS when one has forgotten the QEMU_ALIGN marker, as a portability aid.


r~



Re: [Qemu-devel] [RFC PATCH 4/6] target/ppc: switch FPR, VMX and VSX helpers to access data directly from cpu_env

2018-12-11 Thread Mark Cave-Ayland
On 10/12/2018 19:05, Richard Henderson wrote:

> On 12/7/18 2:56 AM, Mark Cave-Ayland wrote:
>> Instead of accessing the FPR, VMX and VSX registers through static arrays of
>> TCGv_i64 globals, remove them and change the helpers to load/store data 
>> directly
>> within cpu_env.
>>
>> Signed-off-by: Mark Cave-Ayland 
>> ---
>>  target/ppc/translate.c | 59 
>> ++
>>  1 file changed, 16 insertions(+), 43 deletions(-)
> 
> Reviewed-by: Richard Henderson 
> 
> Note however, that there are other steps that you must add here before using
> vector operations in the next patch:
> 
> (1a) The fpr and vsr arrays must be merged, since fpr[n] == vsrh[n].
>  If this isn't done, then you simply cannot apply one operation
>  to two disjoint memory blocks.
> 
> (1b) The vsr and avr arrays should be merged, since vsr[32+n] == avr[n].
>  This is simply tidiness, matching the layout to the architecture.
> 
> These steps will modify gdbstub.c, machine.c, and linux-user/.

The reason I didn't touch the VSR arrays was because I was hoping that this 
could be
done as a follow up later; my thought was that since I'd only introduced vector
operations into the VMX instructions then currently no vector operations could 
be
done across the 2 separate memory blocks?

> (2) The vsr array needs to be QEMU_ALIGN(16).  See target/arm/cpu.h.
> We assert that the host addresses are 16 byte aligned, so that we
> can eventually use Altivec/VSX in tcg/ppc/.

That's a good observation. Presumably being on Intel the unaligned accesses 
would
still work but just be slower? I've certainly seen the new vector ops being 
emitted
in the generated code.


ATB,

Mark.



Re: [Qemu-devel] [RFC PATCH 4/6] target/ppc: switch FPR, VMX and VSX helpers to access data directly from cpu_env

2018-12-10 Thread Richard Henderson
On 12/7/18 2:56 AM, Mark Cave-Ayland wrote:
> Instead of accessing the FPR, VMX and VSX registers through static arrays of
> TCGv_i64 globals, remove them and change the helpers to load/store data 
> directly
> within cpu_env.
> 
> Signed-off-by: Mark Cave-Ayland 
> ---
>  target/ppc/translate.c | 59 
> ++
>  1 file changed, 16 insertions(+), 43 deletions(-)

Reviewed-by: Richard Henderson 

Note however, that there are other steps that you must add here before using
vector operations in the next patch:

(1a) The fpr and vsr arrays must be merged, since fpr[n] == vsrh[n].
 If this isn't done, then you simply cannot apply one operation
 to two disjoint memory blocks.

(1b) The vsr and avr arrays should be merged, since vsr[32+n] == avr[n].
 This is simply tidiness, matching the layout to the architecture.

These steps will modify gdbstub.c, machine.c, and linux-user/.

(2) The vsr array needs to be QEMU_ALIGN(16).  See target/arm/cpu.h.
We assert that the host addresses are 16 byte aligned, so that we
can eventually use Altivec/VSX in tcg/ppc/.


r~