On 12/11/18 1:21 PM, Mark Cave-Ayland wrote:
>> Note however, that there are other steps that you must add here before using
>> vector operations in the next patch:
>>
>> (1a) The fpr and vsr arrays must be merged, since fpr[n] == vsrh[n].
>>      If this isn't done, then you simply cannot apply one operation
>>      to two disjoint memory blocks.
>>
>> (1b) The vsr and avr arrays should be merged, since vsr[32+n] == avr[n].
>>      This is simply tidiness, matching the layout to the architecture.
>>
>> These steps will modify gdbstub.c, machine.c, and linux-user/.
> 
> The reason I didn't touch the VSR arrays was because I was hoping that this 
> could be
> done as a follow up later; my thought was that since I'd only introduced 
> vector
> operations into the VMX instructions then currently no vector operations 
> could be
> done across the 2 separate memory blocks?

True, until you convert the VSX insns you can delay this.
Though honestly I would consider doing both at once.

>> (2) The vsr array needs to be QEMU_ALIGN(16).  See target/arm/cpu.h.
>>     We assert that the host addresses are 16 byte aligned, so that we
>>     can eventually use Altivec/VSX in tcg/ppc/.
> 
> That's a good observation. Presumably being on Intel the unaligned accesses 
> would
> still work but just be slower? I've certainly seen the new vector ops being 
> emitted
> in the generated code.

Yes, currently I generate unaligned loads.  It made sense when considering AVX2
and ARM SVE, since I do not increase the alignment requirements to 32-bytes
when using 256-bit vectors.

I do wonder if I should go back and generate aligned loads, just to raise
SIGBUS when one has forgotten the QEMU_ALIGN marker, as a portability aid.


r~

Reply via email to