On 12/11/18 1:21 PM, Mark Cave-Ayland wrote: >> Note however, that there are other steps that you must add here before using >> vector operations in the next patch: >> >> (1a) The fpr and vsr arrays must be merged, since fpr[n] == vsrh[n]. >> If this isn't done, then you simply cannot apply one operation >> to two disjoint memory blocks. >> >> (1b) The vsr and avr arrays should be merged, since vsr[32+n] == avr[n]. >> This is simply tidiness, matching the layout to the architecture. >> >> These steps will modify gdbstub.c, machine.c, and linux-user/. > > The reason I didn't touch the VSR arrays was because I was hoping that this > could be > done as a follow up later; my thought was that since I'd only introduced > vector > operations into the VMX instructions then currently no vector operations > could be > done across the 2 separate memory blocks?
True, until you convert the VSX insns you can delay this. Though honestly I would consider doing both at once. >> (2) The vsr array needs to be QEMU_ALIGN(16). See target/arm/cpu.h. >> We assert that the host addresses are 16 byte aligned, so that we >> can eventually use Altivec/VSX in tcg/ppc/. > > That's a good observation. Presumably being on Intel the unaligned accesses > would > still work but just be slower? I've certainly seen the new vector ops being > emitted > in the generated code. Yes, currently I generate unaligned loads. It made sense when considering AVX2 and ARM SVE, since I do not increase the alignment requirements to 32-bytes when using 256-bit vectors. I do wonder if I should go back and generate aligned loads, just to raise SIGBUS when one has forgotten the QEMU_ALIGN marker, as a portability aid. r~