On Wed, Jun 26, 2019 at 05:22:34PM +0200, Richard Henderson wrote: > On 6/21/19 6:34 PM, Andrew Jones wrote: > > +/* > > + * If ARM_MAX_VQ is increased to be greater than 16, then we can no > > + * longer hard code slices to 1 in kvm_arch_put/get_sve(). > > + */ > > +QEMU_BUILD_BUG_ON(ARM_MAX_VQ > 16); > > This seems easy to fix, or simply drop the slices entirely for now, as > otherwise they are a teeny bit confusing.
I can do that, but as I replied down thread, I sort of like it this way for documentation purposes. Anyway, I don't have a strong opinion here, so I'm happy to make reviewers happy :-) > > It's a shame that these slices exist at all. It seems like the kernel could > use the negotiated max sve size to grab the data all at once. > > > + for (n = 0; n < KVM_ARM64_SVE_NUM_ZREGS; n++) { > > + uint64_t *q = aa64_vfp_qreg(env, n); > > +#ifdef HOST_WORDS_BIGENDIAN > > + uint64_t d[ARM_MAX_VQ * 2]; > > + int j; > > + for (j = 0; j < cpu->sve_max_vq * 2; j++) { > > + d[j] = bswap64(q[j]); > > + } > > + reg.addr = (uintptr_t)d; > > +#else > > + reg.addr = (uintptr_t)q; > > +#endif > > + reg.id = KVM_REG_ARM64_SVE_ZREG(n, i); > > + ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, ®); > > It might be worth splitting this... > > > + for (n = 0; n < KVM_ARM64_SVE_NUM_PREGS; n++) { > > + uint64_t *q = &env->vfp.pregs[n].p[0]; > > +#ifdef HOST_WORDS_BIGENDIAN > > + uint64_t d[ARM_MAX_VQ * 2 / 8]; > > + int j; > > + for (j = 0; j < cpu->sve_max_vq * 2 / 8; j++) { > > + d[j] = bswap64(q[j]); > > + } > > + reg.addr = (uintptr_t)d; > > +#else > > + reg.addr = (uintptr_t)q; > > +#endif > > + reg.id = KVM_REG_ARM64_SVE_PREG(n, i); > > + ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, ®); > > ... and this (unified w/ reg + size parameters?) to a function because ... > > > + reg.addr = (uintptr_t)&env->vfp.pregs[FFR_PRED_NUM].p[0]; > > + reg.id = KVM_REG_ARM64_SVE_FFR(i); > > + ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, ®); > > ... you forgot to apply the bswap here. Ah, thanks for catching this. I'll fix it for v3, possibly with the factoring, as you suggest. > > Likewise for the other direction. > > > r~ > > > PS: It's also tempting to drop the ifdefs and, since we know the host supports > sve instructions, and that the host supports sve_max_vq, do the reformatting > as > > uint64_t scratch[ARM_MAX_VQ * 2]; > asm("whilelo p0.d, xzr, %2\n\t" > "ld1d z0.d, p0/z [%1]\n\t" > "str z0, [%0]" > : "=Q"(scratch) > : "Q"(*aa64_vfp_qreg(env, n)), > "r"(cpu->sve_max_vq) > : "p0", "v0"); This is nice, but as we don't have any other asm's in this file, I'm inclined to leave it with the loops/swaps until we can use a builtin, as you suggest below. > > PPS: Ideally, this would be further cleaned up with acle builtins, but those > are still under development for GCC. > Thanks, drew