On 6/21/19 6:34 PM, Andrew Jones wrote: > +/* > + * If ARM_MAX_VQ is increased to be greater than 16, then we can no > + * longer hard code slices to 1 in kvm_arch_put/get_sve(). > + */ > +QEMU_BUILD_BUG_ON(ARM_MAX_VQ > 16);
This seems easy to fix, or simply drop the slices entirely for now, as otherwise they are a teeny bit confusing. It's a shame that these slices exist at all. It seems like the kernel could use the negotiated max sve size to grab the data all at once. > + for (n = 0; n < KVM_ARM64_SVE_NUM_ZREGS; n++) { > + uint64_t *q = aa64_vfp_qreg(env, n); > +#ifdef HOST_WORDS_BIGENDIAN > + uint64_t d[ARM_MAX_VQ * 2]; > + int j; > + for (j = 0; j < cpu->sve_max_vq * 2; j++) { > + d[j] = bswap64(q[j]); > + } > + reg.addr = (uintptr_t)d; > +#else > + reg.addr = (uintptr_t)q; > +#endif > + reg.id = KVM_REG_ARM64_SVE_ZREG(n, i); > + ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, ®); It might be worth splitting this... > + for (n = 0; n < KVM_ARM64_SVE_NUM_PREGS; n++) { > + uint64_t *q = &env->vfp.pregs[n].p[0]; > +#ifdef HOST_WORDS_BIGENDIAN > + uint64_t d[ARM_MAX_VQ * 2 / 8]; > + int j; > + for (j = 0; j < cpu->sve_max_vq * 2 / 8; j++) { > + d[j] = bswap64(q[j]); > + } > + reg.addr = (uintptr_t)d; > +#else > + reg.addr = (uintptr_t)q; > +#endif > + reg.id = KVM_REG_ARM64_SVE_PREG(n, i); > + ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, ®); ... and this (unified w/ reg + size parameters?) to a function because ... > + reg.addr = (uintptr_t)&env->vfp.pregs[FFR_PRED_NUM].p[0]; > + reg.id = KVM_REG_ARM64_SVE_FFR(i); > + ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, ®); ... you forgot to apply the bswap here. Likewise for the other direction. r~ PS: It's also tempting to drop the ifdefs and, since we know the host supports sve instructions, and that the host supports sve_max_vq, do the reformatting as uint64_t scratch[ARM_MAX_VQ * 2]; asm("whilelo p0.d, xzr, %2\n\t" "ld1d z0.d, p0/z [%1]\n\t" "str z0, [%0]" : "=Q"(scratch) : "Q"(*aa64_vfp_qreg(env, n)), "r"(cpu->sve_max_vq) : "p0", "v0"); PPS: Ideally, this would be further cleaned up with acle builtins, but those are still under development for GCC.