This patchseries provides an initial slice of the MVE implementation. (MVE is "vector instructions for M-profile", also known as Helium).
This is not complete support by a long way -- it covers only about 35% of the decode patterns for MVE, and it implements only the slow-path "we need predication, drop out to a helper function" versions of insns. I send it out for two reasons: * if there's something I need to change about the general structure or the way I'm implementing insns, I want to know now rather than after I've implemented the other two thirds of the ISA * if I hold onto the whole patchset until I've got a complete MVE implementation it'll be 150+ patches, 10000 lines of code, and a nightmare to code review The series covers: * framework for MVE decode, including infrastructure for handling predication, PSR.ECI, etc * tail-predication forms of low-overhead-loop insns (LCTP, WLSTP, LETP) * basic (non-gather) loads and stores * pretty much all the integer 2-operand vector and scalar insns * most of the integer 1-operand insns * a handful of other insns (Unfortunately the v8M Arm ARM does not provide a nice neatly separated list of encodings the way the SVE2 XML does. I ended up just pulling all the decode patterns out of the Arm ARM insn descriptions and then hand-sorting them into what looked like common formats. So the insns implemented aren't following a 100% logical order.) As noted above, the implementation here is purely the slow-path fully-generic "call helpers that can handle predication". I do want to implement a fast-path for "we know we have no predication, so we can generate inline vector code", but I'd like to do that as a series of followup patches once the main MVE code has landed. That will (a) make it easier to review, I hope (b) mean we get to "at least functional" MVE quicker and (c) allow people to bisect any regressions to the "add fastpath" patch. Almost nothing in this patchseries is "live code", because no CPU sets the ID register bits to turn on MVE. The exception is the handling of PSR.ECI/ICI, which is enabled at least as far as the ICI bits go for M-profile CPUs (thus fixing the missing corner-case requirement that trying to execute a non-continuable insn with non-zero ICI should fault). My view is that if these patches get through code review we're better off with them in upstream git rather than outside it; open to arguments to the contrary. Patch 1 is RTH's recently posted tcg_remove_ops_after() patch, which we need for the PSR.ECI handling (which indeed is the justification for having that new function in the first place). You can also get this patchset here: https://git.linaro.org/people/peter.maydell/qemu-arm.git mve-drop-1 thanks -- PMM Peter Maydell (54): target/arm: Enable FPSCR.QC bit for MVE target/arm: Handle VPR semantics in existing code target/arm: Add handling for PSR.ECI/ICI target/arm: Let vfp_access_check() handle late NOCP checks target/arm: Implement MVE LCTP target/arm: Implement MVE WLSTP insn target/arm: Implement MVE DLSTP target/arm: Implement MVE LETP insn target/arm: Add framework for MVE decode target/arm: Implement MVE VLDR/VSTR (non-widening forms) target/arm: Implement widening/narrowing MVE VLDR/VSTR insns target/arm: Implement MVE VCLZ target/arm: Implement MVE VCLS bitops.h: Provide hswap32(), hswap64(), wswap64() swapping operations target/arm: Implement MVE VREV16, VREV32, VREV64 target/arm: Implement MVE VMVN (register) target/arm: Implement MVE VABS target/arm: Implement MVE VNEG target/arm: Implement MVE VDUP target/arm: Implement MVE VAND, VBIC, VORR, VORN, VEOR target/arm: Implement MVE VADD, VSUB, VMUL target/arm: Implement MVE VMULH target/arm: Implement MVE VRMULH target/arm: Implement MVE VMAX, VMIN target/arm: Implement MVE VABD target/arm: Implement MVE VHADD, VHSUB target/arm: Implement MVE VMULL target/arm: Implement MVE VMLALDAV target/arm: Implement MVE VMLSLDAV include/qemu/int128.h: Add function to create Int128 from int64_t target/arm: Implement MVE VRMLALDAVH, VRMLSLDAVH target/arm: Implement MVE VADD (scalar) target/arm: Implement MVE VSUB, VMUL (scalar) target/arm: Implement MVE VHADD, VHSUB (scalar) target/arm: Implement MVE VBRSR target/arm: Implement MVE VPST target/arm: Implement MVE VQADD and VQSUB target/arm: Implement MVE VQDMULH and VQRDMULH (scalar) target/arm: Implement MVE VQDMULL scalar target/arm: Implement MVE VQDMULH, VQRDMULH (vector) target/arm: Implement MVE VQADD, VQSUB (vector) target/arm: Implement MVE VQSHL (vector) target/arm: Implement MVE VQRSHL target/arm: Implement MVE VSHL insn target/arm: Implement MVE VRSHL target/arm: Implement MVE VQDMLADH and VQRDMLADH target/arm: Implement MVE VQDMLSDH and VQRDMLSDH target/arm: Implement MVE VQDMULL (vector) target/arm: Implement MVE VRHADD target/arm: Implement MVE VADC, VSBC target/arm: Implement MVE VCADD target/arm: Implement MVE VHCADD target/arm: Implement MVE VADDV target/arm: Make VMOV scalar <-> gpreg beatwise for MVE Richard Henderson (1): tcg: Introduce tcg_remove_ops_after include/qemu/bitops.h | 29 + include/qemu/int128.h | 10 + include/tcg/tcg.h | 1 + target/arm/helper-mve.h | 357 +++++++++ target/arm/helper.h | 2 + target/arm/internals.h | 11 + target/arm/translate-a32.h | 4 + target/arm/translate.h | 19 + target/arm/mve.decode | 261 +++++++ target/arm/t32.decode | 15 +- target/arm/m_helper.c | 54 +- target/arm/mve_helper.c | 1343 +++++++++++++++++++++++++++++++++ target/arm/sve_helper.c | 20 - target/arm/translate-m-nocp.c | 16 +- target/arm/translate-mve.c | 865 +++++++++++++++++++++ target/arm/translate-vfp.c | 152 +++- target/arm/translate.c | 301 +++++++- target/arm/vfp_helper.c | 3 +- tcg/tcg.c | 13 + target/arm/meson.build | 3 + 20 files changed, 3408 insertions(+), 71 deletions(-) create mode 100644 target/arm/helper-mve.h create mode 100644 target/arm/mve.decode create mode 100644 target/arm/mve_helper.c create mode 100644 target/arm/translate-mve.c -- 2.20.1