arm: First slice of MVE implementation

Peter Maydell Mon, 07 Jun 2021 09:59:45 -0700

This patchseries provides an initial slice of the MVE
implementation. (MVE is "vector instructions for M-profile", also
known as Helium).


This is not complete support by a long way -- it covers only about 35%
of the decode patterns for MVE, and it implements only the slow-path
"we need predication, drop out to a helper function" versions of
insns. I send it out for two reasons:

 * if there's something I need to change about the general structure
   or the way I'm implementing insns, I want to know now rather than
   after I've implemented the other two thirds of the ISA

 * if I hold onto the whole patchset until I've got a complete MVE
   implementation it'll be 150+ patches, 10000 lines of code, and
   a nightmare to code review

The series covers:
 * framework for MVE decode, including infrastructure for
   handling predication, PSR.ECI, etc
 * tail-predication forms of low-overhead-loop insns (LCTP, WLSTP, LETP)
 * basic (non-gather) loads and stores
 * pretty much all the integer 2-operand vector and scalar insns
 * most of the integer 1-operand insns
 * a handful of other insns

(Unfortunately the v8M Arm ARM does not provide a nice neatly
separated list of encodings the way the SVE2 XML does.  I ended up
just pulling all the decode patterns out of the Arm ARM insn
descriptions and then hand-sorting them into what looked like common
formats. So the insns implemented aren't following a 100% logical
order.)

As noted above, the implementation here is purely the slow-path
fully-generic "call helpers that can handle predication". I do
want to implement a fast-path for "we know we have no predication,
so we can generate inline vector code", but I'd like to do that
as a series of followup patches once the main MVE code has landed.
That will (a) make it easier to review, I hope (b) mean we get to
"at least functional" MVE quicker and (c) allow people to bisect
any regressions to the "add fastpath" patch.

Almost nothing in this patchseries is "live code", because no CPU sets
the ID register bits to turn on MVE.  The exception is the handling of
PSR.ECI/ICI, which is enabled at least as far as the ICI bits go for
M-profile CPUs (thus fixing the missing corner-case requirement that
trying to execute a non-continuable insn with non-zero ICI should
fault).

My view is that if these patches get through code review we're better
off with them in upstream git rather than outside it; open to
arguments to the contrary.

Patch 1 is RTH's recently posted tcg_remove_ops_after() patch,
which we need for the PSR.ECI handling (which indeed is the
justification for having that new function in the first place).

You can also get this patchset here:
 https://git.linaro.org/people/peter.maydell/qemu-arm.git mve-drop-1

thanks
-- PMM

Peter Maydell (54):
  target/arm: Enable FPSCR.QC bit for MVE
  target/arm: Handle VPR semantics in existing code
  target/arm: Add handling for PSR.ECI/ICI
  target/arm: Let vfp_access_check() handle late NOCP checks
  target/arm: Implement MVE LCTP
  target/arm: Implement MVE WLSTP insn
  target/arm: Implement MVE DLSTP
  target/arm: Implement MVE LETP insn
  target/arm: Add framework for MVE decode
  target/arm: Implement MVE VLDR/VSTR (non-widening forms)
  target/arm: Implement widening/narrowing MVE VLDR/VSTR insns
  target/arm: Implement MVE VCLZ
  target/arm: Implement MVE VCLS
  bitops.h: Provide hswap32(), hswap64(), wswap64() swapping operations
  target/arm: Implement MVE VREV16, VREV32, VREV64
  target/arm: Implement MVE VMVN (register)
  target/arm: Implement MVE VABS
  target/arm: Implement MVE VNEG
  target/arm: Implement MVE VDUP
  target/arm: Implement MVE VAND, VBIC, VORR, VORN, VEOR
  target/arm: Implement MVE VADD, VSUB, VMUL
  target/arm: Implement MVE VMULH
  target/arm: Implement MVE VRMULH
  target/arm: Implement MVE VMAX, VMIN
  target/arm: Implement MVE VABD
  target/arm: Implement MVE VHADD, VHSUB
  target/arm: Implement MVE VMULL
  target/arm: Implement MVE VMLALDAV
  target/arm: Implement MVE VMLSLDAV
  include/qemu/int128.h: Add function to create Int128 from int64_t
  target/arm: Implement MVE VRMLALDAVH, VRMLSLDAVH
  target/arm: Implement MVE VADD (scalar)
  target/arm: Implement MVE VSUB, VMUL (scalar)
  target/arm: Implement MVE VHADD, VHSUB (scalar)
  target/arm: Implement MVE VBRSR
  target/arm: Implement MVE VPST
  target/arm: Implement MVE VQADD and VQSUB
  target/arm: Implement MVE VQDMULH and VQRDMULH (scalar)
  target/arm: Implement MVE VQDMULL scalar
  target/arm: Implement MVE VQDMULH, VQRDMULH (vector)
  target/arm: Implement MVE VQADD, VQSUB (vector)
  target/arm: Implement MVE VQSHL (vector)
  target/arm: Implement MVE VQRSHL
  target/arm: Implement MVE VSHL insn
  target/arm: Implement MVE VRSHL
  target/arm: Implement MVE VQDMLADH and VQRDMLADH
  target/arm: Implement MVE VQDMLSDH and VQRDMLSDH
  target/arm: Implement MVE VQDMULL (vector)
  target/arm: Implement MVE VRHADD
  target/arm: Implement MVE VADC, VSBC
  target/arm: Implement MVE VCADD
  target/arm: Implement MVE VHCADD
  target/arm: Implement MVE VADDV
  target/arm: Make VMOV scalar <-> gpreg beatwise for MVE

Richard Henderson (1):
  tcg: Introduce tcg_remove_ops_after

 include/qemu/bitops.h         |   29 +
 include/qemu/int128.h         |   10 +
 include/tcg/tcg.h             |    1 +
 target/arm/helper-mve.h       |  357 +++++++++
 target/arm/helper.h           |    2 +
 target/arm/internals.h        |   11 +
 target/arm/translate-a32.h    |    4 +
 target/arm/translate.h        |   19 +
 target/arm/mve.decode         |  261 +++++++
 target/arm/t32.decode         |   15 +-
 target/arm/m_helper.c         |   54 +-
 target/arm/mve_helper.c       | 1343 +++++++++++++++++++++++++++++++++
 target/arm/sve_helper.c       |   20 -
 target/arm/translate-m-nocp.c |   16 +-
 target/arm/translate-mve.c    |  865 +++++++++++++++++++++
 target/arm/translate-vfp.c    |  152 +++-
 target/arm/translate.c        |  301 +++++++-
 target/arm/vfp_helper.c       |    3 +-
 tcg/tcg.c                     |   13 +
 target/arm/meson.build        |    3 +
 20 files changed, 3408 insertions(+), 71 deletions(-)
 create mode 100644 target/arm/helper-mve.h
 create mode 100644 target/arm/mve.decode
 create mode 100644 target/arm/mve_helper.c
 create mode 100644 target/arm/translate-mve.c

-- 
2.20.1

[PATCH 00/55] target/arm: First slice of MVE implementation

Reply via email to