[RFC PATCH 00/17] (The beginning of) a new i386 decoder

Paolo Bonzini Wed, 24 Aug 2022 10:36:28 -0700

While looking again at Paul's patches for AVX, I came to the conclusion
that the x86 decoder is unsalvageable.  The encoding of x86 is simply too
messy for it to be decoded in code; huge tables, derived as much as possible
from the architecture reference, are the real way to go.


So here is a new, albeit partial decoder, that is based on three principles:

- use mostly table-driven decoding, using tables derived as much as possible
  from the Intel manual, keeping the code as "non-branchy" as possible

- keep address generation and (for ALU operands) memory loads and write back
  as much in common code as possible, to avoid code duplication (this
  is less relevant to non-ALU instructions because read-modify-write
  operations are rare)

- do minimal changes on the old decoder while allowing incremental
  replacement of the old decoder with the new one

So this series introduces the main decoder flow, integrates it with the
old decoder (which takes care of parsing prefixes and then optionally
drops to the new one based on the first byte of the opcode), and
implements three quarters of the one byte opcodes.

It is only lightly tested but it can boot to iPXE and run some 64-bit
coreutils just fine; Linux seems to trigger a bug in outsw/l/q emulation
that I haven't checked yet, but still it's enough to show the result of
a couple days of hacking.

The generated code is mostly the same, though marginally worse in some
cases because I privileged code simplicity.  For example, MOVSXD is not
able to use MO_SL and falls back to MO_UL + sign extension.  One notable
difference is that the new decoder always sign-extends 8-bit immediates,
so for example a "cmpb $e9, %dl" instruction will subtract $0xfff...fffe9
from the temporary value.  This is the way Intel intended "Ib" immediates
to work, and there's no difference between the two.

Anyay, porting these opcodes is really more of a validation for the
whole concept and a test for the common decoder code; it's probably more
efficient to focus on the SSE and VEX 2-byte and 3-byte opcodes as a path
towards enabling AVX in QEMU, and keep the existing decoder for non-VEX,
non-SSE opcodes.  Getting the conditions right for VEX.L, VEX.W etc. is
going to be, well, vexing because of the way Intel has decided to format
the exception tables in the manual, but it should be feasible to use a
more table-based decoding process for those operations as well.

The series is available at https://gitlab.com/bonzini/qemu.git, branch i386.

Paolo

Paolo Bonzini (17):
  target/i386: extract old decoder to a separate file
  target/i386: introduce insn_get_addr
  target/i386: add core of new i386 decoder
  target/i386: add ALU load/writeback core
  target/i386: add 00-07, 10-17 opcodes
  target/i386: add 08-0F, 18-1F opcodes
  target/i386: add 20-27, 30-37 opcodes
  target/i386: add 28-2f, 38-3f opcodes
  target/i386: add 40-47, 50-57 opcodes
  target/i386: add 48-4f, 58-5f opcodes
  target/i386: add 60-67, 70-77 opcodes
  target/i386: add 68-6f, 78-7f opcodes
  target/i386: add 80-87, 90-97 opcodes
  target/i386: add a0-a7, b0-b7 opcodes
  target/i386: do not clobber A0 in POP translation
  target/i386: add 88-8f, 98-9f opcodes
  target/i386: add a8-af, b8-bf opcodes

 target/i386/tcg/decode-new.c.inc | 1254 +++++++
 target/i386/tcg/decode-old.c.inc | 5707 +++++++++++++++++++++++++++++
 target/i386/tcg/emit.c.inc       |  684 ++++
 target/i386/tcg/translate.c      | 5822 +-----------------------------
 4 files changed, 7740 insertions(+), 5727 deletions(-)
 create mode 100644 target/i386/tcg/decode-new.c.inc
 create mode 100644 target/i386/tcg/decode-old.c.inc
 create mode 100644 target/i386/tcg/emit.c.inc

-- 
2.37.1

[RFC PATCH 00/17] (The beginning of) a new i386 decoder

Reply via email to