This series implements APX support. This is little more than some Christmas time hacking, and I have tested it for now only on user mode emulation. The main reason to do this is actually to have some initial infrastructure for EVEX, without requiring all the complexity of AVX512. It also forces some changes that (hopefully) make QEMU's decoder align a bit more with what Intel processors actually do. These (patches 1, 4, 8, 9 in particular) could be extracted and committed separately, though at this point this would be something for 11.1 anyway.
There are relatively few *new* instructions (CCMP/CTEST, CFCMOV, PUSH2/POP2), all of them trivial except for CCMP/CTEST. Therefore, most of the new code is for decoding. The new data destination feature comes almost for free thanks to the existing support for BMI instructions, and variants such as no flags update and zero-upper are quite easy as well. For CCMP/CTEST, I tried to make them reasonably efficient, also thanks to the changes to AF/CF/OF computation that came over the past year, but that takes quite a few lines of code. Don't expect any performance gains. APX binaries do produce about 1% fewer TCG ops, but they map to about 1% *more* assembly instructions, at least for x86-on-x86: that's because while the optimizer could already produce roughly the same ops as NDD or NF instructions, the new PUSH2/POP2 instructions include a stack alignment check that isn't there in non-APX code. I don't think it's worth wasting a precious HF_ bit for it, at least not until for 10 years or so. Other than testing system emulation, the main decision to take is whether to treat VEX and EVEX maps as an extension of the regular maps, or just bite the bullet, copy them over to a new array and define them from scratch. There are some annoying differences already in APX (accepted prefixes and opcodes moved to a different spot) and there would be even more with AVX512, for example mask register instructions use opcodes in VEX map 1 that overlap with two-byte 0F opcodes. I don't think I will have much time to work on this for a few months, since I did it just out of my own interest, but I thought I'd throw it out there for review. Paolo Paolo Bonzini (18): target/i386/tcg: move check bits out of validate_vex target/i386/tcg: add APX support to XSAVE/XRSTOR target/i386/tcg: treat VEX as disabling high-byte registers target/i386/tcg: add definition for REX2 prefix target/i386/tcg: mark XSAVE* as not allowing REX2 target/i386/tcg: decode REX2 prefix target/i386/tcg: implement JMPABS instruction target/i386/tcg: fetch modrm early target/i386/tcg: move VEX validation early target/i386/tcg: extend VEX.vvvv parsing for APX target/i386/tcg: decode EVEX prefix target/i386/tcg: add ZU writeback target/i386/tcg: add decode functionality for APX target/i386/tcg: implement CCMP/CTEST target/i386/tcg: undo IMUL memory load optimization target/i386/tcg: decode APX instructions target/i386/tcg: mark APX as supported target/i386/tcg: optimize CCMP configs/targets/x86_64-bsd-user.mak | 2 +- configs/targets/x86_64-linux-user.mak | 2 +- target/i386/cpu.h | 8 + target/i386/helper.h | 1 + target/i386/tcg/decode-new.h | 20 + target/i386/tcg/tcg-cpu.h | 16 +- target/i386/tcg/cc_helper_template.h.inc | 11 + target/i386/cpu.c | 15 +- target/i386/helper.c | 11 + target/i386/tcg/cc_helper.c | 10 + target/i386/tcg/excp_helper.c | 5 + target/i386/tcg/fpu_helper.c | 59 +- target/i386/tcg/tcg-cpu.c | 5 +- target/i386/tcg/translate.c | 106 ++- target/i386/tcg/decode-new.c.inc | 937 ++++++++++++++++++----- target/i386/tcg/emit.c.inc | 243 +++++- 16 files changed, 1226 insertions(+), 225 deletions(-) -- 2.52.0
