On 5/6/24 01:09, Paolo Bonzini wrote:
The shift instructions are rewritten instead of reusing code from the old
decoder. Rotates use CC_OP_ADCOX more extensively and generally rely
more on the optimizer, so that the code generators are shared between
the immediate-count and variable-count cases.
In particular, this makes gen_RCL and gen_RCR pretty efficient for the
count == 1 case, which becomes (apart from a few extra movs) something like:
(compute_cc_all if needed)
// save old value for OF calculation
mov cc_src2, T0
// the bulk of RCL is just this!
deposit T0, cc_src, T0, 1, TARGET_LONG_BITS - 1
// compute carry
shr cc_dst, cc_src2, length - 1
and cc_dst, cc_dst, 1
// compute overflow
xor cc_src2, cc_src2, T0
extract cc_src2, cc_src2, length - 1, 1
32-bit MUL and IMUL are also slightly more efficient on 64-bit hosts.
Signed-off-by: Paolo Bonzini<pbonz...@redhat.com>
---
target/i386/tcg/decode-new.h | 1 +
target/i386/tcg/translate.c | 23 +-
target/i386/tcg/decode-new.c.inc | 142 +++++
target/i386/tcg/emit.c.inc | 1014 +++++++++++++++++++++++++++++-
4 files changed, 1169 insertions(+), 11 deletions(-)
Reviewed-by: Richard Henderson <richard.hender...@linaro.org>
r~