On 5/6/24 01:09, Paolo Bonzini wrote:
The shift instructions are rewritten instead of reusing code from the old
decoder.  Rotates use CC_OP_ADCOX more extensively and generally rely
more on the optimizer, so that the code generators are shared between
the immediate-count and variable-count cases.

In particular, this makes gen_RCL and gen_RCR pretty efficient for the
count == 1 case, which becomes (apart from a few extra movs) something like:

   (compute_cc_all if needed)
   // save old value for OF calculation
   mov     cc_src2, T0
   // the bulk of RCL is just this!
   deposit T0, cc_src, T0, 1, TARGET_LONG_BITS - 1
   // compute carry
   shr     cc_dst, cc_src2, length - 1
   and     cc_dst, cc_dst, 1
   // compute overflow
   xor     cc_src2, cc_src2, T0
   extract cc_src2, cc_src2, length - 1, 1

32-bit MUL and IMUL are also slightly more efficient on 64-bit hosts.

Signed-off-by: Paolo Bonzini<pbonz...@redhat.com>
---
  target/i386/tcg/decode-new.h     |    1 +
  target/i386/tcg/translate.c      |   23 +-
  target/i386/tcg/decode-new.c.inc |  142 +++++
  target/i386/tcg/emit.c.inc       | 1014 +++++++++++++++++++++++++++++-
  4 files changed, 1169 insertions(+), 11 deletions(-)

Reviewed-by: Richard Henderson <richard.hender...@linaro.org>

r~

Reply via email to