On 10/19/23 03:46, Paolo Bonzini wrote:
This includes:

- implementing SHA and CMPccXADD instruction extensions

- introducing a new mechanism for flags writeback that avoids a
   tricky failure

- converting the more orthogonal parts of the one-byte opcode
   map, as well as the CMOVcc and SETcc instructions.

Tested by booting several 32-bit and 64-bit guests.

The new decoder produces roughly 2% more ops, but after optimization there
are just 0.5% more and almost all of them come from cmp instructions.
For some reason that I have not investigated, these end up with an extra
mov even after optimization:

                                 sub_i64 tmp0,rax,$0x33
  mov_i64 cc_src,$0x33           mov_i64 cc_dst,tmp0
  sub_i64 cc_dst,rax,$0x33       mov_i64 cc_src,$0x33
  discard cc_src2                discard cc_src2
  discard cc_op                  discard cc_op

It could be easily fixed by not reusing gen_SUB for cmp instructions,
or by debugging what goes on in the optimizer.  However, it does not
result in larger assembly.

This is expected behaviour out of the tcg optimizer. We don't forward-propagate outputs at that point. But during register allocation of the "mov cc_dst,tmp0" opcode, we will see that tmp0 is dead and re-assign the register from tmp0 to cc_dst without emitting an host instruction.


r~

Reply via email to