On 10/19/23 03:46, Paolo Bonzini wrote:
This includes:
- implementing SHA and CMPccXADD instruction extensions
- introducing a new mechanism for flags writeback that avoids a
tricky failure
- converting the more orthogonal parts of the one-byte opcode
map, as well as the CMOVcc and SETcc instructions.
Tested by booting several 32-bit and 64-bit guests.
The new decoder produces roughly 2% more ops, but after optimization there
are just 0.5% more and almost all of them come from cmp instructions.
For some reason that I have not investigated, these end up with an extra
mov even after optimization:
sub_i64 tmp0,rax,$0x33
mov_i64 cc_src,$0x33 mov_i64 cc_dst,tmp0
sub_i64 cc_dst,rax,$0x33 mov_i64 cc_src,$0x33
discard cc_src2 discard cc_src2
discard cc_op discard cc_op
It could be easily fixed by not reusing gen_SUB for cmp instructions,
or by debugging what goes on in the optimizer. However, it does not
result in larger assembly.
This is expected behaviour out of the tcg optimizer. We don't forward-propagate outputs
at that point. But during register allocation of the "mov cc_dst,tmp0" opcode, we will
see that tmp0 is dead and re-assign the register from tmp0 to cc_dst without emitting an
host instruction.
r~