On 12/15/24 03:06, Paolo Bonzini wrote:
The condition for optimizing repeat instruction is more or less the opposite of what you imagine: almost always the string instruction was_not_ optimized and optimizing the loop relied on goto_tb. This is obviously not great for performance, due to the cost of the exit-to-main-loop check, but also wrong. In fact, after expanding dc->jmp_opt and simplifying "!!x" to "x", the condition for looping used to be:((cflags & CF_NO_GOTO_TB) || (flags & (HF_RF_MASK | HF_TF_MASK | HF_INHIBIT_IRQ_MASK))) && !(cflags & CF_USE_ICOUNT) In other words, setting aside RF (it requires special handling for REP instructions and it was completely missing), repeat instruction were being optimized if TF or inhibit IRQ flags were set. This is certainly wrong for TF, because string instructions trap after every execution, and probably for interrupt shadow too. Get rid of repz_opt completely. The next patches will reintroduce the optimization, applying it in the common case instead of the unlikely and wrong one. While at it, place the CX/ECX/RCX=0 case is at the end of the function, which saves a label and is clearer when reading the generated ops. For clarity, mark the cc_op explicitly as DYNAMIC even if at the end of the translation block; the cc_op can come from either the previous instruction or the string instruction, and currently we rely on a gen_update_cc_op() that is hidden in the bowels of gen_jcc() to spill cc_op and mark it clean. Signed-off-by: Paolo Bonzini<[email protected]> --- target/i386/tcg/translate.c | 60 ++++++++----------------------------- 1 file changed, 13 insertions(+), 47 deletions(-)
It might have been clearer inlining gen_jz_ecx_string as a separate step, but no need to do that now. Yes, this is much clearer code generation.
Reviewed-by: Richard Henderson <[email protected]> r~
