I'll have to apologize to Paolo here, in that (1) during some of the various rewriting and rebasing, the original author credit got dropped and (2) most of the signed-off-by is still there, despite the large changes.
I've fixed the performance regression that Laurent reported on nbench. I've misplaced the spreadsheet that I'd done with the numbers, but as I recall the result of that benchmarking is that, for -march=i386 guest, most tests had unchanged speed with a few startling improvements, whereas for -march=i686 guest we had more across the board minor improvements. My guess at the time was that was primarily due to the improved code gen of the cmov insn. The tree can be found at git://github.com/rth7680/qemu.git eflags3 Please review. r~ Paolo Bonzini (19): test-i386: QEMU_PACKED is not defined here test-i386: make it compile with a recent gcc target-i386: use OT_* consistently target-i386: introduce gen_ext_tl target-i386: factor setting of s->cc_op handling for string functions target-i386: drop cc_op argument of gen_jcc1 target-i386: move carry computation for inc/dec closer to gen_op_set_cc_op target-i386: move eflags computation closer to gen_op_set_cc_op target-i386: compute eflags outside rcl/rcr helper target-i386: clean up sahf target-i386: use gen_jcc1 to compile loopz target-i386: factor gen_op_set_cc_op/tcg_gen_discard_tl around computing flags target-i386: add helper functions to get other flags target-i386: change gen_setcc_slow_T0 to gen_setcc_slow target-i386: optimize setcc instructions target-i386: use CCPrepare to generate conditional jumps target-i386: cleanup temporary macros for CCPrepare target-i386: introduce gen_cmovcc1 target-i386: kill cpu_T3 Richard Henderson (38): target-i386: Name the cc_op enumeration target-i386: Introduce set_cc_op target-i386: Don't clobber s->cc_op in gen_update_cc_op target-i386: Use gen_update_cc_op everywhere target-i386: do not compute eflags multiple times consecutively target-i386: no need to flush out cc_op before gen_eob target-i386: Move CC discards to set_cc_op target-i386: do not call helper to compute ZF/SF target-i386: use inverted setcond when computing NS or NZ target-i386: convert gen_compute_eflags_c to TCG target-i386: optimize setbe target-i386: optimize setle target-i386: introduce CCPrepare target-i386: introduce gen_prepare_cc target-i386: inline gen_prepare_cc_slow target-i386: expand cmov via movcond target-i386: use gen_op for cmps/scas target-i386: introduce gen_jcc1_noeob target-i386: Update cc_op before TCG branches target-i386: optimize flags checking after sub using CC_SRCT target-i386: Don't reference ENV through most of cc helpers target-i386: Make helper_cc_compute_{all,c} const target-i386: Use CC_SRC2 for ADC and SBB target-i386: Tidy prefix parsing target-i386: Decode the VEX prefixes target-i386: Implement MOVBE target-i386: Implement ANDN target-i386: Implement BEXTR target-i386: Implement BLSR, BLSMSK, BLSI target-i386: Implement BZHI target-i386: Implement MULX target-i386: Implement PDEP, PEXT target-i386: Implement SHLX, SARX, SHRX target-i386: Implement RORX target-i386: Implement ADX extension target-i386: Use clz/ctz for bsf/bsr helpers target-i386: Implement tzcnt and fix lzcnt target-i386: Add CC_OP_CLR target-i386/cc_helper.c | 260 +++-- target-i386/cc_helper_template.h | 261 ++--- target-i386/cpu.c | 18 +- target-i386/cpu.h | 26 +- target-i386/helper.c | 13 +- target-i386/helper.h | 13 +- target-i386/int_helper.c | 69 +- target-i386/shift_helper_template.h | 12 +- target-i386/translate.c | 2205 +++++++++++++++++++++-------------- tests/tcg/test-i386.c | 10 +- 10 files changed, 1670 insertions(+), 1217 deletions(-) -- 1.8.1.2