Changes v1->v2: * Patch 1 changes the exact swap condition. This helps add2 for e.g.
add2 tmp4,tmp5,tmp4,tmp5,c1,c2 where tmp5, c1, and c2 are all input constants. Since tmp4 is variable, we cannot constant fold this. But the existing swap condition would give add2 tmp4.tmp5,tmp4,c2,c1,tmp5 While not incorrect, we do want to prefer "adc $c2,tmp5" on i686. * Patch 2 drops the partial constant folding for add2/sub2. It only does the operand ordering for add2. * Patch 4 is new. When writing the code for brcond2 et al, it did seem silly to do all the gen_args[N] = args[N] copying by hand. I think the patch makes the code more readable. * Patch 5 has the operand typo fixed that Aurelien noticed. * Patch 8 is new, adding the extra nop into the opcode stream that was suggested on the list. With this we fully constant fold add2/sub2. * Patch 9 is new. While looking at dumps from x86_64 bios boot, I noticed that sequences of push/pop insns leave the high-part of %rsp dead. And in general any 32-bit addition in which the high-part isn't "consumed" by cc_dst. * Patch 10 is new, treating mulu2 similarly to add2. It triggers frequently during the boot of seabios, and should not be expensive. r~ Richard Henderson (10): tcg: Split out swap_commutative as a subroutine tcg: Canonicalize add2 operand ordering tcg: Swap commutative double-word comparisons tcg: Use common code when failing to optimize tcg: Optimize double-word comparisons against zero tcg: Split out subroutines from do_constant_folding_cond tcg: Do constant folding on double-word comparisons tcg: Constant fold add2 and sub2 tcg: Optimize half-dead add2/sub2 tcg: Optimize mulu2 tcg/optimize.c | 465 ++++++++++++++++++++++++++++++++++++++------------------- tcg/tcg-op.h | 11 ++ tcg/tcg.c | 53 ++++++- 3 files changed, 377 insertions(+), 152 deletions(-) -- 1.7.11.4