On Mon, Jul 26, 2021 at 1:27 PM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > The following patch to the x86_64 backend improves the code generated > for a decrement followed by a conditional move. The primary change is > to recognize that after subtracting one, checking the result is -1 (or > equivalently that the original value was zero) can be implemented using > the borrow/carry flag instead of requiring an explicit test instruction. > This is achieved by a new define_insn_and_split that allows combine to > split the desired sequence/composite into a *subsi_3 and *movsicc_noc. > > The other change with this patch is/are a pair of peephole2 optimizations > to eliminate register-to-register moves generated during register > allocation. During reload, the compiler doesn't know that inverting > the condition of a conditional cmove can sometimes reduce register > pressure, but this is easy to tidy up during the peephole2 pass (where > swapping the order of the insn's operands performs the required > logic inversion). > > Both improvements are demonstrated by the case below: > > int foo(int x) { > if (x == 0) > x = 16; > else x--; > return x; > } > > Before: > foo: leal -1(%rdi), %eax > testl %edi, %edi > movl $16, %edx > cmove %edx, %eax > ret > > After: > foo: subl $1, %edi > movl $16, %eax > cmovnc %edi, %eax > ret > > And the value of the peephole2 clean-up can be seen on its own in: > > int bar(int x) { > x--; > if (x == 0) > x = 16; > return x; > } > > Before: > bar: movl %edi, %eax > movl $16, %edx > subl $1, %eax > cmove %edx, %eax > ret > > After: > bar: subl $1, %edi > movl $16, %eax > cmovne %edi, %eax > ret > > These idioms were inspired by the source code of NIST SciMark4's > Random_nextDouble function, where the tweaks above result in > a ~1% improvement in the MonteCarlo benchmark kernel. > > This patch has been tested on x86_64-pc-linux-gnu with a > "make boostrap" and "make -k check" with no new failures. > > Ok for mainline? > > > 2021-07-26 Roger Sayle <ro...@nextmovesoftware.com> > > gcc/ChangeLog > * config/i386/i386.md (*dec_cmov<mode>): New define_insn_and_split > to generate a conditional move using the carry flag after sub $1. > (peephole2): Eliminate a register-to-register move by inverting > the condition of a conditional move. > > gcc/testsuite/ChangeLog > * gcc.target/i386/dec-cmov-1.c: New test. > * gcc.target/i386/dec-cmov-2.c: New test.
Please also allow ia32 in the testcases. #ifdef __x86_64__ 64bit specific (long long) tests and add: /* { dg-additional-options "-march=pentiumpro -mregparm=3" { target ia32 } } */ (cmov generation uses ancient ix86_arch_features, it gets enabled by using -march=pentiumpro). OK with the above change. Thanks, Uros.