On Fri, Mar 11, 2016 at 5:41 PM, Michael Meissner <meiss...@linux.vnet.ibm.com> wrote: > As I was auditing rs6000.md for power9 changes, I noticed that changes I had > made in 2010 for power7 weren't as effective with power8. > > The FCTIWZ/FCTIWUZ instructions convert the scalar floating point value to a > 32-bit signed/unsigned integer in bits 32-63 of the floating point or vector > register. Unfortunately, the hardware does not guarantee that bits 0-31 are > copies of the sign, so that it can be used as a valid 64-bit integer. There > is > no conversion from 32-bit int to floating point. This meant in the power7 > days, if you wanted to round a floating point value to 32-bit integer, you > would need to do: > > convert to 32-bit integer > store 32-bit value on the stack > load 32-bit value to a GPR > sign/zero extend it > store 32-bit value to the stack > load 32-bit value to a FPR/vector register. > > The optimization does a store/load to sign/zero extend, rather than going > through the GPRs. > > On power8, we have a direct move instruction that copies the value between the > register sets, and the compiler will generate this if the above optimization > is > turned off (which is what this patch does). > > There are other ways to sign/zero extend a value in the vector registers > without doing a move using multiple instructions, but in practice direct move > seems to be as fast as the other instructions. > > I bootstrapped the compiler and there were no regressions with this patch. > > I rebuilt the Spec 2006 benchmark suite, and there 7 of the benchmarks that > used this sequence somewhere in the code. I ran those benchmarks with this > patch, and compared them to the original benchmarks. In 6 of the benchmarks, > the run time was almost precisely the same. The 416.gamess benchmark was > about > 2% faster, and there were no regressions. > > Is this patch ok to apply to the trunk? I would like to apply it to the gcc 5 > branch as well. Is this ok also? > > [gcc] > 2016-03-11 Michael Meissner <meiss...@linux.vnet.ibm.com> > > PR target/70131 > * config/rs6000/rs6000.md (round32<mode>2_fprs): Do not do the > optimization if we have direct move. > (roundu32<mode>2_fprs): Likewise. > > [gcc/testsuite] > 2016-03-11 Michael Meissner <meiss...@linux.vnet.ibm.com> > > PR target/70131 > * gcc.target/powerpc/ppc-round2.c: New test.
Okay for trunk and GCC 5. Thanks, David