Paolo Bonzini wrote: >> That said, there is a whole bunch of applications that would kill for >> -mrecip, > even for 11bit ones. Games are one of them, for sure ;) > What about -mrecip=0/1/2 for the number of NR steps? Or would two steps be > slower than divss? > > I was thinking of adding this as a follow-up patch ;) Just look how the > operations are grouped together.
As Richard pointed out: Having two NR does not make sense. For some cases doing with out Newton-Raphson is enough. (Example: Games -- or SPEC CPU 2006: http://www.hpcwire.com/hpc/1556972.html) Other compilers have this option, e.g. Pathscale's -OPT:rsqrt=2 [yes, this is used for SPEC runs ;-)] -- Summary: Support using -mrecip w/o additional Newton-Raphson run Product: gcc Version: 4.3.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: burnus at gcc dot gnu dot org GCC target triplet: x86_64-*-* i686-*-* http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32392