[Bug tree-optimization/55760] scalar code non using rsqrtss and rcpss
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55760 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |WONTFIX See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=47989 Keywords||documentation --- Comment #7 from Andrew Pinski --- See PR 47989 for the reason why this option is not enabled for scalar code and why it was only enabled for vectorized code.
[Bug tree-optimization/55760] scalar code non using rsqrtss and rcpss
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55760 Marc Glisse changed: What|Removed |Added CC||glisse at gcc dot gnu.org --- Comment #6 from Marc Glisse 2013-01-08 23:55:18 UTC --- (In reply to comment #5) > we just got "hit" by this great type of code (copysign is unknown to > scientists) > > most probably gcc could optimize it for -Ofast to return copysignf(1.f,x); > (x/x > is optimized in 1) > > > cat one.cc;c++ -Ofast -mrecip -S one.cc; cat one.s > #include > int one(float x) { > return x/std::abs(x); > } That looks like a completely different issue than this PR, I think you should open a different PR if you don't want it to get lost. It seems easy to add a few lines to fold_binary_loc about it (not the best place, but that's where the others are) near the place that optimizes A / A to 1.0. You could try writing the patch, I don't foresee any trap.
[Bug tree-optimization/55760] scalar code non using rsqrtss and rcpss
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55760 --- Comment #5 from vincenzo Innocente 2013-01-08 15:29:18 UTC --- we just got "hit" by this great type of code (copysign is unknown to scientists) most probably gcc could optimize it for -Ofast to return copysignf(1.f,x); (x/x is optimized in 1) cat one.cc;c++ -Ofast -mrecip -S one.cc; cat one.s #include int one(float x) { return x/std::abs(x); } .text .align 4,0x90 .globl __Z3onef __Z3onef: LFB86: movssLC0(%rip), %xmm2 andps%xmm0, %xmm2 rcpss%xmm2, %xmm1 mulss%xmm1, %xmm2 mulss%xmm1, %xmm2 addss%xmm1, %xmm1 subss%xmm2, %xmm1 mulss%xmm0, %xmm1 cvttss2si%xmm1, %eax ret
[Bug tree-optimization/55760] scalar code non using rsqrtss and rcpss
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55760 --- Comment #4 from Dominique d'Humieres 2012-12-20 16:07:11 UTC --- > is there any reason why rsqrtss and rcpss are not used for scalar code while > rsqrtps and rcpps are used for loops? Yep! I don't have the patience to dig the bugzilla archive right now, but the main reason is related to a loss of accuracy (especially 1/2.0 != 0.5) leading to problems in some codes (see gas_dyn.f90 in the polyhedron tests). You can pass options to force the use of rsqrtss and rcpss for scalars: -mrecip This option enables use of RCPSS and RSQRTSS instructions (and their vectorized variants RCPPS and RSQRTPS) with an additional Newton-Raphson step to increase precision instead of DIVSS and SQRTSS (and their vectorized variants) for single-precision floating-point arguments. These instructions are generated only when -funsafe-math-optimizations is enabled together with -finite-math-only and -fno-trapping-math. Note that while the throughput of the sequence is higher than the throughput of the non-reciprocal instruction, the precision of the sequence can be decreased by up to 2 ulp (i.e. the inverse of 1.0 equals 0.9994). Note that GCC implements 1.0f/sqrtf(x) in terms of RSQRTSS (or RSQRTPS) already with -ffast-math (or the above option combination), and doesn't need -mrecip. Also note that GCC emits the above sequence with additional Newton-Raphson step for vectorized single-float division and vectorized sqrtf(x) already with -ffast-math (or the above option combination), and doesn't need -mrecip. -mrecip=opt This option controls which reciprocal estimate instructions may be used. opt is a comma-separated list of options, which may be preceded by a `!' to invert the option: `all' Enable all estimate instructions. `default' Enable the default instructions, equivalent to -mrecip. `none' Disable all estimate instructions, equivalent to -mno-recip. `div' Enable the approximation for scalar division. `vec-div' Enable the approximation for vectorized division. `sqrt' Enable the approximation for scalar square root. `vec-sqrt' Enable the approximation for vectorized square root. So, for example, -mrecip=all,!sqrt enables all of the reciprocal approximations, except for square root.
[Bug tree-optimization/55760] scalar code non using rsqrtss and rcpss
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55760 --- Comment #3 from Richard Biener 2012-12-20 15:58:55 UTC --- (In reply to comment #2) > Thanks. > not safe meaning producing incorrect results? Yes. > Is it documented? See the documentation for -mrecip: ... Note that while the throughput of the sequence is higher than the throughput of the non-reciprocal instruction, the precision of the sequence can be decreased by up to 2 ulp (i.e. the inverse of 1.0 equals 0.9994). ...
[Bug tree-optimization/55760] scalar code non using rsqrtss and rcpss
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55760 --- Comment #2 from vincenzo Innocente 2012-12-20 15:55:03 UTC --- Thanks. not safe meaning producing incorrect results? Is it documented?
[Bug tree-optimization/55760] scalar code non using rsqrtss and rcpss
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55760 --- Comment #1 from Richard Biener 2012-12-20 15:52:31 UTC --- Use -mrecip. It's otherwise not safe for SPEC CPU 2006 which is why it is not enabled by default for -ffast-math.