subject:"\[Bug tree\-optimization\/55760\] scalar code non using rsqrtss and rcpss"

[Bug tree-optimization/55760] scalar code non using rsqrtss and rcpss

2021-08-07 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55760

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |WONTFIX
   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=47989
   Keywords||documentation

--- Comment #7 from Andrew Pinski  ---
See PR 47989 for the reason why this option is not enabled for scalar code and
why it was only enabled for vectorized code.

[Bug tree-optimization/55760] scalar code non using rsqrtss and rcpss

2013-01-08 Thread glisse at gcc dot gnu.org



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55760



Marc Glisse  changed:



   What|Removed |Added



 CC||glisse at gcc dot gnu.org



--- Comment #6 from Marc Glisse  2013-01-08 23:55:18 
UTC ---

(In reply to comment #5)

> we just got "hit" by this great type of code (copysign is unknown to

> scientists)

> 

> most probably gcc could optimize it for -Ofast to return copysignf(1.f,x); 
> (x/x

> is optimized in 1)

> 

> 

> cat one.cc;c++ -Ofast -mrecip -S one.cc; cat one.s

> #include

> int one(float x) {

>   return x/std::abs(x);

> }



That looks like a completely different issue than this PR, I think you should

open a different PR if you don't want it to get lost. It seems easy to add a

few lines to fold_binary_loc about it (not the best place, but that's where the

others are) near the place that optimizes A / A to 1.0. You could try writing

the patch, I don't foresee any trap.

[Bug tree-optimization/55760] scalar code non using rsqrtss and rcpss

2013-01-08 Thread vincenzo.innocente at cern dot ch



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55760



--- Comment #5 from vincenzo Innocente  
2013-01-08 15:29:18 UTC ---

we just got "hit" by this great type of code (copysign is unknown to

scientists)



most probably gcc could optimize it for -Ofast to return copysignf(1.f,x); (x/x

is optimized in 1)





cat one.cc;c++ -Ofast -mrecip -S one.cc; cat one.s

#include

int one(float x) {

  return x/std::abs(x);

}



.text

.align 4,0x90

.globl __Z3onef

__Z3onef:

LFB86:

movssLC0(%rip), %xmm2

andps%xmm0, %xmm2

rcpss%xmm2, %xmm1

mulss%xmm1, %xmm2

mulss%xmm1, %xmm2

addss%xmm1, %xmm1

subss%xmm2, %xmm1

mulss%xmm0, %xmm1

cvttss2si%xmm1, %eax

ret

[Bug tree-optimization/55760] scalar code non using rsqrtss and rcpss

2012-12-20 Thread dominiq at lps dot ens.fr



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55760



--- Comment #4 from Dominique d'Humieres  2012-12-20 
16:07:11 UTC ---

> is there any reason why rsqrtss and rcpss are not used for scalar code while

> rsqrtps and rcpps are used for loops?



Yep! I don't have the patience to dig the bugzilla archive right now, but the

main reason is related to a loss of accuracy (especially 1/2.0 != 0.5) leading

to problems in some codes (see gas_dyn.f90 in the polyhedron tests). You can

pass options to force the use of rsqrtss and rcpss for scalars:



-mrecip

This option enables use of RCPSS and RSQRTSS instructions (and their vectorized

variants RCPPS and RSQRTPS) with an additional Newton-Raphson step to increase

precision instead of DIVSS and SQRTSS (and their vectorized variants) for

single-precision floating-point arguments. These instructions are generated

only when -funsafe-math-optimizations is enabled together with

-finite-math-only and -fno-trapping-math. Note that while the throughput of the

sequence is higher than the throughput of the non-reciprocal instruction, the

precision of the sequence can be decreased by up to 2 ulp (i.e. the inverse of

1.0 equals 0.9994).

Note that GCC implements 1.0f/sqrtf(x) in terms of RSQRTSS (or RSQRTPS) already

with -ffast-math (or the above option combination), and doesn't need -mrecip.



Also note that GCC emits the above sequence with additional Newton-Raphson step

for vectorized single-float division and vectorized sqrtf(x) already with

-ffast-math (or the above option combination), and doesn't need -mrecip. 



-mrecip=opt

This option controls which reciprocal estimate instructions may be used. opt is

a comma-separated list of options, which may be preceded by a `!' to invert the

option:

`all'

Enable all estimate instructions. 

`default'

Enable the default instructions, equivalent to -mrecip. 

`none'

Disable all estimate instructions, equivalent to -mno-recip. 

`div'

Enable the approximation for scalar division. 

`vec-div'

Enable the approximation for vectorized division. 

`sqrt'

Enable the approximation for scalar square root. 

`vec-sqrt'

Enable the approximation for vectorized square root.

So, for example, -mrecip=all,!sqrt enables all of the reciprocal

approximations, except for square root.

[Bug tree-optimization/55760] scalar code non using rsqrtss and rcpss

2012-12-20 Thread rguenth at gcc dot gnu.org



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55760



--- Comment #3 from Richard Biener  2012-12-20 
15:58:55 UTC ---

(In reply to comment #2)

> Thanks.

> not safe meaning producing incorrect results?



Yes.



> Is it documented?



See the documentation for -mrecip:



...



Note that while the throughput of the sequence is higher than the throughput

of the non-reciprocal instruction, the precision of the sequence can be

decreased by up to 2 ulp (i.e. the inverse of 1.0 equals 0.9994).



...

[Bug tree-optimization/55760] scalar code non using rsqrtss and rcpss

2012-12-20 Thread vincenzo.innocente at cern dot ch



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55760



--- Comment #2 from vincenzo Innocente  
2012-12-20 15:55:03 UTC ---

Thanks.

not safe meaning producing incorrect results?

Is it documented?

[Bug tree-optimization/55760] scalar code non using rsqrtss and rcpss

2012-12-20 Thread rguenth at gcc dot gnu.org



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55760



--- Comment #1 from Richard Biener  2012-12-20 
15:52:31 UTC ---

Use -mrecip.  It's otherwise not safe for SPEC CPU 2006 which is why it is not

enabled by default for -ffast-math.

[Bug tree-optimization/55760] scalar code non using rsqrtss and rcpss

[Bug tree-optimization/55760] scalar code non using rsqrtss and rcpss

[Bug tree-optimization/55760] scalar code non using rsqrtss and rcpss

[Bug tree-optimization/55760] scalar code non using rsqrtss and rcpss

[Bug tree-optimization/55760] scalar code non using rsqrtss and rcpss

[Bug tree-optimization/55760] scalar code non using rsqrtss and rcpss

[Bug tree-optimization/55760] scalar code non using rsqrtss and rcpss

7 matches

Site Navigation

Mail list logo

Footer information