http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55723
vincenzo Innocente <vincenzo.innocente at cern dot ch> changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|SLP vectorization vs loop: |SLP vectorization vs loop: |SLP more efficient! |SLP more efficient: loop | |vectorization inefficient | |in presence of multiple | |"blends" --- Comment #1 from vincenzo Innocente <vincenzo.innocente at cern dot ch> 2012-12-17 19:25:37 UTC --- moving the second blending before the polynomial makes the two loops to produce almost identical code This is not always possible though. Bug in the loop optimizer? template<typename Float> inline Float atan(Float t) { constexpr float PIO4F = 0.7853981633974483096f; constexpr Float zero = {0}; Float z= (t > 0.4142135623730950f) ? (t-1.0f)/(t+1.0f) : t; Float ret = ( t > 0.4142135623730950f ) ? zero+PIO4F : zero; Float z2 = z * z; ret += ((( 8.05374449538e-2f * z2 - 1.38776856032E-1f) * z2 + 1.99777106478E-1f) * z2 - 3.33329491539E-1f) * z2 * z + z; return ret; }