On Wed, 2019-01-09 at 11:10 -0500, David Malcolm wrote:
> On Wed, 2019-01-09 at 09:56 +0000, Jonathan Wakely wrote:
> > On Wed, 9 Jan 2019 at 09:50, Andrew Haley wrote:
> > > I don't agree. Sometimes vectorization is critical. It would be
> > > nice
> > > to have a warning which would fire if vectorization failed. That
> > > would
> > > surely help the OP.
> > 
> > Dave Malcolm has been working on something like that:
> > https://gcc.gnu.org/ml/gcc-patches/2018-09/msg01749.html
> 
> Yes: this code is in trunk for gcc 9, but it doesn't help much for
> the
> case given elsewhere in this thread:
> 
> #include <cmath>
> 
> extern float data [ 32768 ] ;
> 
> extern void vf1()
> {
>    #pragma vectorize enable
>    for ( int i = 0 ; i < 32768 ; i++ )
>      data [ i ] = std::sqrt ( data [ i ] ) ;
> }
> 
> Compiling on this x86_64 box with -fopt-info-vec-missed shows the
> rather cryptic:
> 
> g++ -c /tmp/sqrt-test.cc -O3 -mavx2 -fopt-info-vec-missed
> /tmp/sqrt-test.cc:8:24: missed: couldn't vectorize loop
> /tmp/sqrt-test.cc:8:24: missed: not vectorized: control flow in loop.
> /home/david/coding/gcc-python/gcc-svn-trunk/install-
> dogfood/include/c++/9.0.0/cmath:464:27: missed: statement clobbers
> memory: __builtin_sqrtf (_1);
> 
> and with -fopt-info-vec-all-internals shows:
> 
> g++ -c /tmp/sqrt-test.cc -O3 -mavx2 -fopt-info-vec-all-internals
> 
> Analyzing loop at /tmp/sqrt-test.cc:8
> /tmp/sqrt-test.cc:8:24: note:  === analyze_loop_nest ===
> /tmp/sqrt-test.cc:8:24: note:   === vect_analyze_loop_form ===
> /tmp/sqrt-test.cc:8:24: missed:   not vectorized: control flow in
> loop.
> /tmp/sqrt-test.cc:8:24: missed:  bad loop form.
> /tmp/sqrt-test.cc:8:24: missed: couldn't vectorize loop
> /tmp/sqrt-test.cc:8:24: missed: not vectorized: control flow in loop.
> /tmp/sqrt-test.cc:5:13: note: vectorized 0 loops in function.
> /home/david/coding/gcc-python/gcc-svn-trunk/install-
> dogfood/include/c++/9.0.0/cmath:464:27: note:  ===
> vect_slp_analyze_bb ===
> /home/david/coding/gcc-python/gcc-svn-trunk/install-
> dogfood/include/c++/9.0.0/cmath:464:27: note:   ===
> vect_analyze_data_refs ===
> /home/david/coding/gcc-python/gcc-svn-trunk/install-
> dogfood/include/c++/9.0.0/cmath:464:27: note:   got vectype for stmt:
> _1 = data[i_12];
> vector(8) float
> /home/david/coding/gcc-python/gcc-svn-trunk/install-

> dogfood/include/c++/9.0.0/cmath:464:27: missed:  not vectorized: not
> enough data-refs in basic block.
> /home/david/coding/gcc-python/gcc-svn-trunk/install-
> dogfood/include/c++/9.0.0/cmath:464:27: missed: statement clobbers
> memory: __builtin_sqrtf (_1);
> /tmp/sqrt-test.cc:8:24: note:  === vect_slp_analyze_bb ===
> /tmp/sqrt-test.cc:8:24: note:   === vect_analyze_data_refs ===
> /tmp/sqrt-test.cc:8:24: note:   got vectype for stmt: data[i_12] =
> _7;
> vector(8) float
> /tmp/sqrt-test.cc:8:24: missed:  not vectorized: not enough data-refs 
> in basic block.
> /tmp/sqrt-test.cc:10:1: note:  === vect_slp_analyze_bb ===
> /tmp/sqrt-test.cc:10:1: note:   === vect_analyze_data_refs ===
> /tmp/sqrt-test.cc:10:1: missed:  not vectorized: not enough data-refs 
> in basic block.
> 
> I had to turn on -fdump-tree-all to try to figure out what that
> "control flow in loop" was; it seems to be a guard against the input
> to
> value being negative:
> 
>   <bb 3> [local count: 1063004407]:
>   # i_12 = PHI <0(2), i_6(7)>
>   # ivtmp_10 = PHI <32768(2), ivtmp_2(7)>
>   # DEBUG i => i_12
>   # DEBUG BEGIN_STMT
>   _1 = data[i_12];
>   # DEBUG __x => _1
>   # DEBUG BEGIN_STMT
>   _7 = .SQRT (_1);
>   if (_1 u>= 0.0)
>     goto <bb 8>; [99.95%]
>   else
>     goto <bb 4>; [0.05%]
> 
>   <bb 8> [local count: 1062472912]:
>   goto <bb 5>; [100.00%]
> 
>   <bb 4> [local count: 531495]:
>   __builtin_sqrtf (_1);
> 
> I'm not sure where that control flow came from: it isn't in
>   sqrt-test.cc.104t.stdarg
> but is in
>   sqrt-test.cc.105t.cdce
> so I think it's coming from the argument-range code in cdce.
> 
> Arguably the location on the statement is wrong: it's on the loop
> header, when it presumably should be on the std::sqrt call.
> 
> Shall I file a bugzilla about this?

...and -fno-tree-builtin-call-dce eliminates the control flow, but it
still doesn't vectorize the loop; on godbolt.org with:
  -O3 -mavx2 -fopt-info-vec-all -fno-tree-builtin-call-dce
gcc trunk x86_64 gives:

<source>:8:24: missed: couldn't vectorize loop
/opt/compiler-explorer/gcc-trunk-20190109/include/c++/9.0.0/cmath:464:27: 
missed: statement clobbers memory: _7 = __builtin_sqrtf (_1);
<source>:5:13: note: vectorized 0 loops in function.
/opt/compiler-explorer/gcc-trunk-20190109/include/c++/9.0.0/cmath:464:27: 
missed: statement clobbers memory: _7 = __builtin_sqrtf (_1);
Compiler returned: 0

...so presumably it doesn't know how to vectorize that builtin call.

Dave

Reply via email to