Re: [PATCH] [RFC] Higher-level reporting of vectorization problems

2018-07-03 Thread Richard Biener
On Mon, 2 Jul 2018, Richard Sandiford wrote:

> Richard Biener  writes:
> > On Fri, 22 Jun 2018, David Malcolm wrote:
> >
> >> NightStrike and I were chatting on IRC last week about
> >> issues with trying to vectorize the following code:
> >> 
> >> #include 
> >> std::size_t f(std::vector> const & v) {
> >>std::size_t ret = 0;
> >>for (auto const & w: v)
> >>ret += w.size();
> >>return ret;
> >> }
> >> 
> >> icc could vectorize it, but gcc couldn't, but neither of us could
> >> immediately figure out what the problem was.
> >> 
> >> Using -fopt-info leads to a wall of text.
> >> 
> >> I tried using my patch here:
> >> 
> >>  "[PATCH] v3 of optinfo, remarks and optimization records"
> >>   https://gcc.gnu.org/ml/gcc-patches/2018-06/msg01267.html
> >> 
> >> It improved things somewhat, by showing:
> >> (a) the nesting structure via indentation, and
> >> (b) the GCC line at which each message is emitted (by using the
> >> "remark" output)
> >> 
> >> but it's still a wall of text:
> >> 
> >>   https://dmalcolm.fedorapeople.org/gcc/2018-06-18/test.cc.remarks.html
> >>   
> >> https://dmalcolm.fedorapeople.org/gcc/2018-06-18/test.cc.d/..%7C..%7Csrc%7Ctest.cc.html#line-4
> >> 
> >> It doesn't yet provide a simple high-level message to a
> >> tech-savvy user on what they need to do to get GCC to
> >> vectorize their loop.
> >
> > Yeah, in particular the vectorizer is way too noisy in its low-level
> > functions.  IIRC -fopt-info-vec-missed is "somewhat" better:
> >
> > t.C:4:26: note: step unknown.
> > t.C:4:26: note: vector alignment may not be reachable
> > t.C:4:26: note: not ssa-name.
> > t.C:4:26: note: use not simple.
> > t.C:4:26: note: not ssa-name.
> > t.C:4:26: note: use not simple.
> > t.C:4:26: note: no array mode for V2DI[3]
> > t.C:4:26: note: Data access with gaps requires scalar epilogue loop
> > t.C:4:26: note: can't use a fully-masked loop because the target doesn't 
> > have the appropriate masked load or store.
> > t.C:4:26: note: not ssa-name.
> > t.C:4:26: note: use not simple.
> > t.C:4:26: note: not ssa-name.
> > t.C:4:26: note: use not simple.
> > t.C:4:26: note: no array mode for V2DI[3]
> > t.C:4:26: note: Data access with gaps requires scalar epilogue loop
> > t.C:4:26: note: op not supported by target.
> > t.C:4:26: note: not vectorized: relevant stmt not supported: _15 = _14 
> > /[ex] 4;
> > t.C:4:26: note: bad operation or unsupported loop bound.
> > t.C:4:26: note: not vectorized: no grouped stores in basic block.
> > t.C:4:26: note: not vectorized: no grouped stores in basic block.
> > t.C:6:12: note: not vectorized: not enough data-refs in basic block.
> >
> >
> >> The pertinent dump messages are:
> >> 
> >> test.cc:4:23: remark: === try_vectorize_loop_1 === 
> >> [../../src/gcc/tree-vectorizer.c:674:try_vectorize_loop_1]
> >> cc1plus: remark:
> >> Analyzing loop at test.cc:4 
> >> [../../src/gcc/dumpfile.c:735:ensure_pending_optinfo]
> >> test.cc:4:23: remark:  === analyze_loop_nest === 
> >> [../../src/gcc/tree-vect-loop.c:2299:vect_analyze_loop]
> >> [...snip...]
> >> test.cc:4:23: remark:   === vect_analyze_loop_operations === 
> >> [../../src/gcc/tree-vect-loop.c:1520:vect_analyze_loop_operations]
> >> [...snip...]
> >> test.cc:4:23: remark:==> examining statement: ‘_15 = _14 /[ex] 4;’ 
> >> [../../src/gcc/tree-vect-stmts.c:9382:vect_analyze_stmt]
> >> test.cc:4:23: remark:vect_is_simple_use: operand ‘_14’ 
> >> [../../src/gcc/tree-vect-stmts.c:10064:vect_is_simple_use]
> >> test.cc:4:23: remark:def_stmt: ‘_14 = _8 - _7;’ 
> >> [../../src/gcc/tree-vect-stmts.c:10098:vect_is_simple_use]
> >> test.cc:4:23: remark:type of def: internal 
> >> [../../src/gcc/tree-vect-stmts.c:10112:vect_is_simple_use]
> >> test.cc:4:23: remark:vect_is_simple_use: operand ‘4’ 
> >> [../../src/gcc/tree-vect-stmts.c:10064:vect_is_simple_use]
> >> test.cc:4:23: remark:op not supported by target. 
> >> [../../src/gcc/tree-vect-stmts.c:5932:vectorizable_operation]
> >> test.cc:4:23: remark:not vectorized: relevant stmt not supported: ‘_15 
> >> = _14 /[ex] 4;’ [../../src/gcc/tree-vect-stmts.c:9565:vect_analyze_stmt]
> >> test.cc:4:23: remark:   bad operation or unsupported loop bound. 
> >> [../../src/gcc/tree-vect-loop.c:2043:vect_analyze_loop_2]
> >> cc1plus: remark: vectorized 0 loops in function. 
> >> [../../src/gcc/tree-vectorizer.c:904:vectorize_loops]
> >> 
> >> In particular, that complaint from
> >>   [../../src/gcc/tree-vect-stmts.c:9565:vect_analyze_stmt]
> >> is coming from:
> >> 
> >>   if (!ok)
> >> {
> >>   if (dump_enabled_p ())
> >> {
> >>   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >>"not vectorized: relevant stmt not ");
> >>   dump_printf (MSG_MISSED_OPTIMIZATION, "supported: ");
> >>   dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0);
> >> }
> >> 
> >>   return false;
> >> }
> >> 
> >> This got me thinking: the 

Re: [PATCH] [RFC] Higher-level reporting of vectorization problems

2018-07-02 Thread Richard Sandiford
Richard Biener  writes:
> On Fri, 22 Jun 2018, David Malcolm wrote:
>
>> NightStrike and I were chatting on IRC last week about
>> issues with trying to vectorize the following code:
>> 
>> #include 
>> std::size_t f(std::vector> const & v) {
>>  std::size_t ret = 0;
>>  for (auto const & w: v)
>>  ret += w.size();
>>  return ret;
>> }
>> 
>> icc could vectorize it, but gcc couldn't, but neither of us could
>> immediately figure out what the problem was.
>> 
>> Using -fopt-info leads to a wall of text.
>> 
>> I tried using my patch here:
>> 
>>  "[PATCH] v3 of optinfo, remarks and optimization records"
>>   https://gcc.gnu.org/ml/gcc-patches/2018-06/msg01267.html
>> 
>> It improved things somewhat, by showing:
>> (a) the nesting structure via indentation, and
>> (b) the GCC line at which each message is emitted (by using the
>> "remark" output)
>> 
>> but it's still a wall of text:
>> 
>>   https://dmalcolm.fedorapeople.org/gcc/2018-06-18/test.cc.remarks.html
>>   
>> https://dmalcolm.fedorapeople.org/gcc/2018-06-18/test.cc.d/..%7C..%7Csrc%7Ctest.cc.html#line-4
>> 
>> It doesn't yet provide a simple high-level message to a
>> tech-savvy user on what they need to do to get GCC to
>> vectorize their loop.
>
> Yeah, in particular the vectorizer is way too noisy in its low-level
> functions.  IIRC -fopt-info-vec-missed is "somewhat" better:
>
> t.C:4:26: note: step unknown.
> t.C:4:26: note: vector alignment may not be reachable
> t.C:4:26: note: not ssa-name.
> t.C:4:26: note: use not simple.
> t.C:4:26: note: not ssa-name.
> t.C:4:26: note: use not simple.
> t.C:4:26: note: no array mode for V2DI[3]
> t.C:4:26: note: Data access with gaps requires scalar epilogue loop
> t.C:4:26: note: can't use a fully-masked loop because the target doesn't 
> have the appropriate masked load or store.
> t.C:4:26: note: not ssa-name.
> t.C:4:26: note: use not simple.
> t.C:4:26: note: not ssa-name.
> t.C:4:26: note: use not simple.
> t.C:4:26: note: no array mode for V2DI[3]
> t.C:4:26: note: Data access with gaps requires scalar epilogue loop
> t.C:4:26: note: op not supported by target.
> t.C:4:26: note: not vectorized: relevant stmt not supported: _15 = _14 
> /[ex] 4;
> t.C:4:26: note: bad operation or unsupported loop bound.
> t.C:4:26: note: not vectorized: no grouped stores in basic block.
> t.C:4:26: note: not vectorized: no grouped stores in basic block.
> t.C:6:12: note: not vectorized: not enough data-refs in basic block.
>
>
>> The pertinent dump messages are:
>> 
>> test.cc:4:23: remark: === try_vectorize_loop_1 === 
>> [../../src/gcc/tree-vectorizer.c:674:try_vectorize_loop_1]
>> cc1plus: remark:
>> Analyzing loop at test.cc:4 
>> [../../src/gcc/dumpfile.c:735:ensure_pending_optinfo]
>> test.cc:4:23: remark:  === analyze_loop_nest === 
>> [../../src/gcc/tree-vect-loop.c:2299:vect_analyze_loop]
>> [...snip...]
>> test.cc:4:23: remark:   === vect_analyze_loop_operations === 
>> [../../src/gcc/tree-vect-loop.c:1520:vect_analyze_loop_operations]
>> [...snip...]
>> test.cc:4:23: remark:==> examining statement: ‘_15 = _14 /[ex] 4;’ 
>> [../../src/gcc/tree-vect-stmts.c:9382:vect_analyze_stmt]
>> test.cc:4:23: remark:vect_is_simple_use: operand ‘_14’ 
>> [../../src/gcc/tree-vect-stmts.c:10064:vect_is_simple_use]
>> test.cc:4:23: remark:def_stmt: ‘_14 = _8 - _7;’ 
>> [../../src/gcc/tree-vect-stmts.c:10098:vect_is_simple_use]
>> test.cc:4:23: remark:type of def: internal 
>> [../../src/gcc/tree-vect-stmts.c:10112:vect_is_simple_use]
>> test.cc:4:23: remark:vect_is_simple_use: operand ‘4’ 
>> [../../src/gcc/tree-vect-stmts.c:10064:vect_is_simple_use]
>> test.cc:4:23: remark:op not supported by target. 
>> [../../src/gcc/tree-vect-stmts.c:5932:vectorizable_operation]
>> test.cc:4:23: remark:not vectorized: relevant stmt not supported: ‘_15 = 
>> _14 /[ex] 4;’ [../../src/gcc/tree-vect-stmts.c:9565:vect_analyze_stmt]
>> test.cc:4:23: remark:   bad operation or unsupported loop bound. 
>> [../../src/gcc/tree-vect-loop.c:2043:vect_analyze_loop_2]
>> cc1plus: remark: vectorized 0 loops in function. 
>> [../../src/gcc/tree-vectorizer.c:904:vectorize_loops]
>> 
>> In particular, that complaint from
>>   [../../src/gcc/tree-vect-stmts.c:9565:vect_analyze_stmt]
>> is coming from:
>> 
>>   if (!ok)
>> {
>>   if (dump_enabled_p ())
>> {
>>   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>>"not vectorized: relevant stmt not ");
>>   dump_printf (MSG_MISSED_OPTIMIZATION, "supported: ");
>>   dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0);
>> }
>> 
>>   return false;
>> }
>> 
>> This got me thinking: the user presumably wants to know several
>> things:
>> 
>> * the location of the loop that can't be vectorized (vect_location
>>   captures this)
>> * location of the problematic statement
>> * why it's problematic
>> * the problematic statement itself.
>> 
>> The following 

Re: [PATCH] [RFC] Higher-level reporting of vectorization problems

2018-06-25 Thread Richard Biener
On Fri, 22 Jun 2018, David Malcolm wrote:

> NightStrike and I were chatting on IRC last week about
> issues with trying to vectorize the following code:
> 
> #include 
> std::size_t f(std::vector> const & v) {
>   std::size_t ret = 0;
>   for (auto const & w: v)
>   ret += w.size();
>   return ret;
> }
> 
> icc could vectorize it, but gcc couldn't, but neither of us could
> immediately figure out what the problem was.
> 
> Using -fopt-info leads to a wall of text.
> 
> I tried using my patch here:
> 
>  "[PATCH] v3 of optinfo, remarks and optimization records"
>   https://gcc.gnu.org/ml/gcc-patches/2018-06/msg01267.html
> 
> It improved things somewhat, by showing:
> (a) the nesting structure via indentation, and
> (b) the GCC line at which each message is emitted (by using the
> "remark" output)
> 
> but it's still a wall of text:
> 
>   https://dmalcolm.fedorapeople.org/gcc/2018-06-18/test.cc.remarks.html
>   
> https://dmalcolm.fedorapeople.org/gcc/2018-06-18/test.cc.d/..%7C..%7Csrc%7Ctest.cc.html#line-4
> 
> It doesn't yet provide a simple high-level message to a
> tech-savvy user on what they need to do to get GCC to
> vectorize their loop.

Yeah, in particular the vectorizer is way too noisy in its low-level
functions.  IIRC -fopt-info-vec-missed is "somewhat" better:

t.C:4:26: note: step unknown.
t.C:4:26: note: vector alignment may not be reachable
t.C:4:26: note: not ssa-name.
t.C:4:26: note: use not simple.
t.C:4:26: note: not ssa-name.
t.C:4:26: note: use not simple.
t.C:4:26: note: no array mode for V2DI[3]
t.C:4:26: note: Data access with gaps requires scalar epilogue loop
t.C:4:26: note: can't use a fully-masked loop because the target doesn't 
have the appropriate masked load or store.
t.C:4:26: note: not ssa-name.
t.C:4:26: note: use not simple.
t.C:4:26: note: not ssa-name.
t.C:4:26: note: use not simple.
t.C:4:26: note: no array mode for V2DI[3]
t.C:4:26: note: Data access with gaps requires scalar epilogue loop
t.C:4:26: note: op not supported by target.
t.C:4:26: note: not vectorized: relevant stmt not supported: _15 = _14 
/[ex] 4;
t.C:4:26: note: bad operation or unsupported loop bound.
t.C:4:26: note: not vectorized: no grouped stores in basic block.
t.C:4:26: note: not vectorized: no grouped stores in basic block.
t.C:6:12: note: not vectorized: not enough data-refs in basic block.


> The pertinent dump messages are:
> 
> test.cc:4:23: remark: === try_vectorize_loop_1 === 
> [../../src/gcc/tree-vectorizer.c:674:try_vectorize_loop_1]
> cc1plus: remark:
> Analyzing loop at test.cc:4 
> [../../src/gcc/dumpfile.c:735:ensure_pending_optinfo]
> test.cc:4:23: remark:  === analyze_loop_nest === 
> [../../src/gcc/tree-vect-loop.c:2299:vect_analyze_loop]
> [...snip...]
> test.cc:4:23: remark:   === vect_analyze_loop_operations === 
> [../../src/gcc/tree-vect-loop.c:1520:vect_analyze_loop_operations]
> [...snip...]
> test.cc:4:23: remark:==> examining statement: ‘_15 = _14 /[ex] 4;’ 
> [../../src/gcc/tree-vect-stmts.c:9382:vect_analyze_stmt]
> test.cc:4:23: remark:vect_is_simple_use: operand ‘_14’ 
> [../../src/gcc/tree-vect-stmts.c:10064:vect_is_simple_use]
> test.cc:4:23: remark:def_stmt: ‘_14 = _8 - _7;’ 
> [../../src/gcc/tree-vect-stmts.c:10098:vect_is_simple_use]
> test.cc:4:23: remark:type of def: internal 
> [../../src/gcc/tree-vect-stmts.c:10112:vect_is_simple_use]
> test.cc:4:23: remark:vect_is_simple_use: operand ‘4’ 
> [../../src/gcc/tree-vect-stmts.c:10064:vect_is_simple_use]
> test.cc:4:23: remark:op not supported by target. 
> [../../src/gcc/tree-vect-stmts.c:5932:vectorizable_operation]
> test.cc:4:23: remark:not vectorized: relevant stmt not supported: ‘_15 = 
> _14 /[ex] 4;’ [../../src/gcc/tree-vect-stmts.c:9565:vect_analyze_stmt]
> test.cc:4:23: remark:   bad operation or unsupported loop bound. 
> [../../src/gcc/tree-vect-loop.c:2043:vect_analyze_loop_2]
> cc1plus: remark: vectorized 0 loops in function. 
> [../../src/gcc/tree-vectorizer.c:904:vectorize_loops]
> 
> In particular, that complaint from
>   [../../src/gcc/tree-vect-stmts.c:9565:vect_analyze_stmt]
> is coming from:
> 
>   if (!ok)
> {
>   if (dump_enabled_p ())
> {
>   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>"not vectorized: relevant stmt not ");
>   dump_printf (MSG_MISSED_OPTIMIZATION, "supported: ");
>   dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0);
> }
> 
>   return false;
> }
> 
> This got me thinking: the user presumably wants to know several
> things:
> 
> * the location of the loop that can't be vectorized (vect_location
>   captures this)
> * location of the problematic statement
> * why it's problematic
> * the problematic statement itself.
> 
> The following is an experiment at capturing that information, by
> recording an "opt_problem" instance describing what the optimization
> problem is, created deep in the callstack when