Re: [PATCH] [RFC] Higher-level reporting of vectorization problems
On Mon, 2 Jul 2018, Richard Sandiford wrote: > Richard Biener writes: > > On Fri, 22 Jun 2018, David Malcolm wrote: > > > >> NightStrike and I were chatting on IRC last week about > >> issues with trying to vectorize the following code: > >> > >> #include > >> std::size_t f(std::vector> const & v) { > >>std::size_t ret = 0; > >>for (auto const & w: v) > >>ret += w.size(); > >>return ret; > >> } > >> > >> icc could vectorize it, but gcc couldn't, but neither of us could > >> immediately figure out what the problem was. > >> > >> Using -fopt-info leads to a wall of text. > >> > >> I tried using my patch here: > >> > >> "[PATCH] v3 of optinfo, remarks and optimization records" > >> https://gcc.gnu.org/ml/gcc-patches/2018-06/msg01267.html > >> > >> It improved things somewhat, by showing: > >> (a) the nesting structure via indentation, and > >> (b) the GCC line at which each message is emitted (by using the > >> "remark" output) > >> > >> but it's still a wall of text: > >> > >> https://dmalcolm.fedorapeople.org/gcc/2018-06-18/test.cc.remarks.html > >> > >> https://dmalcolm.fedorapeople.org/gcc/2018-06-18/test.cc.d/..%7C..%7Csrc%7Ctest.cc.html#line-4 > >> > >> It doesn't yet provide a simple high-level message to a > >> tech-savvy user on what they need to do to get GCC to > >> vectorize their loop. > > > > Yeah, in particular the vectorizer is way too noisy in its low-level > > functions. IIRC -fopt-info-vec-missed is "somewhat" better: > > > > t.C:4:26: note: step unknown. > > t.C:4:26: note: vector alignment may not be reachable > > t.C:4:26: note: not ssa-name. > > t.C:4:26: note: use not simple. > > t.C:4:26: note: not ssa-name. > > t.C:4:26: note: use not simple. > > t.C:4:26: note: no array mode for V2DI[3] > > t.C:4:26: note: Data access with gaps requires scalar epilogue loop > > t.C:4:26: note: can't use a fully-masked loop because the target doesn't > > have the appropriate masked load or store. > > t.C:4:26: note: not ssa-name. > > t.C:4:26: note: use not simple. > > t.C:4:26: note: not ssa-name. > > t.C:4:26: note: use not simple. > > t.C:4:26: note: no array mode for V2DI[3] > > t.C:4:26: note: Data access with gaps requires scalar epilogue loop > > t.C:4:26: note: op not supported by target. > > t.C:4:26: note: not vectorized: relevant stmt not supported: _15 = _14 > > /[ex] 4; > > t.C:4:26: note: bad operation or unsupported loop bound. > > t.C:4:26: note: not vectorized: no grouped stores in basic block. > > t.C:4:26: note: not vectorized: no grouped stores in basic block. > > t.C:6:12: note: not vectorized: not enough data-refs in basic block. > > > > > >> The pertinent dump messages are: > >> > >> test.cc:4:23: remark: === try_vectorize_loop_1 === > >> [../../src/gcc/tree-vectorizer.c:674:try_vectorize_loop_1] > >> cc1plus: remark: > >> Analyzing loop at test.cc:4 > >> [../../src/gcc/dumpfile.c:735:ensure_pending_optinfo] > >> test.cc:4:23: remark: === analyze_loop_nest === > >> [../../src/gcc/tree-vect-loop.c:2299:vect_analyze_loop] > >> [...snip...] > >> test.cc:4:23: remark: === vect_analyze_loop_operations === > >> [../../src/gcc/tree-vect-loop.c:1520:vect_analyze_loop_operations] > >> [...snip...] > >> test.cc:4:23: remark:==> examining statement: ‘_15 = _14 /[ex] 4;’ > >> [../../src/gcc/tree-vect-stmts.c:9382:vect_analyze_stmt] > >> test.cc:4:23: remark:vect_is_simple_use: operand ‘_14’ > >> [../../src/gcc/tree-vect-stmts.c:10064:vect_is_simple_use] > >> test.cc:4:23: remark:def_stmt: ‘_14 = _8 - _7;’ > >> [../../src/gcc/tree-vect-stmts.c:10098:vect_is_simple_use] > >> test.cc:4:23: remark:type of def: internal > >> [../../src/gcc/tree-vect-stmts.c:10112:vect_is_simple_use] > >> test.cc:4:23: remark:vect_is_simple_use: operand ‘4’ > >> [../../src/gcc/tree-vect-stmts.c:10064:vect_is_simple_use] > >> test.cc:4:23: remark:op not supported by target. > >> [../../src/gcc/tree-vect-stmts.c:5932:vectorizable_operation] > >> test.cc:4:23: remark:not vectorized: relevant stmt not supported: ‘_15 > >> = _14 /[ex] 4;’ [../../src/gcc/tree-vect-stmts.c:9565:vect_analyze_stmt] > >> test.cc:4:23: remark: bad operation or unsupported loop bound. > >> [../../src/gcc/tree-vect-loop.c:2043:vect_analyze_loop_2] > >> cc1plus: remark: vectorized 0 loops in function. > >> [../../src/gcc/tree-vectorizer.c:904:vectorize_loops] > >> > >> In particular, that complaint from > >> [../../src/gcc/tree-vect-stmts.c:9565:vect_analyze_stmt] > >> is coming from: > >> > >> if (!ok) > >> { > >> if (dump_enabled_p ()) > >> { > >> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > >>"not vectorized: relevant stmt not "); > >> dump_printf (MSG_MISSED_OPTIMIZATION, "supported: "); > >> dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0); > >> } > >> > >> return false; > >> } > >> > >> This got me thinking: the
Re: [PATCH] [RFC] Higher-level reporting of vectorization problems
Richard Biener writes: > On Fri, 22 Jun 2018, David Malcolm wrote: > >> NightStrike and I were chatting on IRC last week about >> issues with trying to vectorize the following code: >> >> #include >> std::size_t f(std::vector> const & v) { >> std::size_t ret = 0; >> for (auto const & w: v) >> ret += w.size(); >> return ret; >> } >> >> icc could vectorize it, but gcc couldn't, but neither of us could >> immediately figure out what the problem was. >> >> Using -fopt-info leads to a wall of text. >> >> I tried using my patch here: >> >> "[PATCH] v3 of optinfo, remarks and optimization records" >> https://gcc.gnu.org/ml/gcc-patches/2018-06/msg01267.html >> >> It improved things somewhat, by showing: >> (a) the nesting structure via indentation, and >> (b) the GCC line at which each message is emitted (by using the >> "remark" output) >> >> but it's still a wall of text: >> >> https://dmalcolm.fedorapeople.org/gcc/2018-06-18/test.cc.remarks.html >> >> https://dmalcolm.fedorapeople.org/gcc/2018-06-18/test.cc.d/..%7C..%7Csrc%7Ctest.cc.html#line-4 >> >> It doesn't yet provide a simple high-level message to a >> tech-savvy user on what they need to do to get GCC to >> vectorize their loop. > > Yeah, in particular the vectorizer is way too noisy in its low-level > functions. IIRC -fopt-info-vec-missed is "somewhat" better: > > t.C:4:26: note: step unknown. > t.C:4:26: note: vector alignment may not be reachable > t.C:4:26: note: not ssa-name. > t.C:4:26: note: use not simple. > t.C:4:26: note: not ssa-name. > t.C:4:26: note: use not simple. > t.C:4:26: note: no array mode for V2DI[3] > t.C:4:26: note: Data access with gaps requires scalar epilogue loop > t.C:4:26: note: can't use a fully-masked loop because the target doesn't > have the appropriate masked load or store. > t.C:4:26: note: not ssa-name. > t.C:4:26: note: use not simple. > t.C:4:26: note: not ssa-name. > t.C:4:26: note: use not simple. > t.C:4:26: note: no array mode for V2DI[3] > t.C:4:26: note: Data access with gaps requires scalar epilogue loop > t.C:4:26: note: op not supported by target. > t.C:4:26: note: not vectorized: relevant stmt not supported: _15 = _14 > /[ex] 4; > t.C:4:26: note: bad operation or unsupported loop bound. > t.C:4:26: note: not vectorized: no grouped stores in basic block. > t.C:4:26: note: not vectorized: no grouped stores in basic block. > t.C:6:12: note: not vectorized: not enough data-refs in basic block. > > >> The pertinent dump messages are: >> >> test.cc:4:23: remark: === try_vectorize_loop_1 === >> [../../src/gcc/tree-vectorizer.c:674:try_vectorize_loop_1] >> cc1plus: remark: >> Analyzing loop at test.cc:4 >> [../../src/gcc/dumpfile.c:735:ensure_pending_optinfo] >> test.cc:4:23: remark: === analyze_loop_nest === >> [../../src/gcc/tree-vect-loop.c:2299:vect_analyze_loop] >> [...snip...] >> test.cc:4:23: remark: === vect_analyze_loop_operations === >> [../../src/gcc/tree-vect-loop.c:1520:vect_analyze_loop_operations] >> [...snip...] >> test.cc:4:23: remark:==> examining statement: ‘_15 = _14 /[ex] 4;’ >> [../../src/gcc/tree-vect-stmts.c:9382:vect_analyze_stmt] >> test.cc:4:23: remark:vect_is_simple_use: operand ‘_14’ >> [../../src/gcc/tree-vect-stmts.c:10064:vect_is_simple_use] >> test.cc:4:23: remark:def_stmt: ‘_14 = _8 - _7;’ >> [../../src/gcc/tree-vect-stmts.c:10098:vect_is_simple_use] >> test.cc:4:23: remark:type of def: internal >> [../../src/gcc/tree-vect-stmts.c:10112:vect_is_simple_use] >> test.cc:4:23: remark:vect_is_simple_use: operand ‘4’ >> [../../src/gcc/tree-vect-stmts.c:10064:vect_is_simple_use] >> test.cc:4:23: remark:op not supported by target. >> [../../src/gcc/tree-vect-stmts.c:5932:vectorizable_operation] >> test.cc:4:23: remark:not vectorized: relevant stmt not supported: ‘_15 = >> _14 /[ex] 4;’ [../../src/gcc/tree-vect-stmts.c:9565:vect_analyze_stmt] >> test.cc:4:23: remark: bad operation or unsupported loop bound. >> [../../src/gcc/tree-vect-loop.c:2043:vect_analyze_loop_2] >> cc1plus: remark: vectorized 0 loops in function. >> [../../src/gcc/tree-vectorizer.c:904:vectorize_loops] >> >> In particular, that complaint from >> [../../src/gcc/tree-vect-stmts.c:9565:vect_analyze_stmt] >> is coming from: >> >> if (!ok) >> { >> if (dump_enabled_p ()) >> { >> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, >>"not vectorized: relevant stmt not "); >> dump_printf (MSG_MISSED_OPTIMIZATION, "supported: "); >> dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0); >> } >> >> return false; >> } >> >> This got me thinking: the user presumably wants to know several >> things: >> >> * the location of the loop that can't be vectorized (vect_location >> captures this) >> * location of the problematic statement >> * why it's problematic >> * the problematic statement itself. >> >> The following
Re: [PATCH] [RFC] Higher-level reporting of vectorization problems
On Fri, 22 Jun 2018, David Malcolm wrote: > NightStrike and I were chatting on IRC last week about > issues with trying to vectorize the following code: > > #include > std::size_t f(std::vector> const & v) { > std::size_t ret = 0; > for (auto const & w: v) > ret += w.size(); > return ret; > } > > icc could vectorize it, but gcc couldn't, but neither of us could > immediately figure out what the problem was. > > Using -fopt-info leads to a wall of text. > > I tried using my patch here: > > "[PATCH] v3 of optinfo, remarks and optimization records" > https://gcc.gnu.org/ml/gcc-patches/2018-06/msg01267.html > > It improved things somewhat, by showing: > (a) the nesting structure via indentation, and > (b) the GCC line at which each message is emitted (by using the > "remark" output) > > but it's still a wall of text: > > https://dmalcolm.fedorapeople.org/gcc/2018-06-18/test.cc.remarks.html > > https://dmalcolm.fedorapeople.org/gcc/2018-06-18/test.cc.d/..%7C..%7Csrc%7Ctest.cc.html#line-4 > > It doesn't yet provide a simple high-level message to a > tech-savvy user on what they need to do to get GCC to > vectorize their loop. Yeah, in particular the vectorizer is way too noisy in its low-level functions. IIRC -fopt-info-vec-missed is "somewhat" better: t.C:4:26: note: step unknown. t.C:4:26: note: vector alignment may not be reachable t.C:4:26: note: not ssa-name. t.C:4:26: note: use not simple. t.C:4:26: note: not ssa-name. t.C:4:26: note: use not simple. t.C:4:26: note: no array mode for V2DI[3] t.C:4:26: note: Data access with gaps requires scalar epilogue loop t.C:4:26: note: can't use a fully-masked loop because the target doesn't have the appropriate masked load or store. t.C:4:26: note: not ssa-name. t.C:4:26: note: use not simple. t.C:4:26: note: not ssa-name. t.C:4:26: note: use not simple. t.C:4:26: note: no array mode for V2DI[3] t.C:4:26: note: Data access with gaps requires scalar epilogue loop t.C:4:26: note: op not supported by target. t.C:4:26: note: not vectorized: relevant stmt not supported: _15 = _14 /[ex] 4; t.C:4:26: note: bad operation or unsupported loop bound. t.C:4:26: note: not vectorized: no grouped stores in basic block. t.C:4:26: note: not vectorized: no grouped stores in basic block. t.C:6:12: note: not vectorized: not enough data-refs in basic block. > The pertinent dump messages are: > > test.cc:4:23: remark: === try_vectorize_loop_1 === > [../../src/gcc/tree-vectorizer.c:674:try_vectorize_loop_1] > cc1plus: remark: > Analyzing loop at test.cc:4 > [../../src/gcc/dumpfile.c:735:ensure_pending_optinfo] > test.cc:4:23: remark: === analyze_loop_nest === > [../../src/gcc/tree-vect-loop.c:2299:vect_analyze_loop] > [...snip...] > test.cc:4:23: remark: === vect_analyze_loop_operations === > [../../src/gcc/tree-vect-loop.c:1520:vect_analyze_loop_operations] > [...snip...] > test.cc:4:23: remark:==> examining statement: ‘_15 = _14 /[ex] 4;’ > [../../src/gcc/tree-vect-stmts.c:9382:vect_analyze_stmt] > test.cc:4:23: remark:vect_is_simple_use: operand ‘_14’ > [../../src/gcc/tree-vect-stmts.c:10064:vect_is_simple_use] > test.cc:4:23: remark:def_stmt: ‘_14 = _8 - _7;’ > [../../src/gcc/tree-vect-stmts.c:10098:vect_is_simple_use] > test.cc:4:23: remark:type of def: internal > [../../src/gcc/tree-vect-stmts.c:10112:vect_is_simple_use] > test.cc:4:23: remark:vect_is_simple_use: operand ‘4’ > [../../src/gcc/tree-vect-stmts.c:10064:vect_is_simple_use] > test.cc:4:23: remark:op not supported by target. > [../../src/gcc/tree-vect-stmts.c:5932:vectorizable_operation] > test.cc:4:23: remark:not vectorized: relevant stmt not supported: ‘_15 = > _14 /[ex] 4;’ [../../src/gcc/tree-vect-stmts.c:9565:vect_analyze_stmt] > test.cc:4:23: remark: bad operation or unsupported loop bound. > [../../src/gcc/tree-vect-loop.c:2043:vect_analyze_loop_2] > cc1plus: remark: vectorized 0 loops in function. > [../../src/gcc/tree-vectorizer.c:904:vectorize_loops] > > In particular, that complaint from > [../../src/gcc/tree-vect-stmts.c:9565:vect_analyze_stmt] > is coming from: > > if (!ok) > { > if (dump_enabled_p ()) > { > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, >"not vectorized: relevant stmt not "); > dump_printf (MSG_MISSED_OPTIMIZATION, "supported: "); > dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0); > } > > return false; > } > > This got me thinking: the user presumably wants to know several > things: > > * the location of the loop that can't be vectorized (vect_location > captures this) > * location of the problematic statement > * why it's problematic > * the problematic statement itself. > > The following is an experiment at capturing that information, by > recording an "opt_problem" instance describing what the optimization > problem is, created deep in the callstack when