On Mon, 2018-06-25 at 11:10 +0200, Richard Biener wrote:
> On Fri, 22 Jun 2018, David Malcolm wrote:
> 
> > NightStrike and I were chatting on IRC last week about
> > issues with trying to vectorize the following code:
> > 
> > #include <vector>
> > std::size_t f(std::vector<std::vector<float>> const & v) {
> >     std::size_t ret = 0;
> >     for (auto const & w: v)
> >             ret += w.size();
> >     return ret;
> > }
> > 
> > icc could vectorize it, but gcc couldn't, but neither of us could
> > immediately figure out what the problem was.
> > 
> > Using -fopt-info leads to a wall of text.
> > 
> > I tried using my patch here:
> > 
> >  "[PATCH] v3 of optinfo, remarks and optimization records"
> >   https://gcc.gnu.org/ml/gcc-patches/2018-06/msg01267.html
> > 
> > It improved things somewhat, by showing:
> > (a) the nesting structure via indentation, and
> > (b) the GCC line at which each message is emitted (by using the
> >     "remark" output)
> > 
> > but it's still a wall of text:
> > 
> >   https://dmalcolm.fedorapeople.org/gcc/2018-06-18/test.cc.remarks.
> > html
> >   https://dmalcolm.fedorapeople.org/gcc/2018-06-18/test.cc.d/..%7C.
> > .%7Csrc%7Ctest.cc.html#line-4
> > 
> > It doesn't yet provide a simple high-level message to a
> > tech-savvy user on what they need to do to get GCC to
> > vectorize their loop.
> 
> Yeah, in particular the vectorizer is way too noisy in its low-level
> functions.  IIRC -fopt-info-vec-missed is "somewhat" better:
> 
> t.C:4:26: note: step unknown.
> t.C:4:26: note: vector alignment may not be reachable
> t.C:4:26: note: not ssa-name.
> t.C:4:26: note: use not simple.
> t.C:4:26: note: not ssa-name.
> t.C:4:26: note: use not simple.
> t.C:4:26: note: no array mode for V2DI[3]
> t.C:4:26: note: Data access with gaps requires scalar epilogue loop
> t.C:4:26: note: can't use a fully-masked loop because the target
> doesn't 
> have the appropriate masked load or store.
> t.C:4:26: note: not ssa-name.
> t.C:4:26: note: use not simple.
> t.C:4:26: note: not ssa-name.
> t.C:4:26: note: use not simple.
> t.C:4:26: note: no array mode for V2DI[3]
> t.C:4:26: note: Data access with gaps requires scalar epilogue loop
> t.C:4:26: note: op not supported by target.
> t.C:4:26: note: not vectorized: relevant stmt not supported: _15 =
> _14 
> /[ex] 4;
> t.C:4:26: note: bad operation or unsupported loop bound.
> t.C:4:26: note: not vectorized: no grouped stores in basic block.
> t.C:4:26: note: not vectorized: no grouped stores in basic block.
> t.C:6:12: note: not vectorized: not enough data-refs in basic block.
> 
> 
> > The pertinent dump messages are:
> > 
> > test.cc:4:23: remark: === try_vectorize_loop_1 ===
> > [../../src/gcc/tree-vectorizer.c:674:try_vectorize_loop_1]
> > cc1plus: remark:
> > Analyzing loop at test.cc:4
> > [../../src/gcc/dumpfile.c:735:ensure_pending_optinfo]
> > test.cc:4:23: remark:  === analyze_loop_nest ===
> > [../../src/gcc/tree-vect-loop.c:2299:vect_analyze_loop]
> > [...snip...]
> > test.cc:4:23: remark:   === vect_analyze_loop_operations ===
> > [../../src/gcc/tree-vect-loop.c:1520:vect_analyze_loop_operations]
> > [...snip...]
> > test.cc:4:23: remark:    ==> examining statement: ‘_15 = _14 /[ex]
> > 4;’ [../../src/gcc/tree-vect-stmts.c:9382:vect_analyze_stmt]
> > test.cc:4:23: remark:    vect_is_simple_use: operand ‘_14’
> > [../../src/gcc/tree-vect-stmts.c:10064:vect_is_simple_use]
> > test.cc:4:23: remark:    def_stmt: ‘_14 = _8 - _7;’
> > [../../src/gcc/tree-vect-stmts.c:10098:vect_is_simple_use]
> > test.cc:4:23: remark:    type of def: internal [../../src/gcc/tree-
> > vect-stmts.c:10112:vect_is_simple_use]
> > test.cc:4:23: remark:    vect_is_simple_use: operand ‘4’
> > [../../src/gcc/tree-vect-stmts.c:10064:vect_is_simple_use]
> > test.cc:4:23: remark:    op not supported by target.
> > [../../src/gcc/tree-vect-stmts.c:5932:vectorizable_operation]
> > test.cc:4:23: remark:    not vectorized: relevant stmt not
> > supported: ‘_15 = _14 /[ex] 4;’ [../../src/gcc/tree-vect-
> > stmts.c:9565:vect_analyze_stmt]
> > test.cc:4:23: remark:   bad operation or unsupported loop bound.
> > [../../src/gcc/tree-vect-loop.c:2043:vect_analyze_loop_2]
> > cc1plus: remark: vectorized 0 loops in function.
> > [../../src/gcc/tree-vectorizer.c:904:vectorize_loops]
> > 
> > In particular, that complaint from
> >   [../../src/gcc/tree-vect-stmts.c:9565:vect_analyze_stmt]
> > is coming from:
> > 
> >   if (!ok)
> >     {
> >       if (dump_enabled_p ())
> >         {
> >           dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >                            "not vectorized: relevant stmt not ");
> >           dump_printf (MSG_MISSED_OPTIMIZATION, "supported: ");
> >           dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
> > stmt, 0);
> >         }
> > 
> >       return false;
> >     }
> > 
> > This got me thinking: the user presumably wants to know several
> > things:
> > 
> > * the location of the loop that can't be vectorized (vect_location
> >   captures this)
> > * location of the problematic statement
> > * why it's problematic
> > * the problematic statement itself.
> > 
> > The following is an experiment at capturing that information, by
> > recording an "opt_problem" instance describing what the
> > optimization
> > problem is, created deep in the callstack when it occurs, for use
> > later on back at the top of the vectorization callstack.
> 
> Nice idea.  Of course the issue is then for which issues to
> exactly create those.  Like all of the MSG_MISSED_OPTIMIZATION
> dumpings?
> 
> I guess the vectorizer needs some axing of useless messages
> and/or we need a MSG_DEBUG to have an additional level below
> MSG_NOTE.
> 
> > This extra work is only done if dump_enabled_p.
> > 
> > It feels vaguely analogous to an exception object (in terms of
> > packaging up a problem that occurs deep in the stack for reporting
> > back at a higher level).
> > 
> > With this patch, we emit:
> > 
> > ../../src/test.cc: In function ‘std::size_t f(const
> > std::vector<std::vector<float> >&)’:
> > ../../src/test.cc:4:23: remark: couldn't vectorize loop
> >   for (auto const & w: v)
> >                        ^
> > In file included from ../x86_64-pc-linux-gnu/libstdc++-
> > v3/include/vector:64,
> >                  from ../../src/test.cc:1:
> > ../x86_64-pc-linux-gnu/libstdc++-
> > v3/include/bits/stl_vector.h:806:50:
> > note: relevant stmt not supported: ‘_15 = _14 /[ex] 4;’
> >        { return size_type(this->_M_impl._M_finish - this-
> > >_M_impl._M_start); }
> >                           ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
> > ~~~~~~~
> > 
> > which reports both the location of the loop and the statement
> > that's
> > problematic (if I'm reading it right, the pointer arithmetic leads
> > to a
> > division by 4, and presumably we're not able to handle that).
> 
> Quite likely because we only handle TRUNC_DIV_EXPR and not
> EXACT_DIV_EXPR
> which we can handle the same semantically (EXACT_DIV_EXPR just gives
> us stronger guarantees).

[...snip...]

I've been experimenting with the idea; here's an updated version of
the patch (on top of the optinfo patch kit, as it happens).

I tried adding more uses of opt_problem, but ran into the issue that it's
hard to find "by hand" the places deep in the callstack where things fail;
I spent a chunk of time stepping through failures, trying to figure out
the best place to capture the opt_problem.

It seemed like something that we could track in C++'s type system.

The following kit solves this by introducing a new class opt_result, which
looks a lot like a bool: it has the same representation in memory.

For instance, at the deepest point of the callstack where the failure
happens, rather than:

     if (!check_something ())
       {
         if (dump_enabled_p ())
           dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
                            "foo is unsupported.\n");
         return false;
       }
     [...lots more checks...]

     // All checks passed:
     return true;

we can capture the cause of the failure via:

     if (!check_something ())
       {
         if (dump_enabled_p ())
           dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
                            "foo is unsupported.\n");
         return opt_result::failure ("foo is unsupported",
                                     stmt);
       }
     [...lots more checks...]

     // All checks passed:
     return opt_result::success ();

which effectively returns true or false, whilst recording any problem.

opt_result::success and opt_result::failure return opt_result values
which "looks like" true/false respectively, via operator bool().
If dump_enabled_p, then opt_result::failure also creates an opt_problem *,
capturing the pertinent data (here, "foo is unsupported" and "stmt").
If dumps are disabled, then opt_problem instances aren't
created, and it's equivalent to just returning a bool (false for failure).

The opt_problem can be propagated via opt_result values back up
the call stack to where it makes most sense to the user.
For instance, rather than:

     bool ok = try_something_that_might_fail ();
     if (!ok)
       {
         if (dump_enabled_p ())
           dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
                            "some message.\n");
         return false;
       }

we can replace the bool with an opt_result, so if dump_enabled_p, we
assume that if try_something_that_might_fail, an opt_problem * will be
created, and we can propagate it up the call chain:

     opt_result ok = try_something_that_might_fail ();
     if (!ok)
       {
         if (dump_enabled_p ())
           dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
                            "some message.\n");
         return ok; // propagating the opt_result
       }

There's currently a way to implicitly construct an opt_result from a
bool, but this is scaffolding: commenting it out allows the compiler to
tell us where we need to capture failure information.

As well as opt_result for "bool", there's a template for wrapping pointers
where non-NULL is "success" and NULL "failure", so that:

      loop_vinfo = vect_analyze_loop_form (loop, shared);
      if (!loop_vinfo)
        {
          if (dump_enabled_p ())
            dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
                             "bad loop form.\n");
          return NULL;
        }

can simply become:

      loop_vinfo = vect_analyze_loop_form (loop, shared);
      if (!loop_vinfo)
        {
          if (dump_enabled_p ())
            dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
                             "bad loop form.\n");
          return loop_vinfo;
        }

where the "loop_vinfo" is now an opt_wrapper<loop_vec_info> and can
capture exactly what went wrong inside vect_analyze_loop_form.

In all cases, the classes act as if the opt_problem were one of its
fields, but the opt_problem is actually stored in a global, so that when
compiled, an opt_wrapper<T> is effectively just a T, so that we're
still just passing e.g. a bool around; the opt_wrapper<T> classes
simply provide type-checking and an API to ensure that we provide
error-messages deep in the callstack at the places where problems
occur, and that we propagate them.  This also avoids having
to manage the ownership of the opt_problem instances.

Using opt_result and opt_wrapper<T> documents the intent of the code
for the places where we represent success values, and allows the C++ type
system to track where the deepest points in the callstack are where we
need to emit the failure messages from.

Some examples, showing how it immediately leads to more meaningful
diagnostics from the vectorizer:

gcc.dg/vect/vect-widen-mult-1.c: In function ‘f’:
gcc.dg/vect/vect-widen-mult-1.c:16:3: remark: loop vectorized using 16 byte 
vectors
   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
   ^~~
gcc.dg/vect/vect-widen-mult-1.c: In function ‘main’:
gcc.dg/vect/vect-widen-mult-1.c:42:10: remark: couldn't vectorize loop
     if (a[i] != (SIGNEDNESS_1 short) ((BASE + i * 5)
         ~^~~
gcc.dg/vect/vect-widen-mult-1.c:42:10: note: control flow in loop 
[../../src/gcc/tree-vect-loop.c:1200:vect_analyze_loop_form_1]
gcc.dg/vect/vect-widen-mult-1.c:34:3: remark: couldn't vectorize loop
   for (int i = 0; i < N; ++i)
   ^~~
gcc.dg/vect/vect-widen-mult-1.c:38:7: note: statement clobbers memory: ‘__asm__ 
__volatile__("" :  :  : "memory");’ 
[../../src/gcc/tree-data-ref.c:5086:find_data_references_in_stmt]
       asm volatile ("" ::: "memory");
       ^~~

by showing the location of the loop that can't be vectorized,
the problematic statement within the loop, and describing what the
problem is.

Note how it also captures the location of where in GCC the problem was
encountered.  Ultimately this would also show hotness information for the
loop in the remark.

I'm not sure exactly what an opt_problem should capture.  There are a few
places in the kit where it looks the "failure" calls might benefit from
being a formatted print API, with format codes for various middle-end
entities (e.g. gimple, tree, --param values, etc).  It's also not clear
to me how this would interact with the optinfo work (e.g. for capturing
things in optimization records), or what the API should look like (though
I like the one in the patch).

(Not yet bootstrapped or regrtested, and has plenty of FIXMEs: I'm posting
this for comment/discussion).

Thoughts?
Does the basic idea look useful and sane? (I think so)

Thanks
Dave

[...snip...]

David Malcolm (5):
  Add opt-problem.h
  Use opt-problem.h in try_vectorize_loop_1
  Add some test coverage
  Use opt_result throughout vectorizer
  Add opt-problem.cc

 gcc/Makefile.in                                |   1 +
 gcc/opt-problem.cc                             |  96 ++++++++
 gcc/opt-problem.h                              | 326 +++++++++++++++++++++++++
 gcc/testsuite/gcc.dg/vect/vect-alias-check-4.c |  11 +-
 gcc/testsuite/gcc.dg/vect/vect-remarks-1.c     |  18 ++
 gcc/tree-data-ref.c                            |  33 +--
 gcc/tree-data-ref.h                            |  10 +-
 gcc/tree-vect-data-refs.c                      | 189 ++++++++------
 gcc/tree-vect-loop.c                           | 212 +++++++++-------
 gcc/tree-vect-slp.c                            |   4 +-
 gcc/tree-vect-stmts.c                          | 130 ++++++----
 gcc/tree-vectorizer.c                          |  34 ++-
 gcc/tree-vectorizer.h                          |  41 ++--
 13 files changed, 842 insertions(+), 263 deletions(-)
 create mode 100644 gcc/opt-problem.cc
 create mode 100644 gcc/opt-problem.h
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-remarks-1.c

-- 
1.8.5.3

Reply via email to