On June 1, 2018 3:40:15 PM GMT+02:00, David Malcolm <dmalc...@redhat.com> wrote: >On Fri, 2018-06-01 at 11:50 +0200, Richard Biener wrote: >> On Tue, May 29, 2018 at 10:33 PM David Malcolm <dmalc...@redhat.com> >> wrote: >> > >> > This was an experiment to try to capture information on a >> > loop optimization. >> > >> > gcc/ChangeLog: >> > * gimple-loop-interchange.cc (should_interchange_loops): >> > Add >> > optinfo note when interchange gives better data locality >> > behavior. >> > (tree_loop_interchange::interchange): Add OPTINFO_SCOPE. >> > Add optinfo for successful and unsuccessful interchanges. >> > (prepare_perfect_loop_nest): Add OPTINFO_SCOPE. Add >> > optinfo note. >> > (pass_linterchange::execute): Add OPTINFO_SCOPE. >> > --- >> > gcc/gimple-loop-interchange.cc | 36 >> > +++++++++++++++++++++++++++++++++++- >> > 1 file changed, 35 insertions(+), 1 deletion(-) >> > >> > diff --git a/gcc/gimple-loop-interchange.cc b/gcc/gimple-loop- >> > interchange.cc >> > index eb35263..cd32288 100644 >> > --- a/gcc/gimple-loop-interchange.cc >> > +++ b/gcc/gimple-loop-interchange.cc >> > @@ -1556,7 +1556,19 @@ should_interchange_loops (unsigned i_idx, >> > unsigned o_idx, >> > ratio = innermost_loops_p ? INNER_STRIDE_RATIO : >> > OUTER_STRIDE_RATIO; >> > /* Do interchange if it gives better data locality behavior. */ >> > if (wi::gtu_p (iloop_strides, wi::mul (oloop_strides, ratio))) >> > - return true; >> > + { >> > + if (optinfo_enabled_p ()) >> > + OPTINFO_NOTE ((gimple *)NULL) // FIXME >> > + << "interchange gives better data locality behavior: " >> > + << "iloop_strides: " >> > + << decu (iloop_strides) >> > + << " > (oloop_strides: " >> > + << decu (oloop_strides) >> > + << " * ratio: " >> > + << decu (ratio) >> > + << ")"; >> >> Just randomly inside the thread. >> >> NOOOOOOOOOO! >> >> :/ > >> Please do _not_ add more stream-like APIs. How do you expect >> translators to deal with those? >> >> Yes, I'm aware of the graphite-* ones and I dislike those very much. >> >> What's wrong with the existing dump API? > >The existing API suffers from a "wall of text" problem: > >* although it's possible to filter based on various criteria (the >optgroup tags, specific passes, success vs failure), it's not possible >to filter base on code hotness: the -fopt-info API works purely in >terms of location_t. So all of the messages about the hottest >functions in the workload are intermingled with all of the other >messages about all of the other functions.
True >* some of the text notes refer to function entry, but all of these are >emitted "at the same level": there's no way to see the nesting of these >function-entry logs, and where other notes are in relation to them. >For example, in: > > test.c:8:3: note: === analyzing loop === > test.c:8:3: note: === analyze_loop_nest === > test.c:8:3: note: === vect_analyze_loop_form === > test.c:8:3: note: === get_loop_niters === >test.c:8:3: note: symbolic number of iterations is (unsigned int) >n_9(D) >test.c:8:3: note: not vectorized: loop contains function calls or data >references that cannot be analyzed > test.c:8:3: note: vectorized 0 loops in function > >there's no way to tell that the "vect_analyze_loop_form" is in fact >inside the call to "analyze_loop_nest", and where the "symbolic number >of iterations" messages is coming from in relation to them. This may >not seem significant here, but I'm quoting a small example; >vectorization typically leads to dozens of messages, with a deep >nesting structure (where that structure isn't visible in the -fopt-info > >output). True. The same issue exists for diagnostics BTW. Indeed, being able to collapse 'sections' in dump files, opt-info or diagnostics sounds useful. Note that for dump files and opt-info the level argument was sort of designed to do that. > >The existing API is throwing data away: > >* as noted above, by working purely with a location_t, the execution >count isn't associated with the messages. The output format purely >gives file/line/column information, but doesn't cover the inlining >chain. For C++ templates it doesn't specify which instance of a >template is being optimized. It might be useful to enhance the interface by allowing different kind of 'locations'. >* there's no metadata about where the messages are coming from. It's >easy to get at the current pass internally, but the messages don't >capture that. Figuring out where a message came from requires grepping >the GCC source code. The prototype I posted captures the __FILE__ and >__LINE__ within the gcc source for every message emitted, and which >pass instance emitted it. The opt info group was supposed to captures this to the level interesting for a user. >* The current output format is of the form: > "FILE:LINE:COLUMN: free-form text\n" >This is only machine-readable up to a point: if a program is parsing >it, all it has is the free-form text. The prototype I posted captures >what kinds of things are in the text (statements, trees, symtab_nodes), >and captures location information for them, so that visualizations of >the dumps can provide useful links. > >There's no API-level grouping of messages, beyond looking for newline >characters. > >I'm probably re-hashing a lot of the material in the cover letter here: >"[PATCH 00/10] RFC: Prototype of compiler-assisted performance >analysis" > https://gcc.gnu.org/ml/gcc-patches/2018-05/msg01675.html > > >I'd like to provide a machine-readable output format that covers the >above - in particular the profile data (whilst retaining -fopt-info for >compatibility). Separation of the data from its presentation. > >Clearly you don't like the stream-like API from the prototype :) Yes :) I wasn't so much complaining about the content but the presentation /API. >So I'm wondering what the API ought to look like, one that would allow >for the kind of "richer" machine-readable output. > >Consider this "-fopt-info" code (from "vect_create_data_ref_ptr"; this >is example 2 from the cover-letter): > > if (dump_enabled_p ()) > { > tree dr_base_type = TREE_TYPE (DR_BASE_OBJECT (dr)); > dump_printf_loc (MSG_NOTE, vect_location, > "create %s-pointer variable to type: ", > get_tree_code_name (TREE_CODE (aggr_type))); > dump_generic_expr (MSG_NOTE, TDF_SLIM, aggr_type); > if (TREE_CODE (dr_base_type) == ARRAY_TYPE) > dump_printf (MSG_NOTE, " vectorizing an array ref: "); > else if (TREE_CODE (dr_base_type) == VECTOR_TYPE) > dump_printf (MSG_NOTE, " vectorizing a vector ref: "); > else if (TREE_CODE (dr_base_type) == RECORD_TYPE) > dump_printf (MSG_NOTE, " vectorizing a record based array ref: "); > else > dump_printf (MSG_NOTE, " vectorizing a pointer ref: "); > dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_BASE_OBJECT (dr)); > dump_printf (MSG_NOTE, "\n"); > } > >where the information is built up piecewise, with conditional logic. >(This existing code isn't particularly amenable to translation either). > >One option would be for "vect_location" to become something other than >a location_t, from which a execution count can be gained (e.g. a gimple >*, from which the location_t and the BB can be accessed). Yes. Like a container that can be initiated from other kind of contexts. Then >"dump_printf_loc" would signify starting an optimization record, and >the final "\n" would signify the end of an optimization record; the >various dump_generic_expr and dump_printf would add structured >information and text entries to theoptimization record. A push/pop style API would maybe work as well. (pushing a level with some meta data) >This has the advantage of retaining the look of the existing API (so >the existing callsites don't need changing), but it changes their >meaning, so that as well as writing to -fopt-info's destinations, it's >also in a sense parsing the dump_* calls and building optimization >records from them. > >AIUI, dump_printf_loc can't be a macro, as it's variadic, so this >approach doesn't allow for capturing the location within GCC for where >the message is emitted. True, though we have __builtin_FILE and friends that can be used as default args. Note a push/pop level API can also buffer intermediate printf style output and annotate/indent it (supporting a dump file emacs/vim mode that can do collapse and expand) >Another approach: there could be something like: > > if (optinfo_enabled_p ()) > { > tree dr_base_type = TREE_TYPE (DR_BASE_OBJECT (dr)); > OPTINFO_VECT_NOTE note; > note.printf ("create %s-pointer variable to type: ", > get_tree_code_name (TREE_CODE (aggr_type))); > note.add_slim (aggr_type); > if (TREE_CODE (dr_base_type) == ARRAY_TYPE) > note.printf (" vectorizing an array ref: "); > else if (TREE_CODE (dr_base_type) == VECTOR_TYPE) > note.printf (MSG_NOTE, " vectorizing a vector ref: "); > else if (TREE_CODE (dr_base_type) == RECORD_TYPE) > note.printf (MSG_NOTE, " vectorizing a record based array ref: "); > else > note.printf (MSG_NOTE, " vectorizing a pointer ref: "); > note.add_slim (DR_BASE_OBJECT (dr)); > } > >which uses a macro to get the gcc source location, and avoids the >special meaning for "\n". This avoids the "<<" - but is kind of just >different syntax for the same - it's hard with this example to avoid >it. Maybe the following raii style that encapsulates the enabling /disabling checks? If (optinfo o = push (msg_optimization,...)) { O.print (...) ; Destructor of o 'pops' } > >Yet another approach: reworking things to support i18n via a pure >printf-style interface, using, say "%S" to mean "TDF_SLIM", it could be >like: > > if (optinfo_enabled_p ()) > { > tree dr_base_type = TREE_TYPE (DR_BASE_OBJECT (dr)); > if (TREE_CODE (dr_base_type) == ARRAY_TYPE) > OPTINFO_VECT_NOTE ("create %s-pointer variable to type: %S" > " vectorizing an array ref: %S", > get_tree_code_name (TREE_CODE (aggr_type)) > aggr_type, > DR_BASE_OBJECT (dr)); > else if (TREE_CODE (dr_base_type) == VECTOR_TYPE) > OPTINFO_VECT_NOTE ("create %s-pointer variable to type: %S" > " vectorizing a vector ref: %S", > get_tree_code_name (TREE_CODE (aggr_type)) > aggr_type, > DR_BASE_OBJECT (dr)); > else if (TREE_CODE (dr_base_type) == RECORD_TYPE) > OPTINFO_VECT_NOTE ("create %s-pointer variable to type: %S" > " vectorizing a record based array ref: %S", > get_tree_code_name (TREE_CODE (aggr_type)) > aggr_type, > DR_BASE_OBJECT (dr)); > else > OPTINFO_VECT_NOTE ("create %s-pointer variable to type: %S" > " vectorizing a pointer ref: %S", > get_tree_code_name (TREE_CODE (aggr_type)) > aggr_type, > DR_BASE_OBJECT (dr)); > } > >or somesuch. The %S would allow for the data to be captured when >emitting optimization records for the messages (not just plain text, >e.g. locations of the expressions). Certainly when exposing things in opt-info we have to be more disciplined. The vectorizer is a bad example here. The original goal of having one set of outputs for both dump files and opt-info is good but I guess the result is less than optimal. Maybe additional detail levels would help here (MSG_Dumpfile_only?) >So those are some ideas. I'm sure there are other ways to fix the >issues with -fopt-info; I'm brainstorming here. Likewise. As said I applaud the attempt improve the situation but I detest a stream API ;) Richard. >[...snip...] > >Thoughts? > >Thanks >Dave