Re: [PATCH 3/5] IPA ICF pass

Jeff Law Tue, 24 Jun 2014 13:32:41 -0700

On 06/13/14 04:44, mliska wrote:

Hello,
    this is core of IPA ICF patchset. It adds new pass and registers all needed 
stuff related to a newly introduced interprocedural optimization.


Algorithm description:
   In LGEN, we visit all read-only variables and functions. For each symbol, a 
hash value based on e.g. number of arguments,
   number of BB, GIMPLE CODES is computed (similar hash is computed for 
read-only variables). This kind of information is streamed
   for LTO.

   In WPA, we build congruence classes for all symbols having a same hash 
value. For functions, these classes are subdivided in WPA by argument type 
comparison. Each reference (a call or a variable reference) to another semantic 
item candidate is marked and stored for further congruence class reduction 
(similar algorithm as Value Numbering:  
www.cs.ucr.edu/~gupta/teaching/553-07/Papers/value.pdf).

   For every congruence class of functions with more than one semantic 
function, we load function body. Having this information, we can
   process complete semantic function equality and subdivide such congruence 
class. Read-only variable class members are also deeply compared.

   After that, we process Value numbering algorithm to do a final subdivision. 
Finally, all items belonging to a congruence class with more than one
   item are merged.

Martin

Changelog:

2014-06-13  Martin Liska  <mli...@suse.cz>
            Jan Hubicka  <hubi...@ucw.cz>

        * Makefile.in: New pass object file added.
        * common.opt: New -fipa-icf flag introduced.
        * doc/invoke.texi: Documentation enhanced for the pass.
        * lto-section-in.c: New LTO section for a summary created by IPA-ICF.
        * lto-streamer.h: New section name introduced.
        * opts.c: Optimization is added to -O2.
        * passes.def: New pass added.
        * timevar.def: New time var for IPA-ICF.
        * tree-pass.h: Pass construction function.
        * ipa-icf.h: New pass header file added.
        * ipa-icf.c: New pass source file added.

You'll note many of my comments are "do you need to ...". You may infact be handling that stuff correctly, they're just things I'd like youto verify are properly handled. If they're properly handled just say so :-)

At a high level, I think this needs to be broken down a bit more. We'vegot two high level concepts in ipa-icf. One is all the equivalencetesting the other is using that information for the icf optimization.

Splitting out the equivalence testing seems like a good thing to do asthere's other contexts where it would be useful.

Overall I think you're on the right path and we just need to iterate abit on this part of the patchset.


@@ -7862,6 +7863,14 @@ it may significantly increase code size
  (see @option{--param ipcp-unit-growth=@var{value}}).
  This flag is enabled by default at @option{-O3}.

+@item -fipa-icf
+@opindex fipa-icf
+Perform Identical Code Folding for functions and read-only variables.
+Behavior is similar to Gold Linker ICF optimization. Symbols proved
+as semantically equivalent are redirected to corresponding symbol. The pass
+sensitively decides for usage of alias, thunk or local redirection.
+This flag is enabled by default at @option{-O2}.

So you've added this at -O2, what is the general compile-time impact?Would it make more sense to instead have it be part of -O3, particularlysince ICF is rarely going to improve performance (sans icache issues).

+
+/* Interprocedural Identical Code Folding for functions and
+   read-only variables.
+
+   The goal of this transformation is to discover functions and read-only
+   variables which do have exactly the same semantics.
+
+   In case of functions,
+   we could either create a virtual clone or do a simple function wrapper
+   that will call equivalent function. If the function is just locally visible,
+   all function calls can be redirected. For read-only variables, we create
+   aliases if possible.
+
+   Optimization pass arranges as follows:
+   1) All functions and read-only variables are visited and internal
+      data structure, either sem_function or sem_variables is created.
+   2) For every symbol from the previoues step, VAR_DECL and FUNCTION_DECL are
+      saved and matched to corresponding sem_items.

s/previoues/previous/

+   3) These declaration are ignored for equality check and are solved
+      by Value Numbering algorithm published by Alpert, Zadeck in 1992.
+   4) We compute hash value for each symbol.
+   5) Congruence classes are created based on hash value. If hash value are
+      equal, equals function is called and symbols are deeply compared.
+      We must prove that all SSA names, declarations and other items
+      correspond.
+   6) Value Numbering is executed for these classes. At the end of the process
+      all symbol members in remaining classes can be mrged.

s/mrged/merged.

+   7) Merge operation creates alias in case of read-only variables. For
+      callgraph node, we must decide if we can redirect local calls,
+      create an alias or a thunk.

Presumably that's the order in which we try to resolve identicalfunctions (first by redirection if it's local, then an alias, then a thunk)?

Is the code conditionalized so that it works properly on targets wherewe can't create aliases? Do we have any such targets that can be easilytested these days? Has this been tested on anything other than x86-64linux? AIX immediately comes to mind as an interesting testing target.


+
+namespace {
+
+func_checker::func_checker (): initialized (false)
+{
+}
+
+/* Itializes internal structures according to given number of

s/Itializes/Initializes/

+   source and target SSA names. The number of source names is SSA_SOURCE,
+   respectively SSA_TARGET.  */
+
+void
+func_checker::initialize (unsigned ssa_source, unsigned ssa_target)
+{
+  release ();
+  initialized = true;
+
+  source_ssa_names.create (ssa_source);
+  target_ssa_names.create (ssa_target);
+
+  for (unsigned int i = 0; i < ssa_source; i++)
+    source_ssa_names.safe_push (-1);
+
+  for (unsigned int i = 0; i < ssa_target; i++)
+    target_ssa_names.safe_push (-1);
+
+  edge_map = new pointer_map <edge> ();
+
+  decl_map = new pointer_map <tree> ();
+}
+
+/* Memory release routine.  */
+
+void
+func_checker::release (void)
+{
+  if (!initialized)
+    return;
+
+  delete edge_map;
+  delete decl_map;
+  source_ssa_names.release();
+  target_ssa_names.release();
+}

So I'm a big fan of RAII style of writing code. While I primarily likeit for locking, I also find it's a good model for memory management.Is there any reason these two functions aren't a suitable ctor/dtor?

+
+/* Verifies that trees T1 and T2 do correspond.  */
+
+bool
+func_checker::compare_ssa_name (tree t1, tree t2)
+{
+  unsigned i1 = SSA_NAME_VERSION (t1);
+  unsigned i2 = SSA_NAME_VERSION (t2);
+
+  if (source_ssa_names[i1] == -1)
+    source_ssa_names[i1] = i2;
+  else if (source_ssa_names[i1] != (int) i2)
+    return false;
+
+  if(target_ssa_names[i2] == -1)
+    target_ssa_names[i2] = i1;
+  else if (target_ssa_names[i2] != (int) i1)
+    return false;
+
+  return true;
+}

Isn't this really checking for equivalence? "do correspond" seemsawkward here.

+
+/* Verification function for edges E1 and E2.  */
+
+bool
+func_checker::compare_edge (edge e1, edge e2)
+{
+  if (e1->flags != e2->flags)
+    return false;

Presumably there's no flags we can safely ignore. So absolute equalityseems reasonable here.

+/* Congruence class constructor for a new class with _ID.  */
+
+congruence_class::congruence_class (unsigned int _id): id(_id)
+{
+  members.create (2);
+}

Is there a dtor which releases this object?

+
+void
+sem_item::setup (bitmap_obstack *stack)
+{
+  gcc_checking_assert (node);
+
+  refs.create (0);
+  tree_refs.create (0);
+  usages.create (0);
+  tree_refs_set = pointer_set_create ();
+  usage_index_bitmap = BITMAP_ALLOC (stack);
+}
+
+sem_item::~sem_item ()
+{
+  if (tree_refs_set)
+    pointer_set_destroy (tree_refs_set);
+
+  for (unsigned i = 0; i < usages.length (); i++)
+    delete usages[i];
+}

Do you need to release refs, tree_refs or the bitmap?

+
+/* Compare two types if are same aliases in case of strict aliasing
+   is enabled.  */
+bool
+sem_item::compare_for_aliasing (tree t1, tree t2)
+{
+  if (flag_strict_aliasing)
+    {
+      alias_set_type s1 = get_deref_alias_set (TREE_TYPE (t1));
+      alias_set_type s2 = get_deref_alias_set (TREE_TYPE (t2));
+
+      return s1 == s2;
+    }
+
+  return true;
+}

Is returning TRUE really the conservatively correct thing to do in theabsence of aliasing information? Isn't that case really "I don't know"in which case the proper return value is FALSE?

+
+/* Semantic function constructor that uses STACK as bitmap memory stack.  */
+
+sem_function::sem_function (bitmap_obstack *stack): sem_item (FUNC, stack),
+  compared_func (NULL)
+{
+  arg_types.create (0);

Does this need to be released?

+
+/* Gets symbol name of the item.  */
+
+const char *
+sem_function::name (void)
+{
+  return node->name ();
+}
+
+/* Gets assembler name of the item.  */
+
+const char *
+sem_function::asm_name (void)
+{
+  return node->asm_name ();
+}

Are these trivial enough that they should be inlined in the classdefinition? As much as I dislike that style of programming, these seemlike classic candidates unless I'm missing something.

+/* References independent hash function.  */
+
+hashval_t
+sem_function::get_hash (void)
+{
+  if(!hash)
+    {
+      hash = 177454; /* Random number for function type.  */
+
+      hash = iterative_hash_object (arg_count, hash);
+      hash = iterative_hash_object (bb_count, hash);
+      hash = iterative_hash_object (edge_count, hash);
+      hash = iterative_hash_object (cfg_checksum, hash);

Does CFG_CHECKSUM encompass the bb/edge counts?

+}

+
+/* Fast equality function based on knowledge known in WPA.  */
+
+bool
+sem_function::equals_wpa (sem_item *item)
+{
+  if (item->type != FUNC)
+    return false;

Can this ever happen?  Either remove to turn into an assert.

+/* Processes function equality comparison.  */
+
+bool
+sem_function::equals_private (sem_item *item)
+{
+  if (item->type != FUNC)
+    return false;
+
+  basic_block bb1, bb2;
+  edge e1, e2;
+  edge_iterator ei1, ei2;
+  int *bb_dict = NULL;
+  bool result = true;
+  tree arg1, arg2;
+
+  compared_func = static_cast<sem_function *> (item);
+
+  gcc_assert (decl != item->decl);
+
+  if (arg_count != compared_func->arg_count
+      || bb_count != compared_func->bb_count
+      || edge_count != compared_func->edge_count
+      || cfg_checksum != compared_func->cfg_checksum)

Again, check if testing the bb/edge counts aren't necessary as we checkthe cfg checksum.

+    SE_EXIT_FALSE();
+
+  if (!equals_wpa (item))
+    return false;
+
+  /* Checking function arguments.  */
+  tree decl1 = DECL_ATTRIBUTES (decl);
+  tree decl2 = DECL_ATTRIBUTES (compared_func->decl);

So are there any attributes we can safely ignore? Probably not.However, we ought to handle the case where the attributes appear indifferent orders.

Also makes me wonder does the comparison of arguments/return valueverify their attributes as well?

+
+/* Returns cgraph_node.  */
+
+struct cgraph_node *
+sem_function::get_node (void)
+{
+  return cgraph (node);
+}
+
+/* Initialize semantic item by info reachable during LTO WPA phase.  */
+
+void
+sem_function::init_wpa (void)
+{
+  parse_tree_args ();
+}

inline? Worth or not worth the headache?

+
+bool
+sem_function::compare_bb (sem_bb_t *bb1, sem_bb_t *bb2, tree func1, tree func2)

So this routine walks down the gimple statements and compares them forequality. Would it make sense to have the equality testing in gimple?That way if someone adds a new gimple code the places they need tocheck/update are at least somewhat more localized?

+
+      for (i = 0; i < size1; ++i)
+       {
+         t1 = gimple_phi_arg (phi1, i)->def;
+         t2 = gimple_phi_arg (phi2, i)->def;
+
+         if (!compare_operand (t1, t2, func1, func2))
+           SE_EXIT_FALSE ();
+
+         e1 = gimple_phi_arg_edge (phi1, i);
+         e2 = gimple_phi_arg_edge (phi2, i);
+
+         if (!checker.compare_edge (e1, e2))
+           SE_EXIT_FALSE ();
+       }

I don't think we guarantee any particular order on the PHI args. ISTMyou'd want to sort them or something so as not to reject a possibleduplicate simply because of ordering issues.

Related, I'm not sure bb indexes are even guaranteed to have any stableordering. So ISTM you'd want to do something like a DFS walk to set aindex for each block, then sort the PHI arguments based on DFS index toget a stable, consistent check here.

+/* For given basic blocks BB1 and BB2 (from functions FUNC1 and FUNC),
+   true value is returned if exception handling regions are equivalent
+   in these blocks.  */
+
+bool
+sem_function::compare_eh_region (eh_region r1, eh_region r2, tree func1,
+                                tree func2)

Similarly, are EH region indexes stable enough to check here?

+
+void
+sem_function::gsi_next_nondebug_stmt (gimple_stmt_iterator &gsi)
+{
+  gimple s;
+
+  s = gsi_stmt (gsi);
+
+  while (gimple_code (s) == GIMPLE_DEBUG)
+    {
+      gsi_next (&gsi);
+      gcc_assert (!gsi_end_p (gsi));
+
+      s = gsi_stmt (gsi);
+    }
+}

Should this be elsewhere? Seems like the concept of asking for the nextnon-debug statement happens often.

+
+/* Iterates GSI statement iterator to the next non-virtual statement.  */
+
+void
+sem_function::gsi_next_nonvirtual_phi (gimple_stmt_iterator &it)
+{
+  gimple phi;
+
+  if (gsi_end_p (it))
+    return;
+
+  phi = gsi_stmt (it);
+  gcc_assert (phi != NULL);
+
+  while (virtual_operand_p (gimple_phi_result (phi)))
+    {
+      gsi_next (&it);
+
+      if (gsi_end_p (it))
+       return;
+
+      phi = gsi_stmt (it);
+    }
+}

Similarly.

+
+/* Verifies that trees T1 and T2 do correspond.  */

Again, don't like the "do correspond" terminology. Seems like we'retesting for equivalence.

A lot of the sem_function::compare_XXX feel like they should beelsewhere, possibly their own module. I could easily see reusing themfor other purposes.

There's a number of algorithms that want to do the same kind of blockhashing & equality testing that you're doing here in ICF. Maybe pullthem into their own file/api?

+/* Gets symbol name of the item.  */
+
+const char *
+sem_variable::name (void)
+{
+  return node->name ();
+}
+
+/* Gets assembler name of the item.  */
+
+const char *
+sem_variable::asm_name (void)
+{
+  return node->asm_name ();
+}

Inline the accessors?

I'm starting to gloss over things.... It feels like we've got too muchstuff in this one file. Breaking things out would help (like forexample the hashing/equality bits). I'd like to see things broken out abit and reposted for further reviewing.


Jeff

Re: [PATCH 3/5] IPA ICF pass

Reply via email to