On Thu, Sep 16, 2021 at 8:13 PM Di Zhao OS
<diz...@os.amperecomputing.com> wrote:
>
> Sorry about updating on this after so long. It took me much time to work out a
> new plan and pass the tests.
>
> The new idea is to use one variable to represent a set of equal variables at
> some basic-block. This variable is called a "equivalence head" or "equiv-head"
> in the code. (There's no-longer a "equivalence map".)
>
> - Initially an SSA_NAME's "equivalence head" is its value number. Temporary
>   equivalence heads are recorded as unary NOP_EXPR results in the vn_nary_op_t
>   map. Besides, when inserting into vn_nary_op_t map, make the new result at
>   front of the vn_pval list, so that when searching for a variable's
>   equivalence head, the first result represents the largest equivalence set at
>   current location.
> - In vn_ssa_aux_t, maintain a list of references to valid_info->nary entry.
>   For recorded equivalences, the reference is result->entry; for normal N-ary
>   operations, the reference is operand->entry.
> - When recording equivalences, if one side A is constant or has more refs, 
> make
>   it the new equivalence head of the other side B. Traverse B's ref-list, if a
>   variable C's previous equiv-head is B, update to A. And re-insert B's n-ary
>   operations by replacing B with A.
> - When inserting and looking for the results of n-ary operations, insert and
>   lookup by the operands' equiv-heads.
>
> So except for the refs in vn_ssa_aux_t, this scheme uses the original
> infrastructure to its best. Quadric search time is avoided at the cost of some
> re-insertions. Test results on SPEC2017 intrate (counts and percentages):
>
>                 |more bb |more bb |more stmt|more stmt|more       |more
>                 |removed |removed |removed  |removed  |nv_nary_ops|nv_nary_ops
>                 |at fre1 |at fre1 |at fre1  |at fre1  |inserted   |inserted
> ------------------------------------------------------------------------------
>  500.perlbench_r| 64     | 1.98%  | 103     | 0.19%   | 11260     | 12.16%
>  502.gcc_r      | 671    | 4.80%  | 317     | 0.23%   | 13964     | 6.09%
>  505.mcf_r      | 5      | 35.71% | 9       | 1.40%   | 32        | 2.52%
>  520.omnetpp    | 132    | 5.45%  | 39      | 0.11%   | 1895      | 3.91%
>  523.xalancbmk_r| 238    | 3.26%  | 313     | 0.36%   | 1417      | 1.27%
>  525.x264_r     | 4      | 1.36%  | 27      | 0.11%   | 1752      | 6.78%
>  531.deepsjeng_r| 1      | 3.45%  | 2       | 0.14%   | 228       | 8.67%
>  541.leela_r    | 2      | 0.63%  | 0       | 0.00%   | 92        | 1.14%
>  548.exchange2_r| 0      | 0.00%  | 3       | 0.04%   | 49        | 1.03%
>  557.xz_r       | 0      | 0.00%  | 3       | 0.07%   | 272       | 7.55%
>
> There're more basic_blocks and statements removed compared with last
> implementation, the reasons are:
> 1) "CONST op CONST" simplification is included. It is missed in previous 
> patch.
> 2) By inserting RHS of statements on equiv-heads, more N-ary operations can be
>    simplified. One example is in 'ssa-fre-97.c' in the patch file.
>
> While jump-threading & vrp also utilize temporary equivalences (so some of the
> newly removed blocks and statements can also be covered by them), I think this
> patch is a supplement, in cases when jump threading cannot take place (the
> original example), or value number info needs to be involved (the
> 'ssa-fre-97.c' example).
>
> Fixed the former issue with non-iterate mode.
>
> About recording the temporary equivalences generated by PHIs (i.e. the
> 'record_equiv_from_previous_cond' stuff), I have to admit it looks strange and
> the code size is large, but I haven't find a better way yet. Consider a piece
> of CFG like the one below, if we want to record x==x2 on the true edge when
> processing bb1, the location (following current practice) will be bb2. But 
> that
> is not useful at bb5 or bb6, because bb2 doesn't dominate them. And I can't
> find a place to record x==x1 when processing bb1.
> If we can record things on edges rather than blocks, say x==x1 on 1->3 and
> x==x2 on 1->2, then perhaps with an extra check for "a!=0", x2 can be a valid
> equiv-head for x since bb5. But I think it lacks efficiency and is not
> persuasive. It is more efficient to find a valid previous predicate when
> processing bb4, because the vn_nary_op_t will be fetched anyway.
>         --------------
>         | if (a != 0) | bb1
>         --------------
>         f |      \ t
>           |    -------
>           |    | bb2 |
>           |    -------
>           |      /
>     -------------------------
>     | x = PHI<x1(1), x2(2)> | bb3
>     -------------------------
>                |
>               ....
>                |
>        --------------
>        | if (a != 0) | bb4
>        --------------
>            |f     \t
>     -------------  -------
> bb7 | where     |  | bb5 |  ==> where "x==x2" is recorded now
>     | "x==x1" is|  -------
>     | recorded  |    \
>     | now       |    ...
>     -------------     |
>                    -------
>                    | bb6 |  ==> where "x==x2" needs to be used
>                    -------
> Although I think I can remove the 'dominator_to_phi_map' and generalize this a
> little, but the major logic will be similar. So I left this unchanged for now,
> and would like to hear your suggestions on the new scheme overall.
>
> Thanks,
> Di Zhao
>
> --------
> Extend FRE with temporary equivalences.

Comments on the patch:

+  /* nary_ref count.  */
+  unsigned num_nary_ref;
+

I think a unsigned short should be enough and that would nicely
pack after value_id together with the bitfield (maybe change that
to unsigned short :1 then).

@@ -7307,17 +7839,23 @@ process_bb (rpo_elim &avail, basic_block bb,
            tree val = gimple_simplify (gimple_cond_code (last),
                                        boolean_type_node, lhs, rhs,
                                        NULL, vn_valueize);
+           vn_nary_op_t vnresult = NULL;
            /* If the condition didn't simplfy see if we have recorded
               an expression from sofar taken edges.  */
            if (! val || TREE_CODE (val) != INTEGER_CST)
              {
-               vn_nary_op_t vnresult;

looks like you don't need vnresult outside of the if()?

+                 /* If no simplified result has been found, try to find one by
+                  * equivalences.  */

no * on the second line.

+/* Find predicated value of vn_nary_op by the operands' equivalences.  Return
+ * NULL_TREE if no known result is found.  */
+
+static tree
+find_predicated_value_by_equivs (vn_nary_op_t vno, basic_block bb,
+                                vn_nary_op_t *vnresult)
+{
+  lookup_equiv_heads (vno->length, vno->op, vno->op, bb);
+  tree result
+    = simplify_nary_op (vno->length, vno->opcode, vno->op, vno->type);

why is it necessary to simplify here?  It looks like the caller
already does this.
I wonder whether it's valid to always perform find_predicated_value_by_equivs
from inside vn_nary_op_get_predicated_value instead of bolting it to only
a single place?

+  if (result)
+    {
+      if (!useless_type_conversion_p (vno->type, TREE_TYPE (result)))
+       result = fold_convert (vno->type, result);
+      return result;
+    }
+  return vn_nary_op_lookup_1 (vno, vnresult);
 }

+/* Record the equivalence of LHS and RHS at basic_block BB.  */

I suppose E->dest instead of BB

+
+static vn_nary_op_t
+val_equiv_insert (tree op1, tree op2, edge e)
+{

+  if (is_gimple_min_invariant (lhs))
+    std::swap (lhs, rhs);
+  if (is_gimple_min_invariant (lhs) || TREE_CODE (lhs) != SSA_NAME)
+    /* Possible if invoked from record_equiv_from_previous_cond.  */
+    return NULL;

Better formulate all of the above in terms of only SSA_NAME checks since...

+  /* Make the hand-side with more recorded n-ary expressions new
+   * equivalence-head, to make fewer re-insertions.  */
+  if (TREE_CODE (rhs) == SSA_NAME
+      && VN_INFO (rhs)->num_nary_ref < VN_INFO (lhs)->num_nary_ref)
+    std::swap (lhs, rhs);

here LHS needs to be an SSA_NAME.

+  /* Record equivalence as unary NOP_EXPR.  */
+  vn_nary_op_t val
+    = vn_nary_op_insert_pieces_predicated_1 (1, NOP_EXPR, TREE_TYPE (lhs),
+                                            &lhs, rhs, 0, bb);

so you are recording an equivalence of lhs to rhs as (NOP_EXPR)lhs
with value rhs?
Presumably you think that's good enough to not overlap with entries
for "real" IL?
In particular why did you choose to _not_ use (2, EQ_EXPR, boolean_type_node,
&[lhs, rhs], boolean_true_node, ...) like I think what would be
inserted for by the
existing code in process_bb via vn_nary_op_insert_pieces_predicated?
It would probably felt more natural if the "new" code bolted on the case where
that is called with an equivalence?

Btw, I didn't find any condition that prevents equivalences being used for
non-integer types?  Are you sure this is all valid for FP compares without
further restrictions?

@@ -4224,36 +4404,345 @@ vn_nary_op_insert_pieces_predicated (unsigned
int length, enum tree_code code,
       fprintf (dump_file, " == %s\n",
               integer_zerop (result) ? "false" : "true");
     }
-  vn_nary_op_t vno1 = alloc_vn_nary_op (length, NULL_TREE, value_id);
-  init_vn_nary_op_from_pieces (vno1, length, code, type, ops);
-  vno1->predicated_values = 1;
-  vno1->u.values = (vn_pval *) obstack_alloc (&vn_tables_obstack,
-                                             sizeof (vn_pval));
-  vno1->u.values->next = NULL;
-  vno1->u.values->result = result;
-  vno1->u.values->n = 1;
-  vno1->u.values->valid_dominated_by_p[0] = pred_e->dest->index;
-  return vn_nary_op_insert_into (vno1, valid_info->nary, true);
+  /* Insert by operands' equiv-heads.  */
+  tree new_ops[3];
+  lookup_equiv_heads (length, ops, new_ops, pred_e->dest);
+  /* Skip if the result is known.  */
+  tree simplified = simplify_nary_op (length, code, new_ops, type);
+  if (simplified)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+       {
+         fprintf (dump_file, "Result is known: ");
+         print_generic_expr (dump_file, simplified, TDF_NONE);
+         fprintf (dump_file, ", skipped.\n");
+       }
+      return NULL;
+    }

soo - how do the existing predicated expressions change to be handled here?
Why would they ever be simplified?  Should this simplification not better happen
upthread when visiting the conditional?  Maybe that needs to use the equivs
already?

You re-use the predicated value list which I think is great, can you explain
how "latest" is meant in the following?

@@ -4145,7 +4264,12 @@ vn_nary_op_insert_into (vn_nary_op_t vno,
vn_nary_op_table_type *table,
              next = &(*next)->next;
            }
          if (!found)
-           *next = nval;
+           {
+             /* Insert new value at the front, so that lookup_equiv_head can
+              * find the latest result.  */
+             nval->next = vno->u.values;
+             vno->u.values = nval;
+           }


OK, I have to stop now but will try to understand and review further next week.
Sorry for the delays but the patch is quite complex to understand :/

Richard.

> 2021-09-16  Di Zhao  <diz...@os.amperecomputing.com>
>
> gcc/ChangeLog:
>         PR tree-optimization/101186
>         * tree-ssa-sccvn.c (dominated_by_p_w_unex): Moved upward, no change.
>         (vn_nary_op_get_predicated_value): Moved upward, no change.
>         (is_vn_valid_at_bb): Check if vn_pval is valid at BB.
>         (lookup_equiv_head): Lookup the "equivalence head" of given node.
>         (lookup_equiv_heads): Lookup the "equivalence head"s of given nodes.
>         (vn_tracking_edge): Extracted utility function.
>         (init_vn_nary_op_from_stmt): Insert and lookup by "equivalence head"s.
>         (vn_nary_op_insert_into): Insert new value at the front.
>         (vn_nary_op_insert_pieces_predicated_1): Insert as predicated values
>         from pieces.
>         (simplify_nary_op): Try to simplify N-ary expressions.
>         (push_new_nary_ref): Insert a back-reference to vn_nary_op_t.
>         (vn_nary_op_insert_pieces_predicated): Insert by "equivalence head"s.
>         (is_relaxed_cond_code): Whether operation code is a relaxed condition
>         code derived from original code.
>         (branch_may_come_from_another): Whether there's a path from the one
>         true or false destination to another.
>         (val_equiv_insert): Record temporary equivalence.
>         (record_equiv_from_previous_cond): Record equivalences generated by
>         previous conditions on current BB's true and false edges.
>         (find_predicated_value_by_equivs): Find predicated value by the
>         operands' equivalences.
>         (record_dom_to_phi_bb): Record mappings from immediate dominator to
>         basic_block with PHIs.
>         (vn_phi_insert): Record mapping from the immediate dominator to a PHI
>         node.
>         (visit_nary_op): Add lookup previous results of N-ary operations by
>         equivalences.
>         (free_rpo_vn): Free dominator_to_phi_map.
>         (process_bb): Add lookup predicated values by equivalences.
>         (struct unwind_state): Unwind state of back-refs to vn_nary_op_t.
>         (do_unwind): Unwind the back-refs to vn_nary_op_t.
>         (do_rpo_vn): Update back-reference unwind state.
>         * tree-ssa-sccvn.h (struct nary_ref): hold a lists of references to 
> the
>         nary map entries.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.dg/tree-ssa/pr68619-2.c: Disable fre.
>         * gcc.dg/tree-ssa/pr71947-1.c: Disable fre.
>         * gcc.dg/tree-ssa/pr71947-2.c: Disable fre.
>         * gcc.dg/tree-ssa/pr71947-3.c: Disable fre.
>         * gcc.dg/tree-ssa/pr71947-5.c: Disable fre.
>         * gcc.dg/tree-ssa/pr71947-7.c: Disable fre.
>         * gcc.dg/tree-ssa/pr71947-8.c: Disable fre.
>         * gcc.dg/tree-ssa/pr71947-9.c: Disable fre.
>         * gcc.dg/tree-ssa/vrp03.c: Disable fre.
>         * gcc.dg/tree-ssa/ssa-fre-95.c: New test.
>         * gcc.dg/tree-ssa/ssa-fre-96.c: New test.
>         * gcc.dg/tree-ssa/ssa-fre-97.c: New test.
>

Reply via email to