I tried to improve the patch following your advices and to catch more
opportunities. Hope it'll be helpful.

On 6/24/21 8:29 AM, Richard Biener wrote: 
> On Thu, Jun 24, 2021 at 11:55 AM Di Zhao via Gcc-patches <gcc-
> patc...@gcc.gnu.org> wrote:
> 
> I have some reservations about extending the ad-hoc "predicated value" code.
> 
> Some comments on the patch:
> 
> +/* hashtable & helpers to record equivalences at given bb.  */
> +
> +typedef struct val_equiv_s
> +{
> +  val_equiv_s *next;
> +  val_equiv_s *unwind_to;
> +  hashval_t hashcode;
> +  /* SSA name this val_equiv_s is associated with.  */
> +  tree name;
> +  /* RESULT in a vn_pval entry is SSA name of a equivalence.  */
> +  vn_pval *values;
> +} * val_equiv_t;
> 
> all of this (and using a hashtable for recording) is IMHO a bit overkill.
> Since you only ever record equivalences for values the more natural place to
> hook those in is the vn_ssa_aux structure where we also record the 
> availability
> chain.
 
I tried to store the equivalences in the vn_ssa_aux structure, but I didn't  
optimize the second case successfully: I need to record the equivalence
of a PHI expression's result and arguments, but their value number results will
become VARYING first, so they won't be changed. Maybe I'm missing something, or
can I force change a VARYING result?

Besides, the way currently used, equivalences only need to be "predictable"
rather than available, maybe availability chains do not represent them very
well?

> There's little commentary in the new code, in particular function-level
> comments are missing everywhere.

Added more comments.

> There's complexity issues, like I see val_equiv_insert has a "recurse"
> feature but also find_predicated_value_by_equiv is quadratic in the number of
> equivalences of the lhs/rhs.  Without knowing what the recursion on the
> former is for - nothing tells me - I suspect either of both should be 
> redundant.

The intention was, given {A==B, B==C, X==Y, Y==Z} and a previous result of
"C opcode Z", to find the result of "A opcode Y". I removed the "recurse"
feature and modified the searching logic so solve the issue. Now a temporary
hash_set is used to record the equivalences that are visited when searching.

> You seem to record equivalences at possible use points which looks odd at best
> - I'd expected equivalences being recorded at the same point we record
> predicated values and for the current condition, not the one determining some
> other predication.
> What was the motivation to do it the way you do it?

The purpose is to "bring down" what can be known from a previous basic-block
that effectively dominates current block, but not actually does so (in the
example it is because jump threading is hindered by a loop). For example in
this case:

  if (a != 0)
      // Nothing useful can be recorded here, because this BB doesn't dominate
      // the BB that we want to simplify.
      c = b;
  for (unsigned i = 0; i < c; i++)
    {
      if (a != 0)  // The recording is triggered here.
       {
         // c == b will be recorded here, so it can be used for simplification.
         // In gimple it is the equivalence of a PHI's result and argument.
         if (i >= b) 
           foo ();

These requires finding a previous condition that is identical with current
one, so it is convenient to do this in FRE. Besides, as FRE records derived
predicate, so for relative conditions there also might be opportunities for
optimization. In the new patch code this is included.

Besides, to find more opportunities, added a hashmap to store mappings from
immediate dominators to basic-blocks with PHIs of interest.

> Why is the code conditional on 'iterate'?

I haven't worked it out to fit the non-iterate mode, so it now breaks the
if_conversion pass. I think this is because some of the equivalence-recordings
are too optimistic for non-iterate mode.

> You've probably realized that there's no "nice" way to handle temporary
> equivalences in a VN scheme using hashing for expressions (unless you degrade
> hashes a lot).

I modified the code to use TREE_HASH on ssa names. Would that be better?  

> You quote opportunities that are catched with this like
> 
> +  if (a != 0)
> +    {
> +      c = b;
> +    }
> +  for (unsigned i = 0; i < c; i++)
> +    {
> +      if (a != 0)
> +       {
> +         if (i >= b)
> +           /* Should be eliminated.
> +            */
> +           foo ();
> 
> but say other "obvious" cases like
> 
> +  if (a != 0)
> +    {
> +      c = b;
> +    }
> +  for (unsigned i = 0; i < c; i++)
> +    {
> +      if (a != 0)
> +       {
> +           /* Should be zero.  */
>              return b - c;
> 
> are not handled.  That's in line with the current "value predication"
> which mainly aims at catching simple jump threading opportunities; you only
> simplify conditions with the recorded equivalences.  But then the complexity 
> of
> handling equivalences does probably not outweight the opportunities catched -
> can you share some numbers on how many branches are newly known taken
> during VN with this patch during say bootstrap or build of SPEC CPU?

I extended the code a little to cover the cases like "A - A" and "A xor A".

Here are some results on one bootstrap step: 
           | values found | more bb removed | values found in all
           | in fre1      | at fre1         | fre & pre passes   
-----------------------------------------------------------------
 bootstrap | 592          | 40              | 1272

As the code is different for bootstrap, the "more bb removed" metric is not
precise. I also tested on SPEC CPU 2017 integer cases:
                | values found | more bb removed | values found in all
                | in fre1      | at fre1         | fre & pre passes
-----------------------------------------------------------------
 500.perlbench_r| 3            |  0              | 9
 502.gcc_r      | 25           |  39             | 241
 520.omnetpp    | 9            |  6              | 34
 523.xalancbmk_r| 12           |  0              | 35
 541.leela_r    | 2            |  0              | 2

In cases not listed above there's no value found by equivalences. Benefits
after fre1 are not counted as CGF may be different from here (for
523.xalancbmk_r there're 8 more basic-blocks removed at fre3). Although the
chances are not plenty, there might be potential benefits, such as making a
function pure.

> I've hoped to ditch the current "value predication" code by eventually using 
> the
> relation oracle from ranger but did not yet have the chance to look into that.
> Now, the predicated equivalences are likely not something that infrastructure
> can handle?
> 
> In the end I think we should research into maintaining an alternate expression
> table for conditions (those we like to simplify with
> equivalences) and use a data structure that more easily allows to introduce
> (temporary) equivalences.  Like by maintaining back-references of values we've
> recorded a condition for and a way to quickly re-canonicalize conditions.  
> Well -
> it requires some research, as said.
> 
> Richard.
> 
> > Regards,
> > Di Zhao
> >
> > Extend FRE with an "equivalence map" for condition prediction.
> >
> > 2021-06-24  Di Zhao  <diz...@os.amperecomputing.com>
> >

Thanks,
Di Zhao

--------
Extend FRE with an "equivalence map" for condition prediction.

2021-07-18  Di Zhao  <diz...@os.amperecomputing.com>

gcc/ChangeLog:
        PR tree-optimization/101186
        * tree-ssa-sccvn.c (vn_tracking_edge): Extracted utility function.
        (dominated_by_p_w_unex): Moved upward, no change.
        (vn_nary_op_get_predicated_value): Moved upward, no change.
        (struct val_equiv_hasher): Hasher for the "equivalence map".
        (is_vn_valid_at_bb): Check if vn_pval is valid at BB.
        (val_equiv_insert): Insert into "equivalence map".
        (vn_lookup_binary_op_result): Lookup binary expression's result by VN.
        (iterate_val_equivs): Iterate on equivalences and returns a non-NULL
        result returned by callback.
        (find_predicated_binary_by_lhs_equiv): Callback for iterate_val_equivs.
        Lookup a binary operations result by LHS equivalences.
        (find_predicated_binary_by_rhs_equiv): Callback for iterate_val_equivs.
        Lookup a binary operations result by RHS equivalences.
        (find_predicated_binary_by_equiv): Lookup predicated value of a binary
        operation by equivalences.
        (is_relaxed_cond_code): Whether operation code is a relaxed condition
        code derived from original code.
        (branch_may_come_from_another): Whether there's a path from the one
        true or false destination to another.
        (record_equiv_from_previous_edge): Record equivalence relation from a
        previous condition on current bb' true and false edges. 
        (record_equiv_from_previous_cond): Record equivalences generated by
        previous conditions on current BB's true and false edges.
        (vn_nary_op_insert_pieces_predicated): Extract utility function. Insert
        into the "equivalence map" for predicate like "x==y is true".
        (record_dom_to_phi_bb): Record mappings from immediate dominator to
        basic_block with PHIs.
        (vn_phi_lookup): Record mappings from immediate dominator to PHIs.
        (visit_nary_op): Add lookup predicated values of binaries by
        equivalences.
        (free_rpo_vn): Free the "equivalence map".
        (process_bb): Insert into & lookup from the "equivalence map".
        (struct unwind_state): Add "equivalence map" unwind state.
        (do_unwind): Unwind the "equivalence map".
        (do_rpo_vn): Update "equivalence map" unwind state.

gcc/testsuite/ChangeLog:
        PR tree-optimization/101186
        * gcc.dg/tree-ssa/pr71947-1.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-2.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-3.c: Disable fre.
        * gcc.dg/tree-ssa/pr71947-5.c: Disable fre.
        * gcc.dg/tree-ssa/vrp03.c: Disable fre.
        * gcc.dg/tree-ssa/ssa-fre-95.c: New test.
        * gcc.dg/tree-ssa/ssa-fre-96.c: New test.

Attachment: tree-optimization-101186-1.patch
Description: tree-optimization-101186-1.patch

Reply via email to