Re: [PATCH v2] ree: Improve ree pass for rs6000 target.

Jeff Law via Gcc-patches Fri, 14 Apr 2023 13:18:42 -0700



On 4/6/23 04:49, Ajit Agarwal wrote:

Hello All:

Eliminate unnecessary redundant extension within basic and across basic blocks. 
For rs6000 target we see redundant zero and sign extension and done to improve 
ree pass to eliminate such redundant zero and sign extension.

Bootstrapped and regtested on powerpc64-linux-gnu.

Thanks & Regards
Ajit


        ree: Improve ree pass for rs6000 target.

        Eliminate unnecessary redundant extension within basic
        and across basic blocks. For rs6000 target we see
        redundant zero and sign extension and done to improve
        ree pass to eliminate such redundant zero and sign
        extension.

        2023-04-06  Ajit Kumar Agarwal  <aagar...@linux.ibm.com>

gcc/ChangeLog:

        * ree.cc (eliminate_across_bbs_p): Add checks to enable extension
        elimination across and within basic blocks.
        (def_arith_p): New function to check definition has arithmetic
        operation.
        (combine_set_extension): Modification to incorporate AND
        and current zero_extend and sign_extend instruction.
        (combline_reaching_defs): Add zero_extend and sign_extend.
        Add FUNCTION_ARG_REGNO_P abi interfaces calls and
        FUNCTION_VALUE_REGNO_P support.
        (merge_def_and_ext): Add calls to eliminate_across_bbs_p and
        zero_extend sign_extend and AND instruction.
        (insn_is_zext_p): New function.
        (add_removable_extension): Add FUNCTION_ARG_REGNO_P abi
        interface calls.
        * common/config/rs6000/rs6000-common.cc: Add REE pass as a
        default rs6000 target pass for O2 and above.

gcc/testsuite/ChangeLog:

        * g++.target/powerpc/zext-elim.C: New testcase.
        * g++.target/powerpc/zext-elim-1.C: New testcase.
        * g++.target/powerpc/zext-elim-2.C: New testcase.
        * g++.target/powerpc/zext-elim-3.C: New testcase.
        * g++.target/powerpc/sext-elim.C: New testcase.

It would be useful to know the kinds of patterns you're trying toimprove. I get the sense there's at least three distinct cases you'retrying to handle.

One case appears to stem from operations which we know produce a zeroextended results. For example x & 0x1. We can kindof view that as azero extension from narrow modes up through word_mode since we know theupper bits are zero.


Another stems from exploiting ABI characteristics.

Finally extending to handle cases across basic blocks

These should be independent changes. So I can easily see this patchshould morph into a patch series with at least 4 entries.


1/4 Just moves code around and produces no functional changes.
2/4 Would implement the case where an operation is known to
    produce a zero extended result.
3/4 Would exploit the ABI characteristics to eliminate more
    extensions.
4/4 Would extend REE to work across blocks

WIth this kind of structure patches #1 and #2 might to in fairlyquickly, even if we have to figure out how to handle ABI issues. It'salso easier for the reviewer on multiple levels.

---
  gcc/common/config/rs6000/rs6000-common.cc     |   2 +
  gcc/ree.cc                                    | 655 ++++++++++++++----
  gcc/testsuite/g++.target/powerpc/sext-elim.C  |  18 +
  .../g++.target/powerpc/zext-elim-1.C          |  19 +
  .../g++.target/powerpc/zext-elim-2.C          |  11 +
  .../g++.target/powerpc/zext-elim-3.C          |  16 +
  gcc/testsuite/g++.target/powerpc/zext-elim.C  |  30 +
  7 files changed, 606 insertions(+), 145 deletions(-)
  create mode 100644 gcc/testsuite/g++.target/powerpc/sext-elim.C
  create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-1.C
  create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-2.C
  create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-3.C
  create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim.C

diff --git a/gcc/common/config/rs6000/rs6000-common.cc 
b/gcc/common/config/rs6000/rs6000-common.cc
index 2140c442ba9..968db215028 100644
--- a/gcc/common/config/rs6000/rs6000-common.cc
+++ b/gcc/common/config/rs6000/rs6000-common.cc
@@ -34,6 +34,8 @@ static const struct default_options 
rs6000_option_optimization_table[] =
      { OPT_LEVELS_ALL, OPT_fsplit_wide_types_early, NULL, 1 },
      /* Enable -fsched-pressure for first pass instruction scheduling.  */
      { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 },
+    /* Enable -free for zero extension and sign extension elimination.*/
+    { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 },

So if we're going to make this change, then we need to update thedocumentation as well (there's a section which lists which -f optionsare enabled at the different -O<n> optimization levels.

It may be better to have the ppc backend enable the REE pass on its ownrather than forcing it on for all the targets since it hasn't beentested on all the targets. That's been pretty standard practice for theREE implementation.

      /* Enable -munroll-only-small-loops with -funroll-loops to unroll small
         loops at -O2 and above by default.  */
      { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 },
diff --git a/gcc/ree.cc b/gcc/ree.cc
index 413aec7c8eb..8057f0325f4 100644
--- a/gcc/ree.cc
+++ b/gcc/ree.cc
@@ -253,6 +253,101 @@ struct ext_cand

static int max_insn_uid;+/* Get all the reaching definitions of an instruction. The definitions are

+   desired for REG used in INSN.  Return the definition list or NULL if a
+   definition is missing.  If DEST is non-NULL, additionally push the INSN
+   of the definitions onto DEST.  */
+
+static struct df_link *
+get_defs (rtx_insn *insn, rtx reg, vec<rtx_insn *> *dest)

[ ... ]

So you moved some functions around. This needs to be reflected in theChangeLog entry. Presumably you did this so that you didn't need tohave a prototype for the static functions?

+
+
+/* Identify instruction AND with identical zero extension.  */
+
+static unsigned int
+insn_is_zext_p (rtx insn)

This is *very* poorly named. You're not operating on an INSN at thispoint anymore. You're operating on an RTX. All INSNs are RTXs, but notall RTXs are INSNs.

The function comment should indicate what the input argument is as wellas the return value. We generally would prefer to use booleans forreturn values rather than integers when we can. So the return typeshould probably be adjusted.


/* Return TRUE if OP can be considered a zero extension from one or
   more sub-word modes to larger modes up to a full word.

   For example (and:DI (reg) (const_int X))

   Depending on the value of X could be considered a zero extension
   from QI, HI and SI to larger modes up to DImode.  */

+{
+  if (GET_CODE (insn) == AND)
+    {
+      rtx set = XEXP (insn, 0);
+      if (REG_P(set))

Formatting. Space between the macro/function name and the openparenthesis for arguments.

+       {
+         if (XEXP (insn, 1) == const1_rtx)
+           return 1;

This is *way* too specific. You probably want to be looking at theresult mode's mask and comparing the constant to that.

+/* Identify instruction AND with identical zero extension.  */
+
+static unsigned int
+insn_is_zext_p (rtx_insn *insn)

The function comment needs improvements similar to the other function.But this time you really are working with an INSN.

+{
+  rtx body = PATTERN (insn);
+
+  if (GET_CODE (body) == PARALLEL)
+     return 0;

You should consider handling the case where you've got a 2 elementPARALLEL where the first element is a suitable AND and the second isjust a CLOBBER. This case arises on targets that implement conditioncode registers.

Using single_set may be a good way to handle that. It'll give you backthe SET expression after stripping out the CLOBBER.

+
+  if (GET_CODE(body) == SET && GET_CODE (SET_SRC (body)) == AND)

Formatting nit.  GET_CODE(body) should be GET_CODE (body)).

+   {
+     rtx set = XEXP (SET_SRC (body), 0);
+
+     if (REG_P(set) && GET_MODE (SET_DEST (body)) == GET_MODE(set))

Multiple formatting nits here.

+       {
+        if (XEXP (SET_SRC (body), 1) == const1_rtx)
+          return 1;

Same comment as prior function. I don't think we want to restrict thisto just (const_int 1).

@@ -359,27 +479,45 @@ combine_set_extension (ext_cand *cand, rtx_insn 
*curr_insn, rtx *orig_set)
+      if (GET_CODE (SET_SRC (cand_pat)) == AND)
+       temp_extension
+       = gen_rtx_fmt_ee (cand->code, cand->mode, XEXP (orig_src, 0),
+                         XEXP (orig_src, 1));
+      else
+       temp_extension

+ = gen_rtx_fmt_e (cand->code, cand->mode, XEXP (orig_src, 0));Isn't cand->code always AND here? Would it make more sense to use

gen_rtx_AND? The reason the other code doesn't here is because I thinkthe candidate could be a zero or sign extension, thus we have to usegen_rtx_fmt_e.

    else if (GET_CODE (orig_src) == IF_THEN_ELSE)
      {
        /* Only IF_THEN_ELSE of phi-type copies are combined.  Otherwise,
-         in general, IF_THEN_ELSE should not be combined.  */
-      return false;
+       in general, IF_THEN_ELSE should not be combined.  Relaxed
+       cases with IF_THEN_ELSE across basic blocls */
+       return true;

Hard to know if this is safe/correct. I would *strongly* suggestbreaking this patch up along the lines I've suggested above. That waywe can focus on one significant change at a time.

+       temp_extension
+       = gen_rtx_fmt_e (cand->code, cand->mode,orig_src);

Please be careful with formatting. THere's always a space after a commain an argument list.

+
        rtx simplified_temp_extension = simplify_rtx (temp_extension);
+
        if (simplified_temp_extension)
          temp_extension = simplified_temp_extension;
+
        new_set = gen_rtx_SET (new_reg, temp_extension);
      }

You're introducing extraneous newlines. Sometimes that's useful toprovide visual indications of related code. But be aware that oftenfolks will object to such changes intermixed in a large patch.

+static bool

+def_arith_p (rtx_insn *insn, rtx orig_src)

Missing a comment indicating what this function does, its arguments andreturn value. These are important so that when someone looks at yourcode later they can quickly ascertain the general purpose of thefunction without having to read and try to understand all the code.

+{
+  if (!REG_P (orig_src)) return true;

Bring the "return true;" down to a new line.

+
+  vec<rtx_insn *> *dest = XCNEWVEC (vec<rtx_insn *>, 4);
+  if (!get_defs (insn, orig_src, dest))
+    return false;
+
+  int i;
+  rtx_insn *def_insn;
+  bool has_arith = false;
+
+  FOR_EACH_VEC_ELT (*dest, i, def_insn)
+    {
+      if (DEBUG_INSN_P (def_insn))
+       {
+         has_arith = true;
+         break;
+       }

Hard to know since there's no function comment, but does it make senseto set HAS_ARITH for a DEBUG_INSN?

+
+      if ((GET_CODE (PATTERN (def_insn)) == SET
+          && (GET_CODE (SET_SRC (PATTERN (def_insn))) == ASHIFT
+          || GET_CODE (SET_SRC (PATTERN (def_insn))) == LSHIFTRT))
+          //|| GET_CODE (SET_SRC (PATTERN (def_insn))) == ZERO_EXTEND))
+          || GET_CODE (PATTERN (def_insn)) == PARALLEL)
+       {
+         has_arith = true;
+         break;
+       }

So there's the commented out line in here. That should be removed. Itseems to me that there should be a comment indicating what propertywe're looking for so that the reader knows why these codes were chosen.

+
+      if (GET_CODE (PATTERN (def_insn)) == SET
+         && (GET_RTX_CLASS (GET_CODE (SET_SRC (PATTERN (def_insn)))) == 
RTX_BIN_ARITH
+         || GET_RTX_CLASS (GET_CODE (SET_SRC (PATTERN (def_insn)))) == 
RTX_COMM_ARITH))
+       {
+         rtx src = XEXP (SET_SRC (PATTERN (def_insn)),0);
+
+         if (GET_CODE (src) == LSHIFTRT
+             || GET_CODE (src) == ASHIFT)
+           {
+             has_arith = true;
+             break;
+           }
+       }

Similarly.  Why are these particular RTX codes relevant?

+}
+
+/* Find feasibility of extension elimination
+   across basic blocks.  */

Needs to document the arguments and return value.

+
+static bool
+eliminate_across_bbs_p (ext_cand *cand,
+                       rtx_insn *def_insn)

Both arguments should probably be on the same line.  The general rule is
if they fit in 80 columns, then don't split them.

+{
+  basic_block bb = BLOCK_FOR_INSN (cand->insn);
+  edge fallthru_edge;
+  edge e;
+  edge_iterator ei;
+
+  FOR_EACH_EDGE (e, ei, bb->preds)
+    {
+      rtx_insn *insn = BB_END (e->src) ? PREV_INSN (BB_END (e->src)) : NULL;
+
+      if (insn && NONDEBUG_INSN_P (insn)
+         && GET_CODE (PATTERN (insn)) == SET && SET_SRC (PATTERN(insn))
+         && GET_CODE (SET_SRC (PATTERN (insn))) == IF_THEN_ELSE)

Filter out the NONDEBUG_INSN_P like this:

        if (!NONDEBUG_INSN_P (insn))
          continue;

Use single_set to ensure you have a proper set. If your code can'thandle a single_set inside a clobber, then what you've written is OK,but it should at least be documented.

THere's also multiple whitespace problems in your code. I wouldstrongly suggest you review the code formatting guidelines and perhapsuse one of the various formatting tools out there to help make sureyou're following the guidelines.

So at a high level, what property are you looking for from a CFGstandpoint? I'd hazard a guess you're looking for extended basic blocksor dominance.

+
+   rtx set = single_set(cand->insn);
+   /* The destination register of the extension insn must not be
+        used or set between the def_insn and cand->insn exclusive.  */
+   if (INSN_CHAIN_CODE_P (GET_CODE (def_insn))
+       && INSN_CHAIN_CODE_P (cand->code))
+     if ((cand->code == ZERO_EXTEND)
+         && REG_P (SET_DEST (set)) && NEXT_INSN (def_insn)
+         && (reg_used_between_p (SET_DEST (set), def_insn, cand->insn)
+               || reg_set_between_p (SET_DEST (set), def_insn, cand->insn)))
+       return false;

This looks similar to other check(s) in ree.cc. Would it make sense tofactor this test into its own function? If so, that would belong inpatch #1 of the proposed series.

+
+   if (cand->code == ZERO_EXTEND && (bb != BLOCK_FOR_INSN (def_insn)
+       || DF_INSN_LUID (def_insn) > DF_INSN_LUID (cand->insn)))
+     return false;

Formatting problems with your operations. That makes this much harderto read. Bring the && down to a new line and indent it just past theopen paren. The || will need further indentation as well.

+
+   if (insn_is_zext_p (cand->insn)
+       && GET_CODE (PATTERN (BB_END (bb))) != USE)
+     return false;

So why is the USE important here? And if it is, don't you need to checkthe argument of hte use for something?

+
+   if (insn_is_zext_p (cand->insn)
+       && GET_CODE (PATTERN (BB_END (bb))) == USE
+       && REGNO (XEXP (PATTERN (BB_END (bb)), 0)) != REGNO (SET_DEST 
(cand->expr)))
+     return false;

You've repeated some of the tests.  Please clean this up.

+
+
+   if (cand->code == SIGN_EXTEND
+       && GET_CODE ((PATTERN (def_insn))) == SET)
+     {
+       rtx orig_src = SET_SRC (PATTERN (def_insn));
+       machine_mode ext_src_mode;
+
+       ext_src_mode = GET_MODE (XEXP (SET_SRC (cand->expr), 0));
+
+       if (GET_MODE (SET_DEST (PATTERN (def_insn))) != ext_src_mode)
+         return false;
+       if (GET_CODE (orig_src) != PLUS)
+         return false;
+
+       if (!REG_P (XEXP (orig_src, 0)))
+         return false;
+
+       if (!REG_P (XEXP (orig_src,1)))
+         return false;
+
+       if (GET_CODE (orig_src) == PLUS)
+         {
+           bool def_src1
+             = def_arith_p (def_insn,
+                            XEXP (SET_SRC (PATTERN(def_insn)), 0));
+           bool def_src2
+              = def_arith_p (def_insn,
+                            XEXP (SET_SRC (PATTERN(def_insn)), 1));
+
+           if (def_src1 || def_src2)
+             return false;
+       }
+     }

So are you relying on WORD_REGISTER_OPERATIONS here? It's like you'reexpecting arithmetic to be setting full words for sub-word operations.If so, you need to actually verify WORD_REGISTER_OPERATIONS is in effectand that the mode of the operation is smaller than a word.

+
  /* Merge the DEF_INSN with an extension.  Calls combine_set_extension
     on the SET pattern.  */

@@ -710,15 +971,26 @@ merge_def_and_ext (ext_cand *cand, rtx_insn *def_insn, ext_state *state)

    ext_src_mode = GET_MODE (XEXP (SET_SRC (cand->expr), 0));
    sub_rtx = get_sub_rtx (def_insn);

+

    if (sub_rtx == NULL)
      return false;

- if (GET_MODE (SET_DEST (*sub_rtx)) == ext_src_mode

-         || ((state->modified[INSN_UID (def_insn)].kind
-              == (cand->code == ZERO_EXTEND
+  bool copy_needed
+    = (REGNO (SET_DEST (cand->expr)) != REGNO (XEXP (SET_SRC (cand->expr), 
0)));
+
+  bool feasible =  eliminate_across_bbs_p (cand, def_insn);
+
+  if (!feasible) return false;

Various formatting issues.

+
+  if (((!copy_needed && (insn_is_zext_p (cand->insn)
+       || (cand->code == ZERO_EXTEND || cand->code == SIGN_EXTEND))
+       && (cand->code == SIGN_EXTEND || GET_MODE (SET_DEST (*sub_rtx)) != 
ext_src_mode)
+       && state->modified[INSN_UID (def_insn)].kind == EXT_MODIFIED_NONE))
+       || ((state->modified[INSN_UID (def_insn)].kind
+               == (cand->code == ZERO_EXTEND
                   ? EXT_MODIFIED_ZEXT : EXT_MODIFIED_SEXT))
-             && state->modified[INSN_UID (def_insn)].mode
-                == ext_src_mode))
+            && state->modified[INSN_UID (def_insn)].mode
+               == ext_src_mode))

Ugh. What a mess. This huge condition deserves a comment. THere maybe formatting issues in here too. Mailers tend to muck up whitespace soit's a bit hard to be 100% sure.

@@ -771,6 +1044,45 @@ combine_reaching_defs (ext_cand *cand, const_rtx set_pat, 
ext_state *state)
    state->defs_list.truncate (0);
    state->copies_list.truncate (0);

+ if (cand->code == ZERO_EXTEND)

+    {
+      rtx orig_src = XEXP (SET_SRC (cand->expr),0);
+      if (!get_defs (cand->insn, orig_src, NULL))
+       {
+         if (GET_MODE (orig_src) == QImode
+             && FUNCTION_ARG_REGNO_P (REGNO (orig_src))
+             && !FUNCTION_VALUE_REGNO_P (REGNO (orig_src)))

Way too specific. You don't want to be checking specific modes. It'salso the case that you can't necessarily depend on ABIs to guarantee aparticular zero/sign extension. As has been mentioned earlier in thisthread we need a way to describe what the ABI guarantees & requires.Just assuming that these are extended is wrong.

          FOR_EACH_VEC_ELT (state->modified_list, i, def_insn)
            {
              ext_modified *modified = &state->modified[INSN_UID (def_insn)];
              if (modified->kind == EXT_MODIFIED_NONE)
                modified->kind = (cand->code == ZERO_EXTEND ? EXT_MODIFIED_ZEXT
-                                                           : 
EXT_MODIFIED_SEXT);
+                                                           : 
EXT_MODIFIED_SEXT);

Looks like a gratutious whitespace change. If the formatting on thiswas wrong before, then include the formatting fix in patch #1 of the series.

I think at this point there's enough TODOs for you. I look forward toseeing an updated series.


Jeff

Re: [PATCH v2] ree: Improve ree pass for rs6000 target.

Reply via email to