On 06/02/18 11:32, Richard Earnshaw (lists) wrote:
On 02/02/18 15:43, Kyrill Tkachov wrote:
Hi Richard,

On 02/02/18 15:25, Richard Earnshaw (lists) wrote:
On 02/02/18 15:10, Kyrill Tkachov wrote:
Hi all,

In this [8 Regression] PR we ICE because we can't recognise the insn:
(insn 59 58 43 7 (set (reg:DI 124)
          (rotatert:DI (reg:DI 125 [ c ])
              (subreg:QI (and:SI (reg:SI 128)
                      (const_int 65535 [0xffff])) 0)))
Aren't we heading off down the wrong path here?

(subreg:QI (and:SI (reg:SI 128) (const_int 65535 [0xffff])) 0))

can be simplified to

(subreg:QI (reg:SI 128) 0)

since the AND operation is redundant as we're only looking at the bottom
8 bits.
I have tried implementing such a transformation in combine [1]
but it was not clear that it was universally beneficial.
See the linked thread and PR 70119 for the discussion (the thread
continues into the next month).
Is that really the same thing?  The example there was using a mask that
was narrower than the subreg and thus not redundant.  The case here is
where the mask is completely redundant because we are only looking at
the bottom 8 bits of the result (which are not changed by the AND
operation).

I think you're right Richard, we can teach simplify-rtx to handle this.
This would make the fix a bit more involved.  The attached patch implements it.
It adds a simplification rule to simplify_subreg to collapse subregs
of AND-immediate operations when the mask covers the mode mask.
This allows us to simplify (subreg:QI (and:SI (reg:SI) (const_int 0xff)) 0)
into (subreg:QI (reg:SI) 0).
We cannot completely remove the creation of SUBREG-AND-immediate RTXes in the
aarch64 splitters because there are cases where we still need that construct,
in particular when the mask covers the just the mode size, for example:
(subreg:QI (and:SI (reg:SI) (const_int 31 [0x1F])) 0).
In this case we cannot simplify this to (subreg:QI (reg:SI) 0) in the midend
and we cannot rely on the aarch64-specific behaviour of the integer LSR, LSL, 
ROR
instructions to truncate the register shift amount by the mode width because
these patterns may match the integer SISD alternatives (USHR and friends) that 
don't
perform an implicit truncation.

This patch passes bootstrap and test on arm-none-linux-gnueabihf and 
aarch64-none-linux-gnu.
There is a regression on aarch64 in the gcc.target/aarch64/bfxil_1.c testcase 
that I address
with a separate patch. There is also an i386 regression that I address 
separately too.

Is this version preferable? I'll ping the midend maintainers for the 
simplify-rtx.c change if so.
Thanks,
Kyrill

2018-02-08  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>

    PR target/84164
    * simplify-rtx.c (simplify_subreg): Simplify subreg of masking
    operation when mask covers the outermode bits.
    * config/aarch64/aarch64.md (*aarch64_reg_<mode>3_neg_mask2):
    Use simplify_gen_subreg when creating a SUBREG.
    (*aarch64_reg_<mode>3_minus_mask): Likewise.
    (*aarch64_<optab>_reg_di3_mask2): Use const_int_operand predicate
    for operand 3.

2018-02-08  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>

    PR target/84164
    * gcc.c-torture/compile/pr84164.c: New test.
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 0d13c356d2a9b86c701c15b818e95df1a00abfc5..d9b4a405c9a1e5c09278b2329e5d73f12b0b963d 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4278,8 +4278,10 @@ (define_insn_and_split "*aarch64_reg_<mode>3_neg_mask2"
     emit_insn (gen_negsi2 (tmp, operands[2]));
 
     rtx and_op = gen_rtx_AND (SImode, tmp, operands[3]);
-    rtx subreg_tmp = gen_rtx_SUBREG (GET_MODE (operands[4]), and_op,
-				     SUBREG_BYTE (operands[4]));
+    rtx subreg_tmp = simplify_gen_subreg (GET_MODE (operands[4]), and_op,
+					   SImode, SUBREG_BYTE (operands[4]));
+
+    gcc_assert (subreg_tmp);
     emit_insn (gen_<optab><mode>3 (operands[0], operands[1], subreg_tmp));
     DONE;
   }
@@ -4305,9 +4307,10 @@ (define_insn_and_split "*aarch64_reg_<mode>3_minus_mask"
     emit_insn (gen_negsi2 (tmp, operands[3]));
 
     rtx and_op = gen_rtx_AND (SImode, tmp, operands[4]);
-    rtx subreg_tmp = gen_rtx_SUBREG (GET_MODE (operands[5]), and_op,
-				     SUBREG_BYTE (operands[5]));
+    rtx subreg_tmp = simplify_gen_subreg (GET_MODE (operands[5]), and_op,
+					   SImode, SUBREG_BYTE (operands[5]));
 
+    gcc_assert (subreg_tmp);
     emit_insn (gen_ashl<mode>3 (operands[0], operands[1], subreg_tmp));
     DONE;
   }
@@ -4318,9 +4321,9 @@ (define_insn "*aarch64_<optab>_reg_di3_mask2"
 	(SHIFT:DI
 	  (match_operand:DI 1 "register_operand" "r")
 	  (match_operator 4 "subreg_lowpart_operator"
-	   [(and:SI (match_operand:SI 2 "register_operand" "r")
-		     (match_operand 3 "aarch64_shift_imm_di" "Usd"))])))]
-  "((~INTVAL (operands[3]) & (GET_MODE_BITSIZE (DImode)-1)) == 0)"
+	    [(and:SI (match_operand:SI 2 "register_operand" "r")
+		     (match_operand 3 "const_int_operand" "n"))])))]
+  "((~INTVAL (operands[3]) & (GET_MODE_BITSIZE (DImode) - 1)) == 0)"
 {
   rtx xop[3];
   xop[0] = operands[0];
diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 2e7aa5c12952ab1a9b49b5adaf23710327e577d3..af06d7502cebac03cefc689b2646874b8397e767 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -6474,6 +6474,18 @@ simplify_subreg (machine_mode outermode, rtx op,
       return NULL_RTX;
     }
 
+  /* Simplify (subreg:QI (and:SI (reg:SI) (const_int 0xffff)) 0)
+     into (subreg:QI (reg:SI) 0).  */
+  scalar_int_mode int_outermode, int_innermode;
+  if (!paradoxical_subreg_p (outermode, innermode)
+      && is_a <scalar_int_mode> (outermode, &int_outermode)
+      && is_a <scalar_int_mode> (innermode, &int_innermode)
+      && GET_CODE (op) == AND && CONST_INT_P (XEXP (op, 1))
+      && known_eq (subreg_lowpart_offset (outermode, innermode), byte)
+      && (~INTVAL (XEXP (op, 1)) & GET_MODE_MASK (int_outermode)) == 0
+      && validate_subreg (outermode, innermode, XEXP (op, 0), byte))
+    return gen_rtx_SUBREG (outermode, XEXP (op, 0), byte);
+
   /* A SUBREG resulting from a zero extension may fold to zero if
      it extracts higher bits that the ZERO_EXTEND's source bits.  */
   if (GET_CODE (op) == ZERO_EXTEND && SCALAR_INT_MODE_P (innermode))
@@ -6483,7 +6495,7 @@ simplify_subreg (machine_mode outermode, rtx op,
 	return CONST0_RTX (outermode);
     }
 
-  scalar_int_mode int_outermode, int_innermode;
+
   if (is_a <scalar_int_mode> (outermode, &int_outermode)
       && is_a <scalar_int_mode> (innermode, &int_innermode)
       && known_eq (byte, subreg_lowpart_offset (int_outermode, int_innermode)))
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr84164.c b/gcc/testsuite/gcc.c-torture/compile/pr84164.c
new file mode 100644
index 0000000000000000000000000000000000000000..d663fe5d6649e495f3e956e6a3bc938236a4bf91
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr84164.c
@@ -0,0 +1,17 @@
+/* PR target/84164.  */
+
+int b, d;
+unsigned long c;
+
+static inline __attribute__ ((always_inline)) void
+bar (int e)
+{
+  e += __builtin_sub_overflow_p (b, d, c);
+  c = c << ((short) ~e & 3) | c >> (-((short) ~e & 3) & 63);
+}
+
+int
+foo (void)
+{
+  bar (~1);
+}

Reply via email to