On Wed, Apr 14, 2021 at 05:31:23PM +0100, Richard Sandiford wrote:
> > +(define_split
> > +  [(set (match_operand:GPI 0 "register_operand")
> > +   (LOGICAL:GPI
> > +     (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand")
> > +                          (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
> > +              (match_operand:GPI 4 "const_int_operand"))
> > +     (zero_extend:GPI (match_operand 3 "register_operand"))))]
> > +  "can_create_pseudo_p ()
> > +   && REG_P (operands[1])
> > +   && REG_P (operands[3])
> > +   && REGNO (operands[1]) == REGNO (operands[3])
> > +   && ((unsigned HOST_WIDE_INT)
> > +       trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[3]))
> > +                      << INTVAL (operands[2]), <MODE>mode)
> > +       == UINTVAL (operands[4]))"
> 
> IMO this would be easier to understand as:
> 
>    && (trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[3]))
>                          << INTVAL (operands[2]), <MODE>mode)
>        == INTVAL (operands[4]))
> 
> (At first I thought the cast and UINTVAL were trying to escape the
> sign-extension canonicalisation.)

It is ok to write it that way, you're right, I wrote it with
UINTVAL etc. because I initially didn't use trunc_int_for_mode
but that is wrong for SImode if the mask is shifted into bit 31.

> I'm not sure about this one though.  The REGNO checks mean that this is
> effectively for hard registers only.  I thought one of the reasons for
> make_more_copies was to avoid combining hard registers like this, so I'm
> not sure we should have a pattern that specifically targets them.
> 
> Segher, have I misunderstood?

Yes, this one works only with the hard regs, the problem is that when
the hard regs are there, combiner doesn't try anything else, so without
such splitter it punts on that.
If I add yet another testcase which doesn't have hard registers, like:
unsigned
or_shift2 (void)
{
  unsigned char i = 0;
  asm volatile ("" : "+r" (i));
  return i | (i << 11);
}
then my patch doesn't handle that case, and the only splitter that would
help would need to deal with:
(set (reg/i:SI 0 x0)
    (ior:SI (and:SI (ashift:SI (subreg:SI (reg:QI 97 [ i ]) 0)
                (const_int 11 [0xb]))
            (const_int 522240 [0x7f800]))
        (zero_extend:SI (reg:QI 97 [ i ]))))
I have added another combine splitter for this below.  But as you can
see, what combiner simplification comes with isn't really consistent
and orthogonal, different operations in there look quite differently :(.

> These two look good to me apart from the cast nit.  The last one feels
> like it's more general than just sign_extends though.  I guess it would
> work for any duplicated operation that can be performed in a single
> instruction.

True, but only very small portion of them can actually make it through,
it needs something that combine has been able to propagate into another
instruction.  So if we know about other insns that would look the same
and would actually be ever matched, we can e.g. define an operator predicate
for it, but until we have testcases for that, not sure it is worth it.

Here is an updated patch that handles also the zero extends without hard
registers and doesn't have the UHWI casts (but untested for now except
for the testcase):

2021-04-14  Jakub Jelinek  <ja...@redhat.com>

        PR target/100056
        * config/aarch64/aarch64.md (*<LOGICAL:optab>_<SHIFT:optab><mode>3):
        Add combine splitters for *<LOGICAL:optab>_ashl<mode>3 with
        ZERO_EXTEND, SIGN_EXTEND or AND.

        * gcc.target/aarch64/pr100056.c: New test.

--- gcc/config/aarch64/aarch64.md.jj    2021-04-13 20:41:45.030040848 +0200
+++ gcc/config/aarch64/aarch64.md       2021-04-14 19:07:41.641623978 +0200
@@ -4431,6 +4431,75 @@ (define_insn "*<LOGICAL:optab>_<SHIFT:op
   [(set_attr "type" "logic_shift_imm")]
 )
 
+(define_split
+  [(set (match_operand:GPI 0 "register_operand")
+       (LOGICAL:GPI
+         (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand")
+                              (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
+                  (match_operand:GPI 4 "const_int_operand"))
+         (zero_extend:GPI (match_operand 3 "register_operand"))))]
+  "can_create_pseudo_p ()
+   && REG_P (operands[1])
+   && REG_P (operands[3])
+   && REGNO (operands[1]) == REGNO (operands[3])
+   && (trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[3]))
+                          << INTVAL (operands[2]), <MODE>mode)
+       == INTVAL (operands[4]))"
+  [(set (match_dup 4) (zero_extend:GPI (match_dup 3)))
+   (set (match_dup 0) (LOGICAL:GPI (ashift:GPI (match_dup 4) (match_dup 2))
+                                  (match_dup 4)))]
+  "operands[4] = gen_reg_rtx (<MODE>mode);"
+)
+
+(define_split
+  [(set (match_operand:GPI 0 "register_operand")
+       (LOGICAL:GPI
+         (and:GPI (ashift:GPI (match_operator:GPI 4 "subreg_lowpart_operator"
+                                [(match_operand 1 "register_operand")])
+                              (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
+                  (match_operand:GPI 3 "const_int_operand"))
+         (zero_extend:GPI (match_dup 1))))]
+  "can_create_pseudo_p ()
+   && (trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[1]))
+                          << INTVAL (operands[2]), <MODE>mode)
+       == INTVAL (operands[3]))"
+  [(set (match_dup 4) (zero_extend:GPI (match_dup 1)))
+   (set (match_dup 0) (LOGICAL:GPI (ashift:GPI (match_dup 4) (match_dup 2))
+                                  (match_dup 4)))]
+  "operands[4] = gen_reg_rtx (<MODE>mode);"
+)
+
+(define_split
+  [(set (match_operand:GPI 0 "register_operand")
+       (LOGICAL:GPI
+         (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand")
+                              (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
+                  (match_operand:GPI 4 "const_int_operand"))
+         (and:GPI (match_dup 1) (match_operand:GPI 3 "const_int_operand"))))]
+  "can_create_pseudo_p ()
+   && pow2_or_zerop (UINTVAL (operands[3]) + 1)
+   && (trunc_int_for_mode (UINTVAL (operands[3])
+                          << INTVAL (operands[2]), <MODE>mode)
+       == INTVAL (operands[4]))"
+  [(set (match_dup 4) (and:GPI (match_dup 1) (match_dup 3)))
+   (set (match_dup 0) (LOGICAL:GPI (ashift:GPI (match_dup 4) (match_dup 2))
+                                  (match_dup 4)))]
+  "operands[4] = gen_reg_rtx (<MODE>mode);"
+)
+
+(define_split
+  [(set (match_operand:GPI 0 "register_operand")
+       (LOGICAL:GPI
+         (ashift:GPI (sign_extend:GPI (match_operand 1 "register_operand"))
+                     (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
+         (sign_extend:GPI (match_dup 1))))]
+  "can_create_pseudo_p ()"
+  [(set (match_dup 4) (sign_extend:GPI (match_dup 1)))
+   (set (match_dup 0) (LOGICAL:GPI (ashift:GPI (match_dup 4) (match_dup 2))
+                                  (match_dup 4)))]
+  "operands[4] = gen_reg_rtx (<MODE>mode);"
+)
+
 (define_insn "*<optab>_rol<mode>3"
   [(set (match_operand:GPI 0 "register_operand" "=r")
        (LOGICAL:GPI (rotate:GPI
--- gcc/testsuite/gcc.target/aarch64/pr100056.c.jj      2021-04-14 
18:54:25.885626705 +0200
+++ gcc/testsuite/gcc.target/aarch64/pr100056.c 2021-04-14 19:00:00.837828080 
+0200
@@ -0,0 +1,58 @@
+/* PR target/100056 */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-not {\t[us]bfiz\tw[0-9]+, w[0-9]+, 11} } } */
+
+int
+or_shift_u8 (unsigned char i)
+{
+  return i | (i << 11);
+}
+
+int
+or_shift_u3a (unsigned i)
+{
+  i &= 7;
+  return i | (i << 11);
+}
+
+int
+or_shift_u3b (unsigned i)
+{
+  i = (i << 29) >> 29;
+  return i | (i << 11);
+}
+
+int
+or_shift_s16 (signed short i)
+{
+  return i | (i << 11);
+}
+
+int
+or_shift_s8 (signed char i)
+{
+  return i | (i << 11);
+}
+
+int
+or_shift_s13 (int i)
+{
+  i = (i << 19) >> 19;
+  return i | (i << 11);
+}
+
+int
+or_shift_s3 (int i)
+{
+  i = (i << 29) >> 29;
+  return i | (i << 11);
+}
+
+int
+or_shift_u8_asm (unsigned char x)
+{
+  unsigned char i = x;
+  asm volatile ("" : "+r" (i));
+  return i | (i << 11);
+}


        Jakub

Reply via email to