Jakub Jelinek <ja...@redhat.com> writes:
> On Wed, Apr 14, 2021 at 05:31:23PM +0100, Richard Sandiford wrote:
>> > +(define_split
>> > +  [(set (match_operand:GPI 0 "register_operand")
>> > +  (LOGICAL:GPI
>> > +    (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand")
>> > +                         (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
>> > +             (match_operand:GPI 4 "const_int_operand"))
>> > +    (zero_extend:GPI (match_operand 3 "register_operand"))))]
>> > +  "can_create_pseudo_p ()
>> > +   && REG_P (operands[1])
>> > +   && REG_P (operands[3])
>> > +   && REGNO (operands[1]) == REGNO (operands[3])
>> > +   && ((unsigned HOST_WIDE_INT)
>> > +       trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[3]))
>> > +                     << INTVAL (operands[2]), <MODE>mode)
>> > +       == UINTVAL (operands[4]))"
>> 
>> IMO this would be easier to understand as:
>> 
>>    && (trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[3]))
>>                         << INTVAL (operands[2]), <MODE>mode)
>>        == INTVAL (operands[4]))
>> 
>> (At first I thought the cast and UINTVAL were trying to escape the
>> sign-extension canonicalisation.)
>
> It is ok to write it that way, you're right, I wrote it with
> UINTVAL etc. because I initially didn't use trunc_int_for_mode
> but that is wrong for SImode if the mask is shifted into bit 31.
>
>> I'm not sure about this one though.  The REGNO checks mean that this is
>> effectively for hard registers only.  I thought one of the reasons for
>> make_more_copies was to avoid combining hard registers like this, so I'm
>> not sure we should have a pattern that specifically targets them.
>> 
>> Segher, have I misunderstood?
>
> Yes, this one works only with the hard regs, the problem is that when
> the hard regs are there, combiner doesn't try anything else, so without
> such splitter it punts on that.
> If I add yet another testcase which doesn't have hard registers, like:
> unsigned
> or_shift2 (void)
> {
>   unsigned char i = 0;
>   asm volatile ("" : "+r" (i));
>   return i | (i << 11);
> }
> then my patch doesn't handle that case, and the only splitter that would
> help would need to deal with:
> (set (reg/i:SI 0 x0)
>     (ior:SI (and:SI (ashift:SI (subreg:SI (reg:QI 97 [ i ]) 0)
>                 (const_int 11 [0xb]))
>             (const_int 522240 [0x7f800]))
>         (zero_extend:SI (reg:QI 97 [ i ]))))
> I have added another combine splitter for this below.  But as you can
> see, what combiner simplification comes with isn't really consistent
> and orthogonal, different operations in there look quite differently :(.

Hmm, OK.  Still, the above looks reasonable on first principles.

>> These two look good to me apart from the cast nit.  The last one feels
>> like it's more general than just sign_extends though.  I guess it would
>> work for any duplicated operation that can be performed in a single
>> instruction.
>
> True, but only very small portion of them can actually make it through,
> it needs something that combine has been able to propagate into another
> instruction.  So if we know about other insns that would look the same
> and would actually be ever matched, we can e.g. define an operator predicate
> for it, but until we have testcases for that, not sure it is worth it.
>
> Here is an updated patch that handles also the zero extends without hard
> registers and doesn't have the UHWI casts (but untested for now except
> for the testcase):
>
> 2021-04-14  Jakub Jelinek  <ja...@redhat.com>
>
>       PR target/100056
>       * config/aarch64/aarch64.md (*<LOGICAL:optab>_<SHIFT:optab><mode>3):
>       Add combine splitters for *<LOGICAL:optab>_ashl<mode>3 with
>       ZERO_EXTEND, SIGN_EXTEND or AND.
>
>       * gcc.target/aarch64/pr100056.c: New test.
>
> --- gcc/config/aarch64/aarch64.md.jj  2021-04-13 20:41:45.030040848 +0200
> +++ gcc/config/aarch64/aarch64.md     2021-04-14 19:07:41.641623978 +0200
> @@ -4431,6 +4431,75 @@ (define_insn "*<LOGICAL:optab>_<SHIFT:op
>    [(set_attr "type" "logic_shift_imm")]
>  )
>  
> +(define_split
> +  [(set (match_operand:GPI 0 "register_operand")
> +     (LOGICAL:GPI
> +       (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand")
> +                            (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
> +                (match_operand:GPI 4 "const_int_operand"))
> +       (zero_extend:GPI (match_operand 3 "register_operand"))))]
> +  "can_create_pseudo_p ()
> +   && REG_P (operands[1])
> +   && REG_P (operands[3])
> +   && REGNO (operands[1]) == REGNO (operands[3])
> +   && (trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[3]))
> +                        << INTVAL (operands[2]), <MODE>mode)
> +       == INTVAL (operands[4]))"
> +  [(set (match_dup 4) (zero_extend:GPI (match_dup 3)))
> +   (set (match_dup 0) (LOGICAL:GPI (ashift:GPI (match_dup 4) (match_dup 2))
> +                                (match_dup 4)))]
> +  "operands[4] = gen_reg_rtx (<MODE>mode);"
> +)
> +
> +(define_split
> +  [(set (match_operand:GPI 0 "register_operand")
> +     (LOGICAL:GPI
> +       (and:GPI (ashift:GPI (match_operator:GPI 4 "subreg_lowpart_operator"
> +                              [(match_operand 1 "register_operand")])
> +                            (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
> +                (match_operand:GPI 3 "const_int_operand"))
> +       (zero_extend:GPI (match_dup 1))))]
> +  "can_create_pseudo_p ()
> +   && (trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[1]))
> +                        << INTVAL (operands[2]), <MODE>mode)
> +       == INTVAL (operands[3]))"
> +  [(set (match_dup 4) (zero_extend:GPI (match_dup 1)))
> +   (set (match_dup 0) (LOGICAL:GPI (ashift:GPI (match_dup 4) (match_dup 2))
> +                                (match_dup 4)))]
> +  "operands[4] = gen_reg_rtx (<MODE>mode);"

Minor, but it might be better to use operand 5 for the temporary here.
Otherwise this looks good apart from the open question about whether
we should be doing complex combinations involving hard regs.  Let's see
what Segher says about that side.

Thanks,
Richard

> +)
> +
> +(define_split
> +  [(set (match_operand:GPI 0 "register_operand")
> +     (LOGICAL:GPI
> +       (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand")
> +                            (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
> +                (match_operand:GPI 4 "const_int_operand"))
> +       (and:GPI (match_dup 1) (match_operand:GPI 3 "const_int_operand"))))]
> +  "can_create_pseudo_p ()
> +   && pow2_or_zerop (UINTVAL (operands[3]) + 1)
> +   && (trunc_int_for_mode (UINTVAL (operands[3])
> +                        << INTVAL (operands[2]), <MODE>mode)
> +       == INTVAL (operands[4]))"
> +  [(set (match_dup 4) (and:GPI (match_dup 1) (match_dup 3)))
> +   (set (match_dup 0) (LOGICAL:GPI (ashift:GPI (match_dup 4) (match_dup 2))
> +                                (match_dup 4)))]
> +  "operands[4] = gen_reg_rtx (<MODE>mode);"
> +)
> +
> +(define_split
> +  [(set (match_operand:GPI 0 "register_operand")
> +     (LOGICAL:GPI
> +       (ashift:GPI (sign_extend:GPI (match_operand 1 "register_operand"))
> +                   (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
> +       (sign_extend:GPI (match_dup 1))))]
> +  "can_create_pseudo_p ()"
> +  [(set (match_dup 4) (sign_extend:GPI (match_dup 1)))
> +   (set (match_dup 0) (LOGICAL:GPI (ashift:GPI (match_dup 4) (match_dup 2))
> +                                (match_dup 4)))]
> +  "operands[4] = gen_reg_rtx (<MODE>mode);"
> +)
> +
>  (define_insn "*<optab>_rol<mode>3"
>    [(set (match_operand:GPI 0 "register_operand" "=r")
>       (LOGICAL:GPI (rotate:GPI
> --- gcc/testsuite/gcc.target/aarch64/pr100056.c.jj    2021-04-14 
> 18:54:25.885626705 +0200
> +++ gcc/testsuite/gcc.target/aarch64/pr100056.c       2021-04-14 
> 19:00:00.837828080 +0200
> @@ -0,0 +1,58 @@
> +/* PR target/100056 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +/* { dg-final { scan-assembler-not {\t[us]bfiz\tw[0-9]+, w[0-9]+, 11} } } */
> +
> +int
> +or_shift_u8 (unsigned char i)
> +{
> +  return i | (i << 11);
> +}
> +
> +int
> +or_shift_u3a (unsigned i)
> +{
> +  i &= 7;
> +  return i | (i << 11);
> +}
> +
> +int
> +or_shift_u3b (unsigned i)
> +{
> +  i = (i << 29) >> 29;
> +  return i | (i << 11);
> +}
> +
> +int
> +or_shift_s16 (signed short i)
> +{
> +  return i | (i << 11);
> +}
> +
> +int
> +or_shift_s8 (signed char i)
> +{
> +  return i | (i << 11);
> +}
> +
> +int
> +or_shift_s13 (int i)
> +{
> +  i = (i << 19) >> 19;
> +  return i | (i << 11);
> +}
> +
> +int
> +or_shift_s3 (int i)
> +{
> +  i = (i << 29) >> 29;
> +  return i | (i << 11);
> +}
> +
> +int
> +or_shift_u8_asm (unsigned char x)
> +{
> +  unsigned char i = x;
> +  asm volatile ("" : "+r" (i));
> +  return i | (i << 11);
> +}
>
>
>       Jakub

Reply via email to