Hi Michael,
On 19 May 2017 at 09:21, Richard Sandiford <richard.sandif...@linaro.org> wrote: > Thanks for doing this. Just a couple of comments about the .md stuff: > > Michael Collison <michael.colli...@arm.com> writes: >> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md >> index 5adc5ed..c6ae670 100644 >> --- a/gcc/config/aarch64/aarch64.md >> +++ b/gcc/config/aarch64/aarch64.md >> @@ -3999,6 +3999,92 @@ >> } >> ) >> >> +;; When the LSL, LSR, ASR, ROR instructions operate on all register >> arguments >> +;; they truncate the shift/rotate amount by the size of the registers they >> +;; operate on: 32 for W-regs, 63 for X-regs. This allows us to optimise >> away >> +;; such redundant masking instructions. GCC can do that automatically when >> +;; SHIFT_COUNT_TRUNCATED is true, but we can't enable it for TARGET_SIMD >> +;; because some of the SISD shift alternatives don't perform this >> truncations. >> +;; So this pattern exists to catch such cases. >> + >> +(define_insn "*aarch64_<optab>_reg_<mode>3_mask1" >> + [(set (match_operand:GPI 0 "register_operand" "=r") >> + (SHIFT:GPI >> + (match_operand:GPI 1 "register_operand" "r") >> + (subreg:QI (and:GPI >> + (match_operand:GPI 2 "register_operand" "r") >> + (match_operand 3 "const_int_operand" "n")) 0)))] >> + "(~INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode)-1)) == 0" >> + "<shift>\t%<w>0, %<w>1, %<w>2" >> + [(set_attr "type" "shift_reg")] >> +) > > (subreg:QI (...) 0) is only correct for little endian. For big endian > it needs to be 3 for SI or 7 for DI. You could probably handle that > using something like: > > (match_operator:QI 2 "lowpart_subreg" > [(and:GPI ...)]) > > and defining a lowpart_subreg predicate that checks the SUBREG_BYTE. > Or just leave the subreg as-is and add !BYTES_BIG_ENDIAN to the insn > condition. > I can confirm that the new tests pass on little-endian, but fail on big. Thanks, Christophe >> +(define_insn_and_split "*aarch64_reg_<mode>3_neg_mask2" >> + [(set (match_operand:GPI 0 "register_operand" "=r") >> + (SHIFT:GPI >> + (match_operand:GPI 1 "register_operand" "r") >> + (subreg:QI (neg:SI (and:SI >> + (match_operand:SI 2 "register_operand" "r") >> + (match_operand 3 "const_int_operand" "n"))) 0)))] >> + "((~INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode)-1)) == 0) >> + && can_create_pseudo_p ()" >> + "#" >> + "&& true" >> + [(const_int 0)] >> + { >> + rtx tmp = gen_reg_rtx (SImode); >> + >> + emit_insn (gen_negsi2 (tmp, operands[2])); >> + rtx tmp2 = simplify_gen_subreg (QImode, tmp, SImode, 0); >> + emit_insn (gen_<optab><mode>3 (operands[0], operands[1], tmp2)); >> + DONE; >> + } >> +) > > Insn patterns shouldn't check can_create_pseudo_p, because there's no > guarantee that the associated split happens before RA. In this case it > should be safe to reuse operand 0 after RA if you change it to: > > [(set (match_operand:GPI 0 "register_operand" "=&r") > ...)] > > and then: > > rtx tmp = (can_create_pseudo_p () > ? gen_reg_rtx (SImode) > : gen_lowpart (SImode, operands[0])); > > Thanks, > Richard