On 16/06/17 22:08, Michael Collison wrote: > This patch improves code generation for literal vector construction by > expanding and exposing the pattern to rtl optimization earlier. The current > implementation delays splitting the pattern until after reload which results > in poor code generation for the following code: > > > #include "arm_neon.h" > > int16x8_t > foo () > { > return vcombine_s16 (vdup_n_s16 (0), vdup_n_s16 (8)); > } > > Trunk generates: > > foo: > movi v1.2s, 0 > movi v0.4h, 0x8 > dup d2, v1.d[0] > ins v2.d[1], v0.d[0] > orr v0.16b, v2.16b, v2.16b > ret > > With the patch we now generate: > > foo: > movi v1.4h, 0x8 > movi v0.4s, 0 > ins v0.d[1], v1.d[0] > ret > > Bootstrapped and tested on aarch64-linux-gnu. Okay for trunk. > > 2017-06-15 Michael Collison <michael.colli...@arm.com> > > * config/aarch64/aarch64-simd.md(aarch64_combine_internal<mode>): > Convert from define_insn_and_split into define_expand > * config/aarch64/aarch64.c(aarch64_split_simd_combine): > Allow register and subreg operands. >
Your changelog entry is confusing. You've deleted the aarch64_combine_internal<mode> pattern entirely, having merged some of its functionality directly into its caller (aarch64_combine<mode>). So I think it should read: * config/aarch64/aarch64-simd.md (aarch64_combine<mode>): Directly call aarch64_split_simd_combine. (aarch64_combine_internal<mode>): Delete pattern. * ... Note also there should be a space between the file name and the open bracket for the first function name. Why don't you need the big-endian code path any more? R. > > pr7057.patch > > > diff --git a/gcc/config/aarch64/aarch64-simd.md > b/gcc/config/aarch64/aarch64-simd.md > index c462164..4a253a9 100644 > --- a/gcc/config/aarch64/aarch64-simd.md > +++ b/gcc/config/aarch64/aarch64-simd.md > @@ -2807,27 +2807,11 @@ > op1 = operands[1]; > op2 = operands[2]; > } > - emit_insn (gen_aarch64_combine_internal<mode> (operands[0], op1, op2)); > - DONE; > -} > -) > > -(define_insn_and_split "aarch64_combine_internal<mode>" > - [(set (match_operand:<VDBL> 0 "register_operand" "=&w") > - (vec_concat:<VDBL> (match_operand:VDC 1 "register_operand" "w") > - (match_operand:VDC 2 "register_operand" "w")))] > - "TARGET_SIMD" > - "#" > - "&& reload_completed" > - [(const_int 0)] > -{ > - if (BYTES_BIG_ENDIAN) > - aarch64_split_simd_combine (operands[0], operands[2], operands[1]); > - else > - aarch64_split_simd_combine (operands[0], operands[1], operands[2]); > + aarch64_split_simd_combine (operands[0], op1, op2); > + > DONE; > } > -[(set_attr "type" "multiple")] > ) > > (define_expand "aarch64_simd_combine<mode>" > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index 2e385c4..46bd78b 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -1650,7 +1650,8 @@ aarch64_split_simd_combine (rtx dst, rtx src1, rtx src2) > > gcc_assert (VECTOR_MODE_P (dst_mode)); > > - if (REG_P (dst) && REG_P (src1) && REG_P (src2)) > + if (register_operand (dst, dst_mode) && register_operand (src1, src_mode) > + && register_operand (src2, src_mode)) > { > rtx (*gen) (rtx, rtx, rtx); > >