*ping* Thanks, James
On Tue, Jun 24, 2014 at 09:45:28AM +0100, James Greenhalgh wrote: > > Hi, > > vec_concat ( { a, b }, { c, d }) should give a new vector { a, b, c, d }. > > On big-endian aarch64 targets, we have to think carefully about what this > means as we map GCC's view of endian-ness on to ours. GCC (for reasons I have > yet to understand) likes to describe lane-extracts from a vector as > endian-ness dependant bit-field extracts. This cause major headaches, and > means we have to pretend throughout the backend that lane zero is at the > high bits of a vector register. > > When we have a machine instruction which zeroes the high bits of a vector > register, and we want to describe it in RTL, the natural little-endian view is > vec_concat ( operand, zeroes ). The reality described above implies that the > correct description on big-endian systems is vec_concat ( zeroes, operand ). > > This also affects arm_neon.h intrinsics. When we say vcombine (a, b) we mean > that a should occupy the low 64-bits and b the high 64 bits. We therefore > need to take care to swap the operands to vec_concat when we are targeting > big-endian. > > This patch is messy, but it gives an notable improvement in the PASS > rates for an internal testsuite for Neon intrinsics. > > Tested on aarch64-none-elf and aarch64_be-none-elf with no issues, but no > improvements either. > > OK for trunk? > > Thanks, > James > > --- > gcc/ > > 2014-06-20 James Greenhalgh <james.greenha...@arm.com> > > * config/aarch64/aarch64-simd.md (move_lo_quad_internal_<mode>): New. > (move_lo_quad_internal_be_<mode>): Likewise. > (move_lo_quad_<mode>): Convert to define_expand. > (aarch64_simd_move_hi_quad_<mode>): Gate on BYTES_BIG_ENDIAN. > (aarch64_simd_move_hi_quad_be_<mode>): New. > (move_hi_quad_<mode>): Use appropriate insn for BYTES_BIG_ENDIAN. > (aarch64_combinez<mode>): Gate on BYTES_BIG_ENDIAN. > (aarch64_combinez_be<mode>): New. > (aarch64_combine<mode>): Convert to define_expand. > (aarch64_combine_internal<mode>): New. > (aarch64_simd_combine<mode>): Remove bogus RTL description. > > diff --git a/gcc/config/aarch64/aarch64-simd.md > b/gcc/config/aarch64/aarch64-simd.md > index > 6b81d811b70bd157207f7753027309442ec9e8b5..00e2206b200fd32c6df5987d7317687488e8dadd > 100644 > --- a/gcc/config/aarch64/aarch64-simd.md > +++ b/gcc/config/aarch64/aarch64-simd.md > @@ -942,14 +942,38 @@ (define_insn "<su><maxmin><mode>3" > [(set_attr "type" "neon_minmax<q>")] > ) > > -;; Move into low-half clearing high half to 0. > +;; vec_concat gives a new vector with the low elements from operand 1, and > +;; the high elements from operand 2. That is to say, given op1 = { a, b } > +;; op2 = { c, d }, vec_concat (op1, op2) = { a, b, c, d }. > +;; What that means, is that the RTL descriptions of the below patterns > +;; need to change depending on endianness. > + > +;; Move to the low architectural bits of the register. > +;; On little-endian this is { operand, zeroes } > +;; On big-endian this is { zeroes, operand } > > -(define_insn "move_lo_quad_<mode>" > +(define_insn "move_lo_quad_internal_<mode>" > [(set (match_operand:VQ 0 "register_operand" "=w,w,w") > (vec_concat:VQ > (match_operand:<VHALF> 1 "register_operand" "w,r,r") > (vec_duplicate:<VHALF> (const_int 0))))] > - "TARGET_SIMD" > + "TARGET_SIMD && !BYTES_BIG_ENDIAN" > + "@ > + dup\\t%d0, %1.d[0] > + fmov\\t%d0, %1 > + dup\\t%d0, %1" > + [(set_attr "type" "neon_dup<q>,f_mcr,neon_dup<q>") > + (set_attr "simd" "yes,*,yes") > + (set_attr "fp" "*,yes,*") > + (set_attr "length" "4")] > +) > + > +(define_insn "move_lo_quad_internal_be_<mode>" > + [(set (match_operand:VQ 0 "register_operand" "=w,w,w") > + (vec_concat:VQ > + (vec_duplicate:<VHALF> (const_int 0)) > + (match_operand:<VHALF> 1 "register_operand" "w,r,r")))] > + "TARGET_SIMD && BYTES_BIG_ENDIAN" > "@ > dup\\t%d0, %1.d[0] > fmov\\t%d0, %1 > @@ -960,7 +984,23 @@ (define_insn "move_lo_quad_<mode>" > (set_attr "length" "4")] > ) > > -;; Move into high-half. > +(define_expand "move_lo_quad_<mode>" > + [(match_operand:VQ 0 "register_operand") > + (match_operand:VQ 1 "register_operand")] > + "TARGET_SIMD" > +{ > + if (BYTES_BIG_ENDIAN) > + emit_insn (gen_move_lo_quad_internal_be_<mode> (operands[0], > operands[1])); > + else > + emit_insn (gen_move_lo_quad_internal_<mode> (operands[0], operands[1])); > + DONE; > +} > +) > + > +;; Move operand1 to the high architectural bits of the register, keeping > +;; the low architectural bits of operand2. > +;; For little-endian this is { operand2, operand1 } > +;; For big-endian this is { operand1, operand2 } > > (define_insn "aarch64_simd_move_hi_quad_<mode>" > [(set (match_operand:VQ 0 "register_operand" "+w,w") > @@ -969,12 +1009,25 @@ (define_insn "aarch64_simd_move_hi_quad_ > (match_dup 0) > (match_operand:VQ 2 "vect_par_cnst_lo_half" "")) > (match_operand:<VHALF> 1 "register_operand" "w,r")))] > - "TARGET_SIMD" > + "TARGET_SIMD && !BYTES_BIG_ENDIAN" > "@ > ins\\t%0.d[1], %1.d[0] > ins\\t%0.d[1], %1" > - [(set_attr "type" "neon_ins") > - (set_attr "length" "4")] > + [(set_attr "type" "neon_ins")] > +) > + > +(define_insn "aarch64_simd_move_hi_quad_be_<mode>" > + [(set (match_operand:VQ 0 "register_operand" "+w,w") > + (vec_concat:VQ > + (match_operand:<VHALF> 1 "register_operand" "w,r") > + (vec_select:<VHALF> > + (match_dup 0) > + (match_operand:VQ 2 "vect_par_cnst_hi_half" ""))))] > + "TARGET_SIMD && BYTES_BIG_ENDIAN" > + "@ > + ins\\t%0.d[1], %1.d[0] > + ins\\t%0.d[1], %1" > + [(set_attr "type" "neon_ins")] > ) > > (define_expand "move_hi_quad_<mode>" > @@ -982,9 +1035,13 @@ (define_expand "move_hi_quad_<mode>" > (match_operand:<VHALF> 1 "register_operand" "")] > "TARGET_SIMD" > { > - rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, false); > - emit_insn (gen_aarch64_simd_move_hi_quad_<mode> (operands[0], > - operands[1], p)); > + rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, BYTES_BIG_ENDIAN); > + if (BYTES_BIG_ENDIAN) > + emit_insn (gen_aarch64_simd_move_hi_quad_be_<mode> (operands[0], > + operands[1], p)); > + else > + emit_insn (gen_aarch64_simd_move_hi_quad_<mode> (operands[0], > + operands[1], p)); > DONE; > }) > > @@ -2338,12 +2395,44 @@ (define_insn "*aarch64_combinez<mode>" > (vec_concat:<VDBL> > (match_operand:VDIC 1 "register_operand" "w") > (match_operand:VDIC 2 "aarch64_simd_imm_zero" "Dz")))] > - "TARGET_SIMD" > + "TARGET_SIMD && !BYTES_BIG_ENDIAN" > "mov\\t%0.8b, %1.8b" > [(set_attr "type" "neon_move<q>")] > ) > > -(define_insn_and_split "aarch64_combine<mode>" > +(define_insn "*aarch64_combinez_be<mode>" > + [(set (match_operand:<VDBL> 0 "register_operand" "=&w") > + (vec_concat:<VDBL> > + (match_operand:VDIC 2 "aarch64_simd_imm_zero" "Dz") > + (match_operand:VDIC 1 "register_operand" "w")))] > + "TARGET_SIMD && BYTES_BIG_ENDIAN" > + "mov\\t%0.8b, %1.8b" > + [(set_attr "type" "neon_move<q>")] > +) > + > +(define_expand "aarch64_combine<mode>" > + [(match_operand:<VDBL> 0 "register_operand") > + (match_operand:VDC 1 "register_operand") > + (match_operand:VDC 2 "register_operand")] > + "TARGET_SIMD" > +{ > + rtx op1, op2; > + if (BYTES_BIG_ENDIAN) > + { > + op1 = operands[2]; > + op2 = operands[1]; > + } > + else > + { > + op1 = operands[1]; > + op2 = operands[2]; > + } > + emit_insn (gen_aarch64_combine_internal<mode> (operands[0], op1, op2)); > + DONE; > +} > +) > + > +(define_insn_and_split "aarch64_combine_internal<mode>" > [(set (match_operand:<VDBL> 0 "register_operand" "=&w") > (vec_concat:<VDBL> (match_operand:VDC 1 "register_operand" "w") > (match_operand:VDC 2 "register_operand" "w")))] > @@ -2352,16 +2441,19 @@ (define_insn_and_split "aarch64_combine< > "&& reload_completed" > [(const_int 0)] > { > - aarch64_split_simd_combine (operands[0], operands[1], operands[2]); > + if (BYTES_BIG_ENDIAN) > + aarch64_split_simd_combine (operands[0], operands[2], operands[1]); > + else > + aarch64_split_simd_combine (operands[0], operands[1], operands[2]); > DONE; > } > [(set_attr "type" "multiple")] > ) > > (define_expand "aarch64_simd_combine<mode>" > - [(set (match_operand:<VDBL> 0 "register_operand" "=&w") > - (vec_concat:<VDBL> (match_operand:VDC 1 "register_operand" "w") > - (match_operand:VDC 2 "register_operand" "w")))] > + [(match_operand:<VDBL> 0 "register_operand") > + (match_operand:VDC 1 "register_operand") > + (match_operand:VDC 2 "register_operand")] > "TARGET_SIMD" > { > emit_insn (gen_move_lo_quad_<Vdbl> (operands[0], operands[1]));