*ping*

Thanks,
James

On Tue, Jun 24, 2014 at 09:45:28AM +0100, James Greenhalgh wrote:
> 
> Hi,
> 
> vec_concat ( { a, b }, { c, d }) should give a new vector { a, b, c, d }.
> 
> On big-endian aarch64 targets, we have to think carefully about what this
> means as we map GCC's view of endian-ness on to ours. GCC (for reasons I have
> yet to understand) likes to describe lane-extracts from a vector as
> endian-ness dependant bit-field extracts. This cause major headaches, and
> means we have to pretend throughout the backend that lane zero is at the
> high bits of a vector register.
> 
> When we have a machine instruction which zeroes the high bits of a vector
> register, and we want to describe it in RTL, the natural little-endian view is
> vec_concat ( operand, zeroes ). The reality described above implies that the
> correct description on big-endian systems is vec_concat ( zeroes, operand ).
> 
> This also affects arm_neon.h intrinsics. When we say vcombine (a, b) we mean
> that a should occupy the low 64-bits and b the high 64 bits. We therefore
> need to take care to swap the operands to vec_concat when we are targeting
> big-endian.
> 
> This patch is messy, but it gives an notable improvement in the PASS
> rates for an internal testsuite for Neon intrinsics.
> 
> Tested on aarch64-none-elf and aarch64_be-none-elf with no issues, but no
> improvements either.
> 
> OK for trunk?
> 
> Thanks,
> James
> 
> ---
> gcc/
> 
> 2014-06-20  James Greenhalgh  <james.greenha...@arm.com>
> 
>       * config/aarch64/aarch64-simd.md (move_lo_quad_internal_<mode>): New.
>       (move_lo_quad_internal_be_<mode>): Likewise.
>       (move_lo_quad_<mode>): Convert to define_expand.
>       (aarch64_simd_move_hi_quad_<mode>): Gate on BYTES_BIG_ENDIAN.
>       (aarch64_simd_move_hi_quad_be_<mode>): New.
>       (move_hi_quad_<mode>): Use appropriate insn for BYTES_BIG_ENDIAN.
>       (aarch64_combinez<mode>): Gate on BYTES_BIG_ENDIAN.
>       (aarch64_combinez_be<mode>): New.
>       (aarch64_combine<mode>): Convert to define_expand.
>       (aarch64_combine_internal<mode>): New.
>       (aarch64_simd_combine<mode>): Remove bogus RTL description.
> 

> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> 6b81d811b70bd157207f7753027309442ec9e8b5..00e2206b200fd32c6df5987d7317687488e8dadd
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -942,14 +942,38 @@ (define_insn "<su><maxmin><mode>3"
>    [(set_attr "type" "neon_minmax<q>")]
>  )
>  
> -;; Move into low-half clearing high half to 0.
> +;; vec_concat gives a new vector with the low elements from operand 1, and
> +;; the high elements from operand 2.  That is to say, given op1 = { a, b }
> +;; op2 = { c, d }, vec_concat (op1, op2) = { a, b, c, d }.
> +;; What that means, is that the RTL descriptions of the below patterns
> +;; need to change depending on endianness.
> +
> +;; Move to the low architectural bits of the register.
> +;; On little-endian this is { operand, zeroes }
> +;; On big-endian this is { zeroes, operand }
>  
> -(define_insn "move_lo_quad_<mode>"
> +(define_insn "move_lo_quad_internal_<mode>"
>    [(set (match_operand:VQ 0 "register_operand" "=w,w,w")
>          (vec_concat:VQ
>         (match_operand:<VHALF> 1 "register_operand" "w,r,r")
>         (vec_duplicate:<VHALF> (const_int 0))))]
> -  "TARGET_SIMD"
> +  "TARGET_SIMD && !BYTES_BIG_ENDIAN"
> +  "@
> +   dup\\t%d0, %1.d[0]
> +   fmov\\t%d0, %1
> +   dup\\t%d0, %1"
> +  [(set_attr "type" "neon_dup<q>,f_mcr,neon_dup<q>")
> +   (set_attr "simd" "yes,*,yes")
> +   (set_attr "fp" "*,yes,*")
> +   (set_attr "length" "4")]
> +)
> +
> +(define_insn "move_lo_quad_internal_be_<mode>"
> +  [(set (match_operand:VQ 0 "register_operand" "=w,w,w")
> +        (vec_concat:VQ
> +       (vec_duplicate:<VHALF> (const_int 0))
> +       (match_operand:<VHALF> 1 "register_operand" "w,r,r")))]
> +  "TARGET_SIMD && BYTES_BIG_ENDIAN"
>    "@
>     dup\\t%d0, %1.d[0]
>     fmov\\t%d0, %1
> @@ -960,7 +984,23 @@ (define_insn "move_lo_quad_<mode>"
>     (set_attr "length" "4")]
>  )
>  
> -;; Move into high-half.
> +(define_expand "move_lo_quad_<mode>"
> +  [(match_operand:VQ 0 "register_operand")
> +   (match_operand:VQ 1 "register_operand")]
> +  "TARGET_SIMD"
> +{
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_move_lo_quad_internal_be_<mode> (operands[0], 
> operands[1]));
> +  else
> +    emit_insn (gen_move_lo_quad_internal_<mode> (operands[0], operands[1]));
> +  DONE;
> +}
> +)
> +
> +;; Move operand1 to the high architectural bits of the register, keeping
> +;; the low architectural bits of operand2.
> +;; For little-endian this is { operand2, operand1 }
> +;; For big-endian this is { operand1, operand2 }
>  
>  (define_insn "aarch64_simd_move_hi_quad_<mode>"
>    [(set (match_operand:VQ 0 "register_operand" "+w,w")
> @@ -969,12 +1009,25 @@ (define_insn "aarch64_simd_move_hi_quad_
>                  (match_dup 0)
>                  (match_operand:VQ 2 "vect_par_cnst_lo_half" ""))
>         (match_operand:<VHALF> 1 "register_operand" "w,r")))]
> -  "TARGET_SIMD"
> +  "TARGET_SIMD && !BYTES_BIG_ENDIAN"
>    "@
>     ins\\t%0.d[1], %1.d[0]
>     ins\\t%0.d[1], %1"
> -  [(set_attr "type" "neon_ins")
> -   (set_attr "length" "4")]
> +  [(set_attr "type" "neon_ins")]
> +)
> +
> +(define_insn "aarch64_simd_move_hi_quad_be_<mode>"
> +  [(set (match_operand:VQ 0 "register_operand" "+w,w")
> +        (vec_concat:VQ
> +       (match_operand:<VHALF> 1 "register_operand" "w,r")
> +          (vec_select:<VHALF>
> +                (match_dup 0)
> +                (match_operand:VQ 2 "vect_par_cnst_hi_half" ""))))]
> +  "TARGET_SIMD && BYTES_BIG_ENDIAN"
> +  "@
> +   ins\\t%0.d[1], %1.d[0]
> +   ins\\t%0.d[1], %1"
> +  [(set_attr "type" "neon_ins")]
>  )
>  
>  (define_expand "move_hi_quad_<mode>"
> @@ -982,9 +1035,13 @@ (define_expand "move_hi_quad_<mode>"
>    (match_operand:<VHALF> 1 "register_operand" "")]
>   "TARGET_SIMD"
>  {
> -  rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, false);
> -  emit_insn (gen_aarch64_simd_move_hi_quad_<mode> (operands[0],
> -                                                operands[1], p));
> +  rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, BYTES_BIG_ENDIAN);
> +  if (BYTES_BIG_ENDIAN)
> +    emit_insn (gen_aarch64_simd_move_hi_quad_be_<mode> (operands[0],
> +                 operands[1], p));
> +  else
> +    emit_insn (gen_aarch64_simd_move_hi_quad_<mode> (operands[0],
> +                 operands[1], p));
>    DONE;
>  })
>  
> @@ -2338,12 +2395,44 @@ (define_insn "*aarch64_combinez<mode>"
>          (vec_concat:<VDBL>
>          (match_operand:VDIC 1 "register_operand" "w")
>          (match_operand:VDIC 2 "aarch64_simd_imm_zero" "Dz")))]
> -  "TARGET_SIMD"
> +  "TARGET_SIMD && !BYTES_BIG_ENDIAN"
>    "mov\\t%0.8b, %1.8b"
>    [(set_attr "type" "neon_move<q>")]
>  )
>  
> -(define_insn_and_split "aarch64_combine<mode>"
> +(define_insn "*aarch64_combinez_be<mode>"
> +  [(set (match_operand:<VDBL> 0 "register_operand" "=&w")
> +        (vec_concat:<VDBL>
> +        (match_operand:VDIC 2 "aarch64_simd_imm_zero" "Dz")
> +        (match_operand:VDIC 1 "register_operand" "w")))]
> +  "TARGET_SIMD && BYTES_BIG_ENDIAN"
> +  "mov\\t%0.8b, %1.8b"
> +  [(set_attr "type" "neon_move<q>")]
> +)
> +
> +(define_expand "aarch64_combine<mode>"
> +  [(match_operand:<VDBL> 0 "register_operand")
> +   (match_operand:VDC 1 "register_operand")
> +   (match_operand:VDC 2 "register_operand")]
> +  "TARGET_SIMD"
> +{
> +  rtx op1, op2;
> +  if (BYTES_BIG_ENDIAN)
> +    {
> +      op1 = operands[2];
> +      op2 = operands[1];
> +    }
> +  else
> +    {
> +      op1 = operands[1];
> +      op2 = operands[2];
> +    }
> +  emit_insn (gen_aarch64_combine_internal<mode> (operands[0], op1, op2));
> +  DONE;
> +}
> +)
> +
> +(define_insn_and_split "aarch64_combine_internal<mode>"
>    [(set (match_operand:<VDBL> 0 "register_operand" "=&w")
>          (vec_concat:<VDBL> (match_operand:VDC 1 "register_operand" "w")
>                          (match_operand:VDC 2 "register_operand" "w")))]
> @@ -2352,16 +2441,19 @@ (define_insn_and_split "aarch64_combine<
>    "&& reload_completed"
>    [(const_int 0)]
>  {
> -  aarch64_split_simd_combine (operands[0], operands[1], operands[2]);
> +  if (BYTES_BIG_ENDIAN)
> +    aarch64_split_simd_combine (operands[0], operands[2], operands[1]);
> +  else
> +    aarch64_split_simd_combine (operands[0], operands[1], operands[2]);
>    DONE;
>  }
>  [(set_attr "type" "multiple")]
>  )
>  
>  (define_expand "aarch64_simd_combine<mode>"
> -  [(set (match_operand:<VDBL> 0 "register_operand" "=&w")
> -        (vec_concat:<VDBL> (match_operand:VDC 1 "register_operand" "w")
> -  (match_operand:VDC 2 "register_operand" "w")))]
> +  [(match_operand:<VDBL> 0 "register_operand")
> +   (match_operand:VDC 1 "register_operand")
> +   (match_operand:VDC 2 "register_operand")]
>    "TARGET_SIMD"
>    {
>      emit_insn (gen_move_lo_quad_<Vdbl> (operands[0], operands[1]));


Reply via email to