Re: [PATCH] [AArch64, NEON] Improve vpmaxX & vpminX intrinsics

Tejas Belagod Fri, 16 Jan 2015 03:49:53 -0800

On 14/01/15 07:09, Yangfei (Felix) wrote:

On 09/12/14 08:17, Yangfei (Felix) wrote:

On 28 November 2014 at 09:23, Yangfei (Felix) <felix.y...@huawei.com>

wrote:

Hi,
    This patch converts vpmaxX & vpminX intrinsics to use builtin
functions

instead of the previous inline assembly syntax.

    Regtested with aarch64-linux-gnu on QEMU.  Also passed the
glorious

testsuite of Christophe Lyon.

    OK for the trunk?


Hi Felix,   We know from experience that the advsimd intrinsics tend
to be fragile for big endian and in general it is fairly easy to
break the big endian case.  For these advsimd improvements that you
are working on (that we very much appreciate) it is important to run
both little endian and big endian regressions.

Thanks
/Marcus



Okay.  Any plan for the advsimd big-endian improvement?
I rebased this patch over Alan Lawrance's patch:
https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00279.html
No regressions for aarch64_be-linux-gnu target too.  OK for the thunk?


Index: gcc/ChangeLog

=============================================================
======

--- gcc/ChangeLog       (revision 218464)
+++ gcc/ChangeLog       (working copy)
@@ -1,3 +1,18 @@
+2014-12-09  Felix Yang  <felix.y...@huawei.com>
+
+       * config/aarch64/aarch64-simd.md

(aarch64_<maxmin_uns>p<mode>): New

+       pattern.
+       * config/aarch64/aarch64-simd-builtins.def (smaxp, sminp, umaxp,
+       uminp, smax_nanp, smin_nanp): New builtins.
+       * config/aarch64/arm_neon.h (vpmax_s8, vpmax_s16, vpmax_s32,
+       vpmax_u8, vpmax_u16, vpmax_u32, vpmaxq_s8, vpmaxq_s16,

vpmaxq_s32,

+       vpmaxq_u8, vpmaxq_u16, vpmaxq_u32, vpmax_f32, vpmaxq_f32,

vpmaxq_f64,

+       vpmaxqd_f64, vpmaxs_f32, vpmaxnm_f32, vpmaxnmq_f32,

vpmaxnmq_f64,

+       vpmaxnmqd_f64, vpmaxnms_f32, vpmin_s8, vpmin_s16, vpmin_s32,

vpmin_u8,

+       vpmin_u16, vpmin_u32, vpminq_s8, vpminq_s16, vpminq_s32,

vpminq_u8,

+       vpminq_u16, vpminq_u32, vpmin_f32, vpminq_f32, vpminq_f64,

vpminqd_f64,

+       vpmins_f32, vpminnm_f32, vpminnmq_f32, vpminnmq_f64,
+ vpminnmqd_f64,
+

   __extension__ static __inline float32x2_t __attribute__
((__always_inline__))
Index: gcc/config/aarch64/aarch64-simd.md

=============================================================
======

--- gcc/config/aarch64/aarch64-simd.md  (revision 218464)
+++ gcc/config/aarch64/aarch64-simd.md  (working copy)
@@ -1017,6 +1017,28 @@
     DONE;
   })

+;; Pairwise Integer Max/Min operations.
+(define_insn "aarch64_<maxmin_uns>p<mode>"
+ [(set (match_operand:VDQ_BHSI 0 "register_operand" "=w")
+       (unspec:VDQ_BHSI [(match_operand:VDQ_BHSI 1

"register_operand" "w")

+                        (match_operand:VDQ_BHSI 2 "register_operand"

"w")]

+                       MAXMINV))]
+ "TARGET_SIMD"
+ "<maxmin_uns_op>p\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
+  [(set_attr "type" "neon_minmax<q>")]
+)
+


Hi Felix,

Sorry for the delay in getting back to you on this.

If you've rolled aarch64_reduc_<maxmin_uns>_internalv2si into the above
pattern, do you still need it? For all its call points, just point them to
aarch64_<maxmin_uns>p<mode>?

Thanks,
Tejas.



Hello Tejas,

   I didn't do this yet.
   Currently the aarch64_reduc_<maxmin_uns>_internalv2si is only called by 
reduc_<maxmin_uns>_scal_<mode>.
   I find it kind of trouble to handle this due to the use of iterators in the 
caller pattern.
   Are you going to rework this part?

You're right. Nevermind. That restructuring, if we choose to do it, isanother patch. This patch looks good(but I can't approve it).


Thanks,
Tejas.

Re: [PATCH] [AArch64, NEON] Improve vpmaxX & vpminX intrinsics

Reply via email to