from:"Bilyan Borisov"

[AARCH32][ACLE][NEON] Implement vcvt_s32_f32 and vcvt_u32_f32 NEON intrinsics.

2016-01-13 Thread Bilyan Borisov

[AARCH32][ACLE][NEON] Implement vcvt*_s32_f32 and vcvt*_u32_f32 NEON 
intrinsics.


This patch implements all the vcvtRQ_s32_f32 and vcvtRQ_u32_f32 vector
intrinsics, where R is ['',a,m,n,p] and Q is ['',q]. The intrinsics were
implemented using builtin functions mapping to the existing 
neon_vcvt
pattern, which was extended to cover the above cross product of all 
variants. In
addition, a new unary type qualifier for builtins, UNOPUS, was added to 
enable

the builtin functions to be called without casts.

Cross tested on arm-none-eabi, arm-none-linux-gnueabi, 
arm-none-linux-gnueabihf,

armeb-none-eabi.

---

gcc/

2015-XX-XX  Bilyan Borisov  

* config/arm/arm-builtins.c (arm_unopus_qualifiers): New
qualifier.
* config/aarch64/arm_neon.h (vcvta_s32_f32): New intrinsic.
(vcvta_u32_f32): Likewise.
(vcvtaq_s32_f32): Likewise.
(vcvtaq_u32_f32): Likewise.
(vcvtm_s32_f32): Likewise.
(vcvtm_u32_f32): Likewise.
(vcvtmq_s32_f32): Likewise.
(vcvtmq_u32_f32): Likewise.
(vcvtn_s32_f32): Likewise.
(vcvtn_u32_f32): Likewise.
(vcvtnq_s32_f32): Likewise.
(vcvtnq_u32_f32): Likewise.
(vcvtp_s32_f32): Likewise.
(vcvtp_u32_f32): Likewise.
(vcvtpq_s32_f32): Likewise.
(vcvtpq_u32_f32): Likewise.
* config/arm/arm_neon_builtins.def (vcvtas): New builtin.
(vcvtau): Likewise.
(vcvtps): Likewise.
(vcvtpu): Likewise.
(vcvtms): Likewise.
(vcvtmu): Likewise.
(vcvtns): Likewise.
(vcvtnu): Likewise.
* config/arm/iterators.md (VCVTR_US): New int iterator.
(VQMOVN): Modified int iterator.
(rndmode): New int attribute.
* config/arm/neon.md (neon_vcvt
): Modified pattern.
* config/arm/unspecs.md (UNSPEC_VCVTA_S): New unspec definition.
(UNSPEC_VCVTA_U): Likewise.
(UNSPEC_VCVTM_S): Likewise.
(UNSPEC_VCVTM_U): Likewise.
(UNSPEC_VCVTN_S): Likewise.
(UNSPEC_VCVTN_U): Likewise.
(UNSPEC_VCVTP_S): Likewise.
(UNSPEC_VCVTP_U): Likewise.

gcc/testsuite/

2015-XX-XX  Bilyan Borisov  

* gcc.target/arm/neon/vcvta_s32_f32_1.c: New.
* gcc.target/arm/neon/vcvta_u32_f32_1.c: Likewise.
* gcc.target/arm/neon/vcvtaq_s32_f32_1.c: Likewise.
* gcc.target/arm/neon/vcvtaq_u32_f32_1.c: Likewise.
* gcc.target/arm/neon/vcvtm_s32_f32_1.c: Likewise.
* gcc.target/arm/neon/vcvtm_u32_f32_1.c: Likewise.
* gcc.target/arm/neon/vcvtmq_s32_f32_1.c: Likewise.
* gcc.target/arm/neon/vcvtmq_u32_f32_1.c: Likewise.
* gcc.target/arm/neon/vcvtn_s32_f32_1.c: Likewise.
* gcc.target/arm/neon/vcvtn_u32_f32_1.c: Likewise.
* gcc.target/arm/neon/vcvtnq_s32_f32_1.c: Likewise.
* gcc.target/arm/neon/vcvtnq_u32_f32_1.c: Likewise.
* gcc.target/arm/neon/vcvtp_s32_f32_1.c: Likewise.
* gcc.target/arm/neon/vcvtp_u32_f32_1.c: Likewise.
* gcc.target/arm/neon/vcvtpq_s32_f32_1.c: Likewise.
* gcc.target/arm/neon/vcvtpq_u32_f32_1.c: Likewise.
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 11cd17d0b8f3c29ccbe16cb463a17d55ba0fa1e3..d635372c3257b33f9ca353f150e68bb6f9621a8d 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -87,6 +87,12 @@ arm_bswap_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned };
 #define BSWAP_QUALIFIERS (arm_bswap_qualifiers)
 
+/* unsigned T (T).  */
+static enum arm_type_qualifiers
+arm_unopus_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_none };
+#define UNOPUS_QUALIFIERS (arm_unopus_qualifiers)
+
 /* T (T, T [maybe_immediate]).  */
 static enum arm_type_qualifiers
 arm_binop_qualifiers[SIMD_MAX_BUILTIN_ARGS]
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 0a33d21f2fcf8a1074fb62e89f4418295d446db5..fc3f5aa24c88f3aec1cfd234f9ca0aec11e1c612 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -6356,6 +6356,106 @@ vcvtq_u32_f32 (float32x4_t __a)
 }
 
 #pragma GCC push_options
+#pragma GCC target ("fpu=neon-fp-armv8")
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vcvta_s32_f32 (float32x2_t __a)
+{
+  return __builtin_neon_vcvtasv2sf (__a);
+}
+
+__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__))
+vcvta_u32_f32 (float32x2_t __a)
+{
+  return __builtin_neon_vcvtauv2sf_us (__a);
+}
+
+__extension__ static __inline int32x4_t __attribute__ ((__always_inline__))
+vcvtaq_s32_f32 (float32x4_t __a)
+{
+  return __builtin_neon_vcvtasv4sf (__a);
+}
+
+__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__))
+vcvtaq_u32_f32 (float32x4_t __a)
+{
+  return __builtin_neon_vcvtauv4sf_us (__a);
+}
+
+__extension__ static __inline int32x2_t __attribute__ ((__always_inline__))
+vcvtm_s32_f32

[AARCH64][ACLE][NEON] Implement vcvt_s64_f64 and vcvt_u64_f64 NEON intrinsics.

2016-01-13 Thread Bilyan Borisov


This patch implements all the vcvtR_s64_f64 and vcvtR_u64_f64 vector
intrinsics, where R is ['',a,m,n,p]. Since these intrinsics are
identical in semantics to the corresponding scalar variants, they are
implemented in terms of them, with appropriate packing and unpacking
of vector arguments. New test cases, covering all the intrinsics were
also added.

Cross tested on aarch64-none-elf and aarch64-none-linux-gnu. 
Bootstrapped and

tested on aarch64-none-linux-gnu.

---

gcc/

2015-XX-XX  Bilyan Borisov  

* config/aarch64/arm_neon.h (vcvt_s64_f64): New intrinsic.
(vcvt_u64_f64): Likewise.
(vcvta_s64_f64): Likewise.
(vcvta_u64_f64): Likewise.
(vcvtm_s64_f64): Likewise.
(vcvtm_u64_f64): Likewise.
(vcvtn_s64_f64): Likewise.
(vcvtn_u64_f64): Likewise.
(vcvtp_s64_f64): Likewise.
(vcvtp_u64_f64): Likewise.

gcc/testsuite/

2015-XX-XX  Bilyan Borisov  

* gcc.target/aarch64/simd/vcvt_s64_f64_1.c: New.
* gcc.target/aarch64/simd/vcvt_u64_f64_1.c: Likewise.
* gcc.target/aarch64/simd/vcvta_s64_f64_1.c: Likewise.
* gcc.target/aarch64/simd/vcvta_u64_f64_1.c: Likewise.
* gcc.target/aarch64/simd/vcvtm_s64_f64_1.c: Likewise.
* gcc.target/aarch64/simd/vcvtm_u64_f64_1.c: Likewise.
* gcc.target/aarch64/simd/vcvtn_s64_f64_1.c: Likewise.
* gcc.target/aarch64/simd/vcvtn_u64_f64_1.c: Likewise.
* gcc.target/aarch64/simd/vcvtp_s64_f64_1.c: Likewise.
* gcc.target/aarch64/simd/vcvtp_u64_f64_1.c: Likewise.
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index c78f2524fa62b1ceedce86ee64cadfa67d3b0d0c..1e19a9e2ed96b7b7c5715be41b98e9c1407a74f9 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -13218,6 +13218,18 @@ vcvtq_u32_f32 (float32x4_t __a)
   return __builtin_aarch64_lbtruncuv4sfv4si_us (__a);
 }
 
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vcvt_s64_f64 (float64x1_t __a)
+{
+  return (int64x1_t) {vcvtd_s64_f64 (__a[0])};
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vcvt_u64_f64 (float64x1_t __a)
+{
+  return (uint64x1_t) {vcvtd_u64_f64 (__a[0])};
+}
+
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vcvtq_s64_f64 (float64x2_t __a)
 {
@@ -13280,6 +13292,18 @@ vcvtaq_u32_f32 (float32x4_t __a)
   return __builtin_aarch64_lrounduv4sfv4si_us (__a);
 }
 
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vcvta_s64_f64 (float64x1_t __a)
+{
+  return (int64x1_t) {vcvtad_s64_f64 (__a[0])};
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vcvta_u64_f64 (float64x1_t __a)
+{
+  return (uint64x1_t) {vcvtad_u64_f64 (__a[0])};
+}
+
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vcvtaq_s64_f64 (float64x2_t __a)
 {
@@ -13342,6 +13366,18 @@ vcvtmq_u32_f32 (float32x4_t __a)
   return __builtin_aarch64_lflooruv4sfv4si_us (__a);
 }
 
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vcvtm_s64_f64 (float64x1_t __a)
+{
+  return (int64x1_t) {vcvtmd_s64_f64 (__a[0])};
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vcvtm_u64_f64 (float64x1_t __a)
+{
+  return (uint64x1_t) {vcvtmd_u64_f64 (__a[0])};
+}
+
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vcvtmq_s64_f64 (float64x2_t __a)
 {
@@ -13404,6 +13440,18 @@ vcvtnq_u32_f32 (float32x4_t __a)
   return __builtin_aarch64_lfrintnuv4sfv4si_us (__a);
 }
 
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vcvtn_s64_f64 (float64x1_t __a)
+{
+  return (int64x1_t) {vcvtnd_s64_f64 (__a[0])};
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vcvtn_u64_f64 (float64x1_t __a)
+{
+  return (uint64x1_t) {vcvtnd_u64_f64 (__a[0])};
+}
+
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vcvtnq_s64_f64 (float64x2_t __a)
 {
@@ -13466,6 +13514,18 @@ vcvtpq_u32_f32 (float32x4_t __a)
   return __builtin_aarch64_lceiluv4sfv4si_us (__a);
 }
 
+__extension__ static __inline int64x1_t __attribute__ ((__always_inline__))
+vcvtp_s64_f64 (float64x1_t __a)
+{
+  return (int64x1_t) {vcvtpd_s64_f64 (__a[0])};
+}
+
+__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__))
+vcvtp_u64_f64 (float64x1_t __a)
+{
+  return (uint64x1_t) {vcvtpd_u64_f64 (__a[0])};
+}
+
 __extension__ static __inline int64x2_t __attribute__ ((__always_inline__))
 vcvtpq_s64_f64 (float64x2_t __a)
 {
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vcvt_s64_f64_1.c b/gcc/testsuite/gcc.target/aarch64/simd/vcvt_s64_f64_1.c
new file mode 100644
index ..02f59fc7e58c988141f8f00c8866c71f2d5d660b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vcvt_s64_f64_1.c
@@ -0,0 +1,25 @@
+/* { dg-do run } *

[AARCH64][ACLE] Implement __ARM_FP_FENV_ROUNDING in aarch64 backend.

2016-01-11 Thread Bilyan Borisov

This patch implements the __ARM_FP_FENV_ROUNDING macro in the aarch64 
backend.

AArch64 supports configurable rounding modes, which can be set using the
standard C fesetround() function. According to the ACLE 2.0 specification,
__ARM_FP_FENV_ROUNDING is defined to 1 only when fesetround() is provided.
Since newlib doesn't provide this function, and since armv8 aarch64 hardware
provides support for configurable rounding modes for scalar and simd, we 
only
define the macro when we are building a compiler targeting glibc (i.e. 
target

aarch64-none-linux-gnu), which does provide fesetround().

Cross tested on aarch64-none-elf and aarch64-none-linux-gnu. 
Bootstrapped and

tested on aarch64-none-linux-gnu.

---

gcc/

2015-XX-XX  Bilyan Borisov  

* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): New macro
definition.

gcc/testsuite/

2015-XX-XX  Bilyan Borisov  

* gcc.target/aarch64/fesetround-checking-baremetal.c: New.
* gcc.target/aarch64/fesetround-checking-linux.c: Likewise.
diff --git a/gcc/config/aarch64/aarch64-c.c b/gcc/config/aarch64/aarch64-c.c
index ad95c78b9895a33da3e5a0ec6328219b887ade37..0e07d74639340176f51587e37e5ea91b57fdeee1 100644
--- a/gcc/config/aarch64/aarch64-c.c
+++ b/gcc/config/aarch64/aarch64-c.c
@@ -97,8 +97,15 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
 
   aarch64_def_or_undef (TARGET_SIMD, "__ARM_FEATURE_NUMERIC_MAXMIN", pfile);
   aarch64_def_or_undef (TARGET_SIMD, "__ARM_NEON", pfile);
-
-
+  /* Since fesetround () is not present in newlib, but it's present in glibc, we
+ only define the __ARM_FP_FENV_ROUNDING macro to 1 when glibc is used
+ i.e. on aarch64-none-linux-gnu.  The OPTION_GLIBC macro is undefined on
+ aarch64-none-elf, and is defined on aarch64-none-linux-gnu to an expression
+ that evaluates to 1 when targetting glibc.  The musl, bionic, and uclibc
+ cases haven't been investigated, so don't do any action in that case.  */
+#ifdef OPTION_GLIBC
+aarch64_def_or_undef (OPTION_GLIBC, "__ARM_FP_FENV_ROUNDING", pfile);
+#endif
   aarch64_def_or_undef (TARGET_CRC32, "__ARM_FEATURE_CRC32", pfile);
 
   cpp_undef (pfile, "__AARCH64_CMODEL_TINY__");
diff --git a/gcc/testsuite/gcc.target/aarch64/fesetround-checking-baremetal.c b/gcc/testsuite/gcc.target/aarch64/fesetround-checking-baremetal.c
new file mode 100644
index ..1202dfa2e81923f0d2c9b6be1eb71e3192d5d974
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/fesetround-checking-baremetal.c
@@ -0,0 +1,18 @@
+/* { dg-do compile { target { aarch64*-*-elf* } } } */
+
+extern void abort ();
+
+int
+main ()
+{
+  /* According to ACLE 2.0, __ARM_FP_FENV_ROUNDING macro should be defined to 1
+ when fesetround () is supported.  Since aarch64-none-elf which uses newlib
+ doesn't support this, we error if the macro is defined to 1.
+   */
+#if defined (__ARM_FP_FENV_ROUNDING)
+#error "According to ACLE 2.0, __ARM_FP_FENV_ROUNDING macro should be defined \
+to 1 only when fesetround () is supported."
+#else
+  return 0;
+#endif
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/fesetround-checking-linux.c b/gcc/testsuite/gcc.target/aarch64/fesetround-checking-linux.c
new file mode 100644
index ..d79f384d672bf8d0847b41ab8cc2cc594a805dca
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/fesetround-checking-linux.c
@@ -0,0 +1,18 @@
+/* { dg-do compile { target { aarch64*-*-linux* } } } */
+
+extern void abort ();
+
+int
+main ()
+{
+  /* According to ACLE 2.0, __ARM_FP_FENV_ROUNDING macro should be defined to 1
+ when fesetround () is supported.  In our case, this is on
+ aarch64-none-linux-gnu.
+   */
+#if defined (__ARM_FP_FENV_ROUNDING) && (__ARM_FP_FENV_ROUNDING == 1)
+  return 0;
+#else
+#error "According to ACLE 2.0, __ARM_FP_FENV_ROUNDING macro should be defined \
+to 1 when fesetround () is supported."
+#endif
+}

[AArch32][NEON] Implementing vmaxnmQ_ST and vminnmQ_ST intrinsics.

2016-01-08 Thread Bilyan Borisov


This patch implements the vmaxnmQ_ST and vminnmQ_ST intrinsics. The
current builtin registration code is deficient since it can't access
standard pattern names, to which vmaxnmQ_ST and vminnmQ_ST map
directly. Thus, to enable the vectoriser to have access to these
intrinsics, we implement them using builtin functions, which we
expand to the proper standard pattern using a define_expand.

This patch also implements the __ARM_FEATURE_NUMERIC_MAXMIN macro,
which is defined when __ARM_ARCH >= 8, and which enables the
intrinsics.

Cross tested on arm-none-eabi, armeb-none-eabi,
arm-none-linux-gnueabi, and arm-none-linux-gnueabihf. Bootstrapped
and tested on arm-none-linux-gnueabihf.

---

gcc/

2015-XX-XX  Bilyan Borisov  

* config/arm/arm-c.c (arm_cpu_builtins): New macro definition.
* config/arm/arm_neon.h (vmaxnm_f32): New intrinsinc.
(vmaxnmq_f32): Likewise.
(vminnm_f32): Likewise.
(vminnmq_f32): Likewise.
* config/arm/arm_neon_builtins.def (vmaxnm): New builtin.
(vminnm): Likewise.
* config/arm/neon.md (neon_, VCVTF): New
expander.

gcc/testsuite/

2015-XX-XX  Bilyan Borisov  

* gcc.target/arm/simd/vmaxnm_f32_1.c: New.
* gcc.target/arm/simd/vmaxnmq_f32_1.c: Likewise.
* gcc.target/arm/simd/vminnm_f32_1.c: Likewise.
* gcc.target/arm/simd/vminnmq_f32_1.c: Likewise.
diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index 7dee28ec52df68f8c7a60fe66e1b049fed39c1c0..7b63bdcf86c079288611f79ed89d6540b348fe82 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -83,6 +83,9 @@ arm_cpu_builtins (struct cpp_reader* pfile)
 		  ((TARGET_ARM_ARCH >= 5 && !TARGET_THUMB)
 		   || TARGET_ARM_ARCH_ISA_THUMB >=2));
 
+  def_or_undef_macro (pfile, "__ARM_FEATURE_NUMERIC_MAXMIN",
+		  TARGET_ARM_ARCH >= 8);
+
   def_or_undef_macro (pfile, "__ARM_FEATURE_SIMD32", TARGET_INT_SIMD);
 
   builtin_define_with_int_value ("__ARM_SIZEOF_MINIMAL_ENUM",
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 0a33d21f2fcf8a1074fb62e89f4418295d446db5..2e28621cd3a3bf6682ce5353cfb1fd5d8d6c55ad 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -2889,6 +2889,34 @@ vmaxq_f32 (float32x4_t __a, float32x4_t __b)
   return (float32x4_t)__builtin_neon_vmaxfv4sf (__a, __b);
 }
 
+#pragma GCC push_options
+#pragma GCC target ("fpu=neon-fp-armv8")
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vmaxnm_f32 (float32x2_t a, float32x2_t b)
+{
+  return (float32x2_t)__builtin_neon_vmaxnmv2sf (a, b);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vmaxnmq_f32 (float32x4_t a, float32x4_t b)
+{
+  return (float32x4_t)__builtin_neon_vmaxnmv4sf (a, b);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vminnm_f32 (float32x2_t a, float32x2_t b)
+{
+  return (float32x2_t)__builtin_neon_vminnmv2sf (a, b);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vminnmq_f32 (float32x4_t a, float32x4_t b)
+{
+  return (float32x4_t)__builtin_neon_vminnmv4sf (a, b);
+}
+#pragma GCC pop_options
+
+
 __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__))
 vmaxq_u8 (uint8x16_t __a, uint8x16_t __b)
 {
diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def
index 0b719df760747af7642bd14ab14a9b2144d43359..1d3b6e9b6a08a3cf3b0d6f76bf340208919c9b13 100644
--- a/gcc/config/arm/arm_neon_builtins.def
+++ b/gcc/config/arm/arm_neon_builtins.def
@@ -126,6 +126,9 @@ VAR6 (BINOP, vmins, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
 VAR6 (BINOP, vminu, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
 VAR2 (BINOP, vminf, v2sf, v4sf)
 
+VAR2 (BINOP, vmaxnm, v2sf, v4sf)
+VAR2 (BINOP, vminnm, v2sf, v4sf)
+
 VAR3 (BINOP, vpmaxs, v8qi, v4hi, v2si)
 VAR3 (BINOP, vpmaxu, v8qi, v4hi, v2si)
 VAR1 (BINOP, vpmaxf, v2sf)
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 94c63fd3dbc071291844fbe7732435465fbc0ada..b745199fe25d7afd7d93598e12e74d8efb4863fe 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -2354,6 +2354,18 @@
   [(set_attr "type" "neon_fp_minmax_s")]
 )
 
+;; Expander for vnm intrinsics.
+(define_expand "neon_"
+  [(unspec:VCVTF [(match_operand:VCVTF 0 "s_register_operand" "")
+   (match_operand:VCVTF 1 "s_register_operand" "")
+   (match_operand:VCVTF 2 "s_register_operand" "")]
+		  VMAXMINFNM)]
+  "TARGET_NEON && TARGET_FPU_ARMV8"
+{
+  emit_insn (gen_3 (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
 ;; Vector forms for the IEEE-754 fmax()/fmin() functions
 (define_insn "3"
   [(set (match_operand:VCVTF 0 "s_register_operand" "=w")
diff --git a/gcc/testsuite/gcc.target/arm/simd/vmaxnm_f32_1.c b/

[AArch32][NEON] Implementing vmaxnmQ_ST and vminnmQ_ST intrinsincs.

2015-12-21 Thread Bilyan Borisov


This patch implements the vmaxnmQ_ST and vminnmQ_ST intrinsincs. It also
implements the __ARM_FEATURE_NUMERIC_MAXMIN macro, which is defined when
__ARM_ARCH >= 8, and which enables the intrinsincs.

Tested on arm-none-eabi, armeb-none-eabi, arm-none-linux-gnueabihf.

---

gcc/

2015-XX-XX  Bilyan Borisov  

* config/arm/arm-c.c (arm_cpu_builtins): New macro definition.
* config/arm/arm_neon.h (vmaxnm_f32): New intrinsinc.
(vmaxnmq_f32): Likewise.
(vminnm_f32): Likewise.
(vminnmq_f32): Likewise.
* config/arm/arm_neon_builtins.def (vmaxnm): New builtin.
(vminnm): Likewise.
* config/arm/iterators.md (VMAXMINNM): New iterator.
(maxmin): Updated iterator.
* config/arm/neon.md (neon_v, VCVTF): New pattern.
* config/arm/unspecs.md (UNSPEC_VMAXNM): New unspec.
(UNSPEC_VMINNM): Likewise.

gcc/testsuite/

2015-XX-XX  Bilyan Borisov  

* gcc.target/arm/simd/vmaxnm_f32_1.c: New.
* gcc.target/arm/simd/vmaxnmq_f32_1.c: Likewise.
* gcc.target/arm/simd/vminnm_f32_1.c: Likewise.
* gcc.target/arm/simd/vminnmq_f32_1.c: Likewise.

diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index 7dee28ec52df68f8c7a60fe66e1b049fed39c1c0..7b63bdcf86c079288611f79ed89d6540b348fe82 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -83,6 +83,9 @@ arm_cpu_builtins (struct cpp_reader* pfile)
 		  ((TARGET_ARM_ARCH >= 5 && !TARGET_THUMB)
 		   || TARGET_ARM_ARCH_ISA_THUMB >=2));
 
+  def_or_undef_macro (pfile, "__ARM_FEATURE_NUMERIC_MAXMIN",
+		  TARGET_ARM_ARCH >= 8);
+
   def_or_undef_macro (pfile, "__ARM_FEATURE_SIMD32", TARGET_INT_SIMD);
 
   builtin_define_with_int_value ("__ARM_SIZEOF_MINIMAL_ENUM",
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 0a33d21f2fcf8a1074fb62e89f4418295d446db5..0c8c08cc404cbc446db648d41f0773d0b4798a3a 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -2907,6 +2907,33 @@ vmaxq_u32 (uint32x4_t __a, uint32x4_t __b)
   return (uint32x4_t)__builtin_neon_vmaxuv4si ((int32x4_t) __a, (int32x4_t) __b);
 }
 
+#pragma GCC push_options
+#pragma GCC target ("fpu=neon-fp-armv8")
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vmaxnm_f32 (float32x2_t a, float32x2_t b)
+{
+  return (float32x2_t)__builtin_neon_vmaxnmv2sf (a, b);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vmaxnmq_f32 (float32x4_t a, float32x4_t b)
+{
+  return (float32x4_t)__builtin_neon_vmaxnmv4sf (a, b);
+}
+
+__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
+vminnm_f32 (float32x2_t a, float32x2_t b)
+{
+  return (float32x2_t)__builtin_neon_vminnmv2sf (a, b);
+}
+
+__extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
+vminnmq_f32 (float32x4_t a, float32x4_t b)
+{
+  return (float32x4_t)__builtin_neon_vminnmv4sf (a, b);
+}
+#pragma GCC pop_options
+
 __extension__ static __inline int8x8_t __attribute__ ((__always_inline__))
 vmin_s8 (int8x8_t __a, int8x8_t __b)
 {
diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def
index 0b719df760747af7642bd14ab14a9b2144d43359..1d3b6e9b6a08a3cf3b0d6f76bf340208919c9b13 100644
--- a/gcc/config/arm/arm_neon_builtins.def
+++ b/gcc/config/arm/arm_neon_builtins.def
@@ -126,6 +126,9 @@ VAR6 (BINOP, vmins, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
 VAR6 (BINOP, vminu, v8qi, v4hi, v2si, v16qi, v8hi, v4si)
 VAR2 (BINOP, vminf, v2sf, v4sf)
 
+VAR2 (BINOP, vmaxnm, v2sf, v4sf)
+VAR2 (BINOP, vminnm, v2sf, v4sf)
+
 VAR3 (BINOP, vpmaxs, v8qi, v4hi, v2si)
 VAR3 (BINOP, vpmaxu, v8qi, v4hi, v2si)
 VAR1 (BINOP, vpmaxf, v2sf)
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 6a541251ed1e5d7c766aca04f0da97ba6d470541..e2f7cea89688c67d841dfef4c5a4e6e003660c63 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -308,6 +308,8 @@
 
 (define_int_iterator VMAXMINF [UNSPEC_VMAX UNSPEC_VMIN])
 
+(define_int_iterator VMAXMINNM [UNSPEC_VMAXNM UNSPEC_VMINNM])
+
 (define_int_iterator VPADDL [UNSPEC_VPADDL_S UNSPEC_VPADDL_U])
 
 (define_int_iterator VPADAL [UNSPEC_VPADAL_S UNSPEC_VPADAL_U])
@@ -741,6 +743,7 @@
   (UNSPEC_VMIN "min") (UNSPEC_VMIN_U "min")
   (UNSPEC_VPMAX "max") (UNSPEC_VPMAX_U "max")
   (UNSPEC_VPMIN "min") (UNSPEC_VPMIN_U "min")
+  (UNSPEC_VMAXNM "maxnm") (UNSPEC_VMINNM "minnm")
 ])
 
 (define_int_attr shift_op [
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 62fb6daae9983470faf2c9cc686f5181b8bd7cb6..1b48451b5ee559c332573860d8a3aea0bb3a58ad 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -2354,6 +2354,16 @@
   [(set_attr "type" "neon_fp_minmax_s")]
 )
 
+(define_insn "neon_v"
+  [(set (match_operand:VCVTF 0 &

[PATCH][AARCH64][NEON] Enabling V*HFmode simd immediate loads.

2015-12-17 Thread Bilyan Borisov


This patch adds support for loading vector 16bit floating point immediates
(modes V*HF) using a movi instruction. We leverage the existing code that does
checking for an 8 bit pattern in a 64/128-bit long splattered version of the
concatenated bit pattern representations of the individual constant elements
of the vector. This enables us to load a variety of constants, since the movi
instruction also comes with an up to 24 bit immediate left shift encoding (in
multiples of 8). A new testcase was added that checks for presence of movi
instructions and for correctness of results.

Tested on aarch64-none-elf, aarch64_be-none-elf, bootstrapped on
aarch64-none-linux-gnu.

---

gcc/

2015-XX-XX  Bilyan Borisov  

* config/aarch64/aarch64.c (aarch64_simd_container_mode): Added HFmode
cases.
(aarch64_vect_float_const_representable_p): Updated comment.
(aarch64_simd_valid_immediate): Added support for V*HF arguments.
(aarch64_output_simd_mov_immediate): Added check for HFmode.

gcc/testsuite/

2015-XX-XX  Bilyan Borisov  

* gcc.target/aarch64/fp16/f16_mov_immediate_simd_1.c: New.

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index ae4cfb336a827a63a6baadefcb5646a9dbfb7523..bb6fce0a829d634a7694710e8a8c9a1c3e841abd 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -10250,6 +10250,8 @@ aarch64_simd_container_mode (machine_mode mode, unsigned width)
 	return V2DFmode;
 	  case SFmode:
 	return V4SFmode;
+	  case HFmode:
+	return V8HFmode;
 	  case SImode:
 	return V4SImode;
 	  case HImode:
@@ -10266,6 +10268,8 @@ aarch64_simd_container_mode (machine_mode mode, unsigned width)
 	  {
 	  case SFmode:
 	return V2SFmode;
+	  case HFmode:
+	return V4HFmode;
 	  case SImode:
 	return V2SImode;
 	  case HImode:
@@ -10469,7 +10473,12 @@ sizetochar (int size)
 /* Return true iff x is a uniform vector of floating-point
constants, and the constant can be represented in
quarter-precision form.  Note, as aarch64_float_const_representable
-   rejects both +0.0 and -0.0, we will also reject +0.0 and -0.0.  */
+   rejects both +0.0 and -0.0, we will also reject +0.0 and -0.0.
+   Also note that this won't ever be called for V*HFmode vectors,
+   since in aarch64_simd_valid_immediate () we check for the mode
+   and handle these vector types differently from other floating
+   point vector modes.  */
+
 static bool
 aarch64_vect_float_const_representable_p (rtx x)
 {
@@ -10505,7 +10514,10 @@ aarch64_simd_valid_immediate (rtx op, machine_mode mode, bool inverse,
   unsigned int invmask = inverse ? 0xff : 0;
   int eshift, emvn;
 
-  if (GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT)
+  /* Ignore V*HFmode vectors, they are handled below with the integer
+ code.  */
+  if (GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT
+  && GET_MODE_INNER (mode) != HFmode)
 {
   if (! (aarch64_simd_imm_zero_p (op, mode)
 	 || aarch64_vect_float_const_representable_p (op)))
@@ -10530,15 +10542,26 @@ aarch64_simd_valid_immediate (rtx op, machine_mode mode, bool inverse,
   rtx el = CONST_VECTOR_ELT (op, BYTES_BIG_ENDIAN ? (n_elts - 1 - i) : i);
   unsigned HOST_WIDE_INT elpart;
 
-  gcc_assert (CONST_INT_P (el));
-  elpart = INTVAL (el);
+  if (CONST_INT_P (el))
+	elpart = INTVAL (el);
+  /* Convert HFmode vector element to bit pattern.  Logic below will catch
+	 most common constants since for FP16 the sign and exponent are in the
+	 top 6 bits and a movi with a left shift of 8 will catch all powers
+	 of 2 that fit in a 16 bit floating point, and the 2 extra bits left
+	 for the mantissa can cover some more non-power of 2 constants.  With
+	 a 0 left shift, we can cover constants of the form 1.xxx since we have
+	 8 bits only for the mantissa.  */
+  else if (CONST_DOUBLE_P (el) && GET_MODE_INNER (mode) == HFmode)
+	elpart =
+	  real_to_target (NULL, CONST_DOUBLE_REAL_VALUE (el), HFmode);
+  else
+gcc_unreachable ();
 
   for (unsigned int byte = 0; byte < innersize; byte++)
 	{
 	  bytes[idx++] = (elpart & 0xff) ^ invmask;
 	  elpart >>= BITS_PER_UNIT;
 	}
-
 }
 
   /* Sanity check.  */
@@ -11913,7 +11936,10 @@ aarch64_output_simd_mov_immediate (rtx const_vector,
   lane_count = width / info.element_width;
 
   mode = GET_MODE_INNER (mode);
-  if (GET_MODE_CLASS (mode) == MODE_FLOAT)
+  /* We handle HFmode vectors separately from the other floating point
+ vector modes.  See aarch64_simd_valid_immediate (), but in short
+ we use a movi instruction rather than a fmov.  */
+  if (GET_MODE_CLASS (mode) == MODE_FLOAT && mode != HFmode)
 {
   gcc_assert (info.shift == 0 && ! info.mvn);
   /* For FP zero change it to a CONST_INT 0 and use the integer SIMD
diff --git a/gcc/testsuite/gcc.target/aarch64/fp16/f16_mov_immediate_simd_1.c b/gcc/testsuite/gcc.target/aarch64/fp16/f16

Re: [AARCH64][PATCH 3/3] Adding tests to check proper error reporting of out of bounds accesses to vmulx_lane* NEON intrinsics

2015-11-24 Thread Bilyan Borisov

I've made the change you've requested. Changelog & patch description are 
below.


Thanks,
Bilyan

---

This patch from the series adds tests that check for the proper error reporting
of out of bounds accesses to all the vmulx_lane NEON intrinsics variants. The
tests were added separately from the previous patch in order to reduce its
size.

gcc/testsuite/

2015-XX-XX  Bilyan Borisov  

* gcc.target/aarch64/advsimd-intrinsics/vmulx_lane_f32_indices_1.c:
New.
* gcc.target/aarch64/advsimd-intrinsics/vmulx_lane_f64_indices_1.c:
New.
* gcc.target/aarch64/advsimd-intrinsics/vmulx_laneq_f32_indices_1.c:
New.
* gcc.target/aarch64/advsimd-intrinsics/vmulx_laneq_f64_indices_1.c:
New.
* gcc.target/aarch64/advsimd-intrinsics/vmulxd_lane_f64_indices_1.c:
New.
* gcc.target/aarch64/advsimd-intrinsics/vmulxd_laneq_f64_indices_1.c:
New.
* gcc.target/aarch64/advsimd-intrinsics/vmulxq_lane_f32_indices_1.c:
New.
* gcc.target/aarch64/advsimd-intrinsics/vmulxq_lane_f64_indices_1.c:
New.
* gcc.target/aarch64/advsimd-intrinsics/vmulxq_laneq_f32_indices_1.c:
New.
* gcc.target/aarch64/advsimd-intrinsics/vmulxq_laneq_f64_indices_1.c:
New.
* gcc.target/aarch64/advsimd-intrinsics/vmulxs_lane_f32_indices_1.c:
New.
* gcc.target/aarch64/advsimd-intrinsics/vmulxs_laneq_f32_indices_1.c:
New.


On 22/11/15 15:24, James Greenhalgh wrote:

On Fri, Oct 30, 2015 at 09:32:07AM +, Bilyan Borisov wrote:

Implementing vmulx_* and vmulx_lane* NEON intrinsics

Hi all,

This series of patches focuses on the different vmulx_ and vmulx_lane NEON
intrinsics variants. All of the existing inlined assembly block implementations
are replaced with newly defined __builtin functions, and the missing intrinsics
are implemented with __builtins as well.

The rationale for the change from assembly to __builtin is that the compiler
would be able to do more optimisations like instruction scheduling. A new named
md pattern was added for the new fmulx __builtin.

Most vmulx_lane variants have been implemented as a combination of a vdup
followed by a vmulx_, rather than as separate __builtins.  The remaining
vmulx_lane intrinsics (vmulx(s|d)_lane*) were implemented using
__aarch64_vget_lane_any () and an appropriate vmulx. Four new nameless md
patterns were added to replace all the different types of RTL generated from the
combination of these intrinsics during the combine pass.

The rationale for this change is that in this way we would be able to optimise
away all uses of a dup followed by a fmulx to the appropriate fmulx lane variant
instruction.

New test cases were added for all the implemented intrinsics. Also new tests
were added for the proper error reporting of out-of-bounds accesses to _lane
intrinsics.

Tested on targets aarch64-none-elf and aarch64_be-none-elf.

diff --git 
a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulx_lane_f32_indices_1.c
 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulx_lane_f32_indices_1.c
new file mode 100644
index 
..5681d5d21bc62e54e308c0a7c171f6f1b8969b71
--- /dev/null
+++ 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulx_lane_f32_indices_1.c
@@ -0,0 +1,16 @@
+#include 
+
+/* { dg-do compile } */
+/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+float32x2_t
+f_vmulx_lane_f32 (float32x2_t v1, float32x2_t v2)
+{
+  float32x2_t res;
+  /* { dg-error "lane -1 out of range 0 - 1" "" { xfail arm*-*-* } 0 } */
+  res = vmulx_lane_f32 (v1, v2, -1);
+  /* { dg-error "lane 2 out of range 0 - 1" "" { xfail arm*-*-* } 0 } */

Given the dg-skip-if directive, do we really need the cfail directive for
arm*-*-*, surely the whole test is skipped regardless?

Could you respin this patch without the xfails?

Thanks,
James



diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulx_lane_f32_indices_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulx_lane_f32_indices_1.c
new file mode 100644
index ..1494633e606d7b9b77d913dbc99b1a127cc9b661
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulx_lane_f32_indices_1.c
@@ -0,0 +1,16 @@
+#include 
+
+/* { dg-do compile } */
+/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+float32x2_t
+f_vmulx_lane_f32 (float32x2_t v1, float32x2_t v2)
+{
+  float32x2_t res;
+  /* { dg-error "lane -1 out of range 0 - 1" "" { target *-*-* } 0 } */
+  res = vmulx_lane_f32 (v1, v2, -1);
+  /* { dg-error "lane 2 out of range 0 - 1" "" { target *-*-* } 0 } */
+  res = vmulx_lane_f32 (v1, v2, 2);
+  return

[AARCH64] Adding constant folding for __builtin_fmulx* with scalar 32 and 64 bit arguments

2015-11-09 Thread Bilyan Borisov


This patch adds an extension to aarch64_gimple_fold_builtin () that does
constant folding on __builtin_fmulx* calls for 32 and 64 bit floating point
scalar modes. We fold when both arguments are constant, as well as when only one
is. The special cases of 0*inf, -0*inf, 0*-inf, and -0*-inf are also
handled. The case for vector constant arguments will be dealt with in a future
patch since the tests for that would be obscure and would unnecessarily
complicate this patch.

Added tests to check for proper handling of constant folding. Tested on targets
aarch64-none-elf and aarch64_be-none-elf.

---

gcc/

2015-XX-XX  Bilyan Borisov  

* config/aarch64/aarch64-builtins.c (aarch64_gimple_fold_builtin): Added
constant folding.

gcc/testsuite/

2015-XX-XX  Bilyan Borisov  

* gcc.target/aarch64/simd/vmulx.x: New.
* gcc.target/aarch64/simd/vmulx_f64_2.c: Likewise.
* gcc.target/aarch64/simd/vmulxd_f64_2.c: Likewise.
* gcc.target/aarch64/simd/vmulxs_f32_2.c: Likewise.

diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index a1998ed550ac801e4d80baae122bf58e394a563f..339054d344900c942d5ce7c047479de3bbb4e61b 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -1362,7 +1362,7 @@ aarch64_gimple_fold_builtin (gimple_stmt_iterator *gsi)
   if (fndecl)
 	{
 	  int fcode = DECL_FUNCTION_CODE (fndecl);
-	  int nargs = gimple_call_num_args (stmt);
+	  unsigned nargs = gimple_call_num_args (stmt);
 	  tree *args = (nargs > 0
 			? gimple_call_arg_ptr (stmt, 0)
 			: &error_mark_node);
@@ -1386,7 +1386,54 @@ aarch64_gimple_fold_builtin (gimple_stmt_iterator *gsi)
 		new_stmt = gimple_build_assign (gimple_call_lhs (stmt),
 		REDUC_MIN_EXPR, args[0]);
 		break;
-
+	  BUILTIN_GPF (BINOP, fmulx, 0)
+		{
+		  gcc_assert (nargs == 2);
+		  bool a0_cst_p = TREE_CODE (args[0]) == REAL_CST;
+		  bool a1_cst_p = TREE_CODE (args[1]) == REAL_CST;
+		  if (a0_cst_p || a1_cst_p)
+		{
+		  if (a0_cst_p && a1_cst_p)
+			{
+			  tree t0 = TREE_TYPE (args[0]);
+			  real_value a0 = (TREE_REAL_CST (args[0]));
+			  real_value a1 = (TREE_REAL_CST (args[1]));
+			  if (real_equal (&a1, &dconst0))
+			std::swap (a0, a1);
+			  /* According to real_equal (), +0 equals -0.  */
+			  if (real_equal (&a0, &dconst0) && real_isinf (&a1))
+			{
+			  real_value res = dconst2;
+			  res.sign = a0.sign ^ a1.sign;
+			  new_stmt =
+gimple_build_assign (gimple_call_lhs (stmt),
+		 REAL_CST,
+		 build_real (t0, res));
+			}
+			  else
+			new_stmt =
+			  gimple_build_assign (gimple_call_lhs (stmt),
+		   MULT_EXPR,
+		   args[0], args[1]);
+			}
+		  else /* a0_cst_p ^ a1_cst_p.  */
+			{
+			  real_value const_part = a0_cst_p
+			? TREE_REAL_CST (args[0]) : TREE_REAL_CST (args[1]);
+			  if (!real_equal (&const_part, &dconst0)
+			  && !real_isinf (&const_part))
+			new_stmt =
+			  gimple_build_assign (gimple_call_lhs (stmt),
+		   MULT_EXPR, args[0], args[1]);
+			}
+		}
+		  if (new_stmt)
+		{
+		  gimple_set_vuse (new_stmt, gimple_vuse (stmt));
+		  gimple_set_vdef (new_stmt, gimple_vdef (stmt));
+		}
+		  break;
+		}
 	default:
 	  break;
 	}
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vmulx.x b/gcc/testsuite/gcc.target/aarch64/simd/vmulx.x
new file mode 100644
index ..8968a64a95cb40a466dd77fea4e9f9f63ad707dc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vmulx.x
@@ -0,0 +1,46 @@
+#define PASS_ARRAY(...) {__VA_ARGS__}
+
+#define SETUP_TEST_CASE_VEC(I, INTRINSIC, BASE_TYPE, TYPE1, TYPE2,	\
+			VALS1, VALS2, EXPS, LEN, FM, Q_LD, Q_ST,	\
+			V1, V2)	\
+  do	\
+{	\
+  int i##I;\
+  BASE_TYPE vec##I##_1_data[] = VALS1;	\
+  BASE_TYPE vec##I##_2_data[] = VALS2;	\
+  V1 TYPE1 vec##I##_1 = vld1##Q_LD##_##FM (vec##I##_1_data);		\
+  V2 TYPE2 vec##I##_2 = vld1##Q_LD##_##FM (vec##I##_2_data);		\
+  TYPE1 actual##I##_v = INTRINSIC (vec##I##_1, vec##I##_2);		\
+  volatile BASE_TYPE expected##I[] = EXPS;\
+  BASE_TYPE actual##I[LEN];		\
+  vst1##Q_ST##_##FM (actual##I, actual##I##_v);\
+  for (i##I = 0; i##I < LEN; ++i##I)	\
+if (actual##I[i##I] != expected##I[i##I])\
+  abort ();\
+}	\
+  while (0)\
+
+#define SETUP_TEST_CASE_SCALAR(I, INTRINSIC, TYPE, VAL1, VAL2, EXP)	\
+  do	\
+{	\
+  TYPE vec_##I##_1 = VAL1;		\
+  TYPE vec_##I##_2 = VAL2;		\
+  TYPE expected_##I = EXP;		\
+  volatile TYPE actual_##I = INTRINSIC (vec_##I##_1, vec_##I##_2);	\
+  if (actual_##I != expected_##I)	\
+abort ();\
+}	\
+  while (0)\
+
+/* Functions used to return values that won't be optimised away.  */
+float32_t  __attribute__

Re: [AARCH64][PATCH 2/3] Implementing vmulx_lane NEON intrinsic variants + Changelog

2015-11-09 Thread Bilyan Borisov


On 09/11/15 11:03, Bilyan Borisov wrote:



On 03/11/15 11:16, James Greenhalgh wrote:

On Fri, Oct 30, 2015 at 09:31:08AM +, Bilyan Borisov wrote:
In this patch from the series, all vmulx_lane variants have been 
implemented as

a vdup followed by a vmulx. Existing implementations of intrinsics were
refactored to use this new approach.

Several new nameless md patterns are added that will enable the 
combine pass to
pick up the dup/fmulx combination and replace it with a proper 
fmulx[lane]

instruction.

In addition, test cases for all new intrinsics were added. Tested on 
targets

aarch64-none-elf and aarch64_be-none-elf.

Hi,

I have a small style comment below.


gcc/

2015-XX-XX  Bilyan Borisov  

* config/aarch64/arm_neon.h (vmulx_lane_f32): New.
(vmulx_lane_f64): New.
(vmulxq_lane_f32): Refactored & moved.
(vmulxq_lane_f64): Refactored & moved.
(vmulx_laneq_f32): New.
(vmulx_laneq_f64): New.
(vmulxq_laneq_f32): New.
(vmulxq_laneq_f64): New.
(vmulxs_lane_f32): New.
(vmulxs_laneq_f32): New.
(vmulxd_lane_f64): New.
(vmulxd_laneq_f64): New.
* config/aarch64/aarch64-simd.md (*aarch64_combine_dupfmulx1,
VDQSF): New pattern.
(*aarch64_combine_dupfmulx2, VDQF): New pattern.
(*aarch64_combine_dupfmulx3): New pattern.
(*aarch64_combine_vgetfmulx1, VDQF_DF): New pattern.
I'm not sure I like the use of 1,2,3 for this naming scheme. 
Elsewhere in

the file, this convention points to the number of operands a pattern
requires (for example add3).

I think elsewhere in the file we use:


   "*aarch64_mul3_elt"
   "*aarch64_mul3_elt_"
   "*aarch64_mul3_elt_to_128df"
   "*aarch64_mul3_elt_to_64v2df"

Is there a reason not to follow that pattern?

Thanks,
James


Hi,

I've made the changes you've requested - the pattern names have been 
changed to follow better the naming conventions elsewhere in the file.


Thanks,
Bilyan


Hi,

You can find the new updated Changelog for this patch below.
Thanks,
Bilyan

---

In this patch from the series, all vmulx_lane variants have been implemented as
a vdup followed by a vmulx. Existing implementations of intrinsics were
refactored to use this new approach.

Several new nameless md patterns are added that will enable the combine pass to
pick up the dup/fmulx combination and replace it with a proper fmulx[lane]
instruction.

In addition, test cases for all new intrinsics were added. Tested on targets
aarch64-none-elf and aarch64_be-none-elf.

gcc/

2015-XX-XX  Bilyan Borisov  

* config/aarch64/arm_neon.h (vmulx_lane_f32): New.
(vmulx_lane_f64): Likewise.
(vmulxq_lane_f32): Refactored & moved.
(vmulxq_lane_f64): Likewise.
(vmulx_laneq_f32): New.
(vmulx_laneq_f64): Likewise.
(vmulxq_laneq_f32): Likewise.
(vmulxq_laneq_f64): Likewise.
(vmulxs_lane_f32): Likewise.
(vmulxs_laneq_f32): Likewise.
(vmulxd_lane_f64): Likewise.
(vmulxd_laneq_f64): Likewise.
* config/aarch64/aarch64-simd.md
(*aarch64_mulx_elt_, VDQSF): New pattern.
(*aarch64_mulx_elt, VDQF): Likewise.
(*aarch64_mulx_elt_to_64v2df): Likewise.
(*aarch64_vgetfmulx, VDQF_DF): Likewise.

gcc/testsuite/

2015-XX-XX  Bilyan Borisov  

* gcc.target/aarch64/simd/vmulx_lane_f32_1.c: New.
* gcc.target/aarch64/simd/vmulx_lane_f64_1.c: New.
* gcc.target/aarch64/simd/vmulx_laneq_f32_1.c: New.
* gcc.target/aarch64/simd/vmulx_laneq_f64_1.c: New.
* gcc.target/aarch64/simd/vmulxq_lane_f32_1.c: New.
* gcc.target/aarch64/simd/vmulxq_lane_f64_1.c: New.
* gcc.target/aarch64/simd/vmulxq_laneq_f32_1.c: New.
* gcc.target/aarch64/simd/vmulxq_laneq_f64_1.c: New.
* gcc.target/aarch64/simd/vmulxs_lane_f32_1.c: New.
* gcc.target/aarch64/simd/vmulxs_laneq_f32_1.c: New.
* gcc.target/aarch64/simd/vmulxd_lane_f64_1.c: New.
* gcc.target/aarch64/simd/vmulxd_laneq_f64_1.c: New.

Re: [AARCH64][PATCH 2/3] Implementing vmulx_lane NEON intrinsic variants

2015-11-09 Thread Bilyan Borisov




On 03/11/15 11:16, James Greenhalgh wrote:

On Fri, Oct 30, 2015 at 09:31:08AM +, Bilyan Borisov wrote:

In this patch from the series, all vmulx_lane variants have been implemented as
a vdup followed by a vmulx. Existing implementations of intrinsics were
refactored to use this new approach.

Several new nameless md patterns are added that will enable the combine pass to
pick up the dup/fmulx combination and replace it with a proper fmulx[lane]
instruction.

In addition, test cases for all new intrinsics were added. Tested on targets
aarch64-none-elf and aarch64_be-none-elf.

Hi,

I have a small style comment below.


gcc/

2015-XX-XX  Bilyan Borisov  

* config/aarch64/arm_neon.h (vmulx_lane_f32): New.
(vmulx_lane_f64): New.
(vmulxq_lane_f32): Refactored & moved.
(vmulxq_lane_f64): Refactored & moved.
(vmulx_laneq_f32): New.
(vmulx_laneq_f64): New.
(vmulxq_laneq_f32): New.
(vmulxq_laneq_f64): New.
(vmulxs_lane_f32): New.
(vmulxs_laneq_f32): New.
(vmulxd_lane_f64): New.
(vmulxd_laneq_f64): New.
* config/aarch64/aarch64-simd.md (*aarch64_combine_dupfmulx1,
VDQSF): New pattern.
(*aarch64_combine_dupfmulx2, VDQF): New pattern.
(*aarch64_combine_dupfmulx3): New pattern.
(*aarch64_combine_vgetfmulx1, VDQF_DF): New pattern.

I'm not sure I like the use of 1,2,3 for this naming scheme. Elsewhere in
the file, this convention points to the number of operands a pattern
requires (for example add3).

I think elsewhere in the file we use:


   "*aarch64_mul3_elt"
   "*aarch64_mul3_elt_"
   "*aarch64_mul3_elt_to_128df"
   "*aarch64_mul3_elt_to_64v2df"

Is there a reason not to follow that pattern?

Thanks,
James


Hi,

I've made the changes you've requested - the pattern names have been 
changed to follow better the naming conventions elsewhere in the file.


Thanks,
Bilyan
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 269e00237bb1153ebf42505906ec5b760b04aafe..5ff19094b2fb10b332d186a6de02752b31ed4141 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2880,6 +2880,79 @@
  [(set_attr "type" "neon_fp_mul_")]
 )
 
+;; fmulxq_lane_f32, and fmulx_laneq_f32
+
+(define_insn "*aarch64_mulx_elt_"
+  [(set (match_operand:VDQSF 0 "register_operand" "=w")
+	(unspec:VDQSF
+	 [(match_operand:VDQSF 1 "register_operand" "w")
+	  (vec_duplicate:VDQSF
+	   (vec_select:
+	(match_operand: 2 "register_operand" "w")
+	(parallel [(match_operand:SI 3 "immediate_operand" "i")])))]
+	 UNSPEC_FMULX))]
+  "TARGET_SIMD"
+  {
+operands[3] = GEN_INT (ENDIAN_LANE_N (mode,
+	  INTVAL (operands[3])));
+return "fmulx\t%0, %1, %2.[%3]";
+  }
+  [(set_attr "type" "neon_fp_mul__scalar")]
+)
+
+;; fmulxq_laneq_f32, fmulxq_laneq_f64, fmulx_lane_f32
+
+(define_insn "*aarch64_mulx_elt"
+  [(set (match_operand:VDQF 0 "register_operand" "=w")
+	(unspec:VDQF
+	 [(match_operand:VDQF 1 "register_operand" "w")
+	  (vec_duplicate:VDQF
+	   (vec_select:
+	(match_operand:VDQF 2 "register_operand" "w")
+	(parallel [(match_operand:SI 3 "immediate_operand" "i")])))]
+	 UNSPEC_FMULX))]
+  "TARGET_SIMD"
+  {
+operands[3] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[3])));
+return "fmulx\t%0, %1, %2.[%3]";
+  }
+  [(set_attr "type" "neon_fp_mul_")]
+)
+
+;; fmulxq_lane_f64
+
+(define_insn "*aarch64_mulx_elt_to_64v2df"
+  [(set (match_operand:V2DF 0 "register_operand" "=w")
+	(unspec:V2DF
+	 [(match_operand:V2DF 1 "register_operand" "w")
+	  (vec_duplicate:V2DF
+	(match_operand:DF 2 "register_operand" "w"))]
+	 UNSPEC_FMULX))]
+  "TARGET_SIMD"
+  {
+return "fmulx\t%0.2d, %1.2d, %2.d[0]";
+  }
+  [(set_attr "type" "neon_fp_mul_d_scalar_q")]
+)
+
+;; fmulxs_lane_f32, fmulxs_laneq_f32, fmulxd_lane_f64 ==  fmulx_lane_f64,
+;; fmulxd_laneq_f64 == fmulx_laneq_f64
+
+(define_insn "*aarch64_vgetfmulx"
+  [(set (match_operand: 0 "register_operand" "=w")
+	(unspec:
+	 [(match_operand: 1 "register_operand" "w")
+	  (vec_select:
+	   (match_operand:VDQF_DF 2 "register_operand" "w")
+	(parallel [(match_operand:SI 3 "immediate_operand" "i")]))]
+	 UNSPEC_FMULX))]
+  "TARGET_SIMD"
+  {
+operands[3] = GEN_INT (ENDIAN_LANE_N (mode,

[AARCH64][PATCH 3/3] Adding tests to check proper error reporting of out of bounds accesses to vmulx_lane* NEON intrinsics

2015-10-30 Thread Bilyan Borisov


Implementing vmulx_* and vmulx_lane* NEON intrinsics

Hi all,

This series of patches focuses on the different vmulx_ and vmulx_lane NEON
intrinsics variants. All of the existing inlined assembly block implementations
are replaced with newly defined __builtin functions, and the missing intrinsics
are implemented with __builtins as well.

The rationale for the change from assembly to __builtin is that the compiler
would be able to do more optimisations like instruction scheduling. A new named
md pattern was added for the new fmulx __builtin.

Most vmulx_lane variants have been implemented as a combination of a vdup
followed by a vmulx_, rather than as separate __builtins.  The remaining
vmulx_lane intrinsics (vmulx(s|d)_lane*) were implemented using
__aarch64_vget_lane_any () and an appropriate vmulx. Four new nameless md
patterns were added to replace all the different types of RTL generated from the
combination of these intrinsics during the combine pass.

The rationale for this change is that in this way we would be able to optimise
away all uses of a dup followed by a fmulx to the appropriate fmulx lane variant
instruction.

New test cases were added for all the implemented intrinsics. Also new tests
were added for the proper error reporting of out-of-bounds accesses to _lane
intrinsics.

Tested on targets aarch64-none-elf and aarch64_be-none-elf.

Dependencies: patch 2/3 depends on patch 1/3, and patch 3/3 depends on patch
2/3.

---

This patch from the series adds tests that check for the proper error reporting
of out of bounds accesses to all the vmulx_lane NEON intrinsics variants. The
tests were added separately from the previous patch in order to reduce its
size.

gcc/testsuite/

2015-XX-XX  Bilyan Borisov  

* gcc.target/aarch64/advsimd-intrinsics/vmulx_lane_f32_indices_1.c:
New.
* gcc.target/aarch64/advsimd-intrinsics/vmulx_lane_f64_indices_1.c:
New.
* gcc.target/aarch64/advsimd-intrinsics/vmulx_laneq_f32_indices_1.c:
New.
* gcc.target/aarch64/advsimd-intrinsics/vmulx_laneq_f64_indices_1.c:
New.
* gcc.target/aarch64/advsimd-intrinsics/vmulxd_lane_f64_indices_1.c:
New.
* gcc.target/aarch64/advsimd-intrinsics/vmulxd_laneq_f64_indices_1.c:
New.
* gcc.target/aarch64/advsimd-intrinsics/vmulxq_lane_f32_indices_1.c:
New.
* gcc.target/aarch64/advsimd-intrinsics/vmulxq_lane_f64_indices_1.c:
New.
* gcc.target/aarch64/advsimd-intrinsics/vmulxq_laneq_f32_indices_1.c:
New.
* gcc.target/aarch64/advsimd-intrinsics/vmulxq_laneq_f64_indices_1.c:
New.
* gcc.target/aarch64/advsimd-intrinsics/vmulxs_lane_f32_indices_1.c:
New.
* gcc.target/aarch64/advsimd-intrinsics/vmulxs_laneq_f32_indices_1.c:
New.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulx_lane_f32_indices_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulx_lane_f32_indices_1.c
new file mode 100644
index ..5681d5d21bc62e54e308c0a7c171f6f1b8969b71
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulx_lane_f32_indices_1.c
@@ -0,0 +1,16 @@
+#include 
+
+/* { dg-do compile } */
+/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+float32x2_t
+f_vmulx_lane_f32 (float32x2_t v1, float32x2_t v2)
+{
+  float32x2_t res;
+  /* { dg-error "lane -1 out of range 0 - 1" "" { xfail arm*-*-* } 0 } */
+  res = vmulx_lane_f32 (v1, v2, -1);
+  /* { dg-error "lane 2 out of range 0 - 1" "" { xfail arm*-*-* } 0 } */
+  res = vmulx_lane_f32 (v1, v2, 2);
+  return res;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulx_lane_f64_indices_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulx_lane_f64_indices_1.c
new file mode 100644
index ..0c8f313d10423d6b189fe3c9081f0a6ef55f9937
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulx_lane_f64_indices_1.c
@@ -0,0 +1,16 @@
+#include 
+
+/* { dg-do compile } */
+/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
+/* { dg-skip-if "" { arm*-*-* } } */
+
+float64x1_t
+f_vmulx_lane_f64 (float64x1_t v1, float64x1_t v2)
+{
+  float64x1_t res;
+  /* { dg-error "lane -1 out of range 0 - 0" "" { xfail arm*-*-* } 0 } */
+  res = vmulx_lane_f64 (v1, v2, -1);
+  /* { dg-error "lane 1 out of range 0 - 0" "" { xfail arm*-*-* } 0 } */
+  res = vmulx_lane_f64 (v1, v2, 1);
+  return res;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulx_laneq_f32_indices_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulx_laneq_f32_indices_1.c
new file mode 100644
index ..9053ab6a7bd1bf9eba1d308ceed1524685e031e4
-

[AARCH64][PATCH 2/3] Implementing vmulx_lane NEON intrinsic variants

2015-10-30 Thread Bilyan Borisov


Implementing vmulx_* and vmulx_lane* NEON intrinsics

Hi all,

This series of patches focuses on the different vmulx_ and vmulx_lane NEON
intrinsics variants. All of the existing inlined assembly block implementations
are replaced with newly defined __builtin functions, and the missing intrinsics
are implemented with __builtins as well.

The rationale for the change from assembly to __builtin is that the compiler
would be able to do more optimisations like instruction scheduling. A new named
md pattern was added for the new fmulx __builtin.

Most vmulx_lane variants have been implemented as a combination of a vdup
followed by a vmulx_, rather than as separate __builtins.  The remaining
vmulx_lane intrinsics (vmulx(s|d)_lane*) were implemented using
__aarch64_vget_lane_any () and an appropriate vmulx. Four new nameless md
patterns were added to replace all the different types of RTL generated from the
combination of these intrinsics during the combine pass.

The rationale for this change is that in this way we would be able to optimise
away all uses of a dup followed by a fmulx to the appropriate fmulx lane variant
instruction.

New test cases were added for all the implemented intrinsics. Also new tests
were added for the proper error reporting of out-of-bounds accesses to _lane
intrinsics.

Tested on targets aarch64-none-elf and aarch64_be-none-elf.

Dependencies: patch 2/3 depends on patch 1/3, and patch 3/3 depends on patch
2/3.

---

In this patch from the series, all vmulx_lane variants have been implemented as
a vdup followed by a vmulx. Existing implementations of intrinsics were
refactored to use this new approach.

Several new nameless md patterns are added that will enable the combine pass to
pick up the dup/fmulx combination and replace it with a proper fmulx[lane]
instruction.

In addition, test cases for all new intrinsics were added. Tested on targets
aarch64-none-elf and aarch64_be-none-elf.

gcc/

2015-XX-XX  Bilyan Borisov  

* config/aarch64/arm_neon.h (vmulx_lane_f32): New.
(vmulx_lane_f64): New.
(vmulxq_lane_f32): Refactored & moved.
(vmulxq_lane_f64): Refactored & moved.
(vmulx_laneq_f32): New.
(vmulx_laneq_f64): New.
(vmulxq_laneq_f32): New.
(vmulxq_laneq_f64): New.
(vmulxs_lane_f32): New.
(vmulxs_laneq_f32): New.
(vmulxd_lane_f64): New.
(vmulxd_laneq_f64): New.
* config/aarch64/aarch64-simd.md (*aarch64_combine_dupfmulx1,
VDQSF): New pattern.
(*aarch64_combine_dupfmulx2, VDQF): New pattern.
(*aarch64_combine_dupfmulx3): New pattern.
(*aarch64_combine_vgetfmulx1, VDQF_DF): New pattern.

gcc/testsuite/

2015-XX-XX  Bilyan Borisov  

* gcc/testsuite/gcc.target/aarch64/simd/vmulx_lane_f32_1.c: New.
* gcc/testsuite/gcc.target/aarch64/simd/vmulx_lane_f64_1.c: New.
* gcc/testsuite/gcc.target/aarch64/simd/vmulx_laneq_f32_1.c: New.
* gcc/testsuite/gcc.target/aarch64/simd/vmulx_laneq_f64_1.c: New.
* gcc/testsuite/gcc.target/aarch64/simd/vmulxq_lane_f32_1.c: New.
* gcc/testsuite/gcc.target/aarch64/simd/vmulxq_lane_f64_1.c: New.
* gcc/testsuite/gcc.target/aarch64/simd/vmulxq_laneq_f32_1.c: New.
* gcc/testsuite/gcc.target/aarch64/simd/vmulxq_laneq_f64_1.c: New.
* gcc/testsuite/gcc.target/aarch64/simd/vmulxs_lane_f32_1.c: New.
* gcc/testsuite/gcc.target/aarch64/simd/vmulxs_laneq_f32_1.c: New.
* gcc/testsuite/gcc.target/aarch64/simd/vmulxd_lane_f64_1.c: New.
* gcc/testsuite/gcc.target/aarch64/simd/vmulxd_laneq_f64_1.c: New.

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index e7ebbd158d21691791a8d7db8a2616062e50..8d6873a45ad0cdef42f7c632bca38096b9de1787 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2822,6 +2822,79 @@
  [(set_attr "type" "neon_fp_mul_")]
 )
 
+;; fmulxq_lane_f32, and fmulx_laneq_f32
+
+(define_insn "*aarch64_combine_dupfmulx1"
+  [(set (match_operand:VDQSF 0 "register_operand" "=w")
+	(unspec:VDQSF
+	 [(match_operand:VDQSF 1 "register_operand" "w")
+	  (vec_duplicate:VDQSF
+	   (vec_select:
+	(match_operand: 2 "register_operand" "w")
+	(parallel [(match_operand:SI 3 "immediate_operand" "i")])))]
+	 UNSPEC_FMULX))]
+  "TARGET_SIMD"
+  {
+operands[3] = GEN_INT (ENDIAN_LANE_N (mode,
+	  INTVAL (operands[3])));
+return "fmulx\t%0, %1, %2.[%3]";
+  }
+  [(set_attr "type" "neon_fp_mul__scalar")]
+)
+
+;; fmulxq_laneq_f32, fmulxq_laneq_f64, fmulx_lane_f32
+
+(define_insn "*aarch64_combine_dupfmulx2"
+  [(set (match_operand:VDQF 0 "register_operand" "=w")
+	(unspec:VDQF
+	 [(match_operand:VDQF 1 "register_operand" "w")
+	  (vec_duplicate:V

[AARCH64][PATCH 1/3] Implementing the variants of the vmulx_ NEON intrinsic

2015-10-30 Thread Bilyan Borisov


Implementing vmulx_* and vmulx_lane* NEON intrinsics

Hi all,

This series of patches focuses on the different vmulx_ and vmulx_lane NEON
intrinsics variants. All of the existing inlined assembly block implementations
are replaced with newly defined __builtin functions, and the missing intrinsics
are implemented with __builtins as well.

The rationale for the change from assembly to __builtin is that the compiler
would be able to do more optimisations like instruction scheduling. A new named
md pattern was added for the new fmulx __builtin.

Most vmulx_lane variants have been implemented as a combination of a vdup
followed by a vmulx_, rather than as separate __builtins.  The remaining
vmulx_lane intrinsics (vmulx(s|d)_lane*) were implemented using
__aarch64_vget_lane_any () and an appropriate vmulx. Four new nameless md
patterns were added to replace all the different types of RTL generated from the
combination of these intrinsics during the combine pass.

The rationale for this change is that in this way we would be able to optimise
away all uses of a dup followed by a fmulx to the appropriate fmulx lane variant
instruction.

New test cases were added for all the implemented intrinsics. Also new tests
were added for the proper error reporting of out-of-bounds accesses to _lane
intrinsics.

Tested on targets aarch64-none-elf and aarch64_be-none-elf.

Dependencies: patch 2/3 depends on patch 1/3, and patch 3/3 depends on patch
2/3.

---

In this patch from the series, a single new md pattern is added: the one for
fmulx, from which all necessary __builtin functions are derived.

Several intrinsics were refactored to use the new __builtin functions as some
of them already had an assembly block implementation. The rest, which had no
existing implementation, were also added. A single intrinsic was removed:
vmulx_lane_f32, since there was no test case that covered it and, moreover,
its implementation was wrong: it was in fact implementing vmulxq_lane_f32.

In addition, test cases for all new intrinsics were added. Tested on targets
aarch64-none-elf and aarch64_be-none-elf.

gcc/

2015-XX-XX  Bilyan Borisov  

* config/aarch64/aarch64-simd-builtins.def: BUILTIN declaration for
fmulx...
* config/aarch64/aarch64-simd.md: And its corresponding md pattern.
* config/aarch64/arm_neon.h (vmulx_f32): Refactored to call fmulx
__builtin, also moved.
(vmulxq_f32): Same.
(vmulx_f64): New, uses __builtin.
(vmulxq_f64): Refactored to call fmulx __builtin, also moved.
(vmulxs_f32): Same.
(vmulxd_f64): Same.
(vmulx_lane_f32): Removed, implementation was wrong.
* config/aarch64/iterators.md: UNSPEC enum for fmulx.

gcc/testsuite/

2015-XX-XX  Bilyan Borisov  

* gcc/testsuite/gcc.target/aarch64/simd/vmulx_f32_1.c: New.
* gcc/testsuite/gcc.target/aarch64/simd/vmulx_f64_1.c: New.
* gcc/testsuite/gcc.target/aarch64/simd/vmulxq_f32_1.c: New.
* gcc/testsuite/gcc.target/aarch64/simd/vmulxq_f64_1.c: New.
* gcc/testsuite/gcc.target/aarch64/simd/vmulxs_f32_1.c: New.
* gcc/testsuite/gcc.target/aarch64/simd/vmulxd_f64_1.c: New.

diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 2c13cfb0823640254f02c202b19ddae78484d537..eed5f2b21997d4ea439dea828a0888cb253ad041 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -41,6 +41,7 @@
 
   BUILTIN_VDC (COMBINE, combine, 0)
   BUILTIN_VB (BINOP, pmul, 0)
+  BUILTIN_VALLF (BINOP, fmulx, 0)
   BUILTIN_VDQF_DF (UNOP, sqrt, 2)
   BUILTIN_VD_BHSI (BINOP, addp, 0)
   VAR1 (UNOP, addp, 0, di)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 541faf982effc7195a5f8d0d82738f76a7e04b4b..e7ebbd158d21691791a8d7db8a2616062e50 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2810,6 +2810,18 @@
   [(set_attr "type" "neon_mul_")]
 )
 
+;; fmulx.
+
+(define_insn "aarch64_fmulx"
+  [(set (match_operand:VALLF 0 "register_operand" "=w")
+	(unspec:VALLF [(match_operand:VALLF 1 "register_operand" "w")
+		   (match_operand:VALLF 2 "register_operand" "w")]
+		  UNSPEC_FMULX))]
+ "TARGET_SIMD"
+ "fmulx\t%0, %1, %2"
+ [(set_attr "type" "neon_fp_mul_")]
+)
+
 ;; q
 
 (define_insn "aarch64_"
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 91ada618b79e038eb61e09ecd29af5129de81f51..4a3ef455b0945ed7e77fb3e78621d5010cd4c094 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -8509,63 +8509,6 @@ vmulq_n_u32 (uint32x4_t a, uint32_t b)
   return result;
 }
 
-__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
-vmulx_f32 (float32x2_t a, float32x2_t b)

[PATCH] [ARM] Replacing variable swaps that use a temporary variable with a call to std::swap in gcc/config/arm/arm.c

2015-09-21 Thread Bilyan Borisov


Replacing variable swaps that use a temporary variable with a call to 
std::swap. Tested against arm-none-eabi target including a variant with neon 
enabled.

2015-XX-XX  Bilyan Borisov  

* config/arm/arm.c (thumb_output_move_mem_multiple): Replaced
operands[4] operands[5] swap with std::swap, removed tmp variable.
(arm_evpc_neon_vzip): Replaced in0/in1 and
out0/out1 swaps with std::swap, removed x variable.
(arm_evpc_neon_vtrn): Replaced in0/int1 and
out0/out1 swaos with std::swap, removed x variable.
(arm_expand_vec_perm_const_1): Replaced
d->op0/d->op1 swap with std::swap, removed x variable.
(arm_evpc_neon_vuzp): Replaced in0/in1 and
out0/out1 swaps with std::swap, removed x variable.

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index fa4e083adfe215b5820237f3cc6b449dbdefc778..dc549fe470e5a82d740cf8014057ee2cc1d54085 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -25362,17 +25362,12 @@ thumb_load_double_from_address (rtx *operands)
 const char *
 thumb_output_move_mem_multiple (int n, rtx *operands)
 {
-  rtx tmp;
-
   switch (n)
 {
 case 2:
   if (REGNO (operands[4]) > REGNO (operands[5]))
-	{
-	  tmp = operands[4];
-	  operands[4] = operands[5];
-	  operands[5] = tmp;
-	}
+	std::swap (operands[4], operands[5]);
+
   output_asm_insn ("ldmia\t%1!, {%4, %5}", operands);
   output_asm_insn ("stmia\t%0!, {%4, %5}", operands);
   break;
@@ -27885,7 +27880,7 @@ static bool
 arm_evpc_neon_vuzp (struct expand_vec_perm_d *d)
 {
   unsigned int i, odd, mask, nelt = d->nelt;
-  rtx out0, out1, in0, in1, x;
+  rtx out0, out1, in0, in1;
   rtx (*gen)(rtx, rtx, rtx, rtx);
 
   if (GET_MODE_UNIT_SIZE (d->vmode) >= 8)
@@ -27929,14 +27924,14 @@ arm_evpc_neon_vuzp (struct expand_vec_perm_d *d)
   in1 = d->op1;
   if (BYTES_BIG_ENDIAN)
 {
-  x = in0, in0 = in1, in1 = x;
+  std::swap (in0, in1);
   odd = !odd;
 }
 
   out0 = d->target;
   out1 = gen_reg_rtx (d->vmode);
   if (odd)
-x = out0, out0 = out1, out1 = x;
+std::swap (out0, out1);
 
   emit_insn (gen (out0, in0, in1, out1));
   return true;
@@ -27948,7 +27943,7 @@ static bool
 arm_evpc_neon_vzip (struct expand_vec_perm_d *d)
 {
   unsigned int i, high, mask, nelt = d->nelt;
-  rtx out0, out1, in0, in1, x;
+  rtx out0, out1, in0, in1;
   rtx (*gen)(rtx, rtx, rtx, rtx);
 
   if (GET_MODE_UNIT_SIZE (d->vmode) >= 8)
@@ -27996,14 +27991,14 @@ arm_evpc_neon_vzip (struct expand_vec_perm_d *d)
   in1 = d->op1;
   if (BYTES_BIG_ENDIAN)
 {
-  x = in0, in0 = in1, in1 = x;
+  std::swap (in0, in1);
   high = !high;
 }
 
   out0 = d->target;
   out1 = gen_reg_rtx (d->vmode);
   if (high)
-x = out0, out0 = out1, out1 = x;
+std::swap (out0, out1);
 
   emit_insn (gen (out0, in0, in1, out1));
   return true;
@@ -28089,7 +28084,7 @@ static bool
 arm_evpc_neon_vtrn (struct expand_vec_perm_d *d)
 {
   unsigned int i, odd, mask, nelt = d->nelt;
-  rtx out0, out1, in0, in1, x;
+  rtx out0, out1, in0, in1;
   rtx (*gen)(rtx, rtx, rtx, rtx);
 
   if (GET_MODE_UNIT_SIZE (d->vmode) >= 8)
@@ -28134,14 +28129,14 @@ arm_evpc_neon_vtrn (struct expand_vec_perm_d *d)
   in1 = d->op1;
   if (BYTES_BIG_ENDIAN)
 {
-  x = in0, in0 = in1, in1 = x;
+  std::swap (in0, in1);
   odd = !odd;
 }
 
   out0 = d->target;
   out1 = gen_reg_rtx (d->vmode);
   if (odd)
-x = out0, out0 = out1, out1 = x;
+std::swap (out0, out1);
 
   emit_insn (gen (out0, in0, in1, out1));
   return true;
@@ -28264,14 +28259,11 @@ arm_expand_vec_perm_const_1 (struct expand_vec_perm_d *d)
   if (d->perm[0] >= d->nelt)
 {
   unsigned i, nelt = d->nelt;
-  rtx x;
 
   for (i = 0; i < nelt; ++i)
 	d->perm[i] = (d->perm[i] + nelt) & (2 * nelt - 1);
 
-  x = d->op0;
-  d->op0 = d->op1;
-  d->op1 = x;
+  std::swap (d->op0, d->op1);
 }
 
   if (TARGET_NEON)

[AARCH32][ACLE][NEON] Implement vcvt_s32_f32 and vcvt_u32_f32 NEON intrinsics.

[AARCH64][ACLE][NEON] Implement vcvt_s64_f64 and vcvt_u64_f64 NEON intrinsics.

[AARCH64][ACLE] Implement __ARM_FP_FENV_ROUNDING in aarch64 backend.

[AArch32][NEON] Implementing vmaxnmQ_ST and vminnmQ_ST intrinsics.

[AArch32][NEON] Implementing vmaxnmQ_ST and vminnmQ_ST intrinsincs.

[PATCH][AARCH64][NEON] Enabling V*HFmode simd immediate loads.

Re: [AARCH64][PATCH 3/3] Adding tests to check proper error reporting of out of bounds accesses to vmulx_lane* NEON intrinsics

[AARCH64] Adding constant folding for __builtin_fmulx* with scalar 32 and 64 bit arguments

Re: [AARCH64][PATCH 2/3] Implementing vmulx_lane NEON intrinsic variants + Changelog

Re: [AARCH64][PATCH 2/3] Implementing vmulx_lane NEON intrinsic variants

[AARCH64][PATCH 3/3] Adding tests to check proper error reporting of out of bounds accesses to vmulx_lane* NEON intrinsics

[AARCH64][PATCH 2/3] Implementing vmulx_lane NEON intrinsic variants

[AARCH64][PATCH 1/3] Implementing the variants of the vmulx_ NEON intrinsic

[PATCH] [ARM] Replacing variable swaps that use a temporary variable with a call to std::swap in gcc/config/arm/arm.c

14 matches

Site Navigation

Mail list logo

Footer information