[AARCH32][ACLE][NEON] Implement vcvt*_s32_f32 and vcvt*_u32_f32 NEON intrinsics.
[AARCH32][ACLE][NEON] Implement vcvt*_s32_f32 and vcvt*_u32_f32 NEON intrinsics. This patch implements all the vcvtRQ_s32_f32 and vcvtRQ_u32_f32 vector intrinsics, where R is ['',a,m,n,p] and Q is ['',q]. The intrinsics were implemented using builtin functions mapping to the existing neon_vcvt pattern, which was extended to cover the above cross product of all variants. In addition, a new unary type qualifier for builtins, UNOPUS, was added to enable the builtin functions to be called without casts. Cross tested on arm-none-eabi, arm-none-linux-gnueabi, arm-none-linux-gnueabihf, armeb-none-eabi. --- gcc/ 2015-XX-XX Bilyan Borisov * config/arm/arm-builtins.c (arm_unopus_qualifiers): New qualifier. * config/aarch64/arm_neon.h (vcvta_s32_f32): New intrinsic. (vcvta_u32_f32): Likewise. (vcvtaq_s32_f32): Likewise. (vcvtaq_u32_f32): Likewise. (vcvtm_s32_f32): Likewise. (vcvtm_u32_f32): Likewise. (vcvtmq_s32_f32): Likewise. (vcvtmq_u32_f32): Likewise. (vcvtn_s32_f32): Likewise. (vcvtn_u32_f32): Likewise. (vcvtnq_s32_f32): Likewise. (vcvtnq_u32_f32): Likewise. (vcvtp_s32_f32): Likewise. (vcvtp_u32_f32): Likewise. (vcvtpq_s32_f32): Likewise. (vcvtpq_u32_f32): Likewise. * config/arm/arm_neon_builtins.def (vcvtas): New builtin. (vcvtau): Likewise. (vcvtps): Likewise. (vcvtpu): Likewise. (vcvtms): Likewise. (vcvtmu): Likewise. (vcvtns): Likewise. (vcvtnu): Likewise. * config/arm/iterators.md (VCVTR_US): New int iterator. (VQMOVN): Modified int iterator. (rndmode): New int attribute. * config/arm/neon.md (neon_vcvt ): Modified pattern. * config/arm/unspecs.md (UNSPEC_VCVTA_S): New unspec definition. (UNSPEC_VCVTA_U): Likewise. (UNSPEC_VCVTM_S): Likewise. (UNSPEC_VCVTM_U): Likewise. (UNSPEC_VCVTN_S): Likewise. (UNSPEC_VCVTN_U): Likewise. (UNSPEC_VCVTP_S): Likewise. (UNSPEC_VCVTP_U): Likewise. gcc/testsuite/ 2015-XX-XX Bilyan Borisov * gcc.target/arm/neon/vcvta_s32_f32_1.c: New. * gcc.target/arm/neon/vcvta_u32_f32_1.c: Likewise. * gcc.target/arm/neon/vcvtaq_s32_f32_1.c: Likewise. * gcc.target/arm/neon/vcvtaq_u32_f32_1.c: Likewise. * gcc.target/arm/neon/vcvtm_s32_f32_1.c: Likewise. * gcc.target/arm/neon/vcvtm_u32_f32_1.c: Likewise. * gcc.target/arm/neon/vcvtmq_s32_f32_1.c: Likewise. * gcc.target/arm/neon/vcvtmq_u32_f32_1.c: Likewise. * gcc.target/arm/neon/vcvtn_s32_f32_1.c: Likewise. * gcc.target/arm/neon/vcvtn_u32_f32_1.c: Likewise. * gcc.target/arm/neon/vcvtnq_s32_f32_1.c: Likewise. * gcc.target/arm/neon/vcvtnq_u32_f32_1.c: Likewise. * gcc.target/arm/neon/vcvtp_s32_f32_1.c: Likewise. * gcc.target/arm/neon/vcvtp_u32_f32_1.c: Likewise. * gcc.target/arm/neon/vcvtpq_s32_f32_1.c: Likewise. * gcc.target/arm/neon/vcvtpq_u32_f32_1.c: Likewise. diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index 11cd17d0b8f3c29ccbe16cb463a17d55ba0fa1e3..d635372c3257b33f9ca353f150e68bb6f9621a8d 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -87,6 +87,12 @@ arm_bswap_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_unsigned, qualifier_unsigned }; #define BSWAP_QUALIFIERS (arm_bswap_qualifiers) +/* unsigned T (T). */ +static enum arm_type_qualifiers +arm_unopus_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_unsigned, qualifier_none }; +#define UNOPUS_QUALIFIERS (arm_unopus_qualifiers) + /* T (T, T [maybe_immediate]). */ static enum arm_type_qualifiers arm_binop_qualifiers[SIMD_MAX_BUILTIN_ARGS] diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h index 0a33d21f2fcf8a1074fb62e89f4418295d446db5..fc3f5aa24c88f3aec1cfd234f9ca0aec11e1c612 100644 --- a/gcc/config/arm/arm_neon.h +++ b/gcc/config/arm/arm_neon.h @@ -6356,6 +6356,106 @@ vcvtq_u32_f32 (float32x4_t __a) } #pragma GCC push_options +#pragma GCC target ("fpu=neon-fp-armv8") +__extension__ static __inline int32x2_t __attribute__ ((__always_inline__)) +vcvta_s32_f32 (float32x2_t __a) +{ + return __builtin_neon_vcvtasv2sf (__a); +} + +__extension__ static __inline uint32x2_t __attribute__ ((__always_inline__)) +vcvta_u32_f32 (float32x2_t __a) +{ + return __builtin_neon_vcvtauv2sf_us (__a); +} + +__extension__ static __inline int32x4_t __attribute__ ((__always_inline__)) +vcvtaq_s32_f32 (float32x4_t __a) +{ + return __builtin_neon_vcvtasv4sf (__a); +} + +__extension__ static __inline uint32x4_t __attribute__ ((__always_inline__)) +vcvtaq_u32_f32 (float32x4_t __a) +{ + return __builtin_neon_vcvtauv4sf_us (__a); +} + +__extension__ static __inline int32x2_t __attribute__ ((__always_inline__)) +vcvtm_s32_f32
[AARCH64][ACLE][NEON] Implement vcvt*_s64_f64 and vcvt*_u64_f64 NEON intrinsics.
This patch implements all the vcvtR_s64_f64 and vcvtR_u64_f64 vector intrinsics, where R is ['',a,m,n,p]. Since these intrinsics are identical in semantics to the corresponding scalar variants, they are implemented in terms of them, with appropriate packing and unpacking of vector arguments. New test cases, covering all the intrinsics were also added. Cross tested on aarch64-none-elf and aarch64-none-linux-gnu. Bootstrapped and tested on aarch64-none-linux-gnu. --- gcc/ 2015-XX-XX Bilyan Borisov * config/aarch64/arm_neon.h (vcvt_s64_f64): New intrinsic. (vcvt_u64_f64): Likewise. (vcvta_s64_f64): Likewise. (vcvta_u64_f64): Likewise. (vcvtm_s64_f64): Likewise. (vcvtm_u64_f64): Likewise. (vcvtn_s64_f64): Likewise. (vcvtn_u64_f64): Likewise. (vcvtp_s64_f64): Likewise. (vcvtp_u64_f64): Likewise. gcc/testsuite/ 2015-XX-XX Bilyan Borisov * gcc.target/aarch64/simd/vcvt_s64_f64_1.c: New. * gcc.target/aarch64/simd/vcvt_u64_f64_1.c: Likewise. * gcc.target/aarch64/simd/vcvta_s64_f64_1.c: Likewise. * gcc.target/aarch64/simd/vcvta_u64_f64_1.c: Likewise. * gcc.target/aarch64/simd/vcvtm_s64_f64_1.c: Likewise. * gcc.target/aarch64/simd/vcvtm_u64_f64_1.c: Likewise. * gcc.target/aarch64/simd/vcvtn_s64_f64_1.c: Likewise. * gcc.target/aarch64/simd/vcvtn_u64_f64_1.c: Likewise. * gcc.target/aarch64/simd/vcvtp_s64_f64_1.c: Likewise. * gcc.target/aarch64/simd/vcvtp_u64_f64_1.c: Likewise. diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index c78f2524fa62b1ceedce86ee64cadfa67d3b0d0c..1e19a9e2ed96b7b7c5715be41b98e9c1407a74f9 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -13218,6 +13218,18 @@ vcvtq_u32_f32 (float32x4_t __a) return __builtin_aarch64_lbtruncuv4sfv4si_us (__a); } +__extension__ static __inline int64x1_t __attribute__ ((__always_inline__)) +vcvt_s64_f64 (float64x1_t __a) +{ + return (int64x1_t) {vcvtd_s64_f64 (__a[0])}; +} + +__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__)) +vcvt_u64_f64 (float64x1_t __a) +{ + return (uint64x1_t) {vcvtd_u64_f64 (__a[0])}; +} + __extension__ static __inline int64x2_t __attribute__ ((__always_inline__)) vcvtq_s64_f64 (float64x2_t __a) { @@ -13280,6 +13292,18 @@ vcvtaq_u32_f32 (float32x4_t __a) return __builtin_aarch64_lrounduv4sfv4si_us (__a); } +__extension__ static __inline int64x1_t __attribute__ ((__always_inline__)) +vcvta_s64_f64 (float64x1_t __a) +{ + return (int64x1_t) {vcvtad_s64_f64 (__a[0])}; +} + +__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__)) +vcvta_u64_f64 (float64x1_t __a) +{ + return (uint64x1_t) {vcvtad_u64_f64 (__a[0])}; +} + __extension__ static __inline int64x2_t __attribute__ ((__always_inline__)) vcvtaq_s64_f64 (float64x2_t __a) { @@ -13342,6 +13366,18 @@ vcvtmq_u32_f32 (float32x4_t __a) return __builtin_aarch64_lflooruv4sfv4si_us (__a); } +__extension__ static __inline int64x1_t __attribute__ ((__always_inline__)) +vcvtm_s64_f64 (float64x1_t __a) +{ + return (int64x1_t) {vcvtmd_s64_f64 (__a[0])}; +} + +__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__)) +vcvtm_u64_f64 (float64x1_t __a) +{ + return (uint64x1_t) {vcvtmd_u64_f64 (__a[0])}; +} + __extension__ static __inline int64x2_t __attribute__ ((__always_inline__)) vcvtmq_s64_f64 (float64x2_t __a) { @@ -13404,6 +13440,18 @@ vcvtnq_u32_f32 (float32x4_t __a) return __builtin_aarch64_lfrintnuv4sfv4si_us (__a); } +__extension__ static __inline int64x1_t __attribute__ ((__always_inline__)) +vcvtn_s64_f64 (float64x1_t __a) +{ + return (int64x1_t) {vcvtnd_s64_f64 (__a[0])}; +} + +__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__)) +vcvtn_u64_f64 (float64x1_t __a) +{ + return (uint64x1_t) {vcvtnd_u64_f64 (__a[0])}; +} + __extension__ static __inline int64x2_t __attribute__ ((__always_inline__)) vcvtnq_s64_f64 (float64x2_t __a) { @@ -13466,6 +13514,18 @@ vcvtpq_u32_f32 (float32x4_t __a) return __builtin_aarch64_lceiluv4sfv4si_us (__a); } +__extension__ static __inline int64x1_t __attribute__ ((__always_inline__)) +vcvtp_s64_f64 (float64x1_t __a) +{ + return (int64x1_t) {vcvtpd_s64_f64 (__a[0])}; +} + +__extension__ static __inline uint64x1_t __attribute__ ((__always_inline__)) +vcvtp_u64_f64 (float64x1_t __a) +{ + return (uint64x1_t) {vcvtpd_u64_f64 (__a[0])}; +} + __extension__ static __inline int64x2_t __attribute__ ((__always_inline__)) vcvtpq_s64_f64 (float64x2_t __a) { diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vcvt_s64_f64_1.c b/gcc/testsuite/gcc.target/aarch64/simd/vcvt_s64_f64_1.c new file mode 100644 index ..02f59fc7e58c988141f8f00c8866c71f2d5d660b --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/simd/vcvt_s64_f64_1.c @@ -0,0 +1,25 @@ +/* { dg-do run } *
[AARCH64][ACLE] Implement __ARM_FP_FENV_ROUNDING in aarch64 backend.
This patch implements the __ARM_FP_FENV_ROUNDING macro in the aarch64 backend. AArch64 supports configurable rounding modes, which can be set using the standard C fesetround() function. According to the ACLE 2.0 specification, __ARM_FP_FENV_ROUNDING is defined to 1 only when fesetround() is provided. Since newlib doesn't provide this function, and since armv8 aarch64 hardware provides support for configurable rounding modes for scalar and simd, we only define the macro when we are building a compiler targeting glibc (i.e. target aarch64-none-linux-gnu), which does provide fesetround(). Cross tested on aarch64-none-elf and aarch64-none-linux-gnu. Bootstrapped and tested on aarch64-none-linux-gnu. --- gcc/ 2015-XX-XX Bilyan Borisov * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): New macro definition. gcc/testsuite/ 2015-XX-XX Bilyan Borisov * gcc.target/aarch64/fesetround-checking-baremetal.c: New. * gcc.target/aarch64/fesetround-checking-linux.c: Likewise. diff --git a/gcc/config/aarch64/aarch64-c.c b/gcc/config/aarch64/aarch64-c.c index ad95c78b9895a33da3e5a0ec6328219b887ade37..0e07d74639340176f51587e37e5ea91b57fdeee1 100644 --- a/gcc/config/aarch64/aarch64-c.c +++ b/gcc/config/aarch64/aarch64-c.c @@ -97,8 +97,15 @@ aarch64_update_cpp_builtins (cpp_reader *pfile) aarch64_def_or_undef (TARGET_SIMD, "__ARM_FEATURE_NUMERIC_MAXMIN", pfile); aarch64_def_or_undef (TARGET_SIMD, "__ARM_NEON", pfile); - - + /* Since fesetround () is not present in newlib, but it's present in glibc, we + only define the __ARM_FP_FENV_ROUNDING macro to 1 when glibc is used + i.e. on aarch64-none-linux-gnu. The OPTION_GLIBC macro is undefined on + aarch64-none-elf, and is defined on aarch64-none-linux-gnu to an expression + that evaluates to 1 when targetting glibc. The musl, bionic, and uclibc + cases haven't been investigated, so don't do any action in that case. */ +#ifdef OPTION_GLIBC +aarch64_def_or_undef (OPTION_GLIBC, "__ARM_FP_FENV_ROUNDING", pfile); +#endif aarch64_def_or_undef (TARGET_CRC32, "__ARM_FEATURE_CRC32", pfile); cpp_undef (pfile, "__AARCH64_CMODEL_TINY__"); diff --git a/gcc/testsuite/gcc.target/aarch64/fesetround-checking-baremetal.c b/gcc/testsuite/gcc.target/aarch64/fesetround-checking-baremetal.c new file mode 100644 index ..1202dfa2e81923f0d2c9b6be1eb71e3192d5d974 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/fesetround-checking-baremetal.c @@ -0,0 +1,18 @@ +/* { dg-do compile { target { aarch64*-*-elf* } } } */ + +extern void abort (); + +int +main () +{ + /* According to ACLE 2.0, __ARM_FP_FENV_ROUNDING macro should be defined to 1 + when fesetround () is supported. Since aarch64-none-elf which uses newlib + doesn't support this, we error if the macro is defined to 1. + */ +#if defined (__ARM_FP_FENV_ROUNDING) +#error "According to ACLE 2.0, __ARM_FP_FENV_ROUNDING macro should be defined \ +to 1 only when fesetround () is supported." +#else + return 0; +#endif +} diff --git a/gcc/testsuite/gcc.target/aarch64/fesetround-checking-linux.c b/gcc/testsuite/gcc.target/aarch64/fesetround-checking-linux.c new file mode 100644 index ..d79f384d672bf8d0847b41ab8cc2cc594a805dca --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/fesetround-checking-linux.c @@ -0,0 +1,18 @@ +/* { dg-do compile { target { aarch64*-*-linux* } } } */ + +extern void abort (); + +int +main () +{ + /* According to ACLE 2.0, __ARM_FP_FENV_ROUNDING macro should be defined to 1 + when fesetround () is supported. In our case, this is on + aarch64-none-linux-gnu. + */ +#if defined (__ARM_FP_FENV_ROUNDING) && (__ARM_FP_FENV_ROUNDING == 1) + return 0; +#else +#error "According to ACLE 2.0, __ARM_FP_FENV_ROUNDING macro should be defined \ +to 1 when fesetround () is supported." +#endif +}
[AArch32][NEON] Implementing vmaxnmQ_ST and vminnmQ_ST intrinsics.
This patch implements the vmaxnmQ_ST and vminnmQ_ST intrinsics. The current builtin registration code is deficient since it can't access standard pattern names, to which vmaxnmQ_ST and vminnmQ_ST map directly. Thus, to enable the vectoriser to have access to these intrinsics, we implement them using builtin functions, which we expand to the proper standard pattern using a define_expand. This patch also implements the __ARM_FEATURE_NUMERIC_MAXMIN macro, which is defined when __ARM_ARCH >= 8, and which enables the intrinsics. Cross tested on arm-none-eabi, armeb-none-eabi, arm-none-linux-gnueabi, and arm-none-linux-gnueabihf. Bootstrapped and tested on arm-none-linux-gnueabihf. --- gcc/ 2015-XX-XX Bilyan Borisov * config/arm/arm-c.c (arm_cpu_builtins): New macro definition. * config/arm/arm_neon.h (vmaxnm_f32): New intrinsinc. (vmaxnmq_f32): Likewise. (vminnm_f32): Likewise. (vminnmq_f32): Likewise. * config/arm/arm_neon_builtins.def (vmaxnm): New builtin. (vminnm): Likewise. * config/arm/neon.md (neon_, VCVTF): New expander. gcc/testsuite/ 2015-XX-XX Bilyan Borisov * gcc.target/arm/simd/vmaxnm_f32_1.c: New. * gcc.target/arm/simd/vmaxnmq_f32_1.c: Likewise. * gcc.target/arm/simd/vminnm_f32_1.c: Likewise. * gcc.target/arm/simd/vminnmq_f32_1.c: Likewise. diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c index 7dee28ec52df68f8c7a60fe66e1b049fed39c1c0..7b63bdcf86c079288611f79ed89d6540b348fe82 100644 --- a/gcc/config/arm/arm-c.c +++ b/gcc/config/arm/arm-c.c @@ -83,6 +83,9 @@ arm_cpu_builtins (struct cpp_reader* pfile) ((TARGET_ARM_ARCH >= 5 && !TARGET_THUMB) || TARGET_ARM_ARCH_ISA_THUMB >=2)); + def_or_undef_macro (pfile, "__ARM_FEATURE_NUMERIC_MAXMIN", + TARGET_ARM_ARCH >= 8); + def_or_undef_macro (pfile, "__ARM_FEATURE_SIMD32", TARGET_INT_SIMD); builtin_define_with_int_value ("__ARM_SIZEOF_MINIMAL_ENUM", diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h index 0a33d21f2fcf8a1074fb62e89f4418295d446db5..2e28621cd3a3bf6682ce5353cfb1fd5d8d6c55ad 100644 --- a/gcc/config/arm/arm_neon.h +++ b/gcc/config/arm/arm_neon.h @@ -2889,6 +2889,34 @@ vmaxq_f32 (float32x4_t __a, float32x4_t __b) return (float32x4_t)__builtin_neon_vmaxfv4sf (__a, __b); } +#pragma GCC push_options +#pragma GCC target ("fpu=neon-fp-armv8") +__extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) +vmaxnm_f32 (float32x2_t a, float32x2_t b) +{ + return (float32x2_t)__builtin_neon_vmaxnmv2sf (a, b); +} + +__extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) +vmaxnmq_f32 (float32x4_t a, float32x4_t b) +{ + return (float32x4_t)__builtin_neon_vmaxnmv4sf (a, b); +} + +__extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) +vminnm_f32 (float32x2_t a, float32x2_t b) +{ + return (float32x2_t)__builtin_neon_vminnmv2sf (a, b); +} + +__extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) +vminnmq_f32 (float32x4_t a, float32x4_t b) +{ + return (float32x4_t)__builtin_neon_vminnmv4sf (a, b); +} +#pragma GCC pop_options + + __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__)) vmaxq_u8 (uint8x16_t __a, uint8x16_t __b) { diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def index 0b719df760747af7642bd14ab14a9b2144d43359..1d3b6e9b6a08a3cf3b0d6f76bf340208919c9b13 100644 --- a/gcc/config/arm/arm_neon_builtins.def +++ b/gcc/config/arm/arm_neon_builtins.def @@ -126,6 +126,9 @@ VAR6 (BINOP, vmins, v8qi, v4hi, v2si, v16qi, v8hi, v4si) VAR6 (BINOP, vminu, v8qi, v4hi, v2si, v16qi, v8hi, v4si) VAR2 (BINOP, vminf, v2sf, v4sf) +VAR2 (BINOP, vmaxnm, v2sf, v4sf) +VAR2 (BINOP, vminnm, v2sf, v4sf) + VAR3 (BINOP, vpmaxs, v8qi, v4hi, v2si) VAR3 (BINOP, vpmaxu, v8qi, v4hi, v2si) VAR1 (BINOP, vpmaxf, v2sf) diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index 94c63fd3dbc071291844fbe7732435465fbc0ada..b745199fe25d7afd7d93598e12e74d8efb4863fe 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -2354,6 +2354,18 @@ [(set_attr "type" "neon_fp_minmax_s")] ) +;; Expander for vnm intrinsics. +(define_expand "neon_" + [(unspec:VCVTF [(match_operand:VCVTF 0 "s_register_operand" "") + (match_operand:VCVTF 1 "s_register_operand" "") + (match_operand:VCVTF 2 "s_register_operand" "")] + VMAXMINFNM)] + "TARGET_NEON && TARGET_FPU_ARMV8" +{ + emit_insn (gen_3 (operands[0], operands[1], operands[2])); + DONE; +}) + ;; Vector forms for the IEEE-754 fmax()/fmin() functions (define_insn "3" [(set (match_operand:VCVTF 0 "s_register_operand" "=w") diff --git a/gcc/testsuite/gcc.target/arm/simd/vmaxnm_f32_1.c b/
[AArch32][NEON] Implementing vmaxnmQ_ST and vminnmQ_ST intrinsincs.
This patch implements the vmaxnmQ_ST and vminnmQ_ST intrinsincs. It also implements the __ARM_FEATURE_NUMERIC_MAXMIN macro, which is defined when __ARM_ARCH >= 8, and which enables the intrinsincs. Tested on arm-none-eabi, armeb-none-eabi, arm-none-linux-gnueabihf. --- gcc/ 2015-XX-XX Bilyan Borisov * config/arm/arm-c.c (arm_cpu_builtins): New macro definition. * config/arm/arm_neon.h (vmaxnm_f32): New intrinsinc. (vmaxnmq_f32): Likewise. (vminnm_f32): Likewise. (vminnmq_f32): Likewise. * config/arm/arm_neon_builtins.def (vmaxnm): New builtin. (vminnm): Likewise. * config/arm/iterators.md (VMAXMINNM): New iterator. (maxmin): Updated iterator. * config/arm/neon.md (neon_v, VCVTF): New pattern. * config/arm/unspecs.md (UNSPEC_VMAXNM): New unspec. (UNSPEC_VMINNM): Likewise. gcc/testsuite/ 2015-XX-XX Bilyan Borisov * gcc.target/arm/simd/vmaxnm_f32_1.c: New. * gcc.target/arm/simd/vmaxnmq_f32_1.c: Likewise. * gcc.target/arm/simd/vminnm_f32_1.c: Likewise. * gcc.target/arm/simd/vminnmq_f32_1.c: Likewise. diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c index 7dee28ec52df68f8c7a60fe66e1b049fed39c1c0..7b63bdcf86c079288611f79ed89d6540b348fe82 100644 --- a/gcc/config/arm/arm-c.c +++ b/gcc/config/arm/arm-c.c @@ -83,6 +83,9 @@ arm_cpu_builtins (struct cpp_reader* pfile) ((TARGET_ARM_ARCH >= 5 && !TARGET_THUMB) || TARGET_ARM_ARCH_ISA_THUMB >=2)); + def_or_undef_macro (pfile, "__ARM_FEATURE_NUMERIC_MAXMIN", + TARGET_ARM_ARCH >= 8); + def_or_undef_macro (pfile, "__ARM_FEATURE_SIMD32", TARGET_INT_SIMD); builtin_define_with_int_value ("__ARM_SIZEOF_MINIMAL_ENUM", diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h index 0a33d21f2fcf8a1074fb62e89f4418295d446db5..0c8c08cc404cbc446db648d41f0773d0b4798a3a 100644 --- a/gcc/config/arm/arm_neon.h +++ b/gcc/config/arm/arm_neon.h @@ -2907,6 +2907,33 @@ vmaxq_u32 (uint32x4_t __a, uint32x4_t __b) return (uint32x4_t)__builtin_neon_vmaxuv4si ((int32x4_t) __a, (int32x4_t) __b); } +#pragma GCC push_options +#pragma GCC target ("fpu=neon-fp-armv8") +__extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) +vmaxnm_f32 (float32x2_t a, float32x2_t b) +{ + return (float32x2_t)__builtin_neon_vmaxnmv2sf (a, b); +} + +__extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) +vmaxnmq_f32 (float32x4_t a, float32x4_t b) +{ + return (float32x4_t)__builtin_neon_vmaxnmv4sf (a, b); +} + +__extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) +vminnm_f32 (float32x2_t a, float32x2_t b) +{ + return (float32x2_t)__builtin_neon_vminnmv2sf (a, b); +} + +__extension__ static __inline float32x4_t __attribute__ ((__always_inline__)) +vminnmq_f32 (float32x4_t a, float32x4_t b) +{ + return (float32x4_t)__builtin_neon_vminnmv4sf (a, b); +} +#pragma GCC pop_options + __extension__ static __inline int8x8_t __attribute__ ((__always_inline__)) vmin_s8 (int8x8_t __a, int8x8_t __b) { diff --git a/gcc/config/arm/arm_neon_builtins.def b/gcc/config/arm/arm_neon_builtins.def index 0b719df760747af7642bd14ab14a9b2144d43359..1d3b6e9b6a08a3cf3b0d6f76bf340208919c9b13 100644 --- a/gcc/config/arm/arm_neon_builtins.def +++ b/gcc/config/arm/arm_neon_builtins.def @@ -126,6 +126,9 @@ VAR6 (BINOP, vmins, v8qi, v4hi, v2si, v16qi, v8hi, v4si) VAR6 (BINOP, vminu, v8qi, v4hi, v2si, v16qi, v8hi, v4si) VAR2 (BINOP, vminf, v2sf, v4sf) +VAR2 (BINOP, vmaxnm, v2sf, v4sf) +VAR2 (BINOP, vminnm, v2sf, v4sf) + VAR3 (BINOP, vpmaxs, v8qi, v4hi, v2si) VAR3 (BINOP, vpmaxu, v8qi, v4hi, v2si) VAR1 (BINOP, vpmaxf, v2sf) diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md index 6a541251ed1e5d7c766aca04f0da97ba6d470541..e2f7cea89688c67d841dfef4c5a4e6e003660c63 100644 --- a/gcc/config/arm/iterators.md +++ b/gcc/config/arm/iterators.md @@ -308,6 +308,8 @@ (define_int_iterator VMAXMINF [UNSPEC_VMAX UNSPEC_VMIN]) +(define_int_iterator VMAXMINNM [UNSPEC_VMAXNM UNSPEC_VMINNM]) + (define_int_iterator VPADDL [UNSPEC_VPADDL_S UNSPEC_VPADDL_U]) (define_int_iterator VPADAL [UNSPEC_VPADAL_S UNSPEC_VPADAL_U]) @@ -741,6 +743,7 @@ (UNSPEC_VMIN "min") (UNSPEC_VMIN_U "min") (UNSPEC_VPMAX "max") (UNSPEC_VPMAX_U "max") (UNSPEC_VPMIN "min") (UNSPEC_VPMIN_U "min") + (UNSPEC_VMAXNM "maxnm") (UNSPEC_VMINNM "minnm") ]) (define_int_attr shift_op [ diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index 62fb6daae9983470faf2c9cc686f5181b8bd7cb6..1b48451b5ee559c332573860d8a3aea0bb3a58ad 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -2354,6 +2354,16 @@ [(set_attr "type" "neon_fp_minmax_s")] ) +(define_insn "neon_v" + [(set (match_operand:VCVTF 0 &
[PATCH][AARCH64][NEON] Enabling V*HFmode simd immediate loads.
This patch adds support for loading vector 16bit floating point immediates (modes V*HF) using a movi instruction. We leverage the existing code that does checking for an 8 bit pattern in a 64/128-bit long splattered version of the concatenated bit pattern representations of the individual constant elements of the vector. This enables us to load a variety of constants, since the movi instruction also comes with an up to 24 bit immediate left shift encoding (in multiples of 8). A new testcase was added that checks for presence of movi instructions and for correctness of results. Tested on aarch64-none-elf, aarch64_be-none-elf, bootstrapped on aarch64-none-linux-gnu. --- gcc/ 2015-XX-XX Bilyan Borisov * config/aarch64/aarch64.c (aarch64_simd_container_mode): Added HFmode cases. (aarch64_vect_float_const_representable_p): Updated comment. (aarch64_simd_valid_immediate): Added support for V*HF arguments. (aarch64_output_simd_mov_immediate): Added check for HFmode. gcc/testsuite/ 2015-XX-XX Bilyan Borisov * gcc.target/aarch64/fp16/f16_mov_immediate_simd_1.c: New. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index ae4cfb336a827a63a6baadefcb5646a9dbfb7523..bb6fce0a829d634a7694710e8a8c9a1c3e841abd 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -10250,6 +10250,8 @@ aarch64_simd_container_mode (machine_mode mode, unsigned width) return V2DFmode; case SFmode: return V4SFmode; + case HFmode: + return V8HFmode; case SImode: return V4SImode; case HImode: @@ -10266,6 +10268,8 @@ aarch64_simd_container_mode (machine_mode mode, unsigned width) { case SFmode: return V2SFmode; + case HFmode: + return V4HFmode; case SImode: return V2SImode; case HImode: @@ -10469,7 +10473,12 @@ sizetochar (int size) /* Return true iff x is a uniform vector of floating-point constants, and the constant can be represented in quarter-precision form. Note, as aarch64_float_const_representable - rejects both +0.0 and -0.0, we will also reject +0.0 and -0.0. */ + rejects both +0.0 and -0.0, we will also reject +0.0 and -0.0. + Also note that this won't ever be called for V*HFmode vectors, + since in aarch64_simd_valid_immediate () we check for the mode + and handle these vector types differently from other floating + point vector modes. */ + static bool aarch64_vect_float_const_representable_p (rtx x) { @@ -10505,7 +10514,10 @@ aarch64_simd_valid_immediate (rtx op, machine_mode mode, bool inverse, unsigned int invmask = inverse ? 0xff : 0; int eshift, emvn; - if (GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT) + /* Ignore V*HFmode vectors, they are handled below with the integer + code. */ + if (GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT + && GET_MODE_INNER (mode) != HFmode) { if (! (aarch64_simd_imm_zero_p (op, mode) || aarch64_vect_float_const_representable_p (op))) @@ -10530,15 +10542,26 @@ aarch64_simd_valid_immediate (rtx op, machine_mode mode, bool inverse, rtx el = CONST_VECTOR_ELT (op, BYTES_BIG_ENDIAN ? (n_elts - 1 - i) : i); unsigned HOST_WIDE_INT elpart; - gcc_assert (CONST_INT_P (el)); - elpart = INTVAL (el); + if (CONST_INT_P (el)) + elpart = INTVAL (el); + /* Convert HFmode vector element to bit pattern. Logic below will catch + most common constants since for FP16 the sign and exponent are in the + top 6 bits and a movi with a left shift of 8 will catch all powers + of 2 that fit in a 16 bit floating point, and the 2 extra bits left + for the mantissa can cover some more non-power of 2 constants. With + a 0 left shift, we can cover constants of the form 1.xxx since we have + 8 bits only for the mantissa. */ + else if (CONST_DOUBLE_P (el) && GET_MODE_INNER (mode) == HFmode) + elpart = + real_to_target (NULL, CONST_DOUBLE_REAL_VALUE (el), HFmode); + else +gcc_unreachable (); for (unsigned int byte = 0; byte < innersize; byte++) { bytes[idx++] = (elpart & 0xff) ^ invmask; elpart >>= BITS_PER_UNIT; } - } /* Sanity check. */ @@ -11913,7 +11936,10 @@ aarch64_output_simd_mov_immediate (rtx const_vector, lane_count = width / info.element_width; mode = GET_MODE_INNER (mode); - if (GET_MODE_CLASS (mode) == MODE_FLOAT) + /* We handle HFmode vectors separately from the other floating point + vector modes. See aarch64_simd_valid_immediate (), but in short + we use a movi instruction rather than a fmov. */ + if (GET_MODE_CLASS (mode) == MODE_FLOAT && mode != HFmode) { gcc_assert (info.shift == 0 && ! info.mvn); /* For FP zero change it to a CONST_INT 0 and use the integer SIMD diff --git a/gcc/testsuite/gcc.target/aarch64/fp16/f16_mov_immediate_simd_1.c b/gcc/testsuite/gcc.target/aarch64/fp16/f16
Re: [AARCH64][PATCH 3/3] Adding tests to check proper error reporting of out of bounds accesses to vmulx_lane* NEON intrinsics
I've made the change you've requested. Changelog & patch description are below. Thanks, Bilyan --- This patch from the series adds tests that check for the proper error reporting of out of bounds accesses to all the vmulx_lane NEON intrinsics variants. The tests were added separately from the previous patch in order to reduce its size. gcc/testsuite/ 2015-XX-XX Bilyan Borisov * gcc.target/aarch64/advsimd-intrinsics/vmulx_lane_f32_indices_1.c: New. * gcc.target/aarch64/advsimd-intrinsics/vmulx_lane_f64_indices_1.c: New. * gcc.target/aarch64/advsimd-intrinsics/vmulx_laneq_f32_indices_1.c: New. * gcc.target/aarch64/advsimd-intrinsics/vmulx_laneq_f64_indices_1.c: New. * gcc.target/aarch64/advsimd-intrinsics/vmulxd_lane_f64_indices_1.c: New. * gcc.target/aarch64/advsimd-intrinsics/vmulxd_laneq_f64_indices_1.c: New. * gcc.target/aarch64/advsimd-intrinsics/vmulxq_lane_f32_indices_1.c: New. * gcc.target/aarch64/advsimd-intrinsics/vmulxq_lane_f64_indices_1.c: New. * gcc.target/aarch64/advsimd-intrinsics/vmulxq_laneq_f32_indices_1.c: New. * gcc.target/aarch64/advsimd-intrinsics/vmulxq_laneq_f64_indices_1.c: New. * gcc.target/aarch64/advsimd-intrinsics/vmulxs_lane_f32_indices_1.c: New. * gcc.target/aarch64/advsimd-intrinsics/vmulxs_laneq_f32_indices_1.c: New. On 22/11/15 15:24, James Greenhalgh wrote: On Fri, Oct 30, 2015 at 09:32:07AM +, Bilyan Borisov wrote: Implementing vmulx_* and vmulx_lane* NEON intrinsics Hi all, This series of patches focuses on the different vmulx_ and vmulx_lane NEON intrinsics variants. All of the existing inlined assembly block implementations are replaced with newly defined __builtin functions, and the missing intrinsics are implemented with __builtins as well. The rationale for the change from assembly to __builtin is that the compiler would be able to do more optimisations like instruction scheduling. A new named md pattern was added for the new fmulx __builtin. Most vmulx_lane variants have been implemented as a combination of a vdup followed by a vmulx_, rather than as separate __builtins. The remaining vmulx_lane intrinsics (vmulx(s|d)_lane*) were implemented using __aarch64_vget_lane_any () and an appropriate vmulx. Four new nameless md patterns were added to replace all the different types of RTL generated from the combination of these intrinsics during the combine pass. The rationale for this change is that in this way we would be able to optimise away all uses of a dup followed by a fmulx to the appropriate fmulx lane variant instruction. New test cases were added for all the implemented intrinsics. Also new tests were added for the proper error reporting of out-of-bounds accesses to _lane intrinsics. Tested on targets aarch64-none-elf and aarch64_be-none-elf. diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulx_lane_f32_indices_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulx_lane_f32_indices_1.c new file mode 100644 index ..5681d5d21bc62e54e308c0a7c171f6f1b8969b71 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulx_lane_f32_indices_1.c @@ -0,0 +1,16 @@ +#include + +/* { dg-do compile } */ +/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */ +/* { dg-skip-if "" { arm*-*-* } } */ + +float32x2_t +f_vmulx_lane_f32 (float32x2_t v1, float32x2_t v2) +{ + float32x2_t res; + /* { dg-error "lane -1 out of range 0 - 1" "" { xfail arm*-*-* } 0 } */ + res = vmulx_lane_f32 (v1, v2, -1); + /* { dg-error "lane 2 out of range 0 - 1" "" { xfail arm*-*-* } 0 } */ Given the dg-skip-if directive, do we really need the cfail directive for arm*-*-*, surely the whole test is skipped regardless? Could you respin this patch without the xfails? Thanks, James diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulx_lane_f32_indices_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulx_lane_f32_indices_1.c new file mode 100644 index ..1494633e606d7b9b77d913dbc99b1a127cc9b661 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulx_lane_f32_indices_1.c @@ -0,0 +1,16 @@ +#include + +/* { dg-do compile } */ +/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */ +/* { dg-skip-if "" { arm*-*-* } } */ + +float32x2_t +f_vmulx_lane_f32 (float32x2_t v1, float32x2_t v2) +{ + float32x2_t res; + /* { dg-error "lane -1 out of range 0 - 1" "" { target *-*-* } 0 } */ + res = vmulx_lane_f32 (v1, v2, -1); + /* { dg-error "lane 2 out of range 0 - 1" "" { target *-*-* } 0 } */ + res = vmulx_lane_f32 (v1, v2, 2); + return
[AARCH64] Adding constant folding for __builtin_fmulx* with scalar 32 and 64 bit arguments
This patch adds an extension to aarch64_gimple_fold_builtin () that does constant folding on __builtin_fmulx* calls for 32 and 64 bit floating point scalar modes. We fold when both arguments are constant, as well as when only one is. The special cases of 0*inf, -0*inf, 0*-inf, and -0*-inf are also handled. The case for vector constant arguments will be dealt with in a future patch since the tests for that would be obscure and would unnecessarily complicate this patch. Added tests to check for proper handling of constant folding. Tested on targets aarch64-none-elf and aarch64_be-none-elf. --- gcc/ 2015-XX-XX Bilyan Borisov * config/aarch64/aarch64-builtins.c (aarch64_gimple_fold_builtin): Added constant folding. gcc/testsuite/ 2015-XX-XX Bilyan Borisov * gcc.target/aarch64/simd/vmulx.x: New. * gcc.target/aarch64/simd/vmulx_f64_2.c: Likewise. * gcc.target/aarch64/simd/vmulxd_f64_2.c: Likewise. * gcc.target/aarch64/simd/vmulxs_f32_2.c: Likewise. diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index a1998ed550ac801e4d80baae122bf58e394a563f..339054d344900c942d5ce7c047479de3bbb4e61b 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -1362,7 +1362,7 @@ aarch64_gimple_fold_builtin (gimple_stmt_iterator *gsi) if (fndecl) { int fcode = DECL_FUNCTION_CODE (fndecl); - int nargs = gimple_call_num_args (stmt); + unsigned nargs = gimple_call_num_args (stmt); tree *args = (nargs > 0 ? gimple_call_arg_ptr (stmt, 0) : &error_mark_node); @@ -1386,7 +1386,54 @@ aarch64_gimple_fold_builtin (gimple_stmt_iterator *gsi) new_stmt = gimple_build_assign (gimple_call_lhs (stmt), REDUC_MIN_EXPR, args[0]); break; - + BUILTIN_GPF (BINOP, fmulx, 0) + { + gcc_assert (nargs == 2); + bool a0_cst_p = TREE_CODE (args[0]) == REAL_CST; + bool a1_cst_p = TREE_CODE (args[1]) == REAL_CST; + if (a0_cst_p || a1_cst_p) + { + if (a0_cst_p && a1_cst_p) + { + tree t0 = TREE_TYPE (args[0]); + real_value a0 = (TREE_REAL_CST (args[0])); + real_value a1 = (TREE_REAL_CST (args[1])); + if (real_equal (&a1, &dconst0)) + std::swap (a0, a1); + /* According to real_equal (), +0 equals -0. */ + if (real_equal (&a0, &dconst0) && real_isinf (&a1)) + { + real_value res = dconst2; + res.sign = a0.sign ^ a1.sign; + new_stmt = +gimple_build_assign (gimple_call_lhs (stmt), + REAL_CST, + build_real (t0, res)); + } + else + new_stmt = + gimple_build_assign (gimple_call_lhs (stmt), + MULT_EXPR, + args[0], args[1]); + } + else /* a0_cst_p ^ a1_cst_p. */ + { + real_value const_part = a0_cst_p + ? TREE_REAL_CST (args[0]) : TREE_REAL_CST (args[1]); + if (!real_equal (&const_part, &dconst0) + && !real_isinf (&const_part)) + new_stmt = + gimple_build_assign (gimple_call_lhs (stmt), + MULT_EXPR, args[0], args[1]); + } + } + if (new_stmt) + { + gimple_set_vuse (new_stmt, gimple_vuse (stmt)); + gimple_set_vdef (new_stmt, gimple_vdef (stmt)); + } + break; + } default: break; } diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vmulx.x b/gcc/testsuite/gcc.target/aarch64/simd/vmulx.x new file mode 100644 index ..8968a64a95cb40a466dd77fea4e9f9f63ad707dc --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/simd/vmulx.x @@ -0,0 +1,46 @@ +#define PASS_ARRAY(...) {__VA_ARGS__} + +#define SETUP_TEST_CASE_VEC(I, INTRINSIC, BASE_TYPE, TYPE1, TYPE2, \ + VALS1, VALS2, EXPS, LEN, FM, Q_LD, Q_ST, \ + V1, V2) \ + do \ +{ \ + int i##I;\ + BASE_TYPE vec##I##_1_data[] = VALS1; \ + BASE_TYPE vec##I##_2_data[] = VALS2; \ + V1 TYPE1 vec##I##_1 = vld1##Q_LD##_##FM (vec##I##_1_data); \ + V2 TYPE2 vec##I##_2 = vld1##Q_LD##_##FM (vec##I##_2_data); \ + TYPE1 actual##I##_v = INTRINSIC (vec##I##_1, vec##I##_2); \ + volatile BASE_TYPE expected##I[] = EXPS;\ + BASE_TYPE actual##I[LEN]; \ + vst1##Q_ST##_##FM (actual##I, actual##I##_v);\ + for (i##I = 0; i##I < LEN; ++i##I) \ +if (actual##I[i##I] != expected##I[i##I])\ + abort ();\ +} \ + while (0)\ + +#define SETUP_TEST_CASE_SCALAR(I, INTRINSIC, TYPE, VAL1, VAL2, EXP) \ + do \ +{ \ + TYPE vec_##I##_1 = VAL1; \ + TYPE vec_##I##_2 = VAL2; \ + TYPE expected_##I = EXP; \ + volatile TYPE actual_##I = INTRINSIC (vec_##I##_1, vec_##I##_2); \ + if (actual_##I != expected_##I) \ +abort ();\ +} \ + while (0)\ + +/* Functions used to return values that won't be optimised away. */ +float32_t __attribute__
Re: [AARCH64][PATCH 2/3] Implementing vmulx_lane NEON intrinsic variants + Changelog
On 09/11/15 11:03, Bilyan Borisov wrote: On 03/11/15 11:16, James Greenhalgh wrote: On Fri, Oct 30, 2015 at 09:31:08AM +, Bilyan Borisov wrote: In this patch from the series, all vmulx_lane variants have been implemented as a vdup followed by a vmulx. Existing implementations of intrinsics were refactored to use this new approach. Several new nameless md patterns are added that will enable the combine pass to pick up the dup/fmulx combination and replace it with a proper fmulx[lane] instruction. In addition, test cases for all new intrinsics were added. Tested on targets aarch64-none-elf and aarch64_be-none-elf. Hi, I have a small style comment below. gcc/ 2015-XX-XX Bilyan Borisov * config/aarch64/arm_neon.h (vmulx_lane_f32): New. (vmulx_lane_f64): New. (vmulxq_lane_f32): Refactored & moved. (vmulxq_lane_f64): Refactored & moved. (vmulx_laneq_f32): New. (vmulx_laneq_f64): New. (vmulxq_laneq_f32): New. (vmulxq_laneq_f64): New. (vmulxs_lane_f32): New. (vmulxs_laneq_f32): New. (vmulxd_lane_f64): New. (vmulxd_laneq_f64): New. * config/aarch64/aarch64-simd.md (*aarch64_combine_dupfmulx1, VDQSF): New pattern. (*aarch64_combine_dupfmulx2, VDQF): New pattern. (*aarch64_combine_dupfmulx3): New pattern. (*aarch64_combine_vgetfmulx1, VDQF_DF): New pattern. I'm not sure I like the use of 1,2,3 for this naming scheme. Elsewhere in the file, this convention points to the number of operands a pattern requires (for example add3). I think elsewhere in the file we use: "*aarch64_mul3_elt" "*aarch64_mul3_elt_" "*aarch64_mul3_elt_to_128df" "*aarch64_mul3_elt_to_64v2df" Is there a reason not to follow that pattern? Thanks, James Hi, I've made the changes you've requested - the pattern names have been changed to follow better the naming conventions elsewhere in the file. Thanks, Bilyan Hi, You can find the new updated Changelog for this patch below. Thanks, Bilyan --- In this patch from the series, all vmulx_lane variants have been implemented as a vdup followed by a vmulx. Existing implementations of intrinsics were refactored to use this new approach. Several new nameless md patterns are added that will enable the combine pass to pick up the dup/fmulx combination and replace it with a proper fmulx[lane] instruction. In addition, test cases for all new intrinsics were added. Tested on targets aarch64-none-elf and aarch64_be-none-elf. gcc/ 2015-XX-XX Bilyan Borisov * config/aarch64/arm_neon.h (vmulx_lane_f32): New. (vmulx_lane_f64): Likewise. (vmulxq_lane_f32): Refactored & moved. (vmulxq_lane_f64): Likewise. (vmulx_laneq_f32): New. (vmulx_laneq_f64): Likewise. (vmulxq_laneq_f32): Likewise. (vmulxq_laneq_f64): Likewise. (vmulxs_lane_f32): Likewise. (vmulxs_laneq_f32): Likewise. (vmulxd_lane_f64): Likewise. (vmulxd_laneq_f64): Likewise. * config/aarch64/aarch64-simd.md (*aarch64_mulx_elt_, VDQSF): New pattern. (*aarch64_mulx_elt, VDQF): Likewise. (*aarch64_mulx_elt_to_64v2df): Likewise. (*aarch64_vgetfmulx, VDQF_DF): Likewise. gcc/testsuite/ 2015-XX-XX Bilyan Borisov * gcc.target/aarch64/simd/vmulx_lane_f32_1.c: New. * gcc.target/aarch64/simd/vmulx_lane_f64_1.c: New. * gcc.target/aarch64/simd/vmulx_laneq_f32_1.c: New. * gcc.target/aarch64/simd/vmulx_laneq_f64_1.c: New. * gcc.target/aarch64/simd/vmulxq_lane_f32_1.c: New. * gcc.target/aarch64/simd/vmulxq_lane_f64_1.c: New. * gcc.target/aarch64/simd/vmulxq_laneq_f32_1.c: New. * gcc.target/aarch64/simd/vmulxq_laneq_f64_1.c: New. * gcc.target/aarch64/simd/vmulxs_lane_f32_1.c: New. * gcc.target/aarch64/simd/vmulxs_laneq_f32_1.c: New. * gcc.target/aarch64/simd/vmulxd_lane_f64_1.c: New. * gcc.target/aarch64/simd/vmulxd_laneq_f64_1.c: New.
Re: [AARCH64][PATCH 2/3] Implementing vmulx_lane NEON intrinsic variants
On 03/11/15 11:16, James Greenhalgh wrote: On Fri, Oct 30, 2015 at 09:31:08AM +, Bilyan Borisov wrote: In this patch from the series, all vmulx_lane variants have been implemented as a vdup followed by a vmulx. Existing implementations of intrinsics were refactored to use this new approach. Several new nameless md patterns are added that will enable the combine pass to pick up the dup/fmulx combination and replace it with a proper fmulx[lane] instruction. In addition, test cases for all new intrinsics were added. Tested on targets aarch64-none-elf and aarch64_be-none-elf. Hi, I have a small style comment below. gcc/ 2015-XX-XX Bilyan Borisov * config/aarch64/arm_neon.h (vmulx_lane_f32): New. (vmulx_lane_f64): New. (vmulxq_lane_f32): Refactored & moved. (vmulxq_lane_f64): Refactored & moved. (vmulx_laneq_f32): New. (vmulx_laneq_f64): New. (vmulxq_laneq_f32): New. (vmulxq_laneq_f64): New. (vmulxs_lane_f32): New. (vmulxs_laneq_f32): New. (vmulxd_lane_f64): New. (vmulxd_laneq_f64): New. * config/aarch64/aarch64-simd.md (*aarch64_combine_dupfmulx1, VDQSF): New pattern. (*aarch64_combine_dupfmulx2, VDQF): New pattern. (*aarch64_combine_dupfmulx3): New pattern. (*aarch64_combine_vgetfmulx1, VDQF_DF): New pattern. I'm not sure I like the use of 1,2,3 for this naming scheme. Elsewhere in the file, this convention points to the number of operands a pattern requires (for example add3). I think elsewhere in the file we use: "*aarch64_mul3_elt" "*aarch64_mul3_elt_" "*aarch64_mul3_elt_to_128df" "*aarch64_mul3_elt_to_64v2df" Is there a reason not to follow that pattern? Thanks, James Hi, I've made the changes you've requested - the pattern names have been changed to follow better the naming conventions elsewhere in the file. Thanks, Bilyan diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 269e00237bb1153ebf42505906ec5b760b04aafe..5ff19094b2fb10b332d186a6de02752b31ed4141 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -2880,6 +2880,79 @@ [(set_attr "type" "neon_fp_mul_")] ) +;; fmulxq_lane_f32, and fmulx_laneq_f32 + +(define_insn "*aarch64_mulx_elt_" + [(set (match_operand:VDQSF 0 "register_operand" "=w") + (unspec:VDQSF + [(match_operand:VDQSF 1 "register_operand" "w") + (vec_duplicate:VDQSF + (vec_select: + (match_operand: 2 "register_operand" "w") + (parallel [(match_operand:SI 3 "immediate_operand" "i")])))] + UNSPEC_FMULX))] + "TARGET_SIMD" + { +operands[3] = GEN_INT (ENDIAN_LANE_N (mode, + INTVAL (operands[3]))); +return "fmulx\t%0, %1, %2.[%3]"; + } + [(set_attr "type" "neon_fp_mul__scalar")] +) + +;; fmulxq_laneq_f32, fmulxq_laneq_f64, fmulx_lane_f32 + +(define_insn "*aarch64_mulx_elt" + [(set (match_operand:VDQF 0 "register_operand" "=w") + (unspec:VDQF + [(match_operand:VDQF 1 "register_operand" "w") + (vec_duplicate:VDQF + (vec_select: + (match_operand:VDQF 2 "register_operand" "w") + (parallel [(match_operand:SI 3 "immediate_operand" "i")])))] + UNSPEC_FMULX))] + "TARGET_SIMD" + { +operands[3] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[3]))); +return "fmulx\t%0, %1, %2.[%3]"; + } + [(set_attr "type" "neon_fp_mul_")] +) + +;; fmulxq_lane_f64 + +(define_insn "*aarch64_mulx_elt_to_64v2df" + [(set (match_operand:V2DF 0 "register_operand" "=w") + (unspec:V2DF + [(match_operand:V2DF 1 "register_operand" "w") + (vec_duplicate:V2DF + (match_operand:DF 2 "register_operand" "w"))] + UNSPEC_FMULX))] + "TARGET_SIMD" + { +return "fmulx\t%0.2d, %1.2d, %2.d[0]"; + } + [(set_attr "type" "neon_fp_mul_d_scalar_q")] +) + +;; fmulxs_lane_f32, fmulxs_laneq_f32, fmulxd_lane_f64 == fmulx_lane_f64, +;; fmulxd_laneq_f64 == fmulx_laneq_f64 + +(define_insn "*aarch64_vgetfmulx" + [(set (match_operand: 0 "register_operand" "=w") + (unspec: + [(match_operand: 1 "register_operand" "w") + (vec_select: + (match_operand:VDQF_DF 2 "register_operand" "w") + (parallel [(match_operand:SI 3 "immediate_operand" "i")]))] + UNSPEC_FMULX))] + "TARGET_SIMD" + { +operands[3] = GEN_INT (ENDIAN_LANE_N (mode,
[AARCH64][PATCH 3/3] Adding tests to check proper error reporting of out of bounds accesses to vmulx_lane* NEON intrinsics
Implementing vmulx_* and vmulx_lane* NEON intrinsics Hi all, This series of patches focuses on the different vmulx_ and vmulx_lane NEON intrinsics variants. All of the existing inlined assembly block implementations are replaced with newly defined __builtin functions, and the missing intrinsics are implemented with __builtins as well. The rationale for the change from assembly to __builtin is that the compiler would be able to do more optimisations like instruction scheduling. A new named md pattern was added for the new fmulx __builtin. Most vmulx_lane variants have been implemented as a combination of a vdup followed by a vmulx_, rather than as separate __builtins. The remaining vmulx_lane intrinsics (vmulx(s|d)_lane*) were implemented using __aarch64_vget_lane_any () and an appropriate vmulx. Four new nameless md patterns were added to replace all the different types of RTL generated from the combination of these intrinsics during the combine pass. The rationale for this change is that in this way we would be able to optimise away all uses of a dup followed by a fmulx to the appropriate fmulx lane variant instruction. New test cases were added for all the implemented intrinsics. Also new tests were added for the proper error reporting of out-of-bounds accesses to _lane intrinsics. Tested on targets aarch64-none-elf and aarch64_be-none-elf. Dependencies: patch 2/3 depends on patch 1/3, and patch 3/3 depends on patch 2/3. --- This patch from the series adds tests that check for the proper error reporting of out of bounds accesses to all the vmulx_lane NEON intrinsics variants. The tests were added separately from the previous patch in order to reduce its size. gcc/testsuite/ 2015-XX-XX Bilyan Borisov * gcc.target/aarch64/advsimd-intrinsics/vmulx_lane_f32_indices_1.c: New. * gcc.target/aarch64/advsimd-intrinsics/vmulx_lane_f64_indices_1.c: New. * gcc.target/aarch64/advsimd-intrinsics/vmulx_laneq_f32_indices_1.c: New. * gcc.target/aarch64/advsimd-intrinsics/vmulx_laneq_f64_indices_1.c: New. * gcc.target/aarch64/advsimd-intrinsics/vmulxd_lane_f64_indices_1.c: New. * gcc.target/aarch64/advsimd-intrinsics/vmulxd_laneq_f64_indices_1.c: New. * gcc.target/aarch64/advsimd-intrinsics/vmulxq_lane_f32_indices_1.c: New. * gcc.target/aarch64/advsimd-intrinsics/vmulxq_lane_f64_indices_1.c: New. * gcc.target/aarch64/advsimd-intrinsics/vmulxq_laneq_f32_indices_1.c: New. * gcc.target/aarch64/advsimd-intrinsics/vmulxq_laneq_f64_indices_1.c: New. * gcc.target/aarch64/advsimd-intrinsics/vmulxs_lane_f32_indices_1.c: New. * gcc.target/aarch64/advsimd-intrinsics/vmulxs_laneq_f32_indices_1.c: New. diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulx_lane_f32_indices_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulx_lane_f32_indices_1.c new file mode 100644 index ..5681d5d21bc62e54e308c0a7c171f6f1b8969b71 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulx_lane_f32_indices_1.c @@ -0,0 +1,16 @@ +#include + +/* { dg-do compile } */ +/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */ +/* { dg-skip-if "" { arm*-*-* } } */ + +float32x2_t +f_vmulx_lane_f32 (float32x2_t v1, float32x2_t v2) +{ + float32x2_t res; + /* { dg-error "lane -1 out of range 0 - 1" "" { xfail arm*-*-* } 0 } */ + res = vmulx_lane_f32 (v1, v2, -1); + /* { dg-error "lane 2 out of range 0 - 1" "" { xfail arm*-*-* } 0 } */ + res = vmulx_lane_f32 (v1, v2, 2); + return res; +} diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulx_lane_f64_indices_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulx_lane_f64_indices_1.c new file mode 100644 index ..0c8f313d10423d6b189fe3c9081f0a6ef55f9937 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulx_lane_f64_indices_1.c @@ -0,0 +1,16 @@ +#include + +/* { dg-do compile } */ +/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */ +/* { dg-skip-if "" { arm*-*-* } } */ + +float64x1_t +f_vmulx_lane_f64 (float64x1_t v1, float64x1_t v2) +{ + float64x1_t res; + /* { dg-error "lane -1 out of range 0 - 0" "" { xfail arm*-*-* } 0 } */ + res = vmulx_lane_f64 (v1, v2, -1); + /* { dg-error "lane 1 out of range 0 - 0" "" { xfail arm*-*-* } 0 } */ + res = vmulx_lane_f64 (v1, v2, 1); + return res; +} diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulx_laneq_f32_indices_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmulx_laneq_f32_indices_1.c new file mode 100644 index ..9053ab6a7bd1bf9eba1d308ceed1524685e031e4 -
[AARCH64][PATCH 2/3] Implementing vmulx_lane NEON intrinsic variants
Implementing vmulx_* and vmulx_lane* NEON intrinsics Hi all, This series of patches focuses on the different vmulx_ and vmulx_lane NEON intrinsics variants. All of the existing inlined assembly block implementations are replaced with newly defined __builtin functions, and the missing intrinsics are implemented with __builtins as well. The rationale for the change from assembly to __builtin is that the compiler would be able to do more optimisations like instruction scheduling. A new named md pattern was added for the new fmulx __builtin. Most vmulx_lane variants have been implemented as a combination of a vdup followed by a vmulx_, rather than as separate __builtins. The remaining vmulx_lane intrinsics (vmulx(s|d)_lane*) were implemented using __aarch64_vget_lane_any () and an appropriate vmulx. Four new nameless md patterns were added to replace all the different types of RTL generated from the combination of these intrinsics during the combine pass. The rationale for this change is that in this way we would be able to optimise away all uses of a dup followed by a fmulx to the appropriate fmulx lane variant instruction. New test cases were added for all the implemented intrinsics. Also new tests were added for the proper error reporting of out-of-bounds accesses to _lane intrinsics. Tested on targets aarch64-none-elf and aarch64_be-none-elf. Dependencies: patch 2/3 depends on patch 1/3, and patch 3/3 depends on patch 2/3. --- In this patch from the series, all vmulx_lane variants have been implemented as a vdup followed by a vmulx. Existing implementations of intrinsics were refactored to use this new approach. Several new nameless md patterns are added that will enable the combine pass to pick up the dup/fmulx combination and replace it with a proper fmulx[lane] instruction. In addition, test cases for all new intrinsics were added. Tested on targets aarch64-none-elf and aarch64_be-none-elf. gcc/ 2015-XX-XX Bilyan Borisov * config/aarch64/arm_neon.h (vmulx_lane_f32): New. (vmulx_lane_f64): New. (vmulxq_lane_f32): Refactored & moved. (vmulxq_lane_f64): Refactored & moved. (vmulx_laneq_f32): New. (vmulx_laneq_f64): New. (vmulxq_laneq_f32): New. (vmulxq_laneq_f64): New. (vmulxs_lane_f32): New. (vmulxs_laneq_f32): New. (vmulxd_lane_f64): New. (vmulxd_laneq_f64): New. * config/aarch64/aarch64-simd.md (*aarch64_combine_dupfmulx1, VDQSF): New pattern. (*aarch64_combine_dupfmulx2, VDQF): New pattern. (*aarch64_combine_dupfmulx3): New pattern. (*aarch64_combine_vgetfmulx1, VDQF_DF): New pattern. gcc/testsuite/ 2015-XX-XX Bilyan Borisov * gcc/testsuite/gcc.target/aarch64/simd/vmulx_lane_f32_1.c: New. * gcc/testsuite/gcc.target/aarch64/simd/vmulx_lane_f64_1.c: New. * gcc/testsuite/gcc.target/aarch64/simd/vmulx_laneq_f32_1.c: New. * gcc/testsuite/gcc.target/aarch64/simd/vmulx_laneq_f64_1.c: New. * gcc/testsuite/gcc.target/aarch64/simd/vmulxq_lane_f32_1.c: New. * gcc/testsuite/gcc.target/aarch64/simd/vmulxq_lane_f64_1.c: New. * gcc/testsuite/gcc.target/aarch64/simd/vmulxq_laneq_f32_1.c: New. * gcc/testsuite/gcc.target/aarch64/simd/vmulxq_laneq_f64_1.c: New. * gcc/testsuite/gcc.target/aarch64/simd/vmulxs_lane_f32_1.c: New. * gcc/testsuite/gcc.target/aarch64/simd/vmulxs_laneq_f32_1.c: New. * gcc/testsuite/gcc.target/aarch64/simd/vmulxd_lane_f64_1.c: New. * gcc/testsuite/gcc.target/aarch64/simd/vmulxd_laneq_f64_1.c: New. diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index e7ebbd158d21691791a8d7db8a2616062e50..8d6873a45ad0cdef42f7c632bca38096b9de1787 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -2822,6 +2822,79 @@ [(set_attr "type" "neon_fp_mul_")] ) +;; fmulxq_lane_f32, and fmulx_laneq_f32 + +(define_insn "*aarch64_combine_dupfmulx1" + [(set (match_operand:VDQSF 0 "register_operand" "=w") + (unspec:VDQSF + [(match_operand:VDQSF 1 "register_operand" "w") + (vec_duplicate:VDQSF + (vec_select: + (match_operand: 2 "register_operand" "w") + (parallel [(match_operand:SI 3 "immediate_operand" "i")])))] + UNSPEC_FMULX))] + "TARGET_SIMD" + { +operands[3] = GEN_INT (ENDIAN_LANE_N (mode, + INTVAL (operands[3]))); +return "fmulx\t%0, %1, %2.[%3]"; + } + [(set_attr "type" "neon_fp_mul__scalar")] +) + +;; fmulxq_laneq_f32, fmulxq_laneq_f64, fmulx_lane_f32 + +(define_insn "*aarch64_combine_dupfmulx2" + [(set (match_operand:VDQF 0 "register_operand" "=w") + (unspec:VDQF + [(match_operand:VDQF 1 "register_operand" "w") + (vec_duplicate:V
[AARCH64][PATCH 1/3] Implementing the variants of the vmulx_ NEON intrinsic
Implementing vmulx_* and vmulx_lane* NEON intrinsics Hi all, This series of patches focuses on the different vmulx_ and vmulx_lane NEON intrinsics variants. All of the existing inlined assembly block implementations are replaced with newly defined __builtin functions, and the missing intrinsics are implemented with __builtins as well. The rationale for the change from assembly to __builtin is that the compiler would be able to do more optimisations like instruction scheduling. A new named md pattern was added for the new fmulx __builtin. Most vmulx_lane variants have been implemented as a combination of a vdup followed by a vmulx_, rather than as separate __builtins. The remaining vmulx_lane intrinsics (vmulx(s|d)_lane*) were implemented using __aarch64_vget_lane_any () and an appropriate vmulx. Four new nameless md patterns were added to replace all the different types of RTL generated from the combination of these intrinsics during the combine pass. The rationale for this change is that in this way we would be able to optimise away all uses of a dup followed by a fmulx to the appropriate fmulx lane variant instruction. New test cases were added for all the implemented intrinsics. Also new tests were added for the proper error reporting of out-of-bounds accesses to _lane intrinsics. Tested on targets aarch64-none-elf and aarch64_be-none-elf. Dependencies: patch 2/3 depends on patch 1/3, and patch 3/3 depends on patch 2/3. --- In this patch from the series, a single new md pattern is added: the one for fmulx, from which all necessary __builtin functions are derived. Several intrinsics were refactored to use the new __builtin functions as some of them already had an assembly block implementation. The rest, which had no existing implementation, were also added. A single intrinsic was removed: vmulx_lane_f32, since there was no test case that covered it and, moreover, its implementation was wrong: it was in fact implementing vmulxq_lane_f32. In addition, test cases for all new intrinsics were added. Tested on targets aarch64-none-elf and aarch64_be-none-elf. gcc/ 2015-XX-XX Bilyan Borisov * config/aarch64/aarch64-simd-builtins.def: BUILTIN declaration for fmulx... * config/aarch64/aarch64-simd.md: And its corresponding md pattern. * config/aarch64/arm_neon.h (vmulx_f32): Refactored to call fmulx __builtin, also moved. (vmulxq_f32): Same. (vmulx_f64): New, uses __builtin. (vmulxq_f64): Refactored to call fmulx __builtin, also moved. (vmulxs_f32): Same. (vmulxd_f64): Same. (vmulx_lane_f32): Removed, implementation was wrong. * config/aarch64/iterators.md: UNSPEC enum for fmulx. gcc/testsuite/ 2015-XX-XX Bilyan Borisov * gcc/testsuite/gcc.target/aarch64/simd/vmulx_f32_1.c: New. * gcc/testsuite/gcc.target/aarch64/simd/vmulx_f64_1.c: New. * gcc/testsuite/gcc.target/aarch64/simd/vmulxq_f32_1.c: New. * gcc/testsuite/gcc.target/aarch64/simd/vmulxq_f64_1.c: New. * gcc/testsuite/gcc.target/aarch64/simd/vmulxs_f32_1.c: New. * gcc/testsuite/gcc.target/aarch64/simd/vmulxd_f64_1.c: New. diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index 2c13cfb0823640254f02c202b19ddae78484d537..eed5f2b21997d4ea439dea828a0888cb253ad041 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -41,6 +41,7 @@ BUILTIN_VDC (COMBINE, combine, 0) BUILTIN_VB (BINOP, pmul, 0) + BUILTIN_VALLF (BINOP, fmulx, 0) BUILTIN_VDQF_DF (UNOP, sqrt, 2) BUILTIN_VD_BHSI (BINOP, addp, 0) VAR1 (UNOP, addp, 0, di) diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 541faf982effc7195a5f8d0d82738f76a7e04b4b..e7ebbd158d21691791a8d7db8a2616062e50 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -2810,6 +2810,18 @@ [(set_attr "type" "neon_mul_")] ) +;; fmulx. + +(define_insn "aarch64_fmulx" + [(set (match_operand:VALLF 0 "register_operand" "=w") + (unspec:VALLF [(match_operand:VALLF 1 "register_operand" "w") + (match_operand:VALLF 2 "register_operand" "w")] + UNSPEC_FMULX))] + "TARGET_SIMD" + "fmulx\t%0, %1, %2" + [(set_attr "type" "neon_fp_mul_")] +) + ;; q (define_insn "aarch64_" diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index 91ada618b79e038eb61e09ecd29af5129de81f51..4a3ef455b0945ed7e77fb3e78621d5010cd4c094 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -8509,63 +8509,6 @@ vmulq_n_u32 (uint32x4_t a, uint32_t b) return result; } -__extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) -vmulx_f32 (float32x2_t a, float32x2_t b)
[PATCH] [ARM] Replacing variable swaps that use a temporary variable with a call to std::swap in gcc/config/arm/arm.c
Replacing variable swaps that use a temporary variable with a call to std::swap. Tested against arm-none-eabi target including a variant with neon enabled. 2015-XX-XX Bilyan Borisov * config/arm/arm.c (thumb_output_move_mem_multiple): Replaced operands[4] operands[5] swap with std::swap, removed tmp variable. (arm_evpc_neon_vzip): Replaced in0/in1 and out0/out1 swaps with std::swap, removed x variable. (arm_evpc_neon_vtrn): Replaced in0/int1 and out0/out1 swaos with std::swap, removed x variable. (arm_expand_vec_perm_const_1): Replaced d->op0/d->op1 swap with std::swap, removed x variable. (arm_evpc_neon_vuzp): Replaced in0/in1 and out0/out1 swaps with std::swap, removed x variable. diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index fa4e083adfe215b5820237f3cc6b449dbdefc778..dc549fe470e5a82d740cf8014057ee2cc1d54085 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -25362,17 +25362,12 @@ thumb_load_double_from_address (rtx *operands) const char * thumb_output_move_mem_multiple (int n, rtx *operands) { - rtx tmp; - switch (n) { case 2: if (REGNO (operands[4]) > REGNO (operands[5])) - { - tmp = operands[4]; - operands[4] = operands[5]; - operands[5] = tmp; - } + std::swap (operands[4], operands[5]); + output_asm_insn ("ldmia\t%1!, {%4, %5}", operands); output_asm_insn ("stmia\t%0!, {%4, %5}", operands); break; @@ -27885,7 +27880,7 @@ static bool arm_evpc_neon_vuzp (struct expand_vec_perm_d *d) { unsigned int i, odd, mask, nelt = d->nelt; - rtx out0, out1, in0, in1, x; + rtx out0, out1, in0, in1; rtx (*gen)(rtx, rtx, rtx, rtx); if (GET_MODE_UNIT_SIZE (d->vmode) >= 8) @@ -27929,14 +27924,14 @@ arm_evpc_neon_vuzp (struct expand_vec_perm_d *d) in1 = d->op1; if (BYTES_BIG_ENDIAN) { - x = in0, in0 = in1, in1 = x; + std::swap (in0, in1); odd = !odd; } out0 = d->target; out1 = gen_reg_rtx (d->vmode); if (odd) -x = out0, out0 = out1, out1 = x; +std::swap (out0, out1); emit_insn (gen (out0, in0, in1, out1)); return true; @@ -27948,7 +27943,7 @@ static bool arm_evpc_neon_vzip (struct expand_vec_perm_d *d) { unsigned int i, high, mask, nelt = d->nelt; - rtx out0, out1, in0, in1, x; + rtx out0, out1, in0, in1; rtx (*gen)(rtx, rtx, rtx, rtx); if (GET_MODE_UNIT_SIZE (d->vmode) >= 8) @@ -27996,14 +27991,14 @@ arm_evpc_neon_vzip (struct expand_vec_perm_d *d) in1 = d->op1; if (BYTES_BIG_ENDIAN) { - x = in0, in0 = in1, in1 = x; + std::swap (in0, in1); high = !high; } out0 = d->target; out1 = gen_reg_rtx (d->vmode); if (high) -x = out0, out0 = out1, out1 = x; +std::swap (out0, out1); emit_insn (gen (out0, in0, in1, out1)); return true; @@ -28089,7 +28084,7 @@ static bool arm_evpc_neon_vtrn (struct expand_vec_perm_d *d) { unsigned int i, odd, mask, nelt = d->nelt; - rtx out0, out1, in0, in1, x; + rtx out0, out1, in0, in1; rtx (*gen)(rtx, rtx, rtx, rtx); if (GET_MODE_UNIT_SIZE (d->vmode) >= 8) @@ -28134,14 +28129,14 @@ arm_evpc_neon_vtrn (struct expand_vec_perm_d *d) in1 = d->op1; if (BYTES_BIG_ENDIAN) { - x = in0, in0 = in1, in1 = x; + std::swap (in0, in1); odd = !odd; } out0 = d->target; out1 = gen_reg_rtx (d->vmode); if (odd) -x = out0, out0 = out1, out1 = x; +std::swap (out0, out1); emit_insn (gen (out0, in0, in1, out1)); return true; @@ -28264,14 +28259,11 @@ arm_expand_vec_perm_const_1 (struct expand_vec_perm_d *d) if (d->perm[0] >= d->nelt) { unsigned i, nelt = d->nelt; - rtx x; for (i = 0; i < nelt; ++i) d->perm[i] = (d->perm[i] + nelt) & (2 * nelt - 1); - x = d->op0; - d->op0 = d->op1; - d->op1 = x; + std::swap (d->op0, d->op1); } if (TARGET_NEON)