Re: [PATCH][AArch64] Implement vsqrt_f64 intrinsic
On 17/12/14 00:04, Joseph Myers wrote: On Mon, 15 Dec 2014, James Greenhalgh wrote: @@ -22792,6 +22792,12 @@ vsqrtq_f32 (float32x4_t a) return __builtin_aarch64_sqrtv4sf (a); } +__extension__ static __inline float64x1_t __attribute__ ((__always_inline__)) +vsqrt_f64 (float64x1_t a) +{ + return (float64x1_t) { __builtin_sqrt (a[0]) }; +} Hi Kyrill, Does this introduce an implicit need to link against a maths library if we want arm_neon.h to work correctly? If so, I think we need to take a different approach. At O0 I've started to see: undefined reference to `sqrt' When checking a large arm_neon.h testcase. It does seem strange that the mid-end would convert a __builtin_sqrt back to a library call at O0 when the target has an optab for it, so perhaps there is a bug there to go hunt? __builtin_sqrt has the same semantics as the sqrt library function. This includes setting errno for negative arguments (other than -0 and -NaN). The semantics also include that it's always OK to expand to a call to that library function (generally, __builtin_foo may always expand to a call to foo, if there is such a library function). So my first attempt at this patch had created a target builtin (__builtin_aarch64_sqrtdf) and used that. Eventually though I went for the shorter __builtin_sqrt because I thought we could benefit from the tree-level information about the semantics rather than the RTL-level expansion that the target-specific builtin would provide. But if there's a risk for it to expand to a library function call, I guess it's better to go with the target builtin. I'll prepare a patch. Thanks for the explanations, Kyrill
Re: [PATCH][AArch64] Implement vsqrt_f64 intrinsic
On 17/12/14 00:04, Joseph Myers wrote: On Mon, 15 Dec 2014, James Greenhalgh wrote: @@ -22792,6 +22792,12 @@ vsqrtq_f32 (float32x4_t a) return __builtin_aarch64_sqrtv4sf (a); } +__extension__ static __inline float64x1_t __attribute__ ((__always_inline__)) +vsqrt_f64 (float64x1_t a) +{ + return (float64x1_t) { __builtin_sqrt (a[0]) }; +} Hi Kyrill, Does this introduce an implicit need to link against a maths library if we want arm_neon.h to work correctly? If so, I think we need to take a different approach. At O0 I've started to see: undefined reference to `sqrt' When checking a large arm_neon.h testcase. It does seem strange that the mid-end would convert a __builtin_sqrt back to a library call at O0 when the target has an optab for it, so perhaps there is a bug there to go hunt? __builtin_sqrt has the same semantics as the sqrt library function. This includes setting errno for negative arguments (other than -0 and -NaN). The semantics also include that it's always OK to expand to a call to that library function (generally, __builtin_foo may always expand to a call to foo, if there is such a library function). Thanks for the explanation. I see that there are some intrinsics implemented in terms of __builtin_fabsf, presumable they can be at 'risk' too of having a library call emitted? Kyrill
Re: [PATCH][AArch64] Implement vsqrt_f64 intrinsic
On Thu, 18 Dec 2014, Kyrill Tkachov wrote: I see that there are some intrinsics implemented in terms of __builtin_fabsf, presumable they can be at 'risk' too of having a library call emitted? The semantics of fabsf/fabs/fabsl are to clear the sign bit, never raise any exceptions (even for signaling NaN arguments, subject to any limitations imposed by calling conventions / floating-point load conventions on the processor) and never set errno, as per IEEE 754-2008 (and the C bindings thereto, TS 18661-1:2014). So there is certainly no need to call a library function for __builtin_fabsf, but I don't know whether the implementation guarantees never to generate such a call. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH][AArch64] Implement vsqrt_f64 intrinsic
On Mon, 15 Dec 2014, James Greenhalgh wrote: @@ -22792,6 +22792,12 @@ vsqrtq_f32 (float32x4_t a) return __builtin_aarch64_sqrtv4sf (a); } +__extension__ static __inline float64x1_t __attribute__ ((__always_inline__)) +vsqrt_f64 (float64x1_t a) +{ + return (float64x1_t) { __builtin_sqrt (a[0]) }; +} Hi Kyrill, Does this introduce an implicit need to link against a maths library if we want arm_neon.h to work correctly? If so, I think we need to take a different approach. At O0 I've started to see: undefined reference to `sqrt' When checking a large arm_neon.h testcase. It does seem strange that the mid-end would convert a __builtin_sqrt back to a library call at O0 when the target has an optab for it, so perhaps there is a bug there to go hunt? __builtin_sqrt has the same semantics as the sqrt library function. This includes setting errno for negative arguments (other than -0 and -NaN). The semantics also include that it's always OK to expand to a call to that library function (generally, __builtin_foo may always expand to a call to foo, if there is such a library function). -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH][AArch64] Implement vsqrt_f64 intrinsic
On Mon, Nov 17, 2014 at 05:35:23PM +, Kyrill Tkachov wrote: Hi all, This patch implements the vsqrt_f64 intrinsic in arm_neon.h. There's not much to it, we can reuse __builtin_sqrt. It's a fairly straightforward and self-contained patch, do we still want it at this stage? A new execute test is added. Tested aarch64-none-elf. Thanks, Kyrill 2014-11-17 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/aarch64/arm_neon.h (vsqrt_f64): New intrinsic. 2014-11-17 Kyrylo Tkachov kyrylo.tkac...@arm.com * gcc.target/aarch64/simd/vsqrt_f64_1.c commit d9e42debe2655287eef7b8c3ecf29bbdd11e6425 Author: Kyrylo Tkachov kyrylo.tkac...@arm.com Date: Mon Nov 17 15:02:01 2014 + [AArch64] Implement vsqrt_f64 intrinsic diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index b3b80b8..c58213a 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -22792,6 +22792,12 @@ vsqrtq_f32 (float32x4_t a) return __builtin_aarch64_sqrtv4sf (a); } +__extension__ static __inline float64x1_t __attribute__ ((__always_inline__)) +vsqrt_f64 (float64x1_t a) +{ + return (float64x1_t) { __builtin_sqrt (a[0]) }; +} Hi Kyrill, Does this introduce an implicit need to link against a maths library if we want arm_neon.h to work correctly? If so, I think we need to take a different approach. At O0 I've started to see: undefined reference to `sqrt' When checking a large arm_neon.h testcase. It does seem strange that the mid-end would convert a __builtin_sqrt back to a library call at O0 when the target has an optab for it, so perhaps there is a bug there to go hunt? Thanks, James
Re: [PATCH][AArch64] Implement vsqrt_f64 intrinsic
On 26/11/14 10:14, Christophe Lyon wrote: Hi Kyrill, Hi Christophe, On 21 November 2014 at 16:52, Marcus Shawcroft marcus.shawcr...@gmail.com wrote: On 17 November 2014 17:35, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: 2014-11-17 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/aarch64/arm_neon.h (vsqrt_f64): New intrinsic. 2014-11-17 Kyrylo Tkachov kyrylo.tkac...@arm.com * gcc.target/aarch64/simd/vsqrt_f64_1.c OK /Marcus Your new test fails at the scan-assembly step because all the code is optimized away (even at -O1). Sorry about that, I could have sworn I saw it pass when I initially wrote it... In any case, I've committed this patch (r218117) as obvious to mark one of the variables as volatile to make sure it's not optimised away. I've confirmed that the scan-assembler test fails without this patch and passes with it. Thanks, Kyrill Christophe. diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vsqrt_f64_1.c b/gcc/testsuite/gcc.target/aarch64/simd/vsqrt_f64_1.c index 57fb6bb..7f99bd5 100644 --- a/gcc/testsuite/gcc.target/aarch64/simd/vsqrt_f64_1.c +++ b/gcc/testsuite/gcc.target/aarch64/simd/vsqrt_f64_1.c @@ -11,7 +11,7 @@ extern void abort (void); int main (void) { - float64x1_t in = vcreate_f64(0x3febd3e560634d7bULL); + volatile float64x1_t in = vcreate_f64(0x3febd3e560634d7bULL); float64x1_t result = vsqrt_f64 (in); float64_t expected = 0.9325321502142351;
Re: [PATCH][AArch64] Implement vsqrt_f64 intrinsic
On 27/11/14 10:03, Kyrill Tkachov wrote: On 26/11/14 10:14, Christophe Lyon wrote: Hi Kyrill, Hi Christophe, On 21 November 2014 at 16:52, Marcus Shawcroft marcus.shawcr...@gmail.com wrote: On 17 November 2014 17:35, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: 2014-11-17 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/aarch64/arm_neon.h (vsqrt_f64): New intrinsic. 2014-11-17 Kyrylo Tkachov kyrylo.tkac...@arm.com * gcc.target/aarch64/simd/vsqrt_f64_1.c OK /Marcus Your new test fails at the scan-assembly step because all the code is optimized away (even at -O1). Sorry about that, I could have sworn I saw it pass when I initially wrote it... In any case, I've committed this patch (r218117) as obvious to mark one of the variables as volatile to make sure it's not optimised away. I've confirmed that the scan-assembler test fails without this patch and passes with it. With the following ChangeLog for completeness sake: 2014-11-27 Kyrylo Tkachov kyrylo.tkac...@arm.com * gcc.target/aarch64/simd/vsqrt_f64_1.c: Mark variable volatile. Thanks, Kyrill Christophe.
Re: [PATCH][AArch64] Implement vsqrt_f64 intrinsic
Hi Kyrill, On 21 November 2014 at 16:52, Marcus Shawcroft marcus.shawcr...@gmail.com wrote: On 17 November 2014 17:35, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: 2014-11-17 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/aarch64/arm_neon.h (vsqrt_f64): New intrinsic. 2014-11-17 Kyrylo Tkachov kyrylo.tkac...@arm.com * gcc.target/aarch64/simd/vsqrt_f64_1.c OK /Marcus Your new test fails at the scan-assembly step because all the code is optimized away (even at -O1). Christophe.
Re: [PATCH][AArch64] Implement vsqrt_f64 intrinsic
On 17 November 2014 17:35, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: 2014-11-17 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/aarch64/arm_neon.h (vsqrt_f64): New intrinsic. 2014-11-17 Kyrylo Tkachov kyrylo.tkac...@arm.com * gcc.target/aarch64/simd/vsqrt_f64_1.c OK /Marcus
[PATCH][AArch64] Implement vsqrt_f64 intrinsic
Hi all, This patch implements the vsqrt_f64 intrinsic in arm_neon.h. There's not much to it, we can reuse __builtin_sqrt. It's a fairly straightforward and self-contained patch, do we still want it at this stage? A new execute test is added. Tested aarch64-none-elf. Thanks, Kyrill 2014-11-17 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/aarch64/arm_neon.h (vsqrt_f64): New intrinsic. 2014-11-17 Kyrylo Tkachov kyrylo.tkac...@arm.com * gcc.target/aarch64/simd/vsqrt_f64_1.ccommit d9e42debe2655287eef7b8c3ecf29bbdd11e6425 Author: Kyrylo Tkachov kyrylo.tkac...@arm.com Date: Mon Nov 17 15:02:01 2014 + [AArch64] Implement vsqrt_f64 intrinsic diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index b3b80b8..c58213a 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -22792,6 +22792,12 @@ vsqrtq_f32 (float32x4_t a) return __builtin_aarch64_sqrtv4sf (a); } +__extension__ static __inline float64x1_t __attribute__ ((__always_inline__)) +vsqrt_f64 (float64x1_t a) +{ + return (float64x1_t) { __builtin_sqrt (a[0]) }; +} + __extension__ static __inline float64x2_t __attribute__ ((__always_inline__)) vsqrtq_f64 (float64x2_t a) { diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vsqrt_f64_1.c b/gcc/testsuite/gcc.target/aarch64/simd/vsqrt_f64_1.c new file mode 100644 index 000..57fb6bb --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/simd/vsqrt_f64_1.c @@ -0,0 +1,25 @@ +/* Test the vsqrt_f64 AArch64 SIMD intrinsic. */ + +/* { dg-do run } */ +/* { dg-options -save-temps -O3 } */ + +#include arm_neon.h + +extern void abort (void); + + +int +main (void) +{ + float64x1_t in = vcreate_f64(0x3febd3e560634d7bULL); + float64x1_t result = vsqrt_f64 (in); + float64_t expected = 0.9325321502142351; + + if (result[0] != expected) +abort (); + + return 0; +} + +/* { dg-final { scan-assembler fsqrt\[ \t\]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+\n } } */ +/* { dg-final { cleanup-saved-temps } } */