Re: [PATCH][AArch64] Implement vsqrt_f64 intrinsic

2015-01-09 Thread Kyrill Tkachov


On 17/12/14 00:04, Joseph Myers wrote:

On Mon, 15 Dec 2014, James Greenhalgh wrote:


@@ -22792,6 +22792,12 @@ vsqrtq_f32 (float32x4_t a)
return __builtin_aarch64_sqrtv4sf (a);
  }
  
+__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))

+vsqrt_f64 (float64x1_t a)
+{
+  return (float64x1_t) { __builtin_sqrt (a[0]) };
+}

Hi Kyrill,

Does this introduce an implicit need to link against a maths library if
we want arm_neon.h to work correctly? If so, I think we need to take a
different approach.

At O0 I've started to see:

undefined reference to `sqrt' 

When checking a large arm_neon.h testcase.

It does seem strange that the mid-end would convert a __builtin_sqrt back
to a library call at O0 when the target has an optab for it, so perhaps
there is a bug there to go hunt?

__builtin_sqrt has the same semantics as the sqrt library function.  This
includes setting errno for negative arguments (other than -0 and -NaN).
The semantics also include that it's always OK to expand to a call to that
library function (generally, __builtin_foo may always expand to a call to
foo, if there is such a library function).


So my first attempt at this patch had created a target builtin 
(__builtin_aarch64_sqrtdf) and used that. Eventually though I went for 
the shorter __builtin_sqrt because I thought we could benefit from the 
tree-level information about the semantics rather than the RTL-level 
expansion that the target-specific builtin would provide.
But if there's a risk for it to expand to a library function call, I 
guess it's better to go with the target builtin. I'll prepare a patch.


Thanks for the explanations,
Kyrill








Re: [PATCH][AArch64] Implement vsqrt_f64 intrinsic

2014-12-18 Thread Kyrill Tkachov


On 17/12/14 00:04, Joseph Myers wrote:

On Mon, 15 Dec 2014, James Greenhalgh wrote:


@@ -22792,6 +22792,12 @@ vsqrtq_f32 (float32x4_t a)
return __builtin_aarch64_sqrtv4sf (a);
  }
  
+__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))

+vsqrt_f64 (float64x1_t a)
+{
+  return (float64x1_t) { __builtin_sqrt (a[0]) };
+}

Hi Kyrill,

Does this introduce an implicit need to link against a maths library if
we want arm_neon.h to work correctly? If so, I think we need to take a
different approach.

At O0 I've started to see:

undefined reference to `sqrt' 

When checking a large arm_neon.h testcase.

It does seem strange that the mid-end would convert a __builtin_sqrt back
to a library call at O0 when the target has an optab for it, so perhaps
there is a bug there to go hunt?

__builtin_sqrt has the same semantics as the sqrt library function.  This
includes setting errno for negative arguments (other than -0 and -NaN).
The semantics also include that it's always OK to expand to a call to that
library function (generally, __builtin_foo may always expand to a call to
foo, if there is such a library function).


Thanks for the explanation.
I see that there are some intrinsics implemented in terms of 
__builtin_fabsf, presumable they can be at 'risk' too of having a 
library call emitted?


Kyrill








Re: [PATCH][AArch64] Implement vsqrt_f64 intrinsic

2014-12-18 Thread Joseph Myers
On Thu, 18 Dec 2014, Kyrill Tkachov wrote:

 I see that there are some intrinsics implemented in terms of __builtin_fabsf,
 presumable they can be at 'risk' too of having a library call emitted?

The semantics of fabsf/fabs/fabsl are to clear the sign bit, never raise 
any exceptions (even for signaling NaN arguments, subject to any 
limitations imposed by calling conventions / floating-point load 
conventions on the processor) and never set errno, as per IEEE 754-2008 
(and the C bindings thereto, TS 18661-1:2014).

So there is certainly no need to call a library function for 
__builtin_fabsf, but I don't know whether the implementation guarantees 
never to generate such a call.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH][AArch64] Implement vsqrt_f64 intrinsic

2014-12-16 Thread Joseph Myers
On Mon, 15 Dec 2014, James Greenhalgh wrote:

  @@ -22792,6 +22792,12 @@ vsqrtq_f32 (float32x4_t a)
 return __builtin_aarch64_sqrtv4sf (a);
   }
   
  +__extension__ static __inline float64x1_t __attribute__ 
  ((__always_inline__))
  +vsqrt_f64 (float64x1_t a)
  +{
  +  return (float64x1_t) { __builtin_sqrt (a[0]) };
  +}
 
 Hi Kyrill,
 
 Does this introduce an implicit need to link against a maths library if
 we want arm_neon.h to work correctly? If so, I think we need to take a
 different approach.
 
 At O0 I've started to see:
 
undefined reference to `sqrt' 
 
 When checking a large arm_neon.h testcase.
 
 It does seem strange that the mid-end would convert a __builtin_sqrt back
 to a library call at O0 when the target has an optab for it, so perhaps
 there is a bug there to go hunt?

__builtin_sqrt has the same semantics as the sqrt library function.  This 
includes setting errno for negative arguments (other than -0 and -NaN).  
The semantics also include that it's always OK to expand to a call to that 
library function (generally, __builtin_foo may always expand to a call to 
foo, if there is such a library function).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH][AArch64] Implement vsqrt_f64 intrinsic

2014-12-15 Thread James Greenhalgh
On Mon, Nov 17, 2014 at 05:35:23PM +, Kyrill Tkachov wrote:
 Hi all,
 
 This patch implements the vsqrt_f64 intrinsic in arm_neon.h.
 There's not much to it, we can reuse __builtin_sqrt.
 It's a fairly straightforward and self-contained patch,
 do we still want it at this stage?
 
 A new execute test is added.
 
 Tested aarch64-none-elf.
 
 
 Thanks,
 Kyrill
 
 2014-11-17  Kyrylo Tkachov  kyrylo.tkac...@arm.com
 
  * config/aarch64/arm_neon.h (vsqrt_f64): New intrinsic.
 
 2014-11-17  Kyrylo Tkachov  kyrylo.tkac...@arm.com
 
  * gcc.target/aarch64/simd/vsqrt_f64_1.c

 commit d9e42debe2655287eef7b8c3ecf29bbdd11e6425
 Author: Kyrylo Tkachov kyrylo.tkac...@arm.com
 Date:   Mon Nov 17 15:02:01 2014 +
 
 [AArch64] Implement vsqrt_f64 intrinsic
 
 diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
 index b3b80b8..c58213a 100644
 --- a/gcc/config/aarch64/arm_neon.h
 +++ b/gcc/config/aarch64/arm_neon.h
 @@ -22792,6 +22792,12 @@ vsqrtq_f32 (float32x4_t a)
return __builtin_aarch64_sqrtv4sf (a);
  }
  
 +__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
 +vsqrt_f64 (float64x1_t a)
 +{
 +  return (float64x1_t) { __builtin_sqrt (a[0]) };
 +}

Hi Kyrill,

Does this introduce an implicit need to link against a maths library if
we want arm_neon.h to work correctly? If so, I think we need to take a
different approach.

At O0 I've started to see:

   undefined reference to `sqrt' 

When checking a large arm_neon.h testcase.

It does seem strange that the mid-end would convert a __builtin_sqrt back
to a library call at O0 when the target has an optab for it, so perhaps
there is a bug there to go hunt?

Thanks,
James



Re: [PATCH][AArch64] Implement vsqrt_f64 intrinsic

2014-11-27 Thread Kyrill Tkachov


On 26/11/14 10:14, Christophe Lyon wrote:

Hi Kyrill,

Hi Christophe,




On 21 November 2014 at 16:52, Marcus Shawcroft
marcus.shawcr...@gmail.com wrote:

On 17 November 2014 17:35, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:


2014-11-17  Kyrylo Tkachov  kyrylo.tkac...@arm.com

 * config/aarch64/arm_neon.h (vsqrt_f64): New intrinsic.

2014-11-17  Kyrylo Tkachov  kyrylo.tkac...@arm.com

 * gcc.target/aarch64/simd/vsqrt_f64_1.c

OK /Marcus

Your new test fails at the scan-assembly step because all the code is
optimized away (even at -O1).


Sorry about that, I could have sworn I saw it pass when I initially 
wrote it...
In any case, I've committed this patch (r218117) as obvious to mark one 
of the variables as volatile

to make sure it's not optimised away.
I've confirmed that the scan-assembler test fails without this patch and 
passes with it.


Thanks,
Kyrill



Christophe.

diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vsqrt_f64_1.c b/gcc/testsuite/gcc.target/aarch64/simd/vsqrt_f64_1.c
index 57fb6bb..7f99bd5 100644
--- a/gcc/testsuite/gcc.target/aarch64/simd/vsqrt_f64_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vsqrt_f64_1.c
@@ -11,7 +11,7 @@ extern void abort (void);
 int
 main (void)
 {
-  float64x1_t in = vcreate_f64(0x3febd3e560634d7bULL);
+  volatile float64x1_t in = vcreate_f64(0x3febd3e560634d7bULL);
   float64x1_t result = vsqrt_f64 (in);
   float64_t expected = 0.9325321502142351;
 

Re: [PATCH][AArch64] Implement vsqrt_f64 intrinsic

2014-11-27 Thread Kyrill Tkachov


On 27/11/14 10:03, Kyrill Tkachov wrote:

On 26/11/14 10:14, Christophe Lyon wrote:

Hi Kyrill,

Hi Christophe,



On 21 November 2014 at 16:52, Marcus Shawcroft
marcus.shawcr...@gmail.com wrote:

On 17 November 2014 17:35, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:


2014-11-17  Kyrylo Tkachov  kyrylo.tkac...@arm.com

  * config/aarch64/arm_neon.h (vsqrt_f64): New intrinsic.

2014-11-17  Kyrylo Tkachov  kyrylo.tkac...@arm.com

  * gcc.target/aarch64/simd/vsqrt_f64_1.c

OK /Marcus

Your new test fails at the scan-assembly step because all the code is
optimized away (even at -O1).

Sorry about that, I could have sworn I saw it pass when I initially
wrote it...
In any case, I've committed this patch (r218117) as obvious to mark one
of the variables as volatile
to make sure it's not optimised away.
I've confirmed that the scan-assembler test fails without this patch and
passes with it.


With the following ChangeLog for completeness sake:

2014-11-27  Kyrylo Tkachov  kyrylo.tkac...@arm.com

* gcc.target/aarch64/simd/vsqrt_f64_1.c: Mark variable volatile.



Thanks,
Kyrill


Christophe.







Re: [PATCH][AArch64] Implement vsqrt_f64 intrinsic

2014-11-26 Thread Christophe Lyon
Hi Kyrill,


On 21 November 2014 at 16:52, Marcus Shawcroft
marcus.shawcr...@gmail.com wrote:
 On 17 November 2014 17:35, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:

 2014-11-17  Kyrylo Tkachov  kyrylo.tkac...@arm.com

 * config/aarch64/arm_neon.h (vsqrt_f64): New intrinsic.

 2014-11-17  Kyrylo Tkachov  kyrylo.tkac...@arm.com

 * gcc.target/aarch64/simd/vsqrt_f64_1.c

 OK /Marcus

Your new test fails at the scan-assembly step because all the code is
optimized away (even at -O1).

Christophe.


Re: [PATCH][AArch64] Implement vsqrt_f64 intrinsic

2014-11-21 Thread Marcus Shawcroft
On 17 November 2014 17:35, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:

 2014-11-17  Kyrylo Tkachov  kyrylo.tkac...@arm.com

 * config/aarch64/arm_neon.h (vsqrt_f64): New intrinsic.

 2014-11-17  Kyrylo Tkachov  kyrylo.tkac...@arm.com

 * gcc.target/aarch64/simd/vsqrt_f64_1.c

OK /Marcus


[PATCH][AArch64] Implement vsqrt_f64 intrinsic

2014-11-17 Thread Kyrill Tkachov

Hi all,

This patch implements the vsqrt_f64 intrinsic in arm_neon.h.
There's not much to it, we can reuse __builtin_sqrt.
It's a fairly straightforward and self-contained patch,
do we still want it at this stage?

A new execute test is added.

Tested aarch64-none-elf.


Thanks,
Kyrill

2014-11-17  Kyrylo Tkachov  kyrylo.tkac...@arm.com

* config/aarch64/arm_neon.h (vsqrt_f64): New intrinsic.

2014-11-17  Kyrylo Tkachov  kyrylo.tkac...@arm.com

* gcc.target/aarch64/simd/vsqrt_f64_1.ccommit d9e42debe2655287eef7b8c3ecf29bbdd11e6425
Author: Kyrylo Tkachov kyrylo.tkac...@arm.com
Date:   Mon Nov 17 15:02:01 2014 +

[AArch64] Implement vsqrt_f64 intrinsic

diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index b3b80b8..c58213a 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -22792,6 +22792,12 @@ vsqrtq_f32 (float32x4_t a)
   return __builtin_aarch64_sqrtv4sf (a);
 }
 
+__extension__ static __inline float64x1_t __attribute__ ((__always_inline__))
+vsqrt_f64 (float64x1_t a)
+{
+  return (float64x1_t) { __builtin_sqrt (a[0]) };
+}
+
 __extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
 vsqrtq_f64 (float64x2_t a)
 {
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vsqrt_f64_1.c b/gcc/testsuite/gcc.target/aarch64/simd/vsqrt_f64_1.c
new file mode 100644
index 000..57fb6bb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vsqrt_f64_1.c
@@ -0,0 +1,25 @@
+/* Test the vsqrt_f64 AArch64 SIMD intrinsic.  */
+
+/* { dg-do run } */
+/* { dg-options -save-temps -O3 } */
+
+#include arm_neon.h
+
+extern void abort (void);
+
+
+int
+main (void)
+{
+  float64x1_t in = vcreate_f64(0x3febd3e560634d7bULL);
+  float64x1_t result = vsqrt_f64 (in);
+  float64_t expected = 0.9325321502142351;
+
+  if (result[0] != expected)
+abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler fsqrt\[ \t\]+\[dD\]\[0-9\]+, \[dD\]\[0-9\]+\n } } */
+/* { dg-final { cleanup-saved-temps } } */