RE: censored naked SSE reciprocals, -mrecip

2007-12-29 Thread Dave Korn
On 29 December 2007 23:04, tbp wrote:

> Now that's blazing fast after-sales service. 

> As an extremely satisfied customer, i want to nominate you for the
> 2007 man of the year short list.

  Hear hear!  Uros works very hard and contributes a lot.  Thank you, Uros!

cheers,
  DaveK
-- 
Can't think of a witty .sigline today



Re: censored naked SSE reciprocals, -mrecip

2007-12-29 Thread tbp
On Dec 29, 2007 4:35 PM, Uros Bizjak <[EMAIL PROTECTED]> wrote:
> Attached patch fixes these problems by using correct shortcuts when
> generating intrinsic functions.
>
> Patch was bootstrapped and regression tested with {,-m32} on
> x86_64-pc-linux-gnu. Patch is committed to SVN.
>
> Thanks a lot for your report,
Now that's blazing fast after-sales service. And i get no less than
two undocumented but functional builtins (as opposed to, say
__builtin_ia32_movddup, which is documented but dysfunctional) for the
same price.
As an extremely satisfied customer, i want to nominate you for the
2007 man of the year short list.


Re: censored naked SSE reciprocals, -mrecip

2007-12-29 Thread Uros Bizjak

Hello!


i lately had some use for -mrecip but it turned out to come with all
sorts of strings attached and apparently no opt-out. Briefly, barring
inline asm, i can't get gcc to emit those ops without a NR fixup.
  





Questions:
  a) is that really by design?


No.

Attached patch fixes these problems by using correct shortcuts when 
generating intrinsic functions.


2007-12-29  Uros Bizjak  <[EMAIL PROTECTED]>

   * config/i386/sse.md ("*divv4sf3"): Rename to "sse_divv4sf3".
   ("*sse_rsqrtv4sf2"): Export.
   ("*sse_sqrtv4sf2"): Ditto.
   * config/i386/i386.c (enum ix86_builtins) [IX86_BUILTIN_RSQRTPS_NR,
   IX86_BUILTIN_SQRTPS_NR]: New constants.
   (struct builtin_description) [IX86_BUILTIN_DIVPS]: Use
   CODE_FOR_sse_divv4sf3.
   [IX86_BUILTIN_SQRTPS]: Use CODE_FOR_sse_sqrtv4sf2.
   [IX86_BUILTIN_SQRTPS_NR]: New.
   [IX86_BUILTIN_RSQRTPS_NR]: Ditto.
   (ix86_init_mmx_sse_builtins): Initialize 
__builtin_ia32_rsqrtps_nr and

   __builtin_ia32_sqrtps_nr.
   (ix86_builtin_vectorized_function): Convert BUILT_IN_SQRTF to
   IX86_BUILTIN_SQRTPS_NR.
   (ix86_builtin_reciprocal): Convert IX86_BUILTIN_SQRTPS_NR to
   IX86_BUILTIN_RSQRTPS_NR.

Patch was bootstrapped and regression tested with {,-m32} on 
x86_64-pc-linux-gnu. Patch is committed to SVN.


Thanks a lot for your report,
Uros.
Index: sse.md
===
--- sse.md  (revision 131218)
+++ sse.md  (working copy)
@@ -490,7 +490,7 @@
 }
 })
 
-(define_insn "*divv4sf3"
+(define_insn "sse_divv4sf3"
   [(set (match_operand:V4SF 0 "register_operand" "=x")
(div:V4SF (match_operand:V4SF 1 "register_operand" "0")
  (match_operand:V4SF 2 "nonimmediate_operand" "xm")))]
@@ -532,16 +532,7 @@
   [(set_attr "type" "sse")
(set_attr "mode" "SF")])
 
-(define_insn "*sse_rsqrtv4sf2"
-  [(set (match_operand:V4SF 0 "register_operand" "=x")
-   (unspec:V4SF
- [(match_operand:V4SF 1 "nonimmediate_operand" "xm")] UNSPEC_RSQRT))]
-  "TARGET_SSE"
-  "rsqrtps\t{%1, %0|%0, %1}"
-  [(set_attr "type" "sse")
-   (set_attr "mode" "V4SF")])
-
-(define_expand "sse_rsqrtv4sf2"
+(define_expand "rsqrtv4sf2"
   [(set (match_operand:V4SF 0 "register_operand" "")
(unspec:V4SF
  [(match_operand:V4SF 1 "nonimmediate_operand" "")] UNSPEC_RSQRT))]
@@ -556,6 +547,15 @@
 }
 })
 
+(define_insn "sse_rsqrtv4sf2"
+  [(set (match_operand:V4SF 0 "register_operand" "=x")
+   (unspec:V4SF
+ [(match_operand:V4SF 1 "nonimmediate_operand" "xm")] UNSPEC_RSQRT))]
+  "TARGET_SSE"
+  "rsqrtps\t{%1, %0|%0, %1}"
+  [(set_attr "type" "sse")
+   (set_attr "mode" "V4SF")])
+
 (define_insn "sse_vmrsqrtv4sf2"
   [(set (match_operand:V4SF 0 "register_operand" "=x")
(vec_merge:V4SF
@@ -568,14 +568,6 @@
   [(set_attr "type" "sse")
(set_attr "mode" "SF")])
 
-(define_insn "*sqrtv4sf2"
-  [(set (match_operand:V4SF 0 "register_operand" "=x")
-   (sqrt:V4SF (match_operand:V4SF 1 "nonimmediate_operand" "xm")))]
-  "TARGET_SSE"
-  "sqrtps\t{%1, %0|%0, %1}"
-  [(set_attr "type" "sse")
-   (set_attr "mode" "V4SF")])
-
 (define_expand "sqrtv4sf2"
   [(set (match_operand:V4SF 0 "register_operand" "=")
(sqrt:V4SF (match_operand:V4SF 1 "nonimmediate_operand" "")))]
@@ -590,6 +582,14 @@
 }
 })
 
+(define_insn "sse_sqrtv4sf2"
+  [(set (match_operand:V4SF 0 "register_operand" "=x")
+   (sqrt:V4SF (match_operand:V4SF 1 "nonimmediate_operand" "xm")))]
+  "TARGET_SSE"
+  "sqrtps\t{%1, %0|%0, %1}"
+  [(set_attr "type" "sse")
+   (set_attr "mode" "V4SF")])
+
 (define_insn "sse_vmsqrtv4sf2"
   [(set (match_operand:V4SF 0 "register_operand" "=x")
(vec_merge:V4SF
Index: i386.c
===
--- i386.c  (revision 131218)
+++ i386.c  (working copy)
@@ -17093,9 +17093,11 @@ enum ix86_builtins
   IX86_BUILTIN_RCPPS,
   IX86_BUILTIN_RCPSS,
   IX86_BUILTIN_RSQRTPS,
+  IX86_BUILTIN_RSQRTPS_NR,
   IX86_BUILTIN_RSQRTSS,
   IX86_BUILTIN_RSQRTF,
   IX86_BUILTIN_SQRTPS,
+  IX86_BUILTIN_SQRTPS_NR,
   IX86_BUILTIN_SQRTSS,
 
   IX86_BUILTIN_UNPCKHPS,
@@ -17849,7 +17851,7 @@ static const struct builtin_description 
   { OPTION_MASK_ISA_SSE, CODE_FOR_addv4sf3, "__builtin_ia32_addps", 
IX86_BUILTIN_ADDPS, UNKNOWN, 0 },
   { OPTION_MASK_ISA_SSE, CODE_FOR_subv4sf3, "__builtin_ia32_subps", 
IX86_BUILTIN_SUBPS, UNKNOWN, 0 },
   { OPTION_MASK_ISA_SSE, CODE_FOR_mulv4sf3, "__builtin_ia32_mulps", 
IX86_BUILTIN_MULPS, UNKNOWN, 0 },
-  { OPTION_MASK_ISA_SSE, CODE_FOR_divv4sf3, "__builtin_ia32_divps", 
IX86_BUILTIN_DIVPS, UNKNOWN, 0 },
+  { OPTION_MASK_ISA_SSE, CODE_FOR_sse_divv4sf3, "__builtin_ia32_divps", 
IX86_BUILTIN_DIVPS, UNKNOWN, 0 },
   { OPTION_MASK_ISA_SSE, CODE_FOR_sse_vmaddv4sf3,  "__builtin_ia32_addss", 
IX86_BUILTIN_ADDSS, UNKNOWN, 0 },
   { OPTION_MASK_ISA_SSE, CODE_FOR_sse_vmsubv4sf3,  "__builtin_ia32_subss", 
IX86_BUILTIN_SUBSS, UNKNOWN, 0 },
   { OPTI

censored naked SSE reciprocals, -mrecip

2007-12-28 Thread tbp
Merry xmas,

i lately had some use for -mrecip but it turned out to come with all
sorts of strings attached and apparently no opt-out. Briefly, barring
inline asm, i can't get gcc to emit those ops without a NR fixup.

# cat src/pr-recip.c
#include 
typedef float v4sf_t __attribute__ ((__vector_size__ (16)));

__m128 foo(__m128 a) { return _mm_sqrt_ps(a); }
__m128 bar(__m128 a) { return _mm_rsqrt_ps(a); }
__m128 baz(__m128 a) { return _mm_rcp_ps(a); }

v4sf_t nope1(v4sf_t a) { return __builtin_ia32_sqrtps(a); }
v4sf_t nope2(v4sf_t a) { return __builtin_ia32_rsqrtps(a); }
v4sf_t allright(v4sf_t a) { return __builtin_ia32_rcpps(a); }

int main() { return 0; }
# /usr/local/gcc-4.3-20071221/bin/gcc -march=native -ffast-math
-mrecip -O2 src/pr-recip.c
... and as can be witnessed in the attached asm dump foo, bar, nope1,
nope2 get mangled (at least on x86-64 linux).

While i can somehow understand the logic behind the automatic
transformation of _mm_sqrt_ps - it can be argued that's what the user
has asked for - there's no obvious way to opt out. But then i really
don't understand why gcc feels the urge to tinker when i specifically
ask for a rsqrt.
To add insult to injury -mrecip, unlike fast-math, doesn't set any
macro so kludging around is a cat / mouse game.

Questions:
  a) is that really by design?
  b) what's the official way to dodge fixups when -mrecip is active?
  c) any chance for -mrecip to set __FAST_MATH_NONE_SHALL_PASS__ or something?


dump.asm
Description: Binary data