[Bug target/38825] missed optimization: register renaming in unrolled loop

2016-05-02 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38825

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #8 from Richard Biener  ---
A testcase variant using __restrict was confirmed:

#include 

void bench_3(float * __restrict out, float * __restrict in, float f, unsigned
int n)
{
  n /= 8;
  __m128 scalar = _mm_set_ps1(f);
  do
{
  __m128 arg = _mm_load_ps(in);
  __m128 result = _mm_add_ps(arg, scalar);
  _mm_store_ps(out, result);

  arg = _mm_load_ps(in+4);
  result = _mm_add_ps(arg, scalar);
  _mm_store_ps(out+4, result);
  in += 8;
  out += 8;
}
  while (--n);
}

This is optimized with GCC 4.6 and up with -frename-registers or on trunk
where the latter is enabled by default now.

Fixed thus.

[Bug target/38825] missed optimization: register renaming in unrolled loop

2016-04-29 Thread steven at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38825

Steven Bosscher  changed:

   What|Removed |Added

   Keywords||alias
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2016-04-29
 CC||rguenth at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #7 from Steven Bosscher  ---
Confirmed at the time - and then fallen through the cracks?
Richi, alias stuff so maybe something for you to look at again?

[Bug target/38825] missed optimization: register renaming in unrolled loop

2009-01-13 Thread rguenth at gcc dot gnu dot org


--- Comment #1 from rguenth at gcc dot gnu dot org  2009-01-13 15:08 ---
Try -frename-registers.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38825



[Bug target/38825] missed optimization: register renaming in unrolled loop

2009-01-13 Thread rguenth at gcc dot gnu dot org


--- Comment #2 from rguenth at gcc dot gnu dot org  2009-01-13 15:15 ---
Note that your testcase has moved the load _mm_load_ps(in+4); before the
store _mm_store_ps(out, result); which the compiler cannot do itself because
they may alias.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38825



[Bug target/38825] missed optimization: register renaming in unrolled loop

2009-01-13 Thread tim at klingt dot org


--- Comment #3 from tim at klingt dot org  2009-01-13 15:26 ---
(In reply to comment #1)
 Try -frename-registers.

i forgot to mention: the binaries are compiled with -O3 -mfpmath=sse -msse
(4.2, 4.3 and 4.4).

-frename-registers is enabled by -O3

(In reply to comment #2)
 Note that your testcase has moved the load _mm_load_ps(in+4); before the
 store _mm_store_ps(out, result); which the compiler cannot do itself because
 they may alias.

i see ... however the generated code is the same, when using restricted
pointers to inform the compiler, that there is no aliasing problem


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38825



[Bug target/38825] missed optimization: register renaming in unrolled loop

2009-01-13 Thread rguenth at gcc dot gnu dot org


--- Comment #4 from rguenth at gcc dot gnu dot org  2009-01-13 15:44 ---
-frename-registers does make a difference for me,

.L2:
movaps  %xmm0, %xmm2
movaps  %xmm0, %xmm1
addps   (%rsi,%rax), %xmm2
movaps  %xmm2, (%rdi,%rax)
addps   16(%rsi,%rax), %xmm1
movaps  %xmm1, 16(%rdi,%rax)
addq$32, %rax
cmpq%rdx, %rax
jne .L2

vs.

.L2:
movaps  %xmm0, %xmm1
addps   (%rsi,%rax), %xmm1
movaps  %xmm1, (%rdi,%rax)
movaps  %xmm0, %xmm1
addps   16(%rsi,%rax), %xmm1
movaps  %xmm1, 16(%rdi,%rax)
addq$32, %rax
cmpq%rdx, %rax
jne .L2

x86_64, -O3 -fschedule-insns [-frename-registers], with restrict added


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38825



[Bug target/38825] missed optimization: register renaming in unrolled loop

2009-01-13 Thread tim at klingt dot org


--- Comment #5 from tim at klingt dot org  2009-01-13 16:08 ---
(In reply to comment #4)
 -frename-registers does make a difference for me,

i can reproduce it, however, -frename-registers is supposed to be enabled by
-O3:
t...@thinkpad:~/workspace/nova-server.git$ /usr/local/lib/gcc-snapshot/bin/g++
-Q -O3 --help=optimizer  |grep frename
  -frename-registers[enabled]


the resolved aliasing issue, is not taken into account, though:

.L23:
movaps  %xmm0, %xmm2
movaps  %xmm0, %xmm1
addps   (%rsi,%rax), %xmm2
movaps  %xmm2, (%rdi,%rax)
addps   16(%rsi,%rax), %xmm1
movaps  %xmm1, 16(%rdi,%rax)
addq$32, %rax
cmpq%rdx, %rax
jne .L23

vs.

.L19:
movaps  %xmm0, %xmm2
movaps  %xmm0, %xmm1
addps   (%rsi,%rax), %xmm2
addps   16(%rsi,%rax), %xmm1
movaps  %xmm2, (%rdi,%rax)
movaps  %xmm1, 16(%rdi,%rax)
addq$32, %rax
cmpq%rdx, %rax
jne .L19


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38825



[Bug target/38825] missed optimization: register renaming in unrolled loop

2009-01-13 Thread rguenth at gcc dot gnu dot org


--- Comment #6 from rguenth at gcc dot gnu dot org  2009-01-13 16:37 ---
Yes, the alias sets are not properly transfered to RTL:

;; MEM[base: out, index: ivtmp.58] = result;

(insn 22 21 0 /usr/lib64/gcc/x86_64-suse-linux/4.4/include/xmmintrin.h:951 (set
(mem:V4SF (plus:DI (reg/v/f:DI 66 [ out ])
(reg:DI 63 [ ivtmp.58 ])) [2 S16 A128])
(reg/v:V4SF 64 [ result ])) -1 (nil))

;; result.70 = __builtin_ia32_addps (MEM[base: in, index: ivtmp.58, offset:
16], scalar);

(insn 23 22 24 /usr/lib64/gcc/x86_64-suse-linux/4.4/include/xmmintrin.h:161
(set (reg:V4SF 75)
(plus:V4SF (reg/v:V4SF 65 [ scalar ])
(mem:V4SF (plus:DI (plus:DI (reg/v/f:DI 67 [ in ])
(reg:DI 63 [ ivtmp.58 ]))
(const_int 16 [0x10])) [2 S16 A128]))) -1 (nil))

as you can see both use alias set 2.  But it should be noted that with
TARGET_MEM_REF (the MEM[...] expr) type-based aliasing is hosed (which is
unfortunately what restrict relies on).

Thus, with -fno-ivopts we can see different alias sets:

;; *(__v4sf *) out = result;

(insn 14 13 0 /usr/lib64/gcc/x86_64-suse-linux/4.4/include/xmmintrin.h:951 (set
(mem:V4SF (reg/v/f:DI 62 [ out ]) [6 S16 A128])
(reg/v:V4SF 60 [ result ])) -1 (nil))

;; result.58 = __builtin_ia32_addps (*(__v4sf *) (in + 16), scalar);

(insn 15 14 16 /usr/lib64/gcc/x86_64-suse-linux/4.4/include/xmmintrin.h:161
(set (reg:V4SF 67)
(plus:V4SF (reg/v:V4SF 61 [ scalar ])
(mem:V4SF (plus:DI (reg/v/f:DI 63 [ in ])
(const_int 16 [0x10])) [5 S16 A128]))) -1 (nil))

and re-ordering of mems!

.L2:
movaps  %xmm0, %xmm2
movaps  %xmm0, %xmm1
addps   (%rsi), %xmm2
addps   16(%rsi), %xmm1
addq$32, %rsi
movaps  %xmm2, (%rdi)
movaps  %xmm1, 16(%rdi)
addq$32, %rdi
subl$1, %edx
jne .L2


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38825