[Bug target/38825] missed optimization: register renaming in unrolled loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38825 Richard Biener changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #8 from Richard Biener --- A testcase variant using __restrict was confirmed: #include void bench_3(float * __restrict out, float * __restrict in, float f, unsigned int n) { n /= 8; __m128 scalar = _mm_set_ps1(f); do { __m128 arg = _mm_load_ps(in); __m128 result = _mm_add_ps(arg, scalar); _mm_store_ps(out, result); arg = _mm_load_ps(in+4); result = _mm_add_ps(arg, scalar); _mm_store_ps(out+4, result); in += 8; out += 8; } while (--n); } This is optimized with GCC 4.6 and up with -frename-registers or on trunk where the latter is enabled by default now. Fixed thus.
[Bug target/38825] missed optimization: register renaming in unrolled loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38825 Steven Bosscher changed: What|Removed |Added Keywords||alias Status|UNCONFIRMED |NEW Last reconfirmed||2016-04-29 CC||rguenth at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #7 from Steven Bosscher --- Confirmed at the time - and then fallen through the cracks? Richi, alias stuff so maybe something for you to look at again?
[Bug target/38825] missed optimization: register renaming in unrolled loop
--- Comment #6 from rguenth at gcc dot gnu dot org 2009-01-13 16:37 --- Yes, the alias sets are not properly transfered to RTL: ;; MEM[base: out, index: ivtmp.58] = result; (insn 22 21 0 /usr/lib64/gcc/x86_64-suse-linux/4.4/include/xmmintrin.h:951 (set (mem:V4SF (plus:DI (reg/v/f:DI 66 [ out ]) (reg:DI 63 [ ivtmp.58 ])) [2 S16 A128]) (reg/v:V4SF 64 [ result ])) -1 (nil)) ;; result.70 = __builtin_ia32_addps (MEM[base: in, index: ivtmp.58, offset: 16], scalar); (insn 23 22 24 /usr/lib64/gcc/x86_64-suse-linux/4.4/include/xmmintrin.h:161 (set (reg:V4SF 75) (plus:V4SF (reg/v:V4SF 65 [ scalar ]) (mem:V4SF (plus:DI (plus:DI (reg/v/f:DI 67 [ in ]) (reg:DI 63 [ ivtmp.58 ])) (const_int 16 [0x10])) [2 S16 A128]))) -1 (nil)) as you can see both use alias set 2. But it should be noted that with TARGET_MEM_REF (the MEM[...] expr) type-based aliasing is hosed (which is unfortunately what restrict relies on). Thus, with -fno-ivopts we can see different alias sets: ;; *(__v4sf *) out = result; (insn 14 13 0 /usr/lib64/gcc/x86_64-suse-linux/4.4/include/xmmintrin.h:951 (set (mem:V4SF (reg/v/f:DI 62 [ out ]) [6 S16 A128]) (reg/v:V4SF 60 [ result ])) -1 (nil)) ;; result.58 = __builtin_ia32_addps (*(__v4sf *) (in + 16), scalar); (insn 15 14 16 /usr/lib64/gcc/x86_64-suse-linux/4.4/include/xmmintrin.h:161 (set (reg:V4SF 67) (plus:V4SF (reg/v:V4SF 61 [ scalar ]) (mem:V4SF (plus:DI (reg/v/f:DI 63 [ in ]) (const_int 16 [0x10])) [5 S16 A128]))) -1 (nil)) and re-ordering of mems! .L2: movaps %xmm0, %xmm2 movaps %xmm0, %xmm1 addps (%rsi), %xmm2 addps 16(%rsi), %xmm1 addq$32, %rsi movaps %xmm2, (%rdi) movaps %xmm1, 16(%rdi) addq$32, %rdi subl$1, %edx jne .L2 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38825
[Bug target/38825] missed optimization: register renaming in unrolled loop
--- Comment #5 from tim at klingt dot org 2009-01-13 16:08 --- (In reply to comment #4) > -frename-registers does make a difference for me, i can reproduce it, however, -frename-registers is supposed to be enabled by -O3: t...@thinkpad:~/workspace/nova-server.git$ /usr/local/lib/gcc-snapshot/bin/g++ -Q -O3 --help=optimizer |grep frename -frename-registers[enabled] the resolved aliasing issue, is not taken into account, though: .L23: movaps %xmm0, %xmm2 movaps %xmm0, %xmm1 addps (%rsi,%rax), %xmm2 movaps %xmm2, (%rdi,%rax) addps 16(%rsi,%rax), %xmm1 movaps %xmm1, 16(%rdi,%rax) addq$32, %rax cmpq%rdx, %rax jne .L23 vs. .L19: movaps %xmm0, %xmm2 movaps %xmm0, %xmm1 addps (%rsi,%rax), %xmm2 addps 16(%rsi,%rax), %xmm1 movaps %xmm2, (%rdi,%rax) movaps %xmm1, 16(%rdi,%rax) addq$32, %rax cmpq%rdx, %rax jne .L19 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38825
[Bug target/38825] missed optimization: register renaming in unrolled loop
--- Comment #4 from rguenth at gcc dot gnu dot org 2009-01-13 15:44 --- -frename-registers does make a difference for me, .L2: movaps %xmm0, %xmm2 movaps %xmm0, %xmm1 addps (%rsi,%rax), %xmm2 movaps %xmm2, (%rdi,%rax) addps 16(%rsi,%rax), %xmm1 movaps %xmm1, 16(%rdi,%rax) addq$32, %rax cmpq%rdx, %rax jne .L2 vs. .L2: movaps %xmm0, %xmm1 addps (%rsi,%rax), %xmm1 movaps %xmm1, (%rdi,%rax) movaps %xmm0, %xmm1 addps 16(%rsi,%rax), %xmm1 movaps %xmm1, 16(%rdi,%rax) addq$32, %rax cmpq%rdx, %rax jne .L2 x86_64, -O3 -fschedule-insns [-frename-registers], with restrict added -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38825
[Bug target/38825] missed optimization: register renaming in unrolled loop
--- Comment #3 from tim at klingt dot org 2009-01-13 15:26 --- (In reply to comment #1) > Try -frename-registers. i forgot to mention: the binaries are compiled with -O3 -mfpmath=sse -msse (4.2, 4.3 and 4.4). -frename-registers is enabled by -O3 (In reply to comment #2) > Note that your testcase has moved the load _mm_load_ps(in+4); before the > store _mm_store_ps(out, result); which the compiler cannot do itself because > they may alias. i see ... however the generated code is the same, when using restricted pointers to inform the compiler, that there is no aliasing problem -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38825
[Bug target/38825] missed optimization: register renaming in unrolled loop
--- Comment #2 from rguenth at gcc dot gnu dot org 2009-01-13 15:15 --- Note that your testcase has moved the load _mm_load_ps(in+4); before the store _mm_store_ps(out, result); which the compiler cannot do itself because they may alias. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38825
[Bug target/38825] missed optimization: register renaming in unrolled loop
--- Comment #1 from rguenth at gcc dot gnu dot org 2009-01-13 15:08 --- Try -frename-registers. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38825