------- Comment #4 from rguenth at gcc dot gnu dot org 2009-09-15 14:07 ------- With the alias issue fixed I get
good: .LFB0: .cfi_startproc movd srcshift(%rip), %xmm1 xorl %eax, %eax .p2align 4,,10 .p2align 3 .L2: movdqu (%rdi,%rax), %xmm0 pslld %xmm1, %xmm0 movdqu %xmm0, (%rsi,%rax) addq $16, %rax cmpq $1024, %rax jne .L2 rep ret bad: .LFB1: .cfi_startproc movd srcshift(%rip), %xmm2 leaq 1024(%rsi), %rax .p2align 4,,10 .p2align 3 .L6: movdqu (%rdi), %xmm0 addq $16, %rdi movdqu (%rsi), %xmm1 pslld %xmm2, %xmm0 por %xmm1, %xmm0 movdqu %xmm0, (%rsi) addq $16, %rsi cmpq %rax, %rsi jne .L6 rep ret which looks good in both cases. For the original testcase which results in a runtime alias check we get bad: .LFB1: .cfi_startproc leaq 16(%rdi), %rax cmpq %rax, %rsi leaq 16(%rsi), %rax seta %dl cmpq %rax, %rdi seta %al orb %al, %dl je .L10 leaq 1024(%rsi), %rax .p2align 4,,10 .p2align 3 .L11: movdqu (%rdi), %xmm0 addq $16, %rdi movd srcshift(%rip), %xmm1 pslld %xmm1, %xmm0 movdqu (%rsi), %xmm1 por %xmm1, %xmm0 movdqu %xmm0, (%rsi) addq $16, %rsi cmpq %rax, %rsi jne .L11 rep ret .L10: movzbl srcshift(%rip), %ecx xorl %eax, %eax .p2align 4,,10 .p2align 3 .L13: movl (%rdi,%rax), %edx sall %cl, %edx orl %edx, (%rsi,%rax) addq $4, %rax cmpq $1024, %rax jne .L13 rep ret thus still bad. It is IRA / reload that moves the srcshift load back into the loop for some reason. -- rguenth at gcc dot gnu dot org changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |vmakarov at gcc dot gnu dot | |org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34011