http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49064
Summary: [x86/x64]: broken alias analysis leads vectorizer to emit poor code Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: piotr.wyder...@gmail.com On an x86 capable of SSE2 or x64 (which has SSE2 by definition) GCC tries to vectorize as much integer code as possible, but ends up witch code much worse than without vectorization. The SSE2-based version unnecessarily recomputes all the m_Data pointers, as demonstrated by the following C++ snippet. I guess the reason is unsophisticated alias analysis, but the actual reason may in fact be different. struct X { __m128i* m_Data; std::size_t m_Len; void xor_all(const X& v1, const X& v2); void xor_all2(const X& v1, const X& v2); }; void X::xor_all(const X& v1, const X& v2) { for(std::size_t i = 0; i != m_Len; ++i) { m_Data[i] = v1.m_Data[i] ^ v2.m_Data[i]; } } void X::xor_all2(const X& v1, const X& v2) { __m128i* p0 = m_Data; __m128i* p1 = v1.m_Data; __m128i* p2 = v2.m_Data; for(std::size_t i = 0; i != m_Len; ++i) { p0[i] = p1[i] ^ p2[i]; } } As can be seen, xor_all2 produces nice code and xor_all doesn't: 0000000000447c70 <_ZN1X7xor_allERKS_S1_>: 447c70: 48 83 7f 08 00 cmpq $0x0,0x8(%rdi) 447c75: 74 35 je 447cac <_ZN1X7xor_allERKS_S1_+0x3c> 447c77: 31 c0 xor %eax,%eax 447c79: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) 447c80: 4c 8b 12 mov (%rdx),%r10 447c83: 48 89 c1 mov %rax,%rcx 447c86: 48 83 c0 01 add $0x1,%rax 447c8a: 4c 8b 0e mov (%rsi),%r9 447c8d: 48 c1 e1 04 shl $0x4,%rcx 447c91: 4c 8b 07 mov (%rdi),%r8 447c94: 66 41 0f 6f 04 0a movdqa (%r10,%rcx,1),%xmm0 447c9a: 66 41 0f ef 04 09 pxor (%r9,%rcx,1),%xmm0 447ca0: 66 41 0f 7f 04 08 movdqa %xmm0,(%r8,%rcx,1) 447ca6: 48 39 47 08 cmp %rax,0x8(%rdi) 447caa: 75 d4 jne 447c80 <_ZN1X7xor_allERKS_S1_+0x10> 447cac: f3 c3 repz retq 0000000000447cb0 <_ZN1X8xor_all2ERKS_S1_>: 447cb0: 48 83 7f 08 00 cmpq $0x0,0x8(%rdi) 447cb5: 48 8b 0f mov (%rdi),%rcx 447cb8: 48 8b 36 mov (%rsi),%rsi 447cbb: 4c 8b 02 mov (%rdx),%r8 447cbe: 74 26 je 447ce6 <_ZN1X8xor_all2ERKS_S1_+0x36> 447cc0: 31 c0 xor %eax,%eax 447cc2: 31 d2 xor %edx,%edx 447cc4: 0f 1f 40 00 nopl 0x0(%rax) 447cc8: 66 41 0f 6f 04 00 movdqa (%r8,%rax,1),%xmm0 447cce: 48 83 c2 01 add $0x1,%rdx 447cd2: 66 0f ef 04 06 pxor (%rsi,%rax,1),%xmm0 447cd7: 66 0f 7f 04 01 movdqa %xmm0,(%rcx,%rax,1) 447cdc: 48 83 c0 10 add $0x10,%rax 447ce0: 48 39 57 08 cmp %rdx,0x8(%rdi) 447ce4: 75 e2 jne 447cc8 <_ZN1X8xor_all2ERKS_S1_+0x18> 447ce6: f3 c3 repz retq