[Bug target/108724] [11 Regression] Poor codegen when summing two arrays without AVX or SSE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108724 Jakub Jelinek changed: What|Removed |Added Target Milestone|11.4|11.5 --- Comment #11 from Jakub Jelinek --- GCC 11.4 is being released, retargeting bugs to GCC 11.5.
[Bug target/108724] [11 Regression] Poor codegen when summing two arrays without AVX or SSE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108724 --- Comment #10 from Richard Biener --- On trunk we're back to vectorizing but as intended with DImode which makes us save half of the loads and stores and we think the extended required arithmetic covers up for that (by quite some margin). movabsq $9223372034707292159, %rcx movq(%rdx), %rax movq(%rsi), %rsi movq%rcx, %rdx andq%rax, %rdx andq%rsi, %rcx xorq%rsi, %rax addq%rcx, %rdx movabsq $-9223372034707292160, %rcx andq%rcx, %rax xorq%rdx, %rax movq%rax, (%rdi) vs movl(%rdx), %eax addl(%rsi), %eax movl%eax, (%rdi) movl4(%rdx), %eax addl4(%rsi), %eax movl%eax, 4(%rdi)
[Bug target/108724] [11 Regression] Poor codegen when summing two arrays without AVX or SSE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108724 Richard Biener changed: What|Removed |Added Component|tree-optimization |target Target||x86_64-*-* i?86-*-* --- Comment #9 from Richard Biener --- And the remaining issue with GCC 11 would be that we fail to account for the GPR -> XMM move. Or the remaining issue for _all_ branches is that we fail to realize that emulated "vector" CTORs are even more expensive since we lack a good way to materialize the CTOR in a GPR (generic RTL expansion fails to consider using shift + and for example). Not sure what a good expansion of a V2SImode, V4HImode or V8QImode CTOR to a GPR DImode reg would look like.