[Bug rtl-optimization/115021] [14/15 regression] unnecessary spill for vpternlog
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115021 --- Comment #5 from Hongtao Liu --- It's fixed by r15-1100-gec985bc97a0157
[Bug rtl-optimization/115021] [14/15 regression] unnecessary spill for vpternlog
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115021 --- Comment #4 from Hongtao Liu --- (In reply to Hu Lin from comment #3) > I found compiler allocates mem to the third source register of vpternlog in > IRA after commit f55cdce3f8dd8503e080e35be59c5f5390f6d95e. And it cause the > generate code will be > > 8 .cfi_startproc > 9 movl$4, %eax > 10 vpsraw $5, %xmm0, %xmm2 > 11 vpbroadcastb%eax, %xmm1 > 12 movl$7, %eax > 13 vpbroadcastb%eax, %xmm3 > 14 vmovdqa %xmm1, %xmm0 > 15 vpternlogd $120, %xmm3, %xmm2, %xmm0 > 16 vmovdqa %xmm3, -24(%rsp) > 17 vpsubb %xmm1, %xmm0, %xmm0 > 18 ret > > And 6a67fdcb3f0cc8be47b49ddd246d0c50c3770800 changes the vector type from > v16qi to v4si, leading to movv4si can't combine with the vpternlog in > postreload, so the result is what you see now. To clarify: The extra spill is caused by r14-4944-gf55cdce3f8dd85, r14-7026-g6a67fdcb3f0cc8 only causes an extra mov instruction(which is not a big deal).
[Bug rtl-optimization/115021] [14/15 regression] unnecessary spill for vpternlog
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115021 Hu Lin changed: What|Removed |Added CC||lin1.hu at intel dot com --- Comment #3 from Hu Lin --- I found compiler allocates mem to the third source register of vpternlog in IRA after commit f55cdce3f8dd8503e080e35be59c5f5390f6d95e. And it cause the generate code will be 8 .cfi_startproc 9 movl$4, %eax 10 vpsraw $5, %xmm0, %xmm2 11 vpbroadcastb%eax, %xmm1 12 movl$7, %eax 13 vpbroadcastb%eax, %xmm3 14 vmovdqa %xmm1, %xmm0 15 vpternlogd $120, %xmm3, %xmm2, %xmm0 16 vmovdqa %xmm3, -24(%rsp) 17 vpsubb %xmm1, %xmm0, %xmm0 18 ret And 6a67fdcb3f0cc8be47b49ddd246d0c50c3770800 changes the vector type from v16qi to v4si, leading to movv4si can't combine with the vpternlog in postreload, so the result is what you see now.
[Bug rtl-optimization/115021] [14/15 regression] unnecessary spill for vpternlog
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115021 --- Comment #2 from Roger Sayle --- Here's a reduced test case that should be unaffected by the pending changes to how V8QI shifts are expanded. Note that the final "t -= t4" is required to convince the register allocator to "spill". typedef signed char v16qi __attribute__ ((__vector_size__ (16))); // sign-extend low 3 bits to a byte. v16qi foo (v16qi x) { v16qi t7 = (v16qi){7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7}; v16qi t4 = (v16qi){4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4}; v16qi t = x & t7; t ^= t4; t -= t4; return t; } which produces: foo:movl$67372036, %eax vmovdqa %xmm0, %xmm2 vpbroadcastd%eax, %xmm1 movl$117901063, %eax vpbroadcastd%eax, %xmm3 vmovdqa %xmm1, %xmm0 vmovdqa %xmm3, -24(%rsp) vmovdqa -24(%rsp), %xmm4 vpternlogd $120, %xmm2, %xmm4, %xmm0 vpsubb %xmm1, %xmm0, %xmm0 ret
[Bug rtl-optimization/115021] [14/15 regression] unnecessary spill for vpternlog
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115021 Roger Sayle changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |roger at nextmovesoftware dot com Last reconfirmed||2024-05-10 CC||roger at nextmovesoftware dot com Ever confirmed|0 |1 Status|UNCONFIRMED |NEW --- Comment #1 from Roger Sayle --- I have a patch for x86 ternlog handling that changes the output for this testcase (without the pending change to optimize V8QI shifts) to: foo:movl$67372036, %eax vpsraw $5, %xmm0, %xmm0 vpbroadcastd%eax, %xmm1 vpternlogd $108, .LC0(%rip), %xmm1, %xmm0 vpsubb %xmm1, %xmm0, %xmm0 ret .align 16 .LC0: .byte 7 .byte 7 .byte 7 .byte 7 .byte 7 .byte 7 .byte 7 .byte 7 .byte 7 .byte 7 .byte 7 .byte 7 .byte 7 .byte 7 .byte 7 .byte 7 which at least doesn't construct the vector with a broadcast, and then "spill" it to the stack before reading it back from memory. I've no idea if this is optimal, but it's certainly better than the current "spill". I'm curious about what has changed to make this code (register allocation) regress since GCC 13. It was a patch of mine that changed broadcastb to broadcastd, but that shouldn't have affected reload/register preferencing.
[Bug rtl-optimization/115021] [14/15 regression] unnecessary spill for vpternlog
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115021 Richard Biener changed: What|Removed |Added Target Milestone|--- |14.2 Priority|P3 |P2