https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125357
--- Comment #6 from GCC Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by Jakub Jelinek <[email protected]>: https://gcc.gnu.org/g:4446d3e1045bd3728f8e57ee4af85f7d1b190e4f commit r17-617-g4446d3e1045bd3728f8e57ee4af85f7d1b190e4f Author: Jakub Jelinek <[email protected]> Date: Wed May 20 08:49:06 2026 +0200 i386: Use vpaddq + vpermilpd for some non-const permutations [PR125357] On Tue, May 19, 2026 at 10:30:16AM +0200, Jakub Jelinek wrote: > On Tue, May 19, 2026 at 10:51:37AM +0300, Alexander Monakov wrote: > > Thanks for looking at the issue, I really appreciate it. The same problem > > exists with 64-bit lanes (V2DF/V2SI modes, we fail to utilize vpermilpd). > > The control in that case is in bits 1 and 65 rather than 0 and 64. > So, in order to use vpermilpd for > __builtin_shuffle (v2di_or_v2df, v2di); > one would need to first shift the mask (or vpaddq with itself). > Though, that is still shorter than what we emit right now. The following seems to work for me. - movl $1, %eax - vmovq %rax, %xmm2 - vpunpcklqdq %xmm2, %xmm2, %xmm2 - vpand %xmm2, %xmm1, %xmm1 - vpsllq $3, %xmm1, %xmm1 - vpshufb .LC1(%rip), %xmm1, %xmm1 - vpaddb .LC2(%rip), %xmm1, %xmm1 - vpshufb %xmm1, %xmm0, %xmm0 + vpaddq %xmm1, %xmm1, %xmm1 + vpermilpd %xmm1, %xmm0, %xmm0 for both V2DI and V2DF. 2026-05-20 Jakub Jelinek <[email protected]> PR target/125357 * config/i386/i386-expand.cc (ix86_expand_vec_perm): For TARGET_AVX one_operand_shuffle handle also V2DImode and V2DFmode using vpaddq and vpermilpd. * gcc.target/i386/avx-pr125357-2.c: New test. * gcc.target/i386/avx2-pr125357-2.c: New test. Reviewed-by: Hongtao Liu <[email protected]>
