[Bug target/80355] Improve __builtin_shuffle on AVX512F
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80355 Jakub Jelinek changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #6 from Jakub Jelinek --- Fixed.
[Bug target/80355] Improve __builtin_shuffle on AVX512F
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80355 --- Comment #5 from CVS Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:50b5877925ef5ae8e9f913d6d2b5ce0204ebc588 commit r12-2837-g50b5877925ef5ae8e9f913d6d2b5ce0204ebc588 Author: Jakub Jelinek Date: Tue Aug 10 12:38:00 2021 +0200 i386: Allow some V32HImode and V64QImode permutations even without AVX512BW [PR80355] When working on the PR, I've noticed we generate terrible code for V32HImode or V64QImode permutations for -mavx512f -mno-avx512bw. Generally we can't do much with such permutations, but since PR68655 we can handle at least some, those expressible using V16SImode or V8DImode permutations, but that wasn't reachable, because ix86_vectorize_vec_perm_const didn't even try, it said without TARGET_AVX512BW it can't do anything, and with it can do everything, no d.testing_p attempts. This patch makes it try it for TARGET_AVX512F && !TARGET_AVX512BW. The first hunk is to avoid ICE, expand_vec_perm_even_odd_1 asserts d->vmode isn't V32HImode because expand_vec_perm_1 for AVX512BW handles already all permutations, but when we let it through without !TARGET_AVX512BW, expand_vec_perm_1 doesn't handle it. If we want, that hunk can be dropped if we implement in expand_vec_perm_even_odd_1 and its helper the even permutation as vpmovdw + vpmovdw + vinserti64x4 and odd permutation as vpsrld $16 + vpsrld $16 + vpmovdw + vpmovdw + vinserti64x4. 2021-08-10 Jakub Jelinek PR target/80355 * config/i386/i386-expand.c (expand_vec_perm_even_odd): Return false for V32HImode if !TARGET_AVX512BW. (ix86_vectorize_vec_perm_const) : If !TARGET_AVX512BW and TARGET_AVX512F and d.testing_p, don't fail early, but actually check the permutation. * gcc.target/i386/avx512f-pr80355-2.c: New test.
[Bug target/80355] Improve __builtin_shuffle on AVX512F
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80355 --- Comment #4 from CVS Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:7665af0b1a964b1baae3a59b22fcc420369c63cf commit r12-2835-g7665af0b1a964b1baae3a59b22fcc420369c63cf Author: Jakub Jelinek Date: Tue Aug 10 11:34:53 2021 +0200 i386: Improve single operand AVX512F permutations [PR80355] On the following testcase we emit vmovdqa32 .LC0(%rip), %zmm1 vpermd %zmm0, %zmm1, %zmm0 and vmovdqa64 .LC1(%rip), %zmm1 vpermq %zmm0, %zmm1, %zmm0 instead of vshufi32x4 $78, %zmm0, %zmm0, %zmm0 and vshufi64x2 $78, %zmm0, %zmm0, %zmm0 we can emit with the patch. We have patterns that match two argument permutations for vshuf[if]*, but for one argument it doesn't trigger. Either we can add two patterns for that, or we would need to add another routine to i386-expand.c that would transform under certain condition these cases to the two argument vshuf*, doing it in sse.md looked simpler. We don't need this for 32-byte vectors, we already emit single insn permutation that doesn't need memory op there. 2021-08-10 Jakub Jelinek PR target/80355 * config/i386/sse.md (*avx512f_shuf_64x2_1_1, *avx512f_shuf_32x4_1_1): New define_insn patterns. * gcc.target/i386/avx512f-pr80355-1.c: New test.
[Bug target/80355] Improve __builtin_shuffle on AVX512F
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80355 --- Comment #3 from Jakub Jelinek --- Created attachment 51278 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51278&action=edit gcc12-pr80355-2.patch And this incremental patch makes it handle even similar V32HImode/V64QImode permutations with -mavx512f -mno-avx512bw.
[Bug target/80355] Improve __builtin_shuffle on AVX512F
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80355 --- Comment #2 from Jakub Jelinek --- Created attachment 51277 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51277&action=edit gcc12-pr80355-1.patch Untested fix. For 32-byte vectors/AVX512VL we don't need this, we already emit vperm2i128 or vpermq.
[Bug target/80355] Improve __builtin_shuffle on AVX512F
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80355 Jakub Jelinek changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |jakub at gcc dot gnu.org Status|NEW |ASSIGNED
[Bug target/80355] Improve __builtin_shuffle on AVX512F
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80355 Andrew Pinski changed: What|Removed |Added Last reconfirmed||2021-08-07 Status|UNCONFIRMED |NEW Severity|normal |enhancement Ever confirmed|0 |1 --- Comment #1 from Andrew Pinski --- Confirmed.