[Bug target/95905] Failure to optimize _mm_unpacklo_epi8 with 0 as right operand to _mm_cvtepu8_epi16
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95905 --- Comment #2 from CVS Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:b668a06e37f72fd96bacd6769990ec97dac4ac6d commit r11-6628-gb668a06e37f72fd96bacd6769990ec97dac4ac6d Author: Jakub Jelinek Date: Wed Jan 13 08:02:54 2021 +0100 i386: Optimize _mm_unpacklo_epi8 of 0 vector as second argument or similar VEC_PERM_EXPRs into pmovzx [PR95905] The following patch adds patterns (so far 128-bit only) for permutations like { 0 16 1 17 2 18 3 19 4 20 5 21 6 22 7 23 } where the second operand is CONST0_RTX CONST_VECTOR to be emitted as pmovzx. 2021-01-13 Jakub Jelinek PR target/95905 * config/i386/predicates.md (pmovzx_parallel): New predicate. * config/i386/sse.md (*sse4_1_zero_extendv8qiv8hi2_3): New define_insn_and_split pattern. (*sse4_1_zero_extendv4hiv4si2_3): Likewise. (*sse4_1_zero_extendv2siv2di2_3): Likewise. * gcc.target/i386/pr95905-1.c: New test. * gcc.target/i386/pr95905-2.c: New test.
[Bug target/95905] Failure to optimize _mm_unpacklo_epi8 with 0 as right operand to _mm_cvtepu8_epi16
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95905 --- Comment #3 from CVS Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:b1d1e2b54c6b9cf13f021176ba37d24cc4dc2fe1 commit r11-6636-gb1d1e2b54c6b9cf13f021176ba37d24cc4dc2fe1 Author: Jakub Jelinek Date: Wed Jan 13 11:28:48 2021 +0100 i386, expand: Optimize also 256-bit and 512-bit permutatations as vpmovzx if possible [PR95905] The following patch implements what I've talked about, i.e. to no longer force operands of vec_perm_const into registers in the generic code, but let each of the (currently 8) targets force it into registers individually, giving the targets better control on if it does that and when and allowing them to do something special with some particular operands. And then defines the define_insn_and_split for the 256-bit and 512-bit permutations into vpmovzx* (only the bw, wd and dq cases, in theory we could add define_insn_and_split patterns also for the bd, bq and wq). 2021-01-13 Jakub Jelinek PR target/95905 * optabs.c (expand_vec_perm_const): Don't force v0 and v1 into registers before calling targetm.vectorize.vec_perm_const, only after that. * config/i386/i386-expand.c (ix86_vectorize_vec_perm_const): Handle two argument permutation when one operand is zero vector and only after that force operands into registers. * config/i386/sse.md (*avx2_zero_extendv16qiv16hi2_1): New define_insn_and_split pattern. (*avx512bw_zero_extendv32qiv32hi2_1): Likewise. (*avx512f_zero_extendv16hiv16si2_1): Likewise. (*avx2_zero_extendv8hiv8si2_1): Likewise. (*avx512f_zero_extendv8siv8di2_1): Likewise. (*avx2_zero_extendv4siv4di2_1): Likewise. * config/mips/mips.c (mips_vectorize_vec_perm_const): Force operands into registers. * config/arm/arm.c (arm_vectorize_vec_perm_const): Likewise. * config/sparc/sparc.c (sparc_vectorize_vec_perm_const): Likewise. * config/ia64/ia64.c (ia64_vectorize_vec_perm_const): Likewise. * config/aarch64/aarch64.c (aarch64_vectorize_vec_perm_const): Likewise. * config/rs6000/rs6000.c (rs6000_vectorize_vec_perm_const): Likewise. * config/gcn/gcn.c (gcn_vectorize_vec_perm_const): Likewise. Use std::swap. * gcc.target/i386/pr95905-2.c: Use scan-assembler-times instead of scan-assembler. Add tests with zero vector as first __builtin_shuffle operand. * gcc.target/i386/pr95905-3.c: New test. * gcc.target/i386/pr95905-4.c: New test.
[Bug target/95905] Failure to optimize _mm_unpacklo_epi8 with 0 as right operand to _mm_cvtepu8_epi16
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95905 Jakub Jelinek changed: What|Removed |Added Resolution|--- |FIXED CC||jakub at gcc dot gnu.org Status|UNCONFIRMED |RESOLVED --- Comment #4 from Jakub Jelinek --- Fixed.
[Bug target/95905] Failure to optimize _mm_unpacklo_epi8 with 0 as right operand to _mm_cvtepu8_epi16
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95905 Andrew Pinski changed: What|Removed |Added Target Milestone|--- |11.0
[Bug target/95905] Failure to optimize _mm_unpacklo_epi8 with 0 as right operand to _mm_cvtepu8_epi16
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95905 Andrew Pinski changed: What|Removed |Added CC||linux at carewolf dot com --- Comment #5 from Andrew Pinski --- *** Bug 78563 has been marked as a duplicate of this bug. ***
[Bug target/95905] Failure to optimize _mm_unpacklo_epi8 with 0 as right operand to _mm_cvtepu8_epi16
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95905 --- Comment #1 from Gabriel Ravier --- The same pattern with _mm_unpacklo_epi16/32 and the corresponding SSE4 intrinsics can also be optimized in the same way.