[Bug target/95905] Failure to optimize _mm_unpacklo_epi8 with 0 as right operand to _mm_cvtepu8_epi16

2021-08-21 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95905

Andrew Pinski  changed:

   What|Removed |Added

 CC||linux at carewolf dot com

--- Comment #5 from Andrew Pinski  ---
*** Bug 78563 has been marked as a duplicate of this bug. ***

[Bug target/95905] Failure to optimize _mm_unpacklo_epi8 with 0 as right operand to _mm_cvtepu8_epi16

2021-08-21 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95905

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |11.0

[Bug target/95905] Failure to optimize _mm_unpacklo_epi8 with 0 as right operand to _mm_cvtepu8_epi16

2021-01-13 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95905

Jakub Jelinek  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 CC||jakub at gcc dot gnu.org
 Status|UNCONFIRMED |RESOLVED

--- Comment #4 from Jakub Jelinek  ---
Fixed.

[Bug target/95905] Failure to optimize _mm_unpacklo_epi8 with 0 as right operand to _mm_cvtepu8_epi16

2021-01-13 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95905

--- Comment #3 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:b1d1e2b54c6b9cf13f021176ba37d24cc4dc2fe1

commit r11-6636-gb1d1e2b54c6b9cf13f021176ba37d24cc4dc2fe1
Author: Jakub Jelinek 
Date:   Wed Jan 13 11:28:48 2021 +0100

i386, expand: Optimize also 256-bit and 512-bit permutatations as vpmovzx
if possible [PR95905]

The following patch implements what I've talked about, i.e. to no longer
force operands of vec_perm_const into registers in the generic code, but
let
each of the (currently 8) targets force it into registers individually,
giving the targets better control on if it does that and when and allowing
them to do something special with some particular operands.
And then defines the define_insn_and_split for the 256-bit and 512-bit
permutations into vpmovzx* (only the bw, wd and dq cases, in theory we
could
add define_insn_and_split patterns also for the bd, bq and wq).

2021-01-13  Jakub Jelinek  

PR target/95905
* optabs.c (expand_vec_perm_const): Don't force v0 and v1 into
registers before calling targetm.vectorize.vec_perm_const, only
after
that.
* config/i386/i386-expand.c (ix86_vectorize_vec_perm_const): Handle
two argument permutation when one operand is zero vector and only
after that force operands into registers.
* config/i386/sse.md (*avx2_zero_extendv16qiv16hi2_1): New
define_insn_and_split pattern.
(*avx512bw_zero_extendv32qiv32hi2_1): Likewise.
(*avx512f_zero_extendv16hiv16si2_1): Likewise.
(*avx2_zero_extendv8hiv8si2_1): Likewise.
(*avx512f_zero_extendv8siv8di2_1): Likewise.
(*avx2_zero_extendv4siv4di2_1): Likewise.
* config/mips/mips.c (mips_vectorize_vec_perm_const): Force
operands
into registers.
* config/arm/arm.c (arm_vectorize_vec_perm_const): Likewise.
* config/sparc/sparc.c (sparc_vectorize_vec_perm_const): Likewise.
* config/ia64/ia64.c (ia64_vectorize_vec_perm_const): Likewise.
* config/aarch64/aarch64.c (aarch64_vectorize_vec_perm_const):
Likewise.
* config/rs6000/rs6000.c (rs6000_vectorize_vec_perm_const):
Likewise.
* config/gcn/gcn.c (gcn_vectorize_vec_perm_const): Likewise.  Use
std::swap.

* gcc.target/i386/pr95905-2.c: Use scan-assembler-times instead of
scan-assembler.  Add tests with zero vector as first
__builtin_shuffle
operand.
* gcc.target/i386/pr95905-3.c: New test.
* gcc.target/i386/pr95905-4.c: New test.

[Bug target/95905] Failure to optimize _mm_unpacklo_epi8 with 0 as right operand to _mm_cvtepu8_epi16

2021-01-12 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95905

--- Comment #2 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:b668a06e37f72fd96bacd6769990ec97dac4ac6d

commit r11-6628-gb668a06e37f72fd96bacd6769990ec97dac4ac6d
Author: Jakub Jelinek 
Date:   Wed Jan 13 08:02:54 2021 +0100

i386: Optimize _mm_unpacklo_epi8 of 0 vector as second argument or similar
VEC_PERM_EXPRs into pmovzx [PR95905]

The following patch adds patterns (so far 128-bit only) for permutations
like { 0 16 1 17 2 18 3 19 4 20 5 21 6 22 7 23 } where the second
operand is CONST0_RTX CONST_VECTOR to be emitted as pmovzx.

2021-01-13  Jakub Jelinek  

PR target/95905
* config/i386/predicates.md (pmovzx_parallel): New predicate.
* config/i386/sse.md (*sse4_1_zero_extendv8qiv8hi2_3): New
define_insn_and_split pattern.
(*sse4_1_zero_extendv4hiv4si2_3): Likewise.
(*sse4_1_zero_extendv2siv2di2_3): Likewise.

* gcc.target/i386/pr95905-1.c: New test.
* gcc.target/i386/pr95905-2.c: New test.

[Bug target/95905] Failure to optimize _mm_unpacklo_epi8 with 0 as right operand to _mm_cvtepu8_epi16

2020-06-25 Thread gabravier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95905

--- Comment #1 from Gabriel Ravier  ---
The same pattern with _mm_unpacklo_epi16/32 and the corresponding SSE4
intrinsics can also be optimized in the same way.