[Bug target/107548] STV doesn't consider vec_select

2022-12-26 Thread roger at nextmovesoftware dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107548

Roger Sayle  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED
 CC||roger at nextmovesoftware dot 
com
   Target Milestone|--- |13.0

--- Comment #3 from Roger Sayle  ---
This should now be fixed/implemented on mainline.

[Bug target/107548] STV doesn't consider vec_select

2022-12-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107548

--- Comment #2 from CVS Commits  ---
The master branch has been updated by Roger Sayle :

https://gcc.gnu.org/g:3cf6d0e1830231dd47740e66926499db600b9ae4

commit r13-4886-g3cf6d0e1830231dd47740e66926499db600b9ae4
Author: Roger Sayle 
Date:   Sat Dec 24 22:07:11 2022 +

[Committed] Tweak new gcc.target/i386/pr107548-1.c for -march=cascadelake.

My recently added testcases gcc.target/i386/pr107548-[12].c need to be
tweaked slightly for -march=cascadelake.  Committed as obvious.

2022-12-24  Roger Sayle  

gcc/testsuite/ChangeLog
PR target/107548
* gcc.target/i386/pr107548-1.c: Match both vmovd and movd.
* gcc.target/i386/pr107548-2.c: Match both vpaddq and paddq.

[Bug target/107548] STV doesn't consider vec_select

2022-12-23 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107548

--- Comment #1 from CVS Commits  ---
The master branch has been updated by Roger Sayle :

https://gcc.gnu.org/g:0b2c1369d035e92847cca81fd9f7b4e9ab9da710

commit r13-4873-g0b2c1369d035e92847cca81fd9f7b4e9ab9da710
Author: Roger Sayle 
Date:   Fri Dec 23 09:56:30 2022 +

PR target/107548: Handle vec_select in STV on x86.

This patch enhances x86's STV pass to handle VEC_SELECT during general
scalar chain conversion, performing SImode scalar extraction from V4SI
and DImode scalar extraction from V2DI in vector registers.

The motivating test case from bugzilla is:

typedef unsigned int v4si __attribute__((vector_size(16)));

unsigned int f (v4si a, v4si b)
{
  a[0] += b[0];
  return a[0] + a[1];
}

currently with -O2 -march=znver2 this generates:

vpextrd $1, %xmm0, %edx
vmovd   %xmm0, %eax
addl%edx, %eax
vmovd   %xmm1, %edx
addl%edx, %eax
ret

which performs three transfers from the vector unit to the scalar unit,
and performs the two additions there.  With this patch, we now generate:

vmovdqa %xmm0, %xmm2
vpshufd $85, %xmm0, %xmm0
vpaddd  %xmm0, %xmm2, %xmm0
vpaddd  %xmm1, %xmm0, %xmm0
vmovd   %xmm0, %eax
ret

which performs the two additions in the vector unit, and then transfers
the result to the scalar unit.  Technically the (cheap) movdqa isn't
needed with better register allocation (or this could be cleaned up
during peephole2), but even so this transform is still a win.

2022-12-23  Roger Sayle  

gcc/ChangeLog
PR target/107548
* config/i386/i386-features.cc (scalar_chain::add_insn): The
operands of a VEC_SELECT don't need to added to the scalar chain.
(general_scalar_chain::compute_convert_gain) :
Provide gains for performing STV on a VEC_SELECT.
(general_scalar_chain::convert_insn): Convert VEC_SELECT to pshufd,
psrldq or no-op.
(general_scalar_to_vector_candidate_p): Handle VEC_SELECT of a
single element from a vector register to a scalar register.

gcc/testsuite/ChangeLog
PR target/107548
* gcc.target/i386/pr107548-1.c: New test V4SI case.
* gcc.target/i386/pr107548-2.c: New test V2DI case.