[Bug target/104124] Poor optimization for vector splat DW with small consts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104124 HaoChen Gui changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #6 from HaoChen Gui --- fixed
[Bug target/104124] Poor optimization for vector splat DW with small consts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104124 --- Comment #5 from Steven Munroe --- Thanks
[Bug target/104124] Poor optimization for vector splat DW with small consts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104124 --- Comment #4 from CVS Commits --- The master branch has been updated by HaoChen Gui : https://gcc.gnu.org/g:f3d87219dd502d5c11608ffb83fbe66c79baf784 commit r14-2153-gf3d87219dd502d5c11608ffb83fbe66c79baf784 Author: Haochen Gui Date: Wed Jun 28 16:30:44 2023 +0800 rs6000: Splat vector small V2DI constants with vspltisw and vupkhsw This patch adds a new insn for vector splat with small V2DI constants on P8. If the value of constant is in RANGE (-16, 15) but not 0 or -1, it can be loaded with vspltisw and vupkhsw on P8. gcc/ PR target/104124 * config/rs6000/altivec.md (*altivec_vupkhs_direct): Rename to... (altivec_vupkhs_direct): ...this. * config/rs6000/predicates.md (vspltisw_vupkhsw_constant_split): New predicate to test if a constant can be loaded with vspltisw and vupkhsw. (easy_vector_constant): Call vspltisw_vupkhsw_constant_p to Check if a vector constant can be synthesized with a vspltisw and a vupkhsw. * config/rs6000/rs6000-protos.h (vspltisw_vupkhsw_constant_p): Declare. * config/rs6000/rs6000.cc (vspltisw_vupkhsw_constant_p): New function to return true if OP mode is V2DI and can be synthesized with vupkhsw and vspltisw. * config/rs6000/vsx.md (*vspltisw_v2di_split): New insn to load up constants with vspltisw and vupkhsw. gcc/testsuite/ PR target/104124 * gcc.target/powerpc/pr104124.c: New.
[Bug target/104124] Poor optimization for vector splat DW with small consts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104124 --- Comment #3 from Michael Meissner --- There are two things going on. 1) There is no vspltisd instruction, so we can't generate a single instruction to load constants other than 0 or -1. Unfortunately, this was not added in either power9 or power10. 2) On the power9 and power10 we have the xxspltib and vecsb2d instructions, and we generate those if -mcpu=power9. To add support for new types of constants, the procedure is: 1) You need to modify easy_altivec_constant and gen_altivec_constant in rs6000.c (or rs6000.cc in GCC 12). Then add new predicates in predicate.md for these new patterns. 2) Look for the predicates "easy_vector_constant_add_self" and so forth in predicates.md and add a new predicate here. 3) Then in altivec.md, look for the define_splits that use the various easy_vector_const_* functions and add a new pattern.
[Bug target/104124] Poor optimization for vector splat DW with small consts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104124 Steven Munroe changed: What|Removed |Added Attachment #52236|0 |1 is obsolete|| --- Comment #2 from Steven Munroe --- Created attachment 52307 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52307&action=edit Enhansed test case that also shows CSE failure Original test case that adds example where CSE should common a splat immediate or even .rodata load, but fails to do even that.
[Bug target/104124] Poor optimization for vector splat DW with small consts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104124 Steven Munroe changed: What|Removed |Added CC||munroesj at gcc dot gnu.org --- Comment #1 from Steven Munroe --- Created attachment 52236 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52236&action=edit Attempts to load small int consts to vector DW via splat Multiple attempt to convince GCC to load small integer (-16 - 15) constants via splat. Current GCC versions (9/10/11) convert vec_splats() and explicit vec_splat_s32/vec_unpackl sequences into to loads from .rodata. This generates more instruction, takes more cycles, and causes register pressure that results in unnecessary spill/reload and load-hit-store rejects.