https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100865
--- Comment #6 from CVS Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by H.J. Lu <h...@gcc.gnu.org>: https://gcc.gnu.org/g:edafb35bdadf309ebb9d1eddc5549f9e1ad49c09 commit r12-1958-gedafb35bdadf309ebb9d1eddc5549f9e1ad49c09 Author: H.J. Lu <hjl.to...@gmail.com> Date: Wed Jun 2 07:15:45 2021 -0700 x86: Convert CONST_WIDE_INT/CONST_VECTOR to broadcast 1. Update move expanders to convert the CONST_WIDE_INT and CONST_VECTOR operands to vector broadcast from an integer with AVX. 2. Add ix86_gen_scratch_sse_rtx to return a scratch SSE register which won't increase stack alignment requirement and blocks transformation by the combine pass. A small benchmark: https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/memset/broadcast shows that broadcast is a little bit faster on Intel Core i7-8559U: $ make gcc -g -I. -O2 -c -o test.o test.c gcc -g -c -o memory.o memory.S gcc -g -c -o broadcast.o broadcast.S gcc -g -c -o vec_dup_sse2.o vec_dup_sse2.S gcc -o test test.o memory.o broadcast.o vec_dup_sse2.o ./test memory : 147215 broadcast : 121213 vec_dup_sse2: 171366 $ broadcast is also smaller: $ size memory.o broadcast.o text data bss dec hex filename 132 0 0 132 84 memory.o 122 0 0 122 7a broadcast.o $ 3. Update PR 87767 tests to expect integer broadcast instead of broadcast from memory. 4. Update avx512f_cond_move.c to expect integer broadcast. A small benchmark: https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/vpaddd/broadcast shows that integer broadcast is faster than embedded memory broadcast: $ make gcc -g -I. -O2 -march=skylake-avx512 -c -o test.o test.c gcc -g -c -o memory.o memory.S gcc -g -c -o broadcast.o broadcast.S gcc -o test test.o memory.o broadcast.o ./test memory : 425538 broadcast : 375260 $ gcc/ PR target/100865 * config/i386/i386-expand.c (ix86_expand_vector_init_duplicate): New prototype. (ix86_byte_broadcast): New function. (ix86_convert_const_wide_int_to_broadcast): Likewise. (ix86_expand_move): Convert CONST_WIDE_INT to broadcast if mode size is 16 bytes or bigger. (ix86_broadcast_from_integer_constant): New function. (ix86_expand_vector_move): Convert CONST_WIDE_INT and CONST_VECTOR to broadcast if mode size is 16 bytes or bigger. * config/i386/i386-protos.h (ix86_gen_scratch_sse_rtx): New prototype. * config/i386/i386.c (ix86_gen_scratch_sse_rtx): New function. gcc/testsuite/ PR target/100865 * gcc.target/i386/avx512f-broadcast-pr87767-1.c: Expect integer broadcast. * gcc.target/i386/avx512f-broadcast-pr87767-5.c: Likewise. * gcc.target/i386/avx512vl-broadcast-pr87767-1.c: Likewise. * gcc.target/i386/avx512vl-broadcast-pr87767-5.c: Likewise. * gcc.target/i386/avx512f_cond_move.c: Also pass -mprefer-vector-width=512 and expect integer broadcast. * gcc.target/i386/pr100865-1.c: New test. * gcc.target/i386/pr100865-2.c: Likewise. * gcc.target/i386/pr100865-3.c: Likewise. * gcc.target/i386/pr100865-4a.c: Likewise. * gcc.target/i386/pr100865-4b.c: Likewise. * gcc.target/i386/pr100865-5a.c: Likewise. * gcc.target/i386/pr100865-5b.c: Likewise. * gcc.target/i386/pr100865-6a.c: Likewise. * gcc.target/i386/pr100865-6b.c: Likewise. * gcc.target/i386/pr100865-6c.c: Likewise. * gcc.target/i386/pr100865-7a.c: Likewise. * gcc.target/i386/pr100865-7b.c: Likewise. * gcc.target/i386/pr100865-7c.c: Likewise. * gcc.target/i386/pr100865-8a.c: Likewise. * gcc.target/i386/pr100865-8b.c: Likewise. * gcc.target/i386/pr100865-8c.c: Likewise. * gcc.target/i386/pr100865-9a.c: Likewise. * gcc.target/i386/pr100865-9b.c: Likewise. * gcc.target/i386/pr100865-9c.c: Likewise. * gcc.target/i386/pr100865-10a.c: Likewise. * gcc.target/i386/pr100865-10b.c: Likewise. * gcc.target/i386/pr100865-11a.c: Likewise. * gcc.target/i386/pr100865-11b.c: Likewise. * gcc.target/i386/pr100865-11c.c: Likewise. * gcc.target/i386/pr100865-12a.c: Likewise. * gcc.target/i386/pr100865-12b.c: Likewise. * gcc.target/i386/pr100865-12c.c: Likewise.