On Tue, Mar 1, 2022 at 5:23 PM Hongtao Liu <crazy...@gmail.com> wrote: > > On Wed, Mar 2, 2022 at 6:49 AM H.J. Lu <hjl.to...@gmail.com> wrote: > > > > On Tue, Mar 1, 2022 at 7:06 AM H.J. Lu <hjl.to...@gmail.com> wrote: > > > > > > On Mon, Feb 28, 2022 at 9:36 PM Hongtao Liu <crazy...@gmail.com> wrote: > > > > > > > > On Tue, Mar 1, 2022 at 10:39 AM H.J. Lu via Gcc-patches > > > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > > > > > On Mon, Feb 28, 2022 at 6:26 PM H.J. Lu <hjl.to...@gmail.com> wrote: > > > > > > > > > > > > On Mon, Feb 28, 2022 at 6:03 PM liuhongt <hongtao....@intel.com> > > > > > > wrote: > > > > > > > > > > > > > > .. in ix86_expand_vector_move and > > > > > > > ix86_convert_const_wide_int_to_broadcast(called by the former). > > > > > > > > > > > > > > ix86_expand_vector_move is called by emit_move_insn which is used > > > > > > > by > > > > > > > many pre_reload passes, ix86_gen_scratch_sse_rtx will break data > > > > > > > flow > > > > > > > when there's explict usage of xmm7/xmm15/xmm31. > > > > > > > > > > > > > > Bootstrapped and regtested on x86_64-linux-gnu{-m32,} > > > > > > > for both w/and w/o --with-cpu=native --with-arch=native. > > > > > > > > > > > > > > Ok for trunk? > > > > > > > > > > > > > > gcc/ChangeLog: > > > > > > > > > > > > > > PR target/104704 > > > > > > > * config/i386/i386-expand.cc > > > > > > > (ix86_convert_const_wide_int_to_broadcast): Replace > > > > > > > ix86_gen_scratch_sse_rtx with gen_reg_rtx. > > > > > > > (ix86_expand_vector_move): Ditto. > > > > > > > * config/i386/sse.md (*vec_dupv4si): Add alternative $r > > > > > > > and > > > > > > > corresponding splitter after it. > > > > > > > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > > > > > > > * gcc.target/i386/incoming-11.c: Revert > > > > > > > r12-2665-g7f4c3943f795fd. > > > > > > > * gcc.target/i386/pr100865-11b.c: Expect vmovdqa or > > > > > > > vmovda64. > > > > > > > * gcc.target/i386/pr100865-12b.c: Ditto. > > > > > > > * gcc.target/i386/pr100865-8b.c: Ditto. > > > > > > > * gcc.target/i386/pr100865-9b.c: Ditto. > > > > > > > * gcc.target/i386/pr82941-1.c: Expect vzeroupper for ! > > > > > > > ia32. > > > > > > > * gcc.target/i386/pr82942-1.c: Ditto. > > > > > > > * gcc.target/i386/pr82990-1.c: Ditto. > > > > > > > * gcc.target/i386/pr82990-3.c: Ditto. > > > > > > > * gcc.target/i386/pr82990-5.c: Ditto. > > > > > > > --- > > > > > > > gcc/config/i386/i386-expand.cc | 6 +-- > > > > > > > gcc/config/i386/sse.md | 41 > > > > > > > +++++++++++++++----- > > > > > > > gcc/testsuite/gcc.target/i386/incoming-11.c | 2 +- > > > > > > > gcc/testsuite/gcc.target/i386/pr100865-11b.c | 2 +- > > > > > > > gcc/testsuite/gcc.target/i386/pr100865-12b.c | 2 +- > > > > > > > gcc/testsuite/gcc.target/i386/pr100865-8b.c | 2 +- > > > > > > > gcc/testsuite/gcc.target/i386/pr100865-9b.c | 2 +- > > > > > > > gcc/testsuite/gcc.target/i386/pr82941-1.c | 3 +- > > > > > > > gcc/testsuite/gcc.target/i386/pr82942-1.c | 3 +- > > > > > > > gcc/testsuite/gcc.target/i386/pr82990-1.c | 3 +- > > > > > > > gcc/testsuite/gcc.target/i386/pr82990-3.c | 3 +- > > > > > > > gcc/testsuite/gcc.target/i386/pr82990-5.c | 3 +- > > > > > > > 12 files changed, 45 insertions(+), 27 deletions(-) > > > > > > > > > > > > > > diff --git a/gcc/config/i386/i386-expand.cc > > > > > > > b/gcc/config/i386/i386-expand.cc > > > > > > > index faa0191c6dd..75a28cdd89d 100644 > > > > > > > --- a/gcc/config/i386/i386-expand.cc > > > > > > > +++ b/gcc/config/i386/i386-expand.cc > > > > > > > @@ -257,7 +257,7 @@ ix86_convert_const_wide_int_to_broadcast > > > > > > > (machine_mode mode, rtx op) > > > > > > > machine_mode vector_mode; > > > > > > > if (!mode_for_vector (broadcast_mode, nunits).exists > > > > > > > (&vector_mode)) > > > > > > > gcc_unreachable (); > > > > > > > - rtx target = ix86_gen_scratch_sse_rtx (vector_mode); > > > > > > > + rtx target = gen_reg_rtx (vector_mode); > > > > > > > > > > > > I think ix86_gen_scratch_sse_rtx should check > > > > > > currently_expanding_gimple_stmt == NULL > > > > > > to return gen_reg_rtx (vector_mode) instead. > > > > > > > > > > Like this: > > > > > > > > > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > > > > > index b2bf90576d5..6c0e4929914 100644 > > > > > --- a/gcc/config/i386/i386.cc > > > > > +++ b/gcc/config/i386/i386.cc > > > > > @@ -23786,7 +23786,7 @@ ix86_optab_supported_p (int op, machine_mode > > > > > mode1, machine_mode, > > > > > rtx > > > > > ix86_gen_scratch_sse_rtx (machine_mode mode) > > > > > { > > > > > - if (TARGET_SSE && !lra_in_progress) > > > > > + if (TARGET_SSE && currently_expanding_gimple_stmt) > > > > > { > > > > > unsigned int regno; > > > > > if (TARGET_64BIT) > > > > > (END) > > > > Looks like it relies on PR104721. > > > > > > I have checked the fix for PR104721. > > > > > > > The proposed patch doesn't fix the testcase in: > > > The original patch can, then i prefer my patch to > currently_expanding_gimple_stmt. > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104704 > > > > I am testing: > > > > https://gitlab.com/x86-gcc/gcc/-/merge_requests/28 > > > > -- > > H.J.
There are 2 kinds of issues in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104704 1. __m512d y, z; int i; int do_test (void) { register int xmm31 __asm ("xmm31") = i; asm volatile ("" : "+v" (xmm31)); z = y; register int xmm2 __asm ("xmm2") = xmm31; asm volatile ("" : "+v" (xmm2)); return xmm2; } 2. char z[128]; int i; __attribute__((noipa)) int do_test (void) { register int xmm31 __asm ("xmm31") = i; asm volatile ("" : "+v" (xmm31)); __builtin_memset (&z, 0, sizeof (z)); register int xmm2 __asm ("xmm2") = xmm31; asm volatile ("" : "+v" (xmm2)); return xmm2; } Your patch fixes #1. I don't think it fixes #2. -- H.J.