On Mon, Nov 21, 2022 at 3:17 PM Uros Bizjak <ubiz...@gmail.com> wrote: > > On Mon, Nov 21, 2022 at 6:24 AM Hongtao Liu <crazy...@gmail.com> wrote: > > > > On Mon, Nov 21, 2022 at 10:13 AM liuhongt <hongtao....@intel.com> wrote: > > > > > > When i'm working at [1] for ix86_can_change_mode_class, > > > I notice there're some incorrectness/misoptimization in current > > > RA-related hook. > > > This patch tries to do some fix and tidy up for them: > > > > > > 1. We also need to guard size of TO to be > > > less than TARGET_SSE2 ? 2 : 4 in ix86_can_change_mode_class. > > > 2. Merge VALID_AVX512FP16_SCALAR_MODE plus BFmode > > > into VALID_AVX512F_SCALAR_MODE since we've support 16-bit data move > > > above SSE2, so no need for the condition of AVX512FP16 for those evex > > > sse registers. > > > 3. Allocate DI/HImode to sse register for SSE2 above just like > > > SImode since we've supported 16-bit data move between sse and gpr > > > above SSE2, this will help RA to handle cases like (subreg:HI (reg:V8HI) > > > 0) or else RA will spill it. This enable optimization for > > > pieces-memset-{3,37,39}.c > > > 4. Guard 64/32-bit vector move patterns with ix86_hard_reg_move_ok. > > > > > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606373.html > > > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > > > Ok for trunk? > > > > > > gcc/ChangeLog: > > > > > > * config/i386/i386.cc (ix86_can_change_mode_class): Also guard > > > size of TO. > > > (ix86_hard_regno_mode_ok): Remove VALID_AVX512FP16_SCALAR_MODE > > > * config/i386/i386.h (VALID_AVX512FP16_SCALAR_MODE): Merged to > > > .. > > > (VALID_AVX512F_SCALAR_MODE): .. this, also add HImode. > > > (VALID_SSE_REG_MODE): Add DI/HImode. > > > * config/i386/mmx.md (*mov<mode>_internal): Add > > > ix86_hard_reg_move_ok to condition. > > > > > > gcc/testsuite/ChangeLog: > > > > > > * gcc.target/i386/pieces-memset-3.c: Remove xfail. > > > * gcc.target/i386/pieces-memset-37.c: Remove xfail. > > > * gcc.target/i386/pieces-memset-39.c: Remove xfail. > > OK. > > This is somehow tricky part of the compiler, so it would be nice if > the patch can be split to a couple of patches to ease bisecting if > something goes wrong. OTOH, recently there were a couple of similar > changes in this area, and there were no problems. Ok, I'll separate ix86_hard_reg_move_ok part into a separate patch. > > Thanks, > Uros. > > > > --- > > > gcc/config/i386/i386.cc | 9 ++------- > > > gcc/config/i386/i386.h | 16 ++++++++-------- > > > gcc/config/i386/mmx.md | 6 ++++-- > > > gcc/testsuite/gcc.target/i386/pieces-memset-3.c | 4 ++-- > > > gcc/testsuite/gcc.target/i386/pieces-memset-37.c | 4 ++-- > > > gcc/testsuite/gcc.target/i386/pieces-memset-39.c | 4 ++-- > > > 6 files changed, 20 insertions(+), 23 deletions(-) > > > > > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > > > index 292b32c5e99..030c26965ab 100644 > > > --- a/gcc/config/i386/i386.cc > > > +++ b/gcc/config/i386/i386.cc > > > @@ -19725,7 +19725,8 @@ ix86_can_change_mode_class (machine_mode from, > > > machine_mode to, > > > the vec_dupv4hi pattern. > > > NB: SSE2 can load 16bit data to sse register via pinsrw. */ > > > int mov_size = MAYBE_SSE_CLASS_P (regclass) && TARGET_SSE2 ? 2 : 4; > > > - if (GET_MODE_SIZE (from) < mov_size) > > > + if (GET_MODE_SIZE (from) < mov_size > > > + || GET_MODE_SIZE (to) < mov_size) > > > return false; > > > } > > > > > > @@ -20089,12 +20090,6 @@ ix86_hard_regno_mode_ok (unsigned int regno, > > > machine_mode mode) > > > || VALID_AVX512F_SCALAR_MODE (mode))) > > > return true; > > > > > > - /* For AVX512FP16, vmovw supports movement of HImode > > > - and HFmode between GPR and SSE registers. */ > > > - if (TARGET_AVX512FP16 > > > - && VALID_AVX512FP16_SCALAR_MODE (mode)) > > > - return true; > > > - > > > /* For AVX-5124FMAPS or AVX-5124VNNIW > > > allow V64SF and V64SI modes for special regnos. */ > > > if ((TARGET_AVX5124FMAPS || TARGET_AVX5124VNNIW) > > > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h > > > index 3869db8f2d3..d9a1fb0e420 100644 > > > --- a/gcc/config/i386/i386.h > > > +++ b/gcc/config/i386/i386.h > > > @@ -1017,11 +1017,9 @@ extern const char *host_detect_local_cpu (int > > > argc, const char **argv); > > > (VALID_AVX256_REG_MODE (MODE) || (MODE) == OImode) > > > > > > #define VALID_AVX512F_SCALAR_MODE(MODE) > > > \ > > > - ((MODE) == DImode || (MODE) == DFmode || (MODE) == SImode \ > > > - || (MODE) == SFmode) > > > - > > > -#define VALID_AVX512FP16_SCALAR_MODE(MODE) \ > > > - ((MODE) == HImode || (MODE) == HFmode) > > > + ((MODE) == DImode || (MODE) == DFmode > > > \ > > > + || (MODE) == SImode || (MODE) == SFmode \ > > > + || (MODE) == HImode || (MODE) == HFmode || (MODE) == BFmode) > > > > > > #define VALID_AVX512F_REG_MODE(MODE) \ > > > ((MODE) == V8DImode || (MODE) == V8DFmode || (MODE) == V64QImode \ > > > @@ -1045,13 +1043,15 @@ extern const char *host_detect_local_cpu (int > > > argc, const char **argv); > > > || (MODE) == V8HFmode || (MODE) == V4HFmode || (MODE) == V2HFmode \ > > > || (MODE) == V8BFmode || (MODE) == V4BFmode || (MODE) == V2BFmode \ > > > || (MODE) == V4QImode || (MODE) == V2HImode || (MODE) == V1SImode \ > > > - || (MODE) == V2DImode || (MODE) == V2QImode || (MODE) == DFmode \ > > > - || (MODE) == HFmode || (MODE) == BFmode) > > > + || (MODE) == V2DImode || (MODE) == V2QImode \ > > > + || (MODE) == DFmode || (MODE) == DImode \ > > > + || (MODE) == HFmode || (MODE) == BFmode || (MODE) == HImode) > > I realize there's no TARGET_SSE2 for VALID_SSE_REG_MODE in > > hard_regno_mode_ok, it's ok for other modes except HImode which must > > require SSE2, orelse we can move 16-bit from/to integer > > > > So Remove HImode from it, and add it separately in hard_regno_mode_ok > > as + /* Use pinsrw/pextrw to mov 16-bit data from/to sse to/from > > integer. */ > > + if (TARGET_SSE2 && mode == HImode) > > + return true; > > + > > > > > > > > > > #define VALID_SSE_REG_MODE(MODE) \ > > > ((MODE) == V1TImode || (MODE) == TImode \ > > > || (MODE) == V4SFmode || (MODE) == V4SImode \ > > > - || (MODE) == SFmode || (MODE) == TFmode || (MODE) == TDmode) > > > + || (MODE) == SFmode || (MODE) == SImode \ > > > + || (MODE) == TFmode || (MODE) == TDmode) > > > > > > #define VALID_MMX_REG_MODE_3DNOW(MODE) \ > > > ((MODE) == V2SFmode || (MODE) == SFmode) > > > diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md > > > index d5134cc351e..63aff287795 100644 > > > --- a/gcc/config/i386/mmx.md > > > +++ b/gcc/config/i386/mmx.md > > > @@ -133,7 +133,8 @@ (define_insn "*mov<mode>_internal" > > > (match_operand:MMXMODE 1 "nonimm_or_0_operand" > > > "rCo,rC,C,rm,rC,C ,!y,m ,?!y,?!y,r ,C,v,m,v,v,r,*x,!y"))] > > > "(TARGET_MMX || TARGET_MMX_WITH_SSE) > > > - && !(MEM_P (operands[0]) && MEM_P (operands[1]))" > > > + && !(MEM_P (operands[0]) && MEM_P (operands[1])) > > > + && ix86_hardreg_mov_ok (operands[0], operands[1])" > > > { > > > switch (get_attr_type (insn)) > > > { > > > @@ -286,7 +287,8 @@ (define_insn "*mov<mode>_internal" > > > "=r ,m ,v,v,v,m,r,v") > > > (match_operand:V_32 1 "general_operand" > > > "rmC,rC,C,v,m,v,v,r"))] > > > - "!(MEM_P (operands[0]) && MEM_P (operands[1]))" > > > + "!(MEM_P (operands[0]) && MEM_P (operands[1])) > > > + && ix86_hardreg_mov_ok (operands[0], operands[1])" > > > { > > > switch (get_attr_type (insn)) > > > { > > > diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-3.c > > > b/gcc/testsuite/gcc.target/i386/pieces-memset-3.c > > > index 765441a7c4a..4f105f58b26 100644 > > > --- a/gcc/testsuite/gcc.target/i386/pieces-memset-3.c > > > +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-3.c > > > @@ -13,6 +13,6 @@ foo (int x) > > > /* { dg-final { scan-assembler-times "vinserti64x4\[ \\t\]+\[^\n\]*%zmm" > > > 1 } } */ > > > /* { dg-final { scan-assembler-times "vmovdqu64\[ \\t\]+\[^\n\]*%zmm" 1 > > > } } */ > > > /* No need to dynamically realign the stack here. */ > > > -/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" { xfail *-*-* > > > } } } */ > > > +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ > > > /* Nor use a frame pointer. */ > > > -/* { dg-final { scan-assembler-not "%\[re\]bp" { xfail *-*-* } } } */ > > > +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ > > > diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-37.c > > > b/gcc/testsuite/gcc.target/i386/pieces-memset-37.c > > > index 0c5056be54d..fd09bd153ce 100644 > > > --- a/gcc/testsuite/gcc.target/i386/pieces-memset-37.c > > > +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-37.c > > > @@ -10,6 +10,6 @@ foo (int a1, int a2, int a3, int a4, int a5, int a6, > > > int x, char *dst) > > > /* { dg-final { scan-assembler-times "vpbroadcastb\[ \\t\]+\[^\n\]*%ymm" > > > 1 } } */ > > > /* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 2 } > > > } */ > > > /* No need to dynamically realign the stack here. */ > > > -/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" { xfail *-*-* > > > } } } */ > > > +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ > > > /* Nor use a frame pointer. */ > > > -/* { dg-final { scan-assembler-not "%\[re\]bp" { xfail *-*-* } } } */ > > > +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ > > > diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-39.c > > > b/gcc/testsuite/gcc.target/i386/pieces-memset-39.c > > > index e33644c2f10..0ed88b274bd 100644 > > > --- a/gcc/testsuite/gcc.target/i386/pieces-memset-39.c > > > +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-39.c > > > @@ -11,6 +11,6 @@ foo (int a1, int a2, int a3, int a4, int a5, int a6, > > > int x, char *dst) > > > /* { dg-final { scan-assembler-not "vinserti64x4" } } */ > > > /* { dg-final { scan-assembler-times "vmovdqu8\[ \\t\]+\[^\n\]*%zmm" 1 } > > > } */ > > > /* No need to dynamically realign the stack here. */ > > > -/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" { xfail *-*-* > > > } } } */ > > > +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ > > > /* Nor use a frame pointer. */ > > > -/* { dg-final { scan-assembler-not "%\[re\]bp" { xfail *-*-* } } } */ > > > +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ > > > -- > > > 2.27.0 > > > > > > > > > -- > > BR, > > Hongtao
-- BR, Hongtao