[Bug target/103393] [12 Regression] Generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128

2021-11-24 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103393

--- Comment #7 from rguenther at suse dot de  ---
On Thu, 25 Nov 2021, crazylht at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103393
> 
> --- Comment #6 from Hongtao.liu  ---
> (In reply to Hongtao.liu from comment #5)
> > (In reply to Richard Biener from comment #3)
> > > (In reply to H.J. Lu from comment #2)
> > > > (In reply to Richard Biener from comment #1)
> > > > > It isn't the vectorizer but memmove inline expansion.  I'm not sure 
> > > > > it's
> > > > > really a bug, but there isn't a way to disable %ymm use besides 
> > > > > disabling
> > > > > AVX entirely.
> > > > > HJ?
> > > > 
> > > > YMM move is generated by loop distribution which doesn't check
> > > > TARGET_PREFER_AVX128.
> > > 
> > > I think it's generated by gimple_fold_builtin_memory_op which since 
> > > Richards
> > > changes accepts bigger now, up to MOVE_MAX * MOVE_RATIO and that ends up
> > > picking an integer mode via
> > > 
> > >   scalar_int_mode mode;
> > >   if (int_mode_for_size (ilen * 8, 0).exists ()
> > >   && GET_MODE_SIZE (mode) * BITS_PER_UNIT == ilen * 8
> > >   && have_insn_for (SET, mode)
> > >   /* If the destination pointer is not aligned we must be
> > > able
> > >  to emit an unaligned store.  */
> > >   && (dest_align >= GET_MODE_ALIGNMENT (mode)
> > >   || !targetm.slow_unaligned_access (mode, dest_align)
> > >   || (optab_handler (movmisalign_optab, mode)
> > >   != CODE_FOR_nothing)))
> > > 
> > > not sure if there's another way to validate things.
> > 
> > For one single set operation, shouldn't the total size be less than MOVE_MAX
> > instead of MOVE_MAX * MOVE_RATIO?
> 
> r12-3482 change MOVE_MAX to MOVE_MAX * MOVE_RATIO

Yes, IIRC it was specifically to allow vector register moves on
aarch64/arm which doesn't seem to have a MOVE_MAX that exceeds
WORD_SIZE.  It looks like x86 carefully tries to have a MOVE_MAX
that honors -mprefer-xxx as to not exceed a single move size.

Both seem to be in conflict here.  Richard - why could arm/aarch64
not increase MOVE_MAX here?

[Bug target/103393] [12 Regression] Generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128

2021-11-24 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103393

--- Comment #6 from Hongtao.liu  ---
(In reply to Hongtao.liu from comment #5)
> (In reply to Richard Biener from comment #3)
> > (In reply to H.J. Lu from comment #2)
> > > (In reply to Richard Biener from comment #1)
> > > > It isn't the vectorizer but memmove inline expansion.  I'm not sure it's
> > > > really a bug, but there isn't a way to disable %ymm use besides 
> > > > disabling
> > > > AVX entirely.
> > > > HJ?
> > > 
> > > YMM move is generated by loop distribution which doesn't check
> > > TARGET_PREFER_AVX128.
> > 
> > I think it's generated by gimple_fold_builtin_memory_op which since Richards
> > changes accepts bigger now, up to MOVE_MAX * MOVE_RATIO and that ends up
> > picking an integer mode via
> > 
> >   scalar_int_mode mode;
> >   if (int_mode_for_size (ilen * 8, 0).exists ()
> >   && GET_MODE_SIZE (mode) * BITS_PER_UNIT == ilen * 8
> >   && have_insn_for (SET, mode)
> >   /* If the destination pointer is not aligned we must be
> > able
> >  to emit an unaligned store.  */
> >   && (dest_align >= GET_MODE_ALIGNMENT (mode)
> >   || !targetm.slow_unaligned_access (mode, dest_align)
> >   || (optab_handler (movmisalign_optab, mode)
> >   != CODE_FOR_nothing)))
> > 
> > not sure if there's another way to validate things.
> 
> For one single set operation, shouldn't the total size be less than MOVE_MAX
> instead of MOVE_MAX * MOVE_RATIO?

r12-3482 change MOVE_MAX to MOVE_MAX * MOVE_RATIO

[Bug target/103393] [12 Regression] Generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128

2021-11-24 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103393

Hongtao.liu  changed:

   What|Removed |Added

 CC||crazylht at gmail dot com

--- Comment #5 from Hongtao.liu  ---
(In reply to Richard Biener from comment #3)
> (In reply to H.J. Lu from comment #2)
> > (In reply to Richard Biener from comment #1)
> > > It isn't the vectorizer but memmove inline expansion.  I'm not sure it's
> > > really a bug, but there isn't a way to disable %ymm use besides disabling
> > > AVX entirely.
> > > HJ?
> > 
> > YMM move is generated by loop distribution which doesn't check
> > TARGET_PREFER_AVX128.
> 
> I think it's generated by gimple_fold_builtin_memory_op which since Richards
> changes accepts bigger now, up to MOVE_MAX * MOVE_RATIO and that ends up
> picking an integer mode via
> 
>   scalar_int_mode mode;
>   if (int_mode_for_size (ilen * 8, 0).exists ()
>   && GET_MODE_SIZE (mode) * BITS_PER_UNIT == ilen * 8
>   && have_insn_for (SET, mode)
>   /* If the destination pointer is not aligned we must be
> able
>  to emit an unaligned store.  */
>   && (dest_align >= GET_MODE_ALIGNMENT (mode)
>   || !targetm.slow_unaligned_access (mode, dest_align)
>   || (optab_handler (movmisalign_optab, mode)
>   != CODE_FOR_nothing)))
> 
> not sure if there's another way to validate things.

For one single set operation, shouldn't the total size be less than MOVE_MAX
instead of MOVE_MAX * MOVE_RATIO?


  /* If we can perform the copy efficiently with first doing all loads and
 then all stores inline it that way.  Currently efficiently means that
 we can load all the memory with a single set operation and that the
 total size is less than MOVE_MAX * MOVE_RATIO.  */
  src_align = get_pointer_alignment (src);
  dest_align = get_pointer_alignment (dest);
  if (tree_fits_uhwi_p (len)
  && (compare_tree_int
  (len, (MOVE_MAX
 * MOVE_RATIO (optimize_function_for_size_p (cfun
  <= 0)

[Bug target/103393] [12 Regression] Generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128

2021-11-24 Thread jschoen4 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103393

--- Comment #4 from John S  ---
I can Confirm from my side that it does appear to be the memmove inline
expansion and not the auto vectorizer.  It also occurs with
builtin_memset/builtin_memcpy as well.

For some context, this is an issue would prevent the usage of gcc in my
production environment.  It will certainly impact other use cases outside of my
own as well.  For example, it becomes impossible to use "-mno-vzeroupper -mavx
-mpreferred-vector-width=128" and use _mm256_xxx + _mm256_zeroupper()
intrinsics to properly manage the ymm state (clear or not) since the compiler
is now able to insert ymm's almost anywhere via the memmove inlining.

Up until now the prefer-width has always behaved as in a way that all auto
generated vector uses will not exceed the preferred width.  Only explicit use
of the _mm256/_mm512_ .. intrinsics or the "vector types" i.e. `__m256 var;
__m512 var;` would result in wider register usage.

I do believe Clang/icc behave this way as well and there are dependencies on
this behavior.  The same also applies w/ avx-512 enabled with ZMM usage +
prefer=128/256 where the downclocking issues can be even more pronounced.

[Bug target/103393] [12 Regression] Generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128

2021-11-24 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103393

Richard Biener  changed:

   What|Removed |Added

 CC||rearnsha at gcc dot gnu.org

--- Comment #3 from Richard Biener  ---
(In reply to H.J. Lu from comment #2)
> (In reply to Richard Biener from comment #1)
> > It isn't the vectorizer but memmove inline expansion.  I'm not sure it's
> > really a bug, but there isn't a way to disable %ymm use besides disabling
> > AVX entirely.
> > HJ?
> 
> YMM move is generated by loop distribution which doesn't check
> TARGET_PREFER_AVX128.

I think it's generated by gimple_fold_builtin_memory_op which since Richards
changes accepts bigger now, up to MOVE_MAX * MOVE_RATIO and that ends up
picking an integer mode via

  scalar_int_mode mode;
  if (int_mode_for_size (ilen * 8, 0).exists ()
  && GET_MODE_SIZE (mode) * BITS_PER_UNIT == ilen * 8
  && have_insn_for (SET, mode)
  /* If the destination pointer is not aligned we must be able
 to emit an unaligned store.  */
  && (dest_align >= GET_MODE_ALIGNMENT (mode)
  || !targetm.slow_unaligned_access (mode, dest_align)
  || (optab_handler (movmisalign_optab, mode)
  != CODE_FOR_nothing)))

not sure if there's another way to validate things.

[Bug target/103393] [12 Regression] Generating 256bit register usage with -mprefer-avx128 -mprefer-vector-width=128

2021-11-24 Thread hjl.tools at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103393

--- Comment #2 from H.J. Lu  ---
(In reply to Richard Biener from comment #1)
> It isn't the vectorizer but memmove inline expansion.  I'm not sure it's
> really a bug, but there isn't a way to disable %ymm use besides disabling
> AVX entirely.
> HJ?

YMM move is generated by loop distribution which doesn't check
TARGET_PREFER_AVX128.