Re: [r12-3893 Regression] FAIL: gcc.target/i386/vect-pr97352.c scan-assembler-times vmov.pd 4 on Linux/x86_64

2021-09-28 Thread Richard Biener via Gcc-patches
On Tue, 28 Sep 2021, Richard Biener wrote:

> On Tue, 28 Sep 2021, Hongtao Liu wrote:
> 
> > On Tue, Sep 28, 2021 at 2:59 PM Richard Biener via Gcc-patches
> >  wrote:
> > >
> > > On Mon, 27 Sep 2021, sunil.k.pandey wrote:
> > >
> > > > On Linux/x86_64,
> > > >
> > > > 6390c5047adb75960f86d56582e6322aaa4d9281 is the first bad commit
> > > > commit 6390c5047adb75960f86d56582e6322aaa4d9281
> > > > Author: Richard Biener 
> > > > Date:   Wed Nov 18 09:36:57 2020 +0100
> > > >
> > > > Allow different vector types for stmt groups
> > > >
> > > > caused
> > > >
> > > > FAIL: gcc.dg/vect/bb-slp-17.c -flto -ffat-lto-objects  
> > > > scan-tree-dump-times slp2 "optimized: basic block" 1
> > > > FAIL: gcc.dg/vect/bb-slp-17.c scan-tree-dump-times slp2 "optimized: 
> > > > basic block" 1
> > >
> > > This shows that it is maybe a bad idea to support V2SImode vectorization
> > > with -m32 when we refuse to implement even plus.
> > >
> > > OTOH it's just the mode that's available, autovectorize_vector_modes
> > > doesn't include the corresponding mode but we still pick it up via
> > > the related vector mode for group-size == 2.
> 
> It looks like we could define the vectorize.related_mode hook to
> reject V2SImode when !TARGET_MMX_WITH_SSE - the default implementation
> just checks for vector_mode_supported_p.

Meh, that doesn't work.  We then fall through

  else if (SCALAR_INT_MODE_P (prevailing_mode)
   || !related_vector_mode (prevailing_mode,
inner_mode, nunits).exists 
(_mode))
{
  /* Fall back to using mode_for_vector, mostly in the hope of being
 able to use an integer mode.  */
  if (known_eq (nunits, 0U)
  && !multiple_p (GET_MODE_SIZE (prevailing_mode), nbytes, 
))
return NULL_TREE;

  if (!mode_for_vector (inner_mode, nunits).exists (_mode))
return NULL_TREE;

and return V2SImode anyway from mode_for_vector ...

So - should we only allow integer modes here as the comment suggests?
With that, thus

  if (!mode_for_vector (inner_mode, nunits).exists (_mode)
  || GET_MODE_CLASS (simd_mode) != MODE_INT)
return NULL_TREE;

we "properly" _not_ use V2SImode for vectorization on x86 when
!TARGET_MMX_WITH_SSE.  Note that will also not use V2SImode
for vectorizing copies (which are properly supported).  So I'm
not sure rejecting V2SImode outright is "proper" ...

Richard.


Re: [r12-3893 Regression] FAIL: gcc.target/i386/vect-pr97352.c scan-assembler-times vmov.pd 4 on Linux/x86_64

2021-09-28 Thread Richard Biener via Gcc-patches
On Tue, 28 Sep 2021, Hongtao Liu wrote:

> On Tue, Sep 28, 2021 at 2:59 PM Richard Biener via Gcc-patches
>  wrote:
> >
> > On Mon, 27 Sep 2021, sunil.k.pandey wrote:
> >
> > > On Linux/x86_64,
> > >
> > > 6390c5047adb75960f86d56582e6322aaa4d9281 is the first bad commit
> > > commit 6390c5047adb75960f86d56582e6322aaa4d9281
> > > Author: Richard Biener 
> > > Date:   Wed Nov 18 09:36:57 2020 +0100
> > >
> > > Allow different vector types for stmt groups
> > >
> > > caused
> > >
> > > FAIL: gcc.dg/vect/bb-slp-17.c -flto -ffat-lto-objects  
> > > scan-tree-dump-times slp2 "optimized: basic block" 1
> > > FAIL: gcc.dg/vect/bb-slp-17.c scan-tree-dump-times slp2 "optimized: basic 
> > > block" 1
> >
> > This shows that it is maybe a bad idea to support V2SImode vectorization
> > with -m32 when we refuse to implement even plus.
> >
> > OTOH it's just the mode that's available, autovectorize_vector_modes
> > doesn't include the corresponding mode but we still pick it up via
> > the related vector mode for group-size == 2.

It looks like we could define the vectorize.related_mode hook to
reject V2SImode when !TARGET_MMX_WITH_SSE - the default implementation
just checks for vector_mode_supported_p.

> > > FAIL: gcc.dg/vect/bb-slp-pr65935.c -flto -ffat-lto-objects  
> > > scan-tree-dump-times slp1 "optimized: basic block" 10
> > > FAIL: gcc.dg/vect/bb-slp-pr65935.c scan-tree-dump-times slp1 "optimized: 
> > > basic block" 10
> >
> > We are now vectorizing the SSE tail when vectorizing with AVX.  I'll
> > adjust the testcase to prefer SSE.
> >
> > > FAIL: gcc.target/i386/vect-pr97352.c scan-assembler-times vmov.pd 4
> >
> > With -mach=cascadelake we get
> >
> > vpermpd $68, c, %ymm0
> > vpermpd $238, c, %ymm0
> >
> > instead of
> >
> > vmovapd c, %ymm1
> > vinsertf128 $1, %xmm1, %ymm1, %ymm0
> > vperm2f128  $49, %ymm1, %ymm1, %ymm0
> >
> > what's a way to disallow additional -march= from taking effect?  It's
> I usually add -mno-{avx,avx512f} and -mtune=generic or sometimes
> -mprefer-vector-width=* to the testcases.

OK, I will try this route then.

Thanks,
Richard.


Re: [r12-3893 Regression] FAIL: gcc.target/i386/vect-pr97352.c scan-assembler-times vmov.pd 4 on Linux/x86_64

2021-09-28 Thread Hongtao Liu via Gcc-patches
On Tue, Sep 28, 2021 at 2:59 PM Richard Biener via Gcc-patches
 wrote:
>
> On Mon, 27 Sep 2021, sunil.k.pandey wrote:
>
> > On Linux/x86_64,
> >
> > 6390c5047adb75960f86d56582e6322aaa4d9281 is the first bad commit
> > commit 6390c5047adb75960f86d56582e6322aaa4d9281
> > Author: Richard Biener 
> > Date:   Wed Nov 18 09:36:57 2020 +0100
> >
> > Allow different vector types for stmt groups
> >
> > caused
> >
> > FAIL: gcc.dg/vect/bb-slp-17.c -flto -ffat-lto-objects  scan-tree-dump-times 
> > slp2 "optimized: basic block" 1
> > FAIL: gcc.dg/vect/bb-slp-17.c scan-tree-dump-times slp2 "optimized: basic 
> > block" 1
>
> This shows that it is maybe a bad idea to support V2SImode vectorization
> with -m32 when we refuse to implement even plus.
>
> OTOH it's just the mode that's available, autovectorize_vector_modes
> doesn't include the corresponding mode but we still pick it up via
> the related vector mode for group-size == 2.
>
> > FAIL: gcc.dg/vect/bb-slp-pr65935.c -flto -ffat-lto-objects  
> > scan-tree-dump-times slp1 "optimized: basic block" 10
> > FAIL: gcc.dg/vect/bb-slp-pr65935.c scan-tree-dump-times slp1 "optimized: 
> > basic block" 10
>
> We are now vectorizing the SSE tail when vectorizing with AVX.  I'll
> adjust the testcase to prefer SSE.
>
> > FAIL: gcc.target/i386/vect-pr97352.c scan-assembler-times vmov.pd 4
>
> With -mach=cascadelake we get
>
> vpermpd $68, c, %ymm0
> vpermpd $238, c, %ymm0
>
> instead of
>
> vmovapd c, %ymm1
> vinsertf128 $1, %xmm1, %ymm1, %ymm0
> vperm2f128  $49, %ymm1, %ymm1, %ymm0
>
> what's a way to disallow additional -march= from taking effect?  It's
I usually add -mno-{avx,avx512f} and -mtune=generic or sometimes
-mprefer-vector-width=* to the testcases.
or use (?:vinsertf128|vpermpd) for alternative instructions.
> really impossible to cater for all possible ISA variants in these kind
> of testcases.
Additional option -march=cascadelake sometimes can find real regression.
>
> Richard.



-- 
BR,
Hongtao


Re: [r12-3893 Regression] FAIL: gcc.target/i386/vect-pr97352.c scan-assembler-times vmov.pd 4 on Linux/x86_64

2021-09-28 Thread Richard Biener via Gcc-patches
On Mon, 27 Sep 2021, sunil.k.pandey wrote:

> On Linux/x86_64,
> 
> 6390c5047adb75960f86d56582e6322aaa4d9281 is the first bad commit
> commit 6390c5047adb75960f86d56582e6322aaa4d9281
> Author: Richard Biener 
> Date:   Wed Nov 18 09:36:57 2020 +0100
> 
> Allow different vector types for stmt groups
> 
> caused
> 
> FAIL: gcc.dg/vect/bb-slp-17.c -flto -ffat-lto-objects  scan-tree-dump-times 
> slp2 "optimized: basic block" 1
> FAIL: gcc.dg/vect/bb-slp-17.c scan-tree-dump-times slp2 "optimized: basic 
> block" 1

This shows that it is maybe a bad idea to support V2SImode vectorization
with -m32 when we refuse to implement even plus.

OTOH it's just the mode that's available, autovectorize_vector_modes
doesn't include the corresponding mode but we still pick it up via
the related vector mode for group-size == 2.

> FAIL: gcc.dg/vect/bb-slp-pr65935.c -flto -ffat-lto-objects  
> scan-tree-dump-times slp1 "optimized: basic block" 10
> FAIL: gcc.dg/vect/bb-slp-pr65935.c scan-tree-dump-times slp1 "optimized: 
> basic block" 10

We are now vectorizing the SSE tail when vectorizing with AVX.  I'll 
adjust the testcase to prefer SSE.

> FAIL: gcc.target/i386/vect-pr97352.c scan-assembler-times vmov.pd 4

With -mach=cascadelake we get

vpermpd $68, c, %ymm0
vpermpd $238, c, %ymm0

instead of

vmovapd c, %ymm1
vinsertf128 $1, %xmm1, %ymm1, %ymm0
vperm2f128  $49, %ymm1, %ymm1, %ymm0

what's a way to disallow additional -march= from taking effect?  It's
really impossible to cater for all possible ISA variants in these kind
of testcases.

Richard.


[r12-3893 Regression] FAIL: gcc.target/i386/vect-pr97352.c scan-assembler-times vmov.pd 4 on Linux/x86_64

2021-09-27 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

6390c5047adb75960f86d56582e6322aaa4d9281 is the first bad commit
commit 6390c5047adb75960f86d56582e6322aaa4d9281
Author: Richard Biener 
Date:   Wed Nov 18 09:36:57 2020 +0100

Allow different vector types for stmt groups

caused

FAIL: gcc.dg/vect/bb-slp-17.c -flto -ffat-lto-objects  scan-tree-dump-times 
slp2 "optimized: basic block" 1
FAIL: gcc.dg/vect/bb-slp-17.c scan-tree-dump-times slp2 "optimized: basic 
block" 1
FAIL: gcc.dg/vect/bb-slp-pr65935.c -flto -ffat-lto-objects  
scan-tree-dump-times slp1 "optimized: basic block" 10
FAIL: gcc.dg/vect/bb-slp-pr65935.c scan-tree-dump-times slp1 "optimized: basic 
block" 10
FAIL: gcc.target/i386/vect-pr97352.c scan-assembler-times vmov.pd 4

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-3893/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/bb-slp-17.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/bb-slp-17.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/bb-slp-pr65935.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/bb-slp-pr65935.c --target_board='unix{-m64\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/vect-pr97352.c 
--target_board='unix{-m32\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)