Re: [r12-3893 Regression] FAIL: gcc.target/i386/vect-pr97352.c scan-assembler-times vmov.pd 4 on Linux/x86_64
On Tue, 28 Sep 2021, Richard Biener wrote: > On Tue, 28 Sep 2021, Hongtao Liu wrote: > > > On Tue, Sep 28, 2021 at 2:59 PM Richard Biener via Gcc-patches > > wrote: > > > > > > On Mon, 27 Sep 2021, sunil.k.pandey wrote: > > > > > > > On Linux/x86_64, > > > > > > > > 6390c5047adb75960f86d56582e6322aaa4d9281 is the first bad commit > > > > commit 6390c5047adb75960f86d56582e6322aaa4d9281 > > > > Author: Richard Biener > > > > Date: Wed Nov 18 09:36:57 2020 +0100 > > > > > > > > Allow different vector types for stmt groups > > > > > > > > caused > > > > > > > > FAIL: gcc.dg/vect/bb-slp-17.c -flto -ffat-lto-objects > > > > scan-tree-dump-times slp2 "optimized: basic block" 1 > > > > FAIL: gcc.dg/vect/bb-slp-17.c scan-tree-dump-times slp2 "optimized: > > > > basic block" 1 > > > > > > This shows that it is maybe a bad idea to support V2SImode vectorization > > > with -m32 when we refuse to implement even plus. > > > > > > OTOH it's just the mode that's available, autovectorize_vector_modes > > > doesn't include the corresponding mode but we still pick it up via > > > the related vector mode for group-size == 2. > > It looks like we could define the vectorize.related_mode hook to > reject V2SImode when !TARGET_MMX_WITH_SSE - the default implementation > just checks for vector_mode_supported_p. Meh, that doesn't work. We then fall through else if (SCALAR_INT_MODE_P (prevailing_mode) || !related_vector_mode (prevailing_mode, inner_mode, nunits).exists (_mode)) { /* Fall back to using mode_for_vector, mostly in the hope of being able to use an integer mode. */ if (known_eq (nunits, 0U) && !multiple_p (GET_MODE_SIZE (prevailing_mode), nbytes, )) return NULL_TREE; if (!mode_for_vector (inner_mode, nunits).exists (_mode)) return NULL_TREE; and return V2SImode anyway from mode_for_vector ... So - should we only allow integer modes here as the comment suggests? With that, thus if (!mode_for_vector (inner_mode, nunits).exists (_mode) || GET_MODE_CLASS (simd_mode) != MODE_INT) return NULL_TREE; we "properly" _not_ use V2SImode for vectorization on x86 when !TARGET_MMX_WITH_SSE. Note that will also not use V2SImode for vectorizing copies (which are properly supported). So I'm not sure rejecting V2SImode outright is "proper" ... Richard.
Re: [r12-3893 Regression] FAIL: gcc.target/i386/vect-pr97352.c scan-assembler-times vmov.pd 4 on Linux/x86_64
On Tue, 28 Sep 2021, Hongtao Liu wrote: > On Tue, Sep 28, 2021 at 2:59 PM Richard Biener via Gcc-patches > wrote: > > > > On Mon, 27 Sep 2021, sunil.k.pandey wrote: > > > > > On Linux/x86_64, > > > > > > 6390c5047adb75960f86d56582e6322aaa4d9281 is the first bad commit > > > commit 6390c5047adb75960f86d56582e6322aaa4d9281 > > > Author: Richard Biener > > > Date: Wed Nov 18 09:36:57 2020 +0100 > > > > > > Allow different vector types for stmt groups > > > > > > caused > > > > > > FAIL: gcc.dg/vect/bb-slp-17.c -flto -ffat-lto-objects > > > scan-tree-dump-times slp2 "optimized: basic block" 1 > > > FAIL: gcc.dg/vect/bb-slp-17.c scan-tree-dump-times slp2 "optimized: basic > > > block" 1 > > > > This shows that it is maybe a bad idea to support V2SImode vectorization > > with -m32 when we refuse to implement even plus. > > > > OTOH it's just the mode that's available, autovectorize_vector_modes > > doesn't include the corresponding mode but we still pick it up via > > the related vector mode for group-size == 2. It looks like we could define the vectorize.related_mode hook to reject V2SImode when !TARGET_MMX_WITH_SSE - the default implementation just checks for vector_mode_supported_p. > > > FAIL: gcc.dg/vect/bb-slp-pr65935.c -flto -ffat-lto-objects > > > scan-tree-dump-times slp1 "optimized: basic block" 10 > > > FAIL: gcc.dg/vect/bb-slp-pr65935.c scan-tree-dump-times slp1 "optimized: > > > basic block" 10 > > > > We are now vectorizing the SSE tail when vectorizing with AVX. I'll > > adjust the testcase to prefer SSE. > > > > > FAIL: gcc.target/i386/vect-pr97352.c scan-assembler-times vmov.pd 4 > > > > With -mach=cascadelake we get > > > > vpermpd $68, c, %ymm0 > > vpermpd $238, c, %ymm0 > > > > instead of > > > > vmovapd c, %ymm1 > > vinsertf128 $1, %xmm1, %ymm1, %ymm0 > > vperm2f128 $49, %ymm1, %ymm1, %ymm0 > > > > what's a way to disallow additional -march= from taking effect? It's > I usually add -mno-{avx,avx512f} and -mtune=generic or sometimes > -mprefer-vector-width=* to the testcases. OK, I will try this route then. Thanks, Richard.
Re: [r12-3893 Regression] FAIL: gcc.target/i386/vect-pr97352.c scan-assembler-times vmov.pd 4 on Linux/x86_64
On Tue, Sep 28, 2021 at 2:59 PM Richard Biener via Gcc-patches wrote: > > On Mon, 27 Sep 2021, sunil.k.pandey wrote: > > > On Linux/x86_64, > > > > 6390c5047adb75960f86d56582e6322aaa4d9281 is the first bad commit > > commit 6390c5047adb75960f86d56582e6322aaa4d9281 > > Author: Richard Biener > > Date: Wed Nov 18 09:36:57 2020 +0100 > > > > Allow different vector types for stmt groups > > > > caused > > > > FAIL: gcc.dg/vect/bb-slp-17.c -flto -ffat-lto-objects scan-tree-dump-times > > slp2 "optimized: basic block" 1 > > FAIL: gcc.dg/vect/bb-slp-17.c scan-tree-dump-times slp2 "optimized: basic > > block" 1 > > This shows that it is maybe a bad idea to support V2SImode vectorization > with -m32 when we refuse to implement even plus. > > OTOH it's just the mode that's available, autovectorize_vector_modes > doesn't include the corresponding mode but we still pick it up via > the related vector mode for group-size == 2. > > > FAIL: gcc.dg/vect/bb-slp-pr65935.c -flto -ffat-lto-objects > > scan-tree-dump-times slp1 "optimized: basic block" 10 > > FAIL: gcc.dg/vect/bb-slp-pr65935.c scan-tree-dump-times slp1 "optimized: > > basic block" 10 > > We are now vectorizing the SSE tail when vectorizing with AVX. I'll > adjust the testcase to prefer SSE. > > > FAIL: gcc.target/i386/vect-pr97352.c scan-assembler-times vmov.pd 4 > > With -mach=cascadelake we get > > vpermpd $68, c, %ymm0 > vpermpd $238, c, %ymm0 > > instead of > > vmovapd c, %ymm1 > vinsertf128 $1, %xmm1, %ymm1, %ymm0 > vperm2f128 $49, %ymm1, %ymm1, %ymm0 > > what's a way to disallow additional -march= from taking effect? It's I usually add -mno-{avx,avx512f} and -mtune=generic or sometimes -mprefer-vector-width=* to the testcases. or use (?:vinsertf128|vpermpd) for alternative instructions. > really impossible to cater for all possible ISA variants in these kind > of testcases. Additional option -march=cascadelake sometimes can find real regression. > > Richard. -- BR, Hongtao
Re: [r12-3893 Regression] FAIL: gcc.target/i386/vect-pr97352.c scan-assembler-times vmov.pd 4 on Linux/x86_64
On Mon, 27 Sep 2021, sunil.k.pandey wrote: > On Linux/x86_64, > > 6390c5047adb75960f86d56582e6322aaa4d9281 is the first bad commit > commit 6390c5047adb75960f86d56582e6322aaa4d9281 > Author: Richard Biener > Date: Wed Nov 18 09:36:57 2020 +0100 > > Allow different vector types for stmt groups > > caused > > FAIL: gcc.dg/vect/bb-slp-17.c -flto -ffat-lto-objects scan-tree-dump-times > slp2 "optimized: basic block" 1 > FAIL: gcc.dg/vect/bb-slp-17.c scan-tree-dump-times slp2 "optimized: basic > block" 1 This shows that it is maybe a bad idea to support V2SImode vectorization with -m32 when we refuse to implement even plus. OTOH it's just the mode that's available, autovectorize_vector_modes doesn't include the corresponding mode but we still pick it up via the related vector mode for group-size == 2. > FAIL: gcc.dg/vect/bb-slp-pr65935.c -flto -ffat-lto-objects > scan-tree-dump-times slp1 "optimized: basic block" 10 > FAIL: gcc.dg/vect/bb-slp-pr65935.c scan-tree-dump-times slp1 "optimized: > basic block" 10 We are now vectorizing the SSE tail when vectorizing with AVX. I'll adjust the testcase to prefer SSE. > FAIL: gcc.target/i386/vect-pr97352.c scan-assembler-times vmov.pd 4 With -mach=cascadelake we get vpermpd $68, c, %ymm0 vpermpd $238, c, %ymm0 instead of vmovapd c, %ymm1 vinsertf128 $1, %xmm1, %ymm1, %ymm0 vperm2f128 $49, %ymm1, %ymm1, %ymm0 what's a way to disallow additional -march= from taking effect? It's really impossible to cater for all possible ISA variants in these kind of testcases. Richard.
[r12-3893 Regression] FAIL: gcc.target/i386/vect-pr97352.c scan-assembler-times vmov.pd 4 on Linux/x86_64
On Linux/x86_64, 6390c5047adb75960f86d56582e6322aaa4d9281 is the first bad commit commit 6390c5047adb75960f86d56582e6322aaa4d9281 Author: Richard Biener Date: Wed Nov 18 09:36:57 2020 +0100 Allow different vector types for stmt groups caused FAIL: gcc.dg/vect/bb-slp-17.c -flto -ffat-lto-objects scan-tree-dump-times slp2 "optimized: basic block" 1 FAIL: gcc.dg/vect/bb-slp-17.c scan-tree-dump-times slp2 "optimized: basic block" 1 FAIL: gcc.dg/vect/bb-slp-pr65935.c -flto -ffat-lto-objects scan-tree-dump-times slp1 "optimized: basic block" 10 FAIL: gcc.dg/vect/bb-slp-pr65935.c scan-tree-dump-times slp1 "optimized: basic block" 10 FAIL: gcc.target/i386/vect-pr97352.c scan-assembler-times vmov.pd 4 with GCC configured with ../../gcc/configure --prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-3893/usr --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap To reproduce: $ cd {build_dir}/gcc && make check RUNTESTFLAGS="vect.exp=gcc.dg/vect/bb-slp-17.c --target_board='unix{-m32}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="vect.exp=gcc.dg/vect/bb-slp-17.c --target_board='unix{-m32\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="vect.exp=gcc.dg/vect/bb-slp-pr65935.c --target_board='unix{-m32\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="vect.exp=gcc.dg/vect/bb-slp-pr65935.c --target_board='unix{-m64\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="i386.exp=gcc.target/i386/vect-pr97352.c --target_board='unix{-m32\ -march=cascadelake}'" (Please do not reply to this email, for question about this report, contact me at skpgkp2 at gmail dot com)