On Fri, 17 May 2024, Richard Sandiford wrote:

> Richard Biener via Gcc <gcc@gcc.gnu.org> writes:
> > Hi,
> >
> > I'd like to discuss how to go forward with getting the vectorizer to
> > all-SLP for this stage1.  While there is a personal branch with my
> > ongoing work (users/rguenth/vect-force-slp) branches haven't proved
> > themselves working well for collaboration.
> 
> Speaking for myself, the problem hasn't been so much the branch as
> lack of time.  I've been pretty swamped the last eight months of so
> (except for the time that I took off, which admittedly was quite a
> bit!), and so I never even got around to properly reading and replying
> to your message after the Cauldron.  It's been on the "this is important,
> I should make time to read and understand it properly" list all this time.
> Sorry about that. :(
> 
> I'm hoping to have time to work/help out on SLP stuff soon.
> 
> > The branch isn't ready to be merged in full but I have been picking
> > improvements to trunk last stage1 and some remaining bits in the past
> > weeks.  I have refrained from merging code paths that cannot be
> > exercised on trunk.
> >
> > There are two important set of changes on the branch, both critical
> > to get more testing on non-x86 targets.
> >
> >  1. enable single-lane SLP discovery
> >  2. avoid splitting store groups (9315bfc661432c3 and 4336060fe2db8ec
> >     if you fetch the branch)
> >
> > The first point is also most annoying on the testsuite since doing
> > SLP instead of interleaving changes what we dump and thus tests
> > start to fail in random ways when you switch between both modes.
> > On the branch single-lane SLP discovery is gated with
> > --param vect-single-lane-slp.
> >
> > The branch has numerous changes to enable single-lane SLP for some
> > code paths that have SLP not implemented and where I did not bother
> > to try supporting multi-lane SLP at this point.  It also adds more
> > SLP discovery entry points.
> >
> > I'm not sure how to try merging these pieces to allow others to
> > more easily help out.  One possibility is to merge
> > --param vect-single-lane-slp defaulted off and pick dependent
> > changes even when they cause testsuite regressions with
> > vect-single-lane-slp=1.  Alternatively adjust the testsuite by
> > adding --param vect-single-lane-slp=0 and default to 1
> > (or keep the default).
> 
> FWIW, this one sounds good to me (the default to 1 version).
> I.e. mechanically add --param vect-single-lane-slp=0 to any tests
> that fail with the new default.  That means that the test that need
> fixing are easily greppable for anyone who wants to help.  Sometimes
> it'll just be a test update.  Sometimes it will be new vectoriser code.

OK.  Meanwhile I figured the most important part is 2. from above
since that enables the single-lane in a grouped access (also covering
single element interleaving).  This will cover all problematical cases
with respect to vectorizing loads and stores.  It also has less
testsuite fallout, mainly because we have a lot less coverage for
grouped stores without SLP.

So I'll see to produce a mergeable patch for part 2 and post that
for review next week.

Thanks,
Richard.

> Thanks,
> Richard
> 
> > Or require a clean testsuite with
> > --param vect-single-lane-slp defaulted to 1 but keep the --param
> > for debugging (and allow FAILs with 0).
> >
> > For fun I merged just single-lane discovery of non-grouped stores
> > and have that enabled by default.  On x86_64 this results in the
> > set of FAILs below.
> >
> > Any suggestions?
> >
> > Thanks,
> > Richard.
> >
> > FAIL: gcc.dg/vect/O3-pr39675-2.c scan-tree-dump-times vect "vectorizing 
> > stmts using SLP" 1
> > XPASS: gcc.dg/vect/no-scevccp-outer-12.c scan-tree-dump-times vect "OUTER 
> > LOOP VECTORIZED." 1
> > FAIL: gcc.dg/vect/no-section-anchors-vect-31.c scan-tree-dump-times vect 
> > "Alignment of access forced using peeling" 2
> > FAIL: gcc.dg/vect/no-section-anchors-vect-31.c scan-tree-dump-times vect 
> > "Vectorizing an unaligned access" 0
> > FAIL: gcc.dg/vect/no-section-anchors-vect-64.c scan-tree-dump-times vect 
> > "Alignment of access forced using peeling" 2
> > FAIL: gcc.dg/vect/no-section-anchors-vect-64.c scan-tree-dump-times vect 
> > "Vectorizing an unaligned access" 0
> > FAIL: gcc.dg/vect/no-section-anchors-vect-66.c scan-tree-dump-times vect 
> > "Alignment of access forced using peeling" 1
> > FAIL: gcc.dg/vect/no-section-anchors-vect-66.c scan-tree-dump-times vect 
> > "Vectorizing an unaligned access" 0
> > FAIL: gcc.dg/vect/no-section-anchors-vect-68.c scan-tree-dump-times vect 
> > "Alignment of access forced using peeling" 2
> > FAIL: gcc.dg/vect/no-section-anchors-vect-68.c scan-tree-dump-times vect 
> > "Vectorizing an unaligned access" 0
> > FAIL: gcc.dg/vect/slp-12a.c -flto -ffat-lto-objects  scan-tree-dump-times 
> > vect "vectorizing stmts using SLP" 1
> > FAIL: gcc.dg/vect/slp-12a.c scan-tree-dump-times vect "vectorizing stmts 
> > using SLP" 1
> > FAIL: gcc.dg/vect/slp-19a.c -flto -ffat-lto-objects  scan-tree-dump-times 
> > vect "vectorizing stmts using SLP" 1
> > FAIL: gcc.dg/vect/slp-19a.c scan-tree-dump-times vect "vectorizing stmts 
> > using SLP" 1
> > FAIL: gcc.dg/vect/slp-19b.c -flto -ffat-lto-objects  scan-tree-dump-times 
> > vect "vectorizing stmts using SLP" 1
> > FAIL: gcc.dg/vect/slp-19b.c scan-tree-dump-times vect "vectorizing stmts 
> > using SLP" 1
> > FAIL: gcc.dg/vect/slp-19c.c -flto -ffat-lto-objects  scan-tree-dump-times 
> > vect "vectorized 1 loops" 1
> > FAIL: gcc.dg/vect/slp-19c.c -flto -ffat-lto-objects  scan-tree-dump-times 
> > vect "vectorizing stmts using SLP" 1
> > FAIL: gcc.dg/vect/slp-19c.c scan-tree-dump-times vect "vectorized 1 loops" 
> > 1
> > FAIL: gcc.dg/vect/slp-19c.c scan-tree-dump-times vect "vectorizing stmts 
> > using SLP" 1
> > XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1115.c -flto -ffat-lto-objects  
> > scan-tree-dump vect "vectorized 1 loops"
> > XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1115.c scan-tree-dump vect "vectorized 
> > 1 loops"
> > XPASS: gcc.dg/vect/tsvc/vect-tsvc-s114.c -flto -ffat-lto-objects  
> > scan-tree-dump vect "vectorized 1 loops"
> > XPASS: gcc.dg/vect/tsvc/vect-tsvc-s114.c scan-tree-dump vect "vectorized 1 
> > loops"
> > XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1232.c -flto -ffat-lto-objects  
> > scan-tree-dump vect "vectorized 1 loops"
> > XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1232.c scan-tree-dump vect "vectorized 
> > 1 loops"
> > XPASS: gcc.dg/vect/tsvc/vect-tsvc-s257.c -flto -ffat-lto-objects  
> > scan-tree-dump vect "vectorized 1 loops"
> > XPASS: gcc.dg/vect/tsvc/vect-tsvc-s257.c scan-tree-dump vect "vectorized 1 
> > loops"
> > FAIL: gcc.dg/vect/vect-26.c -flto -ffat-lto-objects  scan-tree-dump-times 
> > vect "Alignment of access forced using peeling" 1
> > FAIL: gcc.dg/vect/vect-26.c -flto -ffat-lto-objects  scan-tree-dump-times 
> > vect "Vectorizing an unaligned access" 0
> > FAIL: gcc.dg/vect/vect-26.c scan-tree-dump-times vect "Alignment of access 
> > forced using peeling" 1
> > FAIL: gcc.dg/vect/vect-26.c scan-tree-dump-times vect "Vectorizing an 
> > unaligned access" 0
> > FAIL: gcc.dg/vect/vect-54.c -flto -ffat-lto-objects  scan-tree-dump-times 
> > vect "Alignment of access forced using peeling" 1
> > FAIL: gcc.dg/vect/vect-54.c -flto -ffat-lto-objects  scan-tree-dump-times 
> > vect "Vectorizing an unaligned access" 0
> > FAIL: gcc.dg/vect/vect-54.c scan-tree-dump-times vect "Alignment of access 
> > forced using peeling" 1
> > FAIL: gcc.dg/vect/vect-54.c scan-tree-dump-times vect "Vectorizing an 
> > unaligned access" 0
> > FAIL: gcc.dg/vect/vect-56.c -flto -ffat-lto-objects  scan-tree-dump-times 
> > vect "Alignment of access forced using peeling" 1
> > FAIL: gcc.dg/vect/vect-56.c -flto -ffat-lto-objects  scan-tree-dump-times 
> > vect "Vectorizing an unaligned access" 1
> > FAIL: gcc.dg/vect/vect-56.c scan-tree-dump-times vect "Alignment of access 
> > forced using peeling" 1
> > FAIL: gcc.dg/vect/vect-56.c scan-tree-dump-times vect "Vectorizing an 
> > unaligned access" 1
> > FAIL: gcc.dg/vect/vect-58.c -flto -ffat-lto-objects  scan-tree-dump-times 
> > vect "Alignment of access forced using peeling" 1
> > FAIL: gcc.dg/vect/vect-58.c -flto -ffat-lto-objects  scan-tree-dump-times 
> > vect "Vectorizing an unaligned access" 0
> > FAIL: gcc.dg/vect/vect-58.c scan-tree-dump-times vect "Alignment of access 
> > forced using peeling" 1
> > FAIL: gcc.dg/vect/vect-58.c scan-tree-dump-times vect "Vectorizing an 
> > unaligned access" 0
> > FAIL: gcc.dg/vect/vect-60.c -flto -ffat-lto-objects  scan-tree-dump-times 
> > vect "Alignment of access forced using peeling" 1
> > FAIL: gcc.dg/vect/vect-60.c -flto -ffat-lto-objects  scan-tree-dump-times 
> > vect "Vectorizing an unaligned access" 1
> > FAIL: gcc.dg/vect/vect-60.c scan-tree-dump-times vect "Alignment of access 
> > forced using peeling" 1
> > FAIL: gcc.dg/vect/vect-60.c scan-tree-dump-times vect "Vectorizing an 
> > unaligned access" 1
> > FAIL: gcc.dg/vect/vect-89-big-array.c -flto -ffat-lto-objects  
> > scan-tree-dump-times vect "Alignment of access forced using peeling" 1
> > FAIL: gcc.dg/vect/vect-89-big-array.c -flto -ffat-lto-objects  
> > scan-tree-dump-times vect "Vectorizing an unaligned access" 0
> > FAIL: gcc.dg/vect/vect-89-big-array.c scan-tree-dump-times vect "Alignment 
> > of access forced using peeling" 1
> > FAIL: gcc.dg/vect/vect-89-big-array.c scan-tree-dump-times vect 
> > "Vectorizing an unaligned access" 0
> > FAIL: gcc.dg/vect/vect-89.c -flto -ffat-lto-objects  scan-tree-dump-times 
> > vect "Alignment of access forced using peeling" 1
> > FAIL: gcc.dg/vect/vect-89.c -flto -ffat-lto-objects  scan-tree-dump-times 
> > vect "Vectorizing an unaligned access" 0
> > FAIL: gcc.dg/vect/vect-89.c scan-tree-dump-times vect "Alignment of access 
> > forced using peeling" 1
> > FAIL: gcc.dg/vect/vect-89.c scan-tree-dump-times vect "Vectorizing an 
> > unaligned access" 0
> > FAIL: gcc.dg/vect/vect-92.c -flto -ffat-lto-objects  scan-tree-dump-times 
> > vect "Alignment of access forced using peeling" 3
> > FAIL: gcc.dg/vect/vect-92.c -flto -ffat-lto-objects  scan-tree-dump-times 
> > vect "Vectorizing an unaligned access" 0 
> > FAIL: gcc.dg/vect/vect-92.c scan-tree-dump-times vect "Alignment of access 
> > forced using peeling" 3
> > FAIL: gcc.dg/vect/vect-92.c scan-tree-dump-times vect "Vectorizing an 
> > unaligned access" 0 
> > FAIL: gcc.dg/vect/vect-early-break_25.c -flto -ffat-lto-objects  
> > scan-tree-dump-times vect "Alignment of access forced using peeling" 1
> > FAIL: gcc.dg/vect/vect-early-break_25.c scan-tree-dump-times vect 
> > "Alignment of access forced using peeling" 1
> > FAIL: gcc.dg/vect/vect-multitypes-1.c -flto -ffat-lto-objects  
> > scan-tree-dump-times vect "Alignment of access forced using peeling" 2
> > FAIL: gcc.dg/vect/vect-multitypes-1.c scan-tree-dump-times vect "Alignment 
> > of access forced using peeling" 2
> > XPASS: gcc.dg/vect/vect-outer-4e.c -flto -ffat-lto-objects  
> > scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1
> > XPASS: gcc.dg/vect/vect-outer-4e.c scan-tree-dump-times vect "OUTER LOOP 
> > VECTORIZED" 1 
> > FAIL: gcc.dg/vect/vect-peel-1.c -flto -ffat-lto-objects  
> > scan-tree-dump-times vect "Alignment of access forced using peeling" 1
> > FAIL: gcc.dg/vect/vect-peel-1.c -flto -ffat-lto-objects  
> > scan-tree-dump-times vect "Vectorizing an unaligned access" 1
> > FAIL: gcc.dg/vect/vect-peel-1.c scan-tree-dump-times vect "Alignment of 
> > access forced using peeling" 1
> > FAIL: gcc.dg/vect/vect-peel-1.c scan-tree-dump-times vect "Vectorizing an 
> > unaligned access" 1
> > FAIL: gcc.dg/vect/vect-peel-2.c -flto -ffat-lto-objects  
> > scan-tree-dump-times vect "Alignment of access forced using peeling" 1
> > FAIL: gcc.dg/vect/vect-peel-2.c -flto -ffat-lto-objects  
> > scan-tree-dump-times vect "Vectorizing an unaligned access" 1
> > FAIL: gcc.dg/vect/vect-peel-2.c scan-tree-dump-times vect "Alignment of 
> > access forced using peeling" 1
> > FAIL: gcc.dg/vect/vect-peel-2.c scan-tree-dump-times vect "Vectorizing an 
> > unaligned access" 1
> > FAIL: gcc.target/i386/cond_op_fma__Float16-1.c scan-assembler-times 
> > vfmadd132ph[ \\\\t]+[^{\\n]*%ymm[0-9]+{%k[1-7]}(?:\\n|[ \\\\t]+#) 1
> > FAIL: gcc.target/i386/cond_op_fma__Float16-1.c scan-assembler-times 
> > vfmsub132ph[ \\\\t]+[^{\\n]*%ymm[0-9]+{%k[1-7]}(?:\\n|[ \\\\t]+#) 1
> > FAIL: gcc.target/i386/cond_op_fma__Float16-1.c scan-assembler-times 
> > vfnmadd132ph[ \\\\t]+[^{\\n]*%ymm[0-9]+{%k[1-7]}(?:\\n|[ \\\\t]+#) 1
> > FAIL: gcc.target/i386/cond_op_fma__Float16-1.c scan-assembler-times 
> > vfnmsub132ph[ \\\\t]+[^{\\n]*%ymm[0-9]+{%k[1-7]}(?:\\n|[ \\\\t]+#) 1
> > FAIL: gcc.target/i386/cond_op_fma_double-1.c scan-assembler-times 
> > vfmadd132pd[ \\\\t]+[^{\\n]*%ymm[0-9]+{%k[1-7]}(?:\\n|[ \\\\t]+#) 1
> > FAIL: gcc.target/i386/cond_op_fma_double-1.c scan-assembler-times 
> > vfmsub132pd[ \\\\t]+[^{\\n]*%ymm[0-9]+{%k[1-7]}(?:\\n|[ \\\\t]+#) 1
> > FAIL: gcc.target/i386/cond_op_fma_double-1.c scan-assembler-times 
> > vfnmadd132pd[ \\\\t]+[^{\\n]*%ymm[0-9]+{%k[1-7]}(?:\\n|[ \\\\t]+#) 1
> > FAIL: gcc.target/i386/cond_op_fma_double-1.c scan-assembler-times 
> > vfnmsub132pd[ \\\\t]+[^{\\n]*%ymm[0-9]+{%k[1-7]}(?:\\n|[ \\\\t]+#) 1
> > FAIL: gcc.target/i386/cond_op_fma_float-1.c scan-assembler-times 
> > vfmadd132ps[ \\\\t]+[^{\\n]*%ymm[0-9]+{%k[1-7]}(?:\\n|[ \\\\t]+#) 1
> > FAIL: gcc.target/i386/cond_op_fma_float-1.c scan-assembler-times 
> > vfmsub132ps[ \\\\t]+[^{\\n]*%ymm[0-9]+{%k[1-7]}(?:\\n|[ \\\\t]+#) 1
> > FAIL: gcc.target/i386/cond_op_fma_float-1.c scan-assembler-times 
> > vfnmadd132ps[ \\\\t]+[^{\\n]*%ymm[0-9]+{%k[1-7]}(?:\\n|[ \\\\t]+#) 1
> > FAIL: gcc.target/i386/cond_op_fma_float-1.c scan-assembler-times 
> > vfnmsub132ps[ \\\\t]+[^{\\n]*%ymm[0-9]+{%k[1-7]}(?:\\n|[ \\\\t]+#) 1
> > FAIL: gcc.target/i386/pr101950-2.c scan-assembler-times \\txor[ql]\\t 2
> > FAIL: gcc.target/i386/pr88531-2b.c scan-assembler-times vmulps 1
> > FAIL: gcc.target/i386/pr88531-2c.c scan-assembler-times vmulps 1
> > FAIL: gcc.target/i386/vectorize1.c scan-tree-dump vect "vect_cst"
> > FAIL: gfortran.dg/temporary_3.f90   -O2  execution test
> > FAIL: gfortran.dg/vect/fast-math-mgrid-resid.f   -O   scan-tree-dump pcom 
> > "Executing predictive commoning without unrolling"
> > FAIL: gfortran.dg/vect/vect-8.f90   -O   scan-tree-dump-times vect 
> > "vectorized 2[234] loops" 1
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to