On Fri, Oct 30, 2020 at 2:46 AM Richard Biener <rguent...@suse.de> wrote:
>
> On Fri, 30 Oct 2020, Richard Sandiford wrote:
>
> > Richard Biener <rguent...@suse.de> writes:
> > > On Wed, 28 Oct 2020, Christophe Lyon wrote:
> > >
> > >> On Wed, 28 Oct 2020 at 11:27, Christophe Lyon
> > >> <christophe.l...@linaro.org> wrote:
> > >> >
> > >> > On Tue, 27 Oct 2020 at 13:18, Richard Biener <rguent...@suse.de> wrote:
> > >> > >
> > >> > > This makes SLP discovery detect backedges by seeding the bst_map with
> > >> > > the node to be analyzed so it can be picked up from recursive calls.
> > >> > > This removes the need to discover backedges in a separate walk.
> > >> > >
> > >> > > This enables SLP build to handle PHI nodes in full, continuing
> > >> > > the SLP build to non-backedges.  For loop vectorization this
> > >> > > enables outer loop vectorization of nested SLP cycles and for
> > >> > > BB vectorization this enables vectorization of PHIs at CFG merges.
> > >> > >
> > >> > > It also turns code generation into a SCC discovery walk to handle
> > >> > > irreducible regions and nodes only reachable via backedges where
> > >> > > we now also fill in vectorized backedge defs.
> > >> > >
> > >> > > This requires sanitizing the SLP tree for SLP reduction chains even
> > >> > > more, manually filling the backedge SLP def.
> > >> > >
> > >> > > This also exposes the fact that CFG copying (and edge splitting
> > >> > > until I fixed that) ends up with different edge order in the
> > >> > > copy which doesn't play well with the desired 1:1 mapping of
> > >> > > SLP PHI node children and edges for epilogue vectorization.
> > >> > > I've tried to fixup CFG copying here but this really looks
> > >> > > like a dead (or expensive) end there so I've done fixup in
> > >> > > slpeel_tree_duplicate_loop_to_edge_cfg instead for the cases
> > >> > > we can run into.
> > >> > >
> > >> > > There's still NULLs in the SLP_TREE_CHILDREN vectors and I'm
> > >> > > not sure it's possible to eliminate them all this stage1 so the
> > >> > > patch has quite some checks for this case all over the place.
> > >> > >
> > >> > > Bootstrapped and tested on x86_64-unknown-linux-gnu.  SPEC CPU 2017
> > >> > > and SPEC CPU 2006 successfully built and tested.
> > >> > >
> > >> > > Will push soon.
> > >> > >
> > >> > > Richard.
> > >> > >
> > >> > > 2020-10-27  Richard Biener  <rguent...@suse.de>
> > >> > >
> > >> > >         * gimple.h (gimple_expr_type): For PHIs return the type
> > >> > >         of the result.
> > >> > >         * tree-vect-loop-manip.c 
> > >> > > (slpeel_tree_duplicate_loop_to_edge_cfg):
> > >> > >         Make sure edge order into copied loop headers line up with 
> > >> > > the
> > >> > >         originals.
> > >> > >         * tree-vect-loop.c (vect_transform_cycle_phi): Handle nested
> > >> > >         loops with SLP.
> > >> > >         (vectorizable_phi): New function.
> > >> > >         (vectorizable_live_operation): For BB vectorization compute 
> > >> > > insert
> > >> > >         location here.
> > >> > >         * tree-vect-slp.c (vect_free_slp_tree): Deal with NULL
> > >> > >         SLP_TREE_CHILDREN entries.
> > >> > >         (vect_create_new_slp_node): Add overloads with pre-existing 
> > >> > > node
> > >> > >         argument.
> > >> > >         (vect_print_slp_graph): Likewise.
> > >> > >         (vect_mark_slp_stmts): Likewise.
> > >> > >         (vect_mark_slp_stmts_relevant): Likewise.
> > >> > >         (vect_gather_slp_loads): Likewise.
> > >> > >         (vect_optimize_slp): Likewise.
> > >> > >         (vect_slp_analyze_node_operations): Likewise.
> > >> > >         (vect_bb_slp_scalar_cost): Likewise.
> > >> > >         (vect_remove_slp_scalar_calls): Likewise.
> > >> > >         (vect_get_and_check_slp_defs): Handle PHIs.
> > >> > >         (vect_build_slp_tree_1): Handle PHIs.
> > >> > >         (vect_build_slp_tree_2): Continue SLP build, following PHI
> > >> > >         arguments.  Fix memory leak.
> > >> > >         (vect_build_slp_tree): Put stub node into the hash-map so
> > >> > >         we can discover cycles directly.
> > >> > >         (vect_build_slp_instance): Set the backedge SLP def for
> > >> > >         reduction chains.
> > >> > >         (vect_analyze_slp_backedges): Remove.
> > >> > >         (vect_analyze_slp): Do not call it.
> > >> > >         (vect_slp_convert_to_external): Release 
> > >> > > SLP_TREE_LOAD_PERMUTATION.
> > >> > >         (vect_slp_analyze_node_operations): Handle stray failed
> > >> > >         backedge defs by failing.
> > >> > >         (vect_slp_build_vertices): Adjust leaf condition.
> > >> > >         (vect_bb_slp_mark_live_stmts): Handle PHIs, use visited
> > >> > >         hash-set to handle cycles.
> > >> > >         (vect_slp_analyze_operations): Adjust.
> > >> > >         (vect_bb_partition_graph_r): Likewise.
> > >> > >         (vect_slp_function): Adjust split condition to allow CFG
> > >> > >         merges.
> > >> > >         (vect_schedule_slp_instance): Rename to ...
> > >> > >         (vect_schedule_slp_node): ... this.  Move DFS walk to ...
> > >> > >         (vect_schedule_scc): ... this new function.
> > >> > >         (vect_schedule_slp): Call it.  Remove ad-hoc vectorized
> > >> > >         backedge fill code.
> > >> > >         * tree-vect-stmts.c (vect_analyze_stmt): Call
> > >> > >         vectorizable_phi.
> > >> > >         (vect_transform_stmt): Likewise.
> > >> > >         (vect_is_simple_use): Handle vect_backedge_def.
> > >> > >         * tree-vectorizer.c (vec_info::new_stmt_vec_info): Only
> > >> > >         set loop header PHIs to vect_unknown_def_type for loop
> > >> > >         vectorization.
> > >> > >         * tree-vectorizer.h (enum vect_def_type): Add 
> > >> > > vect_backedge_def.
> > >> > >         (enum stmt_vec_info_type): Add phi_info_type.
> > >> > >         (vectorizable_phi): Declare.
> > >> > >
> > >> > >         * gcc.dg/vect/bb-slp-54.c: New test.
> > >> > >         * gcc.dg/vect/bb-slp-55.c: Likewise.
> > >> > >         * gcc.dg/vect/bb-slp-56.c: Likewise.
> > >> > >         * gcc.dg/vect/bb-slp-57.c: Likewise.
> > >> > >         * gcc.dg/vect/bb-slp-58.c: Likewise.
> > >> > >         * gcc.dg/vect/bb-slp-59.c: Likewise.
> > >> > >         * gcc.dg/vect/bb-slp-60.c: Likewise.
> > >> > >         * gcc.dg/vect/bb-slp-61.c: Likewise.
> > >> > >         * gcc.dg/vect/bb-slp-62.c: Likewise.
> > >> > >         * gcc.dg/vect/bb-slp-63.c: Likewise.
> > >> > >         * gcc.dg/vect/bb-slp-64.c: Likewise.
> > >> > >         * gcc.dg/vect/bb-slp-65.c: Likewise.
> > >> > >         * gcc.dg/vect/bb-slp-66.c: Likewise.
> > >> > >         * gcc.dg/vect/vect-outer-slp-1.c: Likewise.
> > >> > >         * gfortran.dg/vect/O3-bb-slp-1.f: Likewise.
> > >> > >         * gfortran.dg/vect/O3-bb-slp-2.f: Likewise.
> > >> > >         * g++.dg/vect/simd-11.cc: Likewise.
> > >> >
> > >> >
> > >> > I've filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97616
> > >> > because bb-slp-58.c and bb-slp-59.c fail on arm.
> > >> >
> > >>
> > >> And it causes a regression on aarch64:
> > >> FAIL:    gcc.target/aarch64/sve/reduc_strict_4.c -march=armv8.2-a+sve
> > >> scan-assembler-times \\tfadda\\td[0-9]+, p[0-7], d[0-9]+, z[0-9]+\\.d
> > >> 4
> > >
> > > I'm testing a fix for this one.
> > >
> > >> FAIL:    gcc.target/aarch64/sve/reduc_strict_5.c -march=armv8.2-a+sve
> > >> scan-assembler-times \\tfadda\\td[0-9]+, p[0-7], d[0-9]+, z[0-9]+\\.d
> > >> 6
> > >
> > > That one is more odd.  For some reason we arrive at
> > >
> > > /home/rguenther/src/trunk/gcc/testsuite/gcc.target/aarch64/sve/reduc_strict_5.c:10:21:
> > > note:   vect_is_simple_use: operand 0.0, type of def: constant
> > > /home/rguenther/src/trunk/gcc/testsuite/gcc.target/aarch64/sve/reduc_strict_5.c:10:21:
> > > missed:   Build SLP failed: invalid type of def for variable-length SLP
> > > 0.0
> > >
> > > we previously didn't analyze the initial value (we will later not
> > > build a SLP node for it but the defs are still analyzed by
> > > vect_get_and_check_slp_defs which I can fix).  Sill I want to
> > > know why we cannot create a vector of all zeros.  The check is
> > >
> > >           if ((dt == vect_constant_def
> > >                || dt == vect_external_def)
> > >               && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
> > >               && (TREE_CODE (type) == BOOLEAN_TYPE
> > >                   || !can_duplicate_and_interleave_p (vinfo, stmts.length
> > > (),
> > >                                                       type)))
> > >             {
> > >               if (dump_enabled_p ())
> > >                 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > >                                  "Build SLP failed: invalid type of def "
> > >                                  "for variable-length SLP %T\n", oprnd);
> > >               return -1;
> > >             }
> > >
> > > so the reason must be the group size (12)?  It succeeds with 8.
> > > But then the check looks totally premature at the above point
> > > because for sure an all-same constant can be always duplicated?
> > >
> > > So - what breaks if we simply remove the above?  Richard?
> >
> > Well, I think there are two issues:
> >
> > - Should all-same constants be OK?  Agree they should, but it would
> >   involve extending the check and making a promise that the values
> >   won't change later (and so invalidate the analysis).
> >
> > - Do we need to check a condition at all?  If we don't,
> >   duplicate_and_interleave will ICE when trying to build invalid
> >   variable-length constant or external definitions.  So we need
> >   to check it somewhere, even if it's not here.
>
> OK, I assumed it was present elsewhere already (for non-SLP vect).
>
> Doing the check at its current place makes sense if the "error"
> can be mitigated by swapping [parent] operands or by splitting
> the SLP graph at other lanes.  Possibly the latter could help
> here, but it's a conflict with a more precise check.  A more
> precise check would be where we call vectorizable_* which
> I admit doesn't yet exist for constants / invariants which we
> simply assume can always generate.
>
> There's similar "premature" checking for vect vs. vect and
> vect vs. scalar shift capabilities in vect_build_slp_tree_1
> which there helps to eventually shuffle the correct stmts
> into better matching operand positions ...
>
> So - it would probably help to "duplicate" the check to
> vect_slp_analyze_node_operations (where we call vectorizable_*)
> because we do generate externals in other places where we
> do _not_ have this kind of check (vect_slp_convert_to_external
> and several places in vect_build_slp_tree_2 where we create
> the node from scalar pieces).
>
> Richard.

This caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97838

-- 
H.J.

Reply via email to