[4/6] Make the vectoriser do its own DCE

2018-08-28 Thread Richard Sandiford
The vectoriser normally leaves a later DCE pass to remove the scalar
code, but we've accumulated various special cases for things that
DCE can't handle, such as removing the scalar stores that have been
replaced by vector stores, and the scalar calls to internal functions.
(The latter must be removed for correctness, since no underlying scalar
optabs exist for those calls.)

Now that vec_basic_block gives us an easy way of iterating over the
original scalar code (ignoring any new code inserted by the vectoriser),
it seems easier to do the DCE directly.  This involves marking the few
cases in which the vector code needs part of the original scalar code
to be kept around.


2018-08-28  Richard Sandiford  

gcc/
* tree-vectorizer.h (_stmt_vec_info::used_by_vector_code_p): New
member variable.
(vect_mark_used_by_vector_code): Declare.
(vect_remove_dead_scalar_stmts): Likewise.
(vect_transform_stmt): Return void.
(vect_remove_stores): Delete.
* tree-vectorizer.c (vec_info::remove_stmt): Handle phis.
* tree-vect-stmts.c (vect_mark_used_by_vector_code): New function.
(vectorizable_call, vectorizable_simd_clone_call): Don't remove
scalar calls here.
(vectorizable_shift): Mark shift amounts in a vector-scalar shift
as being used by the vector code.
(vectorizable_load): Mark unhoisted scalar loads that feed a
load-and-broadcast operation as being needed by the vector code.
(vect_transform_stmt): Return void.
(vect_remove_stores): Delete.
(vect_maybe_remove_scalar_stmt): New function.
(vect_remove_dead_scalar_stmts): Likewise.
* tree-vect-slp.c (vect_slp_bb): Call vect_remove_dead_scalar_stmts.
(vect_remove_slp_scalar_calls): Delete.
(vect_schedule_slp): Don't call it.  Don't remove scalar stores here.
* tree-vect-loop.c (vectorizable_reduction): Mark scalar phis that
are retained by the vector code.
(vectorizable_live_operation): Mark scalar live-out statements that
are retained by the vector code.
(vect_transform_loop_stmt): Remove seen_store argument.  Mark gconds
in nested loops as being needed by the vector code.  Replace the
outer loop's gcond with a dummy condition.
(vect_transform_loop): Update calls accordingly.  Don't remove
scalar stores or calls here; call vect_remove_dead_scalar_stmts
instead.
* tree-vect-loop-manip.c (vect_update_ivs_after_vectorizer): Don't
remove PHIs that were classified as reductions but never actually
vectorized.

Index: gcc/tree-vectorizer.h
===
--- gcc/tree-vectorizer.h   2018-08-28 12:05:11.466982927 +0100
+++ gcc/tree-vectorizer.h   2018-08-28 12:05:14.014961439 +0100
@@ -925,6 +925,10 @@ struct _stmt_vec_info {
   /* For both loads and stores.  */
   bool simd_lane_access_p;
 
+  /* True if the vectorized code keeps this statement in its current form.
+ Only meaningful for statements that were in the original scalar code.  */
+  bool used_by_vector_code_p;
+
   /* Classifies how the load or store is going to be implemented
  for loop vectorization.  */
   vect_memory_access_type memory_access_type;
@@ -1522,6 +1526,7 @@ extern stmt_vec_info vect_finish_replace
 extern stmt_vec_info vect_finish_stmt_generation (stmt_vec_info, gimple *,
  gimple_stmt_iterator *);
 extern bool vect_mark_stmts_to_be_vectorized (loop_vec_info);
+extern void vect_mark_used_by_vector_code (stmt_vec_info);
 extern tree vect_get_store_rhs (stmt_vec_info);
 extern tree vect_get_vec_def_for_operand_1 (stmt_vec_info, enum vect_def_type);
 extern tree vect_get_vec_def_for_operand (tree, stmt_vec_info, tree = NULL);
@@ -1532,9 +1537,8 @@ extern void vect_get_vec_defs_for_stmt_c
 extern tree vect_init_vector (stmt_vec_info, tree, tree,
   gimple_stmt_iterator *);
 extern tree vect_get_vec_def_for_stmt_copy (vec_info *, tree);
-extern bool vect_transform_stmt (stmt_vec_info, gimple_stmt_iterator *,
+extern void vect_transform_stmt (stmt_vec_info, gimple_stmt_iterator *,
 slp_tree, slp_instance);
-extern void vect_remove_stores (stmt_vec_info);
 extern bool vect_analyze_stmt (stmt_vec_info, bool *, slp_tree, slp_instance,
   stmt_vector_for_cost *);
 extern bool vectorizable_condition (stmt_vec_info, gimple_stmt_iterator *,
@@ -1554,6 +1558,7 @@ extern gcall *vect_gen_while (tree, tree
 extern tree vect_gen_while_not (gimple_seq *, tree, tree, tree);
 extern bool vect_get_vector_types_for_stmt (stmt_vec_info, tree *, tree *);
 extern tree vect_get_mask_type_for_stmt (stmt_vec_info);
+extern void vect_remove_dead_scalar_stmts (vec_info *);
 
 /* In tree-vect-data-refs.c.  */
 extern bool vect_can_force_dr_alignment_p (const_tree, un

Re: [4/6] Make the vectoriser do its own DCE

2018-08-28 Thread Jeff Law
On 08/28/2018 05:23 AM, Richard Sandiford wrote:
> The vectoriser normally leaves a later DCE pass to remove the scalar
> code, but we've accumulated various special cases for things that
> DCE can't handle, such as removing the scalar stores that have been
> replaced by vector stores, and the scalar calls to internal functions.
> (The latter must be removed for correctness, since no underlying scalar
> optabs exist for those calls.)
> 
> Now that vec_basic_block gives us an easy way of iterating over the
> original scalar code (ignoring any new code inserted by the vectoriser),
> it seems easier to do the DCE directly.  This involves marking the few
> cases in which the vector code needs part of the original scalar code
> to be kept around.
> 
> 
> 2018-08-28  Richard Sandiford  
> 
> gcc/
>   * tree-vectorizer.h (_stmt_vec_info::used_by_vector_code_p): New
>   member variable.
>   (vect_mark_used_by_vector_code): Declare.
>   (vect_remove_dead_scalar_stmts): Likewise.
>   (vect_transform_stmt): Return void.
>   (vect_remove_stores): Delete.
>   * tree-vectorizer.c (vec_info::remove_stmt): Handle phis.
>   * tree-vect-stmts.c (vect_mark_used_by_vector_code): New function.
>   (vectorizable_call, vectorizable_simd_clone_call): Don't remove
>   scalar calls here.
>   (vectorizable_shift): Mark shift amounts in a vector-scalar shift
>   as being used by the vector code.
>   (vectorizable_load): Mark unhoisted scalar loads that feed a
>   load-and-broadcast operation as being needed by the vector code.
>   (vect_transform_stmt): Return void.
>   (vect_remove_stores): Delete.
>   (vect_maybe_remove_scalar_stmt): New function.
>   (vect_remove_dead_scalar_stmts): Likewise.
>   * tree-vect-slp.c (vect_slp_bb): Call vect_remove_dead_scalar_stmts.
>   (vect_remove_slp_scalar_calls): Delete.
>   (vect_schedule_slp): Don't call it.  Don't remove scalar stores here.
>   * tree-vect-loop.c (vectorizable_reduction): Mark scalar phis that
>   are retained by the vector code.
>   (vectorizable_live_operation): Mark scalar live-out statements that
>   are retained by the vector code.
>   (vect_transform_loop_stmt): Remove seen_store argument.  Mark gconds
>   in nested loops as being needed by the vector code.  Replace the
>   outer loop's gcond with a dummy condition.
>   (vect_transform_loop): Update calls accordingly.  Don't remove
>   scalar stores or calls here; call vect_remove_dead_scalar_stmts
>   instead.
>   * tree-vect-loop-manip.c (vect_update_ivs_after_vectorizer): Don't
>   remove PHIs that were classified as reductions but never actually
>   vectorized.
Presumably you'll look at killing the actual DCE pass we're running as a
follow-up.

I'm hoping this will help the tendency of the vectorizer to leak
SSA_NAMEs.  Though it's late enough in the pipeline that leaking isn't
that bad.





> Index: gcc/tree-vectorizer.c
> ===
> --- gcc/tree-vectorizer.c 2018-08-28 12:05:11.466982927 +0100
> +++ gcc/tree-vectorizer.c 2018-08-28 12:05:14.014961439 +0100
> @@ -654,8 +654,13 @@ vec_info::remove_stmt (stmt_vec_info stm
>set_vinfo_for_stmt (stmt_info->stmt, NULL);
>gimple_stmt_iterator si = gsi_for_stmt (stmt_info->stmt);
>unlink_stmt_vdef (stmt_info->stmt);
> -  gsi_remove (&si, true);
> -  release_defs (stmt_info->stmt);
> +  if (is_a  (stmt_info->stmt))
> +remove_phi_node (&si, true);
> +  else
> +{
> +  gsi_remove (&si, true);
> +  release_defs (stmt_info->stmt);
> +}
More than once I've wondered if we should factor this into its own
little function.  I'm pretty sure I've seen this style of code
elsewhere.  Your call whether or not to clean that up.

OK with or without creating a little helper for the code above.
jeff


Re: [4/6] Make the vectoriser do its own DCE

2018-08-29 Thread Richard Biener
On Wed, Aug 29, 2018 at 1:01 AM Jeff Law  wrote:
>
> On 08/28/2018 05:23 AM, Richard Sandiford wrote:
> > The vectoriser normally leaves a later DCE pass to remove the scalar
> > code, but we've accumulated various special cases for things that
> > DCE can't handle, such as removing the scalar stores that have been
> > replaced by vector stores, and the scalar calls to internal functions.
> > (The latter must be removed for correctness, since no underlying scalar
> > optabs exist for those calls.)
> >
> > Now that vec_basic_block gives us an easy way of iterating over the
> > original scalar code (ignoring any new code inserted by the vectoriser),
> > it seems easier to do the DCE directly.  This involves marking the few
> > cases in which the vector code needs part of the original scalar code
> > to be kept around.
> >
> >
> > 2018-08-28  Richard Sandiford  
> >
> > gcc/
> >   * tree-vectorizer.h (_stmt_vec_info::used_by_vector_code_p): New
> >   member variable.
> >   (vect_mark_used_by_vector_code): Declare.
> >   (vect_remove_dead_scalar_stmts): Likewise.
> >   (vect_transform_stmt): Return void.
> >   (vect_remove_stores): Delete.
> >   * tree-vectorizer.c (vec_info::remove_stmt): Handle phis.
> >   * tree-vect-stmts.c (vect_mark_used_by_vector_code): New function.
> >   (vectorizable_call, vectorizable_simd_clone_call): Don't remove
> >   scalar calls here.
> >   (vectorizable_shift): Mark shift amounts in a vector-scalar shift
> >   as being used by the vector code.
> >   (vectorizable_load): Mark unhoisted scalar loads that feed a
> >   load-and-broadcast operation as being needed by the vector code.
> >   (vect_transform_stmt): Return void.
> >   (vect_remove_stores): Delete.
> >   (vect_maybe_remove_scalar_stmt): New function.
> >   (vect_remove_dead_scalar_stmts): Likewise.
> >   * tree-vect-slp.c (vect_slp_bb): Call vect_remove_dead_scalar_stmts.
> >   (vect_remove_slp_scalar_calls): Delete.
> >   (vect_schedule_slp): Don't call it.  Don't remove scalar stores here.
> >   * tree-vect-loop.c (vectorizable_reduction): Mark scalar phis that
> >   are retained by the vector code.
> >   (vectorizable_live_operation): Mark scalar live-out statements that
> >   are retained by the vector code.
> >   (vect_transform_loop_stmt): Remove seen_store argument.  Mark gconds
> >   in nested loops as being needed by the vector code.  Replace the
> >   outer loop's gcond with a dummy condition.
> >   (vect_transform_loop): Update calls accordingly.  Don't remove
> >   scalar stores or calls here; call vect_remove_dead_scalar_stmts
> >   instead.
> >   * tree-vect-loop-manip.c (vect_update_ivs_after_vectorizer): Don't
> >   remove PHIs that were classified as reductions but never actually
> >   vectorized.
> Presumably you'll look at killing the actual DCE pass we're running as a
> follow-up.
>
> I'm hoping this will help the tendency of the vectorizer to leak
> SSA_NAMEs.  Though it's late enough in the pipeline that leaking isn't
> that bad.
>
>
>
>
>
> > Index: gcc/tree-vectorizer.c
> > ===
> > --- gcc/tree-vectorizer.c 2018-08-28 12:05:11.466982927 +0100
> > +++ gcc/tree-vectorizer.c 2018-08-28 12:05:14.014961439 +0100
> > @@ -654,8 +654,13 @@ vec_info::remove_stmt (stmt_vec_info stm
> >set_vinfo_for_stmt (stmt_info->stmt, NULL);
> >gimple_stmt_iterator si = gsi_for_stmt (stmt_info->stmt);
> >unlink_stmt_vdef (stmt_info->stmt);
> > -  gsi_remove (&si, true);
> > -  release_defs (stmt_info->stmt);
> > +  if (is_a  (stmt_info->stmt))
> > +remove_phi_node (&si, true);
> > +  else
> > +{
> > +  gsi_remove (&si, true);
> > +  release_defs (stmt_info->stmt);
> > +}
> More than once I've wondered if we should factor this into its own
> little function.  I'm pretty sure I've seen this style of code
> elsewhere.  Your call whether or not to clean that up.

I think "merging" remove_phi_node and gsi_remove would make more sense.
There's the non-obvious semantic difference of the boolean argument so that
would need cleaning up.  Usually removing also involves calling
unlink_stmt_vdef.

Richard.

> OK with or without creating a little helper for the code above.
> jeff