On Mon, 13 Jan 2014, Cong Hou wrote:

> I noticed that LIM could not hoist vector invariant, and that is why
> my first implementation tries to hoist them all.

Yes, I filed PR59786 for this.  I'll see if I can come up with
a fix suitable for stage3.

> In addition, there are two disadvantages of hoisting invariant load +
> lim method:
> 
> First, for some instructions the scalar version is faster than the
> vector version, and in this case hoisting scalar instructions before
> vectorization is better. Those instructions include data
> packing/unpacking, integer multiplication with SSE2, etc..
> 
> Second, it may use more SIMD registers.
> 
> The following code shows a simple example:
> 
> char *a, *b, *c;
> for (int i = 0; i < N; ++i)
>   a[i] = b[0] * c[0] + a[i];
> 
> Vectorizing b[0]*c[0] is worse than loading the result of b[0]*c[0]
> into a vector.

Yes.  I've tried with adjusting the vec_def_type as in the prototype
patch I sent before christmas but that's quite intrusive for this
stage.  You could argue that performing invariant motion is not
really the vectorizers main task and that a combination of hoisting
only the load, later LIM hoisting the rest and then tree-vect-generic.c
demoting vector ops to scalar ops (unimplemented, but also a useful
general optimization) would work as well.

That said, we should definitely have a second look for 4.10.  For now
hoisting the load is an improvement over 4.8 (at least I hope so ;))
which needs to be good enough for 4.9.

I had to fix a latent bug to cure some testsuite fallout so the following
is what I ended up committing.

Jakub, adding the new flag is ok with me.

Thanks,
Richard.

2014-01-14  Richard Biener  <rguent...@suse.de>

        PR tree-optimization/58921
        PR tree-optimization/59006
        * tree-vect-loop-manip.c (vect_loop_versioning): Remove code
        hoisting invariant stmts.
        * tree-vect-stmts.c (vectorizable_load): Insert the splat of
        invariant loads on the preheader edge if possible.

        * gcc.dg/torture/pr58921.c: New testcase.
        * gcc.dg/torture/pr59006.c: Likewise.
        * gcc.dg/vect/pr58508.c: XFAIL no longer handled cases.

Index: gcc/tree-vect-loop-manip.c
===================================================================
*** gcc/tree-vect-loop-manip.c  (revision 206576)
--- gcc/tree-vect-loop-manip.c  (working copy)
*************** vect_loop_versioning (loop_vec_info loop
*** 2435,2507 ****
        }
      }
  
- 
-   /* Extract load statements on memrefs with zero-stride accesses.  */
- 
-   if (LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo))
-     {
-       /* In the loop body, we iterate each statement to check if it is a load.
-        Then we check the DR_STEP of the data reference.  If DR_STEP is zero,
-        then we will hoist the load statement to the loop preheader.  */
- 
-       basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
-       int nbbs = loop->num_nodes;
- 
-       for (int i = 0; i < nbbs; ++i)
-       {
-         for (gimple_stmt_iterator si = gsi_start_bb (bbs[i]);
-              !gsi_end_p (si);)
-           {
-             gimple stmt = gsi_stmt (si);
-             stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
-             struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
- 
-             if (is_gimple_assign (stmt)
-                 && (!dr
-                     || (DR_IS_READ (dr) && integer_zerop (DR_STEP (dr)))))
-               {
-                 bool hoist = true;
-                 ssa_op_iter iter;
-                 tree var;
- 
-                 /* We hoist a statement if all SSA uses in it are defined
-                    outside of the loop.  */
-                 FOR_EACH_SSA_TREE_OPERAND (var, stmt, iter, SSA_OP_USE)
-                   {
-                     gimple def = SSA_NAME_DEF_STMT (var);
-                     if (!gimple_nop_p (def)
-                         && flow_bb_inside_loop_p (loop, gimple_bb (def)))
-                       {
-                         hoist = false;
-                         break;
-                       }
-                   }
- 
-                 if (hoist)
-                   {
-                     if (dr)
-                       gimple_set_vuse (stmt, NULL);
- 
-                     gsi_remove (&si, false);
-                     gsi_insert_on_edge_immediate (loop_preheader_edge (loop),
-                                                   stmt);
- 
-                     if (dump_enabled_p ())
-                       {
-                         dump_printf_loc
-                             (MSG_NOTE, vect_location,
-                              "hoisting out of the vectorized loop: ");
-                         dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
-                         dump_printf (MSG_NOTE, "\n");
-                       }
-                     continue;
-                   }
-               }
-             gsi_next (&si);
-           }
-       }
-     }
- 
    /* End loop-exit-fixes after versioning.  */
  
    if (cond_expr_stmt_list)
--- 2435,2440 ----
Index: gcc/tree-vect-stmts.c
===================================================================
*** gcc/tree-vect-stmts.c       (revision 206576)
--- gcc/tree-vect-stmts.c       (working copy)
*************** vectorizable_load (gimple stmt, gimple_s
*** 6368,6379 ****
              /* 4. Handle invariant-load.  */
              if (inv_p && !bb_vinfo)
                {
-                 gimple_stmt_iterator gsi2 = *gsi;
                  gcc_assert (!grouped_load);
!                 gsi_next (&gsi2);
!                 new_temp = vect_init_vector (stmt, scalar_dest,
!                                              vectype, &gsi2);
                  new_stmt = SSA_NAME_DEF_STMT (new_temp);
                }
  
              if (negative)
--- 6368,6406 ----
              /* 4. Handle invariant-load.  */
              if (inv_p && !bb_vinfo)
                {
                  gcc_assert (!grouped_load);
!                 /* If we have versioned for aliasing then we are sure
!                    this is a loop invariant load and thus we can insert
!                    it on the preheader edge.  */
!                 if (LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo))
!                   {
!                     if (dump_enabled_p ())
!                       {
!                         dump_printf_loc (MSG_NOTE, vect_location,
!                                          "hoisting out of the vectorized "
!                                          "loop: ");
!                         dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
!                         dump_printf (MSG_NOTE, "\n");
!                       }
!                     tree tem = copy_ssa_name (scalar_dest, NULL);
!                     gsi_insert_on_edge_immediate
!                       (loop_preheader_edge (loop),
!                        gimple_build_assign (tem,
!                                             unshare_expr
!                                               (gimple_assign_rhs1 (stmt))));
!                     new_temp = vect_init_vector (stmt, tem, vectype, NULL);
!                   }
!                 else
!                   {
!                     gimple_stmt_iterator gsi2 = *gsi;
!                     gsi_next (&gsi2);
!                     new_temp = vect_init_vector (stmt, scalar_dest,
!                                                  vectype, &gsi2);
!                   }
                  new_stmt = SSA_NAME_DEF_STMT (new_temp);
+                 set_vinfo_for_stmt (new_stmt,
+                                     new_stmt_vec_info (new_stmt, loop_vinfo,
+                                                        bb_vinfo));
                }
  
              if (negative)
Index: gcc/testsuite/gcc.dg/torture/pr58921.c
===================================================================
*** gcc/testsuite/gcc.dg/torture/pr58921.c      (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr58921.c      (working copy)
***************
*** 0 ****
--- 1,17 ----
+ /* { dg-do compile } */
+ 
+ int a[7];
+ int b;
+ 
+ void
+ fn1 ()
+ {
+   for (; b; b++)
+     a[b] = ((a[b] <= 0) == (a[0] != 0));
+ }
+ 
+ int
+ main ()
+ {
+   return 0;
+ }
Index: gcc/testsuite/gcc.dg/torture/pr59006.c
===================================================================
*** gcc/testsuite/gcc.dg/torture/pr59006.c      (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr59006.c      (working copy)
***************
*** 0 ****
--- 1,13 ----
+ /* { dg-do compile } */
+ 
+ int a[8], b;
+ void fn1(void)
+ {
+   int c;
+   for (; b; b++)
+     {
+       int d = a[b];
+       c = a[0] ? d : 0;
+       a[b] = c;
+     }
+ }
Index: gcc/testsuite/gcc.dg/vect/pr58508.c
===================================================================
*** gcc/testsuite/gcc.dg/vect/pr58508.c (revision 206576)
--- gcc/testsuite/gcc.dg/vect/pr58508.c (working copy)
*************** void test5 (int* a, int* b)
*** 66,70 ****
      }
  }
  
! /* { dg-final { scan-tree-dump-times "hoist" 8 "vect" { xfail vect_no_align } 
} } */
  /* { dg-final { cleanup-tree-dump "vect" } } */
--- 66,71 ----
      }
  }
  
! /* { dg-final { scan-tree-dump-times "hoist" 8 "vect" { xfail *-*-* } } } */
! /* { dg-final { scan-tree-dump-times "hoist" 3 "vect" { xfail vect_no_align } 
} } */
  /* { dg-final { cleanup-tree-dump "vect" } } */

Reply via email to