Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)

Richard Biener Tue, 15 Sep 2015 05:10:24 -0700

On Thu, Sep 10, 2015 at 4:51 PM, Alan Hayward <alan.hayw...@arm.com> wrote:
> Hi,
> This patch (attached) adds support for vectorizing conditional expressions
> (PR 65947), for example:
>
> int condition_reduction (int *a, int min_v)
> {
>   int last = 0;
>   for (int i = 0; i < N; i++)
>     if (a[i] < min_v)
>       last = a[i];
>   return last;
> }
>
> To do this the loop is vectorised to create a vector of data results (ie
> of matching a[i] values). Using an induction variable, an additional
> vector is added containing the indexes where the matches occured. In the
> function epilogue this is reduced to a single max value and then used to
> index into the vector of data results.
> When no values are matched in the loop, the indexes vector will contain
> all zeroes, eventually matching the first entry in the data results vector.
>
> To vectorize sucessfully, support is required for REDUC_MAX_EXPR. This is
> supported by aarch64 and arm. On X86 and powerpc, gcc will complain that
> REDUC_MAX_EXPR is not supported for the required modes, failing the
> vectorization. On mips it complains that the required vcond expression is
> not supported. It is suggested the relevant backend experts add the
> required backend support.
>
> Using a simple testcase based around a large number of N and run on an
> aarch64 juno board, with the patch in use, the runtime reduced to 0.8 of
> it's original time.
>
> This patch caused binary differences in three spec2006 binaries on aarch64
> - 4.16.gamess, 435.gromacs and 456.hmmer. Running them on a juno board
> showed no improvement or degregation in runtime.
>
>
> In the near future I hope to submit a further patch (as PR 66558) which
> optimises the case where the result is simply the index of the loop, for
> example:
> int condition_reduction (int *a, int min_v)
> {
>   int last = 0;
>   for (int i = 0; i < N; i++)
>     if (a[i] < min_v)
>       last = i;
>   return last;
> }
> In this case a lot of the new code can be optimized away.
>
> I have run check for aarch64, arm and x86 and have seen no regressions.


Now comments on the patch itself.

+      if (code == COND_EXPR)
+       *v_reduc_type = COND_REDUCTION;

so why not simply use COND_EXPR instead of the new v_reduc_type?

+  if (check_reduction && code != COND_EXPR &&
+      vect_is_slp_reduction (loop_info, phi, def_stmt))

&&s go to the next line

+             /* Reduction of the max index and a reduction of the found
+                values.  */
+             epilogue_cost += add_stmt_cost (target_cost_data, 1,
+                                             vec_to_scalar, stmt_info, 0,
+                                             vect_epilogue);

vec_to_scalar once isn't what the comment suggests.  Instead the
comment suggests twice what a regular reduction would do
but I guess we can "hide" the vec_to_scalar cost and "merge" it
with the broadcast.  Thus make the above two vector_stmt costs?

+             /* A broadcast of the max value.  */
+             epilogue_cost += add_stmt_cost (target_cost_data, 2,
+                                             scalar_to_vec, stmt_info, 0,
+                                             vect_epilogue);

comment suggests a single broadcast.

@@ -3705,7 +3764,7 @@ get_initial_def_for_induction (gimple iv_phi)
          the final vector of induction results:  */
       exit_phi = NULL;
       FOR_EACH_IMM_USE_FAST (use_p, imm_iter, loop_arg)
-        {
+       {
          gimple use_stmt = USE_STMT (use_p);
          if (is_gimple_debug (use_stmt))
            continue;

please avoid unrelated whitespace changes.

+      case COND_EXPR:
+       if (v_reduc_type == COND_REDUCTION)
+         {
...
+       /* Fall through.  */
+
       case MIN_EXPR:
       case MAX_EXPR:
-      case COND_EXPR:

aww, so we could already handle COND_EXPR reductions?  How do they
differ from what you add?  Did you see if that path is even exercised today?

+           /* Create a vector of {init_value, 0, 0, 0...}.  */
+           vec<constructor_elt, va_gc> *v;
+           vec_alloc (v, nunits);
+           CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, init_val);
+           if (SCALAR_FLOAT_TYPE_P (scalar_type))
+             for (i = 1; i < nunits; ++i)
+               CONSTRUCTOR_APPEND_ELT (v, NULL_TREE,
+                                       build_real (scalar_type, dconst0));
+           else
+             for (i = 1; i < nunits; ++i)
+               CONSTRUCTOR_APPEND_ELT (v, NULL_TREE,
+                                       build_int_cst (scalar_type, 0));
+           init_def = build_constructor (vectype, v);

you can unify the float/int case by using build_zero_cst (scalar_type).
Note that you should build a vector constant instead of a constructor
if init_val is a constant.  The convenient way is to build the vector
elements into a tree[] array and use build_vector_stat in that case.

+      /* Find maximum value from the vector of found indexes.  */
+      tree max_index = make_temp_ssa_name (index_scalar_type, NULL, "");

just use make_ssa_name (index_scalar_type);

+      /* Convert the vector of data to the same type as the EQ.  */
+      tree vec_data_cast;
+      if ( TYPE_UNSIGNED (index_vec_type))
+       {

How come it never happens the element
sizes do not match?  (int index type and double data type?)

+      /* Where the max index occured, use the value from the data vector.  */
+      tree vec_and = make_temp_ssa_name (index_vec_type_signed, NULL, "");
+      gimple vec_and_stmt = gimple_build_assign (vec_and, BIT_AND_EXPR,
+                                                vec_compare, vec_data_cast);

that is, don't you need to do some widening/shortening on the comparison result?
(also what happens in the case "or all of the values"?)

Definitely too much VIEW_CONVERT_MAGIC here for my taste ;)

+   This function also handles reduction of condition expressions, for example:
+     for (int i = 0; i < N; i++)
+       if (a[i] < value)
+        last = a[i];
+   This is handled by vectorising the loop and creating an additional vector
+   containing the loop indexes for which "a[i] < value" was true.  In the
+   function epilogue this is reduced to a single max value and then used to
+   index into the vector of results.

I miss a comment that shows the kind of code we transform this to.
"an additional vector containing the loop indexes" can't work - the vector
will not be large enough ;)  Naiively I would have made 'last' a vector,
performing the reduction element-wise and in the epilogue reduce
'last' itself.  And it looks like we are already doing that for

int foo (int *a)
{
  int val = 0;
  for (int i = 0; i < 1024; ++i)
    if (a[i] > val)
      val = a[i];
  return val;
}

I must be missing something.  Yes, I think we can't do index reduction yet,
but pr65947-10.c is alrady handled?

Stopping here.

Thanks,
Richard.

>
> Changelog:
>
>     2015-08-28  Alan Hayward <alan.hayw...@arm.com>
>
>         PR tree-optimization/65947
>         * tree-vect-loop.c
>         (vect_is_simple_reduction_1): Find condition reductions.
>         (vect_model_reduction_cost): Add condition reduction costs.
>         (get_initial_def_for_reduction): Add condition reduction initial
> var.
>         (vect_create_epilog_for_reduction): Add condition reduction epilog.
>         (vectorizable_reduction): Condition reduction support.
>         * tree-vect-stmts.c
>         (vectorizable_condition): Add vect reduction arg
>         * doc/sourcebuild.texi (Vector-specific attributes): Document
>         vect_max_reduc
>
>     testsuite/Changelog:
>
>         PR tree-optimization/65947
>         * lib/target-supports.exp
>         (check_effective_target_vect_max_reduc): Add.
>         * gcc.dg/vect/pr65947-1.c: New test.
>         * gcc.dg/vect/pr65947-2.c: New test.
>         * gcc.dg/vect/pr65947-3.c: New test.
>         * gcc.dg/vect/pr65947-4.c: New test.
>         * gcc.dg/vect/pr65947-5.c: New test.
>         * gcc.dg/vect/pr65947-6.c: New test.
>         * gcc.dg/vect/pr65947-7.c: New test.
>         * gcc.dg/vect/pr65947-8.c: New test.
>         * gcc.dg/vect/pr65947-9.c: New test.
>         * gcc.dg/vect/pr65947-10.c: New test.
>         * gcc.dg/vect/pr65947-11.c: New test.
>
>
>
> Thanks,
> Alan
>
>

Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)

Reply via email to