On Thu, Sep 10, 2015 at 4:51 PM, Alan Hayward <alan.hayw...@arm.com> wrote: > Hi, > This patch (attached) adds support for vectorizing conditional expressions > (PR 65947), for example: > > int condition_reduction (int *a, int min_v) > { > int last = 0; > for (int i = 0; i < N; i++) > if (a[i] < min_v) > last = a[i]; > return last; > } > > To do this the loop is vectorised to create a vector of data results (ie > of matching a[i] values). Using an induction variable, an additional > vector is added containing the indexes where the matches occured. In the > function epilogue this is reduced to a single max value and then used to > index into the vector of data results. > When no values are matched in the loop, the indexes vector will contain > all zeroes, eventually matching the first entry in the data results vector. > > To vectorize sucessfully, support is required for REDUC_MAX_EXPR. This is > supported by aarch64 and arm. On X86 and powerpc, gcc will complain that > REDUC_MAX_EXPR is not supported for the required modes, failing the > vectorization. On mips it complains that the required vcond expression is > not supported. It is suggested the relevant backend experts add the > required backend support. > > Using a simple testcase based around a large number of N and run on an > aarch64 juno board, with the patch in use, the runtime reduced to 0.8 of > it's original time. > > This patch caused binary differences in three spec2006 binaries on aarch64 > - 4.16.gamess, 435.gromacs and 456.hmmer. Running them on a juno board > showed no improvement or degregation in runtime. > > > In the near future I hope to submit a further patch (as PR 66558) which > optimises the case where the result is simply the index of the loop, for > example: > int condition_reduction (int *a, int min_v) > { > int last = 0; > for (int i = 0; i < N; i++) > if (a[i] < min_v) > last = i; > return last; > } > In this case a lot of the new code can be optimized away. > > I have run check for aarch64, arm and x86 and have seen no regressions.
Now comments on the patch itself. + if (code == COND_EXPR) + *v_reduc_type = COND_REDUCTION; so why not simply use COND_EXPR instead of the new v_reduc_type? + if (check_reduction && code != COND_EXPR && + vect_is_slp_reduction (loop_info, phi, def_stmt)) &&s go to the next line + /* Reduction of the max index and a reduction of the found + values. */ + epilogue_cost += add_stmt_cost (target_cost_data, 1, + vec_to_scalar, stmt_info, 0, + vect_epilogue); vec_to_scalar once isn't what the comment suggests. Instead the comment suggests twice what a regular reduction would do but I guess we can "hide" the vec_to_scalar cost and "merge" it with the broadcast. Thus make the above two vector_stmt costs? + /* A broadcast of the max value. */ + epilogue_cost += add_stmt_cost (target_cost_data, 2, + scalar_to_vec, stmt_info, 0, + vect_epilogue); comment suggests a single broadcast. @@ -3705,7 +3764,7 @@ get_initial_def_for_induction (gimple iv_phi) the final vector of induction results: */ exit_phi = NULL; FOR_EACH_IMM_USE_FAST (use_p, imm_iter, loop_arg) - { + { gimple use_stmt = USE_STMT (use_p); if (is_gimple_debug (use_stmt)) continue; please avoid unrelated whitespace changes. + case COND_EXPR: + if (v_reduc_type == COND_REDUCTION) + { ... + /* Fall through. */ + case MIN_EXPR: case MAX_EXPR: - case COND_EXPR: aww, so we could already handle COND_EXPR reductions? How do they differ from what you add? Did you see if that path is even exercised today? + /* Create a vector of {init_value, 0, 0, 0...}. */ + vec<constructor_elt, va_gc> *v; + vec_alloc (v, nunits); + CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, init_val); + if (SCALAR_FLOAT_TYPE_P (scalar_type)) + for (i = 1; i < nunits; ++i) + CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, + build_real (scalar_type, dconst0)); + else + for (i = 1; i < nunits; ++i) + CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, + build_int_cst (scalar_type, 0)); + init_def = build_constructor (vectype, v); you can unify the float/int case by using build_zero_cst (scalar_type). Note that you should build a vector constant instead of a constructor if init_val is a constant. The convenient way is to build the vector elements into a tree[] array and use build_vector_stat in that case. + /* Find maximum value from the vector of found indexes. */ + tree max_index = make_temp_ssa_name (index_scalar_type, NULL, ""); just use make_ssa_name (index_scalar_type); + /* Convert the vector of data to the same type as the EQ. */ + tree vec_data_cast; + if ( TYPE_UNSIGNED (index_vec_type)) + { How come it never happens the element sizes do not match? (int index type and double data type?) + /* Where the max index occured, use the value from the data vector. */ + tree vec_and = make_temp_ssa_name (index_vec_type_signed, NULL, ""); + gimple vec_and_stmt = gimple_build_assign (vec_and, BIT_AND_EXPR, + vec_compare, vec_data_cast); that is, don't you need to do some widening/shortening on the comparison result? (also what happens in the case "or all of the values"?) Definitely too much VIEW_CONVERT_MAGIC here for my taste ;) + This function also handles reduction of condition expressions, for example: + for (int i = 0; i < N; i++) + if (a[i] < value) + last = a[i]; + This is handled by vectorising the loop and creating an additional vector + containing the loop indexes for which "a[i] < value" was true. In the + function epilogue this is reduced to a single max value and then used to + index into the vector of results. I miss a comment that shows the kind of code we transform this to. "an additional vector containing the loop indexes" can't work - the vector will not be large enough ;) Naiively I would have made 'last' a vector, performing the reduction element-wise and in the epilogue reduce 'last' itself. And it looks like we are already doing that for int foo (int *a) { int val = 0; for (int i = 0; i < 1024; ++i) if (a[i] > val) val = a[i]; return val; } I must be missing something. Yes, I think we can't do index reduction yet, but pr65947-10.c is alrady handled? Stopping here. Thanks, Richard. > > Changelog: > > 2015-08-28 Alan Hayward <alan.hayw...@arm.com> > > PR tree-optimization/65947 > * tree-vect-loop.c > (vect_is_simple_reduction_1): Find condition reductions. > (vect_model_reduction_cost): Add condition reduction costs. > (get_initial_def_for_reduction): Add condition reduction initial > var. > (vect_create_epilog_for_reduction): Add condition reduction epilog. > (vectorizable_reduction): Condition reduction support. > * tree-vect-stmts.c > (vectorizable_condition): Add vect reduction arg > * doc/sourcebuild.texi (Vector-specific attributes): Document > vect_max_reduc > > testsuite/Changelog: > > PR tree-optimization/65947 > * lib/target-supports.exp > (check_effective_target_vect_max_reduc): Add. > * gcc.dg/vect/pr65947-1.c: New test. > * gcc.dg/vect/pr65947-2.c: New test. > * gcc.dg/vect/pr65947-3.c: New test. > * gcc.dg/vect/pr65947-4.c: New test. > * gcc.dg/vect/pr65947-5.c: New test. > * gcc.dg/vect/pr65947-6.c: New test. > * gcc.dg/vect/pr65947-7.c: New test. > * gcc.dg/vect/pr65947-8.c: New test. > * gcc.dg/vect/pr65947-9.c: New test. > * gcc.dg/vect/pr65947-10.c: New test. > * gcc.dg/vect/pr65947-11.c: New test. > > > > Thanks, > Alan > >