On 30/09/2015 13:45, "Richard Biener" <richard.guent...@gmail.com> wrote:
>On Wed, Sep 23, 2015 at 5:51 PM, Alan Hayward <alan.hayw...@arm.com> >wrote: >> >> >> On 18/09/2015 14:53, "Alan Hayward" <alan.hayw...@arm.com> wrote: >> >>> >>> >>>On 18/09/2015 14:26, "Alan Lawrence" <alan.lawre...@arm.com> wrote: >>> >>>>On 18/09/15 13:17, Richard Biener wrote: >>>>> >>>>> Ok, I see. >>>>> >>>>> That this case is already vectorized is because it implements >>>>>MAX_EXPR, >>>>> modifying it slightly to >>>>> >>>>> int foo (int *a) >>>>> { >>>>> int val = 0; >>>>> for (int i = 0; i < 1024; ++i) >>>>> if (a[i] > val) >>>>> val = a[i] + 1; >>>>> return val; >>>>> } >>>>> >>>>> makes it no longer handled by current code. >>>>> >>>> >>>>Yes. I believe the idea for the patch is to handle arbitrary >>>>expressions >>>>like >>>> >>>>int foo (int *a) >>>>{ >>>> int val = 0; >>>> for (int i = 0; i < 1024; ++i) >>>> if (some_expression (i)) >>>> val = another_expression (i); >>>> return val; >>>>} >>> >>>Yes, that’s correct. Hopefully my new test cases should cover >>>everything. >>> >> >> Attached is a new version of the patch containing all the changes >> requested by Richard. > >+ /* Compare the max index vector to the vector of found indexes to >find >+ the postion of the max value. This will result in either a >single >+ match or all of the values. */ >+ tree vec_compare = make_ssa_name (index_vec_type_signed); >+ gimple vec_compare_stmt = gimple_build_assign (vec_compare, >EQ_EXPR, >+ induction_index, >+ max_index_vec); > >I'm not sure all targets can handle this. If I deciper the code >correctly then we do > > mask = induction_index == max_index_vec; > vec_and = mask & vec_data; > >plus some casts. So this is basically > > vec_and = induction_index == max_index_vec ? vec_data : {0, 0, ... }; > >without the need to relate the induction index vector type to the data >vector type. >I believe this is also the form all targets support. Ok, I’ll replace this. > >I am missing a comment before all this code-generation that shows the >transform >result with the variable names used in the code-gen. I have a hard >time connecting >things here. Ok, I’ll add some comments. > >+ tree matched_data_reduc_cast = build1 (VIEW_CONVERT_EXPR, >scalar_type, >+ matched_data_reduc); >+ epilog_stmt = gimple_build_assign (new_scalar_dest, >+ matched_data_reduc_cast); >+ new_temp = make_ssa_name (new_scalar_dest, epilog_stmt); >+ gimple_assign_set_lhs (epilog_stmt, new_temp); > >this will leave the stmt unsimplified. scalar sign-changes should use >NOP_EXPR, >not VIEW_CONVERT_EXPR. The easiest fix is to use fold_convert instead. >Also just do like before - first make_ssa_name and then directly use it >in the >gimple_build_assign. We need the VIEW_CONVERT_EXPR for the cases where we have float data values. The index is always integer. > >The patch is somewhat hard to parse with all the indentation changes. A >context >diff would be much easier to read in those contexts. Ok, I’ll make the next patch like that > >+ if (v_reduc_type == COND_REDUCTION) >+ { >+ widest_int ni; >+ >+ if (! max_loop_iterations (loop, &ni)) >+ { >+ if (dump_enabled_p ()) >+ dump_printf_loc (MSG_NOTE, vect_location, >+ "loop count not known, cannot create cond " >+ "reduction.\n"); > >ugh. That's bad. > >+ /* The additional index will be the same type as the condition. >Check >+ that the loop can fit into this less one (because we'll use up >the >+ zero slot for when there are no matches). */ >+ tree max_index = TYPE_MAX_VALUE (cr_index_scalar_type); >+ if (wi::geu_p (ni, wi::to_widest (max_index))) >+ { >+ if (dump_enabled_p ()) >+ dump_printf_loc (MSG_NOTE, vect_location, >+ "loop size is greater than data size.\n"); >+ return false; > >Likewise. We could do better if we made the index type larger. But as a first implementation of this optimisation, I didn’t want to overcomplicate things more. > >@@ -5327,6 +5540,8 @@ vectorizable_reduction (gimple stmt, >gimple_stmt_iterator *gsi, > if (dump_enabled_p ()) > dump_printf_loc (MSG_NOTE, vect_location, "transform reduction.\n"); > >+ STMT_VINFO_TYPE (stmt_info) = reduc_vec_info_type; >+ > /* FORNOW: Multiple types are not supported for condition. */ > if (code == COND_EXPR) > >this change looks odd (or wrong). The type should be _only_ set/changed >during >analysis. The problem is, for COND_EXPRs, this function calls vectorizable_condition(), which sets STMT_VINFO_TYPE to condition_vec_info_type. Therefore we need something to restore it back to reduc_vec_info_type on the non-analysis call. I considered setting STMT_VINFO_TYPE to reduc_vec_info_type directly after the call to vectorizable_condition(), but that looked worse. I could back up the value of STMT_VINFO_TYPE before calling vectorizable_condition() and then restore it after? I think that’ll look a lot better. > >+ >+ /* For cond reductions we need to add an additional conditional >based on >+ the loop index. */ >+ if (v_reduc_type == COND_REDUCTION) >+ { >+ int nunits_out = TYPE_VECTOR_SUBPARTS (vectype_out); >+ int k; >... >+ STMT_VINFO_VECTYPE (index_vec_info) = cr_index_vector_type; >+ set_vinfo_for_stmt (index_condition, index_vec_info); >+ >+ /* Update the phi with the vec cond. */ >+ add_phi_arg (new_phi, cond_name, loop_latch_edge (loop), >+ UNKNOWN_LOCATION); > >same as before - I am missing a comment that shows the generated code >and connects >the local vars used. Ok, I’ll add something > > >+ tree ccompare_name = make_ssa_name (TREE_TYPE (ccompare)); >+ gimple ccompare_stmt = gimple_build_assign (ccompare_name, >ccompare); >+ gsi_insert_before (&vec_stmt_gsi, ccompare_stmt, GSI_SAME_STMT); >+ gimple_assign_set_rhs1 (*vec_stmt, ccompare_name); > >hum - are you sure this works with ncopies > 1? Will it use the >correct vec_stmt? We don’t support this when ncopies >1. In vectorizable_reduction(): if ((double_reduc || v_reduc_type == COND_REDUCTION) && ncopies > 1) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "multiple types in double reduction or condition " "reduction.\n"); return false; } > >I still dislike the v_reduc_type plastered and passed everywhere. Can >you explore >adding the reduction kind to stmt_info? Ok, I can do that. Thanks for the comments. I’ll put together a patch with the above changes. Thanks, Alan.