On Wed, May 18, 2016 at 5:06 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: > On Tue, May 17, 2016 at 12:08 PM, Richard Biener > <richard.guent...@gmail.com> wrote: >> On Mon, May 16, 2016 at 10:09 AM, Bin.Cheng <amker.ch...@gmail.com> wrote: >>> On Fri, May 13, 2016 at 5:53 PM, Richard Biener >>> <richard.guent...@gmail.com> wrote: >>>> On May 13, 2016 6:02:27 PM GMT+02:00, Bin Cheng <bin.ch...@arm.com> wrote: >>>>>Hi, >>>>>As PR69848 reported, GCC vectorizer now generates comparison outside of >>>>>VEC_COND_EXPR for COND_REDUCTION case, as below: >>>>> >>>>> _20 = vect__1.6_8 != { 0, 0, 0, 0 }; >>>>> vect_c_2.8_16 = VEC_COND_EXPR <_20, { 0, 0, 0, 0 }, vect_c_2.7_13>; >>>>> _21 = VEC_COND_EXPR <_20, ivtmp_17, _19>; >>>>> >>>>>This results in inefficient expanding. With IR like: >>>>> >>>>>vect_c_2.8_16 = VEC_COND_EXPR <vect__1.6_8 != { 0, 0, 0, 0 }, { 0, 0, >>>>>0, 0 }, vect_c_2.7_13>; >>>>> _21 = VEC_COND_EXPR <vect__1.6_8 != { 0, 0, 0, 0 }, ivtmp_17, _19>; >>>>> >>>>>We can do: >>>>>1) Expanding time optimization, for example, reverting comparison >>>>>operator by switching VEC_COND_EXPR operands. This is useful when >>>>>backend only supports some comparison operators. >>>>>2) For backend not supporting vcond_mask patterns, saving one LT_EXPR >>>>>instruction which introduced by expand_vec_cond_expr. >>>>> >>>>>This patch fixes this by propagating comparison into VEC_COND_EXPR even >>>>>if it's used multiple times. For now, GCC does single_use_only >>>>>propagation. Ideally, we may duplicate the comparison before each use >>>>>statement just before expanding, so that TER can successfully backtrack >>>>>it from each VEC_COND_EXPR. Unfortunately I didn't find a good pass to >>>>>do this. Tree-vect-generic.c looks like a good candidate, but it's so >>>>>early that following CSE could undo the transform. Another possible >>>>>fix is to generate comparison inside VEC_COND_EXPR directly in function >>>>>vectorizable_reduction. >>>> >>>> I prefer this for now. >>> Hi Richard, you mean this patch, or the possible fix before your comment? >> >> The possible fix before my comment - make the vectorizer generate >> VEC_COND_EXPRs >> with embedded comparison. > Hi, > Here is updated patch doing that. It's definitely clearer than the > original version. > Bootstrap and test on x86_64. Also checked the expanding time > optimization still happens. Is it OK?
Yes. Thanks, Richard. > Thanks, > bin >> >> Thanks, >> Richard. >>