On Tue, 15 Nov 2016, Yuri Rumyantsev wrote:

> Hi All,
> 
> Here is patch for non-masked epilogue vectoriziation.
> 
> Bootstrap and regression testing did not show any new failures.
> 
> Is it OK for trunk?

Ok for trunk.

I believe we ultimatively want to remove the new
--param and enable this by default with a better cost model.
What immediately comes to my mind when seeing

+  if (epilogue)
+    {
+       unsigned int vector_sizes
+         = targetm.vectorize.autovectorize_vector_sizes ();
+       vector_sizes &= current_vector_size - 1;
+
+       if (!PARAM_VALUE (PARAM_VECT_EPILOGUES_NOMASK))
+         epilogue = NULL;
+       else if (!vector_sizes)
+         epilogue = NULL;
+       else if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
+                && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) >= 0)
+         {
+           int smallest_vec_size = 1 << ctz_hwi (vector_sizes);
+           int ratio = current_vector_size / smallest_vec_size;
+           int eiters = LOOP_VINFO_INT_NITERS (loop_vinfo)
+             - LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo);
+           eiters = eiters % vf;
+
+           epilogue->nb_iterations_upper_bound = eiters - 1;
+
+           if (eiters < vf / ratio)
+             epilogue = NULL;
+           }

is that we have targetm.vectorize.preferred_simd_mode which
for example with -mprefer-avx128 will first try with 128bit
vectorization.  So if a ! prefered vector size ends up
creating the epilogue we know vectorizing with the prefered
size will fail.  The above also does not take into account
peeling for gaps in which case we know the epilogue runs at
least VF times (but the vectorized epilogue might also need
an epilogue itself if not masked).

The natural next step is the masked epilogue support, after
that the merged masked epilogue one.

Thanks,
Richard.

> Thanks.
> Changelog:
> 
> 2016-11-15  Yuri Rumyantsev  <ysrum...@gmail.com>
> 
> * params.def (PARAM_VECT_EPILOGUES_NOMASK): New.
> * tree-if-conv.c (tree_if_conversion): Make public.
> * * tree-if-conv.h: New file.
> * tree-vect-data-refs.c (vect_analyze_data_ref_dependences) Avoid
> dynamic alias checks for epilogues.
> * tree-vect-loop-manip.c (vect_do_peeling): Return created epilog.
> * tree-vect-loop.c: include tree-if-conv.h.
> (new_loop_vec_info): Add zeroing orig_loop_info field.
> (vect_analyze_loop_2): Don't try to enhance alignment for epilogues.
> (vect_analyze_loop): Add argument ORIG_LOOP_INFO which is not NULL
> if epilogue is vectorized, set up orig_loop_info field of loop_vinfo
> using passed argument.
> (vect_transform_loop): Check if created epilogue should be returned
> for further vectorization with less vf.  If-convert epilogue if
> required. Print vectorization success for epilogue.
> * tree-vectorizer.c (vectorize_loops): Add epilogue vectorization
> if it is required, pass loop_vinfo produced during vectorization of
> loop body to vect_analyze_loop.
> * tree-vectorizer.h (struct _loop_vec_info): Add new field
> orig_loop_info.
> (LOOP_VINFO_ORIG_LOOP_INFO): New.
> (LOOP_VINFO_EPILOGUE_P): New.
> (LOOP_VINFO_ORIG_VECT_FACTOR): New.
> (vect_do_peeling): Change prototype to return epilogue.
> (vect_analyze_loop): Add argument of loop_vec_info type.
> (vect_transform_loop): Return created loop.
> 
> gcc/testsuite/
> 
> * lib/target-supports.exp (check_avx2_hw_available): New.
> (check_effective_target_avx2_runtime): New.
> * gcc.dg/vect/vect-tail-nomask-1.c: New test.
> 
> 
> 2016-11-14 20:04 GMT+03:00 Richard Biener <rguent...@suse.de>:
> > On November 14, 2016 4:39:40 PM GMT+01:00, Yuri Rumyantsev 
> > <ysrum...@gmail.com> wrote:
> >>Richard,
> >>
> >>I checked one of the tests designed for epilogue vectorization using
> >>patches 1 - 3 and found out that build compiler performs vectorization
> >>of epilogues with --param vect-epilogues-nomask=1 passed:
> >>
> >>$ gcc -Ofast -mavx2 t1.c -S --param vect-epilogues-nomask=1 -o
> >>t1.new-nomask.s -fdump-tree-vect-details
> >>$ grep VECTORIZED -c t1.c.156t.vect
> >>4
> >> Without param only 2 loops are vectorized.
> >>
> >>Should I simply add a part of tests related to this feature or I must
> >>delete all not necessary changes also?
> >
> > Please remove all not necessary changes.
> >
> > Richard.
> >
> >>Thanks.
> >>Yuri.
> >>
> >>2016-11-14 16:40 GMT+03:00 Richard Biener <rguent...@suse.de>:
> >>> On Mon, 14 Nov 2016, Yuri Rumyantsev wrote:
> >>>
> >>>> Richard,
> >>>>
> >>>> In my previous patch I forgot to remove couple lines related to aux
> >>field.
> >>>> Here is the correct updated patch.
> >>>
> >>> Yeah, I noticed.  This patch would be ok for trunk (together with
> >>> necessary parts from 1 and 2) if all not required parts are removed
> >>> (and you'd add the testcases covering non-masked tail vect).
> >>>
> >>> Thus, can you please produce a single complete patch containing only
> >>> non-masked epilogue vectoriziation?
> >>>
> >>> Thanks,
> >>> Richard.
> >>>
> >>>> Thanks.
> >>>> Yuri.
> >>>>
> >>>> 2016-11-14 15:51 GMT+03:00 Richard Biener <rguent...@suse.de>:
> >>>> > On Fri, 11 Nov 2016, Yuri Rumyantsev wrote:
> >>>> >
> >>>> >> Richard,
> >>>> >>
> >>>> >> I prepare updated 3 patch with passing additional argument to
> >>>> >> vect_analyze_loop as you proposed (untested).
> >>>> >>
> >>>> >> You wrote:
> >>>> >> tw, I wonder if you can produce a single patch containing just
> >>>> >> epilogue vectorization, that is combine patches 1-3 but rip out
> >>>> >> changes only needed by later patches?
> >>>> >>
> >>>> >> Did you mean that I exclude all support for vectorization
> >>epilogues,
> >>>> >> i.e. exclude from 2-nd patch all non-related changes
> >>>> >> like
> >>>> >>
> >>>> >> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> >>>> >> index 11863af..32011c1 100644
> >>>> >> --- a/gcc/tree-vect-loop.c
> >>>> >> +++ b/gcc/tree-vect-loop.c
> >>>> >> @@ -1120,6 +1120,12 @@ new_loop_vec_info (struct loop *loop)
> >>>> >>    LOOP_VINFO_PEELING_FOR_GAPS (res) = false;
> >>>> >>    LOOP_VINFO_PEELING_FOR_NITER (res) = false;
> >>>> >>    LOOP_VINFO_OPERANDS_SWAPPED (res) = false;
> >>>> >> +  LOOP_VINFO_CAN_BE_MASKED (res) = false;
> >>>> >> +  LOOP_VINFO_REQUIRED_MASKS (res) = 0;
> >>>> >> +  LOOP_VINFO_COMBINE_EPILOGUE (res) = false;
> >>>> >> +  LOOP_VINFO_MASK_EPILOGUE (res) = false;
> >>>> >> +  LOOP_VINFO_NEED_MASKING (res) = false;
> >>>> >> +  LOOP_VINFO_ORIG_LOOP_INFO (res) = NULL;
> >>>> >
> >>>> > Yes.
> >>>> >
> >>>> >> Did you mean also that new combined patch must be working patch,
> >>i.e.
> >>>> >> can be integrated without other patches?
> >>>> >
> >>>> > Yes.
> >>>> >
> >>>> >> Could you please look at updated patch?
> >>>> >
> >>>> > Will do.
> >>>> >
> >>>> > Thanks,
> >>>> > Richard.
> >>>> >
> >>>> >> Thanks.
> >>>> >> Yuri.
> >>>> >>
> >>>> >> 2016-11-10 15:36 GMT+03:00 Richard Biener <rguent...@suse.de>:
> >>>> >> > On Thu, 10 Nov 2016, Richard Biener wrote:
> >>>> >> >
> >>>> >> >> On Tue, 8 Nov 2016, Yuri Rumyantsev wrote:
> >>>> >> >>
> >>>> >> >> > Richard,
> >>>> >> >> >
> >>>> >> >> > Here is updated 3 patch.
> >>>> >> >> >
> >>>> >> >> > I checked that all new tests related to epilogue
> >>vectorization passed with it.
> >>>> >> >> >
> >>>> >> >> > Your comments will be appreciated.
> >>>> >> >>
> >>>> >> >> A lot better now.  Instead of the ->aux dance I now prefer to
> >>>> >> >> pass the original loops loop_vinfo to vect_analyze_loop as
> >>>> >> >> optional argument (if non-NULL we analyze the epilogue of that
> >>>> >> >> loop_vinfo).  OTOH I remember we mainly use it to get at the
> >>>> >> >> original vectorization factor?  So we can pass down an
> >>(optional)
> >>>> >> >> forced vectorization factor as well?
> >>>> >> >
> >>>> >> > Btw, I wonder if you can produce a single patch containing just
> >>>> >> > epilogue vectorization, that is combine patches 1-3 but rip out
> >>>> >> > changes only needed by later patches?
> >>>> >> >
> >>>> >> > Thanks,
> >>>> >> > Richard.
> >>>> >> >
> >>>> >> >> Richard.
> >>>> >> >>
> >>>> >> >> > 2016-11-08 15:38 GMT+03:00 Richard Biener
> >><rguent...@suse.de>:
> >>>> >> >> > > On Thu, 3 Nov 2016, Yuri Rumyantsev wrote:
> >>>> >> >> > >
> >>>> >> >> > >> Hi Richard,
> >>>> >> >> > >>
> >>>> >> >> > >> I did not understand your last remark:
> >>>> >> >> > >>
> >>>> >> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
> >>>> >> >> > >> >
> >>>> >> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
> >>>> >> >> > >> >           && dump_enabled_p ())
> >>>> >> >> > >> >           dump_printf_loc (MSG_OPTIMIZED_LOCATIONS,
> >>vect_location,
> >>>> >> >> > >> >                            "loop vectorized\n");
> >>>> >> >> > >> > -       vect_transform_loop (loop_vinfo);
> >>>> >> >> > >> > +       new_loop = vect_transform_loop (loop_vinfo);
> >>>> >> >> > >> >         num_vectorized_loops++;
> >>>> >> >> > >> >        /* Now that the loop has been vectorized, allow
> >>it to be unrolled
> >>>> >> >> > >> >           etc.  */
> >>>> >> >> > >> >      loop->force_vectorize = false;
> >>>> >> >> > >> >
> >>>> >> >> > >> > +       /* Add new loop to a processing queue.  To make
> >>it easier
> >>>> >> >> > >> > +          to match loop and its epilogue vectorization
> >>in dumps
> >>>> >> >> > >> > +          put new loop as the next loop to process.
> >>*/
> >>>> >> >> > >> > +       if (new_loop)
> >>>> >> >> > >> > +         {
> >>>> >> >> > >> > +           loops.safe_insert (i + 1, new_loop->num);
> >>>> >> >> > >> > +           vect_loops_num = number_of_loops (cfun);
> >>>> >> >> > >> > +         }
> >>>> >> >> > >> >
> >>>> >> >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo,
> >>new_loop)
> >>>> >> >> > >> f> unction which will set up stuff properly (and also
> >>perform
> >>>> >> >> > >> > the if-conversion of the epilogue there).
> >>>> >> >> > >> >
> >>>> >> >> > >> > That said, if we can get in non-masked epilogue
> >>vectorization
> >>>> >> >> > >> > separately that would be great.
> >>>> >> >> > >>
> >>>> >> >> > >> Could you please clarify your proposal.
> >>>> >> >> > >
> >>>> >> >> > > When a loop was vectorized set things up to immediately
> >>vectorize
> >>>> >> >> > > its epilogue, avoiding changing the loop iteration and
> >>avoiding
> >>>> >> >> > > the re-use of ->aux.
> >>>> >> >> > >
> >>>> >> >> > > Richard.
> >>>> >> >> > >
> >>>> >> >> > >> Thanks.
> >>>> >> >> > >> Yuri.
> >>>> >> >> > >>
> >>>> >> >> > >> 2016-11-02 15:27 GMT+03:00 Richard Biener
> >><rguent...@suse.de>:
> >>>> >> >> > >> > On Tue, 1 Nov 2016, Yuri Rumyantsev wrote:
> >>>> >> >> > >> >
> >>>> >> >> > >> >> Hi All,
> >>>> >> >> > >> >>
> >>>> >> >> > >> >> I re-send all patches sent by Ilya earlier for review
> >>which support
> >>>> >> >> > >> >> vectorization of loop epilogues and loops with low
> >>trip count. We
> >>>> >> >> > >> >> assume that the only patch -
> >>vec-tails-07-combine-tail.patch - was not
> >>>> >> >> > >> >> approved by Jeff.
> >>>> >> >> > >> >>
> >>>> >> >> > >> >> I did re-base of all patches and performed
> >>bootstrapping and
> >>>> >> >> > >> >> regression testing that did not show any new failures.
> >>Also all
> >>>> >> >> > >> >> changes related to new vect_do_peeling algorithm have
> >>been changed
> >>>> >> >> > >> >> accordingly.
> >>>> >> >> > >> >>
> >>>> >> >> > >> >> Is it OK for trunk?
> >>>> >> >> > >> >
> >>>> >> >> > >> > I would have prefered that the series up to
> >>-03-nomask-tails would
> >>>> >> >> > >> > _only_ contain epilogue loop vectorization changes but
> >>unfortunately
> >>>> >> >> > >> > the patchset is oddly separated.
> >>>> >> >> > >> >
> >>>> >> >> > >> > I have a comment on that part nevertheless:
> >>>> >> >> > >> >
> >>>> >> >> > >> > @@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment
> >>(loop_vec_info
> >>>> >> >> > >> > loop_vinfo)
> >>>> >> >> > >> >    /* Check if we can possibly peel the loop.  */
> >>>> >> >> > >> >    if (!vect_can_advance_ivs_p (loop_vinfo)
> >>>> >> >> > >> >        || !slpeel_can_duplicate_loop_p (loop,
> >>single_exit (loop))
> >>>> >> >> > >> > -      || loop->inner)
> >>>> >> >> > >> > +      || loop->inner
> >>>> >> >> > >> > +      /* Required peeling was performed in prologue
> >>and
> >>>> >> >> > >> > +        is not required for epilogue.  */
> >>>> >> >> > >> > +      || LOOP_VINFO_EPILOGUE_P (loop_vinfo))
> >>>> >> >> > >> >      do_peeling = false;
> >>>> >> >> > >> >
> >>>> >> >> > >> >    if (do_peeling
> >>>> >> >> > >> > @@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment
> >>(loop_vec_info
> >>>> >> >> > >> > loop_vinfo)
> >>>> >> >> > >> >
> >>>> >> >> > >> >    do_versioning =
> >>>> >> >> > >> >         optimize_loop_nest_for_speed_p (loop)
> >>>> >> >> > >> > -       && (!loop->inner); /* FORNOW */
> >>>> >> >> > >> > +       && (!loop->inner) /* FORNOW */
> >>>> >> >> > >> > +        /* Required versioning was performed for the
> >>>> >> >> > >> > +          original loop and is not required for
> >>epilogue.  */
> >>>> >> >> > >> > +       && !LOOP_VINFO_EPILOGUE_P (loop_vinfo);
> >>>> >> >> > >> >
> >>>> >> >> > >> >    if (do_versioning)
> >>>> >> >> > >> >      {
> >>>> >> >> > >> >
> >>>> >> >> > >> > please do that check in the single caller of this
> >>function.
> >>>> >> >> > >> >
> >>>> >> >> > >> > Otherwise I still dislike the new ->aux use and I
> >>believe that simply
> >>>> >> >> > >> > passing down info from the processed parent would be
> >>_much_ cleaner.
> >>>> >> >> > >> > That is, here (and avoid the FOR_EACH_LOOP change):
> >>>> >> >> > >> >
> >>>> >> >> > >> > @@ -580,12 +586,21 @@ vectorize_loops (void)
> >>>> >> >> > >> >             && dump_enabled_p ())
> >>>> >> >> > >> >            dump_printf_loc (MSG_OPTIMIZED_LOCATIONS,
> >>vect_location,
> >>>> >> >> > >> >                             "loop vectorized\n");
> >>>> >> >> > >> > -       vect_transform_loop (loop_vinfo);
> >>>> >> >> > >> > +       new_loop = vect_transform_loop (loop_vinfo);
> >>>> >> >> > >> >         num_vectorized_loops++;
> >>>> >> >> > >> >         /* Now that the loop has been vectorized, allow
> >>it to be unrolled
> >>>> >> >> > >> >            etc.  */
> >>>> >> >> > >> >         loop->force_vectorize = false;
> >>>> >> >> > >> >
> >>>> >> >> > >> > +       /* Add new loop to a processing queue.  To make
> >>it easier
> >>>> >> >> > >> > +          to match loop and its epilogue vectorization
> >>in dumps
> >>>> >> >> > >> > +          put new loop as the next loop to process.
> >>*/
> >>>> >> >> > >> > +       if (new_loop)
> >>>> >> >> > >> > +         {
> >>>> >> >> > >> > +           loops.safe_insert (i + 1, new_loop->num);
> >>>> >> >> > >> > +           vect_loops_num = number_of_loops (cfun);
> >>>> >> >> > >> > +         }
> >>>> >> >> > >> >
> >>>> >> >> > >> > simply dispatch to a vectorize_epilogue (loop_vinfo,
> >>new_loop)
> >>>> >> >> > >> > function which will set up stuff properly (and also
> >>perform
> >>>> >> >> > >> > the if-conversion of the epilogue there).
> >>>> >> >> > >> >
> >>>> >> >> > >> > That said, if we can get in non-masked epilogue
> >>vectorization
> >>>> >> >> > >> > separately that would be great.
> >>>> >> >> > >> >
> >>>> >> >> > >> > I'm still torn about all the rest of the stuff and
> >>question its
> >>>> >> >> > >> > usability (esp. merging the epilogue with the main
> >>vector loop).
> >>>> >> >> > >> > But it has already been approved ... oh well.
> >>>> >> >> > >> >
> >>>> >> >> > >> > Thanks,
> >>>> >> >> > >> > Richard.
> >>>> >> >> > >>
> >>>> >> >> > >>
> >>>> >> >> > >
> >>>> >> >> > > --
> >>>> >> >> > > Richard Biener <rguent...@suse.de>
> >>>> >> >> > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard,
> >>Graham Norton, HRB 21284 (AG Nuernberg)
> >>>> >> >> >
> >>>> >> >>
> >>>> >> >>
> >>>> >> >
> >>>> >> > --
> >>>> >> > Richard Biener <rguent...@suse.de>
> >>>> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham
> >>Norton, HRB 21284 (AG Nuernberg)
> >>>> >>
> >>>> >
> >>>> > --
> >>>> > Richard Biener <rguent...@suse.de>
> >>>> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham
> >>Norton, HRB 21284 (AG Nuernberg)
> >>>>
> >>>
> >>> --
> >>> Richard Biener <rguent...@suse.de>
> >>> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham
> >>Norton, HRB 21284 (AG Nuernberg)
> >
> >
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Reply via email to