On 09/21/2016 02:52 AM, Bin.Cheng wrote:
On Wed, Sep 14, 2016 at 5:43 PM, Jeff Law <l...@redhat.com> wrote:
On 09/14/2016 07:21 AM, Richard Biener wrote:

On Tue, Sep 6, 2016 at 8:52 PM, Bin Cheng <bin.ch...@arm.com> wrote:

Hi,
This is the main patch improving control flow graph for vectorized loop.
It generally rewrites loop peeling stuff in vectorizer.  As described in
patch, for a typical loop to be vectorized like:

       preheader:
     LOOP:
       header_bb:
         loop_body
         if (exit_loop_cond) goto exit_bb
         else                goto header_bb
       exit_bb:

This patch peels prolog and epilog from the loop, adds guards skipping
PROLOG and EPILOG for various conditions.  As a result, the changed CFG
would look like:

       guard_bb_1:
         if (prefer_scalar_loop) goto merge_bb_1
         else                    goto guard_bb_2

       guard_bb_2:
         if (skip_prolog) goto merge_bb_2
         else             goto prolog_preheader

       prolog_preheader:
     PROLOG:
       prolog_header_bb:
         prolog_body
         if (exit_prolog_cond) goto prolog_exit_bb
         else                  goto prolog_header_bb
       prolog_exit_bb:

       merge_bb_2:

       vector_preheader:
     VECTOR LOOP:
       vector_header_bb:
         vector_body
         if (exit_vector_cond) goto vector_exit_bb
         else                  goto vector_header_bb
       vector_exit_bb:

       guard_bb_3:
         if (skip_epilog) goto merge_bb_3
         else             goto epilog_preheader

       merge_bb_1:

       epilog_preheader:
     EPILOG:
       epilog_header_bb:
         epilog_body
         if (exit_epilog_cond) goto merge_bb_3
         else                  goto epilog_header_bb

       merge_bb_3:


Note this patch peels prolog and epilog only if it's necessary, as well
as adds different guard_conditions/branches.  Also the first guard/branch
could be further improved by merging it with loop versioning.

Before this patch, up to 4 branch instructions need to be executed before
the vectorized loop is reached in the worst case, while the number is
reduced to 2 with this patch.  The patch also does better in compile time
analysis to avoid unnecessary peeling/branching.
From implementation's point of view, vectorizer needs to update induction
variables and iteration bounds along with control flow changes.
Unfortunately, it also becomes much harder to follow because slpeel_*
functions updates SSA by itself, rather than using update_ssa interface.
This patch tries to factor out SSA/IV/Niter_bound changes from CFG changes.
This should make the implementation easier to read, and I think it maybe a
step forward to replace slpeel_* functions with generic GIMPLE loop copy
interfaces as Richard suggested.


I've skimmed over the patch and it looks reasonable to me.

THanks.  I was maybe 15% of the way through the main patch.  Nothing that
gave me cause for concern, but I wasn't ready to ACK it myself yet.
Hi Jeff,
Any update on this one?  Well, it might conflict with the epilogue
vectorization patch set?
I considered Richi's message an ACK for the patch. Sorry if I wasn't clear about that.

While this patch may conflict with the epilogue vectorization patch set, but the epilogue vectorization work seems to have stalled, so let's have yours go in now.

Jeff

Reply via email to