Re: [PATCH GCC 6/9]Simplify control flow graph for vectorized loop

Jeff Law Wed, 28 Sep 2016 09:18:44 -0700

On 09/21/2016 02:52 AM, Bin.Cheng wrote:

On Wed, Sep 14, 2016 at 5:43 PM, Jeff Law <l...@redhat.com> wrote:

On 09/14/2016 07:21 AM, Richard Biener wrote:


On Tue, Sep 6, 2016 at 8:52 PM, Bin Cheng <bin.ch...@arm.com> wrote:


Hi,
This is the main patch improving control flow graph for vectorized loop.
It generally rewrites loop peeling stuff in vectorizer.  As described in
patch, for a typical loop to be vectorized like:

       preheader:
     LOOP:
       header_bb:
         loop_body
         if (exit_loop_cond) goto exit_bb
         else                goto header_bb
       exit_bb:

This patch peels prolog and epilog from the loop, adds guards skipping
PROLOG and EPILOG for various conditions.  As a result, the changed CFG
would look like:

       guard_bb_1:
         if (prefer_scalar_loop) goto merge_bb_1
         else                    goto guard_bb_2

       guard_bb_2:
         if (skip_prolog) goto merge_bb_2
         else             goto prolog_preheader

       prolog_preheader:
     PROLOG:
       prolog_header_bb:
         prolog_body
         if (exit_prolog_cond) goto prolog_exit_bb
         else                  goto prolog_header_bb
       prolog_exit_bb:

       merge_bb_2:

       vector_preheader:
     VECTOR LOOP:
       vector_header_bb:
         vector_body
         if (exit_vector_cond) goto vector_exit_bb
         else                  goto vector_header_bb
       vector_exit_bb:

       guard_bb_3:
         if (skip_epilog) goto merge_bb_3
         else             goto epilog_preheader

       merge_bb_1:

       epilog_preheader:
     EPILOG:
       epilog_header_bb:
         epilog_body
         if (exit_epilog_cond) goto merge_bb_3
         else                  goto epilog_header_bb

       merge_bb_3:


Note this patch peels prolog and epilog only if it's necessary, as well
as adds different guard_conditions/branches.  Also the first guard/branch
could be further improved by merging it with loop versioning.

Before this patch, up to 4 branch instructions need to be executed before
the vectorized loop is reached in the worst case, while the number is
reduced to 2 with this patch.  The patch also does better in compile time
analysis to avoid unnecessary peeling/branching.
From implementation's point of view, vectorizer needs to update induction
variables and iteration bounds along with control flow changes.
Unfortunately, it also becomes much harder to follow because slpeel_*
functions updates SSA by itself, rather than using update_ssa interface.
This patch tries to factor out SSA/IV/Niter_bound changes from CFG changes.
This should make the implementation easier to read, and I think it maybe a
step forward to replace slpeel_* functions with generic GIMPLE loop copy
interfaces as Richard suggested.



I've skimmed over the patch and it looks reasonable to me.


THanks.  I was maybe 15% of the way through the main patch.  Nothing that
gave me cause for concern, but I wasn't ready to ACK it myself yet.

Hi Jeff,
Any update on this one?  Well, it might conflict with the epilogue
vectorization patch set?

I considered Richi's message an ACK for the patch. Sorry if I wasn'tclear about that.

While this patch may conflict with the epilogue vectorization patch set,but the epilogue vectorization work seems to have stalled, so let's haveyours go in now.


Jeff

Re: [PATCH GCC 6/9]Simplify control flow graph for vectorized loop

Reply via email to