Re: [PATCH v6 0/21]middle-end: Support early break/return auto-vectorization

2023-11-28 Thread Richard Biener
On Mon, 27 Nov 2023, Richard Sandiford wrote:

> Catching up on backlog, so this might already be resolved, but:
> 
> Richard Biener  writes:
> > On Tue, 7 Nov 2023, Tamar Christina wrote:
> >
> >> > -Original Message-
> >> > From: Richard Biener 
> >> > Sent: Tuesday, November 7, 2023 9:43 AM
> >> > To: Tamar Christina 
> >> > Cc: gcc-patches@gcc.gnu.org; nd 
> >> > Subject: RE: [PATCH v6 0/21]middle-end: Support early break/return auto-
> >> > vectorization
> >> > 
> >> > On Mon, 6 Nov 2023, Tamar Christina wrote:
> >> > 
> >> > > > -Original Message-
> >> > > > From: Richard Biener 
> >> > > > Sent: Monday, November 6, 2023 2:25 PM
> >> > > > To: Tamar Christina 
> >> > > > Cc: gcc-patches@gcc.gnu.org; nd 
> >> > > > Subject: Re: [PATCH v6 0/21]middle-end: Support early break/return
> >> > > > auto- vectorization
> >> > > >
> >> > > > On Mon, 6 Nov 2023, Tamar Christina wrote:
> >> > > >
> >> > > > > Hi All,
> >> > > > >
> >> > > > > This patch adds initial support for early break vectorization in 
> >> > > > > GCC.
> >> > > > > The support is added for any target that implements a vector
> >> > > > > cbranch optab, this includes both fully masked and non-masked 
> >> > > > > targets.
> >> > > > >
> >> > > > > Depending on the operation, the vectorizer may also require
> >> > > > > support for boolean mask reductions using Inclusive OR.  This is
> >> > > > > however only checked then the comparison would produce multiple
> >> > statements.
> >> > > > >
> >> > > > > Note: I am currently struggling to get patch 7 correct in all
> >> > > > > cases and could
> >> > > > use
> >> > > > >   some feedback there.
> >> > > > >
> >> > > > > Concretely the kind of loops supported are of the forms:
> >> > > > >
> >> > > > >  for (int i = 0; i < N; i++)
> >> > > > >  {
> >> > > > >
> >> > > > >if ()
> >> > > > >  {
> >> > > > >...
> >> > > > >;
> >> > > > >  }
> >> > > > >
> >> > > > >  }
> >> > > > >
> >> > > > > where  can be:
> >> > > > >  - break
> >> > > > >  - return
> >> > > > >  - goto
> >> > > > >
> >> > > > > Any number of statements can be used before the  occurs.
> >> > > > >
> >> > > > > Since this is an initial version for GCC 14 it has the following
> >> > > > > limitations and
> >> > > > > features:
> >> > > > >
> >> > > > > - Only fixed sized iterations and buffers are supported.  That is 
> >> > > > > to say any
> >> > > > >   vectors loaded or stored must be to statically allocated arrays 
> >> > > > > with
> >> > known
> >> > > > >   sizes. N must also be known.  This limitation is because our 
> >> > > > > primary
> >> > target
> >> > > > >   for this optimization is SVE.  For VLA SVE we can't easily do 
> >> > > > > cross page
> >> > > > >   iteraion checks. The result is likely to also not be beneficial. 
> >> > > > > For that
> >> > > > >   reason we punt support for variable buffers till we have 
> >> > > > > First-Faulting
> >> > > > >   support in GCC.
> >> > 
> >> > Btw, for this I wonder if you thought about marking memory accesses 
> >> > required
> >> > for the early break condition as required to be vector-size aligned, 
> >> > thus peeling
> >> > or versioning them for alignment?  That should ensure they do not fault.
> >> > 
> >> > OTOH I somehow remember prologue peeling isn't supported for early break
> >> > vectorization?  ..
> >> > 
> >&g

Re: [PATCH v6 0/21]middle-end: Support early break/return auto-vectorization

2023-11-27 Thread Richard Sandiford
Catching up on backlog, so this might already be resolved, but:

Richard Biener  writes:
> On Tue, 7 Nov 2023, Tamar Christina wrote:
>
>> > -Original Message-
>> > From: Richard Biener 
>> > Sent: Tuesday, November 7, 2023 9:43 AM
>> > To: Tamar Christina 
>> > Cc: gcc-patches@gcc.gnu.org; nd 
>> > Subject: RE: [PATCH v6 0/21]middle-end: Support early break/return auto-
>> > vectorization
>> > 
>> > On Mon, 6 Nov 2023, Tamar Christina wrote:
>> > 
>> > > > -Original Message-
>> > > > From: Richard Biener 
>> > > > Sent: Monday, November 6, 2023 2:25 PM
>> > > > To: Tamar Christina 
>> > > > Cc: gcc-patches@gcc.gnu.org; nd 
>> > > > Subject: Re: [PATCH v6 0/21]middle-end: Support early break/return
>> > > > auto- vectorization
>> > > >
>> > > > On Mon, 6 Nov 2023, Tamar Christina wrote:
>> > > >
>> > > > > Hi All,
>> > > > >
>> > > > > This patch adds initial support for early break vectorization in GCC.
>> > > > > The support is added for any target that implements a vector
>> > > > > cbranch optab, this includes both fully masked and non-masked 
>> > > > > targets.
>> > > > >
>> > > > > Depending on the operation, the vectorizer may also require
>> > > > > support for boolean mask reductions using Inclusive OR.  This is
>> > > > > however only checked then the comparison would produce multiple
>> > statements.
>> > > > >
>> > > > > Note: I am currently struggling to get patch 7 correct in all
>> > > > > cases and could
>> > > > use
>> > > > >   some feedback there.
>> > > > >
>> > > > > Concretely the kind of loops supported are of the forms:
>> > > > >
>> > > > >  for (int i = 0; i < N; i++)
>> > > > >  {
>> > > > >
>> > > > >if ()
>> > > > >  {
>> > > > >...
>> > > > >;
>> > > > >  }
>> > > > >
>> > > > >  }
>> > > > >
>> > > > > where  can be:
>> > > > >  - break
>> > > > >  - return
>> > > > >  - goto
>> > > > >
>> > > > > Any number of statements can be used before the  occurs.
>> > > > >
>> > > > > Since this is an initial version for GCC 14 it has the following
>> > > > > limitations and
>> > > > > features:
>> > > > >
>> > > > > - Only fixed sized iterations and buffers are supported.  That is to 
>> > > > > say any
>> > > > >   vectors loaded or stored must be to statically allocated arrays 
>> > > > > with
>> > known
>> > > > >   sizes. N must also be known.  This limitation is because our 
>> > > > > primary
>> > target
>> > > > >   for this optimization is SVE.  For VLA SVE we can't easily do 
>> > > > > cross page
>> > > > >   iteraion checks. The result is likely to also not be beneficial. 
>> > > > > For that
>> > > > >   reason we punt support for variable buffers till we have 
>> > > > > First-Faulting
>> > > > >   support in GCC.
>> > 
>> > Btw, for this I wonder if you thought about marking memory accesses 
>> > required
>> > for the early break condition as required to be vector-size aligned, thus 
>> > peeling
>> > or versioning them for alignment?  That should ensure they do not fault.
>> > 
>> > OTOH I somehow remember prologue peeling isn't supported for early break
>> > vectorization?  ..
>> > 
>> > > > > - any stores in  should not be to the same objects as in
>> > > > >   .  Loads are fine as long as they don't have the 
>> > > > > possibility to
>> > > > >   alias.  More concretely, we block RAW dependencies when the
>> > > > > intermediate
>> > > > value
>> > > > >   can't be separated fromt the store, or the store itself can't be 
>> > > > > moved.
>> > > > > - Prologue

RE: [PATCH v6 0/21]middle-end: Support early break/return auto-vectorization

2023-11-07 Thread Richard Biener
On Tue, 7 Nov 2023, Tamar Christina wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Tuesday, November 7, 2023 9:43 AM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd 
> > Subject: RE: [PATCH v6 0/21]middle-end: Support early break/return auto-
> > vectorization
> > 
> > On Mon, 6 Nov 2023, Tamar Christina wrote:
> > 
> > > > -Original Message-
> > > > From: Richard Biener 
> > > > Sent: Monday, November 6, 2023 2:25 PM
> > > > To: Tamar Christina 
> > > > Cc: gcc-patches@gcc.gnu.org; nd 
> > > > Subject: Re: [PATCH v6 0/21]middle-end: Support early break/return
> > > > auto- vectorization
> > > >
> > > > On Mon, 6 Nov 2023, Tamar Christina wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > This patch adds initial support for early break vectorization in GCC.
> > > > > The support is added for any target that implements a vector
> > > > > cbranch optab, this includes both fully masked and non-masked targets.
> > > > >
> > > > > Depending on the operation, the vectorizer may also require
> > > > > support for boolean mask reductions using Inclusive OR.  This is
> > > > > however only checked then the comparison would produce multiple
> > statements.
> > > > >
> > > > > Note: I am currently struggling to get patch 7 correct in all
> > > > > cases and could
> > > > use
> > > > >   some feedback there.
> > > > >
> > > > > Concretely the kind of loops supported are of the forms:
> > > > >
> > > > >  for (int i = 0; i < N; i++)
> > > > >  {
> > > > >
> > > > >if ()
> > > > >  {
> > > > >...
> > > > >;
> > > > >  }
> > > > >
> > > > >  }
> > > > >
> > > > > where  can be:
> > > > >  - break
> > > > >  - return
> > > > >  - goto
> > > > >
> > > > > Any number of statements can be used before the  occurs.
> > > > >
> > > > > Since this is an initial version for GCC 14 it has the following
> > > > > limitations and
> > > > > features:
> > > > >
> > > > > - Only fixed sized iterations and buffers are supported.  That is to 
> > > > > say any
> > > > >   vectors loaded or stored must be to statically allocated arrays with
> > known
> > > > >   sizes. N must also be known.  This limitation is because our primary
> > target
> > > > >   for this optimization is SVE.  For VLA SVE we can't easily do cross 
> > > > > page
> > > > >   iteraion checks. The result is likely to also not be beneficial. 
> > > > > For that
> > > > >   reason we punt support for variable buffers till we have 
> > > > > First-Faulting
> > > > >   support in GCC.
> > 
> > Btw, for this I wonder if you thought about marking memory accesses required
> > for the early break condition as required to be vector-size aligned, thus 
> > peeling
> > or versioning them for alignment?  That should ensure they do not fault.
> > 
> > OTOH I somehow remember prologue peeling isn't supported for early break
> > vectorization?  ..
> > 
> > > > > - any stores in  should not be to the same objects as in
> > > > >   .  Loads are fine as long as they don't have the 
> > > > > possibility to
> > > > >   alias.  More concretely, we block RAW dependencies when the
> > > > > intermediate
> > > > value
> > > > >   can't be separated fromt the store, or the store itself can't be 
> > > > > moved.
> > > > > - Prologue peeling, alignment peelinig and loop versioning are 
> > > > > supported.
> > 
> > .. but here you say it is.  Not sure if peeling for alignment works for VLA 
> > vectors
> > though.  Just to say x86 doesn't support first-faulting loads.
> 
> For VLA we support it through masking.  i.e. if you need to peel N 
> iterations, we
> generate a masked copy of the loop vectorized which masks off the first N 
> bits.
> 
> This is not typically needed, but we do support it.  But the problem with this
> schem

RE: [PATCH v6 0/21]middle-end: Support early break/return auto-vectorization

2023-11-07 Thread Tamar Christina
> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, November 7, 2023 9:43 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd 
> Subject: RE: [PATCH v6 0/21]middle-end: Support early break/return auto-
> vectorization
> 
> On Mon, 6 Nov 2023, Tamar Christina wrote:
> 
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Monday, November 6, 2023 2:25 PM
> > > To: Tamar Christina 
> > > Cc: gcc-patches@gcc.gnu.org; nd 
> > > Subject: Re: [PATCH v6 0/21]middle-end: Support early break/return
> > > auto- vectorization
> > >
> > > On Mon, 6 Nov 2023, Tamar Christina wrote:
> > >
> > > > Hi All,
> > > >
> > > > This patch adds initial support for early break vectorization in GCC.
> > > > The support is added for any target that implements a vector
> > > > cbranch optab, this includes both fully masked and non-masked targets.
> > > >
> > > > Depending on the operation, the vectorizer may also require
> > > > support for boolean mask reductions using Inclusive OR.  This is
> > > > however only checked then the comparison would produce multiple
> statements.
> > > >
> > > > Note: I am currently struggling to get patch 7 correct in all
> > > > cases and could
> > > use
> > > >   some feedback there.
> > > >
> > > > Concretely the kind of loops supported are of the forms:
> > > >
> > > >  for (int i = 0; i < N; i++)
> > > >  {
> > > >
> > > >if ()
> > > >  {
> > > >...
> > > >;
> > > >  }
> > > >
> > > >  }
> > > >
> > > > where  can be:
> > > >  - break
> > > >  - return
> > > >  - goto
> > > >
> > > > Any number of statements can be used before the  occurs.
> > > >
> > > > Since this is an initial version for GCC 14 it has the following
> > > > limitations and
> > > > features:
> > > >
> > > > - Only fixed sized iterations and buffers are supported.  That is to 
> > > > say any
> > > >   vectors loaded or stored must be to statically allocated arrays with
> known
> > > >   sizes. N must also be known.  This limitation is because our primary
> target
> > > >   for this optimization is SVE.  For VLA SVE we can't easily do cross 
> > > > page
> > > >   iteraion checks. The result is likely to also not be beneficial. For 
> > > > that
> > > >   reason we punt support for variable buffers till we have 
> > > > First-Faulting
> > > >   support in GCC.
> 
> Btw, for this I wonder if you thought about marking memory accesses required
> for the early break condition as required to be vector-size aligned, thus 
> peeling
> or versioning them for alignment?  That should ensure they do not fault.
> 
> OTOH I somehow remember prologue peeling isn't supported for early break
> vectorization?  ..
> 
> > > > - any stores in  should not be to the same objects as in
> > > >   .  Loads are fine as long as they don't have the 
> > > > possibility to
> > > >   alias.  More concretely, we block RAW dependencies when the
> > > > intermediate
> > > value
> > > >   can't be separated fromt the store, or the store itself can't be 
> > > > moved.
> > > > - Prologue peeling, alignment peelinig and loop versioning are 
> > > > supported.
> 
> .. but here you say it is.  Not sure if peeling for alignment works for VLA 
> vectors
> though.  Just to say x86 doesn't support first-faulting loads.

For VLA we support it through masking.  i.e. if you need to peel N iterations, 
we
generate a masked copy of the loop vectorized which masks off the first N bits.

This is not typically needed, but we do support it.  But the problem with this
scheme and early break is obviously that the peeled loop needs to be vectorized
so you kinda end up with the same issue again.  So Atm it rejects it for VLA.

Regards,
Tamar

> 
> > > > - Fully masked loops, unmasked loops and partially masked loops
> > > > are supported
> > > > - Any number of loop early exits are supported.
> > > > - No support for epilogue vectorization.  The only epilogue supported is
> the
> > > >   scalar final one.  Peeling code supports it but the code moti

RE: [PATCH v6 0/21]middle-end: Support early break/return auto-vectorization

2023-11-07 Thread Richard Biener
On Mon, 6 Nov 2023, Tamar Christina wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Monday, November 6, 2023 2:25 PM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd 
> > Subject: Re: [PATCH v6 0/21]middle-end: Support early break/return auto-
> > vectorization
> > 
> > On Mon, 6 Nov 2023, Tamar Christina wrote:
> > 
> > > Hi All,
> > >
> > > This patch adds initial support for early break vectorization in GCC.
> > > The support is added for any target that implements a vector cbranch
> > > optab, this includes both fully masked and non-masked targets.
> > >
> > > Depending on the operation, the vectorizer may also require support
> > > for boolean mask reductions using Inclusive OR.  This is however only
> > > checked then the comparison would produce multiple statements.
> > >
> > > Note: I am currently struggling to get patch 7 correct in all cases and 
> > > could
> > use
> > >   some feedback there.
> > >
> > > Concretely the kind of loops supported are of the forms:
> > >
> > >  for (int i = 0; i < N; i++)
> > >  {
> > >
> > >if ()
> > >  {
> > >...
> > >;
> > >  }
> > >
> > >  }
> > >
> > > where  can be:
> > >  - break
> > >  - return
> > >  - goto
> > >
> > > Any number of statements can be used before the  occurs.
> > >
> > > Since this is an initial version for GCC 14 it has the following
> > > limitations and
> > > features:
> > >
> > > - Only fixed sized iterations and buffers are supported.  That is to say 
> > > any
> > >   vectors loaded or stored must be to statically allocated arrays with 
> > > known
> > >   sizes. N must also be known.  This limitation is because our primary 
> > > target
> > >   for this optimization is SVE.  For VLA SVE we can't easily do cross page
> > >   iteraion checks. The result is likely to also not be beneficial. For 
> > > that
> > >   reason we punt support for variable buffers till we have First-Faulting
> > >   support in GCC.

Btw, for this I wonder if you thought about marking memory accesses
required for the early break condition as required to be vector-size
aligned, thus peeling or versioning them for alignment?  That should
ensure they do not fault.

OTOH I somehow remember prologue peeling isn't supported for early
break vectorization?  ..

> > > - any stores in  should not be to the same objects as in
> > >   .  Loads are fine as long as they don't have the possibility 
> > > to
> > >   alias.  More concretely, we block RAW dependencies when the intermediate
> > value
> > >   can't be separated fromt the store, or the store itself can't be moved.
> > > - Prologue peeling, alignment peelinig and loop versioning are supported.

.. but here you say it is.  Not sure if peeling for alignment works for
VLA vectors though.  Just to say x86 doesn't support first-faulting
loads.

> > > - Fully masked loops, unmasked loops and partially masked loops are
> > > supported
> > > - Any number of loop early exits are supported.
> > > - No support for epilogue vectorization.  The only epilogue supported is 
> > > the
> > >   scalar final one.  Peeling code supports it but the code motion code 
> > > cannot
> > >   find instructions to make the move in the epilog.
> > > - Early breaks are only supported for inner loop vectorization.
> > >
> > > I have pushed a branch to refs/users/tnfchris/heads/gcc-14-early-break
> > >
> > > With the help of IPA and LTO this still gets hit quite often.  During
> > > bootstrap it hit rather frequently.  Additionally TSVC s332, s481 and
> > > s482 all pass now since these are tests for support for early exit
> > vectorization.
> > >
> > > This implementation does not support completely handling the early
> > > break inside the vector loop itself but instead supports adding checks
> > > such that if we know that we have to exit in the current iteration
> > > then we branch to scalar code to actually do the final VF iterations which
> > handles all the code in .
> > >
> > > For the scalar loop we know that whatever exit you take you have to
> > > perform at most VF iterations.  For vector code we only case about the
> > > state of fully performed iteration and reset the sca

RE: [PATCH v6 0/21]middle-end: Support early break/return auto-vectorization

2023-11-06 Thread Tamar Christina
> -Original Message-
> From: Richard Biener 
> Sent: Monday, November 6, 2023 2:25 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd 
> Subject: Re: [PATCH v6 0/21]middle-end: Support early break/return auto-
> vectorization
> 
> On Mon, 6 Nov 2023, Tamar Christina wrote:
> 
> > Hi All,
> >
> > This patch adds initial support for early break vectorization in GCC.
> > The support is added for any target that implements a vector cbranch
> > optab, this includes both fully masked and non-masked targets.
> >
> > Depending on the operation, the vectorizer may also require support
> > for boolean mask reductions using Inclusive OR.  This is however only
> > checked then the comparison would produce multiple statements.
> >
> > Note: I am currently struggling to get patch 7 correct in all cases and 
> > could
> use
> >   some feedback there.
> >
> > Concretely the kind of loops supported are of the forms:
> >
> >  for (int i = 0; i < N; i++)
> >  {
> >
> >if ()
> >  {
> >...
> >;
> >  }
> >
> >  }
> >
> > where  can be:
> >  - break
> >  - return
> >  - goto
> >
> > Any number of statements can be used before the  occurs.
> >
> > Since this is an initial version for GCC 14 it has the following
> > limitations and
> > features:
> >
> > - Only fixed sized iterations and buffers are supported.  That is to say any
> >   vectors loaded or stored must be to statically allocated arrays with known
> >   sizes. N must also be known.  This limitation is because our primary 
> > target
> >   for this optimization is SVE.  For VLA SVE we can't easily do cross page
> >   iteraion checks. The result is likely to also not be beneficial. For that
> >   reason we punt support for variable buffers till we have First-Faulting
> >   support in GCC.
> > - any stores in  should not be to the same objects as in
> >   .  Loads are fine as long as they don't have the possibility to
> >   alias.  More concretely, we block RAW dependencies when the intermediate
> value
> >   can't be separated fromt the store, or the store itself can't be moved.
> > - Prologue peeling, alignment peelinig and loop versioning are supported.
> > - Fully masked loops, unmasked loops and partially masked loops are
> > supported
> > - Any number of loop early exits are supported.
> > - No support for epilogue vectorization.  The only epilogue supported is the
> >   scalar final one.  Peeling code supports it but the code motion code 
> > cannot
> >   find instructions to make the move in the epilog.
> > - Early breaks are only supported for inner loop vectorization.
> >
> > I have pushed a branch to refs/users/tnfchris/heads/gcc-14-early-break
> >
> > With the help of IPA and LTO this still gets hit quite often.  During
> > bootstrap it hit rather frequently.  Additionally TSVC s332, s481 and
> > s482 all pass now since these are tests for support for early exit
> vectorization.
> >
> > This implementation does not support completely handling the early
> > break inside the vector loop itself but instead supports adding checks
> > such that if we know that we have to exit in the current iteration
> > then we branch to scalar code to actually do the final VF iterations which
> handles all the code in .
> >
> > For the scalar loop we know that whatever exit you take you have to
> > perform at most VF iterations.  For vector code we only case about the
> > state of fully performed iteration and reset the scalar code to the 
> > (partially)
> remaining loop.
> >
> > That is to say, the first vector loop executes so long as the early
> > exit isn't needed.  Once the exit is taken, the scalar code will
> > perform at most VF extra iterations.  The exact number depending on peeling
> and iteration start and which
> > exit was taken (natural or early).   For this scalar loop, all early exits 
> > are
> > treated the same.
> >
> > When we vectorize we move any statement not related to the early break
> > itself and that would be incorrect to execute before the break (i.e.
> > has side effects) to after the break.  If this is not possible we decline to
> vectorize.
> >
> > This means that we check at the start of iterations whether we are
> > going to exit or not.  During the analyis phase we check whether we
> > are allowed to do this moving of statements.  Also note that we only
> > move the scalar statements,

Re: [PATCH v6 0/21]middle-end: Support early break/return auto-vectorization

2023-11-06 Thread Richard Biener
On Mon, 6 Nov 2023, Tamar Christina wrote:

> Hi All,
> 
> This patch adds initial support for early break vectorization in GCC.
> The support is added for any target that implements a vector cbranch optab,
> this includes both fully masked and non-masked targets.
> 
> Depending on the operation, the vectorizer may also require support for 
> boolean
> mask reductions using Inclusive OR.  This is however only checked then the
> comparison would produce multiple statements.
> 
> Note: I am currently struggling to get patch 7 correct in all cases and could 
> use
>   some feedback there.
> 
> Concretely the kind of loops supported are of the forms:
> 
>  for (int i = 0; i < N; i++)
>  {
>
>if ()
>  {
>...
>;
>  }
>
>  }
> 
> where  can be:
>  - break
>  - return
>  - goto
> 
> Any number of statements can be used before the  occurs.
> 
> Since this is an initial version for GCC 14 it has the following limitations 
> and
> features:
> 
> - Only fixed sized iterations and buffers are supported.  That is to say any
>   vectors loaded or stored must be to statically allocated arrays with known
>   sizes. N must also be known.  This limitation is because our primary target
>   for this optimization is SVE.  For VLA SVE we can't easily do cross page
>   iteraion checks. The result is likely to also not be beneficial. For that
>   reason we punt support for variable buffers till we have First-Faulting
>   support in GCC.
> - any stores in  should not be to the same objects as in
>   .  Loads are fine as long as they don't have the possibility to
>   alias.  More concretely, we block RAW dependencies when the intermediate 
> value
>   can't be separated fromt the store, or the store itself can't be moved.
> - Prologue peeling, alignment peelinig and loop versioning are supported.
> - Fully masked loops, unmasked loops and partially masked loops are supported
> - Any number of loop early exits are supported.
> - No support for epilogue vectorization.  The only epilogue supported is the
>   scalar final one.  Peeling code supports it but the code motion code cannot
>   find instructions to make the move in the epilog.
> - Early breaks are only supported for inner loop vectorization.
> 
> I have pushed a branch to refs/users/tnfchris/heads/gcc-14-early-break
> 
> With the help of IPA and LTO this still gets hit quite often.  During 
> bootstrap
> it hit rather frequently.  Additionally TSVC s332, s481 and s482 all pass now
> since these are tests for support for early exit vectorization.
> 
> This implementation does not support completely handling the early break 
> inside
> the vector loop itself but instead supports adding checks such that if we know
> that we have to exit in the current iteration then we branch to scalar code to
> actually do the final VF iterations which handles all the code in .
> 
> For the scalar loop we know that whatever exit you take you have to perform at
> most VF iterations.  For vector code we only case about the state of fully
> performed iteration and reset the scalar code to the (partially) remaining 
> loop.
> 
> That is to say, the first vector loop executes so long as the early exit isn't
> needed.  Once the exit is taken, the scalar code will perform at most VF extra
> iterations.  The exact number depending on peeling and iteration start and 
> which
> exit was taken (natural or early).   For this scalar loop, all early exits are
> treated the same.
> 
> When we vectorize we move any statement not related to the early break itself
> and that would be incorrect to execute before the break (i.e. has side 
> effects)
> to after the break.  If this is not possible we decline to vectorize.
> 
> This means that we check at the start of iterations whether we are going to 
> exit
> or not.  During the analyis phase we check whether we are allowed to do this
> moving of statements.  Also note that we only move the scalar statements, but
> only do so after peeling but just before we start transforming statements.
> 
> Codegen:
> 
> for e.g.
> 
> #define N 803
> unsigned vect_a[N];
> unsigned vect_b[N];
> 
> unsigned test4(unsigned x)
> {
>  unsigned ret = 0;
>  for (int i = 0; i < N; i++)
>  {
>vect_b[i] = x + i;
>if (vect_a[i] > x)
>  break;
>vect_a[i] = x;
> 
>  }
>  return ret;
> }
> 
> We generate for Adv. SIMD:
> 
> test4:
> adrpx2, .LC0
> adrpx3, .LANCHOR0
> dup v2.4s, w0
> add x3, x3, :lo12:.LANCHOR0
> moviv4.4s, 0x4
> add x4, x3, 3216
> ldr q1, [x2, #:lo12:.LC0]
> mov x1, 0
> mov w2, 0
> .p2align 3,,7
> .L3:
> ldr q0, [x3, x1]
> add v3.4s, v1.4s, v2.4s
> add v1.4s, v1.4s, v4.4s
> cmhiv0.4s, v0.4s, v2.4s
> umaxp   v0.4s, v0.4s, v0.4s
> fmovx5, d0
> cbnzx5, .L6
> add w2, w2, 1
> str q3, [x1, x4]
> str

[PATCH v6 0/21]middle-end: Support early break/return auto-vectorization

2023-11-05 Thread Tamar Christina
Hi All,

This patch adds initial support for early break vectorization in GCC.
The support is added for any target that implements a vector cbranch optab,
this includes both fully masked and non-masked targets.

Depending on the operation, the vectorizer may also require support for boolean
mask reductions using Inclusive OR.  This is however only checked then the
comparison would produce multiple statements.

Note: I am currently struggling to get patch 7 correct in all cases and could 
use
  some feedback there.

Concretely the kind of loops supported are of the forms:

 for (int i = 0; i < N; i++)
 {
   
   if ()
 {
   ...
   ;
 }
   
 }

where  can be:
 - break
 - return
 - goto

Any number of statements can be used before the  occurs.

Since this is an initial version for GCC 14 it has the following limitations and
features:

- Only fixed sized iterations and buffers are supported.  That is to say any
  vectors loaded or stored must be to statically allocated arrays with known
  sizes. N must also be known.  This limitation is because our primary target
  for this optimization is SVE.  For VLA SVE we can't easily do cross page
  iteraion checks. The result is likely to also not be beneficial. For that
  reason we punt support for variable buffers till we have First-Faulting
  support in GCC.
- any stores in  should not be to the same objects as in
  .  Loads are fine as long as they don't have the possibility to
  alias.  More concretely, we block RAW dependencies when the intermediate value
  can't be separated fromt the store, or the store itself can't be moved.
- Prologue peeling, alignment peelinig and loop versioning are supported.
- Fully masked loops, unmasked loops and partially masked loops are supported
- Any number of loop early exits are supported.
- No support for epilogue vectorization.  The only epilogue supported is the
  scalar final one.  Peeling code supports it but the code motion code cannot
  find instructions to make the move in the epilog.
- Early breaks are only supported for inner loop vectorization.

I have pushed a branch to refs/users/tnfchris/heads/gcc-14-early-break

With the help of IPA and LTO this still gets hit quite often.  During bootstrap
it hit rather frequently.  Additionally TSVC s332, s481 and s482 all pass now
since these are tests for support for early exit vectorization.

This implementation does not support completely handling the early break inside
the vector loop itself but instead supports adding checks such that if we know
that we have to exit in the current iteration then we branch to scalar code to
actually do the final VF iterations which handles all the code in .

For the scalar loop we know that whatever exit you take you have to perform at
most VF iterations.  For vector code we only case about the state of fully
performed iteration and reset the scalar code to the (partially) remaining loop.

That is to say, the first vector loop executes so long as the early exit isn't
needed.  Once the exit is taken, the scalar code will perform at most VF extra
iterations.  The exact number depending on peeling and iteration start and which
exit was taken (natural or early).   For this scalar loop, all early exits are
treated the same.

When we vectorize we move any statement not related to the early break itself
and that would be incorrect to execute before the break (i.e. has side effects)
to after the break.  If this is not possible we decline to vectorize.

This means that we check at the start of iterations whether we are going to exit
or not.  During the analyis phase we check whether we are allowed to do this
moving of statements.  Also note that we only move the scalar statements, but
only do so after peeling but just before we start transforming statements.

Codegen:

for e.g.

#define N 803
unsigned vect_a[N];
unsigned vect_b[N];

unsigned test4(unsigned x)
{
 unsigned ret = 0;
 for (int i = 0; i < N; i++)
 {
   vect_b[i] = x + i;
   if (vect_a[i] > x)
 break;
   vect_a[i] = x;

 }
 return ret;
}

We generate for Adv. SIMD:

test4:
adrpx2, .LC0
adrpx3, .LANCHOR0
dup v2.4s, w0
add x3, x3, :lo12:.LANCHOR0
moviv4.4s, 0x4
add x4, x3, 3216
ldr q1, [x2, #:lo12:.LC0]
mov x1, 0
mov w2, 0
.p2align 3,,7
.L3:
ldr q0, [x3, x1]
add v3.4s, v1.4s, v2.4s
add v1.4s, v1.4s, v4.4s
cmhiv0.4s, v0.4s, v2.4s
umaxp   v0.4s, v0.4s, v0.4s
fmovx5, d0
cbnzx5, .L6
add w2, w2, 1
str q3, [x1, x4]
str q2, [x3, x1]
add x1, x1, 16
cmp w2, 200
bne .L3
mov w7, 3
.L2:
lsl w2, w2, 2
add x5, x3, 3216
add w6, w2, w0
sxtwx4, w2
ldr w1, [x3, x4, lsl 2]
str w6, [x5, x4, lsl 2]
cmp w0, w1
bcc .L4
   

RE: middle-end: Support early break/return auto-vectorization.

2023-05-15 Thread Tamar Christina via Gcc-patches
Hi,

Yes I hope to upstream it this year.  I'm busy cleaning up a new version of the
patch which and hope to send it up for review again next week if all tests pass.

Cheers,
Tamar

From: juzhe.zh...@rivai.ai 
Sent: Monday, May 15, 2023 6:20 AM
To: gcc-patches 
Cc: rguenther ; Tamar Christina ; 
Richard Sandiford 
Subject: middle-end: Support early break/return auto-vectorization.

Hi, this patch is very interesting patch and I found it's very beneficial after 
applying to my downstream RVV GCC.
However, it has been a long time that this patch didn't update.
Is it possible that this patch will be refined and merged into trunk in the 
future ?

Thanks

juzhe.zh...@rivai.ai<mailto:juzhe.zh...@rivai.ai>


middle-end: Support early break/return auto-vectorization.

2023-05-14 Thread juzhe.zh...@rivai.ai
Hi, this patch is very interesting patch and I found it's very beneficial after 
applying to my downstream RVV GCC.
However, it has been a long time that this patch didn't update.
Is it possible that this patch will be refined and merged into trunk in the 
future ?

Thanks


juzhe.zh...@rivai.ai


RE: [PATCH 1/2]middle-end: Support early break/return auto-vectorization.

2022-12-14 Thread Richard Biener via Gcc-patches
On Tue, 13 Dec 2022, Tamar Christina wrote:

> Hi Richi,
> 
> This is a respin of the mid-end patch.  Changes since previous version:
>  - The mismatch in Boolean types is now fixed, and it generates an OR 
> reduction when it needs to.
>  - I've refactored things around to be a bit neater
>  - I've switched to using iterate_fix_dominators which has simplified the 
> loop peeling code a ton.
>  - I've moved the conditionals into the loop structure and use them from 
> there.
>  - I've moved the analysis part early into vect_analyze_data_ref_dependences
>  - I've switched to moving the scalar code instead of the vector code, as 
> moving vector required us to track a lot more complicated things like 
> internal functions.  It was also a lot more work when the loop is unrolled or 
> VF is increased due to unpacking.  I have verified as much as I can that we 
> don't seem to run into trouble doing this.
> 
> Outstanding things:
>   - Split off the SCEV parts from the rest of the patch (and determine the 
> "normal" exit based on the counting IV instead)
>   - Merge vectorizable_early_exit and transform_early_exit
> 
> I'm sending this patch out for you to take a look at the issue we were 
> discussing the issue on IRC (which you can reproduce with testcase 
> gcc.dg/vect/vect-early-break_16.c)
> 
> That should be the last outstanding issue.   Meanwhile I'll finish up the 
> splitting of SCEV and merging the two functions. 
> 
> Any additional comments is appreciated. Will hopefully finish the refactoring 
> today and send out the split patch tomorrow.

Few comments inline.

As said in the earlier review I dislike the "normal_exit" notion and
that the loop machinery is in charge of deciding on it.
get_loop_exit_condition should be unnecessary - the vectorizer should
know the exit it considers normal.  The dumping should be also adjusted,
eventually to also dump the edge as %d -> %d.

> Thanks,
> Tamar
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
> index 
> 528b1219bc37ad8f114d5cf381c0cff899db31ee..9c7f019a51abfe2de8e1dd7135dea2463b0256a0
>  100644
> --- a/gcc/cfgloop.h
> +++ b/gcc/cfgloop.h
> @@ -385,6 +385,7 @@ extern basic_block *get_loop_body_in_custom_order (const 
> class loop *, void *,
>  
>  extern auto_vec get_loop_exit_edges (const class loop *, basic_block * 
> = NULL);
>  extern edge single_exit (const class loop *);
> +extern edge normal_exit (const class loop *);
>  extern edge single_likely_exit (class loop *loop, const vec &);
>  extern unsigned num_loop_branches (const class loop *);
>  
> diff --git a/gcc/cfgloop.cc b/gcc/cfgloop.cc
> index 
> 57bf7b1855d4dd20fb3f42388124932d0ca2b48a..97a7373fb6d9514da602d5be01050f2ec66094bc
>  100644
> --- a/gcc/cfgloop.cc
> +++ b/gcc/cfgloop.cc
> @@ -1812,6 +1812,20 @@ single_exit (const class loop *loop)
>  return NULL;
>  }
>  
> +/* Returns the normal exit edge of LOOP, or NULL if LOOP has either no exit.
> +   If loops do not have the exits recorded, NULL is returned always.  */
> +
> +edge
> +normal_exit (const class loop *loop)
> +{
> +  struct loop_exit *exit = loop->exits->next;
> +
> +  if (!loops_state_satisfies_p (LOOPS_HAVE_RECORDED_EXITS))
> +return NULL;
> +
> +  return exit->e;
> +}
> +
>  /* Returns true when BB has an incoming edge exiting LOOP.  */
>  
>  bool
> diff --git a/gcc/doc/loop.texi b/gcc/doc/loop.texi
> index 
> 6e8657a074d2447db7ae9b75cbfbb71282b84287..e1de2ac40f87f879ab691f68bd41b3bc21a83bf7
>  100644
> --- a/gcc/doc/loop.texi
> +++ b/gcc/doc/loop.texi
> @@ -211,6 +211,10 @@ relation, and breath-first search order, respectively.
>  @item @code{single_exit}: Returns the single exit edge of the loop, or
>  @code{NULL} if the loop has more than one exit.  You can only use this
>  function if @code{LOOPS_HAVE_RECORDED_EXITS} is used.
> +function if LOOPS_HAVE_MARKED_SINGLE_EXITS property is used.
> +@item @code{normal_exit}: Returns the natural exit edge of the loop,
> +even if the loop has more than one exit.  The natural exit is the exit
> +that would normally be taken where the loop to be fully executed.
>  @item @code{get_loop_exit_edges}: Enumerates the exit edges of a loop.
>  @item @code{just_once_each_iteration_p}: Returns true if the basic block
>  is executed exactly once during each iteration of a loop (that is, it
> @@ -623,4 +627,4 @@ maximum verbosity the details of a data dependence 
> relations array,
>  @code{dump_dist_dir_vectors} prints only the classical distance and
>  direction vectors for a data dependence relations array, and
>  @code{dump_data_references} prints the details of the data references
> -contained in a data reference array.
> +contained in a data reference array
> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
> index 
> ffe69d6fcb9c46cf97ba570e85b56e586a0c9b99..a82c7b8f1efa01b02b772c9dd0f5b3dcde817091
>  100644
> --- a/gcc/doc/sourcebuild.texi
> +++ b/gcc/doc/sourcebuild.texi
> @@ -1637,6 +1637,10 @@ Target supports hardware 

Re: [PATCH 1/2]middle-end: Support early break/return auto-vectorization.

2022-11-18 Thread Richard Biener via Gcc-patches
On Wed, 2 Nov 2022, Tamar Christina wrote:

> Hi All,
> 
> This patch adds initial support for early break vectorization in GCC.
> The support is added for any target that implements a vector cbranch optab.

I'm looking at this now, first some high-level questions.

Why do we need a new cbranch optab?  It seems implementing
a vector comparison and mask test against zero sufficies?

You have some elaborate explanation on how peeling works but I
somewhat miss the high-level idea how to vectorize the early
exit.  I've applied the patches and from looking at how
vect-early-break_1.c gets transformed on aarch64 it seems you
vectorize

 for (int i = 0; i < N; i++)
 {
   vect_b[i] = x + i;
   if (vect_a[i] > x)
 break;
   vect_a[i] = x;
 }

as

 for (int i = 0; i < N;)
 {
   if (any (vect_a[i] > x))
 break;
   i += VF;
   vect_b[i] = x + i;
   vect_a[i] = x;
 }
 for (; i < N; i++)
 { 
   vect_b[i] = x + i;
   if (vect_a[i] > x)
 break;
   vect_a[i] = x;
 }

As you outline below this requires that the side-effects done as part
of  and  before exiting can be moved after the
exit, basically you need to be able to compute whether any scalar
iteration covered by a vector iteration will exit the loop early.
Code generation wise you'd simply "ignore" code generating early exits
at the place they appear in the scalar code and instead emit them
vectorized in the loop header.

> Concretely the kind of loops supported are of the forms:
> 
>  for (int i = 0; i < N; i++)
>  {
>
>if ()
>  ;
>
>  }
> 
> where  can be:
>  - break
>  - return
> 
> Any number of statements can be used before the  occurs.
> 
> Since this is an initial version for GCC 13 it has the following limitations:
> 
> - Only fixed sized iterations and buffers are supported.  That is to say any
>   vectors loaded or stored must be to statically allocated arrays with known
>   sizes. N must also be known.

Why?

> - any stores in  should not be to the same objects as in
>   .  Loads are fine as long as they don't have the possibility to
>   alias.

I think that's a fundamental limitation - you have to be able to compute 
the early exit condition at the beginning of the vectorized loop.  For
a single alternate exit it might be possible to apply loop rotation to
move things but that can introduce "bad" cross-iteration dependences(?)

> - No support for prologue peeling.  Since we only support fixed buffers this
>   wouldn't be an issue as we assume the arrays are correctly aligned.

Huh, I don't understand how prologue or epilogue peeling is an issue?  Is
that just because you didn't handle the early exit triggering?

> - Fully masked loops or unmasked loops are supported, but not partially masked
>   loops.
> - Only one additional exit is supported at this time.  The majority of the 
> code
>   will handle n exits. But not all so at this time this restriction is needed.
> - The early exit must be before the natural loop exit/latch.  The vectorizer 
> is
>   designed in way to propage phi-nodes downwards.  As such supporting this
>   inverted control flow is hard.

How do you identify the "natural" exit?  It's the one 
number_of_iterations_exit works on?  Your normal_exit picks the
first from the loops recorded exit list but I don't think that list
is ordered in any particular way.

"normal_exit" would rather be single_countable_exit () or so?  A loop
already has a list of control_ivs (not sure if we ever have more than
one), I wonder if that can be annotated with the corresponding exit
edge?

I think that vect_analyze_loop_form should record the counting IV
exit edge and that recorded edge should be passed to utilities
like slpeel_can_duplicate_loop_p rather than re-querying 'normal_exit',
for example if we'd have

for (;; ++i, ++j)
  {
if (i < n)
  break;
a[i] = 0;
if (j < m)
  break;
  }

which counting IV we choose as "normal" should be up to the vectorizer,
not up to the loop infrastructure.

The patch should likely be split, doing single_exit () replacements
with, say, LOOP_VINFO_IV_EXIT (..) first.


> - No support for epilogue vectorization.  The only epilogue supported is the
>   scalar final one.
>
> With the help of IPA this still gets hit quite often.  During bootstrap it
> hit rather frequently as well.
> 
> This implementation does not support completely handling the early break 
> inside
> the vector loop itself but instead supports adding checks such that if we know
> that we have to exit in the current iteration then we branch to scalar code to
> actually do the final VF iterations which handles all the code in .
> 
> niters analysis and the majority of the vectorizer with hardcoded single_exit
> have been updated with the use of a new function normal_exit which returns the
> loop's natural exit.
> 
> for niters the natural exit is still what determines the overall iterations as
> that is the O(iters) for the loop.
> 
> For the scalar loop we know that whatever exit you take you have to perform at

Re: [PATCH 1/2]middle-end: Support early break/return auto-vectorization.

2022-11-16 Thread Jeff Law via Gcc-patches



On 11/16/22 05:17, Richard Biener via Gcc-patches wrote:

On Tue, 15 Nov 2022, Tamar Christina wrote:


Ping, anyone alive in here? ?

You have to queue behind all the others waiting, sorry.  But it's on
my list of things to look at - but it's also a complex thing and thus
requires a larger chunk of time.


Yea, it was quite an end-of-stage1 this cycle.  I'm slogging through as 
much as I can day-to-day.



jeff




RE: [PATCH 1/2]middle-end: Support early break/return auto-vectorization.

2022-11-16 Thread Richard Biener via Gcc-patches
On Tue, 15 Nov 2022, Tamar Christina wrote:

> Ping, anyone alive in here? ?

You have to queue behind all the others waiting, sorry.  But it's on
my list of things to look at - but it's also a complex thing and thus
requires a larger chunk of time.

Richard.

> > -Original Message-
> > From: Tamar Christina
> > Sent: Tuesday, November 8, 2022 5:37 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: nd ; rguent...@suse.de; Richard Sandiford
> > 
> > Subject: RE: [PATCH 1/2]middle-end: Support early break/return auto-
> > vectorization.
> > 
> > ping
> > 
> > > -Original Message-
> > > From: Tamar Christina
> > > Sent: Wednesday, November 2, 2022 2:46 PM
> > > To: gcc-patches@gcc.gnu.org
> > > Cc: nd ; rguent...@suse.de; Richard Sandiford
> > > 
> > > Subject: [PATCH 1/2]middle-end: Support early break/return auto-
> > > vectorization.
> > >
> > > Hi All,
> > >
> > > This patch adds initial support for early break vectorization in GCC.
> > > The support is added for any target that implements a vector cbranch
> > optab.
> > >
> > > Concretely the kind of loops supported are of the forms:
> > >
> > >  for (int i = 0; i < N; i++)
> > >  {
> > >
> > >if ()
> > >  ;
> > >
> > >  }
> > >
> > > where  can be:
> > >  - break
> > >  - return
> > >
> > > Any number of statements can be used before the  occurs.
> > >
> > > Since this is an initial version for GCC 13 it has the following 
> > > limitations:
> > >
> > > - Only fixed sized iterations and buffers are supported.  That is to say 
> > > any
> > >   vectors loaded or stored must be to statically allocated arrays with 
> > > known
> > >   sizes. N must also be known.
> > > - any stores in  should not be to the same objects as in
> > >   .  Loads are fine as long as they don't have the possibility 
> > > to
> > >   alias.
> > > - No support for prologue peeling.  Since we only support fixed buffers 
> > > this
> > >   wouldn't be an issue as we assume the arrays are correctly aligned.
> > > - Fully masked loops or unmasked loops are supported, but not partially
> > > masked
> > >   loops.
> > > - Only one additional exit is supported at this time.  The majority of the
> > code
> > >   will handle n exits. But not all so at this time this restriction is 
> > > needed.
> > > - The early exit must be before the natural loop exit/latch.  The 
> > > vectorizer is
> > >   designed in way to propage phi-nodes downwards.  As such supporting
> > this
> > >   inverted control flow is hard.
> > > - No support for epilogue vectorization.  The only epilogue supported is 
> > > the
> > >   scalar final one.
> > >
> > > With the help of IPA this still gets hit quite often.  During bootstrap it
> > > hit rather frequently as well.
> > >
> > > This implementation does not support completely handling the early break
> > > inside
> > > the vector loop itself but instead supports adding checks such that if we
> > > know
> > > that we have to exit in the current iteration then we branch to scalar 
> > > code
> > to
> > > actually do the final VF iterations which handles all the code in 
> > > .
> > >
> > > niters analysis and the majority of the vectorizer with hardcoded
> > single_exit
> > > have been updated with the use of a new function normal_exit which
> > > returns the
> > > loop's natural exit.
> > >
> > > for niters the natural exit is still what determines the overall 
> > > iterations as
> > > that is the O(iters) for the loop.
> > >
> > > For the scalar loop we know that whatever exit you take you have to
> > > perform at
> > > most VF iterations.
> > >
> > > When the loop is peeled during the copying I have to go through great
> > > lengths to
> > > keep the dominators up to date.  All exits from the first loop are 
> > > rewired to
> > > the
> > > loop header of the second loop.  But this can change the immediate
> > > dominator.
> > >
> > > We had spoken on IRC about removing the dominators validation call at the
> > > end of
> > > slpeel_tree_duplica

RE: [PATCH 1/2]middle-end: Support early break/return auto-vectorization.

2022-11-03 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Gcc-patches  bounces+tamar.christina=arm@gcc.gnu.org> On Behalf Of Jeff Law via
> Gcc-patches
> Sent: Wednesday, November 2, 2022 10:33 PM
> To: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH 1/2]middle-end: Support early break/return auto-
> vectorization.
> 
> 
> On 11/2/22 15:50, Bernhard Reutner-Fischer via Gcc-patches wrote:
> > On 2 November 2022 15:45:39 CET, Tamar Christina via Gcc-patches  patc...@gcc.gnu.org> wrote:
> >> Hi All,
> >>
> >> This patch adds initial support for early break vectorization in GCC.
> >> The support is added for any target that implements a vector cbranch
> optab.
> >>
> >> Concretely the kind of loops supported are of the forms:
> >>
> >> for (int i = 0; i < N; i++)
> >> {
> >>
> >>if ()
> >>  ;
> >>
> >> }
> >>
> >> where  can be:
> >> - break
> >> - return
> > Just curious, but don't we have graphite for splitting loops on control 
> > flow,
> respectively reflow loops to help vectorization like in this case? Did you
> compare, and if so, what's missing?
> 
> Graphite isn't generally enabled, is largely unmaintained and often makes
> things worse rather than better.
> 

Indeed, but also couldn't get graphite to do much in this case.

Pass just says

0 loops carried no dependency.
Pass statistics of "graphite": 

And bails out. and none of the other graphite passes seem to put anything
in the dump files..

That aside, other reason to do it in the vectorizer is that eventually we can 
build
to support general control flow support and vectorizing these loops without
needing to peel the loop for ISAs that are fully masked.

Regards,
Tamar

> 
> jeff
> 



Re: [PATCH 1/2]middle-end: Support early break/return auto-vectorization.

2022-11-02 Thread Jeff Law via Gcc-patches



On 11/2/22 15:50, Bernhard Reutner-Fischer via Gcc-patches wrote:

On 2 November 2022 15:45:39 CET, Tamar Christina via Gcc-patches 
 wrote:

Hi All,

This patch adds initial support for early break vectorization in GCC.
The support is added for any target that implements a vector cbranch optab.

Concretely the kind of loops supported are of the forms:

for (int i = 0; i < N; i++)
{
   
   if ()
 ;
   
}

where  can be:
- break
- return

Just curious, but don't we have graphite for splitting loops on control flow, 
respectively reflow loops to help vectorization like in this case? Did you 
compare, and if so, what's missing?


Graphite isn't generally enabled, is largely unmaintained and often 
makes things worse rather than better.



jeff




Re: [PATCH 1/2]middle-end: Support early break/return auto-vectorization.

2022-11-02 Thread Bernhard Reutner-Fischer via Gcc-patches
On 2 November 2022 15:45:39 CET, Tamar Christina via Gcc-patches 
 wrote:
>Hi All,
>
>This patch adds initial support for early break vectorization in GCC.
>The support is added for any target that implements a vector cbranch optab.
>
>Concretely the kind of loops supported are of the forms:
>
> for (int i = 0; i < N; i++)
> {
>   
>   if ()
> ;
>   
> }
>
>where  can be:
> - break
> - return

Just curious, but don't we have graphite for splitting loops on control flow, 
respectively reflow loops to help vectorization like in this case? Did you 
compare, and if so, what's missing?

thanks and cheers,