On Fri, Dec 8, 2017 at 12:17 PM, Richard Biener
<richard.guent...@gmail.com> wrote:
> On Fri, Dec 8, 2017 at 12:46 PM, Bin Cheng <bin.ch...@arm.com> wrote:
>> Hi,
>> This simple patch makes interchange even more conservative for small loops 
>> with constant initialized simple reduction.
>> The reason is undoing such reduction introduces new data reference and 
>> cond_expr, which could cost too much in a small
>> loop.
>> Test gcc.target/aarch64/pr62178.c is fixed with this patch.  Is it OK if 
>> test passes?
>
> Shouldn't we do this even for non-constant initialzied simple
> reduction?  Because for any simple
> reduction we add two DRs that are not innermost, for constant
> initialized we add an additional
> cond-expr.  So ...
>
> +  /* Conservatively skip interchange in cases only have few data references
> +     and constant initialized simple reduction since it introduces new data
> +     reference as well as ?: operation.  */
> +  if (num_old_inv_drs + num_const_init_simple_reduc * 2 >= datarefs.length 
> ())
> +    return false;
> +
>
> can you, instead of carrying num_const_init_simple_reduc simply loop
> over m_reductions
> and classify them in this function accordingly?  I think we want to
> cost non-constant-init
> reductions as well.  The :? can eventually count for another DR for
> cost purposes.
Number of non-constant-init reductions can still be carried in struct
loop_cand?  I am not very sure what's the advantage of an additional
loop over m_reductions getting the same information.
Perhaps the increase of stmts should be counted like:
  num_old_inv_drs + num_const_init_simple_reduc * 2 - num_new_inv_drs
Question is which number should this be compared against.  (we may
need to shift num_new_inv_drs to the other side for wrapping issue).

>
> It looks like we do count the existing DRs for the reduction?  Is that
> why you arrive
> at the num_const_init_simple_reduc * 2 figure? (one extra load plus one ?:)
Yes.
> But we don't really know whether the DR was invariant in the outer
> loop (well, I suppose
Hmm, I might misunderstand here.  num_old_inv_drs tracks the number of
invariant reference with regarding to inner loop, rather than the
outer loop.  The same to num_new_inv_drs,
which means a reference become invariant after loop interchange with
regarding to (the new) inner loop.  This invariant information is
always known from data reference, right?
As for DRs for reduction, we know it's invariant because we set its
inner loop stride to zero.

> we could remember the DR in m_reductions).
>
> Note that the good thing is that the ?: has an invariant condition and
> thus vectorization
> can hoist the mask generation out of the vectorized loop which means
> it boils down to
> cheap operations.  My gut feeling is that just looking at the number
> of memory references
> isn't a good indicator of profitability as the regular stmt workload
> has a big impact on
> profitability of vectorization.
It's not specific to vectorization.  The generated new code also costs
too much in small loops without vectorization.  But yes, # of mem_refs
may be too inaccurate, maybe we should check against num_stmts.

Thanks,
bin
>
> So no ack nor nack...
>
> Richard.
>
>> Thanks,
>> bin
>> 2017-12-08  Bin Cheng  <bin.ch...@arm.com>
>>
>>         * gimple-loop-interchange.cc (struct loop_cand): New field.
>>         (loop_cand::loop_cand): Init new field in constructor.
>>         (loop_cand::classify_simple_reduction): Record simple reduction
>>         initialized with constant value.
>>         (should_interchange_loops): New parameter.  Skip interchange if loop
>>         has few data references and constant intitialized simple reduction.
>>         (tree_loop_interchange::interchange): Update call to above function.
>>         (should_interchange_loop_nest): Ditto.

Reply via email to