Re: [PATCH GCC]More conservative interchanging small loops with const initialized simple reduction

Richard Biener Fri, 08 Dec 2017 06:41:21 -0800

On Fri, Dec 8, 2017 at 1:43 PM, Bin.Cheng <amker.ch...@gmail.com> wrote:
> On Fri, Dec 8, 2017 at 12:17 PM, Richard Biener
> <richard.guent...@gmail.com> wrote:
>> On Fri, Dec 8, 2017 at 12:46 PM, Bin Cheng <bin.ch...@arm.com> wrote:
>>> Hi,
>>> This simple patch makes interchange even more conservative for small loops 
>>> with constant initialized simple reduction.
>>> The reason is undoing such reduction introduces new data reference and 
>>> cond_expr, which could cost too much in a small
>>> loop.
>>> Test gcc.target/aarch64/pr62178.c is fixed with this patch.  Is it OK if 
>>> test passes?
>>
>> Shouldn't we do this even for non-constant initialzied simple
>> reduction?  Because for any simple
>> reduction we add two DRs that are not innermost, for constant
>> initialized we add an additional
>> cond-expr.  So ...
>>
>> +  /* Conservatively skip interchange in cases only have few data references
>> +     and constant initialized simple reduction since it introduces new data
>> +     reference as well as ?: operation.  */
>> +  if (num_old_inv_drs + num_const_init_simple_reduc * 2 >= datarefs.length 
>> ())
>> +    return false;
>> +
>>
>> can you, instead of carrying num_const_init_simple_reduc simply loop
>> over m_reductions
>> and classify them in this function accordingly?  I think we want to
>> cost non-constant-init
>> reductions as well.  The :? can eventually count for another DR for
>> cost purposes.
> Number of non-constant-init reductions can still be carried in struct
> loop_cand?  I am not very sure what's the advantage of an additional
> loop over m_reductions getting the same information.
> Perhaps the increase of stmts should be counted like:
>   num_old_inv_drs + num_const_init_simple_reduc * 2 - num_new_inv_drs
> Question is which number should this be compared against.  (we may
> need to shift num_new_inv_drs to the other side for wrapping issue).
>
>>
>> It looks like we do count the existing DRs for the reduction?  Is that
>> why you arrive
>> at the num_const_init_simple_reduc * 2 figure? (one extra load plus one ?:)
> Yes.
>> But we don't really know whether the DR was invariant in the outer
>> loop (well, I suppose
> Hmm, I might misunderstand here.  num_old_inv_drs tracks the number of
> invariant reference with regarding to inner loop, rather than the
> outer loop.  The same to num_new_inv_drs,
> which means a reference become invariant after loop interchange with
> regarding to (the new) inner loop.  This invariant information is
> always known from data reference, right?
> As for DRs for reduction, we know it's invariant because we set its
> inner loop stride to zero.
>
>> we could remember the DR in m_reductions).
>>
>> Note that the good thing is that the ?: has an invariant condition and
>> thus vectorization
>> can hoist the mask generation out of the vectorized loop which means
>> it boils down to
>> cheap operations.  My gut feeling is that just looking at the number
>> of memory references
>> isn't a good indicator of profitability as the regular stmt workload
>> has a big impact on
>> profitability of vectorization.
> It's not specific to vectorization.  The generated new code also costs
> too much in small loops without vectorization.  But yes, # of mem_refs
> may be too inaccurate, maybe we should check against num_stmts.


Not specific to vectorization but the interchange may pay off only when
vectorizing a loop.  Would the loop in loop-interchange-5.c be still
interchanged?  If we remove the multiplication and just keep
c[i][j] = c[i][j] + b[k][j];
?  That is, why is the constant init so special?  Even for non-constant init
we're changing two outer loop DRs to two non-consecutive inner loop DRs.

Richard.

> Thanks,
> bin
>>
>> So no ack nor nack...
>>
>> Richard.
>>
>>> Thanks,
>>> bin
>>> 2017-12-08  Bin Cheng  <bin.ch...@arm.com>
>>>
>>>         * gimple-loop-interchange.cc (struct loop_cand): New field.
>>>         (loop_cand::loop_cand): Init new field in constructor.
>>>         (loop_cand::classify_simple_reduction): Record simple reduction
>>>         initialized with constant value.
>>>         (should_interchange_loops): New parameter.  Skip interchange if loop
>>>         has few data references and constant intitialized simple reduction.
>>>         (tree_loop_interchange::interchange): Update call to above function.
>>>         (should_interchange_loop_nest): Ditto.

Re: [PATCH GCC]More conservative interchanging small loops with const initialized simple reduction

Reply via email to