On Fri, Dec 8, 2017 at 1:43 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: > On Fri, Dec 8, 2017 at 12:17 PM, Richard Biener > <richard.guent...@gmail.com> wrote: >> On Fri, Dec 8, 2017 at 12:46 PM, Bin Cheng <bin.ch...@arm.com> wrote: >>> Hi, >>> This simple patch makes interchange even more conservative for small loops >>> with constant initialized simple reduction. >>> The reason is undoing such reduction introduces new data reference and >>> cond_expr, which could cost too much in a small >>> loop. >>> Test gcc.target/aarch64/pr62178.c is fixed with this patch. Is it OK if >>> test passes? >> >> Shouldn't we do this even for non-constant initialzied simple >> reduction? Because for any simple >> reduction we add two DRs that are not innermost, for constant >> initialized we add an additional >> cond-expr. So ... >> >> + /* Conservatively skip interchange in cases only have few data references >> + and constant initialized simple reduction since it introduces new data >> + reference as well as ?: operation. */ >> + if (num_old_inv_drs + num_const_init_simple_reduc * 2 >= datarefs.length >> ()) >> + return false; >> + >> >> can you, instead of carrying num_const_init_simple_reduc simply loop >> over m_reductions >> and classify them in this function accordingly? I think we want to >> cost non-constant-init >> reductions as well. The :? can eventually count for another DR for >> cost purposes. > Number of non-constant-init reductions can still be carried in struct > loop_cand? I am not very sure what's the advantage of an additional > loop over m_reductions getting the same information. > Perhaps the increase of stmts should be counted like: > num_old_inv_drs + num_const_init_simple_reduc * 2 - num_new_inv_drs > Question is which number should this be compared against. (we may > need to shift num_new_inv_drs to the other side for wrapping issue). > >> >> It looks like we do count the existing DRs for the reduction? Is that >> why you arrive >> at the num_const_init_simple_reduc * 2 figure? (one extra load plus one ?:) > Yes. >> But we don't really know whether the DR was invariant in the outer >> loop (well, I suppose > Hmm, I might misunderstand here. num_old_inv_drs tracks the number of > invariant reference with regarding to inner loop, rather than the > outer loop. The same to num_new_inv_drs, > which means a reference become invariant after loop interchange with > regarding to (the new) inner loop. This invariant information is > always known from data reference, right? > As for DRs for reduction, we know it's invariant because we set its > inner loop stride to zero. > >> we could remember the DR in m_reductions). >> >> Note that the good thing is that the ?: has an invariant condition and >> thus vectorization >> can hoist the mask generation out of the vectorized loop which means >> it boils down to >> cheap operations. My gut feeling is that just looking at the number >> of memory references >> isn't a good indicator of profitability as the regular stmt workload >> has a big impact on >> profitability of vectorization. > It's not specific to vectorization. The generated new code also costs > too much in small loops without vectorization. But yes, # of mem_refs > may be too inaccurate, maybe we should check against num_stmts.
Not specific to vectorization but the interchange may pay off only when vectorizing a loop. Would the loop in loop-interchange-5.c be still interchanged? If we remove the multiplication and just keep c[i][j] = c[i][j] + b[k][j]; ? That is, why is the constant init so special? Even for non-constant init we're changing two outer loop DRs to two non-consecutive inner loop DRs. Richard. > Thanks, > bin >> >> So no ack nor nack... >> >> Richard. >> >>> Thanks, >>> bin >>> 2017-12-08 Bin Cheng <bin.ch...@arm.com> >>> >>> * gimple-loop-interchange.cc (struct loop_cand): New field. >>> (loop_cand::loop_cand): Init new field in constructor. >>> (loop_cand::classify_simple_reduction): Record simple reduction >>> initialized with constant value. >>> (should_interchange_loops): New parameter. Skip interchange if loop >>> has few data references and constant intitialized simple reduction. >>> (tree_loop_interchange::interchange): Update call to above function. >>> (should_interchange_loop_nest): Ditto.