On Tue, Jul 4, 2017 at 12:19 PM, Richard Biener
<richard.guent...@gmail.com> wrote:
> On Mon, Jul 3, 2017 at 4:17 PM, Bin.Cheng <amker.ch...@gmail.com> wrote:
>> On Mon, Jul 3, 2017 at 10:38 AM, Richard Biener
>> <richard.guent...@gmail.com> wrote:
>>> On Tue, Jun 27, 2017 at 12:49 PM, Bin Cheng <bin.ch...@arm.com> wrote:
>>>> Hi,
>>>> For the moment, tree-predcom.c only supports 
>>>> invariant/load-loads/store-loads chains.
>>>> This patch generalizes dead store elimination (or store motion) across 
>>>> loop iterations in
>>>> predictive commoning pass by supporting store-store chain.  As comment in 
>>>> the patch:
>>>>
>>>>    Apart from predictive commoning on Load-Load and Store-Load chains, we
>>>>    also support Store-Store chains -- stores killed by other store can be
>>>>    eliminated.  Given below example:
>>>>
>>>>      for (i = 0; i < n; i++)
>>>>        {
>>>>          a[i] = 1;
>>>>          a[i+2] = 2;
>>>>        }
>>>>
>>>>    It can be replaced with:
>>>>
>>>>      t0 = a[0];
>>>>      t1 = a[1];
>>>>      for (i = 0; i < n; i++)
>>>>        {
>>>>          a[i] = 1;
>>>>          t2 = 2;
>>>>          t0 = t1;
>>>>          t1 = t2;
>>>>        }
>>>>      a[n] = t0;
>>>>      a[n+1] = t1;
>>>>
>>>>    If the loop runs more than 1 iterations, it can be further simplified 
>>>> into:
>>>>
>>>>      for (i = 0; i < n; i++)
>>>>        {
>>>>          a[i] = 1;
>>>>        }
>>>>      a[n] = 2;
>>>>      a[n+1] = 2;
>>>>
>>>>    The interesting part is this can be viewed either as general store 
>>>> motion
>>>>    or general dead store elimination in either intra/inter-iterations way.
>>>>
>>>> There are number of interesting facts about this enhancement:
>>>> a) This patch supports dead store elimination for both across-iteration 
>>>> case and single-iteration
>>>>      case.  For the latter, it is dead store elimination.
>>>> b) There are advantages supporting dead store elimination in predcom, for 
>>>> example, it has
>>>>      complete information about memory address.  On the contrary, DSE pass 
>>>> can only handle
>>>>      memory references with exact the same memory address expression.
>>>> c) It's cheap to support store-stores chain in predcom based on existing 
>>>> code.
>>>> d) As commented, the enhancement can be viewed as either generalized dead 
>>>> store elimination
>>>>      or generalized store motion.  I prefer DSE here.
>>>>
>>>> Bootstrap(O2/O3) in patch series on x86_64 and AArch64.  Is it OK?
>>>
>>> Looks mostly ok.  I have a few questions though.
>>>
>>> +  /* Don't do store elimination if loop has multiple exit edges.  */
>>> +  bool eliminate_store_p = single_exit (loop) != NULL;
>>>
>>> handling this would be an enhancement?  IIRC LIM store-motion handles this
>>> just fine by emitting code on all exits.
>> It is an enhancement with a little bit more complication.  We would
>> need to setup/record finalizer memory references for different exit
>> edges.  I added TODO description for this (and following one).  Is it
>> okay to pick up this in the future?
>
> Yes.
>
>>>
>>> @@ -1773,6 +2003,9 @@ determine_unroll_factor (vec<chain_p> chains)
>>>      {
>>>        if (chain->type == CT_INVARIANT)
>>>         continue;
>>> +      /* Don't unroll when eliminating stores.  */
>>> +      else if (chain->type == CT_STORE_STORE)
>>> +       return 1;
>>>
>>> this is a hard exit value so we do not handle the case where another chain
>>> in the loop would want to unroll? (enhancement?)  I'd have expected to do
>>> the same as for CT_INVARIANT here.
>> I didn't check what change is needed in case of unrolling.  I am not
>> very sure if we should prefer unroll for *load chains or prefer not
>> unroll for store-store chains, because unroll in general increases
>> loop-carried register pressure for store-store chains rather than
>> decreases register pressure for *load chains.
>> I was also thinking if it's possible to restrict unrolling somehow in
>> order to enable predcom at O2.  BTW, this is not common, it only
>> happens once in spec2k6 with factor forced to 1.  So okay if as it is
>> now?
>
> I think it is ok for now with a TODO added.  Please change the comment
> to "we can't handle unrolling when eliminating stores" (it's not clear if we
> can -- did you try?  maybe add a testcase that would ICE)
>
>>
>>>
>>> +      tree init = ref_at_iteration (dr, (int) 0 - i, &stmts);
>>> +      if (!chain->all_always_accessed && tree_could_trap_p (init))
>>> +       {
>>> +         gimple_seq_discard (stmts);
>>> +         return false;
>>>
>>> so this is the only place that remotely cares for not always performed 
>>> stores.
>>> But as-is the patch doesn't seem to avoid speculating stores and thus
>>> violates the C++ memory model, aka, introduces store-data-races?  The LIM
>>> store-motion code was fixed to avoid this by keeping track of whether a BB
>>> has executed to guard the stores done in the compensation code on the loop
>>> exit.
>>>
>>> That said, to "fix" this all && tree_could_trap_p cases would need to be 
>>> removed
>>> (or similarly flag vars be introduced).  Speculating loads that do not
>>> trap is ok
>>> (might only introduce false uninit use reports by tools like valgrind).
>> Hmm, not sure IIUC.  Patch updated, is it correct (though conservative)?
>
> +static bool
> +prepare_initializers_chain_store_elim (struct loop *, chain_p chain)
> +{
> ...
> +      tree init = ref_at_iteration (dr, (int) 0 - i, &stmts);
> +      if (!chain->all_always_accessed && tree_could_trap_p (init))
> +       {
>
> remove && tree_could_trap_p and hoist the other check.
>
> Same in prepare_finalizers_chain.  I think you should check
>
>    if (! chain->all_always_accessed
>       && ! PARAM_VALUE (PARAM_ALLOW_STORE_DATA_RACES))
>      return false;
>
> at the beginning of both functions and retain
>
>     if (! chain->all_always_accessed && tree_could_trap_p ())
>
> in the loops.
>
> Ok with that change and a testcase that would exercise failure/ICE of
> store chains w/ unrolling.
Hmm, now I remember maybe these
all_always_accessed/trap/data_store_race checks can be simplified.  In
function suitable_component_p, we call just_once_each_iteration_p for
all references.  So we shouldn't not end up with
chain->all_always_accessed == false cases, right?  why means we don't
really need to check at all?

Thanks,
bin
>
> Thanks,
> Richard.
>
>> Thanks,
>> bin
>>>
>>> Thanks,
>>> Richard.
>>>
>>>> Thanks,
>>>> bin
>>>> 2017-06-21  Bin Cheng  <bin.ch...@arm.com>
>>>>
>>>>         * tree-predcom.c: Revise general description of pass.
>>>>         (enum chain_type): New enum type for store elimination.
>>>>         (struct chain): New field supporting store elimination.
>>>>         (dump_chain): Dump store-stores chain.
>>>>         (release_chain): Release resources.
>>>>         (split_data_refs_to_components): Compute and create component
>>>>         contains only stores for elimination.
>>>>         (get_chain_last_ref_at): New function.
>>>>         (make_invariant_chain): Initialization.
>>>>         (make_rooted_chain): Specify chain type in parameter.
>>>>         (add_looparound_copies): Skip for store-stores chain.
>>>>         (determine_roots_comp): Compute type of chain and pass it to
>>>>         make_rooted_chain.
>>>>         (initialize_root_vars_store_elim_2): New function.
>>>>         (finalize_eliminated_stores): New function.
>>>>         (remove_stmt): Handle store for elimination.
>>>>         (execute_pred_commoning_chain): Execute predictive commoning on
>>>>         store-store chains.
>>>>         (determine_unroll_factor): Skip unroll for store-stores chain.
>>>>         (prepare_initializers_chain_store_elim): New function.
>>>>         (prepare_initializers_chain): Hanlde store-store chain.
>>>>         (prepare_finalizers_chain, prepare_finalizers): New function.
>>>>         (tree_predictive_commoning_loop): Return integer value indicating
>>>>         if loop is unrolled or lcssa form is corrupted.
>>>>         (tree_predictive_commoning): Rewrite for lcssa form if necessary.
>>>>
>>>> gcc/testsuite/ChangeLog
>>>> 2017-06-21  Bin Cheng  <bin.ch...@arm.com>
>>>>
>>>>         * gcc.dg/tree-ssa/predcom-dse-1.c: New test.
>>>>         * gcc.dg/tree-ssa/predcom-dse-2.c: New test.
>>>>         * gcc.dg/tree-ssa/predcom-dse-3.c: New test.
>>>>         * gcc.dg/tree-ssa/predcom-dse-4.c: New test.
>>>>         * gcc.dg/tree-ssa/predcom-dse-5.c: New test.
>>>>         * gcc.dg/tree-ssa/predcom-dse-6.c: New test.
>>>>         * gcc.dg/tree-ssa/predcom-dse-7.c: New test.
>>>>         * gcc.dg/tree-ssa/predcom-dse-8.c: New test.
>>>>         * gcc.dg/tree-ssa/predcom-dse-9.c: New test.

Reply via email to