On Tue, Jul 4, 2017 at 12:19 PM, Richard Biener <richard.guent...@gmail.com> wrote: > On Mon, Jul 3, 2017 at 4:17 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: >> On Mon, Jul 3, 2017 at 10:38 AM, Richard Biener >> <richard.guent...@gmail.com> wrote: >>> On Tue, Jun 27, 2017 at 12:49 PM, Bin Cheng <bin.ch...@arm.com> wrote: >>>> Hi, >>>> For the moment, tree-predcom.c only supports >>>> invariant/load-loads/store-loads chains. >>>> This patch generalizes dead store elimination (or store motion) across >>>> loop iterations in >>>> predictive commoning pass by supporting store-store chain. As comment in >>>> the patch: >>>> >>>> Apart from predictive commoning on Load-Load and Store-Load chains, we >>>> also support Store-Store chains -- stores killed by other store can be >>>> eliminated. Given below example: >>>> >>>> for (i = 0; i < n; i++) >>>> { >>>> a[i] = 1; >>>> a[i+2] = 2; >>>> } >>>> >>>> It can be replaced with: >>>> >>>> t0 = a[0]; >>>> t1 = a[1]; >>>> for (i = 0; i < n; i++) >>>> { >>>> a[i] = 1; >>>> t2 = 2; >>>> t0 = t1; >>>> t1 = t2; >>>> } >>>> a[n] = t0; >>>> a[n+1] = t1; >>>> >>>> If the loop runs more than 1 iterations, it can be further simplified >>>> into: >>>> >>>> for (i = 0; i < n; i++) >>>> { >>>> a[i] = 1; >>>> } >>>> a[n] = 2; >>>> a[n+1] = 2; >>>> >>>> The interesting part is this can be viewed either as general store >>>> motion >>>> or general dead store elimination in either intra/inter-iterations way. >>>> >>>> There are number of interesting facts about this enhancement: >>>> a) This patch supports dead store elimination for both across-iteration >>>> case and single-iteration >>>> case. For the latter, it is dead store elimination. >>>> b) There are advantages supporting dead store elimination in predcom, for >>>> example, it has >>>> complete information about memory address. On the contrary, DSE pass >>>> can only handle >>>> memory references with exact the same memory address expression. >>>> c) It's cheap to support store-stores chain in predcom based on existing >>>> code. >>>> d) As commented, the enhancement can be viewed as either generalized dead >>>> store elimination >>>> or generalized store motion. I prefer DSE here. >>>> >>>> Bootstrap(O2/O3) in patch series on x86_64 and AArch64. Is it OK? >>> >>> Looks mostly ok. I have a few questions though. >>> >>> + /* Don't do store elimination if loop has multiple exit edges. */ >>> + bool eliminate_store_p = single_exit (loop) != NULL; >>> >>> handling this would be an enhancement? IIRC LIM store-motion handles this >>> just fine by emitting code on all exits. >> It is an enhancement with a little bit more complication. We would >> need to setup/record finalizer memory references for different exit >> edges. I added TODO description for this (and following one). Is it >> okay to pick up this in the future? > > Yes. > >>> >>> @@ -1773,6 +2003,9 @@ determine_unroll_factor (vec<chain_p> chains) >>> { >>> if (chain->type == CT_INVARIANT) >>> continue; >>> + /* Don't unroll when eliminating stores. */ >>> + else if (chain->type == CT_STORE_STORE) >>> + return 1; >>> >>> this is a hard exit value so we do not handle the case where another chain >>> in the loop would want to unroll? (enhancement?) I'd have expected to do >>> the same as for CT_INVARIANT here. >> I didn't check what change is needed in case of unrolling. I am not >> very sure if we should prefer unroll for *load chains or prefer not >> unroll for store-store chains, because unroll in general increases >> loop-carried register pressure for store-store chains rather than >> decreases register pressure for *load chains. >> I was also thinking if it's possible to restrict unrolling somehow in >> order to enable predcom at O2. BTW, this is not common, it only >> happens once in spec2k6 with factor forced to 1. So okay if as it is >> now? > > I think it is ok for now with a TODO added. Please change the comment > to "we can't handle unrolling when eliminating stores" (it's not clear if we > can -- did you try? maybe add a testcase that would ICE) > >> >>> >>> + tree init = ref_at_iteration (dr, (int) 0 - i, &stmts); >>> + if (!chain->all_always_accessed && tree_could_trap_p (init)) >>> + { >>> + gimple_seq_discard (stmts); >>> + return false; >>> >>> so this is the only place that remotely cares for not always performed >>> stores. >>> But as-is the patch doesn't seem to avoid speculating stores and thus >>> violates the C++ memory model, aka, introduces store-data-races? The LIM >>> store-motion code was fixed to avoid this by keeping track of whether a BB >>> has executed to guard the stores done in the compensation code on the loop >>> exit. >>> >>> That said, to "fix" this all && tree_could_trap_p cases would need to be >>> removed >>> (or similarly flag vars be introduced). Speculating loads that do not >>> trap is ok >>> (might only introduce false uninit use reports by tools like valgrind). >> Hmm, not sure IIUC. Patch updated, is it correct (though conservative)? > > +static bool > +prepare_initializers_chain_store_elim (struct loop *, chain_p chain) > +{ > ... > + tree init = ref_at_iteration (dr, (int) 0 - i, &stmts); > + if (!chain->all_always_accessed && tree_could_trap_p (init)) > + { > > remove && tree_could_trap_p and hoist the other check. > > Same in prepare_finalizers_chain. I think you should check > > if (! chain->all_always_accessed > && ! PARAM_VALUE (PARAM_ALLOW_STORE_DATA_RACES)) > return false; > > at the beginning of both functions and retain > > if (! chain->all_always_accessed && tree_could_trap_p ()) > > in the loops. > > Ok with that change and a testcase that would exercise failure/ICE of > store chains w/ unrolling. Hmm, now I remember maybe these all_always_accessed/trap/data_store_race checks can be simplified. In function suitable_component_p, we call just_once_each_iteration_p for all references. So we shouldn't not end up with chain->all_always_accessed == false cases, right? why means we don't really need to check at all?
Thanks, bin > > Thanks, > Richard. > >> Thanks, >> bin >>> >>> Thanks, >>> Richard. >>> >>>> Thanks, >>>> bin >>>> 2017-06-21 Bin Cheng <bin.ch...@arm.com> >>>> >>>> * tree-predcom.c: Revise general description of pass. >>>> (enum chain_type): New enum type for store elimination. >>>> (struct chain): New field supporting store elimination. >>>> (dump_chain): Dump store-stores chain. >>>> (release_chain): Release resources. >>>> (split_data_refs_to_components): Compute and create component >>>> contains only stores for elimination. >>>> (get_chain_last_ref_at): New function. >>>> (make_invariant_chain): Initialization. >>>> (make_rooted_chain): Specify chain type in parameter. >>>> (add_looparound_copies): Skip for store-stores chain. >>>> (determine_roots_comp): Compute type of chain and pass it to >>>> make_rooted_chain. >>>> (initialize_root_vars_store_elim_2): New function. >>>> (finalize_eliminated_stores): New function. >>>> (remove_stmt): Handle store for elimination. >>>> (execute_pred_commoning_chain): Execute predictive commoning on >>>> store-store chains. >>>> (determine_unroll_factor): Skip unroll for store-stores chain. >>>> (prepare_initializers_chain_store_elim): New function. >>>> (prepare_initializers_chain): Hanlde store-store chain. >>>> (prepare_finalizers_chain, prepare_finalizers): New function. >>>> (tree_predictive_commoning_loop): Return integer value indicating >>>> if loop is unrolled or lcssa form is corrupted. >>>> (tree_predictive_commoning): Rewrite for lcssa form if necessary. >>>> >>>> gcc/testsuite/ChangeLog >>>> 2017-06-21 Bin Cheng <bin.ch...@arm.com> >>>> >>>> * gcc.dg/tree-ssa/predcom-dse-1.c: New test. >>>> * gcc.dg/tree-ssa/predcom-dse-2.c: New test. >>>> * gcc.dg/tree-ssa/predcom-dse-3.c: New test. >>>> * gcc.dg/tree-ssa/predcom-dse-4.c: New test. >>>> * gcc.dg/tree-ssa/predcom-dse-5.c: New test. >>>> * gcc.dg/tree-ssa/predcom-dse-6.c: New test. >>>> * gcc.dg/tree-ssa/predcom-dse-7.c: New test. >>>> * gcc.dg/tree-ssa/predcom-dse-8.c: New test. >>>> * gcc.dg/tree-ssa/predcom-dse-9.c: New test.