On Fri, Jun 23, 2017 at 11:48 AM, Richard Biener <richard.guent...@gmail.com> wrote: > On Fri, Jun 23, 2017 at 12:19 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: >> On Mon, Jun 19, 2017 at 4:20 PM, Richard Biener >> <richard.guent...@gmail.com> wrote: >>> On Mon, Jun 19, 2017 at 3:40 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: >>>> On Wed, Jun 14, 2017 at 2:54 PM, Richard Biener >>>> <richard.guent...@gmail.com> wrote: >>>>> On Mon, Jun 12, 2017 at 7:03 PM, Bin Cheng <bin.ch...@arm.com> wrote: >>>>>> Hi, >>>>>> Current primitive cost model merges partitions with data references >>>>>> sharing the same >>>>>> base address. I believe it's designed to maximize data reuse in >>>>>> distribution, but >>>>>> that should be done by dedicated data reusing algorithm. At this stage >>>>>> of merging, >>>>>> we should be conservative and only merge partitions with the same >>>>>> references. >>>>>> Bootstrap and test on x86_64 and AArch64. Is it OK? >>>>> >>>>> Well, I'd say "conservative" is merging more, not less. For example >>>>> splitting a[i+1] from a[i] >>>>> would be bad(?), so I'd see to allow unequal DR_INIT as "equal" for >>>>> merging. Maybe >>>>> DR_INIT within a cacheline or so. >>>>> >>>>> How many extra distributions in say SPEC do you get from this change >>>>> alone? >>>> Hi, >>>> I collected data for spec2006 only with/without this patch. I am a >>>> bit surprised that it doesn't change the number of distributed loops. >>>>> >>>>> It shows also that having partition->reads_and_writes would be nice >>>>> ... the code duplication >>>> Yeah, I merged read/write data references in previous patch, now this >>>> duplication is gone. Update patch attached. Is it OK? >>> >>> + gcc_assert (i < datarefs_vec.length ()); >>> + dr1 = datarefs_vec[i]; >>> >>> these asserts are superfluous -- vec::operator[] does them as well. >>> >>> Ok if you remove them. >> Done. >> I realized I made mistakes when measuring the impact of this patch. >> This patch only apparently causes failure of >> gcc.dg/tree-ssa/ldist-6.c, so here is the updated patch. I also >> collected the number of distributed loops in spec2k6 as below: >> trunk: 5882 >> only this patch: 7130 >> whole patch series: 5237 >> So the conclusion is, this patch does aggressive distribution like >> ldist-6.c, which means worse data-locality. The following patch does >> more fusion which mitigates impact of this patch and results in >> conservative distribution overall. > > What changed in the patch? Did you attach the correct one? No code changed in this one. I just added test case change which can't be resolved by following patches. ldist-6.c slipped away because of a bug in patch:
[11/13]Annotate partition by its parallelism execution type > > I'm not sure ldist-6.c is a "valid" testcase but I didn't try to see > where it was reduced from. > >> But as we lack of data locality >> cost model, ldist-6.c remains failed even after applying whole patch >> series. Hmm, a cache-sensitive cost model is need for several passes >> now, distribution, prefetch and (possible) interchange. >> Richard, do you have second comment based on the new data? > > I expected the "only this patch" result somewhat, as said, I'd have > allowed "related" references to fuse by not requiring equal > DR_INIT for example. > > I suggest to go forward with it in its current form. We can tweak the > cost model later. Yeah. > > Thanks, > Richard. > >> Thanks, >> bin >> 2017-06-20 Bin Cheng <bin.ch...@arm.com> >> >> * tree-loop-distribution.c (ref_base_address): Delete. >> (similar_memory_accesses): Rename ... >> (share_memory_accesses): ... to this. Check if partitions access >> the same memory reference. >> (distribute_loop): Call share_memory_accesses. >> >> gcc/testsuite/ChangeLog >> 2017-06-20 Bin Cheng <bin.ch...@arm.com> >> >> * gcc.dg/tree-ssa/ldist-6.c: XFAIL.