On Fri, Jun 23, 2017 at 11:48 AM, Richard Biener
<richard.guent...@gmail.com> wrote:
> On Fri, Jun 23, 2017 at 12:19 PM, Bin.Cheng <amker.ch...@gmail.com> wrote:
>> On Mon, Jun 19, 2017 at 4:20 PM, Richard Biener
>> <richard.guent...@gmail.com> wrote:
>>> On Mon, Jun 19, 2017 at 3:40 PM, Bin.Cheng <amker.ch...@gmail.com> wrote:
>>>> On Wed, Jun 14, 2017 at 2:54 PM, Richard Biener
>>>> <richard.guent...@gmail.com> wrote:
>>>>> On Mon, Jun 12, 2017 at 7:03 PM, Bin Cheng <bin.ch...@arm.com> wrote:
>>>>>> Hi,
>>>>>> Current primitive cost model merges partitions with data references 
>>>>>> sharing the same
>>>>>> base address.  I believe it's designed to maximize data reuse in 
>>>>>> distribution, but
>>>>>> that should be done by dedicated data reusing algorithm.  At this stage 
>>>>>> of merging,
>>>>>> we should be conservative and only merge partitions with the same 
>>>>>> references.
>>>>>> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>>>>>
>>>>> Well, I'd say "conservative" is merging more, not less.  For example
>>>>> splitting a[i+1] from a[i]
>>>>> would be bad(?), so I'd see to allow unequal DR_INIT as "equal" for
>>>>> merging.  Maybe
>>>>> DR_INIT within a cacheline or so.
>>>>>
>>>>> How many extra distributions in say SPEC do you get from this change 
>>>>> alone?
>>>> Hi,
>>>> I collected data for spec2006 only with/without this patch.  I am a
>>>> bit surprised that it doesn't change the number of distributed loops.
>>>>>
>>>>> It shows also that having partition->reads_and_writes would be nice
>>>>> ...  the code duplication
>>>> Yeah, I merged read/write data references in previous patch, now this
>>>> duplication is gone.  Update patch attached.  Is it OK?
>>>
>>> +      gcc_assert (i < datarefs_vec.length ());
>>> +      dr1 = datarefs_vec[i];
>>>
>>> these asserts are superfluous -- vec::operator[] does them as well.
>>>
>>> Ok if you remove them.
>> Done.
>> I realized I made mistakes when measuring the impact of this patch.
>> This patch only apparently causes failure of
>> gcc.dg/tree-ssa/ldist-6.c, so here is the updated patch.  I also
>> collected the number of distributed loops in spec2k6 as below:
>>      trunk:  5882
>>      only this patch: 7130
>>      whole patch series: 5237
>> So the conclusion is, this patch does aggressive distribution like
>> ldist-6.c, which means worse data-locality.  The following patch does
>> more fusion which mitigates impact of this patch and results in
>> conservative distribution overall.
>
> What changed in the patch?  Did you attach the correct one?
No code changed in this one.  I just added test case change which
can't be resolved by following patches.  ldist-6.c slipped away
because of a bug in patch:

[11/13]Annotate partition by its parallelism execution type

>
> I'm not sure ldist-6.c is a "valid" testcase but I didn't try to see
> where it was reduced from.
>
>>   But as we lack of data locality
>> cost model, ldist-6.c remains failed even after applying whole patch
>> series.  Hmm, a cache-sensitive cost model is need for several passes
>> now, distribution, prefetch and (possible) interchange.
>> Richard, do you have second comment based on the new data?
>
> I expected the "only this patch" result somewhat, as said, I'd have
> allowed "related" references to fuse by not requiring equal
> DR_INIT for example.
>
> I suggest to go forward with it in its current form.  We can tweak the
> cost model later.
Yeah.
>
> Thanks,
> Richard.
>
>> Thanks,
>> bin
>> 2017-06-20  Bin Cheng  <bin.ch...@arm.com>
>>
>>     * tree-loop-distribution.c (ref_base_address): Delete.
>>     (similar_memory_accesses): Rename ...
>>     (share_memory_accesses): ... to this.  Check if partitions access
>>     the same memory reference.
>>     (distribute_loop): Call share_memory_accesses.
>>
>> gcc/testsuite/ChangeLog
>> 2017-06-20  Bin Cheng  <bin.ch...@arm.com>
>>
>>     * gcc.dg/tree-ssa/ldist-6.c: XFAIL.

Reply via email to