On Fri, Jan 16, 2015 at 3:06 PM, Maxim Kuvyrkov
<maxim.kuvyr...@linaro.org> wrote:
> On Nov 19, 2014, at 12:27 PM, Ramana Radhakrishnan 
> <ramana.radhakrish...@arm.com> wrote:
>
>>
>
> Hi Ramana,
> Hi Vladimir,
>
> I still don't have SPEC2000/SPEC2006 benchmark numbers for this patch.  Since 
> stage3 is about to finish, I'm going to commit the target independent part of 
> the patch now (it was approved by Vladimir a while ago).
>
> I'll commit ARM part of the patch once Ramana is happy with it.
>
> I've addressed all review comments and updated patch is attached.
>
> Any objections?
>
>> On 14/11/14 15:12, Maxim Kuvyrkov wrote:
>>> On Nov 14, 2014, at 8:38 AM, Jeff Law <l...@redhat.com> wrote:
>>>
>>>> On 10/20/14 22:06, Maxim Kuvyrkov wrote:
>>>>> Hi,
>>>>> Ramana, this change requires benchmarking, which I can't easily do
>>>>> at
>>>> the moment. I would appreciate any benchmarking results that you can
>>>> share. In particular, the value of PARAM_SCHED_AUTOPREF_QUEUE_DEPTH
>>>> needs to be tuned/confirmed for Cortex-A15.
>>>> What were the results of that benchmarking?  IIRC I tabled reviewing this 
>>>> work waiting for those results (and I probably should have let you know 
>>>> that.  Sorry,  my bad there).
>>>
>>> I don't have the benchmarking results yet, and I was hoping for ARM to help 
>>> with getting the numbers.  The arm maintainers still need to OK the 
>>> arm-specific portion of the patch, which, I imagine, will happen only of 
>>> benchmark scores improve.
>>
>>
>> I tried benchmarking 78f367cfcfdc9f0a422a362cd85ecc122834d96f from the trees 
>> you gave me links to against the equivalent version on trunk.
>>
>> The results with SPEC2k on A15 were in the noise with the default value for 
>> PARAM_SCHED_AUTOPREF_QUEUE_DEPTH which is 2 in the backend. I'm still 
>> waiting on results for values 0, 1 and 3 and hopefully something will come 
>> back soon for SPEC2k.
>
> After going back to the original benchmark that exposed the autoprefetcher 
> issue (STREAMS) the required value of the parameter to fix the regression 
> turned out to be 5.
>
> I have also discovered that Cortex-A57 also benefits from this optimization.  
> Running STREAMS benchmark on Juno in AArch32 mode showed increased 
> performance by 10-15%, which is not as big as Cortex-A15's 250-300%, but 
> still significant.
>
> In this updated patch the parameter is set to maximum -- look through all 
> scheduling queue of instructions in search of memory ops that are relevant to 
> L2 autoprefetcher.  In practice the setting to maximum will not change much, 
> since there are rarely many instructions in the queue.
>
>>
>>
>>>
>>> @@ -29903,6 +29915,20 @@ arm_first_cycle_multipass_dfa_lookahead (void)
>>>   return issue_rate > 1 ? issue_rate : 0;
>>> }
>>>
>>> +/* Enable modeling of Cortex-A15 L2 auto-prefetcher.  */
>>> +static int
>>> +arm_first_cycle_multipass_dfa_lookahead_guard (rtx insn, int ready_index)
>>> +{
>>> +  switch (arm_tune)
>>> +    {
>>> +    case cortexa15:
>>> +      return autopref_multipass_dfa_lookahead_guard (insn, ready_index);
>>> +
>>> +    default:
>>> +      return 0;
>>> +    }
>>> +}
>>> +
>>
>>
>> It would be better to have this as a flag in the tuning tables rather than 
>> hardcoding for a core here. The backend has been moving in that direction 
>> for all core centric information and it is preferable that be continued.
>>
>> So this logic here should just be
>>
>> if (current_tune->multipass_lookahead)
>>  return autopref_multipass_lookahead_guard (insn, ready_index);
>> else
>>  return 0;
>>
>
> Done.  The autoprefetcher is enabled in Cortex-A15 and Cortex-A57 tuning 
> tables.


The ARM changes look sane. As long as you've tested this with a
bootstrap, I'm ok with it - we probably need a similar change for
AArch64 and Cortex-A57 but I think we should do that for the next
stage1. Thanks for persevering with this.


Ramana

>
> --
> Maxim Kuvyrkov
> www.linaro.org
>
>

Reply via email to