Hi,

This patch tweaks autoprefetcher heuristic in scheduler to better group memory 
loads and stores together.

From https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91598:

There are two separate changes, both related to instruction scheduler, that 
cause the regression.  The first change in r253235 is responsible for 70% of 
the regression.
===
    haifa-sched: fix autopref_rank_for_schedule qsort comparator
    
            * haifa-sched.c (autopref_rank_for_schedule): Order 'irrelevant' 
insns
            first, always call autopref_rank_data otherwise.
    
    
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@253235 
138bc75d-0d04-0410-961f-82ee72b054a4
===

After this change instead of
r1 = [rb + 0]
r2 = [rb + 8]
r3 = [rb + 16]
r4 = <math with r1>
r5 = <math with r2>
r6 = <math with r3>

we get
r1 = [rb + 0]
<math with r1>
r2 = [rb + 8]
<math with r2>
r3 = [rb + 16]
<math with r3>

which, apparently, cortex-a53 autoprefetcher doesn't recognize.  This schedule 
happens because r2= load gets lower priority than the "irrelevant" <math with 
r1> due to the above patch.

If we think about it, the fact that "r1 = [rb + 0]" can be scheduled means that 
true dependencies of all similar base+offset loads are resolved.  Therefore, 
for autoprefetcher-friendly schedule we should prioritize memory reads before 
"irrelevant" instructions.

On the other hand, following similar logic, we want to delay memory stores as 
much as possible to start scheduling them only after all potential producers 
are scheduled.  I.e., for autoprefetcher-friendly schedule we should prioritize 
"irrelevant" instructions before memory writes.

Obvious patch to implement the above is attached.  It brings 70% of regressed 
performance on this testcase back.

OK to commit?

Regards,

--
Maxim Kuvyrkov
www.linaro.org


Attachment: 0001-Improve-autoprefetcher-heuristic-partly-fix-regressi.patch
Description: Binary data

Reply via email to