Hi, This patch tweaks autoprefetcher heuristic in scheduler to better group memory loads and stores together.
From https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91598: There are two separate changes, both related to instruction scheduler, that cause the regression. The first change in r253235 is responsible for 70% of the regression. === haifa-sched: fix autopref_rank_for_schedule qsort comparator * haifa-sched.c (autopref_rank_for_schedule): Order 'irrelevant' insns first, always call autopref_rank_data otherwise. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@253235 138bc75d-0d04-0410-961f-82ee72b054a4 === After this change instead of r1 = [rb + 0] r2 = [rb + 8] r3 = [rb + 16] r4 = <math with r1> r5 = <math with r2> r6 = <math with r3> we get r1 = [rb + 0] <math with r1> r2 = [rb + 8] <math with r2> r3 = [rb + 16] <math with r3> which, apparently, cortex-a53 autoprefetcher doesn't recognize. This schedule happens because r2= load gets lower priority than the "irrelevant" <math with r1> due to the above patch. If we think about it, the fact that "r1 = [rb + 0]" can be scheduled means that true dependencies of all similar base+offset loads are resolved. Therefore, for autoprefetcher-friendly schedule we should prioritize memory reads before "irrelevant" instructions. On the other hand, following similar logic, we want to delay memory stores as much as possible to start scheduling them only after all potential producers are scheduled. I.e., for autoprefetcher-friendly schedule we should prioritize "irrelevant" instructions before memory writes. Obvious patch to implement the above is attached. It brings 70% of regressed performance on this testcase back. OK to commit? Regards, -- Maxim Kuvyrkov www.linaro.org
0001-Improve-autoprefetcher-heuristic-partly-fix-regressi.patch
Description: Binary data