Vladimir Makarov wrote:
Luis Machado wrote:
Upon further investigation on facerec's regression, it looks like the
code generated by the IRA-enabled gcc has many more spills than the one
with a disabled IRA, twice or sometimes three times more.
I'm trying to reduce the testcase a bit further so it's simpler to
analyse.
I am going to look at facerec too. Only, please, don't expect all
this problem will be solved soon.
Analysis of 187.facerec problem was actually easier than applu one.
It has one very hot (80%) function localmove::graphRoutines.f90 and
there is only one hot loop in the function. Although the loop is
pretty big because of inlining TopCostFct. The loop contains a few
if-statements and several switch-statements. But only part of the loop
body is hot.
After comparisons of the hot loop parts I found that IRA generates
about 20 insns more which are some stores but mostly load. I did not
find any problem with spilling in reload (that is after fixing one
spilling problem about which I wrote in my previous email). Reload
spills only two registers which lives throughout the hot parts.
So I concluded that the problem was actually in IRA allocation. I
did not find any wrong in IRA implementation of coloring algorithm.
Changing spill first heuristic there (we choose allocno with smaller
number to spill) from
SPILL_COST / (NUMBER_OF_LEFT_CONFLICTS * NUMBER_HARD_REGISTER_NEEDED
+ 1)
to
SPILL_COST * log (NREFS) / (NUMBER_OF_LEFT_CONFLICTS
* NUMBER_HARD_REGISTER_NEEDED
* LIVE_RANGE_LENGTH + 1)
increased facerec score on Power6 from 1760 to 2039 (vs 1850 for the
old RA). It looks very good but unfortunately the overall SPECFP2000
score was not changed much and SPECINT2000 was not change at all.
Some programs score became better the rest became worse.
This is classical example of heuristic approach drawback. You never
get best code using one heuristic for all programs. Sometimes the old
RA will generate better code. What are goal should be is to achieve
better code in "average", i.e. for some credible benchmark like
SPECFP2000 and we can achieve this with IRA.
It is hard to say what heuristic will work best for given program
and make the choice automatically (especially without profiling
because branch probability prediction algorithm is not that accurate).
Although we could add an option to choose different spill heuristics
in hope to achieve best overall score using machine learning algorithm
like Google or Netflix use to find the better prediction
(e.g. neighborhood based search). There is one promising project for
GCC using this approach. It was reported by Grigory Fursin on this
year GCC Summit. I think it is promising because different spill
heuristics results in very different times (about 20% for facerec).
Probably I'll submit the new heuristic because SPECFP2000 score a
bit better (about 0.5%) with using this. But I need some time to
check it on other platforms.