> > OK, so it is about 2%. Did you try if you need lookahead even in the early > > pass (before reload)? My guess would be so, but if not, it could cut the > > cost to half. For -Ofast/-O3 it looks resonable to me, but we will need > > to announce it on the ML. For other settings I think we need to work on > > more improvements or cut the expenses. > > Yes, it is required before reload. > > I have another idea which can be pondered upon. Currently, can we enable > lookahead with the value 4 (pre reload) for default? This will exponentially > cut the cost of build time. > I have done some measurements on the build time of some benchmarks (mentioned > below) with lookahead value 4. The 2% increase in build time with value 8 is > now almost gone. > > dfa4 no_lookahead > > perlbench - 191s 193s > bzip2 - 19s 19s > gcc - 429s 429s > mcf - 3s 3s > gobmk - 116s 115s > hmmer - 60s 60s > sjeng - 18s 17s > libquantum - 6s 6s > h264ref - 107s 107s > omnetpp - 128s 128s > astar - 7s 7s > bwaves - 5s 5s > gamess - 1964s 1957s > milc - 18s 18s > GemsFDTD - 273s 272s > > Lookahead value 4 also helps because, the modified decoder model in bdver3.md > is only two cycles deep (though in hardware it is actually 4 cycles deep). > This means that we can look another two levels deep for better schedule. > GemsFDTD still retains the performance boost of around 6-7% with value 4. > > Let me know your thoughts.
This seems resonable. I would go for lookahead of 4 for now and 8 for -Ofast and we can tune things based on the experience with this setting incrementally. Uros, Richard, what do you think? Honza