> > OK, so it is about 2%.  Did you try if you need lookahead even in the early 
> > pass (before reload)?  My guess would be so, but if not, it could cut the 
> > cost to half.  For -Ofast/-O3 it looks resonable to me, but we will  need 
> > to announce it on the ML.  For other settings I think we need to work on 
> > more improvements or cut the expenses.
> 
> Yes, it is required before reload.  
> 
> I have another idea which can be pondered upon. Currently, can we enable 
> lookahead with the value 4 (pre reload) for default? This will exponentially 
> cut the cost of build time. 
> I have done some measurements on the build time of some benchmarks (mentioned 
> below) with lookahead value 4. The 2% increase in build time with value 8 is 
> now almost gone.
> 
>                    dfa4       no_lookahead
>  
>  perlbench       - 191s          193s
>  bzip2           - 19s           19s
>  gcc             - 429s          429s
>  mcf             - 3s            3s
>  gobmk           - 116s          115s
>  hmmer           - 60s           60s
>  sjeng           - 18s           17s
>  libquantum      - 6s            6s
>  h264ref         - 107s          107s
>  omnetpp         - 128s          128s
>  astar           - 7s            7s
>  bwaves          - 5s            5s
>  gamess          - 1964s         1957s
>  milc            - 18s           18s
>  GemsFDTD        - 273s          272s
> 
> Lookahead value 4 also helps because, the modified decoder model in bdver3.md 
> is only two cycles deep (though in hardware it is actually 4 cycles deep). 
> This means that we can look another two levels deep for better schedule.
> GemsFDTD still retains the performance boost of around 6-7% with value 4.
> 
> Let me know your thoughts.

This seems resonable.  I would go for lookahead of 4 for now and 8 for -Ofast
and we can tune things based on the experience with this setting incrementally.
Uros, Richard, what do you think?

Honza

Reply via email to