Here is my report on Fortran benchmarking. I compare the trunk dated
20080507 (no revision number, sorry) and the IRA branch rev. 135035. I
run the Polyhedron benchmark
(http://www.polyhedron.co.uk/polyhedron_benchmark_suite0html) which is
probably the most widely used benchmark in the Fortran community. I
don't have many points, but they're very well converged (the
"standard" parameters were used, which means each test and each set of
compilation option is run between 10 and 100 times, until timing
standard deviation becomes less than 0.1%).
I compile with -march=native -ffast-math -funroll-loops -O3 and run on
a dual-core biprocessor machine with 8GB RAM. /proc/cpuinfo says it's
a Dual-Core AMD Opteron(tm) Processor 2220 running around 2.8 GHz.
Full timings are below, with a summary here:
Overall (judged by geometric mean exec time), IRA introduces a 2.2%
regression in execution time (and a 2.7% regression in compilation
time, consistent with my previous mail). Using the CB algorithm
doesn't change this significantly.
The performance regression is mainly due to one testcase, induct,
which is taking a 30% hit on IRA. If the performance of that one were
the same with IRA than with the old allocator, the switch would be
(for this benchmark) performance-neutral. So, I have investigated the
case of induct, and I found that with the IRA branch compiler without
-fira, it's already 30% slower than with trunk. So, is it an issue
with the IRA branch, or has it just not been merged recently and we
had a recent great improvement of induct on trunk? I'd appreciate if
you could enlighten me on this point.
So, other than that small question, everything seems mostly good on
the Fortran performance front.
Cheers,
FX
Comparison of execution time (see in fixed-width font):
Benchmark Execution time, compared to mainline
Name IRA IRA-CB
--------- -------- --------
ac +1.59% +6.80%
aermod +5.87% +3.14%
air -0.33% -0.83%
capacita +5.17% +2.58%
channel +0.30% 0.00%
doduc -3.61% -3.61%
fatigue -0.93% -2.67%
gas_dyn 0.99% +2.48%
induct +30.28% +29.64%
linpk -1.80% -1.57%
mdbx -2.19% -2.74%
nf +0.74% -0.30%
protein +1.30% +1.58%
rnflow +0.16% +0.22%
test_fpu -0.83% -0.39%
tfft -0.72% +0.14%
----------------------------------
geometric mean +2.25% +2.16%
Detailed timings for mainline:
Benchmark Compile Executable Ave Run Number Estim
Name (secs) (bytes) (secs) Repeats Err %
--------- ------- ---------- ------- ------- ------
ac 7.36 1175251 11.32 15 0.0938
aermod 90.03 2424785 38.82 14 0.0866
air 6.83 1365405 12.04 19 0.0983
capacita 2.54 1235764 78.93 23 0.0785
channel 1.66 1254613 10.12 19 0.0885
doduc 13.20 1416729 35.21 13 0.0870
fatigue 6.20 1299862 8.60 12 0.0951
gas_dyn 6.45 1269413 10.08 100 0.1026
induct 19.57 1593762 34.38 10 0.0965
linpk 1.43 1162116 26.17 77 0.2626
mdbx 3.37 1192451 16.41 24 0.0939
nf 7.65 1217536 29.72 68 0.1240
protein 12.77 1342400 57.54 10 0.0942
rnflow 12.81 1357019 31.42 12 0.0976
test_fpu 11.78 1331485 18.07 24 0.0879
tfft 1.13 1173880 6.91 24 0.0991
Geometric Mean Execution Time = 20.85 seconds
Timing for IRA branch with -fira:
Benchmark Compile Executable Ave Run Number Estim
Name (secs) (bytes) (secs) Repeats Err %
--------- ------- ---------- ------- ------- ------
ac 6.06 1158971 11.50 15 0.0979
aermod 94.22 2421896 41.10 12 0.0725
air 7.07 1352645 12.00 23 0.0899
capacita 2.89 1221980 83.01 25 0.1860
channel 1.81 1241539 10.15 31 0.0879
doduc 15.20 1404025 33.94 10 0.0628
fatigue 6.17 1273630 8.52 14 0.0966
gas_dyn 7.79 1256267 10.18 32 0.0920
induct 14.28 1567935 44.79 12 0.0772
linpk 1.44 1145546 25.70 77 0.0920
mdbx 3.54 1181755 16.05 15 0.0588
nf 7.73 1205207 29.94 66 0.0890
protein 12.89 1325392 58.29 10 0.0458
rnflow 12.45 1340531 31.47 12 0.0570
test_fpu 12.18 1312704 17.92 58 0.0797
tfft 1.28 1158396 6.86 32 0.0853
Geometric Mean Execution Time = 21.27 seconds
Timing for IRA branch with -fira -fira-algorithm=CB:
Benchmark Compile Executable Ave Run Number Estim
Name (secs) (bytes) (secs) Repeats Err %
--------- ------- ---------- ------- ------- ------
ac 6.33 1158907 12.09 14 0.0943
aermod 89.54 2421640 40.04 14 0.0877
air 7.44 1352613 11.94 30 0.0841
capacita 2.79 1221980 80.97 25 0.2601
channel 1.75 1241411 10.12 24 0.0909
doduc 14.12 1403417 33.94 10 0.0438
fatigue 5.90 1273630 8.37 16 0.0884
gas_dyn 7.01 1256267 10.33 38 0.0855
induct 13.74 1568287 44.57 13 0.0978
linpk 2.50 1145546 25.76 78 0.2625
mdbx 3.53 1181979 15.96 49 0.0619
nf 7.91 1205207 29.63 68 0.1055
protein 12.36 1325264 58.45 10 0.0717
rnflow 11.78 1340083 31.49 17 0.0892
test_fpu 11.49 1311040 18.00 18 0.0615
tfft 1.24 1158492 6.92 25 0.0807
Geometric Mean Execution Time = 21.25 seconds
--
FX Coudert
http://www.homepages.ucl.ac.uk/~uccafco/