Re: food for optimizer developers

Vladimir Makarov Wed, 11 Aug 2010 14:05:01 -0700

 On 08/10/2010 09:51 PM, Ralf W. Grosse-Kunstleve wrote:

I wrote a Fortran to C++ conversion program that I used to convert selected
LAPACK sources. Comparing runtimes with different compilers I get:


                          absolute  relative
ifort 11.1.072            1.790s    1.00
gfortran 4.4.4            2.470s    1.38
g++ 4.4.4                 2.922s    1.63

To get a full picture, it would be nice to see icc times too.

This is under Fedora 13, 64-bit, 12-core Opteron 2.2GHz

All files needed to easily reproduce the results are here:

   http://cci.lbl.gov/lapack_fem/

See the README file or the example commands below.

Questions:

- Is there a way to make the g++ version as fast as ifort?

I think it is more important (and harder) to make gfortran closer to ifort.

I can not say about your fragment of LAPACK. But about 15 years ago Iworked on manual LAPACK optimization for an Alpha processor. As Iremember LAPACK is quite memory bound benchmark. The hottest spot wasmatrix multiplication which is used in many LAPACK places. The matrixmultiplication in LAPACK is already moderately optimized by usingtemporary variable and that makes it 1.5 faster (if cache is not enoughto hold matrices) than normal algorithm. But proper loop optimizations(tiling mostly) could improve it in more 4 times.

So I guess and hope graphite project finally will improve LAPACK byimplementing tiling.

After solving memory bound problem, loop vectorization is anotherimportant optimization which could improve LAPACK. Unfortunately, GCCvectorizes less loops (it was about 2 time less when last time Ichecked) than ifort. I did not analyze what is the reason for this.

After solving vectorization problem, another important lower-level loopoptimization is modulo scheduling (even if modern x86/x86_64 processorare out of order) because OOO processors can look only through a fewbranches. And as I remember, Intel compiler does make modulo schedulingfrequently. GCC modulo-scheduling is quite constraint.

That is my thoughts but I might be wrong because I have no time toconfirm my speculations. If you really want to help GCC developers, youcould make comparison analysis of the code generated by ifort andgfortran and find what optimizations GCC misses. GCC has few resourcesand developers who could solve the problems are very busy. Inteloptimization compiler team (besides researchers) is much bigger thanwhole GCC community. Taking this into account and that they have muchmore info about their processors, I don't think gfortran will generate abetter or equal code for floating point benchmarks in near future.

Re: food for optimizer developers

Reply via email to