On Thu, Aug 12, 2010 at 08:47:34PM +0200, Toon Moene wrote: > Steve Kargl wrote: > > ># gfc4x 9.814 9.358 8.622 9.810 Note1 9.172 8.958 9.022 > > Column 5 compiled with -march=native -O2 -ffast-math > > ># Note 1: STOP DLAMC1 failure (10) > > That's probably because a standard compile of the LAPACK sources only > compiles {S|D}LAM* with -O0. > > The code is simply not written for any higher optimization (i.e., it > assumes the compiler more or less compiles it "literally"). >
Your observation re-enforces the notion that doing benchmarks properly is difficult. I forgot about the lapack inquiry routines. One would think that some 20+ year after F90, that Dongarra and colleagues would use the intrinsic numeric inquiry functions. Although the accumulated time is small, DLAMCH() is called 2642428 times during execution. Everything returned by DLAMCH() can be reduced to a compile time constant. -- Steve