Re: [Beowulf] MPI application benchmarks

Toon Knapen Mon, 07 May 2007 23:51:04 -0700

Robert G. Brown wrote:

Perhaps fortunately (perhaps not) there is a lot less variation in
system performance with system design than there once was.  Everybody
uses one of a few CPUs, one of a few chipsets, generic memory,
standardized peripherals.  There can be small variations from system to
system, but in many cases one can get a pretty good idea of the
nonlinear "performance fingerprint" of a given CPU/OS/compiler family
(e.g. opteron/linux/gcc) all at once and have it not be crazy wrong or
unintelligible as you vary similar systems from different manufacturers
or vary clock speed within the family.  There are enough exceptions that
it isn't wise to TRUST this rule, but it is still likely correct within
10% or so.

I agree that this rule is true for almost all codes ... that areperfectly in cache and that do not try to benefit from specificoptimisations.

HPC codes however are always pushing the limits and this means you willalways stumble on some bottleneck somewhere. Once you removed thebottleneck, you stumble on another. And every bottleneck mask all othersuntil you remove it.

E.g. it was already mentioned in this thread that one should not forgetto pay attention to storage. However often people run parallel codeswith each process performing heave IO without an adapted storage system.

Or another example, GotoBLAS is well known to outperform netlib-blas.However, in an application calling many dgemm's on small matrices (up to50x50), netlib-blas will _really_ (i.e. a factor 30) outperform GotoBLAS(because GotoBLAS 'looses' time aligning the matrices etc. which becomessignificant for small matrices)


toon
_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] MPI application benchmarks

Reply via email to