Re: [Beowulf] Performance characterising a HPC application

Patrick Geoffray Thu, 22 Mar 2007 23:33:25 -0800

Greg,

Greg Lindahl wrote:

Compare the latency numbers in HPC Challenge to the 2-node ping-pong
latency reported by vendors. For some vendors, it's the same number.
For others, the latency from using all the nodes is much, much higher.

The ring test in HPC is rather poorly implemented: 3 iterations only tomeasure something in the same order of magnitude than the precision ofMPI_Wtime(). Someone just failed Benchmarking 101. If you replace agettimeofday() implementation of MPI_Wtime() by a cycle counter one,then the numbers change quite a bit.

However, I agree with you, this is the right way to measure thesensitivity to concurrent traffic.

Note that the new MVAPICH has message coalescing, which causes its

It is unbelievable that so few people denounce it. It is clearlyimplemented only to cheat on a micro-benchmark. What's next ? Checkingthat the buffer to send is identical to the previous one to avoidsending "redundant" messages in ping-pong ?!?

message each to lots of other nodes before synchronizing. Message rate
benchmarks like "base" HPCC Gups get no benefit from message
coalescing.

HPCC Gups already does some sort of coalescing. If updates are going tothe same process, then they are put in the same bucket. The size ofmessages depend on the number of updates in the buckets, so smallernumber of nodes means bigger messages. I don't understand why they woulddo that, it defeats the goal of scalability testing.

HPC Challenge is much better than what has come before, but it too can

I think HPCC is somewhat a regression compared to the NAS for example.The communication benchmarks are too analytic, not functional enough.

intra-node. And guess what? HPCC results are hard to come by, even though
it's pretty easy to run.

And HPCC is a pain in the bottom to compile and run. HPL is not really ashinning example of straightforward build process, and configlessoperations, so why build HPCC on top of it ? Is autoconf still toobleeding edge these days ? Argh ! And What about the three dozensparameters in the config file ?!? It's just insane.

I like the NAS benchmarks. You can run each of them independently, onlychoose the problem size and the number of processes. Easy to run, easyto compare. Pallas is nice too, anybody can run it.

Trust me, I'd love to see microbenchmarks which attack the real issues
that speed up applications. But usually they miss the mark, and my
attempt to create a new one (message rate) is now destroyed by message
coalescing. I should have used an N-node benchmark instead.

If you want to show the impact of concurrent communications, somethinglatency-based like the HPCC ring test is the best way (eventually withmore nodes). The millions of packet per second of a stream-basedbenchmark are lovely for the marketing folks, but has little meaning forreal codes that computes a minimum. However, an alltoall on manycores/nodes would exercise the same metric (many sends/recvs on the sameNIC at the same time), but would be harder to cheat and be much moremeaningful IMHO.


Patrick
--
Patrick Geoffray
Myricom, Inc.
http://www.myri.com
_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Re: [Beowulf] Performance characterising a HPC application

Reply via email to