debug utility

Patrick Geoffray Thu, 29 Jun 2006 19:35:25 -0400

Jeff Squyres (jsquyres) wrote:

-----Original Message-----
From: devel-boun...@open-mpi.org[mailto:devel-boun...@open-mpi.org] On Behalf Of Patrick Geoffray
Sent: Wednesday, June 28, 2006 1:23 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] Best bw/lat performance formicrobenchmark/debug utility
Josh Aune wrote:
I am writing up some interconnect/network debugging software that is
centered around ompi. What is the best set of functions to

I was assuming that you would be testing latency/bandwidth, but Patrick
is correct in stating that there are many more things to test than just
those two metrics.

There are a lot of metrics, but most of them require deep understandingof the MPI semantics and implementation details to make sense. The artof micro-benchmark is to choose the metrics and explain why they matter.It's obvious for latency/bandwidth, a bit less for unexpected and hostoverhead, definitively hard for overlap and progress. And that's justfor point-to-point.

To avoid reinventing the wheel, I would suggest to Josh to develop amicro-benchmark test suite to compute a very detailed LogP-derivedparameters, ie for all message sizes:* send overhead (o.s) and recv overhead (o.r). These overheads willlikely be either constant or linear for various message size ranges, itwould be great to automatically compute the ranges.Memory registration cost is accounted here, so it would useful tomeasure with and without registration cache also.

* Latency (L).

* Send gap (g.s) and recv gap (g.r). For large messages, they willlikely be identical and represent the link bandwidth. For smallermessages, the send gap is the gap of a fan-out pattern (1->N) and therecv gap is the gap of a flat gather (N->1). It's important to not havethe send or recv overhead hiding the send or recv gap, using severalprocesses could be used to dive the send/recv overhead.* unexpected overhead (o.u). Overhead added to (o.r) when the message isnot immediately matched.* overlap availability (a) that is the percentage of communication timethat you can overlap with real host computation.

From these parameters, you can derive pretty much all characteristicsof an interconnect without contention.


Patrick
--
Patrick Geoffray
Myricom, Inc.
http://www.myri.com

Re: [OMPI devel] Best bw/lat performance for microbenchmark/debug utility

Reply via email to