I've used my NetPIPE communication benchmark (http://netpipe.cs.ksu.edu)
to measure the performance of OpenMPI and other implementations on
Comet at SDSC (FDR IB, graph attached, same results measured elsewhere too).
The uni-directional performance is good at 50 Gbps, the bi-directional
performance
is double that at 97 Gbps, and the aggregate bandwidth from measuring
24 bi-directional ping-pongs across the link between 2 nodes is a little
lower
than I'd like to see but still respectable, and similar for MVAPICH. All
these
were measured by reusing the same source and destination buffers
each time.
When I measure using the --nocache flag where the data comes
from a new buffer in main memory each time, and is therefore also
not already registered with the IB card, and likewise gets put into a
new buffer in main memory, I see a loss in performance of at least
20%. Could someone please give me a short description of whether
this is due to data being copied into a memory buffer that is already
registered with the IB card, or whether this is the cost of registering
the new memory with the IB card for its first use?
I also see huge performance losses in this case when the message
size is not a factor of 8 bytes (factors of 8 are the tops of the spikes).
I've seen this in the past when there was a memory copy involved and
the copy routine switched to a byte-by-byte copy for non factors of 8.
While I don't know how many apps fall into this worst case scenario
that the --nocache measurements represent, I could certainly see large
bioinformatics runs being affected as the message lengths are not
going to be factors of 8 bytes.
Dave Turner
--
Work: [email protected] (785) 532-7791
2219 Engineering Hall, Manhattan KS 66506
Home: [email protected]
cell: (785) 770-5929
np.comet.openmpi.pdf
Description: Adobe PDF document
_______________________________________________ devel mailing list [email protected] https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
