On Aug 12, 2007, at 3:49 PM, Gleb Natapov wrote:
- Mellanox tested MVAPICH with the header caching; latency was around
1.4us
- Mellanox tested MVAPICH without the header caching; latency was
around 1.9us
As far as I remember Mellanox results and according to our testing
difference between MVAPICH with header caching and OMPI is 0.2-0.3us.
Not 0.5us. And MVAPICH without header caching is actually worse then
OMPI for small messages.
I guess reading the graph that Pasha sent is difficult; Pasha -- can
you send the actual numbers?
Given that OMPI is the lone outlier around 1.9us, I think we have no
choice except to implement the header caching and/or examine our
header to see if we can shrink it. Mellanox has volunteered to
implement header caching in the openib btl.
I think we have a chose. Not implement header caching, but just
change the
osu_latency benchmark to send each message with different tag :)
If only. :-)
But that misses the point (and the fact that all the common ping-pong
benchmarks use a single tag: NetPIPE, IMB, osu_latency, etc.). *All
other MPI's* give us latency around 1.4us, but Open MPI is around
1.9us. So we need to do something.
Are we optimizing for a benchmark? Yes. But we have to do it. Many
people know that these benchmarks are fairly useless, but not enough
-- too many customers do not, and education is not enough. "Sure
this MPI looks slower but, really, it isn't. Trust me; my name is
Joe Isuzu." That's a hard sell.
I am not against header caching per se, but if it will complicate code
even a little bit I don't think we should implemented it just to
benefit one
fabricated benchmark (AFAIR before header caching was implemented in
MVAPICH mpi_latency actually sent messages with different tags).
That may be true and a reason for us to wail and gnash our teeth, but
it doesn't change the current reality.
Also there is really nothing to cache in openib BTL. Openin BTL
header is 4
bytes long. The caching will have to be done in OB1 and there it will
affect every other interconnect.
Surely there is *something* we can do -- what, exactly, is the
objection to peeking inside the PML header down in the btl? Is it
really so horrible for a btl to look inside the upper layer's
header? I agree that the PML looking into a btl header would
[obviously] be Bad.
All this being said -- is there another reason to lower our latency?
My main goal here is to lower the latency. If header caching is
unattractive, then another method would be fine.
--
Jeff Squyres
Cisco Systems