I've talked with both Brian and Rich about the measurements and they are
ok with the new findings. I also have not received any other comments
to the negative on putting 1097 into the v1.2 branch. So I would like
to instruct Tim Mattox to bring over the 1097 change to v1.2 branch and
make a new 1.2 RC.
thanks,
--td
Terry Dontje wrote:
Nikolay and Community,
Sorry to be so late in responding to your email but I've been working
with Pak to determine whether my hasty decision as RM yesterday was
hasty or not. To answer your question, we are still trying to determine
if the message queue support can go in or not and the below is my
perspective on whether we should.
Community,
A couple things have transpired in the last 24 hours from when we had
our concall. As Jeff surmised earlier this morning Pak did accidentally
have debugging enabled which did skew the numbers quite a bit. After
making sure debugging was disabled for both builds (v1.2 and the tmp
branch with the message queue fixes) we then fretted over the numbers.
It looks to me that there is quite a bit of variance in the numbers that
the OSU latency, IMB latency and mpi_ping all produce.
For example in using the OSU latency tests we say the MX MTL have a .01
us difference between v1.2 and the tmp branch (in favor of v1.2).
However the mean, trimmed mean and median have about .02-07us difference
(in favor of the tmp branch). To me the data looks pretty much the same
and the fact that we are measuring the averages (ie none of the tests
pick out the minimum value) makes these numbers even more hard to really
nail down IMHO. I've essentially seen this affect for the other tests
(IMB and mpi_ping).
For the SM timings using the mpi_ping tests we have seen a range of
average latencies from 1.47-1.5 us for both the tmp and v1.2 so they
seem like moral equivalents to me. Rich Graham has led me to believe
that he might get more consistent numbers but we are not able to and so
I can only deduce that the numbers are essentially the same.
In conclusion I believe both the CM PML (MX MTL) and the SM BTL
performance is about the same between the tmp branch and v1.2. Because
of this I would like to request that the 1097 cmr be put into 1.2.4. If
others disagree with my assessment above I think a discussion will need
to ensue and I would welcome further testing by others that may show
that the changes have regressed performance (or not). I would like to
set a timeout of 12 noon ET for others to comment whether these new
findings puts our fears at ease. At that time if not descenting
comments have been received I would like to ask Tim to pull in these
changes and rebuild 1.2.4.
thanks,
--td
Nikolay Piskun wrote:
Hi,
Just to verify, before I'll start testing this, there will be no
message queue debugging support in this version, correct? This all
goes to 1.3 release.
Best Regards,
P.S. It looks like it is time for us to be more formally involved in
this work.
Nikolay Piskun
Director of Continuing Engineering, TotalView Technologies
24 Prime Parkway, Natick, MA 01760
http://www.totalviewtech.com
------------------------------------------------------------------------
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel