Gleb --
How about making a tarball with this patch in it that can be thrown at
everyone's MTT? (we can put the tarball on www.open-mpi.org somewhere)
On Dec 11, 2007, at 4:14 PM, Richard Graham wrote:
I will re-iterate my concern. The code that is there now is mostly
nine
years old (with some mods made when it was brought over to Open
MPI). It
took about 2 months of testing on systems with 5-13 way network
parallelism
to track down all KNOWN race conditions. This code is at the center
of MPI
correctness, so I am VERY concerned about changing it w/o some very
strong
reasons. Not apposed, just very cautious.
Rich
On 12/11/07 11:47 AM, "Gleb Natapov" <gl...@voltaire.com> wrote:
On Tue, Dec 11, 2007 at 08:36:42AM -0800, Andrew Friedley wrote:
Possibly, though I have results from a benchmark I've written
indicating
the reordering happens at the sender. I believe I found it was
due to
the QP striping trick I use to get more bandwidth -- if you back
down to
one QP (there's a define in the code you can change), the reordering
rate drops.
Ah, OK. My assumption was just from looking into code, so I may be
wrong.
Also I do not make any recursive calls to progress -- at least not
directly in the BTL; I can't speak for the upper layers. The
reason I
do many completions at once is that it is a big help in turning
around
receive buffers, making it harder to run out of buffers and drop
frags.
I want to say there was some performance benefit as well but I
can't
say for sure.
Currently upper layers of Open MPI may call BTL progress function
recursively. I hope this will change some day.
Andrew
Gleb Natapov wrote:
On Tue, Dec 11, 2007 at 08:03:52AM -0800, Andrew Friedley wrote:
Try UD, frags are reordered at a very high rate so should be a
good test.
Good Idea I'll try this. BTW I thing the reason for such a high
rate of
reordering in UD is that it polls for MCA_BTL_UD_NUM_WC completions
(500) and process them one by one and if progress function is
called
recursively next 500 completion will be reordered versus previous
completions (reordering happens on a receiver, not sender).
Andrew
Richard Graham wrote:
Gleb,
I would suggest that before this is checked in this be tested
on a
system
that has N-way network parallelism, where N is as large as you
can find.
This is a key bit of code for MPI correctness, and out-of-order
operations
will break it, so you want to maximize the chance for such
operations.
Rich
On 12/11/07 10:54 AM, "Gleb Natapov" <gl...@voltaire.com> wrote:
Hi,
I did a rewrite of matching code in OB1. I made it much
simpler and 2
times smaller (which is good, less code - less bugs). I also
got rid
of huge macros - very helpful if you need to debug something.
There
is no performance degradation, actually I even see very small
performance
improvement. I ran MTT with this patch and the result is the
same as on
trunk. I would like to commit this to the trunk. The patch is
attached
for everybody to try.
--
Gleb.
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
--
Gleb.
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
--
Gleb.
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
--
Jeff Squyres
Cisco Systems