On Wed, Dec 12, 2007 at 03:46:10PM -0500, Richard Graham wrote: > This is better than nothing, but really not very helpful for looking at the > specific issues that can arise with this, unless these systems have several > parallel networks, with tests that will generate a lot of parallel network > traffic, and be able to self check for out-of-order received - i.e. this > needs to be encoded into the payload for verification purposes. There are > some out-of-order scenarios that need to be generated and checked. I think > that George may have a system that will be good for this sort of testing. > I am running various test with multiple networks right now. I use several IB BTLs and TCP BTL simultaneously. I see many reordered messages and all tests were OK till now, but they don't encode message sequence in a payload as far as I know. I'll change one of them to do so.
> Rich > > > On 12/12/07 3:20 PM, "Gleb Natapov" <gl...@voltaire.com> wrote: > > > On Wed, Dec 12, 2007 at 11:57:11AM -0500, Jeff Squyres wrote: > >> Gleb -- > >> > >> How about making a tarball with this patch in it that can be thrown at > >> everyone's MTT? (we can put the tarball on www.open-mpi.org somewhere) > > I don't have access to www.open-mpi.org, but I can send you the patch. > > I can send you a tarball too, but I prefer to not abuse email. > > > >> > >> > >> On Dec 11, 2007, at 4:14 PM, Richard Graham wrote: > >> > >>> I will re-iterate my concern. The code that is there now is mostly > >>> nine > >>> years old (with some mods made when it was brought over to Open > >>> MPI). It > >>> took about 2 months of testing on systems with 5-13 way network > >>> parallelism > >>> to track down all KNOWN race conditions. This code is at the center > >>> of MPI > >>> correctness, so I am VERY concerned about changing it w/o some very > >>> strong > >>> reasons. Not apposed, just very cautious. > >>> > >>> Rich > >>> > >>> > >>> On 12/11/07 11:47 AM, "Gleb Natapov" <gl...@voltaire.com> wrote: > >>> > >>>> On Tue, Dec 11, 2007 at 08:36:42AM -0800, Andrew Friedley wrote: > >>>>> Possibly, though I have results from a benchmark I've written > >>>>> indicating > >>>>> the reordering happens at the sender. I believe I found it was > >>>>> due to > >>>>> the QP striping trick I use to get more bandwidth -- if you back > >>>>> down to > >>>>> one QP (there's a define in the code you can change), the reordering > >>>>> rate drops. > >>>> Ah, OK. My assumption was just from looking into code, so I may be > >>>> wrong. > >>>> > >>>>> > >>>>> Also I do not make any recursive calls to progress -- at least not > >>>>> directly in the BTL; I can't speak for the upper layers. The > >>>>> reason I > >>>>> do many completions at once is that it is a big help in turning > >>>>> around > >>>>> receive buffers, making it harder to run out of buffers and drop > >>>>> frags. > >>>>> I want to say there was some performance benefit as well but I > >>>>> can't > >>>>> say for sure. > >>>> Currently upper layers of Open MPI may call BTL progress function > >>>> recursively. I hope this will change some day. > >>>> > >>>>> > >>>>> Andrew > >>>>> > >>>>> Gleb Natapov wrote: > >>>>>> On Tue, Dec 11, 2007 at 08:03:52AM -0800, Andrew Friedley wrote: > >>>>>>> Try UD, frags are reordered at a very high rate so should be a > >>>>>>> good test. > >>>>>> Good Idea I'll try this. BTW I thing the reason for such a high > >>>>>> rate of > >>>>>> reordering in UD is that it polls for MCA_BTL_UD_NUM_WC completions > >>>>>> (500) and process them one by one and if progress function is > >>>>>> called > >>>>>> recursively next 500 completion will be reordered versus previous > >>>>>> completions (reordering happens on a receiver, not sender). > >>>>>> > >>>>>>> Andrew > >>>>>>> > >>>>>>> Richard Graham wrote: > >>>>>>>> Gleb, > >>>>>>>> I would suggest that before this is checked in this be tested > >>>>>>>> on a > >>>>>>>> system > >>>>>>>> that has N-way network parallelism, where N is as large as you > >>>>>>>> can find. > >>>>>>>> This is a key bit of code for MPI correctness, and out-of-order > >>>>>>>> operations > >>>>>>>> will break it, so you want to maximize the chance for such > >>>>>>>> operations. > >>>>>>>> > >>>>>>>> Rich > >>>>>>>> > >>>>>>>> > >>>>>>>> On 12/11/07 10:54 AM, "Gleb Natapov" <gl...@voltaire.com> wrote: > >>>>>>>> > >>>>>>>>> Hi, > >>>>>>>>> > >>>>>>>>> I did a rewrite of matching code in OB1. I made it much > >>>>>>>>> simpler and 2 > >>>>>>>>> times smaller (which is good, less code - less bugs). I also > >>>>>>>>> got rid > >>>>>>>>> of huge macros - very helpful if you need to debug something. > >>>>>>>>> There > >>>>>>>>> is no performance degradation, actually I even see very small > >>>>>>>>> performance > >>>>>>>>> improvement. I ran MTT with this patch and the result is the > >>>>>>>>> same as on > >>>>>>>>> trunk. I would like to commit this to the trunk. The patch is > >>>>>>>>> attached > >>>>>>>>> for everybody to try. > >>>>>>>>> > >>>>>>>>> -- > >>>>>>>>> Gleb. > >>>>>>>>> _______________________________________________ > >>>>>>>>> devel mailing list > >>>>>>>>> de...@open-mpi.org > >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel > >>>>>>>> _______________________________________________ > >>>>>>>> devel mailing list > >>>>>>>> de...@open-mpi.org > >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel > >>>>>>> _______________________________________________ > >>>>>>> devel mailing list > >>>>>>> de...@open-mpi.org > >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel > >>>>>> > >>>>>> -- > >>>>>> Gleb. > >>>>>> _______________________________________________ > >>>>>> devel mailing list > >>>>>> de...@open-mpi.org > >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel > >>>>> _______________________________________________ > >>>>> devel mailing list > >>>>> de...@open-mpi.org > >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel > >>>> > >>>> -- > >>>> Gleb. > >>>> _______________________________________________ > >>>> devel mailing list > >>>> de...@open-mpi.org > >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel > >>> > >>> _______________________________________________ > >>> devel mailing list > >>> de...@open-mpi.org > >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel > >> > >> > >> -- > >> Jeff Squyres > >> Cisco Systems > >> _______________________________________________ > >> devel mailing list > >> de...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > -- > > Gleb. > > _______________________________________________ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Gleb.