On Wed, Dec 12, 2007 at 11:57:11AM -0500, Jeff Squyres wrote: > Gleb -- > > How about making a tarball with this patch in it that can be thrown at > everyone's MTT? (we can put the tarball on www.open-mpi.org somewhere) I don't have access to www.open-mpi.org, but I can send you the patch. I can send you a tarball too, but I prefer to not abuse email.
> > > On Dec 11, 2007, at 4:14 PM, Richard Graham wrote: > > > I will re-iterate my concern. The code that is there now is mostly > > nine > > years old (with some mods made when it was brought over to Open > > MPI). It > > took about 2 months of testing on systems with 5-13 way network > > parallelism > > to track down all KNOWN race conditions. This code is at the center > > of MPI > > correctness, so I am VERY concerned about changing it w/o some very > > strong > > reasons. Not apposed, just very cautious. > > > > Rich > > > > > > On 12/11/07 11:47 AM, "Gleb Natapov" <gl...@voltaire.com> wrote: > > > >> On Tue, Dec 11, 2007 at 08:36:42AM -0800, Andrew Friedley wrote: > >>> Possibly, though I have results from a benchmark I've written > >>> indicating > >>> the reordering happens at the sender. I believe I found it was > >>> due to > >>> the QP striping trick I use to get more bandwidth -- if you back > >>> down to > >>> one QP (there's a define in the code you can change), the reordering > >>> rate drops. > >> Ah, OK. My assumption was just from looking into code, so I may be > >> wrong. > >> > >>> > >>> Also I do not make any recursive calls to progress -- at least not > >>> directly in the BTL; I can't speak for the upper layers. The > >>> reason I > >>> do many completions at once is that it is a big help in turning > >>> around > >>> receive buffers, making it harder to run out of buffers and drop > >>> frags. > >>> I want to say there was some performance benefit as well but I > >>> can't > >>> say for sure. > >> Currently upper layers of Open MPI may call BTL progress function > >> recursively. I hope this will change some day. > >> > >>> > >>> Andrew > >>> > >>> Gleb Natapov wrote: > >>>> On Tue, Dec 11, 2007 at 08:03:52AM -0800, Andrew Friedley wrote: > >>>>> Try UD, frags are reordered at a very high rate so should be a > >>>>> good test. > >>>> Good Idea I'll try this. BTW I thing the reason for such a high > >>>> rate of > >>>> reordering in UD is that it polls for MCA_BTL_UD_NUM_WC completions > >>>> (500) and process them one by one and if progress function is > >>>> called > >>>> recursively next 500 completion will be reordered versus previous > >>>> completions (reordering happens on a receiver, not sender). > >>>> > >>>>> Andrew > >>>>> > >>>>> Richard Graham wrote: > >>>>>> Gleb, > >>>>>> I would suggest that before this is checked in this be tested > >>>>>> on a > >>>>>> system > >>>>>> that has N-way network parallelism, where N is as large as you > >>>>>> can find. > >>>>>> This is a key bit of code for MPI correctness, and out-of-order > >>>>>> operations > >>>>>> will break it, so you want to maximize the chance for such > >>>>>> operations. > >>>>>> > >>>>>> Rich > >>>>>> > >>>>>> > >>>>>> On 12/11/07 10:54 AM, "Gleb Natapov" <gl...@voltaire.com> wrote: > >>>>>> > >>>>>>> Hi, > >>>>>>> > >>>>>>> I did a rewrite of matching code in OB1. I made it much > >>>>>>> simpler and 2 > >>>>>>> times smaller (which is good, less code - less bugs). I also > >>>>>>> got rid > >>>>>>> of huge macros - very helpful if you need to debug something. > >>>>>>> There > >>>>>>> is no performance degradation, actually I even see very small > >>>>>>> performance > >>>>>>> improvement. I ran MTT with this patch and the result is the > >>>>>>> same as on > >>>>>>> trunk. I would like to commit this to the trunk. The patch is > >>>>>>> attached > >>>>>>> for everybody to try. > >>>>>>> > >>>>>>> -- > >>>>>>> Gleb. > >>>>>>> _______________________________________________ > >>>>>>> devel mailing list > >>>>>>> de...@open-mpi.org > >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel > >>>>>> _______________________________________________ > >>>>>> devel mailing list > >>>>>> de...@open-mpi.org > >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel > >>>>> _______________________________________________ > >>>>> devel mailing list > >>>>> de...@open-mpi.org > >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel > >>>> > >>>> -- > >>>> Gleb. > >>>> _______________________________________________ > >>>> devel mailing list > >>>> de...@open-mpi.org > >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel > >>> _______________________________________________ > >>> devel mailing list > >>> de...@open-mpi.org > >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel > >> > >> -- > >> Gleb. > >> _______________________________________________ > >> devel mailing list > >> de...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > _______________________________________________ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > -- > Jeff Squyres > Cisco Systems > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Gleb.