On Wed, Dec 12, 2007 at 03:46:10PM -0500, Richard Graham wrote:
> This is better than nothing, but really not very helpful for looking at the
> specific issues that can arise with this, unless these systems have several
> parallel networks, with tests that will generate a lot of parallel network
> traffic, and be able to self check for out-of-order received - i.e. this
> needs to be encoded into the payload for verification purposes.  There are
> some out-of-order scenarios that need to be generated and checked.  I think
> that George may have a system that will be good for this sort of testing.
> 
I am running various test with multiple networks right now. I use
several IB BTLs and TCP BTL simultaneously. I see many reordered
messages and all tests were OK till now, but they don't encode
message sequence in a payload as far as I know. I'll change one of
them to do so.

> Rich
> 
> 
> On 12/12/07 3:20 PM, "Gleb Natapov" <gl...@voltaire.com> wrote:
> 
> > On Wed, Dec 12, 2007 at 11:57:11AM -0500, Jeff Squyres wrote:
> >> Gleb --
> >> 
> >> How about making a tarball with this patch in it that can be thrown at
> >> everyone's MTT? (we can put the tarball on www.open-mpi.org somewhere)
> > I don't have access to www.open-mpi.org, but I can send you the patch.
> > I can send you a tarball too, but I prefer to not abuse email.
> > 
> >> 
> >> 
> >> On Dec 11, 2007, at 4:14 PM, Richard Graham wrote:
> >> 
> >>> I will re-iterate my concern.  The code that is there now is mostly
> >>> nine
> >>> years old (with some mods made when it was brought over to Open
> >>> MPI).  It
> >>> took about 2 months of testing on systems with 5-13 way network
> >>> parallelism
> >>> to track down all KNOWN race conditions.  This code is at the center
> >>> of MPI
> >>> correctness, so I am VERY concerned about changing it w/o some very
> >>> strong
> >>> reasons.  Not apposed, just very cautious.
> >>> 
> >>> Rich
> >>> 
> >>> 
> >>> On 12/11/07 11:47 AM, "Gleb Natapov" <gl...@voltaire.com> wrote:
> >>> 
> >>>> On Tue, Dec 11, 2007 at 08:36:42AM -0800, Andrew Friedley wrote:
> >>>>> Possibly, though I have results from a benchmark I've written
> >>>>> indicating
> >>>>> the reordering happens at the sender.  I believe I found it was
> >>>>> due to
> >>>>> the QP striping trick I use to get more bandwidth -- if you back
> >>>>> down to
> >>>>> one QP (there's a define in the code you can change), the reordering
> >>>>> rate drops.
> >>>> Ah, OK. My assumption was just from looking into code, so I may be
> >>>> wrong.
> >>>> 
> >>>>> 
> >>>>> Also I do not make any recursive calls to progress -- at least not
> >>>>> directly in the BTL; I can't speak for the upper layers.  The
> >>>>> reason I
> >>>>> do many completions at once is that it is a big help in turning
> >>>>> around
> >>>>> receive buffers, making it harder to run out of buffers and drop
> >>>>> frags.
> >>>>>  I want to say there was some performance benefit as well but I
> >>>>> can't
> >>>>> say for sure.
> >>>> Currently upper layers of Open MPI may call BTL progress function
> >>>> recursively. I hope this will change some day.
> >>>> 
> >>>>> 
> >>>>> Andrew
> >>>>> 
> >>>>> Gleb Natapov wrote:
> >>>>>> On Tue, Dec 11, 2007 at 08:03:52AM -0800, Andrew Friedley wrote:
> >>>>>>> Try UD, frags are reordered at a very high rate so should be a
> >>>>>>> good test.
> >>>>>> Good Idea I'll try this. BTW I thing the reason for such a high
> >>>>>> rate of
> >>>>>> reordering in UD is that it polls for MCA_BTL_UD_NUM_WC completions
> >>>>>> (500) and process them one by one and if progress function is
> >>>>>> called
> >>>>>> recursively next 500 completion will be reordered versus previous
> >>>>>> completions (reordering happens on a receiver, not sender).
> >>>>>> 
> >>>>>>> Andrew
> >>>>>>> 
> >>>>>>> Richard Graham wrote:
> >>>>>>>> Gleb,
> >>>>>>>>  I would suggest that before this is checked in this be tested
> >>>>>>>> on a
> >>>>>>>> system
> >>>>>>>> that has N-way network parallelism, where N is as large as you
> >>>>>>>> can find.
> >>>>>>>> This is a key bit of code for MPI correctness, and out-of-order
> >>>>>>>> operations
> >>>>>>>> will break it, so you want to maximize the chance for such
> >>>>>>>> operations.
> >>>>>>>> 
> >>>>>>>> Rich
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> On 12/11/07 10:54 AM, "Gleb Natapov" <gl...@voltaire.com> wrote:
> >>>>>>>> 
> >>>>>>>>> Hi,
> >>>>>>>>> 
> >>>>>>>>>   I did a rewrite of matching code in OB1. I made it much
> >>>>>>>>> simpler and 2
> >>>>>>>>> times smaller (which is good, less code - less bugs). I also
> >>>>>>>>> got rid
> >>>>>>>>> of huge macros - very helpful if you need to debug something.
> >>>>>>>>> There
> >>>>>>>>> is no performance degradation, actually I even see very small
> >>>>>>>>> performance
> >>>>>>>>> improvement. I ran MTT with this patch and the result is the
> >>>>>>>>> same as on
> >>>>>>>>> trunk. I would like to commit this to the trunk. The patch is
> >>>>>>>>> attached
> >>>>>>>>> for everybody to try.
> >>>>>>>>> 
> >>>>>>>>> --
> >>>>>>>>> Gleb.
> >>>>>>>>> _______________________________________________
> >>>>>>>>> devel mailing list
> >>>>>>>>> de...@open-mpi.org
> >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>>>>>> _______________________________________________
> >>>>>>>> devel mailing list
> >>>>>>>> de...@open-mpi.org
> >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>>>>> _______________________________________________
> >>>>>>> devel mailing list
> >>>>>>> de...@open-mpi.org
> >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>>>> 
> >>>>>> --
> >>>>>> Gleb.
> >>>>>> _______________________________________________
> >>>>>> devel mailing list
> >>>>>> de...@open-mpi.org
> >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>>> _______________________________________________
> >>>>> devel mailing list
> >>>>> de...@open-mpi.org
> >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>> 
> >>>> --
> >>>> Gleb.
> >>>> _______________________________________________
> >>>> devel mailing list
> >>>> de...@open-mpi.org
> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>> 
> >>> _______________________________________________
> >>> devel mailing list
> >>> de...@open-mpi.org
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> 
> >> 
> >> -- 
> >> Jeff Squyres
> >> Cisco Systems
> >> _______________________________________________
> >> devel mailing list
> >> de...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > 
> > --
> > Gleb.
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

--
                        Gleb.

Reply via email to