On Wed, Feb 13, 2008 at 09:05:24AM -0500, Jeff Squyres wrote: > Actually, we should then also print out a different error message when > RNR occurs in PP QP's, too. It should be something along the lines of > "flow control problem occurred; this shouldn't happen..." (right now > it says RNR happened, and goes into detail into what that means -- but > that's not the real problem). > Good point.
> I'll do that as well. Thanks! > > > On Feb 13, 2008, at 12:59 AM, Gleb Natapov wrote: > > > On Tue, Feb 12, 2008 at 05:41:13PM -0500, Jeff Squyres wrote: > >> I see that in the OOB CPC for the openib BTL, when setting up the > >> send > >> side of the QP, we set the rnr_retry value depending on whether the > >> remote receive queue is a per-peer or SRQ: > >> > >> - SRQ: btl_openib_rnr_retry MCA param value > >> - PP: 0 > >> > >> The rationale given in a comment is that setting the RNR to 0 is a > >> good way to find bugs in our flow control. > >> > >> Do we really want this in production builds? Or do we want 0 for > >> developer builds and the same btl_openib_rnr_retry value for PP > >> queues? > >> > > The comment is mine and IMO it should stay that way for production > > builds. SW flow control either work or it doesn't and if it doesn't I > > prefer to know about it immediately. Setting PP to some value greater > > then 0 just delays the manifestation of the problem and in the case of > > iWarp such possibility doesn't even exists. > > > > -- > > Gleb. > > _______________________________________________ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > -- > Jeff Squyres > Cisco Systems > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Gleb.