Actually, we should then also print out a different error message when RNR occurs in PP QP's, too. It should be something along the lines of "flow control problem occurred; this shouldn't happen..." (right now it says RNR happened, and goes into detail into what that means -- but that's not the real problem).

I'll do that as well.


On Feb 13, 2008, at 12:59 AM, Gleb Natapov wrote:

On Tue, Feb 12, 2008 at 05:41:13PM -0500, Jeff Squyres wrote:
I see that in the OOB CPC for the openib BTL, when setting up the send
side of the QP, we set the rnr_retry value depending on whether the
remote receive queue is a per-peer or SRQ:

- SRQ: btl_openib_rnr_retry MCA param value
- PP: 0

The rationale given in a comment is that setting the RNR to 0 is a
good way to find bugs in our flow control.

Do we really want this in production builds?  Or do we want 0 for
developer builds and the same btl_openib_rnr_retry value for PP queues?

The comment is mine and IMO it should stay that way for production
builds. SW flow control either work or it doesn't and if it doesn't I
prefer to know about it immediately. Setting PP to some value greater
then 0 just delays the manifestation of the problem and in the case of
iWarp such possibility doesn't even exists.

--
                        Gleb.
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Jeff Squyres
Cisco Systems

Reply via email to