I see that in the OOB CPC for the openib BTL, when setting up the send side of the QP, we set the rnr_retry value depending on whether the remote receive queue is a per-peer or SRQ:

- SRQ: btl_openib_rnr_retry MCA param value
- PP: 0

The rationale given in a comment is that setting the RNR to 0 is a good way to find bugs in our flow control.

Do we really want this in production builds? Or do we want 0 for developer builds and the same btl_openib_rnr_retry value for PP queues?

Or should we offer a finer-grained control, such as:

- btl_openib_rnr_retry_pp: value to use for per-peer q's, -1=use the default
- btl_openib_rnr_retry_srq: value to use for srq's, -1=use the default
- btl_openib_rnr_retry: value to use as the default for _pp and _srq

--
Jeff Squyres
Cisco Systems

Reply via email to