On Jan 1, 2008, at 1:11 PM, Andrew Friedley wrote:

We would like to add SDP support for OPENMPI.

I have a few points -- this is the first:

I would do this patch slightly differently. I prefer to have as few #if's as possible, so I'd do it to always have the struct members and logic for the MCA-enable/disable of SDP support, but only actually enable it if HAVE_DECL_AF_INET_SDP. Hence, the number of #if's is dramatically reduced -- you only need to #if the parts of the code that actually try to use AF_INET_SDP (etc.).

I'd also ditch the --enable-sdp; I think configure can figure that stuff out by itself without an --enable switch. Perhaps if people really want the ability to turn SDP off at configure time, --disable- sdp could be useful. But that might not be too useful.

Don't forget that we always have the "bool" type available; you can use that for logicals (instead of int).

I'd also add another MCA param that is read-only that indicates whether SDP is support was compiled in or not (i.e., HAVE_DECL_AF_INET_SDP is 1, and therefore there was a value for AF_INET_SDP). This will allow you to query ompi_info and see if your OMPI was configured for SDP support.

That way, you can have a consistent set of MCA params for the TCP components regardless of platform. I think that's somewhat important. To be user-friendly, I'd also emit a warning if someone tries to enable SDP support and it's not available. Note that SDP could be unavailable for multiple reasons:

- wasn't available at compile time
- isn't available for the peer IP address that was used

Hence, if HAVE_DECL_AF_INET_SDP==1 and using AF_INET_SDP fails to that peer, it might be desirable to try to fail over to using AF_INET_something_else. I'm still technically on vacation :-), so I didn't look *too* closely at your patch, but I think you're doing that (failing over if AF_INET_SDP doesn't work because of EAFNOSUPPORT), which is good.

I would think the following would apply:

- Error (or warning?): user requests SDP and HAVE_DECL_AF_INET_SDP is 0
- Error (or warning?): user requests SDP and HAVE_DECL_AF_INET_SDP is 1, but using AF_INET_SDP failed - Not an error: user does not request SDP, but HAVE_DECL_AF_INET_SDP is 1 and AF_INET_SDP works - Not an error: user does not request SDP, but HAVE_DECL_AF_INET_SDP is 1 and AF_INET_SDP does not work, but is able to fail over to AF_INET_something_else

With all this, the support is still somewhat inconsistent -- you could be using an OMPI that has HAVE_DECL_AF_INET_SDP==0, but you're running on a system that has SDP available.

Perhaps a more general approach would be to [perhaps additionally] provide an MCA param to allow the user to specify the AF_* value? (AF_INET_SDP is a standardized value, right? I.e., will it be the same on all Linux variants [and someday Solaris]?)

SDP can be used to accelerate job start ( oob over sdp ) and IPoIB
performance.

I fail to see the reason to pollute the TCP btl with IB-specific SDP stuff.

For the oob, this is arguable, but doesn't SDP allow for *transparent* socket replacement at runtime ? In this case, why not use this mechanism
and keep the code clean ?

Patrick's got a good point: is there a reason not to do this? (LD_PRELOAD and the like) Is it problematic with the remote orted's?

Furthermore, why would a user choose to use SDP and TCP/IPoIB when the
OpenIB BTL is available using the native verbs interface?  FWIW, this
same sort of question gets asked of the uDAPL BTL -- the answer there
being that the uDAPL BTL runs in places the OpenIB BTL does not.  Is
this true here as well?


Andrew's got a point point here, too -- accelerating the TCP BTL with SDP seems kinda pointless. I'm guessing that you did it because it was just about the same work as was done in the TCP OOB (for which we have no corresponding verbs interface). Is that right?

--
Jeff Squyres
Cisco Systems

Reply via email to