Re: [OMPI devel] SDP support for OPEN-MPI

Jeff Squyres Tue, 1 Jan 2008 21:07:46 -0500

On Jan 1, 2008, at 1:11 PM, Andrew Friedley wrote:

We would like to add SDP support for OPENMPI.


I have a few points -- this is the first:

I would do this patch slightly differently. I prefer to have as few#if's as possible, so I'd do it to always have the struct members andlogic for the MCA-enable/disable of SDP support, but only actuallyenable it if HAVE_DECL_AF_INET_SDP. Hence, the number of #if's isdramatically reduced -- you only need to #if the parts of the codethat actually try to use AF_INET_SDP (etc.).

I'd also ditch the --enable-sdp; I think configure can figure thatstuff out by itself without an --enable switch. Perhaps if peoplereally want the ability to turn SDP off at configure time, --disable-sdp could be useful. But that might not be too useful.

Don't forget that we always have the "bool" type available; you canuse that for logicals (instead of int).

I'd also add another MCA param that is read-only that indicateswhether SDP is support was compiled in or not (i.e.,HAVE_DECL_AF_INET_SDP is 1, and therefore there was a value forAF_INET_SDP). This will allow you to query ompi_info and see if yourOMPI was configured for SDP support.

That way, you can have a consistent set of MCA params for the TCPcomponents regardless of platform. I think that's somewhatimportant. To be user-friendly, I'd also emit a warning if someonetries to enable SDP support and it's not available. Note that SDPcould be unavailable for multiple reasons:


- wasn't available at compile time
- isn't available for the peer IP address that was used

Hence, if HAVE_DECL_AF_INET_SDP==1 and using AF_INET_SDP fails to thatpeer, it might be desirable to try to fail over to usingAF_INET_something_else. I'm still technically on vacation :-), so Ididn't look *too* closely at your patch, but I think you're doing that(failing over if AF_INET_SDP doesn't work because of EAFNOSUPPORT),which is good.


I would think the following would apply:

- Error (or warning?): user requests SDP and HAVE_DECL_AF_INET_SDP is 0

- Error (or warning?): user requests SDP and HAVE_DECL_AF_INET_SDP is1, but using AF_INET_SDP failed- Not an error: user does not request SDP, but HAVE_DECL_AF_INET_SDPis 1 and AF_INET_SDP works- Not an error: user does not request SDP, but HAVE_DECL_AF_INET_SDPis 1 and AF_INET_SDP does not work, but is able to fail over toAF_INET_something_else

With all this, the support is still somewhat inconsistent -- you couldbe using an OMPI that has HAVE_DECL_AF_INET_SDP==0, but you're runningon a system that has SDP available.

Perhaps a more general approach would be to [perhaps additionally]provide an MCA param to allow the user to specify the AF_* value?(AF_INET_SDP is a standardized value, right? I.e., will it be thesame on all Linux variants [and someday Solaris]?)

SDP can be used to accelerate job start ( oob over sdp ) and IPoIB
performance.
I fail to see the reason to pollute the TCP btl with IB-specificSDP stuff.
For the oob, this is arguable, but doesn't SDP allow for*transparent*socket replacement at runtime ? In this case, why not use thismechanism
and keep the code clean ?

Patrick's got a good point: is there a reason not to do this?(LD_PRELOAD and the like) Is it problematic with the remote orted's?

Furthermore, why would a user choose to use SDP and TCP/IPoIB when the
OpenIB BTL is available using the native verbs interface?  FWIW, this
same sort of question gets asked of the uDAPL BTL -- the answer there
being that the uDAPL BTL runs in places the OpenIB BTL does not.  Is
this true here as well?

Andrew's got a point point here, too -- accelerating the TCP BTL withSDP seems kinda pointless. I'm guessing that you did it because itwas just about the same work as was done in the TCP OOB (for which wehave no corresponding verbs interface). Is that right?


--
Jeff Squyres
Cisco Systems

Re: [OMPI devel] SDP support for OPEN-MPI

Reply via email to