Hi all,
Hi,
Thanks for the responses.

> -----Original Message-----
> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org]
On
> Behalf Of Jeff Squyres
> Sent: Wednesday, January 02, 2008 4:08 AM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] SDP support for OPEN-MPI
> 
> On Jan 1, 2008, at 1:11 PM, Andrew Friedley wrote:
> 
> >>> We would like to add SDP support for OPENMPI.
> 
> I have a few points -- this is the first:
> 
> I would do this patch slightly differently.  I prefer to have as few
> #if's as possible, so I'd do it to always have the struct members and
> logic for the MCA-enable/disable of SDP support, but only actually
> enable it if HAVE_DECL_AF_INET_SDP.  Hence, the number of #if's is
> dramatically reduced -- you only need to #if the parts of the code
> that actually try to use AF_INET_SDP (etc.).
> 
> I'd also ditch the --enable-sdp; I think configure can figure that
> stuff out by itself without an --enable switch.  Perhaps if people
> really want the ability to turn SDP off at configure time, --disable-
> sdp could be useful.  But that might not be too useful.
Unfortunatly AF_INET_SDP is not defined in the glibc and there is no
easy way to check it during config, Each app that uses SDP defines
AF_INET_SDP in its own headers.
Since the user can compile on the machine without SDP support and to
minimize the number of #if's we can always compile code with
sdp_support.
> 
> Don't forget that we always have the "bool" type available; you can
> use that for logicals (instead of int).
> 
> I'd also add another MCA param that is read-only that indicates
> whether SDP is support was compiled in or not (i.e.,
> HAVE_DECL_AF_INET_SDP is 1, and therefore there was a value for
> AF_INET_SDP).  This will allow you to query ompi_info and see if your
> OMPI was configured for SDP support.
> 
> That way, you can have a consistent set of MCA params for the TCP
> components regardless of platform.  I think that's somewhat
> important.  To be user-friendly, I'd also emit a warning if someone
> tries to enable SDP support and it's not available.  Note that SDP
> could be unavailable for multiple reasons:
> 
> - wasn't available at compile time
> - isn't available for the peer IP address that was used
> 
> Hence, if HAVE_DECL_AF_INET_SDP==1 and using AF_INET_SDP fails to that
> peer, it might be desirable to try to fail over to using
> AF_INET_something_else.  I'm still technically on vacation :-), so I
> didn't look *too* closely at your patch, but I think you're doing that
> (failing over if AF_INET_SDP doesn't work because of EAFNOSUPPORT),
> which is good.
This is actually not implemented yet. 
Supporting failing over requires opening AF_INET sockets in addition to
SDP sockets, this can cause a problem in large clusters. 
If one of the machine is not supporting SDP user will get an error.
> 
> I would think the following would apply:
> 
> - Error (or warning?): user requests SDP and HAVE_DECL_AF_INET_SDP is
0
> - Error (or warning?): user requests SDP and HAVE_DECL_AF_INET_SDP is
> 1, but using AF_INET_SDP failed
> - Not an error: user does not request SDP, but HAVE_DECL_AF_INET_SDP
> is 1 and AF_INET_SDP works
> - Not an error: user does not request SDP, but HAVE_DECL_AF_INET_SDP
> is 1 and AF_INET_SDP does not work, but is able to fail over to
> AF_INET_something_else
> 
> With all this, the support is still somewhat inconsistent -- you could
> be using an OMPI that has HAVE_DECL_AF_INET_SDP==0, but you're running
> on a system that has SDP available.
> 
> Perhaps a more general approach would be to [perhaps additionally]
> provide an MCA param to allow the user to specify the AF_* value?
> (AF_INET_SDP is a standardized value, right?  I.e., will it be the
> same on all Linux variants [and someday Solaris]?)
I didn't find any standard on it, it seems to be "randomly" selected
since the originally it was 26 and changed to 27 due to conflict with
kernel's defines.
> 
> >>> SDP can be used to accelerate job start ( oob over sdp ) and IPoIB
> >>> performance.
> >>
> >> I fail to see the reason to pollute the TCP btl with IB-specific
> >> SDP stuff.
> >>
> >> For the oob, this is arguable, but doesn't SDP allow for
> >> *transparent*
> >> socket replacement at runtime ? In this case, why not use this
> >> mechanism
> >> and keep the code clean ?
> 
> Patrick's got a good point: is there a reason not to do this?
> (LD_PRELOAD and the like)  Is it problematic with the remote orted's?
Yes, it's problematic with remote orted's and it not really transparent
as you might think.
Since we can't pass environments' variables to the orted's during
runtime we must preload sdp library to each remote environment ( i.e.
bashrc ) This will cause all applications to use SDP instead of AF_INET.
Which means you can't choose specific protocol for specific application,
either you are using SDP or AF_INET for all.
SDP also can be loaded with appropriate /usr/local/ofed/etc/libsdp.conf
configuration but a simple user have no access to it usually.
(http://www.cisco.com/univercd/cc/td/doc/product/svbu/ofed/ofed_1_1/ofed
_ug/sdp.htm#wp952927)

> 
> > Furthermore, why would a user choose to use SDP and TCP/IPoIB when
the
> > OpenIB BTL is available using the native verbs interface?  FWIW,
this
> > same sort of question gets asked of the uDAPL BTL -- the answer
there
> > being that the uDAPL BTL runs in places the OpenIB BTL does not.  Is
> > this true here as well?
> 
> 
> Andrew's got a point point here, too -- accelerating the TCP BTL with
> SDP seems kinda pointless.  I'm guessing that you did it because it
> was just about the same work as was done in the TCP OOB (for which we
> have no corresponding verbs interface).  Is that right?
Indeed. But it also seems that SDP has lower overhead than VERBS in some
cases.

Tests with Sandia's overlapping benchmark 
http://www.cs.sandia.gov/smb/overhead.html#mozTocId316713

VERBS results
msgsize iterations  iter_t      work_t      overhead    base_t
avail(%)
0       1000        16.892      15.309      1.583       7.029       77.5
2       1000        16.852      15.332      1.520       7.144       78.7
4       1000        16.932      15.312      1.620       7.128       77.3
8       1000        16.985      15.319      1.666       7.182       76.8
16      1000        16.886      15.297      1.589       7.219       78.0
32      1000        16.988      15.311      1.677       7.251       76.9
64      1000        16.944      15.299      1.645       7.457       77.9

SDP results
0       1000        134.902     128.089     6.813       54.691      87.5
2       1000        135.064     128.196     6.868       55.283      87.6
4       1000        135.031     128.356     6.675       55.039      87.9
8       1000        130.460     125.908     4.552       52.010      91.2
16      1000        135.432     128.694     6.738       55.615      87.9
32      1000        135.228     128.494     6.734       55.627      87.9
64      1000        135.470     128.540     6.930       56.583      87.8

IPoIB results
0       1000        252.953     247.053     5.900       119.977     95.1
2       1000        253.336     247.285     6.051       121.573     95.0
4       1000        254.147     247.041     7.106       122.110     94.2
8       1000        254.613     248.011     6.602       121.840     94.6
16      1000        255.662     247.952     7.710       124.738     93.8
32      1000        255.569     248.057     7.512       127.095     94.1
64      1000        255.867     248.308     7.559       132.858     94.3

> 
> --
> Jeff Squyres
> Cisco Systems
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to