Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-1731-g8e30579

2015-05-16 Thread Jeff Squyres (jsquyres)
I looked at this in a bit more detail this morning.

SHORT VERSION
-

I think that the real issue is that we shouldn't be setting KEEPALIVE on the 
listening sockets (we should only be setting these values on accepted/connected 
sockets).

I submitted a PR for this: https://github.com/open-mpi/ompi/pull/588; there are 
several commits on it, the key commit is 
https://github.com/jsquyres/ompi/commit/ece484c.

The OS X kernel reports its default keep alive values in milliseconds, and the 
Linux kernel reports its default values in seconds.  But I can't find 
definitive references to the units for setsoctopt(TCP_KEEPALIVE) (and friends) 
-- empirical testing on OS X shows that TCP_KEEPALIVE is definitely specified 
in *seconds*.

MORE DETAIL
---

We only need to set KEEPALIVE on the accepted/connected sockets; there's no 
point in setting KEEPALIVE -- or any of our usual socket options (e.g., RCVBUF, 
NODELAY, etc.) -- on the listening sockets because the listening sockets are 
not used to send/receive anything.

Specifically: not setting these values for the listening sockets avoids the 
memory leak that started this whole kerfuffle.

As for whether the units are in seconds vs. milliseconds: I found references to 
how the Linux and OS X kernels report the values (Linux: seconds, OS X: 
milliseconds), but I can't find any references to the units used when settings 
the values via setsockopt().  Most references I found imply *seconds*.

I think either OS X may be buggy, or the behavior of keepalives on listening 
sockets is just different vs. keep alive behavior on connected sockets.  On OS 
X, when I setsockopt(sd, SOL_SOCKET, SO_KEEPALIVE, ...) to a value of 10, I see 
that the listener socket doesn't generate its first EAGAIN for 10 seconds.  But 
then it generates EAGAINs constantly after that, regardless of the 
TCP_KEEPINTVL value that is set.  Specifically, I tried setting TCP_KEEPINVTL 
to 1 and 1000 -- there was no change in frequency of the EAGAINs after the 
first one was generated.

--> But that may be moot because KEEPALIVE behavior on an unconnected socket 
may be ... weird/undefined/whatever.

However, the OS X kernel clearly reports its default keepavlie values in 
milliseconds, and the Linux kernel clearly reports its default values in 
seconds.  I ran the following commands on my systems, based on information I 
found on http://www.gnugk.org/keepalive.html:

Linux / RHEL 6.5 / 2.6.32 kernel (this is clearly in seconds):

$ sysctl net.ipv4.tcp_keepalive_time
net.ipv4.tcp_keepalive_time = 1800
$ sysctl net.ipv4.tcp_keepalive_intvl
net.ipv4.tcp_keepalive_intvl = 75

Linux / Ubuntu 14.04.2 / 3.16.0 kernel (this is clearly in seconds):

$ sysctl net.ipv4.tcp_keepalive_time
net.ipv4.tcp_keepalive_time = 7200
$ sysctl net.ipv4.tcp_keepalive_intvl
net.ipv4.tcp_keepalive_intvl = 75

OS X 10.10 / Yosemite (this is clearly in microseconds):

$ sysctl net.inet.tcp.keepidle
net.inet.tcp.keepidle: 720
$ sysctl net.inet.tcp.keepintvl
net.inet.tcp.keepintvl: 75000




> On May 15, 2015, at 8:42 AM, Ralph Castain  wrote:
> 
> Did some more digging, and it turns out that Linux specifies the keep alive 
> time interval in seconds - and Mac (for some strange reason) uses 
> milliseconds. Hence the difference in behavior.
> 
> So I could replace the current commit with one that multiplies the keep alive 
> interval by 1000x if we are on a Mac. However, we don't really need keep 
> alive at all on the Mac, so I'm wondering if we shouldn't just leave it 
> turned off?
> 
> I confess I don't care either way
> Ralph
> 
> 
> On Thu, May 14, 2015 at 10:46 PM, George Bosilca  wrote:
> In the worst case, i.e. no other solution is possible, OS X can be identified 
> by the existence of the macro __APPLE__. There is no need to have 
> OPAL_HAVE_MAC.
> 
>   George.
> 
> On Thu, May 14, 2015 at 11:12 PM, Ralph Castain  wrote:
> Interesting - as I said, I'll take a look. In either case, the keep alive on 
> the Mac is unnecessary as it is always a standalone scenario - no value in 
> running it. So the "fix" does no harm and just saves some useless overhead.
> 
> 
> On Thu, May 14, 2015 at 9:00 PM, George Bosilca  wrote:
> I'm sorry Ralph what you proposed is not really a fix. My comment is based on 
> a real execution of exactly the command you provided with lldb attached to 
> the process. What I see is millions of 
> OBJ_NEW(mca_oob_tcp_pending_connection_t) because the EAGAIN is not correctly 
> handled.
> 
>   George.
> 
> 
> On Thu, May 14, 2015 at 10:56 PM, Ralph Castain  wrote:
> Yes - this is the fix for that issue
> 
> 
> On Thu, May 14, 2015 at 8:54 PM, Howard Pritchard  wrote:
> Is this by any chance associated with issue 579?
> 
> 
> 2015-05-14 20:49 GMT-06:00 Ralph Castain :
> I'll look at the lines you cite, but that clearly isn't the problem we are 
> seeing here. I can verify that because the test case:
> 
> mpirun -n 1 sleep 1000
> 
> does not open up any connections at all. Thus, th

Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-1731-g8e30579

2015-05-16 Thread Chris Samuel
On Sat, 16 May 2015 12:49:51 PM Jeff Squyres wrote:

> Linux / RHEL 6.5 / 2.6.32 kernel (this is clearly in seconds):
>
> $ sysctl net.ipv4.tcp_keepalive_time
> net.ipv4.tcp_keepalive_time = 1800

I suspect that's a local customisation, all Linux systems I've got access to 
(including RHEL 6.4/6.5/6.6) report:

net.ipv4.tcp_keepalive_time = 7200

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci



Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-1731-g8e30579

2015-05-16 Thread Paul Hargrove
AIX, Solaris and {Free,Open,Net}BSD results are also not consistent with
regards to units used for reporting:

AIX$ no -o tcp_keepidle -o tcp_keepintvl
tcp_keepidle = 14400
tcp_keepintvl = 150

{phargrov@solaris11-amd64 ~}$ ndd -get /dev/tcp tcp_keepalive_interval
720

[phargrov@freebsd10-amd64 ~]$ sysctl net.inet.tcp.keepidle
net.inet.tcp.keepintvl
net.inet.tcp.keepidle: 720
net.inet.tcp.keepintvl: 75000

[openbsd5-amd64 ~]$ sysctl net.inet.tcp.keepidle net.inet.tcp.keepintvl
net.inet.tcp.keepidle=14400
net.inet.tcp.keepintvl=150

[netbsd6-amd64 ~]$ /sbin/sysctl net.inet.tcp.keepidle net.inet.tcp.keepintvl
net.inet.tcp.keepidle = 14400
net.inet.tcp.keepintvl = 150


At least AIX documents these values as having units of HALF SECOND.
I suspect that that is also true of the OpenBSD and NetBSD values above,
because then all keepidle values seen so far are the same 2-hours  (except
Jeff's one RHEL-6.5 system).


I *was* able to find units use for setting these documented on several
systems:

On Linux, FreeBSD and NetBSD the respective tcp(4) man pages all document
TCP_KEEPIDLE and TCP_KEEPINTVL socket options as taking *seconds* for their
arguments.
Even AIX-7.1's setsockopt manpage says seconds are used to set these two
socket options.

My OS X 10.8 system's tcp(4) has different names (TCP_KEEPALIVE and
TCP_CONNECTIONTIMEOUT) which are documented as corresponding to the sysctl
values, but it *does* agree that units of seconds are used to set these
options.

I didn't find OpenBSD or Solaris docs ("grep -rl TCP_KEEP /usr/share/man"
didn't find any matches).

So in summary:

+ sysctl (or equiv) reports in non-standardized units (including seconds,
half-seconds and milliseconds).
+ setsockopt() uses seconds on all systems I found documented (Linux, OS X,
FreeBSD, NetBSD and AIX)

-Paul

P.S.
re: AIX - seriously "no" (I am guessing (n)etwork (o)ptions) as the command
name!


On Sat, May 16, 2015 at 6:25 AM, Chris Samuel  wrote:

> On Sat, 16 May 2015 12:49:51 PM Jeff Squyres wrote:
>
> > Linux / RHEL 6.5 / 2.6.32 kernel (this is clearly in seconds):
> >
> > $ sysctl net.ipv4.tcp_keepalive_time
> > net.ipv4.tcp_keepalive_time = 1800
>
> I suspect that's a local customisation, all Linux systems I've got access
> to
> (including RHEL 6.4/6.5/6.6) report:
>
> net.ipv4.tcp_keepalive_time = 7200
>
> All the best,
> Chris
> --
>  Christopher SamuelSenior Systems Administrator
>  VLSCI - Victorian Life Sciences Computation Initiative
>  Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
>  http://www.vlsci.org.au/  http://twitter.com/vlsci
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/05/17411.php
>



-- 
Paul H. Hargrove  phhargr...@lbl.gov
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-1731-g8e30579

2015-05-16 Thread Chris Samuel
On Sat, 16 May 2015 02:59:35 PM Paul Hargrove wrote:

> I didn't find OpenBSD or Solaris docs ("grep -rl TCP_KEEP /usr/share/man"
> didn't find any matches).

This seems to document it for an unspecified version of Solaris:

http://docs.oracle.com/cd/E19120-01/open.solaris/819-2724/fsvdg/index.html

For OpenBSD this bugzilla entry for Firefox from early last year claims they 
are only available via sysctl options there, not via setsockopt():

https://bugzilla.mozilla.org/show_bug.cgi?id=970550#c8

There is some (meagre) documentation of those options here:

http://nixdoc.net/man-pages/OpenBSD/sysctl.3.html

The last (documented) change on the OpenBSD site was 3.5, saying:

http://www.openbsd.org/plus35.html

# Reset the TCP keepalive timer to tcp.keepidle (normally four hours)
# after the three-way handshake completes. (syncache sets it to
# tcp.keepinittime, normally 150 seconds).

Hope that helps!

All the best,
Chris
-- 
 Christopher SamuelSenior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/  http://twitter.com/vlsci