Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-1731-g8e30579
I looked at this in a bit more detail this morning. SHORT VERSION - I think that the real issue is that we shouldn't be setting KEEPALIVE on the listening sockets (we should only be setting these values on accepted/connected sockets). I submitted a PR for this: https://github.com/open-mpi/ompi/pull/588; there are several commits on it, the key commit is https://github.com/jsquyres/ompi/commit/ece484c. The OS X kernel reports its default keep alive values in milliseconds, and the Linux kernel reports its default values in seconds. But I can't find definitive references to the units for setsoctopt(TCP_KEEPALIVE) (and friends) -- empirical testing on OS X shows that TCP_KEEPALIVE is definitely specified in *seconds*. MORE DETAIL --- We only need to set KEEPALIVE on the accepted/connected sockets; there's no point in setting KEEPALIVE -- or any of our usual socket options (e.g., RCVBUF, NODELAY, etc.) -- on the listening sockets because the listening sockets are not used to send/receive anything. Specifically: not setting these values for the listening sockets avoids the memory leak that started this whole kerfuffle. As for whether the units are in seconds vs. milliseconds: I found references to how the Linux and OS X kernels report the values (Linux: seconds, OS X: milliseconds), but I can't find any references to the units used when settings the values via setsockopt(). Most references I found imply *seconds*. I think either OS X may be buggy, or the behavior of keepalives on listening sockets is just different vs. keep alive behavior on connected sockets. On OS X, when I setsockopt(sd, SOL_SOCKET, SO_KEEPALIVE, ...) to a value of 10, I see that the listener socket doesn't generate its first EAGAIN for 10 seconds. But then it generates EAGAINs constantly after that, regardless of the TCP_KEEPINTVL value that is set. Specifically, I tried setting TCP_KEEPINVTL to 1 and 1000 -- there was no change in frequency of the EAGAINs after the first one was generated. --> But that may be moot because KEEPALIVE behavior on an unconnected socket may be ... weird/undefined/whatever. However, the OS X kernel clearly reports its default keepavlie values in milliseconds, and the Linux kernel clearly reports its default values in seconds. I ran the following commands on my systems, based on information I found on http://www.gnugk.org/keepalive.html: Linux / RHEL 6.5 / 2.6.32 kernel (this is clearly in seconds): $ sysctl net.ipv4.tcp_keepalive_time net.ipv4.tcp_keepalive_time = 1800 $ sysctl net.ipv4.tcp_keepalive_intvl net.ipv4.tcp_keepalive_intvl = 75 Linux / Ubuntu 14.04.2 / 3.16.0 kernel (this is clearly in seconds): $ sysctl net.ipv4.tcp_keepalive_time net.ipv4.tcp_keepalive_time = 7200 $ sysctl net.ipv4.tcp_keepalive_intvl net.ipv4.tcp_keepalive_intvl = 75 OS X 10.10 / Yosemite (this is clearly in microseconds): $ sysctl net.inet.tcp.keepidle net.inet.tcp.keepidle: 720 $ sysctl net.inet.tcp.keepintvl net.inet.tcp.keepintvl: 75000 > On May 15, 2015, at 8:42 AM, Ralph Castain wrote: > > Did some more digging, and it turns out that Linux specifies the keep alive > time interval in seconds - and Mac (for some strange reason) uses > milliseconds. Hence the difference in behavior. > > So I could replace the current commit with one that multiplies the keep alive > interval by 1000x if we are on a Mac. However, we don't really need keep > alive at all on the Mac, so I'm wondering if we shouldn't just leave it > turned off? > > I confess I don't care either way > Ralph > > > On Thu, May 14, 2015 at 10:46 PM, George Bosilca wrote: > In the worst case, i.e. no other solution is possible, OS X can be identified > by the existence of the macro __APPLE__. There is no need to have > OPAL_HAVE_MAC. > > George. > > On Thu, May 14, 2015 at 11:12 PM, Ralph Castain wrote: > Interesting - as I said, I'll take a look. In either case, the keep alive on > the Mac is unnecessary as it is always a standalone scenario - no value in > running it. So the "fix" does no harm and just saves some useless overhead. > > > On Thu, May 14, 2015 at 9:00 PM, George Bosilca wrote: > I'm sorry Ralph what you proposed is not really a fix. My comment is based on > a real execution of exactly the command you provided with lldb attached to > the process. What I see is millions of > OBJ_NEW(mca_oob_tcp_pending_connection_t) because the EAGAIN is not correctly > handled. > > George. > > > On Thu, May 14, 2015 at 10:56 PM, Ralph Castain wrote: > Yes - this is the fix for that issue > > > On Thu, May 14, 2015 at 8:54 PM, Howard Pritchard wrote: > Is this by any chance associated with issue 579? > > > 2015-05-14 20:49 GMT-06:00 Ralph Castain : > I'll look at the lines you cite, but that clearly isn't the problem we are > seeing here. I can verify that because the test case: > > mpirun -n 1 sleep 1000 > > does not open up any connections at all. Thus, th
Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-1731-g8e30579
On Sat, 16 May 2015 12:49:51 PM Jeff Squyres wrote: > Linux / RHEL 6.5 / 2.6.32 kernel (this is clearly in seconds): > > $ sysctl net.ipv4.tcp_keepalive_time > net.ipv4.tcp_keepalive_time = 1800 I suspect that's a local customisation, all Linux systems I've got access to (including RHEL 6.4/6.5/6.6) report: net.ipv4.tcp_keepalive_time = 7200 All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-1731-g8e30579
AIX, Solaris and {Free,Open,Net}BSD results are also not consistent with regards to units used for reporting: AIX$ no -o tcp_keepidle -o tcp_keepintvl tcp_keepidle = 14400 tcp_keepintvl = 150 {phargrov@solaris11-amd64 ~}$ ndd -get /dev/tcp tcp_keepalive_interval 720 [phargrov@freebsd10-amd64 ~]$ sysctl net.inet.tcp.keepidle net.inet.tcp.keepintvl net.inet.tcp.keepidle: 720 net.inet.tcp.keepintvl: 75000 [openbsd5-amd64 ~]$ sysctl net.inet.tcp.keepidle net.inet.tcp.keepintvl net.inet.tcp.keepidle=14400 net.inet.tcp.keepintvl=150 [netbsd6-amd64 ~]$ /sbin/sysctl net.inet.tcp.keepidle net.inet.tcp.keepintvl net.inet.tcp.keepidle = 14400 net.inet.tcp.keepintvl = 150 At least AIX documents these values as having units of HALF SECOND. I suspect that that is also true of the OpenBSD and NetBSD values above, because then all keepidle values seen so far are the same 2-hours (except Jeff's one RHEL-6.5 system). I *was* able to find units use for setting these documented on several systems: On Linux, FreeBSD and NetBSD the respective tcp(4) man pages all document TCP_KEEPIDLE and TCP_KEEPINTVL socket options as taking *seconds* for their arguments. Even AIX-7.1's setsockopt manpage says seconds are used to set these two socket options. My OS X 10.8 system's tcp(4) has different names (TCP_KEEPALIVE and TCP_CONNECTIONTIMEOUT) which are documented as corresponding to the sysctl values, but it *does* agree that units of seconds are used to set these options. I didn't find OpenBSD or Solaris docs ("grep -rl TCP_KEEP /usr/share/man" didn't find any matches). So in summary: + sysctl (or equiv) reports in non-standardized units (including seconds, half-seconds and milliseconds). + setsockopt() uses seconds on all systems I found documented (Linux, OS X, FreeBSD, NetBSD and AIX) -Paul P.S. re: AIX - seriously "no" (I am guessing (n)etwork (o)ptions) as the command name! On Sat, May 16, 2015 at 6:25 AM, Chris Samuel wrote: > On Sat, 16 May 2015 12:49:51 PM Jeff Squyres wrote: > > > Linux / RHEL 6.5 / 2.6.32 kernel (this is clearly in seconds): > > > > $ sysctl net.ipv4.tcp_keepalive_time > > net.ipv4.tcp_keepalive_time = 1800 > > I suspect that's a local customisation, all Linux systems I've got access > to > (including RHEL 6.4/6.5/6.6) report: > > net.ipv4.tcp_keepalive_time = 7200 > > All the best, > Chris > -- > Christopher SamuelSenior Systems Administrator > VLSCI - Victorian Life Sciences Computation Initiative > Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 > http://www.vlsci.org.au/ http://twitter.com/vlsci > > ___ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/05/17411.php > -- Paul H. Hargrove phhargr...@lbl.gov Computer Languages & Systems Software (CLaSS) Group Computer Science Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] [OMPI commits] Git: open-mpi/ompi branch master updated. dev-1731-g8e30579
On Sat, 16 May 2015 02:59:35 PM Paul Hargrove wrote: > I didn't find OpenBSD or Solaris docs ("grep -rl TCP_KEEP /usr/share/man" > didn't find any matches). This seems to document it for an unspecified version of Solaris: http://docs.oracle.com/cd/E19120-01/open.solaris/819-2724/fsvdg/index.html For OpenBSD this bugzilla entry for Firefox from early last year claims they are only available via sysctl options there, not via setsockopt(): https://bugzilla.mozilla.org/show_bug.cgi?id=970550#c8 There is some (meagre) documentation of those options here: http://nixdoc.net/man-pages/OpenBSD/sysctl.3.html The last (documented) change on the OpenBSD site was 3.5, saying: http://www.openbsd.org/plus35.html # Reset the TCP keepalive timer to tcp.keepidle (normally four hours) # after the three-way handshake completes. (syncache sets it to # tcp.keepinittime, normally 150 seconds). Hope that helps! All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci