Re: Latest stable (r287104) bash leaves zombies on exit
On 08/28/2015 18:18, Konstantin Belousov wrote: On Fri, Aug 28, 2015 at 05:52:42PM +0200, Michiel Boland wrote: set -e for a in `seq 1000` do echo -n "$a " xterm -e ssh nonexisting done echo "" (The idea here is that 'ssh nonexisting' should do some work and then exit, "xterm -e false", etc. don't appear to trigger the bug.) Prior to the patch, one of the xterms would hang after the counter reaches a random (reasonably small) number. After the patch the script runs till completion. Thank you for testing. Funny detail is that your loop does not hangs for me, I see flapping xterms until the completion. How many cpus does your machine have ? I have a Q8300 (4 cpus) - I guess the timing matters. Do I understand correctly that the problem is that if you install a signal handler with signal() (which is what xterm does) and pull in libthr.so somehow, then there is no thr_sighandler inserted? I condensed the xterm problem into a small C program. Compile in such a way that the delay loop does not get optimized out, and link with -lpthread. Eventually, when executed often enough, this will hang in the same fashion as xterm does. #include #include #include #include #include #include #include static void reapchild(int sig __unused) { wait(NULL); } static void delay(void) { long i, n; n = random() % 100; if (n < 0) { n = -n; } for (i = 0; i < n; i++) ; } int main() { int p[2]; char dummy; srandomdev(); if (signal(SIGCHLD, reapchild) == SIG_ERR) { perror("signal"); exit(1); } if (pipe(p) == -1) { perror("pipe"); exit(1); } switch (fork()) { case -1: perror("fork"); exit(1); case 0: close(p[1]); read(p[0], &dummy, 1); _exit(0); } close(p[1]); read(p[0], &dummy, 1); delay(); exit(0); } Below is a slightly improved version of the change, to avoid unnecessary relocations. Would be good to rebuild the world and confirm that you see no regression (the patch also affects rtld in some way). Ok, I will try this patch later today. Cheers, Michiel ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Latest stable (r287104) bash leaves zombies on exit
On Sat, Aug 29, 2015 at 01:43:36PM +0200, Michiel Boland wrote: > Do I understand correctly that the problem is that if you install a signal > handler with signal() (which is what xterm does) and pull in libthr.so > somehow, > then there is no thr_sighandler inserted? Yes. The problem does not exist for the sigaction(2). ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
FreeBSD_stable_10 - Build #1658 - Failure
FreeBSD_stable_10 - Build #1658 - Failure: Build information: https://jenkins.FreeBSD.org/job/FreeBSD_stable_10/1658/ Full change log: https://jenkins.FreeBSD.org/job/FreeBSD_stable_10/1658/changes Full build log: https://jenkins.FreeBSD.org/job/FreeBSD_stable_10/1658/console Change summaries: No changes The end of the build log: Started by an SCM change Building remotely on jenkins-10.freebsd.org (FreeBSD-10) in workspace /builds/FreeBSD_stable_10 java.io.IOException: remote file operation failed: /builds/FreeBSD_stable_10 at hudson.remoting.Channel@5a762a5f:jenkins-10.freebsd.org: hudson.remoting.ChannelClosedException: channel is already closed at hudson.FilePath.act(FilePath.java:987) at hudson.FilePath.act(FilePath.java:969) at hudson.FilePath.mkdirs(FilePath.java:1152) at hudson.model.AbstractProject.checkout(AbstractProject.java:1275) at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:610) at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86) at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:532) at hudson.model.Run.execute(Run.java:1741) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43) at hudson.model.ResourceController.execute(ResourceController.java:98) at hudson.model.Executor.run(Executor.java:381) Caused by: hudson.remoting.ChannelClosedException: channel is already closed at hudson.remoting.Channel.send(Channel.java:550) at hudson.remoting.Request.call(Request.java:129) at hudson.remoting.Channel.call(Channel.java:752) at hudson.FilePath.act(FilePath.java:980) ... 10 more Caused by: java.io.IOException at hudson.remoting.Channel.close(Channel.java:1110) at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:118) at hudson.remoting.PingThread.ping(PingThread.java:126) at hudson.remoting.PingThread.run(PingThread.java:85) Caused by: java.util.concurrent.TimeoutException: Ping started at 1440849388364 hasn't completed by 1440849628560 ... 2 more [WARNINGS] Skipping publisher since build result is FAILURE Email was triggered for: Failure - Any Sending email for trigger: Failure - Any ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Latest stable (r287104) bash leaves zombies on exit
On Fri, Aug 28, 2015 at 07:18:47PM +0300, Konstantin Belousov wrote: > On Fri, Aug 28, 2015 at 05:52:42PM +0200, Michiel Boland wrote: > > set -e > > for a in `seq 1000` > > do > > echo -n "$a " > > xterm -e ssh nonexisting > > done > > echo "" > > (The idea here is that 'ssh nonexisting' should do some work and then exit, > > "xterm -e false", etc. don't appear to trigger the bug.) > > Prior to the patch, one of the xterms would hang after the counter > > reaches a random (reasonably small) number. > > After the patch the script runs till completion. > Thank you for testing. Funny detail is that your loop does not hangs for > me, I see flapping xterms until the completion. How many cpus does your > machine have ? > Below is a slightly improved version of the change, to avoid unnecessary > relocations. Would be good to rebuild the world and confirm that you > see no regression (the patch also affects rtld in some way). Looks good to me, except that I think a vforked child (in system() and posix_spawn*()) should use the system calls and not libthr's wrappers. This reduces the probability of weird things happening between vfork and exec, and also avoids an unexpected error when posix_spawnattr_setsigdefault()'s mask contains SIGTHR. -- Jilles Tjoelker ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Latest stable (r287104) bash leaves zombies on exit
On Sat, Aug 29, 2015 at 03:01:38PM +0200, Jilles Tjoelker wrote: > Looks good to me, except that I think a vforked child (in system() and > posix_spawn*()) should use the system calls and not libthr's wrappers. > This reduces the probability of weird things happening between vfork and > exec, and also avoids an unexpected error when > posix_spawnattr_setsigdefault()'s mask contains SIGTHR. Thank you for the review, I agree with the note about vfork. Updated patch is below. Also, I removed the PIC_PROLOGUE from the i386 setjmp, it has no use after the plt calls are removed. diff --git a/lib/libc/amd64/gen/setjmp.S b/lib/libc/amd64/gen/setjmp.S index c26f52f..826220e 100644 --- a/lib/libc/amd64/gen/setjmp.S +++ b/lib/libc/amd64/gen/setjmp.S @@ -55,7 +55,7 @@ ENTRY(setjmp) movq$0,%rsi /* (sigset_t*)set */ leaq72(%rcx),%rdx /* 9,10; (sigset_t*)oset */ /* stack is 16-byte aligned */ - callPIC_PLT(CNAME(_sigprocmask)) + call__libc_sigprocmask popq%rdi movq%rdi,%rcx movq0(%rsp),%rdx/* retval */ @@ -82,7 +82,7 @@ ENTRY(__longjmp) leaq72(%rdx),%rsi /* (sigset_t*)set */ movq$0,%rdx /* (sigset_t*)oset */ subq$0x8,%rsp /* make the stack 16-byte aligned */ - callPIC_PLT(CNAME(_sigprocmask)) + call__libc_sigprocmask addq$0x8,%rsp popq%rsi popq%rdi/* jmpbuf */ diff --git a/lib/libc/amd64/gen/sigsetjmp.S b/lib/libc/amd64/gen/sigsetjmp.S index 9a20556..1e8e77c 100644 --- a/lib/libc/amd64/gen/sigsetjmp.S +++ b/lib/libc/amd64/gen/sigsetjmp.S @@ -63,7 +63,7 @@ ENTRY(sigsetjmp) movq$0,%rsi /* (sigset_t*)set */ leaq72(%rcx),%rdx /* 9,10 (sigset_t*)oset */ /* stack is 16-byte aligned */ - callPIC_PLT(CNAME(_sigprocmask)) + call__libc_sigprocmask popq%rdi 2: movq%rdi,%rcx movq0(%rsp),%rdx/* retval */ @@ -91,7 +91,7 @@ ENTRY(__siglongjmp) leaq72(%rdx),%rsi /* (sigset_t*)set */ movq$0,%rdx /* (sigset_t*)oset */ subq$0x8,%rsp /* make the stack 16-byte aligned */ - callPIC_PLT(CNAME(_sigprocmask)) + call__libc_sigprocmask addq$0x8,%rsp popq%rsi popq%rdi/* jmpbuf */ diff --git a/lib/libc/compat-43/sigcompat.c b/lib/libc/compat-43/sigcompat.c index 199846f..a8cef1c 100644 --- a/lib/libc/compat-43/sigcompat.c +++ b/lib/libc/compat-43/sigcompat.c @@ -59,7 +59,7 @@ sigvec(signo, sv, osv) } else sap = NULL; osap = osv != NULL ? &osa : NULL; - ret = _sigaction(signo, sap, osap); + ret = __libc_sigaction(signo, sap, osap); if (ret == 0 && osv != NULL) { osv->sv_handler = osa.sa_handler; osv->sv_flags = osa.sa_flags ^ SV_INTERRUPT; @@ -77,7 +77,7 @@ sigsetmask(mask) sigemptyset(&set); set.__bits[0] = mask; - n = _sigprocmask(SIG_SETMASK, &set, &oset); + n = __libc_sigprocmask(SIG_SETMASK, &set, &oset); if (n) return (n); return (oset.__bits[0]); @@ -92,7 +92,7 @@ sigblock(mask) sigemptyset(&set); set.__bits[0] = mask; - n = _sigprocmask(SIG_BLOCK, &set, &oset); + n = __libc_sigprocmask(SIG_BLOCK, &set, &oset); if (n) return (n); return (oset.__bits[0]); @@ -105,7 +105,7 @@ sigpause(int mask) sigemptyset(&set); set.__bits[0] = mask; - return (_sigsuspend(&set)); + return (__libc_sigsuspend(&set)); } int @@ -113,11 +113,11 @@ xsi_sigpause(int sig) { sigset_t set; - if (_sigprocmask(SIG_BLOCK, NULL, &set) == -1) + if (__libc_sigprocmask(SIG_BLOCK, NULL, &set) == -1) return (-1); if (sigdelset(&set, sig) == -1) return (-1); - return (_sigsuspend(&set)); + return (__libc_sigsuspend(&set)); } int @@ -128,7 +128,7 @@ sighold(int sig) sigemptyset(&set); if (sigaddset(&set, sig) == -1) return (-1); - return (_sigprocmask(SIG_BLOCK, &set, NULL)); + return (__libc_sigprocmask(SIG_BLOCK, &set, NULL)); } int @@ -138,7 +138,7 @@ sigignore(int sig) bzero(&sa, sizeof(sa)); sa.sa_handler = SIG_IGN; - return (_sigaction(sig, &sa, NULL)); + return (__libc_sigaction(sig, &sa, NULL)); } int @@ -149,7 +149,7 @@ sigrelse(int sig) sigemptyset(&set); if (sigaddset(&set, sig) == -1) return (-1); - return (_sigprocmask(SIG_UNBLOCK, &set, NULL)); + return (__libc_sigprocmask(SIG_UNBLOCK, &set, NULL)); } void @@ -161,26 +161,26 @@ void sigemptyset(&set); if (sigadd
Re: Latest stable (r287104) bash leaves zombies on exit
On 08/29/2015 15:41, Konstantin Belousov wrote: On Sat, Aug 29, 2015 at 03:01:38PM +0200, Jilles Tjoelker wrote: Looks good to me, except that I think a vforked child (in system() and posix_spawn*()) should use the system calls and not libthr's wrappers. This reduces the probability of weird things happening between vfork and exec, and also avoids an unexpected error when posix_spawnattr_setsigdefault()'s mask contains SIGTHR. Thank you for the review, I agree with the note about vfork. Updated patch is below. Also, I removed the PIC_PROLOGUE from the i386 setjmp, it has no use after the plt calls are removed. I verified the patch. The getumask part of lib/libc/gen/setmode.c part was rejected on stable/10 (probably due to other changes in ^/head.) Cheers Michiel ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Latest stable (r287104) bash leaves zombies on exit
On Sat, Aug 29, 2015 at 04:02:43PM +0200, Michiel Boland wrote: > I verified the patch. The getumask part of lib/libc/gen/setmode.c part was > rejected on stable/10 (probably due to other changes in ^/head.) Thank you. The setmode bits are from the Jilles' r280713. I will merge this revision when doing the MFC, unless Jilles do it first. The change is committed to HEAD as r287292, MFC set to 1 week. I will ask for EN after merge to stable/10. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Latest stable (r287104) bash leaves zombies on exit
On Sat, Aug 29, 2015 at 04:41:30PM +0300, Konstantin Belousov wrote: > On Sat, Aug 29, 2015 at 03:01:38PM +0200, Jilles Tjoelker wrote: > > Looks good to me, except that I think a vforked child (in system() and > > posix_spawn*()) should use the system calls and not libthr's wrappers. > > This reduces the probability of weird things happening between vfork and > > exec, and also avoids an unexpected error when > > posix_spawnattr_setsigdefault()'s mask contains SIGTHR. > Thank you for the review, I agree with the note about vfork. Updated > patch is below. Also, I removed the PIC_PROLOGUE from the i386 setjmp, > it has no use after the plt calls are removed. > [snip] > diff --git a/lib/libc/gen/posix_spawn.c b/lib/libc/gen/posix_spawn.c > index e3124b2..673c760 100644 > --- a/lib/libc/gen/posix_spawn.c > +++ b/lib/libc/gen/posix_spawn.c > @@ -118,15 +118,18 @@ process_spawnattr(const posix_spawnattr_t sa) > return (errno); > } > > - /* Set signal masks/defaults */ > + /* > + * Set signal masks/defaults. > + * Use unwrapped syscall, libthr is in undefined state after vfork(). > + */ > if (sa->sa_flags & POSIX_SPAWN_SETSIGMASK) { > - _sigprocmask(SIG_SETMASK, &sa->sa_sigmask, NULL); > + __libc_sigprocmask(SIG_SETMASK, &sa->sa_sigmask, NULL); > } > > if (sa->sa_flags & POSIX_SPAWN_SETSIGDEF) { > for (i = 1; i <= _SIG_MAXSIG; i++) { > if (sigismember(&sa->sa_sigdefault, i)) > - if (_sigaction(i, &sigact, NULL) != 0) > + if (__libc_sigaction(i, &sigact, NULL) != 0) > return (errno); > } > } Hmm, the comments say direct syscalls are being used, but in fact libthr's interposer is called. The change to system() does correctly use __sys_sigprocmask(). -- Jilles Tjoelker ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
NFS Client changing it's source address? (FreeBSD 10.2)
Hello, I have a server running FreeBSD 10.2. It has several NFS mounts. Frequently my NFS mount hang (v3). After a little investigation it looks like FreeBSD has chosen a wrong source address for it's connections and all packets are departing from the wrong interface. Sockstat output: [root@host004 ~]# sockstat -4 | grep 2049 root ssh14689 3 tcp4 10.4.2.4:2049910.4.2.5:22 ?? ? ? tcp4 10.13.37.4:67210.13.37.2:2049 ?? ? ? tcp4 79.x.x.210:90510.13.37.2:2049 ?? ? ? tcp4 79.x.x.210:99210.13.37.2:2049 tcpdump confirms nfs connection are trying to get out via the 79.x.x.x interface My fstab for the nfs mounts look like: 10.13.37.2:/tank/hostingbase /opt/jails/hostingbase nfs nfsv3,ro,noatime,async,noauto0 0 /opt/jails/hostingbase /opt/jails/test01 nullfs ro,noatime,noauto 0 0 10.13.37.2:/tank/hosting/test /opt/jails/test01/opt nfs nfsv3,noatime,async,rw,noauto 0 0 tmpfs /opt/jails/test01/shm tmpfs rw,size=51200,noauto 0 0 /opt/jails/hostingbase /opt/jails/test2 nullfs ro,noatime,noauto 0 0 10.13.37.2:/tank/hosting/test2 /opt/jails/test2/opt nfs nfsv3,noatime,async,rw,noauto 0 0 tmpfs /opt/jails/test2/shm tmpfs rw,size=51200,noauto 0 0 The change of source address looks to be happening after a nfs connection is re-established. At first everything works, I leave the server idling (it's a test server) and after that the mounts are hanging 10.2-RELEASE #0 r28 is the current running version. Regards, Frank de Bot ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
FreeBSD_stable_10 - Build #1659 - Fixed
FreeBSD_stable_10 - Build #1659 - Fixed: Build information: https://jenkins.FreeBSD.org/job/FreeBSD_stable_10/1659/ Full change log: https://jenkins.FreeBSD.org/job/FreeBSD_stable_10/1659/changes Full build log: https://jenkins.FreeBSD.org/job/FreeBSD_stable_10/1659/console Change summaries: No changes ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: NFS Client changing it's source address? (FreeBSD 10.2)
Frank de Bot wrote: > Hello, > > I have a server running FreeBSD 10.2. It has several NFS mounts. > Frequently my NFS mount hang (v3). After a little investigation it looks > like FreeBSD has chosen a wrong source address for it's connections and > all packets are departing from the wrong interface. > > Sockstat output: > > [root@host004 ~]# sockstat -4 | grep 2049 > root ssh14689 3 tcp4 10.4.2.4:2049910.4.2.5:22 > ?? ? ? tcp4 10.13.37.4:67210.13.37.2:2049 > ?? ? ? tcp4 79.x.x.210:90510.13.37.2:2049 > ?? ? ? tcp4 79.x.x.210:99210.13.37.2:2049 > > tcpdump confirms nfs connection are trying to get out via the 79.x.x.x > interface > > My fstab for the nfs mounts look like: > > 10.13.37.2:/tank/hostingbase /opt/jails/hostingbase nfs > nfsv3,ro,noatime,async,noauto0 0 > > /opt/jails/hostingbase /opt/jails/test01 nullfs > ro,noatime,noauto 0 0 > 10.13.37.2:/tank/hosting/test /opt/jails/test01/opt nfs > nfsv3,noatime,async,rw,noauto 0 0 > tmpfs /opt/jails/test01/shm tmpfs > rw,size=51200,noauto 0 0 > > /opt/jails/hostingbase /opt/jails/test2 nullfs > ro,noatime,noauto 0 0 > 10.13.37.2:/tank/hosting/test2 /opt/jails/test2/opt nfs > nfsv3,noatime,async,rw,noauto 0 0 > tmpfs /opt/jails/test2/shm tmpfs > rw,size=51200,noauto 0 0 > > > The change of source address looks to be happening after a nfs > connection is re-established. At first everything works, I leave the > server idling (it's a test server) and after that the mounts are hanging > If the client side of the kernel RPC needs a new TCP connection to the server, it will do a soconnect(). It will be done by whatever process/thread is doing the NFS I/O syscall at the time that the connection isn't available. I'd guess that a process/thread that is running in a jail that can't see the 10.13.37 network does this soconnect() and then it is broken. I am not conversant w.r.t. jails and I don't know of any clever way around this. A couple of "hackish" workarounds that might work: - Reconfigure your NFS server so that it never drops idle TCP connections. (FreeBSD never drops an NFS TCP connection until umount, but some others like Solaris NFS servers drop idle connections.) - Make your NFS server accessible through the 79.n.n network instead of 10.13.37. - Write a daemon that does a stat() of a file on the NFS mount once per minute, so the NFS connection is never idle long enough to be disconnected. rick > 10.2-RELEASE #0 r28 is the current running version. > > Regards, > > Frank de Bot > ___ > freebsd-stable@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" > ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: NFS Client changing it's source address? (FreeBSD 10.2)
I wrote: > - Reconfigure your NFS server so that it never drops idle TCP connections. > (FreeBSD never drops an NFS TCP connection until umount, but some others >like Solaris NFS servers drop idle connections.) Oops, I forgot that the kernel RPC (I wasn't the author) does drop idle TCP connection(s) from client(s) after 6 minutes without RPC activity). I can't remember if the client side times out for NFS? The only time the idle timeout is disabled in the server is for NFSv4.1 with a backchannel on the TCP connection. Disabling it on the server is a 1line source change, but there isn't a sysctl for it (maybe there should be?). rick ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"