Re: Latest stable (r287104) bash leaves zombies on exit
On 08/27/2015 22:16, Konstantin Belousov wrote: [...] I just verified that the signal handler is correctly wrapped for me, on the latest stable/10. Both with the pre-linked libthr.so and with the library loaded dynamically at runtime. I used the test program at the end of the message, put breakpoint on the sigusr2_handler, and looked at the backtrace, which must include thr_sighandler(). It did in my case, for binary built with and without -lpthread. Can you verify the presence of thr_sighandler() in the backtrace for this test program, on your system ? Verified, see below. Cheers Michiel Breakpoint 1, sigusr2_handler (signo=31, si=0x7fffe430, u=0x7fffe0c0) at rtld_sigresolv.c:24 24 wait(NULL); Current language: auto; currently minimal (gdb) bt #0 sigusr2_handler (signo=31, si=0x7fffe430, u=0x7fffe0c0) at rtld_sigresolv.c:24 #1 0x00080100d947 in handle_signal (actp=value optimized out, sig=31, info=0x7fffe430, ucp=0x7fffe0c0) at /usr/src/lib/libthr/thread/thr_sig.c:243 #2 0x00080100d158 in thr_sighandler (sig=value optimized out, info=value optimized out, _ucp=value optimized out) at /usr/src/lib/libthr/thread/thr_sig.c:188 #3 signal handler called #4 thr_kill () at thr_kill.S:3 #5 0x000800965066 in __raise (s=value optimized out) at /usr/src/lib/libc/gen/raise.c:51 #6 0x00400c72 in atexit_code () at rtld_sigresolv.c:31 #7 0x00080093d406 in __cxa_finalize (dso=0x0) at /usr/src/lib/libc/stdlib/atexit.c:200 #8 0x0008008de92c in exit (status=0) at /usr/src/lib/libc/stdlib/exit.c:67 #9 0x00400946 in _start (ap=value optimized out, cleanup=value optimized out) at /usr/src/lib/csu/amd64/crt1.c:78 #10 0x000800621000 in ?? () #11 0x in ?? () ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ia64 stable/10 r286316: hang at Entering /boot/kernel/kernel
From kostik...@gmail.com Thu Aug 27 18:22:37 2015 On Thu, Aug 27, 2015 at 01:12:16PM +0100, Anton Shterenlikht wrote: ia64 stable/10 r286315 boots, but r286316 hangs at Entering /boot/kernel/kernel. Please advise To state an obvious thing. The commit which you pointed to, changes the code which is not executed at that early kernel boot stage. The revision cannot cause the consequences you described. yes, I'm surprised too. I think that you either have build-environment issue which randomly pops up, or there is some other boot-time issue which is sporadic. The only suggestion I have, try many boots with kernels which look either good or bad, I would be not surprised if statistic would be completely different from binary good/bad outcome. Otherwise, I do not have an idea. I doubt it's a random or a sporadic issue. I did a bisection, as suggested, during which I built world/kernel on 7 revisions, and when I narrowed it down to 50, a further 4 kernels. All kernels =286315 boot, all kernels = 286316 do not. I think if it were something random, it wouldn't be such a clear cut picture. What about my loader.conf: # cat /boot/loader.conf zfs_load=YES # soft limits kern.dfldsiz=536748032 # default soft limit for process data kern.dflssiz=536748032 # default soft limit for stack # hard limits kern.maxdsiz=536748032 # hard limit for process data kern.maxssiz=536748032 # hard limit for stack kern.maxtsiz=536748032 # hard limit for text size # processes may not exceed these limits. # My memory: real memory = 8589934592 (8192 MB) avail memory = 8387649536 (7999 MB) I'll try disabling all these settings in loader.conf and see if makes a difference. But these settings have been there for a few years with no problems. Anton ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ia64 stable/10 r286316: hang at Entering /boot/kernel/kernel
To add a very small (useless) data point to this, I have an atom device that, very occasionally, hangs before the boot stage (at the little slash, prior to the daemon boot menu offering you the chance to select another kernel etc). I haven't worked out the rhyme or reason yet, so its probably a red herring, but its frustrated me when i have to dig out the monitor and keyboard again. At least it did with 10.1-release, yet to have it happen with stable. Cheers, Joe On 28/08/2015 8:30 PM, Anton Shterenlikht wrote: From kostik...@gmail.com Thu Aug 27 18:22:37 2015 On Thu, Aug 27, 2015 at 01:12:16PM +0100, Anton Shterenlikht wrote: ia64 stable/10 r286315 boots, but r286316 hangs at Entering /boot/kernel/kernel. Please advise To state an obvious thing. The commit which you pointed to, changes the code which is not executed at that early kernel boot stage. The revision cannot cause the consequences you described. yes, I'm surprised too. I think that you either have build-environment issue which randomly pops up, or there is some other boot-time issue which is sporadic. The only suggestion I have, try many boots with kernels which look either good or bad, I would be not surprised if statistic would be completely different from binary good/bad outcome. Otherwise, I do not have an idea. I doubt it's a random or a sporadic issue. I did a bisection, as suggested, during which I built world/kernel on 7 revisions, and when I narrowed it down to 50, a further 4 kernels. All kernels =286315 boot, all kernels = 286316 do not. I think if it were something random, it wouldn't be such a clear cut picture. What about my loader.conf: # cat /boot/loader.conf zfs_load=YES # soft limits kern.dfldsiz=536748032 # default soft limit for process data kern.dflssiz=536748032 # default soft limit for stack # hard limits kern.maxdsiz=536748032 # hard limit for process data kern.maxssiz=536748032 # hard limit for stack kern.maxtsiz=536748032 # hard limit for text size # processes may not exceed these limits. # My memory: real memory = 8589934592 (8192 MB) avail memory = 8387649536 (7999 MB) I'll try disabling all these settings in loader.conf and see if makes a difference. But these settings have been there for a few years with no problems. Anton ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Latest stable (r287104) bash leaves zombies on exit
On Fri, Aug 28, 2015 at 08:08:27AM +0200, Michiel Boland wrote: On 08/27/2015 22:16, Konstantin Belousov wrote: [...] I just verified that the signal handler is correctly wrapped for me, on the latest stable/10. Both with the pre-linked libthr.so and with the library loaded dynamically at runtime. I used the test program at the end of the message, put breakpoint on the sigusr2_handler, and looked at the backtrace, which must include thr_sighandler(). It did in my case, for binary built with and without -lpthread. Can you verify the presence of thr_sighandler() in the backtrace for this test program, on your system ? Verified, see below. Cheers Michiel Breakpoint 1, sigusr2_handler (signo=31, si=0x7fffe430, u=0x7fffe0c0) at rtld_sigresolv.c:24 24 wait(NULL); Current language: auto; currently minimal (gdb) bt #0 sigusr2_handler (signo=31, si=0x7fffe430, u=0x7fffe0c0) at rtld_sigresolv.c:24 #1 0x00080100d947 in handle_signal (actp=value optimized out, sig=31, info=0x7fffe430, ucp=0x7fffe0c0) at /usr/src/lib/libthr/thread/thr_sig.c:243 #2 0x00080100d158 in thr_sighandler (sig=value optimized out, info=value optimized out, _ucp=value optimized out) at /usr/src/lib/libthr/thread/thr_sig.c:188 #3 signal handler called I suppose this is with the binary built without -lpthread ? I probably have an idea what is going wrong. Please try the patch below. Libc does not used interposed sig{procmask,action,suspend} entries itself, which resulted in e.g. signal(3) breaking libthr hooks. #4 thr_kill () at thr_kill.S:3 #5 0x000800965066 in __raise (s=value optimized out) at /usr/src/lib/libc/gen/raise.c:51 #6 0x00400c72 in atexit_code () at rtld_sigresolv.c:31 #7 0x00080093d406 in __cxa_finalize (dso=0x0) at /usr/src/lib/libc/stdlib/atexit.c:200 #8 0x0008008de92c in exit (status=0) at /usr/src/lib/libc/stdlib/exit.c:67 #9 0x00400946 in _start (ap=value optimized out, cleanup=value optimized out) at /usr/src/lib/csu/amd64/crt1.c:78 #10 0x000800621000 in ?? () #11 0x in ?? () diff --git a/lib/libc/amd64/gen/setjmp.S b/lib/libc/amd64/gen/setjmp.S index c26f52f..826220e 100644 --- a/lib/libc/amd64/gen/setjmp.S +++ b/lib/libc/amd64/gen/setjmp.S @@ -55,7 +55,7 @@ ENTRY(setjmp) movq$0,%rsi /* (sigset_t*)set */ leaq72(%rcx),%rdx /* 9,10; (sigset_t*)oset */ /* stack is 16-byte aligned */ - callPIC_PLT(CNAME(_sigprocmask)) + call__libc_sigprocmask popq%rdi movq%rdi,%rcx movq0(%rsp),%rdx/* retval */ @@ -82,7 +82,7 @@ ENTRY(__longjmp) leaq72(%rdx),%rsi /* (sigset_t*)set */ movq$0,%rdx /* (sigset_t*)oset */ subq$0x8,%rsp /* make the stack 16-byte aligned */ - callPIC_PLT(CNAME(_sigprocmask)) + call__libc_sigprocmask addq$0x8,%rsp popq%rsi popq%rdi/* jmpbuf */ diff --git a/lib/libc/amd64/gen/sigsetjmp.S b/lib/libc/amd64/gen/sigsetjmp.S index 9a20556..1e8e77c 100644 --- a/lib/libc/amd64/gen/sigsetjmp.S +++ b/lib/libc/amd64/gen/sigsetjmp.S @@ -63,7 +63,7 @@ ENTRY(sigsetjmp) movq$0,%rsi /* (sigset_t*)set */ leaq72(%rcx),%rdx /* 9,10 (sigset_t*)oset */ /* stack is 16-byte aligned */ - callPIC_PLT(CNAME(_sigprocmask)) + call__libc_sigprocmask popq%rdi 2: movq%rdi,%rcx movq0(%rsp),%rdx/* retval */ @@ -91,7 +91,7 @@ ENTRY(__siglongjmp) leaq72(%rdx),%rsi /* (sigset_t*)set */ movq$0,%rdx /* (sigset_t*)oset */ subq$0x8,%rsp /* make the stack 16-byte aligned */ - callPIC_PLT(CNAME(_sigprocmask)) + call__libc_sigprocmask addq$0x8,%rsp popq%rsi popq%rdi/* jmpbuf */ diff --git a/lib/libc/compat-43/sigcompat.c b/lib/libc/compat-43/sigcompat.c index 199846f..a8cef1c 100644 --- a/lib/libc/compat-43/sigcompat.c +++ b/lib/libc/compat-43/sigcompat.c @@ -59,7 +59,7 @@ sigvec(signo, sv, osv) } else sap = NULL; osap = osv != NULL ? osa : NULL; - ret = _sigaction(signo, sap, osap); + ret = __libc_sigaction(signo, sap, osap); if (ret == 0 osv != NULL) { osv-sv_handler = osa.sa_handler; osv-sv_flags = osa.sa_flags ^ SV_INTERRUPT; @@ -77,7 +77,7 @@ sigsetmask(mask) sigemptyset(set); set.__bits[0] = mask; - n = _sigprocmask(SIG_SETMASK, set, oset); + n = __libc_sigprocmask(SIG_SETMASK, set, oset); if (n) return (n); return (oset.__bits[0]); @@ -92,7 +92,7 @@
Re: ia64 stable/10 r286316: hang at Entering /boot/kernel/kernel
On Fri, Aug 28, 2015 at 11:30:18AM +0100, Anton Shterenlikht wrote: From kostik...@gmail.com Thu Aug 27 18:22:37 2015 On Thu, Aug 27, 2015 at 01:12:16PM +0100, Anton Shterenlikht wrote: ia64 stable/10 r286315 boots, but r286316 hangs at Entering /boot/kernel/kernel. Please advise To state an obvious thing. The commit which you pointed to, changes the code which is not executed at that early kernel boot stage. The revision cannot cause the consequences you described. yes, I'm surprised too. I think that you either have build-environment issue which randomly pops up, or there is some other boot-time issue which is sporadic. The only suggestion I have, try many boots with kernels which look either good or bad, I would be not surprised if statistic would be completely different from binary good/bad outcome. Otherwise, I do not have an idea. I doubt it's a random or a sporadic issue. I did a bisection, as suggested, during which I built world/kernel on 7 revisions, and when I narrowed it down to 50, a further 4 kernels. All kernels =286315 boot, all kernels = 286316 do not. I think if it were something random, it wouldn't be such a clear cut picture. What about my loader.conf: # cat /boot/loader.conf zfs_load=YES # soft limits kern.dfldsiz=536748032 # default soft limit for process data kern.dflssiz=536748032 # default soft limit for stack # hard limits kern.maxdsiz=536748032 # hard limit for process data kern.maxssiz=536748032 # hard limit for stack kern.maxtsiz=536748032 # hard limit for text size # processes may not exceed these limits. # My memory: real memory = 8589934592 (8192 MB) avail memory = 8387649536 (7999 MB) I'll try disabling all these settings in loader.conf and see if makes a difference. But these settings have been there for a few years with no problems. In the initial range you mentioned, there were some changes related to the handling of the userspace stacks. But again, the problem occurs too early for a userspace-related modification to affect the outcome. Might be, try the latest stable/10 kernel with the problematic revision r286316 reversed ? This might add more points to the Marcel' note about some static relocation table processed early. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ia64 stable/10 r286316: hang at Entering /boot/kernel/kernel
From me...@bristol.ac.uk Fri Aug 28 11:34:20 2015 What about my loader.conf: # cat /boot/loader.conf zfs_load=YES # soft limits kern.dfldsiz=536748032 # default soft limit for process data kern.dflssiz=536748032 # default soft limit for stack # hard limits kern.maxdsiz=536748032 # hard limit for process data kern.maxssiz=536748032 # hard limit for stack kern.maxtsiz=536748032 # hard limit for text size # processes may not exceed these limits. # My memory: real memory = 8589934592 (8192 MB) avail memory = 8387649536 (7999 MB) I'll try disabling all these settings in loader.conf and see if makes a difference. But these settings have been there for a few years with no problems. Anton yes, this does help: # uname -a FreeBSD 10.2-PRERELEASE FreeBSD 10.2-PRERELEASE #12 r286316: Thu Aug 27 11:03:44 BST 2015 r...@mech-as28.men.bris.ac.uk:/usr/obj/usr/src/sys/GENERIC ia64 # I guess I now need to check if it's zfs of the limits. Anton ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Latest stable (r287104) bash leaves zombies on exit
On 08/28/2015 12:01, Konstantin Belousov wrote: [...] I probably have an idea what is going wrong. Please try the patch below. Libc does not used interposed sig{procmask,action,suspend} entries itself, which resulted in e.g. signal(3) breaking libthr hooks. I'm trying now, and it did appear to get rid of the zombies. Here's a quick test. set -e for a in `seq 1000` do echo -n $a xterm -e ssh nonexisting done echo (The idea here is that 'ssh nonexisting' should do some work and then exit, xterm -e false, etc. don't appear to trigger the bug.) Prior to the patch, one of the xterms would hang after the counter reaches a random (reasonably small) number. After the patch the script runs till completion. Cheers Michiel ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Latest stable (r287104) bash leaves zombies on exit
On Fri, Aug 28, 2015 at 05:52:42PM +0200, Michiel Boland wrote: set -e for a in `seq 1000` do echo -n $a xterm -e ssh nonexisting done echo (The idea here is that 'ssh nonexisting' should do some work and then exit, xterm -e false, etc. don't appear to trigger the bug.) Prior to the patch, one of the xterms would hang after the counter reaches a random (reasonably small) number. After the patch the script runs till completion. Thank you for testing. Funny detail is that your loop does not hangs for me, I see flapping xterms until the completion. How many cpus does your machine have ? Below is a slightly improved version of the change, to avoid unnecessary relocations. Would be good to rebuild the world and confirm that you see no regression (the patch also affects rtld in some way). diff --git a/lib/libc/amd64/gen/setjmp.S b/lib/libc/amd64/gen/setjmp.S index c26f52f..826220e 100644 --- a/lib/libc/amd64/gen/setjmp.S +++ b/lib/libc/amd64/gen/setjmp.S @@ -55,7 +55,7 @@ ENTRY(setjmp) movq$0,%rsi /* (sigset_t*)set */ leaq72(%rcx),%rdx /* 9,10; (sigset_t*)oset */ /* stack is 16-byte aligned */ - callPIC_PLT(CNAME(_sigprocmask)) + call__libc_sigprocmask popq%rdi movq%rdi,%rcx movq0(%rsp),%rdx/* retval */ @@ -82,7 +82,7 @@ ENTRY(__longjmp) leaq72(%rdx),%rsi /* (sigset_t*)set */ movq$0,%rdx /* (sigset_t*)oset */ subq$0x8,%rsp /* make the stack 16-byte aligned */ - callPIC_PLT(CNAME(_sigprocmask)) + call__libc_sigprocmask addq$0x8,%rsp popq%rsi popq%rdi/* jmpbuf */ diff --git a/lib/libc/amd64/gen/sigsetjmp.S b/lib/libc/amd64/gen/sigsetjmp.S index 9a20556..1e8e77c 100644 --- a/lib/libc/amd64/gen/sigsetjmp.S +++ b/lib/libc/amd64/gen/sigsetjmp.S @@ -63,7 +63,7 @@ ENTRY(sigsetjmp) movq$0,%rsi /* (sigset_t*)set */ leaq72(%rcx),%rdx /* 9,10 (sigset_t*)oset */ /* stack is 16-byte aligned */ - callPIC_PLT(CNAME(_sigprocmask)) + call__libc_sigprocmask popq%rdi 2: movq%rdi,%rcx movq0(%rsp),%rdx/* retval */ @@ -91,7 +91,7 @@ ENTRY(__siglongjmp) leaq72(%rdx),%rsi /* (sigset_t*)set */ movq$0,%rdx /* (sigset_t*)oset */ subq$0x8,%rsp /* make the stack 16-byte aligned */ - callPIC_PLT(CNAME(_sigprocmask)) + call__libc_sigprocmask addq$0x8,%rsp popq%rsi popq%rdi/* jmpbuf */ diff --git a/lib/libc/compat-43/sigcompat.c b/lib/libc/compat-43/sigcompat.c index 199846f..a8cef1c 100644 --- a/lib/libc/compat-43/sigcompat.c +++ b/lib/libc/compat-43/sigcompat.c @@ -59,7 +59,7 @@ sigvec(signo, sv, osv) } else sap = NULL; osap = osv != NULL ? osa : NULL; - ret = _sigaction(signo, sap, osap); + ret = __libc_sigaction(signo, sap, osap); if (ret == 0 osv != NULL) { osv-sv_handler = osa.sa_handler; osv-sv_flags = osa.sa_flags ^ SV_INTERRUPT; @@ -77,7 +77,7 @@ sigsetmask(mask) sigemptyset(set); set.__bits[0] = mask; - n = _sigprocmask(SIG_SETMASK, set, oset); + n = __libc_sigprocmask(SIG_SETMASK, set, oset); if (n) return (n); return (oset.__bits[0]); @@ -92,7 +92,7 @@ sigblock(mask) sigemptyset(set); set.__bits[0] = mask; - n = _sigprocmask(SIG_BLOCK, set, oset); + n = __libc_sigprocmask(SIG_BLOCK, set, oset); if (n) return (n); return (oset.__bits[0]); @@ -105,7 +105,7 @@ sigpause(int mask) sigemptyset(set); set.__bits[0] = mask; - return (_sigsuspend(set)); + return (__libc_sigsuspend(set)); } int @@ -113,11 +113,11 @@ xsi_sigpause(int sig) { sigset_t set; - if (_sigprocmask(SIG_BLOCK, NULL, set) == -1) + if (__libc_sigprocmask(SIG_BLOCK, NULL, set) == -1) return (-1); if (sigdelset(set, sig) == -1) return (-1); - return (_sigsuspend(set)); + return (__libc_sigsuspend(set)); } int @@ -128,7 +128,7 @@ sighold(int sig) sigemptyset(set); if (sigaddset(set, sig) == -1) return (-1); - return (_sigprocmask(SIG_BLOCK, set, NULL)); + return (__libc_sigprocmask(SIG_BLOCK, set, NULL)); } int @@ -138,7 +138,7 @@ sigignore(int sig) bzero(sa, sizeof(sa)); sa.sa_handler = SIG_IGN; - return (_sigaction(sig, sa, NULL)); + return (__libc_sigaction(sig, sa, NULL)); } int @@ -149,7 +149,7 @@ sigrelse(int sig) sigemptyset(set); if (sigaddset(set, sig) == -1)
Re: Panic [page fault] in _ieee80211_crypto_delkey(): stable/10/amd64 @r286878
On Wed, Aug 19, 2015 at 01:01:24PM -0700, David Wolfskill wrote: On Wed, Aug 19, 2015 at 12:25:38PM -0700, Adrian Chadd wrote: ... But we definitely ahe enough to put into a PR.. ... Bug 202494 - Panic [page fault] in _ieee80211_crypto_delkey() https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=202494 Caught another one (panic) -- this time, after having run wlandebug +crypto. Above-cited PR updated to reflect updated status dubugging info. Peace, david -- David H. Wolfskill da...@catwhisker.org Those who would murder in the name of God or prophet are blasphemous cowards. See http://www.catwhisker.org/~david/publickey.gpg for my public key. pgpik1QWBpJwb.pgp Description: PGP signature
Re: ia64 stable/10 r286316: hang at Entering /boot/kernel/kernel
On Aug 28, 2015, at 3:35 AM, Konstantin Belousov kostik...@gmail.com wrote: Might be, try the latest stable/10 kernel with the problematic revision r286316 reversed ? This might add more points to the Marcel' note about some static relocation table processed early. I built a kernel off of revision 286315 and got this: eris% objdump -R kernel | grep FPTR64LSB | wc -l 5377 We only reserve room for 4096 relocations, so we’re over as it is. A kernel off of revision 286316 gave me this: eris% objdump -R kernel | grep FPTR64LSB | wc -l 5377 Same. Odd, but ok. It’s possible that the memory layout changed such that we now scribble over something that’s important. To be sure: Anton can you apply the following patch and tell me if it makes a difference. It doubles the space we set aside for relocations. Index: sys/ia64/ia64/locore.S === --- sys/ia64/ia64/locore.S (revision 286316) +++ sys/ia64/ia64/locore.S (working copy) @@ -357,5 +357,5 @@ .align 16 .global fptr_storage fptr_storage: - .space 4096*16 // XXX + .space 8192*16 // XXX fptr_storage_end: -- Marcel Moolenaar mar...@xcllnt.net signature.asc Description: Message signed with OpenPGP using GPGMail