weird hangs in current (ghc, gnucash)
Hi! I've just updated my kernel from 10.99.10 to 10.99.10 (~ Oct 11 to Oct 20) to test the rge(4) changes, and started a bulk build, and the packages using ghc seem to wait for something and make no progress. In one of my sandboxes there is a hs-data-array-byte build but it's not doing anything. The log stops at: ===> Creating toolchain wrappers for hs-data-array-byte-0.1.0.1nb2 ===> Configuring for hs-data-array-byte-0.1.0.1nb2 => Checking for portability problems in extracted files [1 of 2] Compiling Main ( Setup.hs, Setup.o ) >From ps: pbulk 26131 0.0 0.1 1073923564 140684 ? Il8:23PM 0:00.23 /usr/pkg/lib/ghc-9.4.7/bin/./ghc-9.4.7 -B/usr/pkg/lib/ghc-9.4.7/lib -package-env - --make Setup -dynamic (btw, that is a really huge process size?!) Attaching with gdb shows me: [Switching to LWP 20090 of process 26131] 0x7195fa607a1a in ___lwp_park60 () from /usr/lib/libc.so.12 (gdb) bt #0 0x7195fa607a1a in ___lwp_park60 () from /usr/lib/libc.so.12 #1 0x7195fa97dc4d in pthread_cond_timedwait () from /usr/lib/libpthread.so.1 #2 0x7195faae1472 in waitCondition (pCond=pCond@entry=0x7195fa22f010, pMut=pMut@entry=0x7195fa22f038) at rts/posix/OSThreads.c:143 #3 0x7195faa903e1 in waitForWorkerCapability (task=) at rts/Capability.c:707 #4 yieldCapability (pCap=pCap@entry=0x7195f77fff10, task=task@entry=0x7195fa22f000, gcAllowed=gcAllowed@entry=true) at rts/Capability.c:1011 #5 0x7195faab0026 in scheduleYield (task=0x7195fa22f000, pcap=0x7195f77fff08) at rts/Schedule.c:709 #6 schedule (initialCapability=initialCapability@entry=0x7195fab21cc0 , task=task@entry=0x7195fa22f000) at rts/Schedule.c:319 #7 0x7195faab20b9 in scheduleWorker (cap=cap@entry=0x7195fab21cc0 , task=task@entry=0x7195fa22f000) at rts/Schedule.c:2668 #8 0x7195faab78a2 in workerStart (task=0x7195fa22f000) at rts/Task.c:444 #9 0x7195fa97f2df in pthread.create_tramp () from /usr/lib/libpthread.so.1 #10 0x7195fa5f0c60 in ?? () from /usr/lib/libc.so.12 #11 0x0020 in ?? () #12 0x in ?? () (gdb) thread apply all bt Thread 6 (LWP 26131 of process 26131 ""): #0 0x7195fa607a1a in ___lwp_park60 () from /usr/lib/libc.so.12 #1 0x7195fa97dc4d in pthread_cond_timedwait () from /usr/lib/libpthread.so.1 #2 0x7195faae1472 in waitCondition (pCond=pCond@entry=0x7195fa2b2010, pMut=pMut@entry=0x7195fa2b2038) at rts/posix/OSThreads.c:143 #3 0x7195faa903e1 in waitForWorkerCapability (task=) at rts/Capability.c:707 #4 yieldCapability (pCap=pCap@entry=0x7f7fff2287c0, task=task@entry=0x7195fa2b2000, gcAllowed=gcAllowed@entry=true) at rts/Capability.c:1011 #5 0x7195faab0026 in scheduleYield (task=0x7195fa2b2000, pcap=0x7f7fff2287b8) at rts/Schedule.c:709 #6 schedule (initialCapability=initialCapability@entry=0x7195fab21cc0 , task=task@entry=0x7195fa2b2000) at rts/Schedule.c:319 #7 0x7195faab2069 in scheduleWaitThread (tso=0x4200406ce8, ret=ret@entry=0x0, pcap=pcap@entry=0x7f7fff228940) at rts/Schedule.c:2651 #8 0x7195faaa85fb in rts_evalLazyIO (cap=cap@entry=0x7f7fff228940, p=p@entry=0x1071e60, ret=ret@entry=0x0) at rts/RtsAPI.c:566 #9 0x7195faaabb48 in hs_main (argc=, argv=, main_closure=0x1071e60, rts_config=...) at rts/RtsMain.c:72 #10 0x01063124 in main () Thread 5 (LWP 7329 of process 26131 "ghc_ticker"): #0 0x7195fa607a1a in ___lwp_park60 () from /usr/lib/libc.so.12 #1 0x7195fa97dc4d in pthread_cond_timedwait () from /usr/lib/libpthread.so.1 #2 0x7195faae1472 in waitCondition (pCond=pCond@entry=0x7195fab21bc0 , pMut=pMut@entry=0x7195fab21b80 ) at rts/posix/OSThreads.c:143 #3 0x7195faae040e in itimer_thread_func (_handle_tick=0x7195faab9c57 ) at rts/posix/ticker/Pthread.c:140 #4 0x7195fa97f2df in pthread.create_tramp () from /usr/lib/libpthread.so.1 #5 0x7195fa5f0c60 in ?? () from /usr/lib/libc.so.12 #6 0x in ?? () Thread 4 (LWP 15032 of process 26131 "ghc_worker"): #0 0x7195fa5a030a in _sys___kevent100 () from /usr/lib/libc.so.12 #1 0x7195fa97a8a7 in __kevent100 () from /usr/lib/libpthread.so.1 #2 0x7195fba014f2 in base_GHCziEventziKQueue_new12_info () from /usr/pkg/lib/ghc-9.4.7/lib/x86_64-netbsd-ghc-9.4.7/libHSbase-4.17.2.0-ghc9.4.7.so #3 0x in ?? () Thread 3 (LWP 17781 of process 26131 "ghc_worker"): #0 0x7195fa5a016a in poll () from /usr/lib/libc.so.12 #1 0x7195fa97ae63 in poll () from /usr/lib/libpthread.so.1 #2 0x7195fba0ff55 in ?? () from /usr/pkg/lib/ghc-9.4.7/lib/x86_64-netbsd-ghc-9.4.7/libHSbase-4.17.2.0-ghc9.4.7.so #3 0x in ?? () Thread 2 (LWP 23219 of process 26131 "ghc_worker"): #0 0x7195fa607a1a in ___lwp_park60 () from /usr/lib/libc.so.12 #1 0x7195fa97dc4d in pthread_cond_timedwait () from /usr/lib/libpthread.so.1 #2 0x7195faae1472 in waitCondition (pCond=pCond@entry=0x7195fa2b2190, pMut=pMut@entry=0x7195fa2b21b8) at rts/posix/OSThreads.c:143 #3
Re: weird hangs in current (ghc, gnucash)
On Sun, Oct 22, 2023 at 10:37:54PM +0200, Thomas Klausner wrote: > I've just updated my kernel from 10.99.10 to 10.99.10 (~ Oct 11 to Oct > 20) to test the rge(4) changes, and started a bulk build, and the > packages using ghc seem to wait for something and make no progress. ... > I see one other new weird behaviour on that machine - gnucash doesn't > finish starting up. I've backed out ad's changes from the 13th, and both problems are gone. I'll attach my local change. Andrew, can you please take a look? Thanks, Thomas Module Name:src Committed By: ad Date: Fri Oct 13 18:48:56 UTC 2023 Modified Files: src/sys/kern: kern_condvar.c kern_sleepq.c src/sys/rump/librump/rumpkern: locks.c locks_up.c src/sys/sys: condvar.h lwp.h Log Message: Add cv_fdrestart() (better name suggestions welcome): Like cv_broadcast(), but make any LWPs that share the same file descriptor table as the caller return ERESTART when resuming. Used to dislodge LWPs waiting for I/O that prevent a file descriptor from being closed, without upsetting access to the file (not descriptor) made from another direction. To generate a diff of this commit: cvs rdiff -u -r1.59 -r1.60 src/sys/kern/kern_condvar.c cvs rdiff -u -r1.83 -r1.84 src/sys/kern/kern_sleepq.c cvs rdiff -u -r1.86 -r1.87 src/sys/rump/librump/rumpkern/locks.c cvs rdiff -u -r1.12 -r1.13 src/sys/rump/librump/rumpkern/locks_up.c cvs rdiff -u -r1.17 -r1.18 src/sys/sys/condvar.h cvs rdiff -u -r1.227 -r1.228 src/sys/sys/lwp.h Module Name:src Committed By: ad Date: Fri Oct 13 18:50:39 UTC 2023 Modified Files: src/sys/kern: uipc_socket.c uipc_syscalls.c src/sys/sys: socketvar.h Log Message: Use cv_fdrestart() to implement fo_restart. To generate a diff of this commit: cvs rdiff -u -r1.305 -r1.306 src/sys/kern/uipc_socket.c cvs rdiff -u -r1.208 -r1.209 src/sys/kern/uipc_syscalls.c cvs rdiff -u -r1.165 -r1.166 src/sys/sys/socketvar.h Module Name:src Committed By: ad Date: Fri Oct 13 19:07:09 UTC 2023 Modified Files: src/sys/ddb: db_command.c db_interface.h db_xxx.c src/sys/kern: sys_pipe.c src/sys/sys: pipe.h src/usr.bin/fstat: fstat.c Log Message: Simplify/streamline pipes a little bit: - Allocate only one struct pipe not two (no need to be bidirectional here). - Then use f_flag (FREAD/FWRITE) to figure out what to do in the fileops. - Never wake the other side or acquire long-term (I/O) lock unless needed. - Whenever possible, defer wakeups until after locks have been released. - Do some things locklessly in pipe_ioctl() and pipe_poll(). Some notable results: - -30% latency on a 486DX2/66 doing 1 byte ping-pong within a single process. - 2.5x less lock contention during "make cleandir" of src on a 48 CPU machine. - 1.5x bandwith with 1kB messages on the same 48 CPU machine (8kB: same b/w). To generate a diff of this commit: cvs rdiff -u -r1.186 -r1.187 src/sys/ddb/db_command.c cvs rdiff -u -r1.41 -r1.42 src/sys/ddb/db_interface.h cvs rdiff -u -r1.77 -r1.78 src/sys/ddb/db_xxx.c cvs rdiff -u -r1.164 -r1.165 src/sys/kern/sys_pipe.c cvs rdiff -u -r1.39 -r1.40 src/sys/sys/pipe.h cvs rdiff -u -r1.118 -r1.119 src/usr.bin/fstat/fstat.c ad.backed.out.diff.gz Description: Binary data
Re: weird hangs in current (ghc, gnucash)
On Sun, Oct 22, 2023 at 11:06:25PM +0200, Thomas Klausner wrote: > On Sun, Oct 22, 2023 at 10:37:54PM +0200, Thomas Klausner wrote: > > I've just updated my kernel from 10.99.10 to 10.99.10 (~ Oct 11 to Oct > > 20) to test the rge(4) changes, and started a bulk build, and the > > packages using ghc seem to wait for something and make no progress. > ... > > I see one other new weird behaviour on that machine - gnucash doesn't > > finish starting up. > > I've backed out ad's changes from the 13th, and both problems are gone. > > I'll attach my local change. > > Andrew, can you please take a look? Two test cases to see the problem I have: 1. start gnucash, it doesn't finish starting up, the splash screen hangs. 2. cd /usr/pkgsrc/devel/hs-data-array-byte && make The 'build' step has two parts, it hangs after the first one. Thomas
Re: weird hangs in current (ghc, gnucash)
... and probably 3. PR kern/57660 https://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=57660 Markus Am So., 22. Okt. 2023 um 23:10 Uhr schrieb Thomas Klausner : > > On Sun, Oct 22, 2023 at 11:06:25PM +0200, Thomas Klausner wrote: > > On Sun, Oct 22, 2023 at 10:37:54PM +0200, Thomas Klausner wrote: > > > I've just updated my kernel from 10.99.10 to 10.99.10 (~ Oct 11 to Oct > > > 20) to test the rge(4) changes, and started a bulk build, and the > > > packages using ghc seem to wait for something and make no progress. > > ... > > > I see one other new weird behaviour on that machine - gnucash doesn't > > > finish starting up. > > > > I've backed out ad's changes from the 13th, and both problems are gone. > > > > I'll attach my local change. > > > > Andrew, can you please take a look? > > Two test cases to see the problem I have: > > 1. start gnucash, it doesn't finish starting up, the splash screen hangs. > > 2. cd /usr/pkgsrc/devel/hs-data-array-byte && make >The 'build' step has two parts, it hangs after the first one. > > Thomas
Re: weird hangs in current (ghc, gnucash)
This weird hang still takes place on ❯ uname -a NetBSD ymir.lorien.lan 10.99.10 NetBSD 10.99.10 (GENERIC) #13: Mon Oct 30 19:45:39 GMT 2023 sysbu...@ymir.lorien.lan:/dumps/sysbuild/amd64/obj/home/sysbuild/src/sys/arch/amd64/com pile/GENERIC amd64 - again during building a haskell package: ===> Configuring for hs-tagged-0.8.8 [1 of 2] Compiling Main ( Setup.lhs, Setup.o ) Htop gives weird output for the process not-yet-created: 11506 root63 0 33283 873 S 0.0 0.0 0:00.00 | `- make 20458 root62 0 34832 613 S 0.0 0.0 0:00.00 | `- /bin/sh -c set -e; test -n "" && echo 1>&2 "ERROR:" && exit 1; exec 3<&0;??? whil 24942 root63 0 33296 882 S 0.0 0.0 0:00.00 | `- /usr/bin/make _MAKE=/usr/bin/make OPSYS=NetBSD OS_VERSION=10.99.10 OPSYS_VERSION=109910 LOWE 21643 root58 0 34302 606 S 0.0 0.0 0:00.00 | `- /bin/sh -c set -e;? if test -n "" && /usr/pkg/sbin/pkg_info -K /usr/pkg/pkgdb -qe hs 19149 root63 0 34367 920 S 0.0 0.0 0:00.00 | `- /usr/bin/make LOWER_OPSYS=netbsd _PKGSRC_BARRIER=yes ALLOW_VULNERABLE_PACKAGES= reinst 23303 root58 0 33685 603 S 0.0 0.0 0:00.00 | `- /bin/sh -c set -e; ulimit -d `ulimit -H -d`; ulimit -v `ulimit -H -v`; cd /usr/pkgs 27078 root21 0 256G 37735 S 0.0 0.9 0:00.00 | `- /usr/pkg/lib/ghc-9.6.3/bin/./ghc-9.6.3 -B/usr/pkg/lib/ghc-9.6.3/lib -package-env 22058 root -22 0 0 0 Z 0.0 0.0 0:00.00 | `- gcc <== --- I guess it is back to the kernel from the 9th of October. Chavdar - On Mon, 23 Oct 2023 at 09:27, Chavdar Ivanov wrote: > > I can confirm that after reverting to the kernel from 9th of October > devel/happy builds OK. > > On Mon, 23 Oct 2023 at 05:56, Markus Kilbinger wrote: >> >> ... and probably >> >> 3. PR kern/57660 >> https://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=57660 >> >> Markus >> >> Am So., 22. Okt. 2023 um 23:10 Uhr schrieb Thomas Klausner : >> > >> > On Sun, Oct 22, 2023 at 11:06:25PM +0200, Thomas Klausner wrote: >> > > On Sun, Oct 22, 2023 at 10:37:54PM +0200, Thomas Klausner wrote: >> > > > I've just updated my kernel from 10.99.10 to 10.99.10 (~ Oct 11 to Oct >> > > > 20) to test the rge(4) changes, and started a bulk build, and the >> > > > packages using ghc seem to wait for something and make no progress. >> > > ... >> > > > I see one other new weird behaviour on that machine - gnucash doesn't >> > > > finish starting up. >> > > >> > > I've backed out ad's changes from the 13th, and both problems are gone. >> > > >> > > I'll attach my local change. >> > > >> > > Andrew, can you please take a look? >> > >> > Two test cases to see the problem I have: >> > >> > 1. start gnucash, it doesn't finish starting up, the splash screen hangs. >> > >> > 2. cd /usr/pkgsrc/devel/hs-data-array-byte && make >> >The 'build' step has two parts, it hangs after the first one. >> > >> > Thomas > > > > -- > --
Re: weird hangs in current (ghc, gnucash)
Should we back out ad's changes until he has time to look at them? Thomas On Wed, Nov 01, 2023 at 09:36:01AM +, Chavdar Ivanov wrote: > This weird hang still takes place on > > ❯ uname -a > NetBSD ymir.lorien.lan 10.99.10 NetBSD 10.99.10 (GENERIC) #13: Mon Oct > 30 19:45:39 GMT 2023 > sysbu...@ymir.lorien.lan:/dumps/sysbuild/amd64/obj/home/sysbuild/src/sys/arch/amd64/com > pile/GENERIC amd64 > > - again during building a haskell package: > > ===> Configuring for hs-tagged-0.8.8 > [1 of 2] Compiling Main ( Setup.lhs, Setup.o ) > > > Htop gives weird output for the process not-yet-created: > > 11506 root63 0 33283 873 S 0.0 0.0 0:00.00 | `- make > 20458 root62 0 34832 613 S 0.0 0.0 0:00.00 | `- > /bin/sh -c set -e; test -n "" && echo 1>&2 "ERROR:" && exit > 1; exec 3<&0;??? whil > 24942 root63 0 33296 882 S 0.0 0.0 0:00.00 | `- > /usr/bin/make _MAKE=/usr/bin/make OPSYS=NetBSD OS_VERSION=10.99.10 > OPSYS_VERSION=109910 LOWE > 21643 root58 0 34302 606 S 0.0 0.0 0:00.00 | `- > /bin/sh -c set -e;? if test -n "" && /usr/pkg/sbin/pkg_info -K > /usr/pkg/pkgdb -qe hs > 19149 root63 0 34367 920 S 0.0 0.0 0:00.00 | `- > /usr/bin/make LOWER_OPSYS=netbsd _PKGSRC_BARRIER=yes > ALLOW_VULNERABLE_PACKAGES= reinst > 23303 root58 0 33685 603 S 0.0 0.0 0:00.00 | `- > /bin/sh -c set -e; ulimit -d `ulimit -H -d`; ulimit -v `ulimit -H -v`; > cd /usr/pkgs > 27078 root21 0 256G 37735 S 0.0 0.9 0:00.00 | `- > /usr/pkg/lib/ghc-9.6.3/bin/./ghc-9.6.3 -B/usr/pkg/lib/ghc-9.6.3/lib > -package-env > 22058 root -22 0 0 0 Z 0.0 0.0 0:00.00 | > `- gcc <== > --- > > > I guess it is back to the kernel from the 9th of October. > > Chavdar > > - > > On Mon, 23 Oct 2023 at 09:27, Chavdar Ivanov wrote: > > > > I can confirm that after reverting to the kernel from 9th of October > > devel/happy builds OK. > > > > On Mon, 23 Oct 2023 at 05:56, Markus Kilbinger wrote: > >> > >> ... and probably > >> > >> 3. PR kern/57660 > >> https://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=57660 > >> > >> Markus > >> > >> Am So., 22. Okt. 2023 um 23:10 Uhr schrieb Thomas Klausner > >> : > >> > > >> > On Sun, Oct 22, 2023 at 11:06:25PM +0200, Thomas Klausner wrote: > >> > > On Sun, Oct 22, 2023 at 10:37:54PM +0200, Thomas Klausner wrote: > >> > > > I've just updated my kernel from 10.99.10 to 10.99.10 (~ Oct 11 to > >> > > > Oct > >> > > > 20) to test the rge(4) changes, and started a bulk build, and the > >> > > > packages using ghc seem to wait for something and make no progress. > >> > > ... > >> > > > I see one other new weird behaviour on that machine - gnucash doesn't > >> > > > finish starting up. > >> > > > >> > > I've backed out ad's changes from the 13th, and both problems are gone. > >> > > > >> > > I'll attach my local change. > >> > > > >> > > Andrew, can you please take a look? > >> > > >> > Two test cases to see the problem I have: > >> > > >> > 1. start gnucash, it doesn't finish starting up, the splash screen hangs. > >> > > >> > 2. cd /usr/pkgsrc/devel/hs-data-array-byte && make > >> >The 'build' step has two parts, it hangs after the first one. > >> > > >> > Thomas > > > > > > > > -- > > > > > > -- >
Re: weird hangs in current (ghc, gnucash)
On Wed, Nov 01, 2023 at 10:49:12AM +0100, Thomas Klausner wrote: > Should we back out ad's changes until he has time to look at them? I just did that on behalf of core. Can you test if this solves your problem? Martin
Re: weird hangs in current (ghc, gnucash)
On Thu, 2 Nov 2023 at 10:33, Martin Husemann wrote: > > On Wed, Nov 01, 2023 at 10:49:12AM +0100, Thomas Klausner wrote: > > Should we back out ad's changes until he has time to look at them? > > I just did that on behalf of core. > Can you test if this solves your problem? I rebuilt the system a few hours ago. now previously failing packages (e.g. devel/hs-assoc) build. I can restart my present rolling replace, I guess. > > Martin Chavdar --
Re: weird hangs in current (ghc, gnucash)
On Thu, Nov 02, 2023 at 11:33:54AM +0100, Martin Husemann wrote: > On Wed, Nov 01, 2023 at 10:49:12AM +0100, Thomas Klausner wrote: > > Should we back out ad's changes until he has time to look at them? > > I just did that on behalf of core. > Can you test if this solves your problem? Thank you, both my test cases work again with a GENERIC. Thomas