Hi Thomas, glad that this is observed elsewhere.
Maybe following bugs could resonate with your observations: kern/54207 [serious/high]: -current locks up solidly when pkgsrc building adapta-gtk-theme-3.95.0.11 looks like locking issue in layerfs* (nullfs). (AMD 1800X, 64GB) kern/54210 [serious/high]: NetBSD-8 processes presumably not exiting not tested with -current,but may be there too. (Intel(R) Xeon(R) Gold 6134 CPU @ 3.20GHz, ~380Gb) At this time I am not too confident, that -current is reliably able to do a pkgsrc build, though I have seen occasionally bulk builds that did finish. Most of the time I run into hard lockups with no information about the system state available (no console, no X, no network, no DDB). Frank On 06/28/19 10:46, Thomas Klausner wrote:
Hi! I've set up a new machine for bulk building. I have tried various things, but in the end it always hangs in tstile. First try was what I currently use: tmpfs sandboxes with nullfs mounted /bin, /lib, ... When it hung, the suspicion was that it's nullfs' fault. (The same setup works fine on my current machine.) The second try was tmpfs with copied-in /bin, /lib, ... and NFS-mounted packages/distfiles/pkgsrc (from localhost). That also hung. So the suspicion was that tmpfs or NFS are broken. The last try was building in the root file system, i.e. not even a sandbox (chroot). The only tmpfs is in /dev. distfiles/pkgsrc/packages are on spinning rust, / is on an ld@nvme. With 8 MAKE_JOBS this finished one pkgsrc build (where some packages didn't build because of missing distfiles, or because they randomly break like rust). When I restarted the bulk build with 24 MAKE_JOBS, it hung after ~4 hours. I have the following systat output: 2 users Load 8.78 7.19 3.62 Fri Jun 28 04:27:32 Proc:r d s Csw Traps SysCal Intr Soft Fault PAGING SWAPPING 24 10 7548 265849 157956 3504 2399 265476 in out in out ops 56.2% Sy 1.2% Us 0.0% Ni 0.0% In 42.5% Id pages | | | | | | | | | | | ============================> 670 forks fkppw Anon 294104 % zero 62161268 5572 Interrupts fksvm Exec 14116 % wired 16296 1968 TLB shootdown pwait File 24587740 18% inact 43756 100 cpu0 timer relck Meta 2606694 % bufs 495676 msi1 vec 0 rlkok (kB) real swaponly free 9 msix2 vec 0 noram Active 24835908 100033996 9 msix2 vec 1 57262 ndcpy Namei Sys-cache Proc-cache msix2 vec 2 27906 fltcp Calls hits % hits % 3427 ioapic1 pin 12 87178 zfod 125076 122834 98 80 0 59 ioapic2 pin 0 35775 cow msix7 vec 0 8192 fmin Disks: seeks xfers bytes %busy 10922 ftarg ld0 1969 16130K 34.8 itarg dk0 1969 16130K 34.8 flnan wd0 pdfre dk1 pdscn dk2 and this from top: load averages: 5.13, 6.53, 3.56; up 1+16:08:05 04:28:13 59 processes: 2 runnable, 55 sleeping, 2 on CPU CPU states: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 99.9% idle Memory: 24G Act, 43M Inact, 16M Wired, 14M Exec, 23G File, 95G Free Swap: 163G Total, 163G Free PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND 10353 pbulk 77 0 185M 172M select/0 0:13 4.74% 4.54% bjam 12120 wiz 109 0 83M 59M tstile/1 165:46 1.46% 1.46% systat 0 root 0 0 0K 93M CPU/31 35:39 0.00% 0.00% [system] 219 root 85 0 32M 2676K kqueue/4 7:34 0.00% 0.00% syslogd 13354 wiz 85 0 89M 4948K select/0 0:52 0.00% 0.00% sshd 380 root 85 0 30M 16M pause/4 0:04 0.00% 0.00% ntpd 10918 wiz 43 0 25M 2872K CPU/3 0:01 0.00% 0.00% top 1 root 85 0 20M 1756K wait/29 0:01 0.00% 0.00% init 5594 pbulk 0 0 0K 0K RUN/0 0:00 0.00% 0.00% bjam 22861 pbulk 0 0 0K 0K RUN/0 0:00 0.00% 0.00% bjam 747 root 117 0 20M 2080K tstile/8 0:00 0.00% 0.00% cron 16473 pbulk 117 0 18M 1564K tstile/2 0:00 0.00% 0.00% cp 9705 pbulk 117 0 15M 1564K bioloc/5 0:00 0.00% 0.00% cp 7301 pbulk 117 0 15M 1560K tstile/2 0:00 0.00% 0.00% cp 22971 pbulk 117 0 19M 1520K tstile/1 0:00 0.00% 0.00% cp 10013 pbulk 117 0 15M 1520K tstile/1 0:00 0.00% 0.00% cp 3411 pbulk 117 0 15M 1520K tstile/3 0:00 0.00% 0.00% cp 5212 pbulk 117 0 15M 1520K tstile/2 0:00 0.00% 0.00% cp 7072 pbulk 117 0 18M 1516K tstile/2 0:00 0.00% 0.00% cp 8880 pbulk 117 0 15M 1516K tstile/2 0:00 0.00% 0.00% cp 5869 pbulk 117 0 15M 1516K tstile/0 0:00 0.00% 0.00% cp 10159 pbulk 117 0 15M 1516K tstile/1 0:00 0.00% 0.00% cp 11783 pbulk 117 0 15M 1516K tstile/7 0:00 0.00% 0.00% cp 7205 pbulk 117 0 15M 1512K tstile/1 0:00 0.00% 0.00% cp 18676 pbulk 109 0 15M 1516K tstile/3 0:00 0.00% 0.00% cp 7802 pbulk 109 0 15M 1516K tstile/2 0:00 0.00% 0.00% cp 622 pbulk 109 0 15M 1512K tstile/2 0:00 0.00% 0.00% cp 29434 pbulk 109 0 9576K 680K tstile/2 0:00 0.00% 0.00% cp 2686 root 85 0 86M 6824K select/2 0:00 0.00% 0.00% sshd 10052 root 85 0 89M 6784K select/2 0:00 0.00% 0.00% sshd 674 root 85 0 70M 5056K wait/18 0:00 0.00% 0.00% login 19345 wiz 85 0 86M 4960K select/3 0:00 0.00% 0.00% sshd 652 postfix 85 0 57M 4848K kqueue/4 0:00 0.00% 0.00% qmgr 4466 postfix 85 0 59M 4560K kqueue/0 0:00 0.00% 0.00% pickup 441 root 85 0 70M 3412K select/2 0:00 0.00% 0.00% sshd 656 root 85 0 57M 3328K kqueue/0 0:00 0.00% 0.00% master 278 root 85 0 45M 2232K nfsd/31 0:00 0.00% 0.00% nfsd 639 root 85 0 16M 2128K pause/0 0:00 0.00% 0.00% ksh 21402 root 85 0 20M 1988K wait/0 0:00 0.00% 0.00% sh 23371 root 85 0 20M 1972K wait/0 0:00 0.00% 0.00% sh 3940 wiz 85 0 16M 1948K pause/23 0:00 0.00% 0.00% ksh 8843 wiz 85 0 16M 1948K pause/5 0:00 0.00% 0.00% ksh 227 root 85 0 20M 1940K select/1 0:00 0.00% 0.00% rpcbind 698 root 85 0 20M 1836K ttyraw/3 0:00 0.00% 0.00% getty 542 root 85 0 20M 1832K ttyraw/2 0:00 0.00% 0.00% getty 535 root 85 0 20M 1832K ttyraw/0 0:00 0.00% 0.00% getty 531 root 85 0 25M 1644K kqueue/3 0:00 0.00% 0.00% inetd 329 root 85 0 24M 1524K select/2 0:00 0.00% 0.00% mountd 436 root 85 0 20M 1516K kqueue/2 0:00 0.00% 0.00% powerd On the console I see that it's currently trying to build boost-headers, so it's not even something compile-heavy. The machine is still in this state and I have a PS/2 keyboard attached, so let me know if you want to check something out. I'll attach the dmesg from 8.99.42 (it's currently at 8.99.48). The kernel config is include "arch/amd64/conf/GENERIC" options FONT_GO_MONO12x23 no options FONT_BOLD16x32 no options FONT_BOLD8x16 It's a 16-core AMD Threadripper system with 128GB RAM. What could go wrong here? I'm running out of ideas. Thomas