Re: NetBSD/amd64 7.0 domU freezes while running pbulk.sh (was Re: Raspberry Pi 2, nfs mount hangs after some time)
On Thu, 26 Nov 2015, Emile `iMil' Heitor wrote: If this issue _is_ NFS related, which I doubt now, it is then read-related, as the build is done in tmpfs. Pushing the logic further, I just tried with pkgsrc itself being in tmpfs, and it froze even faster. Emile `iMil' Heitor * _ | http://imil.net| ASCII ribbon campaign ( ) | http://www.NetBSD.org | - against HTML email X | http://gcu.info| & vCards / \
Re: NetBSD/amd64 7.0 domU freezes while running pbulk.sh (was Re: Raspberry Pi 2, nfs mount hangs after some time)
On Thu, Nov 26, 2015 at 07:13:04PM +0100, Emile `iMil' Heitor wrote: > On Thu, 26 Nov 2015, Manuel Bouyer wrote: > > >what does 'show uvm' report ? > > db{0}> show uvm > Current UVM status: > pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12 > , ncolors=8 7444115 VM pages: 53990 active, 1807 inactive, 1 wired, 7302474 > fre OK, so it's not a "out of memory" issue -- Manuel Bouyer NetBSD: 26 ans d'experience feront toujours la difference --
Re: NetBSD/amd64 7.0 domU freezes while running pbulk.sh (was Re: Raspberry Pi 2, nfs mount hangs after some time)
On Thu, 26 Nov 2015, Manuel Bouyer wrote: what does 'show uvm' report ? db{0}> show uvm Current UVM status: pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12 , ncolors=8 7444115 VM pages: 53990 active, 1807 inactive, 1 wired, 7302474 fre e pages 26626 anon, 25029 file, 4143 exec freemin=4096, free-target=5461, wired-max=2481371 cpu0: faults=2789285, traps=2796277, intrs=1396531, ctxswitch=1585247 softint=792597, syscalls=1259477 cpu1: faults=1696656, traps=1698017, intrs=180486, ctxswitch=127378 softint=36160, syscalls=608644 cpu2: faults=1207093, traps=1208093, intrs=160266, ctxswitch=65538 softint=18178, syscalls=412550 cpu3: faults=1434344, traps=1435516, intrs=174413, ctxswitch=100028 softint=24126, syscalls=512909 cpu4: faults=1273978, traps=1275187, intrs=161384, ctxswitch=68847 softint=19305, syscalls=424913 cpu5: faults=1622825, traps=1624084, intrs=171817, ctxswitch=105319 softint=31330, syscalls=510165 cpu6: faults=1734292, traps=1735749, intrs=170374, ctxswitch=99131 softint=26841, syscalls=551106 cpu7: faults=1392652, traps=1393985, intrs=166582, ctxswitch=81469 softint=20174, syscalls=442880 cpu8: faults=1492063, traps=1493265, intrs=166791, ctxswitch=88768 softint=24325, syscalls=492824 cpu9: faults=1579170, traps=1580406, intrs=167471, ctxswitch=89049 softint=23423, syscalls=506804 cpu10: faults=2153399, traps=2154831, intrs=184225, ctxswitch=149924 softint=40597, syscalls=828691 cpu11: faults=3136585, traps=3138031, intrs=219926, ctxswitch=251413 softint=67270, syscalls=1262227 cpu12: faults=4211510, traps=4213265, intrs=222549, ctxswitch=273560 softint=78470, syscalls=1584403 cpu13: faults=3938228, traps=3940765, intrs=252763, ctxswitch=368601 softint=110598, syscalls=1636441 cpu14: faults=1720207, traps=1721476, intrs=183332, ctxswitch=138148 softint=43486, syscalls=759336 cpu15: faults=1547431, traps=1548457, intrs=177462, ctxswitch=126099 softint=36803, syscalls=657976 fault counts: noram=0, noanon=0, pgwait=0, pgrele=0 ok relocks(total)=19975519(19975516), anget(retrys)=1498606(0), amapcopy=186 2658 neighbor anon/obj pg=1672558/779689, gets(lock/unlock)=20195408/19975523 cases: anon=1035657, anoncow=462949, obj=18439642, prcopy=1755801, przero=11 148433 daemon and swap counts: woke=0, revs=0, scans=0, obscans=0, anscans=0 busy=0, freed=0, reactivate=0, deactivate=0 pageouts=0, pending=0, nswget=0 nswapdev=0, swpgavail=0 swpages=0, swpginuse=0, swpgonly=0, paging=0 Emile `iMil' Heitor * _ | http://imil.net| ASCII ribbon campaign ( ) | http://www.NetBSD.org | - against HTML email X | http://gcu.info| & vCards / \
Re: NetBSD/amd64 7.0 domU freezes while running pbulk.sh (was Re: Raspberry Pi 2, nfs mount hangs after some time)
On 26 Nov 2015, at 17:22, Emile `iMil' Heitor wrote: > > 242691 3 14 0 a0003f446a80cat nfsrcv > 1868 1 3 15 0 a0003dca59c0 getty nfsrcv > 2354 1 3 5 0 a0003ed51b00 cron nfsrcv > > 2086 1 3 14 100 a0003eca46a0 qmgr nfsrcv > 677 1 3 5 0 a0003dd6fa40syslogd nfsrcv > 0 131 3 3 200 a0003d75d140 nfskqpoll nfsrcv Looks like a NFS problem, too many threads in nfsrcv... -- J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)
Re: NetBSD/amd64 7.0 domU freezes while running pbulk.sh (was Re: Raspberry Pi 2, nfs mount hangs after some time)
On Thu, Nov 26, 2015 at 05:22:16PM +0100, Emile `iMil' Heitor wrote: > On Thu, 26 Nov 2015, Emile `iMil' Heitor wrote: > > >Again, as there's no log at all, what would help debugging this behaviour? > > FWIW, some ddb output (ddb is triggered by hitting + on domU's console): what does 'show uvm' report ? -- Manuel Bouyer NetBSD: 26 ans d'experience feront toujours la difference --
Re: NetBSD/amd64 7.0 domU freezes while running pbulk.sh (was Re: Raspberry Pi 2, nfs mount hangs after some time)
On Thu, Nov 26, 2015 at 03:55:27PM +0100, Emile `iMil' Heitor wrote: > [...] > Again, as there's no log at all, what would help debugging this behaviour? Can you enter ddb on the console (on a PV domU this is with '+++', not break which doesn't exists for xl console) -- Manuel Bouyer NetBSD: 26 ans d'experience feront toujours la difference --
Re: NetBSD/amd64 7.0 domU freezes while running pbulk.sh (was Re: Raspberry Pi 2, nfs mount hangs after some time)
On Thu, 26 Nov 2015, Emile `iMil' Heitor wrote: Again, as there's no log at all, what would help debugging this behaviour? FWIW, some ddb output (ddb is triggered by hitting + on domU's console): fatal breakpoint trap in supervisor mode trap type 1 code 0 rip 8012e5ad cs e030 rflags 202 cr2 7f7ff6c1e049 ilevel 8 rsp a0051864cc58 curlwp 0xa00035538840 pid 0.2 lowest kstack 0xa0051864a2c0 Stopped in pid 0.2 (system) at netbsd:breakpoint+0x5: leave breakpoint() at netbsd:breakpoint+0x5 xencons_tty_input() at netbsd:xencons_tty_input+0xb2 xencons_handler() at netbsd:xencons_handler+0x65 intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x19 evtchn_do_event() at netbsd:evtchn_do_event+0x281 do_hypervisor_callback() at netbsd:do_hypervisor_callback+0x143 hypervisor_callback() at netbsd:hypervisor_callback+0x9e idle_loop() at netbsd:idle_loop+0xe8 ds 3c80 es c780 fs c040 gs 7524 rdi a0003a62b330 rsi 8437d01f rbp a0051864cc58 rbx 8437d01f rdx 2b rcx 2b rax 1 r8 0 r9 805fc780cpu_info_primary r10 cdd9f51e239cbb87 r11 246 r12 a0003d754c00 r13 8437d020 r14 a0003a62b330 r15 1 rip 8012e5adbreakpoint+0x5 cs e030 rflags 202 rsp a0051864cc58 ss e02b netbsd:breakpoint+0x5: leave db{0}> ps PIDLID S CPU FLAGS STRUCT LWP * NAME WAIT 162591 3 13 0 a0003f435080awk netio 242691 3 14 0 a0003f446a80cat nfsrcv 233011 3 1480 a0003f3f8a60 sh wait 118351 3 1480 a0003f40c0c0 bmake wait 9493 1 3 1480 a0003f26d4a0 sh wait 4831 1 3 1480 a0003f42d320 bmake wait 285131 3 1480 a0003f3f09e0 sh wait 6232 1 3 1580 a0003ed15ae0 sh wait 1172 1 3 1580 a0003f083000 bmake wait 165441 3 1580 a0003f439500 sh wait 181411 3 1580 a0003f430420 sh wait 2349 1 3 1580 a0003f448280 bash wait 490 1 3 580 a0003f447680 sshd select 2135 1 3 080 a0003f445640 sshd select 2234 1 3 480 a0003f2be580 bash ttyraw 381 1 3 1280 a0003f2be9a0 sshd select 382 1 3 1380 a0003e9cc680 sshd select 1868 1 3 15 0 a0003dca59c0 getty nfsrcv 2354 1 3 5 0 a0003ed51b00 cron nfsrcv 1675 1 3 1180 a0003edbd2e0 inetd kqueue 2105 1 3 1280 a0003ee0d720 nrpe select 2086 1 3 14 100 a0003eca46a0 qmgr nfsrcv 2033 1 3 13 0 a0003eca4ac0 pickup nfskqdet 2055 1 3 4 0 a0003edbd700 master tstile 164013 5 11 1000 a0003efff740 python2.7 1640 9 3 1180 a0003ee0db40 python2.7 kqueue 1640 8 3 1280 a0003ed152a0 python2.7 kqueue 1640 1 3 980 a0003e0f6240 python2.7 select 1555 1 3 1380 a0003e0f6660 sshd select 1407 1 3 1380 a0003dd15a20 powerd kqueue 892 1 3 280 a0003e099640 rpc.lockd select 884 1 3 1580 a0003e099a60 rpc.statd select 686 1 3 780 a0003dd151e0rpcbind select 677 1 3 5 0 a0003dd6fa40syslogd nfsrcv 11 3 880 a0003d75b100 init wait 0 131 3 3 200 a0003d75d140 nfskqpoll nfsrcv 0 129 3 4 200 a0003dc4e160 aiodoned aiodoned 0 128 3 7 200 a0003dc4e580ioflush syncer 0 127 3 0 200 a0003dc4e9a0 pgdaemon pgdaemon 0 124 3 14 200 a0003d75a920 nfsio nfsiod 0 123 3 13 200 a0003d75a500 nfsio nfsiod 0 122 3 9 200 a0003d75a0e0 nfsio nfsiod 0 121 3 15 200 a0003d75b940 nfsio nfsiod 0 120 3 0 200 a0003d75b520 cryptoret crypto_w 0 119 3 0 200 a0003d7530c0 unpgc unpgc 0 118 3 0 200 a0003d75c960xen_balloon xen_balloon 0 117 3 9 200 a0003d75c540vmem_rehash vmem_rehash 0 116 3 0 200 a0003d75d980 xenbus r
NetBSD/amd64 7.0 domU freezes while running pbulk.sh (was Re: Raspberry Pi 2, nfs mount hangs after some time)
On Mon, 2 Nov 2015, Emile `iMil' Heitor wrote: I'm trying to get rid of those hangs for weeks now, tried every mount flag combination without success, the system would freeze randomly, leaving the whole OS unresponsive. There's no log, no kernel message, the domU actually responds to network solicitations (ping, telnet 22...) but once it's frozen, it is impossible to run any command, it will just hang. The exact same setup is successfully running since Sept 2014 on NetBSD 6.1/amd64. Any idea how to get some valuable information to help tracking down this awful behaviour? A bit of follow-up. I've been trying many workarounds during the past weeks, and right now I'm not convinced it even is an NFS problem. I've setup a tmpfs bulk build directory, and even that way, NetBSD 7.0 would freeze randomly after a couple of minutes while processing `pbulk.sh'. What I can say: - the server is a fresh diskless NetBSD 7.0 domU (PXE/NFS) - there's not a single information about the freeze, not even in the console - I've only witnessed those freezes when calling `pbulk.sh' (couldn't get further anyway) - cvs co pkgsrc does not freezes, I ran it many times without issues - the domU stays up for days if no operation is made - I started this domU on various dom0s to validate this was not a hardware problem, always had the same symptoms - I tried a custom 7.0_STABLE kernel without success If this issue _is_ NFS related, which I doubt now, it is then read-related, as the build is done in tmpfs. Again, as there's no log at all, what would help debugging this behaviour? Emile `iMil' Heitor * _ | http://imil.net| ASCII ribbon campaign ( ) | http://www.NetBSD.org | - against HTML email X | http://gcu.info| & vCards / \