Re: panic: Assertion lock == sq->sq_lock failed at /usr/src-13/sys/kern/subr_sleepqueue.c:371
On 03/05/2020 15:13, Gary Jennejohn wrote: On Sun, 3 May 2020 14:11:09 +0100 Grzegorz Junka wrote: I don't have a partition that I could use for swap. I have two whole disks added to ZFS. Maybe on the boot drive but that would require repartitioning and I have Windows/FreeBSD there, so not so straightforward. As the dumpon man pages states, by the time a crash dump is needed the files systems are dead. No way to dump to a ZFS file system. That's why a raw partition is required. The other option would be netdump. See the dumpon man page. I will consider a separate partition next time I partition my disk. For now I will have to ignore panics and dumps. I tried netdump and it didn't work - it couldn't ARP the netmapd server. --GrzegorzJ ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: lock order reversal and poudriere
On 03/05/2020 15:00, Niclas Zeising wrote: On 2020-05-02 20:36, Kurt Jaeger wrote: I don't know, either 8-} bz@ is in Cc:, so he'll probably know what to do. How do I know if I have got a backtrace? Are those errors: pid 43297 (conftest), jid 5, uid 0: exited on signal 11 related or it's a different issue? I think that's a different issue. conftest is when configure scripts do things. Configure works a lot by compiling (and sometimes running) small snippets of code to figure out what's going on. Sometimes those snippets core dump. It's all normal. Good to know. It's mostly conftest but sometimes others too: pid 37407 (cc), jid 9, uid 0: exited on signal 6 pid 95358 (conftest), jid 3, uid 0: exited on signal 11 pid 70242 (conftest), jid 9, uid 0: exited on signal 11 pid 27480 (ngc27183), jid 3, uid 0: exited on signal 11 Regards --GrzegorzJ ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: lock order reversal and poudriere
On 02/05/2020 10:08, Grzegorz Junka wrote: I am compiling some packages with poudriere on 13-current kernel. I noticed some strange messages printed into the terminal and dmesg: lock order reversal: 1st 0xf8010ca78250 zfs (zfs) @ /usr/src-13/sys/kern/vfs_mount.c:1005 2nd 0xf8010cd37250 devfs (devfs) @ /usr/src-13/sys/kern/vfs_mount.c:1016 stack backtrace: #0 0x80c2d5f1 at witness_debugger+0x71 #1 0x80b92f18 at lockmgr_lock_flags+0x188 #2 0x80cae744 at _vn_lock+0x54 #3 0x80c90756 at vfs_domount+0xd16 #4 0x80c8efd1 at vfs_donmount+0x871 #5 0x80c8e729 at sys_nmount+0x69 #6 0x81060c40 at amd64_syscall+0x140 #7 0x810370a0 at fast_syscall_common+0x101 pid 17216 (conftest), jid 6, uid 0: exited on signal 11 pid 51159 (conftest), jid 6, uid 0: exited on signal 11 pid 23833 (conftest), jid 3, uid 0: exited on signal 11 pid 4916 (conftest), jid 3, uid 0: exited on signal 11 (... then there is a bunch of similar ones, then ...) pid 14504 (conftest), jid 3, uid 0: exited on signal 11 pid 27466 (conftest), jid 6, uid 0: exited on signal 11 pid 43297 (conftest), jid 5, uid 0: exited on signal 11 lock order reversal: 1st 0xfe00bc68c030 filedesc structure (filedesc structure) @ /usr/src-13/sys/kern/sys_generic.c:1557 2nd 0xf803baeddbd8 tmpfs (tmpfs) @ /usr/src-13/sys/kern/vfs_vnops.c:1553 stack backtrace: #0 0x80c2d5f1 at witness_debugger+0x71 #1 0x80b946b5 at lockmgr_xlock+0x55 #2 0x80cae744 at _vn_lock+0x54 #3 0x80cad0da at vn_poll+0x3a #4 0x80c33e19 at kern_poll+0x419 #5 0x80c340df at sys_ppoll+0x6f #6 0x81060c40 at amd64_syscall+0x140 #7 0x810370a0 at fast_syscall_common+0x101 pid 37533 (conftest), jid 5, uid 0: exited on signal 11 pid 43474 (conftest), jid 5, uid 0: exited on signal 11 I restarted the compilation and again seeing similar LORs: lock order reversal: 1st 0xf80115d32068 zfs (zfs) @ /usr/src-13/sys/kern/vfs_mount.c:1005 2nd 0xf800243d6808 devfs (devfs) @ /usr/src-13/sys/kern/vfs_mount.c:1016 stack backtrace: #0 0x80c2d5f1 at witness_debugger+0x71 #1 0x80b92f18 at lockmgr_lock_flags+0x188 #2 0x80cae744 at _vn_lock+0x54 #3 0x80c90756 at vfs_domount+0xd16 #4 0x80c8efd1 at vfs_donmount+0x871 #5 0x80c8e729 at sys_nmount+0x69 #6 0x81060c40 at amd64_syscall+0x140 #7 0x810370a0 at fast_syscall_common+0x101 lock order reversal: 1st 0xfe00a7aa49b0 filedesc structure (filedesc structure) @ /usr/src-13/sys/kern/sys_generic.c:1557 2nd 0xf800aa2cdbd8 zfs (zfs) @ /usr/src-13/sys/kern/vfs_vnops.c:1553 stack backtrace: #0 0x80c2d5f1 at witness_debugger+0x71 #1 0x80b946b5 at lockmgr_xlock+0x55 #2 0x80cae744 at _vn_lock+0x54 #3 0x80cad0da at vn_poll+0x3a #4 0x80c33e19 at kern_poll+0x419 #5 0x80c339f0 at sys_poll+0x50 #6 0x81060c40 at amd64_syscall+0x140 #7 0x810370a0 at fast_syscall_common+0x101 The page to report still returns 404 :) -- GrzegorzJ ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: panic: Assertion lock == sq->sq_lock failed at /usr/src-13/sys/kern/subr_sleepqueue.c:371
On 03/05/2020 08:05, Gary Jennejohn wrote: On Sat, 02 May 2020 16:28:46 -0700 Chris wrote: Another thing is that I don't quite understand why the crash couldn't be dumped. root@crayon2:~ # swapinfo Device__ 1K-blocks Used__ Avail Capacity /dev/zvol/tank3/swap__ 33554432__ 0 33554432 0% There is no entry in /etc/fstab though, should it be there too? How about your rc.conf(5) ? You need to define a dumpdev within it as: # Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable dumpdev="YES" Which defaults to the location of: /var/crash Yes, of course I have 'dumpdev="AUTO"'. Should it be "YES" instead? Yes, it should of course be AUTO. I was distracted at the time of writing. Sorry. Does /var/crash exist? That _should_ be enough. Assuming /var/crash is writable. Sorry, but read the man page for rc.conf. This is the entry for dumpdev: dumpdev (str) Indicates the device (usually a swap partition) to which a crash dump should be written in the event of a system crash. If the value of this variable is "AUTO", the first suitable swap device listed in /etc/fstab will be used as dump device. Otherwise, the value of this variable is passed as the argument to dumpon(8). To disable crash dumps, set this variable to "NO". If there are no swap devices in /etc/fstab then "AUTO" will not work. But a partition can be specified. I have dumpdev="/dev/ada0p5" in my rc.conf. /var/crash is the target for crash dumps after the system is re-booted. /var/crash existed but might not have had the right permissions. I think it was 755 whereas the handbook recommends 700. Shouldn't matter though. I don't have anything about swap in fstab since I am using Root on ZFS. swapinfo correctly recognizes the swap partition and uses it. This the typical usage while I am compiling ports: last pid: 85116; load averages: 8.95, 8.50, 8.34 up 0+18:06:31 13:02:32 72 processes: 14 running, 57 sleeping, 1 zombie CPU: 0.0% user, 90.5% nice, 9.5% system, 0.0% interrupt, 0.0% idle Mem: 993M Active, 594M Inact, 6400K Laundry, 12G Wired, 2225M Free ARC: 6160M Total, 3093M MFU, 2657M MRU, 214M Anon, 100M Header, 193M Other 5300M Compressed, 5861M Uncompressed, 1.11:1 Ratio Swap: 32G Total, 61M Used, 32G Free The crash happened in similar conditions so there should be nothing preventing dumping the crash to the zfs swap, unless dumpon isn't smart enough to use zfs swap. I don't have a partition that I could use for swap. I have two whole disks added to ZFS. Maybe on the boot drive but that would require repartitioning and I have Windows/FreeBSD there, so not so straightforward. --GrzegorzJ ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: panic: Assertion lock == sq->sq_lock failed at /usr/src-13/sys/kern/subr_sleepqueue.c:371
On 02/05/2020 21:18, Mark Johnston wrote: OK, I found this handbook https://www.freebsd.org/doc/en/books/developers-handbook/book.html#kerneldebug Obviously something must have been misconfigured that I can't dump the core now. Is there anything I can fetch from the system while I am in db> or I should just forget and restart? It would be useful to see the output of "bt", "show lockchain" and "alltrace" if possible. The latter command will product a lot of output though. Sorry, had to restart. I tried "netdump -s someIP -g someGateway which forced netdump into a loop (of requesting ARP for someIP and failing) and couldn't stop it. I only have the photo of the crash itself which ends at and sleepq_add before going to panic. I can hardtranscribe if it might be of any use. --GrzegorzJ ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: panic: Assertion lock == sq->sq_lock failed at /usr/src-13/sys/kern/subr_sleepqueue.c:371
On 02/05/2020 20:43, Chris wrote: On Sat, 2 May 2020 20:19:56 +0100 Grzegorz Junka li...@gjunka.com said On 02/05/2020 14:56, Grzegorz Junka wrote: > > On 02/05/2020 14:15, Grzegorz Junka wrote: >> cpuid = 3 >> >> time = 1588422616 >> >> KDB: stack backtrace: >> >> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame >> 0xfe00b27e86b0 >> >> vpanic() at vpanic+0x182/frame 0xfe00b27e8700 >> >> panic() at panic+0x43/frame ... >> >> sleepq_add() >> >> ... >> >> I see >> >> db> >> >> in the terminal. I tried "dump" but it says, Cannot dump: no dump >> device specified. >> >> Is there a guide how to deal wit those, i.e. to gather information >> required to investigate issues? > Another thing is that I don't quite understand why the crash couldn't be dumped. root@crayon2:~ # swapinfo Device 1K-blocks Used Avail Capacity /dev/zvol/tank3/swap 33554432 0 33554432 0% There is no entry in /etc/fstab though, should it be there too? How about your rc.conf(5) ? You need to define a dumpdev within it as: # Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable dumpdev="YES" Which defaults to the location of: /var/crash Yes, of course I have 'dumpdev="AUTO"'. Should it be "YES" instead? ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: panic: Assertion lock == sq->sq_lock failed at /usr/src-13/sys/kern/subr_sleepqueue.c:371
On 02/05/2020 14:56, Grzegorz Junka wrote: On 02/05/2020 14:15, Grzegorz Junka wrote: cpuid = 3 time = 1588422616 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe00b27e86b0 vpanic() at vpanic+0x182/frame 0xfe00b27e8700 panic() at panic+0x43/frame ... sleepq_add() ... I see db> in the terminal. I tried "dump" but it says, Cannot dump: no dump device specified. Is there a guide how to deal wit those, i.e. to gather information required to investigate issues? Another thing is that I don't quite understand why the crash couldn't be dumped. root@crayon2:~ # swapinfo Device 1K-blocks Used Avail Capacity /dev/zvol/tank3/swap 33554432 0 33554432 0% There is no entry in /etc/fstab though, should it be there too? -- GrzegorzJ ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: panic: Assertion lock == sq->sq_lock failed at /usr/src-13/sys/kern/subr_sleepqueue.c:371
On 02/05/2020 15:40, Conrad Meyer wrote: Hi Grzegorz, If you have another machine connected by network that you can install and start netdumpd on, and; ipv4 configured on a supported network device before the machine paniced; and a recent CURRENT; you should be able to initiate a kernel dump over the network with 'netdump -s server-ip' in DDB. In more complicated situations you might also need to specify '-g gateway-ip -c client-ip -i interface', but for servers on the LAN or available via the default gateway route, the former ought to work. Thanks Conrad. That doesn't seem to work. netdump -s reports "Failed to ARP server" then "failed to locate MAC address". Both systems are in the same local network and the system that crashed did have a network configured prior to crash. In fact, I was logged in over ssh in one of the terminals. I tried through a switch and when the network is connected directly. I tried to specify the interface and the client IP. Is there a way to specify MAC directly? GrzegorzJ ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: panic: Assertion lock == sq->sq_lock failed at /usr/src-13/sys/kern/subr_sleepqueue.c:371
On 02/05/2020 14:15, Grzegorz Junka wrote: cpuid = 3 time = 1588422616 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe00b27e86b0 vpanic() at vpanic+0x182/frame 0xfe00b27e8700 panic() at panic+0x43/frame ... sleepq_add() ... I see db> in the terminal. I tried "dump" but it says, Cannot dump: no dump device specified. Is there a guide how to deal wit those, i.e. to gather information required to investigate issues? ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" OK, I found this handbook https://www.freebsd.org/doc/en/books/developers-handbook/book.html#kerneldebug Obviously something must have been misconfigured that I can't dump the core now. Is there anything I can fetch from the system while I am in db> or I should just forget and restart? GrzegorzJ ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
panic: Assertion lock == sq->sq_lock failed at /usr/src-13/sys/kern/subr_sleepqueue.c:371
cpuid = 3 time = 1588422616 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe00b27e86b0 vpanic() at vpanic+0x182/frame 0xfe00b27e8700 panic() at panic+0x43/frame ... sleepq_add() ... I see db> in the terminal. I tried "dump" but it says, Cannot dump: no dump device specified. Is there a guide how to deal wit those, i.e. to gather information required to investigate issues? ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: lock order reversal and poudriere
On 02/05/2020 10:54, Kurt Jaeger wrote: Hi! I am compiling some packages with poudriere on 13-current kernel. I noticed some strange messages printed into the terminal and dmesg: lock order reversal: [...] Are those the debug messages that aren't visible on non-current kernel and should they be reported? Yes, they should be checked and reported. For more details see: http://sources.zabbadoz.net/freebsd/lor.html There's a webpage with a list of all known LORs and a way to report new LORs. Thanks Kurt. I can't find those two specific LORs in the list on that page. The page also says to report them using a link, which leads to 404 :-), or on this mailing list, which I did. I am not sure what else should I do. How do I know if I have got a backtrace? Are those errors: pid 43297 (conftest), jid 5, uid 0: exited on signal 11 related or it's a different issue? GrzegorzJ ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
lock order reversal and poudriere
I am compiling some packages with poudriere on 13-current kernel. I noticed some strange messages printed into the terminal and dmesg: lock order reversal: 1st 0xf8010ca78250 zfs (zfs) @ /usr/src-13/sys/kern/vfs_mount.c:1005 2nd 0xf8010cd37250 devfs (devfs) @ /usr/src-13/sys/kern/vfs_mount.c:1016 stack backtrace: #0 0x80c2d5f1 at witness_debugger+0x71 #1 0x80b92f18 at lockmgr_lock_flags+0x188 #2 0x80cae744 at _vn_lock+0x54 #3 0x80c90756 at vfs_domount+0xd16 #4 0x80c8efd1 at vfs_donmount+0x871 #5 0x80c8e729 at sys_nmount+0x69 #6 0x81060c40 at amd64_syscall+0x140 #7 0x810370a0 at fast_syscall_common+0x101 pid 17216 (conftest), jid 6, uid 0: exited on signal 11 pid 51159 (conftest), jid 6, uid 0: exited on signal 11 pid 23833 (conftest), jid 3, uid 0: exited on signal 11 pid 4916 (conftest), jid 3, uid 0: exited on signal 11 (... then there is a bunch of similar ones, then ...) pid 14504 (conftest), jid 3, uid 0: exited on signal 11 pid 27466 (conftest), jid 6, uid 0: exited on signal 11 pid 43297 (conftest), jid 5, uid 0: exited on signal 11 lock order reversal: 1st 0xfe00bc68c030 filedesc structure (filedesc structure) @ /usr/src-13/sys/kern/sys_generic.c:1557 2nd 0xf803baeddbd8 tmpfs (tmpfs) @ /usr/src-13/sys/kern/vfs_vnops.c:1553 stack backtrace: #0 0x80c2d5f1 at witness_debugger+0x71 #1 0x80b946b5 at lockmgr_xlock+0x55 #2 0x80cae744 at _vn_lock+0x54 #3 0x80cad0da at vn_poll+0x3a #4 0x80c33e19 at kern_poll+0x419 #5 0x80c340df at sys_ppoll+0x6f #6 0x81060c40 at amd64_syscall+0x140 #7 0x810370a0 at fast_syscall_common+0x101 pid 37533 (conftest), jid 5, uid 0: exited on signal 11 pid 43474 (conftest), jid 5, uid 0: exited on signal 11 Poudriere doesn't really report any problems: # poudriere status SET PORTS JAIL BUILD STATUS QUEUE BUILT FAIL SKIP IGNORE REMAIN TIME LOGS kde5 gui 13 2020-05-01_10h17m52s parallel_build 2040 792 0 0 0 1248 22:48:00 /usr/local/poudriere/data/logs/bulk/13-gui-kde5/2020-05-01_10h17m52s Are those the debug messages that aren't visible on non-current kernel and should they be reported? GrzegorzJ ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"