[Bug 204340] [panic] nfsd, em, msix, fatal trap 9
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204340 --- Comment #31 from Rick Macklem --- And thanks go to you for doing the testing and commits. -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 204340] [panic] nfsd, em, msix, fatal trap 9
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204340 Andriy Gapon changed: What|Removed |Added Status|In Progress |Closed Resolution|--- |FIXED --- Comment #30 from Andriy Gapon --- Rick, thank you again! -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 204340] [panic] nfsd, em, msix, fatal trap 9
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204340 --- Comment #29 from commit-h...@freebsd.org --- A commit references this bug: Author: avg Date: Tue Feb 21 09:29:47 UTC 2017 New revision: 314034 URL: https://svnweb.freebsd.org/changeset/base/314034 Log: MFC r313735: add svcpool_close to handle killed nfsd threads PR: 204340 Reported by: Panzura Reviewed by: rmacklem Approved by: rmacklem Changes: _U stable/10/ stable/10/sys/fs/nfsserver/nfs_nfsdkrpc.c stable/10/sys/rpc/svc.c stable/10/sys/rpc/svc.h -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 204340] [panic] nfsd, em, msix, fatal trap 9
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204340 --- Comment #28 from commit-h...@freebsd.org --- A commit references this bug: Author: avg Date: Tue Feb 21 09:29:11 UTC 2017 New revision: 314033 URL: https://svnweb.freebsd.org/changeset/base/314033 Log: MFC r313735: add svcpool_close to handle killed nfsd threads PR: 204340 Reported by: Panzura Approved by: rmacklem Obtained from:rmacklem Changes: _U stable/11/ stable/11/sys/fs/nfsserver/nfs_nfsdkrpc.c stable/11/sys/rpc/svc.c stable/11/sys/rpc/svc.h -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 204340] [panic] nfsd, em, msix, fatal trap 9
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204340 --- Comment #27 from commit-h...@freebsd.org --- A commit references this bug: Author: avg Date: Tue Feb 14 17:49:08 UTC 2017 New revision: 313735 URL: https://svnweb.freebsd.org/changeset/base/313735 Log: add svcpool_close to handle killed nfsd threads This patch adds a new function to the server krpc called svcpool_close(). It is similar to svcpool_destroy(), but does not free the data structures, so that the pool can be used again. This function is then used instead of svcpool_destroy(), svcpool_create() when the nfsd threads are killed. PR: 204340 Reported by: Panzura Approved by: rmacklem Obtained from:rmacklem MFC after:1 week Changes: head/sys/fs/nfsserver/nfs_nfsdkrpc.c head/sys/rpc/svc.c head/sys/rpc/svc.h -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 204340] [panic] nfsd, em, msix, fatal trap 9
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204340 --- Comment #26 from Andriy Gapon --- (In reply to Rick Macklem from comment #25) Rick, thank you very much! We tested this patch and it makes nfsd more robust with respect to SIGKILL. We haven't seen any regressions. -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 204340] [panic] nfsd, em, msix, fatal trap 9
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204340 Rick Macklem changed: What|Removed |Added Attachment #179512|0 |1 is obsolete|| --- Comment #25 from Rick Macklem --- Created attachment 179661 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=179661&action=edit add svcpool_close (cleaned up version) This patch implements the same logic as 179512, but is cleaned up by factoring out the code common to svcpool_destroy() and svcpool_close() and placing it in svcpool_cleanup(). Semantically equivalent to 179512 for testing. -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 204340] [panic] nfsd, em, msix, fatal trap 9
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204340 Rick Macklem changed: What|Removed |Added Status|Closed |In Progress Resolution|FIXED |--- --- Comment #24 from Rick Macklem --- Please test the patch I just attached. (4th one) I think it might make the code less fragile to nfsd threads being signalled. I have not been able to create a crash with the patch during limited testing. Since avg@'s crash occurred with the other patches, I have reopened the PR. -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 204340] [panic] nfsd, em, msix, fatal trap 9
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204340 --- Comment #23 from Rick Macklem --- Created attachment 179512 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=179512&action=edit add svcpool_close so that svcpool_destroy doesn't get called when nfsd threads are killed This patch adds a new function to the server krpc called svcpool_close(). It is similar to svcpool_destroy(), but does not free the data structures, so that the pool can be used again. This function is then used instead of svcpool_destroy(), svcpool_create() when the nfsd threads are killed. These crashes are caused because the data structures were free'd by svcpool_destroy() when the nfsd threads were killed off (or signalled somehow). By avoiding the svcpool_destroy() call, the crashes should be avoided. -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 204340] [panic] nfsd, em, msix, fatal trap 9
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204340 --- Comment #22 from Andriy Gapon --- (In reply to Rick Macklem from comment #20) Rick, thank you very much for the explanation! I knew that nfsd processes were special as they 'lend their stacks to kernel' or something like that. But I failed to realise that that put restrictions in the signals as well. I should also explain that kill -9 was not used to shutdown nfsd or as a replacement for the normal nfsd management. It was used just to demonstrate the problem. I think that originally the problem happened when gdb was used on an nfsd process. I understand that the nfsd processes are special. But the situation seems to be a bit fragile. The current design is old and proven. But perhaps we could switch to using kernel processes or maybe we could mark the nfsd processes with a special flag somehow as to prevent them being killed SIGKILL or stopped with SIGSTOP (i.e. prevent normal signal delivery for all signals). Lastly, just to clarify, should we avoid using debuggers / SIGSTOP with nfsd? -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 204340] [panic] nfsd, em, msix, fatal trap 9
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204340 --- Comment #21 from Rick Macklem --- Oh, and if you get a crash when shutting down the nfsd threads via "kill -USR1 ", then let us know, since something is still broken. (If you are curious, the "(master)" nfsd is the one that gets new TCP connections and it must stop doing that before the kernel threads are terminated. Otherwise you can easily get a socket upcall after the data structures have been free'd.) rick -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 204340] [panic] nfsd, em, msix, fatal trap 9
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204340 --- Comment #20 from Rick Macklem --- Well, you should never kill the nfsd with SIGKILL. The way that is intended to be "safe" is to send SIGUSR1 to the "(master)" nfsd. It will shut things down and kill off the other threads. I'm not sure that I know of any way to make SIGKILL safe. I'm pretty sure "man nfsd" says this, but maybe it needs more emphasis? I also believe that /etc/rc.d/nfsd does this correctly. rick -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 204340] [panic] nfsd, em, msix, fatal trap 9
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204340 --- Comment #19 from Andriy Gapon --- (In reply to Rick Macklem from comment #18) > Andriy Gapon, did your crash occur when the machine was being shut down (or the nfsd threads were being killed off)? Yes, it did. I am told that this was easy to reproduce by SIGKILL to nfsd. -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 204340] [panic] nfsd, em, msix, fatal trap 9
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204340 --- Comment #18 from Rick Macklem --- Andriy Gapon, did your crash occur when the machine was being shut down (or the nfsd threads were being killed off)? If not, it is not caused by what these patches were intended for. (They are in 10.3.) Your crash basically indicates that either: 1 - The sg_group was free'd when a socket upcall was still in progress. Since the sg_group structures aren't free'd until the nfsd threads are killed (shutdown or ??), I don't think this can happen during normal operation. OR 2 - The xprt structure that referenced the sg_group was free'd prematurely and the sg pointer was bogus. If it was #2, I think I can come up with a simple patch to avoid this. (Basically acquire a reference count on the xprt structure during the socket upcall.) rick -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 204340] [panic] nfsd, em, msix, fatal trap 9
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204340 Andriy Gapon changed: What|Removed |Added CC||a...@freebsd.org --- Comment #17 from Andriy Gapon --- Rick, we got what looks like a very similar crash in FreeBSD 10.3: db_trace_self_wrapper+0x2a kdb_backtrace+0x37 vpanic+0xf7 panic+0x67 trap_fatal+0x264 trap_pfault+0x216 trap+0x32b calltrap+0x8 __mtx_lock_sleep+0xa2 xprt_active+0xe7 svc_vc_soupcall+0x25 sowakeup+0x69 tcp_do_segment+0x319e tcp_input+0x701 ip_input+0x14c netisr_dispatch_src+0x228 ether_demux+0x1a5 ether_nh_input+0x1fc netisr_dispatch_src+0x228 tcp_lro_flush+0x2b ixgbe_rxeof+0x30d ixgbe_msix_que+0x88 intr_event_execute_handlers+0x102 ithread_loop+0x9a fork_exit+0x11f fork_trampoline+0xe It's this lock: void xprt_active(SVCXPRT *xprt) { SVCGROUP *grp = xprt->xp_group; mtx_lock(&grp->sg_lock); Do you have any suggestions? Thank you! -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 204340] [panic] nfsd, em, msix, fatal trap 9
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204340 Rick Macklem changed: What|Removed |Added Resolution|--- |FIXED Status|In Progress |Closed --- Comment #16 from Rick Macklem --- The patch that fixes this has been committed to head as r291150 and stable/10 as r291869. -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 204340] [panic] nfsd, em, msix, fatal trap 9
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204340 Rick Macklem changed: What|Removed |Added Flags|mfc-stable10? |mfc-stable10+ -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 204340] [panic] nfsd, em, msix, fatal trap 9
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204340 --- Comment #15 from commit-h...@freebsd.org --- A commit references this bug: Author: rmacklem Date: Sat Nov 21 23:55:46 UTC 2015 New revision: 291150 URL: https://svnweb.freebsd.org/changeset/base/291150 Log: When the nfsd threads are terminated, the NFSv4 server state (opens, locks, etc) is retained, which I believe is correct behaviour. However, for NFSv4.1, the server also retained a reference to the xprt (RPC transport socket structure) for the backchannel. This caused svcpool_destroy() to not call SVC_DESTROY() for the xprt and allowed a socket upcall to occur after the mutexes in the svcpool were destroyed, causing a crash. This patch fixes the code so that the backchannel xprt structure is dereferenced just before svcpool_destroy() is called, so the code does do an SVC_DESTROY() on the xprt, which shuts down the socket upcall. Tested by:g_amana...@yahoo.com PR:204340 MFC after:2 weeks Changes: head/sys/fs/nfs/nfs_var.h head/sys/fs/nfsserver/nfs_nfsdkrpc.c head/sys/fs/nfsserver/nfs_nfsdstate.c -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 204340] [panic] nfsd, em, msix, fatal trap 9
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204340 --- Comment #14 from Rick Macklem --- Also, although it was good to get this crash resolved, I would recommend trying to avoid doing nfsd thread restarts. (If by any chance you were doing this to avoid access problems during export/mount updates, the "-S" option on mount suspends/resumes the nfsd threads, which should be preferable to killing them off and restarting them.) rick -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 204340] [panic] nfsd, em, msix, fatal trap 9
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204340 Rick Macklem changed: What|Removed |Added Flags|mfc-stable9?|mfc-stable9- --- Comment #13 from Rick Macklem --- A variant of the first patch has already been committed to head by mav@. He also noted to me that the second patch isn't needed. The "else" case doesn't have any other thread manipulating the list, so the thread mutex isn't needed. Since you are using NFSv4.1, I'm pretty sure that the third patch is the one that fixes the problem. I will commit this to head soon and MFC it before closing this PR. (Oh, and the NFSv4.1 server stuff isn't in stable/9, so the fix only applies to 10 and head.) Thanks for doing the testing, so this could get resolved. rick -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 204340] [panic] nfsd, em, msix, fatal trap 9
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204340 --- Comment #12 from g_amana...@yahoo.com --- The three last patches resolve the bug. @rick thank you. @koobs MFC? -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 204340] [panic] nfsd, em, msix, fatal trap 9
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204340 --- Comment #11 from g_amana...@yahoo.com --- I am currently testing with the three last patches applied as I do have NFSv4.1 clients running on Linux. Give me a couple of days and I will provide feedback. Thank you for the insight. -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 204340] [panic] nfsd, em, msix, fatal trap 9
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204340 Mark Linimon changed: What|Removed |Added Keywords||patch -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 204340] [panic] nfsd, em, msix, fatal trap 9
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204340 --- Comment #10 from Rick Macklem --- I have just added 2 more patches that might be relevant to the crashes. When the nfsd threads are terminated, this is what is supposed to happen: - All nfsd threads running in svc_run_internal() return to svc_run(). - svc_run() waits for all these threads to return. - After svc_run returns, the nfsd calls svcpool_destroy(). - svcpool_destroy() unregisters all the xprts (which represent the TCP sockets) - at this point, the reference count should be 1 for all xprts --> Then svcpool_destroy() calls SVC_RELEASE(xprt) for all of them, which drops the reference count to 0 and calls SVC_DESTROY() --> This actually calls svc_vc_destroy(), which shuts down the socket upcall and after that, destroys the mutexes. My best guess w.r.t. the crashes is that the reference count gets messed up on an xprt, so it doesn't get SVC_DESTROY()'d. Then a socket upcall calls xprt_active() after the mutex has been destroyed and BOOM. The two patched should be applied along with the first one. The second patch fixes the one other place that I can spot where the server side krpc code isn't quite SMP safe. Although unlikely, it is conceivable that this could cause the crashes. The third patch makes sure that the backchannel xprt is dereferenced before the call to svcpool_destroy(). The one seems a more likely culprit, but only if you have clients doing NFSv4.1 mounts against the server. If you could try the second patch (and the third if you have NFSv4.1 mounts), that would be appreciated. One final comment: I am assuming that you are terminating the nfsd threads by sending a SIGUSR1 to the nfsd master. This is the only way the nfsd threads should be terminated. (If you are using /etc/rc.d/nfsd, it should be doing that, but you might try using "kill -USR1 " directly, just in case the shell script is busted. This pretty well exhausts what I can see that might cause the crashes and I can't reproduce a crash here, so hopefully you can make some progress from here. Good luck with it, rick -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 204340] [panic] nfsd, em, msix, fatal trap 9
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204340 --- Comment #9 from Rick Macklem --- Created attachment 163300 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=163300&action=edit patch that makes NFSv4.1 server release the backchannel xprt This patch fixes the NFSv4.1 server so that it SVC_RELEASE()s the backchannel xprt when the nfsd threads are terminated. This patch only affects NFSv4.1, so it doesn't matter unless you are running NFSv4.1 client mounts. -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 204340] [panic] nfsd, em, msix, fatal trap 9
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204340 --- Comment #8 from Rick Macklem --- Created attachment 163299 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=163299&action=edit patch that locks mutex when request queue is updated This patch fixes the only other issue I can spot in the kernel server RPC. There was a case where the request queue was being updated, but the st_lock mutex was not held. This bug could have conceivably resulted in a corrupted request queue. -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 204340] [panic] nfsd, em, msix, fatal trap 9
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204340 --- Comment #7 from g_amana...@yahoo.com --- This a fresh dump with the second patch applied, it seems the same. The problem seems to occur in xprt_active(). Unread portion of the kernel message buffer: stack pointer = 0x28:0xfe01ee7ee430 frame pointer = 0x28:0xfe01ee7ee4b0 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 12 (irq266: em1:rx0) trap number = 9 panic: general protection fault cpuid = 0 KDB: stack backtrace: #0 0x80984e30 at kdb_backtrace+0x60 #1 0x809489e6 at vpanic+0x126 #2 0x809488b3 at panic+0x43 #3 0x80d4ab6b at trap_fatal+0x36b #4 0x80d4a7ec at trap+0x75c #5 0x80d30882 at calltrap+0x8 #6 0x80b4a725 at xprt_active+0x45 #7 0x80b4e185 at svc_vc_soupcall+0x35 #8 0x809bcc52 at sowakeup+0x82 #9 0x80aea942 at tcp_do_segment+0x2b22 #10 0x80ae7720 at tcp_input+0x12b0 #11 0x80a77f57 at ip_input+0x97 #12 0x80a177d2 at netisr_dispatch_src+0x62 #13 0x80a0eb76 at ether_demux+0x126 #14 0x80a0f81e at ether_nh_input+0x35e #15 0x80a177d2 at netisr_dispatch_src+0x62 #16 0x804e121b at em_rxeof+0x2eb #17 0x804e1663 at em_msix_rx+0x33 Uptime: 10h5m21s Dumping 3079 out of 8134 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% Reading symbols from /boot/kernel/zfs.ko.symbols...done. Loaded symbols for /boot/kernel/zfs.ko.symbols Reading symbols from /boot/kernel/opensolaris.ko.symbols...done. Loaded symbols for /boot/kernel/opensolaris.ko.symbols Reading symbols from /boot/kernel/if_tap.ko.symbols...done. Loaded symbols for /boot/kernel/if_tap.ko.symbols Reading symbols from /boot/kernel/if_bridge.ko.symbols...done. Loaded symbols for /boot/kernel/if_bridge.ko.symbols Reading symbols from /boot/kernel/bridgestp.ko.symbols...done. Loaded symbols for /boot/kernel/bridgestp.ko.symbols Reading symbols from /boot/kernel/aio.ko.symbols...done. Loaded symbols for /boot/kernel/aio.ko.symbols Reading symbols from /boot/kernel/coretemp.ko.symbols...done. Loaded symbols for /boot/kernel/coretemp.ko.symbols Reading symbols from /boot/kernel/ipmi.ko.symbols...done. Loaded symbols for /boot/kernel/ipmi.ko.symbols Reading symbols from /boot/kernel/smbus.ko.symbols...done. Loaded symbols for /boot/kernel/smbus.ko.symbols Reading symbols from /boot/kernel/aesni.ko.symbols...done. Loaded symbols for /boot/kernel/aesni.ko.symbols Reading symbols from /boot/kernel/crypto.ko.symbols...done. Loaded symbols for /boot/kernel/crypto.ko.symbols Reading symbols from /boot/kernel/mps.ko.symbols...done. Loaded symbols for /boot/kernel/mps.ko.symbols Reading symbols from /boot/kernel/vmm.ko.symbols...done. Loaded symbols for /boot/kernel/vmm.ko.symbols Reading symbols from /boot/kernel/nmdm.ko.symbols...done. Loaded symbols for /boot/kernel/nmdm.ko.symbols Reading symbols from /boot/kernel/geom_eli.ko.symbols...done. Loaded symbols for /boot/kernel/geom_eli.ko.symbols Reading symbols from /boot/kernel/ums.ko.symbols...done. Loaded symbols for /boot/kernel/ums.ko.symbols Reading symbols from /boot/kernel/ipfw.ko.symbols...done. Loaded symbols for /boot/kernel/ipfw.ko.symbols Reading symbols from /boot/kernel/dummynet.ko.symbols...done. Loaded symbols for /boot/kernel/dummynet.ko.symbols Reading symbols from /boot/kernel/ipfw_nat.ko.symbols...done. Loaded symbols for /boot/kernel/ipfw_nat.ko.symbols Reading symbols from /boot/kernel/libalias.ko.symbols...done. Loaded symbols for /boot/kernel/libalias.ko.symbols Reading symbols from /boot/kernel/ctl.ko.symbols...done. Loaded symbols for /boot/kernel/ctl.ko.symbols Reading symbols from /boot/kernel/iscsi.ko.symbols...done. Loaded symbols for /boot/kernel/iscsi.ko.symbols Reading symbols from /boot/kernel/nullfs.ko.symbols...done. Loaded symbols for /boot/kernel/nullfs.ko.symbols Reading symbols from /boot/kernel/fdescfs.ko.symbols...done. Loaded symbols for /boot/kernel/fdescfs.ko.symbols #0 doadump (textdump=) at pcpu.h:219 219 pcpu.h: No such file or directory. in pcpu.h (kgdb) list *0xfe01ee7ee430 No source file for address 0xfe01ee7ee430. (kgdb) backtrace #0 doadump (textdump=) at pcpu.h:219 #1 0x80948642 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:451 #2 0x80948a25 in vpanic (fmt=, ap=) at /usr/src/sys/kern/kern_shutdown.c:758 #3 0x809488b3 in panic (fmt=0x0) at /usr/src/sys/kern/kern_shutdown.c:687 #4 0x80d4ab6b in trap_fatal (frame=, eva=) at /usr/src/sys/amd64/amd64/trap.c:851 #5 0x80d4a7ec in trap (frame=) at /usr/src/sys/amd64/amd64/trap.c:203 #6 0x80d30882 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236 #7 0x8092e980 in __mtx_lock_sleep (c=0xfe
[Bug 204340] [panic] nfsd, em, msix, fatal trap 9
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204340 --- Comment #6 from g_amana...@yahoo.com --- (In reply to Rick Macklem from comment #4) It still crashes. I am going to get a dump tomorrow. -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 204340] [panic] nfsd, em, msix, fatal trap 9
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204340 --- Comment #5 from g_amana...@yahoo.com --- (In reply to Rick Macklem from comment #4) I am testing it, so far it seems ok. Give me a couple of days to test it thoroughly. -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 204340] [panic] nfsd, em, msix, fatal trap 9
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204340 Rick Macklem changed: What|Removed |Added Attachment #163160|0 |1 is obsolete|| --- Comment #4 from Rick Macklem --- Created attachment 163217 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=163217&action=edit Patch to fix sg_threadcount++ so the mutex is held when done I took a closer look at svc.c and the only way that the race I think might have happened can occur is if svc_run() doesn't wait for all threads to terminate. I also can now see that the first patch wouldn't have fixed anything, so I'm not surprised it didn't work. The only thing I can see that is broken in the code and might allow svc_run() to return before all threads have terminated is: - The thread count sg_threadcount is incremented when the sg_lock mutex isn't held. --> This could conceivably result in a corrupted sg_threadcount, which would allow svc_run() to return before all threads have terminated. This second patch fixes the code so that sg_threadcount++ is always done when the sg_lock mutex is held. If you can test this one instead of the last one, that would be appreciated. I do know this patch fixes the above problem, but I don't know if this is the cause of your crashes. Also, this bug seems to have existed in the code forever and all that r267228 did was switch from not holding the pool mutex to not holding the group mutex. So Alexander, you are off the hook, I think.;-) I've left you on the cc, since you probably know this code better than anyone else and might have some insight w.r.t. this crash. -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 204340] [panic] nfsd, em, msix, fatal trap 9
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204340 --- Comment #3 from g_amana...@yahoo.com --- (In reply to Rick Macklem from comment #2) Thanks for the insight. However, I just tried the patch proposed and it still crashes. -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 204340] [panic] nfsd, em, msix, fatal trap 9
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204340 Kubilay Kocak changed: What|Removed |Added Flags||mfc-stable9?, mfc-stable10? Keywords||crash -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 204340] [panic] nfsd, em, msix, fatal trap 9
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204340 Rick Macklem changed: What|Removed |Added Status|New |In Progress CC||rmack...@freebsd.org Assignee|freebsd-b...@freebsd.org|rmack...@freebsd.org --- Comment #2 from Rick Macklem --- Created attachment 163160 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=163160&action=edit patch that might fix this problem I think this crash might have been caused by a race between svcpool_destroy() and the socket upcall. The code in svcpool_destroy() assumes that SVC_RELEASE(xprt) drops the ref cnt to 0, so that SVC_DESTROY() is called. -->SVC_DESTROY() shuts down the socket upcall. --> If the ref cnt doesn't go to 0, svcpool_destroy() will mtx_destroy() the mutexes prematurely. I am not sure, but the race might have been introduced by r267228 since, prior to this there was a single mutex for the pool, held while all xprt's are unregistered. After r267228, there is a group of mutexes, where the code only held one at a time, so I think an xprt might get re-registered on another group after that group has had all de-registered. The attached little patch moves the mtx_lock() calls to a separate loop before the xprt_unregister loops, so that all locks are held while all are de-registered. I've added mav@ to the cc list, since he might be the guy that actually understands this. Anyhow, if you could test the attached patch with msi interrupts re-enabled and see if the crashes go away, that would be great. (I don't think that this indicates that the em(4) driver is broken. I suspect that it just affects timing of the interrupts that tripped over this race.) -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 204340] [panic] nfsd, em, msix, fatal trap 9
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204340 Sean Bruno changed: What|Removed |Added Keywords||IntelNetworking CC||sbr...@freebsd.org -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
[Bug 204340] [panic] nfsd, em, msix, fatal trap 9
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204340 g_amana...@yahoo.com changed: What|Removed |Added CC||freebsd-net@FreeBSD.org -- You are receiving this mail because: You are on the CC list for the bug. ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"