Re: Trap 12 in vm_page_alloc_after()
< said: > If you're using a Skylake, I suspect that you can set the > hw.skz63_enable tunable to 0 as a workaround, assuming you're not using > any code that relies on Intel TSX. (I don't think there's anything in > the base system that does.) There are some details in > https://reviews.freebsd.org/D18374 It is definitely a Skylake (although it took searching to find that out, since we don't identify processors by Intel codename). I've set that tunable, but I won't know whether it helps until the next (scheduled or unscheduled) reboot. -GAWollman ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Trap 12 in vm_page_alloc_after()
On Sun, Nov 25, 2018 at 11:35:30PM -0500, Garrett Wollman wrote: > < said: > > > On Sun, Nov 18, 2018 at 08:24:38PM -0500, Garrett Wollman wrote: > >> Has anyone seen this before? It's on a busy NFS server, but hasn't > >> been observed on any of our other NFS servers. > >> > >> > >> Fatal trap 12: page fault while in kernel mode > > >> --- trap 0xc, rip = 0x809a903d, rsp = 0xfe17eb8d0710, rbp = > >> 0xfe17eb8d0750 --- > >> vm_page_alloc_after() at vm_page_alloc_after+0x15d/frame 0xfe17eb8d0750 > > > What is the line number for vm_page_alloc_after+0x15d ? > > Do you have NUMA enabled on 11 ? > > If gdb is to be believed, the trap is at line 1687: > > /* > * At this point we had better have found a good page. > */ > KASSERT(m != NULL, ("missing page")); > free_count = vm_phys_freecnt_adj(m, -1); > >> if ((m->flags & PG_ZERO) != 0) > vm_page_zero_count--; > mtx_unlock(&vm_page_queue_free_mtx); > vm_page_alloc_check(m); > > The faulting instruction is: > > 0x809a903d : testb $0x8,0x5a(%r14) > > There are no options matching /numa/i in the configuration. (This is > a non-debugging configuration so the KASSERT is inoperative, I > assume.) I have about a dozen other servers with the same kernel and > they're not crashing, but obviously they all have different loads and > sets of active clients. If you're using a Skylake, I suspect that you can set the hw.skz63_enable tunable to 0 as a workaround, assuming you're not using any code that relies on Intel TSX. (I don't think there's anything in the base system that does.) There are some details in https://reviews.freebsd.org/D18374 ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Trap 12 in vm_page_alloc_after()
< said: > On Sun, Nov 18, 2018 at 08:24:38PM -0500, Garrett Wollman wrote: >> Has anyone seen this before? It's on a busy NFS server, but hasn't >> been observed on any of our other NFS servers. >> >> >> Fatal trap 12: page fault while in kernel mode >> --- trap 0xc, rip = 0x809a903d, rsp = 0xfe17eb8d0710, rbp = >> 0xfe17eb8d0750 --- >> vm_page_alloc_after() at vm_page_alloc_after+0x15d/frame 0xfe17eb8d0750 > What is the line number for vm_page_alloc_after+0x15d ? > Do you have NUMA enabled on 11 ? If gdb is to be believed, the trap is at line 1687: /* * At this point we had better have found a good page. */ KASSERT(m != NULL, ("missing page")); free_count = vm_phys_freecnt_adj(m, -1); >> if ((m->flags & PG_ZERO) != 0) vm_page_zero_count--; mtx_unlock(&vm_page_queue_free_mtx); vm_page_alloc_check(m); The faulting instruction is: 0x809a903d : testb $0x8,0x5a(%r14) There are no options matching /numa/i in the configuration. (This is a non-debugging configuration so the KASSERT is inoperative, I assume.) I have about a dozen other servers with the same kernel and they're not crashing, but obviously they all have different loads and sets of active clients. -GAWollman ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Trap 12 in vm_page_alloc_after()
On Sun, Nov 18, 2018 at 08:24:38PM -0500, Garrett Wollman wrote: > Has anyone seen this before? It's on a busy NFS server, but hasn't > been observed on any of our other NFS servers. > > > Fatal trap 12: page fault while in kernel mode > cpuid = 35; apic id = 35 > fault virtual address = 0x5a > fault code = supervisor read data, page not present > instruction pointer = 0x20:0x809a903d > stack pointer = 0x28:0xfe17eb8d0710 > frame pointer = 0x28:0xfe17eb8d0750 > code segment= base 0x0, limit 0xf, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags= interrupt enabled, resume, IOPL = 0 > current process = 878 (nfsd: service) > trap number = 12 > panic: page fault > cpuid = 35 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe17eb8d03c0 > vpanic() at vpanic+0x177/frame 0xfe17eb8d0420 > panic() at panic+0x43/frame 0xfe17eb8d0480 > trap_fatal() at trap_fatal+0x35f/frame 0xfe17eb8d04d0 > trap_pfault() at trap_pfault+0x49/frame 0xfe17eb8d0530 > trap() at trap+0x2c7/frame 0xfe17eb8d0640 > calltrap() at calltrap+0x8/frame 0xfe17eb8d0640 > --- trap 0xc, rip = 0x809a903d, rsp = 0xfe17eb8d0710, rbp = > 0xfe17eb8d0750 --- > vm_page_alloc_after() at vm_page_alloc_after+0x15d/frame 0xfe17eb8d0750 > kmem_back() at kmem_back+0xf2/frame 0xfe17eb8d07c0 > kmem_malloc() at kmem_malloc+0x60/frame 0xfe17eb8d07f0 > keg_alloc_slab() at keg_alloc_slab+0xe2/frame 0xfe17eb8d0860 > keg_fetch_slab() at keg_fetch_slab+0x14e/frame 0xfe17eb8d08b0 > zone_fetch_slab() at zone_fetch_slab+0x64/frame 0xfe17eb8d08e0 > zone_import() at zone_import+0x3f/frame 0xfe17eb8d0930 > uma_zalloc_arg() at uma_zalloc_arg+0x3d9/frame 0xfe17eb8d09a0 > zil_alloc_lwb() at zil_alloc_lwb+0x9c/frame 0xfe17eb8d09e0 > zil_lwb_write_issue() at zil_lwb_write_issue+0x2f8/frame 0xfe17eb8d0a40 > zil_commit_impl() at zil_commit_impl+0x95f/frame 0xfe17eb8d0b80 > zfs_freebsd_fsync() at zfs_freebsd_fsync+0xa7/frame 0xfe17eb8d0bb0 > VOP_FSYNC_APV() at VOP_FSYNC_APV+0x82/frame 0xfe17eb8d0be0 > nfsvno_fsync() at nfsvno_fsync+0xe0/frame 0xfe17eb8d0c50 > nfsrvd_commit() at nfsrvd_commit+0xe8/frame 0xfe17eb8d0e20 > nfsrvd_dorpc() at nfsrvd_dorpc+0x621/frame 0xfe17eb8d0ff0 > nfssvc_program() at nfssvc_program+0x557/frame 0xfe17eb8d11a0 > svc_run_internal() at svc_run_internal+0xe09/frame 0xfe17eb8d12e0 > svc_thread_start() at svc_thread_start+0xb/frame 0xfe17eb8d12f0 > fork_exit() at fork_exit+0x83/frame 0xfe17eb8d1330 > fork_trampoline() at fork_trampoline+0xe/frame 0xfe17eb8d1330 > --- trap 0xc, rip = 0x80087101a, rsp = 0x7fffe688, rbp = 0x7fffe930 > --- > > > At this point the system was frozen: it did not attempt to reboot > automatically and was not in the debugger. I had to do a remote reset > via the BMC. The kernel is 11.2 r336644 (so no errata applied), but > none of the SAs and ENs release so far look like they touch this > region of code. What is the line number for vm_page_alloc_after+0x15d ? Do you have NUMA enabled on 11 ? ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Trap 12 in vm_page_alloc_after()
Has anyone seen this before? It's on a busy NFS server, but hasn't been observed on any of our other NFS servers. Fatal trap 12: page fault while in kernel mode cpuid = 35; apic id = 35 fault virtual address = 0x5a fault code = supervisor read data, page not present instruction pointer = 0x20:0x809a903d stack pointer = 0x28:0xfe17eb8d0710 frame pointer = 0x28:0xfe17eb8d0750 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 878 (nfsd: service) trap number = 12 panic: page fault cpuid = 35 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe17eb8d03c0 vpanic() at vpanic+0x177/frame 0xfe17eb8d0420 panic() at panic+0x43/frame 0xfe17eb8d0480 trap_fatal() at trap_fatal+0x35f/frame 0xfe17eb8d04d0 trap_pfault() at trap_pfault+0x49/frame 0xfe17eb8d0530 trap() at trap+0x2c7/frame 0xfe17eb8d0640 calltrap() at calltrap+0x8/frame 0xfe17eb8d0640 --- trap 0xc, rip = 0x809a903d, rsp = 0xfe17eb8d0710, rbp = 0xfe17eb8d0750 --- vm_page_alloc_after() at vm_page_alloc_after+0x15d/frame 0xfe17eb8d0750 kmem_back() at kmem_back+0xf2/frame 0xfe17eb8d07c0 kmem_malloc() at kmem_malloc+0x60/frame 0xfe17eb8d07f0 keg_alloc_slab() at keg_alloc_slab+0xe2/frame 0xfe17eb8d0860 keg_fetch_slab() at keg_fetch_slab+0x14e/frame 0xfe17eb8d08b0 zone_fetch_slab() at zone_fetch_slab+0x64/frame 0xfe17eb8d08e0 zone_import() at zone_import+0x3f/frame 0xfe17eb8d0930 uma_zalloc_arg() at uma_zalloc_arg+0x3d9/frame 0xfe17eb8d09a0 zil_alloc_lwb() at zil_alloc_lwb+0x9c/frame 0xfe17eb8d09e0 zil_lwb_write_issue() at zil_lwb_write_issue+0x2f8/frame 0xfe17eb8d0a40 zil_commit_impl() at zil_commit_impl+0x95f/frame 0xfe17eb8d0b80 zfs_freebsd_fsync() at zfs_freebsd_fsync+0xa7/frame 0xfe17eb8d0bb0 VOP_FSYNC_APV() at VOP_FSYNC_APV+0x82/frame 0xfe17eb8d0be0 nfsvno_fsync() at nfsvno_fsync+0xe0/frame 0xfe17eb8d0c50 nfsrvd_commit() at nfsrvd_commit+0xe8/frame 0xfe17eb8d0e20 nfsrvd_dorpc() at nfsrvd_dorpc+0x621/frame 0xfe17eb8d0ff0 nfssvc_program() at nfssvc_program+0x557/frame 0xfe17eb8d11a0 svc_run_internal() at svc_run_internal+0xe09/frame 0xfe17eb8d12e0 svc_thread_start() at svc_thread_start+0xb/frame 0xfe17eb8d12f0 fork_exit() at fork_exit+0x83/frame 0xfe17eb8d1330 fork_trampoline() at fork_trampoline+0xe/frame 0xfe17eb8d1330 --- trap 0xc, rip = 0x80087101a, rsp = 0x7fffe688, rbp = 0x7fffe930 --- At this point the system was frozen: it did not attempt to reboot automatically and was not in the debugger. I had to do a remote reset via the BMC. The kernel is 11.2 r336644 (so no errata applied), but none of the SAs and ENs release so far look like they touch this region of code. -GAWollman ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"