Re: Trap 12 in vm_page_alloc_after()

2018-11-28 Thread Garrett Wollman
< said:

> If you're using a Skylake, I suspect that you can set the
> hw.skz63_enable tunable to 0 as a workaround, assuming you're not using
> any code that relies on Intel TSX.  (I don't think there's anything in
> the base system that does.)  There are some details in
> https://reviews.freebsd.org/D18374

It is definitely a Skylake (although it took searching to find that
out, since we don't identify processors by Intel codename).  I've
set that tunable, but I won't know whether it helps until the next
(scheduled or unscheduled) reboot.

-GAWollman

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Trap 12 in vm_page_alloc_after()

2018-11-28 Thread Mark Johnston
On Sun, Nov 25, 2018 at 11:35:30PM -0500, Garrett Wollman wrote:
> <  said:
> 
> > On Sun, Nov 18, 2018 at 08:24:38PM -0500, Garrett Wollman wrote:
> >> Has anyone seen this before?  It's on a busy NFS server, but hasn't
> >> been observed on any of our other NFS servers.
> >> 
> >> 
> >> Fatal trap 12: page fault while in kernel mode
> 
> >> --- trap 0xc, rip = 0x809a903d, rsp = 0xfe17eb8d0710, rbp = 
> >> 0xfe17eb8d0750 ---
> >> vm_page_alloc_after() at vm_page_alloc_after+0x15d/frame 0xfe17eb8d0750
> 
> > What is the line number for vm_page_alloc_after+0x15d ?
> > Do you have NUMA enabled on 11 ?
> 
> If gdb is to be believed, the trap is at line 1687:
> 
> /*
>  *  At this point we had better have found a good page.
>  */
> KASSERT(m != NULL, ("missing page"));
> free_count = vm_phys_freecnt_adj(m, -1);
> >>  if ((m->flags & PG_ZERO) != 0)
> vm_page_zero_count--;
> mtx_unlock(&vm_page_queue_free_mtx);
> vm_page_alloc_check(m);
> 
> The faulting instruction is:
> 
> 0x809a903d :   testb  $0x8,0x5a(%r14)
> 
> There are no options matching /numa/i in the configuration.  (This is
> a non-debugging configuration so the KASSERT is inoperative, I
> assume.)  I have about a dozen other servers with the same kernel and
> they're not crashing, but obviously they all have different loads and
> sets of active clients.

If you're using a Skylake, I suspect that you can set the
hw.skz63_enable tunable to 0 as a workaround, assuming you're not using
any code that relies on Intel TSX.  (I don't think there's anything in
the base system that does.)  There are some details in
https://reviews.freebsd.org/D18374
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Trap 12 in vm_page_alloc_after()

2018-11-25 Thread Garrett Wollman
< 
said:

> On Sun, Nov 18, 2018 at 08:24:38PM -0500, Garrett Wollman wrote:
>> Has anyone seen this before?  It's on a busy NFS server, but hasn't
>> been observed on any of our other NFS servers.
>> 
>> 
>> Fatal trap 12: page fault while in kernel mode

>> --- trap 0xc, rip = 0x809a903d, rsp = 0xfe17eb8d0710, rbp = 
>> 0xfe17eb8d0750 ---
>> vm_page_alloc_after() at vm_page_alloc_after+0x15d/frame 0xfe17eb8d0750

> What is the line number for vm_page_alloc_after+0x15d ?
> Do you have NUMA enabled on 11 ?

If gdb is to be believed, the trap is at line 1687:

/*
 *  At this point we had better have found a good page.
 */
KASSERT(m != NULL, ("missing page"));
free_count = vm_phys_freecnt_adj(m, -1);
>>  if ((m->flags & PG_ZERO) != 0)
vm_page_zero_count--;
mtx_unlock(&vm_page_queue_free_mtx);
vm_page_alloc_check(m);

The faulting instruction is:

0x809a903d :   testb  $0x8,0x5a(%r14)

There are no options matching /numa/i in the configuration.  (This is
a non-debugging configuration so the KASSERT is inoperative, I
assume.)  I have about a dozen other servers with the same kernel and
they're not crashing, but obviously they all have different loads and
sets of active clients.

-GAWollman

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Trap 12 in vm_page_alloc_after()

2018-11-18 Thread Konstantin Belousov
On Sun, Nov 18, 2018 at 08:24:38PM -0500, Garrett Wollman wrote:
> Has anyone seen this before?  It's on a busy NFS server, but hasn't
> been observed on any of our other NFS servers.
> 
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 35; apic id = 35
> fault virtual address   = 0x5a
> fault code  = supervisor read data, page not present
> instruction pointer = 0x20:0x809a903d
> stack pointer   = 0x28:0xfe17eb8d0710
> frame pointer   = 0x28:0xfe17eb8d0750
> code segment= base 0x0, limit 0xf, type 0x1b
> = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags= interrupt enabled, resume, IOPL = 0
> current process = 878 (nfsd: service)
> trap number = 12
> panic: page fault
> cpuid = 35
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe17eb8d03c0
> vpanic() at vpanic+0x177/frame 0xfe17eb8d0420
> panic() at panic+0x43/frame 0xfe17eb8d0480
> trap_fatal() at trap_fatal+0x35f/frame 0xfe17eb8d04d0
> trap_pfault() at trap_pfault+0x49/frame 0xfe17eb8d0530
> trap() at trap+0x2c7/frame 0xfe17eb8d0640
> calltrap() at calltrap+0x8/frame 0xfe17eb8d0640
> --- trap 0xc, rip = 0x809a903d, rsp = 0xfe17eb8d0710, rbp = 
> 0xfe17eb8d0750 ---
> vm_page_alloc_after() at vm_page_alloc_after+0x15d/frame 0xfe17eb8d0750
> kmem_back() at kmem_back+0xf2/frame 0xfe17eb8d07c0
> kmem_malloc() at kmem_malloc+0x60/frame 0xfe17eb8d07f0
> keg_alloc_slab() at keg_alloc_slab+0xe2/frame 0xfe17eb8d0860
> keg_fetch_slab() at keg_fetch_slab+0x14e/frame 0xfe17eb8d08b0
> zone_fetch_slab() at zone_fetch_slab+0x64/frame 0xfe17eb8d08e0
> zone_import() at zone_import+0x3f/frame 0xfe17eb8d0930
> uma_zalloc_arg() at uma_zalloc_arg+0x3d9/frame 0xfe17eb8d09a0
> zil_alloc_lwb() at zil_alloc_lwb+0x9c/frame 0xfe17eb8d09e0
> zil_lwb_write_issue() at zil_lwb_write_issue+0x2f8/frame 0xfe17eb8d0a40
> zil_commit_impl() at zil_commit_impl+0x95f/frame 0xfe17eb8d0b80
> zfs_freebsd_fsync() at zfs_freebsd_fsync+0xa7/frame 0xfe17eb8d0bb0
> VOP_FSYNC_APV() at VOP_FSYNC_APV+0x82/frame 0xfe17eb8d0be0
> nfsvno_fsync() at nfsvno_fsync+0xe0/frame 0xfe17eb8d0c50
> nfsrvd_commit() at nfsrvd_commit+0xe8/frame 0xfe17eb8d0e20
> nfsrvd_dorpc() at nfsrvd_dorpc+0x621/frame 0xfe17eb8d0ff0
> nfssvc_program() at nfssvc_program+0x557/frame 0xfe17eb8d11a0
> svc_run_internal() at svc_run_internal+0xe09/frame 0xfe17eb8d12e0
> svc_thread_start() at svc_thread_start+0xb/frame 0xfe17eb8d12f0
> fork_exit() at fork_exit+0x83/frame 0xfe17eb8d1330
> fork_trampoline() at fork_trampoline+0xe/frame 0xfe17eb8d1330
> --- trap 0xc, rip = 0x80087101a, rsp = 0x7fffe688, rbp = 0x7fffe930 
> ---
> 
> 
> At this point the system was frozen: it did not attempt to reboot
> automatically and was not in the debugger.  I had to do a remote reset
> via the BMC.  The kernel is 11.2 r336644 (so no errata applied), but
> none of the SAs and ENs release so far look like they touch this
> region of code.

What is the line number for vm_page_alloc_after+0x15d ?
Do you have NUMA enabled on 11 ?
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Trap 12 in vm_page_alloc_after()

2018-11-18 Thread Garrett Wollman
Has anyone seen this before?  It's on a busy NFS server, but hasn't
been observed on any of our other NFS servers.


Fatal trap 12: page fault while in kernel mode
cpuid = 35; apic id = 35
fault virtual address   = 0x5a
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x809a903d
stack pointer   = 0x28:0xfe17eb8d0710
frame pointer   = 0x28:0xfe17eb8d0750
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 878 (nfsd: service)
trap number = 12
panic: page fault
cpuid = 35
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe17eb8d03c0
vpanic() at vpanic+0x177/frame 0xfe17eb8d0420
panic() at panic+0x43/frame 0xfe17eb8d0480
trap_fatal() at trap_fatal+0x35f/frame 0xfe17eb8d04d0
trap_pfault() at trap_pfault+0x49/frame 0xfe17eb8d0530
trap() at trap+0x2c7/frame 0xfe17eb8d0640
calltrap() at calltrap+0x8/frame 0xfe17eb8d0640
--- trap 0xc, rip = 0x809a903d, rsp = 0xfe17eb8d0710, rbp = 
0xfe17eb8d0750 ---
vm_page_alloc_after() at vm_page_alloc_after+0x15d/frame 0xfe17eb8d0750
kmem_back() at kmem_back+0xf2/frame 0xfe17eb8d07c0
kmem_malloc() at kmem_malloc+0x60/frame 0xfe17eb8d07f0
keg_alloc_slab() at keg_alloc_slab+0xe2/frame 0xfe17eb8d0860
keg_fetch_slab() at keg_fetch_slab+0x14e/frame 0xfe17eb8d08b0
zone_fetch_slab() at zone_fetch_slab+0x64/frame 0xfe17eb8d08e0
zone_import() at zone_import+0x3f/frame 0xfe17eb8d0930
uma_zalloc_arg() at uma_zalloc_arg+0x3d9/frame 0xfe17eb8d09a0
zil_alloc_lwb() at zil_alloc_lwb+0x9c/frame 0xfe17eb8d09e0
zil_lwb_write_issue() at zil_lwb_write_issue+0x2f8/frame 0xfe17eb8d0a40
zil_commit_impl() at zil_commit_impl+0x95f/frame 0xfe17eb8d0b80
zfs_freebsd_fsync() at zfs_freebsd_fsync+0xa7/frame 0xfe17eb8d0bb0
VOP_FSYNC_APV() at VOP_FSYNC_APV+0x82/frame 0xfe17eb8d0be0
nfsvno_fsync() at nfsvno_fsync+0xe0/frame 0xfe17eb8d0c50
nfsrvd_commit() at nfsrvd_commit+0xe8/frame 0xfe17eb8d0e20
nfsrvd_dorpc() at nfsrvd_dorpc+0x621/frame 0xfe17eb8d0ff0
nfssvc_program() at nfssvc_program+0x557/frame 0xfe17eb8d11a0
svc_run_internal() at svc_run_internal+0xe09/frame 0xfe17eb8d12e0
svc_thread_start() at svc_thread_start+0xb/frame 0xfe17eb8d12f0
fork_exit() at fork_exit+0x83/frame 0xfe17eb8d1330
fork_trampoline() at fork_trampoline+0xe/frame 0xfe17eb8d1330
--- trap 0xc, rip = 0x80087101a, rsp = 0x7fffe688, rbp = 0x7fffe930 ---


At this point the system was frozen: it did not attempt to reboot
automatically and was not in the debugger.  I had to do a remote reset
via the BMC.  The kernel is 11.2 r336644 (so no errata applied), but
none of the SAs and ENs release so far look like they touch this
region of code.

-GAWollman

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"