Re: zfs panic RELENG_12

2020-12-22 Thread mike tancsa
On 12/22/2020 10:09 AM, mike tancsa wrote:
> On 12/22/2020 10:07 AM, Mark Johnston wrote:
>> Could you go to frame 11 and print zone->uz_name and
>> bucket->ub_bucket[18]?  I'm wondering if the item pointer was mangled
>> somehow.
> Thank you for looking!
>
> (kgdb) frame 11
>
> #11 0x80ca47d4 in bucket_drain (zone=0xf800037da000,
> bucket=0xf801c7fd5200) at /usr/src/sys/vm/uma_core.c:758
> 758 zone->uz_release(zone->uz_arg, bucket->ub_bucket,
> bucket->ub_cnt);
> (kgdb) p zone->uz_name
> $1 = 0x8102118a "mbuf_jumbo_9k"
> (kgdb) p bucket->ub_bucket[18]
> $2 = (void *) 0xf80de4654000
> (kgdb) p bucket->ub_bucket   
> $3 = 0xf801c7fd5218
>
> (kgdb)
>
Not sure if its coincidence or not, but previously I was running with
arc being limited to ~30G of the 64G of RAM on the box.  I removed that
limit a few weeks ago after upgrading the box to RELENG_12 to pull in
the OpenSSL changes.  The panic seems to happen under disk load. I have
3 zfs pools that are pretty busy receiving snapshots. One day a week, we
write a full set to a 4th zfs pool off some geli attached drives via USB
for offsite cold storage.  The crashes happened with that extra level of
disk work.  gstat shows most of the 12 drives off 2 mrsas controllers at
or close to 100% busy during the 18hrs it takes to dump out the files.

Trying a new cold storage run now with the arc limit back to
vfs.zfs.arc_max=29334498304

    ---Mike



___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: zfs panic RELENG_12

2020-12-22 Thread mike tancsa
On 12/22/2020 10:07 AM, Mark Johnston wrote:
>
> Could you go to frame 11 and print zone->uz_name and
> bucket->ub_bucket[18]?  I'm wondering if the item pointer was mangled
> somehow.

Thank you for looking!

(kgdb) frame 11

#11 0x80ca47d4 in bucket_drain (zone=0xf800037da000,
bucket=0xf801c7fd5200) at /usr/src/sys/vm/uma_core.c:758
758 zone->uz_release(zone->uz_arg, bucket->ub_bucket,
bucket->ub_cnt);
(kgdb) p zone->uz_name
$1 = 0x8102118a "mbuf_jumbo_9k"
(kgdb) p bucket->ub_bucket[18]
$2 = (void *) 0xf80de4654000
(kgdb) p bucket->ub_bucket   
$3 = 0xf801c7fd5218

(kgdb)

    ---Mike

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: zfs panic RELENG_12

2020-12-22 Thread Mark Johnston
On Tue, Dec 22, 2020 at 09:05:01AM -0500, mike tancsa wrote:
> Hmmm, another one. Not sure if this is hardware as it seems different ?
> 
> 
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 11; apic id = 0b
> fault virtual address   = 0x0
> fault code  = supervisor write data, page not present
> instruction pointer = 0x20:0x80ca0826
> stack pointer   = 0x28:0xfe00bc0f8540
> frame pointer   = 0x28:0xfe00bc0f8590
> code segment    = base 0x0, limit 0xf, type 0x1b
>     = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags    = interrupt enabled, resume, IOPL = 0
> current process = 33 (dom0)
> trap number = 12
> panic: page fault
> cpuid = 11
> time = 1608641071
> KDB: stack backtrace:
> #0 0x80a3fe85 at kdb_backtrace+0x65
> #1 0x809f406b at vpanic+0x17b
> #2 0x809f3ee3 at panic+0x43
> #3 0x80e3fe71 at trap_fatal+0x391
> #4 0x80e3fecf at trap_pfault+0x4f
> #5 0x80e3f516 at trap+0x286
> #6 0x80e19318 at calltrap+0x8
> #7 0x80ca47d4 at bucket_cache_drain+0x134
> #8 0x80c9e302 at zone_drain_wait+0xa2
> #9 0x80ca2bbd at uma_reclaim_locked+0x6d
> #10 0x80ca2af4 at uma_reclaim+0x34
> #11 0x80cc5321 at vm_pageout_worker+0x421
> #12 0x80cc4ee3 at vm_pageout+0x193
> #13 0x809b55be at fork_exit+0x7e
> #14 0x80e1a34e at fork_trampoline+0xe
> Uptime: 5d20h37m16s
> Dumping 16057 out of 65398
> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
> 
> __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
> 55  __asm("movq %%gs:%P1,%0" : "=r" (td) : "n"
> (offsetof(struct pcpu,
> (kgdb) bt

Could you go to frame 11 and print zone->uz_name and
bucket->ub_bucket[18]?  I'm wondering if the item pointer was mangled
somehow.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: zfs panic RELENG_12

2020-12-22 Thread mike tancsa
Hmmm, another one. Not sure if this is hardware as it seems different ?



Fatal trap 12: page fault while in kernel mode
cpuid = 11; apic id = 0b
fault virtual address   = 0x0
fault code  = supervisor write data, page not present
instruction pointer = 0x20:0x80ca0826
stack pointer   = 0x28:0xfe00bc0f8540
frame pointer   = 0x28:0xfe00bc0f8590
code segment    = base 0x0, limit 0xf, type 0x1b
    = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags    = interrupt enabled, resume, IOPL = 0
current process = 33 (dom0)
trap number = 12
panic: page fault
cpuid = 11
time = 1608641071
KDB: stack backtrace:
#0 0x80a3fe85 at kdb_backtrace+0x65
#1 0x809f406b at vpanic+0x17b
#2 0x809f3ee3 at panic+0x43
#3 0x80e3fe71 at trap_fatal+0x391
#4 0x80e3fecf at trap_pfault+0x4f
#5 0x80e3f516 at trap+0x286
#6 0x80e19318 at calltrap+0x8
#7 0x80ca47d4 at bucket_cache_drain+0x134
#8 0x80c9e302 at zone_drain_wait+0xa2
#9 0x80ca2bbd at uma_reclaim_locked+0x6d
#10 0x80ca2af4 at uma_reclaim+0x34
#11 0x80cc5321 at vm_pageout_worker+0x421
#12 0x80cc4ee3 at vm_pageout+0x193
#13 0x809b55be at fork_exit+0x7e
#14 0x80e1a34e at fork_trampoline+0xe
Uptime: 5d20h37m16s
Dumping 16057 out of 65398
MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
55  __asm("movq %%gs:%P1,%0" : "=r" (td) : "n"
(offsetof(struct pcpu,
(kgdb) bt
#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=) at
/usr/src/sys/kern/kern_shutdown.c:371
#2  0x809f3c85 in kern_reboot (howto=260) at
/usr/src/sys/kern/kern_shutdown.c:451
#3  0x809f40c3 in vpanic (fmt=, ap=) at /usr/src/sys/kern/kern_shutdown.c:880
#4  0x809f3ee3 in panic (fmt=) at
/usr/src/sys/kern/kern_shutdown.c:807
#5  0x80e3fe71 in trap_fatal (frame=0xfe00bc0f8480, eva=0)
at /usr/src/sys/amd64/amd64/trap.c:921
#6  0x80e3fecf in trap_pfault (frame=0xfe00bc0f8480,
usermode=, signo=, ucode=)
at /usr/src/sys/amd64/amd64/trap.c:739
#7  0x80e3f516 in trap (frame=0xfe00bc0f8480) at
/usr/src/sys/amd64/amd64/trap.c:405
#8  
#9  0x80ca0826 in slab_free_item (keg=0xf800037fa380,
slab=0xf80de4656fb0, item=) at
/usr/src/sys/vm/uma_core.c:3357
#10 zone_release (zone=, bucket=0xf801c7fd5218,
cnt=) at /usr/src/sys/vm/uma_core.c:3404
#11 0x80ca47d4 in bucket_drain (zone=0xf800037da000,
bucket=0xf801c7fd5200) at /usr/src/sys/vm/uma_core.c:758
#12 bucket_cache_drain (zone=0xf800037da000) at
/usr/src/sys/vm/uma_core.c:915
#13 0x80c9e302 in zone_drain_wait (zone=0xf800037da000,
waitok=1) at /usr/src/sys/vm/uma_core.c:1037
#14 0x80ca2bbd in zone_drain (zone=0xf800037da000) at
/usr/src/sys/vm/uma_core.c:1056
#15 zone_foreach (zfunc=) at /usr/src/sys/vm/uma_core.c:1985
#16 uma_reclaim_locked (kmem_danger=) at
/usr/src/sys/vm/uma_core.c:3737
#17 0x80ca2af4 in uma_reclaim () at /usr/src/sys/vm/uma_core.c:3757
#18 0x80cc5321 in vm_pageout_lowmem () at
/usr/src/sys/vm/vm_pageout.c:1890
#19 vm_pageout_worker (arg=) at
/usr/src/sys/vm/vm_pageout.c:1966
#20 0x80cc4ee3 in vm_pageout () at /usr/src/sys/vm/vm_pageout.c:2126
#21 0x809b55be in fork_exit (callout=0x80cc4d50
, arg=0x0, frame=0xfe00bc0f8b00) at
/usr/src/sys/kern/kern_fork.c:1080
#22 
(kgdb) bt full
#0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
    td = 
#1  doadump (textdump=) at
/usr/src/sys/kern/kern_shutdown.c:371
    error = 
    coredump = 
#2  0x809f3c85 in kern_reboot (howto=260) at
/usr/src/sys/kern/kern_shutdown.c:451
    once = 
#3  0x809f40c3 in vpanic (fmt=, ap=) at /usr/src/sys/kern/kern_shutdown.c:880
    buf = "page fault", '\000' 
    other_cpus = {__bits = {2047, 0, 0, 0}}
    td = 0xf80004964740
    newpanic = 
    bootopt = 
#4  0x809f3ee3 in panic (fmt=) at
/usr/src/sys/kern/kern_shutdown.c:807
    ap = {{gp_offset = 16, fp_offset = 48, overflow_arg_area =
0xfe00bc0f82c0, reg_save_area = 0xfe00bc0f8260}}
#5  0x80e3fe71 in trap_fatal (frame=0xfe00bc0f8480, eva=0)
at /usr/src/sys/amd64/amd64/trap.c:921
    softseg = {ssd_base = 0, ssd_limit = 1048575, ssd_type = 27,
ssd_dpl = 0, ssd_p = 1, ssd_long = 1, ssd_def32 = 0, ssd_gran = 1}
    code = 
    type = 
    ss = 40
    handled = 
#6  0x80e3fecf in trap_pfault (frame=0xfe00bc0f8480,
usermode=, signo=, ucode=)
at /usr/src/sys/amd64/amd64/trap.c:739
    td = 0xf80004964740
    p = 
    eva = 0
    map = 
    ftype = 
    rv = 
#7  0x80e3f516 in trap (frame=0xfe00bc0f8480) at
/usr/src/sys/amd64/amd64/trap.c:405
    ksi = {ksi_link = {tqe_next = 0