Re: dump -X of large LVM based FFSv2 with WAPBL panics

Jaromír Doleček Wed, 15 Nov 2017 16:48:39 -0800

Hi,

can you try if doing full forced fsck (fsck -f) would resolve this?


I've seen several such persistent panics when I was debugging WAPBL. Even
after kernel fixes I had persistent panics around ffs_newvnode() due to
disk data corruption from previous runs. This is worth trying.

Some day I plan to add some counter, so that actually boot would actually
force fsck every X boots even when clean, similarily what Linux does with
ext3/4.

Jaromir

2017-11-15 12:56 GMT+01:00 Matthias Petermann <[email protected]>:

> Hello,
>
> on my system I have observed a serious panic when doing FFSv2 dumps under
> certain conditions. I did some googling on my own and found some references
> regarding the lead symptom
>
>         "ffs_newvnode: ino=113 on /p: gen 55fd2f1f/55fd2f1f has non zero
> blocks ffffffffffffff00 or size 0"
>
> but all of them ended up as solved back in 2016. So I wanted to share my
> observation here, in the hope somebody can give me some pointers how the
> issue could be narrowed down further.
>
> 1) Given:
>
> - NetBSD 8.0_BETA (Kernel built from branches/netbsd-8 around 2017-11-06)
>
>         NetBSD nuc.local 8.0_BETA NetBSD 8.0_BETA (XEN3_DOM0_XHCI) #0: Mon
> Nov 6 14:31:17 CET 2017 
> [email protected]:/s/src/sys/arch/amd64/compile/XEN3_DOM0_XHCI
> amd64
>
> - A large (392 GB) LVM volume hosting a FFSv2 filesystem with WAPBL enabled
>   (/dev/mapper/vg0-photo mounted at /p)
>
> - (An external USB 3.0 Drive)
>
> 2) What I tried:
>
> - make a dump of the aforementioned filesystem, using snapshots
>
>     # dump -X -0auf /mnt/photo.0.dump /p
>
> 3) What happens then:
>
> - the System crashes, leaving a coredump with with the following
> indication:
>
>     ffs_newvnode: ino=113 on /p: gen 55fd2f1f/55fd2f1f has non zero blocks
> ffffffffffffff00 or size 0
>     fatal page fault in supervisor mode
>     trap type 6 code 0x2 rip 0xffffffff8022c0cc cs 0x8 rflags 0x10246 cr2
> 0xfffffe82deaddf1d ilevel 0x3 rsp 0xfffffe810e6b1eb8
>     curlwp 0xfffffe827f736000 pid 0.4 lowest kstack 0xfffffe810e6ae2c0
>     panic: trap
>     cpu0: Begin traceback...
>     vpanic() at netbsd:vpanic+0x140
>     snprintf() at netbsd:snprintf
>     trap() at netbsd:trap+0xc6b
>     --- trap (number 6) ---
>     mutex_enter() at netbsd:mutex_enter+0xc
>     biodone2() at netbsd:biodone2+0x9b
>     biodone2() at netbsd:biodone2+0x9b
>     biointr() at netbsd:biointr+0x3a
>     softint_dispatch() at netbsd:softint_dispatch+0xd3
>     DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfffffe810e6b1ff0
>     Xsoftintr() at netbsd:Xsoftintr+0x4f
>     --- interrupt ---
>     0:
>     cpu0: End traceback...
>
>     dumping to dev 0,1 (offset=168119, size=2076255):
>     dump
>
> - gdb backtrace shows:
>
>     (gdb) target kvm netbsd.3.core
>     0xffffffff80229545 in cpu_reboot ()
>     (gdb) bt
>     #0  0xffffffff80229545 in cpu_reboot ()
>     #1  0xffffffff809a4afc in vpanic ()
>     #2  0xffffffff809a4bb0 in panic ()
>     #3  0xffffffff8022b176 in trap ()
>     #4  0xffffffff8020113e in alltraps ()
>     #5  0xffffffff8022c0cc in mutex_enter ()
>     #6  0xffffffff80a029f5 in wapbl_biodone ()
>     #7  0xffffffff809e2f20 in biodone2 ()
>     #8  0xffffffff809e2f20 in biodone2 ()
>     #9  0xffffffff809e303e in biointr ()
>     #10 0xffffffff8097bc1d in softint_dispatch ()
>     #11 0xffffffff80223eef in Xsoftintr ()
>     (gdb)
>
> 4) What I tried afterwards:
>
> - make a dump of the aforementioned filesystem, using NO snapshots
>
>     # dump -0auf /mnt/photo.0.dump /p
>
>     -> works
>
> - umount the filesystem, enforcing a manual fsck
>
>     -> no problems
>
> - dumpfs -s /dev/mapper/vg0-photo
>
>     nuc# dumpfs -s /dev/mapper/vg0-photo
>     file system: /dev/mapper/vg0-photo
>     format  FFSv2
>     endian  little-endian
>     location 65536  (-b 128)
>     magic   19540119        time    Wed Nov 15 12:26:52 2017
>     superblock location     65536   id      [ 59f8026a 16319237 ]
>     cylgrp  dynamic inodes  FFSv2   sblock  FFSv2   fslevel 5
>     nbfree  4461561 ndir    1865    nifree  24770027        nffree  2079
>     ncg     530     size    100663296       blocks  99102949
>     bsize   32768   shift   15      mask    0xffff8000
>     fsize   4096    shift   12      mask    0xfffff000
>     frag    8       shift   3       fsbtodb 3
>     bpg     23742   fpg     189936  ipg     46848
>     minfree 5%      optim   time    maxcontig 2     maxbpg  4096
>     symlinklen 120  contigsumsize 2
>     maxfilesize 0x000800800805ffff
>     nindir  4096    inopb   128
>     avgfilesize 16384       avgfpdir 64
>     sblkno  24      cblkno  32      iblkno  40      dblkno  2968
>     sbsize  4096    cgsize  32768
>     csaddr  2968    cssize  12288
>     cgrotor 0       fmod    0       ronly   0       clean   0x01
>     wapbl version 0x1       location 2      flags 0x0
>     wapbl loc0 402688128    loc1 131072     loc2 512        loc3 3
>     flags   none
>     fsmnt   /p
>     volname         swuid   0
>
> 5) Further observations:
>
> - dump -X of other FSs on the same machine seem to work fine, but
>   these FSs are smaller
>
> I'd be glad to help identifying the root cause further.
>
> Best regards,
> Matthias
>
> --
> Matthias Petermann <[email protected]> | www.petermann-it.de
> GnuPG: 0x5C3E6D75 | 5930 86EF 7965 2BBA 6572  C3D7 7B1D A3C3 5C3E 6D75
>

Re: dump -X of large LVM based FFSv2 with WAPBL panics

Reply via email to