Re: panic: Solaris(panic): blkptr invalid CHECKSUM1
Bezüglich Harry Schmalzbauer's Nachricht vom 30.09.2017 19:25 (localtime): > Bezüglich Harry Schmalzbauer's Nachricht vom 30.09.2017 18:30 (localtime): >> Bad surprise. >> Most likely I forgot to stop a PCIe-Passthrough NIC before shutting down >> that (byhve(8)) guest – jhb@ helped my identifying this as the root >> cause for sever memory corruptions I regularly had (on stable-11). >> >> Now this time, corruption affected ZFS's RAM area, obviously. >> >> What I haven't expected is the panic. >> The machine has memory disk as root, so luckily I still can boot (from >> ZFS, –> mdpreload rootfs) into single user mode, but early rc stage >> (most likely mounting ZFS datasets) leads to the following panic: >> >> Trying to mount root from ufs:/dev/ufs/cetusROOT []... >> panic: Solaris(panic): blkptr at 0xfe0005b6b000 has invalid CHECKSUM 1 >> cpuid = 1 >> KDB: stack backtrace: >> #0 0x805e3837 at kdb_backtrace+0x67 >> #1 0x805a2286 at vpanic+0x186 >> #2 0x805a20f3 at panic+0x43 >> #3 0x81570192 at vcmn_err+0xc2 >> #4 0x812d7dda at zfs_panic_recover+0x5a >> #5 0x812ff49b at zfs_blkptr_verify+0x8b >> #6 0x812ff72c at zio_read+0x2c >> #7 0x812761de at arc_read+0x6de >> #8 0x81298b4d at traverse_prefetch_metadata+0xbd >> #9 0x812980ed at traverse_visitbp+0x39d >> #10 0x81298c27 at traverse_dnode+0xc7 >> #11 0x812984a3 at traverse_visitbp+0x753 >> #12 0x8129788b at traverse_impl+0x22b >> #13 0x81297afc at traverse_pool+0x5c >> #14 0x812cce06 at spa_load+0x1c06 >> #15 0x812cc302 at spa_load+0x1102 >> #16 0x812cac6e at spa_load_best+0x6e >> #17 0x812c73a1 at spa_open_common+0x101 >> Uptime: 37s >> Dumping 1082 out of 15733 MB:..2%..… >> Dump complete >> mps0: Sending StopUnit: path (xpt0:mps0:0:2:): handle 12 >> mps0: Incrementing SSU count >> … >> >> Haven't done any scrub attempts yet – expectation is to get all datasets >> of the striped mirror pool back... >> >> Any hints highly appreciated. > Now it seems I'm in really big trouble. > Regular import doesn't work (also not if booted from cd9660). > I get all pools listed, but trying to import (unmounted) leads to the > same panic as initialy reported – because rc is just doning the same. > > I booted into single user mode (which works since the bootpool isn't > affected and root is a memory disk from the bootpool) > and set vfs.zfs.recover=1. > But this time I don't even get the list of pools to import 'zpool' > import instantaniously leads to that panic: > > Solaris: WARNING: blkptr at 0xfe0005a8e000 has invalid CHECKSUM 1 > Solaris: WARNING: blkptr at 0xfe0005a8e000 has invalid COMPRESS 0 > Solaris: WARNING: blkptr at 0xfe0005a8e000 DVA 0 has invalid VDEV > 2337865727 > Solaris: WARNING: blkptr at 0xfe0005a8e000 DVA 1 has invalid VDEV > 289407040 > Solaris: WARNING: blkptr at 0xfe0005a8e000 DVA 2 has invalid VDEV > 3959586324 > > > Fatal trap 12: page fault while in kernel mode > cpuid = 0; apic id = 00 > fault virtual address = 0x50 > fault code = supervisor read data, page not present > instruction pointer = 0x20:0x812de904 > stack pointer = 0x28:0xfe043f6bcbc0 > frame pointer = 0x28:0xfe043f6bcbc0 > code segment= base 0x0, limit 0xf, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags= interrupt enabled, resume, IOPL = 0 > current process = 44 (zpool) > trap number = 12 > panic: page fault > cpuid = 0 … OpenIndiana also panics at regular import. Unfortunately I don't know the aequivalent of vfs.zfs.recover in OI. panic[cpu1]/thread=ff06dafe8be0: blkptr at ff06dbe63000 has invalid CHECKSUM 1 Warning - stack not written to the dump buffer ff001f67f070 genunix:vcmn_err+42 () ff001f67f0e0 zfs:zfs_panic_recover+51 () ff001f67f140 zfs:zfs_blkptr_verify+8d () ff001f67f220 zfs:zio_read+55 () ff001f67f310 zfs:arc_read+662 () ff001f67f370 zfs:traverse_prefetch_metadata+b5 () ff001f67f450 zfs:traverse_visitbp+1c3 () ff001f67f4e0 zfs:traverse_dnode+af () ff001f67f5c0 zfs:traverse_visitbp+6dd () ff001f67f720 zfs:traverse_impl+1a6 () ff001f67f830 zfs:traverse_pool+9f () ff001f67f8a0 zfs:spa_load_verify+1e6 () ff001f67f990 zfs:spa_load_impl+e1c () ff001f67fa30 zfs:spa_load+14e () ff001f67fad0 zfs:spa_load_best+7a () ff001f67fb90 zfs:spa_import+1b0 () ff001f67fbe0 zfs:zfs_ioc_pool_import+10f () ff001f67fc80 zfs:zfsdev_ioctl+4b7 () ff001f67fcc0 genunix:cdev_ioctl+39 () ff001f67fd10 specfs:spec_ioctl+60 () ff001f67fda0 genunix:fop_ioctl+55 () ff001f67fec0 genunix:ioctl+9b () ff001f67ff10 unix:brand_sys_sysenter+1c9 () This is a important lesson. My impression was that it's not possible to corrupt a complete pool, but there's always a way to recover
Re: panic: Solaris(panic): blkptr invalid CHECKSUM1
Bezüglich Harry Schmalzbauer's Nachricht vom 30.09.2017 18:30 (localtime): > Bad surprise. > Most likely I forgot to stop a PCIe-Passthrough NIC before shutting down > that (byhve(8)) guest – jhb@ helped my identifying this as the root > cause for sever memory corruptions I regularly had (on stable-11). > > Now this time, corruption affected ZFS's RAM area, obviously. > > What I haven't expected is the panic. > The machine has memory disk as root, so luckily I still can boot (from > ZFS, –> mdpreload rootfs) into single user mode, but early rc stage > (most likely mounting ZFS datasets) leads to the following panic: > > Trying to mount root from ufs:/dev/ufs/cetusROOT []... > panic: Solaris(panic): blkptr at 0xfe0005b6b000 has invalid CHECKSUM 1 > cpuid = 1 > KDB: stack backtrace: > #0 0x805e3837 at kdb_backtrace+0x67 > #1 0x805a2286 at vpanic+0x186 > #2 0x805a20f3 at panic+0x43 > #3 0x81570192 at vcmn_err+0xc2 > #4 0x812d7dda at zfs_panic_recover+0x5a > #5 0x812ff49b at zfs_blkptr_verify+0x8b > #6 0x812ff72c at zio_read+0x2c > #7 0x812761de at arc_read+0x6de > #8 0x81298b4d at traverse_prefetch_metadata+0xbd > #9 0x812980ed at traverse_visitbp+0x39d > #10 0x81298c27 at traverse_dnode+0xc7 > #11 0x812984a3 at traverse_visitbp+0x753 > #12 0x8129788b at traverse_impl+0x22b > #13 0x81297afc at traverse_pool+0x5c > #14 0x812cce06 at spa_load+0x1c06 > #15 0x812cc302 at spa_load+0x1102 > #16 0x812cac6e at spa_load_best+0x6e > #17 0x812c73a1 at spa_open_common+0x101 > Uptime: 37s > Dumping 1082 out of 15733 MB:..2%..… > Dump complete > mps0: Sending StopUnit: path (xpt0:mps0:0:2:): handle 12 > mps0: Incrementing SSU count > … > > Haven't done any scrub attempts yet – expectation is to get all datasets > of the striped mirror pool back... > > Any hints highly appreciated. Now it seems I'm in really big trouble. Regular import doesn't work (also not if booted from cd9660). I get all pools listed, but trying to import (unmounted) leads to the same panic as initialy reported – because rc is just doning the same. I booted into single user mode (which works since the bootpool isn't affected and root is a memory disk from the bootpool) and set vfs.zfs.recover=1. But this time I don't even get the list of pools to import 'zpool' import instantaniously leads to that panic: Solaris: WARNING: blkptr at 0xfe0005a8e000 has invalid CHECKSUM 1 Solaris: WARNING: blkptr at 0xfe0005a8e000 has invalid COMPRESS 0 Solaris: WARNING: blkptr at 0xfe0005a8e000 DVA 0 has invalid VDEV 2337865727 Solaris: WARNING: blkptr at 0xfe0005a8e000 DVA 1 has invalid VDEV 289407040 Solaris: WARNING: blkptr at 0xfe0005a8e000 DVA 2 has invalid VDEV 3959586324 Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x50 fault code = supervisor read data, page not present instruction pointer = 0x20:0x812de904 stack pointer = 0x28:0xfe043f6bcbc0 frame pointer = 0x28:0xfe043f6bcbc0 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 44 (zpool) trap number = 12 panic: page fault cpuid = 0 KDB: stack backtrace: #0 0x805e3837 at kdb_backtrace+0x67 #1 0x805a2286 at vpanic+0x186 #2 0x805a20f3 at panic+0x43 #3 0x808a4922 at trap_fatal+0x322 #4 0x808a4979 at trap_pfault+0x49 #5 0x808a41f8 at trap+0x298 #6 0x80889fb1 at calltrap+0x8 #7 0x812e58a3 at vdev_mirror_child_select+0x53 #8 0x812e535e at vdev_mirror_io_start+0x2ee #9 0x81303aa1 at zio_vdev_io_start+0x161 #10 0x8130054c at zio_execute+0xac #11 0x812ffe7b at zio_nowait+0xcb #12 0x812761f3 at arc_read+0x6f3 #13 0x81298b4d at traverse_prefetch_metadata+0xbd #14 0x812980ed at traverse_visitbp+0x39d #15 0x81298c27 at traverse_dnode+0xc7 #16 0x812984a3 at traverse_visitbp+0x753 #17 0x8129788b at traverse_impl+0x22b Now I hope any ZFS guru can help me out. Needless to mention that the bits on this mirrored pool are important for me – no productive data, but lots of intermediate... Thanks, -harry ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
panic: Solaris(panic): blkptr invalid CHECKSUM1
Bad surprise. Most likely I forgot to stop a PCIe-Passthrough NIC before shutting down that (byhve(8)) guest – jhb@ helped my identifying this as the root cause for sever memory corruptions I regularly had (on stable-11). Now this time, corruption affected ZFS's RAM area, obviously. What I haven't expected is the panic. The machine has memory disk as root, so luckily I still can boot (from ZFS, –> mdpreload rootfs) into single user mode, but early rc stage (most likely mounting ZFS datasets) leads to the following panic: Trying to mount root from ufs:/dev/ufs/cetusROOT []... panic: Solaris(panic): blkptr at 0xfe0005b6b000 has invalid CHECKSUM 1 cpuid = 1 KDB: stack backtrace: #0 0x805e3837 at kdb_backtrace+0x67 #1 0x805a2286 at vpanic+0x186 #2 0x805a20f3 at panic+0x43 #3 0x81570192 at vcmn_err+0xc2 #4 0x812d7dda at zfs_panic_recover+0x5a #5 0x812ff49b at zfs_blkptr_verify+0x8b #6 0x812ff72c at zio_read+0x2c #7 0x812761de at arc_read+0x6de #8 0x81298b4d at traverse_prefetch_metadata+0xbd #9 0x812980ed at traverse_visitbp+0x39d #10 0x81298c27 at traverse_dnode+0xc7 #11 0x812984a3 at traverse_visitbp+0x753 #12 0x8129788b at traverse_impl+0x22b #13 0x81297afc at traverse_pool+0x5c #14 0x812cce06 at spa_load+0x1c06 #15 0x812cc302 at spa_load+0x1102 #16 0x812cac6e at spa_load_best+0x6e #17 0x812c73a1 at spa_open_common+0x101 Uptime: 37s Dumping 1082 out of 15733 MB:..2%..… Dump complete mps0: Sending StopUnit: path (xpt0:mps0:0:2:): handle 12 mps0: Incrementing SSU count … Haven't done any scrub attempts yet – expectation is to get all datasets of the striped mirror pool back... Any hints highly appreciated. -harry ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
find(1)'s "newer" primary expression gives wrong results with symbolic links
Hello, utilizing find(1)'s 'newer' primary expression is broken with symbolic links (for a very long time). Anyone who is using find for timestamp comparings should pay special attention regarding symbolic links. The man page states for "-P" (which ist the default), that »the file information and file type (see stat(2)) returned for each symbolic link to be those of the link itself« That's not the case, -P doesn't work as expected, see https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=222698 for details. Hope someone with better C skills will find the root of the problem as soon as possible. I'm trying to find myself, but I'm happyly proven not to be the fastest ;-) -harry ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"