Re: panic: Solaris(panic): blkptr invalid CHECKSUM1

2017-09-30 Thread Harry Schmalzbauer
 Bezüglich Harry Schmalzbauer's Nachricht vom 30.09.2017 19:25 (localtime):
>  Bezüglich Harry Schmalzbauer's Nachricht vom 30.09.2017 18:30 (localtime):
>>  Bad surprise.
>> Most likely I forgot to stop a PCIe-Passthrough NIC before shutting down
>> that (byhve(8)) guest – jhb@ helped my identifying this as the root
>> cause for sever memory corruptions I regularly had (on stable-11).
>>
>> Now this time, corruption affected ZFS's RAM area, obviously.
>>
>> What I haven't expected is the panic.
>> The machine has memory disk as root, so luckily I still can boot (from
>> ZFS, –> mdpreload rootfs) into single user mode, but early rc stage
>> (most likely mounting ZFS datasets) leads to the following panic:
>>
>> Trying to mount root from ufs:/dev/ufs/cetusROOT []...
>> panic: Solaris(panic): blkptr at 0xfe0005b6b000 has invalid CHECKSUM 1
>> cpuid = 1
>> KDB: stack backtrace:
>> #0 0x805e3837 at kdb_backtrace+0x67
>> #1 0x805a2286 at vpanic+0x186
>> #2 0x805a20f3 at panic+0x43
>> #3 0x81570192 at vcmn_err+0xc2
>> #4 0x812d7dda at zfs_panic_recover+0x5a
>> #5 0x812ff49b at zfs_blkptr_verify+0x8b
>> #6 0x812ff72c at zio_read+0x2c
>> #7 0x812761de at arc_read+0x6de
>> #8 0x81298b4d at traverse_prefetch_metadata+0xbd
>> #9 0x812980ed at traverse_visitbp+0x39d
>> #10 0x81298c27 at traverse_dnode+0xc7
>> #11 0x812984a3 at traverse_visitbp+0x753
>> #12 0x8129788b at traverse_impl+0x22b
>> #13 0x81297afc at traverse_pool+0x5c
>> #14 0x812cce06 at spa_load+0x1c06
>> #15 0x812cc302 at spa_load+0x1102
>> #16 0x812cac6e at spa_load_best+0x6e
>> #17 0x812c73a1 at spa_open_common+0x101
>> Uptime: 37s
>> Dumping 1082 out of 15733 MB:..2%..…
>> Dump complete
>> mps0: Sending StopUnit: path (xpt0:mps0:0:2:): handle 12
>> mps0: Incrementing SSU count
>> …
>>
>> Haven't done any scrub attempts yet – expectation is to get all datasets
>> of the striped mirror pool back...
>>
>> Any hints highly appreciated.
> Now it seems I'm in really big trouble.
> Regular import doesn't work (also not if booted from cd9660).
> I get all pools listed, but trying to import (unmounted) leads to the
> same panic as initialy reported – because rc is just doning the same.
>
> I booted into single user mode (which works since the bootpool isn't
> affected and root is a memory disk from the bootpool)
> and set vfs.zfs.recover=1.
> But this time I don't even get the list of pools to import 'zpool'
> import instantaniously leads to that panic:
>
> Solaris: WARNING: blkptr at 0xfe0005a8e000 has invalid CHECKSUM 1
> Solaris: WARNING: blkptr at 0xfe0005a8e000 has invalid COMPRESS 0
> Solaris: WARNING: blkptr at 0xfe0005a8e000 DVA 0 has invalid VDEV
> 2337865727
> Solaris: WARNING: blkptr at 0xfe0005a8e000 DVA 1 has invalid VDEV
> 289407040
> Solaris: WARNING: blkptr at 0xfe0005a8e000 DVA 2 has invalid VDEV
> 3959586324
>
>
> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; apic id = 00
> fault virtual address   = 0x50
> fault code  = supervisor read data, page not present
> instruction pointer = 0x20:0x812de904
> stack pointer   = 0x28:0xfe043f6bcbc0
> frame pointer   = 0x28:0xfe043f6bcbc0
> code segment= base 0x0, limit 0xf, type 0x1b
> = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags= interrupt enabled, resume, IOPL = 0
> current process = 44 (zpool)
> trap number = 12
> panic: page fault
> cpuid = 0

…

OpenIndiana also panics at regular import.
Unfortunately I don't know the aequivalent of vfs.zfs.recover in OI.

panic[cpu1]/thread=ff06dafe8be0: blkptr at ff06dbe63000 has
invalid CHECKSUM 1

Warning - stack not written to the dump buffer
ff001f67f070 genunix:vcmn_err+42 ()
ff001f67f0e0 zfs:zfs_panic_recover+51 ()
ff001f67f140 zfs:zfs_blkptr_verify+8d ()
ff001f67f220 zfs:zio_read+55 ()
ff001f67f310 zfs:arc_read+662 ()
ff001f67f370 zfs:traverse_prefetch_metadata+b5 ()
ff001f67f450 zfs:traverse_visitbp+1c3 ()
ff001f67f4e0 zfs:traverse_dnode+af ()
ff001f67f5c0 zfs:traverse_visitbp+6dd ()
ff001f67f720 zfs:traverse_impl+1a6 ()
ff001f67f830 zfs:traverse_pool+9f ()
ff001f67f8a0 zfs:spa_load_verify+1e6 ()
ff001f67f990 zfs:spa_load_impl+e1c ()
ff001f67fa30 zfs:spa_load+14e ()
ff001f67fad0 zfs:spa_load_best+7a ()
ff001f67fb90 zfs:spa_import+1b0 ()
ff001f67fbe0 zfs:zfs_ioc_pool_import+10f ()
ff001f67fc80 zfs:zfsdev_ioctl+4b7 ()
ff001f67fcc0 genunix:cdev_ioctl+39 ()
ff001f67fd10 specfs:spec_ioctl+60 ()
ff001f67fda0 genunix:fop_ioctl+55 ()
ff001f67fec0 genunix:ioctl+9b ()
ff001f67ff10 unix:brand_sys_sysenter+1c9 ()

This is a important lesson.
My impression was that it's not possible to corrupt a complete pool, but
there's always a way to recover 

Re: panic: Solaris(panic): blkptr invalid CHECKSUM1

2017-09-30 Thread Harry Schmalzbauer
 Bezüglich Harry Schmalzbauer's Nachricht vom 30.09.2017 18:30 (localtime):
>  Bad surprise.
> Most likely I forgot to stop a PCIe-Passthrough NIC before shutting down
> that (byhve(8)) guest – jhb@ helped my identifying this as the root
> cause for sever memory corruptions I regularly had (on stable-11).
>
> Now this time, corruption affected ZFS's RAM area, obviously.
>
> What I haven't expected is the panic.
> The machine has memory disk as root, so luckily I still can boot (from
> ZFS, –> mdpreload rootfs) into single user mode, but early rc stage
> (most likely mounting ZFS datasets) leads to the following panic:
>
> Trying to mount root from ufs:/dev/ufs/cetusROOT []...
> panic: Solaris(panic): blkptr at 0xfe0005b6b000 has invalid CHECKSUM 1
> cpuid = 1
> KDB: stack backtrace:
> #0 0x805e3837 at kdb_backtrace+0x67
> #1 0x805a2286 at vpanic+0x186
> #2 0x805a20f3 at panic+0x43
> #3 0x81570192 at vcmn_err+0xc2
> #4 0x812d7dda at zfs_panic_recover+0x5a
> #5 0x812ff49b at zfs_blkptr_verify+0x8b
> #6 0x812ff72c at zio_read+0x2c
> #7 0x812761de at arc_read+0x6de
> #8 0x81298b4d at traverse_prefetch_metadata+0xbd
> #9 0x812980ed at traverse_visitbp+0x39d
> #10 0x81298c27 at traverse_dnode+0xc7
> #11 0x812984a3 at traverse_visitbp+0x753
> #12 0x8129788b at traverse_impl+0x22b
> #13 0x81297afc at traverse_pool+0x5c
> #14 0x812cce06 at spa_load+0x1c06
> #15 0x812cc302 at spa_load+0x1102
> #16 0x812cac6e at spa_load_best+0x6e
> #17 0x812c73a1 at spa_open_common+0x101
> Uptime: 37s
> Dumping 1082 out of 15733 MB:..2%..…
> Dump complete
> mps0: Sending StopUnit: path (xpt0:mps0:0:2:): handle 12
> mps0: Incrementing SSU count
> …
>
> Haven't done any scrub attempts yet – expectation is to get all datasets
> of the striped mirror pool back...
>
> Any hints highly appreciated.

Now it seems I'm in really big trouble.
Regular import doesn't work (also not if booted from cd9660).
I get all pools listed, but trying to import (unmounted) leads to the
same panic as initialy reported – because rc is just doning the same.

I booted into single user mode (which works since the bootpool isn't
affected and root is a memory disk from the bootpool)
and set vfs.zfs.recover=1.
But this time I don't even get the list of pools to import 'zpool'
import instantaniously leads to that panic:

Solaris: WARNING: blkptr at 0xfe0005a8e000 has invalid CHECKSUM 1
Solaris: WARNING: blkptr at 0xfe0005a8e000 has invalid COMPRESS 0
Solaris: WARNING: blkptr at 0xfe0005a8e000 DVA 0 has invalid VDEV
2337865727
Solaris: WARNING: blkptr at 0xfe0005a8e000 DVA 1 has invalid VDEV
289407040
Solaris: WARNING: blkptr at 0xfe0005a8e000 DVA 2 has invalid VDEV
3959586324


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x50
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x812de904
stack pointer   = 0x28:0xfe043f6bcbc0
frame pointer   = 0x28:0xfe043f6bcbc0
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 44 (zpool)
trap number = 12
panic: page fault
cpuid = 0
KDB: stack backtrace:
#0 0x805e3837 at kdb_backtrace+0x67
#1 0x805a2286 at vpanic+0x186
#2 0x805a20f3 at panic+0x43
#3 0x808a4922 at trap_fatal+0x322
#4 0x808a4979 at trap_pfault+0x49
#5 0x808a41f8 at trap+0x298
#6 0x80889fb1 at calltrap+0x8
#7 0x812e58a3 at vdev_mirror_child_select+0x53
#8 0x812e535e at vdev_mirror_io_start+0x2ee
#9 0x81303aa1 at zio_vdev_io_start+0x161
#10 0x8130054c at zio_execute+0xac
#11 0x812ffe7b at zio_nowait+0xcb
#12 0x812761f3 at arc_read+0x6f3
#13 0x81298b4d at traverse_prefetch_metadata+0xbd
#14 0x812980ed at traverse_visitbp+0x39d
#15 0x81298c27 at traverse_dnode+0xc7
#16 0x812984a3 at traverse_visitbp+0x753
#17 0x8129788b at traverse_impl+0x22b

Now I hope any ZFS guru can help me out. Needless to mention that the
bits on this mirrored pool are important for me – no productive data,
but lots of intermediate...

Thanks,

-harry

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

panic: Solaris(panic): blkptr invalid CHECKSUM1

2017-09-30 Thread Harry Schmalzbauer
 Bad surprise.
Most likely I forgot to stop a PCIe-Passthrough NIC before shutting down
that (byhve(8)) guest – jhb@ helped my identifying this as the root
cause for sever memory corruptions I regularly had (on stable-11).

Now this time, corruption affected ZFS's RAM area, obviously.

What I haven't expected is the panic.
The machine has memory disk as root, so luckily I still can boot (from
ZFS, –> mdpreload rootfs) into single user mode, but early rc stage
(most likely mounting ZFS datasets) leads to the following panic:

Trying to mount root from ufs:/dev/ufs/cetusROOT []...
panic: Solaris(panic): blkptr at 0xfe0005b6b000 has invalid CHECKSUM 1
cpuid = 1
KDB: stack backtrace:
#0 0x805e3837 at kdb_backtrace+0x67
#1 0x805a2286 at vpanic+0x186
#2 0x805a20f3 at panic+0x43
#3 0x81570192 at vcmn_err+0xc2
#4 0x812d7dda at zfs_panic_recover+0x5a
#5 0x812ff49b at zfs_blkptr_verify+0x8b
#6 0x812ff72c at zio_read+0x2c
#7 0x812761de at arc_read+0x6de
#8 0x81298b4d at traverse_prefetch_metadata+0xbd
#9 0x812980ed at traverse_visitbp+0x39d
#10 0x81298c27 at traverse_dnode+0xc7
#11 0x812984a3 at traverse_visitbp+0x753
#12 0x8129788b at traverse_impl+0x22b
#13 0x81297afc at traverse_pool+0x5c
#14 0x812cce06 at spa_load+0x1c06
#15 0x812cc302 at spa_load+0x1102
#16 0x812cac6e at spa_load_best+0x6e
#17 0x812c73a1 at spa_open_common+0x101
Uptime: 37s
Dumping 1082 out of 15733 MB:..2%..…
Dump complete
mps0: Sending StopUnit: path (xpt0:mps0:0:2:): handle 12
mps0: Incrementing SSU count
…

Haven't done any scrub attempts yet – expectation is to get all datasets
of the striped mirror pool back...

Any hints highly appreciated.

-harry
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

find(1)'s "newer" primary expression gives wrong results with symbolic links

2017-09-30 Thread Harry Schmalzbauer


Hello,

utilizing find(1)'s 'newer' primary expression is broken with symbolic links 
(for a very long time).

Anyone who is using find for timestamp comparings should pay special attention 
regarding symbolic links.
The man page states for "-P" (which ist the default), that »the file 
information and file type (see stat(2)) returned for each symbolic link to be 
those of the link itself«

That's not the case, -P doesn't work as expected, see
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=222698
for details.
Hope someone with better C skills will find the root of the problem as soon as 
possible.  I'm trying to find myself, but I'm happyly proven not to be the 
fastest ;-)

-harry

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"