Re: ZFS cache devs UNAVAIL
on 23/10/2012 05:24 Matthew D. Fuller said the following: On Tue, Oct 23, 2012 at 12:57:34AM +0200 I heard the voice of Michael Schmiedgen, and lo! it spake thus: after an update to CURRENT 2012-10-17 my ZFS cache devs are marked UAVAIL after boot. These two devs are SSD partitions that are listed with some wired numbers (see below). Before that they were listed fine as ada0p1 and ada1p1. I saw this after my update to 10.0-CURRENT #0 r241541: Sun Oct 14. In my case, it's ada2p2 which is the cache that comes up unavail on boot. One notable thing may be that p1 is used for ZIL, and comes up fine. NAMESTATE READ WRITE CKSUM d ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ada1p3 ONLINE 0 0 0 ada0p3 ONLINE 0 0 0 logs ada2p1ONLINE 0 0 0 cache ada2p2ONLINE 0 0 0 I notice that you also have a second partition on your drives that's part of another pool. Maybe it's related to something giving up after assigning one partition from the drive to zpool somewhere? Though in your case it's p2 that's working and p1 that's wandered off, so maybe that's not it... Guys, could you please reproduce the problem with vfs.zfs.debug=1 in loader.conf and share the dmesg? Thank you. -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: ZFS cache devs UNAVAIL
Hi Andriy, my dmesg is listed below. Thanks, Michael FreeBSD 10.0-CURRENT #0: Tue Oct 23 00:14:32 CEST 2012 root@gizeh.smoke:/usr/obj/usr/src/sys/GIZEH amd64 CPU: Intel(R) Xeon(R) CPU E3110 @ 3.00GHz (2992.57-MHz K8-class CPU) Origin = GenuineIntel Id = 0x10676 Family = 0x6 Model = 0x17 Stepping = 6 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE Features2=0x8e3fdSSE3,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1 AMD Features=0x20100800SYSCALL,NX,LM AMD Features2=0x1LAHF TSC: P-state invariant, performance statistics real memory = 6442450944 (6144 MB) avail memory = 6145687552 (5860 MB) Event timer LAPIC quality 400 ACPI APIC Table: PTLTD APIC FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs FreeBSD/SMP: 1 package(s) x 2 core(s) cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 ioapic0 Version 2.0 irqs 0-23 on motherboard kbd1 at kbdmux0 acpi0: PTLTDXSDT on motherboard acpi0: Power Button (fixed) cpu0: ACPI CPU on acpi0 cpu1: ACPI CPU on acpi0 hpet0: High Precision Event Timer iomem 0xfed0-0xfed003ff irq 0,8 on acpi0 Timecounter HPET frequency 14318180 Hz quality 950 Event timer HPET frequency 14318180 Hz quality 450 Event timer HPET1 frequency 14318180 Hz quality 440 Event timer HPET2 frequency 14318180 Hz quality 440 Event timer HPET3 frequency 14318180 Hz quality 440 atrtc0: AT realtime clock port 0x70-0x71 on acpi0 Event timer RTC frequency 32768 Hz quality 0 attimer0: AT timer port 0x40-0x43,0x50-0x53 on acpi0 Timecounter i8254 frequency 1193182 Hz quality 0 Event timer i8254 frequency 1193182 Hz quality 100 Timecounter ACPI-safe frequency 3579545 Hz quality 850 acpi_timer0: 24-bit timer at 3.579545MHz port 0x1008-0x100b on acpi0 pcib0: ACPI Host-PCI bridge port 0xcf8-0xcff on acpi0 pci0: ACPI PCI bus on pcib0 pcib1: ACPI PCI-PCI bridge irq 16 at device 1.0 on pci0 pci1: ACPI PCI bus on pcib1 vgapci0: VGA-compatible display port 0x2000-0x207f mem 0xd200-0xd2ff,0xc000-0xcfff,0xd000-0xd1ff irq 16 at device 0.0 on pci1 nvidia0: GeForce 8600 GT on vgapci0 vgapci0: child nvidia0 requested pci_enable_io vgapci0: child nvidia0 requested pci_enable_io em0: Intel(R) PRO/1000 Network Connection 7.3.2 port 0x1820-0x183f mem 0xd330-0xd331,0xd3324000-0xd3324fff irq 16 at device 25.0 on pci0 em0: Using an MSI interrupt em0: Ethernet address: 00:30:48:93:f0:06 uhci0: Intel 82801I (ICH9) USB controller port 0x1840-0x185f irq 16 at device 26.0 on pci0 usbus0 on uhci0 uhci1: Intel 82801I (ICH9) USB controller port 0x1860-0x187f irq 17 at device 26.1 on pci0 usbus1 on uhci1 uhci2: Intel 82801I (ICH9) USB controller port 0x1880-0x189f irq 18 at device 26.2 on pci0 usbus2 on uhci2 ehci0: Intel 82801I (ICH9) USB 2.0 controller mem 0xd3326800-0xd3326bff irq 18 at device 26.7 on pci0 usbus3: EHCI version 1.0 usbus3 on ehci0 hdac0: Intel 82801I HDA Controller mem 0xd332-0xd3323fff irq 16 at device 27.0 on pci0 pcib2: ACPI PCI-PCI bridge irq 16 at device 28.0 on pci0 pci5: ACPI PCI bus on pcib2 pcib3: ACPI PCI-PCI bridge irq 16 at device 28.4 on pci0 pci13: ACPI PCI bus on pcib3 ahci0: Marvell 88SE912x AHCI SATA controller port 0x3030-0x3037,0x3024-0x3027,0x3028-0x302f,0x3020-0x3023,0x3000-0x300f mem 0xd300-0xd30007ff irq 16 at device 0.0 on pci13 ahci0: AHCI v1.20 with 8 6Gbps ports, Port Multiplier not supported ahcich0: AHCI channel at channel 0 on ahci0 ahcich1: AHCI channel at channel 1 on ahci0 ahcich2: AHCI channel at channel 2 on ahci0 ahcich3: AHCI channel at channel 3 on ahci0 ahcich4: AHCI channel at channel 4 on ahci0 ahcich5: AHCI channel at channel 5 on ahci0 ahcich6: AHCI channel at channel 6 on ahci0 ahcich7: AHCI channel at channel 7 on ahci0 atapci0: Marvell 88SE912x UDMA133 controller port 0x3048-0x304f,0x303c-0x303f,0x3040-0x3047,0x3038-0x303b,0x3010-0x301f mem 0xd3000800-0xd300080f irq 17 at device 0.1 on pci13 uhci3: Intel 82801I (ICH9) USB controller port 0x18a0-0x18bf irq 23 at device 29.0 on pci0 usbus4 on uhci3 uhci4: Intel 82801I (ICH9) USB controller port 0x18c0-0x18df irq 22 at device 29.1 on pci0 usbus5 on uhci4 uhci5: Intel 82801I (ICH9) USB controller port 0x18e0-0x18ff irq 18 at device 29.2 on pci0 usbus6 on uhci5 ehci1: Intel 82801I (ICH9) USB 2.0 controller mem 0xd3326c00-0xd3326fff irq 23 at device 29.7 on pci0 usbus7: EHCI version 1.0 usbus7 on ehci1 pcib4: ACPI PCI-PCI bridge at device 30.0 on pci0 pci17: ACPI PCI bus on pcib4 atapci1: ITE IT8212F UDMA133 controller port 0x4020-0x4027,0x4014-0x4017,0x4018-0x401f,0x4010-0x4013,0x4000-0x400f irq 23 at device 4.0 on pci17 ata2: ATA channel at channel 0 on atapci1 ata3: ATA channel at channel 1 on atapci1 isab0: PCI-ISA bridge at device 31.0 on pci0 isa0: ISA bus on isab0 ahci1: Intel ICH9 AHCI SATA controller port 0x1c70-0x1c77,0x1c64-0x1c67,0x1c68-0x1c6f,0x1c60-0x1c63,0x1c00-0x1c1f mem 0xd3326000-0xd33267ff irq 17 at device 31.2 on pci0
Re: ZFS cache devs UNAVAIL
on 23/10/2012 20:56 Michael Schmiedgen said the following: FreeBSD 10.0-CURRENT #0: Tue Oct 23 00:14:32 CEST 2012 root@gizeh.smoke:/usr/obj/usr/src/sys/GIZEH amd64 ... vdev_geom_open_by_path:519[1]: guid mismatch for provider /dev/ada0p1: 5267967234359339128 != 0. Thank you for this valuable information. Do you have a rough estimate of when you started to experience this issue? Could you please also provide output of the following command captured right after a reboot and then after you re-add the cache disks? $ zdb -lll /dev/ada0p -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: ZFS cache devs UNAVAIL
On Tue, Oct 23, 2012 at 11:08:55PM +0300 I heard the voice of Andriy Gapon, and lo! it spake thus: Do you have a rough estimate of when you started to experience this issue? I saw it with r241541 and not with my previous kernel (strings says it was r238937; July 31). So not a very narrow range for me. The major changes in ZFS in that interval I say in a glance at the log were the TRIM and the tasting-for-root-pool. But I don't have any reason to suspect them other than hey, these are high-profile. -- Matthew Fuller (MF4839) | fulle...@over-yonder.net Systems/Network Administrator | http://www.over-yonder.net/~fullermd/ On the Internet, nobody can hear you scream. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: ZFS cache devs UNAVAIL
on 23/10/2012 23:08 Andriy Gapon said the following: on 23/10/2012 20:56 Michael Schmiedgen said the following: FreeBSD 10.0-CURRENT #0: Tue Oct 23 00:14:32 CEST 2012 root@gizeh.smoke:/usr/obj/usr/src/sys/GIZEH amd64 ... vdev_geom_open_by_path:519[1]: guid mismatch for provider /dev/ada0p1: 5267967234359339128 != 0. Thank you for this valuable information. Do you have a rough estimate of when you started to experience this issue? Could you please also provide output of the following command captured right after a reboot and then after you re-add the cache disks? $ zdb -lll /dev/ada0p I still would like to get the above information if possible. But here is a patch that you can try: --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c @@ -270,14 +270,16 @@ vdev_geom_read_config(struct g_consumer *cp, nvlist_t **config) continue; if (nvlist_lookup_uint64(*config, ZPOOL_CONFIG_POOL_STATE, - state) != 0 || state = POOL_STATE_DESTROYED) { + state) != 0 || state == POOL_STATE_DESTROYED || + state POOL_STATE_L2CACHE) { nvlist_free(*config); *config = NULL; continue; } - if (nvlist_lookup_uint64(*config, ZPOOL_CONFIG_POOL_TXG, - txg) != 0 || txg == 0) { + if (state != POOL_STATE_SPARE state != POOL_STATE_L2CACHE + (nvlist_lookup_uint64(*config, ZPOOL_CONFIG_POOL_TXG, + txg) != 0 || txg == 0)) { nvlist_free(*config); *config = NULL; continue; I think that I introduced this bug because I used some old OpenSolaris code as an inspiration and completely missed the new states. -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: ZFS cache devs UNAVAIL
Hi Andy, thank you for your reply. I will test your patch right now and give you feedback. On 10/23/12 22:08, Andriy Gapon wrote: on 23/10/2012 20:56 Michael Schmiedgen said the following: ... vdev_geom_open_by_path:519[1]: guid mismatch for provider /dev/ada0p1: 5267967234359339128 != 0. Thank you for this valuable information. Do you have a rough estimate of when you started to experience this issue? I experienced this since my 2012-10-17 build. I build every 3-5 weeks. Could you please also provide output of the following command captured right after a reboot and then after you re-add the cache disks? $ zdb -lll /dev/ada0p Here the data in UNAVAIL state: # zdb -lll /dev/ada0p1 LABEL 0 version: 5000 state: 4 guid: 5267967234359339128 LABEL 1 version: 5000 state: 4 guid: 5267967234359339128 LABEL 2 version: 5000 state: 4 guid: 5267967234359339128 LABEL 3 version: 5000 state: 4 guid: 5267967234359339128 # zdb -lll /dev/ada1p1 LABEL 0 version: 5000 state: 4 guid: 5693315451104805234 LABEL 1 version: 5000 state: 4 guid: 5693315451104805234 LABEL 2 version: 5000 state: 4 guid: 5693315451104805234 LABEL 3 version: 5000 state: 4 guid: 5693315451104805234 Here the data after readding the two devs: zdb -lll /dev/ada0p1 LABEL 0 version: 5000 state: 4 guid: 13019058935211054376 LABEL 1 version: 5000 state: 4 guid: 13019058935211054376 LABEL 2 version: 5000 state: 4 guid: 13019058935211054376 LABEL 3 version: 5000 state: 4 guid: 13019058935211054376 # zdb -lll /dev/ada1p1 LABEL 0 version: 5000 state: 4 guid: 1347428618237802818 LABEL 1 version: 5000 state: 4 guid: 1347428618237802818 LABEL 2 version: 5000 state: 4 guid: 1347428618237802818 LABEL 3 version: 5000 state: 4 guid: 1347428618237802818 I will post the data after build/install/reboot soon. Thanks, Michael ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: ZFS cache devs UNAVAIL
On 10/23/12 22:23, Andriy Gapon wrote: on 23/10/2012 23:08 Andriy Gapon said the following: on 23/10/2012 20:56 Michael Schmiedgen said the following: ... vdev_geom_open_by_path:519[1]: guid mismatch for provider /dev/ada0p1: 5267967234359339128 != 0. Thank you for this valuable information. Do you have a rough estimate of when you started to experience this issue? Could you please also provide output of the following command captured right after a reboot and then after you re-add the cache disks? $ zdb -lll /dev/ada0p I still would like to get the above information if possible. But here is a patch that you can try: --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c @@ -270,14 +270,16 @@ vdev_geom_read_config(struct g_consumer *cp, nvlist_t **config) continue; if (nvlist_lookup_uint64(*config, ZPOOL_CONFIG_POOL_STATE, - state) != 0 || state = POOL_STATE_DESTROYED) { + state) != 0 || state == POOL_STATE_DESTROYED || + state POOL_STATE_L2CACHE) { nvlist_free(*config); *config = NULL; continue; } - if (nvlist_lookup_uint64(*config, ZPOOL_CONFIG_POOL_TXG, - txg) != 0 || txg == 0) { + if (state != POOL_STATE_SPARE state != POOL_STATE_L2CACHE + (nvlist_lookup_uint64(*config, ZPOOL_CONFIG_POOL_TXG, + txg) != 0 || txg == 0)) { nvlist_free(*config); *config = NULL; continue; This works for me. Thank you very much! :) For zdb data see below, it has not changed since patch-apply/readd/reboot. Michael # zdb -lll /dev/ada0p1 LABEL 0 version: 5000 state: 4 guid: 13019058935211054376 LABEL 1 version: 5000 state: 4 guid: 13019058935211054376 LABEL 2 version: 5000 state: 4 guid: 13019058935211054376 LABEL 3 version: 5000 state: 4 guid: 13019058935211054376 # zdb -lll /dev/ada1p1 LABEL 0 version: 5000 state: 4 guid: 1347428618237802818 LABEL 1 version: 5000 state: 4 guid: 1347428618237802818 LABEL 2 version: 5000 state: 4 guid: 1347428618237802818 LABEL 3 version: 5000 state: 4 guid: 1347428618237802818 ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: ZFS cache devs UNAVAIL
On 10/23/12 23:57, Florian Smeets wrote: My NAS experienced same problem, I thought the old IDE SSD had just died of old age, that's why i didn't investigate further yet. :) I got 2 physical SSDs, with both first partitions striped as cache for my main zpool (cache devs gone UNAVAIL) and both second partitions for a mirrored temp zpool (ONLINE). So I saw good chances to *not* blame the hardware. ;) With the patch the cache device is back. Works here, too. Michael ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: ZFS cache devs UNAVAIL
On 23.10.12 22:23, Andriy Gapon wrote: on 23/10/2012 23:08 Andriy Gapon said the following: on 23/10/2012 20:56 Michael Schmiedgen said the following: FreeBSD 10.0-CURRENT #0: Tue Oct 23 00:14:32 CEST 2012 root@gizeh.smoke:/usr/obj/usr/src/sys/GIZEH amd64 ... vdev_geom_open_by_path:519[1]: guid mismatch for provider /dev/ada0p1: 5267967234359339128 != 0. Thank you for this valuable information. Do you have a rough estimate of when you started to experience this issue? Could you please also provide output of the following command captured right after a reboot and then after you re-add the cache disks? $ zdb -lll /dev/ada0p I still would like to get the above information if possible. But here is a patch that you can try: I think that I introduced this bug because I used some old OpenSolaris code as an inspiration and completely missed the new states. My NAS experienced same problem, I thought the old IDE SSD had just died of old age, that's why i didn't investigate further yet. :) With the patch the cache device is back. Thanks, Florian signature.asc Description: OpenPGP digital signature
Re: ZFS cache devs UNAVAIL
On Tue, Oct 23, 2012 at 12:57:34AM +0200 I heard the voice of Michael Schmiedgen, and lo! it spake thus: after an update to CURRENT 2012-10-17 my ZFS cache devs are marked UAVAIL after boot. These two devs are SSD partitions that are listed with some wired numbers (see below). Before that they were listed fine as ada0p1 and ada1p1. I saw this after my update to 10.0-CURRENT #0 r241541: Sun Oct 14. In my case, it's ada2p2 which is the cache that comes up unavail on boot. One notable thing may be that p1 is used for ZIL, and comes up fine. NAMESTATE READ WRITE CKSUM d ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ada1p3 ONLINE 0 0 0 ada0p3 ONLINE 0 0 0 logs ada2p1ONLINE 0 0 0 cache ada2p2ONLINE 0 0 0 I notice that you also have a second partition on your drives that's part of another pool. Maybe it's related to something giving up after assigning one partition from the drive to zpool somewhere? Though in your case it's p2 that's working and p1 that's wandered off, so maybe that's not it... -- Matthew Fuller (MF4839) | fulle...@over-yonder.net Systems/Network Administrator | http://www.over-yonder.net/~fullermd/ On the Internet, nobody can hear you scream. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org