Re: ZFS cache devs UNAVAIL

2012-10-23 Thread Andriy Gapon
on 23/10/2012 05:24 Matthew D. Fuller said the following:
 On Tue, Oct 23, 2012 at 12:57:34AM +0200 I heard the voice of
 Michael Schmiedgen, and lo! it spake thus:

 after an update to CURRENT 2012-10-17 my ZFS cache devs are marked
 UAVAIL after boot. These two devs are SSD partitions that are listed
 with some wired numbers (see below). Before that they were listed
 fine as ada0p1 and ada1p1.
 
 I saw this after my update to 10.0-CURRENT #0 r241541: Sun Oct 14.
 
 In my case, it's ada2p2 which is the cache that comes up unavail on
 boot.  One notable thing may be that p1 is used for ZIL, and comes up
 fine.
 
 NAMESTATE READ WRITE CKSUM
 d   ONLINE   0 0 0
   mirror-0  ONLINE   0 0 0
 ada1p3  ONLINE   0 0 0
 ada0p3  ONLINE   0 0 0
 logs
   ada2p1ONLINE   0 0 0
 cache
   ada2p2ONLINE   0 0 0
 
 
 I notice that you also have a second partition on your drives that's
 part of another pool.  Maybe it's related to something giving up after
 assigning one partition from the drive to zpool somewhere?  Though in
 your case it's p2 that's working and p1 that's wandered off, so maybe
 that's not it...
 
 

Guys,

could you please reproduce the problem with vfs.zfs.debug=1 in loader.conf and
share the dmesg?  Thank you.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ZFS cache devs UNAVAIL

2012-10-23 Thread Michael Schmiedgen
Hi Andriy,

my dmesg is listed below.

Thanks,
  Michael


FreeBSD 10.0-CURRENT #0: Tue Oct 23 00:14:32 CEST 2012
root@gizeh.smoke:/usr/obj/usr/src/sys/GIZEH amd64
CPU: Intel(R) Xeon(R) CPU   E3110  @ 3.00GHz (2992.57-MHz
K8-class CPU)
  Origin = GenuineIntel  Id = 0x10676  Family = 0x6  Model = 0x17
Stepping = 6

Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE

Features2=0x8e3fdSSE3,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1
  AMD Features=0x20100800SYSCALL,NX,LM
  AMD Features2=0x1LAHF
  TSC: P-state invariant, performance statistics
real memory  = 6442450944 (6144 MB)
avail memory = 6145687552 (5860 MB)
Event timer LAPIC quality 400
ACPI APIC Table: PTLTD  APIC  
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
FreeBSD/SMP: 1 package(s) x 2 core(s)
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
ioapic0 Version 2.0 irqs 0-23 on motherboard
kbd1 at kbdmux0
acpi0: PTLTDXSDT on motherboard
acpi0: Power Button (fixed)
cpu0: ACPI CPU on acpi0
cpu1: ACPI CPU on acpi0
hpet0: High Precision Event Timer iomem 0xfed0-0xfed003ff irq 0,8
on acpi0
Timecounter HPET frequency 14318180 Hz quality 950
Event timer HPET frequency 14318180 Hz quality 450
Event timer HPET1 frequency 14318180 Hz quality 440
Event timer HPET2 frequency 14318180 Hz quality 440
Event timer HPET3 frequency 14318180 Hz quality 440
atrtc0: AT realtime clock port 0x70-0x71 on acpi0
Event timer RTC frequency 32768 Hz quality 0
attimer0: AT timer port 0x40-0x43,0x50-0x53 on acpi0
Timecounter i8254 frequency 1193182 Hz quality 0
Event timer i8254 frequency 1193182 Hz quality 100
Timecounter ACPI-safe frequency 3579545 Hz quality 850
acpi_timer0: 24-bit timer at 3.579545MHz port 0x1008-0x100b on acpi0
pcib0: ACPI Host-PCI bridge port 0xcf8-0xcff on acpi0
pci0: ACPI PCI bus on pcib0
pcib1: ACPI PCI-PCI bridge irq 16 at device 1.0 on pci0
pci1: ACPI PCI bus on pcib1
vgapci0: VGA-compatible display port 0x2000-0x207f mem
0xd200-0xd2ff,0xc000-0xcfff,0xd000-0xd1ff irq 16
at device 0.0 on pci1
nvidia0: GeForce 8600 GT on vgapci0
vgapci0: child nvidia0 requested pci_enable_io
vgapci0: child nvidia0 requested pci_enable_io
em0: Intel(R) PRO/1000 Network Connection 7.3.2 port 0x1820-0x183f mem
0xd330-0xd331,0xd3324000-0xd3324fff irq 16 at device 25.0 on pci0
em0: Using an MSI interrupt
em0: Ethernet address: 00:30:48:93:f0:06
uhci0: Intel 82801I (ICH9) USB controller port 0x1840-0x185f irq 16 at
device 26.0 on pci0
usbus0 on uhci0
uhci1: Intel 82801I (ICH9) USB controller port 0x1860-0x187f irq 17 at
device 26.1 on pci0
usbus1 on uhci1
uhci2: Intel 82801I (ICH9) USB controller port 0x1880-0x189f irq 18 at
device 26.2 on pci0
usbus2 on uhci2
ehci0: Intel 82801I (ICH9) USB 2.0 controller mem
0xd3326800-0xd3326bff irq 18 at device 26.7 on pci0
usbus3: EHCI version 1.0
usbus3 on ehci0
hdac0: Intel 82801I HDA Controller mem 0xd332-0xd3323fff irq 16 at
device 27.0 on pci0
pcib2: ACPI PCI-PCI bridge irq 16 at device 28.0 on pci0
pci5: ACPI PCI bus on pcib2
pcib3: ACPI PCI-PCI bridge irq 16 at device 28.4 on pci0
pci13: ACPI PCI bus on pcib3
ahci0: Marvell 88SE912x AHCI SATA controller port
0x3030-0x3037,0x3024-0x3027,0x3028-0x302f,0x3020-0x3023,0x3000-0x300f
mem 0xd300-0xd30007ff irq 16 at device 0.0 on pci13
ahci0: AHCI v1.20 with 8 6Gbps ports, Port Multiplier not supported
ahcich0: AHCI channel at channel 0 on ahci0
ahcich1: AHCI channel at channel 1 on ahci0
ahcich2: AHCI channel at channel 2 on ahci0
ahcich3: AHCI channel at channel 3 on ahci0
ahcich4: AHCI channel at channel 4 on ahci0
ahcich5: AHCI channel at channel 5 on ahci0
ahcich6: AHCI channel at channel 6 on ahci0
ahcich7: AHCI channel at channel 7 on ahci0
atapci0: Marvell 88SE912x UDMA133 controller port
0x3048-0x304f,0x303c-0x303f,0x3040-0x3047,0x3038-0x303b,0x3010-0x301f
mem 0xd3000800-0xd300080f irq 17 at device 0.1 on pci13
uhci3: Intel 82801I (ICH9) USB controller port 0x18a0-0x18bf irq 23 at
device 29.0 on pci0
usbus4 on uhci3
uhci4: Intel 82801I (ICH9) USB controller port 0x18c0-0x18df irq 22 at
device 29.1 on pci0
usbus5 on uhci4
uhci5: Intel 82801I (ICH9) USB controller port 0x18e0-0x18ff irq 18 at
device 29.2 on pci0
usbus6 on uhci5
ehci1: Intel 82801I (ICH9) USB 2.0 controller mem
0xd3326c00-0xd3326fff irq 23 at device 29.7 on pci0
usbus7: EHCI version 1.0
usbus7 on ehci1
pcib4: ACPI PCI-PCI bridge at device 30.0 on pci0
pci17: ACPI PCI bus on pcib4
atapci1: ITE IT8212F UDMA133 controller port
0x4020-0x4027,0x4014-0x4017,0x4018-0x401f,0x4010-0x4013,0x4000-0x400f
irq 23 at device 4.0 on pci17
ata2: ATA channel at channel 0 on atapci1
ata3: ATA channel at channel 1 on atapci1
isab0: PCI-ISA bridge at device 31.0 on pci0
isa0: ISA bus on isab0
ahci1: Intel ICH9 AHCI SATA controller port
0x1c70-0x1c77,0x1c64-0x1c67,0x1c68-0x1c6f,0x1c60-0x1c63,0x1c00-0x1c1f
mem 0xd3326000-0xd33267ff irq 17 at device 31.2 on pci0

Re: ZFS cache devs UNAVAIL

2012-10-23 Thread Andriy Gapon
on 23/10/2012 20:56 Michael Schmiedgen said the following:
 FreeBSD 10.0-CURRENT #0: Tue Oct 23 00:14:32 CEST 2012
 root@gizeh.smoke:/usr/obj/usr/src/sys/GIZEH amd64
...
 vdev_geom_open_by_path:519[1]: guid mismatch for provider /dev/ada0p1:
 5267967234359339128 != 0.

Thank you for this valuable information.

Do you have a rough estimate of when you started to experience this issue?

Could you please also provide output of the following command captured right
after a reboot and then after you re-add the cache disks?
$ zdb -lll /dev/ada0p


-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ZFS cache devs UNAVAIL

2012-10-23 Thread Matthew D. Fuller
On Tue, Oct 23, 2012 at 11:08:55PM +0300 I heard the voice of
Andriy Gapon, and lo! it spake thus:
 
 Do you have a rough estimate of when you started to experience this issue?

I saw it with r241541 and not with my previous kernel (strings says it
was r238937; July 31).  So not a very narrow range for me.  The major
changes in ZFS in that interval I say in a glance at the log were the
TRIM and the tasting-for-root-pool.  But I don't have any reason to
suspect them other than hey, these are high-profile.


-- 
Matthew Fuller (MF4839)   |  fulle...@over-yonder.net
Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/
   On the Internet, nobody can hear you scream.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ZFS cache devs UNAVAIL

2012-10-23 Thread Andriy Gapon
on 23/10/2012 23:08 Andriy Gapon said the following:
 on 23/10/2012 20:56 Michael Schmiedgen said the following:
 FreeBSD 10.0-CURRENT #0: Tue Oct 23 00:14:32 CEST 2012
 root@gizeh.smoke:/usr/obj/usr/src/sys/GIZEH amd64
 ...
 vdev_geom_open_by_path:519[1]: guid mismatch for provider /dev/ada0p1:
 5267967234359339128 != 0.
 
 Thank you for this valuable information.
 
 Do you have a rough estimate of when you started to experience this issue?
 
 Could you please also provide output of the following command captured right
 after a reboot and then after you re-add the cache disks?
 $ zdb -lll /dev/ada0p
 
 

I still would like to get the above information if possible.
But here is a patch that you can try:

--- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
+++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
@@ -270,14 +270,16 @@ vdev_geom_read_config(struct g_consumer *cp, nvlist_t
**config)
continue;

if (nvlist_lookup_uint64(*config, ZPOOL_CONFIG_POOL_STATE,
-   state) != 0 || state = POOL_STATE_DESTROYED) {
+   state) != 0 || state == POOL_STATE_DESTROYED ||
+   state  POOL_STATE_L2CACHE) {
nvlist_free(*config);
*config = NULL;
continue;
}

-   if (nvlist_lookup_uint64(*config, ZPOOL_CONFIG_POOL_TXG,
-   txg) != 0 || txg == 0) {
+   if (state != POOL_STATE_SPARE  state != POOL_STATE_L2CACHE 
+   (nvlist_lookup_uint64(*config, ZPOOL_CONFIG_POOL_TXG,
+   txg) != 0 || txg == 0)) {
nvlist_free(*config);
*config = NULL;
continue;



I think that I introduced this bug because I used some old OpenSolaris code as
an inspiration and completely missed the new states.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ZFS cache devs UNAVAIL

2012-10-23 Thread Michael Schmiedgen
Hi Andy,

thank you for your reply. I will test your patch right now
and give you feedback.

On 10/23/12 22:08, Andriy Gapon wrote:
 on 23/10/2012 20:56 Michael Schmiedgen said the following:
 ...
 vdev_geom_open_by_path:519[1]: guid mismatch for provider /dev/ada0p1:
 5267967234359339128 != 0.
 
 Thank you for this valuable information.
 
 Do you have a rough estimate of when you started to experience this issue?

I experienced this since my 2012-10-17 build. I build every
3-5 weeks.

 Could you please also provide output of the following command captured right
 after a reboot and then after you re-add the cache disks?
 $ zdb -lll /dev/ada0p



Here the data in UNAVAIL state:

# zdb -lll /dev/ada0p1

LABEL 0

version: 5000
state: 4
guid: 5267967234359339128

LABEL 1

version: 5000
state: 4
guid: 5267967234359339128

LABEL 2

version: 5000
state: 4
guid: 5267967234359339128

LABEL 3

version: 5000
state: 4
guid: 5267967234359339128

# zdb -lll /dev/ada1p1

LABEL 0

version: 5000
state: 4
guid: 5693315451104805234

LABEL 1

version: 5000
state: 4
guid: 5693315451104805234

LABEL 2

version: 5000
state: 4
guid: 5693315451104805234

LABEL 3

version: 5000
state: 4
guid: 5693315451104805234



Here the data after readding the two devs:

zdb -lll /dev/ada0p1

LABEL 0

version: 5000
state: 4
guid: 13019058935211054376

LABEL 1

version: 5000
state: 4
guid: 13019058935211054376

LABEL 2

version: 5000
state: 4
guid: 13019058935211054376

LABEL 3

version: 5000
state: 4
guid: 13019058935211054376


# zdb -lll /dev/ada1p1

LABEL 0

version: 5000
state: 4
guid: 1347428618237802818

LABEL 1

version: 5000
state: 4
guid: 1347428618237802818

LABEL 2

version: 5000
state: 4
guid: 1347428618237802818

LABEL 3

version: 5000
state: 4
guid: 1347428618237802818


I will post the data after build/install/reboot soon.

Thanks,
  Michael

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ZFS cache devs UNAVAIL

2012-10-23 Thread Michael Schmiedgen
On 10/23/12 22:23, Andriy Gapon wrote:
 on 23/10/2012 23:08 Andriy Gapon said the following:
 on 23/10/2012 20:56 Michael Schmiedgen said the following:
 ...
 vdev_geom_open_by_path:519[1]: guid mismatch for provider /dev/ada0p1:
 5267967234359339128 != 0.

 Thank you for this valuable information.

 Do you have a rough estimate of when you started to experience this issue?

 Could you please also provide output of the following command captured right
 after a reboot and then after you re-add the cache disks?
 $ zdb -lll /dev/ada0p


 
 I still would like to get the above information if possible.
 But here is a patch that you can try:
 
 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
 +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
 @@ -270,14 +270,16 @@ vdev_geom_read_config(struct g_consumer *cp, nvlist_t
 **config)
   continue;
 
   if (nvlist_lookup_uint64(*config, ZPOOL_CONFIG_POOL_STATE,
 - state) != 0 || state = POOL_STATE_DESTROYED) {
 + state) != 0 || state == POOL_STATE_DESTROYED ||
 + state  POOL_STATE_L2CACHE) {
   nvlist_free(*config);
   *config = NULL;
   continue;
   }
 
 - if (nvlist_lookup_uint64(*config, ZPOOL_CONFIG_POOL_TXG,
 - txg) != 0 || txg == 0) {
 + if (state != POOL_STATE_SPARE  state != POOL_STATE_L2CACHE 
 + (nvlist_lookup_uint64(*config, ZPOOL_CONFIG_POOL_TXG,
 + txg) != 0 || txg == 0)) {
   nvlist_free(*config);
   *config = NULL;
   continue;

This works for me. Thank you very much! :) For zdb data see below,
it has not changed since patch-apply/readd/reboot.

  Michael



# zdb -lll /dev/ada0p1

LABEL 0

version: 5000
state: 4
guid: 13019058935211054376

LABEL 1

version: 5000
state: 4
guid: 13019058935211054376

LABEL 2

version: 5000
state: 4
guid: 13019058935211054376

LABEL 3

version: 5000
state: 4
guid: 13019058935211054376

# zdb -lll /dev/ada1p1

LABEL 0

version: 5000
state: 4
guid: 1347428618237802818

LABEL 1

version: 5000
state: 4
guid: 1347428618237802818

LABEL 2

version: 5000
state: 4
guid: 1347428618237802818

LABEL 3

version: 5000
state: 4
guid: 1347428618237802818

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ZFS cache devs UNAVAIL

2012-10-23 Thread Michael Schmiedgen
On 10/23/12 23:57, Florian Smeets wrote:
 My NAS experienced same problem, I thought the old IDE SSD had just died
 of old age, that's why i didn't investigate further yet. :)

I got 2 physical SSDs, with both first partitions striped
as cache for my main zpool (cache devs gone UNAVAIL) and
both second partitions for a mirrored temp zpool (ONLINE).
So I saw good chances to *not* blame the hardware. ;)

 With the patch the cache device is back.

Works here, too.

Michael

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ZFS cache devs UNAVAIL

2012-10-23 Thread Florian Smeets
On 23.10.12 22:23, Andriy Gapon wrote:
 on 23/10/2012 23:08 Andriy Gapon said the following:
 on 23/10/2012 20:56 Michael Schmiedgen said the following:
 FreeBSD 10.0-CURRENT #0: Tue Oct 23 00:14:32 CEST 2012
 root@gizeh.smoke:/usr/obj/usr/src/sys/GIZEH amd64
 ...
 vdev_geom_open_by_path:519[1]: guid mismatch for provider /dev/ada0p1:
 5267967234359339128 != 0.

 Thank you for this valuable information.

 Do you have a rough estimate of when you started to experience this issue?

 Could you please also provide output of the following command captured right
 after a reboot and then after you re-add the cache disks?
 $ zdb -lll /dev/ada0p


 
 I still would like to get the above information if possible.
 But here is a patch that you can try:
 
 
 I think that I introduced this bug because I used some old OpenSolaris code as
 an inspiration and completely missed the new states.
 

My NAS experienced same problem, I thought the old IDE SSD had just died
of old age, that's why i didn't investigate further yet. :)

With the patch the cache device is back.

Thanks,
Florian



signature.asc
Description: OpenPGP digital signature


Re: ZFS cache devs UNAVAIL

2012-10-22 Thread Matthew D. Fuller
On Tue, Oct 23, 2012 at 12:57:34AM +0200 I heard the voice of
Michael Schmiedgen, and lo! it spake thus:
 
 after an update to CURRENT 2012-10-17 my ZFS cache devs are marked
 UAVAIL after boot. These two devs are SSD partitions that are listed
 with some wired numbers (see below). Before that they were listed
 fine as ada0p1 and ada1p1.

I saw this after my update to 10.0-CURRENT #0 r241541: Sun Oct 14.

In my case, it's ada2p2 which is the cache that comes up unavail on
boot.  One notable thing may be that p1 is used for ZIL, and comes up
fine.

NAMESTATE READ WRITE CKSUM
d   ONLINE   0 0 0
  mirror-0  ONLINE   0 0 0
ada1p3  ONLINE   0 0 0
ada0p3  ONLINE   0 0 0
logs
  ada2p1ONLINE   0 0 0
cache
  ada2p2ONLINE   0 0 0


I notice that you also have a second partition on your drives that's
part of another pool.  Maybe it's related to something giving up after
assigning one partition from the drive to zpool somewhere?  Though in
your case it's p2 that's working and p1 that's wandered off, so maybe
that's not it...


-- 
Matthew Fuller (MF4839)   |  fulle...@over-yonder.net
Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/
   On the Internet, nobody can hear you scream.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org