from:"\"Siegfried Nikolaivich\""

Re: [zfs-discuss] Metadata corrupted

2008-04-28 Thread Siegfried Nikolaivich

> Were you able to fix this problem in the end?

Unfortunately, no.  I believe Matthew Ahrens took a look at it and couldn't 
find the cause or how to fix it.  We had to destroy the pool and re-create it 
from scratch.

Fortunately, this was during the ZFS testing period, and no critically 
important data was lost, but I am still a bit shaken by the incident.  Since 
then we did eventually adopt ZFS, and it has been running well without further 
such problems for over a year now.  This leads me to believe it was either a 
software bug, or a hardware failure that triggered a fatal condition in the 
software that is not resilient to error in a redundant configuration.  I am 
sincerely hoping that this has been fixed, on purpose or by accident.

Cheers,
Siegfried
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] NFS and Tar/Star Performance

2007-06-13 Thread Siegfried Nikolaivich



On 12-Jun-07, at 9:02 AM, eric kustarz wrote:
Comparing a ZFS pool made out of a single disk to a single UFS  
filesystem would be a fair comparison.


What does your storage look like?


The storage looks like:

NAMESTATE READ WRITE CKSUM
tankONLINE   0 0 0
  raidz1ONLINE   0 0 0
c0t0d0  ONLINE   0 0 0
c0t1d0  ONLINE   0 0 0
c0t2d0  ONLINE   0 0 0
c0t4d0  ONLINE   0 0 0
c0t5d0  ONLINE   0 0 0
c0t6d0  ONLINE   0 0 0

All disks are local SATA/300 drives with SATA framework on marvell  
card.  The SATA drives are consumer drives with 16MB cache.


I agree it's not a fair comparison, especially with raidz over 6  
drives.  However, a performance difference of 10x is fairly large.


I do not have a single drive available to test ZFS with and compare  
it to UFS, but I have done similar tests in the past with one ZFS  
drive without write cache, etc. vs. a UFS drive of the same brand/ 
size.  The difference was still on the order of 10x slower for the  
ZFS drive over NFS.  What could cause such a large difference?  Is  
there a way to measure NFS_COMMIT latency?



Cheers,
Siegfried
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] NFS and Tar/Star Performance

2007-06-10 Thread Siegfried Nikolaivich

This is an old topic, discussed many times at length.  However, I  
still wonder if there are any workarounds to this issue except  
disabling ZIL, since it makes ZFS over NFS almost unusable (it's a  
whole magnitude slower).  My understanding is that the ball is in the  
hands of NFS due to ZFS's design.  The testing results are below.



Solaris 10u3 AMD64 server with Mac client over gigabit ethernet.  The  
filesystem is on a 6 disk raidz1 pool, testing the performance of  
untarring (with bzip2) the Linux 2.6.21 source code.  The archive is  
stored locally and extracted remotely.


Locally
---
tar xfvj linux-2.6.21.tar.bz2
real4m4.094s,   user0m44.732s,  sys 0m26.047s

star xfv linux-2.6.21.tar.bz2
real1m47.502s,  user0m38.573s,  sys 0m22.671s

Over NFS

tar xfvj linux-2.6.21.tar.bz2
real48m22.685s, user0m45.703s,  sys 0m59.264s

star xfv linux-2.6.21.tar.bz2
real49m13.574s, user0m38.996s,  sys 0m35.215s

star -no-fsync -x -v -f linux-2.6.21.tar.bz2
real49m32.127s, user0m38.454s,  sys 0m36.197s


The performance seems pretty bad, lets see how other protocols fare.

Over Samba
--
tar xfvj linux-2.6.21.tar.bz2
real4m34.952s,  user0m44.325s,  sys 0m27.404s

star xfv linux-2.6.21.tar.bz2
real4m2.998s,   user0m44.121s,  sys 0m29.214s

star -no-fsync -x -v -f linux-2.6.21.tar.bz2
real4m13.352s,  user0m44.239s,  sys 0m29.547s

Over AFP

tar xfvj linux-2.6.21.tar.bz2
real3m58.405s,  user0m43.132s,  sys 0m40.847s

star xfv linux-2.6.21.tar.bz2
real19m44.212s, user0m38.535s,  sys 0m38.866s

star -no-fsync -x -v -f linux-2.6.21.tar.bz2
real3m21.976s,  user0m42.529s,  sys 0m39.529s


Samba and AFP are much faster, except the fsync'ed star over AFP.  Is  
this a ZFS or NFS issue?


Over NFS to non-ZFS drive
-
tar xfvj linux-2.6.21.tar.bz2
real5m0.211s,   user0m45.330s,  sys 0m50.118s

star xfv linux-2.6.21.tar.bz2
real3m26.053s,  user0m43.069s,  sys 0m33.726s

star -no-fsync -x -v -f linux-2.6.21.tar.bz2
real3m55.522s,  user0m42.749s,  sys 0m35.294s

It looks like ZFS is the culprit here.  The untarring is much faster  
to a single 80 GB UFS drive than a 6 disk raid-z array over NFS.



Cheers,
Siegfried


PS. Getting netatalk to compile on amd64 Solaris required some  
changes since i386 wasn't being defined anymore, and somehow it  
thought the architecture was sparc64 for some linking steps.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Saving scrub results before scrub completes

2006-12-28 Thread Siegfried Nikolaivich

problem where if I send certain files over the network (CIFS or NFS),  
the machine slows to a crawl until it is "hung".  This is  
reproducible every time with the same "special" files, but it does  
not happen locally, only over the network.  I already posted about  
this in network-discuss and am currently investigating the issue.

Additionally, you can look at the corefile using mdb and take a  
look at the vdev error stats. Here's an example (hopefully the  
formatting doesn't get messed up):

Excellent information, thanks!  It looks like there are no read/write/ 
chksum errors.

I now at least have a way of checking the scrub results until the  
panic is fixed (hopefully someday).

Siegfried

> ::spa -v
ADDR STATE NAME
060004473680ACTIVE test

ADDR STATE AUX  DESCRIPTION
060004bcb500 HEALTHY   -root
060004bcafc0 HEALTHY   -  /dev/dsk/c0t2d0s0

> 060004bcb500::vdev -re
ADDR STATE AUX  DESCRIPTION
060004bcb500 HEALTHY   -root

   READWRITE FREECLAIM 
IOCTL
OPS   000 
00
BYTES 000 
00

EREAD 0
EWRITE0
ECKSUM0

060004bcafc0 HEALTHY   -  /dev/dsk/c0t2d0s0

   READWRITE FREECLAIM 
IOCTL
OPS0x170x1d20 
00
BYTES  0x19c000 0x11da000 
00

EREAD 0
EWRITE0
ECKSUM0

This will show you and read/write/cksum errors.

Thanks,
George

Siegfried Nikolaivich wrote:

Hello All,
I am wondering if there is a way to save the scrub results right  
before the scrub is complete.
After upgrading to Solaris 10U3 I still have ZFS panicing right as  
the scrub completes.  The scrub results seem to be "cleared" when  
system boots back up, so I never get a chance to see them.

Does anyone know of a simple way?
  This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Saving scrub results before scrub completes

2006-12-26 Thread Siegfried Nikolaivich

Hello All,

I am wondering if there is a way to save the scrub results right before the 
scrub is complete.

After upgrading to Solaris 10U3 I still have ZFS panicing right as the scrub 
completes.  The scrub results seem to be "cleared" when system boots back up, 
so I never get a chance to see them.

Does anyone know of a simple way?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Panic while scrubbing

2006-10-25 Thread Siegfried Nikolaivich



On 24-Oct-06, at 9:47 PM, James McPherson wrote:


Could you look through your msgbuf and/or /var/adm/messages and
find the full text of when these Illegal Request errors were  
logged. That

will give an idea of where to look next.


Ok it doesn't look like it's the controller, I ran some tests and it  
functions just as well as it used to.


I have no idea why it keeps panicking during the scrub... doesn't  
seem hardware related.



Cheers,
Siegfried
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Panic while scrubbing

2006-10-24 Thread Siegfried Nikolaivich



On 24-Oct-06, at 9:47 PM, James McPherson wrote:


On 10/25/06, Siegfried Nikolaivich <[EMAIL PROTECTED]> wrote:

And this is shown on the rest of the ports:
c0t?d0   Soft Errors: 6 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: ST3320620AS  Revision: CSerial No:
Size: 320.07GB <320072932864 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 6 Predictive Failure Analysis: 0


Hmm. All your disks attached to the same controller and showing
entries in the Illegal Request field . what's the common component
between them - the cable?


I guess the common component between them is the power supply.  Each  
drive has its own SATA cable connected directly to the controller.



Could you look through your msgbuf and/or /var/adm/messages and
find the full text of when these Illegal Request errors were  
logged. That

will give an idea of where to look next.


That is the part I can't figure out.  Nowhere does it say "Illegal  
Request" except when I run iostat -nE.


I found out that the "Illegal Request" count can be incremented on  
the ZFS drives by starting a scrub.


For example:
# iostat -nE
...
c0t2d0   Soft Errors: 8 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: ST3320620AS  Revision: CSerial No:
Size: 320.07GB <320072932864 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 8 Predictive Failure Analysis: 0
c0t3d0   Soft Errors: 24 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: ST3320620AS  Revision: CSerial No:
Size: 320.07GB <320072932864 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 24 Predictive Failure Analysis: 0
...

# zpool scrub tank

# iostat -nE
...
c0t2d0   Soft Errors: 9 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: ST3320620AS  Revision: CSerial No:
Size: 320.07GB <320072932864 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 9 Predictive Failure Analysis: 0
c0t3d0   Soft Errors: 24 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: ST3320620AS  Revision: CSerial No:
Size: 320.07GB <320072932864 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 24 Predictive Failure Analysis: 0
...

# zpool scrub -s tank
(no panic at this point)

Happens every time.



Thanks,
Siegfried
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Panic while scrubbing

2006-10-24 Thread Siegfried Nikolaivich



On 24-Oct-06, at 9:11 PM, James McPherson wrote:


this error from the marvell88sx driver is of concern, The 10b8b decode
and disparity error messages make me think that you have a bad piece
of hardware. I hope it's not your controller but I can't tell  
without more

data. You should have a look at the iostat -En output for the device
on marvell88sx instance #0, attached as port 3. If there are any error
counts above 0 then - after checking /var/adm/messages for medium
errors - you should probably replace the disk.



I have just tried to do a 'zpool scrub' and I got the same result - a  
panic right when the scrub finishes (no errors found during / after  
panic).  So I guess this problem is reproducible (and might not be an  
intermittent hardware malfunction).


It is funny I get the marvell88sx driver error for port 3 as that is  
the Solaris UFS drive, whereas the rest of the ports are setup for  
ZFS.  Since the scrub seems to be causing the panic, I don't see why  
an error on the root drive would be the root cause.


Note that this error comes in the log after it is trying to make a  
dump of the panic: "genunix: [ID 111219 kern.notice] dumping to /dev/ 
dsk/c0t3d0s1, offset 860356608, content: kernel"



By the way, this is what iostat -En shows for port 3:
c0t3d0   Soft Errors: 24 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: ST3320620AS  Revision: CSerial No:
Size: 320.07GB <320072932864 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 24 Predictive Failure Analysis: 0


And this is shown on the rest of the ports:
c0t?d0   Soft Errors: 6 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: ST3320620AS  Revision: CSerial No:
Size: 320.07GB <320072932864 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 6 Predictive Failure Analysis: 0


Thanks,
Siegfried
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Panic while scrubbing

2006-10-24 Thread Siegfried Nikolaivich

Hello,

I am not sure if I am posting in the correct forum, but it seems somewhat zfs 
related, so I thought I'd share it.

While the machine was idle, I started a scrub.  Around the time the scrubbing 
was supposed to be finished, the machine panicked.

This might be related to the 'metadata corruption' that happened earlier to me. 
 Here is the log, any ideas?


Oct 24 20:13:51 FServe unix: [ID 836849 kern.notice] 
Oct 24 20:13:51 FServe ^Mpanic[cpu0]/thread=fe8000311c80: 
Oct 24 20:13:51 FServe genunix: [ID 683410 kern.notice] BAD TRAP: type=e (#pf 
Page fault) rp=fe80003119c0 addr=fe00e24c6218
Oct 24 20:13:51 FServe unix: [ID 10 kern.notice] 
Oct 24 20:13:51 FServe unix: [ID 839527 kern.notice] sched: 
Oct 24 20:13:51 FServe unix: [ID 753105 kern.notice] #pf Page fault
Oct 24 20:13:51 FServe unix: [ID 532287 kern.notice] Bad kernel fault at 
addr=0xfe00e24c6218
Oct 24 20:13:51 FServe unix: [ID 243837 kern.notice] pid=0, 
pc=0xfb92c360, sp=0xfe8000311ab0, eflags=0x10282
Oct 24 20:13:51 FServe unix: [ID 211416 kern.notice] cr0: 
8005003b cr4: 6f0
Oct 24 20:13:51 FServe unix: [ID 354241 kern.notice] cr2: fe00e24c6218 cr3: 
a22b000 cr8: c
Oct 24 20:13:51 FServe unix: [ID 592667 kern.notice]rdi: 84233e88 
rsi: fe00e24c6208 rdx: 3f8038931883
Oct 24 20:13:51 FServe unix: [ID 592667 kern.notice]rcx:0  
r8:1  r9: 
Oct 24 20:13:51 FServe unix: [ID 592667 kern.notice]rax:2 
rbx: fe80eb90f7c0 rbp: fe8000311ab0
Oct 24 20:13:51 FServe unix: [ID 592667 kern.notice]r10: a5de7488 
r11:1 r12: 84233e88
Oct 24 20:13:51 FServe unix: [ID 592667 kern.notice]r13:2 
r14: fe80eb90f7c0 r15: 84233dd8
Oct 24 20:13:51 FServe unix: [ID 592667 kern.notice]fsb: 8000 
gsb: fbc24060  ds:   43
Oct 24 20:13:51 FServe unix: [ID 592667 kern.notice] es:   43  
fs:0  gs:  1c3
Oct 24 20:13:51 FServe unix: [ID 592667 kern.notice]trp:e 
err:0 rip: fb92c360
Oct 24 20:13:51 FServe unix: [ID 592667 kern.notice] cs:   28 
rfl:10282 rsp: fe8000311ab0
Oct 24 20:13:51 FServe unix: [ID 266532 kern.notice] ss:   30
Oct 24 20:13:51 FServe unix: [ID 10 kern.notice] 
Oct 24 20:13:51 FServe genunix: [ID 655072 kern.notice] fe80003118d0 
unix:real_mode_end+58d1 ()
Oct 24 20:13:51 FServe genunix: [ID 655072 kern.notice] fe80003119b0 
unix:trap+d77 ()
Oct 24 20:13:51 FServe genunix: [ID 655072 kern.notice] fe80003119c0 
unix:_cmntrap+13f ()
Oct 24 20:13:51 FServe genunix: [ID 655072 kern.notice] fe8000311ab0 
genunix:avl_insert+60 ()
Oct 24 20:13:51 FServe genunix: [ID 655072 kern.notice] fe8000311ae0 
genunix:avl_add+33 ()
Oct 24 20:13:51 FServe genunix: [ID 655072 kern.notice] fe8000311b60 
zfs:vdev_queue_io_to_issue+1ec ()
Oct 24 20:13:51 FServe genunix: [ID 655072 kern.notice] fe8000311ba0 
zfs:zfsctl_ops_root+33c6e7a1 ()
Oct 24 20:13:51 FServe genunix: [ID 655072 kern.notice] fe8000311bc0 
zfs:vdev_disk_io_done+11 ()
Oct 24 20:13:51 FServe genunix: [ID 655072 kern.notice] fe8000311bd0 
zfs:vdev_io_done+12 ()
Oct 24 20:13:51 FServe genunix: [ID 655072 kern.notice] fe8000311be0 
zfs:zio_vdev_io_done+1b ()
Oct 24 20:13:51 FServe genunix: [ID 655072 kern.notice] fe8000311c60 
genunix:taskq_thread+bc ()
Oct 24 20:13:51 FServe genunix: [ID 655072 kern.notice] fe8000311c70 
unix:thread_start+8 ()
Oct 24 20:13:51 FServe unix: [ID 10 kern.notice] 
Oct 24 20:13:51 FServe genunix: [ID 672855 kern.notice] syncing file systems...
Oct 24 20:13:51 FServe genunix: [ID 904073 kern.notice]  done
Oct 24 20:13:52 FServe genunix: [ID 111219 kern.notice] dumping to 
/dev/dsk/c0t3d0s1, offset 860356608, content: kernel
Oct 24 20:13:52 FServe marvell88sx: [ID 812950 kern.warning] WARNING: 
marvell88sx0: error on port 3:
Oct 24 20:13:52 FServe marvell88sx: [ID 517869 kern.info]   device 
disconnected
Oct 24 20:13:52 FServe marvell88sx: [ID 517869 kern.info]   device connected
Oct 24 20:13:52 FServe marvell88sx: [ID 517869 kern.info]   SError interrupt
Oct 24 20:13:52 FServe marvell88sx: [ID 131198 kern.info]   SErrors:
Oct 24 20:13:52 FServe marvell88sx: [ID 517869 kern.info]   
Recovered communication error
Oct 24 20:13:52 FServe marvell88sx: [ID 517869 kern.info]   PHY 
ready change
Oct 24 20:13:52 FServe marvell88sx: [ID 517869 kern.info]   10-bit 
to 8-bit decode error
Oct 24 20:13:52 FServe marvell88sx: [ID 517869 kern.info]   
Disparity error
Oct 24 20:13:57 FServe genunix: [ID 409368 kern.notice] ^M100% done: 150751 
pages dumped, compression ratio 4.23, 
Oct 24 20:13:57 FServe genunix: [ID 851671 kern.notice] dump succeeded


Thanks,
Siegfried
 
 
This message posted from opensolaris.or

[zfs-discuss] Re: Re: Re: Metadata corrupted

2006-10-11 Thread Siegfried Nikolaivich

> On Mon, Oct 09, 2006 at 11:08:14PM -0700, Matthew
> Ahrens wrote:
> You may also want to try 'fmdump -eV' to get an idea
> of what those
> faults were.

I am not sure how to interpret the results, maybe you can help me.  It looks 
like the following with many more similar pages following:

% fmdump -eV
TIME   CLASS
Oct 07 2006 17:28:48.265102839 ereport.fs.zfs.checksum
nvlist version: 0
class = ereport.fs.zfs.checksum
ena = 0x933872163a1
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = zfs
pool = 0xbe23c6961def3450
vdev = 0x46f50fe03a3fd818
(end detector)

pool = tank
pool_guid = 0xbe23c6961def3450
pool_context = 0
vdev_guid = 0x46f50fe03a3fd818
vdev_type = disk
vdev_path = /dev/dsk/c0t1d0s0
parent_guid = 0x3bb6ede3be1cf975
parent_type = raidz
zio_err = 0
zio_offset = 0x1c3644ae00
zio_size = 0xac00
zio_objset = 0x20
zio_object = 0x78
zio_level = 0
zio_blkid = 0xafaf
__ttl = 0x1
__tod = 0x45284640 0xfcd25f7

Oct 07 2006 17:31:24.616729701 ereport.fs.zfs.checksum
nvlist version: 0
class = ereport.fs.zfs.checksum
ena = 0xb7a0bad55900401
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = zfs
pool = 0xbe23c6961def3450
vdev = 0xa543197df30d1460
(end detector)

pool = tank
pool_guid = 0xbe23c6961def3450
pool_context = 0
vdev_guid = 0xa543197df30d1460
vdev_type = disk
vdev_path = /dev/dsk/c0t2d0s0
parent_guid = 0x3bb6ede3be1cf975
parent_type = raidz
zio_err = 0
zio_offset = 0x30d218e00
zio_size = 0xac00
zio_objset = 0x20
zio_object = 0xea
zio_level = 0
zio_blkid = 0x7577
__ttl = 0x1
__tod = 0x452846dc 0x24c28c65

Oct 07 2006 17:31:24.903968466 ereport.fs.zfs.checksum
nvlist version: 0
class = ereport.fs.zfs.checksum
ena = 0xb7b1da39251
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = zfs
pool = 0xbe23c6961def3450
vdev = 0x46f50fe03a3fd818
(end detector)

pool = tank
pool_guid = 0xbe23c6961def3450
pool_context = 0
vdev_guid = 0x46f50fe03a3fd818
vdev_type = disk
vdev_path = /dev/dsk/c0t1d0s0
parent_guid = 0x3bb6ede3be1cf975
parent_type = raidz
zio_err = 0
zio_offset = 0x30e558800
zio_size = 0xac00
zio_objset = 0x20
zio_object = 0xea
zio_level = 0
zio_blkid = 0x7724
__ttl = 0x1
__tod = 0x452846dc 0x35e176d2

Oct 07 2006 17:31:52.178481693 ereport.fs.zfs.checksum
nvlist version: 0
class = ereport.fs.zfs.checksum
ena = 0xbe0bb6f3b11
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = zfs
pool = 0xbe23c6961def3450
vdev = 0xa543197df30d1460
(end detector)

pool = tank
pool_guid = 0xbe23c6961def3450
pool_context = 0
vdev_guid = 0xa543197df30d1460
vdev_type = disk
vdev_path = /dev/dsk/c0t2d0s0
parent_guid = 0x3bb6ede3be1cf975
parent_type = raidz
zio_err = 0
zio_offset = 0x375e12800
zio_size = 0xac00
zio_objset = 0x20
zio_object = 0xec
zio_level = 0
zio_blkid = 0x7788
__ttl = 0x1
__tod = 0x452846f8 0xaa36a1d

Cheers,
Albert
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Re: Re: Metadata corrupted

2006-10-09 Thread Siegfried Nikolaivich

> Yeah, good catch.  So this means that it seems to be
> able to read the 
> label off of each device OK, and the labels look
> good.  I'm not sure 
> what else would cause us to be unable to open the
> pool...  Can you try 
> running 'zpool status -v'?

The command seems to return the same thing:

% zpool status -v
  pool: tank
 state: FAULTED
status: The pool metadata is corrupted and the pool cannot be opened.
action: Destroy and re-create the pool from a backup source.
   see: http://www.sun.com/msg/ZFS-8000-CS
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
tankFAULTED  0 0 6  corrupted data
  raidz ONLINE   0 0 6
c0t0d0  ONLINE   0 0 0
c0t1d0  ONLINE   0 0 0
c0t2d0  ONLINE   0 0 0
c0t4d0  ONLINE   0 0 0


I can provide you with SSH access if you want.

Thanks,
Siegfried
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Re: Metadata corrupted

2006-10-09 Thread Siegfried Nikolaivich

>  zdb -l /dev/dsk/c0t1d0

Sorry for posting again, but I think you might have meant /dev/dsk/c0t1d0s0 
there.  The only difference between the following outputs is the guid for each 
device.

# zdb -l /dev/dsk/c0t0d0s0

LABEL 0

version=2
name='tank'
state=0
txg=225992
pool_guid=13701012839440790608
top_guid=4302888056402016629
guid=15193057146179069576
vdev_tree
type='raidz'
id=0
guid=4302888056402016629
metaslab_array=13
metaslab_shift=33
ashift=9
asize=1280238944256
children[0]
type='disk'
id=0
guid=15193057146179069576
path='/dev/dsk/c0t0d0s0'
whole_disk=1
DTL=122
children[1]
type='disk'
id=1
guid=5113010407673419800
path='/dev/dsk/c0t1d0s0'
whole_disk=1
DTL=121
children[2]
type='disk'
id=2
guid=11908389868437050464
path='/dev/dsk/c0t2d0s0'
whole_disk=1
DTL=120
children[3]
type='disk'
id=3
guid=800140628824658935
path='/dev/dsk/c0t4d0s0'
devid='id1,[EMAIL PROTECTED]/a'
whole_disk=1
DTL=119

LABEL 1

version=2
name='tank'
state=0
txg=225992
pool_guid=13701012839440790608
top_guid=4302888056402016629
guid=15193057146179069576
vdev_tree
type='raidz'
id=0
guid=4302888056402016629
metaslab_array=13
metaslab_shift=33
ashift=9
asize=1280238944256
children[0]
type='disk'
id=0
guid=15193057146179069576
path='/dev/dsk/c0t0d0s0'
whole_disk=1
DTL=122
children[1]
type='disk'
id=1
guid=5113010407673419800
path='/dev/dsk/c0t1d0s0'
whole_disk=1
DTL=121
children[2]
type='disk'
id=2
guid=11908389868437050464
path='/dev/dsk/c0t2d0s0'
whole_disk=1
DTL=120
children[3]
type='disk'
id=3
guid=800140628824658935
path='/dev/dsk/c0t4d0s0'
devid='id1,[EMAIL PROTECTED]/a'
whole_disk=1
DTL=119

LABEL 2

version=2
name='tank'
state=0
txg=225992
pool_guid=13701012839440790608
top_guid=4302888056402016629
guid=15193057146179069576
vdev_tree
type='raidz'
id=0
guid=4302888056402016629
metaslab_array=13
metaslab_shift=33
ashift=9
asize=1280238944256
children[0]
type='disk'
id=0
guid=15193057146179069576
path='/dev/dsk/c0t0d0s0'
whole_disk=1
DTL=122
children[1]
type='disk'
id=1
guid=5113010407673419800
path='/dev/dsk/c0t1d0s0'
whole_disk=1
DTL=121
children[2]
type='disk'
id=2
guid=11908389868437050464
path='/dev/dsk/c0t2d0s0'
whole_disk=1
DTL=120
children[3]
type='disk'
id=3
guid=800140628824658935
path='/dev/dsk/c0t4d0s0'
devid='id1,[EMAIL PROTECTED]/a'
whole_disk=1
DTL=119

LABEL 3

version=2
name='tank'
state=0
txg=225992
pool_guid=13701012839440790608
top_guid=4302888056402016629
guid=15193057146179069576
vdev_tree
type='raidz'
id=0
guid=4302888056402016629
metaslab_array=13
metaslab_shift=33
ashift=9
asize=1280238944256
children[0]
type='disk'
id=0
guid=15193057146179069576
path='/dev/dsk/c0t0d0s0'
whole_disk=1
DTL=122
children[1]
type='disk'
id=1
guid=5113010407673419800
path='/dev/dsk/c0t1d0s0'
whole_disk=1
DTL=121
children[2]
type='disk'

[zfs-discuss] Re: Re: Metadata corrupted

2006-10-09 Thread Siegfried Nikolaivich

> > zdb -v tank

Forgot to add "zdb: can't open tank: error 5" to the end of the output of that 
command.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Re: Metadata corrupted

2006-10-09 Thread Siegfried Nikolaivich

Thanks for the response Matthew.

> > I don't think it's a hardware issue because it
> seems to be still
> > working fine, and has been for months.
> 
> "Working fine", except that you can't access your
> pool, right? :-)

Well the computer and disk controller work fine when I tried it in Linux with a 
different set of disks.  Even if one of the disks or the controller failed, I 
do not think that this should destroy the pool, should it?

> We might be able to figure out more exactly what went
> wrong if you can 
> send the output of:
> 
> zpool status -x
> zdb -v tank
>(which might not work)
>  -l /dev/dsk/c0t1d0
> db -l /dev/dsk/... (for each of the other devices in
> the pool)

# zpool status -x
  pool: tank
 state: FAULTED
status: The pool metadata is corrupted and the pool cannot be opened.
action: Destroy and re-create the pool from a backup source.
   see: http://www.sun.com/msg/ZFS-8000-CS
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
tankFAULTED  0 0 6  corrupted data
  raidz ONLINE   0 0 6
c0t0d0  ONLINE   0 0 0
c0t1d0  ONLINE   0 0 0
c0t2d0  ONLINE   0 0 0
c0t4d0  ONLINE   0 0 0

# zdb -v tank
version=2
name='tank'
state=0
txg=4
pool_guid=13701012839440790608
vdev_tree
type='root'
id=0
guid=13701012839440790608
children[0]
type='raidz'
id=0
guid=4302888056402016629
metaslab_array=13
metaslab_shift=33
ashift=9
asize=1280238944256
children[0]
type='disk'
id=0
guid=15193057146179069576
path='/dev/dsk/c0t0d0s0'
whole_disk=1
children[1]
type='disk'
id=1
guid=5113010407673419800
path='/dev/dsk/c0t1d0s0'
whole_disk=1
children[2]
type='disk'
id=2
guid=11908389868437050464
path='/dev/dsk/c0t2d0s0'
whole_disk=1
children[3]
type='disk'
id=3
guid=800140628824658935
path='/dev/dsk/c0t4d0s0'
devid='id1,[EMAIL PROTECTED]/a'
whole_disk=1

# zdb -l /dev/dsk/c0t0d0

LABEL 0

failed to unpack label 0

LABEL 1

failed to unpack label 1

LABEL 2

failed to unpack label 2

LABEL 3

failed to unpack label 3


The result is the same for the other 3 disks (c0t1d0, c0t2d0 and c0t4d0).  I am 
new to ZFS, so I am not sure what these results tell, they don't look too good. 
 What I find strange is that zdb -v tank has a devid for the 4th child, but not 
for the others.

Any ideas?


Thanks,
Siegfried
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: Metadata corrupted

2006-10-09 Thread Siegfried Nikolaivich

> status: The pool metadata is corrupted and the pool
> cannot be opened.

Is there at least a way to determine what caused this error?  Is it a hardware 
issue?  Is it a possible defect in ZFS?

I don't think it's a hardware issue because it seems to be still working fine, 
and has been for months.

It's important to have this information so that I/we can prevent it from 
happening next time.


Thanks,
Siegfried
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: A versioning FS

2006-10-07 Thread Siegfried Nikolaivich

> So, if I build it, people will want it? ;)

I think implementing this feature would help Apple adopt ZFS for Time Machine, 
which is essentially a versioning FS in practice.  Actually I don't know if 
Apple does this, but you can increment versions with kernel notifications of 
file changes (Spotlight).


Cheers
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Metadata corrupted

2006-10-07 Thread Siegfried Nikolaivich

I was in the middle of doing a large transfer to my ZFS pool over CIFS.  Near 
the end of the transfer, the Solaris machine froze.  Both ethernet links were 
down.

I walked over to the machine and pushed the reset button, as it wouldn't 
respond to any key-presses.  After the machine booted up, I got an unpleasant 
surprise:

FServe% zpool status
  pool: tank
 state: FAULTED
status: The pool metadata is corrupted and the pool cannot be opened.
action: Destroy and re-create the pool from a backup source.
   see: http://www.sun.com/msg/ZFS-8000-CS
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
tankFAULTED  0 0 6  corrupted data
  raidz ONLINE   0 0 6
c0t0d0  ONLINE   0 0 0
c0t1d0  ONLINE   0 0 0
c0t2d0  ONLINE   0 0 0
c0t4d0  ONLINE   0 0 0

The machine has worked fine for months, and now after a freeze/reboot all my 
data is gone?  Is there a way to recover any of it?  I have some very important 
files on it.

I am running Solaris 10 06/06.


Thanks
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: x86 CPU Choice for ZFS

2006-07-06 Thread Siegfried Nikolaivich

> But for ZFS, it has been said often that it currently performs
> much better with a 64bit address space, such as that with
> Opterons and other AMD64 CPUs. I think this would play a
> bigger part in a ZFS server performing well than just MHZ
> and cache size.

I will no doubt be selecting a 64-bit capable CPU.  My main concern is whether 
getting a dual core vs single core processor will give ZFS any noticable 
performance gain.  Is ZFS multi-threaded in any way?  I will also be heavily 
using NFS and possibly Samba, but a single core processor with a much higher 
clock speed is much cheaper than the dual core offerings from AMD.

Also, there is a premium price for extra L2 cache.  Would the ZFS checksum'ing 
and parity calculations benefit at all from a larger L2 cache, say 1MB?  Or 
would the instructions fit fine inside 512kB?  I know it depends on the 
application, but some general info on this subject will help my selection.

Thanks
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] x86 CPU Choice for ZFS

2006-07-06 Thread Siegfried Nikolaivich

Hello,

What kind of x86 CPU does ZFS prefer?  In particular, what kind of CPU is 
optimal when using RAID-Z with a large number of disks (8)?

Does L2 cache size play a big role, 256kb vs 512kb vs 1MB?  Are there any 
performance improvements when using a dual core or quad processor machine?

I am choosing a CPU in a system primarily for ZFS and am wondering whether 
paying the extra price for a larger cache or going dual core will provide any 
benefits.  Or would it be better to put the money towards a higher clocked CPU?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Metadata corrupted

Re: [zfs-discuss] NFS and Tar/Star Performance

[zfs-discuss] NFS and Tar/Star Performance

Re: [zfs-discuss] Saving scrub results before scrub completes

[zfs-discuss] Saving scrub results before scrub completes

Re: [zfs-discuss] Panic while scrubbing

Re: [zfs-discuss] Panic while scrubbing

Re: [zfs-discuss] Panic while scrubbing

[zfs-discuss] Panic while scrubbing

[zfs-discuss] Re: Re: Re: Metadata corrupted

[zfs-discuss] Re: Re: Re: Metadata corrupted

[zfs-discuss] Re: Re: Metadata corrupted

[zfs-discuss] Re: Re: Metadata corrupted

[zfs-discuss] Re: Re: Metadata corrupted

[zfs-discuss] Re: Metadata corrupted

[zfs-discuss] Re: A versioning FS

[zfs-discuss] Metadata corrupted

[zfs-discuss] Re: x86 CPU Choice for ZFS

[zfs-discuss] x86 CPU Choice for ZFS

19 matches

Site Navigation

Mail list logo

Footer information