Re: [zfs-discuss] Has anyone seen zpool corruption with VirtualBox shared folders?

2010-09-22 Thread Orvar Korvar
Now this is a testament to the power of ZFS. Only ZFS is so sensitive it 
observed these errors to you. Had you run another filesystem, you would never 
got a notice that your data is slowly being corrupted by some faulty hardware. 

:o)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Has anyone seen zpool corruption with VirtualBox shared folders?

2010-09-20 Thread Warren Strange
Just following up...

I reran memtest diagnostics and let it run overnight again.   This time I did 
see some memory errors - which would be the most likely explanation for the 
errors I am seeing.

 Faulty hardware strikes again. 


Thanks to all for the advice.

Warren


 Comments below...
 
 On Sep 12, 2010, at 2:56 PM, Warren Strange wrote:
  So we are clear, you are running VirtualBox on
 ZFS,
  rather than ZFS on VirtualBox?
  
  
  Correct
  
  
  Bad power supply, HBA, cables, or other common
 cause.
  To help you determine the sort of corruption, for
  mirrored pools FMA will record
  the nature of the discrepancies.
 fmdump -eV
  will show a checksum error and the associated
 bitmap
  comparisons.
  
  Below are the errors reported from the two disks.
 Not sure if anything looks suspicious (other than the
 obvious checksum error)
  
  Sep 10 2010 12:49:42.315641690
 ereport.fs.zfs.checksum
  nvlist version: 0
  class = ereport.fs.zfs.checksum
  ena = 0x95816e82e2900401
  detector = (embedded nvlist)
  nvlist version: 0
  version = 0x0
  scheme = zfs
  pool = 0xf3cb5e110f2c88ec
  vdev = 0x961d9b28c1440020
  (end detector)
  
  pool = tank
  pool_guid = 0xf3cb5e110f2c88ec
  pool_context = 0
  pool_failmode = wait
  vdev_guid = 0x961d9b28c1440020
  vdev_type = disk
  vdev_path = /dev/dsk/c8t5d0s0
  vdev_devid =
 id1,s...@sata_wdc_wd15eads-00p_wd-wcavu0351361/a
  parent_guid = 0xdae51838a62627b9
  parent_type = mirror
  zio_err = 50
  zio_offset = 0x1ef6813a00
  zio_size = 0x2
  zio_objset = 0x10
  zio_object = 0x1402f
  zio_level = 0
  zio_blkid = 0x76f
  cksum_expected = 0x405288851d24 0x100655c808fa2072
 0xa89d11a403482052 0xf1041fd6f838c6eb
  cksum_actual = 0x40528884fd24 0x100655c803286072
 0xa89d111c8af30052 0xf0fbe93b4f02c6eb
  cksum_algorithm = fletcher4
  __ttl = 0x1
  __tod = 0x4c8a7dc6 0x12d04f5a
  
  Sep 10 2010 12:49:42.315641636
 ereport.fs.zfs.checksum
  nvlist version: 0
  class = ereport.fs.zfs.checksum
  ena = 0x95816e82e2900401
  detector = (embedded nvlist)
  nvlist version: 0
  version = 0x0
  scheme = zfs
  pool = 0xf3cb5e110f2c88ec
  vdev = 0x969570b704d5bff1
  (end detector)
  
  pool = tank
  pool_guid = 0xf3cb5e110f2c88ec
  pool_context = 0
  pool_failmode = wait
  vdev_guid = 0x969570b704d5bff1
  vdev_type = disk
  vdev_path = /dev/dsk/c8t4d0s0
  vdev_devid =
 id1,s...@sata_st31500341as9vs3b4cp/a
  parent_guid = 0xdae51838a62627b9
  parent_type = mirror
  zio_err = 50
  zio_offset = 0x1ef6813a00
  zio_size = 0x2
  zio_objset = 0x10
  zio_object = 0x1402f
  zio_level = 0
  zio_blkid = 0x76f
  cksum_expected = 0x405288851d24 0x100655c808fa2072
 0xa89d11a403482052 0xf1041fd6f838c6eb
  cksum_actual = 0x40528884fd24 0x100655c803286072
 0xa89d111c8af30052 0xf0fbe93b4f02c6eb
  cksum_algorithm = fletcher4
  __ttl = 0x1
  __tod = 0x4c8a7dc6 0x12d04f24
 
 In the case where one side of the mirror is corrupted
 and the other is correct, then
 you will be shown the difference between the two, in
 the form of an abbreviated bitmap.
 
 In this case, the data on each side of the mirror is
 the same, with a large degree of
 confidence. So the source of the corruption is likely
 to be the same -- some common 
 component: CPU, RAM, HBA, I/O path, etc. You can rule
 out the disks as suspects.
 With some additional experiments you can determine if
 the corruption occurred during
 the write or the read.
  -- richard
 -- 
 OpenStorage Summit, October 25-27, Palo Alto, CA
 http://nexenta-summit2010.eventbrite.com
 
 Richard Elling
 rich...@nexenta.com   +1-760-896-4422
 Enterprise class storage for everyone
 www.nexenta.com
 
 
 
 
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discu
 ss

-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Has anyone seen zpool corruption with VirtualBox shared folders?

2010-09-12 Thread Jeff Savit

Hi Warren,

This may not help much, except perhaps as a way to eliminate possible 
causes, but I ran b134 with VirtualBox and guests on ZFS for quite a 
long time without any such symptoms. My pool is a simple, unmirrored 
one, so the difference may be there. I used shared folders without 
incident. Guests include Linux (several distros, including RH), Windows, 
Solaris, BSD.


--Jeff

On 09/12/10 11:05 AM, Warren Strange wrote:

I posted the following to the VirtualBox forum. I would be interested in 
finding out if anyone else has ever seen zpool corruption with VirtualBox as a 
host on OpenSolaris:

-
I am running OpenSolaris b134 as a VirtualBox host, with a Linux guest.

I have experienced 6-7 instances of my zpool getting corrupted.  I am wondering 
if anyone else has ever seen this before.

This is on a mirrored zpool - using drives from two different manufacturers 
(i.e. it is very unlikely both drives would fail at the same time, with the 
same blocks going bad). I initially thought I might have a memory problem - 
which could explain the simultaneous disk failures. After running memory 
diagnostics for 24 hours with no errors reported, I am beginning to suspect it 
might be something else.

I am using shared folders from the guest - mounted at guest boot up time.

Is it possible that the Solaris vboxsf shared folder kernel driver is causing 
corruption? Being in the kernel, would it allow bypassing of the normal zfs 
integrity mechanisms? Or is it possible there is some locking issue or race 
condition that triggers the corruption?

Anecdotally, when I see the corruption the sequence of events seems to be:

- dmesg reports various vbox drivers being loaded (normal - just loading the 
drivers)
- Guest boots - gets just pass grub boot screen to the initial redhat boot 
screen.
- The Guest hangs and never boots.
- zpool status -v  reports corrupted files. The files are on the zpool 
containing the shared folders and the VirtualBox images


Thoughts?
   



--


Jeff Savit | Principal Sales Consultant
Phone: 602.824.6275
Email: jeff.sa...@oracle.com | Blog: http://blogs.sun.com/jsavit
Oracle North America Commercial Hardware
Operating Environments  Infrastructure S/W Pillar
2355 E Camelback Rd | Phoenix, AZ 85016



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Has anyone seen zpool corruption with VirtualBox shared folders?

2010-09-12 Thread Richard Elling
On Sep 12, 2010, at 11:05 AM, Warren Strange wrote:

 I posted the following to the VirtualBox forum. I would be interested in 
 finding out if anyone else has ever seen zpool corruption with VirtualBox as 
 a host on OpenSolaris:
 
 -
 I am running OpenSolaris b134 as a VirtualBox host, with a Linux guest.
 
 I have experienced 6-7 instances of my zpool getting corrupted.  I am 
 wondering if anyone else has ever seen this before. 
 
 This is on a mirrored zpool - using drives from two different manufacturers 
 (i.e. it is very unlikely both drives would fail at the same time, with the 
 same blocks going bad). I initially thought I might have a memory problem - 
 which could explain the simultaneous disk failures. After running memory 
 diagnostics for 24 hours with no errors reported, I am beginning to suspect 
 it might be something else.

So we are clear, you are running VirtualBox on ZFS, rather than ZFS on 
VirtualBox?

 I am using shared folders from the guest - mounted at guest boot up time. 
 
 Is it possible that the Solaris vboxsf shared folder kernel driver is causing 
 corruption? Being in the kernel, would it allow bypassing of the normal zfs 
 integrity mechanisms? Or is it possible there is some locking issue or race 
 condition that triggers the corruption?
 
 Anecdotally, when I see the corruption the sequence of events seems to be:
 
 - dmesg reports various vbox drivers being loaded (normal - just loading the 
 drivers)
 - Guest boots - gets just pass grub boot screen to the initial redhat boot 
 screen. 
 - The Guest hangs and never boots. 
 - zpool status -v  reports corrupted files. The files are on the zpool 
 containing the shared folders and the VirtualBox images
 
 
 Thoughts?

Bad power supply, HBA, cables, or other common cause.
To help you determine the sort of corruption, for mirrored pools FMA will record
the nature of the discrepancies.
fmdump -eV
will show a checksum error and the associated bitmap comparisons.
 -- richard

-- 
OpenStorage Summit, October 25-27, Palo Alto, CA
http://nexenta-summit2010.eventbrite.com
ZFS and performance consulting
http://www.RichardElling.com












___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Has anyone seen zpool corruption with VirtualBox shared folders?

2010-09-12 Thread Warren Strange
 So we are clear, you are running VirtualBox on ZFS,
 rather than ZFS on VirtualBox?
 

Correct


 
 Bad power supply, HBA, cables, or other common cause.
 To help you determine the sort of corruption, for
 mirrored pools FMA will record
 the nature of the discrepancies.
   fmdump -eV
 will show a checksum error and the associated bitmap
 comparisons.

Below are the errors reported from the two disks. Not sure if anything looks 
suspicious (other than the obvious checksum error)




Sep 10 2010 12:49:42.315641690 ereport.fs.zfs.checksum
nvlist version: 0
class = ereport.fs.zfs.checksum
ena = 0x95816e82e2900401
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = zfs
pool = 0xf3cb5e110f2c88ec
vdev = 0x961d9b28c1440020
(end detector)

pool = tank
pool_guid = 0xf3cb5e110f2c88ec
pool_context = 0
pool_failmode = wait
vdev_guid = 0x961d9b28c1440020
vdev_type = disk
vdev_path = /dev/dsk/c8t5d0s0
vdev_devid = id1,s...@sata_wdc_wd15eads-00p_wd-wcavu0351361/a
parent_guid = 0xdae51838a62627b9
parent_type = mirror
zio_err = 50
zio_offset = 0x1ef6813a00
zio_size = 0x2
zio_objset = 0x10
zio_object = 0x1402f
zio_level = 0
zio_blkid = 0x76f
cksum_expected = 0x405288851d24 0x100655c808fa2072 0xa89d11a403482052 
0xf1041fd6f838c6eb
cksum_actual = 0x40528884fd24 0x100655c803286072 0xa89d111c8af30052 
0xf0fbe93b4f02c6eb
cksum_algorithm = fletcher4
__ttl = 0x1
__tod = 0x4c8a7dc6 0x12d04f5a

Sep 10 2010 12:49:42.315641636 ereport.fs.zfs.checksum
nvlist version: 0
class = ereport.fs.zfs.checksum
ena = 0x95816e82e2900401
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = zfs
pool = 0xf3cb5e110f2c88ec
vdev = 0x969570b704d5bff1
(end detector)

pool = tank
pool_guid = 0xf3cb5e110f2c88ec
pool_context = 0
pool_failmode = wait
vdev_guid = 0x969570b704d5bff1
vdev_type = disk
vdev_path = /dev/dsk/c8t4d0s0
vdev_devid = id1,s...@sata_st31500341as9vs3b4cp/a
parent_guid = 0xdae51838a62627b9
parent_type = mirror
zio_err = 50
zio_offset = 0x1ef6813a00
zio_size = 0x2
zio_objset = 0x10
zio_object = 0x1402f
zio_level = 0
zio_blkid = 0x76f
cksum_expected = 0x405288851d24 0x100655c808fa2072 0xa89d11a403482052 
0xf1041fd6f838c6eb
cksum_actual = 0x40528884fd24 0x100655c803286072 0xa89d111c8af30052 
0xf0fbe93b4f02c6eb
cksum_algorithm = fletcher4
__ttl = 0x1
__tod = 0x4c8a7dc6 0x12d04f24

Message was edited by: wstrange
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Has anyone seen zpool corruption with VirtualBox shared folders?

2010-09-12 Thread Richard Elling
Comments below...

On Sep 12, 2010, at 2:56 PM, Warren Strange wrote:
 So we are clear, you are running VirtualBox on ZFS,
 rather than ZFS on VirtualBox?
 
 
 Correct
 
 
 Bad power supply, HBA, cables, or other common cause.
 To help you determine the sort of corruption, for
 mirrored pools FMA will record
 the nature of the discrepancies.
  fmdump -eV
 will show a checksum error and the associated bitmap
 comparisons.
 
 Below are the errors reported from the two disks. Not sure if anything looks 
 suspicious (other than the obvious checksum error)
 
 Sep 10 2010 12:49:42.315641690 ereport.fs.zfs.checksum
 nvlist version: 0
   class = ereport.fs.zfs.checksum
   ena = 0x95816e82e2900401
   detector = (embedded nvlist)
   nvlist version: 0
   version = 0x0
   scheme = zfs
   pool = 0xf3cb5e110f2c88ec
   vdev = 0x961d9b28c1440020
   (end detector)
 
   pool = tank
   pool_guid = 0xf3cb5e110f2c88ec
   pool_context = 0
   pool_failmode = wait
   vdev_guid = 0x961d9b28c1440020
   vdev_type = disk
   vdev_path = /dev/dsk/c8t5d0s0
   vdev_devid = id1,s...@sata_wdc_wd15eads-00p_wd-wcavu0351361/a
   parent_guid = 0xdae51838a62627b9
   parent_type = mirror
   zio_err = 50
   zio_offset = 0x1ef6813a00
   zio_size = 0x2
   zio_objset = 0x10
   zio_object = 0x1402f
   zio_level = 0
   zio_blkid = 0x76f
   cksum_expected = 0x405288851d24 0x100655c808fa2072 0xa89d11a403482052 
 0xf1041fd6f838c6eb
   cksum_actual = 0x40528884fd24 0x100655c803286072 0xa89d111c8af30052 
 0xf0fbe93b4f02c6eb
   cksum_algorithm = fletcher4
   __ttl = 0x1
   __tod = 0x4c8a7dc6 0x12d04f5a
 
 Sep 10 2010 12:49:42.315641636 ereport.fs.zfs.checksum
 nvlist version: 0
   class = ereport.fs.zfs.checksum
   ena = 0x95816e82e2900401
   detector = (embedded nvlist)
   nvlist version: 0
   version = 0x0
   scheme = zfs
   pool = 0xf3cb5e110f2c88ec
   vdev = 0x969570b704d5bff1
   (end detector)
 
   pool = tank
   pool_guid = 0xf3cb5e110f2c88ec
   pool_context = 0
   pool_failmode = wait
   vdev_guid = 0x969570b704d5bff1
   vdev_type = disk
   vdev_path = /dev/dsk/c8t4d0s0
   vdev_devid = id1,s...@sata_st31500341as9vs3b4cp/a
   parent_guid = 0xdae51838a62627b9
   parent_type = mirror
   zio_err = 50
   zio_offset = 0x1ef6813a00
   zio_size = 0x2
   zio_objset = 0x10
   zio_object = 0x1402f
   zio_level = 0
   zio_blkid = 0x76f
   cksum_expected = 0x405288851d24 0x100655c808fa2072 0xa89d11a403482052 
 0xf1041fd6f838c6eb
   cksum_actual = 0x40528884fd24 0x100655c803286072 0xa89d111c8af30052 
 0xf0fbe93b4f02c6eb
   cksum_algorithm = fletcher4
   __ttl = 0x1
   __tod = 0x4c8a7dc6 0x12d04f24

In the case where one side of the mirror is corrupted and the other is correct, 
then
you will be shown the difference between the two, in the form of an abbreviated 
bitmap.

In this case, the data on each side of the mirror is the same, with a large 
degree of
confidence. So the source of the corruption is likely to be the same -- some 
common 
component: CPU, RAM, HBA, I/O path, etc. You can rule out the disks as suspects.
With some additional experiments you can determine if the corruption occurred 
during
the write or the read.
 -- richard

-- 
OpenStorage Summit, October 25-27, Palo Alto, CA
http://nexenta-summit2010.eventbrite.com

Richard Elling
rich...@nexenta.com   +1-760-896-4422
Enterprise class storage for everyone
www.nexenta.com





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss