Re: Analysis of disk file block with ZFS checksum error

2008-05-30 Thread Pawel Jakub Dawidek
On Mon, Feb 11, 2008 at 12:39:08PM -0700, Joe Peterson wrote:
 Gavin Atkinson wrote:
  Are the datestamps (Thu Jan 24 23:20:58 2008) found within the corrupt
  block before or after the datestamp of the file it was found within?
  i.e. was the corrupt block on the disk before or after the mp3 was
  written there?
 
 Hi Gavin, those dated are later than the original copy (I do not have
 the file timestamps to prove this, but according to my email record, I
 am pretty sure of this).  So the corrupt block is later than the
 original write.
 
 If this is the case, I assume that the block got written, by mistake,
 into the middle of the mp3 file.  Someone else suggested that it could
 be caused by a bad transfer block number or bad drive command (corrupted
 on the way to the drive, since these are not checksummed in the
 hardware).  If the block went to the wrong place, AND if it was a HW
 glitch, I suppose the best ZFS could then do is retry the write (if its
 failure was even detected - still not sure if ZFS does a re-check of the
 disk data checksum after the disk write), not knowing until the later
 scrub that the block had corrupted a file.

ZFS doesn't verify checksum after write, it would be pointless for two
reasons:
1. The read will come most likely from disk cache and not from the
   stable storage.
2. This would kill performance.

ZFS test checksum only on read. What you observe is either a misdirected
read/write (you asked to read/write sector X, but the data was read
from or wrote to sector Y) or a phantom write (you asked to write, but
the data never reach the disk, so you have old data there).

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpCVwlIXlBUx.pgp
Description: PGP signature


Re: Analysis of disk file block with ZFS checksum error

2008-03-04 Thread Eric Anderson

Joe Peterson wrote:

Gavin Atkinson wrote:

Are the datestamps (Thu Jan 24 23:20:58 2008) found within the corrupt
block before or after the datestamp of the file it was found within?
i.e. was the corrupt block on the disk before or after the mp3 was
written there?


Hi Gavin, those dated are later than the original copy (I do not have
the file timestamps to prove this, but according to my email record, I
am pretty sure of this).  So the corrupt block is later than the
original write.

If this is the case, I assume that the block got written, by mistake,
into the middle of the mp3 file.  Someone else suggested that it could
be caused by a bad transfer block number or bad drive command (corrupted
on the way to the drive, since these are not checksummed in the
hardware).  If the block went to the wrong place, AND if it was a HW
glitch, I suppose the best ZFS could then do is retry the write (if its
failure was even detected - still not sure if ZFS does a re-check of the
disk data checksum after the disk write), not knowing until the later
scrub that the block had corrupted a file.

I think that anything is possible, but I know I was getting periodic DMA
timeouts, etc. around that time.  I hesitate, although it is tempting,
to use this evidence to focus blame purely on bad HW, given that others
seem to be seeing DMA problems too, and there is reasonable doubt
whether their problems are HW related or not.  In my case, I have been
free of DMA errors (cross your fingers) after re-installed FreeBSD
completely (giving it a larger boot partition and redoing the ZFS slice
too), and before this, I changed the IDE cable just to eliminate one
more variable.  Therefore, there are too many variables to reach a firm
conclusion, since even if the cable was bad, I never saw one DMA error
or other indication of anything wrong with HW from the Linux side (and
I've been using that HW with both Linux and FreeBSD 6.2 for months now -
no apparent flakiness of any kind on either system).  So either it *was*
bad and FreeBSD 7.0 was being more honest, FreeBSD's drivers and/or
ZFS was stressing the HW and revealing weaknesses in the cable, or it
was a SW issue that got cleared somehow when I re-installed.

Is it possible that the problem lies in the ATA drivers in FreeBSD or
even in ZFS and just looks like HW issues?  I do not have enough
info/expertise to know.  If not, then it may very well be true that HW
problems are pretty widespread (and that disk HW cannot, in fact, be
trusted), and there really *is* a strong need for ZFS *now* to protect
our data.  If there is a possibility that SW could be involved, any
hints on how to further debug this would be of great help to those still
experiencing recent DMA errors.  I just want to be more sure one way or
the other, but I know this issue is not an easy one (however, it's the
kind of problem that should receive the highest priority, IMHO).


I'm not sure what happened to this thread, but I also had a lot of 
similar issues.  I was using SATA, and using a mirrored pair of SATA 
drives, brand new.  It was suggested that my controller was junk.


I'm starting to think there is a timing issue or some such problem with 
ZFS, since I can use the same drives in a gmirror with UFS, and never 
have any data problems (md5 checksums confirm it over-and-over).  I 
highly doubt that everyone is seeing similar issues and it just is 
because ZFS is so intense.  I've had plenty of systems under severe disk 
load that have never exhibited corrupt files because of something like 
this.


I wish we could get our hands on this issue..  Seems like some common 
threads are ATA/SATA disks.  Is your setup running 32bit or 64bit 
FreeBSD?  (if you already mentioned it, I'm sorry, I missed it)


Eric



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Analysis of disk file block with ZFS checksum error

2008-03-04 Thread Jeremy Chadwick
On Tue, Mar 04, 2008 at 07:25:35AM -0600, Eric Anderson wrote:
 I'm starting to think there is a timing issue or some such problem with 
 ZFS, since I can use the same drives in a gmirror with UFS, and never have 
 any data problems (md5 checksums confirm it over-and-over).  I highly doubt 
 that everyone is seeing similar issues and it just is because ZFS is so 
 intense.  I've had plenty of systems under severe disk load that have never 
 exhibited corrupt files because of something like this.

One thing that hasn't been mentioned (or maybe it has been but I missed
it): FreeBSD's ZFS port is version 6, while Solaris is up to version 10.

Is it possible that the problem folks are experiencing, including the
infamous deadlock or crash on heavy I/O between UFS/UFS2 and ZFS
filesystems, could've been fixed between versions 6 and 10?

I myself use gstripe(8) and UFS2 (no softupdates) on two identical SATA
disks.  I do nightly backups so if I lose a disk, I'm OK.  My transfer
rates are quite good (~143MB/sec read, ~130MB/sec write -- really!) on
the stripe, and in the past 2 weeks I have spent a LOT of time copying
over 150GB of data back and forth between the stripe and the backup disk
without any issues.  All disks are on an ICH7 controller.

 I wish we could get our hands on this issue..  Seems like some common 
 threads are ATA/SATA disks.  Is your setup running 32bit or 64bit FreeBSD?  
 (if you already mentioned it, I'm sorry, I missed it)

So far the reports have shown that it's not specific to either i386 or
amd64, and that it's not specific to any type of hardware (motherboard,
controller, etc.).  Joe's setup is very different from mine, for
example.

If the same disks are fine when used with UFS/UFS2, then I'd say it's
less of a ATA subsystem bug, and more of an oddity with ZFS on FreeBSD.
If it's reproducable, that would be helpful to developers.

Regarding ATA/SATA though, there are reports of DMA timeouts and other
oddities happening on ATA/SATA disks on FreeBSD.  When I was using ZFS
not too long ago, I experienced that problem when doing heavy I/O
(copying data from a standard UFS2 disk to a ZFS RAIDZ pool).  It's been
the only time I've seen this problem.

http://lists.freebsd.org/pipermail/freebsd-stable/2008-January/040013.html

The drive showed no signs of errors (SMART stats look fine, no
mechanical noises or other oddities).  I've since replaced it out of
pure paranoia with a disk identical to the ones on the gstripe(8).

Regarding those issues (DMA errors, etc.), Scott Long has offered to
help, but needs systems which can reproduce the problem reliably and
have remote access (serial highly recommended).

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Analysis of disk file block with ZFS checksum error

2008-03-04 Thread Joe Peterson
Eric Anderson wrote:
 I'm starting to think there is a timing issue or some such problem with 
 ZFS, since I can use the same drives in a gmirror with UFS, and never 
 have any data problems (md5 checksums confirm it over-and-over).  I 
 highly doubt that everyone is seeing similar issues and it just is 
 because ZFS is so intense.  I've had plenty of systems under severe disk 
 load that have never exhibited corrupt files because of something like 
 this.

I also wondered this - i.e. if ZFS was triggering a certain timing
behavior that revealed the problem.  Still, if this is the case, it
seems to me that the problem lies in the ATA subsystem, since it should
prevent a higher-level things like ZFS to be able to create bad timings
(or am I not thinking of this correctly?).

Also, I think there were some reports of problems with DMA/ATA when
*not* using ZFS.

 I wish we could get our hands on this issue..  Seems like some common 
 threads are ATA/SATA disks.  Is your setup running 32bit or 64bit 
 FreeBSD?  (if you already mentioned it, I'm sorry, I missed it)

This was on 32bit FreeBSD with PATA.  I am the one who had no SMART
issues and no DMA errors reported under Linux.  Changing the cable may
have fixed it, since I did not see errors in some further testing, but
even if so, my theory is that there is some edge case (timing?) that the
FreeBSD ATA drivers were sensitive to, and perhaps my change of cables
pushed the problem to the other side of the threshold.  Since I never
saw errors under Linux (and I've been using that cable for a couple of
years), I do not necessarily think the cable was actually defective.

-Joe
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Analysis of disk file block with ZFS checksum error

2008-02-13 Thread junics-fbsdstable

Joe Peterson wrote:

*cut*

I suppose the best ZFS could then do is retry the write (if its
failure was even detected - still not sure if ZFS does a re-check of the
disk data checksum after the disk write), not knowing until the later
scrub that the block had corrupted a file.
  

*cut*

Disclaimer: I have only experimented with ZFS in a VM and read much of 
the documentation, but never used it properly. Please correct me if i 
am wrong.


1) If it where able to verify written data directly after a write, then 
it would probably be an optional feature. I don't recall such an option 
when I experimented, nor can i find it in the online man pages (DOS 
actually had something like: set verify=on)
2) It would cause a lot of head seeking and killing performance, unless 
queued into an elevator seek batch job when the disks are idle. 
(Wikipedia: Elevator_algorithm)
3) It would need to disable all disk read caching to really verify what 
was written to the surface correctly. Probably a complex problem 
considering all the different types of hardware out there, also in 
keeping ZFS portable.
4) ZFS is designed to be run in a redundant configuration, so once it 
reads the bad block on request or scrub then it would be able to 
overwrite the bad block from the redundant data. (See details on self 
healing in the ZFS docs)
4.1) If your ZFS is up to date then you could probably set the copies=2 
parameter on the mount point and do a poor mans raid1, if it is a 
hardware problem that is... _All_ metadata is already written at least 
twice, even in a single disk configuration. I think it will try to keep 
the blocks apart 1/8 of the total space.
4.2) Overwriting bad blocks plays nice with internal disk sector 
relocation. Pending sectors in smartctl -a is a thing of the past :)


I actually have two bad disks that i probably will try it on, once 7.0 
is released. They are heat damaged so bad sectors are popping up 
semi-frequently.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Analysis of disk file block with ZFS checksum error

2008-02-11 Thread Joe Peterson
Gavin Atkinson wrote:
 Are the datestamps (Thu Jan 24 23:20:58 2008) found within the corrupt
 block before or after the datestamp of the file it was found within?
 i.e. was the corrupt block on the disk before or after the mp3 was
 written there?

Hi Gavin, those dated are later than the original copy (I do not have
the file timestamps to prove this, but according to my email record, I
am pretty sure of this).  So the corrupt block is later than the
original write.

If this is the case, I assume that the block got written, by mistake,
into the middle of the mp3 file.  Someone else suggested that it could
be caused by a bad transfer block number or bad drive command (corrupted
on the way to the drive, since these are not checksummed in the
hardware).  If the block went to the wrong place, AND if it was a HW
glitch, I suppose the best ZFS could then do is retry the write (if its
failure was even detected - still not sure if ZFS does a re-check of the
disk data checksum after the disk write), not knowing until the later
scrub that the block had corrupted a file.

I think that anything is possible, but I know I was getting periodic DMA
timeouts, etc. around that time.  I hesitate, although it is tempting,
to use this evidence to focus blame purely on bad HW, given that others
seem to be seeing DMA problems too, and there is reasonable doubt
whether their problems are HW related or not.  In my case, I have been
free of DMA errors (cross your fingers) after re-installed FreeBSD
completely (giving it a larger boot partition and redoing the ZFS slice
too), and before this, I changed the IDE cable just to eliminate one
more variable.  Therefore, there are too many variables to reach a firm
conclusion, since even if the cable was bad, I never saw one DMA error
or other indication of anything wrong with HW from the Linux side (and
I've been using that HW with both Linux and FreeBSD 6.2 for months now -
no apparent flakiness of any kind on either system).  So either it *was*
bad and FreeBSD 7.0 was being more honest, FreeBSD's drivers and/or
ZFS was stressing the HW and revealing weaknesses in the cable, or it
was a SW issue that got cleared somehow when I re-installed.

Is it possible that the problem lies in the ATA drivers in FreeBSD or
even in ZFS and just looks like HW issues?  I do not have enough
info/expertise to know.  If not, then it may very well be true that HW
problems are pretty widespread (and that disk HW cannot, in fact, be
trusted), and there really *is* a strong need for ZFS *now* to protect
our data.  If there is a possibility that SW could be involved, any
hints on how to further debug this would be of great help to those still
experiencing recent DMA errors.  I just want to be more sure one way or
the other, but I know this issue is not an easy one (however, it's the
kind of problem that should receive the highest priority, IMHO).

-Joe
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Analysis of disk file block with ZFS checksum error

2008-02-11 Thread Gavin Atkinson
On Fri, 2008-02-08 at 17:15 -0700, Joe Peterson wrote:
 Chris Dillon wrote:
  That is a chunk of a Mozilla Mork-format database.  Perhaps the  
  Firefox URL history or address book from Thunderbird.
 
 Interesting (thanks to all who recognized Mork).  I do use Firefox and
 Thunderbird, so it's feasible, but how the heck would a piece of one of
 those files find its way into 1/2 of a ZFS block in one of my mp3 files?
I wonder if it could have been done on write when the file was copied
 to the ZFS pool (maybe some write-caching issue?), but I thought ZFS
 would have verified the block after write.  It seems unlikely that it
 would get changed later - I never rewrote that file after the original
 copy...

Are the datestamps (Thu Jan 24 23:20:58 2008) found within the corrupt
block before or after the datestamp of the file it was found within?
i.e. was the corrupt block on the disk before or after the mp3 was
written there?

You could possibly confirm this by grepping for that datestamp in the
files in your home directory, and with the aid of 
http://developer.mozilla.org/en/docs/Mork_Structure#Rows, try to
establish exactly what the datestamp means (ie was it the time you
visited a URL, etc).

Gavin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Analysis of disk file block with ZFS checksum error

2008-02-08 Thread Joe Peterson
In my experimentation with the ZFS filesystem, I encountered one case of
a file block with a checksum mismatch.  Doing a zpool scrub revealed
it, and trying to read the file yielded an error - only the part of the
file before the bad block was read (ZFS aborts reading at this point,
which makes sense), resulting in a short file.  The reason the CKSUM
error is not fixable is because my ZFS pool contains only one device (no
mirror or RAIDZ), but I do have the original/good version of the file
affected.  Here's the output of zpool status (new scrub in process):

  pool: tank
 state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: scrub in progress, 64.36% done, 0h18m to go
config:

NAMESTATE READ WRITE CKSUM
tankONLINE   0 0 2
  hda6  ONLINE   0 0 2

errors: Permanent errors have been detected in the following files:

/mnt/tank/fbsd/home/joe/music/jukebox/christmas/Esquivel/
Merry_XMas_from_the_SpaceAge_Bachelor_Pad/07-Snowfall.mp3


I was curious about what actually happened: was this a ZFS bug, trouble
with its metadata, or truly a bad block?  In order to determine this, I
modified ZFS's source code temporarily to ignore the checksum mismatch
and let the file read fully.  What I then got was the full-length file
and no errors, showing that there were no disk read errors associated
with the read (I already had assumed this from the fact that zpool
status showed only a non-zero CKSUM count), however, I may have seen
other error counts previously (ZFS resets them to zero on, e.g.,
reboot).  I received no errors when originally copying this file *to*
the ZFS pool - only on subsequent reads/scrubs.

(Note that I have posted before about DMA errors in my log for the disk
I am using, but I have had nothing but successful SeaTools tests
(surface scans) of the drive.  Jeremy Chadwick had similar issues, as
did others, so I think it is worth investigating if there is some
OS/software cause rather than real HW issues.  This is one reason I
wanted to investigate my ZFS checksum issue more deeply.)

I also have a good backup of the file in question, so I now have two
copies of the file: one good, and one with a bad block.  The file is
3575936 bytes long, and recordsize (in ZFS) is 128K, making the file
about 27 blocks long.  Curiously, the bad section of the file is exactly
65536 bytes long (1/2 a block).  The bad block starts at exactly the 5th
128K block (byte 65536 or hex a).

I wanted to see the characteristics of the bad data.  Was just one bit
flipped randomly?  No.  It is just one bit or set of bits in the bytes
that are affected?  It doesn't seem so.  Were there any other stange
patterns here?  Well, yes, and maybe someout out there with more
knowledge/experience in disk modes of failure will recognize something
(I have included some data below).

For one thing (as I mentioned), only 65536 bytes are bad (and it's
exactly this many, with a few good bytes thrown in, but not far from
what matches random chance would produce.  Also, all bad bytes have a
zero in the high bit - interesting?  Also, near the end of the block,
the bad bytes all go to zero, strangely coincident with the first good
zero in that bad block - not sure if that's coincidence or not.  Also, I
calculated the number of Bits same (matching bits) in the good vs. bad
bytes, and it appears fairly random, so it appears that the bad bytes
are very random in nature and not correlated much at all with the good
bytes.

So except for the fact that the 2nd half (65536 bytes) of the ZFS block
are good, the bad block seems to consist of random data, except for the
string of zero bytes near the end and the zero high-bit.  It's not as if
one bit on the disk flipped - it affects the whole (1/2) block.  Does
this seem like a disk error, controller error/bug, cable problem (I
recently put a new cable on, so I doubt this).  It seems to me something
more systemic rather than a random bit error - opinions are more than
welcome.

Here is some info from a python program I wrote to look at the data
(I've left out spans of essentially uninteresting portions showing
similar stuff, but I can get you the whole thing if interested):

File posGoodBad Match   Good (bin)  Bad (bin)  Bits same
0009fff0d9  d9  Yes 11011001110110018
0009fff105  05  Yes 010101018
0009fff2c1  c1  Yes 110111018
0009fff381  81  Yes 100110018
0009fff45f  5f  Yes 010101018
0009fff566  66  Yes 01100110011001108
0009fff65e  5e  Yes 010001008

Re: Analysis of disk file block with ZFS checksum error

2008-02-08 Thread Joe Peterson
Mark Day wrote:
 Based on the subset of data you posted, the bad data looks like ASCII
 text.
 The bad data from offset a to a000f is:

 ${138AFE{@
 @$$}1

 The bad data from offset af6c1 to af6c8 is:

 392A9}@

 I don't recognize the content beyond that, but I'd guess that somehow
 the
 contents of some other file managed to overwrite that portion of the bad
 file.  As for how that happened, I don't know.  But if someone
 recognizes
 where the bad content came from, that might be a clue.


Gary/Mark,

Good eye!  Yes, it indeed does appear to be ASCII.  I *thought*
something in the repetition when I originally did an od -a looked
interesting.

I dumped the whole bad section as a string, and here's (partly) what I get:

${138AFE{@
@$$}138AFE}@

@$${138AFF{@
[A3:^80(^91^2146F)]
@$$}138AFF}@

@$${138B00{@
@$$}138B00}@

@$${138B01{@
[181:^80(^91^2146F)]
@$$}138B01}@

@$${138B02{@
@$$}138B02}@

@$${138B03{@
[2C:^80(^91^2146F)]
@$$}138B03}@

@$${138B04{@
@$$}138B04}@

.
.
.

@$${138B8B{@
(21470=Thu Jan 24 23:20:58 2008)
[117:^80(^91^21470)]
@$$}138B8B}@

.
.
.

@$${138C18{@
(21472=1201242069)[-2:^80(^82^85)(^83^1B5)(^84=b)(^85=1)(^86=0)(^87=0)
(^88=0)(^89^2146C)(^8A=)(^8B=40)(^8C=2e)(^8D^84)(^8E=0)(^90^21472)
(^91^21460)]
@$$}138C18}@

@$${138C19{@
(21473=a72f78)[2:^80(^89^21473)]
@$$}138C19}@

@$${138C1A{@
@$$}138C1A}@

.
.
.


and more of the same.  Note the date string.  There are several like
that.  Anyone recognize this text format?

-Joe
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Analysis of disk file block with ZFS checksum error

2008-02-08 Thread Alfred Perlstein
* Joe Peterson [EMAIL PROTECTED] [080208 14:58] wrote:
 Mark Day wrote:
  Based on the subset of data you posted, the bad data looks like ASCII
  text.
  The bad data from offset a to a000f is:
 
  ${138AFE{@
  @$$}1
 
  The bad data from offset af6c1 to af6c8 is:
 
  392A9}@
 
  I don't recognize the content beyond that, but I'd guess that somehow
  the
  contents of some other file managed to overwrite that portion of the bad
  file.  As for how that happened, I don't know.  But if someone
  recognizes
  where the bad content came from, that might be a clue.
 
 
 Gary/Mark,
 
 Good eye!  Yes, it indeed does appear to be ASCII.  I *thought*
 something in the repetition when I originally did an od -a looked
 interesting.
 
 I dumped the whole bad section as a string, and here's (partly) what I get:
 
 ${138AFE{@
 @$$}138AFE}@
 
 @$${138AFF{@
 [A3:^80(^91^2146F)]
 @$$}138AFF}@
 
 @$${138B00{@

Looks like terminal output/codes that have been stripped...

-Alfred
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Analysis of disk file block with ZFS checksum error

2008-02-08 Thread Bartosz Fabianowski
I'd say that's the mork database format [1,2], as used by Mozilla 
products, for example in the Firefox history.dat file.


- Bartosz

[1] http://www.mozilla.org/mailnews/arch/mork/primer.txt
[2] http://www.jwz.org/hacks/mork.pl
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Analysis of disk file block with ZFS checksum error

2008-02-08 Thread Gary Corcoran

Joe Peterson wrote:

In my experimentation with the ZFS filesystem, I encountered one case of
a file block with a checksum mismatch.  Doing a zpool scrub revealed
it, and trying to read the file yielded an error - only the part of the
file before the bad block was read (ZFS aborts reading at this point,
which makes sense), resulting in a short file.  The reason the CKSUM
error is not fixable is because my ZFS pool contains only one device (no
mirror or RAIDZ), but I do have the original/good version of the file
affected.  Here's the output of zpool status (new scrub in process):

  pool: tank
 state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: scrub in progress, 64.36% done, 0h18m to go
config:

NAMESTATE READ WRITE CKSUM
tankONLINE   0 0 2
  hda6  ONLINE   0 0 2

errors: Permanent errors have been detected in the following files:

/mnt/tank/fbsd/home/joe/music/jukebox/christmas/Esquivel/
Merry_XMas_from_the_SpaceAge_Bachelor_Pad/07-Snowfall.mp3


I was curious about what actually happened: was this a ZFS bug, trouble
with its metadata, or truly a bad block?  In order to determine this, I
modified ZFS's source code temporarily to ignore the checksum mismatch
and let the file read fully.  What I then got was the full-length file
and no errors, showing that there were no disk read errors associated
with the read (I already had assumed this from the fact that zpool
status showed only a non-zero CKSUM count), however, I may have seen
other error counts previously (ZFS resets them to zero on, e.g.,
reboot).  I received no errors when originally copying this file *to*
the ZFS pool - only on subsequent reads/scrubs.

(Note that I have posted before about DMA errors in my log for the disk
I am using, but I have had nothing but successful SeaTools tests
(surface scans) of the drive.  Jeremy Chadwick had similar issues, as
did others, so I think it is worth investigating if there is some
OS/software cause rather than real HW issues.  This is one reason I
wanted to investigate my ZFS checksum issue more deeply.)

I also have a good backup of the file in question, so I now have two
copies of the file: one good, and one with a bad block.  The file is
3575936 bytes long, and recordsize (in ZFS) is 128K, making the file
about 27 blocks long.  Curiously, the bad section of the file is exactly
65536 bytes long (1/2 a block).  The bad block starts at exactly the 5th
128K block (byte 65536 or hex a).

I wanted to see the characteristics of the bad data.  Was just one bit
flipped randomly?  No.  It is just one bit or set of bits in the bytes
that are affected?  It doesn't seem so.  Were there any other stange
patterns here?  Well, yes, and maybe someout out there with more
knowledge/experience in disk modes of failure will recognize something
(I have included some data below).

For one thing (as I mentioned), only 65536 bytes are bad (and it's
exactly this many, with a few good bytes thrown in, but not far from
what matches random chance would produce.  Also, all bad bytes have a
zero in the high bit - interesting?  Also, near the end of the block,
the bad bytes all go to zero, strangely coincident with the first good
zero in that bad block - not sure if that's coincidence or not.  Also, I
calculated the number of Bits same (matching bits) in the good vs. bad
bytes, and it appears fairly random, so it appears that the bad bytes
are very random in nature and not correlated much at all with the good
bytes.

So except for the fact that the 2nd half (65536 bytes) of the ZFS block
are good, the bad block seems to consist of random data, except for the
string of zero bytes near the end and the zero high-bit.  It's not as if
one bit on the disk flipped - it affects the whole (1/2) block.  Does
this seem like a disk error, controller error/bug, cable problem (I
recently put a new cable on, so I doubt this).  It seems to me something
more systemic rather than a random bit error - opinions are more than
welcome.

Here is some info from a python program I wrote to look at the data
(I've left out spans of essentially uninteresting portions showing
similar stuff, but I can get you the whole thing if interested):

File posGoodBad Match   Good (bin)  Bad (bin)  Bits same
0009fff0d9  d9  Yes 11011001110110018
0009fff105  05  Yes 010101018
0009fff2c1  c1  Yes 110111018
0009fff381  81  Yes 100110018
0009fff45f  5f  Yes 010101018
0009fff566  66  Yes 01100110011001108
0009fff65e  5e  Yes 

Re: Analysis of disk file block with ZFS checksum error

2008-02-08 Thread Mark Day

On Feb 8, 2008, at 2:29 PM, Joe Peterson wrote:


For one thing (as I mentioned), only 65536 bytes are bad (and it's
exactly this many, with a few good bytes thrown in, but not far from
what matches random chance would produce.  Also, all bad bytes have a
zero in the high bit - interesting?  Also, near the end of the block,
the bad bytes all go to zero, strangely coincident with the first  
good
zero in that bad block - not sure if that's coincidence or not.   
Also, I
calculated the number of Bits same (matching bits) in the good vs.  
bad

bytes, and it appears fairly random, so it appears that the bad bytes
are very random in nature and not correlated much at all with the good
bytes.

So except for the fact that the 2nd half (65536 bytes) of the ZFS  
block
are good, the bad block seems to consist of random data, except for  
the
string of zero bytes near the end and the zero high-bit.  It's not  
as if

one bit on the disk flipped - it affects the whole (1/2) block.  Does
this seem like a disk error, controller error/bug, cable problem (I
recently put a new cable on, so I doubt this).  It seems to me  
something

more systemic rather than a random bit error - opinions are more than
welcome.


Based on the subset of data you posted, the bad data looks like ASCII  
text.

The bad data from offset a to a000f is:

${138AFE{@
@$$}1

The bad data from offset af6c1 to af6c8 is:

392A9}@

I don't recognize the content beyond that, but I'd guess that somehow  
the

contents of some other file managed to overwrite that portion of the bad
file.  As for how that happened, I don't know.  But if someone  
recognizes

where the bad content came from, that might be a clue.

-Mark

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Analysis of disk file block with ZFS checksum error

2008-02-08 Thread Chris Dillon

Quoting Joe Peterson [EMAIL PROTECTED]:


I dumped the whole bad section as a string, and here's (partly) what I get:


[...edited for brevity...]


@$${138B8B{@
(21470=Thu Jan 24 23:20:58 2008)
[117:^80(^91^21470)]
@$$}138B8B}@

@$${138C18{@
(21472=1201242069)[-2:^80(^82^85)(^83^1B5)(^84=b)(^85=1)(^86=0)(^87=0)
(^88=0)(^89^2146C)(^8A=)(^8B=40)(^8C=2e)(^8D^84)(^8E=0)(^90^21472)
(^91^21460)]
@$$}138C18}@

@$${138C19{@
(21473=a72f78)[2:^80(^89^21473)]
@$$}138C19}@

@$${138C1A{@
@$$}138C1A}@

and more of the same.  Note the date string.  There are several like
that.  Anyone recognize this text format?


That is a chunk of a Mozilla Mork-format database.  Perhaps the  
Firefox URL history or address book from Thunderbird.


--

Chris Dillon - NetEng/SysAdm
Reeds Spring R-IV School District
Technology Department
175 Elementary Rd.
Reeds Spring, MO  65737
Voice: 417-272-8266   Fax: 417-272-0015


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Analysis of disk file block with ZFS checksum error

2008-02-08 Thread Dan Nelson
In the last episode (Feb 08), Joe Peterson said:
 Mark Day wrote:
  Based on the subset of data you posted, the bad data looks like
  ASCII text. The bad data from offset a to a000f is:
 
  ${138AFE{@
  @$$}1
 
  The bad data from offset af6c1 to af6c8 is:
 
  392A9}@
 
  I don't recognize the content beyond that, but I'd guess that
  somehow the contents of some other file managed to overwrite that
  portion of the bad file.  As for how that happened, I don't know. 
  But if someone recognizes where the bad content came from, that
  might be a clue.
 
 Good eye!  Yes, it indeed does appear to be ASCII.  I *thought*
 something in the repetition when I originally did an od -a looked
 interesting.
 
 I dumped the whole bad section as a string, and here's (partly) what I get:
 
 @$${138B8B{@
 (21470=Thu Jan 24 23:20:58 2008)
 [117:^80(^91^21470)]
 @$$}138B8B}@
...
 @$${138C18{@
 (21472=1201242069)[-2:^80(^82^85)(^83^1B5)(^84=b)(^85=1)(^86=0)(^87=0)
 (^88=0)(^89^2146C)(^8A=)(^8B=40)(^8C=2e)(^8D^84)(^8E=0)(^90^21472)
 (^91^21460)]
 @$$}138C18}@
 
 and more of the same.  Note the date string.  There are several like
 that.  Anyone recognize this text format?

It's a Mork database from the Mozilla project:

http://developer.mozilla.org/en/docs/Mork_Structure#Rows

-- 
Dan Nelson
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Analysis of disk file block with ZFS checksum error

2008-02-08 Thread Joe Peterson
Chris Dillon wrote:
 That is a chunk of a Mozilla Mork-format database.  Perhaps the  
 Firefox URL history or address book from Thunderbird.

Interesting (thanks to all who recognized Mork).  I do use Firefox and
Thunderbird, so it's feasible, but how the heck would a piece of one of
those files find its way into 1/2 of a ZFS block in one of my mp3 files?
   I wonder if it could have been done on write when the file was copied
to the ZFS pool (maybe some write-caching issue?), but I thought ZFS
would have verified the block after write.  It seems unlikely that it
would get changed later - I never rewrote that file after the original
copy...

-Joe
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Analysis of disk file block with ZFS checksum error

2008-02-08 Thread Julian Elischer

Joe Peterson wrote:

Chris Dillon wrote:
That is a chunk of a Mozilla Mork-format database.  Perhaps the  
Firefox URL history or address book from Thunderbird.


Interesting (thanks to all who recognized Mork).  I do use Firefox and
Thunderbird, so it's feasible, but how the heck would a piece of one of
those files find its way into 1/2 of a ZFS block in one of my mp3 files?
   I wonder if it could have been done on write when the file was copied
to the ZFS pool (maybe some write-caching issue?), but I thought ZFS
would have verified the block after write.  It seems unlikely that it
would g


it could be an old file..
what kind of disks?
I had a scenario where 3ware controllers were just failing to write to
a drive in the array, so old data showed through.

it was possible by looking to see where the boundary between good and 
bad was, to identify the culprit..


the filesystem and the partitions and the raids all were on different
alignments so teh only part of the system that had a boundary that 
aligned with the bad data was the physical stripes laid down by the 
controller.  It was 64k stripes and 64k data missing, exactly on
stripe boundaries. Due to the fact that FreeBSD had partitioned the 
drive staring at 63 blocks in, nothing else aligned with the problem.




-Joe
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to [EMAIL PROTECTED]


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Analysis of disk file block with ZFS checksum error

2008-02-08 Thread Joe Peterson
Julian Elischer wrote:
 it could be an old file..
 what kind of disks?

It's a Seagate ST3500630A parallel ATA drive.

 I had a scenario where 3ware controllers were just failing to write to
 a drive in the array, so old data showed through.

I have an Intel ICH4 controller - nothing unusual.

 the filesystem and the partitions and the raids all were on different
 alignments so teh only part of the system that had a boundary that 
 aligned with the bad data was the physical stripes laid down by the 
 controller.  It was 64k stripes and 64k data missing, exactly on
 stripe boundaries. Due to the fact that FreeBSD had partitioned the 
 drive staring at 63 blocks in, nothing else aligned with the problem.

Hmm, well this is a straight-forward disk situation - never used RAID on
this drive.  Give what is happening, I wonder the changes of it being
HW, OS, or a filesystem issue.

-Joe
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Analysis of disk file block with ZFS checksum error

2008-02-08 Thread Julian Elischer

Joe Peterson wrote:

Julian Elischer wrote:

it could be an old file..
what kind of disks?


It's a Seagate ST3500630A parallel ATA drive.


I had a scenario where 3ware controllers were just failing to write to
a drive in the array, so old data showed through.


I have an Intel ICH4 controller - nothing unusual.


the filesystem and the partitions and the raids all were on different
alignments so teh only part of the system that had a boundary that 
aligned with the bad data was the physical stripes laid down by the 
controller.  It was 64k stripes and 64k data missing, exactly on
stripe boundaries. Due to the fact that FreeBSD had partitioned the 
drive staring at 63 blocks in, nothing else aligned with the problem.


Hmm, well this is a straight-forward disk situation - never used RAID on
this drive.  Give what is happening, I wonder the changes of it being
HW, OS, or a filesystem issue.

-Joe


still, see whether the 64k lines up with the drive or with
the filesystem (if the filesystem is not on an exact 64k boundary
of the drive).
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]