https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=288881
Bug ID: 288881
Summary: ZFS checksum error at `software' level
Product: Base System
Version: 14.3-RELEASE
Hardware: amd64
OS: Any
Status: New
Severity: Affects Only Me
Priority: ---
Component: kern
Assignee: [email protected]
Reporter: [email protected]
On one of our systems we got regular errors in one specific, always exactly the
same, place:
Aug 2 08:52:43 hostname ZFS[26329]: checksum mismatch, zpool=zroot
path=/dev/ada0p3 offset=1793168560128 size=32768
Aug 8 05:52:38 hostname ZFS[62461]: checksum mismatch, zpool=zroot
path=/dev/ada0p3 offset=1793168560128 size=32768
With zpool status each time confirming:
NAME STATE READ WRITE CKSUM
zroot ONLINE 0 0 0
raidz3-0 ONLINE 0 0 0
ada0p3 ONLINE 0 0 2
ada1p3 ONLINE 0 0 0
....
And with a zpool scrub fixing the issue with no ado or further errors each
time; and the issue not getting worse if left unattended for a few months.
We assumed that this was a hardware error and recently replaced the disk;
swapping a HGST HMS5C4040BLE640 for a ST4000NM0245.
>From the kernel message:
# bzcat messages*bz2 | grep ada0 | grep Serial
Aug 5 18:45:57 hostname kernel: ada0: Serial Number PL2331LAGUP9BJ
Aug 9 10:26:33 hostname kernel: ada0: Serial Number ZC112BE5
We can see this as successful - and indeed the disk ada0. Resilvering went
without error; However - we no see the very same `error' appearing again:
Aug 14 11:50:59 hostname ZFS[83330]: checksum mismatch, zpool=zroot
path=/dev/ada0p3 offset=1793168592896 size=32768
Aug 14 11:50:59 hostname ZFS[83810]: checksum mismatch, zpool=zroot
path=/dev/ada0p3 offset=1793168560128 size=32768
Aug 15 03:56:04 hostname root[23278]: hostname - ZFS pool - HEALTH
fault
So I am now starting to doubt that this is a hardware issue - and am wondering
if this is a SW issue - and what can be done to narrow this down.
--
You are receiving this mail because:
You are the assignee for the bug.