On Wed, Apr 19, 2017 at 11:44 AM, Henk Slager <eye...@gmail.com> wrote:

> I also have a WD40EZRX and the fs on it is also almost exclusively a
> btrfs receive target and it has now for the second time csum (just 5 )
> errors. Extended selftest at 16K hours shows no problem and I am not
> fully sure if this is a magnetic media error case or something else.

I have now located the 20K (sequential) of bad csums in a 4G file and
physical chunk address. Then read that 1G chunk to a file and wrote it
back to the same disk location. No I/O errors in dmesg, so my
assumption is that the 20K bad spot is replaced by good spares. Or it
was a btrfs or luks fault or just a spurious random write somehow due
to SW/HW glitch.

As a sort of locking the bad area, I did cp --reflink the 4G file to
the root of the fs and read-writeback the 20K spot in the 4G file in
the send-source fs. So now after another differential receive, I
remove all but the latest snapshot. The 5 csum errors will then sit
there fixed if I don't balance. Then just before I do a btrfs-repflace
(if I decide to ), I delete the 4G file en make sure the cleaner has
finished so that replace will not fail on bad the 5 bad csums.

The fs on the WD40EZRX is just another clone/backup but with quite
some complex subvolume tree. The above actions + replace are more fun
and faster cloning again than recreating the tree with rsync etc. I
have done similar things in the past, when csum errors were clearly
due to btrfs bugs but with good HDDs.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to