On Mon, Oct 30, 2017 at 2:57 AM, Zak Kohler <y...@y2kbugger.com> wrote:

> $ sudo btrfs scrub start --offline --progress /dev/disk/by-id/WD-XX1
> Scrub result:
> Tree bytes scrubbed: 5234425856
> Tree extents scrubbed: 638968
> Data bytes scrubbed: 4353723670528
> Data extents scrubbed: 374300
> Data bytes without csum: 533200896
> Read error: 0
> Verify error: 0
> Csum error: 150
>
> $ sudo btrfs scrub start --offline --progress /dev/disk/by-id/WD-XX2
> Scrub result:
> Tree bytes scrubbed: 5234425856
> Tree extents scrubbed: 638967
> Data bytes scrubbed: 4353723314176
> Data extents scrubbed: 374300
> Data bytes without csum: 533200896
> Read error: 0
> Verify error: 0
> Csum error: 238
>
> $ sudo btrfs scrub start --offline --progress /dev/disk/by-id/WD-XX3
> Scrub result:
> Tree bytes scrubbed: 5234491392
> Tree extents scrubbed: 638975
> Data bytes scrubbed: 4353723572224
> Data extents scrubbed: 374300
> Data bytes without csum: 533200896
> Read error: 0
> Verify error: 0
> Csum error: 175 #first run
> Csum error: 112 #second run...
> Csum error: 285 #third run...
>
> But I ran the /dev/disk/by-id/WD-XX3 device three times and you can
> see the result...


I expect these commands are the same, and involve all three drives in
the offline scrub each time. So you have five different results, but
all five involve csum errors. So the errors have a certain transience
to them, hence inconsistent results.

But the online scrub consistently reports zero errors. That to me
sounds like a bug in the offline scrub code. Maybe it's confused, and
reports data without csums (nodatacow) as csum errors? That does not
explain the inconsistency though.

And then you're getting an consistent failure, but at an inconsistent
location, with Btrfs send, ostensibly due to IO error, which sounds
like it's hitting a bad csum check.

It is entirely possible to get transient errors like this somewhere in
a storage stack that's otherwise not reported by the error detection
code in that layer. The thing I really don't understand is how you're
getting zero errors with conventional online scrub, every time.

On my tiny 23G installation I'm traveling with, I get the same results
with all three scrub methods on an NVMe drive. Zero errors. The
slighly larger spinning rust drives are not with me so I can't check
them for a while.



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to