On 07/15/2016 12:28 PM, Goffredo Baroncelli wrote:
On 2016-07-14 23:20, Chris Mason wrote:
On 07/12/2016 05:50 PM, Goffredo Baroncelli wrote:
Hi All,
I developed a new btrfs command "btrfs insp phy"[1] to further
investigate this bug [2]. Using "btrfs insp phy" I developed a
script to trigger the bug. The bug is not always triggered, but
most of time yes.
Basically the script create a raid5 filesystem (using three
loop-device on three file called disk[123].img); on this filesystem
it is create a file. Then using "btrfs insp phy", the physical
placement of the data on the device are computed.
First the script checks that the data are the right one (for data1,
data2 and parity), then it corrupt the data:
test1: the parity is corrupted, then scrub is ran. Then the (data1,
data2, parity) data on the disk are checked. This test goes fine
all the times
test2: data2 is corrupted, then scrub is ran. Then the (data1,
data2, parity) data on the disk are checked. This test fail most of
the time: the data on the disk is not correct; the parity is wrong.
Scrub sometime reports "WARNING: errors detected during scrubbing,
corrected" and sometime reports "ERROR: there are uncorrectable
errors". But this seems unrelated to the fact that the data is
corrupetd or not test3: like test2, but data1 is corrupted. The
result are the same as above.
test4: data2 is corrupted, the the file is read. The system doesn't
return error (the data seems to be fine); but the data2 on the disk
is still corrupted.
Note: data1, data2, parity are the disk-element of the raid5
stripe-
Conclusion:
most of the time, it seems that btrfs-raid5 is not capable to
rebuild parity and data. Worse the message returned by scrub is
incoherent by the status on the disk. The tests didn't fail every
time; this complicate the diagnosis. However my script fails most
of the time.
Interesting, thanks for taking the time to write this up. Is the
failure specific to scrub? Or is parity rebuild in general also
failing in this case?
Test #4 handles this case: I corrupt the data, and when I read
it the data is good. So parity is used but the data on the platter
are still bad.
However I have to point out that this kind of test is very
difficult to do: the file-cache could lead to read an old data, so please
suggestion about how flush the cache are good (I do some sync,
unmount the filesystem and perform "echo 3 >/proc/sys/vm/drop_caches",
but sometime it seems not enough).
O_DIRECT should handle the cache flushing for you.
-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html