On 2016-07-14 23:20, Chris Mason wrote: > > > On 07/12/2016 05:50 PM, Goffredo Baroncelli wrote: >> Hi All, >> >> I developed a new btrfs command "btrfs insp phy"[1] to further >> investigate this bug [2]. Using "btrfs insp phy" I developed a >> script to trigger the bug. The bug is not always triggered, but >> most of time yes. >> >> Basically the script create a raid5 filesystem (using three >> loop-device on three file called disk[123].img); on this filesystem >> it is create a file. Then using "btrfs insp phy", the physical >> placement of the data on the device are computed. >> >> First the script checks that the data are the right one (for data1, >> data2 and parity), then it corrupt the data: >> >> test1: the parity is corrupted, then scrub is ran. Then the (data1, >> data2, parity) data on the disk are checked. This test goes fine >> all the times >> >> test2: data2 is corrupted, then scrub is ran. Then the (data1, >> data2, parity) data on the disk are checked. This test fail most of >> the time: the data on the disk is not correct; the parity is wrong. >> Scrub sometime reports "WARNING: errors detected during scrubbing, >> corrected" and sometime reports "ERROR: there are uncorrectable >> errors". But this seems unrelated to the fact that the data is >> corrupetd or not test3: like test2, but data1 is corrupted. The >> result are the same as above. >> >> >> test4: data2 is corrupted, the the file is read. The system doesn't >> return error (the data seems to be fine); but the data2 on the disk >> is still corrupted. >> >> >> Note: data1, data2, parity are the disk-element of the raid5 >> stripe- >> >> Conclusion: >> >> most of the time, it seems that btrfs-raid5 is not capable to >> rebuild parity and data. Worse the message returned by scrub is >> incoherent by the status on the disk. The tests didn't fail every >> time; this complicate the diagnosis. However my script fails most >> of the time. > > Interesting, thanks for taking the time to write this up. Is the > failure specific to scrub? Or is parity rebuild in general also > failing in this case?
Test #4 handles this case: I corrupt the data, and when I read it the data is good. So parity is used but the data on the platter are still bad. However I have to point out that this kind of test is very difficult to do: the file-cache could lead to read an old data, so please suggestion about how flush the cache are good (I do some sync, unmount the filesystem and perform "echo 3 >/proc/sys/vm/drop_caches", but sometime it seems not enough). > > -chris > BR G.Baroncelli -- gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it> Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html