On Thu, Jun 28, 2018 at 11:37 AM, Goffredo Baroncelli <kreij...@libero.it> wrote:
> Regarding your point 3), it must be point out that in case of NOCOW files, > even having the same transid it is not enough. It still be possible that a > copy is update before a power failure preventing the super-block update. > I think that the only way to prevent it to happens is: > 1) using a data journal (which means that each data is copied two times) > OR > 2) using a cow filesystem (with cow enabled of course !) There is no power failure in this example. So it's really off the table considering whether Btrfs or mdadm/lvm raid do better in the same situation with a nodatacow file. I think here is the problem in the Btrfs nodatacow case. Btrfs doesn't have a way of untrusting nodatacow files on a previously missing drive that hasn't been balanced. There is no such thing as nometadatacow, so no matter what it figures out there's a problem, and uses the good copy of metadata, but it never "marks" the previously missing device as suspicious. When it comes time to read a nodatacow file, Btrfs just blindly reads off one of the drives, it has no mechanism for questioning the formerly missing drive without csum. That is actually a really weird and unique kind of write hole for Btrfs raid1 when the data is nodatacow. I have to agree with Remi. This is a flaw in the design or bad bug, however you want to consider it. Because mdadm/lvm do not behave this way in the exact same situation. And an open question I have about scrub is weather it only ever is checking csums, meaning nodatacow files are never scrubbed, or if the copies are at least compared to each other? As for fixes: - During mount time, Btrfs sees from supers that there is a transid mismatch, to not read nodatacow files from the lower transid device until an auto balance has completed. Right now Btrfs doesn't have an abbreviated balance that "replays" the events between two transids. Basically it would work like send/receive but for balance to catch up a previously missing device. Right now we have to do a full balance which is a brutal penalty for a briefly missing drive. Again, mdadm and lvm do better here by default. - Fix the performance issues of COW with disk images. ZFS doesn't even have a nodatacow option and they're running VM images on ZFS and it doesn't sound like they're running into ridiculous performance penalties that makes it impractical to use. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html