On 2015-10-19 02:19, Erkki Seppala wrote:
Hugo Mills <h...@carfax.org.uk> writes:
    It has to be disabled because if you enable it, there's a race
condition: since you're overwriting existing data (rather than CoWing
it), you can't update the checksums atomically. So, in the interests
of consistency, checksums are disabled.

I suppose this has been suggested before, but couldn't it store both the
new and the old checksums and be satisfied if either of them match?
Actually, I don't think that's been suggested before, read on however for an explanation of why we don't do that.

The user is probably not happy that a partial write is going to be
difficult to read from the device due to a checksum error, but there is
no promise of recently-overwritten data state with traditional
filesystems either in case of sudden powerdown, assuming there is no
data journaling..
And that is exactly the case with how things are now, when something is marked NOCOW, it has essentially zero guarantee of data consistency after a crash. As things are now though, there is a guarantee that you can still read the file, but using checksums like you suggest would result in it being unreadable most of the time, because it's statistically unlikely that we wrote the _whole_ block (IOW, we can't guarantee without COW that the data was completely written) because: a. While some disks do atomically write single sectors, most don't, and if the power dies during the disk writing a single sector, there is no certainty exactly what that sector will read back as. b. Assuming that item a is not an issue, one block in BTRFS is usually multiple sectors on disk, and a majority of disks have volatile write caches, thus it is not unlikely that the power will die during the process of writing the block. c. In the event that both items a and b are not an issue (for example, you have a storage controller with a non-volatile write cache, have write caching turned off on the disks, and it's a smart enough storage controller that it only removes writes from the cache after they return), then there is still the small but distinct possibility that the crash will cause either corruption in the write cache, or some other hardware related issue.


Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to