On Sat, Aug 12, 2017 at 01:51:46PM +0200, Christoph Anton Mitterer wrote: > On Sat, 2017-08-12 at 00:42 -0700, Christoph Hellwig wrote: > > And how are you going to write your data and checksum atomically when > > doing in-place updates? > > Maybe I misunderstand something, but what's the big deal with not doing > it atomically (I assume you mean in terms of actually writing to the > pyhsical medium)? Isn't that anyway already a problem in case of a > crash?
With normal CoW operations, the atomicity is achieved by constructing a completely new metadata tree containing both changes (references to the data, and the csum metadata), and then atomically changing the superblock to point to the new tree, so it really is atomic. With nodatacow, that approach doesn't work, because the new data replaces the old on the physical medium, so you'd have to make the data write atomic with the superblock write -- which can't be done, because it's (at least) two distinct writes. > And isn't that the case also with all forms of e.g. software RAID (when > not having a journal)? > > And as I've said, what's the worst thing that can happen? Either the > data would not have been completely written - with or without > checksumming. Then what's the difference to try the checksumming (and > do it successfully in all non crash cases)? > My understanding was (but that may be wrong of course, I'm not a > filesystem expert at all), that worst that can happen is that data an > csum aren't *both* fully written (in all possible combinations), so > we'd have four cases in total: > > data=good csum=good => fine > data=bad csum=bad => doesn't matter whether csum or not and whether atomic > or not > data=bad csum=good => the csum will tell us, that the data is bad > data= > good csum=bad => the only real problem, data would be actually > > good, but csum is not I don't think this is a particularly good description of the problem. I'd say it's more like this: If you write data and metadata separately (which you have to do in the nodatacow case), and the system halts between the two writes, then you either have the new data with the old csum, or the old csum with the new data. Both data and csum are "good", but good from different states of the FS. In both cases (data first or metadata first), the csum doesn't match the data, and so you now have an I/O error reported when trying to read that data. You can't easily fix this, because when the data and csum don't match, you need to know the _reason_ they don't match -- is it because the machine was interrupted during write (in which case you can fix it), or is it because the hard disk has had someone write data to it directly, and the data is now toast (in which case you shouldn't fix the I/O error)? Basically, nodatacow bypasses the very mechanisms that are meant to provide consistency in the filesystem. Hugo. -- Hugo Mills | vi vi vi: the Editor of the Beast. hugo@... carfax.org.uk | http://carfax.org.uk/ | PGP: E2AB1DE4 |
signature.asc
Description: Digital signature