On Sat, Aug 12, 2017 at 01:51:46PM +0200, Christoph Anton Mitterer wrote:
> On Sat, 2017-08-12 at 00:42 -0700, Christoph Hellwig wrote:
> > And how are you going to write your data and checksum atomically when
> > doing in-place updates?
> 
> Maybe I misunderstand something, but what's the big deal with not doing
> it atomically (I assume you mean in terms of actually writing to the
> pyhsical medium)? Isn't that anyway already a problem in case of a
> crash?

   With normal CoW operations, the atomicity is achieved by
constructing a completely new metadata tree containing both changes
(references to the data, and the csum metadata), and then atomically
changing the superblock to point to the new tree, so it really is
atomic.

   With nodatacow, that approach doesn't work, because the new data
replaces the old on the physical medium, so you'd have to make the
data write atomic with the superblock write -- which can't be done,
because it's (at least) two distinct writes.

> And isn't that the case also with all forms of e.g. software RAID (when
> not having a journal)?
> 
> And as I've said, what's the worst thing that can happen? Either the
> data would not have been completely written - with or without
> checksumming. Then what's the difference to try the checksumming (and
> do it successfully in all non crash cases)?
> My understanding was (but that may be wrong of course, I'm not a
> filesystem expert at all), that worst that can happen is that data an
> csum aren't *both* fully written (in all possible combinations), so
> we'd have four cases in total:
> 
> data=good csum=good => fine
> data=bad  csum=bad  => doesn't matter whether csum or not and whether atomic 
> or not
> data=bad  csum=good => the csum will tell us, that the data is bad
> data=
> good csum=bad  => the only real problem, data would be actually
>         
>               good, but csum is not

   I don't think this is a particularly good description of the
problem. I'd say it's more like this:

   If you write data and metadata separately (which you have to do in
the nodatacow case), and the system halts between the two writes, then
you either have the new data with the old csum, or the old csum with
the new data. Both data and csum are "good", but good from different
states of the FS. In both cases (data first or metadata first), the
csum doesn't match the data, and so you now have an I/O error reported
when trying to read that data.

   You can't easily fix this, because when the data and csum don't
match, you need to know the _reason_ they don't match -- is it because
the machine was interrupted during write (in which case you can fix
it), or is it because the hard disk has had someone write data to it
directly, and the data is now toast (in which case you shouldn't fix
the I/O error)?

   Basically, nodatacow bypasses the very mechanisms that are meant to
provide consistency in the filesystem.

   Hugo.

-- 
Hugo Mills             | vi vi vi: the Editor of the Beast.
hugo@... carfax.org.uk |
http://carfax.org.uk/  |
PGP: E2AB1DE4          |

Attachment: signature.asc
Description: Digital signature

Reply via email to