On 08/12/2017 02:12 PM, Hugo Mills wrote:
> On Sat, Aug 12, 2017 at 01:51:46PM +0200, Christoph Anton Mitterer wrote:
>> On Sat, 2017-08-12 at 00:42 -0700, Christoph Hellwig wrote:
[...]      
>>               good, but csum is not
> 
>    I don't think this is a particularly good description of the
> problem. I'd say it's more like this:
> 
>    If you write data and metadata separately (which you have to do in
> the nodatacow case), and the system halts between the two writes, then
> you either have the new data with the old csum, or the old csum with
> the new data. Both data and csum are "good", but good from different
> states of the FS. In both cases (data first or metadata first), the
> csum doesn't match the data, and so you now have an I/O error reported
> when trying to read that data.
> 
>    You can't easily fix this, because when the data and csum don't
> match, you need to know the _reason_ they don't match -- is it because
> the machine was interrupted during write (in which case you can fix
> it), or is it because the hard disk has had someone write data to it
> directly, and the data is now toast (in which case you shouldn't fix
> the I/O error)?

I am still inclined to think that this kind of problems could be solved using a 
journal: if you track which blocks are updated in the transaction and their 
checksum, if the transaction are interrupted, you can always rebuild the pair 
data/checksum:
in case of interruption of a transaction:
- all COW data are trashed
- some NOCOW data might be written
- all metadata (which are COW) are trashed

Supposing to log for each transaction BTRFS which "data NOCOW blocks" will be 
updated and their checksum, in case a transaction is interrupted you know which 
blocks have to be checked and are able to verify if the checksum matches and 
correct the mismatch. Logging also the checksum could help to identify if:
- the data is old
- the data is updated
- the updated data is correct

The same approach could be used also to solving also the issue related to the 
infamous RAID5/6 hole: logging which block are updated, in case of transaction 
aborted you can check the parity which have to be rebuild.

> 
>    Basically, nodatacow bypasses the very mechanisms that are meant to
> provide consistency in the filesystem.
> 
>    Hugo.
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to