On 2017年08月14日 20:32, Christoph Anton Mitterer wrote:
On Mon, 2017-08-14 at 15:46 +0800, Qu Wenruo wrote:
The problem here is, if you enable csum and even data is updated
correctly, only metadata is trashed, then you can't even read out
the
correct data.

So what?
This problem occurs anyway *only* in case of a crash,.. and *only* if
notdatacow+checksumung would be used.
A case in which currently, the user can either only hope that his data
is fine (unless higher levels provide some checksumming means[0]), or
anyway needs to recover from a backup.

Let's make it clear of the combinations and its result in power loss case:

Datacow + Datasum: Good old data

Datacow + nodatasum: Good old data

Nodatacow + datacum: Good old data (data not committed yet) or -EIO (data updated) Not supported yet, so I just assume it's using current csum checking behavior.

Nodatacow + nodatasum: Good old data (data not committed yet) or uncertain data.

The uncertain part is when data updated, what it should behave.

If we really need to implement nodatacow +datasum, I prefer to make it consistent with nodatacow + nodatasum behavior, at least read out the data, give some csum warning instead of refuse to read and returning -EIO.


Intuitively I'd also say it's much less likely that the data (which is
more in terms of space) is written correctly while the checksum is not.
Or is it?

Checksums are protected by mandatory metadata CoW, so metadata update is always atomic.
Checksum will either be updated correctly, or trashed at all. Unlike data.

And it's highly possible to happen. As when synchronising a filesystem, we write data first, then metadata (data and meta may be cached by disk controller, but at least we submit such request to disk), then flush all data and metadata to disk, and update superblock finally.

Since metadata is updated CoW, unless the superblock is written to disk, we are always reading the old metadata trees (including csum tree).

So if powerloss happens between data written to disk and final superblock update, it's highly possible to hit the problem. And considering the data/metadata ratio, we spend more time flushing data other than metadata, which increase the possibility further more.


[0] And when I've investigated back when discussion rose up the first
time and some list member claimed that most typical cases (DBs, VM
images) would anyway do their own checksuming,... I came to the
conclusion that most did not even support it and even if they would
it's no enabled per default and not really a *full* checksumming in
most cases.



As btrfs csum checker will just prevent you from reading out any
data
which doesn't match with csum.
As I've said before, a tool could be provided, that re-computes the
checksums then (making the data accessible again)... or one could
simply mount the fs with nochecksum or some other special option, which
allows bypassing any checks.

Just as you pointed out, such csum bypassing should be the prerequisite for nodatacow+datasum.
And unfortunately, we don't have such facility yet.


Now it's not just data corruption, but data loss then.
I think the former is worse than the later. The later gives you a
chance of noting it, and either recover from a backup, regenerate the
data (if possible) or manually mark the data as being "good" (though
corrupted) again.

This depends.
If the upper layer has its own error detection mechanism, like keeping a special file fsynced before write (or just call it journal), then allowing reading out the corrupted data gives it a chance to find it good and continue.
While just returning -EIO kills the chance at all.

BTW, normal user space programs can handle csum mismatch better than -EIO.
Like zip files has its own checksum, btw can't handle -EIO at all.

Thanks,
Qu



Cheers,
Chris.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to