On Mon, Aug 14, 2017 at 09:54:48PM +0200, Christoph Anton Mitterer wrote:
On Mon, 2017-08-14 at 11:53 -0400, Austin S. Hemmelgarn wrote:
Quite a few applications actually _do_ have some degree of secondary
verification or protection from a crash. Go look at almost any
database
software.
Then please give proper references for this!
This is from 2015, where you claimed this already and I looked up all
the bigger DBs and they either couldn't do it at all, didn't to it per
default, or it required application support (i.e. from the programs
using the DB)
https://www.spinics.net/lists/linux-btrfs/msg50258.html
It usually will not have checksumming, but it will almost
always have support for a journal, which is enough to cover the
particular data loss scenario we're talking about (unexpected
unclean
shutdown).
I don't think we talk about this:
We talk about people wanting checksuming to notice e.g. silent data
corruption.
The crash case is only the corner case about what happens then if data
is written correctly but csums not.
We use the crcs to catch storage gone wrong, both in terms of simple
things like cabling, bus errors, drives gone crazy or exotic problems
like every time I reboot the box a handful of sectors return EFI
partition table headers instead of the data I wrote. You don't need
data center scale for this to happen, but it does help...
So, we do catch crc errors in prod and they do keep us from replicating
bad data over good data. Some databases also crc, and all drives have
correction bits of of some kind. There's nothing wrong with crcs
happening at lots of layers.
Btrfs couples the crcs with COW because it's the least complicated way
to protect against:
* bits flipping
* IO getting lost on the way to the drive, leaving stale but valid data
in place
* IO from sector A going to sector B instead, overwriting valid data
with other valid data.
It's possible to protect against all three without COW, but all
solutions have their own tradeoffs and this is the setup we chose. It's
easy to trust and easy to debug and at scale that really helps.
In general, production storage environments prefer clearly defined
errors when the storage has the wrong data. EIOs happen often, and you
want to be able to quickly pitch the bad data and replicate in good
data.
My real goal is to make COW fast enough that we can leave it on for the
database applications too. Obviously I haven't quite finished that one
yet ;) But I'd rather keep the building block of all the other btrfs
features in place than try to do crcs differently.
-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html