Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?

Qu Wenruo Wed, 16 Aug 2017 07:03:33 -0700


On 2017年08月16日 21:12, Chris Mason wrote:

On Mon, Aug 14, 2017 at 09:54:48PM +0200, Christoph Anton Mitterer wrote:
On Mon, 2017-08-14 at 11:53 -0400, Austin S. Hemmelgarn wrote:
Quite a few applications actually _do_ have some degree of secondary
verification or protection from a crash.  Go look at almost any
database
software.
Then please give proper references for this!

This is from 2015, where you claimed this already and I looked up all
the bigger DBs and they either couldn't do it at all, didn't to it per
default, or it required application support (i.e. from the programs
using the DB)
https://www.spinics.net/lists/linux-btrfs/msg50258.html
It usually will not have checksumming, but it will almost
always have support for a journal, which is enough to cover the
particular data loss scenario we're talking about (unexpected
unclean
shutdown).
I don't think we talk about this:
We talk about people wanting checksuming to notice e.g. silent data
corruption.

The crash case is only the corner case about what happens then if data
is written correctly but csums not.
We use the crcs to catch storage gone wrong, both in terms of simplethings like cabling, bus errors, drives gone crazy or exotic problemslike every time I reboot the box a handful of sectors return EFIpartition table headers instead of the data I wrote. You don't needdata center scale for this to happen, but it does help...
So, we do catch crc errors in prod and they do keep us from replicatingbad data over good data. Some databases also crc, and all drives havecorrection bits of of some kind. There's nothing wrong with crcshappening at lots of layers.
Btrfs couples the crcs with COW because it's the least complicated wayto protect against:
* bits flipping
* IO getting lost on the way to the drive, leaving stale but valid datain place* IO from sector A going to sector B instead, overwriting valid datawith other valid data.
It's possible to protect against all three without COW, but allsolutions have their own tradeoffs and this is the setup we chose. It'seasy to trust and easy to debug and at scale that really helps.
In general, production storage environments prefer clearly definederrors when the storage has the wrong data. EIOs happen often, and youwant to be able to quickly pitch the bad data and replicate in good data.

Btrfs csum is really good, specially for case like RAID1/5/6 where csumcan provide extra info about which mirror/stripe/parity can be trusted,with minimal space wasted.

DM layer should really have the ability to verify its data at thattiming like btrfs.

My real goal is to make COW fast enough that we can leave it on for thedatabase applications too.

Yes, most of the complexity of nodatasum/nodatacow comes from thosespecial workload.

BTW, when Fujitsu tested the postgresql workload on btrfs, the result isquite interesting.

For HDD, when number of clients is low, btrfs shows obvious performancedrop.And the problem seems to be mandatory metadata COW, which leads tosuperblock FUA updates.And when number of clients grow, difference between btrfs and other fsesgets much smaller, the bottleneck is the HDD itself.

While for SSD, when number of clients is low, btrfs is almost the sameperformance as other fses, nodatacow/nodatasum only provides marginaldifference.

But when number of clients grows, btrfs falls far behind other fses.

The reason seems to be related to how postgresql commit its transaction,which always fsync its journal sequentially without concurrency.While Btrfs needs to wait its data write before updating its log tree,this makes most of its time wasted on waiting data IO.In that case, nodatacow does improves the performance, by allowing btrfsto update its log tree without waiting data IO.

But in both case, CoW itself, like allocating new extent, or calculatingcsum, is not the main cause to slow down btrfs.

That's to say, nodatacow is not as important as we used to think.

If we can get rid of nodatacow/nodatasum, there will be much less thingto consider for us developers, and less related bugs.


Thanks,
Qu

Obviously I haven't quite finished that oneyet ;) But I'd rather keep the building block of all the other btrfsfeatures in place than try to do crcs differently.
-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RedHat 7.4 Release Notes: "Btrfs has been deprecated" - wut?

Reply via email to