> We use the crcs to catch storage gone wrong, [ ... ] And that's an opportunistically feasible idea given that current CPUs can do that in real-time.
> [ ... ] It's possible to protect against all three without COW, > but all solutions have their own tradeoffs and this is the setup > we chose. It's easy to trust and easy to debug and at scale that > really helps. Indeed all filesystem designs have pathological workloads, and system administrators and applications developers who are "more prepared" know which one is best for which workload, or try to figure it out. > Some databases also crc, and all drives have correction bits of > of some kind. There's nothing wrong with crcs happening at lots > of layers. Well, there is: in theory checksumming should be end-to-end, that is entirely application level, so applications that don't need it don't pay the price, but having it done at other layers can help the very many applications that don't do it and should do it, and it is cheap, and can help when troubleshooting exactly there the problem is. It is an opportunistic thing to do. > [ ... ] My real goal is to make COW fast enough that we can > leave it on for the database applications too. Obviously I > haven't quite finished that one yet ;) [ ... ] And this worries me because it portends the usual "marketing" goal of making Btrfs all things to all workloads, the "OpenStack of filesystems", with little consideration for complexity, maintainability, or even sometimes reality. The reality is that all known storage media have hugely anisotropic performance envelopes, both as to functionality, cost, speed, reliability, and there is no way to have an automagic filesystem that "just works" in all cases, despite the constant demands for one by "less prepared" storage administrators and application developers. The reality is also that if one such filesystem could automagically adapt to cover optimally the performance envelopes of every possible device and workload, it would be so complex as to be unmaintainable in practice. So Btrfs, in its base "Rodeh" functionality, with COW, checksums, subvolumes, shapshots, *on a single device*, works pretty well and reliably and it is already very useful, for most workloads. Some people also like some of its exotic complexities like in-place compression and defragmentation, but they come at a high cost. For workloads that inflict lots of small random in-place updates on storage, like tablespaces for DBMSes etc, perhaps simpler less featureful storage abstraction layers are more appropriate, from OCFS2 to simple DM/LVM2 LVs, and Btrfs NOCOW approximates them well. BTW as to the specifics of DBMSes and filesystems, there is a classic paper making eminently reasonable, practical, suggestions that have been ignored for only 35 years and some: %A M. R. Stonebraker %T Operating system support for database management %J CACM %V 24 %D JUL 1981 %P 412-418 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html