> We use the crcs to catch storage gone wrong, [ ... ]

And that's an opportunistically feasible idea given that current
CPUs can do that in real-time.

> [ ... ] It's possible to protect against all three without COW,
> but all solutions have their own tradeoffs and this is the setup
> we chose. It's easy to trust and easy to debug and at scale that
> really helps.

Indeed all filesystem designs have pathological workloads, and
system administrators and applications developers who are "more
prepared" know which one is best for which workload, or try to
figure it out.

> Some databases also crc, and all drives have correction bits of
> of some kind. There's nothing wrong with crcs happening at lots
> of layers.

Well, there is: in theory checksumming should be end-to-end, that
is entirely application level, so applications that don't need it
don't pay the price, but having it done at other layers can help
the very many applications that don't do it and should do it, and
it is cheap, and can help when troubleshooting exactly there the
problem is. It is an opportunistic thing to do.

> [ ... ] My real goal is to make COW fast enough that we can
> leave it on for the database applications too.  Obviously I
> haven't quite finished that one yet ;) [ ... ]

And this worries me because it portends the usual "marketing" goal
of making Btrfs all things to all workloads, the "OpenStack of
filesystems", with little consideration for complexity,
maintainability, or even sometimes reality.

The reality is that all known storage media have hugely
anisotropic performance envelopes, both as to functionality, cost,
speed, reliability, and there is no way to have an automagic
filesystem that "just works" in all cases, despite the constant
demands for one by "less prepared" storage administrators and
application developers. The reality is also that if one such
filesystem could automagically adapt to cover optimally the
performance envelopes of every possible device and workload, it
would be so complex as to be unmaintainable in practice.

So Btrfs, in its base "Rodeh" functionality, with COW, checksums,
subvolumes, shapshots, *on a single device*, works pretty well and
reliably and it is already very useful, for most workloads. Some
people also like some of its exotic complexities like in-place
compression and defragmentation, but they come at a high cost.

For workloads that inflict lots of small random in-place updates
on storage, like tablespaces for DBMSes etc, perhaps simpler less
featureful storage abstraction layers are more appropriate, from
OCFS2 to simple DM/LVM2 LVs, and Btrfs NOCOW approximates them
well.

BTW as to the specifics of DBMSes and filesystems, there is a
classic paper making eminently reasonable, practical, suggestions
that have been ignored for only 35 years and some:

  %A M. R. Stonebraker
  %T Operating system support for database management
  %J CACM
  %V 24
  %D JUL 1981
  %P 412-418
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to