On 1/8/21 6:30 PM, Goffredo Baroncelli wrote:
On 1/8/21 2:05 AM, Zygo Blaxell wrote:
On Thu, May 28, 2020 at 08:34:47PM +0200, Goffredo Baroncelli wrote:

[...]

I've been testing these patches for a while now.  They enable an
interesting use case that can't otherwise be done safely, sanely or
cheaply with btrfs.

Thanks Zygo for this feedback. As usual you are source of very interesting 
considerations.

Normally if we have an array of, say, 10 spinning disks, and we want to
implement a writeback cache layer with SSD, we would need 10 distinct SSD
devices to avoid reducing btrfs's ability to recover from drive failures.
The writeback cache will be modified on both reads and writes, data and
metadata, so we need high endurance SSDs if we want them to make it to
the end of their warranty.  The SSD firmware has to not have crippling
performance bugs while under heavy write load, which means we are now
restricted to an expensive subset of high endurance SSDs targeted at
the enterprise/NAS/video production markets...and we need 10 of them!

NVME has fairly draconian restrictions on drive count, and getting
anything close to 10 of them into a btrfs filesystem can be an expensive
challenge.  (I'm not counting solutions that use USB-to-NVME bridges
because those don't count as "sane" or "safe").

We can share the cache between disks, but not safely in writeback mode,
because a failure in one SSD could affect multiple logical btrfs disks.
Strictly speaking we can't do it safely in any cache mode, but at least
with a writethrough cache we can recover the btrfs by throwing the SSDs
away.
[...]

Hi Zygo,

could you elaborate the last sentence. What I understood is that in
write-through mode, the ordering (and the barrier) are preserved.
So this mode should be safe (bug a part).

If this is true, it would be possible to have a btrfs multi (spinning)
disks setup with only one SSD acting as cache. Of course, it will
works only in write-through mode, and the main beneficial are related
to caching the data for further next read.

Does anyone have further experiences ? Does anyone tried to
recover a BTRFS filesystem when the cache disks died ?

Oh.. wait... Now I understood: if the caching disk read badly (but
without returning an error), the bad data may be wrote on the other
disks: in this case a single failure (the cache disk) may affect
all the other disks and the redundancy is lost ...

BR
G.Baroncelli

Reply via email to