On 10/10/18 21:08, Ilya Dryomov wrote:
On Wed, Oct 10, 2018 at 8:48 PM Kjetil Joergensen <kje...@medallia.com> wrote:
Hi,

We tested bcache, dm-cache/lvmcache, and one more which name eludes me with 
PCIe NVME on top of large spinning rust drives behind a SAS3 expander - and 
decided this were not for us.

This was probably jewel with filestore, and our primary reason for trying to go down this 
path were that leveldb compaction were killing us, and putting omap/leveldb and things on 
separate locations were "so-so" supported (IIRC: some were explicitly 
supported, some you could do a bit of symlink or mount trickery).

The caching worked - although, when we started doing power failure 
survivability (power cycle the entire rig, wait for recovery, repeat), we ended 
up with seriously corrupted the XFS filesystems on top of the cached block 
device within a handful of power cycles). We did not test fully disabling the 
spinning rust on-device cache (which were the leading hypothesis of why this 
actually failed, potentially combined with ordering of FLUSH+FUA ending up 
slightly funky combined with the rather asymmetric commit latency). Just to 
rule out anything else, we did run the same power-fail test regimen for days 
without the nvme-over-spinning-rust-caching, without triggering the same 
filesystem corruption.

So yea - I'd recommend looking at i.e. bluestore and stick rocksdb, journal and 
anything else performance critical on faster storage instead.

If you do decide to go down the dm-cache/lvmcache/(other cache) road - I'd 
recommend throughly testing failure scenarios like i.e. power-loss so you don't 
find out accidentally when you do have a multi-failure-domain outage. :)
Yeah, definitely do a lot of pulling disks and power cycle testing.
dm-cache had a data corruption on power loss bug in 4.9+:

   
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5b1fe7bec8a8d0cc547a22e7ddc2bd59acd67de4

Thanks,

                 Ilya

Thanks a lot for the feedback, so i deduce that this is currently not mainstream yet. I watched a presentation given by Sage last year on Bluestore, there was a slide on bcache/dm-cache in the Future section, is this still on the table ? maybe the push is for all flash and possibly spdk down the road may make such caching less important ?

/Maged
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to